* [PATCH v2 0/3] sched/fair: some fixes for asym_packing @ 2018-12-14 16:01 Vincent Guittot 2018-12-14 16:01 ` [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing Vincent Guittot ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Vincent Guittot @ 2018-12-14 16:01 UTC (permalink / raw) To: peterz, mingo, linux-kernel Cc: valentin.schneider, Morten.Rasmussen, Vincent Guittot During the review of misfit task patchset, Morten and Valentin raised some problems with the use of SD_ASYM_PACKING flag on asymetric system like hikey960 arm64 big/LITTLE platform. The study of the use cases has shown some problems that can happen for every systems that use the flag. The 3 patches fixes the problems raised for lmbench and the rt-app UC that creates 2 tasks that start as small tasks and then become suddenly always running tasks. (I can provide the rt-app json if needed) Changes since v1: - rebase on tip/sched/core - changes asym_active_balance() as suggested by Peter Vincent Guittot (3): sched/fair: fix rounding issue for asym packing sched/fair: trigger asym_packing during idle load balance sched/fair: fix unnecessary increase of balance interval kernel/sched/fair.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing 2018-12-14 16:01 [PATCH v2 0/3] sched/fair: some fixes for asym_packing Vincent Guittot @ 2018-12-14 16:01 ` Vincent Guittot 2018-12-18 17:32 ` Valentin Schneider 2018-12-14 16:01 ` [PATCH v2 2/3] sched/fair: trigger asym_packing during idle load balance Vincent Guittot 2018-12-14 16:01 ` [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval Vincent Guittot 2 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-14 16:01 UTC (permalink / raw) To: peterz, mingo, linux-kernel Cc: valentin.schneider, Morten.Rasmussen, Vincent Guittot When check_asym_packing() is triggered, the imbalance is set to : busiest_stat.avg_load * busiest_stat.group_capacity / SCHED_CAPACITY_SCALE busiest_stat.avg_load also comes from a division and the final rounding can make imbalance slightly lower than the weighted load of the cfs_rq. But this is enough to skip the rq in find_busiest_queue and prevents asym migration to happen. Add 1 to the avg_load to make sure that the targeted cpu will not be skipped unexpectidly. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ca46964..c215f7a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8215,6 +8215,12 @@ static inline void update_sg_lb_stats(struct lb_env *env, /* Adjust by relative CPU capacity of the group */ sgs->group_capacity = group->sgc->capacity; sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; + /* + * Prevent division rounding to make the computation of imbalance + * slightly less than original value and to prevent the rq to be then + * selected as busiest queue + */ + sgs->avg_load += 1; if (sgs->sum_nr_running) sgs->load_per_task = sgs->sum_weighted_load / sgs->sum_nr_running; -- 2.7.4 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing 2018-12-14 16:01 ` [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing Vincent Guittot @ 2018-12-18 17:32 ` Valentin Schneider 2018-12-19 8:32 ` Vincent Guittot 0 siblings, 1 reply; 27+ messages in thread From: Valentin Schneider @ 2018-12-18 17:32 UTC (permalink / raw) To: Vincent Guittot, peterz, mingo, linux-kernel; +Cc: Morten.Rasmussen On 14/12/2018 16:01, Vincent Guittot wrote: > When check_asym_packing() is triggered, the imbalance is set to : > busiest_stat.avg_load * busiest_stat.group_capacity / SCHED_CAPACITY_SCALE > busiest_stat.avg_load also comes from a division and the final rounding > can make imbalance slightly lower than the weighted load of the cfs_rq. > But this is enough to skip the rq in find_busiest_queue and prevents asym > migration to happen. > > Add 1 to the avg_load to make sure that the targeted cpu will not be > skipped unexpectidly. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > kernel/sched/fair.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index ca46964..c215f7a 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8215,6 +8215,12 @@ static inline void update_sg_lb_stats(struct lb_env *env, > /* Adjust by relative CPU capacity of the group */ > sgs->group_capacity = group->sgc->capacity; > sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; > + /* > + * Prevent division rounding to make the computation of imbalance > + * slightly less than original value and to prevent the rq to be then > + * selected as busiest queue > + */ > + sgs->avg_load += 1; I've tried investigating this off-by-one issue by running lmbench, but it seems to be a gnarly one. The adventure starts in update_sg_lb_stats() where we compute sgs->avg_load: sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity Then we drop by find_busiest_group() and call check_asym_packing() which, if we do need to pack, does: env->imbalance = DIV_ROUND_CLOSEST( sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity, SCHED_CAPACITY_SCALE); And finally the one check that seems to be failing, and that you're addressing with a +1 is in find_busiest_queue(): if (rq->nr_running == 1 && wl > env->imbalance && !check_cpu_capacity(rq, env->sd)) continue; Now, running lmbench on my HiKey960 hacked up to use asym packing, I get an example where there's a task that should be migrated via asym packing but isn't: # This a DIE level balance update_sg_lb_stats(): cpu=0 load=1 cpu=1 load=1023 cpu=2 load=0 cpu=3 load=0 avg_load=568 # Busiest group is [0-3] find_busiest_group(): check_asym_packing(): env->imbalance = 1022 find_busiest_queue(): cpu=0 wl=1 cpu=1 wl=1023 cpu=2 wl=0 cpu=3 wl=0 Only CPU1 has rq->nr_running > 0 (==1 actually), but wl > env->imbalance so we skip it in find_busiest_queue(). Now, I thought it was due to IRQ/rt pressure - 1840 isn't the maximum group capacity for the LITTLE cluster, it should be (463 * 4) == 1852. I got curious and threw random group capacity values in that equation while keeping the same avg_load [1], and it gets wild: you have a signal that oscillates between 1024 and 1014! [1]: https://gist.github.com/valschneider/f42252e83be943d276b901f57734b8ba Adding +1 to avg_load doesn't solve those pesky rounding errors that get worse as group_capacity grows, but it reverses the trend: the signal oscillates above 1024. So in the end a +1 *could* "fix" this, but a) It'd make more sense to have it in the check_asym_packing() code b) It's ugly and it makes me feel like this piece of code is flimsy AF. > > if (sgs->sum_nr_running) > sgs->load_per_task = sgs->sum_weighted_load / sgs->sum_nr_running; > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing 2018-12-18 17:32 ` Valentin Schneider @ 2018-12-19 8:32 ` Vincent Guittot 2018-12-19 11:58 ` Valentin Schneider 0 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-19 8:32 UTC (permalink / raw) To: Valentin Schneider Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On Tue, 18 Dec 2018 at 18:32, Valentin Schneider <valentin.schneider@arm.com> wrote: > > On 14/12/2018 16:01, Vincent Guittot wrote: > > When check_asym_packing() is triggered, the imbalance is set to : > > busiest_stat.avg_load * busiest_stat.group_capacity / SCHED_CAPACITY_SCALE > > busiest_stat.avg_load also comes from a division and the final rounding > > can make imbalance slightly lower than the weighted load of the cfs_rq. > > But this is enough to skip the rq in find_busiest_queue and prevents asym > > migration to happen. > > > > Add 1 to the avg_load to make sure that the targeted cpu will not be > > skipped unexpectidly. > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > --- > > kernel/sched/fair.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index ca46964..c215f7a 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -8215,6 +8215,12 @@ static inline void update_sg_lb_stats(struct lb_env *env, > > /* Adjust by relative CPU capacity of the group */ > > sgs->group_capacity = group->sgc->capacity; > > sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; > > + /* > > + * Prevent division rounding to make the computation of imbalance > > + * slightly less than original value and to prevent the rq to be then > > + * selected as busiest queue > > + */ > > + sgs->avg_load += 1; > > I've tried investigating this off-by-one issue by running lmbench, but it > seems to be a gnarly one. > > > > The adventure starts in update_sg_lb_stats() where we compute > sgs->avg_load: > > sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity > > Then we drop by find_busiest_group() and call check_asym_packing() which, > if we do need to pack, does: > > env->imbalance = DIV_ROUND_CLOSEST( > sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity, > SCHED_CAPACITY_SCALE); > > And finally the one check that seems to be failing, and that you're > addressing with a +1 is in find_busiest_queue(): > > if (rq->nr_running == 1 && wl > env->imbalance && > !check_cpu_capacity(rq, env->sd)) > continue; > > > > Now, running lmbench on my HiKey960 hacked up to use asym packing, I > get an example where there's a task that should be migrated via asym > packing but isn't: > > # This a DIE level balance > update_sg_lb_stats(): > cpu=0 load=1 > cpu=1 load=1023 > cpu=2 load=0 > cpu=3 load=0 > > avg_load=568 > > # Busiest group is [0-3] > find_busiest_group(): check_asym_packing(): > env->imbalance = 1022 > > find_busiest_queue(): > cpu=0 wl=1 > cpu=1 wl=1023 > cpu=2 wl=0 > cpu=3 wl=0 > > Only CPU1 has rq->nr_running > 0 (==1 actually), but wl > env->imbalance > so we skip it in find_busiest_queue(). > > Now, I thought it was due to IRQ/rt pressure - 1840 isn't the maximum > group capacity for the LITTLE cluster, it should be (463 * 4) == 1852. > I got curious and threw random group capacity values in that equation while > keeping the same avg_load [1], and it gets wild: you have a signal that > oscillates between 1024 and 1014! hmm ... this seems to not be related with $subject > > [1]: https://gist.github.com/valschneider/f42252e83be943d276b901f57734b8ba > > Adding +1 to avg_load doesn't solve those pesky rounding errors that get > worse as group_capacity grows, but it reverses the trend: the signal > oscillates above 1024. > > > > So in the end a +1 *could* "fix" this, but > a) It'd make more sense to have it in the check_asym_packing() code > b) It's ugly and it makes me feel like this piece of code is flimsy AF. This is another UC, asym packing is used at SMT level for now and we don't face this kind of problem, it has been also tested and DynamiQ configuration which is similar to SMT : 1 CPU per sched_group The legacy b.L one was not the main target although it can be The rounding error is there and should be fixed and your UC is another case for legacy b.L that can be treated in another patch > > > > > if (sgs->sum_nr_running) > > sgs->load_per_task = sgs->sum_weighted_load / sgs->sum_nr_running; > > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing 2018-12-19 8:32 ` Vincent Guittot @ 2018-12-19 11:58 ` Valentin Schneider 2018-12-19 13:39 ` Vincent Guittot 0 siblings, 1 reply; 27+ messages in thread From: Valentin Schneider @ 2018-12-19 11:58 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On 19/12/2018 08:32, Vincent Guittot wrote: [...] > This is another UC, asym packing is used at SMT level for now and we > don't face this kind of problem, it has been also tested and DynamiQ > configuration which is similar to SMT : 1 CPU per sched_group > The legacy b.L one was not the main target although it can be > > The rounding error is there and should be fixed and your UC is another > case for legacy b.L that can be treated in another patch > I used that setup out of convenience for myself, but AFAICT that use-case just stresses that issue. In the end we still have an imbalance value that can vary with only slight group_capacity changes. And that's true even if that group contains a single CPU, as the imbalance value computed by check_asym_packing() can vary by +/-1 with very tiny capacity changes (rt/IRQ pressure). That means that sometimes you're off by one and don't do the packing because some CPU was pressured, but some other time you might do it because that CPU was *more* pressured. It's doesn't make sense. In the end problem is that in update_sg_lb_stats() we do: sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; and in check_asym_packing() we do: env->imbalance = DIV_ROUND_CLOSEST( sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity, SCHED_CAPACITY_SCALE) So we end up with something like: group_load * SCHED_CAPACITY_SCALE * group_capacity imbalance = -------------------------------------------------- group_capacity * SCHED_CAPACITY_SCALE Which you'd hope to reduce down to: imbalance = group_load but you get division errors all over the place. So actually, the fix we'd want would be: -----8<----- @@ -8463,9 +8463,7 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds) if (sched_asym_prefer(busiest_cpu, env->dst_cpu)) return 0; - env->imbalance = DIV_ROUND_CLOSEST( - sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity, - SCHED_CAPACITY_SCALE); + env->imbalance = sds->busiest_stat.group_load; return 1; } ----->8----- ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing 2018-12-19 11:58 ` Valentin Schneider @ 2018-12-19 13:39 ` Vincent Guittot 2018-12-19 14:59 ` Valentin Schneider 0 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-19 13:39 UTC (permalink / raw) To: Valentin Schneider Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On Wed, 19 Dec 2018 at 12:58, Valentin Schneider <valentin.schneider@arm.com> wrote: > > On 19/12/2018 08:32, Vincent Guittot wrote: > [...] > > This is another UC, asym packing is used at SMT level for now and we > > don't face this kind of problem, it has been also tested and DynamiQ > > configuration which is similar to SMT : 1 CPU per sched_group > > The legacy b.L one was not the main target although it can be > > > > The rounding error is there and should be fixed and your UC is another > > case for legacy b.L that can be treated in another patch > > > > I used that setup out of convenience for myself, but AFAICT that use-case > just stresses that issue. After looking at you UC in details, your problem comes from the wl=1 for cpu0 whereas there is no running task. But wl!=0 without running task should not be possible since fdf5f3 ("wsched/fair: Disable LB_BIAS by default") This explain the 1023 instead of 1022 > > In the end we still have an imbalance value that can vary with only > slight group_capacity changes. And that's true even if that group > contains a single CPU, as the imbalance value computed by > check_asym_packing() can vary by +/-1 with very tiny capacity changes > (rt/IRQ pressure). That means that sometimes you're off by one and don't > do the packing because some CPU was pressured, but some other time you > might do it because that CPU was *more* pressured. It's doesn't make sense. > > > In the end problem is that in update_sg_lb_stats() we do: > > sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; > > and in check_asym_packing() we do: > > env->imbalance = DIV_ROUND_CLOSEST( > sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity, > SCHED_CAPACITY_SCALE) > > So we end up with something like: > > group_load * SCHED_CAPACITY_SCALE * group_capacity > imbalance = -------------------------------------------------- > group_capacity * SCHED_CAPACITY_SCALE > > Which you'd hope to reduce down to: > > imbalance = group_load > > but you get division errors all over the place. So actually, the fix we'd > want would be: > > -----8<----- > @@ -8463,9 +8463,7 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds) > if (sched_asym_prefer(busiest_cpu, env->dst_cpu)) > return 0; > > - env->imbalance = DIV_ROUND_CLOSEST( > - sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity, > - SCHED_CAPACITY_SCALE); > + env->imbalance = sds->busiest_stat.group_load; > > return 1; > } > ----->8----- ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing 2018-12-19 13:39 ` Vincent Guittot @ 2018-12-19 14:59 ` Valentin Schneider 2018-12-19 15:05 ` Vincent Guittot 0 siblings, 1 reply; 27+ messages in thread From: Valentin Schneider @ 2018-12-19 14:59 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On 19/12/2018 13:39, Vincent Guittot wrote: [...] >> I used that setup out of convenience for myself, but AFAICT that use-case >> just stresses that issue. > > After looking at you UC in details, your problem comes from the wl=1 > for cpu0 whereas there is no running task. > But wl!=0 without running task should not be possible since fdf5f3 > ("wsched/fair: Disable LB_BIAS by default") > > This explain the 1023 instead of 1022 > > True, I had a look at the trace and there doesn't seem to be any running task on that CPU. That's a separate matter however - the rounding issues can happen regardless of the wl values. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing 2018-12-19 14:59 ` Valentin Schneider @ 2018-12-19 15:05 ` Vincent Guittot 2018-12-19 15:11 ` Valentin Schneider 0 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-19 15:05 UTC (permalink / raw) To: Valentin Schneider Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On Wed, 19 Dec 2018 at 15:59, Valentin Schneider <valentin.schneider@arm.com> wrote: > > > > On 19/12/2018 13:39, Vincent Guittot wrote: > [...] > >> I used that setup out of convenience for myself, but AFAICT that use-case > >> just stresses that issue. > > > > After looking at you UC in details, your problem comes from the wl=1 > > for cpu0 whereas there is no running task. > > But wl!=0 without running task should not be possible since fdf5f3 > > ("wsched/fair: Disable LB_BIAS by default") > > > > This explain the 1023 instead of 1022 > > > > > > True, I had a look at the trace and there doesn't seem to be any running > task on that CPU. That's a separate matter however - the rounding issues > can happen regardless of the wl values. But it means that the rounding fix +1 works and your problems comes from something else ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing 2018-12-19 15:05 ` Vincent Guittot @ 2018-12-19 15:11 ` Valentin Schneider 2018-12-19 15:20 ` Vincent Guittot 0 siblings, 1 reply; 27+ messages in thread From: Valentin Schneider @ 2018-12-19 15:11 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On 19/12/2018 15:05, Vincent Guittot wrote: [...] >> True, I had a look at the trace and there doesn't seem to be any running >> task on that CPU. That's a separate matter however - the rounding issues >> can happen regardless of the wl values. > > But it means that the rounding fix +1 works and your problems comes > from something else Oh yes, I never said it didn't work - I was doing some investigation on the reason as to why we'd need this fix, because it's wasn't explicit from the commit message. The rounding errors are countered by the +1, yes, but I'd rather remove the errors altogether and go for the snippet I suggested in my previous reply. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing 2018-12-19 15:11 ` Valentin Schneider @ 2018-12-19 15:20 ` Vincent Guittot 2018-12-19 15:30 ` Valentin Schneider 0 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-19 15:20 UTC (permalink / raw) To: Valentin Schneider Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On Wed, 19 Dec 2018 at 16:11, Valentin Schneider <valentin.schneider@arm.com> wrote: > > On 19/12/2018 15:05, Vincent Guittot wrote: > [...] > >> True, I had a look at the trace and there doesn't seem to be any running > >> task on that CPU. That's a separate matter however - the rounding issues > >> can happen regardless of the wl values. > > > > But it means that the rounding fix +1 works and your problems comes > > from something else > > Oh yes, I never said it didn't work - I was doing some investigation on > the reason as to why we'd need this fix, because it's wasn't explicit from > the commit message. > > The rounding errors are countered by the +1, yes, but I'd rather remove > the errors altogether and go for the snippet I suggested in my previous > reply. except that you don't always want to migrate all group load. I prefer keeping current algorithm and fix it for now. Trying "new" thing can come in a 2nd step ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing 2018-12-19 15:20 ` Vincent Guittot @ 2018-12-19 15:30 ` Valentin Schneider 2018-12-19 15:54 ` Vincent Guittot 0 siblings, 1 reply; 27+ messages in thread From: Valentin Schneider @ 2018-12-19 15:30 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On 19/12/2018 15:20, Vincent Guittot wrote: [...] >> Oh yes, I never said it didn't work - I was doing some investigation on >> the reason as to why we'd need this fix, because it's wasn't explicit from >> the commit message. >> >> The rounding errors are countered by the +1, yes, but I'd rather remove >> the errors altogether and go for the snippet I suggested in my previous >> reply. > > except that you don't always want to migrate all group load. > I prefer keeping current algorithm and fix it for now. Trying "new" > thing can come in a 2nd step We already set the imbalance as the whole group load, we just introduce rounding errors inbetween. As I've already said: in update_sg_lb_stats() we do: sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; and in check_asym_packing() we do: env->imbalance = DIV_ROUND_CLOSEST( sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity, SCHED_CAPACITY_SCALE) So we end up with something like: group_load * SCHED_CAPACITY_SCALE * group_capacity imbalance = -------------------------------------------------- group_capacity * SCHED_CAPACITY_SCALE Which we could reduce down to: imbalance = group_load and not get any rounding errors. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing 2018-12-19 15:30 ` Valentin Schneider @ 2018-12-19 15:54 ` Vincent Guittot 0 siblings, 0 replies; 27+ messages in thread From: Vincent Guittot @ 2018-12-19 15:54 UTC (permalink / raw) To: Valentin Schneider Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On Wed, 19 Dec 2018 at 16:30, Valentin Schneider <valentin.schneider@arm.com> wrote: > > On 19/12/2018 15:20, Vincent Guittot wrote: > [...] > >> Oh yes, I never said it didn't work - I was doing some investigation on > >> the reason as to why we'd need this fix, because it's wasn't explicit from > >> the commit message. > >> > >> The rounding errors are countered by the +1, yes, but I'd rather remove > >> the errors altogether and go for the snippet I suggested in my previous > >> reply. > > > > except that you don't always want to migrate all group load. > > I prefer keeping current algorithm and fix it for now. Trying "new" > > thing can come in a 2nd step > > We already set the imbalance as the whole group load, we just introduce > rounding errors inbetween. As I've already said: > > in update_sg_lb_stats() we do: > > sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; > > and in check_asym_packing() we do: > > env->imbalance = DIV_ROUND_CLOSEST( > sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity, > SCHED_CAPACITY_SCALE) > > So we end up with something like: > > group_load * SCHED_CAPACITY_SCALE * group_capacity > imbalance = -------------------------------------------------- > group_capacity * SCHED_CAPACITY_SCALE > > Which we could reduce down to: > > imbalance = group_load ah yes, i thought the nr_running was involved but it's not the case. This looks better indeed > > and not get any rounding errors. ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 2/3] sched/fair: trigger asym_packing during idle load balance 2018-12-14 16:01 [PATCH v2 0/3] sched/fair: some fixes for asym_packing Vincent Guittot 2018-12-14 16:01 ` [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing Vincent Guittot @ 2018-12-14 16:01 ` Vincent Guittot 2018-12-17 16:59 ` Valentin Schneider 2018-12-14 16:01 ` [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval Vincent Guittot 2 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-14 16:01 UTC (permalink / raw) To: peterz, mingo, linux-kernel Cc: valentin.schneider, Morten.Rasmussen, Vincent Guittot newly idle load balance is not always triggered when a cpu becomes idle. This prevent the scheduler to get a chance to migrate task for asym packing. Enable active migration because of asym packing during idle load balance too. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c215f7a..9591e7a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8861,7 +8861,7 @@ static int need_active_balance(struct lb_env *env) { struct sched_domain *sd = env->sd; - if (env->idle == CPU_NEWLY_IDLE) { + if (env->idle != CPU_NOT_IDLE) { /* * ASYM_PACKING needs to force migrate tasks from busy but -- 2.7.4 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/3] sched/fair: trigger asym_packing during idle load balance 2018-12-14 16:01 ` [PATCH v2 2/3] sched/fair: trigger asym_packing during idle load balance Vincent Guittot @ 2018-12-17 16:59 ` Valentin Schneider 2018-12-18 8:17 ` Vincent Guittot 0 siblings, 1 reply; 27+ messages in thread From: Valentin Schneider @ 2018-12-17 16:59 UTC (permalink / raw) To: Vincent Guittot, peterz, mingo, linux-kernel; +Cc: Morten.Rasmussen Hi Vincent, On 14/12/2018 16:01, Vincent Guittot wrote: > newly idle load balance is not always triggered when a cpu becomes idle. > This prevent the scheduler to get a chance to migrate task for asym packing. > Enable active migration because of asym packing during idle load balance too. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index c215f7a..9591e7a 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8861,7 +8861,7 @@ static int need_active_balance(struct lb_env *env) > { > struct sched_domain *sd = env->sd; > > - if (env->idle == CPU_NEWLY_IDLE) { > + if (env->idle != CPU_NOT_IDLE) { > > /* > * ASYM_PACKING needs to force migrate tasks from busy but > That change looks fine. However, you're mentioning newidle load_balance() not being triggered - you'd want to set root_domain->overload for any newidle pull to happen, probably with something like this: -----8<----- @@ -8398,6 +8408,9 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd sg = sg->next; } while (sg != env->sd->groups); + if (check_asym_packing(env, sds)) + sg_status |= SG_OVERLOAD; + #ifdef CONFIG_NO_HZ_COMMON if ((env->flags & LBF_NOHZ_AGAIN) && cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) { ----->8----- It's similar to what is done for misfit, although that's yet another 'twisted' use of that flag which we might want to rename (I suggested something like 'need_idle_balance' a while back but it wasn't really popular). ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/3] sched/fair: trigger asym_packing during idle load balance 2018-12-17 16:59 ` Valentin Schneider @ 2018-12-18 8:17 ` Vincent Guittot 2018-12-18 12:00 ` Valentin Schneider 0 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-18 8:17 UTC (permalink / raw) To: Valentin Schneider Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On Mon, 17 Dec 2018 at 17:59, Valentin Schneider <valentin.schneider@arm.com> wrote: > > Hi Vincent, > > On 14/12/2018 16:01, Vincent Guittot wrote: > > newly idle load balance is not always triggered when a cpu becomes idle. > > This prevent the scheduler to get a chance to migrate task for asym packing. > > Enable active migration because of asym packing during idle load balance too. > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > --- > > kernel/sched/fair.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index c215f7a..9591e7a 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -8861,7 +8861,7 @@ static int need_active_balance(struct lb_env *env) > > { > > struct sched_domain *sd = env->sd; > > > > - if (env->idle == CPU_NEWLY_IDLE) { > > + if (env->idle != CPU_NOT_IDLE) { > > > > /* > > * ASYM_PACKING needs to force migrate tasks from busy but > > > > That change looks fine. However, you're mentioning newidle load_balance() > not being triggered - you'd want to set root_domain->overload for any > newidle pull to happen, probably with something like this: It's not needed in this case because the dst cpu is already the target cpu and the migration will happen during this idle load balance. Setting root_domain->overload is useful only if you want another cpu to pull the task during another coming load_balance (newly or normal idle ones) which is not the case here. > > -----8<----- > @@ -8398,6 +8408,9 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd > sg = sg->next; > } while (sg != env->sd->groups); > > + if (check_asym_packing(env, sds)) > + sg_status |= SG_OVERLOAD; > + > #ifdef CONFIG_NO_HZ_COMMON > if ((env->flags & LBF_NOHZ_AGAIN) && > cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) { > ----->8----- > > It's similar to what is done for misfit, although that's yet another > 'twisted' use of that flag which we might want to rename (I suggested > something like 'need_idle_balance' a while back but it wasn't really > popular). ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/3] sched/fair: trigger asym_packing during idle load balance 2018-12-18 8:17 ` Vincent Guittot @ 2018-12-18 12:00 ` Valentin Schneider 0 siblings, 0 replies; 27+ messages in thread From: Valentin Schneider @ 2018-12-18 12:00 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On 18/12/2018 08:17, Vincent Guittot wrote: [...] >> That change looks fine. However, you're mentioning newidle load_balance() >> not being triggered - you'd want to set root_domain->overload for any >> newidle pull to happen, probably with something like this: > > It's not needed in this case because the dst cpu is already the target > cpu and the migration will happen during this idle load balance. > Setting root_domain->overload is useful only if you want another cpu > to pull the task during another coming load_balance (newly or normal > idle ones) which is not the case here. > Right, I got the check wrong. The thing is, if you want to go through a newidle balance, you need to have that overload flag raised beforehand. I was about to draw a diagram but I kinda already did in the log of 757ffdd705ee ("sched/fair: Set rq->rd->overload when misfit"). So you would first need to raise the flag, e.g. when updating the lb stats on a low-priority CPU, and when a higher-priority CPU goes newly idle, the flag is raised, gates to idle_balance() are opened, and a newidle pull from low-priority to high-priority can happen. >> >> -----8<----- >> @@ -8398,6 +8408,9 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd >> sg = sg->next; >> } while (sg != env->sd->groups); >> >> + if (check_asym_packing(env, sds)) >> + sg_status |= SG_OVERLOAD; >> + >> #ifdef CONFIG_NO_HZ_COMMON >> if ((env->flags & LBF_NOHZ_AGAIN) && >> cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) { >> ----->8----- >> >> It's similar to what is done for misfit, although that's yet another >> 'twisted' use of that flag which we might want to rename (I suggested >> something like 'need_idle_balance' a while back but it wasn't really >> popular). ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval 2018-12-14 16:01 [PATCH v2 0/3] sched/fair: some fixes for asym_packing Vincent Guittot 2018-12-14 16:01 ` [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing Vincent Guittot 2018-12-14 16:01 ` [PATCH v2 2/3] sched/fair: trigger asym_packing during idle load balance Vincent Guittot @ 2018-12-14 16:01 ` Vincent Guittot 2018-12-17 16:56 ` Valentin Schneider 2 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-14 16:01 UTC (permalink / raw) To: peterz, mingo, linux-kernel Cc: valentin.schneider, Morten.Rasmussen, Vincent Guittot In case of active balance, we increase the balance interval to cover pinned tasks cases not covered by all_pinned logic. Neverthless, the active migration triggered by asym packing should be treated as the normal unbalanced case and reset the interval to default value otherwise active migration for asym_packing can be easily delayed for hundreds of ms because of this all_pinned detection mecanism. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9591e7a..487287e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8857,21 +8857,24 @@ static struct rq *find_busiest_queue(struct lb_env *env, */ #define MAX_PINNED_INTERVAL 512 +static inline bool +asym_active_balance(struct lb_env *env) +{ + /* + * ASYM_PACKING needs to force migrate tasks from busy but + * lower priority CPUs in order to pack all tasks in the + * highest priority CPUs. + */ + return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) && + sched_asym_prefer(env->dst_cpu, env->src_cpu); +} + static int need_active_balance(struct lb_env *env) { struct sched_domain *sd = env->sd; - if (env->idle != CPU_NOT_IDLE) { - - /* - * ASYM_PACKING needs to force migrate tasks from busy but - * lower priority CPUs in order to pack all tasks in the - * highest priority CPUs. - */ - if ((sd->flags & SD_ASYM_PACKING) && - sched_asym_prefer(env->dst_cpu, env->src_cpu)) - return 1; - } + if (asym_active_balance(env)) + return 1; /* * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task. @@ -9150,7 +9153,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, } else sd->nr_balance_failed = 0; - if (likely(!active_balance)) { + if (likely(!active_balance) || asym_active_balance(&env)) { /* We were unbalanced, so reset the balancing interval */ sd->balance_interval = sd->min_interval; } else { -- 2.7.4 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval 2018-12-14 16:01 ` [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval Vincent Guittot @ 2018-12-17 16:56 ` Valentin Schneider 2018-12-18 9:32 ` Vincent Guittot 0 siblings, 1 reply; 27+ messages in thread From: Valentin Schneider @ 2018-12-17 16:56 UTC (permalink / raw) To: Vincent Guittot, peterz, mingo, linux-kernel; +Cc: Morten.Rasmussen Hi Vincent, About time I had a look at this one... On 14/12/2018 16:01, Vincent Guittot wrote: > In case of active balance, we increase the balance interval to cover > pinned tasks cases not covered by all_pinned logic. AFAIUI the balance increase is there to have plenty of time to stop the task before going through another load_balance(). Seeing as there is a cpus_allowed check that leads to env.flags |= LBF_ALL_PINNED; goto out_one_pinned; in the active balancing part of load_balance(), the condition you're changing should never be hit when we have pinned tasks. So you may want to rephrase that bit. > Neverthless, the active > migration triggered by asym packing should be treated as the normal > unbalanced case and reset the interval to default value otherwise active > migration for asym_packing can be easily delayed for hundreds of ms > because of this all_pinned detection mecanism. Mmm so it's not exactly clear why we need this change. If we double the interval because of a pinned task we wanted to active balance, well it's just regular pinned task issues and we can't do much about it. The only scenario I can think of is if you had a workload where you wanted to do an active balance in several successive load_balance(), in which case you would keep increasing the interval even though you do migrate a task every time (which would harm every subsequent active_balance). In that case every active_balance "user" (pressured CPU, misfit) is affected, so maybe what we want is something like this: -----8<----- @@ -9136,13 +9149,11 @@ static int load_balance(int this_cpu, struct rq *this_rq, sd->balance_interval = sd->min_interval; } else { /* - * If we've begun active balancing, start to back off. This - * case may not be covered by the all_pinned logic if there - * is only 1 task on the busy runqueue (because we don't call - * detach_tasks). + * If we've begun active balancing, start to back off. + * Don't increase too much in case we have more active balances + * coming up. */ - if (sd->balance_interval < sd->max_interval) - sd->balance_interval *= 2; + sd->balance_interval = 2 * sd->min_interval; } goto out; ----->8----- Maybe we want a larger threshold - truth be told, it all depends on how long the cpu stopper can take and if that delay increase is still relevant nowadays. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > kernel/sched/fair.c | 27 +++++++++++++++------------ > 1 file changed, 15 insertions(+), 12 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 9591e7a..487287e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8857,21 +8857,24 @@ static struct rq *find_busiest_queue(struct lb_env *env, > */ > #define MAX_PINNED_INTERVAL 512 > > +static inline bool > +asym_active_balance(struct lb_env *env) > +{ > + /* > + * ASYM_PACKING needs to force migrate tasks from busy but > + * lower priority CPUs in order to pack all tasks in the > + * highest priority CPUs. > + */ > + return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) && > + sched_asym_prefer(env->dst_cpu, env->src_cpu); > +} > + > static int need_active_balance(struct lb_env *env) > { > struct sched_domain *sd = env->sd; > > - if (env->idle != CPU_NOT_IDLE) { > - > - /* > - * ASYM_PACKING needs to force migrate tasks from busy but > - * lower priority CPUs in order to pack all tasks in the > - * highest priority CPUs. > - */ > - if ((sd->flags & SD_ASYM_PACKING) && > - sched_asym_prefer(env->dst_cpu, env->src_cpu)) > - return 1; > - } > + if (asym_active_balance(env)) > + return 1; > > /* > * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task. > @@ -9150,7 +9153,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, > } else > sd->nr_balance_failed = 0; > > - if (likely(!active_balance)) { > + if (likely(!active_balance) || asym_active_balance(&env)) { AFAICT this makes us reset the interval whenever we do an asym packing active balance (if the task is pinned we would goto out_one_pinned). This goes against the logic I had understood so far and that I explained higher up. > /* We were unbalanced, so reset the balancing interval */ > sd->balance_interval = sd->min_interval; > } else { > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval 2018-12-17 16:56 ` Valentin Schneider @ 2018-12-18 9:32 ` Vincent Guittot 2018-12-18 11:46 ` Valentin Schneider 0 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-18 9:32 UTC (permalink / raw) To: Valentin Schneider Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen Hi Valentin, On Mon, 17 Dec 2018 at 17:56, Valentin Schneider <valentin.schneider@arm.com> wrote: > > Hi Vincent, > > About time I had a look at this one... > > On 14/12/2018 16:01, Vincent Guittot wrote: > > In case of active balance, we increase the balance interval to cover > > pinned tasks cases not covered by all_pinned logic. > > AFAIUI the balance increase is there to have plenty of time to > stop the task before going through another load_balance(). > > Seeing as there is a cpus_allowed check that leads to > > env.flags |= LBF_ALL_PINNED; > goto out_one_pinned; > > in the active balancing part of load_balance(), the condition you're > changing should never be hit when we have pinned tasks. So you may want > to rephrase that bit. In this asym packing case, It has nothing to do with pinned tasks and that's the root cause of the problem: the active balance triggered by asym packing is wrongly assumed to be an active balance due to pinned task(s) and the load balance interval is increased without any reason > > > Neverthless, the active > > migration triggered by asym packing should be treated as the normal > > unbalanced case and reset the interval to default value otherwise active > > migration for asym_packing can be easily delayed for hundreds of ms > > because of this all_pinned detection mecanism. > > Mmm so it's not exactly clear why we need this change. If we double the > interval because of a pinned task we wanted to active balance, well it's > just regular pinned task issues and we can't do much about it. As explained above, it's not a pinned task case > > The only scenario I can think of is if you had a workload where you wanted > to do an active balance in several successive load_balance(), in which case > you would keep increasing the interval even though you do migrate a task > every time (which would harm every subsequent active_balance). > > In that case every active_balance "user" (pressured CPU, misfit) is > affected, so maybe what we want is something like this: > > -----8<----- > @@ -9136,13 +9149,11 @@ static int load_balance(int this_cpu, struct rq *this_rq, > sd->balance_interval = sd->min_interval; > } else { > /* > - * If we've begun active balancing, start to back off. This > - * case may not be covered by the all_pinned logic if there > - * is only 1 task on the busy runqueue (because we don't call > - * detach_tasks). > + * If we've begun active balancing, start to back off. > + * Don't increase too much in case we have more active balances > + * coming up. > */ > - if (sd->balance_interval < sd->max_interval) > - sd->balance_interval *= 2; > + sd->balance_interval = 2 * sd->min_interval; > } > > goto out; > ----->8----- > > Maybe we want a larger threshold - truth be told, it all depends on how > long the cpu stopper can take and if that delay increase is still relevant > nowadays. hmm the increase of balance interval is not linked to cpu stopper but to increase the load balance interval when we know that there is no possible load balance to perform Regards, Vincent > > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > --- > > kernel/sched/fair.c | 27 +++++++++++++++------------ > > 1 file changed, 15 insertions(+), 12 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 9591e7a..487287e 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -8857,21 +8857,24 @@ static struct rq *find_busiest_queue(struct lb_env *env, > > */ > > #define MAX_PINNED_INTERVAL 512 > > > > +static inline bool > > +asym_active_balance(struct lb_env *env) > > +{ > > + /* > > + * ASYM_PACKING needs to force migrate tasks from busy but > > + * lower priority CPUs in order to pack all tasks in the > > + * highest priority CPUs. > > + */ > > + return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) && > > + sched_asym_prefer(env->dst_cpu, env->src_cpu); > > +} > > + > > static int need_active_balance(struct lb_env *env) > > { > > struct sched_domain *sd = env->sd; > > > > - if (env->idle != CPU_NOT_IDLE) { > > - > > - /* > > - * ASYM_PACKING needs to force migrate tasks from busy but > > - * lower priority CPUs in order to pack all tasks in the > > - * highest priority CPUs. > > - */ > > - if ((sd->flags & SD_ASYM_PACKING) && > > - sched_asym_prefer(env->dst_cpu, env->src_cpu)) > > - return 1; > > - } > > + if (asym_active_balance(env)) > > + return 1; > > > > /* > > * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task. > > @@ -9150,7 +9153,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, > > } else > > sd->nr_balance_failed = 0; > > > > - if (likely(!active_balance)) { > > + if (likely(!active_balance) || asym_active_balance(&env)) { > > AFAICT this makes us reset the interval whenever we do an asym packing > active balance (if the task is pinned we would goto out_one_pinned). > This goes against the logic I had understood so far and that I explained > higher up. > > > /* We were unbalanced, so reset the balancing interval */ > > sd->balance_interval = sd->min_interval; > > } else { > > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval 2018-12-18 9:32 ` Vincent Guittot @ 2018-12-18 11:46 ` Valentin Schneider 2018-12-18 13:23 ` Vincent Guittot 0 siblings, 1 reply; 27+ messages in thread From: Valentin Schneider @ 2018-12-18 11:46 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On 18/12/2018 09:32, Vincent Guittot wrote: [...] > In this asym packing case, It has nothing to do with pinned tasks and > that's the root cause of the problem: > the active balance triggered by asym packing is wrongly assumed to be > an active balance due to pinned task(s) and the load balance interval > is increased without any reason > [...]> hmm the increase of balance interval is not linked to cpu stopper but > to increase the load balance interval when we know that there is no > possible load balance to perform > > Regards, > Vincent Ah, I think I get it: you're saying that this balance_interval increase is done because it is always assumed we do an active balance with busiest->nr_running > 1 && pinned tasks, and that all that is left to migrate after the active_balance is pinned tasks that we can't actually migrate. Maybe that's what we should target (as always, totally untested): -----8<----- @@ -9131,18 +9144,16 @@ static int load_balance(int this_cpu, struct rq *this_rq, } else sd->nr_balance_failed = 0; - if (likely(!active_balance)) { - /* We were unbalanced, so reset the balancing interval */ - sd->balance_interval = sd->min_interval; - } else { + if (unlikely(active_balance && (env.flags & LBF_ALL_PINNED))) { /* - * If we've begun active balancing, start to back off. This - * case may not be covered by the all_pinned logic if there - * is only 1 task on the busy runqueue (because we don't call - * detach_tasks). + * We did an active balance as a last resort (all other tasks + * were pinned), start to back off. */ if (sd->balance_interval < sd->max_interval) sd->balance_interval *= 2; + } else { + /* We were unbalanced, so reset the balancing interval */ + sd->balance_interval = sd->min_interval; } goto out; ----->8----- can_migrate_task() called by detach_tasks() in the 'if (busiest->nr_running > 1)' block should clear that LBF_ALL_PINNED flag if there is any migratable task (even if we don't end up migrating it), and it's not set if (busiest->nr_running == 1), so that *should* work. I believe the argument that this applies to all kinds of active balances is still valid - this shouldn't be changed just for asym packing active balance. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval 2018-12-18 11:46 ` Valentin Schneider @ 2018-12-18 13:23 ` Vincent Guittot 2018-12-18 14:09 ` Valentin Schneider 0 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-18 13:23 UTC (permalink / raw) To: Valentin Schneider Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On Tue, 18 Dec 2018 at 12:46, Valentin Schneider <valentin.schneider@arm.com> wrote: > > On 18/12/2018 09:32, Vincent Guittot wrote: > [...] > > In this asym packing case, It has nothing to do with pinned tasks and > > that's the root cause of the problem: > > the active balance triggered by asym packing is wrongly assumed to be > > an active balance due to pinned task(s) and the load balance interval > > is increased without any reason > > > [...]> hmm the increase of balance interval is not linked to cpu stopper but > > to increase the load balance interval when we know that there is no > > possible load balance to perform > > > > Regards, > > Vincent > > Ah, I think I get it: you're saying that this balance_interval increase > is done because it is always assumed we do an active balance with > busiest->nr_running > 1 && pinned tasks, and that all that is left to > migrate after the active_balance is pinned tasks that we can't actually > migrate. > > Maybe that's what we should target (as always, totally untested): > > -----8<----- > @@ -9131,18 +9144,16 @@ static int load_balance(int this_cpu, struct rq *this_rq, > } else > sd->nr_balance_failed = 0; > > - if (likely(!active_balance)) { > - /* We were unbalanced, so reset the balancing interval */ > - sd->balance_interval = sd->min_interval; > - } else { > + if (unlikely(active_balance && (env.flags & LBF_ALL_PINNED))) { So it's not exactly LBF_ALL_PINNED but also some pinned tasks but IIUC, you would like to extend the reset of balance interval to all the "good" reasons to trig active load balance In fact the condition used in need_active_balance except the last remaining one. Good reasons are: - asym packing - /* * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task. * It's worth migrating the task if the src_cpu's capacity is reduced * because of other sched_class or IRQs if more capacity stays * available on dst_cpu. */ - misfit task I can extend the test for these conditions > /* > - * If we've begun active balancing, start to back off. This > - * case may not be covered by the all_pinned logic if there > - * is only 1 task on the busy runqueue (because we don't call > - * detach_tasks). > + * We did an active balance as a last resort (all other tasks > + * were pinned), start to back off. > */ > if (sd->balance_interval < sd->max_interval) > sd->balance_interval *= 2; > + } else { > + /* We were unbalanced, so reset the balancing interval */ > + sd->balance_interval = sd->min_interval; > } > > goto out; > ----->8----- > > can_migrate_task() called by detach_tasks() in the > 'if (busiest->nr_running > 1)' block should clear that LBF_ALL_PINNED flag > if there is any migratable task (even if we don't end up migrating it), > and it's not set if (busiest->nr_running == 1), so that *should* work. > > I believe the argument that this applies to all kinds of active balances > is still valid - this shouldn't be changed just for asym packing active > balance. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval 2018-12-18 13:23 ` Vincent Guittot @ 2018-12-18 14:09 ` Valentin Schneider 2018-12-19 8:27 ` Vincent Guittot 0 siblings, 1 reply; 27+ messages in thread From: Valentin Schneider @ 2018-12-18 14:09 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On 18/12/2018 13:23, Vincent Guittot wrote: [...] >> Ah, I think I get it: you're saying that this balance_interval increase >> is done because it is always assumed we do an active balance with >> busiest->nr_running > 1 && pinned tasks, and that all that is left to >> migrate after the active_balance is pinned tasks that we can't actually >> migrate. >> >> Maybe that's what we should target (as always, totally untested): >> >> -----8<----- >> @@ -9131,18 +9144,16 @@ static int load_balance(int this_cpu, struct rq *this_rq, >> } else >> sd->nr_balance_failed = 0; >> >> - if (likely(!active_balance)) { >> - /* We were unbalanced, so reset the balancing interval */ >> - sd->balance_interval = sd->min_interval; >> - } else { >> + if (unlikely(active_balance && (env.flags & LBF_ALL_PINNED))) { > > So it's not exactly LBF_ALL_PINNED but also some pinned tasks > Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top of the 'if (busiest->nr_running > 1)' block and cleared whenever we find at least one task we could pull, even if we don't pull it because of other reasons in can_migrate_task() (e.g. cache hotness). If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't increase the balance_interval, which is what we would want to maintain. > but IIUC, you would like to extend the reset of balance interval to > all the "good" reasons to trig active load balance Right > In fact the condition used in need_active_balance except the last > remaining one. Good reasons are: > - asym packing > - /* > * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task. > * It's worth migrating the task if the src_cpu's capacity is reduced > * because of other sched_class or IRQs if more capacity stays > * available on dst_cpu. > */ > - misfit task > > I can extend the test for these conditions So that's all the need_active_balance() cases except the last sd->nr_balance_failed one. I'd argue this could also be counted as a "good" reason to active balance which shouldn't lead to a balance_interval increase. Plus, it keeps to the logic of increasing the balance_interval only when task affinity gets in the way. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval 2018-12-18 14:09 ` Valentin Schneider @ 2018-12-19 8:27 ` Vincent Guittot 2018-12-19 11:16 ` Valentin Schneider 0 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-19 8:27 UTC (permalink / raw) To: Valentin Schneider Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On Tue, 18 Dec 2018 at 15:09, Valentin Schneider <valentin.schneider@arm.com> wrote: > > On 18/12/2018 13:23, Vincent Guittot wrote: > [...] > >> Ah, I think I get it: you're saying that this balance_interval increase > >> is done because it is always assumed we do an active balance with > >> busiest->nr_running > 1 && pinned tasks, and that all that is left to > >> migrate after the active_balance is pinned tasks that we can't actually > >> migrate. > >> > >> Maybe that's what we should target (as always, totally untested): > >> > >> -----8<----- > >> @@ -9131,18 +9144,16 @@ static int load_balance(int this_cpu, struct rq *this_rq, > >> } else > >> sd->nr_balance_failed = 0; > >> > >> - if (likely(!active_balance)) { > >> - /* We were unbalanced, so reset the balancing interval */ > >> - sd->balance_interval = sd->min_interval; > >> - } else { > >> + if (unlikely(active_balance && (env.flags & LBF_ALL_PINNED))) { > > > > So it's not exactly LBF_ALL_PINNED but also some pinned tasks > > > > Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top > of the 'if (busiest->nr_running > 1)' block and cleared whenever we find > at least one task we could pull, even if we don't pull it because of > other reasons in can_migrate_task() (e.g. cache hotness). > > If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't > increase the balance_interval, which is what we would want to maintain. But there are several other UC to do active migration and increase the interval like all except running tasks are pinned > > > but IIUC, you would like to extend the reset of balance interval to > > all the "good" reasons to trig active load balance > > Right > > > In fact the condition used in need_active_balance except the last > > remaining one. Good reasons are: > > - asym packing > > - /* > > * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task. > > * It's worth migrating the task if the src_cpu's capacity is reduced > > * because of other sched_class or IRQs if more capacity stays > > * available on dst_cpu. > > */ > > - misfit task > > > > I can extend the test for these conditions > > So that's all the need_active_balance() cases except the last > sd->nr_balance_failed one. I'd argue this could also be counted as a > "good" reason to active balance which shouldn't lead to a balance_interval > increase. Plus, it keeps to the logic of increasing the balance_interval > only when task affinity gets in the way. But this is some kind of affinity, the hotness is a way for the scheduler to temporarily pinned the task on a cpu to take advantage of cache hotness. I would prefer to be conservative and only reset the interval when we are sure that active load balance is really what we want to do. Asym packing is one, we can add the misfit case and the move task on cpu with more available capacity as well. For the other case, it's less obvious and I would prefer to keep current behavior ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval 2018-12-19 8:27 ` Vincent Guittot @ 2018-12-19 11:16 ` Valentin Schneider 2018-12-19 13:29 ` Vincent Guittot 0 siblings, 1 reply; 27+ messages in thread From: Valentin Schneider @ 2018-12-19 11:16 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On 19/12/2018 08:27, Vincent Guittot wrote: [...] >> Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top >> of the 'if (busiest->nr_running > 1)' block and cleared whenever we find >> at least one task we could pull, even if we don't pull it because of >> other reasons in can_migrate_task() (e.g. cache hotness). >> >> If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't >> increase the balance_interval, which is what we would want to maintain. > > But there are several other UC to do active migration and increase the interval > like all except running tasks are pinned > My point is that AFAICT the LBF_ALL_PINNED flag would cover all the cases we care about, although the one you're mentioning is the only one I can think of. In that case LBF_ALL_PINNED would never be cleared, so when we do the active balance we'd know it's because all other tasks were pinned so we should probably increase the interval (see last snippet I sent). [...] >> So that's all the need_active_balance() cases except the last >> sd->nr_balance_failed one. I'd argue this could also be counted as a >> "good" reason to active balance which shouldn't lead to a balance_interval >> increase. Plus, it keeps to the logic of increasing the balance_interval >> only when task affinity gets in the way. > > But this is some kind of affinity, the hotness is a way for the > scheduler to temporarily pinned the task on a cpu to take advantage of > cache hotness. > > I would prefer to be conservative and only reset the interval when we > are sure that active load balance is really what we want to do. > Asym packing is one, we can add the misfit case and the move task on > cpu with more available capacity as well. For the other case, it's > less obvious and I would prefer to keep current behavior > Mmm ok so this one is kinda about semantics on what do we really consider as "pinning". If we look at the regular load_balance() path, if all tasks are cache hot (so we clear LBF_ALL_PINNED but don't pass can_migrate_task()) we won't increase the balance_interval. Actually, if we have !active_balance we'll reset it. Seeing as the duration of a task's cache hotness (default .5ms) is small compared to any balance_interval (1ms * sd_weight), IMO it would make sense to reset the interval for all active balance cases. On top of that, we would keep to the current logic of increasing the balance_interval *only* when task->cpus_allowed gets in the way. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval 2018-12-19 11:16 ` Valentin Schneider @ 2018-12-19 13:29 ` Vincent Guittot 2018-12-19 15:54 ` Valentin Schneider 0 siblings, 1 reply; 27+ messages in thread From: Vincent Guittot @ 2018-12-19 13:29 UTC (permalink / raw) To: Valentin Schneider Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On Wed, 19 Dec 2018 at 12:16, Valentin Schneider <valentin.schneider@arm.com> wrote: > > On 19/12/2018 08:27, Vincent Guittot wrote: > [...] > >> Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top > >> of the 'if (busiest->nr_running > 1)' block and cleared whenever we find > >> at least one task we could pull, even if we don't pull it because of > >> other reasons in can_migrate_task() (e.g. cache hotness). > >> > >> If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't > >> increase the balance_interval, which is what we would want to maintain. > > > > But there are several other UC to do active migration and increase the interval > > like all except running tasks are pinned > > > > My point is that AFAICT the LBF_ALL_PINNED flag would cover all the cases > we care about, although the one you're mentioning is the only one I can > think of. In that case LBF_ALL_PINNED would never be cleared, so when we do > the active balance we'd know it's because all other tasks were pinned so > we should probably increase the interval (see last snippet I sent). There are probably several other UC than the one mentioned below as tasks can be discarded for several reasons. So instead of changing for all UC by default, i would prefer only change for those for which we know it's safe > > [...] > >> So that's all the need_active_balance() cases except the last > >> sd->nr_balance_failed one. I'd argue this could also be counted as a > >> "good" reason to active balance which shouldn't lead to a balance_interval > >> increase. Plus, it keeps to the logic of increasing the balance_interval > >> only when task affinity gets in the way. > > > > But this is some kind of affinity, the hotness is a way for the > > scheduler to temporarily pinned the task on a cpu to take advantage of > > cache hotness. > > > > I would prefer to be conservative and only reset the interval when we > > are sure that active load balance is really what we want to do. > > Asym packing is one, we can add the misfit case and the move task on > > cpu with more available capacity as well. For the other case, it's > > less obvious and I would prefer to keep current behavior > > > > Mmm ok so this one is kinda about semantics on what do we really consider > as "pinning". > > If we look at the regular load_balance() path, if all tasks are cache hot > (so we clear LBF_ALL_PINNED but don't pass can_migrate_task()) we won't > increase the balance_interval. Actually, if we have !active_balance we'll > reset it. > > Seeing as the duration of a task's cache hotness (default .5ms) is small > compared to any balance_interval (1ms * sd_weight), IMO it would make sense > to reset the interval for all active balance cases. On top of that, we > would keep to the current logic of increasing the balance_interval *only* > when task->cpus_allowed gets in the way. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval 2018-12-19 13:29 ` Vincent Guittot @ 2018-12-19 15:54 ` Valentin Schneider 2018-12-19 16:54 ` Vincent Guittot 0 siblings, 1 reply; 27+ messages in thread From: Valentin Schneider @ 2018-12-19 15:54 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On 19/12/2018 13:29, Vincent Guittot wrote: [...] >> My point is that AFAICT the LBF_ALL_PINNED flag would cover all the cases >> we care about, although the one you're mentioning is the only one I can >> think of. In that case LBF_ALL_PINNED would never be cleared, so when we do >> the active balance we'd know it's because all other tasks were pinned so >> we should probably increase the interval (see last snippet I sent). > > There are probably several other UC than the one mentioned below as > tasks can be discarded for several reasons. > So instead of changing for all UC by default, i would prefer only > change for those for which we know it's safe I get your point. Thing is, I've stared at the code for a while and couldn't find any other usecase where checking LBF_ALL_PINNED wouldn't suffice. It would be nice convince ourselves it is indeed enough (or not, but then we should be sure of it rather than base ourselves on assumptions), because then we can have just a simple condition rather than something that introduces active balance categories. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval 2018-12-19 15:54 ` Valentin Schneider @ 2018-12-19 16:54 ` Vincent Guittot 0 siblings, 0 replies; 27+ messages in thread From: Vincent Guittot @ 2018-12-19 16:54 UTC (permalink / raw) To: Valentin Schneider Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen On Wed, 19 Dec 2018 at 16:54, Valentin Schneider <valentin.schneider@arm.com> wrote: > > On 19/12/2018 13:29, Vincent Guittot wrote: > [...] > >> My point is that AFAICT the LBF_ALL_PINNED flag would cover all the cases > >> we care about, although the one you're mentioning is the only one I can > >> think of. In that case LBF_ALL_PINNED would never be cleared, so when we do > >> the active balance we'd know it's because all other tasks were pinned so > >> we should probably increase the interval (see last snippet I sent). > > > > There are probably several other UC than the one mentioned below as > > tasks can be discarded for several reasons. > > So instead of changing for all UC by default, i would prefer only > > change for those for which we know it's safe > > I get your point. Thing is, I've stared at the code for a while and > couldn't find any other usecase where checking LBF_ALL_PINNED wouldn't > suffice. The point is that LBF_ALL_PINNED flag is not set otherwise we would have jump to out_*_pinned But conditions are similar > > It would be nice convince ourselves it is indeed enough (or not, but then > we should be sure of it rather than base ourselves on assumptions), because > then we can have just a simple condition rather than something that > introduces active balance categories. this can be part of the larger rework that Peter asked few days ago ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2018-12-19 16:54 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-14 16:01 [PATCH v2 0/3] sched/fair: some fixes for asym_packing Vincent Guittot 2018-12-14 16:01 ` [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing Vincent Guittot 2018-12-18 17:32 ` Valentin Schneider 2018-12-19 8:32 ` Vincent Guittot 2018-12-19 11:58 ` Valentin Schneider 2018-12-19 13:39 ` Vincent Guittot 2018-12-19 14:59 ` Valentin Schneider 2018-12-19 15:05 ` Vincent Guittot 2018-12-19 15:11 ` Valentin Schneider 2018-12-19 15:20 ` Vincent Guittot 2018-12-19 15:30 ` Valentin Schneider 2018-12-19 15:54 ` Vincent Guittot 2018-12-14 16:01 ` [PATCH v2 2/3] sched/fair: trigger asym_packing during idle load balance Vincent Guittot 2018-12-17 16:59 ` Valentin Schneider 2018-12-18 8:17 ` Vincent Guittot 2018-12-18 12:00 ` Valentin Schneider 2018-12-14 16:01 ` [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval Vincent Guittot 2018-12-17 16:56 ` Valentin Schneider 2018-12-18 9:32 ` Vincent Guittot 2018-12-18 11:46 ` Valentin Schneider 2018-12-18 13:23 ` Vincent Guittot 2018-12-18 14:09 ` Valentin Schneider 2018-12-19 8:27 ` Vincent Guittot 2018-12-19 11:16 ` Valentin Schneider 2018-12-19 13:29 ` Vincent Guittot 2018-12-19 15:54 ` Valentin Schneider 2018-12-19 16:54 ` Vincent Guittot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.