From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753446AbcHPN5a (ORCPT ); Tue, 16 Aug 2016 09:57:30 -0400 Received: from mail-wm0-f41.google.com ([74.125.82.41]:37775 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752223AbcHPN52 (ORCPT ); Tue, 16 Aug 2016 09:57:28 -0400 MIME-Version: 1.0 In-Reply-To: <1469453670-2660-12-git-send-email-morten.rasmussen@arm.com> References: <1469453670-2660-1-git-send-email-morten.rasmussen@arm.com> <1469453670-2660-12-git-send-email-morten.rasmussen@arm.com> From: Vincent Guittot Date: Tue, 16 Aug 2016 15:57:06 +0200 Message-ID: Subject: Re: [PATCH v3 11/13] sched/fair: Consider spare capacity in find_idlest_group() To: Morten Rasmussen Cc: Peter Zijlstra , "mingo@redhat.com" , Dietmar Eggemann , Yuyang Du , mgalbraith@suse.de, Sai Charan Gurrappadi , Koan-Sin Tan , =?UTF-8?B?5bCP5p6X5pWs5aSq?= , linux-kernel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 25 July 2016 at 15:34, Morten Rasmussen wrote: > In low-utilization scenarios comparing relative loads in > find_idlest_group() doesn't always lead to the most optimum choice. > Systems with groups containing different numbers of cpus and/or cpus of > different compute capacity are significantly better off when considering > spare capacity rather than relative load in those scenarios. > > In addition to existing load based search an alternative spare capacity > based candidate sched_group is found and selected instead if sufficient > spare capacity exists. If not, existing behaviour is preserved. > > cc: Ingo Molnar > cc: Peter Zijlstra > > Signed-off-by: Morten Rasmussen > --- > kernel/sched/fair.c | 46 +++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 41 insertions(+), 5 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 9c6ec3bf75ce..e3654409d099 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5164,6 +5164,14 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, > return 1; > } > > +static inline int task_util(struct task_struct *p); > +static int cpu_util_wake(int cpu, struct task_struct *p); > + > +static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) > +{ > + return capacity_orig_of(cpu) - cpu_util_wake(cpu, p); > +} > + > /* > * find_idlest_group finds and returns the least busy CPU group within the > * domain. > @@ -5173,7 +5181,9 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, > int this_cpu, int sd_flag) > { > struct sched_group *idlest = NULL, *group = sd->groups; > + struct sched_group *most_spare_sg = NULL; > unsigned long min_load = ULONG_MAX, this_load = 0; > + unsigned long most_spare = 0, this_spare = 0; > int load_idx = sd->forkexec_idx; > int imbalance = 100 + (sd->imbalance_pct-100)/2; > > @@ -5181,7 +5191,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, > load_idx = sd->wake_idx; > > do { > - unsigned long load, avg_load; > + unsigned long load, avg_load, spare_cap, max_spare_cap; > int local_group; > int i; > > @@ -5193,8 +5203,12 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, > local_group = cpumask_test_cpu(this_cpu, > sched_group_cpus(group)); > > - /* Tally up the load of all CPUs in the group */ > + /* > + * Tally up the load of all CPUs in the group and find > + * the group containing the cpu with most spare capacity. > + */ > avg_load = 0; > + max_spare_cap = 0; > > for_each_cpu(i, sched_group_cpus(group)) { > /* Bias balancing toward cpus of our domain */ > @@ -5204,6 +5218,13 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, > load = target_load(i, load_idx); > > avg_load += load; > + > + spare_cap = capacity_spare_wake(i, p); > + > + if (spare_cap > max_spare_cap && > + spare_cap > capacity_of(i) >> 3) { This condition probably needs some descriptions. You're not only looking for max spare capacity but also a significant spare capacity (more than 12.5% of cpu_capacity_orig). Can't this additional test lead to some strange situation where a CPU with more spare capacity will not be selected because of this 12.5% condition whereas another with less spare capacity will be selected because its capacity_orig is lower ? > + max_spare_cap = spare_cap; > + } > } > > /* Adjust by relative CPU capacity of the group */ > @@ -5211,12 +5232,27 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, > > if (local_group) { > this_load = avg_load; > - } else if (avg_load < min_load) { > - min_load = avg_load; > - idlest = group; > + this_spare = max_spare_cap; > + } else { > + if (avg_load < min_load) { > + min_load = avg_load; > + idlest = group; > + } > + > + if (most_spare < max_spare_cap) { > + most_spare = max_spare_cap; > + most_spare_sg = group; > + } > } > } while (group = group->next, group != sd->groups); > > + /* Found a significant amount of spare capacity. */ It may worth explaining the threshold when it becomes better to choose the most spare group instead of the least loaded group. > + if (this_spare > task_util(p) / 2 && > + imbalance*this_spare > 100*most_spare) > + return NULL; > + else if (most_spare > task_util(p) / 2) > + return most_spare_sg; > + > if (!idlest || 100*this_load < imbalance*min_load) > return NULL; > return idlest; > -- > 1.9.1 >