linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>,
	Preeti U Murthy <preeti@linux.vnet.ibm.com>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"riel@redhat.com" <riel@redhat.com>,
	"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
	"srikar@linux.vnet.ibm.com" <srikar@linux.vnet.ibm.com>,
	"pjt@google.com" <pjt@google.com>,
	"benh@kernel.crashing.org" <benh@kernel.crashing.org>,
	"efault@gmx.de" <efault@gmx.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"iamjoonsoo.kim@lge.com" <iamjoonsoo.kim@lge.com>,
	"svaidy@linux.vnet.ibm.com" <svaidy@linux.vnet.ibm.com>,
	"tim.c.chen@linux.intel.com" <tim.c.chen@linux.intel.com>,
	"jason.low2@hp.com" <jason.low2@hp.com>
Subject: Re: [PATCH V2] sched: Improve load balancing in the presence of idle CPUs
Date: Mon, 30 Mar 2015 15:29:09 +0200	[thread overview]
Message-ID: <CAKfTPtAHAYt2f8EsvW3uWfE+RjaFrJHRGYVmRs+ahyCrAXmOdQ@mail.gmail.com> (raw)
In-Reply-To: <20150330122449.GH21418@twins.programming.kicks-ass.net>

On 30 March 2015 at 14:24, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Mar 30, 2015 at 01:03:03PM +0100, Morten Rasmussen wrote:
>> On Mon, Mar 30, 2015 at 12:06:32PM +0100, Peter Zijlstra wrote:
>> > On Fri, Mar 27, 2015 at 05:56:51PM +0000, Morten Rasmussen wrote:
>> >
>> > > I agree that it is hard to predict how many additional cpus you need,
>> > > but I don't think you necessarily need that information as long as you
>> > > start by filling up the cpu that was kicked to do the
>> > > nohz_idle_balance() first.
>> >
>> > > Reducing unnecessary wakeups is quite important for energy consumption
>> > > and something a lot of effort is put into. You really don't want to wake
>> > > up another cluster/package unnecessarily just because there was only one
>> > > nohz-idle cpu left in the previous one which could have handled the
>> > > additional load. It gets even worse if the other cluster is less
>> > > energy-efficient (big.LITTLE).
>> >
>> > So the only way to get tasks to cross your cluster is by balancing that
>> > domain. At this point we'll compute sg stats for either group
>> > (=cluster).
>> >
>> > The only thing we need to ensure is that it doesn't view the small
>> > cluster as overloaded (as long as it really isn't of course), as long as
>> > its not viewed as overloaded it will not pull tasks from it into the big
>> > cluster, no matter how many ILBs we run before the ILB duty cpu's
>> > rebalance_domains() call.
>> >
>> > I'm really not seeing the problem here.
>>
>> I see. The group_classify() should take care of it in all cases of
>> balancing across clusters. You would be iterating over all cpus in the
>> other cluster running rebalance_domains() if the balancer cpu happens to
>> be the last one in the little cluster though. However, within the
>> cluster (in case you have 2 or more nohz-idle cpus) you still take a
>> double hit. No?
>
> It can yes, but typically not I think. This all could use some 'help'
> for sure.
>
> So the thing is, find_new_ilb() simply selects the first idle_cpus_mask
> cpu, while at the same time, nohz_idle_balance() will iterate the
> idle_cpus_mask with the first, being first (obviously).
>
> So it is very like that if we migrate on the ILB it is in fact to the
> current CPU.
>
> In case we cannot, we have no choice but to wake up a second idle,
> nothing really to be done about that.
>
> To put it another way, for ILB purposes the rebalance_domains() call is
> mostly superfluous. The only other case is if the selected ILB target
> became non-idle between being selected and getting to run the softirq
> handler. At which point we should wake another anyhow too.
>
> Maybe something like the below helps -- albeit it could use a comment
> too of course.
>
> ---
>  kernel/sched/fair.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fdae26eb7218..b879d4b3b599 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7624,11 +7624,12 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
>   * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
>   * rebalancing for all the cpus for whom scheduler ticks are stopped.
>   */
> -static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> +static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  {
>         int this_cpu = this_rq->cpu;
> -       struct rq *rq;
>         int balance_cpu;
> +       struct rq *rq;
> +       bool done = false;
>
>         if (idle != CPU_IDLE ||
>             !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
> @@ -7647,6 +7648,8 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>                         break;
>
>                 rq = cpu_rq(balance_cpu);
> +               if (rq == this_rq)
> +                       done = true;

AFAICT, this can't happen because we start the for_each _cpu loop with:
if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
continue;

>
>                 /*
>                  * If time for next balance is due,
> @@ -7666,6 +7669,8 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>         nohz.next_balance = this_rq->next_balance;
>  end:
>         clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> +
> +       return done;
>  }
>
>  /*
> @@ -7744,7 +7749,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
>         return kick;
>  }
>  #else
> -static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) { }
> +static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) { return false; }
>  #endif
>
>  /*
> @@ -7765,8 +7770,8 @@ static void run_rebalance_domains(struct softirq_action *h)
>          * load balance only within the local sched_domain hierarchy
>          * and abort nohz_idle_balance altogether if we pull some load.
>          */
> -       nohz_idle_balance(this_rq, idle);
> -       rebalance_domains(this_rq, idle);
> +       if (!nohz_idle_balance(this_rq, idle))
> +               rebalance_domains(this_rq, idle);

the nohz_idle_balance run rebalance_domains for all CPU except this CPU

>  }
>
>  /*

  parent reply	other threads:[~2015-03-30 13:29 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-26 13:02 [PATCH V2] sched: Improve load balancing in the presence of idle CPUs Preeti U Murthy
2015-03-26 17:03 ` Jason Low
2015-03-27  2:12 ` Wanpeng Li
2015-03-27  4:33   ` Preeti U Murthy
2015-03-27  4:23     ` Wanpeng Li
2015-03-27  5:01     ` Jason Low
2015-03-27  5:07   ` Jason Low
2015-03-27  5:39     ` Srikar Dronamraju
2015-03-27  7:00       ` Wanpeng Li
2015-03-27  6:43     ` Wanpeng Li
2015-03-27 16:23     ` Preeti U Murthy
2015-03-27 11:43 ` [tip:sched/core] " tip-bot for Preeti U Murthy
2015-03-27 13:03 ` [PATCH V2] " Srikar Dronamraju
2015-03-27 14:38 ` Morten Rasmussen
2015-03-27 16:46   ` Preeti U Murthy
2015-03-27 17:56     ` Morten Rasmussen
2015-03-30  7:26       ` Preeti U Murthy
2015-03-30 11:30         ` Morten Rasmussen
2015-03-30 11:06       ` Peter Zijlstra
2015-03-30 12:03         ` Morten Rasmussen
2015-03-30 12:24           ` Peter Zijlstra
2015-03-30 12:54             ` Peter Zijlstra
2015-03-30 13:29             ` Vincent Guittot [this message]
2015-03-30 14:01               ` Peter Zijlstra
2015-03-30 15:27               ` Morten Rasmussen
2015-03-31  8:58           ` Preeti U Murthy
2015-03-31 17:30             ` Jason Low
2015-04-01  6:28               ` Preeti U Murthy
2015-04-01 13:03                 ` Morten Rasmussen
2015-04-02  0:55                   ` Jason Low
2015-04-02  3:22                   ` Preeti U Murthy
2015-03-30 13:45 ` Vincent Guittot
2015-03-31  8:06   ` Preeti U Murthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtAHAYt2f8EsvW3uWfE+RjaFrJHRGYVmRs+ahyCrAXmOdQ@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=benh@kernel.crashing.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=efault@gmx.de \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jason.low2@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).