From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753427AbbDADhv (ORCPT <rfc822;w@1wt.eu>);
	Tue, 31 Mar 2015 23:37:51 -0400
Received: from mail-ig0-f174.google.com ([209.85.213.174]:36225 "EHLO
	mail-ig0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751315AbbDADhr (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 31 Mar 2015 23:37:47 -0400
MIME-Version: 1.0
In-Reply-To: <CAKfTPtC9HSAwdbv3vG25=HFF9hLwE8r5GNDre8cBnO-7M3T9jQ@mail.gmail.com>
References: <1425052454-25797-1-git-send-email-vincent.guittot@linaro.org>
	<1425052454-25797-9-git-send-email-vincent.guittot@linaro.org>
	<CADcy93WdRjirmjYHOFp-J_==uEK=giQAe6nPS+Pj6QYafnwJJg@mail.gmail.com>
	<CAKfTPtC9HSAwdbv3vG25=HFF9hLwE8r5GNDre8cBnO-7M3T9jQ@mail.gmail.com>
Date: Wed, 1 Apr 2015 11:37:45 +0800
Message-ID: <CADcy93VAxB-XF97CDsY9anb08t28xjhe5koOtN=RA2884n6Jmw@mail.gmail.com>
Subject: Re: [PATCH v10 08/11] sched: replace capacity_factor by usage
From: Xunlei Pang <pang.xunlei@linaro.org>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        lkml <linux-kernel@vger.kernel.org>,
        Preeti U Murthy <preeti@linux.vnet.ibm.com>,
        Morten Rasmussen <Morten.Rasmussen@arm.com>,
        Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
        Rik van Riel <riel@redhat.com>,
        Linaro Kernel Mailman List <linaro-kernel@lists.linaro.org>,
        Mike Galbraith <efault@gmx.de>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Vincent,

On 27 March 2015 at 23:59, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> On 27 March 2015 at 15:52, Xunlei Pang <pang.xunlei@linaro.org> wrote:
>> Hi Vincent,
>>
>> On 27 February 2015 at 23:54, Vincent Guittot
>> <vincent.guittot@linaro.org> wrote:
>>>  /**
>>> @@ -6432,18 +6435,19 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>>>
>>>                 /*
>>>                  * In case the child domain prefers tasks go to siblings
>>> -                * first, lower the sg capacity factor to one so that we'll try
>>> +                * first, lower the sg capacity so that we'll try
>>>                  * and move all the excess tasks away. We lower the capacity
>>>                  * of a group only if the local group has the capacity to fit
>>> -                * these excess tasks, i.e. nr_running < group_capacity_factor. The
>>> -                * extra check prevents the case where you always pull from the
>>> -                * heaviest group when it is already under-utilized (possible
>>> -                * with a large weight task outweighs the tasks on the system).
>>> +                * these excess tasks. The extra check prevents the case where
>>> +                * you always pull from the heaviest group when it is already
>>> +                * under-utilized (possible with a large weight task outweighs
>>> +                * the tasks on the system).
>>>                  */
>>>                 if (prefer_sibling && sds->local &&
>>> -                   sds->local_stat.group_has_free_capacity) {
>>> -                       sgs->group_capacity_factor = min(sgs->group_capacity_factor, 1U);
>>> -                       sgs->group_type = group_classify(sg, sgs);
>>> +                   group_has_capacity(env, &sds->local_stat) &&
>>> +                   (sgs->sum_nr_running > 1)) {
>>> +                       sgs->group_no_capacity = 1;
>>> +                       sgs->group_type = group_overloaded;
>>>                 }
>>>
>>
>> For SD_PREFER_SIBLING, if local has 1 task and group_has_capacity()
>> returns true(but not overloaded)  for it, and assume sgs group has 2
>> tasks, should we still mark this group overloaded?
>
> yes, the load balance will then choose if it's worth pulling it or not
> depending of the load of each groups

Maybe I didn't make it clearly.
For example, CPU0~1 are SMT siblings,  CPU2~CPU3 are another pair.
CPU0 is idle, others each has 1 task. Then according to this patch,
CPU2~CPU3(as one group) will be viewed as overloaded(CPU0~CPU1 as
local group, and group_has_capacity() returns true here), so the
balancer may initiate an active task moving. This is different from
the current code as SD_PREFER_SIBLING logic does. Is this problematic?

>
>>
>> -Xunlei