linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@intel.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linaro-dev@lists.linaro.org, peterz@infradead.org,
	mingo@kernel.org, linux@arm.linux.org.uk, pjt@google.com,
	santosh.shilimkar@ti.com, Morten.Rasmussen@arm.com,
	chander.kashyap@linaro.org, cmetcalf@tilera.com,
	tony.luck@intel.com, preeti@linux.vnet.ibm.com,
	paulmck@linux.vnet.ibm.com, tglx@linutronix.de,
	len.brown@intel.com, arjan@linux.intel.com,
	amit.kucheria@linaro.org, viresh.kumar@linaro.org
Subject: Re: [RFC PATCH v2 3/6] sched: pack small tasks
Date: Mon, 17 Dec 2012 23:24:06 +0800	[thread overview]
Message-ID: <50CF3916.4010001@intel.com> (raw)
In-Reply-To: <CAKfTPtAt-TUoJpXbL+dAQ2zGTYtewdXpFDN_mSeu4MnuC-2Xjg@mail.gmail.com>

>>>>>> The scheme below tries to summaries the idea:
>>>>>>
>>>>>> Socket      | socket 0 | socket 1   | socket 2   | socket 3   |
>>>>>> LCPU        | 0 | 1-15 | 16 | 17-31 | 32 | 33-47 | 48 | 49-63 |
>>>>>> buddy conf0 | 0 | 0    | 1  | 16    | 2  | 32    | 3  | 48    |
>>>>>> buddy conf1 | 0 | 0    | 0  | 16    | 16 | 32    | 32 | 48    |
>>>>>> buddy conf2 | 0 | 0    | 16 | 16    | 32 | 32    | 48 | 48    |
>>>>>>
>>>>>> But, I don't know how this can interact with NUMA load balance and the
>>>>>> better might be to use conf3.
>>>>>
>>>>> I mean conf2 not conf3
>>>>
>>>> So, it has 4 levels 0/16/32/ for socket 3 and 0 level for socket 0, it
>>>> is unbalanced for different socket.
>>>
>>> That the target because we have decided to pack the small tasks in
>>> socket 0 when we have parsed the topology at boot.
>>> We don't have to loop into sched_domain or sched_group anymore to find
>>> the best LCPU when a small tasks wake up.
>>
>> iteration on domain and group is a advantage feature for power efficient
>> requirement, not shortage. If some CPU are already idle before forking,
>> let another waking CPU check their load/util and then decide which one
>> is best CPU can reduce late migrations, that save both the performance
>> and power.
> 
> In fact, we have already done this job once at boot and we consider
> that moving small tasks in the buddy CPU is always benefit so we don't
> need to waste time looping sched_domain and sched_group to compute
> current capacity of each LCPU for each wake up of each small tasks. We
> want all small tasks and background activity waking up on the same
> buddy CPU and let the default behavior of the scheduler choosing the
> best CPU for heavy tasks or loaded CPUs.

IMHO, the design should be very good for your scenario and your machine,
but when the code move to general scheduler, we do want it can handle
more general scenarios. like sometime the 'small task' is not as small
as tasks in cyclictest which even hardly can run longer than migration
granularity or one tick, thus we really don't need to consider task
migration cost. But when the task are not too small, migration is more
heavier than domain/group walking, that is the common sense in
fork/exec/waking balance.

> 
>>
>> On the contrary, move task walking on each level buddies is not only bad
>> on performance but also bad on power. Consider the quite big latency of
>> waking a deep idle CPU. we lose too much..
> 
> My result have shown different conclusion.

That should be due to your tasks are too small to need consider
migration cost.
> In fact, there is much more chance that the buddy will not be in a
> deep idle as all the small tasks and background activity are already
> waking on this CPU.

powertop is helpful to tune your system for more idle time. Another
reason is current kernel just try to spread tasks on more cpu for
performance consideration. My power scheduling patch should helpful on this.
> 
>>
>>>
>>>>
>>>> And the ground level has just one buddy for 16 LCPUs - 8 cores, that's
>>>> not a good design, consider my previous examples: if there are 4 or 8
>>>> tasks in one socket, you just has 2 choices: spread them into all cores,
>>>> or pack them into one LCPU. Actually, moving them just into 2 or 4 cores
>>>> maybe a better solution. but the design missed this.
>>>
>>> You speak about tasks without any notion of load. This patch only care
>>> of small tasks and light LCPU load, but it falls back to default
>>> behavior for other situation. So if there are 4 or 8 small tasks, they
>>> will migrate to the socket 0 after 1 or up to 3 migration (it depends
>>> of the conf and the LCPU they come from).
>>
>> According to your patch, what your mean 'notion of load' is the
>> utilization of cpu, not the load weight of tasks, right?
> 
> Yes but not only. The number of tasks that run simultaneously, is
> another important input
> 
>>
>> Yes, I just talked about tasks numbers, but it naturally extends to the
>> task utilization on cpu. like 8 tasks with 25% util, that just can full
>> fill 2 CPUs. but clearly beyond the capacity of the buddy, so you need
>> to wake up another CPU socket while local socket has some LCPU idle...
> 
> 8 tasks with a running period of 25ms per 100ms that wake up
> simultaneously should probably run on 8 different LCPU in order to
> race to idle

nope, it's a rare probability of 8 tasks wakeuping simultaneously. And
even so they should run in the same socket for power saving
consideration(my power scheduling patch can do this), instead of spread
to all sockets.
> 
> 
> Regards,
> Vincent
> 
>>>
>>> Then, if too much small tasks wake up simultaneously on the same LCPU,
>>> the default load balance will spread them in the core/cluster/socket
>>>
>>>>
>>>> Obviously, more and more cores is the trend on any kinds of CPU, the
>>>> buddy system seems hard to catch up this.
>>>>
>>>>
>>
>>
>> --
>> Thanks
>>     Alex


-- 
Thanks
    Alex

  reply	other threads:[~2012-12-17 15:24 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-12 13:31 [RFC PATCH v2 0/6] sched: packing small tasks Vincent Guittot
2012-12-12 13:31 ` RFC PATCH v2 1/6] Revert "sched: introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Vincent Guittot
2012-12-12 13:31 ` [PATCH 2/6] sched: add a new SD SHARE_POWERLINE flag for sched_domain Vincent Guittot
2012-12-13  2:24   ` Alex Shi
2012-12-13  8:53     ` Vincent Guittot
2012-12-12 13:31 ` [RFC PATCH v2 3/6] sched: pack small tasks Vincent Guittot
2012-12-13  2:17   ` Alex Shi
2012-12-13  2:43     ` Alex Shi
2012-12-13 10:11     ` Vincent Guittot
2012-12-13 14:25       ` Alex Shi
2012-12-13 14:53         ` Vincent Guittot
2012-12-13 15:48           ` Vincent Guittot
2012-12-14  1:46             ` Alex Shi
2012-12-14  9:33               ` Vincent Guittot
2012-12-16  7:12                 ` Alex Shi
2012-12-17  9:40                   ` Vincent Guittot
2012-12-17 15:24                     ` Alex Shi [this message]
2012-12-18  9:53                       ` Vincent Guittot
2012-12-18 11:29                         ` Alex Shi
2012-12-14  4:45         ` Mike Galbraith
2012-12-14  6:36           ` Alex Shi
2012-12-14  7:45             ` Mike Galbraith
2012-12-14  7:57               ` Alex Shi
2012-12-14 10:43               ` Vincent Guittot
2012-12-15  6:40                 ` Mike Galbraith
2012-12-17  3:13           ` Alex Shi
2012-12-21  5:47       ` Namhyung Kim
2012-12-21  8:53         ` Vincent Guittot
2012-12-21  8:59           ` Vincent Guittot
2012-12-12 13:31 ` [RFC PATCH v2 4/6] sched: secure access to other CPU statistics Vincent Guittot
2012-12-12 13:31 ` [RFC PATCH v2 5/6] sched: pack the idle load balance Vincent Guittot
2012-12-12 13:31 ` [RFC PATCH v2 6/6] ARM: sched: clear SD_SHARE_POWERLINE Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50CF3916.4010001@intel.com \
    --to=alex.shi@intel.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=amit.kucheria@linaro.org \
    --cc=arjan@linux.intel.com \
    --cc=chander.kashyap@linaro.org \
    --cc=cmetcalf@tilera.com \
    --cc=len.brown@intel.com \
    --cc=linaro-dev@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=santosh.shilimkar@ti.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).