From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752301AbbE1Dqt (ORCPT ); Wed, 27 May 2015 23:46:49 -0400 Received: from mail-wi0-f173.google.com ([209.85.212.173]:37170 "EHLO mail-wi0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751650AbbE1Dqm (ORCPT ); Wed, 27 May 2015 23:46:42 -0400 Message-ID: <1432784798.3237.81.camel@gmail.com> Subject: Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE From: Mike Galbraith To: Josef Bacik Cc: riel@redhat.com, mingo@redhat.com, peterz@infradead.org, linux-kernel@vger.kernel.org Date: Thu, 28 May 2015 05:46:38 +0200 In-Reply-To: <1432761736-22093-1-git-send-email-jbacik@fb.com> References: <1432761736-22093-1-git-send-email-jbacik@fb.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.11 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2015-05-27 at 17:22 -0400, Josef Bacik wrote: > [ sorry if you get this twice, it seems like the first submission got lost ] > > At Facebook we have a pretty heavily multi-threaded application that is > sensitive to latency. We have been pulling forward the old SD_WAKE_IDLE code > because it gives us a pretty significant performance gain (like 20%). It turns > out this is because there are cases where the scheduler puts our task on a busy > CPU when there are idle CPU's in the system. We verify this by reading the > cpu_delay_req_avg_us from the scheduler netlink stuff. With our crappy patch we > get much lower numbers vs baseline. > > SD_BALANCE_WAKE is supposed to find us an idle cpu to run on, however it is just > looking for an idle sibling, preferring affinity over all else. This is not > helpful in all cases, and SD_BALANCE_WAKE's job is to find us an idle cpu, not > garuntee affinity. Fix this by first trying to find an idle sibling, and then > if the cpu is not idle fall through to the logic to find an idle cpu. With this > patch we get slightly better performance than with our forward port of > SD_WAKE_IDLE. Thanks, The job description isn't really find idle. it's find least loaded. > Signed-off-by: Josef Bacik > Acked-by: Rik van Riel > --- > kernel/sched/fair.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 241213b..03dafa3 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4766,7 +4766,8 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f > > if (sd_flag & SD_BALANCE_WAKE) { > new_cpu = select_idle_sibling(p, prev_cpu); > - goto unlock; > + if (idle_cpu(new_cpu)) > + goto unlock; > } > > while (sd) { Instead of doing what for most will be a redundant idle_cpu() call, perhaps a couple cycles can be saved if you move the sd assignment above affine_sd assignment, and say if (!sd || idle_cpu(new_cpu)) ? You could also stop find_idlest_group() at the first completely idle group to shave cycles off the not fully committed search. It ain't likely to find a negative load.. cool as that would be ;-) -Mike