linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <clm@fb.com>
To: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Mike Galbraith <mgalbraith@suse.de>,
	<linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH RFC] select_idle_sibling experiments
Date: Tue, 5 Apr 2016 20:44:12 -0400	[thread overview]
Message-ID: <20160406004412.GB57524@clm-mbp.masoncoding.com> (raw)
In-Reply-To: <20160405200302.GL2701@codeblueprint.co.uk>

On Tue, Apr 05, 2016 at 09:03:02PM +0100, Matt Fleming wrote:
> On Tue, 05 Apr, at 02:08:22PM, Chris Mason wrote:
> > 
> > I started with a small-ish program to benchmark wakeup latencies.  The
> > basic idea is a bunch of worker threads who sit around and burn CPU.
> > Every once and a while they send a message to a message thread.
>  
> This reminds me of something I've been looking at recently; a similar
> workload in Mel's mmtests based on pgbench with 1-client that also has
> this problem of cpu_idle() being false at an inconvenient time in
> select_idle_sibling(), so we move the task off the cpu and the cpu
> then immediately goes idle.
> 
> This leads to tasks bouncing around the socket as we search for idle
> cpus.

It might be worth making a way to claim the idle cpu.  If there are lots
of them, we'll fan out properly instead of piling up into the first one.
If there are very few of them, we'll figure it out much faster.

> 
> > It has knobs for cpu think time, and for how long the messenger thread
> > waits before replying.  Here's how I'm running it with my patch:
>  
> [...]
> 
> Cool, I'll go have a play with this.

I'm more than open to ways to improve it, and I'll send to Mel or put in
a git tree if people find it useful.

> 
> > Now, on to the patch.  I pushed some code around and narrowed the
> > problem down to select_idle_sibling()   We have cores going into and out
> > of idle fast enough that even this cut our latencies in half:
> > 
> > static int select_idle_sibling(struct task_struct *p, int target)
> >                                 goto next;
> >  
> >                         for_each_cpu(i, sched_group_cpus(sg)) {
> > -                               if (i == target || !idle_cpu(i))
> > +                               if (!idle_cpu(i))
> >                                         goto next;
> >                         }
> >  
> > IOW, by the time we get down to for_each_cpu(), the idle_cpu() check
> > done at the top of the function is no longer valid.
>  
> Yeah. The problem is that because we're racing with the cpu going in
> and out of idle, and since you're exploiting that race condition, this
> is highly tuned to your specific workload.
> 
> Which is a roundabout way of saying, this is probably going to
> negatively impact other workloads.
> 
> > I tried a few variations on select_idle_sibling() that preserved the
> > underlying goal of returning idle cores before idle SMT threads.  They
> > were all horrible in different ways, and none of them were fast.
>  
> I toyed with ignoring cpu_idle() in select_idle_sibling() for my
> workload. That actually was faster ;)
> 
> > The patch below just makes select_idle_sibling pick the first idle
> > thread it can find.  When I ran it through production workloads here, it
> > was faster than the patch we've been carrying around for the last few
> > years.
> 
> It would be really nice if we had a lightweight way to gauge the
> "idleness" of a cpu, and whether we expect it to be idle again soon.
> 
> Failing that, could we just force the task onto 'target' when it makes
> sense and skip the idle search (and the race) altogether?

To me it feels like the search for a full free core is impossible.  The boxes
are intentionally loaded to the point where a full core is never going
to be free.  So we need this loop to quickly pick a good candidate and
move on.

The benchmark is using ~30ms of CPU time in each worker thread, so
picking a CPU with a busy worker thread is going to have a pretty big
penalty.  Just grabbing any CPU and hoping it'll be idle soon isn't
likely to work well, and if it does that's probably a bug in my
benchmark ;)

You can see this in action by adding one (or at most two) more threads
to the command line.  The p99 jumps quickly to 2ms or more.

-chris

  parent reply	other threads:[~2016-04-06  0:44 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05 18:08 [PATCH RFC] select_idle_sibling experiments Chris Mason
2016-04-05 18:43 ` Bastien Bastien Philbert
2016-04-05 19:28   ` Chris Mason
2016-04-05 20:03 ` Matt Fleming
2016-04-05 21:05   ` Bastien Philbert
2016-04-06  0:44   ` Chris Mason [this message]
2016-04-06  7:27 ` Mike Galbraith
2016-04-06 13:36   ` Chris Mason
2016-04-09 17:30   ` Chris Mason
2016-04-12 21:45     ` Matt Fleming
2016-04-13  3:40       ` Mike Galbraith
2016-04-13 15:54         ` Chris Mason
2016-04-28 12:00   ` Peter Zijlstra
2016-04-28 13:17     ` Mike Galbraith
2016-05-02  5:35     ` Mike Galbraith
2016-04-07 15:17 ` Chris Mason
2016-04-09 19:05 ` sched: tweak select_idle_sibling to look for idle threads Chris Mason
2016-04-10 10:04   ` Mike Galbraith
2016-04-10 12:35     ` Chris Mason
2016-04-10 12:46       ` Mike Galbraith
2016-04-10 19:55     ` Chris Mason
2016-04-11  4:54       ` Mike Galbraith
2016-04-12  0:30         ` Chris Mason
2016-04-12  4:44           ` Mike Galbraith
2016-04-12 13:27             ` Chris Mason
2016-04-12 18:16               ` Mike Galbraith
2016-04-12 20:07                 ` Chris Mason
2016-04-13  3:18                   ` Mike Galbraith
2016-04-13 13:44                     ` Chris Mason
2016-04-13 14:22                       ` Mike Galbraith
2016-04-13 14:36                         ` Chris Mason
2016-04-13 15:05                           ` Mike Galbraith
2016-04-13 15:34                             ` Mike Galbraith
2016-04-30 12:47   ` Peter Zijlstra
2016-05-01  7:12     ` Mike Galbraith
2016-05-01  8:53       ` Peter Zijlstra
2016-05-01  9:20         ` Mike Galbraith
2016-05-07  1:24           ` Yuyang Du
2016-05-08  8:08             ` Mike Galbraith
2016-05-08 18:57               ` Yuyang Du
2016-05-09  3:45                 ` Mike Galbraith
2016-05-08 20:22                   ` Yuyang Du
2016-05-09  7:44                     ` Mike Galbraith
2016-05-09  1:13                       ` Yuyang Du
2016-05-09  9:39                         ` Mike Galbraith
2016-05-09 23:26                           ` Yuyang Du
2016-05-10  7:49                             ` Mike Galbraith
2016-05-10 15:26                               ` Mike Galbraith
2016-05-10 19:16                                 ` Yuyang Du
2016-05-11  4:17                                   ` Mike Galbraith
2016-05-11  1:23                                     ` Yuyang Du
2016-05-11  9:56                                       ` Mike Galbraith
2016-05-18  6:41                                   ` Mike Galbraith
2016-05-09  3:52                 ` Mike Galbraith
2016-05-08 20:31                   ` Yuyang Du
2016-05-02  8:46       ` Peter Zijlstra
2016-05-02 14:50         ` Mike Galbraith
2016-05-02 14:58           ` Peter Zijlstra
2016-05-02 15:47             ` Chris Mason
2016-05-03 14:32               ` Peter Zijlstra
2016-05-03 15:11                 ` Chris Mason
2016-05-04 10:37                   ` Peter Zijlstra
2016-05-04 15:31                     ` Peter Zijlstra
2016-05-05 22:03                     ` Matt Fleming
2016-05-06 18:54                       ` Mike Galbraith
2016-05-09  8:33                         ` Peter Zijlstra
2016-05-09  8:56                           ` Mike Galbraith
2016-05-04 15:45                   ` Peter Zijlstra
2016-05-04 17:46                     ` Chris Mason
2016-05-05  9:33                       ` Peter Zijlstra
2016-05-05 13:58                         ` Chris Mason
2016-05-06  7:12                           ` Peter Zijlstra
2016-05-06 17:27                             ` Chris Mason
2016-05-06  7:25                   ` Peter Zijlstra
2016-05-02 17:30             ` Mike Galbraith
2016-05-02 15:01           ` Peter Zijlstra
2016-05-02 16:04             ` Ingo Molnar
2016-05-03 11:31               ` Peter Zijlstra
2016-05-03 18:22                 ` Peter Zijlstra
2016-05-02 15:10           ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160406004412.GB57524@clm-mbp.masoncoding.com \
    --to=clm@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mgalbraith@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).