linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Galbraith <mgalbraith@suse.de>
To: Chris Mason <clm@fb.com>, Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Matt Fleming <matt@codeblueprint.co.uk>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] select_idle_sibling experiments
Date: Wed, 06 Apr 2016 09:27:24 +0200	[thread overview]
Message-ID: <1459927644.5612.41.camel@suse.de> (raw)
In-Reply-To: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com>

[-- Attachment #1: Type: text/plain, Size: 3515 bytes --]

> On Tue, 2016-04-05 at 14:08 -0400, Chris Mason wrote:

> Now, on to the patch.  I pushed some code around and narrowed the
> problem down to select_idle_sibling()   We have cores going into and out
> of idle fast enough that even this cut our latencies in half:

Are you using NO_HZ?  If so, you may want to try the attached.

> static int select_idle_sibling(struct task_struct *p, int target)
>                                 goto next;
>  
>                         for_each_cpu(i, sched_group_cpus(sg)) {
> -                               if (i == target || !idle_cpu(i))
> +                               if (!idle_cpu(i))
>                                         goto next;
>                         }
>  
> IOW, by the time we get down to for_each_cpu(), the idle_cpu() check
> done at the top of the function is no longer valid.

Ok, that's only an optimization, could go if it's causing trouble.

> I tried a few variations on select_idle_sibling() that preserved the
> underlying goal of returning idle cores before idle SMT threads.  They
> were all horrible in different ways, and none of them were fast.
> 
> The patch below just makes select_idle_sibling pick the first idle
> thread it can find.  When I ran it through production workloads here, it
> was faster than the patch we've been carrying around for the last few
> years.
> 
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 56b7d4b..c41baa6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4974,7 +4974,6 @@ find_idlest_cpu(struct sched_group *group,
> struct task_struct *p, int this_cpu)
>  static int select_idle_sibling(struct task_struct *p, int target)
>  {
>  	struct sched_domain *sd;
> -	struct sched_group *sg;
>  	int i = task_cpu(p);
>  
>  	if (idle_cpu(target))
> @@ -4990,24 +4989,14 @@ static int select_idle_sibling(struct
> task_struct *p, int target)
>  	 * Otherwise, iterate the domains and find an elegible idle
> cpu.
>  	 */
>  	sd = rcu_dereference(per_cpu(sd_llc, target));
> -	for_each_lower_domain(sd) {
> -		sg = sd->groups;
> -		do {
> -			if
> (!cpumask_intersects(sched_group_cpus(sg),
> -						tsk_cpus_allowed(p))
> )
> -				goto next;
> -
> -			for_each_cpu(i, sched_group_cpus(sg)) {
> -				if (i == target || !idle_cpu(i))
> -					goto next;
> -			}
> +	if (!sd)
> +		goto done;
>  
> -			target =
> cpumask_first_and(sched_group_cpus(sg),
> -					tsk_cpus_allowed(p));
> +	for_each_cpu_and(i, sched_domain_span(sd), &p->cpus_allowed)
> {
> +		if (cpu_active(i) && idle_cpu(i)) {
> +			target = i;
>  			goto done;
> -next:
> -			sg = sg->next;
> -		} while (sg != sd->groups);
> +		}
>  	}
>  done:
>  	return target;
> 

Ew.  That may improve your latency is everything load, but worst case
package walk will hurt like hell on CPUs with insane number of threads.
 That full search also turns the evil face of two-faced little
select_idle_sibling() into it's only face, the one that bounces tasks
about much more than they appreciate.

Looking for an idle core first delivers the most throughput boost, and
only looking at target's threads if you don't find one keeps the bounce
and traverse pain down to a dull roar, while at least trying to get
that latency win.  To me, your patch looks like it trades harm to many,
for good for a few.

A behavior switch would be better.  It can't get any dumber, but trying
to make it smarter makes it too damn fat.  As it sits, it's aiming in
the general direction of the bullseye.. and occasionally hits the wall.

	-Mike

[-- Attachment #2: sched-throttle-nohz.patch --]
[-- Type: text/x-patch, Size: 1535 bytes --]

sched: ratelimit nohz

Entering nohz code on every micro-idle is too expensive to bear.

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 include/linux/sched.h    |    5 +++++
 kernel/sched/core.c      |    8 ++++++++
 kernel/time/tick-sched.c |    2 +-
 3 files changed, 14 insertions(+), 1 deletion(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2286,6 +2286,11 @@ static inline int set_cpus_allowed_ptr(s
 #ifdef CONFIG_NO_HZ_COMMON
 void calc_load_enter_idle(void);
 void calc_load_exit_idle(void);
+#ifdef CONFIG_SMP
+extern int sched_needs_cpu(int cpu);
+#else
+static inline int sched_needs_cpu(int cpu) { return 0; }
+#endif
 #else
 static inline void calc_load_enter_idle(void) { }
 static inline void calc_load_exit_idle(void) { }
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -577,6 +577,14 @@ static inline bool got_nohz_idle_kick(vo
 	return false;
 }
 
+int sched_needs_cpu(int cpu)
+{
+	if (tick_nohz_full_cpu(cpu))
+		return 0;
+
+	return  cpu_rq(cpu)->avg_idle < sysctl_sched_migration_cost;
+}
+
 #else /* CONFIG_NO_HZ_COMMON */
 
 static inline bool got_nohz_idle_kick(void)
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -676,7 +676,7 @@ static ktime_t tick_nohz_stop_sched_tick
 	} while (read_seqretry(&jiffies_lock, seq));
 	ts->last_jiffies = basejiff;
 
-	if (rcu_needs_cpu(basemono, &next_rcu) ||
+	if (sched_needs_cpu(cpu) || rcu_needs_cpu(basemono, &next_rcu) ||
 	    arch_needs_cpu() || irq_work_needs_cpu()) {
 		next_tick = basemono + TICK_NSEC;
 	} else {

  parent reply	other threads:[~2016-04-06  7:27 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05 18:08 [PATCH RFC] select_idle_sibling experiments Chris Mason
2016-04-05 18:43 ` Bastien Bastien Philbert
2016-04-05 19:28   ` Chris Mason
2016-04-05 20:03 ` Matt Fleming
2016-04-05 21:05   ` Bastien Philbert
2016-04-06  0:44   ` Chris Mason
2016-04-06  7:27 ` Mike Galbraith [this message]
2016-04-06 13:36   ` Chris Mason
2016-04-09 17:30   ` Chris Mason
2016-04-12 21:45     ` Matt Fleming
2016-04-13  3:40       ` Mike Galbraith
2016-04-13 15:54         ` Chris Mason
2016-04-28 12:00   ` Peter Zijlstra
2016-04-28 13:17     ` Mike Galbraith
2016-05-02  5:35     ` Mike Galbraith
2016-04-07 15:17 ` Chris Mason
2016-04-09 19:05 ` sched: tweak select_idle_sibling to look for idle threads Chris Mason
2016-04-10 10:04   ` Mike Galbraith
2016-04-10 12:35     ` Chris Mason
2016-04-10 12:46       ` Mike Galbraith
2016-04-10 19:55     ` Chris Mason
2016-04-11  4:54       ` Mike Galbraith
2016-04-12  0:30         ` Chris Mason
2016-04-12  4:44           ` Mike Galbraith
2016-04-12 13:27             ` Chris Mason
2016-04-12 18:16               ` Mike Galbraith
2016-04-12 20:07                 ` Chris Mason
2016-04-13  3:18                   ` Mike Galbraith
2016-04-13 13:44                     ` Chris Mason
2016-04-13 14:22                       ` Mike Galbraith
2016-04-13 14:36                         ` Chris Mason
2016-04-13 15:05                           ` Mike Galbraith
2016-04-13 15:34                             ` Mike Galbraith
2016-04-30 12:47   ` Peter Zijlstra
2016-05-01  7:12     ` Mike Galbraith
2016-05-01  8:53       ` Peter Zijlstra
2016-05-01  9:20         ` Mike Galbraith
2016-05-07  1:24           ` Yuyang Du
2016-05-08  8:08             ` Mike Galbraith
2016-05-08 18:57               ` Yuyang Du
2016-05-09  3:45                 ` Mike Galbraith
2016-05-08 20:22                   ` Yuyang Du
2016-05-09  7:44                     ` Mike Galbraith
2016-05-09  1:13                       ` Yuyang Du
2016-05-09  9:39                         ` Mike Galbraith
2016-05-09 23:26                           ` Yuyang Du
2016-05-10  7:49                             ` Mike Galbraith
2016-05-10 15:26                               ` Mike Galbraith
2016-05-10 19:16                                 ` Yuyang Du
2016-05-11  4:17                                   ` Mike Galbraith
2016-05-11  1:23                                     ` Yuyang Du
2016-05-11  9:56                                       ` Mike Galbraith
2016-05-18  6:41                                   ` Mike Galbraith
2016-05-09  3:52                 ` Mike Galbraith
2016-05-08 20:31                   ` Yuyang Du
2016-05-02  8:46       ` Peter Zijlstra
2016-05-02 14:50         ` Mike Galbraith
2016-05-02 14:58           ` Peter Zijlstra
2016-05-02 15:47             ` Chris Mason
2016-05-03 14:32               ` Peter Zijlstra
2016-05-03 15:11                 ` Chris Mason
2016-05-04 10:37                   ` Peter Zijlstra
2016-05-04 15:31                     ` Peter Zijlstra
2016-05-05 22:03                     ` Matt Fleming
2016-05-06 18:54                       ` Mike Galbraith
2016-05-09  8:33                         ` Peter Zijlstra
2016-05-09  8:56                           ` Mike Galbraith
2016-05-04 15:45                   ` Peter Zijlstra
2016-05-04 17:46                     ` Chris Mason
2016-05-05  9:33                       ` Peter Zijlstra
2016-05-05 13:58                         ` Chris Mason
2016-05-06  7:12                           ` Peter Zijlstra
2016-05-06 17:27                             ` Chris Mason
2016-05-06  7:25                   ` Peter Zijlstra
2016-05-02 17:30             ` Mike Galbraith
2016-05-02 15:01           ` Peter Zijlstra
2016-05-02 16:04             ` Ingo Molnar
2016-05-03 11:31               ` Peter Zijlstra
2016-05-03 18:22                 ` Peter Zijlstra
2016-05-02 15:10           ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1459927644.5612.41.camel@suse.de \
    --to=mgalbraith@suse.de \
    --cc=clm@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).