All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@intel.com>
To: Michael Wang <wangyun@linux.vnet.ibm.com>
Cc: mingo@redhat.com, peterz@infradead.org, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com, bp@alien8.de,
	pjt@google.com, namhyung@kernel.org, efault@gmx.de,
	morten.rasmussen@arm.com, vincent.guittot@linaro.org,
	gregkh@linuxfoundation.org, preeti@linux.vnet.ibm.com,
	viresh.kumar@linaro.org, linux-kernel@vger.kernel.org,
	len.brown@intel.com, rafael.j.wysocki@intel.com, jkosina@suse.cz,
	clark.williams@gmail.com, tony.luck@intel.com,
	keescook@chromium.org, mgorman@suse.de, riel@redhat.com
Subject: Re: [patch v3 0/8] sched: use runnable avg in load balance
Date: Wed, 03 Apr 2013 14:53:50 +0800	[thread overview]
Message-ID: <515BD1FE.7030901@intel.com> (raw)
In-Reply-To: <515BCA94.6050703@linux.vnet.ibm.com>

On 04/03/2013 02:22 PM, Michael Wang wrote:
>> > 
>> > If many tasks sleep long time, their runnable load are zero. And if they
>> > are waked up bursty, too light runnable load causes big imbalance among
>> > CPU. So such benchmark, like aim9 drop 5~7%.
>> > 
>> > With this patch the losing is covered, and even is slight better.
> A fast test show the improvement disappear and the regression back
> again...after applied this one as the 8th patch, it doesn't works.

It always is good for on benchmark and bad for another. :)

the following patch include the renamed knob, and you can tune it under 
/proc/sys/kernel/... to see detailed impact degree.

>From e540b31b99c887d5a0c5338f7b1f224972b98932 Mon Sep 17 00:00:00 2001
From: Alex Shi <alex.shi@intel.com>
Date: Tue, 2 Apr 2013 10:27:45 +0800
Subject: [PATCH] sched: use instant load for burst wake up

If many tasks sleep long time, their runnable load are zero. And if they
are waked up bursty, too light runnable load causes big imbalance among
CPU. So such benchmark, like aim9 drop 5~7%.

rq->avg_idle is 'to used to accommodate bursty loads in a dirt simple
dirt cheap manner' -- Mike Galbraith.

With this cheap and smart bursty indicator, we can find the wake up
burst, and just use nr_running as instant utilization only.

The 'sysctl_sched_burst_threshold' used for wakeup burst.

With this patch the losing is covered, and even is slight better.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 include/linux/sched/sysctl.h |  1 +
 kernel/sched/fair.c          | 17 +++++++++++++++--
 kernel/sysctl.c              |  7 +++++++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index bf8086b..a3c3d43 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -53,6 +53,7 @@ extern unsigned int sysctl_numa_balancing_settle_count;
 
 #ifdef CONFIG_SCHED_DEBUG
 extern unsigned int sysctl_sched_migration_cost;
+extern unsigned int sysctl_sched_burst_threshold;
 extern unsigned int sysctl_sched_nr_migrate;
 extern unsigned int sysctl_sched_time_avg;
 extern unsigned int sysctl_timer_migration;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dbaa8ca..f41ca67 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -91,6 +91,7 @@ unsigned int sysctl_sched_wakeup_granularity = 1000000UL;
 unsigned int normalized_sysctl_sched_wakeup_granularity = 1000000UL;
 
 const_debug unsigned int sysctl_sched_migration_cost = 500000UL;
+const_debug unsigned int sysctl_sched_burst_threshold = 1000000UL;
 
 /*
  * The exponential sliding  window over which load is averaged for shares
@@ -3103,12 +3104,24 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 	unsigned long weight;
 	int balanced;
 	int runnable_avg;
+	int burst = 0;
 
 	idx	  = sd->wake_idx;
 	this_cpu  = smp_processor_id();
 	prev_cpu  = task_cpu(p);
-	load	  = source_load(prev_cpu, idx);
-	this_load = target_load(this_cpu, idx);
+
+	if (cpu_rq(this_cpu)->avg_idle < sysctl_sched_migration_cost ||
+		cpu_rq(prev_cpu)->avg_idle < sysctl_sched_migration_cost)
+		burst= 1;
+
+	/* use instant load for bursty waking up */
+	if (!burst) {
+		load = source_load(prev_cpu, idx);
+		this_load = target_load(this_cpu, idx);
+	} else {
+		load = cpu_rq(prev_cpu)->load.weight;
+		this_load = cpu_rq(this_cpu)->load.weight;
+	}
 
 	/*
 	 * If sync wakeup then subtract the (maximum possible)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index afc1dc6..1f23457 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -327,6 +327,13 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 	{
+		.procname	= "sched_burst_threshold_ns",
+		.data		= &sysctl_sched_burst_threshold,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
 		.procname	= "sched_nr_migrate",
 		.data		= &sysctl_sched_nr_migrate,
 		.maxlen		= sizeof(unsigned int),
-- 
1.7.12


-- 
Thanks Alex

  reply	other threads:[~2013-04-03  6:54 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-02  3:23 [patch v3 0/8] sched: use runnable avg in load balance Alex Shi
2013-04-02  3:23 ` [patch v3 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Alex Shi
2013-04-02  3:23 ` [patch v3 2/8] sched: set initial value of runnable avg for new forked task Alex Shi
2013-04-02  3:23 ` [patch v3 3/8] sched: only count runnable avg on cfs_rq's nr_running Alex Shi
2013-04-03  3:19   ` Alex Shi
2013-04-02  3:23 ` [patch v3 4/8] sched: update cpu load after task_tick Alex Shi
2013-04-02  3:23 ` [patch v3 5/8] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
2013-04-02  3:23 ` [patch v3 6/8] sched: consider runnable load average in move_tasks Alex Shi
2013-04-09  7:08   ` Vincent Guittot
2013-04-09  8:05     ` Alex Shi
2013-04-09  8:58       ` Vincent Guittot
2013-04-09 10:38         ` Alex Shi
2013-04-09 11:56           ` Vincent Guittot
2013-04-09 14:48             ` Alex Shi
2013-04-09 15:16               ` Vincent Guittot
2013-04-10  2:31                 ` Alex Shi
2013-04-10  6:07     ` Michael Wang
2013-04-10  6:55       ` Vincent Guittot
2013-04-02  3:23 ` [patch v3 7/8] sched: consider runnable load average in effective_load Alex Shi
2013-04-02  3:23 ` [patch v3 8/8] sched: use instant load for burst wake up Alex Shi
2013-04-02  7:23 ` [patch v3 0/8] sched: use runnable avg in load balance Michael Wang
2013-04-02  8:34   ` Mike Galbraith
2013-04-02  9:13     ` Michael Wang
2013-04-02  8:35   ` Alex Shi
2013-04-02  9:45     ` Michael Wang
2013-04-03  2:46     ` Michael Wang
2013-04-03  2:56       ` Alex Shi
2013-04-03  3:23         ` Michael Wang
2013-04-03  4:28           ` Alex Shi
2013-04-03  5:38             ` Michael Wang
2013-04-03  5:53               ` Michael Wang
2013-04-03  6:01               ` Alex Shi
2013-04-03  6:22             ` Michael Wang
2013-04-03  6:53               ` Alex Shi [this message]
2013-04-03  7:18                 ` Michael Wang
2013-04-03  7:28                   ` Alex Shi
2013-04-03  8:46   ` Alex Shi
2013-04-03  9:37     ` Michael Wang
2013-04-03 11:17       ` Alex Shi
2013-04-07  3:09     ` Michael Wang
2013-04-07  7:30       ` Alex Shi
2013-04-07  8:56         ` Michael Wang
2013-04-09  5:08 ` Alex Shi
2013-04-10 13:12   ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=515BD1FE.7030901@intel.com \
    --to=alex.shi@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=clark.williams@gmail.com \
    --cc=efault@gmx.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=jkosina@suse.cz \
    --cc=keescook@chromium.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=wangyun@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.