From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752214AbcEBFfe (ORCPT ); Mon, 2 May 2016 01:35:34 -0400 Received: from mx2.suse.de ([195.135.220.15]:44178 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750936AbcEBFf0 (ORCPT ); Mon, 2 May 2016 01:35:26 -0400 Message-ID: <1462167323.4507.1.camel@suse.de> Subject: Re: [PATCH RFC] select_idle_sibling experiments From: Mike Galbraith To: Peter Zijlstra Cc: Chris Mason , Ingo Molnar , Matt Fleming , linux-kernel@vger.kernel.org Date: Mon, 02 May 2016 07:35:23 +0200 In-Reply-To: <20160428120012.GZ3430@twins.programming.kicks-ass.net> References: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com> <1459927644.5612.41.camel@suse.de> <20160428120012.GZ3430@twins.programming.kicks-ass.net> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2016-04-28 at 14:00 +0200, Peter Zijlstra wrote: > On Wed, Apr 06, 2016 at 09:27:24AM +0200, Mike Galbraith wrote: > > sched: ratelimit nohz > > > > Entering nohz code on every micro-idle is too expensive to bear. > > > > Signed-off-by: Mike Galbraith > > > +int sched_needs_cpu(int cpu) > > +{ > > +> > > > if (tick_nohz_full_cpu(cpu)) > > +> > > > > > return 0; > > + > > +> > > > return cpu_rq(cpu)->avg_idle < sysctl_sched_migration_cost; > > So the only problem I have with this patch is the choice of limit. This > isn't at all tied to the migration cost. > > And some people are already twiddling with the migration_cost knob to > affect the idle_balance() behaviour -- making it much more agressive by > dialing it down. When you do that you also loose the effectiveness of > this proposed usage, even though those same people would probably want > this. > > Failing a spot of inspiration for a runtime limit on this; we might have > to introduce yet another knob :/ sched: ratelimit nohz tick shutdown/restart Tick shutdown/restart overhead can be substantial when CPUs enter/exit the idle loop at high frequency. Ratelimit based upon rq->avg_idle, and provide an adjustment knob. Signed-off-by: Mike Galbraith --- include/linux/sched.h | 5 +++++ include/linux/sched/sysctl.h | 4 ++++ kernel/sched/core.c | 10 ++++++++++ kernel/sysctl.c | 9 +++++++++ kernel/time/tick-sched.c | 2 +- 5 files changed, 29 insertions(+), 1 deletion(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2286,6 +2286,11 @@ static inline int set_cpus_allowed_ptr(s #ifdef CONFIG_NO_HZ_COMMON void calc_load_enter_idle(void); void calc_load_exit_idle(void); +#ifdef CONFIG_SMP +extern int sched_needs_cpu(int cpu); +#else +static inline int sched_needs_cpu(int cpu) { return 0; } +#endif #else static inline void calc_load_enter_idle(void) { } static inline void calc_load_exit_idle(void) { } --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -19,6 +19,10 @@ extern unsigned int sysctl_sched_min_gra extern unsigned int sysctl_sched_wakeup_granularity; extern unsigned int sysctl_sched_child_runs_first; +#if defined(CONFIG_NO_HZ_COMMON) && defined(CONFIG_SMP) +extern unsigned int sysctl_sched_nohz_throttle; +#endif + enum sched_tunable_scaling { SCHED_TUNABLESCALING_NONE, SCHED_TUNABLESCALING_LOG, --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -577,6 +577,16 @@ static inline bool got_nohz_idle_kick(vo return false; } +unsigned int sysctl_sched_nohz_throttle = 500000UL; + +int sched_needs_cpu(int cpu) +{ + if (tick_nohz_full_cpu(cpu)) + return 0; + + return cpu_rq(cpu)->avg_idle < sysctl_sched_nohz_throttle; +} + #else /* CONFIG_NO_HZ_COMMON */ static inline bool got_nohz_idle_kick(void) --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -351,6 +351,15 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, +#ifdef CONFIG_NO_HZ_COMMON + { + .procname = "sched_nohz_throttle_ns", + .data = &sysctl_sched_nohz_throttle, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, +#endif #ifdef CONFIG_SCHEDSTATS { .procname = "sched_schedstats", --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -676,7 +676,7 @@ static ktime_t tick_nohz_stop_sched_tick } while (read_seqretry(&jiffies_lock, seq)); ts->last_jiffies = basejiff; - if (rcu_needs_cpu(basemono, &next_rcu) || + if (sched_needs_cpu(cpu) || rcu_needs_cpu(basemono, &next_rcu) || arch_needs_cpu() || irq_work_needs_cpu()) { next_tick = basemono + TICK_NSEC; } else {