From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751883AbcEKJF0 (ORCPT ); Wed, 11 May 2016 05:05:26 -0400 Received: from mga02.intel.com ([134.134.136.20]:46594 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751397AbcEKJFU (ORCPT ); Wed, 11 May 2016 05:05:20 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,608,1455004800"; d="scan'208";a="950787789" Date: Wed, 11 May 2016 09:23:47 +0800 From: Yuyang Du To: Mike Galbraith Cc: Peter Zijlstra , Chris Mason , Ingo Molnar , Matt Fleming , linux-kernel@vger.kernel.org Subject: Re: sched: tweak select_idle_sibling to look for idle threads Message-ID: <20160511012347.GA8790@intel.com> References: <1462765540.3803.44.camel@suse.de> <20160508202201.GM16093@intel.com> <1462779853.3803.128.camel@suse.de> <20160509011311.GQ16093@intel.com> <1462786745.3803.181.camel@suse.de> <20160509232623.GR16093@intel.com> <1462866562.3702.33.camel@suse.de> <1462893965.3702.56.camel@gmail.com> <20160510191646.GA4870@intel.com> <1462940271.3717.57.camel@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1462940271.3717.57.camel@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 11, 2016 at 06:17:51AM +0200, Mike Galbraith wrote: > > > static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq) > > > { > > > +> > > > if (sched_feat(LB_TIP_AVG_HIGH) && cfs_rq->load.weight > cfs_rq->runnable_load_avg*2) > > > +> > > > > > return cfs_rq->runnable_load_avg + min_t(unsigned long, NICE_0_LOAD, > > > +> > > > > > > > > > > > > > > > cfs_rq->load.weight/2); > > > > > > > return cfs_rq->runnable_load_avg; > > > } > > > > cfs_rq->runnable_load_avg is for sure no greater than (in this case much less > > than, maybe 1/2 of) load.weight, whereas load_avg is not necessarily a rock > > in gearbox that only impedes speed up, but also speed down. > > Yeah, just like everything else, it'll cuts both ways (why you can't > win the sched game). If I can believe tbench, at tasks=cpus, reducing > lag increased utilization and reduced latency a wee bit, as did the > reserve thing once a booboo got fixed up. Ok, so you have a secret IDLE_RESERVE? Good luck and show it, ;) > Makes sense, robbing Peter > to pay Paul should work out better for Paul. > > NO_LB_TIP_AVG_HIGH > Throughput 27132.9 MB/sec 96 clients 96 procs max_latency=7.656 ms > Throughput 28464.1 MB/sec 96 clients 96 procs max_latency=9.905 ms > Throughput 25369.8 MB/sec 96 clients 96 procs max_latency=7.192 ms > Throughput 25670.3 MB/sec 96 clients 96 procs max_latency=5.874 ms > Throughput 29309.3 MB/sec 96 clients 96 procs max_latency=1.331 ms > avg 27189 1.000 6.391 1.000 > > NO_LB_TIP_AVG_HIGH IDLE_RESERVE > Throughput 24437.5 MB/sec 96 clients 96 procs max_latency=1.837 ms > Throughput 29464.7 MB/sec 96 clients 96 procs max_latency=1.594 ms > Throughput 28023.6 MB/sec 96 clients 96 procs max_latency=1.494 ms > Throughput 28299.0 MB/sec 96 clients 96 procs max_latency=10.404 ms > Throughput 29072.1 MB/sec 96 clients 96 procs max_latency=5.575 ms > avg 27859 1.024 4.180 0.654 > > LB_TIP_AVG_HIGH NO_IDLE_RESERVE > Throughput 29068.1 MB/sec 96 clients 96 procs max_latency=5.599 ms > Throughput 26435.6 MB/sec 96 clients 96 procs max_latency=3.703 ms > Throughput 23930.0 MB/sec 96 clients 96 procs max_latency=7.742 ms > Throughput 29464.2 MB/sec 96 clients 96 procs max_latency=1.549 ms > Throughput 24250.9 MB/sec 96 clients 96 procs max_latency=1.518 ms > avg 26629 0.979 4.022 0.629 > > LB_TIP_AVG_HIGH IDLE_RESERVE > Throughput 30340.1 MB/sec 96 clients 96 procs max_latency=1.465 ms > Throughput 29042.9 MB/sec 96 clients 96 procs max_latency=4.515 ms > Throughput 26718.7 MB/sec 96 clients 96 procs max_latency=1.822 ms > Throughput 28694.4 MB/sec 96 clients 96 procs max_latency=1.503 ms > Throughput 28918.2 MB/sec 96 clients 96 procs max_latency=7.599 ms > avg 28742 1.057 3.380 0.528 > > > But I really don't know the load references in select_task_rq() should be > > what kind. So maybe the real issue is a mix of them, i.e., conflated balancing > > and just wanting an idle cpu. ? > > Depends on the goal. For both, load lagging reality means the high > frequency component is squelched, meaning less migration cost, but also > higher latency due to stacking. It's a tradeoff where Chris' latency > is everything" benchmark, and _maybe_ the real world load it's based > upon is on Peter's end of the rob Peter to pay Paul transaction. The > benchmark says it definitely is, the real world load may have already > been fixed up by the select_idle_sibling() rewrite. Obviously, load avgs are good at balancing in a larger scale in a timeframe, so they should be used in comparing/balancing sd's not cpus. However, this is not the case currently: avgs are mixed with idle cpu/core selection, so I think better job can be done before and after select_idle_sibling(). For example, I don't know what the complex wake_affine() is really doing for what. Am i missing something, you think? Kudos to select_idle_sibling() rewrite, like Peter said, a second step and an even third step scans are really helping, in addition to many cleanups and refactors.