From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751117Ab3AUFHN (ORCPT ); Mon, 21 Jan 2013 00:07:13 -0500 Received: from e23smtp03.au.ibm.com ([202.81.31.145]:44722 "EHLO e23smtp03.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750699Ab3AUFHL (ORCPT ); Mon, 21 Jan 2013 00:07:11 -0500 Message-ID: <50FCCCF5.30504@linux.vnet.ibm.com> Date: Mon, 21 Jan 2013 13:07:01 +0800 From: Michael Wang User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: Mike Galbraith CC: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org, mingo@kernel.org, a.p.zijlstra@chello.nl Subject: Re: [RFC PATCH 0/2] sched: simplify the select_task_rq_fair() References: <1356588535-23251-1-git-send-email-wangyun@linux.vnet.ibm.com> <50ED384C.1030301@linux.vnet.ibm.com> <1357977704.6796.47.camel@marge.simpson.net> <1357985943.6796.55.camel@marge.simpson.net> <1358155290.5631.19.camel@marge.simpson.net> <50F79256.1010900@linux.vnet.ibm.com> <1358654997.5743.17.camel@marge.simpson.net> <50FCACE3.5000706@linux.vnet.ibm.com> <1358743128.4994.33.camel@marge.simpson.net> In-Reply-To: <1358743128.4994.33.camel@marge.simpson.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13012105-6102-0000-0000-000002E3F5F5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/21/2013 12:38 PM, Mike Galbraith wrote: > On Mon, 2013-01-21 at 10:50 +0800, Michael Wang wrote: >> On 01/20/2013 12:09 PM, Mike Galbraith wrote: >>> On Thu, 2013-01-17 at 13:55 +0800, Michael Wang wrote: >>>> Hi, Mike >>>> >>>> I've send out the v2, which I suppose it will fix the below BUG and >>>> perform better, please do let me know if it still cause issues on your >>>> arm7 machine. >>> >>> s/arm7/aim7 >>> >>> Someone swiped half of CPUs/ram, so the box is now 2 10 core nodes vs 4. >>> >>> stock scheduler knobs >>> >>> 3.8-wang-v2 avg 3.8-virgin avg vs wang >>> Tasks jobs/min >>> 1 436.29 435.66 435.97 435.97 437.86 441.69 440.09 439.88 1.008 >>> 5 2361.65 2356.14 2350.66 2356.15 2416.27 2563.45 2374.61 2451.44 1.040 >>> 10 4767.90 4764.15 4779.18 4770.41 4946.94 4832.54 4828.69 4869.39 1.020 >>> 20 9672.79 9703.76 9380.80 9585.78 9634.34 9672.79 9727.13 9678.08 1.009 >>> 40 19162.06 19207.61 19299.36 19223.01 19268.68 19192.40 19056.60 19172.56 .997 >>> 80 37610.55 37465.22 37465.22 37513.66 37263.64 37120.98 37465.22 37283.28 .993 >>> 160 69306.65 69655.17 69257.14 69406.32 69257.14 69306.65 69257.14 69273.64 .998 >>> 320 111512.36 109066.37 111256.45 110611.72 108395.75 107913.19 108335.20 108214.71 .978 >>> 640 142850.83 148483.92 150851.81 147395.52 151974.92 151263.65 151322.67 151520.41 1.027 >>> 1280 52788.89 52706.39 67280.77 57592.01 189931.44 189745.60 189792.02 189823.02 3.295 >>> 2560 75403.91 52905.91 45196.21 57835.34 217368.64 217582.05 217551.54 217500.74 3.760 >>> >>> sched_latency_ns = 24ms >>> sched_min_granularity_ns = 8ms >>> sched_wakeup_granularity_ns = 10ms >>> >>> 3.8-wang-v2 avg 3.8-virgin avg vs wang >>> Tasks jobs/min >>> 1 436.29 436.60 434.72 435.87 434.41 439.77 438.81 437.66 1.004 >>> 5 2382.08 2393.36 2451.46 2408.96 2451.46 2453.44 2425.94 2443.61 1.014 >>> 10 5029.05 4887.10 5045.80 4987.31 4844.12 4828.69 4844.12 4838.97 .970 >>> 20 9869.71 9734.94 9758.45 9787.70 9513.34 9611.42 9565.90 9563.55 .977 >>> 40 19146.92 19146.92 19192.40 19162.08 18617.51 18603.22 18517.95 18579.56 .969 >>> 80 37177.91 37378.57 37292.31 37282.93 36451.13 36179.10 36233.18 36287.80 .973 >>> 160 70260.87 69109.05 69207.71 69525.87 68281.69 68522.97 68912.58 68572.41 .986 >>> 320 114745.56 113869.64 114474.62 114363.27 114137.73 114137.73 114137.73 114137.73 .998 >>> 640 164338.98 164338.98 164618.00 164431.98 164130.34 164130.34 164130.34 164130.34 .998 >>> 1280 209473.40 209134.54 209473.40 209360.44 210040.62 210040.62 210097.51 210059.58 1.003 >>> 2560 242703.38 242627.46 242779.34 242703.39 244001.26 243847.85 243732.91 243860.67 1.004 >>> >>> As you can see, the load collapsed at the high load end with stock >>> scheduler knobs (desktop latency). With knobs set to scale, the delta >>> disappeared. >> >> Thanks for the testing, Mike, please allow me to ask few questions. >> >> What are those tasks actually doing? what's the workload? > > It's the canned aim7 compute load, mixed bag load weighted toward > compute. Below is the workfile, should give you an idea. > > # @(#) workfile.compute:1.3 1/22/96 00:00:00 > # Compute Server Mix > FILESIZE: 100K > POOLSIZE: 250M > 50 add_double > 30 add_int > 30 add_long > 10 array_rtns > 10 disk_cp > 30 disk_rd > 10 disk_src > 20 disk_wrt > 40 div_double > 30 div_int > 50 matrix_rtns > 40 mem_rtns_1 > 40 mem_rtns_2 > 50 mul_double > 30 mul_int > 30 mul_long > 40 new_raph > 40 num_rtns_1 > 50 page_test > 40 series_1 > 10 shared_memory > 30 sieve > 20 stream_pipe > 30 string_rtns > 40 trig_rtns > 20 udp_test > That seems like the default one, could you please show me the numbers in your datapoint file? I'm not familiar with this benchmark, but I'd like to have a try on my server, to make sure whether it is a generic issue. >> And I'm confusing about how those new parameter value was figured out >> and how could them help solve the possible issue? > > Oh, that's easy. I set sched_min_granularity_ns such that last_buddy > kicks in when a third task arrives on a runqueue, and set > sched_wakeup_granularity_ns near minimum that still allows wakeup > preemption to occur. Combined effect is reduced over-scheduling. That sounds very hard, to catch the timing, whatever, it could be an important clue for analysis. >> Do you have any idea about which part in this patch set may cause the issue? > > Nope, I'm as puzzled by that as you are. When the box had 40 cores, > both virgin and patched showed over-scheduling effects, but not like > this. With 20 cores, symptoms changed in a most puzzling way, and I > don't see how you'd be directly responsible. Hmm... > >> One change by designed is that, for old logical, if it's a wake up and >> we found affine sd, the select func will never go into the balance path, >> but the new logical will, in some cases, do you think this could be a >> problem? > > Since it's the high load end, where looking for an idle core is most > likely to be a waste of time, it makes sense that entering the balance > path would hurt _some_, it isn't free.. except for twiddling preemption > knobs making the collapse just go away. We're still going to enter that > path if all cores are busy, no matter how I twiddle those knobs. May be we could try change this back to the old way later, after the aim 7 test on my server. > >>> I thought perhaps the bogus (shouldn't exist) CPU domain in mainline >>> somehow contributes to the strange behavioral delta, but killing it made >>> zero difference. All of these numbers for both trees were logged with >>> the below applies, but as noted, it changed nothing. >> >> The patch set was supposed to do accelerate by reduce the cost of >> select_task_rq(), so it should be harmless for all the conditions. > > Yeah, it should just save some cycles, but I like to eliminate known > bugs when testing, just in case. Agree, that's really important. Regards, Michael Wang > > -Mike >