From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752838Ab3EaBSC (ORCPT ); Thu, 30 May 2013 21:18:02 -0400 Received: from mga02.intel.com ([134.134.136.20]:29778 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752502Ab3EaBRm (ORCPT ); Thu, 30 May 2013 21:17:42 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.87,774,1363158000"; d="scan'208";a="322230562" Message-ID: <51A7FA14.70902@intel.com> Date: Fri, 31 May 2013 09:17:08 +0800 From: Alex Shi User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 MIME-Version: 1.0 To: Morten Rasmussen CC: peterz@infradead.org, mingo@kernel.org, preeti@linux.vnet.ibm.com, vincent.guittot@linaro.org, efault@gmx.de, pjt@google.com, linux-kernel@vger.kernel.org, linaro-kernel@lists.linaro.org, arjan@linux.intel.com, len.brown@intel.com, corbet@lwn.net, tglx@linutronix.de Subject: Re: [RFC] Comparison of power-efficient scheduling patch sets References: <20130530134718.GB32728@e103034-lin> In-Reply-To: <20130530134718.GB32728@e103034-lin> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/30/2013 09:47 PM, Morten Rasmussen wrote: > Hi, > > A number of patch sets related to power-efficient scheduling have been > posted over the last couple of months. Most of them do not have much > data to back them up, so I decided to do some testing. > > Common for all of the patch sets that I have tested, except one, is that > they attempt to pack tasks on as few cpus as possible to allow the > remaining cpus to enter deeper sleep states - a strategy that should > make sense on most platforms that support per-cpu power gating and > multi-socket machines. > > Kernel: 3.9 > > Patch sets: > rlb-v4: sched: use runnable load based balance (Alex Shi) > Thanks for the valuable comparison! The runnable load balance target is performance. It is still try to disperse tasks to as much as possible CPUs. :) The latest v7 version remove the 6th patch(wake_affine change) in v4. and plus fix a slept time double counting issue, and remove blocked_load_avg in tg load. http://comments.gmane.org/gmane.linux.kernel/1498988 Enjoy! > pas-v7: sched: power aware scheduling (Alex Shi) > We still have some internal discussion on this patch set before update it. Sorry for response late on this patchset! > pst-v3: sched: packing small tasks (Vincent Guittot) > > pst-v4: sched: packing small tasks (Vincent Guittot) > > > Configuration: > pas-v7: Set to "powersaving" mode. > pst-v4: Set to "Full" packing mode. > > Platform: > ARM TC2 (test-chip), 2xCortex-A15 + 3xCortex-A7. Cortex-A15s disabled. > > Measurement technique: > Time spent non-idle (not in idle state) for each cpu based on cpuidle > ftrace events. TC2 does not have per-core power-gating, so packing > inside the A7 cluster does not lead to any significant power savings. > Note that any product grade hardware (TC2 is a test-chip) will very > likely have per-core power-gating, so in those cases packing will have > an appreciable effect on power savings. > Measuring non-idle time rather than power should give a more clear idea > about the effect of the patch sets given that the idle back-end is > highly implementation specific. > > Benchmarks: > audio playback (Android): 30s mp3 file playback on Android. > bbench+audio (Android): Web page rendering while doing mp3 playback. > andebench_native (Android): Android benchmark running in native mode. > cyclictest: Short periodic tasks. > > Results: > Two runs for each patch set. > > audio playback (Android) SMP > non-idle % cpu 0 cpu 1 cpu 2 > 3.9_1 11.96 2.86 2.48 > 3.9_2 12.64 2.81 1.88 > rlb-v4_1 12.61 2.44 1.90 > rlb-v4_2 12.45 2.44 1.90 > pas-v7_1 16.17 0.03 0.24 > pas-v7_2 16.08 0.28 0.07 > pst-v3_1 15.18 2.76 1.70 > pst-v3_2 15.13 0.80 0.38 > pst-v4_1 16.14 0.05 0.00 > pst-v4_2 16.34 0.06 0.00 > > bbench+audio (Android) SMP > non-idle % cpu 0 cpu 1 cpu 2 render time > 3.9_1 25.00 20.73 21.22 812 > 3.9_2 24.29 19.78 22.34 795 > rlb-v4_1 23.84 19.36 22.74 782 > rlb-v4_2 24.07 19.36 22.74 797 > pas-v7_1 28.29 17.86 16.01 869 > pas-v7_2 28.62 18.54 15.05 908 > pst-v3_1 29.14 20.59 21.72 830 > pst-v3_2 27.69 18.81 20.06 830 > pst-v4_1 42.20 13.63 2.29 880 > pst-v4_2 41.56 14.40 2.17 935 > > andebench_native (8 threads) (Android) SMP > non-idle % cpu 0 cpu 1 cpu 2 Score > 3.9_1 99.22 98.88 99.61 4139 > 3.9_2 99.56 99.31 99.46 4148 > rlb-v4_1 99.49 99.61 99.53 4153 > rlb-v4_2 99.56 99.61 99.53 4149 > pas-v7_1 99.53 99.59 99.29 4149 > pas-v7_2 99.42 99.63 99.48 4150 > pst-v3_1 97.89 99.33 99.42 4097 > pst-v3_2 99.16 99.62 99.42 4097 > pst-v4_1 99.34 99.01 99.59 4146 > pst-v4_2 99.49 99.52 99.20 4146 > > cyclictest SMP > non-idle % cpu 0 cpu 1 cpu 2 > 3.9_1 9.13 8.88 8.41 > 3.9_2 10.27 8.02 6.30 > rlb-v4_1 8.88 8.09 8.11 > rlb-v4_2 8.49 8.09 8.11 > pas-v7_1 10.20 0.02 11.50 > pas-v7_2 7.86 14.31 0.02 > pst-v3_1 20.44 8.68 7.97 > pst-v3_2 20.41 0.78 1.00 > pst-v4_1 21.32 0.21 0.05 > pst-v4_2 21.56 0.21 0.04 > > Overall, pas-v7 seems to do a fairly good job at packing. The idle time > distribution seems to be somewhere between pst-v3 and the more > aggressive pst-v4 for all the benchmarks. pst-v4 manages to keep two > cpus nearly idle (<0.25% non-idle) for both cyclictest and audio, which > is better than both pst-v3 and pas-v7. pas-v7 fails to pack cyclictest. > Packing does come at at cost which can be seen for bbench+audio, where > pst-v3 and rlb-v4 get better render times than pas-v7 and pst-v4 which > do more aggressive packing. rlb-v4 does not pack, it is only included > for reference. > > From a packing perspective pst-v4 seems to do the best job for the > workloads that I have tested on ARM TC2. The less aggressive packing in > pst-v3 may be a better choice for in terms of performance. > > I'm well aware that these tests are heavily focused on mobile workloads. > I would therefore encourage people to share your test results for your > workloads on your platforms to complete the picture. Comments are also > welcome. > > Thanks, > Morten > > -- Thanks Alex