linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Paul Turner <pjt@google.com>,
	linux-kernel@vger.kernel.org,
	Bharata B Rao <bharata@linux.vnet.ibm.com>,
	Dhaval Giani <dhaval.giani@gmail.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
	Pavel Emelyanov <xemul@openvz.org>
Subject: Re: [patch 00/16] CFS Bandwidth Control v7
Date: Fri, 24 Jun 2011 14:11:06 +0900	[thread overview]
Message-ID: <4E041C6A.4000701@jp.fujitsu.com> (raw)
In-Reply-To: <20110623124310.GA15430@elte.hu>

[-- Attachment #1: Type: text/plain, Size: 7253 bytes --]

(2011/06/23 21:43), Ingo Molnar wrote:
> 
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
>> On Wed, 2011-06-22 at 19:05 +0900, Hidetoshi Seto wrote:
>>
>>> I'll continue my test/benchmark on this v7 for a while. Though I 
>>> believe no more bug is there, I'll let you know if there is 
>>> something.
>>
>> Would that testing include performance of a kernel without these 
>> patches vs one with these patches in a configuration where the new 
>> feature is compiled in but not used?
>>
>> It does add a number of if (!cfs_rq->runtime_enabled) return 
>> branches all over the place, some possibly inside a function call 
>> (depending on what the auto-inliner does). So while the impact 
>> should be minimal, it would be very good to test it is indeed so.
> 
> Yeah, doing such performance tests is absolutely required. Branches 
> and instructions impact should be measured as well, beyond the cycles 
> impact.
> 
> The changelog of this recent commit:
> 
>   c8b281161dfa: sched: Increase SCHED_LOAD_SCALE resolution
> 
> gives an example of how to do such measurements.

Thank you for useful guidance!

I've run pipe-test-100k on both of a kernel without patches (3.0-rc4)
and one with patches (3.0-rc4+), in similar way as that described in
the change log you pointed (but I add "-d" for more details).

I sampled 4 results for each: repeat 10 times * 3 + repeat 200 times * 1.
Cgroups are not used in both, therefore of course CFS bandwidth control
is not used in one that have patched.  Results are archived and attached.

Here is a comparison in diff style:

=====
--- /home/seto/bwc-pipe-test/bwc-rc4-orig.txt   2011-06-24 11:52:16.000000000 +0900
+++ /home/seto/bwc-pipe-test/bwc-rc4-patched.txt        2011-06-24 12:08:32.000000000 +0900
 [seto@SIRIUS-F14 perf]$ taskset 1 ./perf stat -d -d -d --repeat 200 ../../../pipe-test-100k

  Performance counter stats for '../../../pipe-test-100k' (200 runs):

-        865.139070 task-clock                #    0.468 CPUs utilized            ( +-  0.22% )
-           200,167 context-switches          #    0.231 M/sec                    ( +-  0.00% )
-                 0 CPU-migrations            #    0.000 M/sec                    ( +- 49.62% )
-               142 page-faults               #    0.000 M/sec                    ( +-  0.07% )
-     1,671,107,623 cycles                    #    1.932 GHz                      ( +-  0.16% ) [28.23%]
-       838,554,329 stalled-cycles-frontend   #   50.18% frontend cycles idle     ( +-  0.27% ) [28.21%]
-       453,526,560 stalled-cycles-backend    #   27.14% backend  cycles idle     ( +-  0.43% ) [28.33%]
-     1,434,140,915 instructions              #    0.86  insns per cycle
-                                             #    0.58  stalled cycles per insn  ( +-  0.06% ) [34.01%]
-       279,485,621 branches                  #  323.053 M/sec                    ( +-  0.06% ) [33.98%]
-         6,653,998 branch-misses             #    2.38% of all branches          ( +-  0.16% ) [33.93%]
-       495,463,378 L1-dcache-loads           #  572.698 M/sec                    ( +-  0.05% ) [28.12%]
-        27,903,270 L1-dcache-load-misses     #    5.63% of all L1-dcache hits    ( +-  0.28% ) [27.84%]
-           885,210 LLC-loads                 #    1.023 M/sec                    ( +-  3.21% ) [21.80%]
-             9,479 LLC-load-misses           #    1.07% of all LL-cache hits     ( +-  0.63% ) [ 5.61%]
-       830,096,007 L1-icache-loads           #  959.494 M/sec                    ( +-  0.08% ) [11.18%]
-       123,728,370 L1-icache-load-misses     #   14.91% of all L1-icache hits    ( +-  0.06% ) [16.78%]
-       504,932,490 dTLB-loads                #  583.643 M/sec                    ( +-  0.06% ) [22.30%]
-         2,056,069 dTLB-load-misses          #    0.41% of all dTLB cache hits   ( +-  2.23% ) [22.20%]
-     1,579,410,083 iTLB-loads                # 1825.614 M/sec                    ( +-  0.06% ) [22.30%]
-           394,739 iTLB-load-misses          #    0.02% of all iTLB cache hits   ( +-  0.03% ) [22.27%]
-         2,286,363 L1-dcache-prefetches      #    2.643 M/sec                    ( +-  0.72% ) [22.40%]
-           776,096 L1-dcache-prefetch-misses #    0.897 M/sec                    ( +-  1.45% ) [22.54%]
+        859.259725 task-clock                #    0.472 CPUs utilized            ( +-  0.24% )
+           200,165 context-switches          #    0.233 M/sec                    ( +-  0.00% )
+                 0 CPU-migrations            #    0.000 M/sec                    ( +-100.00% )
+               142 page-faults               #    0.000 M/sec                    ( +-  0.06% )
+     1,659,371,974 cycles                    #    1.931 GHz                      ( +-  0.18% ) [28.23%]
+       829,806,955 stalled-cycles-frontend   #   50.01% frontend cycles idle     ( +-  0.32% ) [28.32%]
+       490,316,435 stalled-cycles-backend    #   29.55% backend  cycles idle     ( +-  0.46% ) [28.34%]
+     1,445,166,061 instructions              #    0.87  insns per cycle
+                                             #    0.57  stalled cycles per insn  ( +-  0.06% ) [34.01%]
+       282,370,988 branches                  #  328.621 M/sec                    ( +-  0.06% ) [33.93%]
+         5,056,568 branch-misses             #    1.79% of all branches          ( +-  0.19% ) [33.94%]
+       500,660,789 L1-dcache-loads           #  582.665 M/sec                    ( +-  0.06% ) [28.05%]
+        26,802,313 L1-dcache-load-misses     #    5.35% of all L1-dcache hits    ( +-  0.26% ) [27.83%]
+           872,571 LLC-loads                 #    1.015 M/sec                    ( +-  3.73% ) [21.82%]
+             9,050 LLC-load-misses           #    1.04% of all LL-cache hits     ( +-  0.55% ) [ 5.70%]
+       794,396,111 L1-icache-loads           #  924.512 M/sec                    ( +-  0.06% ) [11.30%]
+       130,179,414 L1-icache-load-misses     #   16.39% of all L1-icache hits    ( +-  0.09% ) [16.85%]
+       511,119,889 dTLB-loads                #  594.837 M/sec                    ( +-  0.06% ) [22.37%]
+         2,452,378 dTLB-load-misses          #    0.48% of all dTLB cache hits   ( +-  2.31% ) [22.14%]
+     1,597,897,243 iTLB-loads                # 1859.621 M/sec                    ( +-  0.06% ) [22.17%]
+           394,366 iTLB-load-misses          #    0.02% of all iTLB cache hits   ( +-  0.03% ) [22.24%]
+         1,897,401 L1-dcache-prefetches      #    2.208 M/sec                    ( +-  0.64% ) [22.38%]
+           879,391 L1-dcache-prefetch-misses #    1.023 M/sec                    ( +-  0.90% ) [22.54%]

-       1.847093132 seconds time elapsed                                          ( +-  0.19% )
+       1.822131534 seconds time elapsed                                          ( +-  0.21% )
=====

As Peter have expected, the number of branches is slightly increased.

-       279,485,621 branches                  #  323.053 M/sec                    ( +-  0.06% ) [33.98%]
+       282,370,988 branches                  #  328.621 M/sec                    ( +-  0.06% ) [33.93%]

However, looking overall, I think there is no significant problem on
the score with this patch set.  I'd love to hear from maintainers.


Thanks,
H.Seto

[-- Attachment #2: bwc-pipe-test.tar.bz2 --]
[-- Type: application/octet-stream, Size: 5124 bytes --]

  reply	other threads:[~2011-06-24  5:11 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-21  7:16 [patch 00/16] CFS Bandwidth Control v7 Paul Turner
2011-06-21  7:16 ` [patch 01/16] sched: (fixlet) dont update shares twice on on_rq parent Paul Turner
2011-06-21  7:16 ` [patch 02/16] sched: hierarchical task accounting for SCHED_OTHER Paul Turner
2011-06-21  7:16 ` [patch 03/16] sched: introduce primitives to account for CFS bandwidth tracking Paul Turner
2011-06-22 10:52   ` Peter Zijlstra
2011-07-06 21:38     ` Paul Turner
2011-07-07 11:32       ` Peter Zijlstra
2011-06-21  7:16 ` [patch 04/16] sched: validate CFS quota hierarchies Paul Turner
2011-06-22  5:43   ` Bharata B Rao
2011-06-22  6:57     ` Paul Turner
2011-06-22  9:38   ` Hidetoshi Seto
2011-06-21  7:16 ` [patch 05/16] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth Paul Turner
2011-06-21  7:16 ` [patch 06/16] sched: add a timer to handle CFS bandwidth refresh Paul Turner
2011-06-22  9:38   ` Hidetoshi Seto
2011-06-21  7:16 ` [patch 07/16] sched: expire invalid runtime Paul Turner
2011-06-22  9:38   ` Hidetoshi Seto
2011-06-22 15:47   ` Peter Zijlstra
2011-06-28  4:42     ` Paul Turner
2011-06-29  2:29       ` Paul Turner
2011-06-21  7:16 ` [patch 08/16] sched: throttle cfs_rq entities which exceed their local runtime Paul Turner
2011-06-22  7:11   ` Bharata B Rao
2011-06-22 16:07   ` Peter Zijlstra
2011-06-22 16:54     ` Paul Turner
2011-06-21  7:16 ` [patch 09/16] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh Paul Turner
2011-06-22 17:29   ` Peter Zijlstra
2011-06-28  4:40     ` Paul Turner
2011-06-28  9:11       ` Peter Zijlstra
2011-06-29  3:37         ` Paul Turner
2011-06-21  7:16 ` [patch 10/16] sched: throttle entities exceeding their allowed bandwidth Paul Turner
2011-06-22  9:39   ` Hidetoshi Seto
2011-06-21  7:17 ` [patch 11/16] sched: allow for positional tg_tree walks Paul Turner
2011-06-21  7:17 ` [patch 12/16] sched: prevent interactions with throttled entities Paul Turner
2011-06-22 21:34   ` Peter Zijlstra
2011-06-28  4:43     ` Paul Turner
2011-06-23 11:49   ` Peter Zijlstra
2011-06-28  4:38     ` Paul Turner
2011-06-21  7:17 ` [patch 13/16] sched: migrate throttled tasks on HOTPLUG Paul Turner
2011-06-21  7:17 ` [patch 14/16] sched: add exports tracking cfs bandwidth control statistics Paul Turner
2011-06-21  7:17 ` [patch 15/16] sched: return unused runtime on voluntary sleep Paul Turner
2011-06-21  7:33   ` Paul Turner
2011-06-22  9:39   ` Hidetoshi Seto
2011-06-23 15:26   ` Peter Zijlstra
2011-06-28  1:42     ` Paul Turner
2011-06-28 10:01       ` Peter Zijlstra
2011-06-28 18:45         ` Paul Turner
2011-06-21  7:17 ` [patch 16/16] sched: add documentation for bandwidth control Paul Turner
2011-06-21 10:30   ` Hidetoshi Seto
2011-06-21 19:46     ` Paul Turner
2011-06-22 10:05 ` [patch 00/16] CFS Bandwidth Control v7 Hidetoshi Seto
2011-06-23 12:06   ` Peter Zijlstra
2011-06-23 12:43     ` Ingo Molnar
2011-06-24  5:11       ` Hidetoshi Seto [this message]
2011-06-26 10:35         ` Ingo Molnar
2011-06-29  4:05           ` Hu Tao
2011-07-01 12:28             ` Ingo Molnar
2011-07-05  3:58               ` Hu Tao
2011-07-05  8:50                 ` Ingo Molnar
2011-07-05  8:52                   ` Ingo Molnar
2011-07-07  3:53                     ` Hu Tao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E041C6A.4000701@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=dhaval.giani@gmail.com \
    --cc=kamalesh@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=pjt@google.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=vatsa@in.ibm.com \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).