From: Jason Baron <jbaron@redhat.com>
To: Paul Turner <pjt@google.com>
Cc: linux-kernel@vger.kernel.org,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Bharata B Rao <bharata@linux.vnet.ibm.com>,
Dhaval Giani <dhaval.giani@gmail.com>,
Balbir Singh <bsingharora@gmail.com>,
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
Srivatsa Vaddagiri <vatsa@in.ibm.com>,
Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
Ingo Molnar <mingo@elte.hu>, Pavel Emelyanov <xemul@openvz.org>
Date: Thu, 21 Jul 2011 20:32:12 -0400 [thread overview]
raw)
rth@redhat.com
Bcc:
Subject: Re: [RFT][patch 17/18] sched: use jump labels to reduce overhead
when bandwidth control is inactive
Reply-To:
In-Reply-To: <20110721184758.403388616@google.com>
On Thu, Jul 21, 2011 at 09:43:42AM -0700, Paul Turner wrote:
> So I'm seeing some strange costs associated with jump_labels; while on paper
> the branches and instructions retired improves (as expected) we're taking an
> unexpected hit in IPC.
>
> [From the initial mail we have workloads:
> mkdir -p /cgroup/cpu/test
> echo $$ > /dev/cgroup/cpu/test (only cpu,cpuacct mounted)
> (W1) taskset -c 0 perf stat --repeat 50 -e instructions,cycles,branches bash -c "for ((i=0;i<5;i++)); do $(dirname $0)/pipe-test 20000; done"
> (W2)taskset -c 0 perf stat --repeat 50 -e instructions,cycles,branches bash -c "$(dirname $0)/pipe-test 100000;true"
> (W3)taskset -c 0 perf stat --repeat 50 -e instructions,cycles,branches bash -c "$(dirname $0)/pipe-test 100000;"
> ]
>
> To make some of the figures more clear:
>
> Legend:
> !BWC = tip + bwc, BWC compiled out
> BWC = tip + bwc
> BWC_JL = tip + bwc + jump label (this patch)
>
>
> Now, comparing under W1 we see:
> W1: BWC vs BWC_JL
> instructions cycles branches elapsed
> ---------------------------------------------------------------------------------------------------------------------
> clovertown [BWC] 845934117 974222228 152715407 0.419014188 [baseline]
> +unconstrained 857963815 (+1.42) 1007152750 (+3.38) 153140328 (+0.28) 0.433186926 (+3.38) [rel]
> +10000000000/1000: 876937753 (+2.55) 1033978705 (+5.65) 160038434 (+3.59) 0.443638365 (+5.66) [rel]
> +10000000000/1000000: 880276838 (+3.08) 1036176245 (+6.13) 160683878 (+4.15) 0.444577244 (+6.14) [rel]
>
> barcelona [BWC] 820573353 748178486 148161233 0.342122850 [baseline]
> +unconstrained 817011602 (-0.43) 759838181 (+1.56) 145951513 (-1.49) 0.347462571 (+1.56) [rel]
> +10000000000/1000: 830109086 (+0.26) 770451537 (+1.67) 151228902 (+1.08) 0.350824677 (+1.65) [rel]
> +10000000000/1000000: 830196206 (+0.30) 770704213 (+2.27) 151250413 (+1.12) 0.350962182 (+2.28) [rel]
>
> westmere [BWC] 802533191 694415157 146071233 0.194428018 [baseline]
> +unconstrained 799057936 (-0.43) 751384496 (+8.20) 143875513 (-1.50) 0.211182620 (+8.62) [rel]
> +10000000000/1000: 812033785 (+0.27) 761469084 (+8.51) 149134146 (+1.09) 0.212149229 (+8.28) [rel]
> +10000000000/1000000: 811912834 (+0.27) 757842988 (+7.45) 149113291 (+1.09) 0.211364804 (+7.30) [rel]
> e.g. Barcelona issues ~0.43% less instructions, for a total of 817011602, in
> the unconstrained case with BWC.
>
>
> Where "unconstrained, 10000000000/1000, 10000000000/10000" are the on
> measurements for BWC_JL, with (%d) being the relative difference to their
> BWC counterparts.
>
> W1: BWC vs BWC_JL is very similar.
> BWC vs BWC_JL
> clovertown [BWC] 985732031 1283113452 175621212 1.375905653
> +unconstrained 979242938 (-0.66) 1288971141 (+0.46) 172122546 (-1.99) 1.389795165 (+1.01) [rel]
> +10000000000/1000: 999886468 (+0.33) 1296597143 (+1.13) 180554004 (+1.62) 1.392576770 (+1.18) [rel]
> +10000000000/1000000: 999034223 (+0.11) 1293925500 (+0.57) 180413829 (+1.39) 1.391041338 (+0.94) [rel]
>
> barcelona [BWC] 982139920 1078757792 175417574 1.069537049
> +unconstrained 965443672 (-1.70) 1075377223 (-0.31) 170215844 (-2.97) 1.045595065 (-2.24) [rel]
> +10000000000/1000: 989104943 (+0.05) 1100836668 (+0.52) 178837754 (+1.22) 1.058730316 (-1.77) [rel]
> +10000000000/1000000: 987627489 (-0.32) 1095843758 (-0.17) 178567411 (+0.84) 1.056100899 (-2.28) [rel]
>
> westmere [BWC] 918633403 896047900 166496917 0.754629182
> +unconstrained 914740541 (-0.42) 903906801 (+0.88) 163652848 (-1.71) 0.758050332 (+0.45) [rel]
> +10000000000/1000: 927517377 (-0.41) 952579771 (+5.67) 170173060 (+0.75) 0.771193786 (+2.43) [rel]
> +10000000000/1000000: 914676985 (-0.89) 936106277 (+3.81) 167683288 (+0.22) 0.764973632 (+1.38) [rel]
>
> Now this is rather odd, almost across the board we're seeing the expected
> drops in instructions and branches, yet we appear to be paying a heavy IPC
> price. The fact that wall-time has scaled equivalently with cycles roughly
> rules out the cycles counter being off.
>
> We are seeing the expected behavior in the bandwidth enabled case;
> specifically the <jl=jmp><ret><cond><ret> blocks are taking an extra branch
> and instruction which shows up on all the numbers above.
>
> With respect to compiler mangling the text is essentially unchanged in size.
> One lurking suspicion is whether the inserted nops have perturbed some of the
> jmp/branch alignments?
>
> text data bss dec hex filename
> 7277206 2827256 2125824 12230286 ba9e8e vmlinux.jump_label
> 7276886 2826744 2125824 12229454 ba9b4e vmlinux.no_jump_label
>
> I have checked to make sure that the right instructions are being patched in
> at run-time. I've also pulled a fully patched jump_label out of the kernel
> into a userspace test (and benchmarked it directly under perf). The results
> here are also exactly as expected.
>
> e.g.
> Performance counter stats for './jump_test':
> 1,500,839,002 instructions, 300,147,081 branches 702,468,404 cycles
> Performance counter stats for './jump_test 1':
> 2,001,014,609 instructions, 400,177,192 branches 901,758,219 cycles
>
> Overall if we can fix the IPC the benefit in the globally unconstrained case
> looks really good.
>
> Any thoughts Jason?
>
Do you have CONFIG_CC_OPTIMIZE_FOR_SIZE set? I know that when
CONFIG_CC_OPTIMIZE_FOR_SIZE is not set, the compiler can make the code
more optimal.
thanks,
-Jason
next reply other threads:[~2011-07-22 0:32 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-22 0:32 Jason Baron [this message]
2011-07-22 0:57 ` Paul Turner
2011-07-22 1:17 ` [RFT][patch 17/18] sched: use jump labels to reduce overhead when bandwidth control is inactive Jason Baron
2011-07-22 1:38 ` Paul Turner
2011-07-27 21:58 ` Jason Baron
2011-08-05 3:53 ` Paul Turner
2011-08-05 7:21 ` Peter Zijlstra
2011-08-05 3:55 ` Paul Turner
2011-08-05 18:28 ` Jason Baron
2011-08-05 8:30 ` Peter Zijlstra
2011-08-05 15:11 ` Richard Henderson
2011-08-05 15:14 ` Peter Zijlstra
2011-08-05 15:24 ` Jason Baron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110722003211.GA2807@redhat.com \
--to=jbaron@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=bharata@linux.vnet.ibm.com \
--cc=bsingharora@gmail.com \
--cc=dhaval.giani@gmail.com \
--cc=kamalesh@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=pjt@google.com \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=svaidy@linux.vnet.ibm.com \
--cc=vatsa@in.ibm.com \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.