From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35345) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gR4n5-00026p-6W for qemu-devel@nongnu.org; Sun, 25 Nov 2018 19:30:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gR4n1-0005ad-VD for qemu-devel@nongnu.org; Sun, 25 Nov 2018 19:30:19 -0500 Received: from out4-smtp.messagingengine.com ([66.111.4.28]:44659) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gR4n1-0005a6-Lv for qemu-devel@nongnu.org; Sun, 25 Nov 2018 19:30:15 -0500 Date: Sun, 25 Nov 2018 19:30:11 -0500 From: "Emilio G. Cota" Message-ID: <20181126003011.GA12936@flamenco> References: <20181123144558.5048-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181123144558.5048-1-richard.henderson@linaro.org> Subject: Re: [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: qemu-devel@nongnu.org, Alistair.Francis@wdc.com On Fri, Nov 23, 2018 at 15:45:21 +0100, Richard Henderson wrote: > This includes everything queued so far -- softmmu out-of-line > patches Reviewed-by: Emilio G. Cota for patches 1-9. I am sad to report that on a Skylake host, this series gives a ~10% average slowdown for x86_64-softmmu SPEC06int (I'm reporting speedup, so <1 means slowdown): https://imgur.com/a/25iu8Yl Turns out that despite the higher icache hit, the IPC ends up being lower. For instance, here are perf counts when running hmmer x3 right after booting up (bootup is included in the counts, but hmmer is run 3 times in a row): - Before: 249,392,070,159 cycles 781,327,593,681 instructions # 3.13 insn per cycle 85,914,418,873 branches 242,572,820 branch-misses # 0.28% of all branches 1,567,954,032 L1-icache-load-misses 70.559864567 seconds time elapsed - After: 277,806,651,701 cycles 813,619,725,225 instructions # 2.93 insn per cycle 132,453,633,831 branches 306,969,989 branch-misses # 0.23% of all branches 1,250,619,057 L1-icache-load-misses 78.420517079 seconds time elapsed On the bright side, in an older system (Sandy Bridge), I get a fairly neutral average perf impact, with some workloads speeding up and others slowing down: https://imgur.com/a/AokDbkm (Note that v1 of this series gave an overall slowdown, so that's progress.) Given the above, perhaps the best way forward is to add a configure flag to disable OOL thunks, unless you have any further optimizations coming up. Thanks, Emilio