From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752341AbbDLVPL (ORCPT ); Sun, 12 Apr 2015 17:15:11 -0400 Received: from plane.gmane.org ([80.91.229.3]:58225 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751776AbbDLVPI (ORCPT ); Sun, 12 Apr 2015 17:15:08 -0400 X-Injected-Via-Gmane: http://gmane.org/ To: linux-kernel@vger.kernel.org From: Jan Hubicka Subject: Re: [PATCH] x86: Turn off GCC branch probability heuristics Date: Sun, 12 Apr 2015 21:11:40 +0000 (UTC) Message-ID: References: <20150409175652.GI6464@linux.vnet.ibm.com> <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> <20150410112748.GB30477@gmail.com> <20150410120846.GA17101@gmail.com> <20150411092021.GA9478@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 24.64.45.84 (Mozilla/5.0 (X11; Linux x86_64; rv:37.0) Gecko/20100101 Firefox/37.0) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, for me as GCC developer, this is definitely an intersting observation. Let me briefly explain what happen here. -fguess-branch-probability does a lot more than just BB reordering. The way GCC works is that it first guesses probability of every branch and then uses the probabilities to estimate a profile (that is frequency at which every BB executed). This profile is then used thorough the optimization queue to make decisions that either bloat the code (such as loop header copying) or that pessimize one code path and improve other code path (such spill code placement). This is based on T Ball, JR Larus: Branch prediction for free and Y Wu, JR Larus: Static branch frequency and program profile analysis. In GCC it was implemented in 2000 http://www.ucw.cz/~hubicka/papers/proj/index.html and the earlier ad-hoc heuristics (that, for example, placed spill code depending on loop depth of the basic blocks) was slowly replaced by profile driven ones. (Most nowdays compilers works this way.) This however also means that the compiler is quite dependent on the profile. -fno-guess-branch-probability will leave all frequencies to be 0 and all edge probabilities 0%. This will make most of these heuristics to shut itself off or possibly behave funny. Disabling all the heuristics except for builtin_expect will likely result in quite funny profiles, too, where frequencies will drop to 50% after every branch. I am very interested in getting -O2 code size down (GCC 5 has quite few improvements in this area thus most motivated by C++ code and LTO). We should try to identify what optimization causes most of bloat and start from these. http://www.ucw.cz/~hubicka/papers/amd64/node4.html has 13 years old data one performance and code size with this flag. Back then branch prediction improved performance by 4.1% on specint at the expense of 5.66% code size cost. Very large portion of the bloat was the basic block reordering. I will try to re-run the benchmarks. In 2003 the largest portion of growth came from basic block reordering (-freorder-blocks) which does some tail code duplication that helped the AMD K7/K8 generation chips it was tested on because of decoder limitations. Basic block reordering pass was implemented in 2000 based on http://dl.acm.org/citation.cfm?id=305178 and while it was improved in meantime (primarily by Google adding special cases and hot/cold partitioning), it is still the same software trace algorithm. So it may be time to rewrite it which should not be too hard. Would it be possible to identify functions that change most and look into these? Honza