From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756302Ab1CVRBA (ORCPT ); Tue, 22 Mar 2011 13:01:00 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:43494 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752265Ab1CVRA5 (ORCPT ); Tue, 22 Mar 2011 13:00:57 -0400 MIME-Version: 1.0 In-Reply-To: <20110322102741.GA4448@elte.hu> References: <20110322102741.GA4448@elte.hu> From: Linus Torvalds Date: Tue, 22 Mar 2011 09:59:59 -0700 Message-ID: Subject: Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N To: Ingo Molnar Cc: Pekka Enberg , Jesper Juhl , linux-kernel@vger.kernel.org, Andrew Morton , "Paul E. McKenney" , Daniel Lezcano , Eric Paris , Roman Zippel , linux-kbuild@vger.kernel.org, Steven Rostedt Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar wrote: > > If that situation has changed - if GCC has regressed in this area then a commit > changing the default IMHO gains a lot of credibility if it is backed by careful > measurements using perf stat --repeat or similar tools. Also, please don't back up any numbers for the "-O2 is faster than -Os" case with some benchmark that is hot in the caches. The thing is, many optimizations that make the code larger look really good if there are no cache misses, and the code is run a million times in a tight loop. But kernel code in particular tends to not be like that. Yes, there are cases where we spend 75% of the time in the kernel (my own personal favorite is "git diff") basically having user space loop around just one single operation. But it is _really_ quite rare in real life. Most of the time, user space will blow the kernel caches out of the water, and the kernel loops will be on the order of a few entries (eg a "loop" may be the loop around a pathname lookup, and loops over three path components). Not millions. The rule-of-thumb should be simple: 10% larger code likely means 10% more I$ misses. Does the larger -O2 code make up for it? Now, the downside of -Os has always been that it's not all that widely used, so we've hit compiler bugs several times. That's been almost enough to make me think that it's not worth it. But currently I don't think we have any known issues, and probably exactly _because_ we use -Os it seems that gcc hasn't that many regressions. It was much more painful when we started trying to use -Os. (That said, gcc -Os isn't all that wonderful. It tends to sometimes generate really crappy code just because it's smaller, ie using a multiply instruction in a critical code window just because doing a few shifts and adds is larger. And that can be _so_ much slower that it really hurts. So we might be better off with a model where we can say "this code is important and really core kernel code that everybody uses, do -O2 for this", and just compile _most_ of the kernel with -Os) Linus