From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754356Ab3AZVJp (ORCPT ); Sat, 26 Jan 2013 16:09:45 -0500 Received: from terminus.zytor.com ([198.137.202.10]:54723 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754128Ab3AZVJn (ORCPT ); Sat, 26 Jan 2013 16:09:43 -0500 User-Agent: K-9 Mail for Android In-Reply-To: References: <1359123061-6139-1-git-send-email-ling.ma@alipay.com> <20130126125208.GC21395@pd.tnic> <8b0dfc4a-6f9c-498a-9844-4d99deb3052f@email.android.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: Re: [tip:x86/asm] x86/defconfig: Turn on CONFIG_CC_OPTIMIZE_FOR_SIZE= y in the 64-bit defconfig From: "H. Peter Anvin" Date: Sat, 26 Jan 2013 13:08:02 -0800 To: Linus Torvalds CC: Borislav Petkov , Ingo Molnar , Linux Kernel Mailing List , Arjan van de Ven , Jan Beulich , ling.ml@alipay.com, Steven Rostedt , Andrew Morton , Thomas Gleixner , linux-tip-commits@vger.kernel.org Message-ID: <36b9ae37-2bf7-4ce7-a41c-9d533ac7ef94@email.android.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The fast rep movsb was introduced on Ivy Bridge, IIRC. Linus Torvalds wrote: >On Sat, Jan 26, 2013 at 7:18 AM, H. Peter Anvin wrote: >> On the CPUs Ling is testing on the downsides of -Os probably matter >less, in particular since rep movsb works well. >> >> It is questionable as a generic default, though. > >So being the person who really pushed for -Os to begin with (I think >I$ and instruction decode bandwidth is one of the most fundamental >limits to CPU performance), I wouldn't mind it if we reintroduced it. > >HOWEVER. > >It wasn't just "rep movs". The thing that killed -Os for me was that >it makes it impossible to try to optimize hot code, because -Os seems >to throw out branch prediction information. So when you use "likely()" >etc to try to teach the compiler to lay out code a certain way so that >code that never really gets executed isn't even brought into the I$, >-Os then screws it up completely. > >Of course, maybe newer versions of gcc might not suck so horribly with >-Os, I haven't actually tried in a while. > >[ Just tested. Still does it ] > >Also, I doubt Ling was testing a SB CPU. Because "rep movb" still >sucks pretty bad on SB. What core *is* Ling testing? Haswell? > >Ugh. We could make it depend on the optimization target. I'd also wish >there was some way to just tune gcc -Os to be closer to reasonable. Or >make -O2 not do some of the excessive crap it does (it aligns code >*much* too much, for example - who cares if you can do it with a >single instruction, if that instruction is so long that it uses up >half your decode bandwidth?) > >The problem, of course, is that most -O2 code generation is done >assuming hot loops that don't show much if any I$ issues. And the -Os >thing is done *purely* for size, not taking any performance into >account at all. There's no balanced middle ground, which is what _we_ >would want. > > Linus -- Sent from my mobile phone. Please excuse brevity and lack of formatting.