From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752615AbbDLXo7 (ORCPT ); Sun, 12 Apr 2015 19:44:59 -0400 Received: from eddie.linux-mips.org ([148.251.95.138]:51006 "EHLO cvs.linux-mips.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752381AbbDLXo4 (ORCPT ); Sun, 12 Apr 2015 19:44:56 -0400 Date: Mon, 13 Apr 2015 00:44:53 +0100 (BST) From: "Maciej W. Rozycki" To: Linus Torvalds cc: Denys Vlasenko , Ingo Molnar , "Paul E. McKenney" , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML , Borislav Petkov , Andy Lutomirski , Brian Gerst , "H. Peter Anvin" , Thomas Gleixner , Peter Zijlstra Subject: Re: [PATCH] x86: Align jump targets to 1 byte boundaries In-Reply-To: Message-ID: References: <20150409175652.GI6464@linux.vnet.ibm.com> <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> <20150410112748.GB30477@gmail.com> <20150410120846.GA17101@gmail.com> <5527C700.3030405@redhat.com> User-Agent: Alpine 2.11 (LFD 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 10 Apr 2015, Linus Torvalds wrote: > It turns out that gcc's -Os is just horrible nasty crap. It doesn't > actually make good tradeoffs for code density, because it doesn't make > any tradeoffs at all. It tries to choose small code, even when it's > ridiculously bad small code. > > For example, a 24-byte static memcpy is best done as three quad-word > load/store pairs. That's very cheap, and not at all unreasonable. > > But what does gcc do? It does a "rep movsl". > > Seriously. That's *shit*. It absolutely kills performance on some very > critical code. > > I'm not making that up. Try "-O2" and "-Os" on the appended trivial > code. Yes, the "rep movsl" is smaller, but it's incredibly expensive, > particularly if the result is partially used afterwards. > > And I'm not a hater of "rep movs" - not at all. I think that "rep > movsb" is basically a perfect way to tell the CPU "do an optimized > memcpy with whatever cache situation you have". So I'm a big fan of > the string instructions, but only when appropriate. And "appropriate" > here very much includes "I don't know the memory copy size, so I'm > going to call out to some complex generic code that does all kinds of > size checks and tricks". > > Replacing three pairs of "mov" instructions with a "rep movs" is insane. > > (There are a couple of other examples of that kind of issues with > "-Os". Like using "imul $15" instead of single shift-by-4 and > subtract. Again, the "imul" is certainly smaller, but can have quite > bad latency and throughput issues). > > So I'm no longer a fan of -Os. It disables too many obviously good > code optimizations. I think the issue is -Os is a binary yes/no option without further tuning as to how desperate about code size saving GCC is asked to be. That's what we'd probably have with speed optimisation too if there was only a single -O GCC option -- equivalent to today's -O3. However instead GCC has -O1, -O2, -O3 that turn on more and more possibly insane optimisations gradually (plus a load -f options for further fine tuning). So a possible complementary solution for size saving could be keeping -Os as it is for people's build recipe compatibility, and then have say -Os1, -Os2, -Os3 enabling more and insane optimisations, on the size side for a change. In that case -Os3 would be equivalent to today's -Os. There could be further fine-tune options to control things like the string moves you mention. The thing here is someone would have to implement all of it and I gather GCC folks have more than enough stuff to do already. I'm fairly sure they wouldn't decline a patch though. ;) Maciej