From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751247AbeAELZP (ORCPT + 1 other); Fri, 5 Jan 2018 06:25:15 -0500 Received: from mail-it0-f41.google.com ([209.85.214.41]:40191 "EHLO mail-it0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750858AbeAELZO (ORCPT ); Fri, 5 Jan 2018 06:25:14 -0500 X-Google-Smtp-Source: ACJfBourcORXR6NzQi4GAQcXqKgxWTZmuLFCiIFdh+1bwpFuphwdKNZ6kYQDyNYiNiAD0ceJ+z3S6A== Date: Fri, 5 Jan 2018 03:25:09 -0800 From: Paul Turner To: David Woodhouse Cc: Alexei Starovoitov , Linus Torvalds , Andi Kleen , LKML , Greg Kroah-Hartman , Tim Chen , Dave Hansen , Thomas Gleixner , Kees Cook , Rik van Riel , Peter Zijlstra , Andy Lutomirski , Jiri Kosina , One Thousand Gnomes Subject: Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support Message-ID: <20180105112509.GD253582@google.com> References: <1515058213.12987.89.camel@amazon.co.uk> <20180104143710.8961-1-dwmw@amazon.co.uk> <20180104181744.komdplek7nfdvlsw@ast-mbp> <20180104183559.wlqoxmp7rf4d44ku@ast-mbp> <1515094078.29312.17.camel@infradead.org> <20180105102824.GA247671@google.com> <1515149738.29312.104.camel@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1515149738.29312.104.camel@infradead.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Fri, Jan 05, 2018 at 10:55:38AM +0000, David Woodhouse wrote: > On Fri, 2018-01-05 at 02:28 -0800, Paul Turner wrote: > > On Thu, Jan 04, 2018 at 07:27:58PM +0000, David Woodhouse wrote: > > > On Thu, 2018-01-04 at 10:36 -0800, Alexei Starovoitov wrote: > > > >  > > > > Pretty much. > > > > Paul's writeup: https://support.google.com/faqs/answer/7625886 > > > > tldr: jmp *%r11 gets converted to: > > > > call set_up_target; > > > > capture_spec: > > > >   pause; > > > >   jmp capture_spec; > > > > set_up_target: > > > >   mov %r11, (%rsp); > > > >   ret; > > > > where capture_spec part will be looping speculatively. > > >  > > > That is almost identical to what's in my latest patch set, except that > > > the capture_spec loop has 'lfence' instead of 'pause'. > > > > When choosing this sequence I benchmarked several alternatives here, including > > (nothing, nops, fences, and other serializing instructions such as cpuid). > > > > The "pause; jmp" sequence proved minutely faster than "lfence;jmp" which is why > > it was chosen. > > > >   "pause; jmp" 33.231 cycles/call 9.517 ns/call > >   "lfence; jmp" 33.354 cycles/call 9.552 ns/call > > > > (Timings are for a complete retpolined indirect branch.) > > Yeah, I studiously ignored you here and went with only what Intel had > *assured* me was correct and put into the GCC patches, rather than > chasing those 35 picoseconds ;) It's also notable here that while the difference is small in terms of absolute values, it's likely due to reduced variation: I would expect: - pause to be extremely consistent in its timings - pause and lfence to be close on their average timings, particularly in a micro-benchmark. Which suggests that the difference may be larger in the occasional cases that you are getting "unlucky" and seeing some other uarch interaction in the lfence path. > > The GCC patch set already had about four different variants over time, > with associated "oh shit, that one doesn't actually work; try this". > What we have in my patch set is precisely what GCC emits at the moment. > > I'm all for optimising it further, but maybe not this week. > > Other than that, is there any other development from your side that I > haven't captured in the latest (v4) series? > http://git.infradead.org/users/dwmw2/linux-retpoline.git/