From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756217AbbFOUUU (ORCPT ); Mon, 15 Jun 2015 16:20:20 -0400 Received: from mail-wg0-f48.google.com ([74.125.82.48]:35821 "EHLO mail-wg0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756107AbbFOUUN (ORCPT ); Mon, 15 Jun 2015 16:20:13 -0400 Date: Mon, 15 Jun 2015 22:20:08 +0200 From: Ingo Molnar To: Denys Vlasenko Cc: Linus Torvalds , Steven Rostedt , Borislav Petkov , "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 4/5] x86/asm/entry/32: Replace RESTORE_RSI_RDI[_RDX] with open-coded 32-bit reads Message-ID: <20150615202008.GA12450@gmail.com> References: <1433876051-26604-1-git-send-email-dvlasenk@redhat.com> <1433876051-26604-4-git-send-email-dvlasenk@redhat.com> <20150614084059.GA24562@gmail.com> <557D9BEE.8010902@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <557D9BEE.8010902@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Denys Vlasenko wrote: > On 06/14/2015 10:40 AM, Ingo Molnar wrote: > > > > * Denys Vlasenko wrote: > > > >> +8b 74 24 68 mov 0x68(%rsp),%esi > >> +8b 7c 24 70 mov 0x70(%rsp),%edi > >> +8b 54 24 60 mov 0x60(%rsp),%edx > > > > Btw., could you (in another patch) order the restoration properly, by pt_regs > > memory order, where possible? > > Will do. > > > So this: > > > >> + movl RSI(%rsp), %esi > >> + movl RDI(%rsp), %edi > >> + movl RDX(%rsp), %edx > >> movl RIP(%rsp), %ecx > >> movl EFLAGS(%rsp), %r11d > > > > would become: > > > > movl RDX(%rsp), %edx > > movl RSI(%rsp), %esi > > movl RDI(%rsp), %edi > > movl RIP(%rsp), %ecx > > movl EFLAGS(%rsp), %r11d > > > > ... or so. > > Actually, ecx and r11 need to be loaded first. They are not so much "restored" > as "prepared for SYSRET insn". Every cycle lost in loading these delays SYSRET. > [...] So in the typical case they will still be cached, and so their max latency should be around 3 cycles. In fact because they are memory loads, they don't really have dependencies, so they should be available to SYSRET almost immediately, i.e. within a cycle - and there's no reason to believe why these loads wouldn't pipeline properly and parallelize with the many other things SYSRET has to do to organize a return to user-space, before it can actually use the target RIP and RFLAGS. So I strongly doubt that the placement of the RCX and R11 load before the SYSRET matters to performance. In any case this should be testable by looking at syscall performance and reordering the instructions. Thanks, Ingo