From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756054AbbFPAYs (ORCPT ); Mon, 15 Jun 2015 20:24:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43809 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750996AbbFPAYk (ORCPT ); Mon, 15 Jun 2015 20:24:40 -0400 Message-ID: <557F6CC3.7070709@redhat.com> Date: Tue, 16 Jun 2015 02:24:35 +0200 From: Denys Vlasenko User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Ingo Molnar CC: Linus Torvalds , Steven Rostedt , Borislav Petkov , "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 4/5] x86/asm/entry/32: Replace RESTORE_RSI_RDI[_RDX] with open-coded 32-bit reads References: <1433876051-26604-1-git-send-email-dvlasenk@redhat.com> <1433876051-26604-4-git-send-email-dvlasenk@redhat.com> <20150614084059.GA24562@gmail.com> <557D9BEE.8010902@redhat.com> <20150615202008.GA12450@gmail.com> In-Reply-To: <20150615202008.GA12450@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/15/2015 10:20 PM, Ingo Molnar wrote: >> Actually, ecx and r11 need to be loaded first. They are not so much "restored" >> as "prepared for SYSRET insn". Every cycle lost in loading these delays SYSRET. >> [...] > > So in the typical case they will still be cached, and so their max latency should > be around 3 cycles. If syscall flushes caches (say, a large read), or sleeps and CPU schedules away, then pt_regs->ip,flags are evicted and need to be reloaded. > In fact because they are memory loads, they don't really have dependencies, > they should be available to SYSRET almost immediately, They depend on the memory data. > i.e. within a cycle - and > there's no reason to believe why these loads wouldn't pipeline properly and > parallelize with the many other things SYSRET has to do to organize a return to > user-space, before it can actually use the target RIP and RFLAGS. This does not sound right. If it takes, say, 20 cycles to pull data from e.g. L3 cache to ECX, then SYSRET can't possibly complete sooner than in 20 cycles.