From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752258AbcF3VZz (ORCPT ); Thu, 30 Jun 2016 17:25:55 -0400 Received: from mga09.intel.com ([134.134.136.24]:58347 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751426AbcF3VZw (ORCPT ); Thu, 30 Jun 2016 17:25:52 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,553,1459839600"; d="scan'208";a="1008513723" Subject: Re: Rethinking sigcontext's xfeatures slightly for PKRU's benefit? To: Andy Lutomirski References: <5673750B.606@linux.intel.com> <567453AF.5060808@linux.intel.com> <56746774.8000707@linux.intel.com> <567476CC.8080805@linux.intel.com> <5678856A.5020509@linux.intel.com> Cc: Linus Torvalds , "H. Peter Anvin" , Oleg Nesterov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Brian Gerst , "linux-kernel@vger.kernel.org" , Christoph Hellwig From: Dave Hansen Message-ID: <57758E2C.6010202@linux.intel.com> Date: Thu, 30 Jun 2016 14:25:00 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/30/2016 10:36 AM, Andy Lutomirski wrote: >>> We make baseline_pkru a process-wide baseline and store it in >>> mm->context. That way, no matter which thread gets interrupted for a >>> signal, they see consistent values. We only write to it when an app >>> _specifically_ asks for it to be updated with a special flag to >>> sys_pkey_set(). >>> >>> When an app uses the execute-only support, we implicitly set the >>> read-disable bit in baseline_pkru for the execute-only pkey. ... > Looking at your git tree, which I assume is a reasonably approximation > of your current patches, this seems to be unimplemented. I, at least, > would be nervous about using PKRU for protection of critical data if > signal handlers are unconditionally exempt. I actually went along and implemented this using an extra 'flag' for pkey_get/set(). I just left it out of this stage since I'm having enough problems getting it in with the existing set of features. :) I'm confident we can add this later with the flags we can pass to pkey_get() and pkey_set(). > Also, the lazily allocated no-read key for execute-only is done in the > name of performance, but it results in odd semantics. How much of a > performance win is preserving the init optimization of PKRU in > practice? (I.e. how much faster are XSAVE and XRSTOR?) I can't test > because even my Skylake laptop doesn't have PKRU. This is admittedly not the most realistic benchmark because everything is cache-warm, but I ran Ingo's FPU "measure.c" code on XSAVES/XRSTORS. This runs things in pretty tight loops where everything is cache hot. The XSAVE instructions are monsters and I'm not super-confident in my measurements, but I'm seeing in the neighborhood of XSAVES/XRSTORS getting 20-30 cycles when PKRU is in play vs. not. This is with completely cache-hot data, though.