All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave@sr71.net>
To: Vlastimil Babka <vbabka@suse.cz>, linux-kernel@vger.kernel.org
Cc: x86@kernel.org, linux-api@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-mm@kvack.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	luto@kernel.org, mgorman@techsingularity.net,
	dave.hansen@linux.intel.com, arnd@arndb.de
Subject: Re: [PATCH 09/10] x86, pkeys: allow configuration of init_pkru
Date: Tue, 2 Aug 2016 07:37:56 -0700	[thread overview]
Message-ID: <57A0B044.1010405@sr71.net> (raw)
In-Reply-To: <2cd6331c-7fa7-b358-6892-580bef430755@suse.cz>

On 08/02/2016 01:28 AM, Vlastimil Babka wrote:
> On 07/29/2016 06:30 PM, Dave Hansen wrote:
>> From: Dave Hansen <dave.hansen@linux.intel.com>
>> But, having PKRU be 0 (its init value) provides some nonzero
>> amount of optimization potential to the hardware.  It can, for
>> instance, skip writes to the XSAVE buffer when it knows that PKRU
>> is in its init state.
> 
> I'm not very happy with tuning options that need the admin to make
> choice between reliability and performance. Is there no way to to
> optimize similarly for a non-zero init state?

The init state is architecturally defined and the overhead comes from
hardware cost when the register is not in its 'init state'.  There's
nothing I can think of that we can do in software to work around this.

I did try a few things with our XSAVE/XRSTOR code to optimize this since
most tasks will have the same PKRU value, but they didn't pan out and
added more overhead than they removed.

>> The cost of losing this optimization is approximately 100 cycles
>> per context switch for a workload which lightly using XSAVE
>> state (something not using AVX much).  The overhead comes from a
>> combinaation of actually manipulating PKRU and the overhead of
>> pullin in an extra cacheline.
> 
> So the cost is in extra steps in software, not in hardware as you
> mentioned above?

There are two sources of overhead: a RDPKRU/WRPKRU pair of instructions
at fpu__clear() time (mostly called via execve()) and overhead in the
XSAVE and XRSTOR instructions that occurs at context-switch time.

Taking the PKRU state out of the 'init state' makes us read at least one
additional cacheline during XRSTOR, plus some additional work inside the
instruction that the processor has to do to shuffle registers in/out of
memory.  This, I consider hardware overhead.

>> This overhead is not huge, but it's also not something that I
>> think we should unconditionally inflict on everyone.
> 
> Here, everyone means really all processes on system, that never heard of
> PKEs, and pay the cost just because the kernel was configured for it?

Yes, all processes on all systems that have memory protection keys
enabled in hardware.  In a normal workload that's context switching 1000
times a second is about 3/100,000 cycles on a 3GHz processor, which I
haven't been able to measure other than instrumenting the XSAVE/XRSTOR
paths themselves.

I also expect the relative overhead to decrease as more pervasive AVX
use increases the overall overhead of XSAVE. (AVX state is ~1k and PKU's
64b of space pales in comparison).

> But in that case, all PTEs use the key 0 anyway, so the non-zero default
> actually provides no extra reliability/security?

Correct.  It provides no additional security or reliability for
processes not using protection keys.

> Seems suboptimal that
> admins of such system have to recognize such situation themselves and
> change the default?

To be honest, I don't think anyone will notice.  Most folks will run a
kernel with PKU support on the new hardware that contains this feature
from day one and they'll never know about the 0.003% performance penalty
that I *think* this might cause.  Say that the processor with protection
keys is 5% faster than its predecessor (made up number), it will now
appear to be 4.996% faster.

WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <dave@sr71.net>
To: Vlastimil Babka <vbabka@suse.cz>, linux-kernel@vger.kernel.org
Cc: x86@kernel.org, linux-api@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-mm@kvack.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	luto@kernel.org, mgorman@techsingularity.net,
	dave.hansen@linux.intel.com, arnd@arndb.de
Subject: Re: [PATCH 09/10] x86, pkeys: allow configuration of init_pkru
Date: Tue, 2 Aug 2016 07:37:56 -0700	[thread overview]
Message-ID: <57A0B044.1010405@sr71.net> (raw)
In-Reply-To: <2cd6331c-7fa7-b358-6892-580bef430755@suse.cz>

On 08/02/2016 01:28 AM, Vlastimil Babka wrote:
> On 07/29/2016 06:30 PM, Dave Hansen wrote:
>> From: Dave Hansen <dave.hansen@linux.intel.com>
>> But, having PKRU be 0 (its init value) provides some nonzero
>> amount of optimization potential to the hardware.  It can, for
>> instance, skip writes to the XSAVE buffer when it knows that PKRU
>> is in its init state.
> 
> I'm not very happy with tuning options that need the admin to make
> choice between reliability and performance. Is there no way to to
> optimize similarly for a non-zero init state?

The init state is architecturally defined and the overhead comes from
hardware cost when the register is not in its 'init state'.  There's
nothing I can think of that we can do in software to work around this.

I did try a few things with our XSAVE/XRSTOR code to optimize this since
most tasks will have the same PKRU value, but they didn't pan out and
added more overhead than they removed.

>> The cost of losing this optimization is approximately 100 cycles
>> per context switch for a workload which lightly using XSAVE
>> state (something not using AVX much).  The overhead comes from a
>> combinaation of actually manipulating PKRU and the overhead of
>> pullin in an extra cacheline.
> 
> So the cost is in extra steps in software, not in hardware as you
> mentioned above?

There are two sources of overhead: a RDPKRU/WRPKRU pair of instructions
at fpu__clear() time (mostly called via execve()) and overhead in the
XSAVE and XRSTOR instructions that occurs at context-switch time.

Taking the PKRU state out of the 'init state' makes us read at least one
additional cacheline during XRSTOR, plus some additional work inside the
instruction that the processor has to do to shuffle registers in/out of
memory.  This, I consider hardware overhead.

>> This overhead is not huge, but it's also not something that I
>> think we should unconditionally inflict on everyone.
> 
> Here, everyone means really all processes on system, that never heard of
> PKEs, and pay the cost just because the kernel was configured for it?

Yes, all processes on all systems that have memory protection keys
enabled in hardware.  In a normal workload that's context switching 1000
times a second is about 3/100,000 cycles on a 3GHz processor, which I
haven't been able to measure other than instrumenting the XSAVE/XRSTOR
paths themselves.

I also expect the relative overhead to decrease as more pervasive AVX
use increases the overall overhead of XSAVE. (AVX state is ~1k and PKU's
64b of space pales in comparison).

> But in that case, all PTEs use the key 0 anyway, so the non-zero default
> actually provides no extra reliability/security?

Correct.  It provides no additional security or reliability for
processes not using protection keys.

> Seems suboptimal that
> admins of such system have to recognize such situation themselves and
> change the default?

To be honest, I don't think anyone will notice.  Most folks will run a
kernel with PKU support on the new hardware that contains this feature
from day one and they'll never know about the 0.003% performance penalty
that I *think* this might cause.  Say that the processor with protection
keys is 5% faster than its predecessor (made up number), it will now
appear to be 4.996% faster.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-08-02 14:39 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-29 16:30 [PATCH 00/10] [v6] System Calls for Memory Protection Keys Dave Hansen
2016-07-29 16:30 ` Dave Hansen
2016-07-29 16:30 ` [PATCH 01/10] x86, pkeys: add fault handling for PF_PK page fault bit Dave Hansen
2016-07-29 16:30   ` Dave Hansen
2016-09-09 11:10   ` [tip:mm/pkeys] x86/pkeys: Add " tip-bot for Dave Hansen
2016-07-29 16:30 ` [PATCH 02/10] mm: implement new pkey_mprotect() system call Dave Hansen
2016-07-29 16:30   ` Dave Hansen
2016-09-09 11:11   ` [tip:mm/pkeys] mm: Implement " tip-bot for Dave Hansen
2016-07-29 16:30 ` [PATCH 03/10] x86, pkeys: make mprotect_key() mask off additional vm_flags Dave Hansen
2016-07-29 16:30   ` Dave Hansen
2016-09-09 11:11   ` [tip:mm/pkeys] x86/pkeys: Make " tip-bot for Dave Hansen
2016-07-29 16:30 ` [PATCH 04/10] x86, pkeys: allocation/free syscalls Dave Hansen
2016-07-29 16:30   ` Dave Hansen
2016-09-09 11:12   ` [tip:mm/pkeys] x86/pkeys: Allocation/free syscalls tip-bot for Dave Hansen
2016-07-29 16:30 ` [PATCH 05/10] x86: wire up protection keys system calls Dave Hansen
2016-07-29 16:30   ` Dave Hansen
2016-07-29 16:30   ` Dave Hansen
2016-09-09 11:12   ` [tip:mm/pkeys] x86: Wire " tip-bot for Dave Hansen
2016-07-29 16:30 ` [PATCH 06/10] generic syscalls: wire up memory protection keys syscalls Dave Hansen
2016-07-29 16:30   ` Dave Hansen
2016-09-09 11:12   ` [tip:mm/pkeys] generic syscalls: Wire " tip-bot for Dave Hansen
2016-07-29 16:30 ` [PATCH 07/10] pkeys: add details of system call use to Documentation/ Dave Hansen
2016-07-29 16:30   ` Dave Hansen
2016-09-09 11:13   ` [tip:mm/pkeys] pkeys: Add " tip-bot for Dave Hansen
2016-07-29 16:30 ` [PATCH 08/10] x86, pkeys: default to a restrictive init PKRU Dave Hansen
2016-07-29 16:30   ` Dave Hansen
2016-07-29 17:29   ` Andy Lutomirski
2016-07-29 17:29     ` Andy Lutomirski
2016-07-29 17:50     ` Dave Hansen
2016-07-29 17:50       ` Dave Hansen
2016-07-29 19:44       ` Andy Lutomirski
2016-07-29 19:44         ` Andy Lutomirski
2016-08-01 14:42   ` Vlastimil Babka
2016-08-01 14:42     ` Vlastimil Babka
2016-08-01 14:58     ` Dave Hansen
2016-08-01 14:58       ` Dave Hansen
2016-08-02  8:20       ` Vlastimil Babka
2016-08-02  8:20         ` Vlastimil Babka
2016-09-09 11:13   ` [tip:mm/pkeys] x86/pkeys: Default " tip-bot for Dave Hansen
2016-07-29 16:30 ` [PATCH 09/10] x86, pkeys: allow configuration of init_pkru Dave Hansen
2016-07-29 16:30   ` Dave Hansen
2016-08-02  8:28   ` Vlastimil Babka
2016-08-02  8:28     ` Vlastimil Babka
2016-08-02 14:37     ` Dave Hansen [this message]
2016-08-02 14:37       ` Dave Hansen
2016-09-09 11:14   ` [tip:mm/pkeys] x86/pkeys: Allow " tip-bot for Dave Hansen
2016-07-29 16:30 ` [PATCH 10/10] x86, pkeys: add self-tests Dave Hansen
2016-07-29 16:30   ` Dave Hansen
2016-09-09 11:14   ` [tip:mm/pkeys] x86/pkeys: Add self-tests tip-bot for Dave Hansen
2016-08-08 23:18 [PATCH 00/10] [v6] System Calls for Memory Protection Keys Dave Hansen
2016-08-08 23:18 ` [PATCH 09/10] x86, pkeys: allow configuration of init_pkru Dave Hansen
2016-08-08 23:18   ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57A0B044.1010405@sr71.net \
    --to=dave@sr71.net \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.