All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] x86, pkeys: fix prefetch/pkeys interaction
@ 2016-07-22 18:03 Dave Hansen
  2016-07-22 18:03 ` [PATCH 1/3] x86, tracing: fix x86 exceptions trace header Dave Hansen
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Dave Hansen @ 2016-07-22 18:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, luto, Dave Hansen

The first two patches here are useful in any case, I think.

But, as for the third:  There are no known prefetch errata on
processors that support memory protection keys.  There have not
been any that I can find in any recent generations, either.

But, if there were a future erratum, we would need this.
Otherwise, apps who hit the theoretical erratum and used pkeys
would not be fixed up by the prefetch erratum detection code.

It also occurs to me that *if* there were an erratum on a modern
CPU, we might not know because we have so many workarounds in
place.

So, I'm submitting this, but I don't feel that strongly about it.
It doesn't fix a real problem, but it's also not that much code,
or in any kind of fast path.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] x86, tracing: fix x86 exceptions trace header
  2016-07-22 18:03 [PATCH 0/3] x86, pkeys: fix prefetch/pkeys interaction Dave Hansen
@ 2016-07-22 18:03 ` Dave Hansen
  2016-07-29 15:08   ` Steven Rostedt
  2016-07-22 18:03 ` [PATCH 2/3] x86: add some better documentation for probe_kernel_address() Dave Hansen
  2016-07-22 18:03 ` [PATCH 3/3] x86, pkeys: allow instruction fetches in presence of pkeys Dave Hansen
  2 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2016-07-22 18:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, luto, Dave Hansen, dave.hansen, rostedt, mingo


From: Dave Hansen <dave.hansen@linux.intel.com>

The various tracing headers pass some variables into the tracing
code itself to indicate things like the name of the tracing
directory where the tracepoints should be located in debugfs.
The general pattern is to #undef them before redefining them.

But, if all instances don't do this, and two different trace
headers get included in the same file, you'll get macro
redefinition compile errors.

Fix up the x86 exception tracing code to properly #undef one of
its macros.

Note that this isn't a problem in practice until the next patch
in this series is applied and adds a #include of the FPU
tracing header.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
---

 b/arch/x86/include/asm/trace/exceptions.h |    1 +
 1 file changed, 1 insertion(+)

diff -puN arch/x86/include/asm/trace/exceptions.h~pkeys-899-fix-exceptions-trace-define arch/x86/include/asm/trace/exceptions.h
--- a/arch/x86/include/asm/trace/exceptions.h~pkeys-899-fix-exceptions-trace-define	2016-07-22 10:52:32.543331471 -0700
+++ b/arch/x86/include/asm/trace/exceptions.h	2016-07-22 10:52:32.546331607 -0700
@@ -45,6 +45,7 @@ DEFINE_PAGE_FAULT_EVENT(page_fault_kerne
 
 #undef TRACE_INCLUDE_PATH
 #define TRACE_INCLUDE_PATH .
+#undef  TRACE_INCLUDE_FILE
 #define TRACE_INCLUDE_FILE exceptions
 #endif /*  _TRACE_PAGE_FAULT_H */
 
_

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/3] x86: add some better documentation for probe_kernel_address()
  2016-07-22 18:03 [PATCH 0/3] x86, pkeys: fix prefetch/pkeys interaction Dave Hansen
  2016-07-22 18:03 ` [PATCH 1/3] x86, tracing: fix x86 exceptions trace header Dave Hansen
@ 2016-07-22 18:03 ` Dave Hansen
  2016-07-22 18:10   ` Andy Lutomirski
  2016-07-22 18:03 ` [PATCH 3/3] x86, pkeys: allow instruction fetches in presence of pkeys Dave Hansen
  2 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2016-07-22 18:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, luto, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

probe_kernel_address() has an unfortunate name since it is used
to probe kernel *and* userspace addresses.  Add a comment
explaining some of the situation to help the next developer who
might make the silly assumption that it is for probing kernel
addresses.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/include/linux/uaccess.h |    8 ++++++++
 1 file changed, 8 insertions(+)

diff -puN include/linux/uaccess.h~pkeys-902-document-probe_kernel_address include/linux/uaccess.h
--- a/include/linux/uaccess.h~pkeys-902-document-probe_kernel_address	2016-07-22 10:52:32.909347999 -0700
+++ b/include/linux/uaccess.h	2016-07-22 10:52:32.912348134 -0700
@@ -106,6 +106,14 @@ extern long strncpy_from_unsafe(char *ds
  * @addr: address to read from
  * @retval: read into this variable
  *
+ * This is safe to call on both userspace and kernel addresses.
+ * Kernel faults (like vmalloc faults) may be handled but do not
+ * sleep.
+ *
+ * If access to @addr is a userspace address and faults, we will
+ * enter the page fault handler, not *handling* the fault in any
+ * way and returning -EFAULT.
+ *
  * Returns 0 on success, or -EFAULT.
  */
 #define probe_kernel_address(addr, retval)		\
_

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/3] x86, pkeys: allow instruction fetches in presence of pkeys
  2016-07-22 18:03 [PATCH 0/3] x86, pkeys: fix prefetch/pkeys interaction Dave Hansen
  2016-07-22 18:03 ` [PATCH 1/3] x86, tracing: fix x86 exceptions trace header Dave Hansen
  2016-07-22 18:03 ` [PATCH 2/3] x86: add some better documentation for probe_kernel_address() Dave Hansen
@ 2016-07-22 18:03 ` Dave Hansen
  2 siblings, 0 replies; 8+ messages in thread
From: Dave Hansen @ 2016-07-22 18:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, luto, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

Thanks to Andy Lutomirski for pointing out the potential issue
here.

Memory protection keys only affect data access.  They do not
affect instruction fetches.  So, an instruction may not be
readable, while it *is* executable.

The fault prefetch checking code directly reads instructions and
might trip over an instruction made unreadable by pkeys.  Turn
off pkeys temporarily so we can fetch the instruction.

===== Long explanation: =====

We have a long history of bugs with prefetch instructions faulting
when they should not.  So, in the slow paths of our page fault
code, we go peeking at the instructions being run when the fault
occurred to see if the fault could have been caused by a prefetch
instruction.

To do the peeking, the kernel looks at the instruction pointer
and fetches the instruction, sometimes right out of userspace.
But, protection keys may get in the way and will make the
kernel's data access fault.  The kernel will assume that the
instruction pointer was bogus and go angrily along on the way to
killing the app instead of _actually_ fetching a prefix
instruction.

Does this matter?  Probably not.  The hardware implementing
protection keys is not even publicly available yet.  But, if it
or some future processor has a prefetch bug, this will help us
properly work around it.

This makes the instruction temporarily readable so that we can do
a data access and peek at its opcode.  This operation is only
visible to the thread doing the fault.

I thought this might be painful to solve, requiring something
akin to a kernel_fpu_begin/end() pair.  After thinking about it,
I managed to convince myself that we do not need that harsh of
a solution and this simple one is OK, mostly because we never
do lazy save/restore of PKRU.

This is slow-ish.  The RDPKRU/WRPKRU/WRPKRU sequence probably
costs us dozens of cycles, plus the extra branches from the
if(OSPKE) checks.  It also does that for each byte of the
instructions that it fetches, which is a bit naughty.  But, this
is a slow path already, so I haven't tried to optimize it at all.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
---

 b/arch/x86/mm/fault.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 49 insertions(+), 2 deletions(-)

diff -puN arch/x86/mm/fault.c~pkeys-900-fix-prefetch arch/x86/mm/fault.c
--- a/arch/x86/mm/fault.c~pkeys-900-fix-prefetch	2016-07-22 10:52:33.274364481 -0700
+++ b/arch/x86/mm/fault.c	2016-07-22 10:52:33.277364616 -0700
@@ -20,6 +20,7 @@
 #include <asm/pgalloc.h>		/* pgd_*(), ...			*/
 #include <asm/kmemcheck.h>		/* kmemcheck_*(), ...		*/
 #include <asm/fixmap.h>			/* VSYSCALL_ADDR		*/
+#include <asm/fpu/internal.h>		/* for use_eager_fpu() checks	*/
 #include <asm/vsyscall.h>		/* emulate_vsyscall		*/
 #include <asm/vm86.h>			/* struct vm86			*/
 #include <asm/mmu_context.h>		/* vma_pkey()			*/
@@ -76,6 +77,52 @@ static nokprobe_inline int kprobes_fault
 }
 
 /*
+ * Memory protection keys only affect data access.  They do not
+ * affect instruction fetches.  So, an instruction may not be
+ * readable, while it *is* executable.  This makes the
+ * instruction temporarily readable so that we can do a data
+ * access and peek at its opcode.
+ */
+static
+int probe_insn_opcode(void *insn_address, unsigned char *ret_opcode)
+{
+	int ret;
+	u32 saved_pkru = read_pkru();
+
+	/*
+	 * Clear out all of the access/write-disable bits in
+	 * PKRU.  This ensures that pkeys will not block access
+	 * to @insn_address.  If no keys are access-disabled
+	 * (saved_pkru==0) avoid the cost of the PKRU writes
+	 * and the continued cost of having taken it out of its
+	 * (XSAVE) init state.
+	 *
+	 * Note also that this only affect access to user
+	 * addresses.  Kernel (supervisor) mappings are not
+	 * affected by this register.
+	 */
+	if (saved_pkru)
+		write_pkru(0);
+	/*
+	 * We normally have to be very careful with FPU registers
+	 * and preempt.  But, PKRU is different.  It is never
+	 * lazily saved/restored, so we don't have to be as
+	 * careful with this as normal FPU state.  Enforce this
+	 * assumption with the WARN_ON().
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_OSPKE))
+		WARN_ON_ONCE(!use_eager_fpu());
+	ret = probe_kernel_address(insn_address, *ret_opcode);
+	/*
+	 * Restore PKRU to what it was.  We a
+	 */
+	if (saved_pkru)
+		write_pkru(saved_pkru);
+
+	return ret;
+}
+
+/*
  * Prefetch quirks:
  *
  * 32-bit mode:
@@ -126,7 +173,7 @@ check_prefetch_opcode(struct pt_regs *re
 		return !instr_lo || (instr_lo>>1) == 1;
 	case 0x00:
 		/* Prefetch instruction is 0x0F0D or 0x0F18 */
-		if (probe_kernel_address(instr, opcode))
+		if (probe_insn_opcode(instr, &opcode))
 			return 0;
 
 		*prefetch = (instr_lo == 0xF) &&
@@ -160,7 +207,7 @@ is_prefetch(struct pt_regs *regs, unsign
 	while (instr < max_instr) {
 		unsigned char opcode;
 
-		if (probe_kernel_address(instr, opcode))
+		if (probe_insn_opcode(instr, &opcode))
 			break;
 
 		instr++;
_

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] x86: add some better documentation for probe_kernel_address()
  2016-07-22 18:03 ` [PATCH 2/3] x86: add some better documentation for probe_kernel_address() Dave Hansen
@ 2016-07-22 18:10   ` Andy Lutomirski
  2016-07-22 18:18     ` Dave Hansen
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2016-07-22 18:10 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, X86 ML, Dave Hansen

On Jul 22, 2016 11:03 AM, "Dave Hansen" <dave@sr71.net> wrote:
>
>
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> probe_kernel_address() has an unfortunate name since it is used
> to probe kernel *and* userspace addresses.  Add a comment
> explaining some of the situation to help the next developer who
> might make the silly assumption that it is for probing kernel
> addresses.

This can't work on architectures like s390 that have separate,
overlapping user and kernel address spaces.  Maybe we should fix x86
to stop abusing it and use get_user instead.  (In which case, your new
function should be called get_user_insn_byte or similar.)

--Andy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] x86: add some better documentation for probe_kernel_address()
  2016-07-22 18:10   ` Andy Lutomirski
@ 2016-07-22 18:18     ` Dave Hansen
  2016-07-22 18:22       ` Andy Lutomirski
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2016-07-22 18:18 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: linux-kernel, X86 ML, Dave Hansen

On 07/22/2016 11:10 AM, Andy Lutomirski wrote:
> On Jul 22, 2016 11:03 AM, "Dave Hansen" <dave@sr71.net> wrote:
>> From: Dave Hansen <dave.hansen@linux.intel.com>
>>
>> probe_kernel_address() has an unfortunate name since it is used
>> to probe kernel *and* userspace addresses.  Add a comment
>> explaining some of the situation to help the next developer who
>> might make the silly assumption that it is for probing kernel
>> addresses.
> 
> This can't work on architectures like s390 that have separate,
> overlapping user and kernel address spaces.  Maybe we should fix x86
> to stop abusing it and use get_user instead.  (In which case, your new
> function should be called get_user_insn_byte or similar.)

Urg.

But can't the x86 use in no_context() be called from a kernel-initiated
fault?  Like a prefetch instruction to a vmalloc() page that we needed
to do a vmalloc fault for?

In either case, it would be awfully nice to have the clarity about
exactly what is being probed.  The other 2 calls to is_prefetch() do
appear to be userspace-only.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] x86: add some better documentation for probe_kernel_address()
  2016-07-22 18:18     ` Dave Hansen
@ 2016-07-22 18:22       ` Andy Lutomirski
  0 siblings, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2016-07-22 18:22 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, X86 ML, Dave Hansen

On Fri, Jul 22, 2016 at 11:18 AM, Dave Hansen <dave@sr71.net> wrote:
> On 07/22/2016 11:10 AM, Andy Lutomirski wrote:
>> On Jul 22, 2016 11:03 AM, "Dave Hansen" <dave@sr71.net> wrote:
>>> From: Dave Hansen <dave.hansen@linux.intel.com>
>>>
>>> probe_kernel_address() has an unfortunate name since it is used
>>> to probe kernel *and* userspace addresses.  Add a comment
>>> explaining some of the situation to help the next developer who
>>> might make the silly assumption that it is for probing kernel
>>> addresses.
>>
>> This can't work on architectures like s390 that have separate,
>> overlapping user and kernel address spaces.  Maybe we should fix x86
>> to stop abusing it and use get_user instead.  (In which case, your new
>> function should be called get_user_insn_byte or similar.)
>
> Urg.
>
> But can't the x86 use in no_context() be called from a kernel-initiated
> fault?  Like a prefetch instruction to a vmalloc() page that we needed
> to do a vmalloc fault for?
>
> In either case, it would be awfully nice to have the clarity about
> exactly what is being probed.  The other 2 calls to is_prefetch() do
> appear to be userspace-only.

Indeed.

Perhaps we should pass a bool is_kernel into is_prefetch().

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] x86, tracing: fix x86 exceptions trace header
  2016-07-22 18:03 ` [PATCH 1/3] x86, tracing: fix x86 exceptions trace header Dave Hansen
@ 2016-07-29 15:08   ` Steven Rostedt
  0 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2016-07-29 15:08 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, x86, luto, dave.hansen, mingo

On Fri, 22 Jul 2016 11:03:11 -0700
Dave Hansen <dave@sr71.net> wrote:

> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> The various tracing headers pass some variables into the tracing
> code itself to indicate things like the name of the tracing
> directory where the tracepoints should be located in debugfs.
> The general pattern is to #undef them before redefining them.
> 
> But, if all instances don't do this, and two different trace
> headers get included in the same file, you'll get macro
> redefinition compile errors.
> 
> Fix up the x86 exception tracing code to properly #undef one of
> its macros.
> 
> Note that this isn't a problem in practice until the next patch
> in this series is applied and adds a #include of the FPU
> tracing header.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> ---
> 
>  b/arch/x86/include/asm/trace/exceptions.h |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff -puN arch/x86/include/asm/trace/exceptions.h~pkeys-899-fix-exceptions-trace-define arch/x86/include/asm/trace/exceptions.h
> --- a/arch/x86/include/asm/trace/exceptions.h~pkeys-899-fix-exceptions-trace-define	2016-07-22 10:52:32.543331471 -0700
> +++ b/arch/x86/include/asm/trace/exceptions.h	2016-07-22 10:52:32.546331607 -0700
> @@ -45,6 +45,7 @@ DEFINE_PAGE_FAULT_EVENT(page_fault_kerne
>  
>  #undef TRACE_INCLUDE_PATH
>  #define TRACE_INCLUDE_PATH .
> +#undef  TRACE_INCLUDE_FILE
>  #define TRACE_INCLUDE_FILE exceptions
>  #endif /*  _TRACE_PAGE_FAULT_H */
>  
> _

Acked-by: Steven Rostedt <rostedt@goodmis.org>

I had to check, but my sample code in
samples/trace_events/trace-events-sample.h does indeed include the
#undef of TRACE_INCLUDE_FILE too.

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-07-29 15:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-22 18:03 [PATCH 0/3] x86, pkeys: fix prefetch/pkeys interaction Dave Hansen
2016-07-22 18:03 ` [PATCH 1/3] x86, tracing: fix x86 exceptions trace header Dave Hansen
2016-07-29 15:08   ` Steven Rostedt
2016-07-22 18:03 ` [PATCH 2/3] x86: add some better documentation for probe_kernel_address() Dave Hansen
2016-07-22 18:10   ` Andy Lutomirski
2016-07-22 18:18     ` Dave Hansen
2016-07-22 18:22       ` Andy Lutomirski
2016-07-22 18:03 ` [PATCH 3/3] x86, pkeys: allow instruction fetches in presence of pkeys Dave Hansen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.