Linux-Doc Archive on lore.kernel.org
 help / color / Atom feed
From: ira.weiny@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Ira Weiny <ira.weiny@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Fenghua Yu <fenghua.yu@intel.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org
Subject: [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions
Date: Fri, 17 Jul 2020 00:20:56 -0700
Message-ID: <20200717072056.73134-18-ira.weiny@intel.com> (raw)
In-Reply-To: <20200717072056.73134-1-ira.weiny@intel.com>

From: Ira Weiny <ira.weiny@intel.com>

The PKRS MSR is not managed by XSAVE.  It is already preserved through a
context switch but this support leaves exception handling code open to
memory accesses which the interrupted process has allowed.

Close this hole by preserve the current task's PKRS MSR, reset the PKRS
MSR value on exception entry, and then restore the state on exception
exit.

Notice that, in addition to the MSR value itself, the saved task state
must also include the device memory protection reference count which is
maintained to keep kmap re-entrant.  The following shows how this works:

...
	// ref == 0
	dev_access_enable()  // ref += 1 ==> disable protection
		irq()
			// enable protection
			// ref = 0
			_handler()
				dev_access_enable()   // ref += 1 ==> disable protection
				dev_access_disable()  // ref -= 1 ==> enable protection
			// WARN_ON(ref != 0)
			// disable protection
	do_pmem_thing()  // all good here
	dev_access_disable() // ref -= 1 ==> 0 ==> enable protection
...

Nested exceptions should also operate the same way with each exception
storing the previous reference count all the way down.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
RFC NOTES:

First I'm not sure if adding this state to idtentry_state and having
that state copied is the right way to go.  It seems like we should start
passing this by reference instead of value.  But for now this works as
an RFC.  Comments?

Second, I'm not 100% happy with having to save the reference count in
the exception handler.  It seems like a very ugly layering violation but
I don't see a way around it at the moment.

Third, this patch has gone through a couple of revisions as I've had
crashes which just don't make sense to me.  One particular issue I've
had is taking a MCE during memcpy_mcsafe causing my WARN_ON() to fire.
The code path was a pmem copy and the ref count should have been
elevated due to dev_access_enable() but why was
idtentry_enter()->idt_save_pkrs() not called I don't know.

Finally, it looks like the entry/exit code is being refactored into
common code.  So likely this is best handled somewhat there.  Because
this can be predicated on CONFIG_ARCH_HAS_SUPERVISOR_PKEYS and handled
in a generic fashion.  But that is a ways off I think.

This patch depends on:
	https://lore.kernel.org/lkml/159411021855.4006.13113751062324360868.tip-bot2@tip-bot2/
	Which has yet to land upstream.  I just pulled it into my test
	branch which is based on 5.8-rc5 to get the exception state
	tracking.  Hopefully it is self contained enough I'm not causing
	other issues with my tests.
---
 arch/x86/entry/common.c               | 78 ++++++++++++++++++++++++++-
 arch/x86/include/asm/idtentry.h       |  2 +
 arch/x86/include/asm/pkeys_internal.h |  3 +-
 arch/x86/kernel/process.c             |  1 -
 arch/x86/mm/pkeys.c                   |  2 +-
 5 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 0521546022cb..012bf7ecca0d 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -41,6 +41,7 @@
 #include <asm/io_bitmap.h>
 #include <asm/syscall.h>
 #include <asm/irq_stack.h>
+#include <asm/pkeys_internal.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
@@ -558,6 +559,72 @@ SYSCALL_DEFINE0(ni_syscall)
 	return -ENOSYS;
 }
 
+#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
+/*
+ * PKRS is a per-logical-processor MSR which overlays additional protection for
+ * pages which have been mapped with a protection key.
+ *
+ * The register is not maintained with XSAVE so we have to maintain the MSR
+ * value in software during context switch and exception handling.
+ *
+ * Context switches save the MSR in the task struct thus taking that value to
+ * other processors if necessary.
+ *
+ * To protect against exceptions having access to this memory we save the
+ * current running value and set the default PKRS value for the duration of the
+ * exception.  Thus preventing exception handlers from having the access of the
+ * interrupted task.
+ *
+ * Furthermore, Zone Device Access Protection maintains access in a re-entrant
+ * manner through a reference count which also needs to be maintained should
+ * exception handlers use those interfaces for memory access.  Here we start
+ * off the exception handler ref count to 0 and ensure it is 0 when the
+ * exception is done.  Then restore it for the interrupted task.
+ */
+
+static void noinstr idt_save_pkrs(idtentry_state_t state)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_PKS))
+		return;
+
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
+	/*
+	 * Save the ref count of the current running process and set it to 0
+	 * for any irq users to properly track re-entrance
+	 */
+	state.pkrs_ref = current->dev_page_access_ref;
+	current->dev_page_access_ref = 0;
+#endif
+
+	state.pkrs = this_cpu_read(pkrs_cache);
+	if (state.pkrs != INIT_PKRS_VALUE)
+		write_pkrs(INIT_PKRS_VALUE);
+}
+
+static void noinstr idt_restore_pkrs(idtentry_state_t state)
+{
+	u32 pkrs;
+
+	if (!cpu_feature_enabled(X86_FEATURE_PKS))
+		return;
+
+	pkrs = this_cpu_read(pkrs_cache);
+	if (state.pkrs != pkrs)
+		write_pkrs(state.pkrs);
+
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
+	WARN_ON_ONCE(current->dev_page_access_ref != 0);
+	/* Restore the interrupted process reference */
+	current->dev_page_access_ref = state.pkrs_ref;
+#endif
+}
+#else
+/* Define as macros to prevent conflict of inline and noinstr */
+#define idt_save_pkrs(state)
+#define idt_restore_pkrs(state)
+#endif
+
+
 /**
  * idtentry_enter - Handle state tracking on ordinary idtentries
  * @regs:	Pointer to pt_regs of interrupted context
@@ -604,6 +671,8 @@ idtentry_state_t noinstr idtentry_enter(struct pt_regs *regs)
 		return ret;
 	}
 
+	idt_save_pkrs(ret);
+
 	/*
 	 * If this entry hit the idle task invoke rcu_irq_enter() whether
 	 * RCU is watching or not.
@@ -694,7 +763,12 @@ void noinstr idtentry_exit(struct pt_regs *regs, idtentry_state_t state)
 	/* Check whether this returns to user mode */
 	if (user_mode(regs)) {
 		prepare_exit_to_usermode(regs);
-	} else if (regs->flags & X86_EFLAGS_IF) {
+		return;
+	}
+
+	idt_restore_pkrs(state);
+
+	if (regs->flags & X86_EFLAGS_IF) {
 		/*
 		 * If RCU was not watching on entry this needs to be done
 		 * carefully and needs the same ordering of lockdep/tracing
@@ -819,6 +893,8 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
 
 	inhcall = get_and_clear_inhcall();
 	if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
+		/* Normally called by idtentry_exit, we must restore pkrs here */
+		idt_restore_pkrs(state);
 		instrumentation_begin();
 		idtentry_exit_cond_resched(regs, true);
 		instrumentation_end();
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 7227225cf45d..92d5e43cda7f 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -14,6 +14,8 @@ void idtentry_enter_user(struct pt_regs *regs);
 void idtentry_exit_user(struct pt_regs *regs);
 
 typedef struct idtentry_state {
+	unsigned int pkrs_ref;
+	u32 pkrs;
 	bool exit_rcu;
 } idtentry_state_t;
 
diff --git a/arch/x86/include/asm/pkeys_internal.h b/arch/x86/include/asm/pkeys_internal.h
index e34f380c66d1..60e7b55dd141 100644
--- a/arch/x86/include/asm/pkeys_internal.h
+++ b/arch/x86/include/asm/pkeys_internal.h
@@ -27,7 +27,8 @@
 #define        PKS_NUM_KEYS            16
 
 #ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
-void write_pkrs(u32 pkrs_val);
+DECLARE_PER_CPU(u32, pkrs_cache);
+void noinstr write_pkrs(u32 pkrs_val);
 #else
 static inline void write_pkrs(u32 pkrs_val) { }
 #endif
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index d69250a7c1bf..85f0cbd5faa5 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -194,7 +194,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
  * headers.
  */
 #ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
-DECLARE_PER_CPU(u32, pkrs_cache);
 static inline void pks_init_task(struct task_struct *tsk)
 {
 	/* New tasks get the most restrictive PKRS value */
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index e565fadd74d7..cf9915fed6c9 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -247,7 +247,7 @@ DEFINE_PER_CPU(u32, pkrs_cache);
  * disabled as it does not guarantee the atomicity of updating the pkrs_cache
  * and MSR on its own.
  */
-void write_pkrs(u32 pkrs_val)
+void noinstr write_pkrs(u32 pkrs_val)
 {
 	this_cpu_write(pkrs_cache, pkrs_val);
 	wrmsrl(MSR_IA32_PKRS, pkrs_val);
-- 
2.28.0.rc0.12.gb6a658bd00c9


  parent reply index

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-17  7:20 [PATCH RFC V2 00/17] PKS: Add Protection Keys Supervisor (PKS) support ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 01/17] x86/pkeys: Create pkeys_internal.h ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 02/17] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support ira.weiny
2020-07-17  8:54   ` Peter Zijlstra
2020-07-17 20:52     ` Ira Weiny
2020-07-20  9:14       ` Peter Zijlstra
2020-07-17 22:36     ` Dave Hansen
2020-07-20  9:13       ` Peter Zijlstra
2020-07-17  7:20 ` [PATCH RFC V2 03/17] x86/pks: Enable Protection Keys Supervisor (PKS) ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 04/17] x86/pks: Preserve the PKRS MSR on context switch ira.weiny
2020-07-17  8:31   ` Peter Zijlstra
2020-07-17 21:39     ` Ira Weiny
2020-07-17  8:59   ` Peter Zijlstra
2020-07-17 22:34     ` Ira Weiny
2020-07-20  9:15       ` Peter Zijlstra
2020-07-20 18:35         ` Ira Weiny
2020-07-17  7:20 ` [PATCH RFC V2 05/17] x86/pks: Add PKS kernel API ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 06/17] x86/pks: Add a debugfs file for allocated PKS keys ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 07/17] Documentation/pkeys: Update documentation for kernel pkeys ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 08/17] x86/pks: Add PKS Test code ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 09/17] memremap: Convert devmap static branch to {inc,dec} ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 10/17] fs/dax: Remove unused size parameter ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 11/17] drivers/dax: Expand lock scope to cover the use of addresses ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 12/17] memremap: Add zone device access protection ira.weiny
2020-07-17  9:10   ` Peter Zijlstra
2020-07-18  5:06     ` Ira Weiny
2020-07-20  9:16       ` Peter Zijlstra
2020-07-17  9:17   ` Peter Zijlstra
2020-07-18  5:51     ` Ira Weiny
2020-07-17  9:20   ` Peter Zijlstra
2020-07-17  7:20 ` [PATCH RFC V2 13/17] kmap: Add stray write protection for device pages ira.weiny
2020-07-17  9:21   ` Peter Zijlstra
2020-07-19  4:13     ` Ira Weiny
2020-07-20  9:17       ` Peter Zijlstra
2020-07-21 16:31         ` Ira Weiny
2020-07-17  7:20 ` [PATCH RFC V2 14/17] dax: Stray write protection for dax_direct_access() ira.weiny
2020-07-17  9:22   ` Peter Zijlstra
2020-07-19  4:41     ` Ira Weiny
2020-07-17  7:20 ` [PATCH RFC V2 15/17] nvdimm/pmem: Stray write protection for pmem->virt_addr ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 16/17] [dax|pmem]: Enable stray write protection ira.weiny
2020-07-17  9:25   ` Peter Zijlstra
2020-07-17  7:20 ` ira.weiny [this message]
2020-07-17  9:30   ` [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions Peter Zijlstra
2020-07-21 18:01     ` Ira Weiny
2020-07-21 19:11       ` Peter Zijlstra
2020-07-17  9:34   ` Peter Zijlstra
2020-07-17 10:06   ` Peter Zijlstra
2020-07-22  5:27     ` Ira Weiny
2020-07-22  9:48       ` Peter Zijlstra
2020-07-22 21:24         ` Ira Weiny
2020-07-23 20:08       ` Thomas Gleixner
2020-07-23 20:15         ` Thomas Gleixner
2020-07-24 17:23           ` Ira Weiny
2020-07-24 17:29             ` Andy Lutomirski
2020-07-24 19:43               ` Ira Weiny
2020-07-22 16:21   ` Andy Lutomirski
2020-07-23 16:18     ` Fenghua Yu
2020-07-23 16:23       ` Dave Hansen
2020-07-23 16:52         ` Fenghua Yu
2020-07-23 17:08           ` Andy Lutomirski
2020-07-23 17:30             ` Dave Hansen
2020-07-23 20:23               ` Thomas Gleixner
2020-07-23 20:22             ` Thomas Gleixner
2020-07-23 21:30               ` Andy Lutomirski
2020-07-23 22:14                 ` Thomas Gleixner
2020-07-23 19:53   ` Thomas Gleixner
2020-07-23 22:04     ` Ira Weiny
2020-07-23 23:41       ` Thomas Gleixner
2020-07-24 21:24         ` Thomas Gleixner
2020-07-24 21:31           ` Thomas Gleixner
2020-07-25  0:09           ` Andy Lutomirski
2020-07-27 20:59           ` Ira Weiny
2020-07-24 22:19 ` [PATCH RFC V2 00/17] PKS: Add Protection Keys Supervisor (PKS) support Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200717072056.73134-18-ira.weiny@intel.com \
    --to=ira.weiny@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vishal.l.verma@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Doc Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-doc/0 linux-doc/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-doc linux-doc/ https://lore.kernel.org/linux-doc \
		linux-doc@vger.kernel.org
	public-inbox-index linux-doc

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-doc


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git