linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V6 00/10] PKS: Add Protection Key Supervisor support
@ 2021-04-01 22:58 ira.weiny
  2021-04-01 22:58 ` [PATCH V6 01/10] x86/pkeys: Create pkeys_common.h ira.weiny
                   ` (10 more replies)
  0 siblings, 11 replies; 13+ messages in thread
From: ira.weiny @ 2021-04-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Ira Weiny, Dave Hansen, Dan Williams, x86, linux-kernel,
	Fenghua Yu, linux-doc, linux-kselftest

From: Ira Weiny <ira.weiny@intel.com>

Introduce a new page protection mechanism for supervisor pages, Protection Key
Supervisor (PKS).

Generally PKS enables protections on 'domains' of supervisor pages to limit
supervisor mode access to pages beyond the normal paging protections.  PKS
works in a similar fashion to user space pkeys, PKU.  As with PKU, supervisor
pkeys are checked in addition to normal paging protections and Access or Writes
can be disabled via a MSR update without TLB flushes when permissions change.

Also like PKU, a page mapping is assigned to a domain by setting pkey bits in
the page table entry for that mapping.

Access is controlled through a PKRS register which is updated via WRMSR/RDMSR.

XSAVE is not supported for the PKRS MSR.  Therefore the implementation
saves/restores the MSR across context switches and during exceptions.  Nested
exceptions are supported by each exception getting a new PKS state.

For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections on mappings with the default pkey value of 0.

Other keys, (1-15) are allocated by an allocator which prepares us for key
contention from day one.  Kernel users should be prepared for the allocator to
fail either because of key exhaustion or due to PKS not being supported on the
CPU instance.

The following are key attributes of PKS.

	1) Fast switching of permissions
		1a) Prevents access without page table manipulations
		1b) No TLB flushes required
	2) Works on a per thread basis

PKS is available with 4 and 5 level paging.  Like PKRU it consumes 4 bits from
the PTE to store the pkey within the entry.

All code to support PKS is configured via ARCH_ENABLE_SUPERVISOR_PKEYS which
is designed to only be turned on when a user is configured on in the kernel.
Those users must depend on ARCH_HAS_SUPERVISOR_PKEYS to properly work with
other architectures which do not yet support PKS.

Originally this series was submitted as part of a large patch set which
converted the kmap call sites.[1]

Many follow on discussions revealed a few problems.  The first of which was
that some callers leak a kmap mapping across threads rather than containing it
to a critical section.  Attempts were made to see if these 'global kmaps' could
be supported.[2]  However, supporting global kmaps had many problems.  Work is
being done in parallel on converting as many kmap calls to the new
kmap_local_page().[3]


Changes from V5 [6]
	From Dave Hansen
		Remove 'we' from comments

Changes from V4 [5]
	From kernel test robot <lkp@intel.com>
		Fix i386 build: pks_init_task not found
	Move MSR_IA32_PKRS and INIT_PKRS_VALUE into patch 5 where they are
		first 'used'.  (Technically nothing is 'used' until the final
		test patch.  But review wise this is much cleaner.)
	From Sean Christoperson
		Add documentation details on what happens if the pkey is violated
		Change cpu_feature_enabled to be in WARN_ON check
		Clean up commit message of patch 6


[1] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/

[2] https://lore.kernel.org/lkml/87mtycqcjf.fsf@nanos.tec.linutronix.de/

[3] https://lore.kernel.org/lkml/20210128061503.1496847-1-ira.weiny@intel.com/
    https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.weiny@intel.com/
    https://lore.kernel.org/lkml/20210205170030.856723-1-ira.weiny@intel.com/
    https://lore.kernel.org/lkml/20210217024826.3466046-1-ira.weiny@intel.com/

[4] https://lore.kernel.org/lkml/20201106232908.364581-1-ira.weiny@intel.com/

[5] https://lore.kernel.org/lkml/20210322053020.2287058-1-ira.weiny@intel.com/

[6] https://lore.kernel.org/lkml/20210331191405.341999-1-ira.weiny@intel.com/


Fenghua Yu (1):
  x86/pks: Add PKS kernel API

Ira Weiny (9):
  x86/pkeys: Create pkeys_common.h
  x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
  x86/pks: Add additional PKEY helper macros
  x86/pks: Add PKS defines and Kconfig options
  x86/pks: Add PKS setup code
  x86/fault: Adjust WARN_ON for PKey fault
  x86/pks: Preserve the PKRS MSR on context switch
  x86/entry: Preserve PKRS MSR across exceptions
  x86/pks: Add PKS test code

 Documentation/core-api/protection-keys.rst  | 112 +++-
 arch/x86/Kconfig                            |   1 +
 arch/x86/entry/calling.h                    |  26 +
 arch/x86/entry/common.c                     |  57 ++
 arch/x86/entry/entry_64.S                   |  22 +-
 arch/x86/entry/entry_64_compat.S            |   6 +-
 arch/x86/include/asm/cpufeatures.h          |   1 +
 arch/x86/include/asm/disabled-features.h    |   8 +-
 arch/x86/include/asm/msr-index.h            |   1 +
 arch/x86/include/asm/pgtable.h              |  15 +-
 arch/x86/include/asm/pgtable_types.h        |  12 +
 arch/x86/include/asm/pkeys.h                |   4 +
 arch/x86/include/asm/pkeys_common.h         |  34 +
 arch/x86/include/asm/pks.h                  |  54 ++
 arch/x86/include/asm/processor-flags.h      |   2 +
 arch/x86/include/asm/processor.h            |  47 +-
 arch/x86/include/uapi/asm/processor-flags.h |   2 +
 arch/x86/kernel/cpu/common.c                |   2 +
 arch/x86/kernel/fpu/xstate.c                |  22 +-
 arch/x86/kernel/head_64.S                   |   7 +-
 arch/x86/kernel/process.c                   |   3 +
 arch/x86/kernel/process_64.c                |   2 +
 arch/x86/mm/fault.c                         |  30 +-
 arch/x86/mm/pkeys.c                         | 218 +++++-
 include/linux/pgtable.h                     |   4 +
 include/linux/pkeys.h                       |  34 +
 kernel/entry/common.c                       |  14 +-
 lib/Kconfig.debug                           |  11 +
 lib/Makefile                                |   3 +
 lib/pks/Makefile                            |   3 +
 lib/pks/pks_test.c                          | 694 ++++++++++++++++++++
 mm/Kconfig                                  |   5 +
 tools/testing/selftests/x86/Makefile        |   3 +-
 tools/testing/selftests/x86/test_pks.c      | 149 +++++
 34 files changed, 1527 insertions(+), 81 deletions(-)
 create mode 100644 arch/x86/include/asm/pkeys_common.h
 create mode 100644 arch/x86/include/asm/pks.h
 create mode 100644 lib/pks/Makefile
 create mode 100644 lib/pks/pks_test.c
 create mode 100644 tools/testing/selftests/x86/test_pks.c

-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V6 01/10] x86/pkeys: Create pkeys_common.h
  2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
@ 2021-04-01 22:58 ` ira.weiny
  2021-04-01 22:58 ` [PATCH V6 02/10] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support ira.weiny
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: ira.weiny @ 2021-04-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Ira Weiny, Dan Williams, Dave Hansen, x86, linux-kernel,
	Fenghua Yu, linux-doc, linux-kselftest

From: Ira Weiny <ira.weiny@intel.com>

Protection Keys User (PKU) and Protection Keys Supervisor (PKS) work in
similar fashions and can share common defines.  Specifically PKS and PKU
each have:

	1. A single control register
	2. The same number of keys
	3. The same number of bits in the register per key
	4. Access and Write disable in the same bit locations

Given the above, share all the macros that synthesize and manipulate
register values between the two features.  Unlike PKU the PKS
definitions are needed in both pgtable.h and pkeys.h.  Create a common
header for those 2 headers to share. The alternative, including
pgtable.h in pkeys.h, triggers complex header dependencies.

Share these defines by moving them into a new header, change their names
to reflect the common use, and include the header where needed.

Also while editing the code remove the use of 'we' from comments being
touched.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
NOTE: The initialization of init_pkru_value cause checkpatch errors
because of the space after the '(' in the macros.  Leave this as is
because it is more readable in this format.  And it was existing code.

---
Changes from V5:
	Remove 'we' from code comments

Changes from V3:
	From Dan Williams
		Fix guard macro names
		Reword commit message.

Changes from RFC V3
	Per Dave Hansen
		Update commit message
		Add comment to PKR_AD_KEY macro
---
 arch/x86/include/asm/pgtable.h      | 18 +++++++-----------
 arch/x86/include/asm/pkeys.h        |  2 ++
 arch/x86/include/asm/pkeys_common.h | 15 +++++++++++++++
 arch/x86/kernel/fpu/xstate.c        | 10 +++++-----
 arch/x86/mm/pkeys.c                 | 14 ++++++--------
 5 files changed, 35 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/include/asm/pkeys_common.h

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a02c67291cfc..53bbde334193 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1360,9 +1360,7 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd)
 }
 #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
 
-#define PKRU_AD_BIT 0x1
-#define PKRU_WD_BIT 0x2
-#define PKRU_BITS_PER_PKEY 2
+#include <asm/pkeys_common.h>
 
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
 extern u32 init_pkru_value;
@@ -1372,18 +1370,16 @@ extern u32 init_pkru_value;
 
 static inline bool __pkru_allows_read(u32 pkru, u16 pkey)
 {
-	int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY;
-	return !(pkru & (PKRU_AD_BIT << pkru_pkey_bits));
+	int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY;
+
+	return !(pkru & (PKR_AD_BIT << pkru_pkey_bits));
 }
 
 static inline bool __pkru_allows_write(u32 pkru, u16 pkey)
 {
-	int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY;
-	/*
-	 * Access-disable disables writes too so we need to check
-	 * both bits here.
-	 */
-	return !(pkru & ((PKRU_AD_BIT|PKRU_WD_BIT) << pkru_pkey_bits));
+	int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY;
+	/* Access-disable disables writes too so check both bits here. */
+	return !(pkru & ((PKR_AD_BIT|PKR_WD_BIT) << pkru_pkey_bits));
 }
 
 static inline u16 pte_flags_pkey(unsigned long pte_flags)
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 2ff9b98812b7..f9feba80894b 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_PKEYS_H
 #define _ASM_X86_PKEYS_H
 
+#include <asm/pkeys_common.h>
+
 #define ARCH_DEFAULT_PKEY	0
 
 /*
diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h
new file mode 100644
index 000000000000..e40b0ced733f
--- /dev/null
+++ b/arch/x86/include/asm/pkeys_common.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PKEYS_COMMON_H
+#define _ASM_X86_PKEYS_COMMON_H
+
+#define PKR_AD_BIT 0x1
+#define PKR_WD_BIT 0x2
+#define PKR_BITS_PER_PKEY 2
+
+/*
+ * Generate an Access-Disable mask for the given pkey.  Several of these can be
+ * OR'd together to generate pkey register values.
+ */
+#define PKR_AD_KEY(pkey)	(PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY))
+
+#endif /*_ASM_X86_PKEYS_COMMON_H */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 683749b80ae2..406ad78e1969 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -995,7 +995,7 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 		unsigned long init_val)
 {
 	u32 old_pkru;
-	int pkey_shift = (pkey * PKRU_BITS_PER_PKEY);
+	int pkey_shift = (pkey * PKR_BITS_PER_PKEY);
 	u32 new_pkru_bits = 0;
 
 	/*
@@ -1012,18 +1012,18 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 	 */
 	WARN_ON_ONCE(pkey >= arch_max_pkey());
 
-	/* Set the bits we need in PKRU:  */
+	/* Set the bits needed in PKRU:  */
 	if (init_val & PKEY_DISABLE_ACCESS)
-		new_pkru_bits |= PKRU_AD_BIT;
+		new_pkru_bits |= PKR_AD_BIT;
 	if (init_val & PKEY_DISABLE_WRITE)
-		new_pkru_bits |= PKRU_WD_BIT;
+		new_pkru_bits |= PKR_WD_BIT;
 
 	/* Shift the bits in to the correct place in PKRU for pkey: */
 	new_pkru_bits <<= pkey_shift;
 
 	/* Get old PKRU and mask off any old bits in place: */
 	old_pkru = read_pkru();
-	old_pkru &= ~((PKRU_AD_BIT|PKRU_WD_BIT) << pkey_shift);
+	old_pkru &= ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift);
 
 	/* Write old part along with new part: */
 	write_pkru(old_pkru | new_pkru_bits);
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 8873ed1438a9..f5efb4007e74 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -111,19 +111,17 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey
 	return vma_pkey(vma);
 }
 
-#define PKRU_AD_KEY(pkey)	(PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY))
-
 /*
  * Make the default PKRU value (at execve() time) as restrictive
  * as possible.  This ensures that any threads clone()'d early
  * in the process's lifetime will not accidentally get access
  * to data which is pkey-protected later on.
  */
-u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) |
-		      PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) |
-		      PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) |
-		      PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) |
-		      PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15);
+u32 init_pkru_value = PKR_AD_KEY( 1) | PKR_AD_KEY( 2) | PKR_AD_KEY( 3) |
+		      PKR_AD_KEY( 4) | PKR_AD_KEY( 5) | PKR_AD_KEY( 6) |
+		      PKR_AD_KEY( 7) | PKR_AD_KEY( 8) | PKR_AD_KEY( 9) |
+		      PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) |
+		      PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15);
 
 /*
  * Called from the FPU code when creating a fresh set of FPU
@@ -173,7 +171,7 @@ static ssize_t init_pkru_write_file(struct file *file,
 	 * up immediately if someone attempts to disable access
 	 * or writes to pkey 0.
 	 */
-	if (new_init_pkru & (PKRU_AD_BIT|PKRU_WD_BIT))
+	if (new_init_pkru & (PKR_AD_BIT|PKR_WD_BIT))
 		return -EINVAL;
 
 	WRITE_ONCE(init_pkru_value, new_init_pkru);
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V6 02/10] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
  2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
  2021-04-01 22:58 ` [PATCH V6 01/10] x86/pkeys: Create pkeys_common.h ira.weiny
@ 2021-04-01 22:58 ` ira.weiny
  2021-04-01 22:58 ` [PATCH V6 03/10] x86/pks: Add additional PKEY helper macros ira.weiny
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: ira.weiny @ 2021-04-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Ira Weiny, Dan Williams, Dave Hansen, x86, linux-kernel,
	Fenghua Yu, linux-doc, linux-kselftest

From: Ira Weiny <ira.weiny@intel.com>

Define a helper, update_pkey_val(), which will be used to support both
Protection Key User (PKU) and the new Protection Key for Supervisor
(PKS) in subsequent patches.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC V3:
	Per Dave Hansen
		Update and add comments per Dave's review
	Per Peter
		Correct attribution
---
 arch/x86/include/asm/pkeys.h |  2 ++
 arch/x86/kernel/fpu/xstate.c | 22 ++++------------------
 arch/x86/mm/pkeys.c          | 23 +++++++++++++++++++++++
 3 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index f9feba80894b..4526245b03e5 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -136,4 +136,6 @@ static inline int vma_pkey(struct vm_area_struct *vma)
 	return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
 }
 
+u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags);
+
 #endif /*_ASM_X86_PKEYS_H */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 406ad78e1969..00251bdf759b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -994,9 +994,7 @@ const void *get_xsave_field_ptr(int xfeature_nr)
 int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 		unsigned long init_val)
 {
-	u32 old_pkru;
-	int pkey_shift = (pkey * PKR_BITS_PER_PKEY);
-	u32 new_pkru_bits = 0;
+	u32 pkru;
 
 	/*
 	 * This check implies XSAVE support.  OSPKE only gets
@@ -1012,21 +1010,9 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 	 */
 	WARN_ON_ONCE(pkey >= arch_max_pkey());
 
-	/* Set the bits needed in PKRU:  */
-	if (init_val & PKEY_DISABLE_ACCESS)
-		new_pkru_bits |= PKR_AD_BIT;
-	if (init_val & PKEY_DISABLE_WRITE)
-		new_pkru_bits |= PKR_WD_BIT;
-
-	/* Shift the bits in to the correct place in PKRU for pkey: */
-	new_pkru_bits <<= pkey_shift;
-
-	/* Get old PKRU and mask off any old bits in place: */
-	old_pkru = read_pkru();
-	old_pkru &= ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift);
-
-	/* Write old part along with new part: */
-	write_pkru(old_pkru | new_pkru_bits);
+	pkru = read_pkru();
+	pkru = update_pkey_val(pkru, pkey, init_val);
+	write_pkru(pkru);
 
 	return 0;
 }
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index f5efb4007e74..d1dfe743e79f 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -208,3 +208,26 @@ static __init int setup_init_pkru(char *opt)
 	return 1;
 }
 __setup("init_pkru=", setup_init_pkru);
+
+/*
+ * Replace disable bits for @pkey with values from @flags
+ *
+ * Kernel users use the same flags as user space:
+ *     PKEY_DISABLE_ACCESS
+ *     PKEY_DISABLE_WRITE
+ */
+u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)
+{
+	int pkey_shift = pkey * PKR_BITS_PER_PKEY;
+
+	/*  Mask out old bit values */
+	pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift);
+
+	/*  Or in new values */
+	if (flags & PKEY_DISABLE_ACCESS)
+		pk_reg |= PKR_AD_BIT << pkey_shift;
+	if (flags & PKEY_DISABLE_WRITE)
+		pk_reg |= PKR_WD_BIT << pkey_shift;
+
+	return pk_reg;
+}
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V6 03/10] x86/pks: Add additional PKEY helper macros
  2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
  2021-04-01 22:58 ` [PATCH V6 01/10] x86/pkeys: Create pkeys_common.h ira.weiny
  2021-04-01 22:58 ` [PATCH V6 02/10] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support ira.weiny
@ 2021-04-01 22:58 ` ira.weiny
  2021-04-01 22:58 ` [PATCH V6 04/10] x86/pks: Add PKS defines and Kconfig options ira.weiny
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: ira.weiny @ 2021-04-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Ira Weiny, Dan Williams, Dave Hansen, x86, linux-kernel,
	Fenghua Yu, linux-doc, linux-kselftest

From: Ira Weiny <ira.weiny@intel.com>

Avoid open coding shift and mask operations by defining and using helper
macros for PKey operations.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V3:
	new patch suggested by Dan Williams to use macros better.
---
 arch/x86/include/asm/pgtable.h      |  7 ++-----
 arch/x86/include/asm/pkeys_common.h | 11 ++++++++---
 arch/x86/mm/pkeys.c                 |  8 +++-----
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 53bbde334193..07e9779b76d2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1370,16 +1370,13 @@ extern u32 init_pkru_value;
 
 static inline bool __pkru_allows_read(u32 pkru, u16 pkey)
 {
-	int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY;
-
-	return !(pkru & (PKR_AD_BIT << pkru_pkey_bits));
+	return !(pkru & PKR_AD_KEY(pkey));
 }
 
 static inline bool __pkru_allows_write(u32 pkru, u16 pkey)
 {
-	int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY;
 	/* Access-disable disables writes too so check both bits here. */
-	return !(pkru & ((PKR_AD_BIT|PKR_WD_BIT) << pkru_pkey_bits));
+	return !(pkru & (PKR_AD_KEY(pkey) | PKR_WD_KEY(pkey)));
 }
 
 static inline u16 pte_flags_pkey(unsigned long pte_flags)
diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h
index e40b0ced733f..0681522974ba 100644
--- a/arch/x86/include/asm/pkeys_common.h
+++ b/arch/x86/include/asm/pkeys_common.h
@@ -6,10 +6,15 @@
 #define PKR_WD_BIT 0x2
 #define PKR_BITS_PER_PKEY 2
 
+#define PKR_PKEY_SHIFT(pkey) (pkey * PKR_BITS_PER_PKEY)
+#define PKR_PKEY_MASK(pkey)  (((1 << PKR_BITS_PER_PKEY) - 1) << PKR_PKEY_SHIFT(pkey))
+
 /*
- * Generate an Access-Disable mask for the given pkey.  Several of these can be
- * OR'd together to generate pkey register values.
+ * Generate an Access-Disable and Write-Disable mask for the given pkey.
+ * Several of the AD's are OR'd together to generate a default pkey register
+ * value.
  */
-#define PKR_AD_KEY(pkey)	(PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY))
+#define PKR_AD_KEY(pkey)     (PKR_AD_BIT << PKR_PKEY_SHIFT(pkey))
+#define PKR_WD_KEY(pkey)     (PKR_WD_BIT << PKR_PKEY_SHIFT(pkey))
 
 #endif /*_ASM_X86_PKEYS_COMMON_H */
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index d1dfe743e79f..fc8c7e2bb21b 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -218,16 +218,14 @@ __setup("init_pkru=", setup_init_pkru);
  */
 u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)
 {
-	int pkey_shift = pkey * PKR_BITS_PER_PKEY;
-
 	/*  Mask out old bit values */
-	pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift);
+	pk_reg &= ~PKR_PKEY_MASK(pkey);
 
 	/*  Or in new values */
 	if (flags & PKEY_DISABLE_ACCESS)
-		pk_reg |= PKR_AD_BIT << pkey_shift;
+		pk_reg |= PKR_AD_KEY(pkey);
 	if (flags & PKEY_DISABLE_WRITE)
-		pk_reg |= PKR_WD_BIT << pkey_shift;
+		pk_reg |= PKR_WD_KEY(pkey);
 
 	return pk_reg;
 }
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V6 04/10] x86/pks: Add PKS defines and Kconfig options
  2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
                   ` (2 preceding siblings ...)
  2021-04-01 22:58 ` [PATCH V6 03/10] x86/pks: Add additional PKEY helper macros ira.weiny
@ 2021-04-01 22:58 ` ira.weiny
  2021-04-01 22:58 ` [PATCH V6 05/10] x86/pks: Add PKS setup code ira.weiny
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: ira.weiny @ 2021-04-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Ira Weiny, Dan Williams, Fenghua Yu, Dave Hansen, x86,
	linux-kernel, linux-doc, linux-kselftest

From: Ira Weiny <ira.weiny@intel.com>

Protection Keys for Supervisor pages (PKS) enables fast, hardware thread
specific, manipulation of permission restrictions on supervisor page
mappings.  It uses the same mechanism of Protection Keys as those on
User mappings but applies that mechanism to supervisor mappings using a
supervisor specific MSR.

Kernel users can define domains of page mappings which have an extra
level of protection beyond those specified in the supervisor page table
entries.

Define the PKS CPU feature bits.

Add the Kconfig ARCH_HAS_SUPERVISOR_PKEYS to indicate to consumers that
an architecture supports pkeys.

Introduce ARCH_ENABLE_SUPERVISOR_PKEYS to allow kernel users to specify
to the arch that they wish to use the supervisor key support if
ARCH_HAS_SUPERVISOR_PKEYS is available.  ARCH_ENABLE_SUPERVISOR_PKEYS
remains off until the first use case sets it.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V3:
	From Dan
		Clean up commit message
		Add ARCH_ENABLE_SUPERVISOR_PKEYS option there is no
		overhead of PKS unless there is a user
	Clean up commit message grammar

Changes from V2
	New patch for V3:  Split this off from the enable patch to be
	able to create cleaner bisectability
---
 arch/x86/Kconfig                            | 1 +
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 mm/Kconfig                                  | 4 ++++
 5 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879d398e..5e3a7c2bc342 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1870,6 +1870,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 	depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
 	select ARCH_USES_HIGH_VMA_FLAGS
 	select ARCH_HAS_PKEYS
+	select ARCH_HAS_SUPERVISOR_PKEYS
 	help
 	  Memory Protection Keys provides a mechanism for enforcing
 	  page-based protections, but without requiring modification of the
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cc96e26d69f7..83ed73407417 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -359,6 +359,7 @@
 #define X86_FEATURE_MOVDIR64B		(16*32+28) /* MOVDIR64B instruction */
 #define X86_FEATURE_ENQCMD		(16*32+29) /* ENQCMD and ENQCMDS instructions */
 #define X86_FEATURE_SGX_LC		(16*32+30) /* Software Guard Extensions Launch Control */
+#define X86_FEATURE_PKS			(16*32+31) /* Protection Keys for Supervisor pages */
 
 /* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV	(17*32+ 0) /* MCA overflow recovery support */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index b7dd944dc867..fd09ae852c04 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -44,6 +44,12 @@
 # define DISABLE_OSPKE		(1<<(X86_FEATURE_OSPKE & 31))
 #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
 
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+# define DISABLE_PKS		0
+#else
+# define DISABLE_PKS		(1<<(X86_FEATURE_PKS & 31))
+#endif
+
 #ifdef CONFIG_X86_5LEVEL
 # define DISABLE_LA57	0
 #else
@@ -88,7 +94,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_PKS)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK19	0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index bcba3c643e63..191c574b2390 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -130,6 +130,8 @@
 #define X86_CR4_SMAP		_BITUL(X86_CR4_SMAP_BIT)
 #define X86_CR4_PKE_BIT		22 /* enable Protection Keys support */
 #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
+#define X86_CR4_PKS_BIT		24 /* enable Protection Keys for Supervisor */
+#define X86_CR4_PKS		_BITUL(X86_CR4_PKS_BIT)
 
 /*
  * x86-64 Task Priority Register, CR8
diff --git a/mm/Kconfig b/mm/Kconfig
index 24c045b24b95..c7d1fc780358 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -808,6 +808,10 @@ config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
 	bool
+config ARCH_HAS_SUPERVISOR_PKEYS
+	bool
+config ARCH_ENABLE_SUPERVISOR_PKEYS
+	bool
 
 config PERCPU_STATS
 	bool "Collect percpu memory statistics"
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V6 05/10] x86/pks: Add PKS setup code
  2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
                   ` (3 preceding siblings ...)
  2021-04-01 22:58 ` [PATCH V6 04/10] x86/pks: Add PKS defines and Kconfig options ira.weiny
@ 2021-04-01 22:58 ` ira.weiny
  2021-04-01 22:58 ` [PATCH V6 06/10] x86/fault: Adjust WARN_ON for PKey fault ira.weiny
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: ira.weiny @ 2021-04-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Ira Weiny, Dan Williams, Fenghua Yu, Dave Hansen, x86,
	linux-kernel, linux-doc, linux-kselftest

From: Ira Weiny <ira.weiny@intel.com>

Protection Keys for Supervisor pages (PKS) enables fast, hardware thread
specific, manipulation of permission restrictions on supervisor page
mappings.  It uses the same mechanism of Protection Keys as those on
User mappings but applies that mechanism to supervisor mappings using a
supervisor specific MSR.

Add setup code and the lowest level of PKS MSR write support.  The write
value is cached per-cpu to avoid the overhead of the MSR write if the
value has not changed.

That said, it should be noted that the underlying WRMSR(MSR_IA32_PKRS)
is not serializing but still maintains ordering properties similar to
WRPKRU.  The current SDM section on PKRS needs updating but should be
the same as that of WRPKRU.  So to quote from the WRPKRU text:

	WRPKRU will never execute transiently. Memory accesses affected
	by PKRU register will not execute (even transiently) until all
	prior executions of WRPKRU have completed execution and updated
	the PKRU register.

write_pkrs() contributed by Peter Zijlstra.

Introduce asm/pks.h to declare setup_pks() as an internal function call.
Later patches will also need this new header as a place to declare
internal structures and functions.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V4:
	Move MSR_IA32_PKRS and INIT_PKRS_VALUE into this patch where
	they are first 'used'.  (Technically nothing is used until the
	test patch but this is correct for review purposes.)

Changes from V3:
	From Dan Williams:
		Update commit message
		Add pks.h to hold ifdefery out of *.c files
		s/ARCH_HAS.../SUPERVISOR_PKEYS
		move setup_pks to pkeys.c (remove more ifdefery)
	From Dan Williams:
		Remove 'domain' language from commit message
		Clarify comment in fault handler
	Move the removal of the WARN_ON_ONCE in the fault path to this
	patch.  Previously it was in:
		[07/10] x86/fault: Report the PKRS state on fault

Changes from V2
	From Thomas: Make this patch last so PKS is not enabled until
	all the PKS mechanisms are in place.  Specifically:
		1) Modify setup_pks() to call write_pkrs() to properly
		   set up the initial value when enabled.

		2) Split this patch into two. 1) a precursor patch with
		   the required defines/config options and 2) this patch
		   which actually enables feature on CPUs which support
		   it.

Changes since RFC V3
	Per Dave Hansen
		Update comment
		Add X86_FEATURE_PKS to disabled-features.h
	Rebase based on latest TIP tree
---
 arch/x86/include/asm/msr-index.h    |  1 +
 arch/x86/include/asm/pkeys_common.h | 14 +++++++++
 arch/x86/include/asm/pks.h          | 15 +++++++++
 arch/x86/kernel/cpu/common.c        |  2 ++
 arch/x86/mm/pkeys.c                 | 48 +++++++++++++++++++++++++++++
 5 files changed, 80 insertions(+)
 create mode 100644 arch/x86/include/asm/pks.h

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 546d6ecf0a35..c15a049bf6ac 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -765,6 +765,7 @@
 
 #define MSR_IA32_TSC_DEADLINE		0x000006E0
 
+#define MSR_IA32_PKRS			0x000006E1
 
 #define MSR_TSX_FORCE_ABORT		0x0000010F
 
diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h
index 0681522974ba..6917f1a27479 100644
--- a/arch/x86/include/asm/pkeys_common.h
+++ b/arch/x86/include/asm/pkeys_common.h
@@ -17,4 +17,18 @@
 #define PKR_AD_KEY(pkey)     (PKR_AD_BIT << PKR_PKEY_SHIFT(pkey))
 #define PKR_WD_KEY(pkey)     (PKR_WD_BIT << PKR_PKEY_SHIFT(pkey))
 
+/*
+ * Define a default PKRS value for each task.
+ *
+ * Key 0 has no restriction.  All other keys are set to the most restrictive
+ * value which is access disabled (AD=1).
+ *
+ * NOTE: This needs to be a macro to be used as part of the INIT_THREAD macro.
+ */
+#define INIT_PKRS_VALUE (PKR_AD_KEY(1) | PKR_AD_KEY(2) | PKR_AD_KEY(3) | \
+			 PKR_AD_KEY(4) | PKR_AD_KEY(5) | PKR_AD_KEY(6) | \
+			 PKR_AD_KEY(7) | PKR_AD_KEY(8) | PKR_AD_KEY(9) | \
+			 PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | \
+			 PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15))
+
 #endif /*_ASM_X86_PKEYS_COMMON_H */
diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h
new file mode 100644
index 000000000000..5d7067ada8fb
--- /dev/null
+++ b/arch/x86/include/asm/pks.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PKS_H
+#define _ASM_X86_PKS_H
+
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+
+void setup_pks(void);
+
+#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
+
+static inline void setup_pks(void) { }
+
+#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
+
+#endif /* _ASM_X86_PKS_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index ab640abe26b6..de49d0c0f4e0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -58,6 +58,7 @@
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
 #include <asm/uv/uv.h>
+#include <asm/pks.h>
 
 #include "cpu.h"
 
@@ -1594,6 +1595,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 
 	x86_init_rdrand(c);
 	setup_pku(c);
+	setup_pks();
 
 	/*
 	 * Clear/Set all flags overridden by options, need do it
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index fc8c7e2bb21b..f6a3a54b8d7d 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -229,3 +229,51 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)
 
 	return pk_reg;
 }
+
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+
+static DEFINE_PER_CPU(u32, pkrs_cache);
+
+/*
+ * write_pkrs() optimizes MSR writes by maintaining a per cpu cache which can
+ * be checked quickly.
+ *
+ * It should also be noted that the underlying WRMSR(MSR_IA32_PKRS) is not
+ * serializing but still maintains ordering properties similar to WRPKRU.
+ * The current SDM section on PKRS needs updating but should be the same as
+ * that of WRPKRU.  So to quote from the WRPKRU text:
+ *
+ *     WRPKRU will never execute transiently. Memory accesses
+ *     affected by PKRU register will not execute (even transiently)
+ *     until all prior executions of WRPKRU have completed execution
+ *     and updated the PKRU register.
+ */
+void write_pkrs(u32 new_pkrs)
+{
+	u32 *pkrs;
+
+	if (!static_cpu_has(X86_FEATURE_PKS))
+		return;
+
+	pkrs = get_cpu_ptr(&pkrs_cache);
+	if (*pkrs != new_pkrs) {
+		*pkrs = new_pkrs;
+		wrmsrl(MSR_IA32_PKRS, new_pkrs);
+	}
+	put_cpu_ptr(pkrs);
+}
+
+/*
+ * PKS is independent of PKU and either or both may be supported on a CPU.
+ * Configure PKS if the CPU supports the feature.
+ */
+void setup_pks(void)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_PKS))
+		return;
+
+	write_pkrs(INIT_PKRS_VALUE);
+	cr4_set_bits(X86_CR4_PKS);
+}
+
+#endif
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V6 06/10] x86/fault: Adjust WARN_ON for PKey fault
  2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
                   ` (4 preceding siblings ...)
  2021-04-01 22:58 ` [PATCH V6 05/10] x86/pks: Add PKS setup code ira.weiny
@ 2021-04-01 22:58 ` ira.weiny
  2021-04-01 22:58 ` [PATCH V6 07/10] x86/pks: Preserve the PKRS MSR on context switch ira.weiny
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: ira.weiny @ 2021-04-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Ira Weiny, Sean Christopherson, Dan Williams, Dave Hansen, x86,
	linux-kernel, Fenghua Yu, linux-doc, linux-kselftest

From: Ira Weiny <ira.weiny@intel.com>

Previously if a Protection key fault occurred it indicated something
very wrong because user page mappings are not supposed to be in the
kernel address space.

Now PKey faults may happen on kernel mappings if the feature is enabled.

Remove the warning in the fault path and allow the oops to occur without
extra debugging if PKS is enabled.

Cc: Sean Christopherson <seanjc@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V5:
	From Dave Hansen
		Remove 'we' from comment

Changes from V4:
	From Sean Christopherson
		Clean up commit message and comment
		Change cpu_feature_enabled to be in WARN_ON check
---
 arch/x86/mm/fault.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a73347e2cdfc..0e0e90968f57 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1141,11 +1141,15 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 		   unsigned long address)
 {
 	/*
-	 * Protection keys exceptions only happen on user pages.  We
-	 * have no user pages in the kernel portion of the address
-	 * space, so do not expect them here.
+	 * X86_PF_PK (Protection key exceptions) may occur on kernel addresses
+	 * when PKS (PKeys Supervisor) is enabled.
+	 *
+	 * However, if PKS is not enabled WARN if this exception is seen
+	 * because there are no user pages in the kernel portion of the address
+	 * space.
 	 */
-	WARN_ON_ONCE(hw_error_code & X86_PF_PK);
+	WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS) &&
+		     (hw_error_code & X86_PF_PK));
 
 #ifdef CONFIG_X86_32
 	/*
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V6 07/10] x86/pks: Preserve the PKRS MSR on context switch
  2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
                   ` (5 preceding siblings ...)
  2021-04-01 22:58 ` [PATCH V6 06/10] x86/fault: Adjust WARN_ON for PKey fault ira.weiny
@ 2021-04-01 22:58 ` ira.weiny
  2021-04-01 22:58 ` [PATCH V6 08/10] x86/entry: Preserve PKRS MSR across exceptions ira.weiny
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: ira.weiny @ 2021-04-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Ira Weiny, Dan Williams, Fenghua Yu, Dave Hansen, x86,
	linux-kernel, linux-doc, linux-kselftest

From: Ira Weiny <ira.weiny@intel.com>

The PKRS MSR is defined as a per-logical-processor register.  This
isolates memory access by logical CPU.  Unfortunately, the MSR is not
managed by XSAVE.  Therefore, tasks must save/restore the MSR value on
context switch.

Define a saved PKRS value in the task struct, as well as a cached
per-logical-processor MSR value which mirrors the MSR value of the
current CPU.  Initialize all tasks with the default MSR value.  Then, on
schedule in, call write_pkrs() which automatically avoids the overhead
of the MSR write if possible.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V4
	From kernel test robot <lkp@intel.com>
		Fix i386 build: pks_init_task not found
	Move MSR_IA32_PKRS and INIT_PKRS_VALUE to patch 5 where they are
		'used'.  (Technically nothing is used until the final
		test patch but this organization makes review better.)
	Fix checkpatch errors

Changes from V3
	From Dan Williams
		make pks_init_task() and pks_sched_in() macros
		To avoid Supervisor PKey '#ifdefery' in process.c and
		process_64.c
	Split write_pkrs() to an earlier patch to be used in setup_pks()
		Move Peter's authorship to that patch.
	From Dan Williams
		Use ARCH_ENABLE_SUPERVISOR_PKEYS
	Remove kernel doc comment from write_pkrs
	From Thomas Gleixner
		Fix where pks_sched_in() is called from.
			Should be called from __switch_to()
			NOTE: PKS requires x86_64 so there is no need to
			update process_32.c
		Make pkrs_cache static
		Remove unnecessary pkrs_cache declaration
		Clean up formatting

Changes from V2
	Adjust for PKS enable being final patch.

Changes from V1
	Rebase to latest tip/master
		Resolve conflicts with INIT_THREAD changes

Changes since RFC V3
	Per Dave Hansen
		Update commit message
		move saved_pkrs to be in a nicer place
	Per Peter Zijlstra
		Add Comment from Peter
		Clean up white space
		Update authorship
---
 arch/x86/include/asm/processor.h | 47 +++++++++++++++++++++++++++++++-
 arch/x86/kernel/process.c        |  3 ++
 arch/x86/kernel/process_64.c     |  2 ++
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index dc6d149bf851..e0ffb9c849c5 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -18,6 +18,7 @@ struct vm86;
 #include <asm/cpufeatures.h>
 #include <asm/page.h>
 #include <asm/pgtable_types.h>
+#include <asm/pkeys_common.h>
 #include <asm/percpu.h>
 #include <asm/msr.h>
 #include <asm/desc_defs.h>
@@ -519,6 +520,12 @@ struct thread_struct {
 	unsigned long		cr2;
 	unsigned long		trap_nr;
 	unsigned long		error_code;
+
+#ifdef	CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+	/* Saved Protection key register for supervisor mappings */
+	u32			saved_pkrs;
+#endif
+
 #ifdef CONFIG_VM86
 	/* Virtual 86 mode info */
 	struct vm86		*vm86;
@@ -775,6 +782,37 @@ static inline void spin_lock_prefetch(const void *x)
 	((struct pt_regs *)__ptr) - 1;					\
 })
 
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+
+void write_pkrs(u32 new_pkrs);
+
+/*
+ * Define pks_init_task and pks_sched_in as macros to avoid requiring the
+ * definition of struct task_struct in this header while keeping the supervisor
+ * pkey #ifdefery out of process.c and process_64.c
+ */
+
+/*
+ * New tasks get the most restrictive PKRS value.
+ */
+#define pks_init_task(tsk) \
+	tsk->thread.saved_pkrs = INIT_PKRS_VALUE
+
+/*
+ * PKRS is only temporarily changed during specific code paths.  Only a
+ * preemption during these windows away from the default value would
+ * require updating the MSR.  write_pkrs() handles this optimization.
+ */
+#define pks_sched_in() \
+	write_pkrs(current->thread.saved_pkrs)
+
+#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
+
+#define pks_init_task(tsk)
+#define pks_sched_in()
+
+#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
+
 #ifdef CONFIG_X86_32
 #define INIT_THREAD  {							  \
 	.sp0			= TOP_OF_INIT_STACK,			  \
@@ -784,7 +822,14 @@ static inline void spin_lock_prefetch(const void *x)
 #define KSTK_ESP(task)		(task_pt_regs(task)->sp)
 
 #else
-#define INIT_THREAD { }
+
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+#define INIT_THREAD  {						\
+	.saved_pkrs = INIT_PKRS_VALUE,				\
+}
+#else
+#define INIT_THREAD  { }
+#endif
 
 extern unsigned long KSTK_ESP(struct task_struct *task);
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9c214d7085a4..89f8454a8541 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -43,6 +43,7 @@
 #include <asm/io_bitmap.h>
 #include <asm/proto.h>
 #include <asm/frame.h>
+#include <asm/processor.h>
 
 #include "process.h"
 
@@ -195,6 +196,8 @@ void flush_thread(void)
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
 
 	fpu__clear_all(&tsk->thread.fpu);
+
+	pks_init_task(tsk);
 }
 
 void disable_TSC(void)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index d08307df69ad..e590ecac1650 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -632,6 +632,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
 
+	pks_sched_in();
+
 	return prev_p;
 }
 
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V6 08/10] x86/entry: Preserve PKRS MSR across exceptions
  2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
                   ` (6 preceding siblings ...)
  2021-04-01 22:58 ` [PATCH V6 07/10] x86/pks: Preserve the PKRS MSR on context switch ira.weiny
@ 2021-04-01 22:58 ` ira.weiny
  2021-04-01 22:58 ` [PATCH V6 09/10] x86/pks: Add PKS kernel API ira.weiny
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: ira.weiny @ 2021-04-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Ira Weiny, Dave Hansen, Dan Williams, x86, linux-kernel,
	Fenghua Yu, linux-doc, linux-kselftest

From: Ira Weiny <ira.weiny@intel.com>

The PKRS MSR is not managed by XSAVE.  It is preserved through a context
switch but this support leaves exception handling code open to memory
accesses during exceptions.

2 possible places for preserving this state were considered,
irqentry_state_t or pt_regs.[1]  pt_regs was much more complicated and
was potentially fraught with unintended consequences.[2]  However, Andy
came up with a way to hide additional values on the stack which could be
accessed as "extended_pt_regs".[3]  This method allows for; any place
which has struct pt_regs can get access to the extra information; no
extra information is added to irq_state; and pt_regs is left intact for
compatibility with outside tools like BPF.

To simplify, the assembly code only adds space on the stack.  The
setting or use of any needed values are left to the C code.  While some
entry points may not use this space it is still added where ever pt_regs
is passed to the C code for consistency.

Each nested exception gets another copy of this extended space allowing
for any number of levels of exception handling.

In the assembly, a macro is defined to allow a central place to add
space for other uses should the need arise.

Finally export pkrs_{save_set|restore}_irq to the common code to allow
it to preserve the current task's PKRS in the new extended pt_regs if
enabled.

Peter, Thomas, Andy, Dave, and Dan all suggested parts of the patch or
aided in the development of the patch..

[1] https://lore.kernel.org/lkml/CALCETrVe1i5JdyzD_BcctxQJn+ZE3T38EFPgjxN1F577M36g+w@mail.gmail.com/
[2] https://lore.kernel.org/lkml/874kpxx4jf.fsf@nanos.tec.linutronix.de/#t
[3] https://lore.kernel.org/lkml/CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3=PGW+iW5=w@mail.gmail.com/

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V5:
	Dave Hansen
		Remove 'we' from comments

Changes from V4:
	Fix checkpatch errors

Changes from V3:
	Fix 0-day issues
	Move all extended regs stuff to pks.h
	From Dan Williams
		Move show_extended_regs_oops ifdefery to pks.h
		Remove a bad comment
		s/irq_save_set_pkrs/pkrs_save_set_irq
		s/irq_restore_pkrs/pkrs_restore_irq
		s/ARCH_HAS/ARCH_ENABLE_SUPERVISOR_PKEYS
	From Dave Hansen:
		remove extra macro parameter for most calls
		clarify with comments
		Add BUILD check for extend regs size
		use subq/addq vs push/pop
	From Dan Williams and Dave Hansen:
		Use a macro call to wrap the c function calls with
		push/pop extended_pt_regs
	From Dave Hansen:
		Guidance on where to find each of the pt_regs being
		passed to C code
	From Thomas Gleixner:
		Remove unnecessary noinstr's
	From Andy Lutomirski:
		Convert to using the extended pt_regs
		Add in showing pks on fault through the extended pt_regs

Changes from V1
	remove redundant irq_state->pkrs
		This value is only needed for the global tracking.  So
		it should be included in that patch and not in this one.

Changes from RFC V3
	Standardize on 'irq_state' variable name
	Per Dave Hansen
		irq_save_pkrs() -> irq_save_set_pkrs()
	Rebased based on clean up patch by Thomas Gleixner
		This includes moving irq_[save_set|restore]_pkrs() to
		the core as well.
---
 arch/x86/entry/calling.h               | 26 ++++++++++++
 arch/x86/entry/common.c                | 57 ++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S              | 22 +++++-----
 arch/x86/entry/entry_64_compat.S       |  6 +--
 arch/x86/include/asm/pks.h             | 16 ++++++++
 arch/x86/include/asm/processor-flags.h |  2 +
 arch/x86/kernel/head_64.S              |  7 ++--
 arch/x86/mm/fault.c                    |  3 ++
 include/linux/pkeys.h                  | 17 ++++++++
 kernel/entry/common.c                  | 14 ++++++-
 10 files changed, 151 insertions(+), 19 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 07a9331d55e7..ec85f8f675be 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -97,6 +97,32 @@ For 32-bit we have the following conventions - kernel is built with
 
 #define SIZEOF_PTREGS	21*8
 
+/*
+ * __call_ext_ptregs - Helper macro to call into C with extended pt_regs
+ * @cfunc:		C function to be called
+ *
+ * This will ensure that extended_ptregs is added and removed as needed during
+ * a call into C code.
+ */
+.macro __call_ext_ptregs cfunc annotate_retpoline_safe:req
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+	/* add space for extended_pt_regs */
+	subq    $EXTENDED_PT_REGS_SIZE, %rsp
+#endif
+	.if \annotate_retpoline_safe == 1
+		ANNOTATE_RETPOLINE_SAFE
+	.endif
+	call	\cfunc
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+	/* remove space for extended_pt_regs */
+	addq    $EXTENDED_PT_REGS_SIZE, %rsp
+#endif
+.endm
+
+.macro call_ext_ptregs cfunc
+	__call_ext_ptregs \cfunc, annotate_retpoline_safe=0
+.endm
+
 .macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
 	.if \save_ret
 	pushq	%rsi		/* pt_regs->si */
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 4efd39aacb9f..239f01d710c5 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -19,6 +19,7 @@
 #include <linux/nospec.h>
 #include <linux/syscalls.h>
 #include <linux/uaccess.h>
+#include <linux/pkeys.h>
 
 #ifdef CONFIG_XEN_PV
 #include <xen/xen-ops.h>
@@ -34,6 +35,7 @@
 #include <asm/io_bitmap.h>
 #include <asm/syscall.h>
 #include <asm/irq_stack.h>
+#include <asm/pks.h>
 
 #ifdef CONFIG_X86_64
 __visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs)
@@ -214,6 +216,59 @@ SYSCALL_DEFINE0(ni_syscall)
 	return -ENOSYS;
 }
 
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+
+void show_extended_regs_oops(struct pt_regs *regs, unsigned long error_code)
+{
+	struct extended_pt_regs *ept_regs = extended_pt_regs(regs);
+
+	if (cpu_feature_enabled(X86_FEATURE_PKS) && (error_code & X86_PF_PK))
+		pr_alert("PKRS: 0x%x\n", ept_regs->thread_pkrs);
+}
+
+/*
+ * PKRS is a per-logical-processor MSR which overlays additional protection for
+ * pages which have been mapped with a protection key.
+ *
+ * The register is not maintained with XSAVE maintain the MSR value in software
+ * during context switch and exception handling.
+ *
+ * Context switches save the MSR in the task struct thus taking that value to
+ * other processors if necessary.
+ *
+ * To protect against exceptions having access to this memory save the current
+ * running value and set the PKRS value to be used during the exception.
+ */
+void pkrs_save_set_irq(struct pt_regs *regs, u32 val)
+{
+	struct extended_pt_regs *ept_regs;
+
+	BUILD_BUG_ON(sizeof(struct extended_pt_regs)
+			!= EXTENDED_PT_REGS_SIZE
+				+ sizeof(struct pt_regs));
+
+	if (!cpu_feature_enabled(X86_FEATURE_PKS))
+		return;
+
+	ept_regs = extended_pt_regs(regs);
+	ept_regs->thread_pkrs = current->thread.saved_pkrs;
+	write_pkrs(val);
+}
+
+void pkrs_restore_irq(struct pt_regs *regs)
+{
+	struct extended_pt_regs *ept_regs;
+
+	if (!cpu_feature_enabled(X86_FEATURE_PKS))
+		return;
+
+	ept_regs = extended_pt_regs(regs);
+	write_pkrs(ept_regs->thread_pkrs);
+	current->thread.saved_pkrs = ept_regs->thread_pkrs;
+}
+
+#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
+
 #ifdef CONFIG_XEN_PV
 #ifndef CONFIG_PREEMPTION
 /*
@@ -270,6 +325,8 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
 
 	inhcall = get_and_clear_inhcall();
 	if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
+		/* Normally called by irqentry_exit, restore pkrs here */
+		pkrs_restore_irq(regs);
 		instrumentation_begin();
 		irqentry_exit_cond_resched();
 		instrumentation_end();
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 400908dff42e..d65952a18ad7 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -331,7 +331,7 @@ SYM_CODE_END(ret_from_fork)
 		movq	$-1, ORIG_RAX(%rsp)	/* no syscall to restart */
 	.endif
 
-	call	\cfunc
+	call_ext_ptregs \cfunc
 
 	jmp	error_return
 .endm
@@ -434,7 +434,7 @@ SYM_CODE_START(\asmsym)
 
 	movq	%rsp, %rdi		/* pt_regs pointer */
 
-	call	\cfunc
+	call_ext_ptregs \cfunc
 
 	jmp	paranoid_exit
 
@@ -495,7 +495,7 @@ SYM_CODE_START(\asmsym)
 	 * stack.
 	 */
 	movq	%rsp, %rdi		/* pt_regs pointer */
-	call	vc_switch_off_ist
+	call_ext_ptregs vc_switch_off_ist
 	movq	%rax, %rsp		/* Switch to new stack */
 
 	UNWIND_HINT_REGS
@@ -506,7 +506,7 @@ SYM_CODE_START(\asmsym)
 
 	movq	%rsp, %rdi		/* pt_regs pointer */
 
-	call	\cfunc
+	call_ext_ptregs \cfunc
 
 	/*
 	 * No need to switch back to the IST stack. The current stack is either
@@ -541,7 +541,7 @@ SYM_CODE_START(\asmsym)
 	movq	%rsp, %rdi		/* pt_regs pointer into first argument */
 	movq	ORIG_RAX(%rsp), %rsi	/* get error code into 2nd argument*/
 	movq	$-1, ORIG_RAX(%rsp)	/* no syscall to restart */
-	call	\cfunc
+	call_ext_ptregs \cfunc
 
 	jmp	paranoid_exit
 
@@ -780,7 +780,7 @@ SYM_CODE_START_LOCAL(exc_xen_hypervisor_callback)
 	movq	%rdi, %rsp			/* we don't return, adjust the stack frame */
 	UNWIND_HINT_REGS
 
-	call	xen_pv_evtchn_do_upcall
+	call_ext_ptregs xen_pv_evtchn_do_upcall
 
 	jmp	error_return
 SYM_CODE_END(exc_xen_hypervisor_callback)
@@ -986,7 +986,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	/* Put us onto the real thread stack. */
 	popq	%r12				/* save return addr in %12 */
 	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
-	call	sync_regs
+	call_ext_ptregs sync_regs
 	movq	%rax, %rsp			/* switch stack */
 	ENCODE_FRAME_POINTER
 	pushq	%r12
@@ -1041,7 +1041,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * as if we faulted immediately after IRET.
 	 */
 	mov	%rsp, %rdi
-	call	fixup_bad_iret
+	call_ext_ptregs fixup_bad_iret
 	mov	%rax, %rsp
 	jmp	.Lerror_entry_from_usermode_after_swapgs
 SYM_CODE_END(error_entry)
@@ -1147,7 +1147,7 @@ SYM_CODE_START(asm_exc_nmi)
 
 	movq	%rsp, %rdi
 	movq	$-1, %rsi
-	call	exc_nmi
+	call_ext_ptregs exc_nmi
 
 	/*
 	 * Return back to user mode.  We must *not* do the normal exit
@@ -1183,6 +1183,8 @@ SYM_CODE_START(asm_exc_nmi)
 	 * +---------------------------------------------------------+
 	 * | pt_regs                                                 |
 	 * +---------------------------------------------------------+
+	 * | (Optionally) extended_pt_regs                           |
+	 * +---------------------------------------------------------+
 	 *
 	 * The "original" frame is used by hardware.  Before re-enabling
 	 * NMIs, we need to be done with it, and we need to leave enough
@@ -1359,7 +1361,7 @@ end_repeat_nmi:
 
 	movq	%rsp, %rdi
 	movq	$-1, %rsi
-	call	exc_nmi
+	call_ext_ptregs exc_nmi
 
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 0051cf5c792d..53254d29d5c7 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -136,7 +136,7 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
 .Lsysenter_flags_fixed:
 
 	movq	%rsp, %rdi
-	call	do_SYSENTER_32
+	call_ext_ptregs do_SYSENTER_32
 	/* XEN PV guests always use IRET path */
 	ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
 		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
@@ -253,7 +253,7 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
 	UNWIND_HINT_REGS
 
 	movq	%rsp, %rdi
-	call	do_fast_syscall_32
+	call_ext_ptregs do_fast_syscall_32
 	/* XEN PV guests always use IRET path */
 	ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
 		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
@@ -410,6 +410,6 @@ SYM_CODE_START(entry_INT80_compat)
 	cld
 
 	movq	%rsp, %rdi
-	call	do_int80_syscall_32
+	call_ext_ptregs do_int80_syscall_32
 	jmp	swapgs_restore_regs_and_return_to_usermode
 SYM_CODE_END(entry_INT80_compat)
diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h
index 5d7067ada8fb..516d82f032b6 100644
--- a/arch/x86/include/asm/pks.h
+++ b/arch/x86/include/asm/pks.h
@@ -4,11 +4,27 @@
 
 #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
 
+struct extended_pt_regs {
+	u32 thread_pkrs;
+	/* Keep stack 8 byte aligned */
+	u32 pad;
+	struct pt_regs pt_regs;
+};
+
 void setup_pks(void);
 
+static inline struct extended_pt_regs *extended_pt_regs(struct pt_regs *regs)
+{
+	return container_of(regs, struct extended_pt_regs, pt_regs);
+}
+
+void show_extended_regs_oops(struct pt_regs *regs, unsigned long error_code);
+
 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
 
 static inline void setup_pks(void) { }
+static inline void show_extended_regs_oops(struct pt_regs *regs,
+					   unsigned long error_code) { }
 
 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
 
diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index 02c2cbda4a74..4a41fc4cf028 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -53,4 +53,6 @@
 # define X86_CR3_PTI_PCID_USER_BIT	11
 #endif
 
+#define EXTENDED_PT_REGS_SIZE 8
+
 #endif /* _ASM_X86_PROCESSOR_FLAGS_H */
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 04bddaaba8e2..80531526b0d2 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -319,8 +319,7 @@ SYM_CODE_START_NOALIGN(vc_boot_ghcb)
 	movq    %rsp, %rdi
 	movq	ORIG_RAX(%rsp), %rsi
 	movq	initial_vc_handler(%rip), %rax
-	ANNOTATE_RETPOLINE_SAFE
-	call	*%rax
+	__call_ext_ptregs *%rax, annotate_retpoline_safe=1
 
 	/* Unwind pt_regs */
 	POP_REGS
@@ -397,7 +396,7 @@ SYM_CODE_START_LOCAL(early_idt_handler_common)
 	UNWIND_HINT_REGS
 
 	movq %rsp,%rdi		/* RDI = pt_regs; RSI is already trapnr */
-	call do_early_exception
+	call_ext_ptregs do_early_exception
 
 	decl early_recursion_flag(%rip)
 	jmp restore_regs_and_return_to_kernel
@@ -421,7 +420,7 @@ SYM_CODE_START_NOALIGN(vc_no_ghcb)
 	/* Call C handler */
 	movq    %rsp, %rdi
 	movq	ORIG_RAX(%rsp), %rsi
-	call    do_vc_no_ghcb
+	call_ext_ptregs do_vc_no_ghcb
 
 	/* Unwind pt_regs */
 	POP_REGS
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 0e0e90968f57..f694fbf9dcb8 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -32,6 +32,7 @@
 #include <asm/pgtable_areas.h>		/* VMALLOC_START, ...		*/
 #include <asm/kvm_para.h>		/* kvm_handle_async_pf		*/
 #include <asm/vdso.h>			/* fixup_vdso_exception()	*/
+#include <asm/pks.h>
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -547,6 +548,8 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 		 (error_code & X86_PF_PK)    ? "protection keys violation" :
 					       "permissions violation");
 
+	show_extended_regs_oops(regs, error_code);
+
 	if (!(error_code & X86_PF_USER) && user_mode(regs)) {
 		struct desc_ptr idt, gdt;
 		u16 ldtr, tr;
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 2955ba976048..a3d17a8e4e81 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -50,4 +50,21 @@ static inline void copy_init_pkru_to_fpregs(void)
 
 #endif /* ! CONFIG_ARCH_HAS_PKEYS */
 
+
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+
+void pkrs_save_set_irq(struct pt_regs *regs, u32 val);
+void pkrs_restore_irq(struct pt_regs *regs);
+
+#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
+
+#ifndef INIT_PKRS_VALUE
+#define INIT_PKRS_VALUE 0
+#endif
+
+static inline void pkrs_save_set_irq(struct pt_regs *regs, u32 val) { }
+static inline void pkrs_restore_irq(struct pt_regs *regs) { }
+
+#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
+
 #endif /* _LINUX_PKEYS_H */
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 8442e5c9cfa2..b50bcc2d3ea5 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -5,6 +5,7 @@
 #include <linux/highmem.h>
 #include <linux/livepatch.h>
 #include <linux/audit.h>
+#include <linux/pkeys.h>
 
 #include "common.h"
 
@@ -363,7 +364,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 		instrumentation_end();
 
 		ret.exit_rcu = true;
-		return ret;
+		goto done;
 	}
 
 	/*
@@ -378,6 +379,8 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 	trace_hardirqs_off_finish();
 	instrumentation_end();
 
+done:
+	pkrs_save_set_irq(regs, INIT_PKRS_VALUE);
 	return ret;
 }
 
@@ -403,7 +406,12 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 	/* Check whether this returns to user mode */
 	if (user_mode(regs)) {
 		irqentry_exit_to_user_mode(regs);
-	} else if (!regs_irqs_disabled(regs)) {
+		return;
+	}
+
+	pkrs_restore_irq(regs);
+
+	if (!regs_irqs_disabled(regs)) {
 		/*
 		 * If RCU was not watching on entry this needs to be done
 		 * carefully and needs the same ordering of lockdep/tracing
@@ -457,11 +465,13 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
 	ftrace_nmi_enter();
 	instrumentation_end();
 
+	pkrs_save_set_irq(regs, INIT_PKRS_VALUE);
 	return irq_state;
 }
 
 void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state)
 {
+	pkrs_restore_irq(regs);
 	instrumentation_begin();
 	ftrace_nmi_exit();
 	if (irq_state.lockdep) {
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V6 09/10] x86/pks: Add PKS kernel API
  2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
                   ` (7 preceding siblings ...)
  2021-04-01 22:58 ` [PATCH V6 08/10] x86/entry: Preserve PKRS MSR across exceptions ira.weiny
@ 2021-04-01 22:58 ` ira.weiny
  2021-04-01 22:58 ` [PATCH V6 10/10] x86/pks: Add PKS test code ira.weiny
  2021-04-16 22:14 ` [PATCH V6 00/10] PKS: Add Protection Key Supervisor support Ira Weiny
  10 siblings, 0 replies; 13+ messages in thread
From: ira.weiny @ 2021-04-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Fenghua Yu, Sean Christopherson, Dan Williams, Ira Weiny,
	Dave Hansen, x86, linux-kernel, linux-doc, linux-kselftest

From: Fenghua Yu <fenghua.yu@intel.com>

PKS allows kernel users to define domains of page mappings which have
additional protections beyond the paging protections.  Violating those
protections results in an oops.

Add an API to allocate, use, and free a protection key which identifies
such a domain.  Export 5 new symbols pks_key_alloc(), pks_mk_noaccess(),
pks_mk_readonly(), pks_mk_readwrite(), and pks_key_free().  Add 2 new
macros; PAGE_KERNEL_PKEY(key) and _PAGE_PKEY(pkey).

Update the protection key documentation to cover pkeys on supervisor
pages.

Cc: Sean Christopherson <seanjc@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Co-developed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

---
Changes from V5:
	Dave Hansen
		Remove 'we' from comments

Changes from V4:
	From Sean Christopherson
		Add what happens if the pkey is violated

Changes from V3:
	From Dan Williams
		Remove flags from pks_key_alloc()
		Convert to ARCH_ENABLE_SUPERVISOR_PKEYS
	Update documentation for ARCH_ENABLE_SUPERVISOR_PKEYS
	No need to export write_pkrs
	Correct Kernel Doc for API functions
	From Dan Williams
		remove export of update_pkey_val()
		Update documentation
		change __clear_bit to clear_bit_unlock
		remove cpu_feature_enabled from pks_key_free
		remove pr_err stubs when CONFIG_HAS_SUPERVISOR_PKEYS=n
		clarify pks_key_alloc flags parameter with enum
	From Randy Dunlap:
		Fix grammatical errors in doc

Changes from V2
	From Greg KH
		Replace all WARN_ON_ONCE() uses with pr_err()
	From Dan Williams
		Add __must_check to pks_key_alloc() to help ensure users
		are using the API correctly

Changes from V1
	Per Dave Hansen
		Add flags to pks_key_alloc() to help future proof the
		interface if/when the key space is exhausted.

Changes from RFC V3
	Per Dave Hansen
		Put WARN_ON_ONCE in pks_key_free()
		s/pks_mknoaccess/pks_mk_noaccess/
		s/pks_mkread/pks_mk_readonly/
		s/pks_mkrdwr/pks_mk_readwrite/
		Change return pks_key_alloc() to EOPNOTSUPP when not
			supported or configured
	Per Peter Zijlstra
		Remove unneeded preempt disable/enable
---
 Documentation/core-api/protection-keys.rst | 109 +++++++++++++---
 arch/x86/include/asm/pgtable_types.h       |  12 ++
 arch/x86/include/asm/pks.h                 |   4 +
 arch/x86/mm/pkeys.c                        | 137 ++++++++++++++++++++-
 include/linux/pgtable.h                    |   4 +
 include/linux/pkeys.h                      |  17 +++
 6 files changed, 264 insertions(+), 19 deletions(-)

diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
index ec575e72d0b2..5b1b90d8bdab 100644
--- a/Documentation/core-api/protection-keys.rst
+++ b/Documentation/core-api/protection-keys.rst
@@ -4,25 +4,30 @@
 Memory Protection Keys
 ======================
 
-Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
-which is found on Intel's Skylake (and later) "Scalable Processor"
-Server CPUs. It will be available in future non-server Intel parts
-and future AMD processors.
+Memory Protection Keys provide a mechanism for enforcing page-based
+protections, but without requiring modification of the page tables
+when an application changes protection domains.
 
-For anyone wishing to test or use this feature, it is available in
-Amazon's EC2 C5 instances and is known to work there using an Ubuntu
-17.04 image.
+PKeys Userspace (PKU) is a feature which is found on Intel's Skylake "Scalable
+Processor" Server CPUs and later.  And it will be available in future
+non-server Intel parts and future AMD processors.
 
-Memory Protection Keys provides a mechanism for enforcing page-based
-protections, but without requiring modification of the page tables
-when an application changes protection domains.  It works by
-dedicating 4 previously ignored bits in each page table entry to a
-"protection key", giving 16 possible keys.
+Protection Keys for Supervisor pages (PKS) is available in the SDM since May
+2020.
+
+pkeys work by dedicating 4 previously Reserved bits in each page table entry to
+a "protection key", giving 16 possible keys.  User and Supervisor pages are
+treated separately.
 
-There is also a new user-accessible register (PKRU) with two separate
-bits (Access Disable and Write Disable) for each key.  Being a CPU
-register, PKRU is inherently thread-local, potentially giving each
-thread a different set of protections from every other thread.
+Protections for each page are controlled with per-CPU registers for each type
+of page User and Supervisor.  Each of these 32-bit register stores two separate
+bits (Access Disable and Write Disable) for each key.
+
+For Userspace the register is user-accessible (rdpkru/wrpkru).  For
+Supervisor, the register (MSR_IA32_PKRS) is accessible only to the kernel.
+
+Being a CPU register, pkeys are inherently thread-local, potentially giving
+each thread an independent set of protections from every other thread.
 
 There are two new instructions (RDPKRU/WRPKRU) for reading and writing
 to the new register.  The feature is only available in 64-bit mode,
@@ -30,8 +35,11 @@ even though there is theoretically space in the PAE PTEs.  These
 permissions are enforced on data access only and have no effect on
 instruction fetches.
 
-Syscalls
-========
+For kernel space rdmsr/wrmsr are used to access the kernel MSRs.
+
+
+Syscalls for user space keys
+============================
 
 There are 3 system calls which directly interact with pkeys::
 
@@ -98,3 +106,68 @@ with a read()::
 The kernel will send a SIGSEGV in both cases, but si_code will be set
 to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
 the plain mprotect() permissions are violated.
+
+
+Kernel API for PKS support
+==========================
+
+Similar to user space pkeys, supervisor pkeys allow additional protections to
+be defined for a supervisor mappings.  Unlike user space pkeys, Violations of
+these protections result in a a kernel oops.
+
+The following interface is used to allocate, use, and free a pkey which defines
+a 'protection domain' within the kernel.  Setting a pkey value in a supervisor
+PTE adds this additional protection to the page.
+
+Kernel users intending to use PKS support should check (depend on)
+ARCH_HAS_SUPERVISOR_PKEYS and add their config to ARCH_ENABLE_SUPERVISOR_PKEYS
+to turn on this support within the core.
+
+        int pks_key_alloc(const char * const pkey_user);
+        #define PAGE_KERNEL_PKEY(pkey)
+        #define _PAGE_KEY(pkey)
+        void pks_mk_noaccess(int pkey);
+        void pks_mk_readonly(int pkey);
+        void pks_mk_readwrite(int pkey);
+        void pks_key_free(int pkey);
+
+pks_key_alloc() allocates keys dynamically to allow better use of the limited
+key space.
+
+Callers of pks_key_alloc() _must_ be prepared for it to fail and take
+appropriate action.  This is due mainly to the fact that PKS may not be
+available on all arch's.  Failure to check the return of pks_key_alloc() and
+using any of the rest of the API is undefined.
+
+Keys are allocated with 'No Access' permissions.  If other permissions are
+required before the pkey is used, the pks_mk*() family of calls, documented
+below, can be used prior to setting the pkey within the page table entries.
+
+Kernel users must set the pkey in the page table entries for the mappings they
+want to protect.  This can be done with PAGE_KERNEL_PKEY() or _PAGE_KEY().
+
+The pks_mk*() family of calls allows kernel users to change the protections for
+the domain identified by the pkey parameter.  3 states are available:
+pks_mk_noaccess(), pks_mk_readonly(), and pks_mk_readwrite() which set the
+access to none, read, and read/write respectively.
+
+Finally, pks_key_free() allows a user to return the key to the allocator for
+use by others.
+
+The interface maintains pks_mk_noaccess() (Access Disabled (AD=1)) for all keys
+not currently allocated.  Therefore, the user can depend on access being
+disabled when pks_key_alloc() returns a key and the user should remove mappings
+from the domain (remove the pkey from the PTE) prior to calling pks_key_free().
+
+It should be noted that the underlying WRMSR(MSR_IA32_PKRS) is not serializing
+but still maintains ordering properties similar to WRPKRU.  Thus it is safe to
+immediately use a mapping when the pks_mk*() functions return.
+
+Older versions of the SDM on PKRS may be wrong with regard to this
+serialization.  The text should be the same as that of WRPKRU.  From the WRPKRU
+text:
+
+	WRPKRU will never execute transiently. Memory accesses
+	affected by PKRU register will not execute (even transiently)
+	until all prior executions of WRPKRU have completed execution
+	and updated the PKRU register.
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index f24d7ef8fffa..a3cb274351d9 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -73,6 +73,12 @@
 			 _PAGE_PKEY_BIT2 | \
 			 _PAGE_PKEY_BIT3)
 
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+#define _PAGE_PKEY(pkey)	(_AT(pteval_t, pkey) << _PAGE_BIT_PKEY_BIT0)
+#else
+#define _PAGE_PKEY(pkey)	(_AT(pteval_t, 0))
+#endif
+
 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
 #define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED)
 #else
@@ -228,6 +234,12 @@ enum page_cache_mode {
 #define PAGE_KERNEL_IO		__pgprot_mask(__PAGE_KERNEL_IO)
 #define PAGE_KERNEL_IO_NOCACHE	__pgprot_mask(__PAGE_KERNEL_IO_NOCACHE)
 
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+#define PAGE_KERNEL_PKEY(pkey)	__pgprot_mask(__PAGE_KERNEL | _PAGE_PKEY(pkey))
+#else
+#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL
+#endif
+
 #endif	/* __ASSEMBLY__ */
 
 /*         xwr */
diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h
index 516d82f032b6..2fff4ac484b9 100644
--- a/arch/x86/include/asm/pks.h
+++ b/arch/x86/include/asm/pks.h
@@ -4,6 +4,10 @@
 
 #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
 
+/*  PKS supports 16 keys. Key 0 is reserved for the kernel. */
+#define        PKS_KERN_DEFAULT_KEY    0
+#define        PKS_NUM_KEYS            16
+
 struct extended_pt_regs {
 	u32 thread_pkrs;
 	/* Keep stack 8 byte aligned */
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index f6a3a54b8d7d..94dff3e1fc31 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -3,6 +3,9 @@
  * Intel Memory Protection Keys management
  * Copyright (c) 2015, Intel Corporation.
  */
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/pkeys: " fmt
+
 #include <linux/debugfs.h>		/* debugfs_create_u32()		*/
 #include <linux/mm_types.h>             /* mm_struct, vma, etc...       */
 #include <linux/pkeys.h>                /* PKEY_*                       */
@@ -11,6 +14,7 @@
 #include <asm/cpufeature.h>             /* boot_cpu_has, ...            */
 #include <asm/mmu_context.h>            /* vma_pkey()                   */
 #include <asm/fpu/internal.h>		/* init_fpstate			*/
+#include <asm/pks.h>
 
 int __execute_only_pkey(struct mm_struct *mm)
 {
@@ -276,4 +280,135 @@ void setup_pks(void)
 	cr4_set_bits(X86_CR4_PKS);
 }
 
-#endif
+/*
+ * Do not call this directly, see pks_mk*() below.
+ *
+ * @pkey: Key for the domain to change
+ * @protection: protection bits to be used
+ *
+ * Protection utilizes the same protection bits specified for User pkeys
+ *     PKEY_DISABLE_ACCESS
+ *     PKEY_DISABLE_WRITE
+ *
+ */
+static inline void pks_update_protection(int pkey, unsigned long protection)
+{
+	current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs,
+						     pkey, protection);
+	write_pkrs(current->thread.saved_pkrs);
+}
+
+/**
+ * pks_mk_noaccess() - Disable all access to the domain
+ * @pkey the pkey for which the access should change.
+ *
+ * Disable all access to the domain specified by pkey.  This is a global
+ * update and only affects the current running thread.
+ *
+ * It is a bug for users to call this without a valid pkey returned from
+ * pks_key_alloc()
+ */
+void pks_mk_noaccess(int pkey)
+{
+	pks_update_protection(pkey, PKEY_DISABLE_ACCESS);
+}
+EXPORT_SYMBOL_GPL(pks_mk_noaccess);
+
+/**
+ * pks_mk_readonly() - Make the domain Read only
+ * @pkey the pkey for which the access should change.
+ *
+ * Allow read access to the domain specified by pkey.  This is a global update
+ * and only affects the current running thread.
+ *
+ * It is a bug for users to call this without a valid pkey returned from
+ * pks_key_alloc()
+ */
+void pks_mk_readonly(int pkey)
+{
+	pks_update_protection(pkey, PKEY_DISABLE_WRITE);
+}
+EXPORT_SYMBOL_GPL(pks_mk_readonly);
+
+/**
+ * pks_mk_readwrite() - Make the domain Read/Write
+ * @pkey the pkey for which the access should change.
+ *
+ * Allow all access, read and write, to the domain specified by pkey.  This is
+ * a global update and only affects the current running thread.
+ *
+ * It is a bug for users to call this without a valid pkey returned from
+ * pks_key_alloc()
+ */
+void pks_mk_readwrite(int pkey)
+{
+	pks_update_protection(pkey, 0);
+}
+EXPORT_SYMBOL_GPL(pks_mk_readwrite);
+
+static const char pks_key_user0[] = "kernel";
+
+/* Store names of allocated keys for debug.  Key 0 is reserved for the kernel.  */
+static const char *pks_key_users[PKS_NUM_KEYS] = {
+	pks_key_user0
+};
+
+/*
+ * Each key is represented by a bit.  Bit 0 is set for key 0 and reserved for
+ * its use.  Use ulong for the bit operations but only 16 bits are used.
+ */
+static unsigned long pks_key_allocation_map = 1 << PKS_KERN_DEFAULT_KEY;
+
+/**
+ * pks_key_alloc() - Allocate a PKS key
+ * @pkey_user: String stored for debugging of key exhaustion.  The caller is
+ *             responsible to maintain this memory until pks_key_free().
+ *
+ * Return: pkey if success
+ *         -EOPNOTSUPP if pks is not supported or not enabled
+ *         -ENOSPC if no keys are available
+ */
+__must_check int pks_key_alloc(const char * const pkey_user)
+{
+	int nr;
+
+	if (!cpu_feature_enabled(X86_FEATURE_PKS))
+		return -EOPNOTSUPP;
+
+	while (1) {
+		nr = find_first_zero_bit(&pks_key_allocation_map, PKS_NUM_KEYS);
+		if (nr >= PKS_NUM_KEYS) {
+			pr_info("Cannot allocate supervisor key for %s.\n",
+				pkey_user);
+			return -ENOSPC;
+		}
+		if (!test_and_set_bit_lock(nr, &pks_key_allocation_map))
+			break;
+	}
+
+	/* for debugging key exhaustion */
+	pks_key_users[nr] = pkey_user;
+
+	return nr;
+}
+EXPORT_SYMBOL_GPL(pks_key_alloc);
+
+/**
+ * pks_key_free() - Free a previously allocate PKS key
+ * @pkey: Key to be free'ed
+ */
+void pks_key_free(int pkey)
+{
+	if (pkey >= PKS_NUM_KEYS || pkey <= PKS_KERN_DEFAULT_KEY) {
+		pr_err("Invalid PKey value: %d\n", pkey);
+		return;
+	}
+
+	/* Restore to default of no access */
+	pks_mk_noaccess(pkey);
+	pks_key_users[pkey] = NULL;
+	clear_bit_unlock(pkey, &pks_key_allocation_map);
+}
+EXPORT_SYMBOL_GPL(pks_key_free);
+
+#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5e772392a379..e189e9ab6904 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1464,6 +1464,10 @@ static inline bool arch_has_pfn_modify_check(void)
 # define PAGE_KERNEL_EXEC PAGE_KERNEL
 #endif
 
+#ifndef PAGE_KERNEL_PKEY
+#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL
+#endif
+
 /*
  * Page Table Modification bits for pgtbl_mod_mask.
  *
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index a3d17a8e4e81..6659404af876 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -56,6 +56,13 @@ static inline void copy_init_pkru_to_fpregs(void)
 void pkrs_save_set_irq(struct pt_regs *regs, u32 val);
 void pkrs_restore_irq(struct pt_regs *regs);
 
+__must_check int pks_key_alloc(const char *const pkey_user);
+void pks_key_free(int pkey);
+
+void pks_mk_noaccess(int pkey);
+void pks_mk_readonly(int pkey);
+void pks_mk_readwrite(int pkey);
+
 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
 
 #ifndef INIT_PKRS_VALUE
@@ -65,6 +72,16 @@ void pkrs_restore_irq(struct pt_regs *regs);
 static inline void pkrs_save_set_irq(struct pt_regs *regs, u32 val) { }
 static inline void pkrs_restore_irq(struct pt_regs *regs) { }
 
+static inline __must_check int pks_key_alloc(const char * const pkey_user)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void pks_key_free(int pkey) {}
+static inline void pks_mk_noaccess(int pkey) {}
+static inline void pks_mk_readonly(int pkey) {}
+static inline void pks_mk_readwrite(int pkey) {}
+
 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
 
 #endif /* _LINUX_PKEYS_H */
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V6 10/10] x86/pks: Add PKS test code
  2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
                   ` (8 preceding siblings ...)
  2021-04-01 22:58 ` [PATCH V6 09/10] x86/pks: Add PKS kernel API ira.weiny
@ 2021-04-01 22:58 ` ira.weiny
  2021-04-16 22:14 ` [PATCH V6 00/10] PKS: Add Protection Key Supervisor support Ira Weiny
  10 siblings, 0 replies; 13+ messages in thread
From: ira.weiny @ 2021-04-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Ira Weiny, Dan Williams, Dave Hansen, Fenghua Yu, x86,
	linux-kernel, linux-doc, linux-kselftest

From: Ira Weiny <ira.weiny@intel.com>

The core PKS functionality provides an interface for kernel users to
reserve keys to their domains set up the page tables with those keys and
control access to those domains when needed.

Define test code which exercises the core functionality of PKS via a
debugfs entry.  Basic checks can be triggered on boot with a kernel
command line option while both basic and preemption checks can be
triggered with separate debugfs values.

debugfs controls are:

'0' -- Run access tests with a single pkey
'1' -- Set up the pkey register with no access for the pkey allocated to
       this fd
'2' -- Check that the pkey register updated in '1' is still the same.
       (To be used after a forced context switch.)
'3' -- Allocate all pkeys possible and run tests on each pkey allocated.
       DEFAULT when run at boot.

Closing the fd will cleanup and release the pkey, therefore to exercise
context switch testing a user space program is provided in:

	.../tools/testing/selftests/x86/test_pks.c

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V5
	Dave Hansen
		Remove 'we' from comments

Changes from V3
	Add test into ARCH_ENABLE_SUPERVISOR_PKEYS
	Fix allocate context error handling
	Callback must now take pt_regs instead of irq_state
	Use pipes to ensure code switches contexts
	Add more verbose output
	Add --debug opt to trigger more kernel debug output
		Reduce kernel output by default
	Use #defines for the various options
	Add ability to chose cpu for testing
	Work out how to make pkrs_cache global when CONFIG_PKS_TEST=y
	Comments from Dan Williams:
		Remove walk_table in favor of follow_pte
		Adjust for new MASK and SHIFT macros
		Remove unneeded pkey.h header
		Handle_pks_testing -> handle_pks_test
		Retain static pkrs_cache when not test
		s/PKS_TESTING/PKS_TEST/
		Put pks_test_callback declaration in pks_common.h
		Don't export pks_test_callback
		Add comment explaining context creation
		Remove module boilerplate

Changes for V2
	Fix compilation errors

Changes for V1
	Update for new pks_key_alloc()

Changes from RFC V3
	Comments from Dave Hansen
		clean up whitespace dmanage
		Clean up Kconfig help
		Clean up user test error output
		s/pks_mknoaccess/pks_mk_noaccess/
		s/pks_mkread/pks_mk_readonly/
		s/pks_mkrdwr/pks_mk_readwrite/
	Comments from Jing Han
		Remove duplicate stdio.h
---
 Documentation/core-api/protection-keys.rst |   5 +-
 arch/x86/include/asm/pks.h                 |  19 +
 arch/x86/mm/fault.c                        |  15 +
 arch/x86/mm/pkeys.c                        |   2 +-
 lib/Kconfig.debug                          |  11 +
 lib/Makefile                               |   3 +
 lib/pks/Makefile                           |   3 +
 lib/pks/pks_test.c                         | 694 +++++++++++++++++++++
 mm/Kconfig                                 |   3 +-
 tools/testing/selftests/x86/Makefile       |   3 +-
 tools/testing/selftests/x86/test_pks.c     | 149 +++++
 11 files changed, 903 insertions(+), 4 deletions(-)
 create mode 100644 lib/pks/Makefile
 create mode 100644 lib/pks/pks_test.c
 create mode 100644 tools/testing/selftests/x86/test_pks.c

diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
index 5b1b90d8bdab..f1245eb172c7 100644
--- a/Documentation/core-api/protection-keys.rst
+++ b/Documentation/core-api/protection-keys.rst
@@ -121,7 +121,8 @@ PTE adds this additional protection to the page.
 
 Kernel users intending to use PKS support should check (depend on)
 ARCH_HAS_SUPERVISOR_PKEYS and add their config to ARCH_ENABLE_SUPERVISOR_PKEYS
-to turn on this support within the core.
+to turn on this support within the core.  See the test configuration option
+'PKS_TEST' for an example.
 
         int pks_key_alloc(const char * const pkey_user);
         #define PAGE_KERNEL_PKEY(pkey)
@@ -171,3 +172,5 @@ text:
 	affected by PKRU register will not execute (even transiently)
 	until all prior executions of WRPKRU have completed execution
 	and updated the PKRU register.
+
+Example code can be found in lib/pks/pks_test.c
diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h
index 2fff4ac484b9..67c3bc782c32 100644
--- a/arch/x86/include/asm/pks.h
+++ b/arch/x86/include/asm/pks.h
@@ -32,4 +32,23 @@ static inline void show_extended_regs_oops(struct pt_regs *regs,
 
 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
 
+
+#ifdef CONFIG_PKS_TEST
+
+#define __static_or_pks_test
+
+bool handle_pks_test(unsigned long hw_error_code, struct pt_regs *regs);
+bool pks_test_callback(struct pt_regs *regs);
+
+#else /* !CONFIG_PKS_TEST */
+
+#define __static_or_pks_test static
+
+static inline bool handle_pks_test(unsigned long hw_error_code, struct pt_regs *regs)
+{
+	return false;
+}
+
+#endif /* CONFIG_PKS_TEST */
+
 #endif /* _ASM_X86_PKS_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f694fbf9dcb8..03bcc2c801a3 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1134,6 +1134,18 @@ bool fault_in_kernel_space(unsigned long address)
 	return address >= TASK_SIZE_MAX;
 }
 
+#ifdef CONFIG_PKS_TEST
+bool handle_pks_test(unsigned long hw_error_code, struct pt_regs *regs)
+{
+	/*
+	 * If a protection key exception occurs it could be because a PKS test
+	 * is running.  If so, pks_test_callback() will clear the protection
+	 * mechanism and return true to indicate the fault was handled.
+	 */
+	return (hw_error_code & X86_PF_PK) && pks_test_callback(regs);
+}
+#endif
+
 /*
  * Called for all faults where 'address' is part of the kernel address
  * space.  Might get called for faults that originate from *code* that
@@ -1154,6 +1166,9 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 	WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS) &&
 		     (hw_error_code & X86_PF_PK));
 
+	if (handle_pks_test(hw_error_code, regs))
+		return;
+
 #ifdef CONFIG_X86_32
 	/*
 	 * We can fault-in kernel-space virtual memory on-demand. The
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 94dff3e1fc31..cecafd91d107 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -236,7 +236,7 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)
 
 #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
 
-static DEFINE_PER_CPU(u32, pkrs_cache);
+__static_or_pks_test DEFINE_PER_CPU(u32, pkrs_cache);
 
 /*
  * write_pkrs() optimizes MSR writes by maintaining a per cpu cache which can
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2779c29d9981..b7728ed139f9 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2535,6 +2535,17 @@ config HYPERV_TESTING
 	help
 	  Select this option to enable Hyper-V vmbus testing.
 
+config PKS_TEST
+	bool "PKey (S)upervisor testing"
+	depends on ARCH_HAS_SUPERVISOR_PKEYS
+	help
+	  Select this option to enable testing of PKS core software and
+	  hardware.  The PKS core provides a mechanism to allocate keys as well
+	  as maintain the protection settings across context switches.
+	  Answer N if you don't know what supervisor keys are.
+
+	  If unsure, say N.
+
 endmenu # "Kernel Testing and Coverage"
 
 source "Documentation/Kconfig"
diff --git a/lib/Makefile b/lib/Makefile
index b5307d3eec1a..3606b97e12e0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -354,3 +354,6 @@ obj-$(CONFIG_BITS_TEST) += test_bits.o
 obj-$(CONFIG_CMDLINE_KUNIT_TEST) += cmdline_kunit.o
 
 obj-$(CONFIG_GENERIC_LIB_DEVMEM_IS_ALLOWED) += devmem_is_allowed.o
+
+# PKS test
+obj-y += pks/
diff --git a/lib/pks/Makefile b/lib/pks/Makefile
new file mode 100644
index 000000000000..9daccba4f7c4
--- /dev/null
+++ b/lib/pks/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_PKS_TEST) += pks_test.o
diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c
new file mode 100644
index 000000000000..329765bce82b
--- /dev/null
+++ b/lib/pks/pks_test.c
@@ -0,0 +1,694 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright(c) 2020 Intel Corporation. All rights reserved.
+ *
+ * Implement PKS testing
+ * Access to run this test can be with a command line parameter
+ * ("pks-test-on-boot") or more detailed tests can be triggered through:
+ *
+ *    /sys/kernel/debug/x86/run_pks
+ *
+ *  debugfs controls are:
+ *
+ *  '0' -- Run access tests with a single pkey
+ *
+ *  '1' -- Set up the pkey register with no access for the pkey allocated to
+ *         this fd
+ *  '2' -- Check that the pkey register updated in '1' is still the same.  (To
+ *         be used after a forced context switch.)
+ *
+ *  '3' -- Allocate all pkeys possible and run tests on each pkey allocated.
+ *         DEFAULT when run at boot.
+ *
+ *  Closing the fd will cleanup and release the pkey.
+ *
+ *  A companion user space program is provided in:
+ *
+ *          .../tools/testing/selftests/x86/test_pks.c
+ *
+ *  which will better test the context switching.
+ *
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/debugfs.h>
+#include <linux/delay.h>
+#include <linux/entry-common.h>
+#include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/mman.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/percpu-defs.h>
+#include <linux/pgtable.h>
+#include <linux/pkeys.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+
+#include <asm/ptrace.h>       /* for struct pt_regs */
+#include <asm/pkeys_common.h>
+#include <asm/processor.h>
+#include <asm/pks.h>
+
+#define PKS_TEST_MEM_SIZE (PAGE_SIZE)
+
+#define RUN_ALLOCATE            "0"
+#define ARM_CTX_SWITCH          "1"
+#define CHECK_CTX_SWITCH        "2"
+#define RUN_ALLOCATE_ALL        "3"
+#define RUN_ALLOCATE_DEBUG      "4"
+#define RUN_ALLOCATE_ALL_DEBUG  "5"
+#define RUN_CRASH_TEST          "9"
+
+DECLARE_PER_CPU(u32, pkrs_cache);
+
+/*
+ * run_on_boot default '= false' which checkpatch complains about initializing;
+ * so don't
+ */
+static bool run_on_boot;
+static struct dentry *pks_test_dentry;
+static bool run_9;
+
+/*
+ * The following globals must be protected for brief periods while the fault
+ * handler checks/updates them.
+ */
+static DEFINE_SPINLOCK(test_lock);
+static int test_armed_key;
+static unsigned long prev_cnt;
+static unsigned long fault_cnt;
+
+struct pks_test_ctx {
+	bool pass;
+	bool pks_cpu_enabled;
+	bool debug;
+	int pkey;
+	char data[64];
+};
+static struct pks_test_ctx *test_exception_ctx;
+
+static bool check_pkey_val(u32 pk_reg, int pkey, u32 expected)
+{
+	pk_reg = (pk_reg & PKR_PKEY_MASK(pkey)) >> PKR_PKEY_SHIFT(pkey);
+	return (pk_reg == expected);
+}
+
+/*
+ * Check if the register @pkey value matches @expected value
+ *
+ * Both the cached and actual MSR must match.
+ */
+static bool check_pkrs(int pkey, u32 expected)
+{
+	bool ret = true;
+	u64 pkrs;
+	u32 *tmp_cache;
+
+	tmp_cache = get_cpu_ptr(&pkrs_cache);
+	if (!check_pkey_val(*tmp_cache, pkey, expected))
+		ret = false;
+	put_cpu_ptr(tmp_cache);
+
+	rdmsrl(MSR_IA32_PKRS, pkrs);
+	if (!check_pkey_val(pkrs, pkey, expected))
+		ret = false;
+
+	return ret;
+}
+
+static void check_exception(u32 thread_pkrs)
+{
+	/* Check the thread saved state */
+	if (!check_pkey_val(thread_pkrs, test_armed_key, PKEY_DISABLE_WRITE)) {
+		pr_err("     FAIL: checking ept_regs->thread_pkrs\n");
+		test_exception_ctx->pass = false;
+	}
+
+	/* Check the exception state */
+	if (!check_pkrs(test_armed_key, PKEY_DISABLE_ACCESS)) {
+		pr_err("     FAIL: PKRS cache and MSR\n");
+		test_exception_ctx->pass = false;
+	}
+
+	/*
+	 * Ensure an update can occur during exception without affecting the
+	 * interrupted thread.  The interrupted thread is checked after
+	 * exception...
+	 */
+	pks_mk_readwrite(test_armed_key);
+	if (!check_pkrs(test_armed_key, 0)) {
+		pr_err("     FAIL: exception did not change register to 0\n");
+		test_exception_ctx->pass = false;
+	}
+	pks_mk_noaccess(test_armed_key);
+	if (!check_pkrs(test_armed_key, PKEY_DISABLE_ACCESS)) {
+		pr_err("     FAIL: exception did not change register to 0x%x\n",
+			PKEY_DISABLE_ACCESS);
+		test_exception_ctx->pass = false;
+	}
+}
+
+/**
+ * pks_test_callback() is exported so that the fault handler can detect
+ * and report back status of intentional faults.
+ *
+ * NOTE: It clears the protection key from the page such that the fault handler
+ * will not re-trigger.
+ */
+bool pks_test_callback(struct pt_regs *regs)
+{
+	struct extended_pt_regs *ept_regs = extended_pt_regs(regs);
+	bool armed = (test_armed_key != 0);
+
+	if (test_exception_ctx) {
+		check_exception(ept_regs->thread_pkrs);
+		/*
+		 * Stop this check directly within the exception because the
+		 * fault handler clean up code will call again while checking
+		 * the PMD entry and there is no need to check this again.
+		 */
+		test_exception_ctx = NULL;
+	}
+
+	if (armed) {
+		/* Enable read and write to stop faults */
+		ept_regs->thread_pkrs = update_pkey_val(ept_regs->thread_pkrs,
+							test_armed_key, 0);
+		fault_cnt++;
+	}
+
+	return armed;
+}
+
+static bool exception_caught(void)
+{
+	bool ret = (fault_cnt != prev_cnt);
+
+	prev_cnt = fault_cnt;
+	return ret;
+}
+
+static void report_pkey_settings(void *info)
+{
+	u8 pkey;
+	unsigned long long msr = 0;
+	unsigned int cpu = smp_processor_id();
+	struct pks_test_ctx *ctx = info;
+
+	rdmsrl(MSR_IA32_PKRS, msr);
+
+	pr_info("for CPU %d : 0x%llx\n", cpu, msr);
+
+	if (ctx->debug) {
+		for (pkey = 0; pkey < PKS_NUM_KEYS; pkey++) {
+			int ad, wd;
+
+			ad = (msr >> PKR_PKEY_SHIFT(pkey)) & PKEY_DISABLE_ACCESS;
+			wd = (msr >> PKR_PKEY_SHIFT(pkey)) & PKEY_DISABLE_WRITE;
+			pr_info("   %u: A:%d W:%d\n", pkey, ad, wd);
+		}
+	}
+}
+
+enum pks_access_mode {
+	PKS_TEST_NO_ACCESS,
+	PKS_TEST_RDWR,
+	PKS_TEST_RDONLY
+};
+
+static char *get_mode_str(enum pks_access_mode mode)
+{
+	switch (mode) {
+	case PKS_TEST_NO_ACCESS:
+		return "No Access";
+	case PKS_TEST_RDWR:
+		return "Read Write";
+	case PKS_TEST_RDONLY:
+		return "Read Only";
+	default:
+		pr_err("BUG in test invalid mode\n");
+		break;
+	}
+
+	return "";
+}
+
+struct pks_access_test {
+	enum pks_access_mode mode;
+	bool write;
+	bool exception;
+};
+
+static struct pks_access_test pkey_test_ary[] = {
+	/*  disable both */
+	{ PKS_TEST_NO_ACCESS, true,  true },
+	{ PKS_TEST_NO_ACCESS, false, true },
+
+	/*  enable both */
+	{ PKS_TEST_RDWR, true,  false },
+	{ PKS_TEST_RDWR, false, false },
+
+	/*  enable read only */
+	{ PKS_TEST_RDONLY, true,  true },
+	{ PKS_TEST_RDONLY, false, false },
+};
+
+static int test_it(struct pks_test_ctx *ctx, struct pks_access_test *test, void *ptr)
+{
+	bool exception;
+	int ret = 0;
+
+	spin_lock(&test_lock);
+	WRITE_ONCE(test_armed_key, ctx->pkey);
+
+	if (test->write)
+		memcpy(ptr, ctx->data, 8);
+	else
+		memcpy(ctx->data, ptr, 8);
+
+	exception = exception_caught();
+
+	WRITE_ONCE(test_armed_key, 0);
+	spin_unlock(&test_lock);
+
+	if (test->exception != exception) {
+		pr_err("pkey test FAILED: mode %s; write %s; exception %s != %s\n",
+			get_mode_str(test->mode),
+			test->write ? "TRUE" : "FALSE",
+			test->exception ? "TRUE" : "FALSE",
+			exception ? "TRUE" : "FALSE");
+		ret = -EFAULT;
+	}
+
+	return ret;
+}
+
+static int run_access_test(struct pks_test_ctx *ctx,
+			   struct pks_access_test *test,
+			   void *ptr)
+{
+	switch (test->mode) {
+	case PKS_TEST_NO_ACCESS:
+		pks_mk_noaccess(ctx->pkey);
+		break;
+	case PKS_TEST_RDWR:
+		pks_mk_readwrite(ctx->pkey);
+		break;
+	case PKS_TEST_RDONLY:
+		pks_mk_readonly(ctx->pkey);
+		break;
+	default:
+		pr_err("BUG in test invalid mode\n");
+		break;
+	}
+
+	return test_it(ctx, test, ptr);
+}
+
+static void *alloc_test_page(int pkey)
+{
+	return __vmalloc_node_range(PKS_TEST_MEM_SIZE, 1, VMALLOC_START, VMALLOC_END,
+				    GFP_KERNEL, PAGE_KERNEL_PKEY(pkey), 0,
+				    NUMA_NO_NODE, __builtin_return_address(0));
+}
+
+static void test_mem_access(struct pks_test_ctx *ctx)
+{
+	int i, rc;
+	u8 pkey;
+	void *ptr = NULL;
+	pte_t *ptep = NULL;
+	spinlock_t *ptl;
+
+	ptr = alloc_test_page(ctx->pkey);
+	if (!ptr) {
+		pr_err("Failed to vmalloc page???\n");
+		ctx->pass = false;
+		return;
+	}
+
+	if (follow_pte(&init_mm, (unsigned long)ptr, &ptep, &ptl)) {
+		pr_err("Failed to walk table???\n");
+		ctx->pass = false;
+		goto done;
+	}
+
+	pkey = pte_flags_pkey(ptep->pte);
+	pr_info("ptep flags 0x%lx pkey %u\n",
+		(unsigned long)ptep->pte, pkey);
+	pte_unmap_unlock(ptep, ptl);
+
+	if (pkey != ctx->pkey) {
+		pr_err("invalid pkey found: %u, test_pkey: %u\n",
+			pkey, ctx->pkey);
+		ctx->pass = false;
+		goto done;
+	}
+
+	if (!ctx->pks_cpu_enabled) {
+		pr_err("not CPU enabled; skipping access tests...\n");
+		ctx->pass = true;
+		goto done;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(pkey_test_ary); i++) {
+		rc = run_access_test(ctx, &pkey_test_ary[i], ptr);
+
+		/*  only save last error is fine */
+		if (rc)
+			ctx->pass = false;
+	}
+
+done:
+	vfree(ptr);
+}
+
+static void pks_run_test(struct pks_test_ctx *ctx)
+{
+	ctx->pass = true;
+
+	pr_info("\n");
+	pr_info("\n");
+	pr_info("     ***** BEGIN: Testing (CPU enabled : %s) *****\n",
+		ctx->pks_cpu_enabled ? "TRUE" : "FALSE");
+
+	if (ctx->pks_cpu_enabled)
+		on_each_cpu(report_pkey_settings, ctx, 1);
+
+	pr_info("           BEGIN: pkey %d Testing\n", ctx->pkey);
+	test_mem_access(ctx);
+	pr_info("           END: PAGE_KERNEL_PKEY Testing : %s\n",
+		ctx->pass ? "PASS" : "FAIL");
+
+	pr_info("     ***** END: Testing *****\n");
+	pr_info("\n");
+	pr_info("\n");
+}
+
+static ssize_t pks_read_file(struct file *file, char __user *user_buf,
+			     size_t count, loff_t *ppos)
+{
+	struct pks_test_ctx *ctx = file->private_data;
+	char buf[32];
+	unsigned int len;
+
+	if (!ctx)
+		len = sprintf(buf, "not run\n");
+	else
+		len = sprintf(buf, "%s\n", ctx->pass ? "PASS" : "FAIL");
+
+	return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+}
+
+static struct pks_test_ctx *alloc_ctx(const char *name)
+{
+	struct pks_test_ctx *ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+
+	if (!ctx) {
+		pr_err("Failed to allocate memory for test context\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ctx->pkey = pks_key_alloc(name);
+	if (ctx->pkey <= 0) {
+		pr_err("Failed to allocate memory for test context\n");
+		kfree(ctx);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ctx->pks_cpu_enabled = cpu_feature_enabled(X86_FEATURE_PKS);
+	sprintf(ctx->data, "%s", "DEADBEEF");
+	return ctx;
+}
+
+static void free_ctx(struct pks_test_ctx *ctx)
+{
+	pks_key_free(ctx->pkey);
+	kfree(ctx);
+}
+
+static void run_exception_test(void)
+{
+	void *ptr = NULL;
+	bool pass = true;
+	struct pks_test_ctx *ctx;
+
+	pr_info("     ***** BEGIN: exception checking\n");
+
+	ctx = alloc_ctx("Exception test");
+	if (IS_ERR(ctx)) {
+		pr_err("     FAIL: no context\n");
+		pass = false;
+		goto result;
+	}
+	ctx->pass = true;
+
+	ptr = alloc_test_page(ctx->pkey);
+	if (!ptr) {
+		pr_err("     FAIL: no vmalloc page\n");
+		pass = false;
+		goto free_context;
+	}
+
+	pks_mk_readonly(ctx->pkey);
+
+	spin_lock(&test_lock);
+	WRITE_ONCE(test_exception_ctx, ctx);
+	WRITE_ONCE(test_armed_key, ctx->pkey);
+
+	memcpy(ptr, ctx->data, 8);
+
+	if (!exception_caught()) {
+		pr_err("     FAIL: did not get an exception\n");
+		pass = false;
+	}
+
+	/*
+	 * NOTE The exception code has to enable access (b00) to keep the fault
+	 * from looping forever.  Therefore full access is seen here rather
+	 * than write disabled.
+	 *
+	 * Furthermore, check_exception() disabled access during the exception
+	 * so this is testing that the thread value was restored back to the
+	 * thread value.
+	 */
+	if (!check_pkrs(test_armed_key, 0)) {
+		pr_err("     FAIL: PKRS not restored\n");
+		pass = false;
+	}
+
+	if (!ctx->pass)
+		pass = false;
+
+	WRITE_ONCE(test_armed_key, 0);
+	spin_unlock(&test_lock);
+
+	vfree(ptr);
+free_context:
+	free_ctx(ctx);
+result:
+	pr_info("     ***** END: exception checking : %s\n",
+		 pass ? "PASS" : "FAIL");
+}
+
+static void run_all(bool debug)
+{
+	struct pks_test_ctx *ctx[PKS_NUM_KEYS];
+	static char name[PKS_NUM_KEYS][64];
+	int i;
+
+	for (i = 1; i < PKS_NUM_KEYS; i++) {
+		sprintf(name[i], "pks ctx %d", i);
+		ctx[i] = alloc_ctx((const char *)name[i]);
+		if (!IS_ERR(ctx[i]))
+			ctx[i]->debug = debug;
+	}
+
+	for (i = 1; i < PKS_NUM_KEYS; i++) {
+		if (!IS_ERR(ctx[i]))
+			pks_run_test(ctx[i]);
+	}
+
+	for (i = 1; i < PKS_NUM_KEYS; i++) {
+		if (!IS_ERR(ctx[i]))
+			free_ctx(ctx[i]);
+	}
+
+	run_exception_test();
+}
+
+static void crash_it(void)
+{
+	struct pks_test_ctx *ctx;
+	void *ptr;
+
+	pr_warn("     ***** BEGIN: Unhandled fault test *****\n");
+
+	ctx = alloc_ctx("crashing kernel\n");
+	if (IS_ERR(ctx)) {
+		pr_err("Failed to allocate context???\n");
+		return;
+	}
+
+	ptr = alloc_test_page(ctx->pkey);
+	if (!ptr) {
+		pr_err("Failed to vmalloc page???\n");
+		ctx->pass = false;
+		return;
+	}
+
+	pks_mk_noaccess(ctx->pkey);
+
+	spin_lock(&test_lock);
+	WRITE_ONCE(test_armed_key, 0);
+	/* This purposely faults */
+	memcpy(ptr, ctx->data, 8);
+	spin_unlock(&test_lock);
+
+	vfree(ptr);
+	free_ctx(ctx);
+}
+
+static ssize_t pks_write_file(struct file *file, const char __user *user_buf,
+			      size_t count, loff_t *ppos)
+{
+	char buf[2];
+	struct pks_test_ctx *ctx = file->private_data;
+
+	if (copy_from_user(buf, user_buf, 1))
+		return -EFAULT;
+	buf[1] = '\0';
+
+	/*
+	 * WARNING: Test "9" will crash the kernel.
+	 *
+	 * Arm the test and print a warning.  A second "9" will run the test.
+	 */
+	if (!strcmp(buf, RUN_CRASH_TEST)) {
+		if (run_9) {
+			crash_it();
+			run_9 = false;
+		} else {
+			pr_warn("CAUTION: Test 9 will crash in the kernel.\n");
+			pr_warn("         Specify 9 a second time to run\n");
+			pr_warn("         run any other test to clear\n");
+			run_9 = true;
+		}
+	} else {
+		run_9 = false;
+	}
+
+	/*
+	 * Test "3" will test allocating all keys. Do it first without
+	 * using "ctx".
+	 */
+	if (!strcmp(buf, RUN_ALLOCATE_ALL))
+		run_all(false);
+	if (!strcmp(buf, RUN_ALLOCATE_ALL_DEBUG))
+		run_all(true);
+
+	/*
+	 * This context is only required if the file is held open for the below
+	 * tests.  Otherwise the context just get's freed in pks_release_file.
+	 */
+	if (!ctx) {
+		ctx = alloc_ctx("pks test");
+		if (IS_ERR(ctx))
+			return -ENOMEM;
+		file->private_data = ctx;
+	}
+
+	if (!strcmp(buf, RUN_ALLOCATE)) {
+		ctx->debug = false;
+		pks_run_test(ctx);
+	}
+	if (!strcmp(buf, RUN_ALLOCATE_DEBUG)) {
+		ctx->debug = true;
+		pks_run_test(ctx);
+	}
+
+	/* start of context switch test */
+	if (!strcmp(buf, ARM_CTX_SWITCH)) {
+		unsigned long reg_pkrs;
+		int access;
+
+		/* Ensure a known state to test context switch */
+		pks_mk_readwrite(ctx->pkey);
+
+		rdmsrl(MSR_IA32_PKRS, reg_pkrs);
+
+		access = (reg_pkrs >> PKR_PKEY_SHIFT(ctx->pkey)) &
+			  PKEY_ACCESS_MASK;
+		pr_info("Context switch armed : pkey %d: 0x%x reg: 0x%lx\n",
+			ctx->pkey, access, reg_pkrs);
+	}
+
+	/* After context switch msr should be restored */
+	if (!strcmp(buf, CHECK_CTX_SWITCH) && ctx->pks_cpu_enabled) {
+		unsigned long reg_pkrs;
+		int access;
+
+		rdmsrl(MSR_IA32_PKRS, reg_pkrs);
+
+		access = (reg_pkrs >> PKR_PKEY_SHIFT(ctx->pkey)) &
+			  PKEY_ACCESS_MASK;
+		if (access != 0) {
+			ctx->pass = false;
+			pr_err("Context switch check failed: pkey %d: 0x%x reg: 0x%lx\n",
+				ctx->pkey, access, reg_pkrs);
+		} else {
+			pr_err("Context switch check passed: pkey %d: 0x%x reg: 0x%lx\n",
+				ctx->pkey, access, reg_pkrs);
+		}
+	}
+
+	return count;
+}
+
+static int pks_release_file(struct inode *inode, struct file *file)
+{
+	struct pks_test_ctx *ctx = file->private_data;
+
+	if (!ctx)
+		return 0;
+
+	free_ctx(ctx);
+	return 0;
+}
+
+static const struct file_operations fops_init_pks = {
+	.read = pks_read_file,
+	.write = pks_write_file,
+	.llseek = default_llseek,
+	.release = pks_release_file,
+};
+
+static int __init parse_pks_test_options(char *str)
+{
+	run_on_boot = true;
+
+	return 0;
+}
+early_param("pks-test-on-boot", parse_pks_test_options);
+
+static int __init pks_test_init(void)
+{
+	if (cpu_feature_enabled(X86_FEATURE_PKS)) {
+		if (run_on_boot)
+			run_all(true);
+
+		pks_test_dentry = debugfs_create_file("run_pks", 0600, arch_debugfs_dir,
+						      NULL, &fops_init_pks);
+	}
+
+	return 0;
+}
+late_initcall(pks_test_init);
+
+static void __exit pks_test_exit(void)
+{
+	debugfs_remove(pks_test_dentry);
+	pr_info("test exit\n");
+}
diff --git a/mm/Kconfig b/mm/Kconfig
index c7d1fc780358..463e95ea0df1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -811,7 +811,8 @@ config ARCH_HAS_PKEYS
 config ARCH_HAS_SUPERVISOR_PKEYS
 	bool
 config ARCH_ENABLE_SUPERVISOR_PKEYS
-	bool
+	def_bool y
+	depends on PKS_TEST
 
 config PERCPU_STATS
 	bool "Collect percpu memory statistics"
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 333980375bc7..32fe0414c6af 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -13,7 +13,8 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie)
 TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \
 			check_initial_reg_state sigreturn iopl ioperm \
 			test_vsyscall mov_ss_trap \
-			syscall_arg_fault fsgsbase_restore
+			syscall_arg_fault fsgsbase_restore test_pks
+
 TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftests/x86/test_pks.c
new file mode 100644
index 000000000000..6d6d100e8894
--- /dev/null
+++ b/tools/testing/selftests/x86/test_pks.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <stdlib.h>
+#include <getopt.h>
+#include <unistd.h>
+#include <assert.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <string.h>
+#include <stdbool.h>
+
+#define PKS_TEST_FILE "/sys/kernel/debug/x86/run_pks"
+
+#define RUN_ALLOCATE            "0"
+#define SETUP_CTX_SWITCH        "1"
+#define CHECK_CTX_SWITCH        "2"
+#define RUN_ALLOCATE_ALL        "3"
+#define RUN_ALLOCATE_DEBUG      "4"
+#define RUN_ALLOCATE_ALL_DEBUG  "5"
+#define RUN_CRASH_TEST          "9"
+
+int main(int argc, char *argv[])
+{
+	cpu_set_t cpuset;
+	char result[32];
+	pid_t pid;
+	int fd;
+	int setup_done[2];
+	int switch_done[2];
+	int cpu = 0;
+	int rc = 0;
+	int c;
+	bool debug = false;
+
+	while (1) {
+		int option_index = 0;
+		static struct option long_options[] = {
+			{"debug",  no_argument,	  0,  0 },
+			{0,	   0,		  0,  0 }
+		};
+
+		c = getopt_long(argc, argv, "", long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			debug = true;
+			break;
+		}
+	}
+
+	if (optind < argc)
+		cpu = strtoul(argv[optind], NULL, 0);
+
+	if (cpu >= sysconf(_SC_NPROCESSORS_ONLN)) {
+		printf("CPU %d is invalid\n", cpu);
+		cpu = sysconf(_SC_NPROCESSORS_ONLN) - 1;
+		printf("   running on max CPU: %d\n", cpu);
+	}
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(cpu, &cpuset);
+	/* Two processes run on CPU 0 so that they go through context switch. */
+	sched_setaffinity(getpid(), sizeof(cpu_set_t), &cpuset);
+
+	if (pipe(setup_done))
+		printf("Failed to create pipe\n");
+	if (pipe(switch_done))
+		printf("Failed to create pipe\n");
+
+	pid = fork();
+	if (pid == 0) {
+		char done = 'y';
+
+		fd = open(PKS_TEST_FILE, O_RDWR);
+		if (fd < 0) {
+			printf("cannot open %s\n", PKS_TEST_FILE);
+			return -1;
+		}
+
+		cpu = sched_getcpu();
+		printf("Child running on cpu %d...\n", cpu);
+
+		/* Allocate test_pkey1 and run test. */
+		if (debug)
+			write(fd, RUN_ALLOCATE_DEBUG, 1);
+		else
+			write(fd, RUN_ALLOCATE, 1);
+
+		/* Arm for context switch test */
+		write(fd, SETUP_CTX_SWITCH, 1);
+
+		printf("   tell parent to go\n");
+		write(setup_done[1], &done, sizeof(done));
+
+		/* Context switch out... */
+		printf("   Waiting for parent...\n");
+		read(switch_done[0], &done, sizeof(done));
+
+		/* Check msr restored */
+		printf("Checking result\n");
+		write(fd, CHECK_CTX_SWITCH, 1);
+
+		read(fd, result, 10);
+		printf("   #PF, context switch, pkey allocation and free tests: %s\n", result);
+		if (!strncmp(result, "PASS", 10)) {
+			rc = -1;
+			done = 'F';
+		}
+
+		/* Signal result */
+		write(setup_done[1], &done, sizeof(done));
+	} else {
+		char done = 'y';
+
+		read(setup_done[0], &done, sizeof(done));
+		cpu = sched_getcpu();
+		printf("Parent running on cpu %d\n", cpu);
+
+		fd = open(PKS_TEST_FILE, O_RDWR);
+		if (fd < 0) {
+			printf("cannot open %s\n", PKS_TEST_FILE);
+			return -1;
+		}
+
+		/* run test with alternate pkey */
+		if (debug)
+			write(fd, RUN_ALLOCATE_DEBUG, 1);
+		else
+			write(fd, RUN_ALLOCATE, 1);
+
+		printf("   Signaling child.\n");
+		write(switch_done[1], &done, sizeof(done));
+
+		/* Wait for result */
+		read(setup_done[0], &done, sizeof(done));
+		if (done == 'F')
+			rc = -1;
+	}
+
+	close(fd);
+
+	return rc;
+}
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH V6 00/10] PKS: Add Protection Key Supervisor support
  2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
                   ` (9 preceding siblings ...)
  2021-04-01 22:58 ` [PATCH V6 10/10] x86/pks: Add PKS test code ira.weiny
@ 2021-04-16 22:14 ` Ira Weiny
  2021-05-14 20:22   ` Ira Weiny
  10 siblings, 1 reply; 13+ messages in thread
From: Ira Weiny @ 2021-04-16 22:14 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Dave Hansen, Dan Williams, x86, linux-kernel, Fenghua Yu,
	linux-doc, linux-kselftest

On Thu, Apr 01, 2021 at 03:58:23PM -0700, 'Ira Weiny' wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Introduce a new page protection mechanism for supervisor pages, Protection Key
> Supervisor (PKS).

Is there any feedback on this series?

Perhaps I should ping for specific feedback or an ack?  Maybe an ack from
x86/mm?

Ira

> 
> Generally PKS enables protections on 'domains' of supervisor pages to limit
> supervisor mode access to pages beyond the normal paging protections.  PKS
> works in a similar fashion to user space pkeys, PKU.  As with PKU, supervisor
> pkeys are checked in addition to normal paging protections and Access or Writes
> can be disabled via a MSR update without TLB flushes when permissions change.
> 
> Also like PKU, a page mapping is assigned to a domain by setting pkey bits in
> the page table entry for that mapping.
> 
> Access is controlled through a PKRS register which is updated via WRMSR/RDMSR.
> 
> XSAVE is not supported for the PKRS MSR.  Therefore the implementation
> saves/restores the MSR across context switches and during exceptions.  Nested
> exceptions are supported by each exception getting a new PKS state.
> 
> For consistent behavior with current paging protections, pkey 0 is reserved and
> configured to allow full access via the pkey mechanism, thus preserving the
> default paging protections on mappings with the default pkey value of 0.
> 
> Other keys, (1-15) are allocated by an allocator which prepares us for key
> contention from day one.  Kernel users should be prepared for the allocator to
> fail either because of key exhaustion or due to PKS not being supported on the
> CPU instance.
> 
> The following are key attributes of PKS.
> 
> 	1) Fast switching of permissions
> 		1a) Prevents access without page table manipulations
> 		1b) No TLB flushes required
> 	2) Works on a per thread basis
> 
> PKS is available with 4 and 5 level paging.  Like PKRU it consumes 4 bits from
> the PTE to store the pkey within the entry.
> 
> All code to support PKS is configured via ARCH_ENABLE_SUPERVISOR_PKEYS which
> is designed to only be turned on when a user is configured on in the kernel.
> Those users must depend on ARCH_HAS_SUPERVISOR_PKEYS to properly work with
> other architectures which do not yet support PKS.
> 
> Originally this series was submitted as part of a large patch set which
> converted the kmap call sites.[1]
> 
> Many follow on discussions revealed a few problems.  The first of which was
> that some callers leak a kmap mapping across threads rather than containing it
> to a critical section.  Attempts were made to see if these 'global kmaps' could
> be supported.[2]  However, supporting global kmaps had many problems.  Work is
> being done in parallel on converting as many kmap calls to the new
> kmap_local_page().[3]
> 
> 
> Changes from V5 [6]
> 	From Dave Hansen
> 		Remove 'we' from comments
> 
> Changes from V4 [5]
> 	From kernel test robot <lkp@intel.com>
> 		Fix i386 build: pks_init_task not found
> 	Move MSR_IA32_PKRS and INIT_PKRS_VALUE into patch 5 where they are
> 		first 'used'.  (Technically nothing is 'used' until the final
> 		test patch.  But review wise this is much cleaner.)
> 	From Sean Christoperson
> 		Add documentation details on what happens if the pkey is violated
> 		Change cpu_feature_enabled to be in WARN_ON check
> 		Clean up commit message of patch 6
> 
> 
> [1] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/
> 
> [2] https://lore.kernel.org/lkml/87mtycqcjf.fsf@nanos.tec.linutronix.de/
> 
> [3] https://lore.kernel.org/lkml/20210128061503.1496847-1-ira.weiny@intel.com/
>     https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.weiny@intel.com/
>     https://lore.kernel.org/lkml/20210205170030.856723-1-ira.weiny@intel.com/
>     https://lore.kernel.org/lkml/20210217024826.3466046-1-ira.weiny@intel.com/
> 
> [4] https://lore.kernel.org/lkml/20201106232908.364581-1-ira.weiny@intel.com/
> 
> [5] https://lore.kernel.org/lkml/20210322053020.2287058-1-ira.weiny@intel.com/
> 
> [6] https://lore.kernel.org/lkml/20210331191405.341999-1-ira.weiny@intel.com/
> 
> 
> Fenghua Yu (1):
>   x86/pks: Add PKS kernel API
> 
> Ira Weiny (9):
>   x86/pkeys: Create pkeys_common.h
>   x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
>   x86/pks: Add additional PKEY helper macros
>   x86/pks: Add PKS defines and Kconfig options
>   x86/pks: Add PKS setup code
>   x86/fault: Adjust WARN_ON for PKey fault
>   x86/pks: Preserve the PKRS MSR on context switch
>   x86/entry: Preserve PKRS MSR across exceptions
>   x86/pks: Add PKS test code
> 
>  Documentation/core-api/protection-keys.rst  | 112 +++-
>  arch/x86/Kconfig                            |   1 +
>  arch/x86/entry/calling.h                    |  26 +
>  arch/x86/entry/common.c                     |  57 ++
>  arch/x86/entry/entry_64.S                   |  22 +-
>  arch/x86/entry/entry_64_compat.S            |   6 +-
>  arch/x86/include/asm/cpufeatures.h          |   1 +
>  arch/x86/include/asm/disabled-features.h    |   8 +-
>  arch/x86/include/asm/msr-index.h            |   1 +
>  arch/x86/include/asm/pgtable.h              |  15 +-
>  arch/x86/include/asm/pgtable_types.h        |  12 +
>  arch/x86/include/asm/pkeys.h                |   4 +
>  arch/x86/include/asm/pkeys_common.h         |  34 +
>  arch/x86/include/asm/pks.h                  |  54 ++
>  arch/x86/include/asm/processor-flags.h      |   2 +
>  arch/x86/include/asm/processor.h            |  47 +-
>  arch/x86/include/uapi/asm/processor-flags.h |   2 +
>  arch/x86/kernel/cpu/common.c                |   2 +
>  arch/x86/kernel/fpu/xstate.c                |  22 +-
>  arch/x86/kernel/head_64.S                   |   7 +-
>  arch/x86/kernel/process.c                   |   3 +
>  arch/x86/kernel/process_64.c                |   2 +
>  arch/x86/mm/fault.c                         |  30 +-
>  arch/x86/mm/pkeys.c                         | 218 +++++-
>  include/linux/pgtable.h                     |   4 +
>  include/linux/pkeys.h                       |  34 +
>  kernel/entry/common.c                       |  14 +-
>  lib/Kconfig.debug                           |  11 +
>  lib/Makefile                                |   3 +
>  lib/pks/Makefile                            |   3 +
>  lib/pks/pks_test.c                          | 694 ++++++++++++++++++++
>  mm/Kconfig                                  |   5 +
>  tools/testing/selftests/x86/Makefile        |   3 +-
>  tools/testing/selftests/x86/test_pks.c      | 149 +++++
>  34 files changed, 1527 insertions(+), 81 deletions(-)
>  create mode 100644 arch/x86/include/asm/pkeys_common.h
>  create mode 100644 arch/x86/include/asm/pks.h
>  create mode 100644 lib/pks/Makefile
>  create mode 100644 lib/pks/pks_test.c
>  create mode 100644 tools/testing/selftests/x86/test_pks.c
> 
> -- 
> 2.28.0.rc0.12.gb6a658bd00c9
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V6 00/10] PKS: Add Protection Key Supervisor support
  2021-04-16 22:14 ` [PATCH V6 00/10] PKS: Add Protection Key Supervisor support Ira Weiny
@ 2021-05-14 20:22   ` Ira Weiny
  0 siblings, 0 replies; 13+ messages in thread
From: Ira Weiny @ 2021-05-14 20:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Peter Zijlstra
  Cc: Edgecombe, Rick P, Dave Hansen, Dan Williams, x86, linux-kernel,
	Fenghua Yu

On Fri, Apr 16, 2021 at 03:14:56PM -0700, 'Ira Weiny' wrote:
> On Thu, Apr 01, 2021 at 03:58:23PM -0700, 'Ira Weiny' wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Introduce a new page protection mechanism for supervisor pages, Protection Key
> > Supervisor (PKS).
> 
> Is there any feedback on this series?

Please disregard this series...  (If you have not done so already).

Along with Rick posting his RFC for page table protections[1] he has raised a
concern with me about pkeys being allocated and then reallocated to another
user.  It turns out the code has a bug in this area and the fix is not trivial.

Because the PMEM use case never needed to free its key and even if it did a
small window of access was not a concern this was not tested.

The best solutions involve changing the API therefore this series is going to
need a respin.

Thank you,
Ira

[1] https://lore.kernel.org/lkml/20210505003032.489164-1-rick.p.edgecombe@intel.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-05-14 20:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01 22:58 [PATCH V6 00/10] PKS: Add Protection Key Supervisor support ira.weiny
2021-04-01 22:58 ` [PATCH V6 01/10] x86/pkeys: Create pkeys_common.h ira.weiny
2021-04-01 22:58 ` [PATCH V6 02/10] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support ira.weiny
2021-04-01 22:58 ` [PATCH V6 03/10] x86/pks: Add additional PKEY helper macros ira.weiny
2021-04-01 22:58 ` [PATCH V6 04/10] x86/pks: Add PKS defines and Kconfig options ira.weiny
2021-04-01 22:58 ` [PATCH V6 05/10] x86/pks: Add PKS setup code ira.weiny
2021-04-01 22:58 ` [PATCH V6 06/10] x86/fault: Adjust WARN_ON for PKey fault ira.weiny
2021-04-01 22:58 ` [PATCH V6 07/10] x86/pks: Preserve the PKRS MSR on context switch ira.weiny
2021-04-01 22:58 ` [PATCH V6 08/10] x86/entry: Preserve PKRS MSR across exceptions ira.weiny
2021-04-01 22:58 ` [PATCH V6 09/10] x86/pks: Add PKS kernel API ira.weiny
2021-04-01 22:58 ` [PATCH V6 10/10] x86/pks: Add PKS test code ira.weiny
2021-04-16 22:14 ` [PATCH V6 00/10] PKS: Add Protection Key Supervisor support Ira Weiny
2021-05-14 20:22   ` Ira Weiny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).