All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv4 00/18] MKTME enabling
@ 2018-06-26 14:22 Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 01/18] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
                   ` (17 more replies)
  0 siblings, 18 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

Multikey Total Memory Encryption (MKTME)[1] is a technology that allows
transparent memory encryption in upcoming Intel platforms. See overview
below.

Here's updated version of my patchset that brings support of MKTME.
Please review and consider applying.

The patchset provides in-kernel infrastructure for MKTME, but doesn't yet
have userspace interface.

First 5 patches are for core-mm. The rest is x86-specific.

The patchset is on top of tip- tree plus page_ext cleanups I've posted
earlier[2]. page_ext cleanups are in -mm tree now.

Below is performance numbers for kernel build. Enabling MKTME doesn't
affect performance of non-encrypted memory allocation.

For encrypted memory allocation requires cache flush on allocation and
freeing encrypted memory. For kernel build it results in ~20% performance
degradation if we allocate all anonymous memory as encrypted.

We would need to maintain per-KeyID pool of free pages to minimize cache
flushing. I'm going to work on the optimization on top of this patchset.

The patchset also can be found here:

git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git mktme/wip

v4:
 - Address Dave's feedback.

 - Add performance numbers.

v3:
 - Kernel now can access encrypted pages via per-KeyID direct mapping.

 - Rework page allocation for encrypted memory to minimize overhead on
   non-encrypted pages. It comes with cost for allocation of encrypted
   pages: we have to flush cache on every time we allocate *and* free
   encrypted page. We will need to optimize it later.

v2:
 - Store KeyID of page in page_ext->flags rather than in anon_vma.
   anon_vma approach turned out to be problematic. The main problem is
   that anon_vma of the page is no longer stable after last mapcount has
   gone. We would like to preserve last used KeyID even for freed
   pages as it allows to avoid unnecessary cache flushing on allocation
   of an encrypted page. page_ext serves this well enough.

 - KeyID is now propagated through page allocator. No need in GFP_ENCRYPT
   anymore.

 - Patch "Decouple dynamic __PHYSICAL_MASK from AMD SME" has been fix to
   work with AMD SEV (need to be confirmed by AMD folks).

------------------------------------------------------------------------------

MKTME is built on top of TME. TME allows encryption of the entirety of
system memory using a single key. MKTME allows to have multiple encryption
domains, each having own key -- different memory pages can be encrypted
with different keys.

Key design points of Intel MKTME:

 - Initial HW implementation would support upto 63 keys (plus one default
   TME key). But the number of keys may be as low as 3, depending to SKU
   and BIOS settings

 - To access encrypted memory you need to use mapping with proper KeyID
   int the page table entry. KeyID is encoded in upper bits of PFN in page
   table entry.

 - CPU does not enforce coherency between mappings of the same physical
   page with different KeyIDs or encryption keys. We wound need to take
   care about flushing cache on allocation of encrypted page and on
   returning it back to free pool.

 - For managing keys, there's MKTME_KEY_PROGRAM leaf of the new PCONFIG
   (platform configuration) instruction. It allows load and clear keys
   associated with a KeyID. You can also ask CPU to generate a key for
   you or disable memory encryption when a KeyID is used.

Performance numbers for kernel build:

Base (tip- tree):

 Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

    5664711.936917      task-clock (msec)         #   34.815 CPUs utilized            ( +-  0.02% )
         1,033,886      context-switches          #    0.183 K/sec                    ( +-  0.37% )
           189,308      cpu-migrations            #    0.033 K/sec                    ( +-  0.39% )
       104,951,554      page-faults               #    0.019 M/sec                    ( +-  0.01% )
16,907,670,543,945      cycles                    #    2.985 GHz                      ( +-  0.01% )
12,662,345,427,578      stalled-cycles-frontend   #   74.89% frontend cycles idle     ( +-  0.02% )
 9,936,469,878,830      instructions              #    0.59  insn per cycle
                                                  #    1.27  stalled cycles per insn  ( +-  0.00% )
 2,179,100,082,611      branches                  #  384.680 M/sec                    ( +-  0.00% )
    91,235,200,652      branch-misses             #    4.19% of all branches          ( +-  0.01% )

     162.706797586 seconds time elapsed                                          ( +-  0.04% )

CONFIG_X86_INTEL_MKTME=y, no encrypted memory:

 Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

    5668508.245004      task-clock (msec)         #   34.872 CPUs utilized            ( +-  0.02% )
         1,032,034      context-switches          #    0.182 K/sec                    ( +-  0.90% )
           188,098      cpu-migrations            #    0.033 K/sec                    ( +-  1.15% )
       104,964,084      page-faults               #    0.019 M/sec                    ( +-  0.01% )
16,919,270,913,026      cycles                    #    2.985 GHz                      ( +-  0.02% )
12,672,067,815,805      stalled-cycles-frontend   #   74.90% frontend cycles idle     ( +-  0.02% )
 9,942,560,135,477      instructions              #    0.59  insn per cycle
                                                  #    1.27  stalled cycles per insn  ( +-  0.00% )
 2,180,800,745,687      branches                  #  384.722 M/sec                    ( +-  0.00% )
    91,167,857,700      branch-misses             #    4.18% of all branches          ( +-  0.02% )

     162.552503629 seconds time elapsed                                          ( +-  0.10% )

CONFIG_X86_INTEL_MKTME=y, all anonymous memory encrypted with KeyID-1, pay
cache flush overhead on allocation and free:

 Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

    7041851.999259      task-clock (msec)         #   35.915 CPUs utilized            ( +-  0.01% )
         1,118,938      context-switches          #    0.159 K/sec                    ( +-  0.49% )
           197,039      cpu-migrations            #    0.028 K/sec                    ( +-  0.80% )
       104,970,021      page-faults               #    0.015 M/sec                    ( +-  0.00% )
21,025,639,251,627      cycles                    #    2.986 GHz                      ( +-  0.01% )
16,729,451,765,492      stalled-cycles-frontend   #   79.57% frontend cycles idle     ( +-  0.02% )
10,010,727,735,588      instructions              #    0.48  insn per cycle
                                                  #    1.67  stalled cycles per insn  ( +-  0.00% )
 2,197,110,181,421      branches                  #  312.007 M/sec                    ( +-  0.00% )
    91,119,463,513      branch-misses             #    4.15% of all branches          ( +-  0.01% )

     196.072361087 seconds time elapsed                                          ( +-  0.14% )

[1] https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
[2] https://lkml.kernel.org/r/20180531135457.20167-1-kirill.shutemov@linux.intel.com

Kirill A. Shutemov (18):
  mm: Do no merge VMAs with different encryption KeyIDs
  mm/ksm: Do not merge pages with different KeyIDs
  mm/page_alloc: Unify alloc_hugepage_vma()
  mm/page_alloc: Handle allocation for encrypted memory
  mm/khugepaged: Handle encrypted pages
  x86/mm: Mask out KeyID bits from page table entry pfn
  x86/mm: Introduce variables to store number, shift and mask of KeyIDs
  x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
  x86/mm: Implement page_keyid() using page_ext
  x86/mm: Implement vma_keyid()
  x86/mm: Implement prep_encrypted_page() and arch_free_page()
  x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING
  x86/mm: Allow to disable MKTME after enumeration
  x86/mm: Detect MKTME early
  x86/mm: Calculate direct mapping size
  x86/mm: Implement sync_direct_mapping()
  x86/mm: Handle encrypted memory in page_to_virt() and __pa()
  x86: Introduce CONFIG_X86_INTEL_MKTME

 Documentation/x86/x86_64/mm.txt      |   4 +
 arch/alpha/include/asm/page.h        |   2 +-
 arch/x86/Kconfig                     |  21 +-
 arch/x86/include/asm/mktme.h         |  47 +++
 arch/x86/include/asm/page.h          |   1 +
 arch/x86/include/asm/page_64.h       |   3 +-
 arch/x86/include/asm/pgtable_types.h |  15 +-
 arch/x86/include/asm/setup.h         |   6 +
 arch/x86/kernel/cpu/intel.c          |  32 +-
 arch/x86/kernel/head64.c             |   2 +
 arch/x86/kernel/setup.c              |   3 +
 arch/x86/mm/Makefile                 |   2 +
 arch/x86/mm/init_64.c                |  50 +++
 arch/x86/mm/kaslr.c                  |  11 +-
 arch/x86/mm/mktme.c                  | 546 +++++++++++++++++++++++++++
 include/linux/gfp.h                  |  54 ++-
 include/linux/migrate.h              |  12 +-
 include/linux/mm.h                   |  14 +
 include/linux/page_ext.h             |  11 +-
 mm/compaction.c                      |   1 +
 mm/khugepaged.c                      |  10 +
 mm/ksm.c                             |   3 +
 mm/mempolicy.c                       |  28 +-
 mm/migrate.c                         |   4 +-
 mm/mmap.c                            |   3 +-
 mm/page_alloc.c                      |  47 +++
 mm/page_ext.c                        |   3 +
 27 files changed, 901 insertions(+), 34 deletions(-)
 create mode 100644 arch/x86/include/asm/mktme.h
 create mode 100644 arch/x86/mm/mktme.c

-- 
2.18.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCHv4 01/18] mm: Do no merge VMAs with different encryption KeyIDs
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 02/18] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

VMAs with different KeyID do not mix together. Only VMAs with the same
KeyID are compatible.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h | 7 +++++++
 mm/mmap.c          | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a0fbb9ffe380..ebf4bd8bd0bf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1541,6 +1541,13 @@ static inline bool vma_is_anonymous(struct vm_area_struct *vma)
 	return !vma->vm_ops;
 }
 
+#ifndef vma_keyid
+static inline int vma_keyid(struct vm_area_struct *vma)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_SHMEM
 /*
  * The vma_is_shmem is not inline because it is used only by slow
diff --git a/mm/mmap.c b/mm/mmap.c
index d1eb87ef4b1a..7823eb264cc0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1217,7 +1217,8 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
 		mpol_equal(vma_policy(a), vma_policy(b)) &&
 		a->vm_file == b->vm_file &&
 		!((a->vm_flags ^ b->vm_flags) & ~(VM_READ|VM_WRITE|VM_EXEC|VM_SOFTDIRTY)) &&
-		b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
+		b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT) &&
+		vma_keyid(a) == vma_keyid(b);
 }
 
 /*
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 02/18] mm/ksm: Do not merge pages with different KeyIDs
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 01/18] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-07-09 18:03   ` Konrad Rzeszutek Wilk
  2018-06-26 14:22 ` [PATCHv4 03/18] mm/page_alloc: Unify alloc_hugepage_vma() Kirill A. Shutemov
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

Pages encrypted with different encryption keys are not subject to KSM
merge. Otherwise it would cross security boundary.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h | 7 +++++++
 mm/ksm.c           | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ebf4bd8bd0bf..406a28cadfcf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1548,6 +1548,13 @@ static inline int vma_keyid(struct vm_area_struct *vma)
 }
 #endif
 
+#ifndef page_keyid
+static inline int page_keyid(struct page *page)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_SHMEM
 /*
  * The vma_is_shmem is not inline because it is used only by slow
diff --git a/mm/ksm.c b/mm/ksm.c
index a6d43cf9a982..1bd7b9710e29 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1214,6 +1214,9 @@ static int try_to_merge_one_page(struct vm_area_struct *vma,
 	if (!PageAnon(page))
 		goto out;
 
+	if (page_keyid(page) != page_keyid(kpage))
+		goto out;
+
 	/*
 	 * We need the page lock to read a stable PageSwapCache in
 	 * write_protect_page().  We use trylock_page() instead of
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 03/18] mm/page_alloc: Unify alloc_hugepage_vma()
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 01/18] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 02/18] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 04/18] mm/page_alloc: Handle allocation for encrypted memory Kirill A. Shutemov
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

We don't need to have separate implementations of alloc_hugepage_vma()
for NUMA and non-NUMA. Using variant based on alloc_pages_vma() we would
cover both cases.

This is preparation patch for allocation encrypted pages.

alloc_pages_vma() will handle allocation of encrypted pages. With this
change we don' t need to cover alloc_hugepage_vma() separately.

The change makes typo in Alpha's implementation of
__alloc_zeroed_user_highpage() visible. Fix it too.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/alpha/include/asm/page.h | 2 +-
 include/linux/gfp.h           | 6 ++----
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h
index f3fb2848470a..9a6fbb5269f3 100644
--- a/arch/alpha/include/asm/page.h
+++ b/arch/alpha/include/asm/page.h
@@ -18,7 +18,7 @@ extern void clear_page(void *page);
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
-	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vmaddr)
+	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 extern void copy_page(void * _to, void * _from);
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index a6afcec53795..66f395737990 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -494,21 +494,19 @@ alloc_pages(gfp_t gfp_mask, unsigned int order)
 extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
 			struct vm_area_struct *vma, unsigned long addr,
 			int node, bool hugepage);
-#define alloc_hugepage_vma(gfp_mask, vma, addr, order)	\
-	alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
 #else
 #define alloc_pages(gfp_mask, order) \
 		alloc_pages_node(numa_node_id(), gfp_mask, order)
 #define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
 	alloc_pages(gfp_mask, order)
-#define alloc_hugepage_vma(gfp_mask, vma, addr, order)	\
-	alloc_pages(gfp_mask, order)
 #endif
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
 #define alloc_page_vma(gfp_mask, vma, addr)			\
 	alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false)
 #define alloc_page_vma_node(gfp_mask, vma, addr, node)		\
 	alloc_pages_vma(gfp_mask, 0, vma, addr, node, false)
+#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
+	alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
 
 extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
 extern unsigned long get_zeroed_page(gfp_t gfp_mask);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 04/18] mm/page_alloc: Handle allocation for encrypted memory
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 03/18] mm/page_alloc: Unify alloc_hugepage_vma() Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 05/18] mm/khugepaged: Handle encrypted pages Kirill A. Shutemov
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

For encrypted memory, we need to allocate pages for a specific
encryption KeyID.

There are two cases when we need to allocate a page for encryption:

 - Allocation for an encrypted VMA;

 - Allocation for migration of encrypted page;

The first case can be covered within alloc_page_vma(). We know KeyID
from the VMA.

The second case requires few new page allocation routines that would
allocate the page for a specific KeyID.

An encrypted page has to be cleared after KeyID set. This is handled
in prep_encrypted_page() that will be provided by arch-specific code.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/gfp.h     | 48 ++++++++++++++++++++++++++++++++++++-----
 include/linux/migrate.h | 12 ++++++++---
 mm/compaction.c         |  1 +
 mm/mempolicy.c          | 28 ++++++++++++++++++------
 mm/migrate.c            |  4 ++--
 mm/page_alloc.c         | 47 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 123 insertions(+), 17 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 66f395737990..347a40558cfc 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -446,16 +446,46 @@ static inline void arch_free_page(struct page *page, int order) { }
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
 
+#ifndef prep_encrypted_page
+static inline void prep_encrypted_page(struct page *page, int order,
+		int keyid, bool zero)
+{
+}
+#endif
+
+/*
+ * Encrypted page has to be cleared once keyid is set, not on allocation.
+ */
+static inline bool encrypted_page_needs_zero(int keyid, gfp_t *gfp_mask)
+{
+	if (!keyid)
+		return false;
+
+	if (*gfp_mask & __GFP_ZERO) {
+		*gfp_mask &= ~__GFP_ZERO;
+		return true;
+	}
+
+	return false;
+}
+
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 							nodemask_t *nodemask);
 
+struct page *
+__alloc_pages_nodemask_keyid(gfp_t gfp_mask, unsigned int order,
+		int preferred_nid, nodemask_t *nodemask, int keyid);
+
 static inline struct page *
 __alloc_pages(gfp_t gfp_mask, unsigned int order, int preferred_nid)
 {
 	return __alloc_pages_nodemask(gfp_mask, order, preferred_nid, NULL);
 }
 
+struct page *__alloc_pages_node_keyid(int nid, int keyid,
+		gfp_t gfp_mask, unsigned int order);
+
 /*
  * Allocate pages, preferring the node given as nid. The node must be valid and
  * online. For more general interface, see alloc_pages_node().
@@ -483,6 +513,19 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 	return __alloc_pages_node(nid, gfp_mask, order);
 }
 
+static inline struct page *alloc_pages_node_keyid(int nid, int keyid,
+		gfp_t gfp_mask, unsigned int order)
+{
+	if (nid == NUMA_NO_NODE)
+		nid = numa_mem_id();
+
+	return __alloc_pages_node_keyid(nid, keyid, gfp_mask, order);
+}
+
+extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
+			struct vm_area_struct *vma, unsigned long addr,
+			int node, bool hugepage);
+
 #ifdef CONFIG_NUMA
 extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order);
 
@@ -491,14 +534,9 @@ alloc_pages(gfp_t gfp_mask, unsigned int order)
 {
 	return alloc_pages_current(gfp_mask, order);
 }
-extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
-			struct vm_area_struct *vma, unsigned long addr,
-			int node, bool hugepage);
 #else
 #define alloc_pages(gfp_mask, order) \
 		alloc_pages_node(numa_node_id(), gfp_mask, order)
-#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
-	alloc_pages(gfp_mask, order)
 #endif
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
 #define alloc_page_vma(gfp_mask, vma, addr)			\
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index f2b4abbca55e..fede9bfa89d9 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -38,9 +38,15 @@ static inline struct page *new_page_nodemask(struct page *page,
 	unsigned int order = 0;
 	struct page *new_page = NULL;
 
-	if (PageHuge(page))
+	if (PageHuge(page)) {
+		/*
+		 * HugeTLB doesn't support encryption. We shouldn't see
+		 * such pages.
+		 */
+		WARN_ON(page_keyid(page));
 		return alloc_huge_page_nodemask(page_hstate(compound_head(page)),
 				preferred_nid, nodemask);
+	}
 
 	if (PageTransHuge(page)) {
 		gfp_mask |= GFP_TRANSHUGE;
@@ -50,8 +56,8 @@ static inline struct page *new_page_nodemask(struct page *page,
 	if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
 		gfp_mask |= __GFP_HIGHMEM;
 
-	new_page = __alloc_pages_nodemask(gfp_mask, order,
-				preferred_nid, nodemask);
+	new_page = __alloc_pages_nodemask_keyid(gfp_mask, order,
+				preferred_nid, nodemask, page_keyid(page));
 
 	if (new_page && PageTransHuge(new_page))
 		prep_transhuge_page(new_page);
diff --git a/mm/compaction.c b/mm/compaction.c
index faca45ebe62d..fd51aa32ad96 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1187,6 +1187,7 @@ static struct page *compaction_alloc(struct page *migratepage,
 	list_del(&freepage->lru);
 	cc->nr_freepages--;
 
+	prep_encrypted_page(freepage, 0, page_keyid(migratepage), false);
 	return freepage;
 }
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 9ac49ef17b4e..b0fc42642f8f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -920,22 +920,28 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
 /* page allocation callback for NUMA node migration */
 struct page *alloc_new_node_page(struct page *page, unsigned long node)
 {
-	if (PageHuge(page))
+	if (PageHuge(page)) {
+		/*
+		 * HugeTLB doesn't support encryption. We shouldn't see
+		 * such pages.
+		 */
+		WARN_ON(page_keyid(page));
 		return alloc_huge_page_node(page_hstate(compound_head(page)),
 					node);
-	else if (PageTransHuge(page)) {
+	} else if (PageTransHuge(page)) {
 		struct page *thp;
 
-		thp = alloc_pages_node(node,
+		thp = alloc_pages_node_keyid(node, page_keyid(page),
 			(GFP_TRANSHUGE | __GFP_THISNODE),
 			HPAGE_PMD_ORDER);
 		if (!thp)
 			return NULL;
 		prep_transhuge_page(thp);
 		return thp;
-	} else
-		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
-						    __GFP_THISNODE, 0);
+	} else {
+		return __alloc_pages_node_keyid(node, page_keyid(page),
+				GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0);
+	}
 }
 
 /*
@@ -2012,9 +2018,16 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 {
 	struct mempolicy *pol;
 	struct page *page;
-	int preferred_nid;
+	bool zero = false;
+	int keyid, preferred_nid;
 	nodemask_t *nmask;
 
+	keyid = vma_keyid(vma);
+	if (keyid && (gfp & __GFP_ZERO)) {
+		zero = true;
+		gfp &= ~__GFP_ZERO;
+	}
+
 	pol = get_vma_policy(vma, addr);
 
 	if (pol->mode == MPOL_INTERLEAVE) {
@@ -2057,6 +2070,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 	page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask);
 	mpol_cond_put(pol);
 out:
+	prep_encrypted_page(page, order, keyid, zero);
 	return page;
 }
 
diff --git a/mm/migrate.c b/mm/migrate.c
index 8c0af0f7cab1..eb8dea219dcb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1847,7 +1847,7 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
 	int nid = (int) data;
 	struct page *newpage;
 
-	newpage = __alloc_pages_node(nid,
+	newpage = __alloc_pages_node_keyid(nid, page_keyid(page),
 					 (GFP_HIGHUSER_MOVABLE |
 					  __GFP_THISNODE | __GFP_NOMEMALLOC |
 					  __GFP_NORETRY | __GFP_NOWARN) &
@@ -2030,7 +2030,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	if (numamigrate_update_ratelimit(pgdat, HPAGE_PMD_NR))
 		goto out_dropref;
 
-	new_page = alloc_pages_node(node,
+	new_page = alloc_pages_node_keyid(node, page_keyid(page),
 		(GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
 		HPAGE_PMD_ORDER);
 	if (!new_page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100f1e63..aae5fdb235ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3697,6 +3697,39 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla
 }
 #endif /* CONFIG_COMPACTION */
 
+#ifndef CONFIG_NUMA
+struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
+		struct vm_area_struct *vma, unsigned long addr,
+		int node, bool hugepage)
+{
+	struct page *page;
+	bool need_zero;
+	int keyid = vma_keyid(vma);
+
+	need_zero = encrypted_page_needs_zero(keyid, &gfp_mask);
+	page = alloc_pages(gfp_mask, order);
+	prep_encrypted_page(page, order, keyid, need_zero);
+
+	return page;
+}
+#endif
+
+struct page * __alloc_pages_node_keyid(int nid, int keyid,
+		gfp_t gfp_mask, unsigned int order)
+{
+	struct page *page;
+	bool need_zero;
+
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+	VM_WARN_ON(!node_online(nid));
+
+	need_zero = encrypted_page_needs_zero(keyid, &gfp_mask);
+	page = __alloc_pages(gfp_mask, order, nid);
+	prep_encrypted_page(page, order, keyid, need_zero);
+
+	return page;
+}
+
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map __fs_reclaim_map =
 	STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
@@ -4401,6 +4434,20 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 }
 EXPORT_SYMBOL(__alloc_pages_nodemask);
 
+struct page *
+__alloc_pages_nodemask_keyid(gfp_t gfp_mask, unsigned int order,
+		int preferred_nid, nodemask_t *nodemask, int keyid)
+{
+	struct page *page;
+	bool need_zero;
+
+	need_zero = encrypted_page_needs_zero(keyid, &gfp_mask);
+	page = __alloc_pages_nodemask(gfp_mask, order, preferred_nid, nodemask);
+	prep_encrypted_page(page, order, keyid, need_zero);
+	return page;
+}
+EXPORT_SYMBOL(__alloc_pages_nodemask_keyid);
+
 /*
  * Common helper functions.
  */
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 05/18] mm/khugepaged: Handle encrypted pages
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 04/18] mm/page_alloc: Handle allocation for encrypted memory Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 06/18] x86/mm: Mask out KeyID bits from page table entry pfn Kirill A. Shutemov
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

khugepaged allocates page in advance, before we found a VMA for
collapse. We don't yet know which KeyID to use for the allocation.

The page is allocated with KeyID-0. Once we know that the VMA is
suitable for collapsing, we prepare the page for KeyID we need, based on
vma_keyid().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/khugepaged.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d7b2a4bf8671..4dff3c114501 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1056,6 +1056,16 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 */
 	anon_vma_unlock_write(vma->anon_vma);
 
+	/*
+	 * At this point new_page is allocated as non-encrypted.
+	 * If VMA's KeyID is non-zero, we need to prepare it to be encrypted
+	 * before coping data.
+	 */
+	if (vma_keyid(vma)) {
+		prep_encrypted_page(new_page, HPAGE_PMD_ORDER,
+				vma_keyid(vma), false);
+	}
+
 	__collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl);
 	pte_unmap(pte);
 	__SetPageUptodate(new_page);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 06/18] x86/mm: Mask out KeyID bits from page table entry pfn
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 05/18] mm/khugepaged: Handle encrypted pages Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 07/18] x86/mm: Introduce variables to store number, shift and mask of KeyIDs Kirill A. Shutemov
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

MKTME claims several upper bits of the physical address in a page table
entry to encode KeyID. It effectively shrinks number of bits for
physical address. We should exclude KeyID bits from physical addresses.

For instance, if CPU enumerates 52 physical address bits and number of
bits claimed for KeyID is 6, bits 51:46 must not be threated as part
physical address.

This patch adjusts __PHYSICAL_MASK during MKTME enumeration.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/cpu/intel.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index eb75564f2d25..bf2caf9d52dd 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -571,6 +571,29 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		mktme_status = MKTME_ENABLED;
 	}
 
+#ifdef CONFIG_X86_INTEL_MKTME
+	if (mktme_status == MKTME_ENABLED && nr_keyids) {
+		/*
+		 * Mask out bits claimed from KeyID from physical address mask.
+		 *
+		 * For instance, if a CPU enumerates 52 physical address bits
+		 * and number of bits claimed for KeyID is 6, bits 51:46 of
+		 * physical address is unusable.
+		 */
+		phys_addr_t keyid_mask;
+
+		keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, c->x86_phys_bits - keyid_bits);
+		physical_mask &= ~keyid_mask;
+	} else {
+		/*
+		 * Reset __PHYSICAL_MASK.
+		 * Maybe needed if there's inconsistent configuation
+		 * between CPUs.
+		 */
+		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+	}
+#endif
+
 	/*
 	 * KeyID bits effectively lower the number of physical address
 	 * bits.  Update cpuinfo_x86::x86_phys_bits accordingly.
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 07/18] x86/mm: Introduce variables to store number, shift and mask of KeyIDs
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 06/18] x86/mm: Mask out KeyID bits from page table entry pfn Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-07-09 18:09   ` Konrad Rzeszutek Wilk
  2018-06-26 14:22 ` [PATCHv4 08/18] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify() Kirill A. Shutemov
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

mktme_nr_keyids holds number of KeyIDs available for MKTME, excluding
KeyID zero which used by TME. MKTME KeyIDs start from 1.

mktme_keyid_shift holds shift of KeyID within physical address.

mktme_keyid_mask holds mask to extract KeyID from physical address.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 16 ++++++++++++++++
 arch/x86/kernel/cpu/intel.c  | 12 ++++++++----
 arch/x86/mm/Makefile         |  2 ++
 arch/x86/mm/mktme.c          |  5 +++++
 4 files changed, 31 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/include/asm/mktme.h
 create mode 100644 arch/x86/mm/mktme.c

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
new file mode 100644
index 000000000000..df31876ec48c
--- /dev/null
+++ b/arch/x86/include/asm/mktme.h
@@ -0,0 +1,16 @@
+#ifndef	_ASM_X86_MKTME_H
+#define	_ASM_X86_MKTME_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_X86_INTEL_MKTME
+extern phys_addr_t mktme_keyid_mask;
+extern int mktme_nr_keyids;
+extern int mktme_keyid_shift;
+#else
+#define mktme_keyid_mask	((phys_addr_t)0)
+#define mktme_nr_keyids		0
+#define mktme_keyid_shift	0
+#endif
+
+#endif
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bf2caf9d52dd..efc9e9fc47d4 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -573,6 +573,9 @@ static void detect_tme(struct cpuinfo_x86 *c)
 
 #ifdef CONFIG_X86_INTEL_MKTME
 	if (mktme_status == MKTME_ENABLED && nr_keyids) {
+		mktme_nr_keyids = nr_keyids;
+		mktme_keyid_shift = c->x86_phys_bits - keyid_bits;
+
 		/*
 		 * Mask out bits claimed from KeyID from physical address mask.
 		 *
@@ -580,10 +583,8 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		 * and number of bits claimed for KeyID is 6, bits 51:46 of
 		 * physical address is unusable.
 		 */
-		phys_addr_t keyid_mask;
-
-		keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, c->x86_phys_bits - keyid_bits);
-		physical_mask &= ~keyid_mask;
+		mktme_keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, mktme_keyid_shift);
+		physical_mask &= ~mktme_keyid_mask;
 	} else {
 		/*
 		 * Reset __PHYSICAL_MASK.
@@ -591,6 +592,9 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		 * between CPUs.
 		 */
 		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+		mktme_keyid_mask = 0;
+		mktme_keyid_shift = 0;
+		mktme_nr_keyids = 0;
 	}
 #endif
 
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 4b101dd6e52f..4ebee899c363 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)		+= pti.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_identity.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
+
+obj-$(CONFIG_X86_INTEL_MKTME)	+= mktme.o
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
new file mode 100644
index 000000000000..467f1b26c737
--- /dev/null
+++ b/arch/x86/mm/mktme.c
@@ -0,0 +1,5 @@
+#include <asm/mktme.h>
+
+phys_addr_t mktme_keyid_mask;
+int mktme_nr_keyids;
+int mktme_keyid_shift;
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 08/18] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 07/18] x86/mm: Introduce variables to store number, shift and mask of KeyIDs Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 09/18] x86/mm: Implement page_keyid() using page_ext Kirill A. Shutemov
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

An encrypted VMA will have KeyID stored in vma->vm_page_prot. This way
we don't need to do anything special to setup encrypted page table
entries and don't need to reserve space for KeyID in a VMA.

This patch changes _PAGE_CHG_MASK to include KeyID bits. Otherwise they
are going to be stripped from vm_page_prot on the first pgprot_modify().

Define PTE_PFN_MASK_MAX similar to PTE_PFN_MASK but based on
__PHYSICAL_MASK_SHIFT. This way we include whole range of bits
architecturally available for PFN without referencing physical_mask and
mktme_keyid_mask variables.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/pgtable_types.h | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 99fff853c944..3731f7e08757 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -120,8 +120,21 @@
  * protection key is treated like _PAGE_RW, for
  * instance, and is *not* included in this mask since
  * pte_modify() does modify it.
+ *
+ * They include the physical address and the memory encryption keyID.
+ * The paddr and the keyID never occupy the same bits at the same time.
+ * But, a given bit might be used for the keyID on one system and used for
+ * the physical address on another. As an optimization, we manage them in
+ * one unit here since their combination always occupies the same hardware
+ * bits. PTE_PFN_MASK_MAX stores combined mask.
+ *
+ * Cast PAGE_MASK to a signed type so that it is sign-extended if
+ * virtual addresses are 32-bits but physical addresses are larger
+ * (ie, 32-bit PAE).
  */
-#define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
+#define PTE_PFN_MASK_MAX \
+	(((signed long)PAGE_MASK) & ((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
+#define _PAGE_CHG_MASK	(PTE_PFN_MASK_MAX | _PAGE_PCD | _PAGE_PWT |		\
 			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY |	\
 			 _PAGE_SOFT_DIRTY)
 #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 09/18] x86/mm: Implement page_keyid() using page_ext
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (7 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 08/18] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify() Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 10/18] x86/mm: Implement vma_keyid() Kirill A. Shutemov
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

Store KeyID in bits 31:16 of extended page flags. These bits are unused.

page_keyid() returns zero until page_ext is ready. page_ext initializer
enables static branch to indicate that page_keyid() can use page_ext.
The same static branch will gate MKTME readiness in general.

We don't yet set KeyID for the page. It will come in the following
patch that implements prep_encrypted_page(). All pages have KeyID-0 for
now.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |  7 +++++++
 arch/x86/include/asm/page.h  |  1 +
 arch/x86/mm/mktme.c          | 34 ++++++++++++++++++++++++++++++++++
 include/linux/page_ext.h     | 11 ++++++++++-
 mm/page_ext.c                |  3 +++
 5 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index df31876ec48c..7266494b4f0a 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -2,11 +2,18 @@
 #define	_ASM_X86_MKTME_H
 
 #include <linux/types.h>
+#include <linux/page_ext.h>
 
 #ifdef CONFIG_X86_INTEL_MKTME
 extern phys_addr_t mktme_keyid_mask;
 extern int mktme_nr_keyids;
 extern int mktme_keyid_shift;
+
+extern struct page_ext_operations page_mktme_ops;
+
+#define page_keyid page_keyid
+int page_keyid(const struct page *page);
+
 #else
 #define mktme_keyid_mask	((phys_addr_t)0)
 #define mktme_nr_keyids		0
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 7555b48803a8..39af59487d5f 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -19,6 +19,7 @@
 struct page;
 
 #include <linux/range.h>
+#include <asm/mktme.h>
 extern struct range pfn_mapped[];
 extern int nr_pfn_mapped;
 
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 467f1b26c737..09cbff678b9f 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -3,3 +3,37 @@
 phys_addr_t mktme_keyid_mask;
 int mktme_nr_keyids;
 int mktme_keyid_shift;
+
+static DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
+
+static inline bool mktme_enabled(void)
+{
+	return static_branch_unlikely(&mktme_enabled_key);
+}
+
+int page_keyid(const struct page *page)
+{
+	if (!mktme_enabled())
+		return 0;
+
+	return lookup_page_ext(page)->keyid;
+}
+EXPORT_SYMBOL(page_keyid);
+
+static bool need_page_mktme(void)
+{
+	/* Make sure keyid doesn't collide with extended page flags */
+	BUILD_BUG_ON(__NR_PAGE_EXT_FLAGS > 16);
+
+	return !!mktme_nr_keyids;
+}
+
+static void init_page_mktme(void)
+{
+	static_branch_enable(&mktme_enabled_key);
+}
+
+struct page_ext_operations page_mktme_ops = {
+	.need = need_page_mktme,
+	.init = init_page_mktme,
+};
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index f84f167ec04c..d9c5aae9523f 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -23,6 +23,7 @@ enum page_ext_flags {
 	PAGE_EXT_YOUNG,
 	PAGE_EXT_IDLE,
 #endif
+	__NR_PAGE_EXT_FLAGS
 };
 
 /*
@@ -33,7 +34,15 @@ enum page_ext_flags {
  * then the page_ext for pfn always exists.
  */
 struct page_ext {
-	unsigned long flags;
+	union {
+		unsigned long flags;
+#ifdef CONFIG_X86_INTEL_MKTME
+		struct {
+			unsigned short __pad;
+			unsigned short keyid;
+		};
+#endif
+	};
 };
 
 extern void pgdat_page_ext_init(struct pglist_data *pgdat);
diff --git a/mm/page_ext.c b/mm/page_ext.c
index a9826da84ccb..036658229842 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -68,6 +68,9 @@ static struct page_ext_operations *page_ext_ops[] = {
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_X86_INTEL_MKTME
+	&page_mktme_ops,
+#endif
 };
 
 static unsigned long total_usage;
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 10/18] x86/mm: Implement vma_keyid()
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (8 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 09/18] x86/mm: Implement page_keyid() using page_ext Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 11/18] x86/mm: Implement prep_encrypted_page() and arch_free_page() Kirill A. Shutemov
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

We store KeyID in upper bits for vm_page_prot that match position of
KeyID in PTE. vma_keyid() extracts KeyID from vm_page_prot.

With KeyID in vm_page_prot we don't need to modify any page table helper
to propagate the KeyID to page table entires.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |  5 +++++
 arch/x86/mm/mktme.c          | 12 ++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 7266494b4f0a..f0b7844e36a4 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -4,6 +4,8 @@
 #include <linux/types.h>
 #include <linux/page_ext.h>
 
+struct vm_area_struct;
+
 #ifdef CONFIG_X86_INTEL_MKTME
 extern phys_addr_t mktme_keyid_mask;
 extern int mktme_nr_keyids;
@@ -14,6 +16,9 @@ extern struct page_ext_operations page_mktme_ops;
 #define page_keyid page_keyid
 int page_keyid(const struct page *page);
 
+#define vma_keyid vma_keyid
+int vma_keyid(struct vm_area_struct *vma);
+
 #else
 #define mktme_keyid_mask	((phys_addr_t)0)
 #define mktme_nr_keyids		0
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 09cbff678b9f..a1f40ee61b25 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,3 +1,4 @@
+#include <linux/mm.h>
 #include <asm/mktme.h>
 
 phys_addr_t mktme_keyid_mask;
@@ -37,3 +38,14 @@ struct page_ext_operations page_mktme_ops = {
 	.need = need_page_mktme,
 	.init = init_page_mktme,
 };
+
+int vma_keyid(struct vm_area_struct *vma)
+{
+	pgprotval_t prot;
+
+	if (!mktme_enabled())
+		return 0;
+
+	prot = pgprot_val(vma->vm_page_prot);
+	return (prot & mktme_keyid_mask) >> mktme_keyid_shift;
+}
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 11/18] x86/mm: Implement prep_encrypted_page() and arch_free_page()
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (9 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 10/18] x86/mm: Implement vma_keyid() Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 12/18] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING Kirill A. Shutemov
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

The hardware/CPU does not enforce coherency between mappings of the same
physical page with different KeyIDs or encryption keys.
We are responsible for cache management.

Flush cache on allocating encrypted page and on returning the page to
the free pool.

prep_encrypted_page() also takes care about zeroing the page. We have to
do this after KeyID is set for the page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |  6 +++++
 arch/x86/mm/mktme.c          | 49 ++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index f0b7844e36a4..44409b8bbaca 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -19,6 +19,12 @@ int page_keyid(const struct page *page);
 #define vma_keyid vma_keyid
 int vma_keyid(struct vm_area_struct *vma);
 
+#define prep_encrypted_page prep_encrypted_page
+void prep_encrypted_page(struct page *page, int order, int keyid, bool zero);
+
+#define HAVE_ARCH_FREE_PAGE
+void arch_free_page(struct page *page, int order);
+
 #else
 #define mktme_keyid_mask	((phys_addr_t)0)
 #define mktme_nr_keyids		0
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index a1f40ee61b25..1194496633ce 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,4 +1,5 @@
 #include <linux/mm.h>
+#include <linux/highmem.h>
 #include <asm/mktme.h>
 
 phys_addr_t mktme_keyid_mask;
@@ -49,3 +50,51 @@ int vma_keyid(struct vm_area_struct *vma)
 	prot = pgprot_val(vma->vm_page_prot);
 	return (prot & mktme_keyid_mask) >> mktme_keyid_shift;
 }
+
+void prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
+{
+	int i;
+
+	/* It's not encrypted page: nothing to do */
+	if (!keyid)
+		return;
+
+	/*
+	 * The hardware/CPU does not enforce coherency between mappings of the
+	 * same physical page with different KeyIDs or encryption keys.
+	 * We are responsible for cache management.
+	 *
+	 * We flush cache before allocating encrypted page
+	 */
+	clflush_cache_range(page_address(page), PAGE_SIZE << order);
+
+	for (i = 0; i < (1 << order); i++) {
+		/* All pages coming out of the allocator should have KeyID 0 */
+		WARN_ON_ONCE(lookup_page_ext(page)->keyid);
+		lookup_page_ext(page)->keyid = keyid;
+
+		/* Clear the page after the KeyID is set. */
+		if (zero)
+			clear_highpage(page);
+
+		page++;
+	}
+}
+
+void arch_free_page(struct page *page, int order)
+{
+	int i;
+
+	/* It's not encrypted page: nothing to do */
+	if (!page_keyid(page))
+		return;
+
+	clflush_cache_range(page_address(page), PAGE_SIZE << order);
+
+	for (i = 0; i < (1 << order); i++) {
+		/* Check if the page has reasonable KeyID */
+		WARN_ON_ONCE(lookup_page_ext(page)->keyid > mktme_nr_keyids);
+		lookup_page_ext(page)->keyid = 0;
+		page++;
+	}
+}
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 12/18] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (10 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 11/18] x86/mm: Implement prep_encrypted_page() and arch_free_page() Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 13/18] x86/mm: Allow to disable MKTME after enumeration Kirill A. Shutemov
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

Rename the option to CONFIG_MEMORY_PHYSICAL_PADDING. It will be used
not only for KASLR.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/Kconfig    | 2 +-
 arch/x86/mm/kaslr.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c4d64b19acff..fa5e1ec09247 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2197,7 +2197,7 @@ config RANDOMIZE_MEMORY
 
 	   If unsure, say Y.
 
-config RANDOMIZE_MEMORY_PHYSICAL_PADDING
+config MEMORY_PHYSICAL_PADDING
 	hex "Physical memory mapping padding" if EXPERT
 	depends on RANDOMIZE_MEMORY
 	default "0xa" if MEMORY_HOTPLUG
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 61db77b0eda9..4408cd9a3bef 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -102,7 +102,7 @@ void __init kernel_randomize_memory(void)
 	 */
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
 	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
-		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+		CONFIG_MEMORY_PHYSICAL_PADDING;
 
 	/* Adapt phyiscal memory region size based on available memory */
 	if (memory_tb < kaslr_regions[0].size_tb)
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 13/18] x86/mm: Allow to disable MKTME after enumeration
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (11 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 12/18] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-07-09 18:20   ` Konrad Rzeszutek Wilk
  2018-06-26 14:22 ` [PATCHv4 14/18] x86/mm: Detect MKTME early Kirill A. Shutemov
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

The new helper mktme_disable() allows to disable MKTME even if it's
enumerated successfully. MKTME initialization may fail and this
functionality allows system to boot regardless of the failure.

MKTME needs per-KeyID direct mapping. It requires a lot more virtual
address space which may be a problem in 4-level paging mode. If the
system has more physical memory than we can handle with MKTME.
The feature allows to fail MKTME, but boot the system successfully.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 2 ++
 arch/x86/kernel/cpu/intel.c  | 5 +----
 arch/x86/mm/mktme.c          | 9 +++++++++
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 44409b8bbaca..ebbee6a0c495 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -6,6 +6,8 @@
 
 struct vm_area_struct;
 
+void mktme_disable(void);
+
 #ifdef CONFIG_X86_INTEL_MKTME
 extern phys_addr_t mktme_keyid_mask;
 extern int mktme_nr_keyids;
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index efc9e9fc47d4..75e3b2602b4a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -591,10 +591,7 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		 * Maybe needed if there's inconsistent configuation
 		 * between CPUs.
 		 */
-		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
-		mktme_keyid_mask = 0;
-		mktme_keyid_shift = 0;
-		mktme_nr_keyids = 0;
+		mktme_disable();
 	}
 #endif
 
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 1194496633ce..bb6210dbcf0e 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -13,6 +13,15 @@ static inline bool mktme_enabled(void)
 	return static_branch_unlikely(&mktme_enabled_key);
 }
 
+void mktme_disable(void)
+{
+	physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+	mktme_keyid_mask = 0;
+	mktme_keyid_shift = 0;
+	mktme_nr_keyids = 0;
+	static_branch_disable(&mktme_enabled_key);
+}
+
 int page_keyid(const struct page *page)
 {
 	if (!mktme_enabled())
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 14/18] x86/mm: Detect MKTME early
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (12 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 13/18] x86/mm: Allow to disable MKTME after enumeration Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 15/18] x86/mm: Calculate direct mapping size Kirill A. Shutemov
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

We need to know number of KeyIDs before KALSR is initialized. Number of
KeyIDs would determinate how much address space would be needed for
per-KeyID direct mapping.

KALSR initialization happens before full CPU initizliation is complete.
Move detect_tme() call to early_init_intel().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/cpu/intel.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 75e3b2602b4a..39830806dd42 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -158,6 +158,8 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
 	return false;
 }
 
+static void detect_tme(struct cpuinfo_x86 *c);
+
 static void early_init_intel(struct cpuinfo_x86 *c)
 {
 	u64 misc_enable;
@@ -301,6 +303,9 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	}
 
 	check_mpx_erratum(c);
+
+	if (cpu_has(c, X86_FEATURE_TME))
+		detect_tme(c);
 }
 
 #ifdef CONFIG_X86_32
@@ -766,9 +771,6 @@ static void init_intel(struct cpuinfo_x86 *c)
 	if (cpu_has(c, X86_FEATURE_VMX))
 		detect_vmx_virtcap(c);
 
-	if (cpu_has(c, X86_FEATURE_TME))
-		detect_tme(c);
-
 	init_intel_energy_perf(c);
 
 	init_intel_misc_features(c);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 15/18] x86/mm: Calculate direct mapping size
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (13 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 14/18] x86/mm: Detect MKTME early Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-07-09 18:32   ` Konrad Rzeszutek Wilk
  2018-06-26 14:22 ` [PATCHv4 16/18] x86/mm: Implement sync_direct_mapping() Kirill A. Shutemov
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

The kernel needs to have a way to access encrypted memory. We have two
option on how approach it:

 - Create temporary mappings every time kernel needs access to encrypted
   memory. That's basically brings highmem and its overhead back.

 - Create multiple direct mappings, one per-KeyID. In this setup we
   don't need to create temporary mappings on the fly -- encrypted
   memory is permanently available in kernel address space.

We take the second approach as it has lower overhead.

It's worth noting that with per-KeyID direct mappings compromised kernel
would give access to decrypted data right away without additional tricks
to get memory mapped with the correct KeyID.

Per-KeyID mappings require a lot more virtual address space. On 4-level
machine with 64 KeyIDs we max out 46-bit virtual address space dedicated
for direct mapping with 1TiB of RAM. Given that we round up any
calculation on direct mapping size to 1TiB, we effectively claim all
46-bit address space for direct mapping on such machine regardless of
RAM size.

Increased usage of virtual address space has implications for KASLR:
we have less space for randomization. With 64 TiB claimed for direct
mapping with 4-level we left with 27 TiB of entropy to place
page_offset_base, vmalloc_base and vmemmap_base.

5-level paging provides much wider virtual address space and KASLR
doesn't suffer significantly from per-KeyID direct mappings.

It's preferred to run MKTME with 5-level paging.

A direct mapping for each KeyID will be put next to each other in the
virtual address space. We need to have a way to find boundaries of
direct mapping for particular KeyID.

The new variable direct_mapping_size specifies the size of direct
mapping. With the value, it's trivial to find direct mapping for
KeyID-N: PAGE_OFFSET + N * direct_mapping_size.

Size of direct mapping is calculated during KASLR setup. If KALSR is
disabled it happens during MKTME initialization.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/x86_64/mm.txt |  4 ++++
 arch/x86/include/asm/page_64.h  |  1 +
 arch/x86/include/asm/setup.h    |  6 +++++
 arch/x86/kernel/head64.c        |  2 ++
 arch/x86/kernel/setup.c         |  3 +++
 arch/x86/mm/init_64.c           | 40 +++++++++++++++++++++++++++++++++
 arch/x86/mm/kaslr.c             | 11 ++++++---
 7 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5432a96d31ff..c5b92904090f 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -61,6 +61,10 @@ The direct mapping covers all memory in the system up to the highest
 memory address (this means in some cases it can also include PCI memory
 holes).
 
+With MKTME, we have multiple direct mappings. One per-KeyID. They are put
+next to each other. PAGE_OFFSET + N * direct_mapping_size can be used to
+find direct mapping for KeyID-N.
+
 vmalloc space is lazily synchronized into the different PML4/PML5 pages of
 the processes using the page fault handler, with init_top_pgt as
 reference.
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index 939b1cff4a7b..53c32af895ab 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -14,6 +14,7 @@ extern unsigned long phys_base;
 extern unsigned long page_offset_base;
 extern unsigned long vmalloc_base;
 extern unsigned long vmemmap_base;
+extern unsigned long direct_mapping_size;
 
 static inline unsigned long __phys_addr_nodebug(unsigned long x)
 {
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index ae13bc974416..bcac5080cca5 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -59,6 +59,12 @@ extern void x86_ce4100_early_setup(void);
 static inline void x86_ce4100_early_setup(void) { }
 #endif
 
+#ifdef CONFIG_MEMORY_PHYSICAL_PADDING
+void calculate_direct_mapping_size(void);
+#else
+static inline void calculate_direct_mapping_size(void) { }
+#endif
+
 #ifndef _SETUP
 
 #include <asm/espfix.h>
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 8047379e575a..854e8665aba0 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -59,6 +59,8 @@ EXPORT_SYMBOL(vmalloc_base);
 unsigned long vmemmap_base __ro_after_init = __VMEMMAP_BASE_L4;
 EXPORT_SYMBOL(vmemmap_base);
 #endif
+unsigned long direct_mapping_size __ro_after_init = -1UL;
+EXPORT_SYMBOL(direct_mapping_size);
 
 #define __head	__section(.head.text)
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 2f86d883dd95..09ddbd142e3c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1053,6 +1053,9 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	init_cache_modes();
 
+	 /* direct_mapping_size has to be initialized before KASLR and MKTME */
+	calculate_direct_mapping_size();
+
 	/*
 	 * Define random base addresses for memory sections after max_pfn is
 	 * defined and before each memory section base is used.
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a688617c727e..6fc506f33e58 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1399,6 +1399,46 @@ unsigned long memory_block_size_bytes(void)
 	return memory_block_size_probed;
 }
 
+#ifdef CONFIG_MEMORY_PHYSICAL_PADDING
+void __init calculate_direct_mapping_size(void)
+{
+	unsigned long available_va;
+
+	/* 1/4 of virtual address space is didicated for direct mapping */
+	available_va = 1UL << (__VIRTUAL_MASK_SHIFT - 1);
+
+	/* How much memory the systrem has? */
+	direct_mapping_size = max_pfn << PAGE_SHIFT;
+	direct_mapping_size = round_up(direct_mapping_size, 1UL << 40);
+
+	if (!IS_ENABLED(CONFIG_X86_INTEL_MKTME) || !mktme_nr_keyids)
+		goto out;
+
+	/*
+	 * Not enough virtual address space to address all physical memory with
+	 * MKTME enabled. Even without padding.
+	 *
+	 * Disable MKTME instead.
+	 */
+	if (direct_mapping_size > available_va / (mktme_nr_keyids + 1)) {
+		pr_err("x86/mktme: Disabled. Not enough virtual address space\n");
+		pr_err("x86/mktme: Consider switching to 5-level paging\n");
+		mktme_disable();
+		goto out;
+	}
+
+	/*
+	 * Virtual address space is divided between per-KeyID direct mappings.
+	 */
+	available_va /= mktme_nr_keyids + 1;
+out:
+	/* Add padding, if there's enough virtual address space */
+	direct_mapping_size += (1UL << 40) * CONFIG_MEMORY_PHYSICAL_PADDING;
+	if (direct_mapping_size > available_va)
+		direct_mapping_size = available_va;
+}
+#endif
+
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 /*
  * Initialise the sparsemem vmemmap using huge-pages at the PMD level.
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 4408cd9a3bef..bf044ff50ec0 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -101,10 +101,15 @@ void __init kernel_randomize_memory(void)
 	 * add padding if needed (especially for memory hotplug support).
 	 */
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
-	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
-		CONFIG_MEMORY_PHYSICAL_PADDING;
 
-	/* Adapt phyiscal memory region size based on available memory */
+	/*
+	 * Calculate space required to map all physical memory.
+	 * In case of MKTME, we map physical memory multiple times, one for
+	 * each KeyID. If MKTME is disabled mktme_nr_keyids is 0.
+	 */
+	memory_tb = (direct_mapping_size * (mktme_nr_keyids + 1)) >> TB_SHIFT;
+
+	/* Adapt physical memory region size based on available memory */
 	if (memory_tb < kaslr_regions[0].size_tb)
 		kaslr_regions[0].size_tb = memory_tb;
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 16/18] x86/mm: Implement sync_direct_mapping()
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (14 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 15/18] x86/mm: Calculate direct mapping size Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 17/18] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
  2018-06-26 14:22 ` [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
  17 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

For MKTME we use per-KeyID direct mappings. This allows kernel to have
access to encrypted memory.

sync_direct_mapping() sync per-KeyID direct mappings with a canonical
one -- KeyID-0.

The function tracks changes in the canonical mapping:
 - creating or removing chunks of the translation tree;
 - changes in mapping flags (i.e. protection bits);
 - splitting huge page mapping into a page table;
 - replacing page table with a huge page mapping;

The function need to be called on every change to the direct mapping:
hotplug, hotremove, changes in permissions bits, etc.

The function is nop until MKTME is enabled.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |   8 +
 arch/x86/mm/init_64.c        |  10 +
 arch/x86/mm/mktme.c          | 437 +++++++++++++++++++++++++++++++++++
 3 files changed, 455 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index ebbee6a0c495..ba83fba4f9b3 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -27,10 +27,18 @@ void prep_encrypted_page(struct page *page, int order, int keyid, bool zero);
 #define HAVE_ARCH_FREE_PAGE
 void arch_free_page(struct page *page, int order);
 
+int sync_direct_mapping(void);
+
 #else
 #define mktme_keyid_mask	((phys_addr_t)0)
 #define mktme_nr_keyids		0
 #define mktme_keyid_shift	0
+
+static inline int sync_direct_mapping(void)
+{
+	return 0;
+}
+
 #endif
 
 #endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 6fc506f33e58..5a20fe465947 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -698,6 +698,7 @@ kernel_physical_mapping_init(unsigned long paddr_start,
 {
 	bool pgd_changed = false;
 	unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
+	int ret;
 
 	paddr_last = paddr_end;
 	vaddr = (unsigned long)__va(paddr_start);
@@ -731,6 +732,9 @@ kernel_physical_mapping_init(unsigned long paddr_start,
 		pgd_changed = true;
 	}
 
+	ret = sync_direct_mapping();
+	WARN_ON(ret);
+
 	if (pgd_changed)
 		sync_global_pgds(vaddr_start, vaddr_end - 1);
 
@@ -1142,10 +1146,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
 static void __meminit
 kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 {
+	int ret;
 	start = (unsigned long)__va(start);
 	end = (unsigned long)__va(end);
 
 	remove_pagetable(start, end, true, NULL);
+	ret = sync_direct_mapping();
+	WARN_ON(ret);
 }
 
 int __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
@@ -1253,6 +1260,7 @@ void mark_rodata_ro(void)
 	unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 	unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
 	unsigned long all_end;
+	int ret;
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
@@ -1290,6 +1298,8 @@ void mark_rodata_ro(void)
 			(unsigned long) __va(__pa_symbol(rodata_end)),
 			(unsigned long) __va(__pa_symbol(_sdata)));
 
+	ret = sync_direct_mapping();
+	WARN_ON(ret);
 	debug_checkwx();
 
 	/*
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index bb6210dbcf0e..660caf6a5ce1 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,6 +1,8 @@
 #include <linux/mm.h>
 #include <linux/highmem.h>
 #include <asm/mktme.h>
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
 
 phys_addr_t mktme_keyid_mask;
 int mktme_nr_keyids;
@@ -42,6 +44,7 @@ static bool need_page_mktme(void)
 static void init_page_mktme(void)
 {
 	static_branch_enable(&mktme_enabled_key);
+	sync_direct_mapping();
 }
 
 struct page_ext_operations page_mktme_ops = {
@@ -107,3 +110,437 @@ void arch_free_page(struct page *page, int order)
 		page++;
 	}
 }
+
+static int sync_direct_mapping_pte(unsigned long keyid,
+		pmd_t *dst_pmd, pmd_t *src_pmd,
+		unsigned long addr, unsigned long end)
+{
+	pte_t *src_pte, *dst_pte;
+	pte_t *new_pte = NULL;
+	bool remove_pte;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_pte = !src_pmd && PAGE_ALIGNED(addr) && PAGE_ALIGNED(end);
+
+	/*
+	 * PMD page got split into page table.
+	 * Clear PMD mapping. Page table will be established instead.
+	 */
+	if (pmd_large(*dst_pmd)) {
+		spin_lock(&init_mm.page_table_lock);
+		pmd_clear(dst_pmd);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	/* Allocate a new page table if needed. */
+	if (pmd_none(*dst_pmd)) {
+		new_pte = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_pte)
+			return -ENOMEM;
+		dst_pte = new_pte + pte_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_pte = pte_offset_map(dst_pmd, addr + keyid * direct_mapping_size);
+	}
+	src_pte = src_pmd ? pte_offset_map(src_pmd, addr) : NULL;
+
+	spin_lock(&init_mm.page_table_lock);
+
+	do {
+		pteval_t val;
+
+		if (!src_pte || pte_none(*src_pte)) {
+			set_pte(dst_pte, __pte(0));
+			goto next;
+		}
+
+		if (!pte_none(*dst_pte)) {
+			/*
+			 * Sanity check: PFNs must match between source
+			 * and destination even if the rest doesn't.
+			 */
+			BUG_ON(pte_pfn(*dst_pte) != pte_pfn(*src_pte));
+		}
+
+		/* Copy entry, but set KeyID. */
+		val = pte_val(*src_pte) | keyid << mktme_keyid_shift;
+		set_pte(dst_pte, __pte(val));
+next:
+		addr += PAGE_SIZE;
+		dst_pte++;
+		if (src_pte)
+			src_pte++;
+	} while (addr != end);
+
+	if (new_pte)
+		pmd_populate_kernel(&init_mm, dst_pmd, new_pte);
+
+	if (remove_pte) {
+		__free_page(pmd_page(*dst_pmd));
+		pmd_clear(dst_pmd);
+	}
+
+	spin_unlock(&init_mm.page_table_lock);
+
+	return 0;
+}
+
+static int sync_direct_mapping_pmd(unsigned long keyid,
+		pud_t *dst_pud, pud_t *src_pud,
+		unsigned long addr, unsigned long end)
+{
+	pmd_t *src_pmd, *dst_pmd;
+	pmd_t *new_pmd = NULL;
+	bool remove_pmd = false;
+	unsigned long next;
+	int ret;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_pmd = !src_pud && IS_ALIGNED(addr, PUD_SIZE) && IS_ALIGNED(end, PUD_SIZE);
+
+	/*
+	 * PUD page got split into page table.
+	 * Clear PUD mapping. Page table will be established instead.
+	 */
+	if (pud_large(*dst_pud)) {
+		spin_lock(&init_mm.page_table_lock);
+		pud_clear(dst_pud);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	/* Allocate a new page table if needed. */
+	if (pud_none(*dst_pud)) {
+		new_pmd = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_pmd)
+			return -ENOMEM;
+		dst_pmd = new_pmd + pmd_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_pmd = pmd_offset(dst_pud, addr + keyid * direct_mapping_size);
+	}
+	src_pmd = src_pud ? pmd_offset(src_pud, addr) : NULL;
+
+	do {
+		pmd_t *__src_pmd = src_pmd;
+
+		next = pmd_addr_end(addr, end);
+		if (!__src_pmd || pmd_none(*__src_pmd)) {
+			if (pmd_none(*dst_pmd))
+				goto next;
+			if (pmd_large(*dst_pmd)) {
+				spin_lock(&init_mm.page_table_lock);
+				set_pmd(dst_pmd, __pmd(0));
+				spin_unlock(&init_mm.page_table_lock);
+				goto next;
+			}
+			__src_pmd = NULL;
+		}
+
+		if (__src_pmd && pmd_large(*__src_pmd)) {
+			pmdval_t val;
+
+			if (pmd_large(*dst_pmd)) {
+				/*
+				 * Sanity check: PFNs must match between source
+				 * and destination even if the rest doesn't.
+				 */
+				BUG_ON(pmd_pfn(*dst_pmd) != pmd_pfn(*__src_pmd));
+			} else if (!pmd_none(*dst_pmd)) {
+				/*
+				 * Page table is replaced with a PMD page.
+				 * Free and unmap the page table.
+				 */
+				__free_page(pmd_page(*dst_pmd));
+				spin_lock(&init_mm.page_table_lock);
+				pmd_clear(dst_pmd);
+				spin_unlock(&init_mm.page_table_lock);
+			}
+
+			/* Copy entry, but set KeyID. */
+			val = pmd_val(*__src_pmd) | keyid << mktme_keyid_shift;
+			spin_lock(&init_mm.page_table_lock);
+			set_pmd(dst_pmd, __pmd(val));
+			spin_unlock(&init_mm.page_table_lock);
+			goto next;
+		}
+
+		ret = sync_direct_mapping_pte(keyid, dst_pmd, __src_pmd,
+				addr, next);
+next:
+		addr = next;
+		dst_pmd++;
+		if (src_pmd)
+			src_pmd++;
+	} while (addr != end && !ret);
+
+	if (new_pmd) {
+		spin_lock(&init_mm.page_table_lock);
+		pud_populate(&init_mm, dst_pud, new_pmd);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	if (remove_pmd) {
+		spin_lock(&init_mm.page_table_lock);
+		__free_page(pud_page(*dst_pud));
+		pud_clear(dst_pud);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	return ret;
+}
+
+static int sync_direct_mapping_pud(unsigned long keyid,
+		p4d_t *dst_p4d, p4d_t *src_p4d,
+		unsigned long addr, unsigned long end)
+{
+	pud_t *src_pud, *dst_pud;
+	pud_t *new_pud = NULL;
+	bool remove_pud = false;
+	unsigned long next;
+	int ret;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_pud = !src_p4d && IS_ALIGNED(addr, P4D_SIZE) && IS_ALIGNED(end, P4D_SIZE);
+
+	/*
+	 * P4D page got split into page table.
+	 * Clear P4D mapping. Page table will be established instead.
+	 */
+	if (p4d_large(*dst_p4d)) {
+		spin_lock(&init_mm.page_table_lock);
+		p4d_clear(dst_p4d);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	/* Allocate a new page table if needed. */
+	if (p4d_none(*dst_p4d)) {
+		new_pud = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_pud)
+			return -ENOMEM;
+		dst_pud = new_pud + pud_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_pud = pud_offset(dst_p4d, addr + keyid * direct_mapping_size);
+	}
+	src_pud = src_p4d ? pud_offset(src_p4d, addr) : NULL;
+
+	do {
+		pud_t *__src_pud = src_pud;
+
+		next = pud_addr_end(addr, end);
+		if (!__src_pud || pud_none(*__src_pud)) {
+			if (pud_none(*dst_pud))
+				goto next;
+			if (pud_large(*dst_pud)) {
+				spin_lock(&init_mm.page_table_lock);
+				set_pud(dst_pud, __pud(0));
+				spin_unlock(&init_mm.page_table_lock);
+				goto next;
+			}
+			__src_pud = NULL;
+		}
+
+		if (__src_pud && pud_large(*__src_pud)) {
+			pudval_t val;
+
+			if (pud_large(*dst_pud)) {
+				/*
+				 * Sanity check: PFNs must match between source
+				 * and destination even if the rest doesn't.
+				 */
+				BUG_ON(pud_pfn(*dst_pud) != pud_pfn(*__src_pud));
+			} else if (!pud_none(*dst_pud)) {
+				/*
+				 * Page table is replaced with a pud page.
+				 * Free and unmap the page table.
+				 */
+				__free_page(pud_page(*dst_pud));
+				spin_lock(&init_mm.page_table_lock);
+				pud_clear(dst_pud);
+				spin_unlock(&init_mm.page_table_lock);
+			}
+
+			/* Copy entry, but set KeyID. */
+			val = pud_val(*__src_pud) | keyid << mktme_keyid_shift;
+			spin_lock(&init_mm.page_table_lock);
+			set_pud(dst_pud, __pud(val));
+			spin_unlock(&init_mm.page_table_lock);
+			goto next;
+		}
+
+		ret = sync_direct_mapping_pmd(keyid, dst_pud, __src_pud,
+				addr, next);
+next:
+		addr = next;
+		dst_pud++;
+		if (src_pud)
+			src_pud++;
+	} while (addr != end && !ret);
+
+	if (new_pud) {
+		spin_lock(&init_mm.page_table_lock);
+		p4d_populate(&init_mm, dst_p4d, new_pud);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	if (remove_pud) {
+		spin_lock(&init_mm.page_table_lock);
+		__free_page(p4d_page(*dst_p4d));
+		p4d_clear(dst_p4d);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	return ret;
+}
+
+static int sync_direct_mapping_p4d(unsigned long keyid,
+		pgd_t *dst_pgd, pgd_t *src_pgd,
+		unsigned long addr, unsigned long end)
+{
+	p4d_t *src_p4d, *dst_p4d;
+	p4d_t *new_p4d_1 = NULL, *new_p4d_2 = NULL;
+	bool remove_p4d = false;
+	unsigned long next;
+	int ret = 0;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_p4d = !src_pgd && IS_ALIGNED(addr, PGDIR_SIZE) && IS_ALIGNED(end, PGDIR_SIZE);
+
+	/* Allocate a new page table if needed. */
+	if (pgd_none(*dst_pgd)) {
+		new_p4d_1 = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_p4d_1)
+			return -ENOMEM;
+		dst_p4d = new_p4d_1 + p4d_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_p4d = p4d_offset(dst_pgd, addr + keyid * direct_mapping_size);
+	}
+	src_p4d = src_pgd ? p4d_offset(src_pgd, addr) : NULL;
+
+	do {
+		p4d_t *__src_p4d = src_p4d;
+
+		next = p4d_addr_end(addr, end);
+		if (!__src_p4d || p4d_none(*__src_p4d)) {
+			if (p4d_none(*dst_p4d))
+				goto next;
+			__src_p4d = NULL;
+		}
+
+		ret = sync_direct_mapping_pud(keyid, dst_p4d, __src_p4d,
+				addr, next);
+next:
+		addr = next;
+		dst_p4d++;
+
+		/*
+		 * Direct mappings are 1TiB-aligned. With 5-level paging it
+		 * means that on PGD level there can be misalignment between
+		 * source and distiantion.
+		 *
+		 * Allocate the new page table if dst_p4d crosses page table
+		 * boundary.
+		 */
+		if (!((unsigned long)dst_p4d & ~PAGE_MASK) && addr != end) {
+			if (pgd_none(dst_pgd[1])) {
+				new_p4d_2 = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+				if (!new_p4d_2)
+					ret = -ENOMEM;
+				dst_p4d = new_p4d_2;
+			} else {
+				dst_p4d = p4d_offset(dst_pgd + 1, 0);
+			}
+		}
+		if (src_p4d)
+			src_p4d++;
+	} while (addr != end && !ret);
+
+	if (new_p4d_1 || new_p4d_2) {
+		spin_lock(&init_mm.page_table_lock);
+		if (new_p4d_1)
+			pgd_populate(&init_mm, dst_pgd, new_p4d_1);
+		if (new_p4d_2)
+			pgd_populate(&init_mm, dst_pgd + 1, new_p4d_2);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	if (remove_p4d) {
+		spin_lock(&init_mm.page_table_lock);
+		__free_page(pgd_page(*dst_pgd));
+		pgd_clear(dst_pgd);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	return ret;
+}
+
+static int sync_direct_mapping_keyid(unsigned long keyid)
+{
+	pgd_t *src_pgd, *dst_pgd;
+	unsigned long addr, end, next;
+	int ret;
+
+	addr = PAGE_OFFSET;
+	end = PAGE_OFFSET + direct_mapping_size;
+
+	dst_pgd = pgd_offset_k(addr + keyid * direct_mapping_size);
+	src_pgd = pgd_offset_k(addr);
+
+	do {
+		pgd_t *__src_pgd = src_pgd;
+
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*__src_pgd)) {
+			if (pgd_none(*dst_pgd))
+				continue;
+			__src_pgd = NULL;
+		}
+
+		ret = sync_direct_mapping_p4d(keyid, dst_pgd, __src_pgd,
+				addr, next);
+	} while (dst_pgd++, src_pgd++, addr = next, addr != end && !ret);
+
+	return ret;
+}
+
+/*
+ * For MKTME we maintain per-KeyID direct mappings. This allows kernel to have
+ * access to encrypted memory.
+ *
+ * sync_direct_mapping() sync per-KeyID direct mappings with a canonical
+ * one -- KeyID-0.
+ *
+ * The function tracks changes in the canonical mapping:
+ *  - creating or removing chunks of the translation tree;
+ *  - changes in mapping flags (i.e. protection bits);
+ *  - splitting huge page mapping into a page table;
+ *  - replacing page table with a huge page mapping;
+ *
+ * The function need to be called on every change to the direct mapping:
+ * hotplug, hotremove, changes in permissions bits, etc.
+ *
+ * The function is nop until MKTME is enabled.
+ */
+int sync_direct_mapping(void)
+{
+	int i, ret = 0;
+
+	if (!mktme_enabled())
+		return 0;
+
+	for (i = 1; !ret && i <= mktme_nr_keyids; i++)
+		ret = sync_direct_mapping_keyid(i);
+
+	__flush_tlb_all();
+
+	return ret;
+}
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 17/18] x86/mm: Handle encrypted memory in page_to_virt() and __pa()
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (15 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 16/18] x86/mm: Implement sync_direct_mapping() Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 16:38   ` Dave Hansen
  2018-06-26 14:22 ` [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
  17 siblings, 1 reply; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

Per-KeyID direct mappings require changes into how we find the right
virtual address for a page and virt-to-phys address translations.

page_to_virt() definition overwrites default macros provided by
<linux/mm.h>. We only overwrite the macros if MTKME is enabled
compile-time.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h   | 3 +++
 arch/x86/include/asm/page_64.h | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index ba83fba4f9b3..dbfbd955da98 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -29,6 +29,9 @@ void arch_free_page(struct page *page, int order);
 
 int sync_direct_mapping(void);
 
+#define page_to_virt(x) \
+	(__va(PFN_PHYS(page_to_pfn(x))) + page_keyid(x) * direct_mapping_size)
+
 #else
 #define mktme_keyid_mask	((phys_addr_t)0)
 #define mktme_nr_keyids		0
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index 53c32af895ab..ffad496aadad 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -23,7 +23,7 @@ static inline unsigned long __phys_addr_nodebug(unsigned long x)
 	/* use the carry flag to determine if x was < __START_KERNEL_map */
 	x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET));
 
-	return x;
+	return x % direct_mapping_size;
 }
 
 #ifdef CONFIG_DEBUG_VIRTUAL
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME
  2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
                   ` (16 preceding siblings ...)
  2018-06-26 14:22 ` [PATCHv4 17/18] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
@ 2018-06-26 14:22 ` Kirill A. Shutemov
  2018-06-26 17:30   ` Randy Dunlap
  2018-07-09 18:36   ` Konrad Rzeszutek Wilk
  17 siblings, 2 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

Add new config option to enabled/disable Multi-Key Total Memory
Encryption support.

MKTME uses MEMORY_PHYSICAL_PADDING to reserve enough space in per-KeyID
direct mappings for memory hotplug.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/Kconfig | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fa5e1ec09247..9a843bd63108 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1523,6 +1523,23 @@ config ARCH_USE_MEMREMAP_PROT
 	def_bool y
 	depends on AMD_MEM_ENCRYPT
 
+config X86_INTEL_MKTME
+	bool "Intel Multi-Key Total Memory Encryption"
+	select DYNAMIC_PHYSICAL_MASK
+	select PAGE_EXTENSION
+	depends on X86_64 && CPU_SUP_INTEL
+	---help---
+	  Say yes to enable support for Multi-Key Total Memory Encryption.
+	  This requires an Intel processor that has support of the feature.
+
+	  Multikey Total Memory Encryption (MKTME) is a technology that allows
+	  transparent memory encryption in and upcoming Intel platforms.
+
+	  MKTME is built on top of TME. TME allows encryption of the entirety
+	  of system memory using a single key. MKTME allows having multiple
+	  encryption domains, each having own key -- different memory pages can
+	  be encrypted with different keys.
+
 # Common NUMA Features
 config NUMA
 	bool "Numa Memory Allocation and Scheduler Support"
@@ -2199,7 +2216,7 @@ config RANDOMIZE_MEMORY
 
 config MEMORY_PHYSICAL_PADDING
 	hex "Physical memory mapping padding" if EXPERT
-	depends on RANDOMIZE_MEMORY
+	depends on RANDOMIZE_MEMORY || X86_INTEL_MKTME
 	default "0xa" if MEMORY_HOTPLUG
 	default "0x0"
 	range 0x1 0x40 if MEMORY_HOTPLUG
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 17/18] x86/mm: Handle encrypted memory in page_to_virt() and __pa()
  2018-06-26 14:22 ` [PATCHv4 17/18] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
@ 2018-06-26 16:38   ` Dave Hansen
  2018-06-27 21:56     ` Kirill A. Shutemov
  0 siblings, 1 reply; 36+ messages in thread
From: Dave Hansen @ 2018-06-26 16:38 UTC (permalink / raw)
  To: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Tom Lendacky
  Cc: Kai Huang, Jacob Pan, linux-kernel, linux-mm

> diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
> index ba83fba4f9b3..dbfbd955da98 100644
> --- a/arch/x86/include/asm/mktme.h
> +++ b/arch/x86/include/asm/mktme.h
> @@ -29,6 +29,9 @@ void arch_free_page(struct page *page, int order);
>  
>  int sync_direct_mapping(void);
>  
> +#define page_to_virt(x) \
> +	(__va(PFN_PHYS(page_to_pfn(x))) + page_keyid(x) * direct_mapping_size)

Please put this in a generic header so that this hunk represents the
*default* x86 implementation that is used universally on x86.  Then,
please do

#ifndef CONFIG_MKTME_WHATEVER
#define page_keyid(x) (0)
#endif

>  #else
>  #define mktme_keyid_mask	((phys_addr_t)0)
>  #define mktme_nr_keyids		0
> diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
> index 53c32af895ab..ffad496aadad 100644
> --- a/arch/x86/include/asm/page_64.h
> +++ b/arch/x86/include/asm/page_64.h
> @@ -23,7 +23,7 @@ static inline unsigned long __phys_addr_nodebug(unsigned long x)
>  	/* use the carry flag to determine if x was < __START_KERNEL_map */
>  	x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET));
>  
> -	return x;
> +	return x % direct_mapping_size;

There are almost *surely* performance implications from this that affect
anyone with this compile option turned on.  There's now a 64-bit integer
division operation which is used in places like kfree().

That's a show-stopper for me until we've done some pretty comprehensive
performance analysis of this, which means much more than one kernel
compile test on one system.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME
  2018-06-26 14:22 ` [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
@ 2018-06-26 17:30   ` Randy Dunlap
  2018-06-27 21:57     ` Kirill A. Shutemov
  2018-07-09 18:36   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 36+ messages in thread
From: Randy Dunlap @ 2018-06-26 17:30 UTC (permalink / raw)
  To: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm

On 06/26/2018 07:22 AM, Kirill A. Shutemov wrote:
> Add new config option to enabled/disable Multi-Key Total Memory
> Encryption support.
> 
> MKTME uses MEMORY_PHYSICAL_PADDING to reserve enough space in per-KeyID
> direct mappings for memory hotplug.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/Kconfig | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index fa5e1ec09247..9a843bd63108 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1523,6 +1523,23 @@ config ARCH_USE_MEMREMAP_PROT
>  	def_bool y
>  	depends on AMD_MEM_ENCRYPT
>  
> +config X86_INTEL_MKTME
> +	bool "Intel Multi-Key Total Memory Encryption"
> +	select DYNAMIC_PHYSICAL_MASK
> +	select PAGE_EXTENSION
> +	depends on X86_64 && CPU_SUP_INTEL
> +	---help---
> +	  Say yes to enable support for Multi-Key Total Memory Encryption.
> +	  This requires an Intel processor that has support of the feature.
> +
> +	  Multikey Total Memory Encryption (MKTME) is a technology that allows
> +	  transparent memory encryption in and upcoming Intel platforms.

huh?  Maybe drop the "and"?

> +
> +	  MKTME is built on top of TME. TME allows encryption of the entirety
> +	  of system memory using a single key. MKTME allows having multiple
> +	  encryption domains, each having own key -- different memory pages can
> +	  be encrypted with different keys.
> +
>  # Common NUMA Features
>  config NUMA
>  	bool "Numa Memory Allocation and Scheduler Support"
> @@ -2199,7 +2216,7 @@ config RANDOMIZE_MEMORY
>  
>  config MEMORY_PHYSICAL_PADDING
>  	hex "Physical memory mapping padding" if EXPERT
> -	depends on RANDOMIZE_MEMORY
> +	depends on RANDOMIZE_MEMORY || X86_INTEL_MKTME
>  	default "0xa" if MEMORY_HOTPLUG
>  	default "0x0"
>  	range 0x1 0x40 if MEMORY_HOTPLUG
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 17/18] x86/mm: Handle encrypted memory in page_to_virt() and __pa()
  2018-06-26 16:38   ` Dave Hansen
@ 2018-06-27 21:56     ` Kirill A. Shutemov
  0 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-27 21:56 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky,
	Kai Huang, Jacob Pan, linux-kernel, linux-mm

On Tue, Jun 26, 2018 at 04:38:23PM +0000, Dave Hansen wrote:
> > diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
> > index ba83fba4f9b3..dbfbd955da98 100644
> > --- a/arch/x86/include/asm/mktme.h
> > +++ b/arch/x86/include/asm/mktme.h
> > @@ -29,6 +29,9 @@ void arch_free_page(struct page *page, int order);
> >  
> >  int sync_direct_mapping(void);
> >  
> > +#define page_to_virt(x) \
> > +	(__va(PFN_PHYS(page_to_pfn(x))) + page_keyid(x) * direct_mapping_size)
> 
> Please put this in a generic header so that this hunk represents the
> *default* x86 implementation that is used universally on x86.

As I said, I disagree with you on the style preference.

If a maintainer prefers it to be done in your way, I'll move the macros.

> Then, please do
> 
> #ifndef CONFIG_MKTME_WHATEVER
> #define page_keyid(x) (0)
> #endif

Default page_keyid() implementation returns 0.

> >  #else
> >  #define mktme_keyid_mask	((phys_addr_t)0)
> >  #define mktme_nr_keyids		0
> > diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
> > index 53c32af895ab..ffad496aadad 100644
> > --- a/arch/x86/include/asm/page_64.h
> > +++ b/arch/x86/include/asm/page_64.h
> > @@ -23,7 +23,7 @@ static inline unsigned long __phys_addr_nodebug(unsigned long x)
> >  	/* use the carry flag to determine if x was < __START_KERNEL_map */
> >  	x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET));
> >  
> > -	return x;
> > +	return x % direct_mapping_size;
> 
> There are almost *surely* performance implications from this that affect
> anyone with this compile option turned on.  There's now a 64-bit integer
> division operation which is used in places like kfree().

Fair point. Apparently, modern CPU is good enough to hide the overhead.
I'll look into how to avoid division.

After quick look the only way to get it cheap (near free on my CPU) is to
have power-of-2 direct_mapping_size and mask address before returning it.

If direct_mapping_size is not power-of-2, the best variant I've come up
with so far costs a branch for non-encrypted memory.

For encrypted it is branch, 32-bit division and some bit shifting and
masking.

I'll look into this more.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME
  2018-06-26 17:30   ` Randy Dunlap
@ 2018-06-27 21:57     ` Kirill A. Shutemov
  2018-06-27 23:48       ` Randy Dunlap
  0 siblings, 1 reply; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-06-27 21:57 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky,
	Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm

On Tue, Jun 26, 2018 at 05:30:12PM +0000, Randy Dunlap wrote:
> On 06/26/2018 07:22 AM, Kirill A. Shutemov wrote:
> > Add new config option to enabled/disable Multi-Key Total Memory
> > Encryption support.
> > 
> > MKTME uses MEMORY_PHYSICAL_PADDING to reserve enough space in per-KeyID
> > direct mappings for memory hotplug.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/Kconfig | 19 ++++++++++++++++++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index fa5e1ec09247..9a843bd63108 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1523,6 +1523,23 @@ config ARCH_USE_MEMREMAP_PROT
> >  	def_bool y
> >  	depends on AMD_MEM_ENCRYPT
> >  
> > +config X86_INTEL_MKTME
> > +	bool "Intel Multi-Key Total Memory Encryption"
> > +	select DYNAMIC_PHYSICAL_MASK
> > +	select PAGE_EXTENSION
> > +	depends on X86_64 && CPU_SUP_INTEL
> > +	---help---
> > +	  Say yes to enable support for Multi-Key Total Memory Encryption.
> > +	  This requires an Intel processor that has support of the feature.
> > +
> > +	  Multikey Total Memory Encryption (MKTME) is a technology that allows
> > +	  transparent memory encryption in and upcoming Intel platforms.
> 
> huh?  Maybe drop the "and"?

Ugh.. It has to be "an".

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME
  2018-06-27 21:57     ` Kirill A. Shutemov
@ 2018-06-27 23:48       ` Randy Dunlap
  0 siblings, 0 replies; 36+ messages in thread
From: Randy Dunlap @ 2018-06-27 23:48 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky,
	Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm

On 06/27/2018 02:57 PM, Kirill A. Shutemov wrote:
> On Tue, Jun 26, 2018 at 05:30:12PM +0000, Randy Dunlap wrote:
>> On 06/26/2018 07:22 AM, Kirill A. Shutemov wrote:
>>> Add new config option to enabled/disable Multi-Key Total Memory
>>> Encryption support.
>>>
>>> MKTME uses MEMORY_PHYSICAL_PADDING to reserve enough space in per-KeyID
>>> direct mappings for memory hotplug.
>>>
>>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> ---
>>>  arch/x86/Kconfig | 19 ++++++++++++++++++-
>>>  1 file changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>> index fa5e1ec09247..9a843bd63108 100644
>>> --- a/arch/x86/Kconfig
>>> +++ b/arch/x86/Kconfig
>>> @@ -1523,6 +1523,23 @@ config ARCH_USE_MEMREMAP_PROT
>>>  	def_bool y
>>>  	depends on AMD_MEM_ENCRYPT
>>>  
>>> +config X86_INTEL_MKTME
>>> +	bool "Intel Multi-Key Total Memory Encryption"
>>> +	select DYNAMIC_PHYSICAL_MASK
>>> +	select PAGE_EXTENSION
>>> +	depends on X86_64 && CPU_SUP_INTEL
>>> +	---help---
>>> +	  Say yes to enable support for Multi-Key Total Memory Encryption.
>>> +	  This requires an Intel processor that has support of the feature.
>>> +
>>> +	  Multikey Total Memory Encryption (MKTME) is a technology that allows
>>> +	  transparent memory encryption in and upcoming Intel platforms.
>>
>> huh?  Maybe drop the "and"?
> 
> Ugh.. It has to be "an".

an ... platform.
or
in upcoming Intel platforms.


-- 
~Randy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 02/18] mm/ksm: Do not merge pages with different KeyIDs
  2018-06-26 14:22 ` [PATCHv4 02/18] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
@ 2018-07-09 18:03   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-07-09 18:03 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky,
	Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm

On Tue, Jun 26, 2018 at 05:22:29PM +0300, Kirill A. Shutemov wrote:
> Pages encrypted with different encryption keys are not subject to KSM

Perhaps not allowed instead of subject?
> merge. Otherwise it would cross security boundary.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  include/linux/mm.h | 7 +++++++
>  mm/ksm.c           | 3 +++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ebf4bd8bd0bf..406a28cadfcf 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1548,6 +1548,13 @@ static inline int vma_keyid(struct vm_area_struct *vma)
>  }
>  #endif
>  
> +#ifndef page_keyid
> +static inline int page_keyid(struct page *page)
> +{
> +	return 0;
> +}
> +#endif
> +
>  #ifdef CONFIG_SHMEM
>  /*
>   * The vma_is_shmem is not inline because it is used only by slow
> diff --git a/mm/ksm.c b/mm/ksm.c
> index a6d43cf9a982..1bd7b9710e29 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1214,6 +1214,9 @@ static int try_to_merge_one_page(struct vm_area_struct *vma,
>  	if (!PageAnon(page))
>  		goto out;
>  
> +	if (page_keyid(page) != page_keyid(kpage))
> +		goto out;
> +
>  	/*
>  	 * We need the page lock to read a stable PageSwapCache in
>  	 * write_protect_page().  We use trylock_page() instead of
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 07/18] x86/mm: Introduce variables to store number, shift and mask of KeyIDs
  2018-06-26 14:22 ` [PATCHv4 07/18] x86/mm: Introduce variables to store number, shift and mask of KeyIDs Kirill A. Shutemov
@ 2018-07-09 18:09   ` Konrad Rzeszutek Wilk
  2018-07-10 10:48     ` Kirill A. Shutemov
  0 siblings, 1 reply; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-07-09 18:09 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky,
	Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm

> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> index 4b101dd6e52f..4ebee899c363 100644
> --- a/arch/x86/mm/Makefile
> +++ b/arch/x86/mm/Makefile
> @@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)		+= pti.o
>  obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt.o
>  obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_identity.o
>  obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
> +
> +obj-$(CONFIG_X86_INTEL_MKTME)	+= mktme.o

Any particular reason to have x86 in the CONFIG?

> diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
> new file mode 100644
> index 000000000000..467f1b26c737
> --- /dev/null
> +++ b/arch/x86/mm/mktme.c
> @@ -0,0 +1,5 @@
> +#include <asm/mktme.h>
> +
> +phys_addr_t mktme_keyid_mask;
> +int mktme_nr_keyids;
> +int mktme_keyid_shift;
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 13/18] x86/mm: Allow to disable MKTME after enumeration
  2018-06-26 14:22 ` [PATCHv4 13/18] x86/mm: Allow to disable MKTME after enumeration Kirill A. Shutemov
@ 2018-07-09 18:20   ` Konrad Rzeszutek Wilk
  2018-07-10 10:49     ` Kirill A. Shutemov
  0 siblings, 1 reply; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-07-09 18:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky,
	Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm

On Tue, Jun 26, 2018 at 05:22:40PM +0300, Kirill A. Shutemov wrote:
> The new helper mktme_disable() allows to disable MKTME even if it's
> enumerated successfully. MKTME initialization may fail and this
> functionality allows system to boot regardless of the failure.
> 
> MKTME needs per-KeyID direct mapping. It requires a lot more virtual
> address space which may be a problem in 4-level paging mode. If the
> system has more physical memory than we can handle with MKTME.

.. then what should happen?
> The feature allows to fail MKTME, but boot the system successfully.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/include/asm/mktme.h | 2 ++
>  arch/x86/kernel/cpu/intel.c  | 5 +----
>  arch/x86/mm/mktme.c          | 9 +++++++++
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
> index 44409b8bbaca..ebbee6a0c495 100644
> --- a/arch/x86/include/asm/mktme.h
> +++ b/arch/x86/include/asm/mktme.h
> @@ -6,6 +6,8 @@
>  
>  struct vm_area_struct;
>  
> +void mktme_disable(void);
> +
>  #ifdef CONFIG_X86_INTEL_MKTME
>  extern phys_addr_t mktme_keyid_mask;
>  extern int mktme_nr_keyids;
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index efc9e9fc47d4..75e3b2602b4a 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -591,10 +591,7 @@ static void detect_tme(struct cpuinfo_x86 *c)
>  		 * Maybe needed if there's inconsistent configuation
>  		 * between CPUs.
>  		 */
> -		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
> -		mktme_keyid_mask = 0;
> -		mktme_keyid_shift = 0;
> -		mktme_nr_keyids = 0;
> +		mktme_disable();
>  	}
>  #endif
>  
> diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
> index 1194496633ce..bb6210dbcf0e 100644
> --- a/arch/x86/mm/mktme.c
> +++ b/arch/x86/mm/mktme.c
> @@ -13,6 +13,15 @@ static inline bool mktme_enabled(void)
>  	return static_branch_unlikely(&mktme_enabled_key);
>  }
>  
> +void mktme_disable(void)
> +{
> +	physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
> +	mktme_keyid_mask = 0;
> +	mktme_keyid_shift = 0;
> +	mktme_nr_keyids = 0;
> +	static_branch_disable(&mktme_enabled_key);
> +}
> +
>  int page_keyid(const struct page *page)
>  {
>  	if (!mktme_enabled())
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 15/18] x86/mm: Calculate direct mapping size
  2018-06-26 14:22 ` [PATCHv4 15/18] x86/mm: Calculate direct mapping size Kirill A. Shutemov
@ 2018-07-09 18:32   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-07-09 18:32 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky,
	Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm

On Tue, Jun 26, 2018 at 05:22:42PM +0300, Kirill A. Shutemov wrote:
> The kernel needs to have a way to access encrypted memory. We have two
> option on how approach it:
> 
>  - Create temporary mappings every time kernel needs access to encrypted
>    memory. That's basically brings highmem and its overhead back.
> 
>  - Create multiple direct mappings, one per-KeyID. In this setup we
>    don't need to create temporary mappings on the fly -- encrypted
>    memory is permanently available in kernel address space.
> 
> We take the second approach as it has lower overhead.
> 
> It's worth noting that with per-KeyID direct mappings compromised kernel
> would give access to decrypted data right away without additional tricks
> to get memory mapped with the correct KeyID.
> 
> Per-KeyID mappings require a lot more virtual address space. On 4-level
> machine with 64 KeyIDs we max out 46-bit virtual address space dedicated
> for direct mapping with 1TiB of RAM. Given that we round up any
> calculation on direct mapping size to 1TiB, we effectively claim all
> 46-bit address space for direct mapping on such machine regardless of
> RAM size.
> 
> Increased usage of virtual address space has implications for KASLR:
> we have less space for randomization. With 64 TiB claimed for direct
> mapping with 4-level we left with 27 TiB of entropy to place
> page_offset_base, vmalloc_base and vmemmap_base.
> 
> 5-level paging provides much wider virtual address space and KASLR
> doesn't suffer significantly from per-KeyID direct mappings.
> 
> It's preferred to run MKTME with 5-level paging.


Why not make this a config dependency then?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME
  2018-06-26 14:22 ` [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
  2018-06-26 17:30   ` Randy Dunlap
@ 2018-07-09 18:36   ` Konrad Rzeszutek Wilk
  2018-07-09 18:44     ` Dave Hansen
  1 sibling, 1 reply; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-07-09 18:36 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky,
	Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm

On Tue, Jun 26, 2018 at 05:22:45PM +0300, Kirill A. Shutemov wrote:
> Add new config option to enabled/disable Multi-Key Total Memory
> Encryption support.
> 
> MKTME uses MEMORY_PHYSICAL_PADDING to reserve enough space in per-KeyID
> direct mappings for memory hotplug.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/Kconfig | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index fa5e1ec09247..9a843bd63108 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1523,6 +1523,23 @@ config ARCH_USE_MEMREMAP_PROT
>  	def_bool y
>  	depends on AMD_MEM_ENCRYPT
>  
> +config X86_INTEL_MKTME

Rip out the X86?
> +	bool "Intel Multi-Key Total Memory Encryption"
> +	select DYNAMIC_PHYSICAL_MASK
> +	select PAGE_EXTENSION

And maybe select 5-page?
> +	depends on X86_64 && CPU_SUP_INTEL
> +	---help---
> +	  Say yes to enable support for Multi-Key Total Memory Encryption.
> +	  This requires an Intel processor that has support of the feature.
> +
> +	  Multikey Total Memory Encryption (MKTME) is a technology that allows
> +	  transparent memory encryption in and upcoming Intel platforms.

How about saying which CPUs? Or just dropping this?
> +
> +	  MKTME is built on top of TME. TME allows encryption of the entirety
> +	  of system memory using a single key. MKTME allows having multiple
> +	  encryption domains, each having own key -- different memory pages can
> +	  be encrypted with different keys.
> +
>  # Common NUMA Features
>  config NUMA
>  	bool "Numa Memory Allocation and Scheduler Support"
> @@ -2199,7 +2216,7 @@ config RANDOMIZE_MEMORY
>  
>  config MEMORY_PHYSICAL_PADDING
>  	hex "Physical memory mapping padding" if EXPERT
> -	depends on RANDOMIZE_MEMORY
> +	depends on RANDOMIZE_MEMORY || X86_INTEL_MKTME
>  	default "0xa" if MEMORY_HOTPLUG
>  	default "0x0"
>  	range 0x1 0x40 if MEMORY_HOTPLUG
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME
  2018-07-09 18:36   ` Konrad Rzeszutek Wilk
@ 2018-07-09 18:44     ` Dave Hansen
  2018-07-09 18:52       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 36+ messages in thread
From: Dave Hansen @ 2018-07-09 18:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Kirill A. Shutemov
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky,
	Kai Huang, Jacob Pan, linux-kernel, linux-mm

On 07/09/2018 11:36 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Jun 26, 2018 at 05:22:45PM +0300, Kirill A. Shutemov wrote:
> Rip out the X86?
>> +	bool "Intel Multi-Key Total Memory Encryption"
>> +	select DYNAMIC_PHYSICAL_MASK
>> +	select PAGE_EXTENSION
> 
> And maybe select 5-page?

Why?  It's not a strict dependency.  You *can* build a 4-level kernel
and run it on smaller systems.

>> +	depends on X86_64 && CPU_SUP_INTEL
>> +	---help---
>> +	  Say yes to enable support for Multi-Key Total Memory Encryption.
>> +	  This requires an Intel processor that has support of the feature.
>> +
>> +	  Multikey Total Memory Encryption (MKTME) is a technology that allows
>> +	  transparent memory encryption in and upcoming Intel platforms.
> 
> How about saying which CPUs? Or just dropping this?

We don't have any information about specifically which processors with
have this feature to share.  But, this config text does tell someone
that they can't use this feature on today's platforms.

We _did_ say this for previous features (protection keys stands out
where we said it was for "Skylake Servers" IIRC), but we are not yet
able to do the same for this feature.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME
  2018-07-09 18:44     ` Dave Hansen
@ 2018-07-09 18:52       ` Konrad Rzeszutek Wilk
  2018-07-09 18:59         ` Dave Hansen
  0 siblings, 1 reply; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-07-09 18:52 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Tom Lendacky, Kai Huang, Jacob Pan, linux-kernel,
	linux-mm

On Mon, Jul 09, 2018 at 11:44:33AM -0700, Dave Hansen wrote:
> On 07/09/2018 11:36 AM, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jun 26, 2018 at 05:22:45PM +0300, Kirill A. Shutemov wrote:
> > Rip out the X86?
> >> +	bool "Intel Multi-Key Total Memory Encryption"
> >> +	select DYNAMIC_PHYSICAL_MASK
> >> +	select PAGE_EXTENSION
> > 
> > And maybe select 5-page?
> 
> Why?  It's not a strict dependency.  You *can* build a 4-level kernel
> and run it on smaller systems.

Sure, but in one of his commits he mentions that we may run in overlapping
physical memory if we use 4-level paging. Hence why not just move to 5-level
paging and simplify this.
> 
> >> +	depends on X86_64 && CPU_SUP_INTEL
> >> +	---help---
> >> +	  Say yes to enable support for Multi-Key Total Memory Encryption.
> >> +	  This requires an Intel processor that has support of the feature.
> >> +
> >> +	  Multikey Total Memory Encryption (MKTME) is a technology that allows
> >> +	  transparent memory encryption in and upcoming Intel platforms.
> > 
> > How about saying which CPUs? Or just dropping this?
> 
> We don't have any information about specifically which processors with
> have this feature to share.  But, this config text does tell someone
> that they can't use this feature on today's platforms.
> 
> We _did_ say this for previous features (protection keys stands out
> where we said it was for "Skylake Servers" IIRC), but we are not yet
> able to do the same for this feature.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME
  2018-07-09 18:52       ` Konrad Rzeszutek Wilk
@ 2018-07-09 18:59         ` Dave Hansen
  2018-07-09 20:29           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 36+ messages in thread
From: Dave Hansen @ 2018-07-09 18:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Tom Lendacky, Kai Huang, Jacob Pan, linux-kernel,
	linux-mm

On 07/09/2018 11:52 AM, Konrad Rzeszutek Wilk wrote:
> On Mon, Jul 09, 2018 at 11:44:33AM -0700, Dave Hansen wrote:
>> On 07/09/2018 11:36 AM, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Jun 26, 2018 at 05:22:45PM +0300, Kirill A. Shutemov wrote:
>>> Rip out the X86?
>>>> +	bool "Intel Multi-Key Total Memory Encryption"
>>>> +	select DYNAMIC_PHYSICAL_MASK
>>>> +	select PAGE_EXTENSION
>>>
>>> And maybe select 5-page?
>>
>> Why?  It's not a strict dependency.  You *can* build a 4-level kernel
>> and run it on smaller systems.
> 
> Sure, but in one of his commits he mentions that we may run in overlapping
> physical memory if we use 4-level paging. Hence why not just move to 5-level
> paging and simplify this.

I'm not sure it _actually_ simplifies anything.  We still need code to
handle the cases where we bump into the limits because even 5-level
paging systems can hit the *architectural* limits.  We just don't think
we'll bump into those limits any time soon in practice since they're
512x larger on 5-level systems.

But, a future system that needs physical address space or has a bunch
more KeyID bits might bump into the limits.

It's also _possible_ that a processor could come out that supports MKTME
but not 5-level paging, or a hypervisor would expose such a
configuration to a guest.  We've asked our colleagues very nicely that
Intel not make a processor that does this, but it's still possible one
shows up.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME
  2018-07-09 18:59         ` Dave Hansen
@ 2018-07-09 20:29           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-07-09 20:29 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Tom Lendacky, Kai Huang, Jacob Pan, linux-kernel,
	linux-mm

On Mon, Jul 09, 2018 at 11:59:33AM -0700, Dave Hansen wrote:
> On 07/09/2018 11:52 AM, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jul 09, 2018 at 11:44:33AM -0700, Dave Hansen wrote:
> >> On 07/09/2018 11:36 AM, Konrad Rzeszutek Wilk wrote:
> >>> On Tue, Jun 26, 2018 at 05:22:45PM +0300, Kirill A. Shutemov wrote:
> >>> Rip out the X86?
> >>>> +	bool "Intel Multi-Key Total Memory Encryption"
> >>>> +	select DYNAMIC_PHYSICAL_MASK
> >>>> +	select PAGE_EXTENSION
> >>>
> >>> And maybe select 5-page?
> >>
> >> Why?  It's not a strict dependency.  You *can* build a 4-level kernel
> >> and run it on smaller systems.
> > 
> > Sure, but in one of his commits he mentions that we may run in overlapping
> > physical memory if we use 4-level paging. Hence why not just move to 5-level
> > paging and simplify this.
> 
> I'm not sure it _actually_ simplifies anything.  We still need code to
> handle the cases where we bump into the limits because even 5-level
> paging systems can hit the *architectural* limits.  We just don't think
> we'll bump into those limits any time soon in practice since they're
> 512x larger on 5-level systems.
> 
> But, a future system that needs physical address space or has a bunch
> more KeyID bits might bump into the limits.

Yikes. So when will we expand to 128-bit page fields?

> 
> It's also _possible_ that a processor could come out that supports MKTME
> but not 5-level paging, or a hypervisor would expose such a
> configuration to a guest.  We've asked our colleagues very nicely that
> Intel not make a processor that does this, but it's still possible one
> shows up.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 07/18] x86/mm: Introduce variables to store number, shift and mask of KeyIDs
  2018-07-09 18:09   ` Konrad Rzeszutek Wilk
@ 2018-07-10 10:48     ` Kirill A. Shutemov
  0 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-07-10 10:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Tom Lendacky, Dave Hansen, Kai Huang, Jacob Pan,
	linux-kernel, linux-mm

On Mon, Jul 09, 2018 at 02:09:49PM -0400, Konrad Rzeszutek Wilk wrote:
> > diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> > index 4b101dd6e52f..4ebee899c363 100644
> > --- a/arch/x86/mm/Makefile
> > +++ b/arch/x86/mm/Makefile
> > @@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)		+= pti.o
> >  obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt.o
> >  obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_identity.o
> >  obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
> > +
> > +obj-$(CONFIG_X86_INTEL_MKTME)	+= mktme.o
> 
> Any particular reason to have x86 in the CONFIG?

It is consistent with MPX and protection keys.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 13/18] x86/mm: Allow to disable MKTME after enumeration
  2018-07-09 18:20   ` Konrad Rzeszutek Wilk
@ 2018-07-10 10:49     ` Kirill A. Shutemov
  2018-07-10 11:21       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 36+ messages in thread
From: Kirill A. Shutemov @ 2018-07-10 10:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Tom Lendacky, Dave Hansen, Kai Huang, Jacob Pan,
	linux-kernel, linux-mm

On Mon, Jul 09, 2018 at 02:20:55PM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jun 26, 2018 at 05:22:40PM +0300, Kirill A. Shutemov wrote:
> > The new helper mktme_disable() allows to disable MKTME even if it's
> > enumerated successfully. MKTME initialization may fail and this
> > functionality allows system to boot regardless of the failure.
> > 
> > MKTME needs per-KeyID direct mapping. It requires a lot more virtual
> > address space which may be a problem in 4-level paging mode. If the
> > system has more physical memory than we can handle with MKTME.
> 
> .. then what should happen?

We fail MKTME initialization and boot the system. See next sentence.

> > The feature allows to fail MKTME, but boot the system successfully.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/include/asm/mktme.h | 2 ++
> >  arch/x86/kernel/cpu/intel.c  | 5 +----
> >  arch/x86/mm/mktme.c          | 9 +++++++++
> >  3 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
> > index 44409b8bbaca..ebbee6a0c495 100644
> > --- a/arch/x86/include/asm/mktme.h
> > +++ b/arch/x86/include/asm/mktme.h
> > @@ -6,6 +6,8 @@
> >  
> >  struct vm_area_struct;
> >  
> > +void mktme_disable(void);
> > +
> >  #ifdef CONFIG_X86_INTEL_MKTME
> >  extern phys_addr_t mktme_keyid_mask;
> >  extern int mktme_nr_keyids;
> > diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> > index efc9e9fc47d4..75e3b2602b4a 100644
> > --- a/arch/x86/kernel/cpu/intel.c
> > +++ b/arch/x86/kernel/cpu/intel.c
> > @@ -591,10 +591,7 @@ static void detect_tme(struct cpuinfo_x86 *c)
> >  		 * Maybe needed if there's inconsistent configuation
> >  		 * between CPUs.
> >  		 */
> > -		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
> > -		mktme_keyid_mask = 0;
> > -		mktme_keyid_shift = 0;
> > -		mktme_nr_keyids = 0;
> > +		mktme_disable();
> >  	}
> >  #endif
> >  
> > diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
> > index 1194496633ce..bb6210dbcf0e 100644
> > --- a/arch/x86/mm/mktme.c
> > +++ b/arch/x86/mm/mktme.c
> > @@ -13,6 +13,15 @@ static inline bool mktme_enabled(void)
> >  	return static_branch_unlikely(&mktme_enabled_key);
> >  }
> >  
> > +void mktme_disable(void)
> > +{
> > +	physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
> > +	mktme_keyid_mask = 0;
> > +	mktme_keyid_shift = 0;
> > +	mktme_nr_keyids = 0;
> > +	static_branch_disable(&mktme_enabled_key);
> > +}
> > +
> >  int page_keyid(const struct page *page)
> >  {
> >  	if (!mktme_enabled())
> > -- 
> > 2.18.0
> > 
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv4 13/18] x86/mm: Allow to disable MKTME after enumeration
  2018-07-10 10:49     ` Kirill A. Shutemov
@ 2018-07-10 11:21       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-07-10 11:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Tom Lendacky, Dave Hansen, Kai Huang, Jacob Pan,
	linux-kernel, linux-mm

On July 10, 2018 6:49:10 AM EDT, "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>On Mon, Jul 09, 2018 at 02:20:55PM -0400, Konrad Rzeszutek Wilk wrote:
>> On Tue, Jun 26, 2018 at 05:22:40PM +0300, Kirill A. Shutemov wrote:
>> > The new helper mktme_disable() allows to disable MKTME even if it's
>> > enumerated successfully. MKTME initialization may fail and this
>> > functionality allows system to boot regardless of the failure.
>> > 
>> > MKTME needs per-KeyID direct mapping. It requires a lot more
>virtual
>> > address space which may be a problem in 4-level paging mode. If the
>> > system has more physical memory than we can handle with MKTME.
>> 
>> .. then what should happen?
>
>We fail MKTME initialization and boot the system. See next sentence.

Perhaps you can then remove the "." and join the sentences 
>
>> > The feature allows to fail MKTME, but boot the system successfully.
>> > 
>> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> > ---
>> >  arch/x86/include/asm/mktme.h | 2 ++
>> >  arch/x86/kernel/cpu/intel.c  | 5 +----
>> >  arch/x86/mm/mktme.c          | 9 +++++++++
>> >  3 files changed, 12 insertions(+), 4 deletions(-)
>> > 
>> > diff --git a/arch/x86/include/asm/mktme.h
>b/arch/x86/include/asm/mktme.h
>> > index 44409b8bbaca..ebbee6a0c495 100644
>> > --- a/arch/x86/include/asm/mktme.h
>> > +++ b/arch/x86/include/asm/mktme.h
>> > @@ -6,6 +6,8 @@
>> >  
>> >  struct vm_area_struct;
>> >  
>> > +void mktme_disable(void);
>> > +
>> >  #ifdef CONFIG_X86_INTEL_MKTME
>> >  extern phys_addr_t mktme_keyid_mask;
>> >  extern int mktme_nr_keyids;
>> > diff --git a/arch/x86/kernel/cpu/intel.c
>b/arch/x86/kernel/cpu/intel.c
>> > index efc9e9fc47d4..75e3b2602b4a 100644
>> > --- a/arch/x86/kernel/cpu/intel.c
>> > +++ b/arch/x86/kernel/cpu/intel.c
>> > @@ -591,10 +591,7 @@ static void detect_tme(struct cpuinfo_x86 *c)
>> >  		 * Maybe needed if there's inconsistent configuation
>> >  		 * between CPUs.
>> >  		 */
>> > -		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
>> > -		mktme_keyid_mask = 0;
>> > -		mktme_keyid_shift = 0;
>> > -		mktme_nr_keyids = 0;
>> > +		mktme_disable();
>> >  	}
>> >  #endif
>> >  
>> > diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
>> > index 1194496633ce..bb6210dbcf0e 100644
>> > --- a/arch/x86/mm/mktme.c
>> > +++ b/arch/x86/mm/mktme.c
>> > @@ -13,6 +13,15 @@ static inline bool mktme_enabled(void)
>> >  	return static_branch_unlikely(&mktme_enabled_key);
>> >  }
>> >  
>> > +void mktme_disable(void)
>> > +{
>> > +	physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
>> > +	mktme_keyid_mask = 0;
>> > +	mktme_keyid_shift = 0;
>> > +	mktme_nr_keyids = 0;
>> > +	static_branch_disable(&mktme_enabled_key);
>> > +}
>> > +
>> >  int page_keyid(const struct page *page)
>> >  {
>> >  	if (!mktme_enabled())
>> > -- 
>> > 2.18.0
>> > 
>> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2018-07-10 11:24 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-26 14:22 [PATCHv4 00/18] MKTME enabling Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 01/18] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 02/18] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
2018-07-09 18:03   ` Konrad Rzeszutek Wilk
2018-06-26 14:22 ` [PATCHv4 03/18] mm/page_alloc: Unify alloc_hugepage_vma() Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 04/18] mm/page_alloc: Handle allocation for encrypted memory Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 05/18] mm/khugepaged: Handle encrypted pages Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 06/18] x86/mm: Mask out KeyID bits from page table entry pfn Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 07/18] x86/mm: Introduce variables to store number, shift and mask of KeyIDs Kirill A. Shutemov
2018-07-09 18:09   ` Konrad Rzeszutek Wilk
2018-07-10 10:48     ` Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 08/18] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify() Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 09/18] x86/mm: Implement page_keyid() using page_ext Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 10/18] x86/mm: Implement vma_keyid() Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 11/18] x86/mm: Implement prep_encrypted_page() and arch_free_page() Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 12/18] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 13/18] x86/mm: Allow to disable MKTME after enumeration Kirill A. Shutemov
2018-07-09 18:20   ` Konrad Rzeszutek Wilk
2018-07-10 10:49     ` Kirill A. Shutemov
2018-07-10 11:21       ` Konrad Rzeszutek Wilk
2018-06-26 14:22 ` [PATCHv4 14/18] x86/mm: Detect MKTME early Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 15/18] x86/mm: Calculate direct mapping size Kirill A. Shutemov
2018-07-09 18:32   ` Konrad Rzeszutek Wilk
2018-06-26 14:22 ` [PATCHv4 16/18] x86/mm: Implement sync_direct_mapping() Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 17/18] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
2018-06-26 16:38   ` Dave Hansen
2018-06-27 21:56     ` Kirill A. Shutemov
2018-06-26 14:22 ` [PATCHv4 18/18] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
2018-06-26 17:30   ` Randy Dunlap
2018-06-27 21:57     ` Kirill A. Shutemov
2018-06-27 23:48       ` Randy Dunlap
2018-07-09 18:36   ` Konrad Rzeszutek Wilk
2018-07-09 18:44     ` Dave Hansen
2018-07-09 18:52       ` Konrad Rzeszutek Wilk
2018-07-09 18:59         ` Dave Hansen
2018-07-09 20:29           ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.