[PATCHv5 00/19] MKTME enabling

* [PATCHv5 00/19] MKTME enabling
@ 2018-07-17 11:20 Kirill A. Shutemov
  2018-07-17 11:20 ` [PATCHv5 01/19] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
                   ` (18 more replies)
  0 siblings, 19 replies; 73+ messages in thread
From: Kirill A. Shutemov @ 2018-07-17 11:20 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Tom Lendacky
  Cc: Dave Hansen, Kai Huang, Jacob Pan, linux-kernel, linux-mm,
	Kirill A. Shutemov

Multikey Total Memory Encryption (MKTME)[1] is a technology that allows
transparent memory encryption in upcoming Intel platforms. See overview
below.

Here's updated version of my patchset that brings support of MKTME.
Please review and consider applying.

The patchset provides in-kernel infrastructure for MKTME, but doesn't yet
have userspace interface.

First 8 patches are for core-mm. The rest is x86-specific.

The patchset is on top of tip- tree plus page_ext cleanups I've posted
earlier[2]. page_ext cleanups are in -mm tree now.

Below is performance numbers for kernel build. Enabling MKTME doesn't
affect performance of non-encrypted memory allocation.

For encrypted memory allocation requires cache flush on allocation and
freeing encrypted memory. For kernel build it results in ~20% performance
degradation if we allocate all anonymous memory as encrypted.

We would need to maintain per-KeyID pool of free pages to minimize cache
flushing. I'm going to work on the optimization on top of this patchset.

The patchset also can be found here:

git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git mktme/wip

v5:
 - Do not merge VMAs with different KeyID (for real).

 - Do not use zero page in encrypted VMAs.

 - Avoid division in __pa(). The division is replaced with masking which
   makes it near-free. I was not able to measure difference comparing to
   base line. direct_mapping_size now has to be power-of-2 if MKTME
   enabled. Only in this case we can use masking there.

v4:
 - Address Dave's feedback.

 - Add performance numbers.

v3:
 - Kernel now can access encrypted pages via per-KeyID direct mapping.

 - Rework page allocation for encrypted memory to minimize overhead on
   non-encrypted pages. It comes with cost for allocation of encrypted
   pages: we have to flush cache on every time we allocate *and* free
   encrypted page. We will need to optimize it later.

v2:
 - Store KeyID of page in page_ext->flags rather than in anon_vma.
   anon_vma approach turned out to be problematic. The main problem is
   that anon_vma of the page is no longer stable after last mapcount has
   gone. We would like to preserve last used KeyID even for freed
   pages as it allows to avoid unnecessary cache flushing on allocation
   of an encrypted page. page_ext serves this well enough.

 - KeyID is now propagated through page allocator. No need in GFP_ENCRYPT
   anymore.

 - Patch "Decouple dynamic __PHYSICAL_MASK from AMD SME" has been fix to
   work with AMD SEV (need to be confirmed by AMD folks).

------------------------------------------------------------------------------

MKTME is built on top of TME. TME allows encryption of the entirety of
system memory using a single key. MKTME allows to have multiple encryption
domains, each having own key -- different memory pages can be encrypted
with different keys.

Key design points of Intel MKTME:

 - Initial HW implementation would support upto 63 keys (plus one default
   TME key). But the number of keys may be as low as 3, depending to SKU
   and BIOS settings

 - To access encrypted memory you need to use mapping with proper KeyID
   int the page table entry. KeyID is encoded in upper bits of PFN in page
   table entry.

 - CPU does not enforce coherency between mappings of the same physical
   page with different KeyIDs or encryption keys. We wound need to take
   care about flushing cache on allocation of encrypted page and on
   returning it back to free pool.

 - For managing keys, there's MKTME_KEY_PROGRAM leaf of the new PCONFIG
   (platform configuration) instruction. It allows load and clear keys
   associated with a KeyID. You can also ask CPU to generate a key for
   you or disable memory encryption when a KeyID is used.

Performance numbers for kernel build:

Base (tip- tree):

 Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

    5664711.936917      task-clock (msec)         #   34.815 CPUs utilized            ( +-  0.02% )
         1,033,886      context-switches          #    0.183 K/sec                    ( +-  0.37% )
           189,308      cpu-migrations            #    0.033 K/sec                    ( +-  0.39% )
       104,951,554      page-faults               #    0.019 M/sec                    ( +-  0.01% )
16,907,670,543,945      cycles                    #    2.985 GHz                      ( +-  0.01% )
12,662,345,427,578      stalled-cycles-frontend   #   74.89% frontend cycles idle     ( +-  0.02% )
 9,936,469,878,830      instructions              #    0.59  insn per cycle
                                                  #    1.27  stalled cycles per insn  ( +-  0.00% )
 2,179,100,082,611      branches                  #  384.680 M/sec                    ( +-  0.00% )
    91,235,200,652      branch-misses             #    4.19% of all branches          ( +-  0.01% )

     162.706797586 seconds time elapsed                                          ( +-  0.04% )

CONFIG_X86_INTEL_MKTME=y, no encrypted memory:

 Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

    5668508.245004      task-clock (msec)         #   34.872 CPUs utilized            ( +-  0.02% )
         1,032,034      context-switches          #    0.182 K/sec                    ( +-  0.90% )
           188,098      cpu-migrations            #    0.033 K/sec                    ( +-  1.15% )
       104,964,084      page-faults               #    0.019 M/sec                    ( +-  0.01% )
16,919,270,913,026      cycles                    #    2.985 GHz                      ( +-  0.02% )
12,672,067,815,805      stalled-cycles-frontend   #   74.90% frontend cycles idle     ( +-  0.02% )
 9,942,560,135,477      instructions              #    0.59  insn per cycle
                                                  #    1.27  stalled cycles per insn  ( +-  0.00% )
 2,180,800,745,687      branches                  #  384.722 M/sec                    ( +-  0.00% )
    91,167,857,700      branch-misses             #    4.18% of all branches          ( +-  0.02% )

     162.552503629 seconds time elapsed                                          ( +-  0.10% )

CONFIG_X86_INTEL_MKTME=y, all anonymous memory encrypted with KeyID-1, pay
cache flush overhead on allocation and free:

 Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

    7041851.999259      task-clock (msec)         #   35.915 CPUs utilized            ( +-  0.01% )
         1,118,938      context-switches          #    0.159 K/sec                    ( +-  0.49% )
           197,039      cpu-migrations            #    0.028 K/sec                    ( +-  0.80% )
       104,970,021      page-faults               #    0.015 M/sec                    ( +-  0.00% )
21,025,639,251,627      cycles                    #    2.986 GHz                      ( +-  0.01% )
16,729,451,765,492      stalled-cycles-frontend   #   79.57% frontend cycles idle     ( +-  0.02% )
10,010,727,735,588      instructions              #    0.48  insn per cycle
                                                  #    1.67  stalled cycles per insn  ( +-  0.00% )
 2,197,110,181,421      branches                  #  312.007 M/sec                    ( +-  0.00% )
    91,119,463,513      branch-misses             #    4.15% of all branches          ( +-  0.01% )

     196.072361087 seconds time elapsed                                          ( +-  0.14% )

[1] https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
[2] https://lkml.kernel.org/r/20180531135457.20167-1-kirill.shutemov@linux.intel.com

Kirill A. Shutemov (19):
  mm: Do no merge VMAs with different encryption KeyIDs
  mm: Do not use zero page in encrypted pages
  mm/ksm: Do not merge pages with different KeyIDs
  mm/page_alloc: Unify alloc_hugepage_vma()
  mm/page_alloc: Handle allocation for encrypted memory
  mm/khugepaged: Handle encrypted pages
  x86/mm: Mask out KeyID bits from page table entry pfn
  x86/mm: Introduce variables to store number, shift and mask of KeyIDs
  x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
  x86/mm: Implement page_keyid() using page_ext
  x86/mm: Implement vma_keyid()
  x86/mm: Implement prep_encrypted_page() and arch_free_page()
  x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING
  x86/mm: Allow to disable MKTME after enumeration
  x86/mm: Detect MKTME early
  x86/mm: Calculate direct mapping size
  x86/mm: Implement sync_direct_mapping()
  x86/mm: Handle encrypted memory in page_to_virt() and __pa()
  x86: Introduce CONFIG_X86_INTEL_MKTME

 Documentation/x86/x86_64/mm.txt      |   4 +
 arch/alpha/include/asm/page.h        |   2 +-
 arch/s390/include/asm/pgtable.h      |   2 +-
 arch/x86/Kconfig                     |  21 +-
 arch/x86/include/asm/mktme.h         |  47 +++
 arch/x86/include/asm/page.h          |   1 +
 arch/x86/include/asm/page_64.h       |   4 +-
 arch/x86/include/asm/pgtable_types.h |  15 +-
 arch/x86/include/asm/setup.h         |   6 +
 arch/x86/kernel/cpu/intel.c          |  32 +-
 arch/x86/kernel/head64.c             |   4 +
 arch/x86/kernel/setup.c              |   3 +
 arch/x86/mm/Makefile                 |   2 +
 arch/x86/mm/init_64.c                |  68 ++++
 arch/x86/mm/kaslr.c                  |  11 +-
 arch/x86/mm/mktme.c                  | 546 +++++++++++++++++++++++++++
 fs/userfaultfd.c                     |   7 +-
 include/linux/gfp.h                  |  54 ++-
 include/linux/migrate.h              |  12 +-
 include/linux/mm.h                   |  20 +-
 include/linux/page_ext.h             |  11 +-
 mm/compaction.c                      |   1 +
 mm/huge_memory.c                     |   3 +-
 mm/khugepaged.c                      |  10 +
 mm/ksm.c                             |   3 +
 mm/madvise.c                         |   2 +-
 mm/memory.c                          |   3 +-
 mm/mempolicy.c                       |  31 +-
 mm/migrate.c                         |   4 +-
 mm/mlock.c                           |   2 +-
 mm/mmap.c                            |  31 +-
 mm/mprotect.c                        |   2 +-
 mm/page_alloc.c                      |  47 +++
 mm/page_ext.c                        |   3 +
 34 files changed, 954 insertions(+), 60 deletions(-)
 create mode 100644 arch/x86/include/asm/mktme.h
 create mode 100644 arch/x86/mm/mktme.c

-- 
2.18.0

^ permalink raw reply	[flat|nested] 73+ messages in thread