KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH, RFC 00/62] Intel MKTME enabling
@ 2019-05-08 14:43 Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 01/62] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
                   ` (63 more replies)
  0 siblings, 64 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

= Intro =

The patchset brings enabling of Intel Multi-Key Total Memory Encryption.
It consists of changes into multiple subsystems:

 * Core MM: infrastructure for allocation pages, dealing with encrypted VMAs
   and providing API setup encrypted mappings.
 * arch/x86: feature enumeration, program keys into hardware, setup
   page table entries for encrypted pages and more.
 * Key management service: setup and management of encryption keys.
 * DMA/IOMMU: dealing with encrypted memory on IO side.
 * KVM: interaction with virtualization side.
 * Documentation: description of APIs and usage examples.

The patchset is huge. This submission aims to give view to the full picture and
get feedback on the overall design. The patchset will be split into more
digestible pieces later.

Please review. Any feedback is welcome.

= Overview =

Multi-Key Total Memory Encryption (MKTME)[1] is a technology that allows
transparent memory encryption in upcoming Intel platforms.  It uses a new
instruction (PCONFIG) for key setup and selects a key for individual pages by
repurposing physical address bits in the page tables.

These patches add support for MKTME into the existing kernel keyring subsystem
and add a new mprotect_encrypt() system call that can be used by applications
to encrypt anonymous memory with keys obtained from the keyring.

This architecture supports encrypting both normal, volatile DRAM and persistent
memory.  However, these patches do not implement persistent memory support.  We
anticipate adding that support next.

== Hardware Background ==

MKTME is built on top of an existing single-key technology called TME.  TME
encrypts all system memory using a single key generated by the CPU on every
boot of the system. TME provides mitigation against physical attacks, such as
physically removing a DIMM or watching memory bus traffic.

MKTME enables the use of multiple encryption keys[2], allowing selection of the
encryption key per-page using the page tables.  Encryption keys are programmed
into each memory controller and the same set of keys is available to all
entities on the system with access to that memory (all cores, DMA engines,
etc...).

MKTME inherits many of the mitigations against hardware attacks from TME.  Like
TME, MKTME does not mitigate vulnerable or malicious operating systems or
virtual machine managers.  MKTME offers additional mitigations when compared to
TME.

TME and MKTME use the AES encryption algorithm in the AES-XTS mode.  This mode,
typically used for block-based storage devices, takes the physical address of
the data into account when encrypting each block.  This ensures that the
effective key is different for each block of memory. Moving encrypted content
across physical address results in garbage on read, mitigating block-relocation
attacks.  This property is the reason many of the discussed attacks require
control of a shared physical page to be handed from the victim to the attacker.

== MKTME-Provided Mitigations ==

MKTME adds a few mitigations against attacks that are not mitigated when using
TME alone.  The first set are mitigations against software attacks that are
familiar today:

 * Kernel Mapping Attacks: information disclosures that leverage the
   kernel direct map are mitigated against disclosing user data.
 * Freed Data Leak Attacks: removing an encryption key from the
   hardware mitigates future user information disclosure.

The next set are attacks that depend on specialized hardware, such as an “evil
DIMM” or a DDR interposer:

 * Cross-Domain Replay Attack: data is captured from one domain
   (guest) and replayed to another at a later time.
 * Cross-Domain Capture and Delayed Compare Attack: data is captured
   and later analyzed to discover secrets.
 * Key Wear-out Attack: data is captured and analyzed in order to
   Weaken the AES encryption itself.

More details on these attacks are below.

=== Kernel Mapping Attacks ===

Information disclosure vulnerabilities leverage the kernel direct map because
many vulnerabilities involve manipulation of kernel data structures (examples:
CVE-2017-7277, CVE-2017-9605).  We normally think of these bugs as leaking
valuable *kernel* data, but they can leak application data when application
pages are recycled for kernel use.

With this MKTME implementation, there is a direct map created for each MKTME
KeyID which is used whenever the kernel needs to access plaintext.  But, all
kernel data structures are accessed via the direct map for KeyID-0.  Thus,
memory reads which are not coordinated with the KeyID get garbage (for example,
accessing KeyID-4 data with the KeyID-0 mapping).

This means that if sensitive data encrypted using MKTME is leaked via the
KeyID-0 direct map, ciphertext decrypted with the wrong key will be disclosed.
To disclose plaintext, an attacker must “pivot” to the correct direct mapping,
which is non-trivial because there are no kernel data structures in the
KeyID!=0 direct mapping.

=== Freed Data Leak Attack ===

The kernel has a history of bugs around uninitialized data.  Usually, we think
of these bugs as leaking sensitive kernel data, but they can also be used to
leak application secrets.

MKTME can help mitigate the case where application secrets are leaked:

 * App (or VM) places a secret in a page
 * App exits or frees memory to kernel allocator
 * Page added to allocator free list
 * Attacker reallocates page to a purpose where it can read the page

Now, imagine MKTME was in use on the memory being leaked.  The data can only be
leaked as long as the key is programmed in the hardware.  If the key is
de-programmed, like after all pages are freed after a guest is shut down, any
future reads will just see ciphertext.

Basically, the key is a convenient choke-point: you can be more confident that
data encrypted with it is inaccessible once the key is removed.

=== Cross-Domain Replay Attack ===

MKTME mitigates cross-domain replay attacks where an attacker replaces an
encrypted block owned by one domain with a block owned by another domain.
MKTME does not prevent this replacement from occurring, but it does mitigate
plaintext from being disclosed if the domains use different keys.

With TME, the attack could be executed by:
 * A victim places secret in memory, at a given physical address.
   Note: AES-XTS is what restricts the attack to being performed at a
   single physical address instead of across different physical
   addresses
 * Attacker captures victim secret’s ciphertext
 * Later on, after victim frees the physical address, attacker gains
   ownership
 * Attacker puts the ciphertext at the address and get the secret
   plaintext

But, due to the presumably different keys used by the attacker and the victim,
the attacker can not successfully decrypt old ciphertext.

=== Cross-Domain Capture and Delayed Compare Attack ===

This is also referred to as a kind of dictionary attack.

Similarly, MKTME protects against cross-domain capture-and-compare attacks.
Consider the following scenario:
 * A victim places a secret in memory, at a known physical address
 * Attacker captures victim’s ciphertext
 * Attacker gains control of the target physical address, perhaps
   after the victim’s VM is shut down or its memory reclaimed.
 * Attacker computes and writes many possible plaintexts until new
   ciphertext matches content captured previously.

Secrets which have low (plaintext) entropy are more vulnerable to this attack
because they reduce the number of possible plaintexts an attacker has to
compute and write.

The attack will not work if attacker and victim uses different keys.

=== Key Wear-out Attack ===

Repeated use of an encryption key might be used by an attacker to infer
information about the key or the plaintext, weakening the encryption.  The
higher the bandwidth of the encryption engine, the more vulnerable the key is
to wear-out.  The MKTME memory encryption hardware works at the speed of the
memory bus, which has high bandwidth.

Such a weakness has been demonstrated[3] on a theoretical cipher with similar
properties as AES-XTS.

An attack would take the following steps:
 * Victim system is using TME with AES-XTS-128
 * Attacker repeatedly captures ciphertext/plaintext pairs (can be
   Performed with online hardware attack like an interposer).
 * Attacker compels repeated use of the key under attack for a
   sustained time period without a system reboot[4].
 * Attacker discovers a cipertext collision (two plaintexts
   translating to the same ciphertext)
 * Attacker can induce controlled modifications to the targeted
   plaintext by modifying the colliding ciphertext

MKTME mitigates key wear-out in two ways:
 * Keys can be rotated periodically to mitigate wear-out.  Since TME
   keys are generated at boot, rotation of TME keys requires a
   reboot.  In contrast, MKTME allows rotation while the system is
   booted.  An application could implement a policy to rotate keys at
   a frequency which is not feasible to attack.
 * In the case that MKTME is used to encrypt two guests’ memory with
   two different keys, an attack on one guest’s key would not weaken
   the key used in the second guest.

--

[1] https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
[2] The MKTME architecture supports up to 16 bits of KeyIDs, so a
    maximum of 65535 keys on top of the “TME key” at KeyID-0.  The
    first implementation is expected to support 5 bits, making 63 keys
    available to applications.  However, this is not guaranteed.  The
    number of available keys could be reduced if, for instance,
    additional physical address space is desired over additional
    KeyIDs.
[3] http://web.cs.ucdavis.edu/~rogaway/papers/offsets.pdf
[4] This sustained time required for an attack could vary from days
    to years depending on the attacker’s goals.

Alison Schofield (33):
  x86/pconfig: Set a valid encryption algorithm for all MKTME commands
  keys/mktme: Introduce a Kernel Key Service for MKTME
  keys/mktme: Preparse the MKTME key payload
  keys/mktme: Instantiate and destroy MKTME keys
  keys/mktme: Move the MKTME payload into a cache aligned structure
  keys/mktme: Strengthen the entropy of CPU generated MKTME keys
  keys/mktme: Set up PCONFIG programming targets for MKTME keys
  keys/mktme: Program MKTME keys into the platform hardware
  keys/mktme: Set up a percpu_ref_count for MKTME keys
  keys/mktme: Require CAP_SYS_RESOURCE capability for MKTME keys
  keys/mktme: Store MKTME payloads if cmdline parameter allows
  acpi: Remove __init from acpi table parsing functions
  acpi/hmat: Determine existence of an ACPI HMAT
  keys/mktme: Require ACPI HMAT to register the MKTME Key Service
  acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME
  keys/mktme: Do not allow key creation in unsafe topologies
  keys/mktme: Support CPU hotplug for MKTME key service
  keys/mktme: Find new PCONFIG targets during memory hotplug
  keys/mktme: Program new PCONFIG targets with MKTME keys
  keys/mktme: Support memory hotplug for MKTME keys
  mm: Generalize the mprotect implementation to support extensions
  syscall/x86: Wire up a system call for MKTME encryption keys
  x86/mm: Set KeyIDs in encrypted VMAs for MKTME
  mm: Add the encrypt_mprotect() system call for MKTME
  x86/mm: Keep reference counts on encrypted VMAs for MKTME
  mm: Restrict MKTME memory encryption to anonymous VMAs
  selftests/x86/mktme: Test the MKTME APIs
  x86/mktme: Overview of Multi-Key Total Memory Encryption
  x86/mktme: Document the MKTME provided security mitigations
  x86/mktme: Document the MKTME kernel configuration requirements
  x86/mktme: Document the MKTME Key Service API
  x86/mktme: Document the MKTME API for anonymous memory encryption
  x86/mktme: Demonstration program using the MKTME APIs

Jacob Pan (3):
  iommu/vt-d: Support MKTME in DMA remapping
  x86/mm: introduce common code for mem encryption
  x86/mm: Use common code for DMA memory encryption

Kai Huang (2):
  mm, x86: export several MKTME variables
  kvm, x86, mmu: setup MKTME keyID to spte for given PFN

Kirill A. Shutemov (24):
  mm: Do no merge VMAs with different encryption KeyIDs
  mm: Add helpers to setup zero page mappings
  mm/ksm: Do not merge pages with different KeyIDs
  mm/page_alloc: Unify alloc_hugepage_vma()
  mm/page_alloc: Handle allocation for encrypted memory
  mm/khugepaged: Handle encrypted pages
  x86/mm: Mask out KeyID bits from page table entry pfn
  x86/mm: Introduce variables to store number, shift and mask of KeyIDs
  x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
  x86/mm: Detect MKTME early
  x86/mm: Add a helper to retrieve KeyID for a page
  x86/mm: Add a helper to retrieve KeyID for a VMA
  x86/mm: Add hooks to allocate and free encrypted pages
  x86/mm: Map zero pages into encrypted mappings correctly
  x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING
  x86/mm: Allow to disable MKTME after enumeration
  x86/mm: Calculate direct mapping size
  x86/mm: Implement syncing per-KeyID direct mappings
  x86/mm: Handle encrypted memory in page_to_virt() and __pa()
  mm/page_ext: Export lookup_page_ext() symbol
  mm/rmap: Clear vma->anon_vma on unlink_anon_vmas()
  x86/mm: Disable MKTME on incompatible platform configurations
  x86/mm: Disable MKTME if not all system memory supports encryption
  x86: Introduce CONFIG_X86_INTEL_MKTME

 .../admin-guide/kernel-parameters.rst         |   1 +
 .../admin-guide/kernel-parameters.txt         |  11 +
 Documentation/x86/mktme/index.rst             |  13 +
 .../x86/mktme/mktme_configuration.rst         |  17 +
 Documentation/x86/mktme/mktme_demo.rst        |  53 ++
 Documentation/x86/mktme/mktme_encrypt.rst     |  57 ++
 Documentation/x86/mktme/mktme_keys.rst        |  96 +++
 Documentation/x86/mktme/mktme_mitigations.rst | 150 ++++
 Documentation/x86/mktme/mktme_overview.rst    |  57 ++
 Documentation/x86/x86_64/mm.txt               |   4 +
 arch/alpha/include/asm/page.h                 |   2 +-
 arch/x86/Kconfig                              |  29 +-
 arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
 arch/x86/include/asm/intel-family.h           |   2 +
 arch/x86/include/asm/intel_pconfig.h          |  14 +-
 arch/x86/include/asm/mem_encrypt.h            |  29 +
 arch/x86/include/asm/mktme.h                  |  93 +++
 arch/x86/include/asm/page.h                   |   4 +
 arch/x86/include/asm/page_32.h                |   1 +
 arch/x86/include/asm/page_64.h                |   4 +-
 arch/x86/include/asm/pgtable.h                |  19 +
 arch/x86/include/asm/pgtable_types.h          |  23 +-
 arch/x86/include/asm/setup.h                  |   6 +
 arch/x86/kernel/cpu/intel.c                   |  58 +-
 arch/x86/kernel/head64.c                      |   4 +
 arch/x86/kernel/setup.c                       |   3 +
 arch/x86/kvm/mmu.c                            |  18 +-
 arch/x86/mm/Makefile                          |   3 +
 arch/x86/mm/init_64.c                         |  68 ++
 arch/x86/mm/kaslr.c                           |  11 +-
 arch/x86/mm/mem_encrypt_common.c              |  28 +
 arch/x86/mm/mktme.c                           | 630 ++++++++++++++
 drivers/acpi/hmat/hmat.c                      |  67 ++
 drivers/acpi/tables.c                         |  10 +-
 drivers/firmware/efi/efi.c                    |  25 +-
 drivers/iommu/intel-iommu.c                   |  29 +-
 fs/dax.c                                      |   3 +-
 fs/exec.c                                     |   4 +-
 fs/userfaultfd.c                              |   7 +-
 include/asm-generic/pgtable.h                 |   8 +
 include/keys/mktme-type.h                     |  39 +
 include/linux/acpi.h                          |   9 +-
 include/linux/dma-direct.h                    |   4 +-
 include/linux/efi.h                           |   1 +
 include/linux/gfp.h                           |  51 +-
 include/linux/intel-iommu.h                   |   9 +-
 include/linux/mem_encrypt.h                   |  23 +-
 include/linux/migrate.h                       |  14 +-
 include/linux/mm.h                            |  27 +-
 include/linux/page_ext.h                      |  11 +-
 include/linux/syscalls.h                      |   2 +
 include/uapi/asm-generic/unistd.h             |   4 +-
 kernel/fork.c                                 |   2 +
 kernel/sys_ni.c                               |   2 +
 mm/compaction.c                               |   3 +
 mm/huge_memory.c                              |   6 +-
 mm/khugepaged.c                               |  10 +
 mm/ksm.c                                      |  17 +
 mm/madvise.c                                  |   2 +-
 mm/memory.c                                   |   3 +-
 mm/mempolicy.c                                |  30 +-
 mm/migrate.c                                  |   4 +-
 mm/mlock.c                                    |   2 +-
 mm/mmap.c                                     |  31 +-
 mm/mprotect.c                                 |  98 ++-
 mm/page_alloc.c                               |  50 ++
 mm/page_ext.c                                 |   5 +
 mm/rmap.c                                     |   4 +-
 mm/userfaultfd.c                              |   3 +-
 security/keys/Makefile                        |   1 +
 security/keys/mktme_keys.c                    | 768 ++++++++++++++++++
 .../selftests/x86/mktme/encrypt_tests.c       | 433 ++++++++++
 .../testing/selftests/x86/mktme/flow_tests.c  | 266 ++++++
 tools/testing/selftests/x86/mktme/key_tests.c | 526 ++++++++++++
 .../testing/selftests/x86/mktme/mktme_test.c  | 300 +++++++
 76 files changed, 4301 insertions(+), 122 deletions(-)
 create mode 100644 Documentation/x86/mktme/index.rst
 create mode 100644 Documentation/x86/mktme/mktme_configuration.rst
 create mode 100644 Documentation/x86/mktme/mktme_demo.rst
 create mode 100644 Documentation/x86/mktme/mktme_encrypt.rst
 create mode 100644 Documentation/x86/mktme/mktme_keys.rst
 create mode 100644 Documentation/x86/mktme/mktme_mitigations.rst
 create mode 100644 Documentation/x86/mktme/mktme_overview.rst
 create mode 100644 arch/x86/include/asm/mktme.h
 create mode 100644 arch/x86/mm/mem_encrypt_common.c
 create mode 100644 arch/x86/mm/mktme.c
 create mode 100644 include/keys/mktme-type.h
 create mode 100644 security/keys/mktme_keys.c
 create mode 100644 tools/testing/selftests/x86/mktme/encrypt_tests.c
 create mode 100644 tools/testing/selftests/x86/mktme/flow_tests.c
 create mode 100644 tools/testing/selftests/x86/mktme/key_tests.c
 create mode 100644 tools/testing/selftests/x86/mktme/mktme_test.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 01/62] mm: Do no merge VMAs with different encryption KeyIDs
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 02/62] mm: Add helpers to setup zero page mappings Kirill A. Shutemov
                   ` (62 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

VMAs with different KeyID do not mix together. Only VMAs with the same
KeyID are compatible.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/userfaultfd.c   |  7 ++++---
 include/linux/mm.h |  9 ++++++++-
 mm/madvise.c       |  2 +-
 mm/mempolicy.c     |  3 ++-
 mm/mlock.c         |  2 +-
 mm/mmap.c          | 31 +++++++++++++++++++------------
 mm/mprotect.c      |  2 +-
 7 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index f5de1e726356..6032aecda4ed 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -901,7 +901,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 				 new_flags, vma->anon_vma,
 				 vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
-				 NULL_VM_UFFD_CTX);
+				 NULL_VM_UFFD_CTX, vma_keyid(vma));
 		if (prev)
 			vma = prev;
 		else
@@ -1451,7 +1451,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		prev = vma_merge(mm, prev, start, vma_end, new_flags,
 				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
-				 ((struct vm_userfaultfd_ctx){ ctx }));
+				 ((struct vm_userfaultfd_ctx){ ctx }),
+				 vma_keyid(vma));
 		if (prev) {
 			vma = prev;
 			goto next;
@@ -1613,7 +1614,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 		prev = vma_merge(mm, prev, start, vma_end, new_flags,
 				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
-				 NULL_VM_UFFD_CTX);
+				 NULL_VM_UFFD_CTX, vma_keyid(vma));
 		if (prev) {
 			vma = prev;
 			goto next;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6b10c21630f5..13c40c43ce00 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1599,6 +1599,13 @@ static inline bool vma_is_anonymous(struct vm_area_struct *vma)
 	return !vma->vm_ops;
 }
 
+#ifndef vma_keyid
+static inline int vma_keyid(struct vm_area_struct *vma)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_SHMEM
 /*
  * The vma_is_shmem is not inline because it is used only by slow
@@ -2275,7 +2282,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
 extern struct vm_area_struct *vma_merge(struct mm_struct *,
 	struct vm_area_struct *prev, unsigned long addr, unsigned long end,
 	unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
-	struct mempolicy *, struct vm_userfaultfd_ctx);
+	struct mempolicy *, struct vm_userfaultfd_ctx, int keyid);
 extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
 extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
 	unsigned long addr, int new_below);
diff --git a/mm/madvise.c b/mm/madvise.c
index 21a7881a2db4..e9925a512b15 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -138,7 +138,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
 	*prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
 			  vma->vm_file, pgoff, vma_policy(vma),
-			  vma->vm_userfaultfd_ctx);
+			  vma->vm_userfaultfd_ctx, vma_keyid(vma));
 	if (*prev) {
 		vma = *prev;
 		goto success;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 2219e747df49..14b18449c623 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -731,7 +731,8 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 			((vmstart - vma->vm_start) >> PAGE_SHIFT);
 		prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
 				 vma->anon_vma, vma->vm_file, pgoff,
-				 new_pol, vma->vm_userfaultfd_ctx);
+				 new_pol, vma->vm_userfaultfd_ctx,
+				 vma_keyid(vma));
 		if (prev) {
 			vma = prev;
 			next = vma->vm_next;
diff --git a/mm/mlock.c b/mm/mlock.c
index 080f3b36415b..d44cb0c9e9ca 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -535,7 +535,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
 	*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
 			  vma->vm_file, pgoff, vma_policy(vma),
-			  vma->vm_userfaultfd_ctx);
+			  vma->vm_userfaultfd_ctx, vma_keyid(vma));
 	if (*prev) {
 		vma = *prev;
 		goto success;
diff --git a/mm/mmap.c b/mm/mmap.c
index bd7b9f293b39..de0bdf4d8f90 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1007,7 +1007,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
  */
 static inline int is_mergeable_vma(struct vm_area_struct *vma,
 				struct file *file, unsigned long vm_flags,
-				struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+				struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+				int keyid)
 {
 	/*
 	 * VM_SOFTDIRTY should not prevent from VMA merging, if we
@@ -1021,6 +1022,8 @@ static inline int is_mergeable_vma(struct vm_area_struct *vma,
 		return 0;
 	if (vma->vm_file != file)
 		return 0;
+	if (vma_keyid(vma) != keyid)
+		return 0;
 	if (vma->vm_ops && vma->vm_ops->close)
 		return 0;
 	if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
@@ -1057,9 +1060,10 @@ static int
 can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
 		     struct anon_vma *anon_vma, struct file *file,
 		     pgoff_t vm_pgoff,
-		     struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+		     struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+		     int keyid)
 {
-	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
+	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, keyid) &&
 	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
 		if (vma->vm_pgoff == vm_pgoff)
 			return 1;
@@ -1078,9 +1082,10 @@ static int
 can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
 		    struct anon_vma *anon_vma, struct file *file,
 		    pgoff_t vm_pgoff,
-		    struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+		    struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+		    int keyid)
 {
-	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
+	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, keyid) &&
 	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
 		pgoff_t vm_pglen;
 		vm_pglen = vma_pages(vma);
@@ -1135,7 +1140,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 			unsigned long end, unsigned long vm_flags,
 			struct anon_vma *anon_vma, struct file *file,
 			pgoff_t pgoff, struct mempolicy *policy,
-			struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+			struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+			int keyid)
 {
 	pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
 	struct vm_area_struct *area, *next;
@@ -1168,7 +1174,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 			mpol_equal(vma_policy(prev), policy) &&
 			can_vma_merge_after(prev, vm_flags,
 					    anon_vma, file, pgoff,
-					    vm_userfaultfd_ctx)) {
+					    vm_userfaultfd_ctx, keyid)) {
 		/*
 		 * OK, it can.  Can we now merge in the successor as well?
 		 */
@@ -1177,7 +1183,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 				can_vma_merge_before(next, vm_flags,
 						     anon_vma, file,
 						     pgoff+pglen,
-						     vm_userfaultfd_ctx) &&
+						     vm_userfaultfd_ctx,
+						     keyid) &&
 				is_mergeable_anon_vma(prev->anon_vma,
 						      next->anon_vma, NULL)) {
 							/* cases 1, 6 */
@@ -1200,7 +1207,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 			mpol_equal(policy, vma_policy(next)) &&
 			can_vma_merge_before(next, vm_flags,
 					     anon_vma, file, pgoff+pglen,
-					     vm_userfaultfd_ctx)) {
+					     vm_userfaultfd_ctx, keyid)) {
 		if (prev && addr < prev->vm_end)	/* case 4 */
 			err = __vma_adjust(prev, prev->vm_start,
 					 addr, prev->vm_pgoff, NULL, next);
@@ -1745,7 +1752,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	 * Can we just expand an old mapping?
 	 */
 	vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
-			NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX);
+			NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX, 0);
 	if (vma)
 		goto out;
 
@@ -3023,7 +3030,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
 
 	/* Can we just expand an old private anonymous mapping? */
 	vma = vma_merge(mm, prev, addr, addr + len, flags,
-			NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX);
+			NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, 0);
 	if (vma)
 		goto out;
 
@@ -3221,7 +3228,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 		return NULL;	/* should never get here */
 	new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
 			    vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
-			    vma->vm_userfaultfd_ctx);
+			    vma->vm_userfaultfd_ctx, vma_keyid(vma));
 	if (new_vma) {
 		/*
 		 * Source vma may have been merged into new_vma
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 028c724dcb1a..e768cd656a48 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -399,7 +399,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
 	*pprev = vma_merge(mm, *pprev, start, end, newflags,
 			   vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
-			   vma->vm_userfaultfd_ctx);
+			   vma->vm_userfaultfd_ctx, vma_keyid(vma));
 	if (*pprev) {
 		vma = *pprev;
 		VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 02/62] mm: Add helpers to setup zero page mappings
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 01/62] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-29  7:21   ` Mike Rapoport
  2019-05-08 14:43 ` [PATCH, RFC 03/62] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
                   ` (61 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

When kernel setups an encrypted page mapping, encryption KeyID is
derived from a VMA. KeyID is going to be part of vma->vm_page_prot and
it will be propagated transparently to page table entry on mk_pte().

But there is an exception: zero page is never encrypted and its mapping
must use KeyID-0, regardless VMA's KeyID.

Introduce helpers that create a page table entry for zero page.

The generic implementation will be overridden by architecture-specific
code that takes care about using correct KeyID.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/dax.c                      | 3 +--
 include/asm-generic/pgtable.h | 8 ++++++++
 mm/huge_memory.c              | 6 ++----
 mm/memory.c                   | 3 +--
 mm/userfaultfd.c              | 3 +--
 5 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index e5e54da1715f..6d609bff53b9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1441,8 +1441,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
 		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
 		mm_inc_nr_ptes(vma->vm_mm);
 	}
-	pmd_entry = mk_pmd(zero_page, vmf->vma->vm_page_prot);
-	pmd_entry = pmd_mkhuge(pmd_entry);
+	pmd_entry = mk_zero_pmd(zero_page, vmf->vma->vm_page_prot);
 	set_pmd_at(vmf->vma->vm_mm, pmd_addr, vmf->pmd, pmd_entry);
 	spin_unlock(ptl);
 	trace_dax_pmd_load_hole(inode, vmf, zero_page, *entry);
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index fa782fba51ee..cde8b81f6f2b 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -879,8 +879,16 @@ static inline unsigned long my_zero_pfn(unsigned long addr)
 }
 #endif
 
+#ifndef mk_zero_pte
+#define mk_zero_pte(addr, prot) pte_mkspecial(pfn_pte(my_zero_pfn(addr), prot))
+#endif
+
 #ifdef CONFIG_MMU
 
+#ifndef mk_zero_pmd
+#define mk_zero_pmd(zero_page, prot) pmd_mkhuge(mk_pmd(zero_page, prot))
+#endif
+
 #ifndef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_trans_huge(pmd_t pmd)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 165ea46bf149..26c3503824ba 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -675,8 +675,7 @@ static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
 	pmd_t entry;
 	if (!pmd_none(*pmd))
 		return false;
-	entry = mk_pmd(zero_page, vma->vm_page_prot);
-	entry = pmd_mkhuge(entry);
+	entry = mk_zero_pmd(zero_page, vma->vm_page_prot);
 	if (pgtable)
 		pgtable_trans_huge_deposit(mm, pmd, pgtable);
 	set_pmd_at(mm, haddr, pmd, entry);
@@ -2101,8 +2100,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
 
 	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
 		pte_t *pte, entry;
-		entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
-		entry = pte_mkspecial(entry);
+		entry = mk_zero_pte(haddr, vma->vm_page_prot);
 		pte = pte_offset_map(&_pmd, haddr);
 		VM_BUG_ON(!pte_none(*pte));
 		set_pte_at(mm, haddr, pte, entry);
diff --git a/mm/memory.c b/mm/memory.c
index ab650c21bccd..c5e0c87a12b7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2927,8 +2927,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	/* Use the zero-page for reads */
 	if (!(vmf->flags & FAULT_FLAG_WRITE) &&
 			!mm_forbids_zeropage(vma->vm_mm)) {
-		entry = pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address),
-						vma->vm_page_prot));
+		entry = mk_zero_pte(vmf->address, vma->vm_page_prot);
 		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 				vmf->address, &vmf->ptl);
 		if (!pte_none(*vmf->pte))
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index d59b5a73dfb3..ac1ce3866036 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -122,8 +122,7 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm,
 	pgoff_t offset, max_off;
 	struct inode *inode;
 
-	_dst_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr),
-					 dst_vma->vm_page_prot));
+	_dst_pte = mk_zero_pte(dst_addr, dst_vma->vm_page_prot);
 	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
 	if (dst_vma->vm_file) {
 		/* the shmem MAP_PRIVATE case requires checking the i_size */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 03/62] mm/ksm: Do not merge pages with different KeyIDs
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 01/62] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 02/62] mm: Add helpers to setup zero page mappings Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-10 18:07   ` Dave Hansen
  2019-05-08 14:43 ` [PATCH, RFC 04/62] mm/page_alloc: Unify alloc_hugepage_vma() Kirill A. Shutemov
                   ` (60 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

KeyID indicates what key to use to encrypt and decrypt page's content.
Depending on the implementation a cipher text may be tied to physical
address of the page. It means that pages with an identical plain text
would appear different if KSM would look at a cipher text. It effectively
disables KSM for encrypted pages.

In addition, some implementations may not allow to read cipher text at all.

KSM compares plain text instead (transparently to KSM code).

But we still need to make sure that pages with identical plain text will
not be merged together if they are encrypted with different keys.

To make it work kernel only allows merging pages with the same KeyID.
The approach guarantees that the merged page can be read by all users.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |  7 +++++++
 mm/ksm.c           | 17 +++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 13c40c43ce00..07c36f4673f6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1606,6 +1606,13 @@ static inline int vma_keyid(struct vm_area_struct *vma)
 }
 #endif
 
+#ifndef page_keyid
+static inline int page_keyid(struct page *page)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_SHMEM
 /*
  * The vma_is_shmem is not inline because it is used only by slow
diff --git a/mm/ksm.c b/mm/ksm.c
index fc64874dc6f4..91bce4799c45 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1227,6 +1227,23 @@ static int try_to_merge_one_page(struct vm_area_struct *vma,
 	if (!PageAnon(page))
 		goto out;
 
+	/*
+	 * KeyID indicates what key to use to encrypt and decrypt page's
+	 * content.
+	 *
+	 * KSM compares plain text instead (transparently to KSM code).
+	 *
+	 * But we still need to make sure that pages with identical plain
+	 * text will not be merged together if they are encrypted with
+	 * different keys.
+	 *
+	 * To make it work kernel only allows merging pages with the same KeyID.
+	 * The approach guarantees that the merged page can be read by all
+	 * users.
+	 */
+	if (kpage && page_keyid(page) != page_keyid(kpage))
+		goto out;
+
 	/*
 	 * We need the page lock to read a stable PageSwapCache in
 	 * write_protect_page().  We use trylock_page() instead of
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 04/62] mm/page_alloc: Unify alloc_hugepage_vma()
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 03/62] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 05/62] mm/page_alloc: Handle allocation for encrypted memory Kirill A. Shutemov
                   ` (59 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

We don't need to have separate implementations of alloc_hugepage_vma()
for NUMA and non-NUMA. Using variant based on alloc_pages_vma() we would
cover both cases.

This is preparation patch for allocation encrypted pages.

alloc_pages_vma() will handle allocation of encrypted pages. With this
change we don' t need to cover alloc_hugepage_vma() separately.

The change makes typo in Alpha's implementation of
__alloc_zeroed_user_highpage() visible. Fix it too.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/alpha/include/asm/page.h | 2 +-
 include/linux/gfp.h           | 6 ++----
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h
index f3fb2848470a..9a6fbb5269f3 100644
--- a/arch/alpha/include/asm/page.h
+++ b/arch/alpha/include/asm/page.h
@@ -18,7 +18,7 @@ extern void clear_page(void *page);
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
-	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vmaddr)
+	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 extern void copy_page(void * _to, void * _from);
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fdab7de7490d..b101aa294157 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -511,21 +511,19 @@ alloc_pages(gfp_t gfp_mask, unsigned int order)
 extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
 			struct vm_area_struct *vma, unsigned long addr,
 			int node, bool hugepage);
-#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
-	alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
 #else
 #define alloc_pages(gfp_mask, order) \
 		alloc_pages_node(numa_node_id(), gfp_mask, order)
 #define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
 	alloc_pages(gfp_mask, order)
-#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
-	alloc_pages(gfp_mask, order)
 #endif
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
 #define alloc_page_vma(gfp_mask, vma, addr)			\
 	alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false)
 #define alloc_page_vma_node(gfp_mask, vma, addr, node)		\
 	alloc_pages_vma(gfp_mask, 0, vma, addr, node, false)
+#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
+	alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
 
 extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
 extern unsigned long get_zeroed_page(gfp_t gfp_mask);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 05/62] mm/page_alloc: Handle allocation for encrypted memory
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 04/62] mm/page_alloc: Unify alloc_hugepage_vma() Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-29  7:21   ` Mike Rapoport
  2019-05-08 14:43 ` [PATCH, RFC 06/62] mm/khugepaged: Handle encrypted pages Kirill A. Shutemov
                   ` (58 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

For encrypted memory, we need to allocate pages for a specific
encryption KeyID.

There are two cases when we need to allocate a page for encryption:

 - Allocation for an encrypted VMA;

 - Allocation for migration of encrypted page;

The first case can be covered within alloc_page_vma(). We know KeyID
from the VMA.

The second case requires few new page allocation routines that would
allocate the page for a specific KeyID.

An encrypted page has to be cleared after KeyID set. This is handled
in prep_encrypted_page() that will be provided by arch-specific code.

Any custom allocator that dials with encrypted pages has to call
prep_encrypted_page() too. See compaction_alloc() for instance.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/gfp.h     | 45 ++++++++++++++++++++++++++++++++-----
 include/linux/migrate.h | 14 +++++++++---
 mm/compaction.c         |  3 +++
 mm/mempolicy.c          | 27 ++++++++++++++++------
 mm/migrate.c            |  4 ++--
 mm/page_alloc.c         | 50 +++++++++++++++++++++++++++++++++++++++++
 6 files changed, 126 insertions(+), 17 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b101aa294157..1716dbe587c9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -463,16 +463,43 @@ static inline void arch_free_page(struct page *page, int order) { }
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
 
+#ifndef prep_encrypted_page
+static inline void prep_encrypted_page(struct page *page, int order,
+		int keyid, bool zero)
+{
+}
+#endif
+
+/*
+ * Encrypted page has to be cleared once keyid is set, not on allocation.
+ */
+static inline bool deferred_page_zero(int keyid, gfp_t *gfp_mask)
+{
+	if (keyid && (*gfp_mask & __GFP_ZERO)) {
+		*gfp_mask &= ~__GFP_ZERO;
+		return true;
+	}
+
+	return false;
+}
+
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 							nodemask_t *nodemask);
 
+struct page *
+__alloc_pages_nodemask_keyid(gfp_t gfp_mask, unsigned int order,
+		int preferred_nid, nodemask_t *nodemask, int keyid);
+
 static inline struct page *
 __alloc_pages(gfp_t gfp_mask, unsigned int order, int preferred_nid)
 {
 	return __alloc_pages_nodemask(gfp_mask, order, preferred_nid, NULL);
 }
 
+struct page *__alloc_pages_node_keyid(int nid, int keyid,
+		gfp_t gfp_mask, unsigned int order);
+
 /*
  * Allocate pages, preferring the node given as nid. The node must be valid and
  * online. For more general interface, see alloc_pages_node().
@@ -500,6 +527,19 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 	return __alloc_pages_node(nid, gfp_mask, order);
 }
 
+static inline struct page *alloc_pages_node_keyid(int nid, int keyid,
+		gfp_t gfp_mask, unsigned int order)
+{
+	if (nid == NUMA_NO_NODE)
+		nid = numa_mem_id();
+
+	return __alloc_pages_node_keyid(nid, keyid, gfp_mask, order);
+}
+
+extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
+			struct vm_area_struct *vma, unsigned long addr,
+			int node, bool hugepage);
+
 #ifdef CONFIG_NUMA
 extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order);
 
@@ -508,14 +548,9 @@ alloc_pages(gfp_t gfp_mask, unsigned int order)
 {
 	return alloc_pages_current(gfp_mask, order);
 }
-extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
-			struct vm_area_struct *vma, unsigned long addr,
-			int node, bool hugepage);
 #else
 #define alloc_pages(gfp_mask, order) \
 		alloc_pages_node(numa_node_id(), gfp_mask, order)
-#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
-	alloc_pages(gfp_mask, order)
 #endif
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
 #define alloc_page_vma(gfp_mask, vma, addr)			\
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e13d9bf2f9a5..a6e068762d08 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -38,9 +38,16 @@ static inline struct page *new_page_nodemask(struct page *page,
 	unsigned int order = 0;
 	struct page *new_page = NULL;
 
-	if (PageHuge(page))
+	if (PageHuge(page)) {
+		/*
+		 * HugeTLB doesn't support encryption. We shouldn't see
+		 * such pages.
+		 */
+		if (WARN_ON_ONCE(page_keyid(page)))
+			return NULL;
 		return alloc_huge_page_nodemask(page_hstate(compound_head(page)),
 				preferred_nid, nodemask);
+	}
 
 	if (PageTransHuge(page)) {
 		gfp_mask |= GFP_TRANSHUGE;
@@ -50,8 +57,9 @@ static inline struct page *new_page_nodemask(struct page *page,
 	if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
 		gfp_mask |= __GFP_HIGHMEM;
 
-	new_page = __alloc_pages_nodemask(gfp_mask, order,
-				preferred_nid, nodemask);
+	/* Allocate a page with the same KeyID as the source page */
+	new_page = __alloc_pages_nodemask_keyid(gfp_mask, order,
+				preferred_nid, nodemask, page_keyid(page));
 
 	if (new_page && PageTransHuge(new_page))
 		prep_transhuge_page(new_page);
diff --git a/mm/compaction.c b/mm/compaction.c
index 3319e0872d01..559b8bd6d245 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1557,6 +1557,9 @@ static struct page *compaction_alloc(struct page *migratepage,
 	list_del(&freepage->lru);
 	cc->nr_freepages--;
 
+	/* Prepare the page using the same KeyID as the source page */
+	if (freepage)
+		prep_encrypted_page(freepage, 0, page_keyid(migratepage), false);
 	return freepage;
 }
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 14b18449c623..5cad39fb7b35 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -961,22 +961,29 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
 /* page allocation callback for NUMA node migration */
 struct page *alloc_new_node_page(struct page *page, unsigned long node)
 {
-	if (PageHuge(page))
+	if (PageHuge(page)) {
+		/*
+		 * HugeTLB doesn't support encryption. We shouldn't see
+		 * such pages.
+		 */
+		if (WARN_ON_ONCE(page_keyid(page)))
+			return NULL;
 		return alloc_huge_page_node(page_hstate(compound_head(page)),
 					node);
-	else if (PageTransHuge(page)) {
+	} else if (PageTransHuge(page)) {
 		struct page *thp;
 
-		thp = alloc_pages_node(node,
+		thp = alloc_pages_node_keyid(node, page_keyid(page),
 			(GFP_TRANSHUGE | __GFP_THISNODE),
 			HPAGE_PMD_ORDER);
 		if (!thp)
 			return NULL;
 		prep_transhuge_page(thp);
 		return thp;
-	} else
-		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
-						    __GFP_THISNODE, 0);
+	} else {
+		return __alloc_pages_node_keyid(node, page_keyid(page),
+				GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0);
+	}
 }
 
 /*
@@ -2053,9 +2060,13 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 {
 	struct mempolicy *pol;
 	struct page *page;
-	int preferred_nid;
+	bool deferred_zero;
+	int keyid, preferred_nid;
 	nodemask_t *nmask;
 
+	keyid = vma_keyid(vma);
+	deferred_zero = deferred_page_zero(keyid, &gfp);
+
 	pol = get_vma_policy(vma, addr);
 
 	if (pol->mode == MPOL_INTERLEAVE) {
@@ -2097,6 +2108,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 	page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask);
 	mpol_cond_put(pol);
 out:
+	if (page)
+		prep_encrypted_page(page, order, keyid, deferred_zero);
 	return page;
 }
 
diff --git a/mm/migrate.c b/mm/migrate.c
index 663a5449367a..04b36a56865d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1880,7 +1880,7 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
 	int nid = (int) data;
 	struct page *newpage;
 
-	newpage = __alloc_pages_node(nid,
+	newpage = __alloc_pages_node_keyid(nid, page_keyid(page),
 					 (GFP_HIGHUSER_MOVABLE |
 					  __GFP_THISNODE | __GFP_NOMEMALLOC |
 					  __GFP_NORETRY | __GFP_NOWARN) &
@@ -2006,7 +2006,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	int page_lru = page_is_file_cache(page);
 	unsigned long start = address & HPAGE_PMD_MASK;
 
-	new_page = alloc_pages_node(node,
+	new_page = alloc_pages_node_keyid(node, page_keyid(page),
 		(GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
 		HPAGE_PMD_ORDER);
 	if (!new_page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c02cff1ed56e..ab1d8661aa87 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3930,6 +3930,41 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla
 }
 #endif /* CONFIG_COMPACTION */
 
+#ifndef CONFIG_NUMA
+struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
+		struct vm_area_struct *vma, unsigned long addr,
+		int node, bool hugepage)
+{
+	struct page *page;
+	bool deferred_zero;
+	int keyid = vma_keyid(vma);
+
+	deferred_zero = deferred_page_zero(keyid, &gfp_mask);
+	page = alloc_pages(gfp_mask, order);
+	if (page)
+		prep_encrypted_page(page, order, keyid, deferred_zero);
+
+	return page;
+}
+#endif
+
+struct page * __alloc_pages_node_keyid(int nid, int keyid,
+		gfp_t gfp_mask, unsigned int order)
+{
+	struct page *page;
+	bool deferred_zero;
+
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+	VM_WARN_ON(!node_online(nid));
+
+	deferred_zero = deferred_page_zero(keyid, &gfp_mask);
+	page = __alloc_pages(gfp_mask, order, nid);
+	if (page)
+		prep_encrypted_page(page, order, keyid, deferred_zero);
+
+	return page;
+}
+
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map __fs_reclaim_map =
 	STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
@@ -4645,6 +4680,21 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 }
 EXPORT_SYMBOL(__alloc_pages_nodemask);
 
+struct page *
+__alloc_pages_nodemask_keyid(gfp_t gfp_mask, unsigned int order,
+		int preferred_nid, nodemask_t *nodemask, int keyid)
+{
+	struct page *page;
+	bool deferred_zero;
+
+	deferred_zero = deferred_page_zero(keyid, &gfp_mask);
+	page = __alloc_pages_nodemask(gfp_mask, order, preferred_nid, nodemask);
+	if (page)
+		prep_encrypted_page(page, order, keyid, deferred_zero);
+	return page;
+}
+EXPORT_SYMBOL(__alloc_pages_nodemask_keyid);
+
 /*
  * Common helper functions. Never use with __GFP_HIGHMEM because the returned
  * address cannot represent highmem pages. Use alloc_pages and then kmap if
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 06/62] mm/khugepaged: Handle encrypted pages
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 05/62] mm/page_alloc: Handle allocation for encrypted memory Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 07/62] x86/mm: Mask out KeyID bits from page table entry pfn Kirill A. Shutemov
                   ` (57 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

For !NUMA khugepaged allocates page in advance, before we found a VMA
for collapse. We don't yet know which KeyID to use for the allocation.

The page is allocated with KeyID-0. Once we know that the VMA is
suitable for collapsing, we prepare the page for KeyID we need, based on
vma_keyid().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/khugepaged.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 449044378782..96326a7e9d61 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1055,6 +1055,16 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 */
 	anon_vma_unlock_write(vma->anon_vma);
 
+	/*
+	 * At this point new_page is allocated as non-encrypted.
+	 * If VMA's KeyID is non-zero, we need to prepare it to be encrypted
+	 * before coping data.
+	 */
+	if (vma_keyid(vma)) {
+		prep_encrypted_page(new_page, HPAGE_PMD_ORDER,
+				vma_keyid(vma), false);
+	}
+
 	__collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl);
 	pte_unmap(pte);
 	__SetPageUptodate(new_page);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 07/62] x86/mm: Mask out KeyID bits from page table entry pfn
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 06/62] mm/khugepaged: Handle encrypted pages Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 08/62] x86/mm: Introduce variables to store number, shift and mask of KeyIDs Kirill A. Shutemov
                   ` (56 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

MKTME claims several upper bits of the physical address in a page table
entry to encode KeyID. It effectively shrinks number of bits for
physical address. We should exclude KeyID bits from physical addresses.

For instance, if CPU enumerates 52 physical address bits and number of
bits claimed for KeyID is 6, bits 51:46 must not be threated as part
physical address.

This patch adjusts __PHYSICAL_MASK during MKTME enumeration.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/cpu/intel.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 3142fd7a9b32..5dfecc9c2253 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -589,6 +589,29 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		mktme_status = MKTME_ENABLED;
 	}
 
+#ifdef CONFIG_X86_INTEL_MKTME
+	if (mktme_status == MKTME_ENABLED && nr_keyids) {
+		/*
+		 * Mask out bits claimed from KeyID from physical address mask.
+		 *
+		 * For instance, if a CPU enumerates 52 physical address bits
+		 * and number of bits claimed for KeyID is 6, bits 51:46 of
+		 * physical address is unusable.
+		 */
+		phys_addr_t keyid_mask;
+
+		keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, c->x86_phys_bits - keyid_bits);
+		physical_mask &= ~keyid_mask;
+	} else {
+		/*
+		 * Reset __PHYSICAL_MASK.
+		 * Maybe needed if there's inconsistent configuation
+		 * between CPUs.
+		 */
+		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+	}
+#endif
+
 	/*
 	 * KeyID bits effectively lower the number of physical address
 	 * bits.  Update cpuinfo_x86::x86_phys_bits accordingly.
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 08/62] x86/mm: Introduce variables to store number, shift and mask of KeyIDs
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 07/62] x86/mm: Mask out KeyID bits from page table entry pfn Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 09/62] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify() Kirill A. Shutemov
                   ` (55 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

mktme_nr_keyids holds the number of KeyIDs available for MKTME,
excluding KeyID zero which used by TME. MKTME KeyIDs start from 1.

mktme_keyid_shift holds the shift of KeyID within physical address.

mktme_keyid_mask holds the mask to extract KeyID from physical address.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 16 ++++++++++++++++
 arch/x86/kernel/cpu/intel.c  | 16 ++++++++++++----
 arch/x86/mm/Makefile         |  2 ++
 arch/x86/mm/mktme.c          | 11 +++++++++++
 4 files changed, 41 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/include/asm/mktme.h
 create mode 100644 arch/x86/mm/mktme.c

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
new file mode 100644
index 000000000000..df31876ec48c
--- /dev/null
+++ b/arch/x86/include/asm/mktme.h
@@ -0,0 +1,16 @@
+#ifndef	_ASM_X86_MKTME_H
+#define	_ASM_X86_MKTME_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_X86_INTEL_MKTME
+extern phys_addr_t mktme_keyid_mask;
+extern int mktme_nr_keyids;
+extern int mktme_keyid_shift;
+#else
+#define mktme_keyid_mask	((phys_addr_t)0)
+#define mktme_nr_keyids		0
+#define mktme_keyid_shift	0
+#endif
+
+#endif
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 5dfecc9c2253..e271264e238a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -591,6 +591,9 @@ static void detect_tme(struct cpuinfo_x86 *c)
 
 #ifdef CONFIG_X86_INTEL_MKTME
 	if (mktme_status == MKTME_ENABLED && nr_keyids) {
+		mktme_nr_keyids = nr_keyids;
+		mktme_keyid_shift = c->x86_phys_bits - keyid_bits;
+
 		/*
 		 * Mask out bits claimed from KeyID from physical address mask.
 		 *
@@ -598,17 +601,22 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		 * and number of bits claimed for KeyID is 6, bits 51:46 of
 		 * physical address is unusable.
 		 */
-		phys_addr_t keyid_mask;
-
-		keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, c->x86_phys_bits - keyid_bits);
-		physical_mask &= ~keyid_mask;
+		mktme_keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, mktme_keyid_shift);
+		physical_mask &= ~mktme_keyid_mask;
 	} else {
 		/*
 		 * Reset __PHYSICAL_MASK.
 		 * Maybe needed if there's inconsistent configuation
 		 * between CPUs.
+		 *
+		 * FIXME: broken for hotplug.
+		 * We must not allow onlining secondary CPUs with non-matching
+		 * configuration.
 		 */
 		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+		mktme_keyid_mask = 0;
+		mktme_keyid_shift = 0;
+		mktme_nr_keyids = 0;
 	}
 #endif
 
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 4b101dd6e52f..4ebee899c363 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)		+= pti.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_identity.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
+
+obj-$(CONFIG_X86_INTEL_MKTME)	+= mktme.o
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
new file mode 100644
index 000000000000..91a415612519
--- /dev/null
+++ b/arch/x86/mm/mktme.c
@@ -0,0 +1,11 @@
+#include <asm/mktme.h>
+
+/* Mask to extract KeyID from physical address. */
+phys_addr_t mktme_keyid_mask;
+/*
+ * Number of KeyIDs available for MKTME.
+ * Excludes KeyID-0 which used by TME. MKTME KeyIDs start from 1.
+ */
+int mktme_nr_keyids;
+/* Shift of KeyID within physical address. */
+int mktme_keyid_shift;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 09/62] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (7 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 08/62] x86/mm: Introduce variables to store number, shift and mask of KeyIDs Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-06-14  9:15   ` Peter Zijlstra
  2019-05-08 14:43 ` [PATCH, RFC 10/62] x86/mm: Detect MKTME early Kirill A. Shutemov
                   ` (54 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

An encrypted VMA will have KeyID stored in vma->vm_page_prot. This way
we don't need to do anything special to setup encrypted page table
entries and don't need to reserve space for KeyID in a VMA.

This patch changes _PAGE_CHG_MASK to include KeyID bits. Otherwise they
are going to be stripped from vm_page_prot on the first pgprot_modify().

Define PTE_PFN_MASK_MAX similar to PTE_PFN_MASK but based on
__PHYSICAL_MASK_SHIFT. This way we include whole range of bits
architecturally available for PFN without referencing physical_mask and
mktme_keyid_mask variables.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/pgtable_types.h | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index d6ff0bbdb394..7d6f68431538 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -117,12 +117,25 @@
 				 _PAGE_ACCESSED | _PAGE_DIRTY)
 
 /*
- * Set of bits not changed in pte_modify.  The pte's
- * protection key is treated like _PAGE_RW, for
- * instance, and is *not* included in this mask since
- * pte_modify() does modify it.
+ * Set of bits not changed in pte_modify.
+ *
+ * The pte's protection key is treated like _PAGE_RW, for instance, and is
+ * *not* included in this mask since pte_modify() does modify it.
+ *
+ * They include the physical address and the memory encryption keyID.
+ * The paddr and the keyID never occupy the same bits at the same time.
+ * But, a given bit might be used for the keyID on one system and used for
+ * the physical address on another. As an optimization, we manage them in
+ * one unit here since their combination always occupies the same hardware
+ * bits. PTE_PFN_MASK_MAX stores combined mask.
+ *
+ * Cast PAGE_MASK to a signed type so that it is sign-extended if
+ * virtual addresses are 32-bits but physical addresses are larger
+ * (ie, 32-bit PAE).
  */
-#define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
+#define PTE_PFN_MASK_MAX \
+	(((signed long)PAGE_MASK) & ((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
+#define _PAGE_CHG_MASK	(PTE_PFN_MASK_MAX | _PAGE_PCD | _PAGE_PWT |		\
 			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY |	\
 			 _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
 #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 10/62] x86/mm: Detect MKTME early
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (8 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 09/62] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify() Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 11/62] x86/mm: Add a helper to retrieve KeyID for a page Kirill A. Shutemov
                   ` (53 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

We need to know the number of KeyIDs before page_ext is initialized.
We are going to use page_ext to store KeyID and it would be handly to
avoid page_ext allocation if there's no MKMTE in the system.

page_ext initialization happens before full CPU initizliation is complete.
Move detect_tme() call to early_init_intel().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/cpu/intel.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index e271264e238a..4c9fadb57a13 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -161,6 +161,8 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
 	return false;
 }
 
+static void detect_tme(struct cpuinfo_x86 *c);
+
 static void early_init_intel(struct cpuinfo_x86 *c)
 {
 	u64 misc_enable;
@@ -311,6 +313,9 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	 */
 	if (detect_extended_topology_early(c) < 0)
 		detect_ht_early(c);
+
+	if (cpu_has(c, X86_FEATURE_TME))
+		detect_tme(c);
 }
 
 #ifdef CONFIG_X86_32
@@ -791,9 +796,6 @@ static void init_intel(struct cpuinfo_x86 *c)
 	if (cpu_has(c, X86_FEATURE_VMX))
 		detect_vmx_virtcap(c);
 
-	if (cpu_has(c, X86_FEATURE_TME))
-		detect_tme(c);
-
 	init_intel_energy_perf(c);
 
 	init_intel_misc_features(c);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 11/62] x86/mm: Add a helper to retrieve KeyID for a page
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (9 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 10/62] x86/mm: Detect MKTME early Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 12/62] x86/mm: Add a helper to retrieve KeyID for a VMA Kirill A. Shutemov
                   ` (52 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

page_ext allows to store additional per-page information without growing
main struct page. The additional space can be requested at boot time.

Store KeyID in bits 31:16 of extended page flags. These bits are unused.

page_keyid() returns zero until page_ext is ready. page_ext initializer
enables a static branch to indicate that page_keyid() can use page_ext.
The same static branch will gate MKTME readiness in general.

We don't yet set KeyID for the page. It will come in the following
patch that implements prep_encrypted_page(). All pages have KeyID-0 for
now.

page_keyid() will be used by KVM which can be built as a module. We need
to export mktme_enabled_key to be able to inline page_keyid().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 28 ++++++++++++++++++++++++++++
 arch/x86/include/asm/page.h  |  1 +
 arch/x86/mm/mktme.c          | 21 +++++++++++++++++++++
 include/linux/mm.h           |  2 +-
 include/linux/page_ext.h     | 11 ++++++++++-
 mm/page_ext.c                |  3 +++
 6 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index df31876ec48c..51f831b94179 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -2,15 +2,43 @@
 #define	_ASM_X86_MKTME_H
 
 #include <linux/types.h>
+#include <linux/page_ext.h>
+#include <linux/jump_label.h>
 
 #ifdef CONFIG_X86_INTEL_MKTME
 extern phys_addr_t mktme_keyid_mask;
 extern int mktme_nr_keyids;
 extern int mktme_keyid_shift;
+
+DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
+static inline bool mktme_enabled(void)
+{
+	return static_branch_unlikely(&mktme_enabled_key);
+}
+
+extern struct page_ext_operations page_mktme_ops;
+
+#define page_keyid page_keyid
+static inline int page_keyid(const struct page *page)
+{
+	if (!mktme_enabled())
+		return 0;
+
+	return lookup_page_ext(page)->keyid;
+}
+
+
 #else
 #define mktme_keyid_mask	((phys_addr_t)0)
 #define mktme_nr_keyids		0
 #define mktme_keyid_shift	0
+
+#define page_keyid(page) 0
+
+static inline bool mktme_enabled(void)
+{
+	return false;
+}
 #endif
 
 #endif
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 7555b48803a8..39af59487d5f 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -19,6 +19,7 @@
 struct page;
 
 #include <linux/range.h>
+#include <asm/mktme.h>
 extern struct range pfn_mapped[];
 extern int nr_pfn_mapped;
 
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 91a415612519..9dc256e3654b 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -9,3 +9,24 @@ phys_addr_t mktme_keyid_mask;
 int mktme_nr_keyids;
 /* Shift of KeyID within physical address. */
 int mktme_keyid_shift;
+
+DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
+EXPORT_SYMBOL_GPL(mktme_enabled_key);
+
+static bool need_page_mktme(void)
+{
+	/* Make sure keyid doesn't collide with extended page flags */
+	BUILD_BUG_ON(__NR_PAGE_EXT_FLAGS > 16);
+
+	return !!mktme_nr_keyids;
+}
+
+static void init_page_mktme(void)
+{
+	static_branch_enable(&mktme_enabled_key);
+}
+
+struct page_ext_operations page_mktme_ops = {
+	.need = need_page_mktme,
+	.init = init_page_mktme,
+};
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 07c36f4673f6..2684245f8503 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1607,7 +1607,7 @@ static inline int vma_keyid(struct vm_area_struct *vma)
 #endif
 
 #ifndef page_keyid
-static inline int page_keyid(struct page *page)
+static inline int page_keyid(const struct page *page)
 {
 	return 0;
 }
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index f84f167ec04c..d9c5aae9523f 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -23,6 +23,7 @@ enum page_ext_flags {
 	PAGE_EXT_YOUNG,
 	PAGE_EXT_IDLE,
 #endif
+	__NR_PAGE_EXT_FLAGS
 };
 
 /*
@@ -33,7 +34,15 @@ enum page_ext_flags {
  * then the page_ext for pfn always exists.
  */
 struct page_ext {
-	unsigned long flags;
+	union {
+		unsigned long flags;
+#ifdef CONFIG_X86_INTEL_MKTME
+		struct {
+			unsigned short __pad;
+			unsigned short keyid;
+		};
+#endif
+	};
 };
 
 extern void pgdat_page_ext_init(struct pglist_data *pgdat);
diff --git a/mm/page_ext.c b/mm/page_ext.c
index d8f1aca4ad43..1af8b82087f2 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -68,6 +68,9 @@ static struct page_ext_operations *page_ext_ops[] = {
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_X86_INTEL_MKTME
+	&page_mktme_ops,
+#endif
 };
 
 static unsigned long total_usage;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 12/62] x86/mm: Add a helper to retrieve KeyID for a VMA
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (10 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 11/62] x86/mm: Add a helper to retrieve KeyID for a page Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages Kirill A. Shutemov
                   ` (51 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

We store KeyID in upper bits for vm_page_prot that match position of
KeyID in PTE. vma_keyid() extracts KeyID from vm_page_prot.

With KeyID in vm_page_prot we don't need to modify any page table helper
to propagate the KeyID to page table entires.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 12 ++++++++++++
 arch/x86/mm/mktme.c          |  7 +++++++
 2 files changed, 19 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 51f831b94179..b5afa31b4526 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -5,6 +5,8 @@
 #include <linux/page_ext.h>
 #include <linux/jump_label.h>
 
+struct vm_area_struct;
+
 #ifdef CONFIG_X86_INTEL_MKTME
 extern phys_addr_t mktme_keyid_mask;
 extern int mktme_nr_keyids;
@@ -28,6 +30,16 @@ static inline int page_keyid(const struct page *page)
 }
 
 
+#define vma_keyid vma_keyid
+int __vma_keyid(struct vm_area_struct *vma);
+static inline int vma_keyid(struct vm_area_struct *vma)
+{
+	if (!mktme_enabled())
+		return 0;
+
+	return __vma_keyid(vma);
+}
+
 #else
 #define mktme_keyid_mask	((phys_addr_t)0)
 #define mktme_nr_keyids		0
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 9dc256e3654b..d4a1a9e9b1c0 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,3 +1,4 @@
+#include <linux/mm.h>
 #include <asm/mktme.h>
 
 /* Mask to extract KeyID from physical address. */
@@ -30,3 +31,9 @@ struct page_ext_operations page_mktme_ops = {
 	.need = need_page_mktme,
 	.init = init_page_mktme,
 };
+
+int __vma_keyid(struct vm_area_struct *vma)
+{
+	pgprotval_t prot = pgprot_val(vma->vm_page_prot);
+	return (prot & mktme_keyid_mask) >> mktme_keyid_shift;
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (11 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 12/62] x86/mm: Add a helper to retrieve KeyID for a VMA Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-06-14  9:34   ` Peter Zijlstra
  2019-05-08 14:43 ` [PATCH, RFC 14/62] x86/mm: Map zero pages into encrypted mappings correctly Kirill A. Shutemov
                   ` (50 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Hook up into page allocator to allocate and free encrypted page
properly.

The hardware/CPU does not enforce coherency between mappings of the same
physical page with different KeyIDs or encryption keys.
We are responsible for cache management.

Flush cache on allocating encrypted page and on returning the page to
the free pool.

prep_encrypted_page() also takes care about zeroing the page. We have to
do this after KeyID is set for the page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 17 +++++++++++++
 arch/x86/mm/mktme.c          | 49 ++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index b5afa31b4526..6e604126f0bc 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -40,6 +40,23 @@ static inline int vma_keyid(struct vm_area_struct *vma)
 	return __vma_keyid(vma);
 }
 
+#define prep_encrypted_page prep_encrypted_page
+void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero);
+static inline void prep_encrypted_page(struct page *page, int order,
+		int keyid, bool zero)
+{
+	if (keyid)
+		__prep_encrypted_page(page, order, keyid, zero);
+}
+
+#define HAVE_ARCH_FREE_PAGE
+void free_encrypted_page(struct page *page, int order);
+static inline void arch_free_page(struct page *page, int order)
+{
+	if (page_keyid(page))
+		free_encrypted_page(page, order);
+}
+
 #else
 #define mktme_keyid_mask	((phys_addr_t)0)
 #define mktme_nr_keyids		0
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index d4a1a9e9b1c0..43489c098e60 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,4 +1,5 @@
 #include <linux/mm.h>
+#include <linux/highmem.h>
 #include <asm/mktme.h>
 
 /* Mask to extract KeyID from physical address. */
@@ -37,3 +38,51 @@ int __vma_keyid(struct vm_area_struct *vma)
 	pgprotval_t prot = pgprot_val(vma->vm_page_prot);
 	return (prot & mktme_keyid_mask) >> mktme_keyid_shift;
 }
+
+/* Prepare page to be used for encryption. Called from page allocator. */
+void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
+{
+	int i;
+
+	/*
+	 * The hardware/CPU does not enforce coherency between mappings
+	 * of the same physical page with different KeyIDs or
+	 * encryption keys. We are responsible for cache management.
+	 */
+	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
+
+	for (i = 0; i < (1 << order); i++) {
+		/* All pages coming out of the allocator should have KeyID 0 */
+		WARN_ON_ONCE(lookup_page_ext(page)->keyid);
+		lookup_page_ext(page)->keyid = keyid;
+
+		/* Clear the page after the KeyID is set. */
+		if (zero)
+			clear_highpage(page);
+
+		page++;
+	}
+}
+
+/*
+ * Handles freeing of encrypted page.
+ * Called from page allocator on freeing encrypted page.
+ */
+void free_encrypted_page(struct page *page, int order)
+{
+	int i;
+
+	/*
+	 * The hardware/CPU does not enforce coherency between mappings
+	 * of the same physical page with different KeyIDs or
+	 * encryption keys. We are responsible for cache management.
+	 */
+	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
+
+	for (i = 0; i < (1 << order); i++) {
+		/* Check if the page has reasonable KeyID */
+		WARN_ON_ONCE(lookup_page_ext(page)->keyid > mktme_nr_keyids);
+		lookup_page_ext(page)->keyid = 0;
+		page++;
+	}
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 14/62] x86/mm: Map zero pages into encrypted mappings correctly
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (12 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 15/62] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING Kirill A. Shutemov
                   ` (49 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Zero pages are never encrypted. Keep KeyID-0 for them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/pgtable.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 50b3e2d963c9..59c3dd50b8d5 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -803,6 +803,19 @@ static inline unsigned long pmd_index(unsigned long address)
  */
 #define mk_pte(page, pgprot)   pfn_pte(page_to_pfn(page), (pgprot))
 
+#define mk_zero_pte mk_zero_pte
+static inline pte_t mk_zero_pte(unsigned long addr, pgprot_t prot)
+{
+	extern unsigned long zero_pfn;
+	pte_t entry;
+
+	prot.pgprot &= ~mktme_keyid_mask;
+	entry = pfn_pte(zero_pfn, prot);
+	entry = pte_mkspecial(entry);
+
+	return entry;
+}
+
 /*
  * the pte page can be thought of an array like this: pte_t[PTRS_PER_PTE]
  *
@@ -1133,6 +1146,12 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
 
 #define mk_pmd(page, pgprot)   pfn_pmd(page_to_pfn(page), (pgprot))
 
+#define mk_zero_pmd(zero_page, prot)					\
+({									\
+	prot.pgprot &= ~mktme_keyid_mask;				\
+	pmd_mkhuge(mk_pmd(zero_page, prot));				\
+})
+
 #define  __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS
 extern int pmdp_set_access_flags(struct vm_area_struct *vma,
 				 unsigned long address, pmd_t *pmdp,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 15/62] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (13 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 14/62] x86/mm: Map zero pages into encrypted mappings correctly Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 16/62] x86/mm: Allow to disable MKTME after enumeration Kirill A. Shutemov
                   ` (48 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Rename the option to CONFIG_MEMORY_PHYSICAL_PADDING. It will be used
not only for KASLR.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/Kconfig    | 2 +-
 arch/x86/mm/kaslr.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 62fc3fda1a05..62cfb381fee3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2201,7 +2201,7 @@ config RANDOMIZE_MEMORY
 
 	   If unsure, say Y.
 
-config RANDOMIZE_MEMORY_PHYSICAL_PADDING
+config MEMORY_PHYSICAL_PADDING
 	hex "Physical memory mapping padding" if EXPERT
 	depends on RANDOMIZE_MEMORY
 	default "0xa" if MEMORY_HOTPLUG
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index d669c5e797e0..2228cc7d6b42 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -103,7 +103,7 @@ void __init kernel_randomize_memory(void)
 	 */
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
 	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
-		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+		CONFIG_MEMORY_PHYSICAL_PADDING;
 
 	/* Adapt phyiscal memory region size based on available memory */
 	if (memory_tb < kaslr_regions[0].size_tb)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 16/62] x86/mm: Allow to disable MKTME after enumeration
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (14 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 15/62] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 17/62] x86/mm: Calculate direct mapping size Kirill A. Shutemov
                   ` (47 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

The new helper mktme_disable() allows to disable MKTME even if it's
enumerated successfully. MKTME initialization may fail and this
functionality allows system to boot regardless of the failure.

MKTME needs per-KeyID direct mapping. It requires a lot more virtual
address space which may be a problem in 4-level paging mode. If the
system has more physical memory than we can handle with MKTME the
feature allows to fail MKTME, but boot the system successfully.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |  5 +++++
 arch/x86/kernel/cpu/intel.c  |  5 +----
 arch/x86/mm/mktme.c          | 10 ++++++++++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 6e604126f0bc..454d6d7c791d 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -18,6 +18,8 @@ static inline bool mktme_enabled(void)
 	return static_branch_unlikely(&mktme_enabled_key);
 }
 
+void mktme_disable(void);
+
 extern struct page_ext_operations page_mktme_ops;
 
 #define page_keyid page_keyid
@@ -68,6 +70,9 @@ static inline bool mktme_enabled(void)
 {
 	return false;
 }
+
+static inline void mktme_disable(void) {}
+
 #endif
 
 #endif
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4c9fadb57a13..f402a74c00a1 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -618,10 +618,7 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		 * We must not allow onlining secondary CPUs with non-matching
 		 * configuration.
 		 */
-		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
-		mktme_keyid_mask = 0;
-		mktme_keyid_shift = 0;
-		mktme_nr_keyids = 0;
+		mktme_disable();
 	}
 #endif
 
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 43489c098e60..9221c894e8e9 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -15,6 +15,16 @@ int mktme_keyid_shift;
 DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
 EXPORT_SYMBOL_GPL(mktme_enabled_key);
 
+void mktme_disable(void)
+{
+	physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+	mktme_keyid_mask = 0;
+	mktme_keyid_shift = 0;
+	mktme_nr_keyids = 0;
+	if (mktme_enabled())
+		static_branch_disable(&mktme_enabled_key);
+}
+
 static bool need_page_mktme(void)
 {
 	/* Make sure keyid doesn't collide with extended page flags */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 17/62] x86/mm: Calculate direct mapping size
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (15 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 16/62] x86/mm: Allow to disable MKTME after enumeration Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 18/62] x86/mm: Implement syncing per-KeyID direct mappings Kirill A. Shutemov
                   ` (46 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

The kernel needs to have a way to access encrypted memory. We have two
option on how approach it:

 - Create temporary mappings every time kernel needs access to encrypted
   memory. That's basically brings highmem and its overhead back.

 - Create multiple direct mappings, one per-KeyID. In this setup we
   don't need to create temporary mappings on the fly -- encrypted
   memory is permanently available in kernel address space.

We take the second approach as it has lower overhead.

It's worth noting that with per-KeyID direct mappings compromised kernel
would give access to decrypted data right away without additional tricks
to get memory mapped with the correct KeyID.

Per-KeyID mappings require a lot more virtual address space. On 4-level
machine with 64 KeyIDs we max out 46-bit virtual address space dedicated
for direct mapping with 1TiB of RAM. Given that we round up any
calculation on direct mapping size to 1TiB, we effectively claim all
46-bit address space for direct mapping on such machine regardless of
RAM size.

Increased usage of virtual address space has implications for KASLR:
we have less space for randomization. With 64 TiB claimed for direct
mapping with 4-level we left with 27 TiB of entropy to place
page_offset_base, vmalloc_base and vmemmap_base.

5-level paging provides much wider virtual address space and KASLR
doesn't suffer significantly from per-KeyID direct mappings.

It's preferred to run MKTME with 5-level paging.

A direct mapping for each KeyID will be put next to each other in the
virtual address space. We need to have a way to find boundaries of
direct mapping for particular KeyID.

The new variable direct_mapping_size specifies the size of direct
mapping. With the value, it's trivial to find direct mapping for
KeyID-N: PAGE_OFFSET + N * direct_mapping_size.

Size of direct mapping is calculated during KASLR setup. If KALSR is
disabled it happens during MKTME initialization.

With MKTME size of direct mapping has to be power-of-2. It makes
implementation of __pa() efficient.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/x86_64/mm.txt |  4 +++
 arch/x86/include/asm/page_32.h  |  1 +
 arch/x86/include/asm/page_64.h  |  2 ++
 arch/x86/include/asm/setup.h    |  6 ++++
 arch/x86/kernel/head64.c        |  4 +++
 arch/x86/kernel/setup.c         |  3 ++
 arch/x86/mm/init_64.c           | 58 +++++++++++++++++++++++++++++++++
 arch/x86/mm/kaslr.c             | 11 +++++--
 8 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 804f9426ed17..81a1b96e0902 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -132,6 +132,10 @@ The direct mapping covers all memory in the system up to the highest
 memory address (this means in some cases it can also include PCI memory
 holes).
 
+With MKTME, we have multiple direct mappings. One per-KeyID. They are put
+next to each other. PAGE_OFFSET + N * direct_mapping_size can be used to
+find direct mapping for KeyID-N.
+
 vmalloc space is lazily synchronized into the different PML4/PML5 pages of
 the processes using the page fault handler, with init_top_pgt as
 reference.
diff --git a/arch/x86/include/asm/page_32.h b/arch/x86/include/asm/page_32.h
index 94dbd51df58f..8bce788f9ca9 100644
--- a/arch/x86/include/asm/page_32.h
+++ b/arch/x86/include/asm/page_32.h
@@ -6,6 +6,7 @@
 
 #ifndef __ASSEMBLY__
 
+#define direct_mapping_size 0
 #define __phys_addr_nodebug(x)	((x) - PAGE_OFFSET)
 #ifdef CONFIG_DEBUG_VIRTUAL
 extern unsigned long __phys_addr(unsigned long);
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index 939b1cff4a7b..f57fc3cc2246 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -14,6 +14,8 @@ extern unsigned long phys_base;
 extern unsigned long page_offset_base;
 extern unsigned long vmalloc_base;
 extern unsigned long vmemmap_base;
+extern unsigned long direct_mapping_size;
+extern unsigned long direct_mapping_mask;
 
 static inline unsigned long __phys_addr_nodebug(unsigned long x)
 {
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index ed8ec011a9fd..d2861074cf83 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -62,6 +62,12 @@ extern void x86_ce4100_early_setup(void);
 static inline void x86_ce4100_early_setup(void) { }
 #endif
 
+#ifdef CONFIG_MEMORY_PHYSICAL_PADDING
+void calculate_direct_mapping_size(void);
+#else
+static inline void calculate_direct_mapping_size(void) { }
+#endif
+
 #ifndef _SETUP
 
 #include <asm/espfix.h>
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 16b1cbd3a61e..c1a3ef88cb08 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -60,6 +60,10 @@ EXPORT_SYMBOL(vmalloc_base);
 unsigned long vmemmap_base __ro_after_init = __VMEMMAP_BASE_L4;
 EXPORT_SYMBOL(vmemmap_base);
 #endif
+unsigned long direct_mapping_size __ro_after_init = -1UL;
+EXPORT_SYMBOL(direct_mapping_size);
+unsigned long direct_mapping_mask __ro_after_init = -1UL;
+EXPORT_SYMBOL(direct_mapping_mask);
 
 #define __head	__section(.head.text)
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a527cd9..8b47e3e38926 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1057,6 +1057,9 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	init_cache_modes();
 
+	 /* direct_mapping_size has to be initialized before KASLR and MKTME */
+	calculate_direct_mapping_size();
+
 	/*
 	 * Define random base addresses for memory sections after max_pfn is
 	 * defined and before each memory section base is used.
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index bccff68e3267..3a08d707eec8 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1383,6 +1383,64 @@ unsigned long memory_block_size_bytes(void)
 	return memory_block_size_probed;
 }
 
+#ifdef CONFIG_MEMORY_PHYSICAL_PADDING
+void __init calculate_direct_mapping_size(void)
+{
+	unsigned long available_va;
+
+	/* 1/4 of virtual address space is didicated for direct mapping */
+	available_va = 1UL << (__VIRTUAL_MASK_SHIFT - 1);
+
+	/* How much memory the system has? */
+	direct_mapping_size = max_pfn << PAGE_SHIFT;
+	direct_mapping_size = round_up(direct_mapping_size, 1UL << 40);
+
+	if (!mktme_nr_keyids)
+		goto out;
+
+	/*
+	 * For MKTME we need direct_mapping_size to be power-of-2.
+	 * It makes __pa() implementation efficient.
+	 */
+	direct_mapping_size = roundup_pow_of_two(direct_mapping_size);
+
+	/*
+	 * Not enough virtual address space to address all physical memory with
+	 * MKTME enabled. Even without padding.
+	 *
+	 * Disable MKTME instead.
+	 */
+	if (direct_mapping_size > available_va / (mktme_nr_keyids + 1)) {
+		pr_err("x86/mktme: Disabled. Not enough virtual address space\n");
+		pr_err("x86/mktme: Consider switching to 5-level paging\n");
+		mktme_disable();
+		goto out;
+	}
+
+	/*
+	 * Virtual address space is divided between per-KeyID direct mappings.
+	 */
+	available_va /= mktme_nr_keyids + 1;
+out:
+	/* Add padding, if there's enough virtual address space */
+	direct_mapping_size += (1UL << 40) * CONFIG_MEMORY_PHYSICAL_PADDING;
+	if (mktme_nr_keyids)
+		direct_mapping_size = roundup_pow_of_two(direct_mapping_size);
+
+	if (direct_mapping_size > available_va)
+		direct_mapping_size = available_va;
+
+	/*
+	 * For MKTME, make sure direct_mapping_size is still power-of-2
+	 * after adding padding and calculate mask that is used in __pa().
+	 */
+	if (mktme_nr_keyids) {
+		direct_mapping_size = rounddown_pow_of_two(direct_mapping_size);
+		direct_mapping_mask = direct_mapping_size - 1;
+	}
+}
+#endif
+
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 /*
  * Initialise the sparsemem vmemmap using huge-pages at the PMD level.
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 2228cc7d6b42..9cfba6627603 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -102,10 +102,15 @@ void __init kernel_randomize_memory(void)
 	 * add padding if needed (especially for memory hotplug support).
 	 */
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
-	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
-		CONFIG_MEMORY_PHYSICAL_PADDING;
 
-	/* Adapt phyiscal memory region size based on available memory */
+	/*
+	 * Calculate space required to map all physical memory.
+	 * In case of MKTME, we map physical memory multiple times, one for
+	 * each KeyID. If MKTME is disabled mktme_nr_keyids is 0.
+	 */
+	memory_tb = (direct_mapping_size * (mktme_nr_keyids + 1)) >> TB_SHIFT;
+
+	/* Adapt physical memory region size based on available memory */
 	if (memory_tb < kaslr_regions[0].size_tb)
 		kaslr_regions[0].size_tb = memory_tb;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 18/62] x86/mm: Implement syncing per-KeyID direct mappings
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (16 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 17/62] x86/mm: Calculate direct mapping size Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-06-14  9:51   ` Peter Zijlstra
  2019-05-08 14:43 ` [PATCH, RFC 19/62] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
                   ` (45 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

For MKTME we use per-KeyID direct mappings. This allows kernel to have
access to encrypted memory.

sync_direct_mapping() sync per-KeyID direct mappings with a canonical
one -- KeyID-0.

The function tracks changes in the canonical mapping:
 - creating or removing chunks of the translation tree;
 - changes in mapping flags (i.e. protection bits);
 - splitting huge page mapping into a page table;
 - replacing page table with a huge page mapping;

The function need to be called on every change to the direct mapping:
hotplug, hotremove, changes in permissions bits, etc.

The function is nop until MKTME is enabled.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |   6 +
 arch/x86/mm/init_64.c        |  10 +
 arch/x86/mm/mktme.c          | 441 +++++++++++++++++++++++++++++++++++
 3 files changed, 457 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 454d6d7c791d..bd6707e73219 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -59,6 +59,8 @@ static inline void arch_free_page(struct page *page, int order)
 		free_encrypted_page(page, order);
 }
 
+int sync_direct_mapping(void);
+
 #else
 #define mktme_keyid_mask	((phys_addr_t)0)
 #define mktme_nr_keyids		0
@@ -73,6 +75,10 @@ static inline bool mktme_enabled(void)
 
 static inline void mktme_disable(void) {}
 
+static inline int sync_direct_mapping(void)
+{
+	return 0;
+}
 #endif
 
 #endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3a08d707eec8..ad4ea3703faf 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -693,6 +693,7 @@ kernel_physical_mapping_init(unsigned long paddr_start,
 {
 	bool pgd_changed = false;
 	unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
+	int ret;
 
 	paddr_last = paddr_end;
 	vaddr = (unsigned long)__va(paddr_start);
@@ -726,6 +727,9 @@ kernel_physical_mapping_init(unsigned long paddr_start,
 		pgd_changed = true;
 	}
 
+	ret = sync_direct_mapping();
+	WARN_ON(ret);
+
 	if (pgd_changed)
 		sync_global_pgds(vaddr_start, vaddr_end - 1);
 
@@ -1135,10 +1139,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
 static void __meminit
 kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 {
+	int ret;
 	start = (unsigned long)__va(start);
 	end = (unsigned long)__va(end);
 
 	remove_pagetable(start, end, true, NULL);
+	ret = sync_direct_mapping();
+	WARN_ON(ret);
 }
 
 int __ref arch_remove_memory(int nid, u64 start, u64 size,
@@ -1247,6 +1254,7 @@ void mark_rodata_ro(void)
 	unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 	unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
 	unsigned long all_end;
+	int ret;
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
@@ -1280,6 +1288,8 @@ void mark_rodata_ro(void)
 	free_kernel_image_pages((void *)text_end, (void *)rodata_start);
 	free_kernel_image_pages((void *)rodata_end, (void *)_sdata);
 
+	ret = sync_direct_mapping();
+	WARN_ON(ret);
 	debug_checkwx();
 }
 
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 9221c894e8e9..024165c9c7f3 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,6 +1,8 @@
 #include <linux/mm.h>
 #include <linux/highmem.h>
 #include <asm/mktme.h>
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
 
 /* Mask to extract KeyID from physical address. */
 phys_addr_t mktme_keyid_mask;
@@ -36,6 +38,8 @@ static bool need_page_mktme(void)
 static void init_page_mktme(void)
 {
 	static_branch_enable(&mktme_enabled_key);
+
+	sync_direct_mapping();
 }
 
 struct page_ext_operations page_mktme_ops = {
@@ -96,3 +100,440 @@ void free_encrypted_page(struct page *page, int order)
 		page++;
 	}
 }
+
+static int sync_direct_mapping_pte(unsigned long keyid,
+		pmd_t *dst_pmd, pmd_t *src_pmd,
+		unsigned long addr, unsigned long end)
+{
+	pte_t *src_pte, *dst_pte;
+	pte_t *new_pte = NULL;
+	bool remove_pte;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_pte = !src_pmd && PAGE_ALIGNED(addr) && PAGE_ALIGNED(end);
+
+	/*
+	 * PMD page got split into page table.
+	 * Clear PMD mapping. Page table will be established instead.
+	 */
+	if (pmd_large(*dst_pmd)) {
+		spin_lock(&init_mm.page_table_lock);
+		pmd_clear(dst_pmd);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	/* Allocate a new page table if needed. */
+	if (pmd_none(*dst_pmd)) {
+		new_pte = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_pte)
+			return -ENOMEM;
+		dst_pte = new_pte + pte_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_pte = pte_offset_map(dst_pmd, addr + keyid * direct_mapping_size);
+	}
+	src_pte = src_pmd ? pte_offset_map(src_pmd, addr) : NULL;
+
+	spin_lock(&init_mm.page_table_lock);
+
+	do {
+		pteval_t val;
+
+		if (!src_pte || pte_none(*src_pte)) {
+			set_pte(dst_pte, __pte(0));
+			goto next;
+		}
+
+		if (!pte_none(*dst_pte)) {
+			/*
+			 * Sanity check: PFNs must match between source
+			 * and destination even if the rest doesn't.
+			 */
+			BUG_ON(pte_pfn(*dst_pte) != pte_pfn(*src_pte));
+		}
+
+		/* Copy entry, but set KeyID. */
+		val = pte_val(*src_pte) | keyid << mktme_keyid_shift;
+		val &= __supported_pte_mask;
+		set_pte(dst_pte, __pte(val));
+next:
+		addr += PAGE_SIZE;
+		dst_pte++;
+		if (src_pte)
+			src_pte++;
+	} while (addr != end);
+
+	if (new_pte)
+		pmd_populate_kernel(&init_mm, dst_pmd, new_pte);
+
+	if (remove_pte) {
+		__free_page(pmd_page(*dst_pmd));
+		pmd_clear(dst_pmd);
+	}
+
+	spin_unlock(&init_mm.page_table_lock);
+
+	return 0;
+}
+
+static int sync_direct_mapping_pmd(unsigned long keyid,
+		pud_t *dst_pud, pud_t *src_pud,
+		unsigned long addr, unsigned long end)
+{
+	pmd_t *src_pmd, *dst_pmd;
+	pmd_t *new_pmd = NULL;
+	bool remove_pmd = false;
+	unsigned long next;
+	int ret = 0;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_pmd = !src_pud && IS_ALIGNED(addr, PUD_SIZE) && IS_ALIGNED(end, PUD_SIZE);
+
+	/*
+	 * PUD page got split into page table.
+	 * Clear PUD mapping. Page table will be established instead.
+	 */
+	if (pud_large(*dst_pud)) {
+		spin_lock(&init_mm.page_table_lock);
+		pud_clear(dst_pud);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	/* Allocate a new page table if needed. */
+	if (pud_none(*dst_pud)) {
+		new_pmd = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_pmd)
+			return -ENOMEM;
+		dst_pmd = new_pmd + pmd_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_pmd = pmd_offset(dst_pud, addr + keyid * direct_mapping_size);
+	}
+	src_pmd = src_pud ? pmd_offset(src_pud, addr) : NULL;
+
+	do {
+		pmd_t *__src_pmd = src_pmd;
+
+		next = pmd_addr_end(addr, end);
+		if (!__src_pmd || pmd_none(*__src_pmd)) {
+			if (pmd_none(*dst_pmd))
+				goto next;
+			if (pmd_large(*dst_pmd)) {
+				spin_lock(&init_mm.page_table_lock);
+				set_pmd(dst_pmd, __pmd(0));
+				spin_unlock(&init_mm.page_table_lock);
+				goto next;
+			}
+			__src_pmd = NULL;
+		}
+
+		if (__src_pmd && pmd_large(*__src_pmd)) {
+			pmdval_t val;
+
+			if (pmd_large(*dst_pmd)) {
+				/*
+				 * Sanity check: PFNs must match between source
+				 * and destination even if the rest doesn't.
+				 */
+				BUG_ON(pmd_pfn(*dst_pmd) != pmd_pfn(*__src_pmd));
+			} else if (!pmd_none(*dst_pmd)) {
+				/*
+				 * Page table is replaced with a PMD page.
+				 * Free and unmap the page table.
+				 */
+				__free_page(pmd_page(*dst_pmd));
+				spin_lock(&init_mm.page_table_lock);
+				pmd_clear(dst_pmd);
+				spin_unlock(&init_mm.page_table_lock);
+			}
+
+			/* Copy entry, but set KeyID. */
+			val = pmd_val(*__src_pmd) | keyid << mktme_keyid_shift;
+			val &= __supported_pte_mask;
+			spin_lock(&init_mm.page_table_lock);
+			set_pmd(dst_pmd, __pmd(val));
+			spin_unlock(&init_mm.page_table_lock);
+			goto next;
+		}
+
+		ret = sync_direct_mapping_pte(keyid, dst_pmd, __src_pmd,
+				addr, next);
+next:
+		addr = next;
+		dst_pmd++;
+		if (src_pmd)
+			src_pmd++;
+	} while (addr != end && !ret);
+
+	if (new_pmd) {
+		spin_lock(&init_mm.page_table_lock);
+		pud_populate(&init_mm, dst_pud, new_pmd);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	if (remove_pmd) {
+		spin_lock(&init_mm.page_table_lock);
+		__free_page(pud_page(*dst_pud));
+		pud_clear(dst_pud);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	return ret;
+}
+
+static int sync_direct_mapping_pud(unsigned long keyid,
+		p4d_t *dst_p4d, p4d_t *src_p4d,
+		unsigned long addr, unsigned long end)
+{
+	pud_t *src_pud, *dst_pud;
+	pud_t *new_pud = NULL;
+	bool remove_pud = false;
+	unsigned long next;
+	int ret = 0;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_pud = !src_p4d && IS_ALIGNED(addr, P4D_SIZE) && IS_ALIGNED(end, P4D_SIZE);
+
+	/*
+	 * P4D page got split into page table.
+	 * Clear P4D mapping. Page table will be established instead.
+	 */
+	if (p4d_large(*dst_p4d)) {
+		spin_lock(&init_mm.page_table_lock);
+		p4d_clear(dst_p4d);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	/* Allocate a new page table if needed. */
+	if (p4d_none(*dst_p4d)) {
+		new_pud = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_pud)
+			return -ENOMEM;
+		dst_pud = new_pud + pud_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_pud = pud_offset(dst_p4d, addr + keyid * direct_mapping_size);
+	}
+	src_pud = src_p4d ? pud_offset(src_p4d, addr) : NULL;
+
+	do {
+		pud_t *__src_pud = src_pud;
+
+		next = pud_addr_end(addr, end);
+		if (!__src_pud || pud_none(*__src_pud)) {
+			if (pud_none(*dst_pud))
+				goto next;
+			if (pud_large(*dst_pud)) {
+				spin_lock(&init_mm.page_table_lock);
+				set_pud(dst_pud, __pud(0));
+				spin_unlock(&init_mm.page_table_lock);
+				goto next;
+			}
+			__src_pud = NULL;
+		}
+
+		if (__src_pud && pud_large(*__src_pud)) {
+			pudval_t val;
+
+			if (pud_large(*dst_pud)) {
+				/*
+				 * Sanity check: PFNs must match between source
+				 * and destination even if the rest doesn't.
+				 */
+				BUG_ON(pud_pfn(*dst_pud) != pud_pfn(*__src_pud));
+			} else if (!pud_none(*dst_pud)) {
+				/*
+				 * Page table is replaced with a pud page.
+				 * Free and unmap the page table.
+				 */
+				__free_page(pud_page(*dst_pud));
+				spin_lock(&init_mm.page_table_lock);
+				pud_clear(dst_pud);
+				spin_unlock(&init_mm.page_table_lock);
+			}
+
+			/* Copy entry, but set KeyID. */
+			val = pud_val(*__src_pud) | keyid << mktme_keyid_shift;
+			val &= __supported_pte_mask;
+			spin_lock(&init_mm.page_table_lock);
+			set_pud(dst_pud, __pud(val));
+			spin_unlock(&init_mm.page_table_lock);
+			goto next;
+		}
+
+		ret = sync_direct_mapping_pmd(keyid, dst_pud, __src_pud,
+				addr, next);
+next:
+		addr = next;
+		dst_pud++;
+		if (src_pud)
+			src_pud++;
+	} while (addr != end && !ret);
+
+	if (new_pud) {
+		spin_lock(&init_mm.page_table_lock);
+		p4d_populate(&init_mm, dst_p4d, new_pud);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	if (remove_pud) {
+		spin_lock(&init_mm.page_table_lock);
+		__free_page(p4d_page(*dst_p4d));
+		p4d_clear(dst_p4d);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	return ret;
+}
+
+static int sync_direct_mapping_p4d(unsigned long keyid,
+		pgd_t *dst_pgd, pgd_t *src_pgd,
+		unsigned long addr, unsigned long end)
+{
+	p4d_t *src_p4d, *dst_p4d;
+	p4d_t *new_p4d_1 = NULL, *new_p4d_2 = NULL;
+	bool remove_p4d = false;
+	unsigned long next;
+	int ret = 0;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_p4d = !src_pgd && IS_ALIGNED(addr, PGDIR_SIZE) && IS_ALIGNED(end, PGDIR_SIZE);
+
+	/* Allocate a new page table if needed. */
+	if (pgd_none(*dst_pgd)) {
+		new_p4d_1 = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_p4d_1)
+			return -ENOMEM;
+		dst_p4d = new_p4d_1 + p4d_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_p4d = p4d_offset(dst_pgd, addr + keyid * direct_mapping_size);
+	}
+	src_p4d = src_pgd ? p4d_offset(src_pgd, addr) : NULL;
+
+	do {
+		p4d_t *__src_p4d = src_p4d;
+
+		next = p4d_addr_end(addr, end);
+		if (!__src_p4d || p4d_none(*__src_p4d)) {
+			if (p4d_none(*dst_p4d))
+				goto next;
+			__src_p4d = NULL;
+		}
+
+		ret = sync_direct_mapping_pud(keyid, dst_p4d, __src_p4d,
+				addr, next);
+next:
+		addr = next;
+		dst_p4d++;
+
+		/*
+		 * Direct mappings are 1TiB-aligned. With 5-level paging it
+		 * means that on PGD level there can be misalignment between
+		 * source and distiantion.
+		 *
+		 * Allocate the new page table if dst_p4d crosses page table
+		 * boundary.
+		 */
+		if (!((unsigned long)dst_p4d & ~PAGE_MASK) && addr != end) {
+			if (pgd_none(dst_pgd[1])) {
+				new_p4d_2 = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+				if (!new_p4d_2)
+					ret = -ENOMEM;
+				dst_p4d = new_p4d_2;
+			} else {
+				dst_p4d = p4d_offset(dst_pgd + 1, 0);
+			}
+		}
+		if (src_p4d)
+			src_p4d++;
+	} while (addr != end && !ret);
+
+	if (new_p4d_1 || new_p4d_2) {
+		spin_lock(&init_mm.page_table_lock);
+		if (new_p4d_1)
+			pgd_populate(&init_mm, dst_pgd, new_p4d_1);
+		if (new_p4d_2)
+			pgd_populate(&init_mm, dst_pgd + 1, new_p4d_2);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	if (remove_p4d) {
+		spin_lock(&init_mm.page_table_lock);
+		__free_page(pgd_page(*dst_pgd));
+		pgd_clear(dst_pgd);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	return ret;
+}
+
+static int sync_direct_mapping_keyid(unsigned long keyid)
+{
+	pgd_t *src_pgd, *dst_pgd;
+	unsigned long addr, end, next;
+	int ret = 0;
+
+	addr = PAGE_OFFSET;
+	end = PAGE_OFFSET + direct_mapping_size;
+
+	dst_pgd = pgd_offset_k(addr + keyid * direct_mapping_size);
+	src_pgd = pgd_offset_k(addr);
+
+	do {
+		pgd_t *__src_pgd = src_pgd;
+
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*__src_pgd)) {
+			if (pgd_none(*dst_pgd))
+				continue;
+			__src_pgd = NULL;
+		}
+
+		ret = sync_direct_mapping_p4d(keyid, dst_pgd, __src_pgd,
+				addr, next);
+	} while (dst_pgd++, src_pgd++, addr = next, addr != end && !ret);
+
+	return ret;
+}
+
+/*
+ * For MKTME we maintain per-KeyID direct mappings. This allows kernel to have
+ * access to encrypted memory.
+ *
+ * sync_direct_mapping() sync per-KeyID direct mappings with a canonical
+ * one -- KeyID-0.
+ *
+ * The function tracks changes in the canonical mapping:
+ *  - creating or removing chunks of the translation tree;
+ *  - changes in mapping flags (i.e. protection bits);
+ *  - splitting huge page mapping into a page table;
+ *  - replacing page table with a huge page mapping;
+ *
+ * The function need to be called on every change to the direct mapping:
+ * hotplug, hotremove, changes in permissions bits, etc.
+ *
+ * The function is nop until MKTME is enabled.
+ */
+int sync_direct_mapping(void)
+{
+	int i, ret = 0;
+
+	if (!mktme_enabled())
+		return 0;
+
+	for (i = 1; !ret && i <= mktme_nr_keyids; i++)
+		ret = sync_direct_mapping_keyid(i);
+
+	flush_tlb_all();
+
+	return ret;
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 19/62] x86/mm: Handle encrypted memory in page_to_virt() and __pa()
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (17 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 18/62] x86/mm: Implement syncing per-KeyID direct mappings Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-06-14 11:10   ` Peter Zijlstra
  2019-05-08 14:43 ` [PATCH, RFC 20/62] mm/page_ext: Export lookup_page_ext() symbol Kirill A. Shutemov
                   ` (44 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Per-KeyID direct mappings require changes into how we find the right
virtual address for a page and virt-to-phys address translations.

page_to_virt() definition overwrites default macros provided by
<linux/mm.h>.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/page.h    | 3 +++
 arch/x86/include/asm/page_64.h | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 39af59487d5f..aff30554f38e 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -72,6 +72,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 extern bool __virt_addr_valid(unsigned long kaddr);
 #define virt_addr_valid(kaddr)	__virt_addr_valid((unsigned long) (kaddr))
 
+#define page_to_virt(x) \
+	(__va(PFN_PHYS(page_to_pfn(x))) + page_keyid(x) * direct_mapping_size)
+
 #endif	/* __ASSEMBLY__ */
 
 #include <asm-generic/memory_model.h>
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index f57fc3cc2246..a4f394e3471d 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -24,7 +24,7 @@ static inline unsigned long __phys_addr_nodebug(unsigned long x)
 	/* use the carry flag to determine if x was < __START_KERNEL_map */
 	x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET));
 
-	return x;
+	return x & direct_mapping_mask;
 }
 
 #ifdef CONFIG_DEBUG_VIRTUAL
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 20/62] mm/page_ext: Export lookup_page_ext() symbol
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (18 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 19/62] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-06-14 11:12   ` Peter Zijlstra
  2019-05-08 14:43 ` [PATCH, RFC 21/62] mm/rmap: Clear vma->anon_vma on unlink_anon_vmas() Kirill A. Shutemov
                   ` (43 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

page_keyid() is inline funcation that uses lookup_page_ext(). KVM is
going to use page_keyid() and since KVM can be built as a module
lookup_page_ext() has to be exported.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/page_ext.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/page_ext.c b/mm/page_ext.c
index 1af8b82087f2..91e4e87f6e41 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -142,6 +142,7 @@ struct page_ext *lookup_page_ext(const struct page *page)
 					MAX_ORDER_NR_PAGES);
 	return get_entry(base, index);
 }
+EXPORT_SYMBOL_GPL(lookup_page_ext);
 
 static int __init alloc_node_page_ext(int nid)
 {
@@ -212,6 +213,7 @@ struct page_ext *lookup_page_ext(const struct page *page)
 		return NULL;
 	return get_entry(section->page_ext, pfn);
 }
+EXPORT_SYMBOL_GPL(lookup_page_ext);
 
 static void *__meminit alloc_page_ext(size_t size, int nid)
 {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 21/62] mm/rmap: Clear vma->anon_vma on unlink_anon_vmas()
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (19 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 20/62] mm/page_ext: Export lookup_page_ext() symbol Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 22/62] x86/pconfig: Set a valid encryption algorithm for all MKTME commands Kirill A. Shutemov
                   ` (42 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

If all pages in the VMA got unmapped there's no reason to link it into
original anon VMA hierarchy: it cannot possibly share any pages with
other VMA.

Set vma->anon_vma to NULL on unlink_anon_vmas(). With the change VMA
can be reused. The new anon VMA will be allocated on the first fault.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/rmap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index b30c7c71d1d9..4ec2aee7baa3 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -400,8 +400,10 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
 		list_del(&avc->same_vma);
 		anon_vma_chain_free(avc);
 	}
-	if (vma->anon_vma)
+	if (vma->anon_vma) {
 		vma->anon_vma->degree--;
+		vma->anon_vma = NULL;
+	}
 	unlock_anon_vma_root(root);
 
 	/*
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 22/62] x86/pconfig: Set a valid encryption algorithm for all MKTME commands
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (20 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 21/62] mm/rmap: Clear vma->anon_vma on unlink_anon_vmas() Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 23/62] keys/mktme: Introduce a Kernel Key Service for MKTME Kirill A. Shutemov
                   ` (41 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The Intel MKTME architecture specification requires a valid encryption
algorithm for all command types.

For commands that actually perform encryption, SET_KEY_DIRECT and
SET_KEY_RANDOM, the user specifies the algorithm when requesting the
key through the MKTME Key Service.

For CLEAR_KEY and NO_ENCRYPT commands, a valid encryption algorithm is
also required by the MKTME hardware. However, it does not make sense to
ask userspace to specify one. Define the CLEAR_KEY and NO_ENCRYPT type
commands to always include a valid encryption algorithm.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/intel_pconfig.h | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/intel_pconfig.h b/arch/x86/include/asm/intel_pconfig.h
index 3cb002b1d0f9..15705699a14e 100644
--- a/arch/x86/include/asm/intel_pconfig.h
+++ b/arch/x86/include/asm/intel_pconfig.h
@@ -21,14 +21,20 @@ enum pconfig_leaf {
 
 /* Defines and structure for MKTME_KEY_PROGRAM of PCONFIG instruction */
 
+/* mktme_key_program::keyid_ctrl ENC_ALG, bits [23:8] */
+#define MKTME_AES_XTS_128	(1 << 8)
+#define MKTME_ANY_VALID_ALG	(1 << 8)
+
 /* mktme_key_program::keyid_ctrl COMMAND, bits [7:0] */
 #define MKTME_KEYID_SET_KEY_DIRECT	0
 #define MKTME_KEYID_SET_KEY_RANDOM	1
-#define MKTME_KEYID_CLEAR_KEY		2
-#define MKTME_KEYID_NO_ENCRYPT		3
 
-/* mktme_key_program::keyid_ctrl ENC_ALG, bits [23:8] */
-#define MKTME_AES_XTS_128	(1 << 8)
+/*
+ * CLEAR_KEY and NO_ENCRYPT require the COMMAND in bits [7:0]
+ * and any valid encryption algorithm, ENC_ALG, in bits [23:8]
+ */
+#define MKTME_KEYID_CLEAR_KEY  (2 | MKTME_ANY_VALID_ALG)
+#define MKTME_KEYID_NO_ENCRYPT (3 | MKTME_ANY_VALID_ALG)
 
 /* Return codes from the PCONFIG MKTME_KEY_PROGRAM */
 #define MKTME_PROG_SUCCESS	0
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 23/62] keys/mktme: Introduce a Kernel Key Service for MKTME
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (21 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 22/62] x86/pconfig: Set a valid encryption algorithm for all MKTME commands Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 24/62] keys/mktme: Preparse the MKTME key payload Kirill A. Shutemov
                   ` (40 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

MKTME (Multi-Key Total Memory Encryption) is a technology that allows
transparent memory encryption in upcoming Intel platforms. MKTME will
support multiple encryption domains, each having their own key.

The MKTME key service will manage the hardware encryption keys. It
will map Userspace Keys to Hardware KeyIDs and program the hardware
with the user requested encryption options.

Here the mapping structure and associated helpers are introduced,
as well as the key service initialization and registration.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/Makefile     |  1 +
 security/keys/mktme_keys.c | 98 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 99 insertions(+)
 create mode 100644 security/keys/mktme_keys.c

diff --git a/security/keys/Makefile b/security/keys/Makefile
index 9cef54064f60..28799be801a9 100644
--- a/security/keys/Makefile
+++ b/security/keys/Makefile
@@ -30,3 +30,4 @@ obj-$(CONFIG_ASYMMETRIC_KEY_TYPE) += keyctl_pkey.o
 obj-$(CONFIG_BIG_KEYS) += big_key.o
 obj-$(CONFIG_TRUSTED_KEYS) += trusted.o
 obj-$(CONFIG_ENCRYPTED_KEYS) += encrypted-keys/
+obj-$(CONFIG_X86_INTEL_MKTME) += mktme_keys.o
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
new file mode 100644
index 000000000000..b5e8289f041b
--- /dev/null
+++ b/security/keys/mktme_keys.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-3.0
+
+/* Documentation/x86/mktme_keys.rst */
+
+#include <linux/init.h>
+#include <linux/key.h>
+#include <linux/key-type.h>
+#include <linux/mm.h>
+#include <keys/user-type.h>
+
+#include "internal.h"
+
+/* 1:1 Mapping between Userspace Keys (struct key) and Hardware KeyIDs */
+struct mktme_mapping {
+	unsigned int	mapped_keyids;
+	struct key	*key[];
+};
+
+struct mktme_mapping *mktme_map;
+
+static inline long mktme_map_size(void)
+{
+	long size = 0;
+
+	size += sizeof(*mktme_map);
+	size += sizeof(mktme_map->key[0]) * (mktme_nr_keyids + 1);
+	return size;
+}
+
+int mktme_map_alloc(void)
+{
+	mktme_map = kvzalloc(mktme_map_size(), GFP_KERNEL);
+	if (!mktme_map)
+		return -ENOMEM;
+	return 0;
+}
+
+int mktme_reserve_keyid(struct key *key)
+{
+	int i;
+
+	if (mktme_map->mapped_keyids == mktme_nr_keyids)
+		return 0;
+
+	for (i = 1; i <= mktme_nr_keyids; i++) {
+		if (mktme_map->key[i] == 0) {
+			mktme_map->key[i] = key;
+			mktme_map->mapped_keyids++;
+			return i;
+		}
+	}
+	return 0;
+}
+
+void mktme_release_keyid(int keyid)
+{
+	mktme_map->key[keyid] = 0;
+	mktme_map->mapped_keyids--;
+}
+
+int mktme_keyid_from_key(struct key *key)
+{
+	int i;
+
+	for (i = 1; i <= mktme_nr_keyids; i++) {
+		if (mktme_map->key[i] == key)
+			return i;
+	}
+	return 0;
+}
+
+struct key_type key_type_mktme = {
+	.name		= "mktme",
+	.describe	= user_describe,
+};
+
+static int __init init_mktme(void)
+{
+	int ret;
+
+	/* Verify keys are present */
+	if (mktme_nr_keyids < 1)
+		return 0;
+
+	/* Mapping of Userspace Keys to Hardware KeyIDs */
+	if (mktme_map_alloc())
+		return -ENOMEM;
+
+	ret = register_key_type(&key_type_mktme);
+	if (!ret)
+		return ret;			/* SUCCESS */
+
+	kvfree(mktme_map);
+
+	return -ENOMEM;
+}
+
+late_initcall(init_mktme);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 24/62] keys/mktme: Preparse the MKTME key payload
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (22 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 23/62] keys/mktme: Introduce a Kernel Key Service for MKTME Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 25/62] keys/mktme: Instantiate and destroy MKTME keys Kirill A. Shutemov
                   ` (39 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

It is a requirement of the Kernel Keys subsystem to provide a
preparse method that validates payloads before key instantiate
methods are called.

Verify that userspace provides valid MKTME options and prepare
the payload for use at key instantiate time.

Create a method to free the preparsed payload. The Kernel Key
subsystem will that to clean up after the key is instantiated.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/keys/mktme-type.h  |  39 +++++++++
 security/keys/mktme_keys.c | 165 +++++++++++++++++++++++++++++++++++++
 2 files changed, 204 insertions(+)
 create mode 100644 include/keys/mktme-type.h

diff --git a/include/keys/mktme-type.h b/include/keys/mktme-type.h
new file mode 100644
index 000000000000..032905b288b4
--- /dev/null
+++ b/include/keys/mktme-type.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/* Key service for Multi-KEY Total Memory Encryption */
+
+#ifndef _KEYS_MKTME_TYPE_H
+#define _KEYS_MKTME_TYPE_H
+
+#include <linux/key.h>
+
+/*
+ * The AES-XTS 128 encryption algorithm requires 128 bits for each
+ * user supplied data key and tweak key.
+ */
+#define MKTME_AES_XTS_SIZE	16	/* 16 bytes, 128 bits */
+
+enum mktme_alg {
+	MKTME_ALG_AES_XTS_128,
+};
+
+const char *const mktme_alg_names[] = {
+	[MKTME_ALG_AES_XTS_128]	= "aes-xts-128",
+};
+
+enum mktme_type {
+	MKTME_TYPE_ERROR = -1,
+	MKTME_TYPE_USER,
+	MKTME_TYPE_CPU,
+	MKTME_TYPE_NO_ENCRYPT,
+};
+
+const char *const mktme_type_names[] = {
+	[MKTME_TYPE_USER]	= "user",
+	[MKTME_TYPE_CPU]	= "cpu",
+	[MKTME_TYPE_NO_ENCRYPT]	= "no-encrypt",
+};
+
+extern struct key_type key_type_mktme;
+
+#endif /* _KEYS_MKTME_TYPE_H */
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index b5e8289f041b..92a047caa829 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -6,6 +6,10 @@
 #include <linux/key.h>
 #include <linux/key-type.h>
 #include <linux/mm.h>
+#include <linux/parser.h>
+#include <linux/string.h>
+#include <asm/intel_pconfig.h>
+#include <keys/mktme-type.h>
 #include <keys/user-type.h>
 
 #include "internal.h"
@@ -69,8 +73,169 @@ int mktme_keyid_from_key(struct key *key)
 	return 0;
 }
 
+enum mktme_opt_id {
+	OPT_ERROR,
+	OPT_TYPE,
+	OPT_KEY,
+	OPT_TWEAK,
+	OPT_ALGORITHM,
+};
+
+static const match_table_t mktme_token = {
+	{OPT_TYPE, "type=%s"},
+	{OPT_KEY, "key=%s"},
+	{OPT_TWEAK, "tweak=%s"},
+	{OPT_ALGORITHM, "algorithm=%s"},
+	{OPT_ERROR, NULL}
+};
+
+struct mktme_payload {
+	u32		keyid_ctrl;	/* Command & Encryption Algorithm */
+	u8		data_key[MKTME_AES_XTS_SIZE];
+	u8		tweak_key[MKTME_AES_XTS_SIZE];
+};
+
+/* Make sure arguments are correct for the TYPE of key requested */
+static int mktme_check_options(struct mktme_payload *payload,
+			       unsigned long token_mask, enum mktme_type type)
+{
+	if (!token_mask)
+		return -EINVAL;
+
+	switch (type) {
+	case MKTME_TYPE_USER:
+		if (test_bit(OPT_ALGORITHM, &token_mask))
+			payload->keyid_ctrl |= MKTME_AES_XTS_128;
+		else
+			return -EINVAL;
+
+		if ((test_bit(OPT_KEY, &token_mask)) &&
+		    (test_bit(OPT_TWEAK, &token_mask)))
+			payload->keyid_ctrl |= MKTME_KEYID_SET_KEY_DIRECT;
+		else
+			return -EINVAL;
+		break;
+
+	case MKTME_TYPE_CPU:
+		if (test_bit(OPT_ALGORITHM, &token_mask))
+			payload->keyid_ctrl |= MKTME_AES_XTS_128;
+		else
+			return -EINVAL;
+
+		payload->keyid_ctrl |= MKTME_KEYID_SET_KEY_RANDOM;
+		break;
+
+	case MKTME_TYPE_NO_ENCRYPT:
+		payload->keyid_ctrl |= MKTME_KEYID_NO_ENCRYPT;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/* Parse the options and store the key programming data in the payload. */
+static int mktme_get_options(char *options, struct mktme_payload *payload)
+{
+	enum mktme_type type = MKTME_TYPE_ERROR;
+	substring_t args[MAX_OPT_ARGS];
+	unsigned long token_mask = 0;
+	char *p = options;
+	int ret, token;
+
+	while ((p = strsep(&options, " \t"))) {
+		if (*p == '\0' || *p == ' ' || *p == '\t')
+			continue;
+		token = match_token(p, mktme_token, args);
+		if (token == OPT_ERROR)
+			return -EINVAL;
+		if (test_and_set_bit(token, &token_mask))
+			return -EINVAL;
+
+		switch (token) {
+		case OPT_KEY:
+			ret = hex2bin(payload->data_key, args[0].from,
+				      MKTME_AES_XTS_SIZE);
+			if (ret < 0)
+				return -EINVAL;
+			break;
+
+		case OPT_TWEAK:
+			ret = hex2bin(payload->tweak_key, args[0].from,
+				      MKTME_AES_XTS_SIZE);
+			if (ret < 0)
+				return -EINVAL;
+			break;
+
+		case OPT_TYPE:
+			type = match_string(mktme_type_names,
+					    ARRAY_SIZE(mktme_type_names),
+					    args[0].from);
+			if (type < 0)
+				return -EINVAL;
+			break;
+
+		case OPT_ALGORITHM:
+			ret = match_string(mktme_alg_names,
+					   ARRAY_SIZE(mktme_alg_names),
+					   args[0].from);
+			if (ret < 0)
+				return -EINVAL;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+	}
+	return mktme_check_options(payload, token_mask, type);
+}
+
+void mktme_free_preparsed_payload(struct key_preparsed_payload *prep)
+{
+	kzfree(prep->payload.data[0]);
+}
+
+/*
+ * Key Service Method to preparse a payload before a key is created.
+ * Check permissions and the options. Load the proposed key field
+ * data into the payload for use by the instantiate method.
+ */
+int mktme_preparse_payload(struct key_preparsed_payload *prep)
+{
+	struct mktme_payload *mktme_payload;
+	size_t datalen = prep->datalen;
+	char *options;
+	int ret;
+
+	if (datalen <= 0 || datalen > 1024 || !prep->data)
+		return -EINVAL;
+
+	options = kmemdup_nul(prep->data, datalen, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	mktme_payload = kzalloc(sizeof(*mktme_payload), GFP_KERNEL);
+	if (!mktme_payload) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	ret = mktme_get_options(options, mktme_payload);
+	if (ret < 0) {
+		kzfree(mktme_payload);
+		goto out;
+	}
+	prep->quotalen = sizeof(mktme_payload);
+	prep->payload.data[0] = mktme_payload;
+out:
+	kzfree(options);
+	return ret;
+}
+
 struct key_type key_type_mktme = {
 	.name		= "mktme",
+	.preparse	= mktme_preparse_payload,
+	.free_preparse	= mktme_free_preparsed_payload,
 	.describe	= user_describe,
 };
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 25/62] keys/mktme: Instantiate and destroy MKTME keys
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (23 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 24/62] keys/mktme: Preparse the MKTME key payload Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 26/62] keys/mktme: Move the MKTME payload into a cache aligned structure Kirill A. Shutemov
                   ` (38 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Instantiating and destroying are two Kernel Key Service methods
that are invoked by the kernel key service when a key is added
(add_key, request_key) or removed (invalidate, revoke, timeout).

During instantiation, MKTME needs to allocate an available hardware
KeyID and map it to the Userspace Key.

During destroy, MKTME wil returned the hardware KeyID to the pool of
available keys.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 92a047caa829..14bc4e600978 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -14,6 +14,8 @@
 
 #include "internal.h"
 
+static DEFINE_SPINLOCK(mktme_lock);
+
 /* 1:1 Mapping between Userspace Keys (struct key) and Hardware KeyIDs */
 struct mktme_mapping {
 	unsigned int	mapped_keyids;
@@ -95,6 +97,26 @@ struct mktme_payload {
 	u8		tweak_key[MKTME_AES_XTS_SIZE];
 };
 
+/* Key Service Method called when a Userspace Key is garbage collected. */
+static void mktme_destroy_key(struct key *key)
+{
+	mktme_release_keyid(mktme_keyid_from_key(key));
+}
+
+/* Key Service Method to create a new key. Payload is preparsed. */
+int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
+{
+	unsigned long flags;
+	int keyid;
+
+	spin_lock_irqsave(&mktme_lock, flags);
+	keyid = mktme_reserve_keyid(key);
+	spin_unlock_irqrestore(&mktme_lock, flags);
+	if (!keyid)
+		return -ENOKEY;
+	return 0;
+}
+
 /* Make sure arguments are correct for the TYPE of key requested */
 static int mktme_check_options(struct mktme_payload *payload,
 			       unsigned long token_mask, enum mktme_type type)
@@ -236,7 +258,9 @@ struct key_type key_type_mktme = {
 	.name		= "mktme",
 	.preparse	= mktme_preparse_payload,
 	.free_preparse	= mktme_free_preparsed_payload,
+	.instantiate	= mktme_instantiate_key,
 	.describe	= user_describe,
+	.destroy	= mktme_destroy_key,
 };
 
 static int __init init_mktme(void)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 26/62] keys/mktme: Move the MKTME payload into a cache aligned structure
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (24 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 25/62] keys/mktme: Instantiate and destroy MKTME keys Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-06-14 11:35   ` Peter Zijlstra
  2019-05-08 14:43 ` [PATCH, RFC 27/62] keys/mktme: Strengthen the entropy of CPU generated MKTME keys Kirill A. Shutemov
                   ` (37 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

In preparation for programming the key into the hardware, move
the key payload into a cache aligned structure. This alignment
is a requirement of the MKTME hardware.

Use the slab allocator to have this structure readily available.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 39 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 14bc4e600978..a7ca32865a1c 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -15,6 +15,7 @@
 #include "internal.h"
 
 static DEFINE_SPINLOCK(mktme_lock);
+struct kmem_cache *mktme_prog_cache;	/* Hardware programming cache */
 
 /* 1:1 Mapping between Userspace Keys (struct key) and Hardware KeyIDs */
 struct mktme_mapping {
@@ -97,6 +98,27 @@ struct mktme_payload {
 	u8		tweak_key[MKTME_AES_XTS_SIZE];
 };
 
+/* Copy the payload to the HW programming structure and program this KeyID */
+static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
+{
+	struct mktme_key_program *kprog = NULL;
+	int ret;
+
+	kprog = kmem_cache_zalloc(mktme_prog_cache, GFP_ATOMIC);
+	if (!kprog)
+		return -ENOMEM;
+
+	/* Hardware programming requires cached aligned struct */
+	kprog->keyid = keyid;
+	kprog->keyid_ctrl = payload->keyid_ctrl;
+	memcpy(kprog->key_field_1, payload->data_key, MKTME_AES_XTS_SIZE);
+	memcpy(kprog->key_field_2, payload->tweak_key, MKTME_AES_XTS_SIZE);
+
+	ret = MKTME_PROG_SUCCESS;	/* Future programming call */
+	kmem_cache_free(mktme_prog_cache, kprog);
+	return ret;
+}
+
 /* Key Service Method called when a Userspace Key is garbage collected. */
 static void mktme_destroy_key(struct key *key)
 {
@@ -106,6 +128,7 @@ static void mktme_destroy_key(struct key *key)
 /* Key Service Method to create a new key. Payload is preparsed. */
 int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 {
+	struct mktme_payload *payload = prep->payload.data[0];
 	unsigned long flags;
 	int keyid;
 
@@ -114,7 +137,14 @@ int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 	spin_unlock_irqrestore(&mktme_lock, flags);
 	if (!keyid)
 		return -ENOKEY;
-	return 0;
+
+	if (!mktme_program_keyid(keyid, payload))
+		return MKTME_PROG_SUCCESS;
+
+	spin_lock_irqsave(&mktme_lock, flags);
+	mktme_release_keyid(keyid);
+	spin_unlock_irqrestore(&mktme_lock, flags);
+	return -ENOKEY;
 }
 
 /* Make sure arguments are correct for the TYPE of key requested */
@@ -275,10 +305,15 @@ static int __init init_mktme(void)
 	if (mktme_map_alloc())
 		return -ENOMEM;
 
+	/* Used to program the hardware key tables */
+	mktme_prog_cache = KMEM_CACHE(mktme_key_program, SLAB_PANIC);
+	if (!mktme_prog_cache)
+		goto free_map;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
-
+free_map:
 	kvfree(mktme_map);
 
 	return -ENOMEM;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 27/62] keys/mktme: Strengthen the entropy of CPU generated MKTME keys
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (25 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 26/62] keys/mktme: Move the MKTME payload into a cache aligned structure Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 28/62] keys/mktme: Set up PCONFIG programming targets for " Kirill A. Shutemov
                   ` (36 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

If users request CPU generated keys, mix additional entropy bits
from the kernel into the key programming fields used by the
hardware. This additional entropy may compensate for weak user
supplied, or CPU generated, entropy.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index a7ca32865a1c..9fdf482ea3e6 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -7,6 +7,7 @@
 #include <linux/key-type.h>
 #include <linux/mm.h>
 #include <linux/parser.h>
+#include <linux/random.h>
 #include <linux/string.h>
 #include <asm/intel_pconfig.h>
 #include <keys/mktme-type.h>
@@ -102,7 +103,8 @@ struct mktme_payload {
 static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
 {
 	struct mktme_key_program *kprog = NULL;
-	int ret;
+	u8 kern_entropy[MKTME_AES_XTS_SIZE];
+	int ret, i;
 
 	kprog = kmem_cache_zalloc(mktme_prog_cache, GFP_ATOMIC);
 	if (!kprog)
@@ -114,6 +116,14 @@ static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
 	memcpy(kprog->key_field_1, payload->data_key, MKTME_AES_XTS_SIZE);
 	memcpy(kprog->key_field_2, payload->tweak_key, MKTME_AES_XTS_SIZE);
 
+	/* Strengthen the entropy fields for CPU generated keys */
+	if ((payload->keyid_ctrl & 0xff) == MKTME_KEYID_SET_KEY_RANDOM) {
+		get_random_bytes(&kern_entropy, sizeof(kern_entropy));
+		for (i = 0; i < (MKTME_AES_XTS_SIZE); i++) {
+			kprog->key_field_1[i] ^= kern_entropy[i];
+			kprog->key_field_2[i] ^= kern_entropy[i];
+		}
+	}
 	ret = MKTME_PROG_SUCCESS;	/* Future programming call */
 	kmem_cache_free(mktme_prog_cache, kprog);
 	return ret;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 28/62] keys/mktme: Set up PCONFIG programming targets for MKTME keys
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (26 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 27/62] keys/mktme: Strengthen the entropy of CPU generated MKTME keys Kirill A. Shutemov
@ 2019-05-08 14:43 ` " Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 29/62] keys/mktme: Program MKTME keys into the platform hardware Kirill A. Shutemov
                   ` (35 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

MKTME Key service maintains the hardware key tables. These key tables
are package scoped per the MKTME hardware definition. This means that
each physical package on the system needs its key table programmed.

These physical packages are the targets of the new PCONFIG programming
command. So, introduce a PCONFIG targets bitmap as well as a CPU mask
that includes the lead CPUs capable of programming the targets.

The lead CPU mask will be used every time a new key is programmed into
the hardware.

Keep the PCONFIG targets bit map around for future use during hotplug
events.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 42 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 9fdf482ea3e6..b5b44decfd3e 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -2,6 +2,7 @@
 
 /* Documentation/x86/mktme_keys.rst */
 
+#include <linux/cpu.h>
 #include <linux/init.h>
 #include <linux/key.h>
 #include <linux/key-type.h>
@@ -17,6 +18,8 @@
 
 static DEFINE_SPINLOCK(mktme_lock);
 struct kmem_cache *mktme_prog_cache;	/* Hardware programming cache */
+unsigned long *mktme_target_map;	/* Pconfig programming targets */
+cpumask_var_t mktme_leadcpus;		/* One lead CPU per pconfig target */
 
 /* 1:1 Mapping between Userspace Keys (struct key) and Hardware KeyIDs */
 struct mktme_mapping {
@@ -303,6 +306,33 @@ struct key_type key_type_mktme = {
 	.destroy	= mktme_destroy_key,
 };
 
+static void mktme_update_pconfig_targets(void)
+{
+	int cpu, target_id;
+
+	cpumask_clear(mktme_leadcpus);
+	bitmap_clear(mktme_target_map, 0, sizeof(mktme_target_map));
+
+	for_each_online_cpu(cpu) {
+		target_id = topology_physical_package_id(cpu);
+		if (!__test_and_set_bit(target_id, mktme_target_map))
+			__cpumask_set_cpu(cpu, mktme_leadcpus);
+	}
+}
+
+static int mktme_alloc_pconfig_targets(void)
+{
+	if (!alloc_cpumask_var(&mktme_leadcpus, GFP_KERNEL))
+		return -ENOMEM;
+
+	mktme_target_map = bitmap_alloc(topology_max_packages(), GFP_KERNEL);
+	if (!mktme_target_map) {
+		free_cpumask_var(mktme_leadcpus);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
 static int __init init_mktme(void)
 {
 	int ret;
@@ -320,9 +350,21 @@ static int __init init_mktme(void)
 	if (!mktme_prog_cache)
 		goto free_map;
 
+	/* Hardware programming targets */
+	if (mktme_alloc_pconfig_targets())
+		goto free_cache;
+
+	/* Initialize first programming targets */
+	mktme_update_pconfig_targets();
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
+
+	free_cpumask_var(mktme_leadcpus);
+	bitmap_free(mktme_target_map);
+free_cache:
+	kmem_cache_destroy(mktme_prog_cache);
 free_map:
 	kvfree(mktme_map);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 29/62] keys/mktme: Program MKTME keys into the platform hardware
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (27 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 28/62] keys/mktme: Set up PCONFIG programming targets for " Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 30/62] keys/mktme: Set up a percpu_ref_count for MKTME keys Kirill A. Shutemov
                   ` (34 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Finally, the keys are programmed into the hardware via each
lead CPU. Every package has to be programmed successfully.
There is no partial success allowed here.

Here a retry scheme is included for two errors that may succeed
on retry: MKTME_DEVICE_BUSY and MKTME_ENTROPY_ERROR.
However, it's not clear if even those errors should be retried
at this level. Perhaps they too, should be returned to user space
for handling.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 92 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 91 insertions(+), 1 deletion(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index b5b44decfd3e..f70533b1a7fd 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -102,6 +102,96 @@ struct mktme_payload {
 	u8		tweak_key[MKTME_AES_XTS_SIZE];
 };
 
+struct mktme_hw_program_info {
+	struct mktme_key_program *key_program;
+	int *status;
+};
+
+struct mktme_err_table {
+	const char *msg;
+	bool retry;
+};
+
+static const struct mktme_err_table mktme_error[] = {
+/* MKTME_PROG_SUCCESS     */ {"KeyID was successfully programmed",   false},
+/* MKTME_INVALID_PROG_CMD */ {"Invalid KeyID programming command",   false},
+/* MKTME_ENTROPY_ERROR    */ {"Insufficient entropy",		      true},
+/* MKTME_INVALID_KEYID    */ {"KeyID not valid",		     false},
+/* MKTME_INVALID_ENC_ALG  */ {"Invalid encryption algorithm chosen", false},
+/* MKTME_DEVICE_BUSY      */ {"Failure to access key table",	      true},
+};
+
+static int mktme_parse_program_status(int status[])
+{
+	int cpu, sum = 0;
+
+	/* Success: all CPU(s) programmed all key table(s) */
+	for_each_cpu(cpu, mktme_leadcpus)
+		sum += status[cpu];
+	if (!sum)
+		return MKTME_PROG_SUCCESS;
+
+	/* Invalid Parameters: log the error and return the error. */
+	for_each_cpu(cpu, mktme_leadcpus) {
+		switch (status[cpu]) {
+		case MKTME_INVALID_KEYID:
+		case MKTME_INVALID_PROG_CMD:
+		case MKTME_INVALID_ENC_ALG:
+			pr_err("mktme: %s\n", mktme_error[status[cpu]].msg);
+			return status[cpu];
+
+		default:
+			break;
+		}
+	}
+	/*
+	 * Device Busy or Insufficient Entropy: do not log the
+	 * error. These will be retried and if retries (time or
+	 * count runs out) caller will log the error.
+	 */
+	for_each_cpu(cpu, mktme_leadcpus) {
+		if (status[cpu] == MKTME_DEVICE_BUSY)
+			return status[cpu];
+	}
+	return MKTME_ENTROPY_ERROR;
+}
+
+/* Program a single key using one CPU. */
+static void mktme_do_program(void *hw_program_info)
+{
+	struct mktme_hw_program_info *info = hw_program_info;
+	int cpu;
+
+	cpu = smp_processor_id();
+	info->status[cpu] = mktme_key_program(info->key_program);
+}
+
+static int mktme_program_all_keytables(struct mktme_key_program *key_program)
+{
+	struct mktme_hw_program_info info;
+	int err, retries = 10; /* Maybe users should handle retries */
+
+	info.key_program = key_program;
+	info.status = kcalloc(num_possible_cpus(), sizeof(info.status[0]),
+			      GFP_KERNEL);
+
+	while (retries--) {
+		get_online_cpus();
+		on_each_cpu_mask(mktme_leadcpus, mktme_do_program,
+				 &info, 1);
+		put_online_cpus();
+
+		err = mktme_parse_program_status(info.status);
+		if (!err)			   /* Success */
+			return err;
+		else if (!mktme_error[err].retry)  /* Error no retry */
+			return -ENOKEY;
+	}
+	/* Ran out of retries */
+	pr_err("mktme: %s\n", mktme_error[err].msg);
+	return err;
+}
+
 /* Copy the payload to the HW programming structure and program this KeyID */
 static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
 {
@@ -127,7 +217,7 @@ static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
 			kprog->key_field_2[i] ^= kern_entropy[i];
 		}
 	}
-	ret = MKTME_PROG_SUCCESS;	/* Future programming call */
+	ret = mktme_program_all_keytables(kprog);
 	kmem_cache_free(mktme_prog_cache, kprog);
 	return ret;
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 30/62] keys/mktme: Set up a percpu_ref_count for MKTME keys
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (28 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 29/62] keys/mktme: Program MKTME keys into the platform hardware Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 31/62] keys/mktme: Require CAP_SYS_RESOURCE capability " Kirill A. Shutemov
                   ` (33 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The MKTME key service needs to keep usage counts on the encryption
keys in order to know when it is safe to free a key for reuse.

percpu_ref_count applies well here because the key service will
take the initial reference and typically hold that reference while
the intermediary references are get/put. The intermediaries in this
case are the encrypted VMA's.

Align the percpu_ref_init and percpu_ref_kill with the key service
instantiate and destroy methods respectively.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 40 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index f70533b1a7fd..496b5c1b7461 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -8,6 +8,7 @@
 #include <linux/key-type.h>
 #include <linux/mm.h>
 #include <linux/parser.h>
+#include <linux/percpu-refcount.h>
 #include <linux/random.h>
 #include <linux/string.h>
 #include <asm/intel_pconfig.h>
@@ -80,6 +81,26 @@ int mktme_keyid_from_key(struct key *key)
 	return 0;
 }
 
+struct percpu_ref *encrypt_count;
+void mktme_percpu_ref_release(struct percpu_ref *ref)
+{
+	unsigned long flags;
+	int keyid;
+
+	for (keyid = 1; keyid <= mktme_nr_keyids; keyid++) {
+		if (&encrypt_count[keyid] == ref)
+			break;
+	}
+	if (&encrypt_count[keyid] != ref) {
+		pr_debug("%s: invalid ref counter\n", __func__);
+		return;
+	}
+	percpu_ref_exit(ref);
+	spin_lock_irqsave(&mktme_map_lock, flags);
+	mktme_release_keyid(keyid);
+	spin_unlock_irqrestore(&mktme_map_lock, flags);
+}
+
 enum mktme_opt_id {
 	OPT_ERROR,
 	OPT_TYPE,
@@ -225,7 +246,10 @@ static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
 /* Key Service Method called when a Userspace Key is garbage collected. */
 static void mktme_destroy_key(struct key *key)
 {
-	mktme_release_keyid(mktme_keyid_from_key(key));
+	int keyid = mktme_keyid_from_key(key);
+
+	mktme_map->key[keyid] = (void *)-1;
+	percpu_ref_kill(&encrypt_count[keyid]);
 }
 
 /* Key Service Method to create a new key. Payload is preparsed. */
@@ -241,9 +265,15 @@ int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 	if (!keyid)
 		return -ENOKEY;
 
+	if (percpu_ref_init(&encrypt_count[keyid], mktme_percpu_ref_release,
+			    0, GFP_KERNEL))
+		goto err_out;
+
 	if (!mktme_program_keyid(keyid, payload))
 		return MKTME_PROG_SUCCESS;
 
+	percpu_ref_exit(&encrypt_count[keyid]);
+err_out:
 	spin_lock_irqsave(&mktme_lock, flags);
 	mktme_release_keyid(keyid);
 	spin_unlock_irqrestore(&mktme_lock, flags);
@@ -447,10 +477,18 @@ static int __init init_mktme(void)
 	/* Initialize first programming targets */
 	mktme_update_pconfig_targets();
 
+	/* Reference counters to protect in use KeyIDs */
+	encrypt_count = kvcalloc(mktme_nr_keyids + 1, sizeof(encrypt_count[0]),
+				 GFP_KERNEL);
+	if (!encrypt_count)
+		goto free_targets;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
 
+	kvfree(encrypt_count);
+free_targets:
 	free_cpumask_var(mktme_leadcpus);
 	bitmap_free(mktme_target_map);
 free_cache:
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 31/62] keys/mktme: Require CAP_SYS_RESOURCE capability for MKTME keys
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (29 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 30/62] keys/mktme: Set up a percpu_ref_count for MKTME keys Kirill A. Shutemov
@ 2019-05-08 14:43 ` " Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 32/62] keys/mktme: Store MKTME payloads if cmdline parameter allows Kirill A. Shutemov
                   ` (32 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The MKTME key type uses capabilities to restrict the allocation
of keys to privileged users. CAP_SYS_RESOURCE is required, but
the broader capability of CAP_SYS_ADMIN is accepted.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 496b5c1b7461..4b2d3dc1843a 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -2,6 +2,7 @@
 
 /* Documentation/x86/mktme_keys.rst */
 
+#include <linux/cred.h>
 #include <linux/cpu.h>
 #include <linux/init.h>
 #include <linux/key.h>
@@ -393,6 +394,9 @@ int mktme_preparse_payload(struct key_preparsed_payload *prep)
 	char *options;
 	int ret;
 
+	if (!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
 	if (datalen <= 0 || datalen > 1024 || !prep->data)
 		return -EINVAL;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 32/62] keys/mktme: Store MKTME payloads if cmdline parameter allows
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (30 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 31/62] keys/mktme: Require CAP_SYS_RESOURCE capability " Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 33/62] acpi: Remove __init from acpi table parsing functions Kirill A. Shutemov
                   ` (31 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

MKTME (Multi-Key Total Memory Encryption) key payloads may include
data encryption keys, tweak keys, and additional entropy bits. These
are used to program the MKTME encryption hardware. By default, the
kernel destroys this payload data once the hardware is programmed.

However, in order to fully support Memory Hotplug, saving the key data
becomes important. The MKTME Key Service cannot allow a new memory
controller to come online unless it can program the Key Table to match
the Key Tables of all existing memory controllers.

With CPU generated keys (a.k.a. random keys or ephemeral keys) the
saving of user key data is not an issue. The kernel and MKTME hardware
can generate strong encryption keys without recalling any user supplied
data.

With USER directed keys (a.k.a. user type) saving the key programming
data (data and tweak key) becomes an issue. The data and tweak keys
are required to program those keys on a new physical package.

In preparation for adding support for onlining new memory:

   Add an 'mktme_key_store' where key payloads are stored.

   Add 'mktme_storekeys' kernel command line parameter that, when
   present, allows the kernel to store user type key payloads.

   Add 'mktme_bitmap_user_type' to recall when USER type keys are in
   use. If no USER type keys are currently in use, new memory
   may be brought online, despite the absence of 'mktme_storekeys'.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 .../admin-guide/kernel-parameters.rst         |  1 +
 .../admin-guide/kernel-parameters.txt         | 11 ++++
 security/keys/mktme_keys.c                    | 51 ++++++++++++++++++-
 3 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index b8d0bc07ed0a..1b62b86d0666 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -120,6 +120,7 @@ parameter is applicable::
 			Documentation/m68k/kernel-options.txt.
 	MDA	MDA console support is enabled.
 	MIPS	MIPS architecture is enabled.
+	MKTME	Multi-Key Total Memory Encryption is enabled.
 	MOUSE	Appropriate mouse support is enabled.
 	MSI	Message Signaled Interrupts (PCI).
 	MTD	MTD (Memory Technology Device) support is enabled.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2b8ee90bb644..38ea0ace9533 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2544,6 +2544,17 @@
 			in the "bleeding edge" mini2440 support kernel at
 			http://repo.or.cz/w/linux-2.6/mini2440.git
 
+	mktme_storekeys [X86, MKTME] When CONFIG_X86_INTEL_MKTME is set
+			this parameter allows the kernel to store the user
+			specified MKTME key payload. Storing this payload
+			means that the MKTME Key Service can always allow
+			the addition of new physical packages. If the
+			mktme_storekeys parameter is not present, users key
+			data will not be stored, and new physical packages
+			may only be added to the system if no user type
+			MKTME keys are programmed.
+			See Documentation/x86/mktme.rst
+
 	mminit_loglevel=
 			[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
 			parameter allows control of the logging verbosity for
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 4b2d3dc1843a..bcd68850048f 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -22,6 +22,9 @@ static DEFINE_SPINLOCK(mktme_lock);
 struct kmem_cache *mktme_prog_cache;	/* Hardware programming cache */
 unsigned long *mktme_target_map;	/* Pconfig programming targets */
 cpumask_var_t mktme_leadcpus;		/* One lead CPU per pconfig target */
+static bool mktme_storekeys;		/* True if key payloads may be stored */
+unsigned long *mktme_bitmap_user_type;	/* Shows presence of user type keys */
+struct mktme_payload *mktme_key_store;	/* Payload storage if allowed */
 
 /* 1:1 Mapping between Userspace Keys (struct key) and Hardware KeyIDs */
 struct mktme_mapping {
@@ -124,6 +127,27 @@ struct mktme_payload {
 	u8		tweak_key[MKTME_AES_XTS_SIZE];
 };
 
+void mktme_store_payload(int keyid, struct mktme_payload *payload)
+{
+	/* Always remember if this key is of type "user" */
+	if ((payload->keyid_ctrl & 0xff) == MKTME_KEYID_SET_KEY_DIRECT)
+		set_bit(keyid, mktme_bitmap_user_type);
+	/*
+	 * Always store the control fields to program newly
+	 * onlined packages with RANDOM or NO_ENCRYPT keys.
+	 */
+	mktme_key_store[keyid].keyid_ctrl = payload->keyid_ctrl;
+
+	/* Only store "user" type data and tweak keys if allowed */
+	if (mktme_storekeys &&
+	    ((payload->keyid_ctrl & 0xff) == MKTME_KEYID_SET_KEY_DIRECT)) {
+		memcpy(mktme_key_store[keyid].data_key, payload->data_key,
+		       MKTME_AES_XTS_SIZE);
+		memcpy(mktme_key_store[keyid].tweak_key, payload->tweak_key,
+		       MKTME_AES_XTS_SIZE);
+	}
+}
+
 struct mktme_hw_program_info {
 	struct mktme_key_program *key_program;
 	int *status;
@@ -270,9 +294,10 @@ int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 			    0, GFP_KERNEL))
 		goto err_out;
 
-	if (!mktme_program_keyid(keyid, payload))
+	if (!mktme_program_keyid(keyid, payload)) {
+		mktme_store_payload(keyid, payload);
 		return MKTME_PROG_SUCCESS;
-
+	}
 	percpu_ref_exit(&encrypt_count[keyid]);
 err_out:
 	spin_lock_irqsave(&mktme_lock, flags);
@@ -487,10 +512,25 @@ static int __init init_mktme(void)
 	if (!encrypt_count)
 		goto free_targets;
 
+	/* Detect presence of user type keys */
+	mktme_bitmap_user_type = bitmap_zalloc(mktme_nr_keyids, GFP_KERNEL);
+	if (!mktme_bitmap_user_type)
+		goto free_encrypt;
+
+	/* Store key payloads if allowable */
+	mktme_key_store = kzalloc(sizeof(mktme_key_store[0]) *
+				   (mktme_nr_keyids + 1), GFP_KERNEL);
+	if (!mktme_key_store)
+		goto free_bitmap;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
 
+	kfree(mktme_key_store);
+free_bitmap:
+	bitmap_free(mktme_bitmap_user_type);
+free_encrypt:
 	kvfree(encrypt_count);
 free_targets:
 	free_cpumask_var(mktme_leadcpus);
@@ -504,3 +544,10 @@ static int __init init_mktme(void)
 }
 
 late_initcall(init_mktme);
+
+static int mktme_enable_storekeys(char *__unused)
+{
+	mktme_storekeys = true;
+	return 1;
+}
+__setup("mktme_storekeys", mktme_enable_storekeys);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 33/62] acpi: Remove __init from acpi table parsing functions
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (31 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 32/62] keys/mktme: Store MKTME payloads if cmdline parameter allows Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 34/62] acpi/hmat: Determine existence of an ACPI HMAT Kirill A. Shutemov
                   ` (30 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

ACPI table parsing functions are useful outside of init time.

For example, the MKTME (Multi-Key Total Memory Encryption) key
service will evaluate the ACPI HMAT table when the first key
creation request occurs.  This will happen after init time.

Additionally, the table parsing functions can be used when
_HMA objects are evaluated at runtime. The _HMA object provides
a completely new HMAT, overriding the existing table. The table
parsing functions will come in handy for those events.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/acpi/tables.c | 10 +++++-----
 include/linux/acpi.h  |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 3d0da38f94c6..35646b0fa7eb 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -47,7 +47,7 @@ static char *mps_inti_flags_trigger[] = { "dfl", "edge", "res", "level" };
 
 static struct acpi_table_desc initial_tables[ACPI_MAX_TABLES] __initdata;
 
-static int acpi_apic_instance __initdata;
+static int acpi_apic_instance;
 
 enum acpi_subtable_type {
 	ACPI_SUBTABLE_COMMON,
@@ -63,7 +63,7 @@ struct acpi_subtable_entry {
  * Disable table checksum verification for the early stage due to the size
  * limitation of the current x86 early mapping implementation.
  */
-static bool acpi_verify_table_checksum __initdata = false;
+static bool acpi_verify_table_checksum = false;
 
 void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
 {
@@ -294,7 +294,7 @@ acpi_get_subtable_type(char *id)
  * On success returns sum of all matching entries for all proc handlers.
  * Otherwise, -ENODEV or -EINVAL is returned.
  */
-static int __init
+static int
 acpi_parse_entries_array(char *id, unsigned long table_size,
 		struct acpi_table_header *table_header,
 		struct acpi_subtable_proc *proc, int proc_num,
@@ -370,7 +370,7 @@ acpi_parse_entries_array(char *id, unsigned long table_size,
 	return errs ? -EINVAL : count;
 }
 
-int __init
+int
 acpi_table_parse_entries_array(char *id,
 			 unsigned long table_size,
 			 struct acpi_subtable_proc *proc, int proc_num,
@@ -402,7 +402,7 @@ acpi_table_parse_entries_array(char *id,
 	return count;
 }
 
-int __init
+int
 acpi_table_parse_entries(char *id,
 			unsigned long table_size,
 			int entry_id,
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 7c7515b0767e..75078fc9b6b3 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -240,11 +240,11 @@ int acpi_numa_init (void);
 
 int acpi_table_init (void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
-int __init acpi_table_parse_entries(char *id, unsigned long table_size,
+int acpi_table_parse_entries(char *id, unsigned long table_size,
 			      int entry_id,
 			      acpi_tbl_entry_handler handler,
 			      unsigned int max_entries);
-int __init acpi_table_parse_entries_array(char *id, unsigned long table_size,
+int acpi_table_parse_entries_array(char *id, unsigned long table_size,
 			      struct acpi_subtable_proc *proc, int proc_num,
 			      unsigned int max_entries);
 int acpi_table_parse_madt(enum acpi_madt_type id,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 34/62] acpi/hmat: Determine existence of an ACPI HMAT
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (32 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 33/62] acpi: Remove __init from acpi table parsing functions Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 35/62] keys/mktme: Require ACPI HMAT to register the MKTME Key Service Kirill A. Shutemov
                   ` (29 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Platforms that need to confirm the presence of an HMAT table
can use this function that simply reports the HMATs existence.

This is added in support of the Multi-Key Total Memory Encryption
(MKTME), a feature on future Intel platforms. These platforms will
need to confirm an HMAT is present at init time.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/acpi/hmat/hmat.c | 13 +++++++++++++
 include/linux/acpi.h     |  4 ++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c
index 96b7d39a97c6..38e3341f569f 100644
--- a/drivers/acpi/hmat/hmat.c
+++ b/drivers/acpi/hmat/hmat.c
@@ -664,3 +664,16 @@ static __init int hmat_init(void)
 	return 0;
 }
 subsys_initcall(hmat_init);
+
+bool acpi_hmat_present(void)
+{
+	struct acpi_table_header *tbl;
+	acpi_status status;
+
+	status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl);
+	if (ACPI_FAILURE(status))
+		return false;
+
+	acpi_put_table(tbl);
+	return true;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 75078fc9b6b3..fe3ad4ca5bb3 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1339,4 +1339,8 @@ acpi_platform_notify(struct device *dev, enum kobject_action action)
 }
 #endif
 
+#ifdef CONFIG_X86_INTEL_MKTME
+extern bool acpi_hmat_present(void);
+#endif /* CONFIG_X86_INTEL_MKTME */
+
 #endif	/*_LINUX_ACPI_H*/
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 35/62] keys/mktme: Require ACPI HMAT to register the MKTME Key Service
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (33 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 34/62] acpi/hmat: Determine existence of an ACPI HMAT Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 36/62] acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME Kirill A. Shutemov
                   ` (28 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The ACPI HMAT will be used by the MKTME key service to identify
topologies that support the safe programming of encryption keys.
Those decisions will happen at key creation time and during
hotplug events.

To enable this, we at least need to have the ACPI HMAT present
at init time. If it's not present, do not register the type.

If the HMAT is not present, failure looks like this:
[ ] MKTME: Registration failed. ACPI HMAT not present.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index bcd68850048f..f5fc6cccc81b 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -2,6 +2,7 @@
 
 /* Documentation/x86/mktme_keys.rst */
 
+#include <linux/acpi.h>
 #include <linux/cred.h>
 #include <linux/cpu.h>
 #include <linux/init.h>
@@ -490,6 +491,12 @@ static int __init init_mktme(void)
 	if (mktme_nr_keyids < 1)
 		return 0;
 
+	/* Require an ACPI HMAT to identify MKTME safe topologies */
+	if (!acpi_hmat_present()) {
+		pr_warn("MKTME: Registration failed. ACPI HMAT not present.\n");
+		return -EINVAL;
+	}
+
 	/* Mapping of Userspace Keys to Hardware KeyIDs */
 	if (mktme_map_alloc())
 		return -ENOMEM;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 36/62] acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (34 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 35/62] keys/mktme: Require ACPI HMAT to register the MKTME Key Service Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 37/62] keys/mktme: Do not allow key creation in unsafe topologies Kirill A. Shutemov
                   ` (27 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

MKTME, Multi-Key Total Memory Encryption, is a feature on Intel
platforms. The ACPI HMAT table can be used to verify that the
platform topology is safe for the usage of MKTME.

The kernel must be capable of programming every memory controller
on the platform. This means that there must be a CPU online, in
the same proximity domain of each memory controller.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/acpi/hmat/hmat.c | 54 ++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h     |  1 +
 2 files changed, 55 insertions(+)

diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c
index 38e3341f569f..936a403c0694 100644
--- a/drivers/acpi/hmat/hmat.c
+++ b/drivers/acpi/hmat/hmat.c
@@ -677,3 +677,57 @@ bool acpi_hmat_present(void)
 	acpi_put_table(tbl);
 	return true;
 }
+
+static int mktme_parse_proximity_domains(union acpi_subtable_headers *header,
+					 const unsigned long end)
+{
+	struct acpi_hmat_proximity_domain *mar = (void *)header;
+	struct acpi_hmat_structure *hdr = (void *)header;
+
+	const struct cpumask *tmp_mask;
+
+	if (!hdr || hdr->type != ACPI_HMAT_TYPE_PROXIMITY)
+		return -EINVAL;
+
+	if (mar->header.length != sizeof(*mar)) {
+		pr_warn("MKTME: invalid header length in HMAT\n");
+		return -1;
+	}
+	/*
+	 * Require a valid processor proximity domain.
+	 * This will catch memory only physical packages with
+	 * no processor capable of programming the key table.
+	 */
+	if (!(mar->flags & ACPI_HMAT_PROCESSOR_PD_VALID)) {
+		pr_warn("MKTME: no valid processor proximity domain\n");
+		return -1;
+	}
+	/* Require an online CPU in the processor proximity domain. */
+	tmp_mask = cpumask_of_node(pxm_to_node(mar->processor_PD));
+	if (!cpumask_intersects(tmp_mask, cpu_online_mask)) {
+		pr_warn("MKTME: no online CPU in proximity domain\n");
+		return -1;
+	}
+	return 0;
+}
+
+/* Returns true if topology is safe for MKTME key creation */
+bool mktme_hmat_evaluate(void)
+{
+	struct acpi_table_header *tbl;
+	bool ret = true;
+	acpi_status status;
+
+	status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl);
+	if (ACPI_FAILURE(status))
+		return -EINVAL;
+
+	if (acpi_table_parse_entries(ACPI_SIG_HMAT,
+				     sizeof(struct acpi_table_hmat),
+				     ACPI_HMAT_TYPE_PROXIMITY,
+				     mktme_parse_proximity_domains, 0) < 0) {
+		ret = false;
+	}
+	acpi_put_table(tbl);
+	return ret;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index fe3ad4ca5bb3..82b270dfb785 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1341,6 +1341,7 @@ acpi_platform_notify(struct device *dev, enum kobject_action action)
 
 #ifdef CONFIG_X86_INTEL_MKTME
 extern bool acpi_hmat_present(void);
+extern bool mktme_hmat_evaluate(void);
 #endif /* CONFIG_X86_INTEL_MKTME */
 
 #endif	/*_LINUX_ACPI_H*/
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 37/62] keys/mktme: Do not allow key creation in unsafe topologies
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (35 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 36/62] acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 38/62] keys/mktme: Support CPU hotplug for MKTME key service Kirill A. Shutemov
                   ` (26 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

MKTME feature depends upon at least one online CPU capable of
programming each memory controller in the platform.

An unsafe topology for MKTME is a memory only package or a package
with no online CPUs. Key creation with unsafe topologies will fail
with EINVAL and a warning will be logged one time.
For example:
	[ ] MKTME: no online CPU in proximity domain
	[ ] MKTME: topology does not support key creation

These are recoverable errors. CPUs may be brought online that are
capable of programming a previously unprogrammable memory controller,
or an unprogrammable memory controller may be removed from the
platform.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 39 ++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index f5fc6cccc81b..734e1d28eb24 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -26,6 +26,7 @@ cpumask_var_t mktme_leadcpus;		/* One lead CPU per pconfig target */
 static bool mktme_storekeys;		/* True if key payloads may be stored */
 unsigned long *mktme_bitmap_user_type;	/* Shows presence of user type keys */
 struct mktme_payload *mktme_key_store;	/* Payload storage if allowed */
+bool mktme_allow_keys;			/* True when topology supports keys */
 
 /* 1:1 Mapping between Userspace Keys (struct key) and Hardware KeyIDs */
 struct mktme_mapping {
@@ -278,33 +279,55 @@ static void mktme_destroy_key(struct key *key)
 	percpu_ref_kill(&encrypt_count[keyid]);
 }
 
+static void mktme_update_pconfig_targets(void);
 /* Key Service Method to create a new key. Payload is preparsed. */
 int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 {
 	struct mktme_payload *payload = prep->payload.data[0];
 	unsigned long flags;
+	int ret = -ENOKEY;
 	int keyid;
 
 	spin_lock_irqsave(&mktme_lock, flags);
+
+	/* Topology supports key creation */
+	if (mktme_allow_keys)
+		goto get_key;
+
+	/* Topology unknown, check it. */
+	if (!mktme_hmat_evaluate()) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* Keys are now allowed. Update the programming targets. */
+	mktme_update_pconfig_targets();
+	mktme_allow_keys = true;
+
+get_key:
 	keyid = mktme_reserve_keyid(key);
 	spin_unlock_irqrestore(&mktme_lock, flags);
 	if (!keyid)
-		return -ENOKEY;
+		goto out;
 
 	if (percpu_ref_init(&encrypt_count[keyid], mktme_percpu_ref_release,
 			    0, GFP_KERNEL))
-		goto err_out;
+		goto out_free_key;
 
-	if (!mktme_program_keyid(keyid, payload)) {
-		mktme_store_payload(keyid, payload);
-		return MKTME_PROG_SUCCESS;
-	}
+	ret = mktme_program_keyid(keyid, payload);
+	if (ret == MKTME_PROG_SUCCESS)
+		goto out;
+
+	/* Key programming failed */
 	percpu_ref_exit(&encrypt_count[keyid]);
-err_out:
+
+out_free_key:
 	spin_lock_irqsave(&mktme_lock, flags);
 	mktme_release_keyid(keyid);
+out_unlock:
 	spin_unlock_irqrestore(&mktme_lock, flags);
-	return -ENOKEY;
+out:
+	return ret;
 }
 
 /* Make sure arguments are correct for the TYPE of key requested */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 38/62] keys/mktme: Support CPU hotplug for MKTME key service
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (36 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 37/62] keys/mktme: Do not allow key creation in unsafe topologies Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:43 ` [PATCH, RFC 39/62] keys/mktme: Find new PCONFIG targets during memory hotplug Kirill A. Shutemov
                   ` (25 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The MKTME encryption hardware resides on each physical package.
The encryption hardware includes 'Key Tables' that must be
programmed identically across all physical packages in the
platform. Although every CPU in a package can program its key
table, the kernel uses one lead CPU per package for programming.

CPU Hotplug Teardown
--------------------
MKTME manages CPU hotplug teardown to make sure the ability to
program all packages is preserved when MKTME keys are present.

When MKTME keys are not currently programmed, simply allow
the teardown, and set "mktme_allow_keys" to false. This will
force a re-evaluation of the platform topology before the next
key creation. If this CPU teardown mattered, MKTME key service
will report an error and fail to create the key. (User can
online that CPU and try again)

When MKTME keys are currently programmed, allow teardowns
of non 'lead CPU's' and of CPUs where another, core sibling
CPU, can take over as lead. Do not allow teardown of any
lead CPU that would render a hardware key table unreachable!

CPU Hotplug Startup
-------------------
CPUs coming online are of interest to the key service, but since
the service never needs to block a CPU startup event, nor does it
need to prepare for an onlining CPU, a callback is not implemented.

MKTME will catch the availability of the new CPU, if it is
needed, at the next key creation time. If keys are not allowed,
that new CPU will be part of the topology evaluation to determine
if keys should now be allowed.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 51 +++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 734e1d28eb24..3dfc0647f1e5 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -102,9 +102,9 @@ void mktme_percpu_ref_release(struct percpu_ref *ref)
 		return;
 	}
 	percpu_ref_exit(ref);
-	spin_lock_irqsave(&mktme_map_lock, flags);
+	spin_lock_irqsave(&mktme_lock, flags);
 	mktme_release_keyid(keyid);
-	spin_unlock_irqrestore(&mktme_map_lock, flags);
+	spin_unlock_irqrestore(&mktme_lock, flags);
 }
 
 enum mktme_opt_id {
@@ -506,9 +506,46 @@ static int mktme_alloc_pconfig_targets(void)
 	return 0;
 }
 
+static int mktme_cpu_teardown(unsigned int cpu)
+{
+	int new_leadcpu, ret = 0;
+	unsigned long flags;
+
+	/* Do not allow key programming during cpu hotplug event */
+	spin_lock_irqsave(&mktme_lock, flags);
+
+	/*
+	 * When no keys are in use, allow the teardown, and set
+	 * mktme_allow_keys to FALSE. That forces an evaluation
+	 * of the topology before the next key creation.
+	 */
+	if (!mktme_map->mapped_keyids) {
+		mktme_allow_keys = false;
+		goto out;
+	}
+	/* Teardown CPU is not a lead CPU. Allow teardown. */
+	if (!cpumask_test_cpu(cpu, mktme_leadcpus))
+		goto out;
+
+	/* Teardown CPU is a lead CPU. Look for a new lead CPU. */
+	new_leadcpu = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+	if (new_leadcpu < nr_cpumask_bits) {
+		/* New lead CPU found. Update the programming mask */
+		__cpumask_clear_cpu(cpu, mktme_leadcpus);
+		__cpumask_set_cpu(new_leadcpu, mktme_leadcpus);
+	} else {
+		/* New lead CPU not found. Do not allow CPU teardown */
+		ret = -1;
+	}
+out:
+	spin_unlock_irqrestore(&mktme_lock, flags);
+	return ret;
+}
+
 static int __init init_mktme(void)
 {
-	int ret;
+	int ret, cpuhp;
 
 	/* Verify keys are present */
 	if (mktme_nr_keyids < 1)
@@ -553,10 +590,18 @@ static int __init init_mktme(void)
 	if (!mktme_key_store)
 		goto free_bitmap;
 
+	cpuhp = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+					  "keys/mktme_keys:online",
+					  NULL, mktme_cpu_teardown);
+	if (cpuhp < 0)
+		goto free_store;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
 
+	cpuhp_remove_state_nocalls(cpuhp);
+free_store:
 	kfree(mktme_key_store);
 free_bitmap:
 	bitmap_free(mktme_bitmap_user_type);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 39/62] keys/mktme: Find new PCONFIG targets during memory hotplug
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (37 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 38/62] keys/mktme: Support CPU hotplug for MKTME key service Kirill A. Shutemov
@ 2019-05-08 14:43 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 40/62] keys/mktme: Program new PCONFIG targets with MKTME keys Kirill A. Shutemov
                   ` (24 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:43 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Introduce a helper function that detects a newly added PCONFIG
target. This will be used in the MKTME memory hotplug notifier
to determine if a new PCONFIG target has been added that needs
to have its Key Table programmed.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 39 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 3dfc0647f1e5..2c975c48fe44 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -543,6 +543,45 @@ static int mktme_cpu_teardown(unsigned int cpu)
 	return ret;
 }
 
+static int mktme_get_new_pconfig_target(void)
+{
+	unsigned long *prev_map, *tmp_map;
+	int new_target;		/* New PCONFIG target to program */
+
+	/* Save the current mktme_target_map bitmap */
+	prev_map = bitmap_alloc(topology_max_packages(), GFP_KERNEL);
+	bitmap_copy(prev_map, mktme_target_map, sizeof(mktme_target_map));
+
+	/* Update the global targets - includes mktme_target_map */
+	mktme_update_pconfig_targets();
+
+	/* Nothing to do if the target bitmap is unchanged */
+	if (bitmap_equal(prev_map, mktme_target_map, sizeof(prev_map))) {
+		new_target = -1;
+		goto free_prev;
+	}
+
+	/* Find the change in the target bitmap */
+	tmp_map = bitmap_alloc(topology_max_packages(), GFP_KERNEL);
+	bitmap_andnot(tmp_map, prev_map, mktme_target_map,
+		      sizeof(prev_map));
+
+	/* There should only be one new target */
+	if (bitmap_weight(tmp_map, sizeof(tmp_map)) != 1) {
+		pr_err("%s: expected %d new target, got %d\n", __func__, 1,
+		       bitmap_weight(tmp_map, sizeof(tmp_map)));
+		new_target = -1;
+		goto free_tmp;
+	}
+	new_target = find_first_bit(tmp_map, sizeof(tmp_map));
+
+free_tmp:
+	bitmap_free(tmp_map);
+free_prev:
+	bitmap_free(prev_map);
+	return new_target;
+}
+
 static int __init init_mktme(void)
 {
 	int ret, cpuhp;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 40/62] keys/mktme: Program new PCONFIG targets with MKTME keys
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (38 preceding siblings ...)
  2019-05-08 14:43 ` [PATCH, RFC 39/62] keys/mktme: Find new PCONFIG targets during memory hotplug Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 41/62] keys/mktme: Support memory hotplug for " Kirill A. Shutemov
                   ` (23 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

When a new PCONFIG target is added to an MKTME platform, its
key table needs to be programmed to match the key tables across
the entire platform. This type of newly added PCONFIG target
may appear during a memory hotplug event.

This key programming path will differ from the normal key
programming path in that it will only program a single PCONFIG
target, AND, it will only do that programming if allowed.

Allowed means that either user type keys are stored, or, no
user type keys are currently programmed.

So, after checking if programming is allowable, this helper
function will program the one new PCONFIG target, with all
the currently programmed keys.

This will be used in MKTME's memory notifier callback supporting
MEM_GOING_ONLINE events.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 44 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 2c975c48fe44..489dddb8c623 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -582,6 +582,50 @@ static int mktme_get_new_pconfig_target(void)
 	return new_target;
 }
 
+static int mktme_program_new_pconfig_target(int new_pkg)
+{
+	struct mktme_payload *payload;
+	int cpu, keyid, ret;
+
+	/*
+	 * Only program new target when user type keys are stored or,
+	 * no user type keys are currently programmed.
+	 */
+	if (!mktme_storekeys &&
+	    (bitmap_weight(mktme_bitmap_user_type, mktme_nr_keyids)))
+		return -EPERM;
+
+	/* Set mktme_leadcpus to only include new target */
+	cpumask_clear(mktme_leadcpus);
+	for_each_online_cpu(cpu) {
+		if (topology_physical_package_id(cpu) == new_pkg) {
+			__cpumask_set_cpu(cpu, mktme_leadcpus);
+			break;
+		}
+	}
+	/* Program the stored keys into the new key table */
+	for (keyid = 1; keyid <= mktme_nr_keyids; keyid++) {
+		/*
+		 * When a KeyID slot is not in use, the corresponding key
+		 * pointer is 0. '-1' is an intermediate state where the
+		 * key is on it's way out, but not gone yet. Program '-1's.
+		 */
+		if (mktme_map->key[keyid] == 0)
+			continue;
+
+		payload = &mktme_key_store[keyid];
+		ret = mktme_program_keyid(keyid, payload);
+		if (ret != MKTME_PROG_SUCCESS) {
+			/* Quit on first failure to program key table */
+			pr_debug("mktme: %s\n", mktme_error[ret].msg);
+			ret = -ENOKEY;
+			break;
+		}
+	}
+	mktme_update_pconfig_targets();		/* Restore mktme_leadcpus */
+	return ret;
+}
+
 static int __init init_mktme(void)
 {
 	int ret, cpuhp;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 41/62] keys/mktme: Support memory hotplug for MKTME keys
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (39 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 40/62] keys/mktme: Program new PCONFIG targets with MKTME keys Kirill A. Shutemov
@ 2019-05-08 14:44 ` " Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 42/62] mm: Generalize the mprotect implementation to support extensions Kirill A. Shutemov
                   ` (22 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Newly added memory may mean that there is a newly added physical
package.  Intel platforms supporting MKTME need to know about the
new physical packages that may appear during MEM_GOING_ONLINE
events.

Add a memory notifier for MEM_GOING_ONLINE events where MKTME
can evaluate this new memory before it goes online.

MKTME will quickly NOTIFY_OK in MEM_GOING_ONLINE events if no MKTME
keys are currently programmed. If the newly added memory presents
an unsafe MKTME topology, that will be found and reported during the
next key creation attempt. (User can repair and retry.)

When MKTME keys are currently programmed, MKTME will evaluate the
platform topology, detect if a new PCONFIG target has been added,
and program that new pconfig target if allowable.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 57 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 489dddb8c623..904748b540c6 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -8,6 +8,7 @@
 #include <linux/init.h>
 #include <linux/key.h>
 #include <linux/key-type.h>
+#include <linux/memory.h>
 #include <linux/mm.h>
 #include <linux/parser.h>
 #include <linux/percpu-refcount.h>
@@ -626,6 +627,56 @@ static int mktme_program_new_pconfig_target(int new_pkg)
 	return ret;
 }
 
+static int mktme_memory_callback(struct notifier_block *nb,
+				 unsigned long action, void *arg)
+{
+	unsigned long flags;
+	int ret, new_target;
+
+	/* MEM_GOING_ONLINE is the only mem event of interest to MKTME */
+	if (action != MEM_GOING_ONLINE)
+		return NOTIFY_OK;
+
+	/* Do not allow key programming during hotplug event */
+	spin_lock_irqsave(&mktme_lock, flags);
+
+	/*
+	 * If no keys are actually programmed let this event proceed.
+	 * The topology will be checked on the next key creation attempt.
+	 */
+	if (!mktme_map->mapped_keyids) {
+		mktme_allow_keys = false;
+		ret = NOTIFY_OK;
+		goto out;
+	}
+	/* Do not allow this event if it creates an unsafe MKTME topology */
+	if (!mktme_hmat_evaluate()) {
+		ret = NOTIFY_BAD;
+		goto out;
+	}
+	/* Topology is safe. Is there a new pconfig target? */
+	new_target = mktme_get_new_pconfig_target();
+
+	/* No new target to program */
+	if (new_target < 0) {
+		ret = NOTIFY_OK;
+		goto out;
+	}
+	if (mktme_program_new_pconfig_target(new_target))
+		ret = NOTIFY_BAD;
+	else
+		ret = NOTIFY_OK;
+
+out:
+	spin_unlock_irqrestore(&mktme_lock, flags);
+	return ret;
+}
+
+static struct notifier_block mktme_memory_nb = {
+	.notifier_call = mktme_memory_callback,
+	.priority = 99,				/* priority ? */
+};
+
 static int __init init_mktme(void)
 {
 	int ret, cpuhp;
@@ -679,10 +730,16 @@ static int __init init_mktme(void)
 	if (cpuhp < 0)
 		goto free_store;
 
+	/* Memory hotplug */
+	if (register_memory_notifier(&mktme_memory_nb))
+		goto remove_cpuhp;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
 
+	unregister_memory_notifier(&mktme_memory_nb);
+remove_cpuhp:
 	cpuhp_remove_state_nocalls(cpuhp);
 free_store:
 	kfree(mktme_key_store);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 42/62] mm: Generalize the mprotect implementation to support extensions
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (40 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 41/62] keys/mktme: Support memory hotplug for " Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 43/62] syscall/x86: Wire up a system call for MKTME encryption keys Kirill A. Shutemov
                   ` (21 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Today mprotect is implemented to support legacy mprotect behavior
plus an extension for memory protection keys. Make it more generic
so that it can support additional extensions in the future.

This is done is preparation for adding a new system call for memory
encyption keys. The intent is that the new encrypted mprotect will be
another extension to legacy mprotect.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/mprotect.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index e768cd656a48..23e680f4b1d5 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -35,6 +35,8 @@
 
 #include "internal.h"
 
+#define NO_KEY	-1
+
 static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long addr, unsigned long end, pgprot_t newprot,
 		int dirty_accountable, int prot_numa)
@@ -452,9 +454,9 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 }
 
 /*
- * pkey==-1 when doing a legacy mprotect()
+ * When pkey==NO_KEY we get legacy mprotect behavior here.
  */
-static int do_mprotect_pkey(unsigned long start, size_t len,
+static int do_mprotect_ext(unsigned long start, size_t len,
 		unsigned long prot, int pkey)
 {
 	unsigned long nstart, end, tmp, reqprot;
@@ -578,7 +580,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot)
 {
-	return do_mprotect_pkey(start, len, prot, -1);
+	return do_mprotect_ext(start, len, prot, NO_KEY);
 }
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -586,7 +588,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot, int, pkey)
 {
-	return do_mprotect_pkey(start, len, prot, pkey);
+	return do_mprotect_ext(start, len, prot, pkey);
 }
 
 SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 43/62] syscall/x86: Wire up a system call for MKTME encryption keys
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (41 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 42/62] mm: Generalize the mprotect implementation to support extensions Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-29  7:21   ` Mike Rapoport
  2019-05-08 14:44 ` [PATCH, RFC 44/62] x86/mm: Set KeyIDs in encrypted VMAs for MKTME Kirill A. Shutemov
                   ` (20 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

encrypt_mprotect() is a new system call to support memory encryption.

It takes the same parameters as legacy mprotect, plus an additional
key serial number that is mapped to an encryption keyid.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 include/linux/syscalls.h               | 2 ++
 include/uapi/asm-generic/unistd.h      | 4 +++-
 kernel/sys_ni.c                        | 2 ++
 5 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 1f9607ed087c..dbcd4c28d743 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -433,3 +433,4 @@
 425	i386	io_uring_setup		sys_io_uring_setup		__ia32_sys_io_uring_setup
 426	i386	io_uring_enter		sys_io_uring_enter		__ia32_sys_io_uring_enter
 427	i386	io_uring_register	sys_io_uring_register		__ia32_sys_io_uring_register
+428	i386	encrypt_mprotect	sys_encrypt_mprotect		__ia32_sys_encrypt_mprotect
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 92ee0b4378d4..d01bd132e9ee 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -349,6 +349,7 @@
 425	common	io_uring_setup		__x64_sys_io_uring_setup
 426	common	io_uring_enter		__x64_sys_io_uring_enter
 427	common	io_uring_register	__x64_sys_io_uring_register
+428	common	encrypt_mprotect	__x64_sys_encrypt_mprotect
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e446806a561f..38a2d7b95397 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -988,6 +988,8 @@ asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
 asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 				       siginfo_t __user *info,
 				       unsigned int flags);
+asmlinkage long sys_encrypt_mprotect(unsigned long start, size_t len,
+				     unsigned long prot, key_serial_t serial);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index dee7292e1df6..86f942f54b1b 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -832,9 +832,11 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup)
 __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
 #define __NR_io_uring_register 427
 __SYSCALL(__NR_io_uring_register, sys_io_uring_register)
+#define __NR_encrypt_mprotect 428
+__SYSCALL(__NR_encrypt_mprotect, sys_encrypt_mprotect)
 
 #undef __NR_syscalls
-#define __NR_syscalls 428
+#define __NR_syscalls 429
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index d21f4befaea4..80da8d9ac8b1 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -350,6 +350,8 @@ COND_SYSCALL(pkey_mprotect);
 COND_SYSCALL(pkey_alloc);
 COND_SYSCALL(pkey_free);
 
+/* multi-key total memory encryption keys */
+COND_SYSCALL(encrypt_mprotect);
 
 /*
  * Architecture specific weak syscall entries.
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 44/62] x86/mm: Set KeyIDs in encrypted VMAs for MKTME
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (42 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 43/62] syscall/x86: Wire up a system call for MKTME encryption keys Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-06-14 11:44   ` Peter Zijlstra
  2019-05-08 14:44 ` [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call " Kirill A. Shutemov
                   ` (19 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

MKTME architecture requires the KeyID to be placed in PTE bits 51:46.
To create an encrypted VMA, place the KeyID in the upper bits of
vm_page_prot that matches the position of those PTE bits.

When the VMA is assigned a KeyID it is always considered a KeyID
change. The VMA is either going from not encrypted to encrypted,
or from encrypted with any KeyID to encrypted with any other KeyID.
To make the change safely, remove the user pages held by the VMA
and unlink the VMA's anonymous chain.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |  4 ++++
 arch/x86/mm/mktme.c          | 26 ++++++++++++++++++++++++++
 include/linux/mm.h           |  6 ++++++
 3 files changed, 36 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index bd6707e73219..0e6df07f1921 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -12,6 +12,10 @@ extern phys_addr_t mktme_keyid_mask;
 extern int mktme_nr_keyids;
 extern int mktme_keyid_shift;
 
+/* Set the encryption keyid bits in a VMA */
+extern void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
+				unsigned long start, unsigned long end);
+
 DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
 static inline bool mktme_enabled(void)
 {
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 024165c9c7f3..91b49e88ca3f 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,5 +1,6 @@
 #include <linux/mm.h>
 #include <linux/highmem.h>
+#include <linux/rmap.h>
 #include <asm/mktme.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
@@ -53,6 +54,31 @@ int __vma_keyid(struct vm_area_struct *vma)
 	return (prot & mktme_keyid_mask) >> mktme_keyid_shift;
 }
 
+/* Set the encryption keyid bits in a VMA */
+void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
+			  unsigned long start, unsigned long end)
+{
+	int oldkeyid = vma_keyid(vma);
+	pgprotval_t newprot;
+
+	/* Unmap pages with old KeyID if there's any. */
+	zap_page_range(vma, start, end - start);
+
+	if (oldkeyid == newkeyid)
+		return;
+
+	newprot = pgprot_val(vma->vm_page_prot);
+	newprot &= ~mktme_keyid_mask;
+	newprot |= (unsigned long)newkeyid << mktme_keyid_shift;
+	vma->vm_page_prot = __pgprot(newprot);
+
+	/*
+	 * The VMA doesn't have any inherited pages.
+	 * Start anon VMA tree from scratch.
+	 */
+	unlink_anon_vmas(vma);
+}
+
 /* Prepare page to be used for encryption. Called from page allocator. */
 void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2684245f8503..c027044de9bf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2825,5 +2825,11 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+#ifndef CONFIG_X86_INTEL_MKTME
+static inline void mprotect_set_encrypt(struct vm_area_struct *vma,
+					int newkeyid,
+					unsigned long start,
+					unsigned long end) {}
+#endif /* CONFIG_X86_INTEL_MKTME */
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (43 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 44/62] x86/mm: Set KeyIDs in encrypted VMAs for MKTME Kirill A. Shutemov
@ 2019-05-08 14:44 ` " Kirill A. Shutemov
  2019-06-14 11:47   ` Peter Zijlstra
                     ` (2 more replies)
  2019-05-08 14:44 ` [PATCH, RFC 46/62] x86/mm: Keep reference counts on encrypted VMAs " Kirill A. Shutemov
                   ` (18 subsequent siblings)
  63 siblings, 3 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Implement memory encryption for MKTME (Multi-Key Total Memory
Encryption) with a new system call that is an extension of the
legacy mprotect() system call.

In encrypt_mprotect the caller must pass a handle to a previously
allocated and programmed MKTME encryption key. The key can be
obtained through the kernel key service type "mktme". The caller
must have KEY_NEED_VIEW permission on the key.

MKTME places an additional restriction on the protected data:
The length of the data must be page aligned. This is in addition
to the existing mprotect restriction that the addr must be page
aligned.

encrypt_mprotect() will lookup the hardware keyid for the given
userspace key. It will use previously defined helpers to insert
that keyid in the VMAs during legacy mprotect() execution.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/exec.c          |  4 +--
 include/linux/mm.h |  3 +-
 mm/mprotect.c      | 68 +++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 2e0033348d8e..695c121b34b3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -755,8 +755,8 @@ int setup_arg_pages(struct linux_binprm *bprm,
 	vm_flags |= mm->def_flags;
 	vm_flags |= VM_STACK_INCOMPLETE_SETUP;
 
-	ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
-			vm_flags);
+	ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags,
+			     -1);
 	if (ret)
 		goto out_unlock;
 	BUG_ON(prev != vma);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c027044de9bf..a7f52d053826 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1634,7 +1634,8 @@ extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long
 			      int dirty_accountable, int prot_numa);
 extern int mprotect_fixup(struct vm_area_struct *vma,
 			  struct vm_area_struct **pprev, unsigned long start,
-			  unsigned long end, unsigned long newflags);
+			  unsigned long end, unsigned long newflags,
+			  int newkeyid);
 
 /*
  * doesn't attempt to fault and will return short.
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 23e680f4b1d5..38d766b5cc20 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -28,6 +28,7 @@
 #include <linux/ksm.h>
 #include <linux/uaccess.h>
 #include <linux/mm_inline.h>
+#include <linux/key.h>
 #include <asm/pgtable.h>
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
@@ -347,7 +348,8 @@ static int prot_none_walk(struct vm_area_struct *vma, unsigned long start,
 
 int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
-	unsigned long start, unsigned long end, unsigned long newflags)
+	       unsigned long start, unsigned long end, unsigned long newflags,
+	       int newkeyid)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long oldflags = vma->vm_flags;
@@ -357,7 +359,14 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	int error;
 	int dirty_accountable = 0;
 
-	if (newflags == oldflags) {
+	/*
+	 * Flags match and Keyids match or we have NO_KEY.
+	 * This _fixup is usually called from do_mprotect_ext() except
+	 * for one special case: caller fs/exec.c/setup_arg_pages()
+	 * In that case, newkeyid is passed as -1 (NO_KEY).
+	 */
+	if (newflags == oldflags &&
+	    (newkeyid == vma_keyid(vma) || newkeyid == NO_KEY)) {
 		*pprev = vma;
 		return 0;
 	}
@@ -423,6 +432,8 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	}
 
 success:
+	if (newkeyid != NO_KEY)
+		mprotect_set_encrypt(vma, newkeyid, start, end);
 	/*
 	 * vm_flags and vm_page_prot are protected by the mmap_sem
 	 * held in write mode.
@@ -454,10 +465,15 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 }
 
 /*
- * When pkey==NO_KEY we get legacy mprotect behavior here.
+ * do_mprotect_ext() supports the legacy mprotect behavior plus extensions
+ * for Protection Keys and Memory Encryption Keys. These extensions are
+ * mutually exclusive and the behavior is:
+ *	(pkey==NO_KEY && keyid==NO_KEY) ==> legacy mprotect
+ *	(pkey is valid)  ==> legacy mprotect plus Protection Key extensions
+ *	(keyid is valid) ==> legacy mprotect plus Encryption Key extensions
  */
 static int do_mprotect_ext(unsigned long start, size_t len,
-		unsigned long prot, int pkey)
+			   unsigned long prot, int pkey, int keyid)
 {
 	unsigned long nstart, end, tmp, reqprot;
 	struct vm_area_struct *vma, *prev;
@@ -555,7 +571,8 @@ static int do_mprotect_ext(unsigned long start, size_t len,
 		tmp = vma->vm_end;
 		if (tmp > end)
 			tmp = end;
-		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
+		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags,
+				       keyid);
 		if (error)
 			goto out;
 		nstart = tmp;
@@ -580,7 +597,7 @@ static int do_mprotect_ext(unsigned long start, size_t len,
 SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot)
 {
-	return do_mprotect_ext(start, len, prot, NO_KEY);
+	return do_mprotect_ext(start, len, prot, NO_KEY, NO_KEY);
 }
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -588,7 +605,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot, int, pkey)
 {
-	return do_mprotect_ext(start, len, prot, pkey);
+	return do_mprotect_ext(start, len, prot, pkey, NO_KEY);
 }
 
 SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
@@ -637,3 +654,40 @@ SYSCALL_DEFINE1(pkey_free, int, pkey)
 }
 
 #endif /* CONFIG_ARCH_HAS_PKEYS */
+
+#ifdef CONFIG_X86_INTEL_MKTME
+
+extern int mktme_keyid_from_key(struct key *key);
+
+SYSCALL_DEFINE4(encrypt_mprotect, unsigned long, start, size_t, len,
+		unsigned long, prot, key_serial_t, serial)
+{
+	key_ref_t key_ref;
+	struct key *key;
+	int ret, keyid;
+
+	/* MKTME restriction */
+	if (!PAGE_ALIGNED(len))
+		return -EINVAL;
+
+	/*
+	 * key_ref prevents the destruction of the key
+	 * while the memory encryption is being set up.
+	 */
+
+	key_ref = lookup_user_key(serial, 0, KEY_NEED_VIEW);
+	if (IS_ERR(key_ref))
+		return PTR_ERR(key_ref);
+
+	key = key_ref_to_ptr(key_ref);
+	keyid = mktme_keyid_from_key(key);
+	if (!keyid) {
+		key_ref_put(key_ref);
+		return -EINVAL;
+	}
+	ret = do_mprotect_ext(start, len, prot, NO_KEY, keyid);
+	key_ref_put(key_ref);
+	return ret;
+}
+
+#endif /* CONFIG_X86_INTEL_MKTME */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 46/62] x86/mm: Keep reference counts on encrypted VMAs for MKTME
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (44 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call " Kirill A. Shutemov
@ 2019-05-08 14:44 ` " Kirill A. Shutemov
  2019-06-14 11:54   ` Peter Zijlstra
  2019-05-08 14:44 ` [PATCH, RFC 47/62] mm: Restrict MKTME memory encryption to anonymous VMAs Kirill A. Shutemov
                   ` (17 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The MKTME (Multi-Key Total Memory Encryption) Key Service needs
a reference count on encrypted VMAs. This reference count is used
to determine when a hardware encryption KeyID is no longer in use
and can be freed and reassigned to another Userspace Key.

The MKTME Key service does the percpu_ref_init and _kill, so
these gets/puts on encrypted VMA's can be considered the
intermediaries in the lifetime of the key.

Increment/decrement the reference count during encrypt_mprotect()
system call for initial or updated encryption on a VMA.

Piggy back on the vm_area_dup/free() helpers. If the VMAs being
duplicated, or freed are encrypted, adjust the reference count.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |  5 +++++
 arch/x86/mm/mktme.c          | 37 ++++++++++++++++++++++++++++++++++--
 include/linux/mm.h           |  2 ++
 kernel/fork.c                |  2 ++
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 0e6df07f1921..14da002d2e85 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -16,6 +16,11 @@ extern int mktme_keyid_shift;
 extern void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
 				unsigned long start, unsigned long end);
 
+/* MTKME encrypt_count for VMAs */
+extern struct percpu_ref *encrypt_count;
+extern void vma_get_encrypt_ref(struct vm_area_struct *vma);
+extern void vma_put_encrypt_ref(struct vm_area_struct *vma);
+
 DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
 static inline bool mktme_enabled(void)
 {
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 91b49e88ca3f..df70651816a1 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -66,11 +66,12 @@ void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
 
 	if (oldkeyid == newkeyid)
 		return;
-
+	vma_put_encrypt_ref(vma);
 	newprot = pgprot_val(vma->vm_page_prot);
 	newprot &= ~mktme_keyid_mask;
 	newprot |= (unsigned long)newkeyid << mktme_keyid_shift;
 	vma->vm_page_prot = __pgprot(newprot);
+	vma_get_encrypt_ref(vma);
 
 	/*
 	 * The VMA doesn't have any inherited pages.
@@ -79,6 +80,18 @@ void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
 	unlink_anon_vmas(vma);
 }
 
+void vma_get_encrypt_ref(struct vm_area_struct *vma)
+{
+	if (vma_keyid(vma))
+		percpu_ref_get(&encrypt_count[vma_keyid(vma)]);
+}
+
+void vma_put_encrypt_ref(struct vm_area_struct *vma)
+{
+	if (vma_keyid(vma))
+		percpu_ref_put(&encrypt_count[vma_keyid(vma)]);
+}
+
 /* Prepare page to be used for encryption. Called from page allocator. */
 void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
 {
@@ -102,6 +115,22 @@ void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
 
 		page++;
 	}
+
+	/*
+	 * Make sure the KeyID cannot be freed until the last page that
+	 * uses the KeyID is gone.
+	 *
+	 * This is required because the page may live longer than VMA it
+	 * is mapped into (i.e. in get_user_pages() case) and having
+	 * refcounting per-VMA is not enough.
+	 *
+	 * Taking a reference per-4K helps in case if the page will be
+	 * split after the allocation. free_encrypted_page() will balance
+	 * out the refcount even if the page was split and freed as bunch
+	 * of 4K pages.
+	 */
+
+	percpu_ref_get_many(&encrypt_count[keyid], 1 << order);
 }
 
 /*
@@ -110,7 +139,9 @@ void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
  */
 void free_encrypted_page(struct page *page, int order)
 {
-	int i;
+	int i, keyid;
+
+	keyid = page_keyid(page);
 
 	/*
 	 * The hardware/CPU does not enforce coherency between mappings
@@ -125,6 +156,8 @@ void free_encrypted_page(struct page *page, int order)
 		lookup_page_ext(page)->keyid = 0;
 		page++;
 	}
+
+	percpu_ref_put_many(&encrypt_count[keyid], 1 << order);
 }
 
 static int sync_direct_mapping_pte(unsigned long keyid,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a7f52d053826..00c0fd70816b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2831,6 +2831,8 @@ static inline void mprotect_set_encrypt(struct vm_area_struct *vma,
 					int newkeyid,
 					unsigned long start,
 					unsigned long end) {}
+static inline void vma_get_encrypt_ref(struct vm_area_struct *vma) {}
+static inline void vma_put_encrypt_ref(struct vm_area_struct *vma) {}
 #endif /* CONFIG_X86_INTEL_MKTME */
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 9dcd18aa210b..f0e35ed76f5a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -342,12 +342,14 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
 	if (new) {
 		*new = *orig;
 		INIT_LIST_HEAD(&new->anon_vma_chain);
+		vma_get_encrypt_ref(new);
 	}
 	return new;
 }
 
 void vm_area_free(struct vm_area_struct *vma)
 {
+	vma_put_encrypt_ref(vma);
 	kmem_cache_free(vm_area_cachep, vma);
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 47/62] mm: Restrict MKTME memory encryption to anonymous VMAs
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (45 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 46/62] x86/mm: Keep reference counts on encrypted VMAs " Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-06-14 11:55   ` Peter Zijlstra
  2019-05-08 14:44 ` [PATCH, RFC 48/62] selftests/x86/mktme: Test the MKTME APIs Kirill A. Shutemov
                   ` (16 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Memory encryption is only supported for mappings that are ANONYMOUS.
Test the VMA's in an encrypt_mprotect() request to make sure they all
meet that requirement before encrypting any.

The encrypt_mprotect syscall will return -EINVAL and will not encrypt
any VMA's if this check fails.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/mprotect.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 38d766b5cc20..53bd41f99a67 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -346,6 +346,24 @@ static int prot_none_walk(struct vm_area_struct *vma, unsigned long start,
 	return walk_page_range(start, end, &prot_none_walk);
 }
 
+/*
+ * Encrypted mprotect is only supported on anonymous mappings.
+ * If this test fails on any single VMA, the entire mprotect
+ * request fails.
+ */
+static bool mem_supports_encryption(struct vm_area_struct *vma, unsigned long end)
+{
+	struct vm_area_struct *test_vma = vma;
+
+	do {
+		if (!vma_is_anonymous(test_vma))
+			return false;
+
+		test_vma = test_vma->vm_next;
+	} while (test_vma && test_vma->vm_start < end);
+	return true;
+}
+
 int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	       unsigned long start, unsigned long end, unsigned long newflags,
@@ -532,6 +550,12 @@ static int do_mprotect_ext(unsigned long start, size_t len,
 				goto out;
 		}
 	}
+
+	if (keyid > 0 && !mem_supports_encryption(vma, end)) {
+		error = -EINVAL;
+		goto out;
+	}
+
 	if (start > vma->vm_start)
 		prev = vma;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 48/62] selftests/x86/mktme: Test the MKTME APIs
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (46 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 47/62] mm: Restrict MKTME memory encryption to anonymous VMAs Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 17:09   ` Alison Schofield
  2019-05-08 14:44 ` [PATCH, RFC 49/62] mm, x86: export several MKTME variables Kirill A. Shutemov
                   ` (15 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

This is a draft for poweron testing.
I'm assuming it needs to be in Intel-next to be available for poweron.

It is not in the selftest Makefiles.
COMPILE w keyutils library ==>  cc -o mktest mktme_test.c -lkeyutils

Usage: mktme_test [options]...
-a                      Run ALL tests
-t <testnum>            Run one <testnum> test
-l                      List available tests
-h, -?                  Show this help

mktest -l
[ 1] Keys: Add each type key
[ 2] Flow: One simple roundtrip
[ 3] Keys: Valid Payload Options
[ 4] Keys: Invalid Payload Options
[ 5] Keys: Add Key Descriptor Field
[ 6] Keys: Add Multiple Same
[ 7] Keys: Change payload, auto update
[ 8] Keys: Update, explicit update
[ 9] Keys: Update, Clear
[10] Keys: Add, Invalidate Keys
[11] Keys: Add, Revoke Keys
[12] Keys: Keyctl Describe
[13] Keys: Clear
[14] Keys: No Encrypt
[15] Keys: Unique KeyIDs
[16] Keys: Get Max KeyIDs
[17] Encrypt: Parameter Alignment
[18] Encrypt: Change Protections
[19] Encrypt: Swap Keys
[20] Encrypt: Counters Same Key
[21] Encrypt: Counters Diff Key
[22] Encrypt: Counters Holes
[23] Flow: Switch key no data
[24] Flow: Switch key multi VMAs
[25] Flow: Switch No Key to Any Key
[26] Flow: madvise
[27] Flow: Invalidate In Use Key

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 .../selftests/x86/mktme/encrypt_tests.c       | 433 ++++++++++++++
 .../testing/selftests/x86/mktme/flow_tests.c  | 266 +++++++++
 tools/testing/selftests/x86/mktme/key_tests.c | 526 ++++++++++++++++++
 .../testing/selftests/x86/mktme/mktme_test.c  | 300 ++++++++++
 4 files changed, 1525 insertions(+)
 create mode 100644 tools/testing/selftests/x86/mktme/encrypt_tests.c
 create mode 100644 tools/testing/selftests/x86/mktme/flow_tests.c
 create mode 100644 tools/testing/selftests/x86/mktme/key_tests.c
 create mode 100644 tools/testing/selftests/x86/mktme/mktme_test.c

diff --git a/tools/testing/selftests/x86/mktme/encrypt_tests.c b/tools/testing/selftests/x86/mktme/encrypt_tests.c
new file mode 100644
index 000000000000..735d5da89d29
--- /dev/null
+++ b/tools/testing/selftests/x86/mktme/encrypt_tests.c
@@ -0,0 +1,433 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* x86 MKTME Encrypt API Tests */
+
+/* Address & length parameters to encrypt_mprotect() must be page aligned */
+void test_param_alignment(void)
+{
+	size_t datalen = PAGE_SIZE * 2;
+	key_serial_t key;
+	int ret, i;
+	char *buf;
+
+	key = add_key("mktme", "keyname", options_CPU_long,
+		      strlen(options_CPU_long), KEY_SPEC_THREAD_KEYRING);
+
+	if (key == -1) {
+		perror("test_param_alignment");
+		return;
+	}
+	buf = (char *)mmap(NULL, datalen, PROT_NONE,
+			   MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+
+	/* Fail if addr is not page aligned */
+	ret = syscall(sys_encrypt_mprotect, buf + 100, datalen / 2, PROT_NONE,
+		      key);
+	if (!ret)
+		fprintf(stderr, "Error: addr is not page aligned\n");
+
+	/* Fail if len is not page aligned */
+	ret = syscall(sys_encrypt_mprotect, buf, 9, PROT_NONE, key);
+	if (!ret)
+		fprintf(stderr, "Error: len is not page aligned.");
+
+	/* Fail if both addr and len are not page aligned */
+	ret = syscall(sys_encrypt_mprotect, buf + 100, datalen + 100,
+		      PROT_READ | PROT_WRITE, key);
+	if (!ret)
+		fprintf(stderr, "Error: addr and len are not page aligned\n");
+
+	/* Success if both addr and len are page aligned */
+	ret = syscall(sys_encrypt_mprotect, buf, datalen,
+		      PROT_READ | PROT_WRITE, key);
+
+	if (ret)
+		fprintf(stderr, "Fail: addr and len are both page aligned\n");
+
+	ret = munmap(buf, datalen);
+
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "Error: invalidate failed on key [%d]\n", key);
+}
+
+/*
+ * Do encrypt_mprotect and follow with classic mprotects.
+ * KeyID should remain unchanged.
+ */
+void test_change_protections(void)
+{
+	unsigned int keyid, check_keyid;
+	key_serial_t key;
+	void *ptra;
+	int ret, i;
+
+	const int prots[] = {
+		PROT_NONE, PROT_READ, PROT_WRITE, PROT_EXEC,
+		PROT_READ | PROT_WRITE, PROT_READ | PROT_EXEC,
+	};
+
+	key = add_key("mktme", "testkey", options_CPU_long,
+		      strlen(options_CPU_long), KEY_SPEC_THREAD_KEYRING);
+	if (key == -1) {
+		perror(__func__);
+		return;
+	}
+	ptra = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE,
+		    -1, 0);
+	if (!ptra) {
+		fprintf(stderr, "Error: mmap failed.");
+		goto revoke_key;
+	}
+	/* Encrypt Memory */
+	ret = syscall(sys_encrypt_mprotect, ptra, PAGE_SIZE, PROT_NONE, key);
+	if (ret)
+		fprintf(stderr, "Error: encrypt_mprotect [%d]\n", ret);
+
+	/* Remember the assigned KeyID */
+	keyid = find_smaps_keyid((unsigned long)ptra);
+
+	/* Classic mprotects()  should not change KeyID. */
+	for (i = 0; i < ARRAY_SIZE(prots); i++) {
+		ret = mprotect(ptra, PAGE_SIZE, prots[i]);
+		if (ret)
+			fprintf(stderr, "Error: encrypt_mprotect [%d]\n", ret);
+
+		check_keyid = find_smaps_keyid((unsigned long)ptra);
+		if (keyid != check_keyid)
+			fprintf(stderr, "Error: keyid change not expected\n");
+	};
+free_memory:
+	ret = munmap(ptra, PAGE_SIZE);
+revoke_key:
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "Error: invalidate failed. [%d]\n", key);
+}
+
+/*
+ * Make one mapping and create a bunch of keys.
+ * Encrypt that one mapping repeatedly with different keys.
+ * Verify the KeyID changes in smaps.
+ */
+void test_key_swap(void)
+{
+	unsigned int prev_keyid, next_keyid;
+	int maxswaps = max_keyids / 2;		/* Not too many swaps */
+	key_serial_t key[maxswaps];
+	long size = PAGE_SIZE;
+	int keys_available = 0;
+	char name[12];
+	void *ptra;
+	int i, ret;
+
+	for (i = 0; i < maxswaps; i++) {
+		sprintf(name, "mk_swap_%d", i);
+		key[i] = add_key("mktme", name, options_CPU_long,
+				 strlen(options_CPU_long),
+				 KEY_SPEC_THREAD_KEYRING);
+		if (key[i] == -1) {
+			perror(__func__);
+			goto free_keys;
+		} else {
+			keys_available++;
+		}
+	}
+
+	printf("     Info: created %d keys\n", keys_available);
+	ptra = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (!ptra) {
+		perror("mmap");
+		goto free_keys;
+	}
+	prev_keyid = 0;
+
+	for (i = 0; i < keys_available; i++) {
+		ret = syscall(sys_encrypt_mprotect, ptra, size,
+			      PROT_NONE, key[i]);
+		if (ret) {
+			perror("encrypt_mprotect");
+			goto free_memory;
+		}
+
+		next_keyid = find_smaps_keyid((unsigned long)ptra);
+		if (prev_keyid == next_keyid)
+			fprintf(stderr, "Error %s: expected new keyid\n",
+				__func__);
+		prev_keyid = next_keyid;
+	}
+free_memory:
+	ret = munmap(ptra, size);
+
+free_keys:
+	for (i = 0; i < keys_available; i++) {
+		if (keyctl(KEYCTL_INVALIDATE, key[i]) == -1)
+			perror(__func__);
+	}
+}
+
+/*
+ * These may not be doing as orig planned. Need to check that key is
+ * invalidated and then gets destroyed when last map is removed.
+ */
+void test_counters_same(void)
+{
+	key_serial_t key;
+	int count = 4;
+	void *ptr[count];
+	int ret, i;
+
+	/* Get 4 pieces of memory */
+	i = count;
+	while (i--) {
+		ptr[i] = mmap(NULL, PAGE_SIZE, PROT_NONE,
+			      MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+		if (!ptr[i])
+			perror("mmap");
+	}
+	/* Protect with same key */
+	key = add_key("mktme", "mk_same", options_USER, strlen(options_USER),
+		      KEY_SPEC_THREAD_KEYRING);
+
+	if (key == -1) {
+		perror("add_key");
+		goto free_mem;
+	}
+	i = count;
+	while (i--) {
+		ret = syscall(sys_encrypt_mprotect, ptr[i], PAGE_SIZE,
+			      PROT_NONE, key);
+		if (ret)
+			perror("encrypt_mprotect");
+	}
+	/* Discard Key & Unmap Memory (order irrelevant) */
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "Error: invalidate failed.\n");
+free_mem:
+	i = count;
+	while (i--)
+		ret = munmap(ptr[i], PAGE_SIZE);
+}
+
+void test_counters_diff(void)
+{
+	int prot = PROT_READ | PROT_WRITE;
+	long size = PAGE_SIZE;
+	int ret, i;
+	int loop = 4;
+	char name[12];
+	void *ptr[loop];
+	key_serial_t diffkey[loop];
+
+	i = loop;
+	while (i--)
+		ptr[i] = mmap(NULL, size, prot, MAP_ANONYMOUS | MAP_PRIVATE,
+			      -1, 0);
+	i = loop;
+	while (i--) {
+		sprintf(name, "cheese_%d", i);
+		diffkey[i] = add_key("mktme", name, options_USER,
+				     strlen(options_USER),
+				     KEY_SPEC_THREAD_KEYRING);
+		ret = syscall(sys_encrypt_mprotect, ptr[i], size, prot,
+			      diffkey[i]);
+		if (ret)
+			perror("encrypt_mprotect");
+	}
+
+	i = loop;
+	while (i--)
+		ret = munmap(ptr[i], PAGE_SIZE);
+
+	i = loop;
+	while (i--) {
+		if (keyctl(KEYCTL_INVALIDATE, diffkey[i]) == -1)
+			fprintf(stderr, "Error: invalidate failed key:%d\n",
+				diffkey[i]);
+	}
+}
+
+void test_counters_holes(void)
+{
+	int prot = PROT_READ | PROT_WRITE;
+	long size = PAGE_SIZE;
+	int ret, i;
+	int loop = 6;
+	void *ptr[loop];
+	key_serial_t samekey;
+
+	samekey = add_key("mktme", "gouda", options_CPU_long,
+			  strlen(options_CPU_long), KEY_SPEC_THREAD_KEYRING);
+
+	i = loop;
+	while (i--) {
+		ptr[i] = mmap(NULL, size, prot, MAP_ANONYMOUS | MAP_PRIVATE,
+			      -1, 0);
+		if (i % 2) {
+			ret = syscall(sys_encrypt_mprotect, ptr[i], size, prot,
+				      samekey);
+			if (ret)
+				perror("mprotect error");
+		}
+	}
+
+	i = loop;
+	while (i--)
+		ret = munmap(ptr[i], size);
+
+	if (keyctl(KEYCTL_INVALIDATE, samekey) == -1)
+		fprintf(stderr, "Error: invalidate failed\n");
+}
+
+/*
+ * Try on SIMICs. See is SIMICs 'a1a1' thing does the trick.
+ * May need real hardware.
+ * One buffer  -> encrypt entirety w one key
+ * Same buffer -> encrypt in pieces w different keys
+ */
+void test_split(void)
+{
+	int prot = PROT_READ | PROT_WRITE;
+	int ret, i;
+	int pieces = 10;
+	size_t len = PAGE_SIZE;
+	char name[12];
+	char *buf;
+	key_serial_t firstkey;
+	key_serial_t diffkey[pieces];
+
+	/* get one piece of memory, protect it, memset it */
+	buf = (char *)mmap(NULL, len, PROT_NONE,
+			   MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+
+	firstkey = add_key("mktme", "firstkey", options_CPU_long,
+			   strlen(options_CPU_long),
+			   KEY_SPEC_THREAD_KEYRING);
+
+	ret = syscall(sys_encrypt_mprotect, buf, len, PROT_READ | PROT_WRITE,
+		      firstkey);
+
+	if (ret) {
+		printf("firstkey mprotect error:%d\n", ret);
+		goto free_mem;
+	}
+
+	memset(buf, 9, len);
+	/*
+	 * Encrypt pieces of buf with different encryption keys.
+	 * Expect to see the data in those pieces zero'd
+	 */
+	for (i = 0; i < pieces; i++) {
+		sprintf(name, "cheese_%d", i);
+		diffkey[i] = add_key("mktme", name, options_CPU_long,
+				     strlen(options_CPU_long),
+				     KEY_SPEC_THREAD_KEYRING);
+		ret = syscall(sys_encrypt_mprotect, (buf + (i * len)), len,
+			      PROT_READ | PROT_WRITE, diffkey[i]);
+		if (ret)
+			printf("diff key mprotect error:%d\n", ret);
+		else
+			printf("done protecting w i:%d key[%d]\n", i,
+			       diffkey[i]);
+	}
+	printf("SIMICs - this should NOT be all 'f's.\n");
+	for (i = 0; i < len; i++)
+		printf("-%x", buf[i]);
+	printf("\n");
+
+	getchar();
+	i = pieces;
+	for (i = 0; i < pieces; i++) {
+		if (keyctl(KEYCTL_INVALIDATE, diffkey[i]) == -1)
+			fprintf(stderr, "invalidate failed key:%d\n",
+				diffkey[i]);
+	}
+	if (keyctl(KEYCTL_INVALIDATE, firstkey) == -1)
+		fprintf(stderr, "invalidate failed on key:%d\n", firstkey);
+free_mem:
+	ret = munmap(buf, len);
+}
+
+void test_well_suited(void)
+{
+	int prot;
+	long size = PAGE_SIZE;
+	int ret, i;
+	int loop = 6;
+	void *ptr[loop];
+	key_serial_t key;
+	void *addr, *first;
+
+	/* mmap alternating protections so that we get loop# of vma's  */
+	i = loop;
+	/* map the first one */
+	first = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+
+	addr = first + PAGE_SIZE;
+	i--;
+	while (i--)  {
+		prot = (i % 2) ? PROT_READ : PROT_WRITE;
+		ptr[i] = mmap(addr, size, prot, MAP_ANONYMOUS | MAP_PRIVATE,
+			      -1, 0);
+		addr = addr + PAGE_SIZE;
+	}
+	/* Protect with same key */
+	key = add_key("mktme", "mk_suited954", options_USER,
+		      strlen(options_USER), KEY_SPEC_THREAD_KEYRING);
+
+	/* Changing FLAGS and adding KEY */
+	ret = syscall(sys_encrypt_mprotect, ptr[0], (loop * PAGE_SIZE),
+		      PROT_EXEC, key);
+	if (ret)
+		fprintf(stderr, "Error: encrypt_mprotect [%d]\n", ret);
+
+	i = loop;
+	while (i--)
+		ret = munmap(ptr[i], size);
+
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "Error: invalidate failed\n");
+}
+
+void test_not_suited(int argc, char *argv[])
+{
+	int prot;
+	int protA = PROT_READ;
+	int protB = PROT_WRITE;
+	int flagsA = MAP_ANONYMOUS | MAP_PRIVATE;
+	int flagsB = MAP_SHARED | MAP_ANONYMOUS;
+	int flags;
+	int ret, i;
+	int loop = 6;
+	void *ptr[loop];
+	key_serial_t key;
+
+	printf("loop count [%d]\n", loop);
+
+	/* mmap alternating protections so that we get loop# of vma's  */
+	i = loop;
+	while (i--)  {
+		prot = (i % 2) ? PROT_READ : PROT_WRITE;
+		if (i == 2)
+			flags = flagsB;
+		else
+			flags = flagsA;
+		ptr[i] = mmap(NULL, PAGE_SIZE, prot, flags, -1, 0);
+	}
+
+	/* protect with same key */
+	key = add_key("mktme", "mk_notsuited", options_CPU_long,
+		      strlen(options_CPU_long), KEY_SPEC_THREAD_KEYRING);
+
+	/* Changing FLAGS and adding KEY */
+	ret = syscall(sys_encrypt_mprotect, ptr[0], (loop * PAGE_SIZE),
+		      PROT_EXEC, key);
+	if (!ret)
+		fprintf(stderr, "Error: expected encrypt_mprotect to fail.\n");
+
+	i = loop;
+	while (i--)
+		ret = munmap(ptr[i], PAGE_SIZE);
+
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "Error: invalidate failed.\n");
+}
+
diff --git a/tools/testing/selftests/x86/mktme/flow_tests.c b/tools/testing/selftests/x86/mktme/flow_tests.c
new file mode 100644
index 000000000000..87b17d3bf142
--- /dev/null
+++ b/tools/testing/selftests/x86/mktme/flow_tests.c
@@ -0,0 +1,266 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * x86 MKTME:  API Tests
+ *
+ * Flow Tests either
+ *	1) Validate some interaction between the 2 API's: Key & Encrypt
+ *	2) or, Validate code flows, scenarios, known/fixed issues.
+ */
+
+/*
+ * Userspace Keys with outstanding memory mappings can be discarded,
+ * (discarded == revoke, invalidate, expire, unlink)
+ * The paired KeyID will not be freed for reuse until the last memory
+ * mapping is unmapped.
+ */
+void test_discard_in_use_key(void)
+{
+	key_serial_t key;
+	void *ptra;
+	int ret;
+
+	key = add_key("mktme", "discard-test", options_CPU_long,
+		      strlen(options_CPU_long), KEY_SPEC_THREAD_KEYRING);
+
+	if (key == -1) {
+		perror("add key");
+		return;
+	}
+	ptra = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE,
+		    -1, 0);
+	if (!ptra) {
+		fprintf(stderr, "Error: mmap failed. ");
+		if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+			fprintf(stderr, "Error: invalidate failed. Key:%d\n",
+				key);
+		return;
+	}
+	ret = syscall(sys_encrypt_mprotect, ptra, PAGE_SIZE, PROT_NONE, key);
+	if (ret) {
+		fprintf(stderr, "Error: encrypt_mprotect: %d\n", ret);
+		goto free_memory;
+	}
+	if (keyctl(KEYCTL_INVALIDATE, key) != 0)
+		fprintf(stderr, "Error: test_revoke_in_use_key\n");
+free_memory:
+	ret = munmap(ptra, PAGE_SIZE);
+}
+
+/* TODO: Can this be made useful? Used to reproduce a trace in Kai's setup. */
+void test_kai_madvise(void)
+{
+	key_serial_t key;
+	void *ptra;
+	int ret;
+
+	key = add_key("mktme", "testkey", options_USER, strlen(options_USER),
+		      KEY_SPEC_THREAD_KEYRING);
+
+	if (key == -1) {
+		perror("add_key");
+		return;
+	}
+
+	/* TODO wanted MAP_FIXED here - but kept failing to mmap */
+	ptra = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
+		    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (!ptra) {
+		perror("failed to mmap");
+		goto revoke_key;
+	}
+
+	ret = madvise(ptra, PAGE_SIZE, MADV_MERGEABLE);
+	if (ret)
+		perror("madvise err mergeable");
+
+	if ((madvise(ptra, PAGE_SIZE, MADV_HUGEPAGE)) != 0)
+		perror("madvise err hugepage");
+
+	if ((madvise(ptra, PAGE_SIZE, MADV_DONTFORK)) != 0)
+		perror("madvise err dontfork");
+
+	ret = syscall(sys_encrypt_mprotect, ptra, PAGE_SIZE, PROT_NONE, key);
+	if (ret)
+		perror("mprotect error");
+
+	ret = munmap(ptra, PAGE_SIZE);
+revoke_key:
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "invalidate failed on key [%d]\n", key);
+}
+
+void test_one_simple_round_trip(void)
+{
+	long size = PAGE_SIZE * 10;
+	key_serial_t key;
+	void *ptra;
+	int ret;
+
+	key = add_key("mktme", "testkey", options_USER, strlen(options_USER),
+		      KEY_SPEC_THREAD_KEYRING);
+
+	if (key == -1) {
+		perror("add_key");
+		return;
+	}
+
+	ptra = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (!ptra) {
+		perror("failed to mmap");
+		goto revoke_key;
+	}
+
+	ret = syscall(sys_encrypt_mprotect, ptra, size, PROT_NONE, key);
+	if (ret)
+		perror("mprotect error");
+
+	ret = munmap(ptra, size);
+revoke_key:
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "revoke failed on key [%d]\n", key);
+}
+
+void test_switch_key_no_data(void)
+{
+	key_serial_t keyA, keyB;
+	int ret, i;
+	void *buf;
+
+	/*
+	 * Program 2 keys: Protect with one, protect with other
+	 */
+	keyA = add_key("mktme", "keyA", options_USER, strlen(options_USER),
+		       KEY_SPEC_THREAD_KEYRING);
+	if (keyA == -1) {
+		perror("add_key");
+		return;
+	}
+	keyB = add_key("mktme", "keyB", options_CPU_long,
+		       strlen(options_CPU_long), KEY_SPEC_THREAD_KEYRING);
+	if (keyB == -1) {
+		perror("add_key");
+		return;
+	}
+	buf = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE,
+		   -1, 0);
+	if (!buf) {
+		perror("mmap error");
+		goto revoke_key;
+	}
+	ret = syscall(sys_encrypt_mprotect, buf, PAGE_SIZE, PROT_NONE, keyA);
+	if (ret)
+		perror("mprotect error");
+
+	ret = syscall(sys_encrypt_mprotect, buf, PAGE_SIZE, PROT_NONE, keyB);
+	if (ret)
+		perror("mprotect error");
+
+free_memory:
+	ret = munmap(buf, PAGE_SIZE);
+revoke_key:
+	if (keyctl(KEYCTL_INVALIDATE, keyA) == -1)
+		printf("revoke failed on key [%d]\n", keyA);
+	if (keyctl(KEYCTL_INVALIDATE, keyB) == -1)
+		printf("revoke failed on key [%d]\n", keyB);
+}
+
+void test_switch_key_mult_vmas(void)
+{
+	int prot = PROT_READ | PROT_WRITE;
+	long size = PAGE_SIZE;
+	int ret, i;
+	int loop = 12;
+	void *ptr[loop];
+	key_serial_t firstkey;
+	key_serial_t nextkey;
+
+	firstkey = add_key("mktme", "gouda", options_CPU_long,
+			   strlen(options_CPU_long), KEY_SPEC_THREAD_KEYRING);
+	nextkey = add_key("mktme", "ricotta", options_CPU_long,
+			  strlen(options_CPU_long), KEY_SPEC_THREAD_KEYRING);
+
+	i = loop;
+	while (i--) {
+		ptr[i] = mmap(NULL, size, PROT_NONE,
+			      MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+		if (i % 2) {
+			ret = syscall(sys_encrypt_mprotect, ptr[i],
+				      size, prot, firstkey);
+			if (ret)
+				perror("mprotect error");
+		}
+	}
+	i = loop;
+	while (i--) {
+		if (i % 2) {
+			ret = syscall(sys_encrypt_mprotect, ptr[i], size, prot,
+				      nextkey);
+			if (ret)
+				perror("mprotect error");
+		}
+	}
+	i = loop;
+	while (i--)
+		ret = munmap(ptr[i], size);
+
+	if (keyctl(KEYCTL_INVALIDATE, nextkey) == -1)
+		fprintf(stderr, "invalidate failed key %d\n", nextkey);
+	if (keyctl(KEYCTL_INVALIDATE, firstkey) == -1)
+		fprintf(stderr, "invalidate failed key %d\n", firstkey);
+}
+
+/* Write to buf with no encrypt key, then encrypt buf */
+void test_switch_key0_to_key(void)
+{
+	key_serial_t key;
+	size_t datalen = PAGE_SIZE;
+	char *buf_1, *buf_2;
+	int ret, i;
+
+	key = add_key("mktme", "keyA", options_USER, strlen(options_USER),
+		      KEY_SPEC_THREAD_KEYRING);
+	if (key == -1) {
+		perror("add_key");
+		return;
+	}
+	buf_1 = (char *)mmap(NULL, datalen, PROT_READ | PROT_WRITE,
+			   MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (!buf_1) {
+		perror("failed to mmap");
+		goto inval_key;
+	}
+	buf_2 = (char *)mmap(NULL, datalen, PROT_READ | PROT_WRITE,
+			   MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (!buf_2) {
+		perror("failed to mmap");
+		goto inval_key;
+	}
+	memset(buf_1, 9, datalen);
+	memset(buf_2, 9, datalen);
+
+	ret = syscall(sys_encrypt_mprotect, buf_1, datalen,
+		      PROT_READ | PROT_WRITE, key);
+	if (ret)
+		perror("mprotect error");
+
+	if (!memcmp(buf_1, buf_2, sizeof(buf_1)))
+		fprintf(stderr, "Error: bufs should not have matched\n");
+
+free_memory:
+	ret = munmap(buf_1, datalen);
+	ret = munmap(buf_2, datalen);
+inval_key:
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "invalidate failed on key [%d]\n", key);
+}
+
+void test_zero_page(void)
+{
+	/*
+	 * write access to the zero page, gets replaced with a newly
+	 * allocated page.
+	 * Can this be seen in smaps?
+	 */
+}
+
diff --git a/tools/testing/selftests/x86/mktme/key_tests.c b/tools/testing/selftests/x86/mktme/key_tests.c
new file mode 100644
index 000000000000..ff4c18dbf533
--- /dev/null
+++ b/tools/testing/selftests/x86/mktme/key_tests.c
@@ -0,0 +1,526 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ *  Testing payload options
+ *
+ *  Invalid options should return -EINVAL, not a Key.
+ *  TODO This is just checking for the Key.
+ *       Add a check for the actual -EINVAL return.
+ *
+ *  Invalid option cases are grouped based on why they are invalid.
+ *  Valid option cases are one large array of expected goodness
+ *
+ */
+const char *bad_type_tail = "algorithm=aes-xts-128 key=12345678123456781234567812345678 tweak=12345678123456781234567812345678";
+const char *bad_type[] = {
+	"type=",			/* missing */
+	"type=cpu, type=cpu",		/* duplicate good */
+	"type=cpu, type=user",
+	"type=user, type=user",
+	"type=user, type=cpu",
+	"type=cp",			/* spelling */
+	"type=cpus",
+	"type=pu",
+	"type=cpucpu",
+	"type=useruser",
+	"type=use",
+	"type=users",
+	"type=used",
+	"type=User",			/* case */
+	"type=USER",
+	"type=UsEr",
+	"type=CPU",
+	"type=Cpu",
+};
+
+const char *bad_alg_tail = "type=cpu";
+const char *bad_algorithm[] = {
+	"algorithm=",
+	"algorithm=aes-xts-12",
+	"algorithm=aes-xts-128aes-xts-128",
+	"algorithm=es-xts-128",
+	"algorithm=bad",
+	"algorithm=aes-xts-128-xxxx",
+	"algorithm=xxx-aes-xts-128",
+};
+
+const char *bad_key_tail = "type=cpu algorithm=aes-xts-128 tweak=12345678123456781234567812345678";
+const char *bad_key[] = {
+	"key=",
+	"key=0",
+	"key=ababababababab",
+	"key=blah",
+	"key=0123333456789abcdef",
+	"key=abracadabra",
+	"key=-1",
+};
+
+const char *bad_tweak_tail = "type=cpu algorithm=aes-xts-128 key=12345678123456781234567812345678";
+const char *bad_tweak[] = {
+	"tweak=",
+	"tweak=ab",
+	"tweak=bad",
+	"tweak=-1",
+	"tweak=000000000000000",
+};
+
+/* Bad, missing, repeating tokens and bad overall payload length */
+const char *bad_other[] = {
+	"",
+	" ",
+	"a ",
+	"algorithm= tweak= type= key=",
+	"key=aaaaaaaaaaaaaaaa tweak=aaaaaaaaaaaaaaaa type=cpu",
+	"algorithm=aes-xts-128 tweak=0000000000000000 tweak=aaaaaaaaaaaaaaaa key=0000000000000000  type=cpu",
+	"algorithm=aes-xts-128 tweak=0000000000000000 key=0000000000000000 key=0000000000000000 type=cpu",
+	"algorithm=aes-xts-128 tweak=0000000000000000 key=0000000000000000  type=cpu type=cpu",
+	"algorithm=aes-xts-128 tweak=0000000000000000 key=0000000000000000  type=cpu type=user",
+	"tweak=0000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
+};
+
+void test_invalid_options(const char *bad_options[], unsigned int size,
+			  const char *good_tail, char *descrip)
+{
+	key_serial_t key[size];
+	char options[512];
+	char name[15];
+	int i, ret;
+
+	for (i = 0; i < size; i++) {
+		sprintf(name, "mk_inv_%d", i);
+		sprintf(options, "%s %s", bad_options[i], good_tail);
+
+		key[i] = add_key("mktme", name, options,
+				 strlen(options),
+				 KEY_SPEC_THREAD_KEYRING);
+		if (key[i] > 0)
+			fprintf(stderr, "Error %s: [%s] accepted.\n",
+				descrip, bad_options[i]);
+	}
+	for (i = 0; i < size; i++) {
+		if (key[i] > 0) {
+			ret = keyctl(KEYCTL_INVALIDATE, key[i]);
+			if (ret == -1)
+				fprintf(stderr, "Key invalidate failed: [%d]\n",
+					key[i]);
+		}
+	}
+}
+
+void test_keys_invalid_options(void)
+{
+	test_invalid_options(bad_type, ARRAY_SIZE(bad_type),
+			     bad_type_tail, "Invalid Type Option");
+	test_invalid_options(bad_algorithm, ARRAY_SIZE(bad_algorithm),
+			     bad_alg_tail, "Invalid Algorithm Option");
+	test_invalid_options(bad_key, ARRAY_SIZE(bad_key),
+			     bad_key_tail, "Invalid Key Option");
+	test_invalid_options(bad_tweak, ARRAY_SIZE(bad_tweak),
+			     bad_tweak_tail, "Invalid Tweak Option");
+	test_invalid_options(bad_other, ARRAY_SIZE(bad_other),
+			     NULL, "Invalid Option");
+}
+
+const char *valid_options[] = {
+	"algorithm=aes-xts-128 type=user key=0123456789abcdef0123456789abcdef tweak=abababababababababababababababab",
+	"algorithm=aes-xts-128 type=user tweak=0123456789abcdef0123456789abcdef key=abababababababababababababababab",
+	"algorithm=aes-xts-128 type=user key=01010101010101010101010101010101 tweak=0123456789abcdef0123456789abcdef",
+	"algorithm=aes-xts-128 tweak=01010101010101010101010101010101 type=user key=0123456789abcdef0123456789abcdef",
+	"algorithm=aes-xts-128 key=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa tweak=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa type=user",
+	"algorithm=aes-xts-128 tweak=aaaaaaaaaaaaaaaa0000000000000000 key=aaaaaaaaaaaaaaaa0000000000000000  type=user",
+	"algorithm=aes-xts-128 type=cpu key=aaaaaaaaaaaaaaaa0123456789abcdef tweak=abababaaaaaaaaaaaaaaaaababababab",
+	"algorithm=aes-xts-128 type=cpu tweak=0123456aaaaaaaaaaaaaaaa789abcdef key=abababaaaaaaaaaaaaaaaaababababab",
+	"algorithm=aes-xts-128 type=cpu key=010101aaaaaaaaaaaaaaaa0101010101 tweak=01234567aaaaaaaaaaaaaaaa89abcdef",
+	"algorithm=aes-xts-128 tweak=01010101aaaaaaaaaaaaaaaa01010101 type=cpu key=012345aaaaaaaaaaaaaaaa6789abcdef",
+	"algorithm=aes-xts-128 key=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa tweak=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa type=cpu",
+	"algorithm=aes-xts-128 tweak=00000000000000000000000000000000 type=cpu",
+	"algorithm=aes-xts-128 key=00000000000000000000000000000000 type=cpu",
+	"algorithm=aes-xts-128 type=cpu",
+	"algorithm=aes-xts-128 tweak=00000000000000000000000000000000 key=00000000000000000000000000000000 type=cpu",
+	"algorithm=aes-xts-128 tweak=00000000000000000000000000000000 key=00000000000000000000000000000000 type=cpu",
+};
+
+void test_keys_valid_options(void)
+{
+	char name[15];
+	int i, ret;
+	key_serial_t key[ARRAY_SIZE(valid_options)];
+
+	for (i = 0; i < ARRAY_SIZE(valid_options); i++) {
+		sprintf(name, "mk_val_%d", i);
+		key[i] = add_key("mktme", name, valid_options[i],
+				 strlen(valid_options[i]),
+				 KEY_SPEC_THREAD_KEYRING);
+		if (key[i] <= 0)
+			fprintf(stderr, "Fail valid option: [%s]\n",
+				valid_options[i]);
+	}
+	for (i = 0; i < ARRAY_SIZE(valid_options); i++) {
+		if (key[i] > 0) {
+			ret = keyctl(KEYCTL_INVALIDATE, key[i]);
+			if (ret)
+				fprintf(stderr, "Invalidate failed key[%d]\n",
+					key[i]);
+		}
+	}
+}
+
+/*
+ *  key_serial_t add_key(const char *type, const char *description,
+ *			 const void *payload, size_t plen,
+ *			 key_serial_t keyring);
+ *
+ *  The Kernel Key Service should validate this. But, let's validate
+ *  some basic syntax. MKTME Keys does NOT propose a description based
+ *  on type and payload if no description is provided. (Some other key
+ *  types do make that 'proposal'.)
+ */
+
+void test_keys_descriptor(void)
+{
+	key_serial_t key;
+
+	key = add_key("mktme", NULL, options_CPU_long, strlen(options_CPU_long),
+		      KEY_SPEC_THREAD_KEYRING);
+
+	if (errno != EINVAL)
+		fprintf(stderr, "Fail: expected EINVAL with NULL descriptor\n");
+
+	if (key > 0)
+		if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+			fprintf(stderr, "Key invalidate failed: %s\n",
+				strerror(errno));
+
+	key = add_key("mktme", "", options_CPU_long, strlen(options_CPU_long),
+		      KEY_SPEC_THREAD_KEYRING);
+
+	if (errno != EINVAL)
+		fprintf(stderr,
+			"Fail: expected EINVAL with empty descriptor\n");
+
+	if (key > 0)
+		if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+			fprintf(stderr, "Key invalidate failed: %s\n",
+				strerror(errno));
+}
+
+/*
+ * Test: Add multiple keys with with same descriptor
+ *
+ * Expect that the same Key Handle (key_serial_t) will be returned
+ * on each subsequent request for the same key. This is treated like
+ * a key update.
+ */
+
+void test_keys_add_mult_same(void)
+{
+	int i, inval, num_keys = 5;
+	key_serial_t key[num_keys];
+
+	for (i = 1; i <= num_keys; i++) {
+		key[i] = add_key("mktme", "multiple_keys",
+				 options_USER,
+				 strlen(options_USER),
+				 KEY_SPEC_THREAD_KEYRING);
+
+		if (i > 1)
+			if (key[i] != key[i - 1]) {
+				fprintf(stderr, "Fail: expected same key.\n");
+				inval = i;    /* maybe i keys to invalidate */
+				goto out;
+			}
+	}
+	inval = 1;    /* if all works correctly, only 1 key to invalidate */
+out:
+	for (i = 1; i <= inval; i++) {
+		if (keyctl(KEYCTL_INVALIDATE, key[i]) == -1)
+			fprintf(stderr, "Key invalidate failed: %s\n",
+				strerror(errno));
+	}
+}
+
+/*
+ * Add two keys with the same descriptor but different payloads.
+ * The result should be one key with the payload from the second
+ * add_key() request. Key Service recognizes the duplicate
+ * descriptor and allows the payload to be updated.
+ *
+ * mktme key type chooses not to support the keyctl read command.
+ * This means we cannot read the key payloads back to compare.
+ * That piece can only be verified in debug mode.
+ */
+void test_keys_change_payload(void)
+{
+	key_serial_t key_a, key_b;
+
+	key_a = add_key("mktme", "changepay", options_USER,
+			strlen(options_USER), KEY_SPEC_THREAD_KEYRING);
+	if (key_a == -1) {
+		fprintf(stderr, "Failed to add test key_a: %s\n",
+			strerror(errno));
+		return;
+	}
+	key_b = add_key("mktme", "changepay", options_CPU_long,
+			strlen(options_CPU_long), KEY_SPEC_THREAD_KEYRING);
+	if (key_b == -1) {
+		fprintf(stderr, "Failed to add test key_b: %s\n",
+			strerror(errno));
+		goto out;
+	}
+	if (key_a != key_b) {
+		fprintf(stderr, "Fail: expected same key, got new key.\n");
+		if (keyctl(KEYCTL_INVALIDATE, key_b) == -1)
+			fprintf(stderr, "Key invalidate failed: %s\n",
+				strerror(errno));
+	}
+out:
+	if (keyctl(KEYCTL_INVALIDATE, key_a) == -1)
+		fprintf(stderr, "Key invalidate failed: %s\n", strerror(errno));
+}
+
+/*  Add a key, then discard via method parameter: revoke or invalidate */
+void test_keys_add_discard(int method)
+{
+	key_serial_t key;
+	int i;
+
+	key = add_key("mktme", "mtest_add_discard", options_USER,
+		      strlen(options_USER), KEY_SPEC_THREAD_KEYRING);
+	if (key < 0)
+		perror("add_key");
+
+	if (keyctl(method, key) == -1)
+		fprintf(stderr, "Key %s failed: %s\n",
+			((method == KEYCTL_INVALIDATE) ? "invalidate"
+			: "revoke"), strerror(errno));
+}
+
+void test_keys_add_invalidate(void)
+{
+	test_keys_add_discard(KEYCTL_INVALIDATE);
+}
+
+void test_keys_add_revoke(void)
+{
+	if (remove_gc_delay()) {
+		fprintf(stderr, "Skipping REVOKE test. Cannot set gc_delay.\n");
+		return;
+	}
+	test_keys_add_discard(KEYCTL_REVOKE);
+	restore_gc_delay();
+}
+
+void test_keys_describe(void)
+{
+	key_serial_t key;
+	char buf[256];
+	int ret;
+
+	key = add_key("mktme", "describe_this_key", options_USER,
+		      strlen(options_USER), KEY_SPEC_THREAD_KEYRING);
+
+	if (key == -1) {
+		fprintf(stderr, "Add_key failed.\n");
+		return;
+	}
+	if (keyctl(KEYCTL_DESCRIBE, key, buf, sizeof(buf)) == -1) {
+		fprintf(stderr, "%s: KEYCTL_DESCRIBE failed\n", __func__);
+		goto revoke_key;
+	}
+	if (strncmp(buf, "mktme", 5))
+		fprintf(stderr, "Error: mktme descriptor missing.\n");
+
+revoke_key:
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "Key invalidate failed: %s\n", strerror(errno));
+}
+
+void test_keys_update_explicit(void)
+{
+	key_serial_t key;
+
+	key = add_key("mktme", "testkey", options_USER, strlen(options_USER),
+		      KEY_SPEC_SESSION_KEYRING);
+
+	if (key == -1) {
+		perror("add_key");
+		return;
+	}
+	if (keyctl(KEYCTL_UPDATE, key, options_CPU_long,
+		   strlen(options_CPU_long)) == -1)
+		fprintf(stderr, "Error: Update key failed\n");
+
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "Key invalidate failed: %s\n", strerror(errno));
+}
+
+void test_keys_update_clear(void)
+{
+	key_serial_t key;
+
+	key = add_key("mktme", "testkey", options_USER, strlen(options_USER),
+		      KEY_SPEC_SESSION_KEYRING);
+
+	if (keyctl(KEYCTL_UPDATE, key, options_CLEAR,
+		   strlen(options_CLEAR)) == -1)
+		fprintf(stderr, "update: clear key failed\n");
+
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "Key invalidate failed: %s\n", strerror(errno));
+}
+
+void test_keys_no_encrypt(void)
+{
+	key_serial_t key;
+
+	key = add_key("mktme", "no_encrypt_key", options_NOENCRYPT,
+		      strlen(options_USER), KEY_SPEC_SESSION_KEYRING);
+
+	if (key == -1) {
+		fprintf(stderr, "Error: add_key type=no_encrypt failed.\n");
+		return;
+	}
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		fprintf(stderr, "Key invalidate failed: %s\n", strerror(errno));
+}
+
+void test_keys_unique_keyid(void)
+{
+	/*
+	 * exists[] array must be of mktme_nr_keyids + 1 size, else the
+	 * uniqueness test will fail. OK for max_keyids under test to be
+	 * less than mktme_nr_keyids.
+	 */
+	unsigned int exists[max_keyids + 1];
+	unsigned int keyids[max_keyids + 1];
+	key_serial_t key[max_keyids + 1];
+	void *ptr[max_keyids + 1];
+	int keys_available = 0;
+	char name[12];
+	int i, ret;
+
+	/* Get as many keys as possible */
+	for (i = 1; i <= max_keyids; i++) {
+		sprintf(name, "mk_unique_%d", i);
+		key[i] = add_key("mktme", name, options_CPU_short,
+				 strlen(options_CPU_short),
+				 KEY_SPEC_THREAD_KEYRING);
+		if (key[i] > 0)
+			keys_available++;
+	}
+	/* Create mappings, encrypt them, and find the assigned KeyIDs */
+	for (i = 1; i <= keys_available; i++) {
+		ptr[i] = mmap(NULL, PAGE_SIZE, PROT_NONE,
+			      MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+		ret = syscall(sys_encrypt_mprotect, ptr[i], PAGE_SIZE,
+			      PROT_NONE, key[i]);
+		keyids[i] = find_smaps_keyid((unsigned long)ptr[i]);
+	}
+	/* Verify the KeyID's are unique */
+	memset(exists, 0, sizeof(exists));
+	for (i = 1; i <= keys_available; i++) {
+		if (exists[keyids[i]])
+			fprintf(stderr, "Error: duplicate keyid %d\n",
+				keyids[i]);
+		exists[keyids[i]] = 1;
+	}
+
+	/* Clean up */
+	for (i = 1; i <= keys_available; i++) {
+		ret = munmap(ptr[i], PAGE_SIZE);
+		if (keyctl(KEYCTL_INVALIDATE, key[i]) == -1)
+			fprintf(stderr, "Invalidate failed Serial:%d\n",
+				key[i]);
+	}
+	sleep(1);  /* Rest a bit while keys get freed. */
+}
+
+void test_keys_get_max_keyids(void)
+{
+	key_serial_t key[max_keyids + 1];
+	int keys_available = 0;
+	char name[12];
+	int i, ret;
+
+	for (i = 1; i <= max_keyids; i++) {
+		sprintf(name, "mk_get63_%d", i);
+		key[i] = add_key("mktme", name, options_CPU_short,
+				 strlen(options_CPU_short),
+				 KEY_SPEC_THREAD_KEYRING);
+		if (key[i] > 0)
+			keys_available++;
+	}
+
+	fprintf(stderr, "     Info: got %d of %d system keys\n",
+		keys_available, max_keyids);
+
+	for (i = 1; i <= keys_available; i++) {
+		if (keyctl(KEYCTL_INVALIDATE, key[i]) == -1)
+			fprintf(stderr, "Invalidate failed Serial:%d\n",
+				key[i]);
+	}
+	sleep(1);  /* Rest a bit while keys get freed. */
+}
+
+/*
+ * TODO: Run out of keys, release 1, grab it, repeat
+ * This test in not completed and is not in the run list.
+ */
+void test_keys_max_out(void)
+{
+	key_serial_t key[max_keyids + 1];
+	int keys_available;
+	char name[12];
+	int i, ret;
+
+	/* Get all the keys or as many as possible: keys_available */
+	for (i = 1; i <= max_keyids; i++) {
+		sprintf(name, "mk_max_%d", i);
+		key[i] = add_key("mktme", name, options_CPU_short,
+				 strlen(options_CPU_short),
+				 KEY_SPEC_THREAD_KEYRING);
+		if (key[i] < 0) {
+			fprintf(stderr, "failed to get key[%d]\n", i);
+			continue;
+		}
+	}
+	keys_available = i - 1;
+	if (keys_available < max_keyids)
+		printf("Error: only got %d keys, expected %d\n",
+		       keys_available, max_keyids);
+
+	for (i = 1; i <= keys_available; i++) {
+		if (keyctl(KEYCTL_INVALIDATE, key[i]) == -1)
+			fprintf(stderr, "Invalidate failed key:%d\n", key[i]);
+	}
+}
+
+/* Add each type of key */
+void test_keys_add_each_type(void)
+{
+	key_serial_t key;
+	int i;
+
+	const char *options[] = {
+		options_CPU_short, options_CPU_long, options_USER,
+		options_CLEAR, options_NOENCRYPT
+	};
+	static const char *opt_name[] = {
+		"add_key cpu_short", "add_key cpu_long", "add_key user",
+		"add_key clear", "add_key no-encrypt"
+	};
+
+	for (i = 0; i < ARRAY_SIZE(options); i++) {
+		key = add_key("mktme", opt_name[i], options[i],
+			      strlen(options[i]), KEY_SPEC_SESSION_KEYRING);
+
+		if (key == -1) {
+			perror(opt_name[i]);
+		} else {
+			perror(opt_name[i]);
+			if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+				fprintf(stderr, "Key invalidate failed: %d\n",
+					key);
+		}
+	}
+}
diff --git a/tools/testing/selftests/x86/mktme/mktme_test.c b/tools/testing/selftests/x86/mktme/mktme_test.c
new file mode 100644
index 000000000000..6409ccf94d4a
--- /dev/null
+++ b/tools/testing/selftests/x86/mktme/mktme_test.c
@@ -0,0 +1,300 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Tests x86 MKTME Multi-Key Memory Protection
+ *
+ * COMPILE w keyutils library ==>  cc -o mktest mktme_test.c -lkeyutils
+ *
+ * Test requires capability of CAP_SYS_RESOURCE, or CAP_SYS_ADMIN.
+ * $ sudo setcap 'CAP_SYS_RESOURCE+ep' mktest
+ *
+ * Some tests may require root privileges because the test needs to
+ * remove the garbage collection delay /proc/sys/kernel/keys/gc_delay
+ * while testing. This keeps the tests (and system) from appearing to
+ * be out of keys when keys are simply awaiting the next scheduled
+ * garbage collection.
+ *
+ * Documentation/x86/mktme.rst
+ *
+ * There are examples in here of:
+ *  * how to use the Kernel Key Service MKTME API to allocate keys
+ *  * how to use the MKTME Memory Encryption API to encrypt memory
+ *
+ * Adding Tests:
+ *	o Each test should run independently and clean up after itself.
+ *	o There are no dependencies among tests.
+ *	o Tests that use a lot of keys, should consider adding sleep(),
+ *	  so that the next test isn't key-starved.
+ *	o Make no assumptions about the order in which tests will run.
+ *	o There are shared defines that can be used for setting
+ *	  payload options.
+ */
+#include <sys/fcntl.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <errno.h>
+#include <keyutils.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
+#define PAGE_SIZE sysconf(_SC_PAGE_SIZE)
+#define sys_encrypt_mprotect 335
+
+/*  TODO get this from kernel. Add to /proc/sys/kernel/keys/ */
+int max_keyids = 63;
+
+/* Use these pre-defined options to simplify the add_key() setup */
+char *options_CPU_short = "algorithm=aes-xts-128 type=cpu";
+char *options_CPU_long = "algorithm=aes-xts-128 type=cpu key=12345678912345671234567891234567 tweak=12345678912345671234567891234567";
+char *options_USER = "algorithm=aes-xts-128 type=user key=12345678912345671234567891234567 tweak=12345678912345671234567891234567";
+char *options_CLEAR = "type=clear";
+char *options_NOENCRYPT = "type=no-encrypt";
+
+/* Helper to check Encryption_KeyID in proc/self/smaps */
+static FILE *seek_to_smaps_entry(unsigned long addr)
+{
+	FILE *file;
+	char *line = NULL;
+	size_t size = 0;
+	unsigned long start, end;
+	char perms[5];
+	unsigned long offset;
+	char dev[32];
+	unsigned long inode;
+	char path[BUFSIZ];
+
+	file = fopen("/proc/self/smaps", "r");
+	if (!file) {
+		perror("fopen smaps");
+		_exit(1);
+	}
+	while (getline(&line, &size, file) > 0) {
+		if (sscanf(line, "%lx-%lx %s %lx %s %lu %s\n",
+			   &start, &end, perms, &offset, dev, &inode, path) < 6)
+			goto next;
+
+		if (start <= addr && addr < end)
+			goto out;
+next:
+		free(line);
+		line = NULL;
+		size = 0;
+	}
+	fclose(file);
+	file = NULL;
+out:
+	free(line);
+	return file;
+}
+
+/* Find the KeyID for this addr from /proc/self/smaps */
+unsigned int find_smaps_keyid(unsigned long addr)
+{
+	unsigned int keyid = 0;
+	char *line = NULL;
+	size_t size = 0;
+	FILE *smaps;
+
+	smaps = seek_to_smaps_entry(addr);
+	if (!smaps) {
+		printf("Unable to parse /proc/self/smaps\n");
+		goto out;
+	}
+	while (getline(&line, &size, smaps) > 0) {
+		if (!strstr(line, "KeyID:")) {
+			free(line);
+			line = NULL;
+			size = 0;
+			continue;
+		}
+		if (sscanf(line, "KeyID:             %5u\n", &keyid) < 1)
+			printf("Unable to parse smaps for KeyID:%s\n", line);
+		break;
+	}
+out:
+	free(line);
+	fclose(smaps);
+	return keyid;
+}
+
+/*
+ * Set the garbage collection delay to 0, so that keys are quickly
+ * available for re-use while running the selftests.
+ *
+ * Most tests use INVALIDATE to remove a key, which has no delay by
+ * design. But, revoke, unlink, and timeout still have a delay, so
+ * they should use this.
+ */
+char current_gc_delay[10] = {0};
+static inline int remove_gc_delay(void)
+{
+	int fd;
+
+	fd = open("/proc/sys/kernel/keys/gc_delay", O_RDWR | O_NONBLOCK);
+	if (fd < 0) {
+		perror("Failed to open /proc/sys/kernel/keys/gc_delay");
+		return -1;
+	}
+	if (read(fd, current_gc_delay, sizeof(current_gc_delay)) <= 0) {
+		perror("Failed to read /proc/sys/kernel/keys/gc_delay");
+		close(fd);
+		return -1;
+	}
+	lseek(fd, 0, SEEK_SET);
+	if (write(fd, "0", sizeof(char)) != sizeof(char)) {
+		perror("Failed to write temp_gc_delay to gc_delay\n");
+		close(fd);
+		return -1;
+	}
+	close(fd);
+	return 0;
+}
+
+static inline void restore_gc_delay(void)
+{
+	int fd;
+
+	fd  = open("/proc/sys/kernel/keys/gc_delay", O_RDWR | O_NONBLOCK);
+	if (fd < 0) {
+		perror("Failed to open /proc/sys/kernel/keys/gc_delay");
+		return;
+	}
+	if (write(fd, current_gc_delay, strlen(current_gc_delay)) !=
+	    strlen(current_gc_delay)) {
+		perror("Failed to restore gc_delay\n");
+		close(fd);
+		return;
+	}
+	close(fd);
+}
+
+/*
+ * The tests are sorted into 3 categories:
+ * key_test encrypt_test focus on their specific API
+ * flow_tests are special flows and regression tests of prior issue.
+ */
+
+#include "key_tests.c"
+#include "encrypt_tests.c"
+#include "flow_tests.c"
+
+struct tlist {
+	const char *name;
+	void (*func)();
+};
+
+static const struct tlist mktme_tests[] = {
+{"Keys: Add each type key",		test_keys_add_each_type		},
+{"Flow: One simple roundtrip",		test_one_simple_round_trip	},
+{"Keys: Valid Payload Options",		test_keys_valid_options		},
+{"Keys: Invalid Payload Options",	test_keys_invalid_options	},
+{"Keys: Add Key Descriptor Field",	test_keys_descriptor		},
+{"Keys: Add Multiple Same",		test_keys_add_mult_same		},
+{"Keys: Change payload, auto update",	test_keys_change_payload	},
+{"Keys: Update, explicit update",	test_keys_update_explicit	},
+{"Keys: Update, Clear",			test_keys_update_clear		},
+{"Keys: Add, Invalidate Keys",		test_keys_add_invalidate	},
+{"Keys: Add, Revoke Keys",		test_keys_add_revoke		},
+{"Keys: Keyctl Describe",		test_keys_describe		},
+{"Keys: Clear",				test_keys_update_clear		},
+{"Keys: No Encrypt",			test_keys_no_encrypt		},
+{"Keys: Unique KeyIDs",			test_keys_unique_keyid		},
+{"Keys: Get Max KeyIDs",		test_keys_get_max_keyids	},
+{"Encrypt: Parameter Alignment",	test_param_alignment		},
+{"Encrypt: Change Protections",		test_change_protections		},
+{"Encrypt: Swap Keys",			test_key_swap			},
+{"Encrypt: Counters Same Key",		test_counters_same		},
+{"Encrypt: Counters Diff Key",		test_counters_diff		},
+{"Encrypt: Counters Holes",		test_counters_holes		},
+/*
+{"Encrypt: Split",			test_split			},
+{"Encrypt: Well Suited",		test_well_suited		},
+{"Encrypt: Not Suited",			test_not_suited			},
+*/
+{"Flow: Switch key no data",		test_switch_key_no_data		},
+{"Flow: Switch key multi VMAs",		test_switch_key_mult_vmas	},
+{"Flow: Switch No Key to Any Key",	test_switch_key0_to_key		},
+{"Flow: madvise",			test_kai_madvise		},
+{"Flow: Invalidate In Use Key",		test_discard_in_use_key		},
+};
+
+void print_usage(void)
+{
+	fprintf(stderr, "Usage: mktme_test [options]...\n"
+		"  -a			Run ALL tests\n"
+		"  -t <testnum>		Run one <testnum> test\n"
+		"  -l			List available tests\n"
+		"  -h, -?		Show this help\n"
+	       );
+}
+
+int main(int argc, char *argv[])
+{
+	int test_selected = -1;
+	char printtest[12];
+	int trace = 0;
+	int i, c, err;
+	char *temp;
+
+	/*
+	 * TODO: Default case needs to run 'selftests' -  a
+	 * curated set of tests that validate functionality but
+	 * don't hog resources.
+	 */
+	c = getopt(argc, argv, "at:lph?");
+		switch (c) {
+		case 'a':
+			test_selected = -1;
+			printf("Test Selected [ALL]\n");
+			break;
+		case 't':
+			test_selected = strtoul(optarg, &temp, 10);
+			printf("Test Selected [%d]\n", test_selected);
+			break;
+		case 'l':
+			for (i = 0; i < ARRAY_SIZE(mktme_tests); i++)
+				printf("[%2d] %s\n", i + 1,
+				       mktme_tests[i].name);
+			exit(0);
+			break;
+		case 'p':
+			trace = 1;
+		case 'h':
+		case '?':
+		default:
+			print_usage();
+			exit(0);
+		}
+
+/*
+ *	if (!cpu_has_mktme()) {
+ *		printf("MKTME not supported on this system.\n");
+ *		exit(0);
+ *	}
+ */
+	if (trace) {
+		printf("Pausing: start trace on PID[%d]\n", (int)getpid());
+		getchar();
+	}
+
+	if (test_selected == -1) {
+		for (i = 0; i < ARRAY_SIZE(mktme_tests); i++) {
+			printf("[%2d] %s\n", i + 1, mktme_tests[i].name);
+			mktme_tests[i].func();
+		}
+		printf("\nTests Completed\n");
+
+	} else {
+		if (test_selected <= ARRAY_SIZE(mktme_tests)) {
+			printf("[%2d] %s\n", test_selected,
+			       mktme_tests[test_selected - 1].name);
+			mktme_tests[test_selected - 1].func();
+			printf("\nTest Completed\n");
+		}
+	}
+	exit(0);
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 49/62] mm, x86: export several MKTME variables
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (47 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 48/62] selftests/x86/mktme: Test the MKTME APIs Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-06-14 11:56   ` Peter Zijlstra
  2019-05-08 14:44 ` [PATCH, RFC 50/62] kvm, x86, mmu: setup MKTME keyID to spte for given PFN Kirill A. Shutemov
                   ` (14 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Kai Huang <kai.huang@linux.intel.com>

KVM needs those variables to get/set memory encryption mask.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/mm/mktme.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index df70651816a1..12f4266cf7ea 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -7,13 +7,16 @@
 
 /* Mask to extract KeyID from physical address. */
 phys_addr_t mktme_keyid_mask;
+EXPORT_SYMBOL_GPL(mktme_keyid_mask);
 /*
  * Number of KeyIDs available for MKTME.
  * Excludes KeyID-0 which used by TME. MKTME KeyIDs start from 1.
  */
 int mktme_nr_keyids;
+EXPORT_SYMBOL_GPL(mktme_nr_keyids);
 /* Shift of KeyID within physical address. */
 int mktme_keyid_shift;
+EXPORT_SYMBOL_GPL(mktme_keyid_shift);
 
 DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
 EXPORT_SYMBOL_GPL(mktme_enabled_key);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 50/62] kvm, x86, mmu: setup MKTME keyID to spte for given PFN
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (48 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 49/62] mm, x86: export several MKTME variables Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 51/62] iommu/vt-d: Support MKTME in DMA remapping Kirill A. Shutemov
                   ` (13 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Kai Huang <kai.huang@linux.intel.com>

Setup keyID to SPTE, which will be eventually programmed to shadow MMU
or EPT table, according to page's associated keyID, so that guest is
able to use correct keyID to access guest memory.

Note current shadow_me_mask doesn't suit MKTME's needs, since for MKTME
there's no fixed memory encryption mask, but can vary from keyID 1 to
maximum keyID, therefore shadow_me_mask remains 0 for MKTME.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d9c7b45d231f..bfee0c194161 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2899,6 +2899,22 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
 #define SET_SPTE_WRITE_PROTECTED_PT	BIT(0)
 #define SET_SPTE_NEED_REMOTE_TLB_FLUSH	BIT(1)
 
+static u64 get_phys_encryption_mask(kvm_pfn_t pfn)
+{
+#ifdef CONFIG_X86_INTEL_MKTME
+	struct page *page;
+
+	if (!pfn_valid(pfn))
+		return 0;
+
+	page = pfn_to_page(pfn);
+
+	return ((u64)page_keyid(page)) << mktme_keyid_shift;
+#else
+	return shadow_me_mask;
+#endif
+}
+
 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		    unsigned pte_access, int level,
 		    gfn_t gfn, kvm_pfn_t pfn, bool speculative,
@@ -2945,7 +2961,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		pte_access &= ~ACC_WRITE_MASK;
 
 	if (!kvm_is_mmio_pfn(pfn))
-		spte |= shadow_me_mask;
+		spte |= get_phys_encryption_mask(pfn);
 
 	spte |= (u64)pfn << PAGE_SHIFT;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 51/62] iommu/vt-d: Support MKTME in DMA remapping
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (49 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 50/62] kvm, x86, mmu: setup MKTME keyID to spte for given PFN Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-06-14 12:04   ` Peter Zijlstra
  2019-05-08 14:44 ` [PATCH, RFC 52/62] x86/mm: introduce common code for mem encryption Kirill A. Shutemov
                   ` (12 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

When MKTME is enabled, keyid is stored in the high order bits of physical
address. For DMA transactions targeting encrypted physical memory, keyid
must be included in the IOVA to physical address translation.

This patch appends page keyid when setting up the IOMMU PTEs. On the
reverse direction, keyid bits are cleared in the physical address lookup.
Mapping functions of both DMA ops and IOMMU ops are covered.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 29 +++++++++++++++++++++++++++--
 include/linux/intel-iommu.h |  9 ++++++++-
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 28cb713d728c..1ff7e87e25f1 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -862,6 +862,28 @@ static void free_context_table(struct intel_iommu *iommu)
 	spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
+static inline void set_pte_mktme_keyid(unsigned long phys_pfn,
+		phys_addr_t *pteval)
+{
+	unsigned long keyid;
+
+	if (!pfn_valid(phys_pfn))
+		return;
+
+	keyid = page_keyid(pfn_to_page(phys_pfn));
+
+#ifdef CONFIG_X86_INTEL_MKTME
+	/*
+	 * When MKTME is enabled, set keyid in PTE such that DMA
+	 * remapping will include keyid in the translation from IOVA
+	 * to physical address. This applies to both user and kernel
+	 * allocated DMA memory.
+	 */
+	*pteval &= ~mktme_keyid_mask;
+	*pteval |= keyid << mktme_keyid_shift;
+#endif
+}
+
 static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 				      unsigned long pfn, int *target_level)
 {
@@ -888,7 +910,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 			break;
 
 		if (!dma_pte_present(pte)) {
-			uint64_t pteval;
+			phys_addr_t pteval;
 
 			tmp_page = alloc_pgtable_page(domain->nid);
 
@@ -896,7 +918,8 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 				return NULL;
 
 			domain_flush_cache(domain, tmp_page, VTD_PAGE_SIZE);
-			pteval = ((uint64_t)virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE;
+			pteval = (virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE;
+			set_pte_mktme_keyid(virt_to_dma_pfn(tmp_page), &pteval);
 			if (cmpxchg64(&pte->val, 0ULL, pteval))
 				/* Someone else set it while we were thinking; use theirs. */
 				free_pgtable_page(tmp_page);
@@ -2289,6 +2312,8 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 			}
 
 		}
+		set_pte_mktme_keyid(phys_pfn, &pteval);
+
 		/* We don't need lock here, nobody else
 		 * touches the iova range
 		 */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index fa364de9db18..48a377a2b896 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -34,6 +34,8 @@
 
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
+#include <asm/page.h>
+
 
 /*
  * VT-d hardware uses 4KiB page size regardless of host page size.
@@ -603,7 +605,12 @@ static inline void dma_clear_pte(struct dma_pte *pte)
 static inline u64 dma_pte_addr(struct dma_pte *pte)
 {
 #ifdef CONFIG_64BIT
-	return pte->val & VTD_PAGE_MASK;
+	u64 addr = pte->val;
+	addr &= VTD_PAGE_MASK;
+#ifdef CONFIG_X86_INTEL_MKTME
+	addr &= ~mktme_keyid_mask;
+#endif
+	return addr;
 #else
 	/* Must have a full atomic 64-bit read */
 	return  __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 52/62] x86/mm: introduce common code for mem encryption
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (50 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 51/62] iommu/vt-d: Support MKTME in DMA remapping Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 16:58   ` Christoph Hellwig
  2019-05-08 14:44 ` [PATCH, RFC 53/62] x86/mm: Use common code for DMA memory encryption Kirill A. Shutemov
                   ` (11 subsequent siblings)
  63 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

Both Intel MKTME and AMD SME have needs to support DMA address
translation with encryption related bits. Common functions are
introduced in this patch to keep DMA generic code abstracted.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/Kconfig                 |  4 ++++
 arch/x86/mm/Makefile             |  1 +
 arch/x86/mm/mem_encrypt_common.c | 28 ++++++++++++++++++++++++++++
 3 files changed, 33 insertions(+)
 create mode 100644 arch/x86/mm/mem_encrypt_common.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 62cfb381fee3..ce9642e2c31b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1505,11 +1505,15 @@ config X86_CPA_STATISTICS
 config ARCH_HAS_MEM_ENCRYPT
 	def_bool y
 
+config X86_MEM_ENCRYPT_COMMON
+	def_bool n
+
 config AMD_MEM_ENCRYPT
 	bool "AMD Secure Memory Encryption (SME) support"
 	depends on X86_64 && CPU_SUP_AMD
 	select DYNAMIC_PHYSICAL_MASK
 	select ARCH_USE_MEMREMAP_PROT
+	select X86_MEM_ENCRYPT_COMMON
 	---help---
 	  Say yes to enable support for the encryption of system memory.
 	  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 4ebee899c363..89dddbc01b1b 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -55,3 +55,4 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_identity.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
 
 obj-$(CONFIG_X86_INTEL_MKTME)	+= mktme.o
+obj-$(CONFIG_X86_MEM_ENCRYPT_COMMON)	+= mem_encrypt_common.o
diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c
new file mode 100644
index 000000000000..2adee65eec46
--- /dev/null
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -0,0 +1,28 @@
+#include <linux/mm.h>
+#include <linux/mem_encrypt.h>
+#include <asm/mktme.h>
+
+/*
+ * Encryption bits need to be set and cleared for both Intel MKTME and
+ * AMD SME when converting between DMA address and physical address.
+ */
+dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr)
+{
+	unsigned long keyid;
+
+	if (sme_active())
+		return __sme_set(daddr);
+	keyid = page_keyid(pfn_to_page(__phys_to_pfn(paddr)));
+
+	return (daddr & ~mktme_keyid_mask) | (keyid << mktme_keyid_shift);
+}
+EXPORT_SYMBOL_GPL(__mem_encrypt_dma_set);
+
+phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
+{
+	if (sme_active())
+		return __sme_clr(paddr);
+
+	return paddr & ~mktme_keyid_mask;
+}
+EXPORT_SYMBOL_GPL(__mem_encrypt_dma_clear);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 53/62] x86/mm: Use common code for DMA memory encryption
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (51 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 52/62] x86/mm: introduce common code for mem encryption Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 54/62] x86/mm: Disable MKTME on incompatible platform configurations Kirill A. Shutemov
                   ` (10 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

Replace sme_ code with x86 memory encryption common code such that
Intel MKTME can be supported underneath generic DMA code.
dma_to_phys() & phys_to_dma() results will be runtime modified by
memory encryption code.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mem_encrypt.h | 29 +++++++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt_common.c   |  2 +-
 include/linux/dma-direct.h         |  4 ++--
 include/linux/mem_encrypt.h        | 23 ++++++++++-------------
 4 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 616f8e637bc3..a2b69cbb0e41 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -55,8 +55,19 @@ bool sev_active(void);
 
 #define __bss_decrypted __attribute__((__section__(".bss..decrypted")))
 
+/*
+ * The __sme_set() and __sme_clr() macros are useful for adding or removing
+ * the encryption mask from a value (e.g. when dealing with pagetable
+ * entries).
+ */
+#define __sme_set(x)		((x) | sme_me_mask)
+#define __sme_clr(x)		((x) & ~sme_me_mask)
+
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
+#define __sme_set(x)		(x)
+#define __sme_clr(x)		(x)
+
 #define sme_me_mask	0ULL
 
 static inline void __init sme_early_encrypt(resource_size_t paddr,
@@ -97,4 +108,22 @@ extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypte
 
 #endif	/* __ASSEMBLY__ */
 
+#ifdef CONFIG_X86_MEM_ENCRYPT_COMMON
+
+extern dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr);
+extern phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr);
+
+#else
+static inline dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr)
+{
+	return daddr;
+}
+
+static inline phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
+{
+	return paddr;
+}
+#endif /* CONFIG_X86_MEM_ENCRYPT_COMMON */
+
+
 #endif	/* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c
index 2adee65eec46..dcc5c710a235 100644
--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -1,5 +1,5 @@
 #include <linux/mm.h>
-#include <linux/mem_encrypt.h>
+#include <asm/mem_encrypt.h>
 #include <asm/mktme.h>
 
 /*
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index b7338702592a..a949adeb6558 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -40,12 +40,12 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
  */
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
-	return __sme_set(__phys_to_dma(dev, paddr));
+	return __mem_encrypt_dma_set(__phys_to_dma(dev, paddr), paddr);
 }
 
 static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
-	return __sme_clr(__dma_to_phys(dev, daddr));
+	return __mem_encrypt_dma_clear(__dma_to_phys(dev, daddr));
 }
 
 u64 dma_direct_get_required_mask(struct device *dev);
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index b310a9c18113..ce8ff0ead16c 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -26,6 +26,16 @@
 static inline bool sme_active(void) { return false; }
 static inline bool sev_active(void) { return false; }
 
+static inline dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr)
+{
+	return daddr;
+}
+
+static inline phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
+{
+	return paddr;
+}
+
 #endif	/* CONFIG_ARCH_HAS_MEM_ENCRYPT */
 
 static inline bool mem_encrypt_active(void)
@@ -38,19 +48,6 @@ static inline u64 sme_get_me_mask(void)
 	return sme_me_mask;
 }
 
-#ifdef CONFIG_AMD_MEM_ENCRYPT
-/*
- * The __sme_set() and __sme_clr() macros are useful for adding or removing
- * the encryption mask from a value (e.g. when dealing with pagetable
- * entries).
- */
-#define __sme_set(x)		((x) | sme_me_mask)
-#define __sme_clr(x)		((x) & ~sme_me_mask)
-#else
-#define __sme_set(x)		(x)
-#define __sme_clr(x)		(x)
-#endif
-
 #endif	/* __ASSEMBLY__ */
 
 #endif	/* __MEM_ENCRYPT_H__ */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 54/62] x86/mm: Disable MKTME on incompatible platform configurations
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (52 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 53/62] x86/mm: Use common code for DMA memory encryption Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 55/62] x86/mm: Disable MKTME if not all system memory supports encryption Kirill A. Shutemov
                   ` (9 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Icelake Server requires additional check to make sure that MKTME usage
is safe on Linux.

Kernel needs a way to access encrypted memory. There can be different
approaches to this: create a temporary mapping to access the page (using
kmap() interface), modify kernel's direct mapping on allocation of
encrypted page.

In order to minimize runtime overhead, the Linux MKTME implementation
uses multiple direct mappings, one per-KeyID. Kernel uses the direct
mapping that is relevant for the page at the moment.

Icelake Server in some configurations doesn't allow a page to be mapped
with multiple KeyIDs at the same time. Even if only one of KeyIDs is
actively used. It conflicts with the Linux MKTME implementation.

OS can check if it's safe to map the same with multiple KeyIDs by
examining bit 8 of MSR 0x6F. If the bit is set we cannot safely use
MKTME on Linux.

The user can disable the Directory Mode in BIOS setup to get the
platform into Linux-compatible mode.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/intel-family.h |  2 ++
 arch/x86/kernel/cpu/intel.c         | 22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/intel-family.h b/arch/x86/include/asm/intel-family.h
index 9f15384c504a..6a633af144aa 100644
--- a/arch/x86/include/asm/intel-family.h
+++ b/arch/x86/include/asm/intel-family.h
@@ -53,6 +53,8 @@
 #define INTEL_FAM6_CANNONLAKE_MOBILE	0x66
 
 #define INTEL_FAM6_ICELAKE_MOBILE	0x7E
+#define INTEL_FAM6_ICELAKE_X		0x6A
+#define INTEL_FAM6_ICELAKE_XEON_D	0x6C
 
 /* "Small Core" Processors (Atom) */
 
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index f402a74c00a1..3fc318f699d3 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,7 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -531,6 +532,16 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
 #define TME_ACTIVATE_CRYPTO_ALGS(x)	((x >> 48) & 0xffff)	/* Bits 63:48 */
 #define TME_ACTIVATE_CRYPTO_AES_XTS_128	1
 
+#define MSR_ICX_MKTME_STATUS		0x6F
+#define MKTME_ALIASES_FORBIDDEN(x)	(x & BIT(8))
+
+/* Need to check MSR_ICX_MKTME_STATUS for these CPUs */
+static const struct x86_cpu_id mktme_status_msr_ids[] = {
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ICELAKE_X		},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ICELAKE_XEON_D	},
+	{}
+};
+
 /* Values for mktme_status (SW only construct) */
 #define MKTME_ENABLED			0
 #define MKTME_DISABLED			1
@@ -564,6 +575,17 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		return;
 	}
 
+	/* Icelake Server quirk: do not enable MKTME if aliases are forbidden */
+	if (x86_match_cpu(mktme_status_msr_ids)) {
+		u64 mktme_status;
+		rdmsrl(MSR_ICX_MKTME_STATUS, mktme_status);
+
+		if (MKTME_ALIASES_FORBIDDEN(mktme_status)) {
+			pr_err_once("x86/tme: Directory Mode is enabled in BIOS\n");
+			mktme_status = MKTME_DISABLED;
+		}
+	}
+
 	if (mktme_status != MKTME_UNINITIALIZED)
 		goto detect_keyid_bits;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 55/62] x86/mm: Disable MKTME if not all system memory supports encryption
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (53 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 54/62] x86/mm: Disable MKTME on incompatible platform configurations Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 56/62] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
                   ` (8 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

UEFI memory attribute EFI_MEMORY_CPU_CRYPTO indicates whether the memory
region supports encryption.

Kernel doesn't handle situation when only part of the system memory
supports encryption.

Disable MKTME if not all system memory supports encryption.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/mm/mktme.c        | 29 +++++++++++++++++++++++++++++
 drivers/firmware/efi/efi.c | 25 +++++++++++++------------
 include/linux/efi.h        |  1 +
 3 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 12f4266cf7ea..60b479686ea5 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,6 +1,7 @@
 #include <linux/mm.h>
 #include <linux/highmem.h>
 #include <linux/rmap.h>
+#include <linux/efi.h>
 #include <asm/mktme.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
@@ -33,9 +34,37 @@ void mktme_disable(void)
 
 static bool need_page_mktme(void)
 {
+	int nid;
+
 	/* Make sure keyid doesn't collide with extended page flags */
 	BUILD_BUG_ON(__NR_PAGE_EXT_FLAGS > 16);
 
+	for_each_node_state(nid, N_MEMORY) {
+		const efi_memory_desc_t *md;
+		unsigned long node_start, node_end;
+
+		node_start = node_start_pfn(nid) << PAGE_SHIFT;
+		node_end = node_end_pfn(nid) << PAGE_SHIFT;
+
+		for_each_efi_memory_desc(md) {
+			u64 efi_start = md->phys_addr;
+			u64 efi_end = md->phys_addr + PAGE_SIZE * md->num_pages;
+
+			if (md->attribute & EFI_MEMORY_CPU_CRYPTO)
+				continue;
+			if (efi_start > node_end)
+				continue;
+			if (efi_end  < node_start)
+				continue;
+
+			printk("Memory range %#llx-%#llx: doesn't support encryption\n",
+					efi_start, efi_end);
+			printk("Disable MKTME\n");
+			mktme_disable();
+			break;
+		}
+	}
+
 	return !!mktme_nr_keyids;
 }
 
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 55b77c576c42..239b2edc78d3 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -848,25 +848,26 @@ char * __init efi_md_typeattr_format(char *buf, size_t size,
 	if (attr & ~(EFI_MEMORY_UC | EFI_MEMORY_WC | EFI_MEMORY_WT |
 		     EFI_MEMORY_WB | EFI_MEMORY_UCE | EFI_MEMORY_RO |
 		     EFI_MEMORY_WP | EFI_MEMORY_RP | EFI_MEMORY_XP |
-		     EFI_MEMORY_NV |
+		     EFI_MEMORY_NV | EFI_MEMORY_CPU_CRYPTO |
 		     EFI_MEMORY_RUNTIME | EFI_MEMORY_MORE_RELIABLE))
 		snprintf(pos, size, "|attr=0x%016llx]",
 			 (unsigned long long)attr);
 	else
 		snprintf(pos, size,
-			 "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
+			 "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
 			 attr & EFI_MEMORY_RUNTIME ? "RUN" : "",
 			 attr & EFI_MEMORY_MORE_RELIABLE ? "MR" : "",
-			 attr & EFI_MEMORY_NV      ? "NV"  : "",
-			 attr & EFI_MEMORY_XP      ? "XP"  : "",
-			 attr & EFI_MEMORY_RP      ? "RP"  : "",
-			 attr & EFI_MEMORY_WP      ? "WP"  : "",
-			 attr & EFI_MEMORY_RO      ? "RO"  : "",
-			 attr & EFI_MEMORY_UCE     ? "UCE" : "",
-			 attr & EFI_MEMORY_WB      ? "WB"  : "",
-			 attr & EFI_MEMORY_WT      ? "WT"  : "",
-			 attr & EFI_MEMORY_WC      ? "WC"  : "",
-			 attr & EFI_MEMORY_UC      ? "UC"  : "");
+			 attr & EFI_MEMORY_NV         ? "NV"  : "",
+			 attr & EFI_MEMORY_CPU_CRYPTO ? "CR"  : "",
+			 attr & EFI_MEMORY_XP         ? "XP"  : "",
+			 attr & EFI_MEMORY_RP         ? "RP"  : "",
+			 attr & EFI_MEMORY_WP         ? "WP"  : "",
+			 attr & EFI_MEMORY_RO         ? "RO"  : "",
+			 attr & EFI_MEMORY_UCE        ? "UCE" : "",
+			 attr & EFI_MEMORY_WB         ? "WB"  : "",
+			 attr & EFI_MEMORY_WT         ? "WT"  : "",
+			 attr & EFI_MEMORY_WC         ? "WC"  : "",
+			 attr & EFI_MEMORY_UC         ? "UC"  : "");
 	return buf;
 }
 
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 6ebc2098cfe1..4b2d0b1a75dc 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -112,6 +112,7 @@ typedef	struct {
 #define EFI_MEMORY_MORE_RELIABLE \
 				((u64)0x0000000000010000ULL)	/* higher reliability */
 #define EFI_MEMORY_RO		((u64)0x0000000000020000ULL)	/* read-only */
+#define EFI_MEMORY_CPU_CRYPTO 	((u64)0x0000000000080000ULL)	/* memory encryption supported */
 #define EFI_MEMORY_RUNTIME	((u64)0x8000000000000000ULL)	/* range requires runtime mapping */
 #define EFI_MEMORY_DESCRIPTOR_VERSION	1
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 56/62] x86: Introduce CONFIG_X86_INTEL_MKTME
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (54 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 55/62] x86/mm: Disable MKTME if not all system memory supports encryption Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 57/62] x86/mktme: Overview of Multi-Key Total Memory Encryption Kirill A. Shutemov
                   ` (7 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Add new config option to enabled/disable Multi-Key Total Memory
Encryption support.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/Kconfig | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ce9642e2c31b..4d2cfee50102 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1533,6 +1533,27 @@ config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
 	  If set to N, then the encryption of system memory can be
 	  activated with the mem_encrypt=on command line option.
 
+config X86_INTEL_MKTME
+	bool "Intel Multi-Key Total Memory Encryption"
+	select DYNAMIC_PHYSICAL_MASK
+	select PAGE_EXTENSION
+	select X86_MEM_ENCRYPT_COMMON
+	depends on X86_64 && CPU_SUP_INTEL && !KASAN
+	depends on KEYS
+	depends on !MEMORY_HOTPLUG_DEFAULT_ONLINE
+	depends on ACPI_HMAT
+	---help---
+	  Say yes to enable support for Multi-Key Total Memory Encryption.
+	  This requires an Intel processor that has support of the feature.
+
+	  Multikey Total Memory Encryption (MKTME) is a technology that allows
+	  transparent memory encryption in upcoming Intel platforms.
+
+	  MKTME is built on top of TME. TME allows encryption of the entirety
+	  of system memory using a single key. MKTME allows having multiple
+	  encryption domains, each having own key -- different memory pages can
+	  be encrypted with different keys.
+
 # Common NUMA Features
 config NUMA
 	bool "Numa Memory Allocation and Scheduler Support"
@@ -2207,7 +2228,7 @@ config RANDOMIZE_MEMORY
 
 config MEMORY_PHYSICAL_PADDING
 	hex "Physical memory mapping padding" if EXPERT
-	depends on RANDOMIZE_MEMORY
+	depends on RANDOMIZE_MEMORY || X86_INTEL_MKTME
 	default "0xa" if MEMORY_HOTPLUG
 	default "0x0"
 	range 0x1 0x40 if MEMORY_HOTPLUG
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 57/62] x86/mktme: Overview of Multi-Key Total Memory Encryption
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (55 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 56/62] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-29  7:21   ` Mike Rapoport
  2019-07-14 18:16   ` Randy Dunlap
  2019-05-08 14:44 ` [PATCH, RFC 58/62] x86/mktme: Document the MKTME provided security mitigations Kirill A. Shutemov
                   ` (6 subsequent siblings)
  63 siblings, 2 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Provide an overview of MKTME on Intel Platforms.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst          |  8 +++
 Documentation/x86/mktme/mktme_overview.rst | 57 ++++++++++++++++++++++
 2 files changed, 65 insertions(+)
 create mode 100644 Documentation/x86/mktme/index.rst
 create mode 100644 Documentation/x86/mktme/mktme_overview.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
new file mode 100644
index 000000000000..1614b52dd3e9
--- /dev/null
+++ b/Documentation/x86/mktme/index.rst
@@ -0,0 +1,8 @@
+
+=========================================
+Multi-Key Total Memory Encryption (MKTME)
+=========================================
+
+.. toctree::
+
+   mktme_overview
diff --git a/Documentation/x86/mktme/mktme_overview.rst b/Documentation/x86/mktme/mktme_overview.rst
new file mode 100644
index 000000000000..59c023965554
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_overview.rst
@@ -0,0 +1,57 @@
+Overview
+=========
+Multi-Key Total Memory Encryption (MKTME)[1] is a technology that
+allows transparent memory encryption in upcoming Intel platforms.
+It uses a new instruction (PCONFIG) for key setup and selects a
+key for individual pages by repurposing physical address bits in
+the page tables.
+
+Support for MKTME is added to the existing kernel keyring subsystem
+and via a new mprotect_encrypt() system call that can be used by
+applications to encrypt anonymous memory with keys obtained from
+the keyring.
+
+This architecture supports encrypting both normal, volatile DRAM
+and persistent memory.  However, persistent memory support is
+not included in the Linux kernel implementation at this time.
+(We anticipate adding that support next.)
+
+Hardware Background
+===================
+
+MKTME is built on top of an existing single-key technology called
+TME.  TME encrypts all system memory using a single key generated
+by the CPU on every boot of the system. TME provides mitigation
+against physical attacks, such as physically removing a DIMM or
+watching memory bus traffic.
+
+MKTME enables the use of multiple encryption keys[2], allowing
+selection of the encryption key per-page using the page tables.
+Encryption keys are programmed into each memory controller and
+the same set of keys is available to all entities on the system
+with access to that memory (all cores, DMA engines, etc...).
+
+MKTME inherits many of the mitigations against hardware attacks
+from TME.  Like TME, MKTME does not mitigate vulnerable or
+malicious operating systems or virtual machine managers.  MKTME
+offers additional mitigations when compared to TME.
+
+TME and MKTME use the AES encryption algorithm in the AES-XTS
+mode.  This mode, typically used for block-based storage devices,
+takes the physical address of the data into account when
+encrypting each block.  This ensures that the effective key is
+different for each block of memory. Moving encrypted content
+across physical address results in garbage on read, mitigating
+block-relocation attacks.  This property is the reason many of
+the discussed attacks require control of a shared physical page
+to be handed from the victim to the attacker.
+
+--
+1. https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
+2. The MKTME architecture supports up to 16 bits of KeyIDs, so a
+   maximum of 65535 keys on top of the “TME key” at KeyID-0.  The
+   first implementation is expected to support 5 bits, making 63
+   keys available to applications.  However, this is not guaranteed.
+   The number of available keys could be reduced if, for instance,
+   additional physical address space is desired over additional
+   KeyIDs.
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 58/62] x86/mktme: Document the MKTME provided security mitigations
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (56 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 57/62] x86/mktme: Overview of Multi-Key Total Memory Encryption Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 59/62] x86/mktme: Document the MKTME kernel configuration requirements Kirill A. Shutemov
                   ` (5 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Describe the security benefits of Multi-Key Total Memory
Encryption (MKTME) over Total Memory Encryption (TME) alone.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst             |   1 +
 Documentation/x86/mktme/mktme_mitigations.rst | 150 ++++++++++++++++++
 2 files changed, 151 insertions(+)
 create mode 100644 Documentation/x86/mktme/mktme_mitigations.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index 1614b52dd3e9..a3a29577b013 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -6,3 +6,4 @@ Multi-Key Total Memory Encryption (MKTME)
 .. toctree::
 
    mktme_overview
+   mktme_mitigations
diff --git a/Documentation/x86/mktme/mktme_mitigations.rst b/Documentation/x86/mktme/mktme_mitigations.rst
new file mode 100644
index 000000000000..90699c38750a
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_mitigations.rst
@@ -0,0 +1,150 @@
+MKTME-Provided Mitigations
+==========================
+
+MKTME adds a few mitigations against attacks that are not
+mitigated when using TME alone.  The first set are mitigations
+against software attacks that are familiar today:
+
+ * Kernel Mapping Attacks: information disclosures that leverage
+   the kernel direct map are mitigated against disclosing user
+   data.
+ * Freed Data Leak Attacks: removing an encryption key from the
+   hardware mitigates future user information disclosure.
+
+The next set are attacks that depend on specialized hardware,
+such as an “evil DIMM” or a DDR interposer:
+
+ * Cross-Domain Replay Attack: data is captured from one domain
+(guest) and replayed to another at a later time.
+ * Cross-Domain Capture and Delayed Compare Attack: data is
+   captured and later analyzed to discover secrets.
+ * Key Wear-out Attack: data is captured and analyzed in order
+   to Weaken the AES encryption itself.
+
+More details on these attacks are below.
+
+Kernel Mapping Attacks
+----------------------
+Information disclosure vulnerabilities leverage the kernel direct
+map because many vulnerabilities involve manipulation of kernel
+data structures (examples: CVE-2017-7277, CVE-2017-9605).  We
+normally think of these bugs as leaking valuable *kernel* data,
+but they can leak application data when application pages are
+recycled for kernel use.
+
+With this MKTME implementation, there is a direct map created for
+each MKTME KeyID which is used whenever the kernel needs to
+access plaintext.  But, all kernel data structures are accessed
+via the direct map for KeyID-0.  Thus, memory reads which are not
+coordinated with the KeyID get garbage (for example, accessing
+KeyID-4 data with the KeyID-0 mapping).
+
+This means that if sensitive data encrypted using MKTME is leaked
+via the KeyID-0 direct map, ciphertext decrypted with the wrong
+key will be disclosed.  To disclose plaintext, an attacker must
+“pivot” to the correct direct mapping, which is non-trivial
+because there are no kernel data structures in the KeyID!=0
+direct mapping.
+
+Freed Data Leak Attack
+----------------------
+The kernel has a history of bugs around uninitialized data.
+Usually, we think of these bugs as leaking sensitive kernel data,
+but they can also be used to leak application secrets.
+
+MKTME can help mitigate the case where application secrets are
+leaked:
+
+ * App (or VM) places a secret in a page * App exits or frees
+memory to kernel allocator * Page added to allocator free list *
+Attacker reallocates page to a purpose where it can read the page
+
+Now, imagine MKTME was in use on the memory being leaked.  The
+data can only be leaked as long as the key is programmed in the
+hardware.  If the key is de-programmed, like after all pages are
+freed after a guest is shut down, any future reads will just see
+ciphertext.
+
+Basically, the key is a convenient choke-point: you can be more
+confident that data encrypted with it is inaccessible once the
+key is removed.
+
+Cross-Domain Replay Attack
+--------------------------
+MKTME mitigates cross-domain replay attacks where an attacker
+replaces an encrypted block owned by one domain with a block
+owned by another domain.  MKTME does not prevent this replacement
+from occurring, but it does mitigate plaintext from being
+disclosed if the domains use different keys.
+
+With TME, the attack could be executed by:
+ * A victim places secret in memory, at a given physical address.
+   Note: AES-XTS is what restricts the attack to being performed
+   at a single physical address instead of across different
+   physical addresses
+ * Attacker captures victim secret’s ciphertext * Later on, after
+   victim frees the physical address, attacker gains ownership 
+ * Attacker puts the ciphertext at the address and get the secret
+   plaintext
+
+But, due to the presumably different keys used by the attacker
+and the victim, the attacker can not successfully decrypt old
+ciphertext.
+
+Cross-Domain Capture and Delayed Compare Attack
+-----------------------------------------------
+This is also referred to as a kind of dictionary attack.
+
+Similarly, MKTME protects against cross-domain capture-and-compare
+attacks.  Consider the following scenario:
+ * A victim places a secret in memory, at a known physical address
+ * Attacker captures victim’s ciphertext
+ * Attacker gains control of the target physical address, perhaps
+   after the victim’s VM is shut down or its memory reclaimed.
+ * Attacker computes and writes many possible plaintexts until new
+   ciphertext matches content captured previously.
+
+Secrets which have low (plaintext) entropy are more vulnerable to
+this attack because they reduce the number of possible plaintexts
+an attacker has to compute and write.
+
+The attack will not work if attacker and victim uses different
+keys.
+
+Key Wear-out Attack
+-------------------
+Repeated use of an encryption key might be used by an attacker to
+infer information about the key or the plaintext, weakening the
+encryption.  The higher the bandwidth of the encryption engine,
+the more vulnerable the key is to wear-out.  The MKTME memory
+encryption hardware works at the speed of the memory bus, which
+has high bandwidth.
+
+Such a weakness has been demonstrated[1] on a theoretical cipher
+with similar properties as AES-XTS.
+
+An attack would take the following steps:
+ * Victim system is using TME with AES-XTS-128
+ * Attacker repeatedly captures ciphertext/plaintext pairs (can
+   be Performed with online hardware attack like an interposer).
+ * Attacker compels repeated use of the key under attack for a
+   sustained time period without a system reboot[2].
+ * Attacker discovers a cipertext collision (two plaintexts
+   translating to the same ciphertext)
+ * Attacker can induce controlled modifications to the targeted
+   plaintext by modifying the colliding ciphertext
+
+MKTME mitigates key wear-out in two ways:
+ * Keys can be rotated periodically to mitigate wear-out.  Since
+   TME keys are generated at boot, rotation of TME keys requires a
+   reboot.  In contrast, MKTME allows rotation while the system is
+   booted.  An application could implement a policy to rotate keys
+   at a frequency which is not feasible to attack.
+ * In the case that MKTME is used to encrypt two guests’ memory
+   with two different keys, an attack on one guest’s key would not
+   weaken the key used in the second guest.
+
+--
+1. http://web.cs.ucdavis.edu/~rogaway/papers/offsets.pdf
+2. This sustained time required for an attack could vary from days
+   to years depending on the attacker’s goals.
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 59/62] x86/mktme: Document the MKTME kernel configuration requirements
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (57 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 58/62] x86/mktme: Document the MKTME provided security mitigations Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 60/62] x86/mktme: Document the MKTME Key Service API Kirill A. Shutemov
                   ` (4 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst               |  1 +
 Documentation/x86/mktme/mktme_configuration.rst | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)
 create mode 100644 Documentation/x86/mktme/mktme_configuration.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index a3a29577b013..0f021cc4a2db 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -7,3 +7,4 @@ Multi-Key Total Memory Encryption (MKTME)
 
    mktme_overview
    mktme_mitigations
+   mktme_configuration
diff --git a/Documentation/x86/mktme/mktme_configuration.rst b/Documentation/x86/mktme/mktme_configuration.rst
new file mode 100644
index 000000000000..91d2f80c736e
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_configuration.rst
@@ -0,0 +1,17 @@
+MKTME Configuration
+===================
+
+CONFIG_X86_INTEL_MKTME
+        MKTME is enabled by selecting CONFIG_X86_INTEL_MKTME on Intel
+        platforms supporting the MKTME feature.
+
+mktme_storekeys
+        mktme_storekeys is a kernel cmdline parameter.
+
+        This parameter allows the kernel to store the user specified
+        MKTME key payload. Storing this payload means that the MKTME
+        Key Service can always allow the addition of new physical
+        packages. If the mktme_storekeys parameter is not present,
+        users key data will not be stored, and new physical packages
+        may only be added to the system if no user type MKTME keys
+        are programmed.
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 60/62] x86/mktme: Document the MKTME Key Service API
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (58 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 59/62] x86/mktme: Document the MKTME kernel configuration requirements Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 61/62] x86/mktme: Document the MKTME API for anonymous memory encryption Kirill A. Shutemov
                   ` (3 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst      |  1 +
 Documentation/x86/mktme/mktme_keys.rst | 96 ++++++++++++++++++++++++++
 2 files changed, 97 insertions(+)
 create mode 100644 Documentation/x86/mktme/mktme_keys.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index 0f021cc4a2db..8cf2b7d62091 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -8,3 +8,4 @@ Multi-Key Total Memory Encryption (MKTME)
    mktme_overview
    mktme_mitigations
    mktme_configuration
+   mktme_keys
diff --git a/Documentation/x86/mktme/mktme_keys.rst b/Documentation/x86/mktme/mktme_keys.rst
new file mode 100644
index 000000000000..161871dee0dc
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_keys.rst
@@ -0,0 +1,96 @@
+MKTME Key Service API
+=====================
+MKTME is a new key service type added to the Linux Kernel Key Service.
+
+The MKTME Key Service type is available when CONFIG_X86_INTEL_MKTME is
+turned on in Intel platforms that support the MKTME feature.
+
+The MKTME Key Service type manages the allocation of hardware encryption
+keys. Users can request an MKTME type key and then use that key to
+encrypt memory with the encrypt_mprotect() system call.
+
+Usage
+-----
+    When using the Kernel Key Service to request an *mktme* key,
+    specify the *payload* as follows:
+
+    type=
+        *user*	User will supply the encryption key data. Use this
+                type to directly program a hardware encryption key.
+
+        *cpu*	User requests a CPU generated encryption key.
+                The CPU generates and assigns an ephemeral key.
+
+        *no-encrypt*
+                 User requests that hardware does not encrypt
+                 memory when this key is in use.
+
+    algorithm=
+        When type=user or type=cpu the algorithm field must be
+        *aes-xts-128*
+
+        When type=clear or type=no-encrypt the algorithm field
+        must not be present in the payload.
+
+    key=
+        When type=user the user must supply a 128 bit encryption
+        key as exactly 32 ASCII hexadecimal characters.
+
+	When type=cpu the user may optionally supply 128 bits of
+        entropy for the CPU generated encryption key in this field.
+        It must be exactly 32 ASCII hexadecimal characters.
+
+	When type=no-encrypt this key field must not be present
+        in the payload.
+
+    tweak=
+	When type=user the user must supply a 128 bit tweak key
+        as exactly 32 ASCII hexadecimal characters.
+
+	When type=cpu the user may optionally supply 128 bits of
+        entropy for the CPU generated tweak key in this field.
+        It must be exactly 32 ASCII hexadecimal characters.
+
+        When type=no-encrypt the tweak field must not be present
+        in the payload.
+
+ERRORS
+------
+    In addition to the Errors returned from the Kernel Key Service,
+    add_key(2) or keyctl(1) commands, the MKTME Key Service type may
+    return the following errors:
+
+    EINVAL for any payload specification that does not match the
+           MKTME type payload as defined above.
+
+    EACCES for access denied. The MKTME key type uses capabilities
+           to restrict the allocation of keys to privileged users.
+           CAP_SYS_RESOURCE is required, but it will accept the
+           broader capability of CAP_SYS_ADMIN. See capabilities(7).
+
+    ENOKEY if a hardware key cannot be allocated. Additional error
+           messages will describe the hardware programming errors.
+
+EXAMPLES
+--------
+    Add a 'user' type key::
+
+        char \*options_USER = "type=user
+                               algorithm=aes-xts-128
+                               key=12345678912345671234567891234567
+                               tweak=12345678912345671234567891234567";
+
+        key = add_key("mktme", "name", options_USER, strlen(options_USER),
+                      KEY_SPEC_THREAD_KEYRING);
+
+    Add a 'cpu' type key::
+
+        char \*options_USER = "type=cpu algorithm=aes-xts-128";
+
+        key = add_key("mktme", "name", options_CPU, strlen(options_CPU),
+                      KEY_SPEC_THREAD_KEYRING);
+
+    Add a "no-encrypt' type key::
+
+	key = add_key("mktme", "name", "no-encrypt", strlen(options_CPU),
+		      KEY_SPEC_THREAD_KEYRING);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 61/62] x86/mktme: Document the MKTME API for anonymous memory encryption
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (59 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 60/62] x86/mktme: Document the MKTME Key Service API Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-08 14:44 ` [PATCH, RFC 62/62] x86/mktme: Demonstration program using the MKTME APIs Kirill A. Shutemov
                   ` (2 subsequent siblings)
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst         |  1 +
 Documentation/x86/mktme/mktme_encrypt.rst | 57 +++++++++++++++++++++++
 2 files changed, 58 insertions(+)
 create mode 100644 Documentation/x86/mktme/mktme_encrypt.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index 8cf2b7d62091..ca3c76adc596 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -9,3 +9,4 @@ Multi-Key Total Memory Encryption (MKTME)
    mktme_mitigations
    mktme_configuration
    mktme_keys
+   mktme_encrypt
diff --git a/Documentation/x86/mktme/mktme_encrypt.rst b/Documentation/x86/mktme/mktme_encrypt.rst
new file mode 100644
index 000000000000..5cdffabc610f
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_encrypt.rst
@@ -0,0 +1,57 @@
+MKTME API: system call encrypt_mprotect()
+=========================================
+
+Synopsis
+--------
+int encrypt_mprotect(void \*addr, size_t len, int prot, key_serial_t serial);
+
+Where *key_serial_t serial* is the serial number of a key allocated
+using the MKTME Key Service.
+
+Description
+-----------
+    encrypt_mprotect() encrypts the memory pages containing any part
+    of the address range in the interval specified by addr and len.
+
+    encrypt_mprotect() supports the legacy mprotect() behavior plus
+    the enabling of memory encryption. That means that in addition
+    to encrypting the memory, the protection flags will be updated
+    as requested in the call.
+
+    The *addr* and *len* must be aligned to a page boundary.
+
+    The caller must have *KEY_NEED_VIEW* permission on the key.
+
+    The range of memory that is to be protected must be mapped as
+    *ANONYMOUS*.
+
+Errors
+------
+    In addition to the Errors returned from legacy mprotect()
+    encrypt_mprotect will return:
+
+    ENOKEY *serial* parameter does not represent a valid key.
+
+    EINVAL *len* parameter is not page aligned.
+
+    EACCES Caller does not have *KEY_NEED_VIEW* permission on the key.
+
+EXAMPLE
+--------
+  Allocate an MKTME Key::
+        serial = add_key("mktme", "name", "type=cpu algorithm=aes-xts-128" @u
+
+  Map ANONYMOUS memory::
+        ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+
+  Protect memory::
+        ret = syscall(SYS_encrypt_mprotect, ptr, size, PROT_READ|PROT_WRITE,
+                      serial);
+
+  Use the encrypted memory
+
+  Free memory::
+        ret = munmap(ptr, size);
+
+  Free the key resource::
+        ret = keyctl(KEYCTL_INVALIDATE, serial);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH, RFC 62/62] x86/mktme: Demonstration program using the MKTME APIs
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (60 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 61/62] x86/mktme: Document the MKTME API for anonymous memory encryption Kirill A. Shutemov
@ 2019-05-08 14:44 ` Kirill A. Shutemov
  2019-05-29  7:30 ` [PATCH, RFC 00/62] Intel MKTME enabling Mike Rapoport
  2019-06-14 12:15 ` Peter Zijlstra
  63 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 14:44 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst      |  1 +
 Documentation/x86/mktme/mktme_demo.rst | 53 ++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)
 create mode 100644 Documentation/x86/mktme/mktme_demo.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index ca3c76adc596..3af322d13225 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -10,3 +10,4 @@ Multi-Key Total Memory Encryption (MKTME)
    mktme_configuration
    mktme_keys
    mktme_encrypt
+   mktme_demo
diff --git a/Documentation/x86/mktme/mktme_demo.rst b/Documentation/x86/mktme/mktme_demo.rst
new file mode 100644
index 000000000000..49377ad648e7
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_demo.rst
@@ -0,0 +1,53 @@
+Demonstration Program using MKTME API's
+=======================================
+
+/* Compile with the keyutils library: cc -o mdemo mdemo.c -lkeyutils */
+
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <keyutils.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGE_SIZE)
+#define sys_encrypt_mprotect 428
+
+void main(void)
+{
+	char *options_CPU = "algorithm=aes-xts-128 type=cpu";
+	long size = PAGE_SIZE;
+        key_serial_t key;
+	void *ptra;
+	int ret;
+
+        /* Allocate an MKTME Key */
+	key = add_key("mktme", "testkey", options_CPU, strlen(options_CPU),
+                      KEY_SPEC_THREAD_KEYRING);
+
+	if (key == -1) {
+		printf("addkey FAILED\n");
+		return;
+	}
+        /* Map a page of ANONYMOUS memory */
+	ptra = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+	if (!ptra) {
+		printf("failed to mmap");
+		goto inval_key;
+	}
+        /* Encrypt that page of memory with the MKTME Key */
+	ret = syscall(sys_encrypt_mprotect, ptra, size, PROT_NONE, key);
+	if (ret)
+		printf("mprotect error [%d]\n", ret);
+
+        /* Enjoy that page of encrypted memory */
+
+        /* Free the memory */
+	ret = munmap(ptra, size);
+
+inval_key:
+        /* Free the Key */
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		printf("invalidate failed on key [%d]\n", key);
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 52/62] x86/mm: introduce common code for mem encryption
  2019-05-08 14:44 ` [PATCH, RFC 52/62] x86/mm: introduce common code for mem encryption Kirill A. Shutemov
@ 2019-05-08 16:58   ` Christoph Hellwig
  2019-05-08 20:52     ` Jacob Pan
  0 siblings, 1 reply; 153+ messages in thread
From: Christoph Hellwig @ 2019-05-08 16:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:44:12PM +0300, Kirill A. Shutemov wrote:
> +EXPORT_SYMBOL_GPL(__mem_encrypt_dma_set);
> +
> +phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
> +{
> +	if (sme_active())
> +		return __sme_clr(paddr);
> +
> +	return paddr & ~mktme_keyid_mask;
> +}
> +EXPORT_SYMBOL_GPL(__mem_encrypt_dma_clear);

In general nothing related to low-level dma address should ever
be exposed to modules.  What is your intended user for these two?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 48/62] selftests/x86/mktme: Test the MKTME APIs
  2019-05-08 14:44 ` [PATCH, RFC 48/62] selftests/x86/mktme: Test the MKTME APIs Kirill A. Shutemov
@ 2019-05-08 17:09   ` Alison Schofield
  0 siblings, 0 replies; 153+ messages in thread
From: Alison Schofield @ 2019-05-08 17:09 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, linux-mm, kvm,
	keyrings, linux-kernel

Please ignore this patch.
It includes an outdated draft from early testing. Other than showing
our intent to deliver selftests, it is not out for review.

Alison

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 52/62] x86/mm: introduce common code for mem encryption
  2019-05-08 16:58   ` Christoph Hellwig
@ 2019-05-08 20:52     ` Jacob Pan
  2019-05-08 21:21       ` Kirill A. Shutemov
  0 siblings, 1 reply; 153+ messages in thread
From: Jacob Pan @ 2019-05-08 20:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells, Kees Cook, Dave Hansen,
	Kai Huang, Alison Schofield, linux-mm, kvm, keyrings,
	linux-kernel, jacob.jun.pan

On Wed, 8 May 2019 09:58:30 -0700
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, May 08, 2019 at 05:44:12PM +0300, Kirill A. Shutemov wrote:
> > +EXPORT_SYMBOL_GPL(__mem_encrypt_dma_set);
> > +
> > +phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
> > +{
> > +	if (sme_active())
> > +		return __sme_clr(paddr);
> > +
> > +	return paddr & ~mktme_keyid_mask;
> > +}
> > +EXPORT_SYMBOL_GPL(__mem_encrypt_dma_clear);  
> 
> In general nothing related to low-level dma address should ever
> be exposed to modules.  What is your intended user for these two?

Right no need to export. It will be used by IOMMU drivers.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 52/62] x86/mm: introduce common code for mem encryption
  2019-05-08 20:52     ` Jacob Pan
@ 2019-05-08 21:21       ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-08 21:21 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Christoph Hellwig, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells, Kees Cook, Dave Hansen,
	Kai Huang, Alison Schofield, linux-mm, kvm, keyrings,
	linux-kernel

On Wed, May 08, 2019 at 08:52:25PM +0000, Jacob Pan wrote:
> On Wed, 8 May 2019 09:58:30 -0700
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Wed, May 08, 2019 at 05:44:12PM +0300, Kirill A. Shutemov wrote:
> > > +EXPORT_SYMBOL_GPL(__mem_encrypt_dma_set);
> > > +
> > > +phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
> > > +{
> > > +	if (sme_active())
> > > +		return __sme_clr(paddr);
> > > +
> > > +	return paddr & ~mktme_keyid_mask;
> > > +}
> > > +EXPORT_SYMBOL_GPL(__mem_encrypt_dma_clear);  
> > 
> > In general nothing related to low-level dma address should ever
> > be exposed to modules.  What is your intended user for these two?
> 
> Right no need to export. It will be used by IOMMU drivers.

I will drop these EXPORT_SYMBOL_GPL().

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 03/62] mm/ksm: Do not merge pages with different KeyIDs
  2019-05-08 14:43 ` [PATCH, RFC 03/62] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
@ 2019-05-10 18:07   ` Dave Hansen
  2019-05-13 14:27     ` Kirill A. Shutemov
  0 siblings, 1 reply; 153+ messages in thread
From: Dave Hansen @ 2019-05-10 18:07 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells
  Cc: Kees Cook, Kai Huang, Jacob Pan, Alison Schofield, linux-mm, kvm,
	keyrings, linux-kernel

On 5/8/19 7:43 AM, Kirill A. Shutemov wrote:
> KeyID indicates what key to use to encrypt and decrypt page's content.
> Depending on the implementation a cipher text may be tied to physical
> address of the page. It means that pages with an identical plain text
> would appear different if KSM would look at a cipher text. It effectively
> disables KSM for encrypted pages.
> 
> In addition, some implementations may not allow to read cipher text at all.
> 
> KSM compares plain text instead (transparently to KSM code).
> 
> But we still need to make sure that pages with identical plain text will
> not be merged together if they are encrypted with different keys.
> 
> To make it work kernel only allows merging pages with the same KeyID.
> The approach guarantees that the merged page can be read by all users.

I can't really parse this description.  Can I suggest replacement text?

Problem: KSM compares plain text.  It might try to merge two pages that
have the same plain text but different ciphertext and possibly different
encryption keys.  When the kernel encrypted the page, it promised that
it would keep it encrypted with _that_ key.  That makes it impossible to
merge two pages encrypted with different keys.

Solution: Never merge encrypted pages with different KeyIDs.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 03/62] mm/ksm: Do not merge pages with different KeyIDs
  2019-05-10 18:07   ` Dave Hansen
@ 2019-05-13 14:27     ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-13 14:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Kai Huang, Jacob Pan, Alison Schofield, linux-mm, kvm,
	keyrings, linux-kernel

On Fri, May 10, 2019 at 06:07:11PM +0000, Dave Hansen wrote:
> On 5/8/19 7:43 AM, Kirill A. Shutemov wrote:
> > KeyID indicates what key to use to encrypt and decrypt page's content.
> > Depending on the implementation a cipher text may be tied to physical
> > address of the page. It means that pages with an identical plain text
> > would appear different if KSM would look at a cipher text. It effectively
> > disables KSM for encrypted pages.
> > 
> > In addition, some implementations may not allow to read cipher text at all.
> > 
> > KSM compares plain text instead (transparently to KSM code).
> > 
> > But we still need to make sure that pages with identical plain text will
> > not be merged together if they are encrypted with different keys.
> > 
> > To make it work kernel only allows merging pages with the same KeyID.
> > The approach guarantees that the merged page can be read by all users.
> 
> I can't really parse this description.  Can I suggest replacement text?

Sure.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 02/62] mm: Add helpers to setup zero page mappings
  2019-05-08 14:43 ` [PATCH, RFC 02/62] mm: Add helpers to setup zero page mappings Kirill A. Shutemov
@ 2019-05-29  7:21   ` Mike Rapoport
  0 siblings, 0 replies; 153+ messages in thread
From: Mike Rapoport @ 2019-05-29  7:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:43:22PM +0300, Kirill A. Shutemov wrote:
> When kernel setups an encrypted page mapping, encryption KeyID is

Nit: "when kernel sets up an encrypted..."

> derived from a VMA. KeyID is going to be part of vma->vm_page_prot and
> it will be propagated transparently to page table entry on mk_pte().
> 
> But there is an exception: zero page is never encrypted and its mapping
> must use KeyID-0, regardless VMA's KeyID.
> 
> Introduce helpers that create a page table entry for zero page.
> 
> The generic implementation will be overridden by architecture-specific
> code that takes care about using correct KeyID.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  fs/dax.c                      | 3 +--
>  include/asm-generic/pgtable.h | 8 ++++++++
>  mm/huge_memory.c              | 6 ++----
>  mm/memory.c                   | 3 +--
>  mm/userfaultfd.c              | 3 +--
>  5 files changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index e5e54da1715f..6d609bff53b9 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1441,8 +1441,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
>  		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
>  		mm_inc_nr_ptes(vma->vm_mm);
>  	}
> -	pmd_entry = mk_pmd(zero_page, vmf->vma->vm_page_prot);
> -	pmd_entry = pmd_mkhuge(pmd_entry);
> +	pmd_entry = mk_zero_pmd(zero_page, vmf->vma->vm_page_prot);
>  	set_pmd_at(vmf->vma->vm_mm, pmd_addr, vmf->pmd, pmd_entry);
>  	spin_unlock(ptl);
>  	trace_dax_pmd_load_hole(inode, vmf, zero_page, *entry);
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index fa782fba51ee..cde8b81f6f2b 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -879,8 +879,16 @@ static inline unsigned long my_zero_pfn(unsigned long addr)
>  }
>  #endif
> 
> +#ifndef mk_zero_pte
> +#define mk_zero_pte(addr, prot) pte_mkspecial(pfn_pte(my_zero_pfn(addr), prot))
> +#endif
> +
>  #ifdef CONFIG_MMU
> 
> +#ifndef mk_zero_pmd
> +#define mk_zero_pmd(zero_page, prot) pmd_mkhuge(mk_pmd(zero_page, prot))
> +#endif
> +
>  #ifndef CONFIG_TRANSPARENT_HUGEPAGE
>  static inline int pmd_trans_huge(pmd_t pmd)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 165ea46bf149..26c3503824ba 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -675,8 +675,7 @@ static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
>  	pmd_t entry;
>  	if (!pmd_none(*pmd))
>  		return false;
> -	entry = mk_pmd(zero_page, vma->vm_page_prot);
> -	entry = pmd_mkhuge(entry);
> +	entry = mk_zero_pmd(zero_page, vma->vm_page_prot);
>  	if (pgtable)
>  		pgtable_trans_huge_deposit(mm, pmd, pgtable);
>  	set_pmd_at(mm, haddr, pmd, entry);
> @@ -2101,8 +2100,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
> 
>  	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
>  		pte_t *pte, entry;
> -		entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
> -		entry = pte_mkspecial(entry);
> +		entry = mk_zero_pte(haddr, vma->vm_page_prot);
>  		pte = pte_offset_map(&_pmd, haddr);
>  		VM_BUG_ON(!pte_none(*pte));
>  		set_pte_at(mm, haddr, pte, entry);
> diff --git a/mm/memory.c b/mm/memory.c
> index ab650c21bccd..c5e0c87a12b7 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2927,8 +2927,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  	/* Use the zero-page for reads */
>  	if (!(vmf->flags & FAULT_FLAG_WRITE) &&
>  			!mm_forbids_zeropage(vma->vm_mm)) {
> -		entry = pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address),
> -						vma->vm_page_prot));
> +		entry = mk_zero_pte(vmf->address, vma->vm_page_prot);
>  		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
>  				vmf->address, &vmf->ptl);
>  		if (!pte_none(*vmf->pte))
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index d59b5a73dfb3..ac1ce3866036 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -122,8 +122,7 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm,
>  	pgoff_t offset, max_off;
>  	struct inode *inode;
> 
> -	_dst_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr),
> -					 dst_vma->vm_page_prot));
> +	_dst_pte = mk_zero_pte(dst_addr, dst_vma->vm_page_prot);
>  	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
>  	if (dst_vma->vm_file) {
>  		/* the shmem MAP_PRIVATE case requires checking the i_size */
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 05/62] mm/page_alloc: Handle allocation for encrypted memory
  2019-05-08 14:43 ` [PATCH, RFC 05/62] mm/page_alloc: Handle allocation for encrypted memory Kirill A. Shutemov
@ 2019-05-29  7:21   ` Mike Rapoport
  2019-05-29 12:47     ` Kirill A. Shutemov
  0 siblings, 1 reply; 153+ messages in thread
From: Mike Rapoport @ 2019-05-29  7:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:43:25PM +0300, Kirill A. Shutemov wrote:
> For encrypted memory, we need to allocate pages for a specific
> encryption KeyID.
> 
> There are two cases when we need to allocate a page for encryption:
> 
>  - Allocation for an encrypted VMA;
> 
>  - Allocation for migration of encrypted page;
> 
> The first case can be covered within alloc_page_vma(). We know KeyID
> from the VMA.
> 
> The second case requires few new page allocation routines that would
> allocate the page for a specific KeyID.
> 
> An encrypted page has to be cleared after KeyID set. This is handled
> in prep_encrypted_page() that will be provided by arch-specific code.
> 
> Any custom allocator that dials with encrypted pages has to call

Nit:                       ^ deals

> prep_encrypted_page() too. See compaction_alloc() for instance.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  include/linux/gfp.h     | 45 ++++++++++++++++++++++++++++++++-----
>  include/linux/migrate.h | 14 +++++++++---
>  mm/compaction.c         |  3 +++
>  mm/mempolicy.c          | 27 ++++++++++++++++------
>  mm/migrate.c            |  4 ++--
>  mm/page_alloc.c         | 50 +++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 126 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index b101aa294157..1716dbe587c9 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -463,16 +463,43 @@ static inline void arch_free_page(struct page *page, int order) { }
>  static inline void arch_alloc_page(struct page *page, int order) { }
>  #endif
> 
> +#ifndef prep_encrypted_page

An explanation of what is expected from the architecture specific
implementation would be nice.

> +static inline void prep_encrypted_page(struct page *page, int order,
> +		int keyid, bool zero)
> +{
> +}
> +#endif
> +
> +/*
> + * Encrypted page has to be cleared once keyid is set, not on allocation.
> + */
> +static inline bool deferred_page_zero(int keyid, gfp_t *gfp_mask)
> +{
> +	if (keyid && (*gfp_mask & __GFP_ZERO)) {
> +		*gfp_mask &= ~__GFP_ZERO;
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  struct page *
>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  							nodemask_t *nodemask);
> 
> +struct page *
> +__alloc_pages_nodemask_keyid(gfp_t gfp_mask, unsigned int order,
> +		int preferred_nid, nodemask_t *nodemask, int keyid);
> +
>  static inline struct page *
>  __alloc_pages(gfp_t gfp_mask, unsigned int order, int preferred_nid)
>  {
>  	return __alloc_pages_nodemask(gfp_mask, order, preferred_nid, NULL);
>  }
> 
> +struct page *__alloc_pages_node_keyid(int nid, int keyid,
> +		gfp_t gfp_mask, unsigned int order);
> +
>  /*
>   * Allocate pages, preferring the node given as nid. The node must be valid and
>   * online. For more general interface, see alloc_pages_node().
> @@ -500,6 +527,19 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  	return __alloc_pages_node(nid, gfp_mask, order);
>  }
> 
> +static inline struct page *alloc_pages_node_keyid(int nid, int keyid,
> +		gfp_t gfp_mask, unsigned int order)
> +{
> +	if (nid == NUMA_NO_NODE)
> +		nid = numa_mem_id();
> +
> +	return __alloc_pages_node_keyid(nid, keyid, gfp_mask, order);
> +}
> +
> +extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
> +			struct vm_area_struct *vma, unsigned long addr,
> +			int node, bool hugepage);
> +
>  #ifdef CONFIG_NUMA
>  extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order);
> 
> @@ -508,14 +548,9 @@ alloc_pages(gfp_t gfp_mask, unsigned int order)
>  {
>  	return alloc_pages_current(gfp_mask, order);
>  }
> -extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
> -			struct vm_area_struct *vma, unsigned long addr,
> -			int node, bool hugepage);
>  #else
>  #define alloc_pages(gfp_mask, order) \
>  		alloc_pages_node(numa_node_id(), gfp_mask, order)
> -#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
> -	alloc_pages(gfp_mask, order)
>  #endif
>  #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
>  #define alloc_page_vma(gfp_mask, vma, addr)			\
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index e13d9bf2f9a5..a6e068762d08 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -38,9 +38,16 @@ static inline struct page *new_page_nodemask(struct page *page,
>  	unsigned int order = 0;
>  	struct page *new_page = NULL;
> 
> -	if (PageHuge(page))
> +	if (PageHuge(page)) {
> +		/*
> +		 * HugeTLB doesn't support encryption. We shouldn't see
> +		 * such pages.
> +		 */
> +		if (WARN_ON_ONCE(page_keyid(page)))
> +			return NULL;
>  		return alloc_huge_page_nodemask(page_hstate(compound_head(page)),
>  				preferred_nid, nodemask);
> +	}
> 
>  	if (PageTransHuge(page)) {
>  		gfp_mask |= GFP_TRANSHUGE;
> @@ -50,8 +57,9 @@ static inline struct page *new_page_nodemask(struct page *page,
>  	if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
>  		gfp_mask |= __GFP_HIGHMEM;
> 
> -	new_page = __alloc_pages_nodemask(gfp_mask, order,
> -				preferred_nid, nodemask);
> +	/* Allocate a page with the same KeyID as the source page */
> +	new_page = __alloc_pages_nodemask_keyid(gfp_mask, order,
> +				preferred_nid, nodemask, page_keyid(page));
> 
>  	if (new_page && PageTransHuge(new_page))
>  		prep_transhuge_page(new_page);
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 3319e0872d01..559b8bd6d245 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1557,6 +1557,9 @@ static struct page *compaction_alloc(struct page *migratepage,
>  	list_del(&freepage->lru);
>  	cc->nr_freepages--;
> 
> +	/* Prepare the page using the same KeyID as the source page */
> +	if (freepage)
> +		prep_encrypted_page(freepage, 0, page_keyid(migratepage), false);
>  	return freepage;
>  }
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 14b18449c623..5cad39fb7b35 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -961,22 +961,29 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
>  /* page allocation callback for NUMA node migration */
>  struct page *alloc_new_node_page(struct page *page, unsigned long node)
>  {
> -	if (PageHuge(page))
> +	if (PageHuge(page)) {
> +		/*
> +		 * HugeTLB doesn't support encryption. We shouldn't see
> +		 * such pages.
> +		 */
> +		if (WARN_ON_ONCE(page_keyid(page)))
> +			return NULL;
>  		return alloc_huge_page_node(page_hstate(compound_head(page)),
>  					node);
> -	else if (PageTransHuge(page)) {
> +	} else if (PageTransHuge(page)) {
>  		struct page *thp;
> 
> -		thp = alloc_pages_node(node,
> +		thp = alloc_pages_node_keyid(node, page_keyid(page),
>  			(GFP_TRANSHUGE | __GFP_THISNODE),
>  			HPAGE_PMD_ORDER);
>  		if (!thp)
>  			return NULL;
>  		prep_transhuge_page(thp);
>  		return thp;
> -	} else
> -		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
> -						    __GFP_THISNODE, 0);
> +	} else {
> +		return __alloc_pages_node_keyid(node, page_keyid(page),
> +				GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0);
> +	}
>  }
> 
>  /*
> @@ -2053,9 +2060,13 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
>  {
>  	struct mempolicy *pol;
>  	struct page *page;
> -	int preferred_nid;
> +	bool deferred_zero;
> +	int keyid, preferred_nid;
>  	nodemask_t *nmask;
> 
> +	keyid = vma_keyid(vma);
> +	deferred_zero = deferred_page_zero(keyid, &gfp);
> +
>  	pol = get_vma_policy(vma, addr);
> 
>  	if (pol->mode == MPOL_INTERLEAVE) {
> @@ -2097,6 +2108,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
>  	page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask);
>  	mpol_cond_put(pol);
>  out:
> +	if (page)
> +		prep_encrypted_page(page, order, keyid, deferred_zero);
>  	return page;
>  }
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 663a5449367a..04b36a56865d 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1880,7 +1880,7 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
>  	int nid = (int) data;
>  	struct page *newpage;
> 
> -	newpage = __alloc_pages_node(nid,
> +	newpage = __alloc_pages_node_keyid(nid, page_keyid(page),
>  					 (GFP_HIGHUSER_MOVABLE |
>  					  __GFP_THISNODE | __GFP_NOMEMALLOC |
>  					  __GFP_NORETRY | __GFP_NOWARN) &
> @@ -2006,7 +2006,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>  	int page_lru = page_is_file_cache(page);
>  	unsigned long start = address & HPAGE_PMD_MASK;
> 
> -	new_page = alloc_pages_node(node,
> +	new_page = alloc_pages_node_keyid(node, page_keyid(page),
>  		(GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
>  		HPAGE_PMD_ORDER);
>  	if (!new_page)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c02cff1ed56e..ab1d8661aa87 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3930,6 +3930,41 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla
>  }
>  #endif /* CONFIG_COMPACTION */
> 
> +#ifndef CONFIG_NUMA
> +struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
> +		struct vm_area_struct *vma, unsigned long addr,
> +		int node, bool hugepage)
> +{

Having NUMA and !NUMA alloc_pages_vma() in different place is confusing,
but I don't have a better suggestion :(

> +	struct page *page;
> +	bool deferred_zero;
> +	int keyid = vma_keyid(vma);
> +
> +	deferred_zero = deferred_page_zero(keyid, &gfp_mask);
> +	page = alloc_pages(gfp_mask, order);
> +	if (page)
> +		prep_encrypted_page(page, order, keyid, deferred_zero);
> +
> +	return page;
> +}
> +#endif
> +
> +struct page * __alloc_pages_node_keyid(int nid, int keyid,
> +		gfp_t gfp_mask, unsigned int order)
> +{

A kerneldoc description would be appreciated

> +	struct page *page;
> +	bool deferred_zero;
> +
> +	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
> +	VM_WARN_ON(!node_online(nid));
> +
> +	deferred_zero = deferred_page_zero(keyid, &gfp_mask);
> +	page = __alloc_pages(gfp_mask, order, nid);
> +	if (page)
> +		prep_encrypted_page(page, order, keyid, deferred_zero);
> +
> +	return page;
> +}

Shouldn't it be EXPORT_SYMBOL?

> +
>  #ifdef CONFIG_LOCKDEP
>  static struct lockdep_map __fs_reclaim_map =
>  	STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
> @@ -4645,6 +4680,21 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  }
>  EXPORT_SYMBOL(__alloc_pages_nodemask);
>
> +struct page *
> +__alloc_pages_nodemask_keyid(gfp_t gfp_mask, unsigned int order,
> +		int preferred_nid, nodemask_t *nodemask, int keyid)
> +{

A kerneldoc description would be appreciated

> +	struct page *page;
> +	bool deferred_zero;
> +
> +	deferred_zero = deferred_page_zero(keyid, &gfp_mask);
> +	page = __alloc_pages_nodemask(gfp_mask, order, preferred_nid, nodemask);
> +	if (page)
> +		prep_encrypted_page(page, order, keyid, deferred_zero);
> +	return page;
> +}
> +EXPORT_SYMBOL(__alloc_pages_nodemask_keyid);
> +
>  /*
>   * Common helper functions. Never use with __GFP_HIGHMEM because the returned
>   * address cannot represent highmem pages. Use alloc_pages and then kmap if
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 43/62] syscall/x86: Wire up a system call for MKTME encryption keys
  2019-05-08 14:44 ` [PATCH, RFC 43/62] syscall/x86: Wire up a system call for MKTME encryption keys Kirill A. Shutemov
@ 2019-05-29  7:21   ` Mike Rapoport
  2019-05-29 18:12     ` Alison Schofield
  0 siblings, 1 reply; 153+ messages in thread
From: Mike Rapoport @ 2019-05-29  7:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:44:03PM +0300, Kirill A. Shutemov wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> encrypt_mprotect() is a new system call to support memory encryption.
> 
> It takes the same parameters as legacy mprotect, plus an additional
> key serial number that is mapped to an encryption keyid.

Shouldn't this patch be after the encrypt_mprotect() is added?
 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/entry/syscalls/syscall_32.tbl | 1 +
>  arch/x86/entry/syscalls/syscall_64.tbl | 1 +
>  include/linux/syscalls.h               | 2 ++
>  include/uapi/asm-generic/unistd.h      | 4 +++-
>  kernel/sys_ni.c                        | 2 ++
>  5 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 1f9607ed087c..dbcd4c28d743 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -433,3 +433,4 @@
>  425	i386	io_uring_setup		sys_io_uring_setup		__ia32_sys_io_uring_setup
>  426	i386	io_uring_enter		sys_io_uring_enter		__ia32_sys_io_uring_enter
>  427	i386	io_uring_register	sys_io_uring_register		__ia32_sys_io_uring_register
> +428	i386	encrypt_mprotect	sys_encrypt_mprotect		__ia32_sys_encrypt_mprotect
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 92ee0b4378d4..d01bd132e9ee 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -349,6 +349,7 @@
>  425	common	io_uring_setup		__x64_sys_io_uring_setup
>  426	common	io_uring_enter		__x64_sys_io_uring_enter
>  427	common	io_uring_register	__x64_sys_io_uring_register
> +428	common	encrypt_mprotect	__x64_sys_encrypt_mprotect
> 
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index e446806a561f..38a2d7b95397 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -988,6 +988,8 @@ asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
>  asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
>  				       siginfo_t __user *info,
>  				       unsigned int flags);
> +asmlinkage long sys_encrypt_mprotect(unsigned long start, size_t len,
> +				     unsigned long prot, key_serial_t serial);
> 
>  /*
>   * Architecture-specific system calls
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index dee7292e1df6..86f942f54b1b 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -832,9 +832,11 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup)
>  __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
>  #define __NR_io_uring_register 427
>  __SYSCALL(__NR_io_uring_register, sys_io_uring_register)
> +#define __NR_encrypt_mprotect 428
> +__SYSCALL(__NR_encrypt_mprotect, sys_encrypt_mprotect)
> 
>  #undef __NR_syscalls
> -#define __NR_syscalls 428
> +#define __NR_syscalls 429
> 
>  /*
>   * 32 bit systems traditionally used different
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index d21f4befaea4..80da8d9ac8b1 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -350,6 +350,8 @@ COND_SYSCALL(pkey_mprotect);
>  COND_SYSCALL(pkey_alloc);
>  COND_SYSCALL(pkey_free);
> 
> +/* multi-key total memory encryption keys */
> +COND_SYSCALL(encrypt_mprotect);
> 
>  /*
>   * Architecture specific weak syscall entries.
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 57/62] x86/mktme: Overview of Multi-Key Total Memory Encryption
  2019-05-08 14:44 ` [PATCH, RFC 57/62] x86/mktme: Overview of Multi-Key Total Memory Encryption Kirill A. Shutemov
@ 2019-05-29  7:21   ` Mike Rapoport
  2019-05-29 18:13     ` Alison Schofield
  2019-07-14 18:16   ` Randy Dunlap
  1 sibling, 1 reply; 153+ messages in thread
From: Mike Rapoport @ 2019-05-29  7:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:44:17PM +0300, Kirill A. Shutemov wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Provide an overview of MKTME on Intel Platforms.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  Documentation/x86/mktme/index.rst          |  8 +++
>  Documentation/x86/mktme/mktme_overview.rst | 57 ++++++++++++++++++++++

I'd expect addition of mktme docs to Documentation/x86/index.rst

>  2 files changed, 65 insertions(+)
>  create mode 100644 Documentation/x86/mktme/index.rst
>  create mode 100644 Documentation/x86/mktme/mktme_overview.rst
> 
> diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
> new file mode 100644
> index 000000000000..1614b52dd3e9
> --- /dev/null
> +++ b/Documentation/x86/mktme/index.rst
> @@ -0,0 +1,8 @@
> +
> +=========================================
> +Multi-Key Total Memory Encryption (MKTME)
> +=========================================
> +
> +.. toctree::
> +
> +   mktme_overview
> diff --git a/Documentation/x86/mktme/mktme_overview.rst b/Documentation/x86/mktme/mktme_overview.rst
> new file mode 100644
> index 000000000000..59c023965554
> --- /dev/null
> +++ b/Documentation/x86/mktme/mktme_overview.rst
> @@ -0,0 +1,57 @@
> +Overview
> +=========
> +Multi-Key Total Memory Encryption (MKTME)[1] is a technology that
> +allows transparent memory encryption in upcoming Intel platforms.
> +It uses a new instruction (PCONFIG) for key setup and selects a
> +key for individual pages by repurposing physical address bits in
> +the page tables.
> +
> +Support for MKTME is added to the existing kernel keyring subsystem
> +and via a new mprotect_encrypt() system call that can be used by
> +applications to encrypt anonymous memory with keys obtained from
> +the keyring.
> +
> +This architecture supports encrypting both normal, volatile DRAM
> +and persistent memory.  However, persistent memory support is
> +not included in the Linux kernel implementation at this time.
> +(We anticipate adding that support next.)
> +
> +Hardware Background
> +===================
> +
> +MKTME is built on top of an existing single-key technology called
> +TME.  TME encrypts all system memory using a single key generated
> +by the CPU on every boot of the system. TME provides mitigation
> +against physical attacks, such as physically removing a DIMM or
> +watching memory bus traffic.
> +
> +MKTME enables the use of multiple encryption keys[2], allowing
> +selection of the encryption key per-page using the page tables.
> +Encryption keys are programmed into each memory controller and
> +the same set of keys is available to all entities on the system
> +with access to that memory (all cores, DMA engines, etc...).
> +
> +MKTME inherits many of the mitigations against hardware attacks
> +from TME.  Like TME, MKTME does not mitigate vulnerable or
> +malicious operating systems or virtual machine managers.  MKTME
> +offers additional mitigations when compared to TME.
> +
> +TME and MKTME use the AES encryption algorithm in the AES-XTS
> +mode.  This mode, typically used for block-based storage devices,
> +takes the physical address of the data into account when
> +encrypting each block.  This ensures that the effective key is
> +different for each block of memory. Moving encrypted content
> +across physical address results in garbage on read, mitigating
> +block-relocation attacks.  This property is the reason many of
> +the discussed attacks require control of a shared physical page
> +to be handed from the victim to the attacker.
> +
> +--
> +1. https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
> +2. The MKTME architecture supports up to 16 bits of KeyIDs, so a
> +   maximum of 65535 keys on top of the “TME key” at KeyID-0.  The
> +   first implementation is expected to support 5 bits, making 63
> +   keys available to applications.  However, this is not guaranteed.
> +   The number of available keys could be reduced if, for instance,
> +   additional physical address space is desired over additional
> +   KeyIDs.
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 00/62] Intel MKTME enabling
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (61 preceding siblings ...)
  2019-05-08 14:44 ` [PATCH, RFC 62/62] x86/mktme: Demonstration program using the MKTME APIs Kirill A. Shutemov
@ 2019-05-29  7:30 ` Mike Rapoport
  2019-05-29 18:20   ` Alison Schofield
  2019-06-14 12:15 ` Peter Zijlstra
  63 siblings, 1 reply; 153+ messages in thread
From: Mike Rapoport @ 2019-05-29  7:30 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:43:20PM +0300, Kirill A. Shutemov wrote:
> = Intro =
> 
> The patchset brings enabling of Intel Multi-Key Total Memory Encryption.
> It consists of changes into multiple subsystems:
> 
>  * Core MM: infrastructure for allocation pages, dealing with encrypted VMAs
>    and providing API setup encrypted mappings.
>  * arch/x86: feature enumeration, program keys into hardware, setup
>    page table entries for encrypted pages and more.
>  * Key management service: setup and management of encryption keys.
>  * DMA/IOMMU: dealing with encrypted memory on IO side.
>  * KVM: interaction with virtualization side.
>  * Documentation: description of APIs and usage examples.
> 
> The patchset is huge. This submission aims to give view to the full picture and
> get feedback on the overall design. The patchset will be split into more
> digestible pieces later.
> 
> Please review. Any feedback is welcome.

It would be nice to have a brief usage description in cover letter rather
than in the last patches in the series ;-)
 
> = Overview =
> 
> Multi-Key Total Memory Encryption (MKTME)[1] is a technology that allows
> transparent memory encryption in upcoming Intel platforms.  It uses a new
> instruction (PCONFIG) for key setup and selects a key for individual pages by
> repurposing physical address bits in the page tables.
> 
> These patches add support for MKTME into the existing kernel keyring subsystem
> and add a new mprotect_encrypt() system call that can be used by applications
> to encrypt anonymous memory with keys obtained from the keyring.
> 
> This architecture supports encrypting both normal, volatile DRAM and persistent
> memory.  However, these patches do not implement persistent memory support.  We
> anticipate adding that support next.
> 
> == Hardware Background ==
> 
> MKTME is built on top of an existing single-key technology called TME.  TME
> encrypts all system memory using a single key generated by the CPU on every
> boot of the system. TME provides mitigation against physical attacks, such as
> physically removing a DIMM or watching memory bus traffic.
> 
> MKTME enables the use of multiple encryption keys[2], allowing selection of the
> encryption key per-page using the page tables.  Encryption keys are programmed
> into each memory controller and the same set of keys is available to all
> entities on the system with access to that memory (all cores, DMA engines,
> etc...).
> 
> MKTME inherits many of the mitigations against hardware attacks from TME.  Like
> TME, MKTME does not mitigate vulnerable or malicious operating systems or
> virtual machine managers.  MKTME offers additional mitigations when compared to
> TME.
> 
> TME and MKTME use the AES encryption algorithm in the AES-XTS mode.  This mode,
> typically used for block-based storage devices, takes the physical address of
> the data into account when encrypting each block.  This ensures that the
> effective key is different for each block of memory. Moving encrypted content
> across physical address results in garbage on read, mitigating block-relocation
> attacks.  This property is the reason many of the discussed attacks require
> control of a shared physical page to be handed from the victim to the attacker.
> 
> == MKTME-Provided Mitigations ==
> 
> MKTME adds a few mitigations against attacks that are not mitigated when using
> TME alone.  The first set are mitigations against software attacks that are
> familiar today:
> 
>  * Kernel Mapping Attacks: information disclosures that leverage the
>    kernel direct map are mitigated against disclosing user data.
>  * Freed Data Leak Attacks: removing an encryption key from the
>    hardware mitigates future user information disclosure.
> 
> The next set are attacks that depend on specialized hardware, such as an “evil
> DIMM” or a DDR interposer:
> 
>  * Cross-Domain Replay Attack: data is captured from one domain
>    (guest) and replayed to another at a later time.
>  * Cross-Domain Capture and Delayed Compare Attack: data is captured
>    and later analyzed to discover secrets.
>  * Key Wear-out Attack: data is captured and analyzed in order to
>    Weaken the AES encryption itself.
> 
> More details on these attacks are below.
> 
> === Kernel Mapping Attacks ===
> 
> Information disclosure vulnerabilities leverage the kernel direct map because
> many vulnerabilities involve manipulation of kernel data structures (examples:
> CVE-2017-7277, CVE-2017-9605).  We normally think of these bugs as leaking
> valuable *kernel* data, but they can leak application data when application
> pages are recycled for kernel use.
> 
> With this MKTME implementation, there is a direct map created for each MKTME
> KeyID which is used whenever the kernel needs to access plaintext.  But, all
> kernel data structures are accessed via the direct map for KeyID-0.  Thus,
> memory reads which are not coordinated with the KeyID get garbage (for example,
> accessing KeyID-4 data with the KeyID-0 mapping).
> 
> This means that if sensitive data encrypted using MKTME is leaked via the
> KeyID-0 direct map, ciphertext decrypted with the wrong key will be disclosed.
> To disclose plaintext, an attacker must “pivot” to the correct direct mapping,
> which is non-trivial because there are no kernel data structures in the
> KeyID!=0 direct mapping.
> 
> === Freed Data Leak Attack ===
> 
> The kernel has a history of bugs around uninitialized data.  Usually, we think
> of these bugs as leaking sensitive kernel data, but they can also be used to
> leak application secrets.
> 
> MKTME can help mitigate the case where application secrets are leaked:
> 
>  * App (or VM) places a secret in a page
>  * App exits or frees memory to kernel allocator
>  * Page added to allocator free list
>  * Attacker reallocates page to a purpose where it can read the page
> 
> Now, imagine MKTME was in use on the memory being leaked.  The data can only be
> leaked as long as the key is programmed in the hardware.  If the key is
> de-programmed, like after all pages are freed after a guest is shut down, any
> future reads will just see ciphertext.
> 
> Basically, the key is a convenient choke-point: you can be more confident that
> data encrypted with it is inaccessible once the key is removed.
> 
> === Cross-Domain Replay Attack ===
> 
> MKTME mitigates cross-domain replay attacks where an attacker replaces an
> encrypted block owned by one domain with a block owned by another domain.
> MKTME does not prevent this replacement from occurring, but it does mitigate
> plaintext from being disclosed if the domains use different keys.
> 
> With TME, the attack could be executed by:
>  * A victim places secret in memory, at a given physical address.
>    Note: AES-XTS is what restricts the attack to being performed at a
>    single physical address instead of across different physical
>    addresses
>  * Attacker captures victim secret’s ciphertext
>  * Later on, after victim frees the physical address, attacker gains
>    ownership
>  * Attacker puts the ciphertext at the address and get the secret
>    plaintext
> 
> But, due to the presumably different keys used by the attacker and the victim,
> the attacker can not successfully decrypt old ciphertext.
> 
> === Cross-Domain Capture and Delayed Compare Attack ===
> 
> This is also referred to as a kind of dictionary attack.
> 
> Similarly, MKTME protects against cross-domain capture-and-compare attacks.
> Consider the following scenario:
>  * A victim places a secret in memory, at a known physical address
>  * Attacker captures victim’s ciphertext
>  * Attacker gains control of the target physical address, perhaps
>    after the victim’s VM is shut down or its memory reclaimed.
>  * Attacker computes and writes many possible plaintexts until new
>    ciphertext matches content captured previously.
> 
> Secrets which have low (plaintext) entropy are more vulnerable to this attack
> because they reduce the number of possible plaintexts an attacker has to
> compute and write.
> 
> The attack will not work if attacker and victim uses different keys.
> 
> === Key Wear-out Attack ===
> 
> Repeated use of an encryption key might be used by an attacker to infer
> information about the key or the plaintext, weakening the encryption.  The
> higher the bandwidth of the encryption engine, the more vulnerable the key is
> to wear-out.  The MKTME memory encryption hardware works at the speed of the
> memory bus, which has high bandwidth.
> 
> Such a weakness has been demonstrated[3] on a theoretical cipher with similar
> properties as AES-XTS.
> 
> An attack would take the following steps:
>  * Victim system is using TME with AES-XTS-128
>  * Attacker repeatedly captures ciphertext/plaintext pairs (can be
>    Performed with online hardware attack like an interposer).
>  * Attacker compels repeated use of the key under attack for a
>    sustained time period without a system reboot[4].
>  * Attacker discovers a cipertext collision (two plaintexts
>    translating to the same ciphertext)
>  * Attacker can induce controlled modifications to the targeted
>    plaintext by modifying the colliding ciphertext
> 
> MKTME mitigates key wear-out in two ways:
>  * Keys can be rotated periodically to mitigate wear-out.  Since TME
>    keys are generated at boot, rotation of TME keys requires a
>    reboot.  In contrast, MKTME allows rotation while the system is
>    booted.  An application could implement a policy to rotate keys at
>    a frequency which is not feasible to attack.
>  * In the case that MKTME is used to encrypt two guests’ memory with
>    two different keys, an attack on one guest’s key would not weaken
>    the key used in the second guest.
> 
> --
> 
> [1] https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
> [2] The MKTME architecture supports up to 16 bits of KeyIDs, so a
>     maximum of 65535 keys on top of the “TME key” at KeyID-0.  The
>     first implementation is expected to support 5 bits, making 63 keys
>     available to applications.  However, this is not guaranteed.  The
>     number of available keys could be reduced if, for instance,
>     additional physical address space is desired over additional
>     KeyIDs.
> [3] http://web.cs.ucdavis.edu/~rogaway/papers/offsets.pdf
> [4] This sustained time required for an attack could vary from days
>     to years depending on the attacker’s goals.
> 
> Alison Schofield (33):
>   x86/pconfig: Set a valid encryption algorithm for all MKTME commands
>   keys/mktme: Introduce a Kernel Key Service for MKTME
>   keys/mktme: Preparse the MKTME key payload
>   keys/mktme: Instantiate and destroy MKTME keys
>   keys/mktme: Move the MKTME payload into a cache aligned structure
>   keys/mktme: Strengthen the entropy of CPU generated MKTME keys
>   keys/mktme: Set up PCONFIG programming targets for MKTME keys
>   keys/mktme: Program MKTME keys into the platform hardware
>   keys/mktme: Set up a percpu_ref_count for MKTME keys
>   keys/mktme: Require CAP_SYS_RESOURCE capability for MKTME keys
>   keys/mktme: Store MKTME payloads if cmdline parameter allows
>   acpi: Remove __init from acpi table parsing functions
>   acpi/hmat: Determine existence of an ACPI HMAT
>   keys/mktme: Require ACPI HMAT to register the MKTME Key Service
>   acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME
>   keys/mktme: Do not allow key creation in unsafe topologies
>   keys/mktme: Support CPU hotplug for MKTME key service
>   keys/mktme: Find new PCONFIG targets during memory hotplug
>   keys/mktme: Program new PCONFIG targets with MKTME keys
>   keys/mktme: Support memory hotplug for MKTME keys
>   mm: Generalize the mprotect implementation to support extensions
>   syscall/x86: Wire up a system call for MKTME encryption keys
>   x86/mm: Set KeyIDs in encrypted VMAs for MKTME
>   mm: Add the encrypt_mprotect() system call for MKTME
>   x86/mm: Keep reference counts on encrypted VMAs for MKTME
>   mm: Restrict MKTME memory encryption to anonymous VMAs
>   selftests/x86/mktme: Test the MKTME APIs
>   x86/mktme: Overview of Multi-Key Total Memory Encryption
>   x86/mktme: Document the MKTME provided security mitigations
>   x86/mktme: Document the MKTME kernel configuration requirements
>   x86/mktme: Document the MKTME Key Service API
>   x86/mktme: Document the MKTME API for anonymous memory encryption
>   x86/mktme: Demonstration program using the MKTME APIs
> 
> Jacob Pan (3):
>   iommu/vt-d: Support MKTME in DMA remapping
>   x86/mm: introduce common code for mem encryption
>   x86/mm: Use common code for DMA memory encryption
> 
> Kai Huang (2):
>   mm, x86: export several MKTME variables
>   kvm, x86, mmu: setup MKTME keyID to spte for given PFN
> 
> Kirill A. Shutemov (24):
>   mm: Do no merge VMAs with different encryption KeyIDs
>   mm: Add helpers to setup zero page mappings
>   mm/ksm: Do not merge pages with different KeyIDs
>   mm/page_alloc: Unify alloc_hugepage_vma()
>   mm/page_alloc: Handle allocation for encrypted memory
>   mm/khugepaged: Handle encrypted pages
>   x86/mm: Mask out KeyID bits from page table entry pfn
>   x86/mm: Introduce variables to store number, shift and mask of KeyIDs
>   x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
>   x86/mm: Detect MKTME early
>   x86/mm: Add a helper to retrieve KeyID for a page
>   x86/mm: Add a helper to retrieve KeyID for a VMA
>   x86/mm: Add hooks to allocate and free encrypted pages
>   x86/mm: Map zero pages into encrypted mappings correctly
>   x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING
>   x86/mm: Allow to disable MKTME after enumeration
>   x86/mm: Calculate direct mapping size
>   x86/mm: Implement syncing per-KeyID direct mappings
>   x86/mm: Handle encrypted memory in page_to_virt() and __pa()
>   mm/page_ext: Export lookup_page_ext() symbol
>   mm/rmap: Clear vma->anon_vma on unlink_anon_vmas()
>   x86/mm: Disable MKTME on incompatible platform configurations
>   x86/mm: Disable MKTME if not all system memory supports encryption
>   x86: Introduce CONFIG_X86_INTEL_MKTME
> 
>  .../admin-guide/kernel-parameters.rst         |   1 +
>  .../admin-guide/kernel-parameters.txt         |  11 +
>  Documentation/x86/mktme/index.rst             |  13 +
>  .../x86/mktme/mktme_configuration.rst         |  17 +
>  Documentation/x86/mktme/mktme_demo.rst        |  53 ++
>  Documentation/x86/mktme/mktme_encrypt.rst     |  57 ++
>  Documentation/x86/mktme/mktme_keys.rst        |  96 +++
>  Documentation/x86/mktme/mktme_mitigations.rst | 150 ++++
>  Documentation/x86/mktme/mktme_overview.rst    |  57 ++
>  Documentation/x86/x86_64/mm.txt               |   4 +
>  arch/alpha/include/asm/page.h                 |   2 +-
>  arch/x86/Kconfig                              |  29 +-
>  arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
>  arch/x86/include/asm/intel-family.h           |   2 +
>  arch/x86/include/asm/intel_pconfig.h          |  14 +-
>  arch/x86/include/asm/mem_encrypt.h            |  29 +
>  arch/x86/include/asm/mktme.h                  |  93 +++
>  arch/x86/include/asm/page.h                   |   4 +
>  arch/x86/include/asm/page_32.h                |   1 +
>  arch/x86/include/asm/page_64.h                |   4 +-
>  arch/x86/include/asm/pgtable.h                |  19 +
>  arch/x86/include/asm/pgtable_types.h          |  23 +-
>  arch/x86/include/asm/setup.h                  |   6 +
>  arch/x86/kernel/cpu/intel.c                   |  58 +-
>  arch/x86/kernel/head64.c                      |   4 +
>  arch/x86/kernel/setup.c                       |   3 +
>  arch/x86/kvm/mmu.c                            |  18 +-
>  arch/x86/mm/Makefile                          |   3 +
>  arch/x86/mm/init_64.c                         |  68 ++
>  arch/x86/mm/kaslr.c                           |  11 +-
>  arch/x86/mm/mem_encrypt_common.c              |  28 +
>  arch/x86/mm/mktme.c                           | 630 ++++++++++++++
>  drivers/acpi/hmat/hmat.c                      |  67 ++
>  drivers/acpi/tables.c                         |  10 +-
>  drivers/firmware/efi/efi.c                    |  25 +-
>  drivers/iommu/intel-iommu.c                   |  29 +-
>  fs/dax.c                                      |   3 +-
>  fs/exec.c                                     |   4 +-
>  fs/userfaultfd.c                              |   7 +-
>  include/asm-generic/pgtable.h                 |   8 +
>  include/keys/mktme-type.h                     |  39 +
>  include/linux/acpi.h                          |   9 +-
>  include/linux/dma-direct.h                    |   4 +-
>  include/linux/efi.h                           |   1 +
>  include/linux/gfp.h                           |  51 +-
>  include/linux/intel-iommu.h                   |   9 +-
>  include/linux/mem_encrypt.h                   |  23 +-
>  include/linux/migrate.h                       |  14 +-
>  include/linux/mm.h                            |  27 +-
>  include/linux/page_ext.h                      |  11 +-
>  include/linux/syscalls.h                      |   2 +
>  include/uapi/asm-generic/unistd.h             |   4 +-
>  kernel/fork.c                                 |   2 +
>  kernel/sys_ni.c                               |   2 +
>  mm/compaction.c                               |   3 +
>  mm/huge_memory.c                              |   6 +-
>  mm/khugepaged.c                               |  10 +
>  mm/ksm.c                                      |  17 +
>  mm/madvise.c                                  |   2 +-
>  mm/memory.c                                   |   3 +-
>  mm/mempolicy.c                                |  30 +-
>  mm/migrate.c                                  |   4 +-
>  mm/mlock.c                                    |   2 +-
>  mm/mmap.c                                     |  31 +-
>  mm/mprotect.c                                 |  98 ++-
>  mm/page_alloc.c                               |  50 ++
>  mm/page_ext.c                                 |   5 +
>  mm/rmap.c                                     |   4 +-
>  mm/userfaultfd.c                              |   3 +-
>  security/keys/Makefile                        |   1 +
>  security/keys/mktme_keys.c                    | 768 ++++++++++++++++++
>  .../selftests/x86/mktme/encrypt_tests.c       | 433 ++++++++++
>  .../testing/selftests/x86/mktme/flow_tests.c  | 266 ++++++
>  tools/testing/selftests/x86/mktme/key_tests.c | 526 ++++++++++++
>  .../testing/selftests/x86/mktme/mktme_test.c  | 300 +++++++
>  76 files changed, 4301 insertions(+), 122 deletions(-)
>  create mode 100644 Documentation/x86/mktme/index.rst
>  create mode 100644 Documentation/x86/mktme/mktme_configuration.rst
>  create mode 100644 Documentation/x86/mktme/mktme_demo.rst
>  create mode 100644 Documentation/x86/mktme/mktme_encrypt.rst
>  create mode 100644 Documentation/x86/mktme/mktme_keys.rst
>  create mode 100644 Documentation/x86/mktme/mktme_mitigations.rst
>  create mode 100644 Documentation/x86/mktme/mktme_overview.rst
>  create mode 100644 arch/x86/include/asm/mktme.h
>  create mode 100644 arch/x86/mm/mem_encrypt_common.c
>  create mode 100644 arch/x86/mm/mktme.c
>  create mode 100644 include/keys/mktme-type.h
>  create mode 100644 security/keys/mktme_keys.c
>  create mode 100644 tools/testing/selftests/x86/mktme/encrypt_tests.c
>  create mode 100644 tools/testing/selftests/x86/mktme/flow_tests.c
>  create mode 100644 tools/testing/selftests/x86/mktme/key_tests.c
>  create mode 100644 tools/testing/selftests/x86/mktme/mktme_test.c
> 
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 05/62] mm/page_alloc: Handle allocation for encrypted memory
  2019-05-29  7:21   ` Mike Rapoport
@ 2019-05-29 12:47     ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-05-29 12:47 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells, Kees Cook, Dave Hansen,
	Kai Huang, Jacob Pan, Alison Schofield, linux-mm, kvm, keyrings,
	linux-kernel

On Wed, May 29, 2019 at 10:21:25AM +0300, Mike Rapoport wrote:
> Shouldn't it be EXPORT_SYMBOL?

We don't have callers outside core-mm at the moment.

I'll add kerneldoc in the next submission.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 43/62] syscall/x86: Wire up a system call for MKTME encryption keys
  2019-05-29  7:21   ` Mike Rapoport
@ 2019-05-29 18:12     ` Alison Schofield
  0 siblings, 0 replies; 153+ messages in thread
From: Alison Schofield @ 2019-05-29 18:12 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells, Kees Cook, Dave Hansen,
	Kai Huang, Jacob Pan, linux-mm, kvm, keyrings, linux-kernel

On Wed, May 29, 2019 at 10:21:37AM +0300, Mike Rapoport wrote:
> On Wed, May 08, 2019 at 05:44:03PM +0300, Kirill A. Shutemov wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > encrypt_mprotect() is a new system call to support memory encryption.
> > 
> > It takes the same parameters as legacy mprotect, plus an additional
> > key serial number that is mapped to an encryption keyid.
> 
> Shouldn't this patch be after the encrypt_mprotect() is added?

COND_SYSCALL(encrypt_mprotect) defined in kernel/sys_ni.c, allowed
it to build in this order, but the order is not logical. Thanks for
pointing it out. I will reorder the two patches.

Alison


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 57/62] x86/mktme: Overview of Multi-Key Total Memory Encryption
  2019-05-29  7:21   ` Mike Rapoport
@ 2019-05-29 18:13     ` Alison Schofield
  0 siblings, 0 replies; 153+ messages in thread
From: Alison Schofield @ 2019-05-29 18:13 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells, Kees Cook, Dave Hansen,
	Kai Huang, Jacob Pan, linux-mm, kvm, keyrings, linux-kernel

On Wed, May 29, 2019 at 10:21:48AM +0300, Mike Rapoport wrote:
> On Wed, May 08, 2019 at 05:44:17PM +0300, Kirill A. Shutemov wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Provide an overview of MKTME on Intel Platforms.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  Documentation/x86/mktme/index.rst          |  8 +++
> >  Documentation/x86/mktme/mktme_overview.rst | 57 ++++++++++++++++++++++
> 
> I'd expect addition of mktme docs to Documentation/x86/index.rst

Got it. Thanks.
Alison


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 00/62] Intel MKTME enabling
  2019-05-29  7:30 ` [PATCH, RFC 00/62] Intel MKTME enabling Mike Rapoport
@ 2019-05-29 18:20   ` Alison Schofield
  0 siblings, 0 replies; 153+ messages in thread
From: Alison Schofield @ 2019-05-29 18:20 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells, Kees Cook, Dave Hansen,
	Kai Huang, Jacob Pan, linux-mm, kvm, keyrings, linux-kernel

On Wed, May 29, 2019 at 10:30:07AM +0300, Mike Rapoport wrote:
> On Wed, May 08, 2019 at 05:43:20PM +0300, Kirill A. Shutemov wrote:
> > = Intro =
> > 
> > The patchset brings enabling of Intel Multi-Key Total Memory Encryption.
> > It consists of changes into multiple subsystems:
> > 
> >  * Core MM: infrastructure for allocation pages, dealing with encrypted VMAs
> >    and providing API setup encrypted mappings.
> >  * arch/x86: feature enumeration, program keys into hardware, setup
> >    page table entries for encrypted pages and more.
> >  * Key management service: setup and management of encryption keys.
> >  * DMA/IOMMU: dealing with encrypted memory on IO side.
> >  * KVM: interaction with virtualization side.
> >  * Documentation: description of APIs and usage examples.
> > 
> > The patchset is huge. This submission aims to give view to the full picture and
> > get feedback on the overall design. The patchset will be split into more
> > digestible pieces later.
> > 
> > Please review. Any feedback is welcome.
> 
> It would be nice to have a brief usage description in cover letter rather
> than in the last patches in the series ;-)
>  

Thanks for making it all the way to the last patches in the set ;)

Yes, we will certainly include that usage model in the cover letters
of future patchsets. 

Alison

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 09/62] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
  2019-05-08 14:43 ` [PATCH, RFC 09/62] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify() Kirill A. Shutemov
@ 2019-06-14  9:15   ` Peter Zijlstra
  2019-06-14 13:03     ` Kirill A. Shutemov
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14  9:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:43:29PM +0300, Kirill A. Shutemov wrote:
> + * Cast PAGE_MASK to a signed type so that it is sign-extended if
> + * virtual addresses are 32-bits but physical addresses are larger
> + * (ie, 32-bit PAE).

On 32bit, 'long' is still 32bit, did you want to cast to 'long long'
instead? Ideally we'd use pteval_t here, but I see that is unsigned.

>   */
> -#define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
> +#define PTE_PFN_MASK_MAX \
> +	(((signed long)PAGE_MASK) & ((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
> +#define _PAGE_CHG_MASK	(PTE_PFN_MASK_MAX | _PAGE_PCD | _PAGE_PWT |		\
>  			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY |	\
>  			 _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
>  #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages
  2019-05-08 14:43 ` [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages Kirill A. Shutemov
@ 2019-06-14  9:34   ` Peter Zijlstra
  2019-06-14 11:04     ` Peter Zijlstra
  2019-06-14 13:14     ` Kirill A. Shutemov
  0 siblings, 2 replies; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14  9:34 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:

> +/* Prepare page to be used for encryption. Called from page allocator. */
> +void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
> +{
> +	int i;
> +
> +	/*
> +	 * The hardware/CPU does not enforce coherency between mappings
> +	 * of the same physical page with different KeyIDs or
> +	 * encryption keys. We are responsible for cache management.
> +	 */

On alloc we should flush the unencrypted (key=0) range, while on free
(below) we should flush the encrypted (key!=0) range.

But I seem to have missed where page_address() does the right thing
here.

> +	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
> +
> +	for (i = 0; i < (1 << order); i++) {
> +		/* All pages coming out of the allocator should have KeyID 0 */
> +		WARN_ON_ONCE(lookup_page_ext(page)->keyid);
> +		lookup_page_ext(page)->keyid = keyid;
> +

So presumably page_address() is affected by this keyid, and the below
clear_highpage() then accesses the 'right' location?

> +		/* Clear the page after the KeyID is set. */
> +		if (zero)
> +			clear_highpage(page);
> +
> +		page++;
> +	}
> +}
> +
> +/*
> + * Handles freeing of encrypted page.
> + * Called from page allocator on freeing encrypted page.
> + */
> +void free_encrypted_page(struct page *page, int order)
> +{
> +	int i;
> +
> +	/*
> +	 * The hardware/CPU does not enforce coherency between mappings
> +	 * of the same physical page with different KeyIDs or
> +	 * encryption keys. We are responsible for cache management.
> +	 */

I still don't like that comment much; yes the hardware doesn't do it,
and yes we have to do it, but it doesn't explain the actual scheme
employed to do so.

> +	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
> +
> +	for (i = 0; i < (1 << order); i++) {
> +		/* Check if the page has reasonable KeyID */
> +		WARN_ON_ONCE(lookup_page_ext(page)->keyid > mktme_nr_keyids);

It should also check keyid > 0, so maybe:

	(unsigned)(keyid - 1) > keyids-1

instead?

> +		lookup_page_ext(page)->keyid = 0;
> +		page++;
> +	}
> +}
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 18/62] x86/mm: Implement syncing per-KeyID direct mappings
  2019-05-08 14:43 ` [PATCH, RFC 18/62] x86/mm: Implement syncing per-KeyID direct mappings Kirill A. Shutemov
@ 2019-06-14  9:51   ` Peter Zijlstra
  2019-06-14 22:43     ` Kirill A. Shutemov
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14  9:51 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:43:38PM +0300, Kirill A. Shutemov wrote:
> For MKTME we use per-KeyID direct mappings. This allows kernel to have
> access to encrypted memory.
> 
> sync_direct_mapping() sync per-KeyID direct mappings with a canonical
> one -- KeyID-0.
> 
> The function tracks changes in the canonical mapping:
>  - creating or removing chunks of the translation tree;
>  - changes in mapping flags (i.e. protection bits);
>  - splitting huge page mapping into a page table;
>  - replacing page table with a huge page mapping;
> 
> The function need to be called on every change to the direct mapping:
> hotplug, hotremove, changes in permissions bits, etc.

And yet I don't see anything in pageattr.c.

Also, this seems like an expensive scheme; if you know where the changes
where, a more fine-grained update would be faster.

> The function is nop until MKTME is enabled.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/include/asm/mktme.h |   6 +
>  arch/x86/mm/init_64.c        |  10 +
>  arch/x86/mm/mktme.c          | 441 +++++++++++++++++++++++++++++++++++
>  3 files changed, 457 insertions(+)


> @@ -1247,6 +1254,7 @@ void mark_rodata_ro(void)
>  	unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
>  	unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
>  	unsigned long all_end;
> +	int ret;
>  
>  	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
>  	       (end - start) >> 10);
> @@ -1280,6 +1288,8 @@ void mark_rodata_ro(void)
>  	free_kernel_image_pages((void *)text_end, (void *)rodata_start);
>  	free_kernel_image_pages((void *)rodata_end, (void *)_sdata);
>  
> +	ret = sync_direct_mapping();
> +	WARN_ON(ret);
>  	debug_checkwx();
>  }
>  

If you'd done pageattr, the above would not be needed.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages
  2019-06-14  9:34   ` Peter Zijlstra
@ 2019-06-14 11:04     ` Peter Zijlstra
  2019-06-14 13:28       ` Kirill A. Shutemov
  2019-06-14 13:14     ` Kirill A. Shutemov
  1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 11:04 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 11:34:09AM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:
> 
> > +		lookup_page_ext(page)->keyid = keyid;

> > +		lookup_page_ext(page)->keyid = 0;

Also, perhaps paranoid; but do we want something like:

static inline void page_set_keyid(struct page *page, int keyid)
{
	/* ensure nothing creeps after changing the keyid */
	barrier();
	WRITE_ONCE(lookup_page_ext(page)->keyid, keyid);
	barrier();
	/* ensure nothing creeps before changing the keyid */
}

And this is very much assuming there is no concurrency through the
allocator locks.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 19/62] x86/mm: Handle encrypted memory in page_to_virt() and __pa()
  2019-05-08 14:43 ` [PATCH, RFC 19/62] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
@ 2019-06-14 11:10   ` Peter Zijlstra
  0 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 11:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:43:39PM +0300, Kirill A. Shutemov wrote:
> Per-KeyID direct mappings require changes into how we find the right
> virtual address for a page and virt-to-phys address translations.
> 
> page_to_virt() definition overwrites default macros provided by
> <linux/mm.h>.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/include/asm/page.h    | 3 +++
>  arch/x86/include/asm/page_64.h | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
> index 39af59487d5f..aff30554f38e 100644
> --- a/arch/x86/include/asm/page.h
> +++ b/arch/x86/include/asm/page.h
> @@ -72,6 +72,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
>  extern bool __virt_addr_valid(unsigned long kaddr);
>  #define virt_addr_valid(kaddr)	__virt_addr_valid((unsigned long) (kaddr))
>  
> +#define page_to_virt(x) \
> +	(__va(PFN_PHYS(page_to_pfn(x))) + page_keyid(x) * direct_mapping_size)
> +
>  #endif	/* __ASSEMBLY__ */

So this is the bit that makes patch 13 make sense. It would've been nice
to have that called out in the Changelog or something.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 20/62] mm/page_ext: Export lookup_page_ext() symbol
  2019-05-08 14:43 ` [PATCH, RFC 20/62] mm/page_ext: Export lookup_page_ext() symbol Kirill A. Shutemov
@ 2019-06-14 11:12   ` Peter Zijlstra
  2019-06-14 22:44     ` Kirill A. Shutemov
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 11:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:43:40PM +0300, Kirill A. Shutemov wrote:
> page_keyid() is inline funcation that uses lookup_page_ext(). KVM is
> going to use page_keyid() and since KVM can be built as a module
> lookup_page_ext() has to be exported.

I _really_ hate having to export world+dog for KVM. This one might not
be a real issue, but I itch every time I see an export for KVM these
days.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 26/62] keys/mktme: Move the MKTME payload into a cache aligned structure
  2019-05-08 14:43 ` [PATCH, RFC 26/62] keys/mktme: Move the MKTME payload into a cache aligned structure Kirill A. Shutemov
@ 2019-06-14 11:35   ` Peter Zijlstra
  2019-06-14 17:10     ` Alison Schofield
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 11:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:43:46PM +0300, Kirill A. Shutemov wrote:

> +/* Copy the payload to the HW programming structure and program this KeyID */
> +static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
> +{
> +	struct mktme_key_program *kprog = NULL;
> +	int ret;
> +
> +	kprog = kmem_cache_zalloc(mktme_prog_cache, GFP_ATOMIC);

Why GFP_ATOMIC, afaict neither of the usage is with a spinlock held.

> +	if (!kprog)
> +		return -ENOMEM;
> +
> +	/* Hardware programming requires cached aligned struct */
> +	kprog->keyid = keyid;
> +	kprog->keyid_ctrl = payload->keyid_ctrl;
> +	memcpy(kprog->key_field_1, payload->data_key, MKTME_AES_XTS_SIZE);
> +	memcpy(kprog->key_field_2, payload->tweak_key, MKTME_AES_XTS_SIZE);
> +
> +	ret = MKTME_PROG_SUCCESS;	/* Future programming call */
> +	kmem_cache_free(mktme_prog_cache, kprog);
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 44/62] x86/mm: Set KeyIDs in encrypted VMAs for MKTME
  2019-05-08 14:44 ` [PATCH, RFC 44/62] x86/mm: Set KeyIDs in encrypted VMAs for MKTME Kirill A. Shutemov
@ 2019-06-14 11:44   ` Peter Zijlstra
  2019-06-14 17:33     ` Alison Schofield
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 11:44 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:44:04PM +0300, Kirill A. Shutemov wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> MKTME architecture requires the KeyID to be placed in PTE bits 51:46.
> To create an encrypted VMA, place the KeyID in the upper bits of
> vm_page_prot that matches the position of those PTE bits.
> 
> When the VMA is assigned a KeyID it is always considered a KeyID
> change. The VMA is either going from not encrypted to encrypted,
> or from encrypted with any KeyID to encrypted with any other KeyID.
> To make the change safely, remove the user pages held by the VMA
> and unlink the VMA's anonymous chain.

This does not look like a transformation that preserves content; is
mprotect() still a suitable name?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-05-08 14:44 ` [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call " Kirill A. Shutemov
@ 2019-06-14 11:47   ` Peter Zijlstra
  2019-06-14 17:35     ` Alison Schofield
  2019-06-14 11:51   ` Peter Zijlstra
  2019-06-17 15:07   ` Andy Lutomirski
  2 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 11:47 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:44:05PM +0300, Kirill A. Shutemov wrote:
> diff --git a/fs/exec.c b/fs/exec.c
> index 2e0033348d8e..695c121b34b3 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -755,8 +755,8 @@ int setup_arg_pages(struct linux_binprm *bprm,
>  	vm_flags |= mm->def_flags;
>  	vm_flags |= VM_STACK_INCOMPLETE_SETUP;
>  
> -	ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
> -			vm_flags);
> +	ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags,
> +			     -1);

You added a nice NO_KEY helper a few patches back, maybe use it?

>  	if (ret)
>  		goto out_unlock;
>  	BUG_ON(prev != vma);

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-05-08 14:44 ` [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call " Kirill A. Shutemov
  2019-06-14 11:47   ` Peter Zijlstra
@ 2019-06-14 11:51   ` Peter Zijlstra
  2019-06-15  0:32     ` Alison Schofield
  2019-06-17 15:07   ` Andy Lutomirski
  2 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 11:51 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:44:05PM +0300, Kirill A. Shutemov wrote:

> @@ -347,7 +348,8 @@ static int prot_none_walk(struct vm_area_struct *vma, unsigned long start,
>  
>  int
>  mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
> -	unsigned long start, unsigned long end, unsigned long newflags)
> +	       unsigned long start, unsigned long end, unsigned long newflags,
> +	       int newkeyid)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	unsigned long oldflags = vma->vm_flags;
> @@ -357,7 +359,14 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
>  	int error;
>  	int dirty_accountable = 0;
>  
> -	if (newflags == oldflags) {
> +	/*
> +	 * Flags match and Keyids match or we have NO_KEY.
> +	 * This _fixup is usually called from do_mprotect_ext() except
> +	 * for one special case: caller fs/exec.c/setup_arg_pages()
> +	 * In that case, newkeyid is passed as -1 (NO_KEY).
> +	 */
> +	if (newflags == oldflags &&
> +	    (newkeyid == vma_keyid(vma) || newkeyid == NO_KEY)) {
>  		*pprev = vma;
>  		return 0;
>  	}
> @@ -423,6 +432,8 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
>  	}
>  
>  success:
> +	if (newkeyid != NO_KEY)
> +		mprotect_set_encrypt(vma, newkeyid, start, end);
>  	/*
>  	 * vm_flags and vm_page_prot are protected by the mmap_sem
>  	 * held in write mode.
> @@ -454,10 +465,15 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
>  }
>  
>  /*
> - * When pkey==NO_KEY we get legacy mprotect behavior here.
> + * do_mprotect_ext() supports the legacy mprotect behavior plus extensions
> + * for Protection Keys and Memory Encryption Keys. These extensions are
> + * mutually exclusive and the behavior is:
> + *	(pkey==NO_KEY && keyid==NO_KEY) ==> legacy mprotect
> + *	(pkey is valid)  ==> legacy mprotect plus Protection Key extensions
> + *	(keyid is valid) ==> legacy mprotect plus Encryption Key extensions
>   */
>  static int do_mprotect_ext(unsigned long start, size_t len,
> -		unsigned long prot, int pkey)
> +			   unsigned long prot, int pkey, int keyid)
>  {
>  	unsigned long nstart, end, tmp, reqprot;
>  	struct vm_area_struct *vma, *prev;
> @@ -555,7 +571,8 @@ static int do_mprotect_ext(unsigned long start, size_t len,
>  		tmp = vma->vm_end;
>  		if (tmp > end)
>  			tmp = end;
> -		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> +		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags,
> +				       keyid);
>  		if (error)
>  			goto out;
>  		nstart = tmp;

I've missed the part where pkey && keyid results in a WARN or error or
whatever.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 46/62] x86/mm: Keep reference counts on encrypted VMAs for MKTME
  2019-05-08 14:44 ` [PATCH, RFC 46/62] x86/mm: Keep reference counts on encrypted VMAs " Kirill A. Shutemov
@ 2019-06-14 11:54   ` Peter Zijlstra
  2019-06-14 18:39     ` Alison Schofield
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 11:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:44:06PM +0300, Kirill A. Shutemov wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> The MKTME (Multi-Key Total Memory Encryption) Key Service needs
> a reference count on encrypted VMAs. This reference count is used
> to determine when a hardware encryption KeyID is no longer in use
> and can be freed and reassigned to another Userspace Key.
> 
> The MKTME Key service does the percpu_ref_init and _kill, so
> these gets/puts on encrypted VMA's can be considered the
> intermediaries in the lifetime of the key.
> 
> Increment/decrement the reference count during encrypt_mprotect()
> system call for initial or updated encryption on a VMA.
> 
> Piggy back on the vm_area_dup/free() helpers. If the VMAs being
> duplicated, or freed are encrypted, adjust the reference count.

That all talks about VMAs, but...

> @@ -102,6 +115,22 @@ void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
>  
>  		page++;
>  	}
> +
> +	/*
> +	 * Make sure the KeyID cannot be freed until the last page that
> +	 * uses the KeyID is gone.
> +	 *
> +	 * This is required because the page may live longer than VMA it
> +	 * is mapped into (i.e. in get_user_pages() case) and having
> +	 * refcounting per-VMA is not enough.
> +	 *
> +	 * Taking a reference per-4K helps in case if the page will be
> +	 * split after the allocation. free_encrypted_page() will balance
> +	 * out the refcount even if the page was split and freed as bunch
> +	 * of 4K pages.
> +	 */
> +
> +	percpu_ref_get_many(&encrypt_count[keyid], 1 << order);
>  }
>  
>  /*
> @@ -110,7 +139,9 @@ void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
>   */
>  void free_encrypted_page(struct page *page, int order)
>  {
> -	int i;
> +	int i, keyid;
> +
> +	keyid = page_keyid(page);
>  
>  	/*
>  	 * The hardware/CPU does not enforce coherency between mappings
> @@ -125,6 +156,8 @@ void free_encrypted_page(struct page *page, int order)
>  		lookup_page_ext(page)->keyid = 0;
>  		page++;
>  	}
> +
> +	percpu_ref_put_many(&encrypt_count[keyid], 1 << order);
>  }

counts pages, what gives?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 47/62] mm: Restrict MKTME memory encryption to anonymous VMAs
  2019-05-08 14:44 ` [PATCH, RFC 47/62] mm: Restrict MKTME memory encryption to anonymous VMAs Kirill A. Shutemov
@ 2019-06-14 11:55   ` Peter Zijlstra
  2019-06-15  0:07     ` Alison Schofield
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 11:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:44:07PM +0300, Kirill A. Shutemov wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Memory encryption is only supported for mappings that are ANONYMOUS.
> Test the VMA's in an encrypt_mprotect() request to make sure they all
> meet that requirement before encrypting any.
> 
> The encrypt_mprotect syscall will return -EINVAL and will not encrypt
> any VMA's if this check fails.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

This should be folded back into the initial implemention, methinks.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 49/62] mm, x86: export several MKTME variables
  2019-05-08 14:44 ` [PATCH, RFC 49/62] mm, x86: export several MKTME variables Kirill A. Shutemov
@ 2019-06-14 11:56   ` Peter Zijlstra
  2019-06-17  3:14     ` Kai Huang
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 11:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:44:09PM +0300, Kirill A. Shutemov wrote:
> From: Kai Huang <kai.huang@linux.intel.com>
> 
> KVM needs those variables to get/set memory encryption mask.
> 
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/mm/mktme.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
> index df70651816a1..12f4266cf7ea 100644
> --- a/arch/x86/mm/mktme.c
> +++ b/arch/x86/mm/mktme.c
> @@ -7,13 +7,16 @@
>  
>  /* Mask to extract KeyID from physical address. */
>  phys_addr_t mktme_keyid_mask;
> +EXPORT_SYMBOL_GPL(mktme_keyid_mask);
>  /*
>   * Number of KeyIDs available for MKTME.
>   * Excludes KeyID-0 which used by TME. MKTME KeyIDs start from 1.
>   */
>  int mktme_nr_keyids;
> +EXPORT_SYMBOL_GPL(mktme_nr_keyids);
>  /* Shift of KeyID within physical address. */
>  int mktme_keyid_shift;
> +EXPORT_SYMBOL_GPL(mktme_keyid_shift);
>  
>  DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
>  EXPORT_SYMBOL_GPL(mktme_enabled_key);

NAK, don't export variables. Who owns the values, who enforces this?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 51/62] iommu/vt-d: Support MKTME in DMA remapping
  2019-05-08 14:44 ` [PATCH, RFC 51/62] iommu/vt-d: Support MKTME in DMA remapping Kirill A. Shutemov
@ 2019-06-14 12:04   ` Peter Zijlstra
  0 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 12:04 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:44:11PM +0300, Kirill A. Shutemov wrote:
> @@ -603,7 +605,12 @@ static inline void dma_clear_pte(struct dma_pte *pte)
>  static inline u64 dma_pte_addr(struct dma_pte *pte)
>  {
>  #ifdef CONFIG_64BIT
> -	return pte->val & VTD_PAGE_MASK;

I don't know this code, but going by the below cmpxchg64, this wants to
be READ_ONCE().

> +	u64 addr = pte->val;
> +	addr &= VTD_PAGE_MASK;
> +#ifdef CONFIG_X86_INTEL_MKTME
> +	addr &= ~mktme_keyid_mask;
> +#endif
> +	return addr;
>  #else
>  	/* Must have a full atomic 64-bit read */
>  	return  __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 00/62] Intel MKTME enabling
  2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
                   ` (62 preceding siblings ...)
  2019-05-29  7:30 ` [PATCH, RFC 00/62] Intel MKTME enabling Mike Rapoport
@ 2019-06-14 12:15 ` Peter Zijlstra
  63 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 12:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield, linux-mm,
	kvm, keyrings, linux-kernel

On Wed, May 08, 2019 at 05:43:20PM +0300, Kirill A. Shutemov wrote:
> = Intro =
> 
> The patchset brings enabling of Intel Multi-Key Total Memory Encryption.
> It consists of changes into multiple subsystems:
> 
>  * Core MM: infrastructure for allocation pages, dealing with encrypted VMAs
>    and providing API setup encrypted mappings.

That wasn't eye-bleeding bad. With exception of the refcounting; that
looks like something that can easily go funny without people noticing.

>  * arch/x86: feature enumeration, program keys into hardware, setup
>    page table entries for encrypted pages and more.

That seemed incomplete (pageattr seems to be a giant hole).

>  * Key management service: setup and management of encryption keys.
>  * DMA/IOMMU: dealing with encrypted memory on IO side.

Just minor nits, someone else would have to look at this.

>  * KVM: interaction with virtualization side.

You really want to limit the damage random modules can do. They have no
business writing to the mktme variables.

>  * Documentation: description of APIs and usage examples.

Didn't bother with those; if the Changelogs are inadequate to make sense
of the patches documentation isn't the right place to fix things.

> The patchset is huge. This submission aims to give view to the full picture and
> get feedback on the overall design. The patchset will be split into more
> digestible pieces later.
> 
> Please review. Any feedback is welcome.

I still can't tell if this is worth the complexity :-/

Yes, there's a lot of words, but it doesn't mean anything to me, that
is, nothing here makes me want to build my kernel with this 'feature'
enabled.



^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 09/62] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
  2019-06-14  9:15   ` Peter Zijlstra
@ 2019-06-14 13:03     ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-06-14 13:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 11:15:14AM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:43:29PM +0300, Kirill A. Shutemov wrote:
> > + * Cast PAGE_MASK to a signed type so that it is sign-extended if
> > + * virtual addresses are 32-bits but physical addresses are larger
> > + * (ie, 32-bit PAE).
> 
> On 32bit, 'long' is still 32bit, did you want to cast to 'long long'
> instead? Ideally we'd use pteval_t here, but I see that is unsigned.

It will be cased implecitly to unsigned long long by '& ((1ULL <<
__PHYSICAL_MASK_SHIFT) - 1)' and due to sign-extension it will get it
right for PAE.

Just to be on safe side, I've re-checked that nothing changed for PAE by
the patch using the test below. PTE_PFN_MASK and PTE_PFN_MASK_MAX are
identical when compiled with -m32.

> >   */
> > -#define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
> > +#define PTE_PFN_MASK_MAX \
> > +	(((signed long)PAGE_MASK) & ((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
> > +#define _PAGE_CHG_MASK	(PTE_PFN_MASK_MAX | _PAGE_PCD | _PAGE_PWT |		\
> >  			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY |	\
> >  			 _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
> >  #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
> 

#include <stdio.h>

typedef unsigned long long u64;
typedef u64 pteval_t;
typedef u64 phys_addr_t;

#define PAGE_SHIFT		12
#define PAGE_SIZE		(1UL << PAGE_SHIFT)
#define PAGE_MASK		(~(PAGE_SIZE-1))
#define __PHYSICAL_MASK_SHIFT	52
#define __PHYSICAL_MASK		((phys_addr_t)((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
#define PHYSICAL_PAGE_MASK	(((signed long)PAGE_MASK) & __PHYSICAL_MASK)
#define PTE_PFN_MASK		((pteval_t)PHYSICAL_PAGE_MASK)
#define PTE_PFN_MASK_MAX	(((signed long)PAGE_MASK) & ((1ULL << __PHYSICAL_MASK_SHIFT) - 1))

int main(void)
{
	printf("PTE_PFN_MASK: %#llx\n", PTE_PFN_MASK);
	printf("PTE_PFN_MASK_MAX: %#llx\n", PTE_PFN_MASK_MAX);

	return 0;
}
-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages
  2019-06-14  9:34   ` Peter Zijlstra
  2019-06-14 11:04     ` Peter Zijlstra
@ 2019-06-14 13:14     ` Kirill A. Shutemov
  1 sibling, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-06-14 13:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 11:34:09AM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:
> 
> > +/* Prepare page to be used for encryption. Called from page allocator. */
> > +void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
> > +{
> > +	int i;
> > +
> > +	/*
> > +	 * The hardware/CPU does not enforce coherency between mappings
> > +	 * of the same physical page with different KeyIDs or
> > +	 * encryption keys. We are responsible for cache management.
> > +	 */
> 
> On alloc we should flush the unencrypted (key=0) range, while on free
> (below) we should flush the encrypted (key!=0) range.
> 
> But I seem to have missed where page_address() does the right thing
> here.

As you've seen by now, it will be addressed later in the patchset. I'll
update the changelog to indicate that page_address() handles KeyIDs
correctly.

> > +	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
> > +
> > +	for (i = 0; i < (1 << order); i++) {
> > +		/* All pages coming out of the allocator should have KeyID 0 */
> > +		WARN_ON_ONCE(lookup_page_ext(page)->keyid);
> > +		lookup_page_ext(page)->keyid = keyid;
> > +
> 
> So presumably page_address() is affected by this keyid, and the below
> clear_highpage() then accesses the 'right' location?

Yes. clear_highpage() -> kmap_atomic() -> page_address().

> > +		/* Clear the page after the KeyID is set. */
> > +		if (zero)
> > +			clear_highpage(page);
> > +
> > +		page++;
> > +	}
> > +}
> > +
> > +/*
> > + * Handles freeing of encrypted page.
> > + * Called from page allocator on freeing encrypted page.
> > + */
> > +void free_encrypted_page(struct page *page, int order)
> > +{
> > +	int i;
> > +
> > +	/*
> > +	 * The hardware/CPU does not enforce coherency between mappings
> > +	 * of the same physical page with different KeyIDs or
> > +	 * encryption keys. We are responsible for cache management.
> > +	 */
> 
> I still don't like that comment much; yes the hardware doesn't do it,
> and yes we have to do it, but it doesn't explain the actual scheme
> employed to do so.

Fair enough. I'll do better.

> > +	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
> > +
> > +	for (i = 0; i < (1 << order); i++) {
> > +		/* Check if the page has reasonable KeyID */
> > +		WARN_ON_ONCE(lookup_page_ext(page)->keyid > mktme_nr_keyids);
> 
> It should also check keyid > 0, so maybe:
> 
> 	(unsigned)(keyid - 1) > keyids-1
> 
> instead?

Makes sense.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages
  2019-06-14 11:04     ` Peter Zijlstra
@ 2019-06-14 13:28       ` Kirill A. Shutemov
  2019-06-14 13:43         ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-06-14 13:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 01:04:58PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 14, 2019 at 11:34:09AM +0200, Peter Zijlstra wrote:
> > On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:
> > 
> > > +		lookup_page_ext(page)->keyid = keyid;
> 
> > > +		lookup_page_ext(page)->keyid = 0;
> 
> Also, perhaps paranoid; but do we want something like:
> 
> static inline void page_set_keyid(struct page *page, int keyid)
> {
> 	/* ensure nothing creeps after changing the keyid */
> 	barrier();
> 	WRITE_ONCE(lookup_page_ext(page)->keyid, keyid);
> 	barrier();
> 	/* ensure nothing creeps before changing the keyid */
> }
> 
> And this is very much assuming there is no concurrency through the
> allocator locks.

There's no concurrency for this page: it has been off the free list, but
have not yet passed on to user. Nobody else sees the page before
allocation is finished.

And barriers/WRITE_ONCE() looks excessive to me. It's just yet another bit
of page's metadata and I don't see why it's has to be handled in a special
way.

Does it relax your paranoia? :P

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages
  2019-06-14 13:28       ` Kirill A. Shutemov
@ 2019-06-14 13:43         ` Peter Zijlstra
  2019-06-14 22:41           ` Kirill A. Shutemov
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-14 13:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 04:28:36PM +0300, Kirill A. Shutemov wrote:
> On Fri, Jun 14, 2019 at 01:04:58PM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 14, 2019 at 11:34:09AM +0200, Peter Zijlstra wrote:
> > > On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:
> > > 
> > > > +		lookup_page_ext(page)->keyid = keyid;
> > 
> > > > +		lookup_page_ext(page)->keyid = 0;
> > 
> > Also, perhaps paranoid; but do we want something like:
> > 
> > static inline void page_set_keyid(struct page *page, int keyid)
> > {
> > 	/* ensure nothing creeps after changing the keyid */
> > 	barrier();
> > 	WRITE_ONCE(lookup_page_ext(page)->keyid, keyid);
> > 	barrier();
> > 	/* ensure nothing creeps before changing the keyid */
> > }
> > 
> > And this is very much assuming there is no concurrency through the
> > allocator locks.
> 
> There's no concurrency for this page: it has been off the free list, but
> have not yet passed on to user. Nobody else sees the page before
> allocation is finished.
> 
> And barriers/WRITE_ONCE() looks excessive to me. It's just yet another bit
> of page's metadata and I don't see why it's has to be handled in a special
> way.
> 
> Does it relax your paranoia? :P

Not really, it all 'works' because clflush_cache_range() includes mb()
and page_address() has an address dependency on the store, and there are
no other sites that will ever change 'keyid', which is all kind of
fragile.

At the very least that should be explicitly called out in a comment.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 26/62] keys/mktme: Move the MKTME payload into a cache aligned structure
  2019-06-14 11:35   ` Peter Zijlstra
@ 2019-06-14 17:10     ` Alison Schofield
  0 siblings, 0 replies; 153+ messages in thread
From: Alison Schofield @ 2019-06-14 17:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 01:35:23PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:43:46PM +0300, Kirill A. Shutemov wrote:
> 
> > +/* Copy the payload to the HW programming structure and program this KeyID */
> > +static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
> > +{
> > +	struct mktme_key_program *kprog = NULL;
> > +	int ret;
> > +
> > +	kprog = kmem_cache_zalloc(mktme_prog_cache, GFP_ATOMIC);
> 
> Why GFP_ATOMIC, afaict neither of the usage is with a spinlock held.

Got it. GFP_ATOMIC not needed.
That said, this is an artifact of reworking the locking, and that 
locking may need to change again. If it does, will try to pre-allocate
rather than depend on GFP_ATOMIC here.

> 
> > +	if (!kprog)
> > +		return -ENOMEM;
> > +
> > +	/* Hardware programming requires cached aligned struct */
> > +	kprog->keyid = keyid;
> > +	kprog->keyid_ctrl = payload->keyid_ctrl;
> > +	memcpy(kprog->key_field_1, payload->data_key, MKTME_AES_XTS_SIZE);
> > +	memcpy(kprog->key_field_2, payload->tweak_key, MKTME_AES_XTS_SIZE);
> > +
> > +	ret = MKTME_PROG_SUCCESS;	/* Future programming call */
> > +	kmem_cache_free(mktme_prog_cache, kprog);
> > +	return ret;
> > +}

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 44/62] x86/mm: Set KeyIDs in encrypted VMAs for MKTME
  2019-06-14 11:44   ` Peter Zijlstra
@ 2019-06-14 17:33     ` Alison Schofield
  2019-06-14 18:26       ` Dave Hansen
  0 siblings, 1 reply; 153+ messages in thread
From: Alison Schofield @ 2019-06-14 17:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 01:44:08PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:44:04PM +0300, Kirill A. Shutemov wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > MKTME architecture requires the KeyID to be placed in PTE bits 51:46.
> > To create an encrypted VMA, place the KeyID in the upper bits of
> > vm_page_prot that matches the position of those PTE bits.
> > 
> > When the VMA is assigned a KeyID it is always considered a KeyID
> > change. The VMA is either going from not encrypted to encrypted,
> > or from encrypted with any KeyID to encrypted with any other KeyID.
> > To make the change safely, remove the user pages held by the VMA
> > and unlink the VMA's anonymous chain.
> 
> This does not look like a transformation that preserves content; is
> mprotect() still a suitable name?

Data is not preserved across KeyID changes, by design.

Background:
We chose to implement encrypt_mprotect() as an extension
to the legacy mprotect so that memory allocated in any
method could be encrypted. ie. we didn't want to be tied
to mmap. As an mprotect extension, encrypt_mprotect also
supports the changing of access flags.

The usage we suggest is:
1) alloc the memory w PROT_NONE to prevent any usage before
   encryption
2) use encrypt_mprotect() to add the key and change the
   access to  PROT_WRITE|PROT_READ.

Preserving the data across encryption key changes has not
been a requirement. I'm not clear if it was ever considered
and rejected. I believe that copying in order to preserve
the data was never considered.

Back to your naming question:
Since it is an mprotect extension, it seems we need to keep
the mprotect in the name. 

Thanks for bringing it up. It would be good to hear more
thoughts on this.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-14 11:47   ` Peter Zijlstra
@ 2019-06-14 17:35     ` Alison Schofield
  0 siblings, 0 replies; 153+ messages in thread
From: Alison Schofield @ 2019-06-14 17:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 01:47:32PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:44:05PM +0300, Kirill A. Shutemov wrote:
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 2e0033348d8e..695c121b34b3 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -755,8 +755,8 @@ int setup_arg_pages(struct linux_binprm *bprm,
> >  	vm_flags |= mm->def_flags;
> >  	vm_flags |= VM_STACK_INCOMPLETE_SETUP;
> >  
> > -	ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
> > -			vm_flags);
> > +	ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags,
> > +			     -1);
> 
> You added a nice NO_KEY helper a few patches back, maybe use it?

Sure, done.
(I hesitated to define NO_KEY in mm.h initially. Put it there now.
We'll see how that looks it next round.)

> 
> >  	if (ret)
> >  		goto out_unlock;
> >  	BUG_ON(prev != vma);

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 44/62] x86/mm: Set KeyIDs in encrypted VMAs for MKTME
  2019-06-14 17:33     ` Alison Schofield
@ 2019-06-14 18:26       ` Dave Hansen
  2019-06-14 18:46         ` Alison Schofield
  0 siblings, 1 reply; 153+ messages in thread
From: Dave Hansen @ 2019-06-14 18:26 UTC (permalink / raw)
  To: Alison Schofield, Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Kai Huang, Jacob Pan, linux-mm, kvm,
	keyrings, linux-kernel

On 6/14/19 10:33 AM, Alison Schofield wrote:
> Preserving the data across encryption key changes has not
> been a requirement. I'm not clear if it was ever considered
> and rejected. I believe that copying in order to preserve
> the data was never considered.

We could preserve the data pretty easily.  It's just annoying, though.
Right now, our only KeyID conversions happen in the page allocator.  If
we were to convert in-place, we'd need something along the lines of:

	1. Allocate a scratch page
	2. Unmap target page, or at least make it entirely read-only
	3. Copy plaintext into scratch page
	4. Do cache KeyID conversion of page being converted:
	   Flush caches, change page_ext metadata
	5. Copy plaintext back into target page from scratch area
	6. Re-establish PTEs with new KeyID

#2 is *really* hard.  It's similar to the problems that the poor
filesystem guys are having with RDMA these days when RDMA is doing writes.

What we have here (destroying existing data) is certainly the _simplest_
semantic.  We can certainly give it a different name, or even non-PROT_*
semantics where it shares none of mprotect()'s functionality.

Doesn't really matter to me at all.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 46/62] x86/mm: Keep reference counts on encrypted VMAs for MKTME
  2019-06-14 11:54   ` Peter Zijlstra
@ 2019-06-14 18:39     ` Alison Schofield
  0 siblings, 0 replies; 153+ messages in thread
From: Alison Schofield @ 2019-06-14 18:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 01:54:24PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:44:06PM +0300, Kirill A. Shutemov wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > The MKTME (Multi-Key Total Memory Encryption) Key Service needs
> > a reference count on encrypted VMAs. This reference count is used
> > to determine when a hardware encryption KeyID is no longer in use
> > and can be freed and reassigned to another Userspace Key.
> > 
> > The MKTME Key service does the percpu_ref_init and _kill, so
> > these gets/puts on encrypted VMA's can be considered the
> > intermediaries in the lifetime of the key.
> > 
> > Increment/decrement the reference count during encrypt_mprotect()
> > system call for initial or updated encryption on a VMA.
> > 
> > Piggy back on the vm_area_dup/free() helpers. If the VMAs being
> > duplicated, or freed are encrypted, adjust the reference count.
> 
> That all talks about VMAs, but...
> 
> > @@ -102,6 +115,22 @@ void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
> >  
> >  		page++;
> >  	}
> > +
> > +	/*
> > +	 * Make sure the KeyID cannot be freed until the last page that
> > +	 * uses the KeyID is gone.
> > +	 *
> > +	 * This is required because the page may live longer than VMA it
> > +	 * is mapped into (i.e. in get_user_pages() case) and having
> > +	 * refcounting per-VMA is not enough.
> > +	 *
> > +	 * Taking a reference per-4K helps in case if the page will be
> > +	 * split after the allocation. free_encrypted_page() will balance
> > +	 * out the refcount even if the page was split and freed as bunch
> > +	 * of 4K pages.
> > +	 */
> > +
> > +	percpu_ref_get_many(&encrypt_count[keyid], 1 << order);
> >  }

snip

> 
> counts pages, what gives?

Yeah. Comments are confusing. We implemented the refcounting w VMA's in
mind, and then added the page counting. I'll update the comments and
dig around some more based on your overall concerns about the
refcounting you mentioned in the cover letter.




^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 44/62] x86/mm: Set KeyIDs in encrypted VMAs for MKTME
  2019-06-14 18:26       ` Dave Hansen
@ 2019-06-14 18:46         ` Alison Schofield
  2019-06-14 19:11           ` Dave Hansen
  0 siblings, 1 reply; 153+ messages in thread
From: Alison Schofield @ 2019-06-14 18:46 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Peter Zijlstra, Kirill A. Shutemov, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	Andy Lutomirski, David Howells, Kees Cook, Kai Huang, Jacob Pan,
	linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 11:26:10AM -0700, Dave Hansen wrote:
> On 6/14/19 10:33 AM, Alison Schofield wrote:
> > Preserving the data across encryption key changes has not
> > been a requirement. I'm not clear if it was ever considered
> > and rejected. I believe that copying in order to preserve
> > the data was never considered.
> 
> We could preserve the data pretty easily.  It's just annoying, though.
> Right now, our only KeyID conversions happen in the page allocator.  If
> we were to convert in-place, we'd need something along the lines of:
> 
> 	1. Allocate a scratch page
> 	2. Unmap target page, or at least make it entirely read-only
> 	3. Copy plaintext into scratch page
> 	4. Do cache KeyID conversion of page being converted:
> 	   Flush caches, change page_ext metadata
> 	5. Copy plaintext back into target page from scratch area
> 	6. Re-establish PTEs with new KeyID

Seems like the 'Copy plaintext' steps might disappoint the user, as
much as the 'we don't preserve your data' design. Would users be happy
w the plain text steps ?
Alison

> 
> #2 is *really* hard.  It's similar to the problems that the poor
> filesystem guys are having with RDMA these days when RDMA is doing writes.
> 
> What we have here (destroying existing data) is certainly the _simplest_
> semantic.  We can certainly give it a different name, or even non-PROT_*
> semantics where it shares none of mprotect()'s functionality.
> 
> Doesn't really matter to me at all.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 44/62] x86/mm: Set KeyIDs in encrypted VMAs for MKTME
  2019-06-14 18:46         ` Alison Schofield
@ 2019-06-14 19:11           ` Dave Hansen
  2019-06-17  9:10             ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Dave Hansen @ 2019-06-14 19:11 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Peter Zijlstra, Kirill A. Shutemov, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	Andy Lutomirski, David Howells, Kees Cook, Kai Huang, Jacob Pan,
	linux-mm, kvm, keyrings, linux-kernel

On 6/14/19 11:46 AM, Alison Schofield wrote:
> On Fri, Jun 14, 2019 at 11:26:10AM -0700, Dave Hansen wrote:
>> On 6/14/19 10:33 AM, Alison Schofield wrote:
>>> Preserving the data across encryption key changes has not
>>> been a requirement. I'm not clear if it was ever considered
>>> and rejected. I believe that copying in order to preserve
>>> the data was never considered.
>>
>> We could preserve the data pretty easily.  It's just annoying, though.
>> Right now, our only KeyID conversions happen in the page allocator.  If
>> we were to convert in-place, we'd need something along the lines of:
>>
>> 	1. Allocate a scratch page
>> 	2. Unmap target page, or at least make it entirely read-only
>> 	3. Copy plaintext into scratch page
>> 	4. Do cache KeyID conversion of page being converted:
>> 	   Flush caches, change page_ext metadata
>> 	5. Copy plaintext back into target page from scratch area
>> 	6. Re-establish PTEs with new KeyID
> 
> Seems like the 'Copy plaintext' steps might disappoint the user, as
> much as the 'we don't preserve your data' design. Would users be happy
> w the plain text steps ?

Well, it got to be plaintext because they wrote it to memory in
plaintext in the first place, so it's kinda hard to disappoint them. :)

IMNHO, the *vast* majority of cases, folks will allocate memory and then
put a secret in it.  They aren't going to *get* a secret in some
mysterious fashion and then later decide they want to protect it.  In
other words, the inability to convert it is pretty academic and not
worth the complexity.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages
  2019-06-14 13:43         ` Peter Zijlstra
@ 2019-06-14 22:41           ` Kirill A. Shutemov
  2019-06-17  9:25             ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-06-14 22:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 03:43:35PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 14, 2019 at 04:28:36PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Jun 14, 2019 at 01:04:58PM +0200, Peter Zijlstra wrote:
> > > On Fri, Jun 14, 2019 at 11:34:09AM +0200, Peter Zijlstra wrote:
> > > > On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:
> > > > 
> > > > > +		lookup_page_ext(page)->keyid = keyid;
> > > 
> > > > > +		lookup_page_ext(page)->keyid = 0;
> > > 
> > > Also, perhaps paranoid; but do we want something like:
> > > 
> > > static inline void page_set_keyid(struct page *page, int keyid)
> > > {
> > > 	/* ensure nothing creeps after changing the keyid */
> > > 	barrier();
> > > 	WRITE_ONCE(lookup_page_ext(page)->keyid, keyid);
> > > 	barrier();
> > > 	/* ensure nothing creeps before changing the keyid */
> > > }
> > > 
> > > And this is very much assuming there is no concurrency through the
> > > allocator locks.
> > 
> > There's no concurrency for this page: it has been off the free list, but
> > have not yet passed on to user. Nobody else sees the page before
> > allocation is finished.
> > 
> > And barriers/WRITE_ONCE() looks excessive to me. It's just yet another bit
> > of page's metadata and I don't see why it's has to be handled in a special
> > way.
> > 
> > Does it relax your paranoia? :P
> 
> Not really, it all 'works' because clflush_cache_range() includes mb()
> and page_address() has an address dependency on the store, and there are
> no other sites that will ever change 'keyid', which is all kind of
> fragile.

Hm. I don't follow how the mb() in clflush_cache_range() relevant...

Any following access of page's memory by kernel will go through
page_keyid() and therefore I believe there's always address dependency on
the store.

Am I missing something?

> At the very least that should be explicitly called out in a comment.
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 18/62] x86/mm: Implement syncing per-KeyID direct mappings
  2019-06-14  9:51   ` Peter Zijlstra
@ 2019-06-14 22:43     ` Kirill A. Shutemov
  2019-06-17  9:27       ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-06-14 22:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 11:51:32AM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:43:38PM +0300, Kirill A. Shutemov wrote:
> > For MKTME we use per-KeyID direct mappings. This allows kernel to have
> > access to encrypted memory.
> > 
> > sync_direct_mapping() sync per-KeyID direct mappings with a canonical
> > one -- KeyID-0.
> > 
> > The function tracks changes in the canonical mapping:
> >  - creating or removing chunks of the translation tree;
> >  - changes in mapping flags (i.e. protection bits);
> >  - splitting huge page mapping into a page table;
> >  - replacing page table with a huge page mapping;
> > 
> > The function need to be called on every change to the direct mapping:
> > hotplug, hotremove, changes in permissions bits, etc.
> 
> And yet I don't see anything in pageattr.c.

You're right. I've hooked up the sync in the wrong place.
> 
> Also, this seems like an expensive scheme; if you know where the changes
> where, a more fine-grained update would be faster.

Do we have any hot enough pageattr users that makes it crucial?

I'll look into this anyway.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 20/62] mm/page_ext: Export lookup_page_ext() symbol
  2019-06-14 11:12   ` Peter Zijlstra
@ 2019-06-14 22:44     ` Kirill A. Shutemov
  2019-06-17  9:30       ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-06-14 22:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 01:12:59PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:43:40PM +0300, Kirill A. Shutemov wrote:
> > page_keyid() is inline funcation that uses lookup_page_ext(). KVM is
> > going to use page_keyid() and since KVM can be built as a module
> > lookup_page_ext() has to be exported.
> 
> I _really_ hate having to export world+dog for KVM. This one might not
> be a real issue, but I itch every time I see an export for KVM these
> days.

Is there any better way? Do we need to invent EXPORT_SYMBOL_KVM()? :P

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 47/62] mm: Restrict MKTME memory encryption to anonymous VMAs
  2019-06-14 11:55   ` Peter Zijlstra
@ 2019-06-15  0:07     ` Alison Schofield
  0 siblings, 0 replies; 153+ messages in thread
From: Alison Schofield @ 2019-06-15  0:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 01:55:20PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:44:07PM +0300, Kirill A. Shutemov wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Memory encryption is only supported for mappings that are ANONYMOUS.
> > Test the VMA's in an encrypt_mprotect() request to make sure they all
> > meet that requirement before encrypting any.
> > 
> > The encrypt_mprotect syscall will return -EINVAL and will not encrypt
> > any VMA's if this check fails.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> This should be folded back into the initial implemention, methinks.

It is part of the initial implementation. I looked for
places to split the implementation into smaller,
reviewable patches, hence this split. None of it gets
built until the CONFIG_X86_INTEL_MKTME is introduced
in a later patch.

The encrypt_mprotect() patchset is ordered like this:
1) generalize mprotect to support the mktme extension
2) wire up encrypt_mprotect()
3) implement encrypt_mprotect()
4) keep reference counts on encryption keys (was VMAs)
5) (this patch) restrict to anonymous VMAs.
  
I thought Patch 5) was a small, but meaningful split. It 
accentuates the fact that MKTME is restricted to anonymous
memory.

Alas, I want to make it logical to review, so I'll move it.



^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-14 11:51   ` Peter Zijlstra
@ 2019-06-15  0:32     ` Alison Schofield
  2019-06-17  9:08       ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Alison Schofield @ 2019-06-15  0:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 01:51:37PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:44:05PM +0300, Kirill A. Shutemov wrote:
snip
> >  /*
> > - * When pkey==NO_KEY we get legacy mprotect behavior here.
> > + * do_mprotect_ext() supports the legacy mprotect behavior plus extensions
> > + * for Protection Keys and Memory Encryption Keys. These extensions are
> > + * mutually exclusive and the behavior is:
> > + *	(pkey==NO_KEY && keyid==NO_KEY) ==> legacy mprotect
> > + *	(pkey is valid)  ==> legacy mprotect plus Protection Key extensions
> > + *	(keyid is valid) ==> legacy mprotect plus Encryption Key extensions
> >   */
> >  static int do_mprotect_ext(unsigned long start, size_t len,
> > -		unsigned long prot, int pkey)
> > +			   unsigned long prot, int pkey, int keyid)
> >  {

snip

>
> I've missed the part where pkey && keyid results in a WARN or error or
> whatever.
> 
I wasn't so sure about that since do_mprotect_ext()
is the call 'behind' the system calls. 

legacy mprotect always calls with: NO_KEY, NO_KEY
pkey_mprotect always calls with:  pkey, NO_KEY
encrypt_mprotect always calls with  NO_KEY, keyid

Would a check on those arguments be debug only 
to future proof this?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 49/62] mm, x86: export several MKTME variables
  2019-06-14 11:56   ` Peter Zijlstra
@ 2019-06-17  3:14     ` Kai Huang
  2019-06-17  7:46       ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Kai Huang @ 2019-06-17  3:14 UTC (permalink / raw)
  To: Peter Zijlstra, Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, David Howells, Kees Cook,
	Dave Hansen, Jacob Pan, Alison Schofield, linux-mm, kvm,
	keyrings, linux-kernel

On Fri, 2019-06-14 at 13:56 +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:44:09PM +0300, Kirill A. Shutemov wrote:
> > From: Kai Huang <kai.huang@linux.intel.com>
> > 
> > KVM needs those variables to get/set memory encryption mask.
> > 
> > Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/mm/mktme.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
> > index df70651816a1..12f4266cf7ea 100644
> > --- a/arch/x86/mm/mktme.c
> > +++ b/arch/x86/mm/mktme.c
> > @@ -7,13 +7,16 @@
> >  
> >  /* Mask to extract KeyID from physical address. */
> >  phys_addr_t mktme_keyid_mask;
> > +EXPORT_SYMBOL_GPL(mktme_keyid_mask);
> >  /*
> >   * Number of KeyIDs available for MKTME.
> >   * Excludes KeyID-0 which used by TME. MKTME KeyIDs start from 1.
> >   */
> >  int mktme_nr_keyids;
> > +EXPORT_SYMBOL_GPL(mktme_nr_keyids);
> >  /* Shift of KeyID within physical address. */
> >  int mktme_keyid_shift;
> > +EXPORT_SYMBOL_GPL(mktme_keyid_shift);
> >  
> >  DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
> >  EXPORT_SYMBOL_GPL(mktme_enabled_key);
> 
> NAK, don't export variables. Who owns the values, who enforces this?
> 

Both KVM and IOMMU driver need page_keyid() and mktme_keyid_shift to set page's keyID to the right
place in the PTE (of KVM EPT and VT-d DMA page table).

MKTME key type code need to know mktme_nr_keyids in order to alloc/free keyID.

Maybe better to introduce functions instead of exposing variables directly?

Or instead of introducing page_keyid(), we use page_encrypt_mask(), which essentially holds
"page_keyid() << mktme_keyid_shift"?

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 49/62] mm, x86: export several MKTME variables
  2019-06-17  3:14     ` Kai Huang
@ 2019-06-17  7:46       ` Peter Zijlstra
  2019-06-17  8:39         ` Kai Huang
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-17  7:46 UTC (permalink / raw)
  To: Kai Huang
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Mon, Jun 17, 2019 at 03:14:29PM +1200, Kai Huang wrote:
> On Fri, 2019-06-14 at 13:56 +0200, Peter Zijlstra wrote:
> > On Wed, May 08, 2019 at 05:44:09PM +0300, Kirill A. Shutemov wrote:
> > > From: Kai Huang <kai.huang@linux.intel.com>
> > > 
> > > KVM needs those variables to get/set memory encryption mask.
> > > 
> > > Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > ---
> > >  arch/x86/mm/mktme.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
> > > index df70651816a1..12f4266cf7ea 100644
> > > --- a/arch/x86/mm/mktme.c
> > > +++ b/arch/x86/mm/mktme.c
> > > @@ -7,13 +7,16 @@
> > >  
> > >  /* Mask to extract KeyID from physical address. */
> > >  phys_addr_t mktme_keyid_mask;
> > > +EXPORT_SYMBOL_GPL(mktme_keyid_mask);
> > >  /*
> > >   * Number of KeyIDs available for MKTME.
> > >   * Excludes KeyID-0 which used by TME. MKTME KeyIDs start from 1.
> > >   */
> > >  int mktme_nr_keyids;
> > > +EXPORT_SYMBOL_GPL(mktme_nr_keyids);
> > >  /* Shift of KeyID within physical address. */
> > >  int mktme_keyid_shift;
> > > +EXPORT_SYMBOL_GPL(mktme_keyid_shift);
> > >  
> > >  DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
> > >  EXPORT_SYMBOL_GPL(mktme_enabled_key);
> > 
> > NAK, don't export variables. Who owns the values, who enforces this?
> > 
> 
> Both KVM and IOMMU driver need page_keyid() and mktme_keyid_shift to set page's keyID to the right
> place in the PTE (of KVM EPT and VT-d DMA page table).
> 
> MKTME key type code need to know mktme_nr_keyids in order to alloc/free keyID.
> 
> Maybe better to introduce functions instead of exposing variables directly?
> 
> Or instead of introducing page_keyid(), we use page_encrypt_mask(), which essentially holds
> "page_keyid() << mktme_keyid_shift"?

Yes, that's much better, because that strictly limits the access to R/O.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 49/62] mm, x86: export several MKTME variables
  2019-06-17  7:46       ` Peter Zijlstra
@ 2019-06-17  8:39         ` Kai Huang
  2019-06-17 11:25           ` Kirill A. Shutemov
  0 siblings, 1 reply; 153+ messages in thread
From: Kai Huang @ 2019-06-17  8:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Mon, 2019-06-17 at 09:46 +0200, Peter Zijlstra wrote:
> On Mon, Jun 17, 2019 at 03:14:29PM +1200, Kai Huang wrote:
> > On Fri, 2019-06-14 at 13:56 +0200, Peter Zijlstra wrote:
> > > On Wed, May 08, 2019 at 05:44:09PM +0300, Kirill A. Shutemov wrote:
> > > > From: Kai Huang <kai.huang@linux.intel.com>
> > > > 
> > > > KVM needs those variables to get/set memory encryption mask.
> > > > 
> > > > Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > ---
> > > >  arch/x86/mm/mktme.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
> > > > index df70651816a1..12f4266cf7ea 100644
> > > > --- a/arch/x86/mm/mktme.c
> > > > +++ b/arch/x86/mm/mktme.c
> > > > @@ -7,13 +7,16 @@
> > > >  
> > > >  /* Mask to extract KeyID from physical address. */
> > > >  phys_addr_t mktme_keyid_mask;
> > > > +EXPORT_SYMBOL_GPL(mktme_keyid_mask);
> > > >  /*
> > > >   * Number of KeyIDs available for MKTME.
> > > >   * Excludes KeyID-0 which used by TME. MKTME KeyIDs start from 1.
> > > >   */
> > > >  int mktme_nr_keyids;
> > > > +EXPORT_SYMBOL_GPL(mktme_nr_keyids);
> > > >  /* Shift of KeyID within physical address. */
> > > >  int mktme_keyid_shift;
> > > > +EXPORT_SYMBOL_GPL(mktme_keyid_shift);
> > > >  
> > > >  DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
> > > >  EXPORT_SYMBOL_GPL(mktme_enabled_key);
> > > 
> > > NAK, don't export variables. Who owns the values, who enforces this?
> > > 
> > 
> > Both KVM and IOMMU driver need page_keyid() and mktme_keyid_shift to set page's keyID to the
> > right
> > place in the PTE (of KVM EPT and VT-d DMA page table).
> > 
> > MKTME key type code need to know mktme_nr_keyids in order to alloc/free keyID.
> > 
> > Maybe better to introduce functions instead of exposing variables directly?
> > 
> > Or instead of introducing page_keyid(), we use page_encrypt_mask(), which essentially holds
> > "page_keyid() << mktme_keyid_shift"?
> 
> Yes, that's much better, because that strictly limits the access to R/O.
> 

Thanks. I think Kirill will be the one to handle your suggestion. :)

Kirill?

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-15  0:32     ` Alison Schofield
@ 2019-06-17  9:08       ` Peter Zijlstra
  0 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-17  9:08 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 05:32:31PM -0700, Alison Schofield wrote:
> On Fri, Jun 14, 2019 at 01:51:37PM +0200, Peter Zijlstra wrote:
> > On Wed, May 08, 2019 at 05:44:05PM +0300, Kirill A. Shutemov wrote:
> snip
> > >  /*
> > > - * When pkey==NO_KEY we get legacy mprotect behavior here.
> > > + * do_mprotect_ext() supports the legacy mprotect behavior plus extensions
> > > + * for Protection Keys and Memory Encryption Keys. These extensions are
> > > + * mutually exclusive and the behavior is:

Well, here it states that the extentions are mutually exclusive.

> > > + *	(pkey==NO_KEY && keyid==NO_KEY) ==> legacy mprotect
> > > + *	(pkey is valid)  ==> legacy mprotect plus Protection Key extensions
> > > + *	(keyid is valid) ==> legacy mprotect plus Encryption Key extensions
> > >   */
> > >  static int do_mprotect_ext(unsigned long start, size_t len,
> > > -		unsigned long prot, int pkey)
> > > +			   unsigned long prot, int pkey, int keyid)
> > >  {
> 
> snip
> 
> >
> > I've missed the part where pkey && keyid results in a WARN or error or
> > whatever.
> > 
> I wasn't so sure about that since do_mprotect_ext()
> is the call 'behind' the system calls. 
> 
> legacy mprotect always calls with: NO_KEY, NO_KEY
> pkey_mprotect always calls with:  pkey, NO_KEY
> encrypt_mprotect always calls with  NO_KEY, keyid
> 
> Would a check on those arguments be debug only 
> to future proof this?

But you then don't check that, anywhere, afaict.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 44/62] x86/mm: Set KeyIDs in encrypted VMAs for MKTME
  2019-06-14 19:11           ` Dave Hansen
@ 2019-06-17  9:10             ` Peter Zijlstra
  0 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-17  9:10 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Alison Schofield, Kirill A. Shutemov, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	Andy Lutomirski, David Howells, Kees Cook, Kai Huang, Jacob Pan,
	linux-mm, kvm, keyrings, linux-kernel

On Fri, Jun 14, 2019 at 12:11:23PM -0700, Dave Hansen wrote:
> On 6/14/19 11:46 AM, Alison Schofield wrote:
> > On Fri, Jun 14, 2019 at 11:26:10AM -0700, Dave Hansen wrote:
> >> On 6/14/19 10:33 AM, Alison Schofield wrote:
> >>> Preserving the data across encryption key changes has not
> >>> been a requirement. I'm not clear if it was ever considered
> >>> and rejected. I believe that copying in order to preserve
> >>> the data was never considered.
> >>
> >> We could preserve the data pretty easily.  It's just annoying, though.
> >> Right now, our only KeyID conversions happen in the page allocator.  If
> >> we were to convert in-place, we'd need something along the lines of:
> >>
> >> 	1. Allocate a scratch page
> >> 	2. Unmap target page, or at least make it entirely read-only
> >> 	3. Copy plaintext into scratch page
> >> 	4. Do cache KeyID conversion of page being converted:
> >> 	   Flush caches, change page_ext metadata
> >> 	5. Copy plaintext back into target page from scratch area
> >> 	6. Re-establish PTEs with new KeyID
> > 
> > Seems like the 'Copy plaintext' steps might disappoint the user, as
> > much as the 'we don't preserve your data' design. Would users be happy
> > w the plain text steps ?
> 
> Well, it got to be plaintext because they wrote it to memory in
> plaintext in the first place, so it's kinda hard to disappoint them. :)
> 
> IMNHO, the *vast* majority of cases, folks will allocate memory and then
> put a secret in it.  They aren't going to *get* a secret in some
> mysterious fashion and then later decide they want to protect it.  In
> other words, the inability to convert it is pretty academic and not
> worth the complexity.

I'm not saying it is (required to preserve); but I do think it is
somewhat surprising to have an mprotect() call destroy content. It's
traditionally specified to not do that.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages
  2019-06-14 22:41           ` Kirill A. Shutemov
@ 2019-06-17  9:25             ` Peter Zijlstra
  0 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-17  9:25 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Sat, Jun 15, 2019 at 01:41:31AM +0300, Kirill A. Shutemov wrote:
> On Fri, Jun 14, 2019 at 03:43:35PM +0200, Peter Zijlstra wrote:

> > Not really, it all 'works' because clflush_cache_range() includes mb()
> > and page_address() has an address dependency on the store, and there are
> > no other sites that will ever change 'keyid', which is all kind of
> > fragile.
> 
> Hm. I don't follow how the mb() in clflush_cache_range() relevant...
> 
> Any following access of page's memory by kernel will go through
> page_keyid() and therefore I believe there's always address dependency on
> the store.
> 
> Am I missing something?

The dependency doesn't help with prior calls; consider:

	addr = page_address(page);

	*addr = foo;

	page->key_id = bar;

	addr2 = page_address(page);


Without a barrier() between '*addr = foo' and 'page->key_id = bar', the
compiler is allowed to reorder these stores.

Now, the clflush stuff we do, that already hard orders things -- we need
to be done writing before we start flushing -- so we can/do rely on
that, but we should explicitly mention that.

Now, for the second part, addr2 must observe bar, because of the address
dependency, the compiler is not allowed mess that up, but again, that is
something we should put in a comment.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 18/62] x86/mm: Implement syncing per-KeyID direct mappings
  2019-06-14 22:43     ` Kirill A. Shutemov
@ 2019-06-17  9:27       ` Peter Zijlstra
  2019-06-17 14:43         ` Kirill A. Shutemov
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-17  9:27 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Sat, Jun 15, 2019 at 01:43:09AM +0300, Kirill A. Shutemov wrote:
> On Fri, Jun 14, 2019 at 11:51:32AM +0200, Peter Zijlstra wrote:
> > On Wed, May 08, 2019 at 05:43:38PM +0300, Kirill A. Shutemov wrote:
> > > For MKTME we use per-KeyID direct mappings. This allows kernel to have
> > > access to encrypted memory.
> > > 
> > > sync_direct_mapping() sync per-KeyID direct mappings with a canonical
> > > one -- KeyID-0.
> > > 
> > > The function tracks changes in the canonical mapping:
> > >  - creating or removing chunks of the translation tree;
> > >  - changes in mapping flags (i.e. protection bits);
> > >  - splitting huge page mapping into a page table;
> > >  - replacing page table with a huge page mapping;
> > > 
> > > The function need to be called on every change to the direct mapping:
> > > hotplug, hotremove, changes in permissions bits, etc.
> > 
> > And yet I don't see anything in pageattr.c.
> 
> You're right. I've hooked up the sync in the wrong place.
> > 
> > Also, this seems like an expensive scheme; if you know where the changes
> > where, a more fine-grained update would be faster.
> 
> Do we have any hot enough pageattr users that makes it crucial?
> 
> I'll look into this anyway.

The graphics people would be the most agressive users of this I'd think.
They're the ones that yelled when I broke it last ;-)


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 20/62] mm/page_ext: Export lookup_page_ext() symbol
  2019-06-14 22:44     ` Kirill A. Shutemov
@ 2019-06-17  9:30       ` Peter Zijlstra
  2019-06-17 11:01         ` Kai Huang
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-17  9:30 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Sat, Jun 15, 2019 at 01:44:43AM +0300, Kirill A. Shutemov wrote:
> On Fri, Jun 14, 2019 at 01:12:59PM +0200, Peter Zijlstra wrote:
> > On Wed, May 08, 2019 at 05:43:40PM +0300, Kirill A. Shutemov wrote:
> > > page_keyid() is inline funcation that uses lookup_page_ext(). KVM is
> > > going to use page_keyid() and since KVM can be built as a module
> > > lookup_page_ext() has to be exported.
> > 
> > I _really_ hate having to export world+dog for KVM. This one might not
> > be a real issue, but I itch every time I see an export for KVM these
> > days.
> 
> Is there any better way? Do we need to invent EXPORT_SYMBOL_KVM()? :P

Or disallow KVM (or parts thereof) from being a module anymore.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 20/62] mm/page_ext: Export lookup_page_ext() symbol
  2019-06-17  9:30       ` Peter Zijlstra
@ 2019-06-17 11:01         ` Kai Huang
  2019-06-17 11:13           ` Huang, Kai
  0 siblings, 1 reply; 153+ messages in thread
From: Kai Huang @ 2019-06-17 11:01 UTC (permalink / raw)
  To: Peter Zijlstra, Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Mon, 2019-06-17 at 11:30 +0200, Peter Zijlstra wrote:
> On Sat, Jun 15, 2019 at 01:44:43AM +0300, Kirill A. Shutemov wrote:
> > On Fri, Jun 14, 2019 at 01:12:59PM +0200, Peter Zijlstra wrote:
> > > On Wed, May 08, 2019 at 05:43:40PM +0300, Kirill A. Shutemov wrote:
> > > > page_keyid() is inline funcation that uses lookup_page_ext(). KVM is
> > > > going to use page_keyid() and since KVM can be built as a module
> > > > lookup_page_ext() has to be exported.
> > > 
> > > I _really_ hate having to export world+dog for KVM. This one might not
> > > be a real issue, but I itch every time I see an export for KVM these
> > > days.
> > 
> > Is there any better way? Do we need to invent EXPORT_SYMBOL_KVM()? :P
> 
> Or disallow KVM (or parts thereof) from being a module anymore.

For this particular symbol expose, I don't think its fair to blame KVM since the fundamental reason
is because page_keyid() (which calls lookup_page_ext()) being implemented as static inline function
in header file, so essentially having any other module who calls page_keyid() will trigger this
problem -- in fact IOMMU driver calls page_keyid() too so even w/o KVM lookup_page_ext() needs to be
exposed.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 20/62] mm/page_ext: Export lookup_page_ext() symbol
  2019-06-17 11:01         ` Kai Huang
@ 2019-06-17 11:13           ` Huang, Kai
  0 siblings, 0 replies; 153+ messages in thread
From: Huang, Kai @ 2019-06-17 11:13 UTC (permalink / raw)
  To: kirill, peterz
  Cc: kvm, kirill.shutemov, keyrings, keescook, linux-kernel, tglx,
	linux-mm, dhowells, jacob.jun.pan, x86, akpm, hpa, mingo, bp,
	Hansen, Dave, luto, Schofield, Alison

On Mon, 2019-06-17 at 23:01 +1200, Kai Huang wrote:
> On Mon, 2019-06-17 at 11:30 +0200, Peter Zijlstra wrote:
> > On Sat, Jun 15, 2019 at 01:44:43AM +0300, Kirill A. Shutemov wrote:
> > > On Fri, Jun 14, 2019 at 01:12:59PM +0200, Peter Zijlstra wrote:
> > > > On Wed, May 08, 2019 at 05:43:40PM +0300, Kirill A. Shutemov wrote:
> > > > > page_keyid() is inline funcation that uses lookup_page_ext(). KVM is
> > > > > going to use page_keyid() and since KVM can be built as a module
> > > > > lookup_page_ext() has to be exported.
> > > > 
> > > > I _really_ hate having to export world+dog for KVM. This one might not
> > > > be a real issue, but I itch every time I see an export for KVM these
> > > > days.
> > > 
> > > Is there any better way? Do we need to invent EXPORT_SYMBOL_KVM()? :P
> > 
> > Or disallow KVM (or parts thereof) from being a module anymore.
> 
> For this particular symbol expose, I don't think its fair to blame KVM since the fundamental
> reason
> is because page_keyid() (which calls lookup_page_ext()) being implemented as static inline
> function
> in header file, so essentially having any other module who calls page_keyid() will trigger this
> problem -- in fact IOMMU driver calls page_keyid() too so even w/o KVM lookup_page_ext() needs to
> be
> exposed.

Oops it seems Intel IOMMU driver is not a module but buildin so yes KVM is the only module who calls
page_keyid() now. Sorry my bad. But if there's any other module calls page_keyid(), this patch is
required.

Thanks,
-Kai
> 
> Thanks,
> -Kai
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 49/62] mm, x86: export several MKTME variables
  2019-06-17  8:39         ` Kai Huang
@ 2019-06-17 11:25           ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-06-17 11:25 UTC (permalink / raw)
  To: Kai Huang
  Cc: Peter Zijlstra, Kirill A. Shutemov, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	Andy Lutomirski, David Howells, Kees Cook, Dave Hansen,
	Jacob Pan, Alison Schofield, linux-mm, kvm, keyrings,
	linux-kernel

On Mon, Jun 17, 2019 at 08:39:43PM +1200, Kai Huang wrote:
> On Mon, 2019-06-17 at 09:46 +0200, Peter Zijlstra wrote:
> > On Mon, Jun 17, 2019 at 03:14:29PM +1200, Kai Huang wrote:
> > > On Fri, 2019-06-14 at 13:56 +0200, Peter Zijlstra wrote:
> > > > On Wed, May 08, 2019 at 05:44:09PM +0300, Kirill A. Shutemov wrote:
> > > > > From: Kai Huang <kai.huang@linux.intel.com>
> > > > > 
> > > > > KVM needs those variables to get/set memory encryption mask.
> > > > > 
> > > > > Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > > ---
> > > > >  arch/x86/mm/mktme.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
> > > > > index df70651816a1..12f4266cf7ea 100644
> > > > > --- a/arch/x86/mm/mktme.c
> > > > > +++ b/arch/x86/mm/mktme.c
> > > > > @@ -7,13 +7,16 @@
> > > > >  
> > > > >  /* Mask to extract KeyID from physical address. */
> > > > >  phys_addr_t mktme_keyid_mask;
> > > > > +EXPORT_SYMBOL_GPL(mktme_keyid_mask);
> > > > >  /*
> > > > >   * Number of KeyIDs available for MKTME.
> > > > >   * Excludes KeyID-0 which used by TME. MKTME KeyIDs start from 1.
> > > > >   */
> > > > >  int mktme_nr_keyids;
> > > > > +EXPORT_SYMBOL_GPL(mktme_nr_keyids);
> > > > >  /* Shift of KeyID within physical address. */
> > > > >  int mktme_keyid_shift;
> > > > > +EXPORT_SYMBOL_GPL(mktme_keyid_shift);
> > > > >  
> > > > >  DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
> > > > >  EXPORT_SYMBOL_GPL(mktme_enabled_key);
> > > > 
> > > > NAK, don't export variables. Who owns the values, who enforces this?
> > > > 
> > > 
> > > Both KVM and IOMMU driver need page_keyid() and mktme_keyid_shift to set page's keyID to the
> > > right
> > > place in the PTE (of KVM EPT and VT-d DMA page table).
> > > 
> > > MKTME key type code need to know mktme_nr_keyids in order to alloc/free keyID.
> > > 
> > > Maybe better to introduce functions instead of exposing variables directly?
> > > 
> > > Or instead of introducing page_keyid(), we use page_encrypt_mask(), which essentially holds
> > > "page_keyid() << mktme_keyid_shift"?
> > 
> > Yes, that's much better, because that strictly limits the access to R/O.
> > 
> 
> Thanks. I think Kirill will be the one to handle your suggestion. :)
> 
> Kirill?

Will do.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 18/62] x86/mm: Implement syncing per-KeyID direct mappings
  2019-06-17  9:27       ` Peter Zijlstra
@ 2019-06-17 14:43         ` Kirill A. Shutemov
  2019-06-17 14:51           ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-06-17 14:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Mon, Jun 17, 2019 at 11:27:55AM +0200, Peter Zijlstra wrote:
> On Sat, Jun 15, 2019 at 01:43:09AM +0300, Kirill A. Shutemov wrote:
> > On Fri, Jun 14, 2019 at 11:51:32AM +0200, Peter Zijlstra wrote:
> > > On Wed, May 08, 2019 at 05:43:38PM +0300, Kirill A. Shutemov wrote:
> > > > For MKTME we use per-KeyID direct mappings. This allows kernel to have
> > > > access to encrypted memory.
> > > > 
> > > > sync_direct_mapping() sync per-KeyID direct mappings with a canonical
> > > > one -- KeyID-0.
> > > > 
> > > > The function tracks changes in the canonical mapping:
> > > >  - creating or removing chunks of the translation tree;
> > > >  - changes in mapping flags (i.e. protection bits);
> > > >  - splitting huge page mapping into a page table;
> > > >  - replacing page table with a huge page mapping;
> > > > 
> > > > The function need to be called on every change to the direct mapping:
> > > > hotplug, hotremove, changes in permissions bits, etc.
> > > 
> > > And yet I don't see anything in pageattr.c.
> > 
> > You're right. I've hooked up the sync in the wrong place.
> > > 
> > > Also, this seems like an expensive scheme; if you know where the changes
> > > where, a more fine-grained update would be faster.
> > 
> > Do we have any hot enough pageattr users that makes it crucial?
> > 
> > I'll look into this anyway.
> 
> The graphics people would be the most agressive users of this I'd think.
> They're the ones that yelled when I broke it last ;-)

I think something like this should do (I'll fold it in after testing):

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 6c973cb1e64c..b30386d84281 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -68,7 +68,7 @@ static inline void arch_free_page(struct page *page, int order)
 		free_encrypted_page(page, order);
 }
 
-int sync_direct_mapping(void);
+int sync_direct_mapping(unsigned long start, unsigned long end);
 
 int mktme_get_alg(int keyid);
 
@@ -86,7 +86,7 @@ static inline bool mktme_enabled(void)
 
 static inline void mktme_disable(void) {}
 
-static inline int sync_direct_mapping(void)
+static inline int sync_direct_mapping(unsigned long start, unsigned long end)
 {
 	return 0;
 }
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index f50a38d86cc4..f8123aeb24a6 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -761,7 +761,7 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
 		pgd_changed = true;
 	}
 
-	ret = sync_direct_mapping();
+	ret = sync_direct_mapping(vaddr_start, vaddr_end);
 	WARN_ON(ret);
 
 	if (pgd_changed)
@@ -1209,7 +1209,7 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 	end = (unsigned long)__va(end);
 
 	remove_pagetable(start, end, true, NULL);
-	ret = sync_direct_mapping();
+	ret = sync_direct_mapping(start, end);
 	WARN_ON(ret);
 }
 
@@ -1315,7 +1315,6 @@ void mark_rodata_ro(void)
 	unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 	unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
 	unsigned long all_end;
-	int ret;
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
@@ -1349,8 +1348,6 @@ void mark_rodata_ro(void)
 	free_kernel_image_pages((void *)text_end, (void *)rodata_start);
 	free_kernel_image_pages((void *)rodata_end, (void *)_sdata);
 
-	ret = sync_direct_mapping();
-	WARN_ON(ret);
 	debug_checkwx();
 }
 
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 9d2bb534f2ba..c099e1da055b 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -76,7 +76,7 @@ static void init_page_mktme(void)
 {
 	static_branch_enable(&mktme_enabled_key);
 
-	sync_direct_mapping();
+	sync_direct_mapping(PAGE_OFFSET, PAGE_OFFSET + direct_mapping_size);
 }
 
 struct page_ext_operations page_mktme_ops = {
@@ -596,15 +596,13 @@ static int sync_direct_mapping_p4d(unsigned long keyid,
 	return ret;
 }
 
-static int sync_direct_mapping_keyid(unsigned long keyid)
+static int sync_direct_mapping_keyid(unsigned long keyid,
+		unsigned long addr, unsigned long end)
 {
 	pgd_t *src_pgd, *dst_pgd;
-	unsigned long addr, end, next;
+	unsigned long next;
 	int ret = 0;
 
-	addr = PAGE_OFFSET;
-	end = PAGE_OFFSET + direct_mapping_size;
-
 	dst_pgd = pgd_offset_k(addr + keyid * direct_mapping_size);
 	src_pgd = pgd_offset_k(addr);
 
@@ -643,7 +641,7 @@ static int sync_direct_mapping_keyid(unsigned long keyid)
  *
  * The function is nop until MKTME is enabled.
  */
-int sync_direct_mapping(void)
+int sync_direct_mapping(unsigned long start, unsigned long end)
 {
 	int i, ret = 0;
 
@@ -651,7 +649,7 @@ int sync_direct_mapping(void)
 		return 0;
 
 	for (i = 1; !ret && i <= mktme_nr_keyids; i++)
-		ret = sync_direct_mapping_keyid(i);
+		ret = sync_direct_mapping_keyid(i, start, end);
 
 	flush_tlb_all();
 
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 6a9a77a403c9..eafbe0d8c44f 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -347,6 +347,28 @@ static void cpa_flush(struct cpa_data *data, int cache)
 
 	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
 
+	if (mktme_enabled()) {
+		unsigned long start, end;
+
+		start = *cpa->vaddr;
+		end = *cpa->vaddr + cpa->numpages * PAGE_SIZE;
+
+		/* Sync all direct mapping for an array */
+		if (cpa->flags & CPA_ARRAY) {
+			start = PAGE_OFFSET;
+			end = PAGE_OFFSET + direct_mapping_size;
+		}
+
+		/*
+		 * Sync per-KeyID direct mappings with the canonical one
+		 * (KeyID-0).
+		 *
+		 * sync_direct_mapping() does full TLB flush.
+		 */
+		sync_direct_mapping(start, end);
+		return;
+	}
+
 	if (cache && !static_cpu_has(X86_FEATURE_CLFLUSH)) {
 		cpa_flush_all(cache);
 		return;
-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 18/62] x86/mm: Implement syncing per-KeyID direct mappings
  2019-06-17 14:43         ` Kirill A. Shutemov
@ 2019-06-17 14:51           ` Peter Zijlstra
  2019-06-17 15:17             ` Kirill A. Shutemov
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-17 14:51 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Mon, Jun 17, 2019 at 05:43:28PM +0300, Kirill A. Shutemov wrote:
> On Mon, Jun 17, 2019 at 11:27:55AM +0200, Peter Zijlstra wrote:

> > > > And yet I don't see anything in pageattr.c.
> > > 
> > > You're right. I've hooked up the sync in the wrong place.

> I think something like this should do (I'll fold it in after testing):

> @@ -643,7 +641,7 @@ static int sync_direct_mapping_keyid(unsigned long keyid)
>   *
>   * The function is nop until MKTME is enabled.
>   */
> -int sync_direct_mapping(void)
> +int sync_direct_mapping(unsigned long start, unsigned long end)
>  {
>  	int i, ret = 0;
>  
> @@ -651,7 +649,7 @@ int sync_direct_mapping(void)
>  		return 0;
>  
>  	for (i = 1; !ret && i <= mktme_nr_keyids; i++)
> -		ret = sync_direct_mapping_keyid(i);
> +		ret = sync_direct_mapping_keyid(i, start, end);
>  
>  	flush_tlb_all();
>  
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 6a9a77a403c9..eafbe0d8c44f 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -347,6 +347,28 @@ static void cpa_flush(struct cpa_data *data, int cache)
>  
>  	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
>  
> +	if (mktme_enabled()) {
> +		unsigned long start, end;
> +
> +		start = *cpa->vaddr;
> +		end = *cpa->vaddr + cpa->numpages * PAGE_SIZE;
> +
> +		/* Sync all direct mapping for an array */
> +		if (cpa->flags & CPA_ARRAY) {
> +			start = PAGE_OFFSET;
> +			end = PAGE_OFFSET + direct_mapping_size;
> +		}

Understandable but sad, IIRC that's the most used interface (at least,
its the one the graphics people use).

> +
> +		/*
> +		 * Sync per-KeyID direct mappings with the canonical one
> +		 * (KeyID-0).
> +		 *
> +		 * sync_direct_mapping() does full TLB flush.
> +		 */
> +		sync_direct_mapping(start, end);
> +		return;

But it doesn't flush cache. So you can't return here.

> +	}
> +
>  	if (cache && !static_cpu_has(X86_FEATURE_CLFLUSH)) {
>  		cpa_flush_all(cache);
>  		return;
> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-05-08 14:44 ` [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call " Kirill A. Shutemov
  2019-06-14 11:47   ` Peter Zijlstra
  2019-06-14 11:51   ` Peter Zijlstra
@ 2019-06-17 15:07   ` Andy Lutomirski
  2019-06-17 15:28     ` Dave Hansen
  2 siblings, 1 reply; 153+ messages in thread
From: Andy Lutomirski @ 2019-06-17 15:07 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, X86 ML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Peter Zijlstra, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	Linux-MM, kvm list, keyrings, LKML

On Wed, May 8, 2019 at 7:44 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> From: Alison Schofield <alison.schofield@intel.com>
>
> Implement memory encryption for MKTME (Multi-Key Total Memory
> Encryption) with a new system call that is an extension of the
> legacy mprotect() system call.
>
> In encrypt_mprotect the caller must pass a handle to a previously
> allocated and programmed MKTME encryption key. The key can be
> obtained through the kernel key service type "mktme". The caller
> must have KEY_NEED_VIEW permission on the key.
>
> MKTME places an additional restriction on the protected data:
> The length of the data must be page aligned. This is in addition
> to the existing mprotect restriction that the addr must be page
> aligned.

I still find it bizarre that this is conflated with mprotect().

I also remain entirely unconvinced that MKTME on anonymous memory is
useful in the long run.  There will inevitably be all kinds of fancy
new CPU features that make the underlying MKTME mechanisms much more
useful.  For example, some way to bind a key to a VM, or a way to
*sanely* encrypt persistent memory.  By making this thing a syscall
that does more than just MKTME, you're adding combinatorial complexity
(you forget pkey!) and you're tying other functionality (change of
protection) to this likely-to-be-deprecated interface.

This is part of why I much prefer the idea of making this style of
MKTME a driver or some other non-intrusive interface.  Then, once
everyone gets tired of it, the driver can just get turned off with no
side effects.

--Andy

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 18/62] x86/mm: Implement syncing per-KeyID direct mappings
  2019-06-17 14:51           ` Peter Zijlstra
@ 2019-06-17 15:17             ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-06-17 15:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Andy Lutomirski,
	David Howells, Kees Cook, Dave Hansen, Kai Huang, Jacob Pan,
	Alison Schofield, linux-mm, kvm, keyrings, linux-kernel

On Mon, Jun 17, 2019 at 04:51:58PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 17, 2019 at 05:43:28PM +0300, Kirill A. Shutemov wrote:
> > On Mon, Jun 17, 2019 at 11:27:55AM +0200, Peter Zijlstra wrote:
> 
> > > > > And yet I don't see anything in pageattr.c.
> > > > 
> > > > You're right. I've hooked up the sync in the wrong place.
> 
> > I think something like this should do (I'll fold it in after testing):
> 
> > @@ -643,7 +641,7 @@ static int sync_direct_mapping_keyid(unsigned long keyid)
> >   *
> >   * The function is nop until MKTME is enabled.
> >   */
> > -int sync_direct_mapping(void)
> > +int sync_direct_mapping(unsigned long start, unsigned long end)
> >  {
> >  	int i, ret = 0;
> >  
> > @@ -651,7 +649,7 @@ int sync_direct_mapping(void)
> >  		return 0;
> >  
> >  	for (i = 1; !ret && i <= mktme_nr_keyids; i++)
> > -		ret = sync_direct_mapping_keyid(i);
> > +		ret = sync_direct_mapping_keyid(i, start, end);
> >  
> >  	flush_tlb_all();
> >  
> > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > index 6a9a77a403c9..eafbe0d8c44f 100644
> > --- a/arch/x86/mm/pageattr.c
> > +++ b/arch/x86/mm/pageattr.c
> > @@ -347,6 +347,28 @@ static void cpa_flush(struct cpa_data *data, int cache)
> >  
> >  	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
> >  
> > +	if (mktme_enabled()) {
> > +		unsigned long start, end;
> > +
> > +		start = *cpa->vaddr;
> > +		end = *cpa->vaddr + cpa->numpages * PAGE_SIZE;
> > +
> > +		/* Sync all direct mapping for an array */
> > +		if (cpa->flags & CPA_ARRAY) {
> > +			start = PAGE_OFFSET;
> > +			end = PAGE_OFFSET + direct_mapping_size;
> > +		}
> 
> Understandable but sad, IIRC that's the most used interface (at least,
> its the one the graphics people use).
> 
> > +
> > +		/*
> > +		 * Sync per-KeyID direct mappings with the canonical one
> > +		 * (KeyID-0).
> > +		 *
> > +		 * sync_direct_mapping() does full TLB flush.
> > +		 */
> > +		sync_direct_mapping(start, end);
> > +		return;
> 
> But it doesn't flush cache. So you can't return here.

Thanks for catching this.

	if (!cache)
		return;

should be fine.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-17 15:07   ` Andy Lutomirski
@ 2019-06-17 15:28     ` Dave Hansen
  2019-06-17 15:46       ` Andy Lutomirski
  0 siblings, 1 reply; 153+ messages in thread
From: Dave Hansen @ 2019-06-17 15:28 UTC (permalink / raw)
  To: Andy Lutomirski, Kirill A. Shutemov
  Cc: Andrew Morton, X86 ML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Peter Zijlstra, David Howells,
	Kees Cook, Kai Huang, Jacob Pan, Alison Schofield, Linux-MM,
	kvm list, keyrings, LKML

On 6/17/19 8:07 AM, Andy Lutomirski wrote:
> I still find it bizarre that this is conflated with mprotect().

This needs to be in the changelog.  But, for better or worse, it's
following the mprotect_pkey() pattern.

Other than the obvious "set the key on this memory", we're looking for
two other properties: atomicity (ensuring there is no transient state
where the memory is usable without the desired properties) and that it
is usable on existing allocations.

For atomicity, we have a model where we can allocate things with
PROT_NONE, then do mprotect_pkey() and mprotect_encrypt() (plus any
future features), then the last mprotect_*() call takes us from
PROT_NONE to the desired end permisions.  We could just require a plain
old mprotect() to do that instead of embedding mprotect()-like behavior
in these, of course, but that isn't the path we're on at the moment with
mprotect_pkey().

So, for this series it's just a matter of whether we do this:

	ptr = mmap(..., PROT_NONE);
	mprotect_pkey(protect_key, ptr, PROT_NONE);
	mprotect_encrypt(encr_key, ptr, PROT_READ|PROT_WRITE);
	// good to go

or this:

	ptr = mmap(..., PROT_NONE);
	mprotect_pkey(protect_key, ptr, PROT_NONE);
	sys_encrypt(key, ptr);
	mprotect(ptr, PROT_READ|PROT_WRITE);
	// good to go

I actually don't care all that much which one we end up with.  It's not
like the extra syscall in the second options means much.

> This is part of why I much prefer the idea of making this style of
> MKTME a driver or some other non-intrusive interface.  Then, once
> everyone gets tired of it, the driver can just get turned off with no
> side effects.

I like the concept, but not where it leads.  I'd call it the 'hugetlbfs
approach". :)  Hugetblfs certainly go us huge pages, but it's continued
to be a parallel set of code with parallel bugs and parallel
implementations of many VM features.  It's not that you can't implement
new things on hugetlbfs, it's that you *need* to.  You never get them
for free.

For instance, if we do a driver, how do we get large pages?  How do we
swap/reclaim the pages?  How do we do NUMA affinity?  How do we
eventually stack it on top of persistent memory filesystems or Device
DAX?  With a driver approach, I think we're stuck basically
reimplementing things or gluing them back together.  Nothing comes for free.

With this approach, we basically start with our normal, full feature set
(modulo weirdo interactions like with KSM).

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-17 15:28     ` Dave Hansen
@ 2019-06-17 15:46       ` Andy Lutomirski
  2019-06-17 18:27         ` Dave Hansen
  0 siblings, 1 reply; 153+ messages in thread
From: Andy Lutomirski @ 2019-06-17 15:46 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Kirill A. Shutemov, Andrew Morton, X86 ML,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	Peter Zijlstra, David Howells, Kees Cook, Kai Huang, Jacob Pan,
	Alison Schofield, Linux-MM, kvm list, keyrings, LKML

On Mon, Jun 17, 2019 at 8:28 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 6/17/19 8:07 AM, Andy Lutomirski wrote:
> > I still find it bizarre that this is conflated with mprotect().
>
> This needs to be in the changelog.  But, for better or worse, it's
> following the mprotect_pkey() pattern.
>
> Other than the obvious "set the key on this memory", we're looking for
> two other properties: atomicity (ensuring there is no transient state
> where the memory is usable without the desired properties) and that it
> is usable on existing allocations.
>
> For atomicity, we have a model where we can allocate things with
> PROT_NONE, then do mprotect_pkey() and mprotect_encrypt() (plus any
> future features), then the last mprotect_*() call takes us from
> PROT_NONE to the desired end permisions.  We could just require a plain
> old mprotect() to do that instead of embedding mprotect()-like behavior
> in these, of course, but that isn't the path we're on at the moment with
> mprotect_pkey().
>
> So, for this series it's just a matter of whether we do this:
>
>         ptr = mmap(..., PROT_NONE);
>         mprotect_pkey(protect_key, ptr, PROT_NONE);
>         mprotect_encrypt(encr_key, ptr, PROT_READ|PROT_WRITE);
>         // good to go
>
> or this:
>
>         ptr = mmap(..., PROT_NONE);
>         mprotect_pkey(protect_key, ptr, PROT_NONE);
>         sys_encrypt(key, ptr);
>         mprotect(ptr, PROT_READ|PROT_WRITE);
>         // good to go
>
> I actually don't care all that much which one we end up with.  It's not
> like the extra syscall in the second options means much.

The benefit of the second one is that, if sys_encrypt is absent, it
just works.  In the first model, programs need a fallback because
they'll segfault of mprotect_encrypt() gets ENOSYS.

>
> > This is part of why I much prefer the idea of making this style of
> > MKTME a driver or some other non-intrusive interface.  Then, once
> > everyone gets tired of it, the driver can just get turned off with no
> > side effects.
>
> I like the concept, but not where it leads.  I'd call it the 'hugetlbfs
> approach". :)  Hugetblfs certainly go us huge pages, but it's continued
> to be a parallel set of code with parallel bugs and parallel
> implementations of many VM features.  It's not that you can't implement
> new things on hugetlbfs, it's that you *need* to.  You never get them
> for free.

Fair enough, but...

>
> For instance, if we do a driver, how do we get large pages?  How do we
> swap/reclaim the pages?  How do we do NUMA affinity?

Those all make sense.

>  How do we
> eventually stack it on top of persistent memory filesystems or Device
> DAX?

How do we stack anonymous memory on top of persistent memory or Device
DAX?  I'm confused.

Just to throw this out there, what if we had a new device /dev/xpfo
and MKTME were one of its features.  You open /dev/xpfo, optionally do
an ioctl to set a key, and them map it.  The pages you get are
unmapped entirely from the direct map, and you get a PFNMAP VMA with
all its limitations.  This seems much more useful -- it's limited, but
it's limited *because the kernel can't accidentally read it*.

I think that, in the long run, we're going to have to either expand
the core mm's concept of what "memory" is or just have a whole
parallel set of mechanisms for memory that doesn't work like memory.
We're already accumulating a set of things that are backed by memory
but aren't usable as memory. SGX EPC pages and SEV pages come to mind.
They are faster when they're in big contiguous chunks (well, not SGX
AFAIK, but maybe some day), they have NUMA node affinity, and they
show up in page tables, but the hardware restricts who can read and
write them.  If Intel isn't planning to do something like this with
the MKTME hardware, I'll eat my hat.

I expect that some day normal memory will  be able to be repurposed as
SGX pages on the fly, and that will also look a lot more like SEV or
XPFO than like the this model of MKTME.

So, if we upstream MKTME as anonymous memory with a magic config
syscall, I predict that, in a few years, it will be end up inheriting
all downsides of both approaches with few of the upsides.  Programs
like QEMU will need to learn to manipulate pages that can't be
accessed outside the VM without special VM buy-in, so the fact that
MKTME pages are fully functional and can be GUP-ed won't be very
useful.  And the VM will learn about all these things, but MKTME won't
really fit in.

And, one of these days, someone will come up with a version of XPFO
that could actually be upstreamed, and it seems entirely plausible
that it will be totally incompatible with MKTME-as-anonymous-memory
and that users of MKTME will actually get *worse* security.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-17 15:46       ` Andy Lutomirski
@ 2019-06-17 18:27         ` Dave Hansen
  2019-06-17 19:12           ` Andy Lutomirski
  2019-06-17 23:59           ` Kai Huang
  0 siblings, 2 replies; 153+ messages in thread
From: Dave Hansen @ 2019-06-17 18:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kirill A. Shutemov, Andrew Morton, X86 ML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	David Howells, Kees Cook, Kai Huang, Jacob Pan, Alison Schofield,
	Linux-MM, kvm list, keyrings, LKML, Tom Lendacky

Tom Lendacky, could you take a look down in the message to the talk of
SEV?  I want to make sure I'm not misrepresenting what it does today.
...


>> I actually don't care all that much which one we end up with.  It's not
>> like the extra syscall in the second options means much.
> 
> The benefit of the second one is that, if sys_encrypt is absent, it
> just works.  In the first model, programs need a fallback because
> they'll segfault of mprotect_encrypt() gets ENOSYS.

Well, by the time they get here, they would have already had to allocate
and set up the encryption key.  I don't think this would really be the
"normal" malloc() path, for instance.

>>  How do we
>> eventually stack it on top of persistent memory filesystems or Device
>> DAX?
> 
> How do we stack anonymous memory on top of persistent memory or Device
> DAX?  I'm confused.

If our interface to MKTME is:

	fd = open("/dev/mktme");
	ptr = mmap(fd);

Then it's hard to combine with an interface which is:

	fd = open("/dev/dax123");
	ptr = mmap(fd);

Where if we have something like mprotect() (or madvise() or something
else taking pointer), we can just do:

	fd = open("/dev/anything987");
	ptr = mmap(fd);
	sys_encrypt(ptr);

Now, we might not *do* it that way for dax, for instance, but I'm just
saying that if we go the /dev/mktme route, we never get a choice.

> I think that, in the long run, we're going to have to either expand
> the core mm's concept of what "memory" is or just have a whole
> parallel set of mechanisms for memory that doesn't work like memory.
...
> I expect that some day normal memory will  be able to be repurposed as
> SGX pages on the fly, and that will also look a lot more like SEV or
> XPFO than like the this model of MKTME.

I think you're drawing the line at pages where the kernel can manage
contents vs. not manage contents.  I'm not sure that's the right
distinction to make, though.  The thing that is important is whether the
kernel can manage the lifetime and location of the data in the page.

Basically: Can the kernel choose where the page comes from and get the
page back when it wants?

I really don't like the current state of things like with SEV or with
KVM direct device assignment where the physical location is quite locked
down and the kernel really can't manage the memory.  I'm trying really
hard to make sure future hardware is more permissive about such things.
 My hope is that these are a temporary blip and not the new normal.

> So, if we upstream MKTME as anonymous memory with a magic config
> syscall, I predict that, in a few years, it will be end up inheriting
> all downsides of both approaches with few of the upsides.  Programs
> like QEMU will need to learn to manipulate pages that can't be
> accessed outside the VM without special VM buy-in, so the fact that
> MKTME pages are fully functional and can be GUP-ed won't be very
> useful.  And the VM will learn about all these things, but MKTME won't
> really fit in.

Kai Huang (who is on cc) has been doing the QEMU enabling and might want
to weigh in.  I'd also love to hear from the AMD folks in case I'm not
grokking some aspect of SEV.

But, my understanding is that, even today, neither QEMU nor the kernel
can see SEV-encrypted guest memory.  So QEMU should already understand
how to not interact with guest memory.  I _assume_ it's also already
doing this with anonymous memory, without needing /dev/sme or something.

> And, one of these days, someone will come up with a version of XPFO
> that could actually be upstreamed, and it seems entirely plausible
> that it will be totally incompatible with MKTME-as-anonymous-memory
> and that users of MKTME will actually get *worse* security.

I'm not following here.  XPFO just means that we don't keep the direct
map around all the time for all memory.  If XPFO and
MKTME-as-anonymous-memory were both in play, I think we'd just be
creating/destroying the MKTME-enlightened direct map instead of a
vanilla one.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-17 18:27         ` Dave Hansen
@ 2019-06-17 19:12           ` Andy Lutomirski
  2019-06-17 21:36             ` Dave Hansen
  2019-06-18  0:05             ` Kai Huang
  2019-06-17 23:59           ` Kai Huang
  1 sibling, 2 replies; 153+ messages in thread
From: Andy Lutomirski @ 2019-06-17 19:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Kirill A. Shutemov, Andrew Morton, X86 ML,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	Peter Zijlstra, David Howells, Kees Cook, Kai Huang, Jacob Pan,
	Alison Schofield, Linux-MM, kvm list, keyrings, LKML,
	Tom Lendacky

On Mon, Jun 17, 2019 at 11:37 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> Tom Lendacky, could you take a look down in the message to the talk of
> SEV?  I want to make sure I'm not misrepresenting what it does today.
> ...
>
>
> >> I actually don't care all that much which one we end up with.  It's not
> >> like the extra syscall in the second options means much.
> >
> > The benefit of the second one is that, if sys_encrypt is absent, it
> > just works.  In the first model, programs need a fallback because
> > they'll segfault of mprotect_encrypt() gets ENOSYS.
>
> Well, by the time they get here, they would have already had to allocate
> and set up the encryption key.  I don't think this would really be the
> "normal" malloc() path, for instance.
>
> >>  How do we
> >> eventually stack it on top of persistent memory filesystems or Device
> >> DAX?
> >
> > How do we stack anonymous memory on top of persistent memory or Device
> > DAX?  I'm confused.
>
> If our interface to MKTME is:
>
>         fd = open("/dev/mktme");
>         ptr = mmap(fd);
>
> Then it's hard to combine with an interface which is:
>
>         fd = open("/dev/dax123");
>         ptr = mmap(fd);
>
> Where if we have something like mprotect() (or madvise() or something
> else taking pointer), we can just do:
>
>         fd = open("/dev/anything987");
>         ptr = mmap(fd);
>         sys_encrypt(ptr);

I'm having a hard time imagining that ever working -- wouldn't it blow
up if someone did:

fd = open("/dev/anything987");
ptr1 = mmap(fd);
ptr2 = mmap(fd);
sys_encrypt(ptr1);

So I think it really has to be:
fd = open("/dev/anything987");
ioctl(fd, ENCRYPT_ME);
mmap(fd);

But I really expect that the encryption of a DAX device will actually
be a block device setting and won't look like this at all.  It'll be
more like dm-crypt except without device mapper.

>
> Now, we might not *do* it that way for dax, for instance, but I'm just
> saying that if we go the /dev/mktme route, we never get a choice.
>
> > I think that, in the long run, we're going to have to either expand
> > the core mm's concept of what "memory" is or just have a whole
> > parallel set of mechanisms for memory that doesn't work like memory.
> ...
> > I expect that some day normal memory will  be able to be repurposed as
> > SGX pages on the fly, and that will also look a lot more like SEV or
> > XPFO than like the this model of MKTME.
>
> I think you're drawing the line at pages where the kernel can manage
> contents vs. not manage contents.  I'm not sure that's the right
> distinction to make, though.  The thing that is important is whether the
> kernel can manage the lifetime and location of the data in the page.

The kernel can manage the location of EPC pages, for example, but only
under extreme constraints right now.  The draft SGX driver can and
does swap them out and swap them back in, potentially at a different
address.

>
> Basically: Can the kernel choose where the page comes from and get the
> page back when it wants?
>
> I really don't like the current state of things like with SEV or with
> KVM direct device assignment where the physical location is quite locked
> down and the kernel really can't manage the memory.  I'm trying really
> hard to make sure future hardware is more permissive about such things.
>  My hope is that these are a temporary blip and not the new normal.
>
> > So, if we upstream MKTME as anonymous memory with a magic config
> > syscall, I predict that, in a few years, it will be end up inheriting
> > all downsides of both approaches with few of the upsides.  Programs
> > like QEMU will need to learn to manipulate pages that can't be
> > accessed outside the VM without special VM buy-in, so the fact that
> > MKTME pages are fully functional and can be GUP-ed won't be very
> > useful.  And the VM will learn about all these things, but MKTME won't
> > really fit in.
>
> Kai Huang (who is on cc) has been doing the QEMU enabling and might want
> to weigh in.  I'd also love to hear from the AMD folks in case I'm not
> grokking some aspect of SEV.
>
> But, my understanding is that, even today, neither QEMU nor the kernel
> can see SEV-encrypted guest memory.  So QEMU should already understand
> how to not interact with guest memory.  I _assume_ it's also already
> doing this with anonymous memory, without needing /dev/sme or something.

Let's find out!

If it's using anonymous memory, it must be very strange, since it
would more or less have to be mmapped PROT_NONE.  The thing that makes
anonymous memory in particular seem so awkward to me is that it's
fundamentally identified by it's *address*, which means it makes no
sense if it's not mapped.

>
> > And, one of these days, someone will come up with a version of XPFO
> > that could actually be upstreamed, and it seems entirely plausible
> > that it will be totally incompatible with MKTME-as-anonymous-memory
> > and that users of MKTME will actually get *worse* security.
>
> I'm not following here.  XPFO just means that we don't keep the direct
> map around all the time for all memory.  If XPFO and
> MKTME-as-anonymous-memory were both in play, I think we'd just be
> creating/destroying the MKTME-enlightened direct map instead of a
> vanilla one.

What I'm saying is that I can imagine XPFO also wanting to be
something other than anonymous memory.  I don't think we'll ever want
regular MAP_ANONYMOUS to enable XPFO by default because the
performance will suck.  Doing this seems odd:

ptr = mmap(MAP_ANONYMOUS);
sys_xpfo_a_pointer(ptr);

So I could imagine:

ptr = mmap(MAP_ANONYMOUS | MAP_XPFO);

or

fd = open("/dev/xpfo"); (or fd = memfd_create(..., XPFO);
ptr = mmap(fd);

I'm thinking that XPFO is a *lot* simpler under the hood if we just
straight-up don't support GUP on it.  Maybe we should call this
"strong XPFO".  Similarly, the kinds of things that want MKTME may
also want the memory to be entirely absent from the direct map.  And
the things that use SEV (as I understand it) *can't* usefully use the
memory for normal IO via GUP or copy_to/from_user(), so these things
all have a decent amount in common.

Another down side of anonymous memory (in my head, anyway -- QEMU
people should chime in) is that it seems awkward to use it for IO
techniques in which the back-end isn't in the QEMU process.  If
there's an fd involved, you can pass it around, feed it to things like
vfio, etc.  If there's no fd, it's stuck in the creating process.

And another silly argument: if we had /dev/mktme, then we could
possibly get away with avoiding all the keyring stuff entirely.
Instead, you open /dev/mktme and you get your own key under the hook.
If you want two keys, you open /dev/mktme twice.  If you want some
other program to be able to see your memory, you pass it the fd.

I hope this email isn't too rambling :)

--Andy

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-17 19:12           ` Andy Lutomirski
@ 2019-06-17 21:36             ` Dave Hansen
  2019-06-18  0:48               ` Kai Huang
  2019-06-18  0:05             ` Kai Huang
  1 sibling, 1 reply; 153+ messages in thread
From: Dave Hansen @ 2019-06-17 21:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kirill A. Shutemov, Andrew Morton, X86 ML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	David Howells, Kees Cook, Kai Huang, Jacob Pan, Alison Schofield,
	Linux-MM, kvm list, keyrings, LKML, Tom Lendacky

>> Where if we have something like mprotect() (or madvise() or something
>> else taking pointer), we can just do:
>>
>>         fd = open("/dev/anything987");
>>         ptr = mmap(fd);
>>         sys_encrypt(ptr);
> 
> I'm having a hard time imagining that ever working -- wouldn't it blow
> up if someone did:
> 
> fd = open("/dev/anything987");
> ptr1 = mmap(fd);
> ptr2 = mmap(fd);
> sys_encrypt(ptr1);
> 
> So I think it really has to be:
> fd = open("/dev/anything987");
> ioctl(fd, ENCRYPT_ME);
> mmap(fd);

Yeah, shared mappings are annoying. :)

But, let's face it, nobody is going to do what you suggest in the
ptr1/ptr2 example.  It doesn't make any logical sense because it's
effectively asking to read the memory with two different keys.  I
_believe_ fscrypt has similar issues and just punts on them by saying
"don't do that".

We can also quite easily figure out what's going on.  It's a very simple
rule to kill a process that tries to fault a page in whose KeyID doesn't
match the VMA under which it is faulted in, and also require that no
pages are faulted in under VMAs which have their key changed.


>> Now, we might not *do* it that way for dax, for instance, but I'm just
>> saying that if we go the /dev/mktme route, we never get a choice.
>>
>>> I think that, in the long run, we're going to have to either expand
>>> the core mm's concept of what "memory" is or just have a whole
>>> parallel set of mechanisms for memory that doesn't work like memory.
>> ...
>>> I expect that some day normal memory will  be able to be repurposed as
>>> SGX pages on the fly, and that will also look a lot more like SEV or
>>> XPFO than like the this model of MKTME.
>>
>> I think you're drawing the line at pages where the kernel can manage
>> contents vs. not manage contents.  I'm not sure that's the right
>> distinction to make, though.  The thing that is important is whether the
>> kernel can manage the lifetime and location of the data in the page.
> 
> The kernel can manage the location of EPC pages, for example, but only
> under extreme constraints right now.  The draft SGX driver can and
> does swap them out and swap them back in, potentially at a different
> address.

The kernel can't put arbitrary data in EPC pages and can't use normal
memory for EPC.  To me, that puts them clearly on the side of being
unmanageable by the core mm code.

For instance, there's no way we could mix EPC pages in the same 'struct
zone' with non-EPC pages.  Not only are they not in the direct map, but
they never *can* be, even for a second.

>>> And, one of these days, someone will come up with a version of XPFO
>>> that could actually be upstreamed, and it seems entirely plausible
>>> that it will be totally incompatible with MKTME-as-anonymous-memory
>>> and that users of MKTME will actually get *worse* security.
>>
>> I'm not following here.  XPFO just means that we don't keep the direct
>> map around all the time for all memory.  If XPFO and
>> MKTME-as-anonymous-memory were both in play, I think we'd just be
>> creating/destroying the MKTME-enlightened direct map instead of a
>> vanilla one.
> 
> What I'm saying is that I can imagine XPFO also wanting to be
> something other than anonymous memory.  I don't think we'll ever want
> regular MAP_ANONYMOUS to enable XPFO by default because the
> performance will suck.

It will certainly suck for some things.  But, does it suck if the kernel
never uses the direct map for the XPFO memory?  If it were for KVM guest
memory for a guest using direct device assignment, we might not even
ever notice.

> I'm thinking that XPFO is a *lot* simpler under the hood if we just
> straight-up don't support GUP on it.  Maybe we should call this
> "strong XPFO".  Similarly, the kinds of things that want MKTME may
> also want the memory to be entirely absent from the direct map.  And
> the things that use SEV (as I understand it) *can't* usefully use the
> memory for normal IO via GUP or copy_to/from_user(), so these things
> all have a decent amount in common.

OK, so basically, you're thinking about new memory management
infrastructure that a memory-allocating-app can opt into where they get
a reduced kernel feature set, but also increased security guarantees?
 The main insight thought is that some hardware features *already*
impose (some of) this reduced feature set?

FWIW, I don't think many folks will go for the no-GUP rule.  It's one
thing to say no-GUPs for SGX pages which can't have I/O done on them in
the first place, but it's quite another to tell folks that sendfile() no
longer works without bounce buffers.

MKTME's security guarantees are very different than something like SEV.
 Since the kernel is in the trust boundary, it *can* do fun stuff like
RDMA which is a heck of a lot faster than bounce buffering.  Let's say a
franken-system existed with SEV and MKTME.  It isn't even clear to me
that *everyone* would pick SEV over MKTME.  IOW, I'm not sure the MKTME
model necessarily goes away given the presence of SEV.

> And another silly argument: if we had /dev/mktme, then we could
> possibly get away with avoiding all the keyring stuff entirely.
> Instead, you open /dev/mktme and you get your own key under the hook.
> If you want two keys, you open /dev/mktme twice.  If you want some
> other program to be able to see your memory, you pass it the fd.

We still like the keyring because it's one-stop-shopping as the place
that *owns* the hardware KeyID slots.  Those are global resources and
scream for a single global place to allocate and manage them.  The
hardware slots also need to be shared between any anonymous and
file-based users, no matter what the APIs for the anonymous side.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-17 18:27         ` Dave Hansen
  2019-06-17 19:12           ` Andy Lutomirski
@ 2019-06-17 23:59           ` Kai Huang
  2019-06-18  1:34             ` Lendacky, Thomas
  1 sibling, 1 reply; 153+ messages in thread
From: Kai Huang @ 2019-06-17 23:59 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski
  Cc: Kirill A. Shutemov, Andrew Morton, X86 ML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	David Howells, Kees Cook, Jacob Pan, Alison Schofield, Linux-MM,
	kvm list, keyrings, LKML, Tom Lendacky

On Mon, 2019-06-17 at 11:27 -0700, Dave Hansen wrote:
> Tom Lendacky, could you take a look down in the message to the talk of
> SEV?  I want to make sure I'm not misrepresenting what it does today.
> ...
> 
> 
> > > I actually don't care all that much which one we end up with.  It's not
> > > like the extra syscall in the second options means much.
> > 
> > The benefit of the second one is that, if sys_encrypt is absent, it
> > just works.  In the first model, programs need a fallback because
> > they'll segfault of mprotect_encrypt() gets ENOSYS.
> 
> Well, by the time they get here, they would have already had to allocate
> and set up the encryption key.  I don't think this would really be the
> "normal" malloc() path, for instance.
> 
> > >  How do we
> > > eventually stack it on top of persistent memory filesystems or Device
> > > DAX?
> > 
> > How do we stack anonymous memory on top of persistent memory or Device
> > DAX?  I'm confused.
> 
> If our interface to MKTME is:
> 
> 	fd = open("/dev/mktme");
> 	ptr = mmap(fd);
> 
> Then it's hard to combine with an interface which is:
> 
> 	fd = open("/dev/dax123");
> 	ptr = mmap(fd);
> 
> Where if we have something like mprotect() (or madvise() or something
> else taking pointer), we can just do:
> 
> 	fd = open("/dev/anything987");
> 	ptr = mmap(fd);
> 	sys_encrypt(ptr);
> 
> Now, we might not *do* it that way for dax, for instance, but I'm just
> saying that if we go the /dev/mktme route, we never get a choice.
> 
> > I think that, in the long run, we're going to have to either expand
> > the core mm's concept of what "memory" is or just have a whole
> > parallel set of mechanisms for memory that doesn't work like memory.
> 
> ...
> > I expect that some day normal memory will  be able to be repurposed as
> > SGX pages on the fly, and that will also look a lot more like SEV or
> > XPFO than like the this model of MKTME.
> 
> I think you're drawing the line at pages where the kernel can manage
> contents vs. not manage contents.  I'm not sure that's the right
> distinction to make, though.  The thing that is important is whether the
> kernel can manage the lifetime and location of the data in the page.
> 
> Basically: Can the kernel choose where the page comes from and get the
> page back when it wants?
> 
> I really don't like the current state of things like with SEV or with
> KVM direct device assignment where the physical location is quite locked
> down and the kernel really can't manage the memory.  I'm trying really
> hard to make sure future hardware is more permissive about such things.
>  My hope is that these are a temporary blip and not the new normal.
> 
> > So, if we upstream MKTME as anonymous memory with a magic config
> > syscall, I predict that, in a few years, it will be end up inheriting
> > all downsides of both approaches with few of the upsides.  Programs
> > like QEMU will need to learn to manipulate pages that can't be
> > accessed outside the VM without special VM buy-in, so the fact that
> > MKTME pages are fully functional and can be GUP-ed won't be very
> > useful.  And the VM will learn about all these things, but MKTME won't
> > really fit in.
> 
> Kai Huang (who is on cc) has been doing the QEMU enabling and might want
> to weigh in.  I'd also love to hear from the AMD folks in case I'm not
> grokking some aspect of SEV.
> 
> But, my understanding is that, even today, neither QEMU nor the kernel
> can see SEV-encrypted guest memory.  So QEMU should already understand
> how to not interact with guest memory.  I _assume_ it's also already
> doing this with anonymous memory, without needing /dev/sme or something.

Correct neither Qemu nor kernel can see SEV-encrypted guest memory. Qemu requires guest's
cooperation when it needs to interacts with guest, i.e. to support virtual DMA (of virtual devices
in SEV-guest), qemu requires SEV-guest to setup bounce buffer (which will not be SEV-encrypted
memory, but shared memory can be accessed from host side too), so that guest kernel can copy DMA
data from bounce buffer to its own SEV-encrypted memory after qemu/host kernel puts DMA data to
bounce buffer.

And yes from my reading (better to have AMD guys to confirm) SEV guest uses anonymous memory, but it
also pins all guest memory (by calling GUP from KVM -- SEV specifically introduced 2 KVM ioctls for
this purpose), since SEV architecturally cannot support swapping, migraiton of SEV-encrypted guest
memory, because SME/SEV also uses physical address as "tweak", and there's no way that kernel can
get or use SEV-guest's memory encryption key. In order to swap/migrate SEV-guest memory, we need SGX
EPC eviction/reload similar thing, which SEV doesn't have today.

From this perspective, I think driver proposal kinda makes sense since we already have security
feature which uses normal memory some kind like "device memory" (no swap, no migration, etc), so it
makes sense that MKTME just follows that (although from HW MKTME can support swap, page migration,
etc). The downside of driver proposal for MKTME I think is, like Dave mentioned, it's hard (or not
sure whether it is possible) to extend to support NVDIMM (and file backed guest memory), since for
virtual NVDIMM, Qemu needs to call mmap against fd of NVDIMM.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-17 19:12           ` Andy Lutomirski
  2019-06-17 21:36             ` Dave Hansen
@ 2019-06-18  0:05             ` Kai Huang
  2019-06-18  0:15               ` Andy Lutomirski
  1 sibling, 1 reply; 153+ messages in thread
From: Kai Huang @ 2019-06-18  0:05 UTC (permalink / raw)
  To: Andy Lutomirski, Dave Hansen
  Cc: Kirill A. Shutemov, Andrew Morton, X86 ML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	David Howells, Kees Cook, Jacob Pan, Alison Schofield, Linux-MM,
	kvm list, keyrings, LKML, Tom Lendacky

On Mon, 2019-06-17 at 12:12 -0700, Andy Lutomirski wrote:
> On Mon, Jun 17, 2019 at 11:37 AM Dave Hansen <dave.hansen@intel.com> wrote:
> > 
> > Tom Lendacky, could you take a look down in the message to the talk of
> > SEV?  I want to make sure I'm not misrepresenting what it does today.
> > ...
> > 
> > 
> > > > I actually don't care all that much which one we end up with.  It's not
> > > > like the extra syscall in the second options means much.
> > > 
> > > The benefit of the second one is that, if sys_encrypt is absent, it
> > > just works.  In the first model, programs need a fallback because
> > > they'll segfault of mprotect_encrypt() gets ENOSYS.
> > 
> > Well, by the time they get here, they would have already had to allocate
> > and set up the encryption key.  I don't think this would really be the
> > "normal" malloc() path, for instance.
> > 
> > > >  How do we
> > > > eventually stack it on top of persistent memory filesystems or Device
> > > > DAX?
> > > 
> > > How do we stack anonymous memory on top of persistent memory or Device
> > > DAX?  I'm confused.
> > 
> > If our interface to MKTME is:
> > 
> >         fd = open("/dev/mktme");
> >         ptr = mmap(fd);
> > 
> > Then it's hard to combine with an interface which is:
> > 
> >         fd = open("/dev/dax123");
> >         ptr = mmap(fd);
> > 
> > Where if we have something like mprotect() (or madvise() or something
> > else taking pointer), we can just do:
> > 
> >         fd = open("/dev/anything987");
> >         ptr = mmap(fd);
> >         sys_encrypt(ptr);
> 
> I'm having a hard time imagining that ever working -- wouldn't it blow
> up if someone did:
> 
> fd = open("/dev/anything987");
> ptr1 = mmap(fd);
> ptr2 = mmap(fd);
> sys_encrypt(ptr1);
> 
> So I think it really has to be:
> fd = open("/dev/anything987");
> ioctl(fd, ENCRYPT_ME);
> mmap(fd);

This requires "/dev/anything987" to support ENCRYPT_ME ioctl, right?

So to support NVDIMM (DAX), we need to add ENCRYPT_ME ioctl to DAX?

> 
> But I really expect that the encryption of a DAX device will actually
> be a block device setting and won't look like this at all.  It'll be
> more like dm-crypt except without device mapper.

Are you suggesting not to support MKTME for DAX, or adding MKTME support to dm-crypt?

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  0:05             ` Kai Huang
@ 2019-06-18  0:15               ` Andy Lutomirski
  2019-06-18  1:35                 ` Kai Huang
  2019-06-18 14:13                 ` Dave Hansen
  0 siblings, 2 replies; 153+ messages in thread
From: Andy Lutomirski @ 2019-06-18  0:15 UTC (permalink / raw)
  To: Kai Huang
  Cc: Andy Lutomirski, Dave Hansen, Kirill A. Shutemov, Andrew Morton,
	X86 ML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, David Howells, Kees Cook,
	Jacob Pan, Alison Schofield, Linux-MM, kvm list, keyrings, LKML,
	Tom Lendacky

On Mon, Jun 17, 2019 at 5:05 PM Kai Huang <kai.huang@linux.intel.com> wrote:
>
> On Mon, 2019-06-17 at 12:12 -0700, Andy Lutomirski wrote:
> > On Mon, Jun 17, 2019 at 11:37 AM Dave Hansen <dave.hansen@intel.com> wrote:
> > >
> > > Tom Lendacky, could you take a look down in the message to the talk of
> > > SEV?  I want to make sure I'm not misrepresenting what it does today.
> > > ...
> > >
> > >
> > > > > I actually don't care all that much which one we end up with.  It's not
> > > > > like the extra syscall in the second options means much.
> > > >
> > > > The benefit of the second one is that, if sys_encrypt is absent, it
> > > > just works.  In the first model, programs need a fallback because
> > > > they'll segfault of mprotect_encrypt() gets ENOSYS.
> > >
> > > Well, by the time they get here, they would have already had to allocate
> > > and set up the encryption key.  I don't think this would really be the
> > > "normal" malloc() path, for instance.
> > >
> > > > >  How do we
> > > > > eventually stack it on top of persistent memory filesystems or Device
> > > > > DAX?
> > > >
> > > > How do we stack anonymous memory on top of persistent memory or Device
> > > > DAX?  I'm confused.
> > >
> > > If our interface to MKTME is:
> > >
> > >         fd = open("/dev/mktme");
> > >         ptr = mmap(fd);
> > >
> > > Then it's hard to combine with an interface which is:
> > >
> > >         fd = open("/dev/dax123");
> > >         ptr = mmap(fd);
> > >
> > > Where if we have something like mprotect() (or madvise() or something
> > > else taking pointer), we can just do:
> > >
> > >         fd = open("/dev/anything987");
> > >         ptr = mmap(fd);
> > >         sys_encrypt(ptr);
> >
> > I'm having a hard time imagining that ever working -- wouldn't it blow
> > up if someone did:
> >
> > fd = open("/dev/anything987");
> > ptr1 = mmap(fd);
> > ptr2 = mmap(fd);
> > sys_encrypt(ptr1);
> >
> > So I think it really has to be:
> > fd = open("/dev/anything987");
> > ioctl(fd, ENCRYPT_ME);
> > mmap(fd);
>
> This requires "/dev/anything987" to support ENCRYPT_ME ioctl, right?
>
> So to support NVDIMM (DAX), we need to add ENCRYPT_ME ioctl to DAX?

Yes and yes, or we do it with layers -- see below.

I don't see how we can credibly avoid this.  If we try to do MKTME
behind the DAX driver's back, aren't we going to end up with cache
coherence problems?

>
> >
> > But I really expect that the encryption of a DAX device will actually
> > be a block device setting and won't look like this at all.  It'll be
> > more like dm-crypt except without device mapper.
>
> Are you suggesting not to support MKTME for DAX, or adding MKTME support to dm-crypt?

I'm proposing exposing it by an interface that looks somewhat like
dm-crypt.  Either we could have a way to create a device layered on
top of the DAX devices that exposes a decrypted view or we add a way
to tell the DAX device to kindly use MKTME with such-and-such key.

If there is demand for a way to have an fscrypt-like thing on top of
DAX where different files use different keys, I suppose that could be
done too, but it will need filesystem or VFS help.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-17 21:36             ` Dave Hansen
@ 2019-06-18  0:48               ` Kai Huang
  2019-06-18  1:50                 ` Andy Lutomirski
  0 siblings, 1 reply; 153+ messages in thread
From: Kai Huang @ 2019-06-18  0:48 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski
  Cc: Kirill A. Shutemov, Andrew Morton, X86 ML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	David Howells, Kees Cook, Jacob Pan, Alison Schofield, Linux-MM,
	kvm list, keyrings, LKML, Tom Lendacky


> 
> > And another silly argument: if we had /dev/mktme, then we could
> > possibly get away with avoiding all the keyring stuff entirely.
> > Instead, you open /dev/mktme and you get your own key under the hook.
> > If you want two keys, you open /dev/mktme twice.  If you want some
> > other program to be able to see your memory, you pass it the fd.
> 
> We still like the keyring because it's one-stop-shopping as the place
> that *owns* the hardware KeyID slots.  Those are global resources and
> scream for a single global place to allocate and manage them.  The
> hardware slots also need to be shared between any anonymous and
> file-based users, no matter what the APIs for the anonymous side.

MKTME driver (who creates /dev/mktme) can also be the one-stop-shopping. I think whether to choose
keyring to manage MKTME key should be based on whether we need/should take advantage of existing key
retention service functionalities. For example, with key retention service we can
revoke/invalidate/set expiry for a key (not sure whether MKTME needs those although), and we have
several keyrings -- thread specific keyring, process specific keyring, user specific keyring, etc,
thus we can control who can/cannot find the key, etc. I think managing MKTME key in MKTME driver
doesn't have those advantages.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-17 23:59           ` Kai Huang
@ 2019-06-18  1:34             ` Lendacky, Thomas
  2019-06-18  1:40               ` Andy Lutomirski
  0 siblings, 1 reply; 153+ messages in thread
From: Lendacky, Thomas @ 2019-06-18  1:34 UTC (permalink / raw)
  To: Kai Huang, Dave Hansen, Andy Lutomirski
  Cc: Kirill A. Shutemov, Andrew Morton, X86 ML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	David Howells, Kees Cook, Jacob Pan, Alison Schofield, Linux-MM,
	kvm list, keyrings, LKML

On 6/17/19 6:59 PM, Kai Huang wrote:
> On Mon, 2019-06-17 at 11:27 -0700, Dave Hansen wrote:
>> Tom Lendacky, could you take a look down in the message to the talk of
>> SEV?  I want to make sure I'm not misrepresenting what it does today.

Sorry, I'm traveling this week, so responses will be delayed...

>> ...
>>
>>
>>>> I actually don't care all that much which one we end up with.  It's not
>>>> like the extra syscall in the second options means much.
>>>
>>> The benefit of the second one is that, if sys_encrypt is absent, it
>>> just works.  In the first model, programs need a fallback because
>>> they'll segfault of mprotect_encrypt() gets ENOSYS.
>>
>> Well, by the time they get here, they would have already had to allocate
>> and set up the encryption key.  I don't think this would really be the
>> "normal" malloc() path, for instance.
>>
>>>>  How do we
>>>> eventually stack it on top of persistent memory filesystems or Device
>>>> DAX?
>>>
>>> How do we stack anonymous memory on top of persistent memory or Device
>>> DAX?  I'm confused.
>>
>> If our interface to MKTME is:
>>
>> 	fd = open("/dev/mktme");
>> 	ptr = mmap(fd);
>>
>> Then it's hard to combine with an interface which is:
>>
>> 	fd = open("/dev/dax123");
>> 	ptr = mmap(fd);
>>
>> Where if we have something like mprotect() (or madvise() or something
>> else taking pointer), we can just do:
>>
>> 	fd = open("/dev/anything987");
>> 	ptr = mmap(fd);
>> 	sys_encrypt(ptr);
>>
>> Now, we might not *do* it that way for dax, for instance, but I'm just
>> saying that if we go the /dev/mktme route, we never get a choice.
>>
>>> I think that, in the long run, we're going to have to either expand
>>> the core mm's concept of what "memory" is or just have a whole
>>> parallel set of mechanisms for memory that doesn't work like memory.
>>
>> ...
>>> I expect that some day normal memory will  be able to be repurposed as
>>> SGX pages on the fly, and that will also look a lot more like SEV or
>>> XPFO than like the this model of MKTME.
>>
>> I think you're drawing the line at pages where the kernel can manage
>> contents vs. not manage contents.  I'm not sure that's the right
>> distinction to make, though.  The thing that is important is whether the
>> kernel can manage the lifetime and location of the data in the page.
>>
>> Basically: Can the kernel choose where the page comes from and get the
>> page back when it wants?
>>
>> I really don't like the current state of things like with SEV or with
>> KVM direct device assignment where the physical location is quite locked
>> down and the kernel really can't manage the memory.  I'm trying really
>> hard to make sure future hardware is more permissive about such things.
>>  My hope is that these are a temporary blip and not the new normal.
>>
>>> So, if we upstream MKTME as anonymous memory with a magic config
>>> syscall, I predict that, in a few years, it will be end up inheriting
>>> all downsides of both approaches with few of the upsides.  Programs
>>> like QEMU will need to learn to manipulate pages that can't be
>>> accessed outside the VM without special VM buy-in, so the fact that
>>> MKTME pages are fully functional and can be GUP-ed won't be very
>>> useful.  And the VM will learn about all these things, but MKTME won't
>>> really fit in.
>>
>> Kai Huang (who is on cc) has been doing the QEMU enabling and might want
>> to weigh in.  I'd also love to hear from the AMD folks in case I'm not
>> grokking some aspect of SEV.
>>
>> But, my understanding is that, even today, neither QEMU nor the kernel
>> can see SEV-encrypted guest memory.  So QEMU should already understand
>> how to not interact with guest memory.  I _assume_ it's also already
>> doing this with anonymous memory, without needing /dev/sme or something.
> 
> Correct neither Qemu nor kernel can see SEV-encrypted guest memory. Qemu requires guest's
> cooperation when it needs to interacts with guest, i.e. to support virtual DMA (of virtual devices
> in SEV-guest), qemu requires SEV-guest to setup bounce buffer (which will not be SEV-encrypted
> memory, but shared memory can be accessed from host side too), so that guest kernel can copy DMA
> data from bounce buffer to its own SEV-encrypted memory after qemu/host kernel puts DMA data to
> bounce buffer.

That is correct. an SEV guest must use un-encrypted memory if it wishes
for the hypervisor to be able to see it. Also, to support DMA into the
guest, the target memory must be un-encrypted, which SEV does through the
DMA api and SWIOTLB.

SME must also use bounce buffers if the device does not support 48-bit
DMA, since the encryption bit is bit 47. Any device supporting DMA above
the encryption bit position can perform DMA without bounce buffers under
SME.

> 
> And yes from my reading (better to have AMD guys to confirm) SEV guest uses anonymous memory, but it
> also pins all guest memory (by calling GUP from KVM -- SEV specifically introduced 2 KVM ioctls for
> this purpose), since SEV architecturally cannot support swapping, migraiton of SEV-encrypted guest
> memory, because SME/SEV also uses physical address as "tweak", and there's no way that kernel can
> get or use SEV-guest's memory encryption key. In order to swap/migrate SEV-guest memory, we need SGX
> EPC eviction/reload similar thing, which SEV doesn't have today.

Yes, all the guest memory is currently pinned by calling GUP when creating
an SEV guest. This is to prevent page migration by the kernel. However,
the support to do page migration is available in the 0.17 version of the
SEV API via the COPY commmand. The COPY commannd allows for copying of
an encrypted page from one location to another via the PSP. The support
for this is not yet in the Linux kernel, but we are working on it. This
would remove the requirement of having to pin the guest memory.

The SEV API also supports migration of memory via the SEND_* and RECEIVE_*
APIs to support live migration.

Swapping, however, is not yet supported.

Thanks,
Tom

> 
> From this perspective, I think driver proposal kinda makes sense since we already have security
> feature which uses normal memory some kind like "device memory" (no swap, no migration, etc), so it
> makes sense that MKTME just follows that (although from HW MKTME can support swap, page migration,
> etc). The downside of driver proposal for MKTME I think is, like Dave mentioned, it's hard (or not
> sure whether it is possible) to extend to support NVDIMM (and file backed guest memory), since for
> virtual NVDIMM, Qemu needs to call mmap against fd of NVDIMM.
> 
> Thanks,
> -Kai
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  0:15               ` Andy Lutomirski
@ 2019-06-18  1:35                 ` Kai Huang
  2019-06-18  1:43                   ` Andy Lutomirski
  2019-06-18 14:13                 ` Dave Hansen
  1 sibling, 1 reply; 153+ messages in thread
From: Kai Huang @ 2019-06-18  1:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Kirill A. Shutemov, Andrew Morton, X86 ML,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	Peter Zijlstra, David Howells, Kees Cook, Jacob Pan,
	Alison Schofield, Linux-MM, kvm list, keyrings, LKML,
	Tom Lendacky


> > > 
> > > I'm having a hard time imagining that ever working -- wouldn't it blow
> > > up if someone did:
> > > 
> > > fd = open("/dev/anything987");
> > > ptr1 = mmap(fd);
> > > ptr2 = mmap(fd);
> > > sys_encrypt(ptr1);
> > > 
> > > So I think it really has to be:
> > > fd = open("/dev/anything987");
> > > ioctl(fd, ENCRYPT_ME);
> > > mmap(fd);
> > 
> > This requires "/dev/anything987" to support ENCRYPT_ME ioctl, right?
> > 
> > So to support NVDIMM (DAX), we need to add ENCRYPT_ME ioctl to DAX?
> 
> Yes and yes, or we do it with layers -- see below.
> 
> I don't see how we can credibly avoid this.  If we try to do MKTME
> behind the DAX driver's back, aren't we going to end up with cache
> coherence problems?

I am not sure whether I understand correctly but how is cache coherence problem related to putting
MKTME concept to different layers? To make MKTME work with DAX/NVDIMM, I think no matter which layer
MKTME concept resides, eventually we need to put keyID into PTE which maps to NVDIMM, and kernel
needs to manage cache coherence for NVDIMM just like for normal memory showed in this series? 

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  1:34             ` Lendacky, Thomas
@ 2019-06-18  1:40               ` Andy Lutomirski
  2019-06-18  2:02                 ` Lendacky, Thomas
  2019-06-18  4:19                 ` Andy Lutomirski
  0 siblings, 2 replies; 153+ messages in thread
From: Andy Lutomirski @ 2019-06-18  1:40 UTC (permalink / raw)
  To: Lendacky, Thomas
  Cc: Kai Huang, Dave Hansen, Andy Lutomirski, Kirill A. Shutemov,
	Andrew Morton, X86 ML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Peter Zijlstra, David Howells,
	Kees Cook, Jacob Pan, Alison Schofield, Linux-MM, kvm list,
	keyrings, LKML

On Mon, Jun 17, 2019 at 6:34 PM Lendacky, Thomas
<Thomas.Lendacky@amd.com> wrote:
>
> On 6/17/19 6:59 PM, Kai Huang wrote:
> > On Mon, 2019-06-17 at 11:27 -0700, Dave Hansen wrote:

> >
> > And yes from my reading (better to have AMD guys to confirm) SEV guest uses anonymous memory, but it
> > also pins all guest memory (by calling GUP from KVM -- SEV specifically introduced 2 KVM ioctls for
> > this purpose), since SEV architecturally cannot support swapping, migraiton of SEV-encrypted guest
> > memory, because SME/SEV also uses physical address as "tweak", and there's no way that kernel can
> > get or use SEV-guest's memory encryption key. In order to swap/migrate SEV-guest memory, we need SGX
> > EPC eviction/reload similar thing, which SEV doesn't have today.
>
> Yes, all the guest memory is currently pinned by calling GUP when creating
> an SEV guest.

Ick.

What happens if QEMU tries to read the memory?  Does it just see
ciphertext?  Is cache coherency lost if QEMU writes it?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  1:35                 ` Kai Huang
@ 2019-06-18  1:43                   ` Andy Lutomirski
  2019-06-18  2:23                     ` Kai Huang
  0 siblings, 1 reply; 153+ messages in thread
From: Andy Lutomirski @ 2019-06-18  1:43 UTC (permalink / raw)
  To: Kai Huang
  Cc: Andy Lutomirski, Dave Hansen, Kirill A. Shutemov, Andrew Morton,
	X86 ML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, David Howells, Kees Cook,
	Jacob Pan, Alison Schofield, Linux-MM, kvm list, keyrings, LKML,
	Tom Lendacky

On Mon, Jun 17, 2019 at 6:35 PM Kai Huang <kai.huang@linux.intel.com> wrote:
>
>
> > > >
> > > > I'm having a hard time imagining that ever working -- wouldn't it blow
> > > > up if someone did:
> > > >
> > > > fd = open("/dev/anything987");
> > > > ptr1 = mmap(fd);
> > > > ptr2 = mmap(fd);
> > > > sys_encrypt(ptr1);
> > > >
> > > > So I think it really has to be:
> > > > fd = open("/dev/anything987");
> > > > ioctl(fd, ENCRYPT_ME);
> > > > mmap(fd);
> > >
> > > This requires "/dev/anything987" to support ENCRYPT_ME ioctl, right?
> > >
> > > So to support NVDIMM (DAX), we need to add ENCRYPT_ME ioctl to DAX?
> >
> > Yes and yes, or we do it with layers -- see below.
> >
> > I don't see how we can credibly avoid this.  If we try to do MKTME
> > behind the DAX driver's back, aren't we going to end up with cache
> > coherence problems?
>
> I am not sure whether I understand correctly but how is cache coherence problem related to putting
> MKTME concept to different layers? To make MKTME work with DAX/NVDIMM, I think no matter which layer
> MKTME concept resides, eventually we need to put keyID into PTE which maps to NVDIMM, and kernel
> needs to manage cache coherence for NVDIMM just like for normal memory showed in this series?
>

I mean is that, to avoid cache coherence problems, something has to
prevent user code from mapping the same page with two different key
ids.  If the entire MKTME mechanism purely layers on top of DAX,
something needs to prevent the underlying DAX device from being mapped
at the same time as the MKTME-decrypted view.  This is obviously
doable, but it's not automatic.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  0:48               ` Kai Huang
@ 2019-06-18  1:50                 ` Andy Lutomirski
  2019-06-18  2:11                   ` Kai Huang
  2019-06-18 14:19                   ` Dave Hansen
  0 siblings, 2 replies; 153+ messages in thread
From: Andy Lutomirski @ 2019-06-18  1:50 UTC (permalink / raw)
  To: Kai Huang
  Cc: Dave Hansen, Andy Lutomirski, Kirill A. Shutemov, Andrew Morton,
	X86 ML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, David Howells, Kees Cook,
	Jacob Pan, Alison Schofield, Linux-MM, kvm list, keyrings, LKML,
	Tom Lendacky

On Mon, Jun 17, 2019 at 5:48 PM Kai Huang <kai.huang@linux.intel.com> wrote:
>
>
> >
> > > And another silly argument: if we had /dev/mktme, then we could
> > > possibly get away with avoiding all the keyring stuff entirely.
> > > Instead, you open /dev/mktme and you get your own key under the hook.
> > > If you want two keys, you open /dev/mktme twice.  If you want some
> > > other program to be able to see your memory, you pass it the fd.
> >
> > We still like the keyring because it's one-stop-shopping as the place
> > that *owns* the hardware KeyID slots.  Those are global resources and
> > scream for a single global place to allocate and manage them.  The
> > hardware slots also need to be shared between any anonymous and
> > file-based users, no matter what the APIs for the anonymous side.
>
> MKTME driver (who creates /dev/mktme) can also be the one-stop-shopping. I think whether to choose
> keyring to manage MKTME key should be based on whether we need/should take advantage of existing key
> retention service functionalities. For example, with key retention service we can
> revoke/invalidate/set expiry for a key (not sure whether MKTME needs those although), and we have
> several keyrings -- thread specific keyring, process specific keyring, user specific keyring, etc,
> thus we can control who can/cannot find the key, etc. I think managing MKTME key in MKTME driver
> doesn't have those advantages.
>

Trying to evaluate this with the current proposed code is a bit odd, I
think.  Suppose you create a thread-specific key and then fork().  The
child can presumably still use the key regardless of whether the child
can nominally access the key in the keyring because the PTEs are still
there.

More fundamentally, in some sense, the current code has no semantics.
Associating a key with memory and "encrypting" it doesn't actually do
anything unless you are attacking the memory bus but you haven't
compromised the kernel.  There's no protection against a guest that
can corrupt its EPT tables, there's no protection against kernel bugs
(*especially* if the duplicate direct map design stays), and there
isn't even any fd or other object around by which you can only access
the data if you can see the key.

I'm also wondering whether the kernel will always be able to be a
one-stop shop for key allocation -- if the MKTME hardware gains
interesting new uses down the road, who knows how key allocation will
work?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  1:40               ` Andy Lutomirski
@ 2019-06-18  2:02                 ` Lendacky, Thomas
  2019-06-18  4:19                 ` Andy Lutomirski
  1 sibling, 0 replies; 153+ messages in thread
From: Lendacky, Thomas @ 2019-06-18  2:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kai Huang, Dave Hansen, Kirill A. Shutemov, Andrew Morton,
	X86 ML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, David Howells, Kees Cook,
	Jacob Pan, Alison Schofield, Linux-MM, kvm list, keyrings, LKML

On 6/17/19 8:40 PM, Andy Lutomirski wrote:
> On Mon, Jun 17, 2019 at 6:34 PM Lendacky, Thomas
> <Thomas.Lendacky@amd.com> wrote:
>>
>> On 6/17/19 6:59 PM, Kai Huang wrote:
>>> On Mon, 2019-06-17 at 11:27 -0700, Dave Hansen wrote:
> 
>>>
>>> And yes from my reading (better to have AMD guys to confirm) SEV guest uses anonymous memory, but it
>>> also pins all guest memory (by calling GUP from KVM -- SEV specifically introduced 2 KVM ioctls for
>>> this purpose), since SEV architecturally cannot support swapping, migraiton of SEV-encrypted guest
>>> memory, because SME/SEV also uses physical address as "tweak", and there's no way that kernel can
>>> get or use SEV-guest's memory encryption key. In order to swap/migrate SEV-guest memory, we need SGX
>>> EPC eviction/reload similar thing, which SEV doesn't have today.
>>
>> Yes, all the guest memory is currently pinned by calling GUP when creating
>> an SEV guest.
> 
> Ick.
> 
> What happens if QEMU tries to read the memory?  Does it just see
> ciphertext?  Is cache coherency lost if QEMU writes it?

If QEMU tries to read the memory is would just see ciphertext. I'll
double check on the write situation, but I think you would end up with
a cache coherency issue because the write by QEMU would be with the
hypervisor key and tagged separately in the cache from the guest cache
entry. SEV provides confidentiality of guest memory from the hypervisor,
it doesn't prevent the hypervisor from trashing the guest memory.


Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  1:50                 ` Andy Lutomirski
@ 2019-06-18  2:11                   ` Kai Huang
  2019-06-18  4:24                     ` Andy Lutomirski
  2019-06-18 14:19                   ` Dave Hansen
  1 sibling, 1 reply; 153+ messages in thread
From: Kai Huang @ 2019-06-18  2:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Kirill A. Shutemov, Andrew Morton, X86 ML,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	Peter Zijlstra, David Howells, Kees Cook, Jacob Pan,
	Alison Schofield, Linux-MM, kvm list, keyrings, LKML,
	Tom Lendacky

On Mon, 2019-06-17 at 18:50 -0700, Andy Lutomirski wrote:
> On Mon, Jun 17, 2019 at 5:48 PM Kai Huang <kai.huang@linux.intel.com> wrote:
> > 
> > 
> > > 
> > > > And another silly argument: if we had /dev/mktme, then we could
> > > > possibly get away with avoiding all the keyring stuff entirely.
> > > > Instead, you open /dev/mktme and you get your own key under the hook.
> > > > If you want two keys, you open /dev/mktme twice.  If you want some
> > > > other program to be able to see your memory, you pass it the fd.
> > > 
> > > We still like the keyring because it's one-stop-shopping as the place
> > > that *owns* the hardware KeyID slots.  Those are global resources and
> > > scream for a single global place to allocate and manage them.  The
> > > hardware slots also need to be shared between any anonymous and
> > > file-based users, no matter what the APIs for the anonymous side.
> > 
> > MKTME driver (who creates /dev/mktme) can also be the one-stop-shopping. I think whether to
> > choose
> > keyring to manage MKTME key should be based on whether we need/should take advantage of existing
> > key
> > retention service functionalities. For example, with key retention service we can
> > revoke/invalidate/set expiry for a key (not sure whether MKTME needs those although), and we
> > have
> > several keyrings -- thread specific keyring, process specific keyring, user specific keyring,
> > etc,
> > thus we can control who can/cannot find the key, etc. I think managing MKTME key in MKTME driver
> > doesn't have those advantages.
> > 
> 
> Trying to evaluate this with the current proposed code is a bit odd, I
> think.  Suppose you create a thread-specific key and then fork().  The
> child can presumably still use the key regardless of whether the child
> can nominally access the key in the keyring because the PTEs are still
> there.

Right. This is a little bit odd, although virtualization (Qemu, which is the main use case of MKTME
at least so far) doesn't use fork().

> 
> More fundamentally, in some sense, the current code has no semantics.
> Associating a key with memory and "encrypting" it doesn't actually do
> anything unless you are attacking the memory bus but you haven't
> compromised the kernel.  There's no protection against a guest that
> can corrupt its EPT tables, there's no protection against kernel bugs
> (*especially* if the duplicate direct map design stays), and there
> isn't even any fd or other object around by which you can only access
> the data if you can see the key.

I am not saying managing MKTME key/keyID in key retention service is definitely better, but it seems
all those you mentioned are not related to whether to choose key retention service to manage MKTME
key/keyID? Or you are saying it doesn't matter we manage key/keyID in key retention service or in
MKTME driver, since MKTME barely have any security benefits (besides physical attack)?

> 
> I'm also wondering whether the kernel will always be able to be a
> one-stop shop for key allocation -- if the MKTME hardware gains
> interesting new uses down the road, who knows how key allocation will
> work?

I by now don't have any use case which requires to manage key/keyID specifically for its own use,
rather than letting kernel to manage keyID allocation. Please inspire us if you have any potential.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  1:43                   ` Andy Lutomirski
@ 2019-06-18  2:23                     ` Kai Huang
  2019-06-18  9:12                       ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Kai Huang @ 2019-06-18  2:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Kirill A. Shutemov, Andrew Morton, X86 ML,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	Peter Zijlstra, David Howells, Kees Cook, Jacob Pan,
	Alison Schofield, Linux-MM, kvm list, keyrings, LKML,
	Tom Lendacky

On Mon, 2019-06-17 at 18:43 -0700, Andy Lutomirski wrote:
> On Mon, Jun 17, 2019 at 6:35 PM Kai Huang <kai.huang@linux.intel.com> wrote:
> > 
> > 
> > > > > 
> > > > > I'm having a hard time imagining that ever working -- wouldn't it blow
> > > > > up if someone did:
> > > > > 
> > > > > fd = open("/dev/anything987");
> > > > > ptr1 = mmap(fd);
> > > > > ptr2 = mmap(fd);
> > > > > sys_encrypt(ptr1);
> > > > > 
> > > > > So I think it really has to be:
> > > > > fd = open("/dev/anything987");
> > > > > ioctl(fd, ENCRYPT_ME);
> > > > > mmap(fd);
> > > > 
> > > > This requires "/dev/anything987" to support ENCRYPT_ME ioctl, right?
> > > > 
> > > > So to support NVDIMM (DAX), we need to add ENCRYPT_ME ioctl to DAX?
> > > 
> > > Yes and yes, or we do it with layers -- see below.
> > > 
> > > I don't see how we can credibly avoid this.  If we try to do MKTME
> > > behind the DAX driver's back, aren't we going to end up with cache
> > > coherence problems?
> > 
> > I am not sure whether I understand correctly but how is cache coherence problem related to
> > putting
> > MKTME concept to different layers? To make MKTME work with DAX/NVDIMM, I think no matter which
> > layer
> > MKTME concept resides, eventually we need to put keyID into PTE which maps to NVDIMM, and kernel
> > needs to manage cache coherence for NVDIMM just like for normal memory showed in this series?
> > 
> 
> I mean is that, to avoid cache coherence problems, something has to
> prevent user code from mapping the same page with two different key
> ids.  If the entire MKTME mechanism purely layers on top of DAX,
> something needs to prevent the underlying DAX device from being mapped
> at the same time as the MKTME-decrypted view.  This is obviously
> doable, but it's not automatic.

Assuming I am understanding the context correctly, yes from this perspective it seems having
sys_encrypt is annoying, and having ENCRYPT_ME should be better. But Dave said "nobody is going to
do what you suggest in the ptr1/ptr2 example"? 

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  1:40               ` Andy Lutomirski
  2019-06-18  2:02                 ` Lendacky, Thomas
@ 2019-06-18  4:19                 ` Andy Lutomirski
  1 sibling, 0 replies; 153+ messages in thread
From: Andy Lutomirski @ 2019-06-18  4:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Lendacky, Thomas, Kai Huang, Dave Hansen, Kirill A. Shutemov,
	Andrew Morton, X86 ML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Peter Zijlstra, David Howells,
	Kees Cook, Jacob Pan, Alison Schofield, Linux-MM, kvm list,
	keyrings, LKML

On Mon, Jun 17, 2019 at 6:40 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Mon, Jun 17, 2019 at 6:34 PM Lendacky, Thomas
> <Thomas.Lendacky@amd.com> wrote:
> >
> > On 6/17/19 6:59 PM, Kai Huang wrote:
> > > On Mon, 2019-06-17 at 11:27 -0700, Dave Hansen wrote:
>
> > >
> > > And yes from my reading (better to have AMD guys to confirm) SEV guest uses anonymous memory, but it
> > > also pins all guest memory (by calling GUP from KVM -- SEV specifically introduced 2 KVM ioctls for
> > > this purpose), since SEV architecturally cannot support swapping, migraiton of SEV-encrypted guest
> > > memory, because SME/SEV also uses physical address as "tweak", and there's no way that kernel can
> > > get or use SEV-guest's memory encryption key. In order to swap/migrate SEV-guest memory, we need SGX
> > > EPC eviction/reload similar thing, which SEV doesn't have today.
> >
> > Yes, all the guest memory is currently pinned by calling GUP when creating
> > an SEV guest.
>
> Ick.
>
> What happens if QEMU tries to read the memory?  Does it just see
> ciphertext?  Is cache coherency lost if QEMU writes it?

I should add: is the current interface that SEV uses actually good, or
should the kernel try to do something differently?  I've spent exactly
zero time looking at SEV APIs or at how QEMU manages its memory.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  2:11                   ` Kai Huang
@ 2019-06-18  4:24                     ` Andy Lutomirski
  0 siblings, 0 replies; 153+ messages in thread
From: Andy Lutomirski @ 2019-06-18  4:24 UTC (permalink / raw)
  To: Kai Huang
  Cc: Andy Lutomirski, Dave Hansen, Kirill A. Shutemov, Andrew Morton,
	X86 ML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, David Howells, Kees Cook,
	Jacob Pan, Alison Schofield, Linux-MM, kvm list, keyrings, LKML,
	Tom Lendacky

On Mon, Jun 17, 2019 at 7:11 PM Kai Huang <kai.huang@linux.intel.com> wrote:
>
> On Mon, 2019-06-17 at 18:50 -0700, Andy Lutomirski wrote:
> > On Mon, Jun 17, 2019 at 5:48 PM Kai Huang <kai.huang@linux.intel.com> wrote:
> > >
> > >
> > > >
> > > > > And another silly argument: if we had /dev/mktme, then we could
> > > > > possibly get away with avoiding all the keyring stuff entirely.
> > > > > Instead, you open /dev/mktme and you get your own key under the hook.
> > > > > If you want two keys, you open /dev/mktme twice.  If you want some
> > > > > other program to be able to see your memory, you pass it the fd.
> > > >
> > > > We still like the keyring because it's one-stop-shopping as the place
> > > > that *owns* the hardware KeyID slots.  Those are global resources and
> > > > scream for a single global place to allocate and manage them.  The
> > > > hardware slots also need to be shared between any anonymous and
> > > > file-based users, no matter what the APIs for the anonymous side.
> > >
> > > MKTME driver (who creates /dev/mktme) can also be the one-stop-shopping. I think whether to
> > > choose
> > > keyring to manage MKTME key should be based on whether we need/should take advantage of existing
> > > key
> > > retention service functionalities. For example, with key retention service we can
> > > revoke/invalidate/set expiry for a key (not sure whether MKTME needs those although), and we
> > > have
> > > several keyrings -- thread specific keyring, process specific keyring, user specific keyring,
> > > etc,
> > > thus we can control who can/cannot find the key, etc. I think managing MKTME key in MKTME driver
> > > doesn't have those advantages.
> > >
> >
> > Trying to evaluate this with the current proposed code is a bit odd, I
> > think.  Suppose you create a thread-specific key and then fork().  The
> > child can presumably still use the key regardless of whether the child
> > can nominally access the key in the keyring because the PTEs are still
> > there.
>
> Right. This is a little bit odd, although virtualization (Qemu, which is the main use case of MKTME
> at least so far) doesn't use fork().
>
> >
> > More fundamentally, in some sense, the current code has no semantics.
> > Associating a key with memory and "encrypting" it doesn't actually do
> > anything unless you are attacking the memory bus but you haven't
> > compromised the kernel.  There's no protection against a guest that
> > can corrupt its EPT tables, there's no protection against kernel bugs
> > (*especially* if the duplicate direct map design stays), and there
> > isn't even any fd or other object around by which you can only access
> > the data if you can see the key.
>
> I am not saying managing MKTME key/keyID in key retention service is definitely better, but it seems
> all those you mentioned are not related to whether to choose key retention service to manage MKTME
> key/keyID? Or you are saying it doesn't matter we manage key/keyID in key retention service or in
> MKTME driver, since MKTME barely have any security benefits (besides physical attack)?

Mostly the latter.  I think it's very hard to evaluate whether a given
key allocation model makes sense given that MKTME provides such weak
security benefits.  TME has obvious security benefits, as does
encryption of persistent memory, but this giant patch set isn't needed
for plain TME and it doesn't help with persistent memory.


>
> >
> > I'm also wondering whether the kernel will always be able to be a
> > one-stop shop for key allocation -- if the MKTME hardware gains
> > interesting new uses down the road, who knows how key allocation will
> > work?
>
> I by now don't have any use case which requires to manage key/keyID specifically for its own use,
> rather than letting kernel to manage keyID allocation. Please inspire us if you have any potential.
>

Other than compliance, I can't think of much reason that using
multiple keys is useful, regardless of how their allocated.  The only
thing I've thought of is that, with multiple keys, you can use PCONFIG
to remove one and flush caches and the data is most definitely gone.
On the other hand, you can just zero the memory and the data is just
as gone even without any encryption.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  2:23                     ` Kai Huang
@ 2019-06-18  9:12                       ` Peter Zijlstra
  2019-06-18 14:09                         ` Dave Hansen
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2019-06-18  9:12 UTC (permalink / raw)
  To: Kai Huang
  Cc: Andy Lutomirski, Dave Hansen, Kirill A. Shutemov, Andrew Morton,
	X86 ML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, David Howells, Kees Cook, Jacob Pan,
	Alison Schofield, Linux-MM, kvm list, keyrings, LKML,
	Tom Lendacky

On Tue, Jun 18, 2019 at 02:23:31PM +1200, Kai Huang wrote:
> Assuming I am understanding the context correctly, yes from this perspective it seems having
> sys_encrypt is annoying, and having ENCRYPT_ME should be better. But Dave said "nobody is going to
> do what you suggest in the ptr1/ptr2 example"? 

You have to phrase that as: 'nobody who knows what he's doing is going
to do that', which leaves lots of people and fuzzers.

Murphy states that if it is possible, someone _will_ do it. And this
being something that causes severe data corruption on persistent
storage,...

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  9:12                       ` Peter Zijlstra
@ 2019-06-18 14:09                         ` Dave Hansen
  2019-06-18 16:15                           ` Kirill A. Shutemov
  0 siblings, 1 reply; 153+ messages in thread
From: Dave Hansen @ 2019-06-18 14:09 UTC (permalink / raw)
  To: Peter Zijlstra, Kai Huang
  Cc: Andy Lutomirski, Kirill A. Shutemov, Andrew Morton, X86 ML,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	David Howells, Kees Cook, Jacob Pan, Alison Schofield, Linux-MM,
	kvm list, keyrings, LKML, Tom Lendacky

On 6/18/19 2:12 AM, Peter Zijlstra wrote:
> On Tue, Jun 18, 2019 at 02:23:31PM +1200, Kai Huang wrote:
>> Assuming I am understanding the context correctly, yes from this perspective it seems having
>> sys_encrypt is annoying, and having ENCRYPT_ME should be better. But Dave said "nobody is going to
>> do what you suggest in the ptr1/ptr2 example"? 
> 
> You have to phrase that as: 'nobody who knows what he's doing is going
> to do that', which leaves lots of people and fuzzers.
> 
> Murphy states that if it is possible, someone _will_ do it. And this
> being something that causes severe data corruption on persistent
> storage,...

I actually think it's not a big deal at all to avoid the corruption that
would occur if it were allowed.  But, if you're even asking to map the
same data with two different keys, you're *asking* for data corruption.
 What we're doing here is continuing to  preserve cache coherency and
ensuring an early failure.

We'd need two rules:
1. A page must not be faulted into a VMA if the page's page_keyid()
   is not consistent with the VMA's
2. Upon changing the VMA's KeyID, all underlying PTEs must either be
   checked or zapped.

If the rules are broken, we SIGBUS.  Andy's suggestion has the same
basic requirements.  But, with his scheme, the error can be to the
ioctl() instead of in the form of a SIGBUS.  I guess that makes the
fuzzers' lives a bit easier.

BTW, note that we don't have any problems with the current anonymous
implementation and fork() because we zap at the encryption syscall.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  0:15               ` Andy Lutomirski
  2019-06-18  1:35                 ` Kai Huang
@ 2019-06-18 14:13                 ` Dave Hansen
  1 sibling, 0 replies; 153+ messages in thread
From: Dave Hansen @ 2019-06-18 14:13 UTC (permalink / raw)
  To: Andy Lutomirski, Kai Huang
  Cc: Kirill A. Shutemov, Andrew Morton, X86 ML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	David Howells, Kees Cook, Jacob Pan, Alison Schofield, Linux-MM,
	kvm list, keyrings, LKML, Tom Lendacky

On 6/17/19 5:15 PM, Andy Lutomirski wrote:
>>> But I really expect that the encryption of a DAX device will actually
>>> be a block device setting and won't look like this at all.  It'll be
>>> more like dm-crypt except without device mapper.
>> Are you suggesting not to support MKTME for DAX, or adding MKTME support to dm-crypt?
> I'm proposing exposing it by an interface that looks somewhat like
> dm-crypt.  Either we could have a way to create a device layered on
> top of the DAX devices that exposes a decrypted view or we add a way
> to tell the DAX device to kindly use MKTME with such-and-such key.

I think this basically implies that we need to settle (or at least
present) on an interface for storage (FS-DAX, Device DAX, page cache)
before we merge one for anonymous memory.

That sounds like a reasonable exercise.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18  1:50                 ` Andy Lutomirski
  2019-06-18  2:11                   ` Kai Huang
@ 2019-06-18 14:19                   ` Dave Hansen
  1 sibling, 0 replies; 153+ messages in thread
From: Dave Hansen @ 2019-06-18 14:19 UTC (permalink / raw)
  To: Andy Lutomirski, Kai Huang
  Cc: Kirill A. Shutemov, Andrew Morton, X86 ML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	David Howells, Kees Cook, Jacob Pan, Alison Schofield, Linux-MM,
	kvm list, keyrings, LKML, Tom Lendacky

On 6/17/19 6:50 PM, Andy Lutomirski wrote:
> I'm also wondering whether the kernel will always be able to be a
> one-stop shop for key allocation -- if the MKTME hardware gains
> interesting new uses down the road, who knows how key allocation will
> work?

I can't share all the details on LKML, of course, but I can at least say
that this model of allocating KeyID slots will continue to be used for a
number of generations.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18 14:09                         ` Dave Hansen
@ 2019-06-18 16:15                           ` Kirill A. Shutemov
  2019-06-18 16:22                             ` Dave Hansen
  0 siblings, 1 reply; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-06-18 16:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Peter Zijlstra, Kai Huang, Andy Lutomirski, Kirill A. Shutemov,
	Andrew Morton, X86 ML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, David Howells, Kees Cook,
	Jacob Pan, Alison Schofield, Linux-MM, kvm list, keyrings, LKML,
	Tom Lendacky

On Tue, Jun 18, 2019 at 07:09:36AM -0700, Dave Hansen wrote:
> On 6/18/19 2:12 AM, Peter Zijlstra wrote:
> > On Tue, Jun 18, 2019 at 02:23:31PM +1200, Kai Huang wrote:
> >> Assuming I am understanding the context correctly, yes from this perspective it seems having
> >> sys_encrypt is annoying, and having ENCRYPT_ME should be better. But Dave said "nobody is going to
> >> do what you suggest in the ptr1/ptr2 example"? 
> > 
> > You have to phrase that as: 'nobody who knows what he's doing is going
> > to do that', which leaves lots of people and fuzzers.
> > 
> > Murphy states that if it is possible, someone _will_ do it. And this
> > being something that causes severe data corruption on persistent
> > storage,...
> 
> I actually think it's not a big deal at all to avoid the corruption that
> would occur if it were allowed.  But, if you're even asking to map the
> same data with two different keys, you're *asking* for data corruption.
>  What we're doing here is continuing to  preserve cache coherency and
> ensuring an early failure.
> 
> We'd need two rules:
> 1. A page must not be faulted into a VMA if the page's page_keyid()
>    is not consistent with the VMA's
> 2. Upon changing the VMA's KeyID, all underlying PTEs must either be
>    checked or zapped.
> 
> If the rules are broken, we SIGBUS.  Andy's suggestion has the same
> basic requirements.  But, with his scheme, the error can be to the
> ioctl() instead of in the form of a SIGBUS.  I guess that makes the
> fuzzers' lives a bit easier.

I see a problem with the scheme: if we don't have a way to decide if the
key is right for the file, user without access to the right key is able to
prevent legitimate user from accessing the file. Attacker just need read
access to the encrypted file to prevent any legitimate use to access it.

The problem applies to ioctl() too.

To make sense of it we must have a way to distinguish right key from
wrong. I don't see obvious solution with the current hardware design.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18 16:15                           ` Kirill A. Shutemov
@ 2019-06-18 16:22                             ` Dave Hansen
  2019-06-18 16:36                               ` Andy Lutomirski
  0 siblings, 1 reply; 153+ messages in thread
From: Dave Hansen @ 2019-06-18 16:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Peter Zijlstra, Kai Huang, Andy Lutomirski, Kirill A. Shutemov,
	Andrew Morton, X86 ML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, David Howells, Kees Cook,
	Jacob Pan, Alison Schofield, Linux-MM, kvm list, keyrings, LKML,
	Tom Lendacky

On 6/18/19 9:15 AM, Kirill A. Shutemov wrote:
>> We'd need two rules:
>> 1. A page must not be faulted into a VMA if the page's page_keyid()
>>    is not consistent with the VMA's
>> 2. Upon changing the VMA's KeyID, all underlying PTEs must either be
>>    checked or zapped.
>>
>> If the rules are broken, we SIGBUS.  Andy's suggestion has the same
>> basic requirements.  But, with his scheme, the error can be to the
>> ioctl() instead of in the form of a SIGBUS.  I guess that makes the
>> fuzzers' lives a bit easier.
> I see a problem with the scheme: if we don't have a way to decide if the
> key is right for the file, user without access to the right key is able to
> prevent legitimate user from accessing the file. Attacker just need read
> access to the encrypted file to prevent any legitimate use to access it.

I think you're bringing up a separate issue.

We were talking about how you resolve a conflict when someone attempts
to use two *different* keyids to decrypt the data in the API and what
the resulting API interaction looks like.

You're describing the situation where one of those is the wrong *key*
(not keyid).  That's a subtly different scenario and requires different
handling (or no handling IMNHO).

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18 16:22                             ` Dave Hansen
@ 2019-06-18 16:36                               ` Andy Lutomirski
  2019-06-18 16:48                                 ` Dave Hansen
  0 siblings, 1 reply; 153+ messages in thread
From: Andy Lutomirski @ 2019-06-18 16:36 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kirill A. Shutemov, Peter Zijlstra, Kai Huang, Andy Lutomirski,
	Kirill A. Shutemov, Andrew Morton, X86 ML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, David Howells,
	Kees Cook, Jacob Pan, Alison Schofield, Linux-MM, kvm list,
	keyrings, LKML, Tom Lendacky



> On Jun 18, 2019, at 9:22 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 6/18/19 9:15 AM, Kirill A. Shutemov wrote:
>>> We'd need two rules:
>>> 1. A page must not be faulted into a VMA if the page's page_keyid()
>>>   is not consistent with the VMA's
>>> 2. Upon changing the VMA's KeyID, all underlying PTEs must either be
>>>   checked or zapped.
>>> 
>>> If the rules are broken, we SIGBUS.  Andy's suggestion has the same
>>> basic requirements.  But, with his scheme, the error can be to the
>>> ioctl() instead of in the form of a SIGBUS.  I guess that makes the
>>> fuzzers' lives a bit easier.
>> I see a problem with the scheme: if we don't have a way to decide if the
>> key is right for the file, user without access to the right key is able to
>> prevent legitimate user from accessing the file. Attacker just need read
>> access to the encrypted file to prevent any legitimate use to access it.
> 
> I think you're bringing up a separate issue.
> 
> We were talking about how you resolve a conflict when someone attempts
> to use two *different* keyids to decrypt the data in the API and what
> the resulting API interaction looks like.
> 
> You're describing the situation where one of those is the wrong *key*
> (not keyid).  That's a subtly different scenario and requires different
> handling (or no handling IMNHO).

I think we’re quibbling over details before we look at the big questions:

Should MKTME+DAX encrypt the entire volume or should it encrypt individual files?  Or both?

If it encrypts individual files, should the fs be involved at all?  Should there be metadata that can check whether a given key is the correct key?

If it encrypts individual files, is it even conceptually possible to avoid corruption if the fs is not involved?  After all, many filesystems think that they can move data blocks, compute checksums, journal data, etc.

I think Dave is right that there should at least be a somewhat credible proposal for how this could fit together.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME
  2019-06-18 16:36                               ` Andy Lutomirski
@ 2019-06-18 16:48                                 ` Dave Hansen
  0 siblings, 0 replies; 153+ messages in thread
From: Dave Hansen @ 2019-06-18 16:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kirill A. Shutemov, Peter Zijlstra, Kai Huang, Andy Lutomirski,
	Kirill A. Shutemov, Andrew Morton, X86 ML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, David Howells,
	Kees Cook, Jacob Pan, Alison Schofield, Linux-MM, kvm list,
	keyrings, LKML, Tom Lendacky

On 6/18/19 9:36 AM, Andy Lutomirski wrote:
> Should MKTME+DAX encrypt the entire volume or should it encrypt individual files?  Or both?

Our current thought is that there should be two modes: One for entire
DAX namespaces and one for filesystem DAX that would allow it to be at
the file level.  More likely, we would mirror fscrypt and do it at the
directory level.

> If it encrypts individual files, should the fs be involved at all?
> Should there be metadata that can check whether a given key is the
> correct key?
FWIW, this is a question for the fs guys.  Their guidance so far has
been "do what fscrypt does", and fscrypt does not protect against
incorrect keys being specified.  See:

	https://www.kernel.org/doc/html/v5.1/filesystems/fscrypt.html

Which says:

> Currently, fscrypt does not prevent a user from maliciously providing
> an incorrect key for another user’s existing encrypted files. A
> protection against this is planned.

> If it encrypts individual files, is it even conceptually possible to
> avoid corruption if the fs is not involved?  After all, many
> filesystems think that they can move data blocks, compute checksums,
> journal data, etc.

Yes, exactly.  Thankfully, fscrypt has thought about this already and
has infrastructure for this.  For instance:

> Online defragmentation of encrypted files is not supported. The
> EXT4_IOC_MOVE_EXT and F2FS_IOC_MOVE_RANGE ioctls will fail with
> EOPNOTSUPP.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 57/62] x86/mktme: Overview of Multi-Key Total Memory Encryption
  2019-05-08 14:44 ` [PATCH, RFC 57/62] x86/mktme: Overview of Multi-Key Total Memory Encryption Kirill A. Shutemov
  2019-05-29  7:21   ` Mike Rapoport
@ 2019-07-14 18:16   ` Randy Dunlap
  2019-07-15  9:02     ` Kirill A. Shutemov
  1 sibling, 1 reply; 153+ messages in thread
From: Randy Dunlap @ 2019-07-14 18:16 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel

On 5/8/19 7:44 AM, Kirill A. Shutemov wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Provide an overview of MKTME on Intel Platforms.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  Documentation/x86/mktme/index.rst          |  8 +++
>  Documentation/x86/mktme/mktme_overview.rst | 57 ++++++++++++++++++++++
>  2 files changed, 65 insertions(+)
>  create mode 100644 Documentation/x86/mktme/index.rst
>  create mode 100644 Documentation/x86/mktme/mktme_overview.rst


> diff --git a/Documentation/x86/mktme/mktme_overview.rst b/Documentation/x86/mktme/mktme_overview.rst
> new file mode 100644
> index 000000000000..59c023965554
> --- /dev/null
> +++ b/Documentation/x86/mktme/mktme_overview.rst
> @@ -0,0 +1,57 @@
> +Overview
> +=========
...
> +--
> +1. https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
> +2. The MKTME architecture supports up to 16 bits of KeyIDs, so a
> +   maximum of 65535 keys on top of the “TME key” at KeyID-0.  The
> +   first implementation is expected to support 5 bits, making 63

Hi,
How do 5 bits make 63 keys available?

> +   keys available to applications.  However, this is not guaranteed.
> +   The number of available keys could be reduced if, for instance,
> +   additional physical address space is desired over additional
> +   KeyIDs.


thanks.
-- 
~Randy

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH, RFC 57/62] x86/mktme: Overview of Multi-Key Total Memory Encryption
  2019-07-14 18:16   ` Randy Dunlap
@ 2019-07-15  9:02     ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2019-07-15  9:02 UTC (permalink / raw)
  To: Randy Dunlap, Alison Schofield
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, linux-mm, kvm,
	keyrings, linux-kernel

On Sun, Jul 14, 2019 at 06:16:49PM +0000, Randy Dunlap wrote:
> On 5/8/19 7:44 AM, Kirill A. Shutemov wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Provide an overview of MKTME on Intel Platforms.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  Documentation/x86/mktme/index.rst          |  8 +++
> >  Documentation/x86/mktme/mktme_overview.rst | 57 ++++++++++++++++++++++
> >  2 files changed, 65 insertions(+)
> >  create mode 100644 Documentation/x86/mktme/index.rst
> >  create mode 100644 Documentation/x86/mktme/mktme_overview.rst
> 
> 
> > diff --git a/Documentation/x86/mktme/mktme_overview.rst b/Documentation/x86/mktme/mktme_overview.rst
> > new file mode 100644
> > index 000000000000..59c023965554
> > --- /dev/null
> > +++ b/Documentation/x86/mktme/mktme_overview.rst
> > @@ -0,0 +1,57 @@
> > +Overview
> > +=========
> ...
> > +--
> > +1. https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
> > +2. The MKTME architecture supports up to 16 bits of KeyIDs, so a
> > +   maximum of 65535 keys on top of the “TME key” at KeyID-0.  The
> > +   first implementation is expected to support 5 bits, making 63
> 
> Hi,
> How do 5 bits make 63 keys available?

Yep, typo. It has to be 6 bits.

Alison, please correct this.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

end of thread, back to index

Thread overview: 153+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-08 14:43 [PATCH, RFC 00/62] Intel MKTME enabling Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 01/62] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 02/62] mm: Add helpers to setup zero page mappings Kirill A. Shutemov
2019-05-29  7:21   ` Mike Rapoport
2019-05-08 14:43 ` [PATCH, RFC 03/62] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
2019-05-10 18:07   ` Dave Hansen
2019-05-13 14:27     ` Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 04/62] mm/page_alloc: Unify alloc_hugepage_vma() Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 05/62] mm/page_alloc: Handle allocation for encrypted memory Kirill A. Shutemov
2019-05-29  7:21   ` Mike Rapoport
2019-05-29 12:47     ` Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 06/62] mm/khugepaged: Handle encrypted pages Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 07/62] x86/mm: Mask out KeyID bits from page table entry pfn Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 08/62] x86/mm: Introduce variables to store number, shift and mask of KeyIDs Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 09/62] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify() Kirill A. Shutemov
2019-06-14  9:15   ` Peter Zijlstra
2019-06-14 13:03     ` Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 10/62] x86/mm: Detect MKTME early Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 11/62] x86/mm: Add a helper to retrieve KeyID for a page Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 12/62] x86/mm: Add a helper to retrieve KeyID for a VMA Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 13/62] x86/mm: Add hooks to allocate and free encrypted pages Kirill A. Shutemov
2019-06-14  9:34   ` Peter Zijlstra
2019-06-14 11:04     ` Peter Zijlstra
2019-06-14 13:28       ` Kirill A. Shutemov
2019-06-14 13:43         ` Peter Zijlstra
2019-06-14 22:41           ` Kirill A. Shutemov
2019-06-17  9:25             ` Peter Zijlstra
2019-06-14 13:14     ` Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 14/62] x86/mm: Map zero pages into encrypted mappings correctly Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 15/62] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 16/62] x86/mm: Allow to disable MKTME after enumeration Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 17/62] x86/mm: Calculate direct mapping size Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 18/62] x86/mm: Implement syncing per-KeyID direct mappings Kirill A. Shutemov
2019-06-14  9:51   ` Peter Zijlstra
2019-06-14 22:43     ` Kirill A. Shutemov
2019-06-17  9:27       ` Peter Zijlstra
2019-06-17 14:43         ` Kirill A. Shutemov
2019-06-17 14:51           ` Peter Zijlstra
2019-06-17 15:17             ` Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 19/62] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
2019-06-14 11:10   ` Peter Zijlstra
2019-05-08 14:43 ` [PATCH, RFC 20/62] mm/page_ext: Export lookup_page_ext() symbol Kirill A. Shutemov
2019-06-14 11:12   ` Peter Zijlstra
2019-06-14 22:44     ` Kirill A. Shutemov
2019-06-17  9:30       ` Peter Zijlstra
2019-06-17 11:01         ` Kai Huang
2019-06-17 11:13           ` Huang, Kai
2019-05-08 14:43 ` [PATCH, RFC 21/62] mm/rmap: Clear vma->anon_vma on unlink_anon_vmas() Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 22/62] x86/pconfig: Set a valid encryption algorithm for all MKTME commands Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 23/62] keys/mktme: Introduce a Kernel Key Service for MKTME Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 24/62] keys/mktme: Preparse the MKTME key payload Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 25/62] keys/mktme: Instantiate and destroy MKTME keys Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 26/62] keys/mktme: Move the MKTME payload into a cache aligned structure Kirill A. Shutemov
2019-06-14 11:35   ` Peter Zijlstra
2019-06-14 17:10     ` Alison Schofield
2019-05-08 14:43 ` [PATCH, RFC 27/62] keys/mktme: Strengthen the entropy of CPU generated MKTME keys Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 28/62] keys/mktme: Set up PCONFIG programming targets for " Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 29/62] keys/mktme: Program MKTME keys into the platform hardware Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 30/62] keys/mktme: Set up a percpu_ref_count for MKTME keys Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 31/62] keys/mktme: Require CAP_SYS_RESOURCE capability " Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 32/62] keys/mktme: Store MKTME payloads if cmdline parameter allows Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 33/62] acpi: Remove __init from acpi table parsing functions Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 34/62] acpi/hmat: Determine existence of an ACPI HMAT Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 35/62] keys/mktme: Require ACPI HMAT to register the MKTME Key Service Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 36/62] acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 37/62] keys/mktme: Do not allow key creation in unsafe topologies Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 38/62] keys/mktme: Support CPU hotplug for MKTME key service Kirill A. Shutemov
2019-05-08 14:43 ` [PATCH, RFC 39/62] keys/mktme: Find new PCONFIG targets during memory hotplug Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 40/62] keys/mktme: Program new PCONFIG targets with MKTME keys Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 41/62] keys/mktme: Support memory hotplug for " Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 42/62] mm: Generalize the mprotect implementation to support extensions Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 43/62] syscall/x86: Wire up a system call for MKTME encryption keys Kirill A. Shutemov
2019-05-29  7:21   ` Mike Rapoport
2019-05-29 18:12     ` Alison Schofield
2019-05-08 14:44 ` [PATCH, RFC 44/62] x86/mm: Set KeyIDs in encrypted VMAs for MKTME Kirill A. Shutemov
2019-06-14 11:44   ` Peter Zijlstra
2019-06-14 17:33     ` Alison Schofield
2019-06-14 18:26       ` Dave Hansen
2019-06-14 18:46         ` Alison Schofield
2019-06-14 19:11           ` Dave Hansen
2019-06-17  9:10             ` Peter Zijlstra
2019-05-08 14:44 ` [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call " Kirill A. Shutemov
2019-06-14 11:47   ` Peter Zijlstra
2019-06-14 17:35     ` Alison Schofield
2019-06-14 11:51   ` Peter Zijlstra
2019-06-15  0:32     ` Alison Schofield
2019-06-17  9:08       ` Peter Zijlstra
2019-06-17 15:07   ` Andy Lutomirski
2019-06-17 15:28     ` Dave Hansen
2019-06-17 15:46       ` Andy Lutomirski
2019-06-17 18:27         ` Dave Hansen
2019-06-17 19:12           ` Andy Lutomirski
2019-06-17 21:36             ` Dave Hansen
2019-06-18  0:48               ` Kai Huang
2019-06-18  1:50                 ` Andy Lutomirski
2019-06-18  2:11                   ` Kai Huang
2019-06-18  4:24                     ` Andy Lutomirski
2019-06-18 14:19                   ` Dave Hansen
2019-06-18  0:05             ` Kai Huang
2019-06-18  0:15               ` Andy Lutomirski
2019-06-18  1:35                 ` Kai Huang
2019-06-18  1:43                   ` Andy Lutomirski
2019-06-18  2:23                     ` Kai Huang
2019-06-18  9:12                       ` Peter Zijlstra
2019-06-18 14:09                         ` Dave Hansen
2019-06-18 16:15                           ` Kirill A. Shutemov
2019-06-18 16:22                             ` Dave Hansen
2019-06-18 16:36                               ` Andy Lutomirski
2019-06-18 16:48                                 ` Dave Hansen
2019-06-18 14:13                 ` Dave Hansen
2019-06-17 23:59           ` Kai Huang
2019-06-18  1:34             ` Lendacky, Thomas
2019-06-18  1:40               ` Andy Lutomirski
2019-06-18  2:02                 ` Lendacky, Thomas
2019-06-18  4:19                 ` Andy Lutomirski
2019-05-08 14:44 ` [PATCH, RFC 46/62] x86/mm: Keep reference counts on encrypted VMAs " Kirill A. Shutemov
2019-06-14 11:54   ` Peter Zijlstra
2019-06-14 18:39     ` Alison Schofield
2019-05-08 14:44 ` [PATCH, RFC 47/62] mm: Restrict MKTME memory encryption to anonymous VMAs Kirill A. Shutemov
2019-06-14 11:55   ` Peter Zijlstra
2019-06-15  0:07     ` Alison Schofield
2019-05-08 14:44 ` [PATCH, RFC 48/62] selftests/x86/mktme: Test the MKTME APIs Kirill A. Shutemov
2019-05-08 17:09   ` Alison Schofield
2019-05-08 14:44 ` [PATCH, RFC 49/62] mm, x86: export several MKTME variables Kirill A. Shutemov
2019-06-14 11:56   ` Peter Zijlstra
2019-06-17  3:14     ` Kai Huang
2019-06-17  7:46       ` Peter Zijlstra
2019-06-17  8:39         ` Kai Huang
2019-06-17 11:25           ` Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 50/62] kvm, x86, mmu: setup MKTME keyID to spte for given PFN Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 51/62] iommu/vt-d: Support MKTME in DMA remapping Kirill A. Shutemov
2019-06-14 12:04   ` Peter Zijlstra
2019-05-08 14:44 ` [PATCH, RFC 52/62] x86/mm: introduce common code for mem encryption Kirill A. Shutemov
2019-05-08 16:58   ` Christoph Hellwig
2019-05-08 20:52     ` Jacob Pan
2019-05-08 21:21       ` Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 53/62] x86/mm: Use common code for DMA memory encryption Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 54/62] x86/mm: Disable MKTME on incompatible platform configurations Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 55/62] x86/mm: Disable MKTME if not all system memory supports encryption Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 56/62] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 57/62] x86/mktme: Overview of Multi-Key Total Memory Encryption Kirill A. Shutemov
2019-05-29  7:21   ` Mike Rapoport
2019-05-29 18:13     ` Alison Schofield
2019-07-14 18:16   ` Randy Dunlap
2019-07-15  9:02     ` Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 58/62] x86/mktme: Document the MKTME provided security mitigations Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 59/62] x86/mktme: Document the MKTME kernel configuration requirements Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 60/62] x86/mktme: Document the MKTME Key Service API Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 61/62] x86/mktme: Document the MKTME API for anonymous memory encryption Kirill A. Shutemov
2019-05-08 14:44 ` [PATCH, RFC 62/62] x86/mktme: Demonstration program using the MKTME APIs Kirill A. Shutemov
2019-05-29  7:30 ` [PATCH, RFC 00/62] Intel MKTME enabling Mike Rapoport
2019-05-29 18:20   ` Alison Schofield
2019-06-14 12:15 ` Peter Zijlstra

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org kvm@archiver.kernel.org
	public-inbox-index kvm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox