Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCHv2 00/59] Intel MKTME enabling
@ 2019-07-31 15:07 Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 01/59] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
                   ` (58 more replies)
  0 siblings, 59 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

= Intro =

The patchset brings enabling of Intel Multi-Key Total Memory Encryption.
It consists of changes into multiple subsystems:

 * Core MM: infrastructure for allocation pages, dealing with encrypted VMAs
   and providing API setup encrypted mappings.
 * arch/x86: feature enumeration, program keys into hardware, setup
   page table entries for encrypted pages and more.
 * Key management service: setup and management of encryption keys.
 * DMA/IOMMU: dealing with encrypted memory on IO side.
 * KVM: interaction with virtualization side.
 * Documentation: description of APIs and usage examples.

Please review. Any feedback is welcome.

= Overview =

Multi-Key Total Memory Encryption (MKTME)[1] is a technology that allows
transparent memory encryption in upcoming Intel platforms.  It uses a new
instruction (PCONFIG) for key setup and selects a key for individual pages
by repurposing physical address bits in the page tables.

These patches add support for MKTME into the existing kernel keyring
subsystem and add a new mprotect_encrypt() system call that can be used by
applications to encrypt anonymous memory with keys obtained from the
keyring.

This architecture supports encrypting both normal, volatile DRAM and
persistent memory.  However, these patches do not implement persistent
memory support.  We anticipate adding that support next.

== Hardware Background ==

MKTME is built on top of an existing single-key technology called TME.
TME encrypts all system memory using a single key generated by the CPU on
every boot of the system. TME provides robust mitigation against
single-read physical attacks, such as physically removing a DIMM and
inspecting its contents.  TME provides weaker mitigations against
multiple-read physical attacks.

MKTME enables the use of multiple encryption keys[2], allowing selection
of the encryption key per-page using the page tables.  Encryption keys are
programmed into each memory controller and the same set of keys is
available to all entities on the system with access to that memory (all
cores, DMA engines, etc...).

MKTME inherits many of the mitigations against hardware attacks from TME.
Like TME, MKTME does not fully mitigate vulnerable or malicious operating
systems or virtual machine managers.  MKTME offers additional mitigations
when compared to TME.

TME and MKTME use the AES encryption algorithm in the AES-XTS mode.  This
mode, typically used for block-based storage devices, takes the physical
address of the data into account when encrypting each block.  This ensures
that the effective key is different for each block of memory. Moving
encrypted content across physical address results in garbage on read,
mitigating block-relocation attacks.  This property is the reason many of
the discussed attacks require control of a shared physical page to be
handed from the victim to the attacker.

== MKTME-Provided Mitigations ==

MKTME adds a few mitigations against attacks that are not mitigated when
using TME alone.  The first set are mitigations against software attacks
that are familiar today:

 * Kernel Mapping Attacks: information disclosures that leverage the
   kernel direct map are mitigated against disclosing user data.
 * Freed Data Leak Attacks: removing an encryption key from the
   hardware mitigates future user information disclosure.

The next set are attacks that depend on specialized hardware, such as an
“evil DIMM” or a DDR interposer:

 * Cross-Domain Replay Attack: data is captured from one domain
   (guest) and replayed to another at a later time.
 * Cross-Domain Capture and Delayed Compare Attack: data is captured
   and later analyzed to discover secrets.
 * Key Wear-out Attack: data is captured and analyzed in order to
   later write precise changes to plaintext.

More details on these attacks are below.

MKTME does not mitigate all attacks that can be performed with an “evil
DIMM” or a DDR interposer.  In determining MKTME’s security value in an
environment, the ease and effectiveness of the above attacks mitigated by
MKTME should be compared with those which are not mitigated.  Some key
examples of unmitigated attacks follow:

  * Random Data Modification Attack: An attacker writes random
    ciphertext, which causes the victim to consume random data.
    This can be used to flip security-sensitive bits.
  * Same-Domain Replay Attacks: Data can be captured and replayed
    within a single domain. An attacker could, for instance, replay
    an old ‘struct cred’ value to a newer, less-privileged process.
  * Ciphertext Side Channel Attacks: Similar to delayed-compare
    attacks, useful information might be inferred even from ciphertext.
    This information might be leveraged to infer information about
    secrets such as private keys.

=== Kernel Mapping Attacks ===

Information disclosure vulnerabilities leverage the kernel direct map
because many vulnerabilities involve manipulation of kernel data
structures (examples: CVE-2017-7277, CVE-2017-9605).  We normally think of
these bugs as leaking valuable *kernel* data, but they can leak
application data when application pages are recycled for kernel use.

With this MKTME implementation, there is a direct map created for each
MKTME KeyID which is used whenever the kernel needs to access plaintext.
But, all kernel data structures are accessed via the direct map for
KeyID-0.  Thus, memory reads which are not coordinated with the KeyID get
garbage (for example, accessing KeyID-4 data with the KeyID-0 mapping).

This means that if sensitive data encrypted using MKTME is leaked via the
KeyID-0 direct map, ciphertext decrypted with the wrong key will be
disclosed.  To disclose plaintext, an attacker must “pivot” to the correct
direct mapping, which is non-trivial because there are no kernel data
structures in the KeyID!=0 direct mapping.

=== Freed Data Leak Attack ===

The kernel has a history of bugs around uninitialized data.  Usually, we
think of these bugs as leaking sensitive kernel data, but they can also be
used to leak application secrets.

MKTME can help mitigate the case where application secrets are leaked:

 * App (or VM) places a secret in a page
 * App exits or frees memory to kernel allocator
 * Page added to allocator free list
 * Attacker reallocates page to a purpose where it can read the page

Now, imagine MKTME was in use on the memory being leaked.  The data can
only be leaked as long as the key is programmed in the hardware.  If the
key is de-programmed, like after all pages are freed after a guest is shut
down, any future reads will just see ciphertext.

Basically, the key is a convenient choke-point: you can be more confident
that data encrypted with it is inaccessible once the key is removed.

=== Cross-Domain Replay Attack ===

MKTME mitigates cross-domain replay attacks where an attacker replaces an
encrypted block owned by one domain with a block owned by another domain.
MKTME does not prevent this replacement from occurring, but it does
mitigate plaintext from being disclosed if the domains use different keys.

With TME, the attack could be executed by:
 * A victim places secret in memory, at a given physical address.
   Note: AES-XTS is what restricts the attack to being performed at a
   single physical address instead of across different physical
   addresses
 * Attacker captures victim secret’s ciphertext
 * Later on, after victim frees the physical address, attacker gains
   ownership
 * Attacker puts the ciphertext at the address and get the secret
   plaintext

But, due to the presumably different keys used by the attacker and the
victim, the attacker can not successfully decrypt old ciphertext.

=== Cross-Domain Capture and Delayed Compare Attack ===

This is also referred to as a kind of dictionary attack.

Similarly, MKTME protects against cross-domain capture-and-compare
attacks. Consider the following scenario:
 * A victim places a secret in memory, at a known physical address
 * Attacker captures victim’s ciphertext
 * Attacker gains control of the target physical address, perhaps
   after the victim’s VM is shut down or its memory reclaimed.
 * Attacker computes and writes many possible plaintexts until new
   ciphertext matches content captured previously.

Secrets which have low (plaintext) entropy are more vulnerable to this
attack because they reduce the number of possible plaintexts an attacker
has to compute and write.

The attack will not work if attacker and victim uses different keys.

=== Key Wear-out Attack ===

Repeated use of an encryption key might be used by an attacker to infer
information about the key or the plaintext, weakening the encryption.  The
higher the bandwidth of the encryption engine, the more vulnerable the key
is to wear-out.  The MKTME memory encryption hardware works at the speed
of the memory bus, which has high bandwidth.

This attack requires capturing potentially large amounts of cipertext,
processing it, then replaying modified cipertext.  Our expectation is that
most attackers would opt for lower-cost attacks like the replay attack
mentioned above.

For this implementation, the kernel always uses KeyID-0 which is always
vulnerable to wear out since it can not be rotated.  KeyID-0 wearout can
be mitigated by limiting the bandwidth with which an attacker can write
KeyID-0-encrypted data.

Such a weakness has been demonstrated[3] on a theoretical cipher with
similar properties as AES-XTS.

An attack would take the following steps:
 * Victim system is using TME with AES-XTS-128
 * Attacker repeatedly captures ciphertext/plaintext pairs (can be
   performed with online hardware attack like an interposer).
 * Attacker compels repeated use of the key under attack for a
   sustained time period without a system reboot[4].
 * Attacker discovers a ‘plaintext XOR cipertext’ collision pair
 * Attacker can induce controlled modifications to the targeted
   plaintext by modifying the colliding ciphertext

MKTME mitigates key wear-out in two ways:
 * Keys can be rotated periodically to mitigate wear-out.  Since TME
   keys are generated at boot, rotation of TME keys requires a
   reboot.  In contrast, MKTME allows rotation while the system is
   booted.  An application could implement a policy to rotate keys at
   a frequency which is not feasible to attack.
 * In the case that MKTME is used to encrypt two guests’ memory with
   two different keys, an attack on one guest’s key would not weaken
   the key used in the second guest.

== Userspace API ==

Here's an overview of the anonymous memory encryption process
as viewed from user space:

 * Allocate an MKTME Key:
        key = add_key("mktme", "name", "type=cpu algorithm=aes-xts-128" @u
 * Map memory:
        ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
 * Protect memory:
        ret = syscall(SYS_encrypt_mprotect, ptr, size, PROT_READ|PROT_WRITE,
                      key);

*Enjoy the encrypted memory*

 * Free memory:
        ret = munmap(ptr, size);

 * Free the MKTME key:
        ret = keyctl(KEYCTL_INVALIDATE, key);

See the documentation patches for more info and a demo program.

This update removes support for user type keys. This closes a security
gap, where the encryption keys were exposed to user space. Additionally,
memory hotplug support was basically removed from the API. Only skeleton
support remains to enforce the rule that no new memory may be added to the
MKTME system.  This is a deferral of memory hot add support until the
platform support is in place.

== Changelog ==

v2:
 - Add comments in allocation and free paths on how ordering is ensured.
 - Modify pageattr code to sync direct mapping after the canonical direct
   mapping is modified.
 - Introduce helpers to access number of KeyIDs, KeyID shift and KeyID
   mask.
 - Drop unneeded EXPORT_SYMBOL_GPL().
 - User type key support, keys in which users bring their own encryption
   keys, has been removed. CPU generated keys remain and should be used
   instead of USER type keys.  (removes security gap, reduces complexity)
 - Adding a CPU generated key no longer offers the user an option of
   supplying additional entropy to the data and tweak key. (reduces
   complexity)
 - Memory hotplug add support is removed. This is basically a deferral
   of the feature until we have platform support for the feature.
   (reduces complexity)
 - Documentation is updated to match changes to the add key API.
 - Documentation adds an index in the x86 index, and corrects a typo.
 - Reference counting: code and commit message comments are updated
   to reflect the general nature of the ref counter. Previous comments
   said it counted VMAs only.
 - Replace an GFP_ATOMIC with GFP_KERNEL is mktme_keys.c

--

[1] https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
[2] The MKTME architecture supports up to 16 bits of KeyIDs, so a
    maximum of 65535 keys on top of the “TME key” at KeyID-0.  The
    first implementation is expected to support 5 bits, making 63 keys
    available to applications.  However, this is not guaranteed.  The
    number of available keys could be reduced if, for instance,
    additional physical address space is desired over additional
    KeyIDs.
[3] http://web.cs.ucdavis.edu/~rogaway/papers/offsets.pdf
[4] This sustained time required for an attack could vary from days
    to years depending on the attacker’s goals.

Alison Schofield (30):
  x86/pconfig: Set an activated algorithm in all MKTME commands
  keys/mktme: Introduce a Kernel Key Service for MKTME
  keys/mktme: Preparse the MKTME key payload
  keys/mktme: Instantiate MKTME keys
  keys/mktme: Destroy MKTME keys
  keys/mktme: Move the MKTME payload into a cache aligned structure
  keys/mktme: Set up PCONFIG programming targets for MKTME keys
  keys/mktme: Program MKTME keys into the platform hardware
  keys/mktme: Set up a percpu_ref_count for MKTME keys
  keys/mktme: Clear the key programming from the MKTME hardware
  keys/mktme: Require CAP_SYS_RESOURCE capability for MKTME keys
  acpi: Remove __init from acpi table parsing functions
  acpi/hmat: Determine existence of an ACPI HMAT
  keys/mktme: Require ACPI HMAT to register the MKTME Key Service
  acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME
  keys/mktme: Do not allow key creation in unsafe topologies
  keys/mktme: Support CPU hotplug for MKTME key service
  keys/mktme: Block memory hotplug additions when MKTME is enabled
  mm: Generalize the mprotect implementation to support extensions
  syscall/x86: Wire up a system call for MKTME encryption keys
  x86/mm: Set KeyIDs in encrypted VMAs for MKTME
  mm: Add the encrypt_mprotect() system call for MKTME
  x86/mm: Keep reference counts on hardware key usage for MKTME
  mm: Restrict MKTME memory encryption to anonymous VMAs
  x86/mktme: Overview of Multi-Key Total Memory Encryption
  x86/mktme: Document the MKTME provided security mitigations
  x86/mktme: Document the MKTME kernel configuration requirements
  x86/mktme: Document the MKTME Key Service API
  x86/mktme: Document the MKTME API for anonymous memory encryption
  x86/mktme: Demonstration program using the MKTME APIs

Jacob Pan (3):
  iommu/vt-d: Support MKTME in DMA remapping
  x86/mm: introduce common code for mem encryption
  x86/mm: Use common code for DMA memory encryption

Kai Huang (1):
  kvm, x86, mmu: setup MKTME keyID to spte for given PFN

Kirill A. Shutemov (25):
  mm: Do no merge VMAs with different encryption KeyIDs
  mm: Add helpers to setup zero page mappings
  mm/ksm: Do not merge pages with different KeyIDs
  mm/page_alloc: Unify alloc_hugepage_vma()
  mm/page_alloc: Handle allocation for encrypted memory
  mm/khugepaged: Handle encrypted pages
  x86/mm: Mask out KeyID bits from page table entry pfn
  x86/mm: Introduce helpers to read number, shift and mask of KeyIDs
  x86/mm: Store bitmask of the encryption algorithms supported by MKTME
  x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
  x86/mm: Detect MKTME early
  x86/mm: Add a helper to retrieve KeyID for a page
  x86/mm: Add a helper to retrieve KeyID for a VMA
  x86/mm: Add hooks to allocate and free encrypted pages
  x86/mm: Map zero pages into encrypted mappings correctly
  x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING
  x86/mm: Allow to disable MKTME after enumeration
  x86/mm: Calculate direct mapping size
  x86/mm: Implement syncing per-KeyID direct mappings
  x86/mm: Handle encrypted memory in page_to_virt() and __pa()
  mm/page_ext: Export lookup_page_ext() symbol
  mm/rmap: Clear vma->anon_vma on unlink_anon_vmas()
  x86/mm: Disable MKTME on incompatible platform configurations
  x86/mm: Disable MKTME if not all system memory supports encryption
  x86: Introduce CONFIG_X86_INTEL_MKTME

 Documentation/x86/index.rst                   |   1 +
 Documentation/x86/mktme/index.rst             |  13 +
 .../x86/mktme/mktme_configuration.rst         |   6 +
 Documentation/x86/mktme/mktme_demo.rst        |  53 ++
 Documentation/x86/mktme/mktme_encrypt.rst     |  56 ++
 Documentation/x86/mktme/mktme_keys.rst        |  61 ++
 Documentation/x86/mktme/mktme_mitigations.rst | 151 ++++
 Documentation/x86/mktme/mktme_overview.rst    |  57 ++
 Documentation/x86/x86_64/mm.rst               |   4 +
 arch/alpha/include/asm/page.h                 |   2 +-
 arch/x86/Kconfig                              |  31 +-
 arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
 arch/x86/include/asm/intel_pconfig.h          |  14 +-
 arch/x86/include/asm/mem_encrypt.h            |  29 +
 arch/x86/include/asm/mktme.h                  |  96 +++
 arch/x86/include/asm/page.h                   |   4 +
 arch/x86/include/asm/page_32.h                |   1 +
 arch/x86/include/asm/page_64.h                |   4 +-
 arch/x86/include/asm/pgtable.h                |  19 +
 arch/x86/include/asm/pgtable_types.h          |  23 +-
 arch/x86/include/asm/setup.h                  |   6 +
 arch/x86/kernel/cpu/intel.c                   |  65 +-
 arch/x86/kernel/head64.c                      |   4 +
 arch/x86/kernel/setup.c                       |   3 +
 arch/x86/kvm/mmu.c                            |  18 +-
 arch/x86/mm/Makefile                          |   3 +
 arch/x86/mm/init_64.c                         |  65 ++
 arch/x86/mm/kaslr.c                           |  11 +-
 arch/x86/mm/mem_encrypt.c                     |  30 -
 arch/x86/mm/mem_encrypt_common.c              |  52 ++
 arch/x86/mm/mktme.c                           | 683 ++++++++++++++++++
 arch/x86/mm/pageattr.c                        |  27 +
 drivers/acpi/hmat/hmat.c                      |  67 ++
 drivers/acpi/tables.c                         |  10 +-
 drivers/firmware/efi/efi.c                    |  25 +-
 drivers/iommu/intel-iommu.c                   |  29 +-
 fs/dax.c                                      |   3 +-
 fs/exec.c                                     |   4 +-
 fs/userfaultfd.c                              |   7 +-
 include/asm-generic/pgtable.h                 |   8 +
 include/keys/mktme-type.h                     |  31 +
 include/linux/acpi.h                          |   9 +-
 include/linux/dma-direct.h                    |   4 +-
 include/linux/efi.h                           |   1 +
 include/linux/gfp.h                           |  56 +-
 include/linux/intel-iommu.h                   |   9 +-
 include/linux/mem_encrypt.h                   |  23 +-
 include/linux/migrate.h                       |  14 +-
 include/linux/mm.h                            |  27 +-
 include/linux/page_ext.h                      |  11 +-
 include/linux/syscalls.h                      |   2 +
 include/uapi/asm-generic/unistd.h             |   4 +-
 kernel/fork.c                                 |   2 +
 kernel/sys_ni.c                               |   2 +
 mm/compaction.c                               |   3 +
 mm/huge_memory.c                              |   6 +-
 mm/khugepaged.c                               |  10 +
 mm/ksm.c                                      |  17 +
 mm/madvise.c                                  |   2 +-
 mm/memory.c                                   |   3 +-
 mm/mempolicy.c                                |  30 +-
 mm/migrate.c                                  |   4 +-
 mm/mlock.c                                    |   2 +-
 mm/mmap.c                                     |  31 +-
 mm/mprotect.c                                 |  98 ++-
 mm/page_alloc.c                               |  74 ++
 mm/page_ext.c                                 |   5 +
 mm/rmap.c                                     |   4 +-
 mm/userfaultfd.c                              |   3 +-
 security/keys/Makefile                        |   1 +
 security/keys/mktme_keys.c                    | 590 +++++++++++++++
 72 files changed, 2670 insertions(+), 155 deletions(-)
 create mode 100644 Documentation/x86/mktme/index.rst
 create mode 100644 Documentation/x86/mktme/mktme_configuration.rst
 create mode 100644 Documentation/x86/mktme/mktme_demo.rst
 create mode 100644 Documentation/x86/mktme/mktme_encrypt.rst
 create mode 100644 Documentation/x86/mktme/mktme_keys.rst
 create mode 100644 Documentation/x86/mktme/mktme_mitigations.rst
 create mode 100644 Documentation/x86/mktme/mktme_overview.rst
 create mode 100644 arch/x86/include/asm/mktme.h
 create mode 100644 arch/x86/mm/mem_encrypt_common.c
 create mode 100644 arch/x86/mm/mktme.c
 create mode 100644 include/keys/mktme-type.h
 create mode 100644 security/keys/mktme_keys.c

-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 01/59] mm: Do no merge VMAs with different encryption KeyIDs
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 02/59] mm: Add helpers to setup zero page mappings Kirill A. Shutemov
                   ` (57 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

VMAs with different KeyID do not mix together. Only VMAs with the same
KeyID are compatible.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/userfaultfd.c   |  7 ++++---
 include/linux/mm.h |  9 ++++++++-
 mm/madvise.c       |  2 +-
 mm/mempolicy.c     |  3 ++-
 mm/mlock.c         |  2 +-
 mm/mmap.c          | 31 +++++++++++++++++++------------
 mm/mprotect.c      |  2 +-
 7 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index ccbdbd62f0d8..3b845a6a44d0 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -911,7 +911,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 				 new_flags, vma->anon_vma,
 				 vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
-				 NULL_VM_UFFD_CTX);
+				 NULL_VM_UFFD_CTX, vma_keyid(vma));
 		if (prev)
 			vma = prev;
 		else
@@ -1461,7 +1461,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		prev = vma_merge(mm, prev, start, vma_end, new_flags,
 				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
-				 ((struct vm_userfaultfd_ctx){ ctx }));
+				 ((struct vm_userfaultfd_ctx){ ctx }),
+				 vma_keyid(vma));
 		if (prev) {
 			vma = prev;
 			goto next;
@@ -1623,7 +1624,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 		prev = vma_merge(mm, prev, start, vma_end, new_flags,
 				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
-				 NULL_VM_UFFD_CTX);
+				 NULL_VM_UFFD_CTX, vma_keyid(vma));
 		if (prev) {
 			vma = prev;
 			goto next;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0334ca97c584..5bfd3dd121c1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1637,6 +1637,13 @@ int clear_page_dirty_for_io(struct page *page);
 
 int get_cmdline(struct task_struct *task, char *buffer, int buflen);
 
+#ifndef vma_keyid
+static inline int vma_keyid(struct vm_area_struct *vma)
+{
+	return 0;
+}
+#endif
+
 extern unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
@@ -2301,7 +2308,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
 extern struct vm_area_struct *vma_merge(struct mm_struct *,
 	struct vm_area_struct *prev, unsigned long addr, unsigned long end,
 	unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
-	struct mempolicy *, struct vm_userfaultfd_ctx);
+	struct mempolicy *, struct vm_userfaultfd_ctx, int keyid);
 extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
 extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
 	unsigned long addr, int new_below);
diff --git a/mm/madvise.c b/mm/madvise.c
index 968df3aa069f..00216780a630 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -138,7 +138,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
 	*prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
 			  vma->vm_file, pgoff, vma_policy(vma),
-			  vma->vm_userfaultfd_ctx);
+			  vma->vm_userfaultfd_ctx, vma_keyid(vma));
 	if (*prev) {
 		vma = *prev;
 		goto success;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f48693f75b37..14ee933b1ff7 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -731,7 +731,8 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 			((vmstart - vma->vm_start) >> PAGE_SHIFT);
 		prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
 				 vma->anon_vma, vma->vm_file, pgoff,
-				 new_pol, vma->vm_userfaultfd_ctx);
+				 new_pol, vma->vm_userfaultfd_ctx,
+				 vma_keyid(vma));
 		if (prev) {
 			vma = prev;
 			next = vma->vm_next;
diff --git a/mm/mlock.c b/mm/mlock.c
index a90099da4fb4..3d0a31bf214c 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -535,7 +535,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
 	*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
 			  vma->vm_file, pgoff, vma_policy(vma),
-			  vma->vm_userfaultfd_ctx);
+			  vma->vm_userfaultfd_ctx, vma_keyid(vma));
 	if (*prev) {
 		vma = *prev;
 		goto success;
diff --git a/mm/mmap.c b/mm/mmap.c
index 7e8c3e8ae75f..715438a1fb93 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1008,7 +1008,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
  */
 static inline int is_mergeable_vma(struct vm_area_struct *vma,
 				struct file *file, unsigned long vm_flags,
-				struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+				struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+				int keyid)
 {
 	/*
 	 * VM_SOFTDIRTY should not prevent from VMA merging, if we
@@ -1022,6 +1023,8 @@ static inline int is_mergeable_vma(struct vm_area_struct *vma,
 		return 0;
 	if (vma->vm_file != file)
 		return 0;
+	if (vma_keyid(vma) != keyid)
+		return 0;
 	if (vma->vm_ops && vma->vm_ops->close)
 		return 0;
 	if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
@@ -1058,9 +1061,10 @@ static int
 can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
 		     struct anon_vma *anon_vma, struct file *file,
 		     pgoff_t vm_pgoff,
-		     struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+		     struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+		     int keyid)
 {
-	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
+	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, keyid) &&
 	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
 		if (vma->vm_pgoff == vm_pgoff)
 			return 1;
@@ -1079,9 +1083,10 @@ static int
 can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
 		    struct anon_vma *anon_vma, struct file *file,
 		    pgoff_t vm_pgoff,
-		    struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+		    struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+		    int keyid)
 {
-	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
+	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, keyid) &&
 	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
 		pgoff_t vm_pglen;
 		vm_pglen = vma_pages(vma);
@@ -1136,7 +1141,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 			unsigned long end, unsigned long vm_flags,
 			struct anon_vma *anon_vma, struct file *file,
 			pgoff_t pgoff, struct mempolicy *policy,
-			struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+			struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+			int keyid)
 {
 	pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
 	struct vm_area_struct *area, *next;
@@ -1169,7 +1175,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 			mpol_equal(vma_policy(prev), policy) &&
 			can_vma_merge_after(prev, vm_flags,
 					    anon_vma, file, pgoff,
-					    vm_userfaultfd_ctx)) {
+					    vm_userfaultfd_ctx, keyid)) {
 		/*
 		 * OK, it can.  Can we now merge in the successor as well?
 		 */
@@ -1178,7 +1184,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 				can_vma_merge_before(next, vm_flags,
 						     anon_vma, file,
 						     pgoff+pglen,
-						     vm_userfaultfd_ctx) &&
+						     vm_userfaultfd_ctx,
+						     keyid) &&
 				is_mergeable_anon_vma(prev->anon_vma,
 						      next->anon_vma, NULL)) {
 							/* cases 1, 6 */
@@ -1201,7 +1208,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 			mpol_equal(policy, vma_policy(next)) &&
 			can_vma_merge_before(next, vm_flags,
 					     anon_vma, file, pgoff+pglen,
-					     vm_userfaultfd_ctx)) {
+					     vm_userfaultfd_ctx, keyid)) {
 		if (prev && addr < prev->vm_end)	/* case 4 */
 			err = __vma_adjust(prev, prev->vm_start,
 					 addr, prev->vm_pgoff, NULL, next);
@@ -1746,7 +1753,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	 * Can we just expand an old mapping?
 	 */
 	vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
-			NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX);
+			NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX, 0);
 	if (vma)
 		goto out;
 
@@ -3025,7 +3032,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
 
 	/* Can we just expand an old private anonymous mapping? */
 	vma = vma_merge(mm, prev, addr, addr + len, flags,
-			NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX);
+			NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, 0);
 	if (vma)
 		goto out;
 
@@ -3223,7 +3230,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 		return NULL;	/* should never get here */
 	new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
 			    vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
-			    vma->vm_userfaultfd_ctx);
+			    vma->vm_userfaultfd_ctx, vma_keyid(vma));
 	if (new_vma) {
 		/*
 		 * Source vma may have been merged into new_vma
diff --git a/mm/mprotect.c b/mm/mprotect.c
index bf38dfbbb4b4..82d7b194a918 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -400,7 +400,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
 	*pprev = vma_merge(mm, *pprev, start, end, newflags,
 			   vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
-			   vma->vm_userfaultfd_ctx);
+			   vma->vm_userfaultfd_ctx, vma_keyid(vma));
 	if (*pprev) {
 		vma = *pprev;
 		VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 02/59] mm: Add helpers to setup zero page mappings
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 01/59] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 03/59] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
                   ` (56 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

When kernel sets up an encrypted page mapping, encryption KeyID is
derived from a VMA. KeyID is going to be part of vma->vm_page_prot and
it will be propagated transparently to page table entry on mk_pte().

But there is an exception: zero page is never encrypted and its mapping
must use KeyID-0, regardless VMA's KeyID.

Introduce helpers that create a page table entry for zero page.

The generic implementation will be overridden by architecture-specific
code that takes care about using correct KeyID.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/dax.c                      | 3 +--
 include/asm-generic/pgtable.h | 8 ++++++++
 mm/huge_memory.c              | 6 ++----
 mm/memory.c                   | 3 +--
 mm/userfaultfd.c              | 3 +--
 5 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index a237141d8787..6ecc9c560e62 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1445,8 +1445,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
 		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
 		mm_inc_nr_ptes(vma->vm_mm);
 	}
-	pmd_entry = mk_pmd(zero_page, vmf->vma->vm_page_prot);
-	pmd_entry = pmd_mkhuge(pmd_entry);
+	pmd_entry = mk_zero_pmd(zero_page, vmf->vma->vm_page_prot);
 	set_pmd_at(vmf->vma->vm_mm, pmd_addr, vmf->pmd, pmd_entry);
 	spin_unlock(ptl);
 	trace_dax_pmd_load_hole(inode, vmf, zero_page, *entry);
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 75d9d68a6de7..afcfbb4af4b2 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -879,8 +879,16 @@ static inline unsigned long my_zero_pfn(unsigned long addr)
 }
 #endif
 
+#ifndef mk_zero_pte
+#define mk_zero_pte(addr, prot) pte_mkspecial(pfn_pte(my_zero_pfn(addr), prot))
+#endif
+
 #ifdef CONFIG_MMU
 
+#ifndef mk_zero_pmd
+#define mk_zero_pmd(zero_page, prot) pmd_mkhuge(mk_pmd(zero_page, prot))
+#endif
+
 #ifndef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_trans_huge(pmd_t pmd)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1334ede667a8..e9a791413730 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -678,8 +678,7 @@ static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
 	pmd_t entry;
 	if (!pmd_none(*pmd))
 		return false;
-	entry = mk_pmd(zero_page, vma->vm_page_prot);
-	entry = pmd_mkhuge(entry);
+	entry = mk_zero_pmd(zero_page, vma->vm_page_prot);
 	if (pgtable)
 		pgtable_trans_huge_deposit(mm, pmd, pgtable);
 	set_pmd_at(mm, haddr, pmd, entry);
@@ -2109,8 +2108,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
 
 	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
 		pte_t *pte, entry;
-		entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
-		entry = pte_mkspecial(entry);
+		entry = mk_zero_pte(haddr, vma->vm_page_prot);
 		pte = pte_offset_map(&_pmd, haddr);
 		VM_BUG_ON(!pte_none(*pte));
 		set_pte_at(mm, haddr, pte, entry);
diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..81ae8c39f75b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2970,8 +2970,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	/* Use the zero-page for reads */
 	if (!(vmf->flags & FAULT_FLAG_WRITE) &&
 			!mm_forbids_zeropage(vma->vm_mm)) {
-		entry = pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address),
-						vma->vm_page_prot));
+		entry = mk_zero_pte(vmf->address, vma->vm_page_prot);
 		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 				vmf->address, &vmf->ptl);
 		if (!pte_none(*vmf->pte))
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index c7ae74ce5ff3..06bf4ea3ee05 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -120,8 +120,7 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm,
 	pgoff_t offset, max_off;
 	struct inode *inode;
 
-	_dst_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr),
-					 dst_vma->vm_page_prot));
+	_dst_pte = mk_zero_pte(dst_addr, dst_vma->vm_page_prot);
 	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
 	if (dst_vma->vm_file) {
 		/* the shmem MAP_PRIVATE case requires checking the i_size */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 03/59] mm/ksm: Do not merge pages with different KeyIDs
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 01/59] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 02/59] mm: Add helpers to setup zero page mappings Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 04/59] mm/page_alloc: Unify alloc_hugepage_vma() Kirill A. Shutemov
                   ` (55 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

KSM compares plain text.  It might try to merge two pages that have the
same plain text but different ciphertext and possibly different
encryption keys.  When the kernel encrypted the page, it promised that
it would keep it encrypted with _that_ key.  That makes it impossible to
merge two pages encrypted with different keys.

Never merge encrypted pages with different KeyIDs.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |  7 +++++++
 mm/ksm.c           | 17 +++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5bfd3dd121c1..af1a56ff6764 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1644,6 +1644,13 @@ static inline int vma_keyid(struct vm_area_struct *vma)
 }
 #endif
 
+#ifndef page_keyid
+static inline int page_keyid(struct page *page)
+{
+	return 0;
+}
+#endif
+
 extern unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
diff --git a/mm/ksm.c b/mm/ksm.c
index 3dc4346411e4..7d4ef634f38e 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1228,6 +1228,23 @@ static int try_to_merge_one_page(struct vm_area_struct *vma,
 	if (!PageAnon(page))
 		goto out;
 
+	/*
+	 * KeyID indicates what key to use to encrypt and decrypt page's
+	 * content.
+	 *
+	 * KSM compares plain text instead (transparently to KSM code).
+	 *
+	 * But we still need to make sure that pages with identical plain
+	 * text will not be merged together if they are encrypted with
+	 * different keys.
+	 *
+	 * To make it work kernel only allows merging pages with the same KeyID.
+	 * The approach guarantees that the merged page can be read by all
+	 * users.
+	 */
+	if (kpage && page_keyid(page) != page_keyid(kpage))
+		goto out;
+
 	/*
 	 * We need the page lock to read a stable PageSwapCache in
 	 * write_protect_page().  We use trylock_page() instead of
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 04/59] mm/page_alloc: Unify alloc_hugepage_vma()
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 03/59] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 05/59] mm/page_alloc: Handle allocation for encrypted memory Kirill A. Shutemov
                   ` (54 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

We don't need to have separate implementations of alloc_hugepage_vma()
for NUMA and non-NUMA. Using variant based on alloc_pages_vma() we would
cover both cases.

This is preparation patch for allocation encrypted pages.

alloc_pages_vma() will handle allocation of encrypted pages. With this
change we don' t need to cover alloc_hugepage_vma() separately.

The change makes typo in Alpha's implementation of
__alloc_zeroed_user_highpage() visible. Fix it too.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/alpha/include/asm/page.h | 2 +-
 include/linux/gfp.h           | 6 ++----
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h
index f3fb2848470a..9a6fbb5269f3 100644
--- a/arch/alpha/include/asm/page.h
+++ b/arch/alpha/include/asm/page.h
@@ -18,7 +18,7 @@ extern void clear_page(void *page);
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
-	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vmaddr)
+	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 extern void copy_page(void * _to, void * _from);
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fb07b503dc45..3d4cb9fea417 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -511,21 +511,19 @@ alloc_pages(gfp_t gfp_mask, unsigned int order)
 extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
 			struct vm_area_struct *vma, unsigned long addr,
 			int node, bool hugepage);
-#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
-	alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
 #else
 #define alloc_pages(gfp_mask, order) \
 		alloc_pages_node(numa_node_id(), gfp_mask, order)
 #define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
 	alloc_pages(gfp_mask, order)
-#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
-	alloc_pages(gfp_mask, order)
 #endif
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
 #define alloc_page_vma(gfp_mask, vma, addr)			\
 	alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false)
 #define alloc_page_vma_node(gfp_mask, vma, addr, node)		\
 	alloc_pages_vma(gfp_mask, 0, vma, addr, node, false)
+#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
+	alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
 
 extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
 extern unsigned long get_zeroed_page(gfp_t gfp_mask);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 05/59] mm/page_alloc: Handle allocation for encrypted memory
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 04/59] mm/page_alloc: Unify alloc_hugepage_vma() Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 06/59] mm/khugepaged: Handle encrypted pages Kirill A. Shutemov
                   ` (53 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

For encrypted memory, we need to allocate pages for a specific
encryption KeyID.

There are two cases when we need to allocate a page for encryption:

 - Allocation for an encrypted VMA;

 - Allocation for migration of encrypted page;

The first case can be covered within alloc_page_vma(). We know KeyID
from the VMA.

The second case requires few new page allocation routines that would
allocate the page for a specific KeyID.

An encrypted page has to be cleared after KeyID set. This is handled
in prep_encrypted_page() that will be provided by arch-specific code.

Any custom allocator that deals with encrypted pages has to call
prep_encrypted_page() too. See compaction_alloc() for instance.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/gfp.h     | 50 +++++++++++++++++++++++++---
 include/linux/migrate.h | 14 ++++++--
 mm/compaction.c         |  3 ++
 mm/mempolicy.c          | 27 +++++++++++----
 mm/migrate.c            |  4 +--
 mm/page_alloc.c         | 74 +++++++++++++++++++++++++++++++++++++++++
 6 files changed, 155 insertions(+), 17 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 3d4cb9fea417..014aef082821 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -463,16 +463,48 @@ static inline void arch_free_page(struct page *page, int order) { }
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
 
+#ifndef prep_encrypted_page
+/*
+ * An architecture may override the helper to prepare the page
+ * to be used for with specific KeyID. To be called on encrypted
+ * page allocation.
+ */
+static inline void prep_encrypted_page(struct page *page, int order,
+		int keyid, bool zero)
+{
+}
+#endif
+
+/*
+ * Encrypted page has to be cleared once keyid is set, not on allocation.
+ */
+static inline bool deferred_page_zero(int keyid, gfp_t *gfp_mask)
+{
+	if (keyid && (*gfp_mask & __GFP_ZERO)) {
+		*gfp_mask &= ~__GFP_ZERO;
+		return true;
+	}
+
+	return false;
+}
+
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 							nodemask_t *nodemask);
 
+struct page *
+__alloc_pages_nodemask_keyid(gfp_t gfp_mask, unsigned int order,
+		int preferred_nid, nodemask_t *nodemask, int keyid);
+
 static inline struct page *
 __alloc_pages(gfp_t gfp_mask, unsigned int order, int preferred_nid)
 {
 	return __alloc_pages_nodemask(gfp_mask, order, preferred_nid, NULL);
 }
 
+struct page *__alloc_pages_node_keyid(int nid, int keyid,
+		gfp_t gfp_mask, unsigned int order);
+
 /*
  * Allocate pages, preferring the node given as nid. The node must be valid and
  * online. For more general interface, see alloc_pages_node().
@@ -500,6 +532,19 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 	return __alloc_pages_node(nid, gfp_mask, order);
 }
 
+static inline struct page *alloc_pages_node_keyid(int nid, int keyid,
+		gfp_t gfp_mask, unsigned int order)
+{
+	if (nid == NUMA_NO_NODE)
+		nid = numa_mem_id();
+
+	return __alloc_pages_node_keyid(nid, keyid, gfp_mask, order);
+}
+
+extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
+			struct vm_area_struct *vma, unsigned long addr,
+			int node, bool hugepage);
+
 #ifdef CONFIG_NUMA
 extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order);
 
@@ -508,14 +553,9 @@ alloc_pages(gfp_t gfp_mask, unsigned int order)
 {
 	return alloc_pages_current(gfp_mask, order);
 }
-extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
-			struct vm_area_struct *vma, unsigned long addr,
-			int node, bool hugepage);
 #else
 #define alloc_pages(gfp_mask, order) \
 		alloc_pages_node(numa_node_id(), gfp_mask, order)
-#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
-	alloc_pages(gfp_mask, order)
 #endif
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
 #define alloc_page_vma(gfp_mask, vma, addr)			\
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 7f04754c7f2b..a68516271c40 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -38,9 +38,16 @@ static inline struct page *new_page_nodemask(struct page *page,
 	unsigned int order = 0;
 	struct page *new_page = NULL;
 
-	if (PageHuge(page))
+	if (PageHuge(page)) {
+		/*
+		 * HugeTLB doesn't support encryption. We shouldn't see
+		 * such pages.
+		 */
+		if (WARN_ON_ONCE(page_keyid(page)))
+			return NULL;
 		return alloc_huge_page_nodemask(page_hstate(compound_head(page)),
 				preferred_nid, nodemask);
+	}
 
 	if (PageTransHuge(page)) {
 		gfp_mask |= GFP_TRANSHUGE;
@@ -50,8 +57,9 @@ static inline struct page *new_page_nodemask(struct page *page,
 	if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
 		gfp_mask |= __GFP_HIGHMEM;
 
-	new_page = __alloc_pages_nodemask(gfp_mask, order,
-				preferred_nid, nodemask);
+	/* Allocate a page with the same KeyID as the source page */
+	new_page = __alloc_pages_nodemask_keyid(gfp_mask, order,
+				preferred_nid, nodemask, page_keyid(page));
 
 	if (new_page && PageTransHuge(new_page))
 		prep_transhuge_page(new_page);
diff --git a/mm/compaction.c b/mm/compaction.c
index 9e1b9acb116b..874af83214b7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1559,6 +1559,9 @@ static struct page *compaction_alloc(struct page *migratepage,
 	list_del(&freepage->lru);
 	cc->nr_freepages--;
 
+	/* Prepare the page using the same KeyID as the source page */
+	if (freepage)
+		prep_encrypted_page(freepage, 0, page_keyid(migratepage), false);
 	return freepage;
 }
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 14ee933b1ff7..f79b4fa08c30 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -961,22 +961,29 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
 /* page allocation callback for NUMA node migration */
 struct page *alloc_new_node_page(struct page *page, unsigned long node)
 {
-	if (PageHuge(page))
+	if (PageHuge(page)) {
+		/*
+		 * HugeTLB doesn't support encryption. We shouldn't see
+		 * such pages.
+		 */
+		if (WARN_ON_ONCE(page_keyid(page)))
+			return NULL;
 		return alloc_huge_page_node(page_hstate(compound_head(page)),
 					node);
-	else if (PageTransHuge(page)) {
+	} else if (PageTransHuge(page)) {
 		struct page *thp;
 
-		thp = alloc_pages_node(node,
+		thp = alloc_pages_node_keyid(node, page_keyid(page),
 			(GFP_TRANSHUGE | __GFP_THISNODE),
 			HPAGE_PMD_ORDER);
 		if (!thp)
 			return NULL;
 		prep_transhuge_page(thp);
 		return thp;
-	} else
-		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
-						    __GFP_THISNODE, 0);
+	} else {
+		return __alloc_pages_node_keyid(node, page_keyid(page),
+				GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0);
+	}
 }
 
 /*
@@ -2053,9 +2060,13 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 {
 	struct mempolicy *pol;
 	struct page *page;
-	int preferred_nid;
+	bool deferred_zero;
+	int keyid, preferred_nid;
 	nodemask_t *nmask;
 
+	keyid = vma_keyid(vma);
+	deferred_zero = deferred_page_zero(keyid, &gfp);
+
 	pol = get_vma_policy(vma, addr);
 
 	if (pol->mode == MPOL_INTERLEAVE) {
@@ -2097,6 +2108,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 	page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask);
 	mpol_cond_put(pol);
 out:
+	if (page)
+		prep_encrypted_page(page, order, keyid, deferred_zero);
 	return page;
 }
 EXPORT_SYMBOL(alloc_pages_vma);
diff --git a/mm/migrate.c b/mm/migrate.c
index 8992741f10aa..c1b88eae71d8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1873,7 +1873,7 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
 	int nid = (int) data;
 	struct page *newpage;
 
-	newpage = __alloc_pages_node(nid,
+	newpage = __alloc_pages_node_keyid(nid, page_keyid(page),
 					 (GFP_HIGHUSER_MOVABLE |
 					  __GFP_THISNODE | __GFP_NOMEMALLOC |
 					  __GFP_NORETRY | __GFP_NOWARN) &
@@ -1999,7 +1999,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	int page_lru = page_is_file_cache(page);
 	unsigned long start = address & HPAGE_PMD_MASK;
 
-	new_page = alloc_pages_node(node,
+	new_page = alloc_pages_node_keyid(node, page_keyid(page),
 		(GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
 		HPAGE_PMD_ORDER);
 	if (!new_page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 272c6de1bf4e..963f959350e4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4046,6 +4046,53 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla
 }
 #endif /* CONFIG_COMPACTION */
 
+#ifndef CONFIG_NUMA
+struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
+		struct vm_area_struct *vma, unsigned long addr,
+		int node, bool hugepage)
+{
+	struct page *page;
+	bool deferred_zero;
+	int keyid = vma_keyid(vma);
+
+	deferred_zero = deferred_page_zero(keyid, &gfp_mask);
+	page = alloc_pages(gfp_mask, order);
+	if (page)
+		prep_encrypted_page(page, order, keyid, deferred_zero);
+
+	return page;
+}
+#endif
+
+/**
+ * __alloc_pages_node_keyid - allocate a page for a specific KeyID with
+ * preferred allocation node.
+ * @nid: the preferred node ID where memory should be allocated
+ * @keyid: KeyID to use
+ * @gfp_mask: GFP flags for the allocation
+ * @order: the page order
+ *
+ * Like __alloc_pages_node(), but prepares the page for a specific KeyID.
+ *
+ * Return: pointer to the allocated page or %NULL in case of error.
+ */
+struct page * __alloc_pages_node_keyid(int nid, int keyid,
+		gfp_t gfp_mask, unsigned int order)
+{
+	struct page *page;
+	bool deferred_zero;
+
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+	VM_WARN_ON(!node_online(nid));
+
+	deferred_zero = deferred_page_zero(keyid, &gfp_mask);
+	page = __alloc_pages(gfp_mask, order, nid);
+	if (page)
+		prep_encrypted_page(page, order, keyid, deferred_zero);
+
+	return page;
+}
+
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map __fs_reclaim_map =
 	STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
@@ -4757,6 +4804,33 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 }
 EXPORT_SYMBOL(__alloc_pages_nodemask);
 
+/**
+ * __alloc_pages_nodemask_keyid - allocate a page for a specific KeyID.
+ * @gfp_mask: GFP flags for the allocation
+ * @order: the page order
+ * @preferred_nid: the preferred node ID where memory should be allocated
+ * @nodemask: allowed nodemask
+ * @keyid: KeyID to use
+ *
+ * Like __alloc_pages_nodemask(), but prepares the page for a specific KeyID.
+ *
+ * Return: pointer to the allocated page or %NULL in case of error.
+ */
+struct page *
+__alloc_pages_nodemask_keyid(gfp_t gfp_mask, unsigned int order,
+		int preferred_nid, nodemask_t *nodemask, int keyid)
+{
+	struct page *page;
+	bool deferred_zero;
+
+	deferred_zero = deferred_page_zero(keyid, &gfp_mask);
+	page = __alloc_pages_nodemask(gfp_mask, order, preferred_nid, nodemask);
+	if (page)
+		prep_encrypted_page(page, order, keyid, deferred_zero);
+	return page;
+}
+EXPORT_SYMBOL(__alloc_pages_nodemask_keyid);
+
 /*
  * Common helper functions. Never use with __GFP_HIGHMEM because the returned
  * address cannot represent highmem pages. Use alloc_pages and then kmap if
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 06/59] mm/khugepaged: Handle encrypted pages
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 05/59] mm/page_alloc: Handle allocation for encrypted memory Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 07/59] x86/mm: Mask out KeyID bits from page table entry pfn Kirill A. Shutemov
                   ` (52 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

For !NUMA khugepaged allocates page in advance, before we found a VMA
for collapse. We don't yet know which KeyID to use for the allocation.

The page is allocated with KeyID-0. Once we know that the VMA is
suitable for collapsing, we prepare the page for KeyID we need, based on
vma_keyid().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/khugepaged.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index eaaa21b23215..ae9bd3b18aa1 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1059,6 +1059,16 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 */
 	anon_vma_unlock_write(vma->anon_vma);
 
+	/*
+	 * At this point new_page is allocated as non-encrypted.
+	 * If VMA's KeyID is non-zero, we need to prepare it to be encrypted
+	 * before coping data.
+	 */
+	if (vma_keyid(vma)) {
+		prep_encrypted_page(new_page, HPAGE_PMD_ORDER,
+				vma_keyid(vma), false);
+	}
+
 	__collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl);
 	pte_unmap(pte);
 	__SetPageUptodate(new_page);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 07/59] x86/mm: Mask out KeyID bits from page table entry pfn
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 06/59] mm/khugepaged: Handle encrypted pages Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 08/59] x86/mm: Introduce helpers to read number, shift and mask of KeyIDs Kirill A. Shutemov
                   ` (51 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

MKTME claims several upper bits of the physical address in a page table
entry to encode KeyID. It effectively shrinks number of bits for
physical address. We should exclude KeyID bits from physical addresses.

For instance, if CPU enumerates 52 physical address bits and number of
bits claimed for KeyID is 6, bits 51:46 must not be threated as part
physical address.

This patch adjusts __PHYSICAL_MASK during MKTME enumeration.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/cpu/intel.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 8d6d92ebeb54..f03eee666761 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -616,6 +616,29 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		mktme_status = MKTME_ENABLED;
 	}
 
+#ifdef CONFIG_X86_INTEL_MKTME
+	if (mktme_status == MKTME_ENABLED && nr_keyids) {
+		/*
+		 * Mask out bits claimed from KeyID from physical address mask.
+		 *
+		 * For instance, if a CPU enumerates 52 physical address bits
+		 * and number of bits claimed for KeyID is 6, bits 51:46 of
+		 * physical address is unusable.
+		 */
+		phys_addr_t keyid_mask;
+
+		keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, c->x86_phys_bits - keyid_bits);
+		physical_mask &= ~keyid_mask;
+	} else {
+		/*
+		 * Reset __PHYSICAL_MASK.
+		 * Maybe needed if there's inconsistent configuation
+		 * between CPUs.
+		 */
+		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+	}
+#endif
+
 	/*
 	 * KeyID bits effectively lower the number of physical address
 	 * bits.  Update cpuinfo_x86::x86_phys_bits accordingly.
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 08/59] x86/mm: Introduce helpers to read number, shift and mask of KeyIDs
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 07/59] x86/mm: Mask out KeyID bits from page table entry pfn Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 09/59] x86/mm: Store bitmask of the encryption algorithms supported by MKTME Kirill A. Shutemov
                   ` (50 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

mktme_nr_keyids() returns the number of KeyIDs available for MKTME,
excluding KeyID zero which used by TME. MKTME KeyIDs start from 1.

mktme_keyid_shift() returns the shift of KeyID within physical address.

mktme_keyid_mask() returns the mask to extract KeyID from physical address.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 19 +++++++++++++++++++
 arch/x86/kernel/cpu/intel.c  | 15 ++++++++++++---
 arch/x86/mm/Makefile         |  2 ++
 arch/x86/mm/mktme.c          | 27 +++++++++++++++++++++++++++
 4 files changed, 60 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/include/asm/mktme.h
 create mode 100644 arch/x86/mm/mktme.c

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
new file mode 100644
index 000000000000..b9ba2ea5b600
--- /dev/null
+++ b/arch/x86/include/asm/mktme.h
@@ -0,0 +1,19 @@
+#ifndef	_ASM_X86_MKTME_H
+#define	_ASM_X86_MKTME_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_X86_INTEL_MKTME
+extern phys_addr_t __mktme_keyid_mask;
+extern phys_addr_t mktme_keyid_mask(void);
+extern int __mktme_keyid_shift;
+extern int mktme_keyid_shift(void);
+extern int __mktme_nr_keyids;
+extern int mktme_nr_keyids(void);
+#else
+#define mktme_keyid_mask()	((phys_addr_t)0)
+#define mktme_nr_keyids()	0
+#define mktme_keyid_shift()	0
+#endif
+
+#endif
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index f03eee666761..7ba44825be42 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -618,6 +618,9 @@ static void detect_tme(struct cpuinfo_x86 *c)
 
 #ifdef CONFIG_X86_INTEL_MKTME
 	if (mktme_status == MKTME_ENABLED && nr_keyids) {
+		__mktme_nr_keyids = nr_keyids;
+		__mktme_keyid_shift = c->x86_phys_bits - keyid_bits;
+
 		/*
 		 * Mask out bits claimed from KeyID from physical address mask.
 		 *
@@ -625,17 +628,23 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		 * and number of bits claimed for KeyID is 6, bits 51:46 of
 		 * physical address is unusable.
 		 */
-		phys_addr_t keyid_mask;
+		__mktme_keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, mktme_keyid_shift());
+		physical_mask &= ~mktme_keyid_mask();
 
-		keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, c->x86_phys_bits - keyid_bits);
-		physical_mask &= ~keyid_mask;
 	} else {
 		/*
 		 * Reset __PHYSICAL_MASK.
 		 * Maybe needed if there's inconsistent configuation
 		 * between CPUs.
+		 *
+		 * FIXME: broken for hotplug.
+		 * We must not allow onlining secondary CPUs with non-matching
+		 * configuration.
 		 */
 		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+		__mktme_keyid_mask = 0;
+		__mktme_keyid_shift = 0;
+		__mktme_nr_keyids = 0;
 	}
 #endif
 
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 84373dc9b341..600d18691876 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)		+= pti.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_identity.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
+
+obj-$(CONFIG_X86_INTEL_MKTME)	+= mktme.o
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
new file mode 100644
index 000000000000..0f48ef2720cc
--- /dev/null
+++ b/arch/x86/mm/mktme.c
@@ -0,0 +1,27 @@
+#include <asm/mktme.h>
+
+/* Mask to extract KeyID from physical address. */
+phys_addr_t __mktme_keyid_mask;
+phys_addr_t mktme_keyid_mask(void)
+{
+	return __mktme_keyid_mask;
+}
+EXPORT_SYMBOL_GPL(mktme_keyid_mask);
+
+/* Shift of KeyID within physical address. */
+int __mktme_keyid_shift;
+int mktme_keyid_shift(void)
+{
+	return __mktme_keyid_shift;
+}
+EXPORT_SYMBOL_GPL(mktme_keyid_shift);
+
+/*
+ * Number of KeyIDs available for MKTME.
+ * Excludes KeyID-0 which used by TME. MKTME KeyIDs start from 1.
+ */
+int __mktme_nr_keyids;
+int mktme_nr_keyids(void)
+{
+	return __mktme_nr_keyids;
+}
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 09/59] x86/mm: Store bitmask of the encryption algorithms supported by MKTME
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (7 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 08/59] x86/mm: Introduce helpers to read number, shift and mask of KeyIDs Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 10/59] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify() Kirill A. Shutemov
                   ` (49 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Store bitmask of the supported encryption algorithms in 'mktme_algs'.
This will be used by key management service.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 2 ++
 arch/x86/kernel/cpu/intel.c  | 6 +++++-
 arch/x86/mm/mktme.c          | 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index b9ba2ea5b600..42a3b1b44669 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -10,6 +10,8 @@ extern int __mktme_keyid_shift;
 extern int mktme_keyid_shift(void);
 extern int __mktme_nr_keyids;
 extern int mktme_nr_keyids(void);
+extern unsigned int mktme_algs;
+
 #else
 #define mktme_keyid_mask()	((phys_addr_t)0)
 #define mktme_nr_keyids()	0
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 7ba44825be42..991bdcb2a55a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -553,6 +553,8 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
 #define TME_ACTIVATE_CRYPTO_ALGS(x)	((x >> 48) & 0xffff)	/* Bits 63:48 */
 #define TME_ACTIVATE_CRYPTO_AES_XTS_128	1
 
+#define TME_ACTIVATE_CRYPTO_KNOWN_ALGS	TME_ACTIVATE_CRYPTO_AES_XTS_128
+
 /* Values for mktme_status (SW only construct) */
 #define MKTME_ENABLED			0
 #define MKTME_DISABLED			1
@@ -596,7 +598,7 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy);
 
 	tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
-	if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) {
+	if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_KNOWN_ALGS)) {
 		pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n",
 				tme_crypto_algs);
 		mktme_status = MKTME_DISABLED;
@@ -631,6 +633,8 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		__mktme_keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, mktme_keyid_shift());
 		physical_mask &= ~mktme_keyid_mask();
 
+		tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
+		mktme_algs = tme_crypto_algs & TME_ACTIVATE_CRYPTO_KNOWN_ALGS;
 	} else {
 		/*
 		 * Reset __PHYSICAL_MASK.
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 0f48ef2720cc..755afc6935b5 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -25,3 +25,5 @@ int mktme_nr_keyids(void)
 {
 	return __mktme_nr_keyids;
 }
+
+unsigned int mktme_algs;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 10/59] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (8 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 09/59] x86/mm: Store bitmask of the encryption algorithms supported by MKTME Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 11/59] x86/mm: Detect MKTME early Kirill A. Shutemov
                   ` (48 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

An encrypted VMA will have KeyID stored in vma->vm_page_prot. This way
we don't need to do anything special to setup encrypted page table
entries and don't need to reserve space for KeyID in a VMA.

This patch changes _PAGE_CHG_MASK to include KeyID bits. Otherwise they
are going to be stripped from vm_page_prot on the first pgprot_modify().

Define PTE_PFN_MASK_MAX similar to PTE_PFN_MASK but based on
__PHYSICAL_MASK_SHIFT. This way we include whole range of bits
architecturally available for PFN without referencing physical_mask and
mktme_keyid_mask variables.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/pgtable_types.h | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index b5e49e6bac63..c23793146759 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -116,12 +116,25 @@
 				 _PAGE_ACCESSED | _PAGE_DIRTY)
 
 /*
- * Set of bits not changed in pte_modify.  The pte's
- * protection key is treated like _PAGE_RW, for
- * instance, and is *not* included in this mask since
- * pte_modify() does modify it.
+ * Set of bits not changed in pte_modify.
+ *
+ * The pte's protection key is treated like _PAGE_RW, for instance, and is
+ * *not* included in this mask since pte_modify() does modify it.
+ *
+ * They include the physical address and the memory encryption keyID.
+ * The paddr and the keyID never occupy the same bits at the same time.
+ * But, a given bit might be used for the keyID on one system and used for
+ * the physical address on another. As an optimization, we manage them in
+ * one unit here since their combination always occupies the same hardware
+ * bits. PTE_PFN_MASK_MAX stores combined mask.
+ *
+ * Cast PAGE_MASK to a signed type so that it is sign-extended if
+ * virtual addresses are 32-bits but physical addresses are larger
+ * (ie, 32-bit PAE).
  */
-#define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
+#define PTE_PFN_MASK_MAX \
+	(((signed long)PAGE_MASK) & ((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
+#define _PAGE_CHG_MASK	(PTE_PFN_MASK_MAX | _PAGE_PCD | _PAGE_PWT |		\
 			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY |	\
 			 _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
 #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 11/59] x86/mm: Detect MKTME early
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (9 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 10/59] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify() Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 12/59] x86/mm: Add a helper to retrieve KeyID for a page Kirill A. Shutemov
                   ` (47 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

We need to know the number of KeyIDs before page_ext is initialized.
We are going to use page_ext to store KeyID and it would be handly to
avoid page_ext allocation if there's no MKMTE in the system.

page_ext initialization happens before full CPU initizliation is complete.
Move detect_tme() call to early_init_intel().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/cpu/intel.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 991bdcb2a55a..4c2d70287eb4 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -187,6 +187,8 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
 	return false;
 }
 
+static void detect_tme(struct cpuinfo_x86 *c);
+
 static void early_init_intel(struct cpuinfo_x86 *c)
 {
 	u64 misc_enable;
@@ -338,6 +340,9 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	 */
 	if (detect_extended_topology_early(c) < 0)
 		detect_ht_early(c);
+
+	if (cpu_has(c, X86_FEATURE_TME))
+		detect_tme(c);
 }
 
 #ifdef CONFIG_X86_32
@@ -793,9 +798,6 @@ static void init_intel(struct cpuinfo_x86 *c)
 	if (cpu_has(c, X86_FEATURE_VMX))
 		detect_vmx_virtcap(c);
 
-	if (cpu_has(c, X86_FEATURE_TME))
-		detect_tme(c);
-
 	init_intel_misc_features(c);
 }
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 12/59] x86/mm: Add a helper to retrieve KeyID for a page
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (10 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 11/59] x86/mm: Detect MKTME early Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 13/59] x86/mm: Add a helper to retrieve KeyID for a VMA Kirill A. Shutemov
                   ` (46 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

page_ext allows to store additional per-page information without growing
main struct page. The additional space can be requested at boot time.

Store KeyID in bits 31:16 of extended page flags. These bits are unused.

page_keyid() returns zero until page_ext is ready. page_ext initializer
enables a static branch to indicate that page_keyid() can use page_ext.
The same static branch will gate MKTME readiness in general.

We don't yet set KeyID for the page. It will come in the following
patch that implements prep_encrypted_page(). All pages have KeyID-0 for
now.

page_keyid() will be used by KVM which can be built as a module. We need
to export mktme_enabled_key to be able to inline page_keyid().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 26 ++++++++++++++++++++++++++
 arch/x86/include/asm/page.h  |  1 +
 arch/x86/mm/mktme.c          | 21 +++++++++++++++++++++
 include/linux/mm.h           |  2 +-
 include/linux/page_ext.h     | 11 ++++++++++-
 mm/page_ext.c                |  3 +++
 6 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 42a3b1b44669..46041075f617 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -2,6 +2,8 @@
 #define	_ASM_X86_MKTME_H
 
 #include <linux/types.h>
+#include <linux/page_ext.h>
+#include <linux/jump_label.h>
 
 #ifdef CONFIG_X86_INTEL_MKTME
 extern phys_addr_t __mktme_keyid_mask;
@@ -12,10 +14,34 @@ extern int __mktme_nr_keyids;
 extern int mktme_nr_keyids(void);
 extern unsigned int mktme_algs;
 
+DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
+static inline bool mktme_enabled(void)
+{
+	return static_branch_unlikely(&mktme_enabled_key);
+}
+
+extern struct page_ext_operations page_mktme_ops;
+
+#define page_keyid page_keyid
+static inline int page_keyid(const struct page *page)
+{
+	if (!mktme_enabled())
+		return 0;
+
+	return lookup_page_ext(page)->keyid;
+}
+
 #else
 #define mktme_keyid_mask()	((phys_addr_t)0)
 #define mktme_nr_keyids()	0
 #define mktme_keyid_shift()	0
+
+#define page_keyid(page) 0
+
+static inline bool mktme_enabled(void)
+{
+	return false;
+}
 #endif
 
 #endif
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 7555b48803a8..39af59487d5f 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -19,6 +19,7 @@
 struct page;
 
 #include <linux/range.h>
+#include <asm/mktme.h>
 extern struct range pfn_mapped[];
 extern int nr_pfn_mapped;
 
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 755afc6935b5..48c2d4c97356 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -27,3 +27,24 @@ int mktme_nr_keyids(void)
 }
 
 unsigned int mktme_algs;
+
+DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
+EXPORT_SYMBOL_GPL(mktme_enabled_key);
+
+static bool need_page_mktme(void)
+{
+	/* Make sure keyid doesn't collide with extended page flags */
+	BUILD_BUG_ON(__NR_PAGE_EXT_FLAGS > 16);
+
+	return !!mktme_nr_keyids();
+}
+
+static void init_page_mktme(void)
+{
+	static_branch_enable(&mktme_enabled_key);
+}
+
+struct page_ext_operations page_mktme_ops = {
+	.need = need_page_mktme,
+	.init = init_page_mktme,
+};
diff --git a/include/linux/mm.h b/include/linux/mm.h
index af1a56ff6764..3f9640f388ac 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1645,7 +1645,7 @@ static inline int vma_keyid(struct vm_area_struct *vma)
 #endif
 
 #ifndef page_keyid
-static inline int page_keyid(struct page *page)
+static inline int page_keyid(const struct page *page)
 {
 	return 0;
 }
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 09592951725c..a9fa95ae9847 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -22,6 +22,7 @@ enum page_ext_flags {
 	PAGE_EXT_YOUNG,
 	PAGE_EXT_IDLE,
 #endif
+	__NR_PAGE_EXT_FLAGS
 };
 
 /*
@@ -32,7 +33,15 @@ enum page_ext_flags {
  * then the page_ext for pfn always exists.
  */
 struct page_ext {
-	unsigned long flags;
+	union {
+		unsigned long flags;
+#ifdef CONFIG_X86_INTEL_MKTME
+		struct {
+			unsigned short __pad;
+			unsigned short keyid;
+		};
+#endif
+	};
 };
 
 extern void pgdat_page_ext_init(struct pglist_data *pgdat);
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 5f5769c7db3b..c52b77c13cd9 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -65,6 +65,9 @@ static struct page_ext_operations *page_ext_ops[] = {
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_X86_INTEL_MKTME
+	&page_mktme_ops,
+#endif
 };
 
 static unsigned long total_usage;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 13/59] x86/mm: Add a helper to retrieve KeyID for a VMA
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (11 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 12/59] x86/mm: Add a helper to retrieve KeyID for a page Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 14/59] x86/mm: Add hooks to allocate and free encrypted pages Kirill A. Shutemov
                   ` (45 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

We store KeyID in upper bits for vm_page_prot that match position of
KeyID in PTE. vma_keyid() extracts KeyID from vm_page_prot.

With KeyID in vm_page_prot we don't need to modify any page table helper
to propagate the KeyID to page table entires.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 12 ++++++++++++
 arch/x86/mm/mktme.c          |  7 +++++++
 2 files changed, 19 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 46041075f617..52b115b30a42 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -5,6 +5,8 @@
 #include <linux/page_ext.h>
 #include <linux/jump_label.h>
 
+struct vm_area_struct;
+
 #ifdef CONFIG_X86_INTEL_MKTME
 extern phys_addr_t __mktme_keyid_mask;
 extern phys_addr_t mktme_keyid_mask(void);
@@ -31,6 +33,16 @@ static inline int page_keyid(const struct page *page)
 	return lookup_page_ext(page)->keyid;
 }
 
+#define vma_keyid vma_keyid
+int __vma_keyid(struct vm_area_struct *vma);
+static inline int vma_keyid(struct vm_area_struct *vma)
+{
+	if (!mktme_enabled())
+		return 0;
+
+	return __vma_keyid(vma);
+}
+
 #else
 #define mktme_keyid_mask()	((phys_addr_t)0)
 #define mktme_nr_keyids()	0
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 48c2d4c97356..d02867212e33 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,3 +1,4 @@
+#include <linux/mm.h>
 #include <asm/mktme.h>
 
 /* Mask to extract KeyID from physical address. */
@@ -48,3 +49,9 @@ struct page_ext_operations page_mktme_ops = {
 	.need = need_page_mktme,
 	.init = init_page_mktme,
 };
+
+int __vma_keyid(struct vm_area_struct *vma)
+{
+	pgprotval_t prot = pgprot_val(vma->vm_page_prot);
+	return (prot & mktme_keyid_mask()) >> mktme_keyid_shift();
+}
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 14/59] x86/mm: Add hooks to allocate and free encrypted pages
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (12 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 13/59] x86/mm: Add a helper to retrieve KeyID for a VMA Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 15/59] x86/mm: Map zero pages into encrypted mappings correctly Kirill A. Shutemov
                   ` (44 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Hook up into page allocator to allocate and free encrypted page
properly.

The hardware/CPU does not enforce coherency between mappings of the same
physical page with different KeyIDs or encryption keys.
We are responsible for cache management.

Flush cache on allocating encrypted page and on returning the page to
the free pool.

prep_encrypted_page() also takes care about zeroing the page. We have to
do this after KeyID is set for the page.

The patch relies on page_address() to return virtual address of the page
mapping with the current KeyID. It will be implemented later in the
patchset.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 17 ++++++++
 arch/x86/mm/mktme.c          | 83 ++++++++++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 52b115b30a42..a61b45fca4b1 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -43,6 +43,23 @@ static inline int vma_keyid(struct vm_area_struct *vma)
 	return __vma_keyid(vma);
 }
 
+#define prep_encrypted_page prep_encrypted_page
+void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero);
+static inline void prep_encrypted_page(struct page *page, int order,
+		int keyid, bool zero)
+{
+	if (keyid)
+		__prep_encrypted_page(page, order, keyid, zero);
+}
+
+#define HAVE_ARCH_FREE_PAGE
+void free_encrypted_page(struct page *page, int order);
+static inline void arch_free_page(struct page *page, int order)
+{
+	if (page_keyid(page))
+		free_encrypted_page(page, order);
+}
+
 #else
 #define mktme_keyid_mask()	((phys_addr_t)0)
 #define mktme_nr_keyids()	0
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index d02867212e33..8015e7822c9b 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,4 +1,5 @@
 #include <linux/mm.h>
+#include <linux/highmem.h>
 #include <asm/mktme.h>
 
 /* Mask to extract KeyID from physical address. */
@@ -55,3 +56,85 @@ int __vma_keyid(struct vm_area_struct *vma)
 	pgprotval_t prot = pgprot_val(vma->vm_page_prot);
 	return (prot & mktme_keyid_mask()) >> mktme_keyid_shift();
 }
+
+/* Prepare page to be used for encryption. Called from page allocator. */
+void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
+{
+	int i;
+
+	/*
+	 * The hardware/CPU does not enforce coherency between mappings
+	 * of the same physical page with different KeyIDs or
+	 * encryption keys. We are responsible for cache management.
+	 *
+	 * Flush cache lines with KeyID-0. page_address() returns virtual
+	 * address of the page mapping with the current (zero) KeyID.
+	 */
+	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
+
+	for (i = 0; i < (1 << order); i++) {
+		/* All pages coming out of the allocator should have KeyID 0 */
+		WARN_ON_ONCE(lookup_page_ext(page)->keyid);
+
+		/*
+		 * Change KeyID. From now on page_address() will return address
+		 * of the page mapping with the new KeyID.
+		 *
+		 * We don't need barrier() before the KeyID change because
+		 * clflush_cache_range() above stops compiler from reordring
+		 * past the point with mb().
+		 *
+		 * And we don't need a barrier() after the assignment because
+		 * any future reference of KeyID (i.e. from page_address())
+		 * will create address dependency and compiler is not allow to
+		 * mess with this.
+		 */
+		lookup_page_ext(page)->keyid = keyid;
+
+		/* Clear the page after the KeyID is set. */
+		if (zero)
+			clear_highpage(page);
+
+		page++;
+	}
+}
+
+/*
+ * Handles freeing of encrypted page.
+ * Called from page allocator on freeing encrypted page.
+ */
+void free_encrypted_page(struct page *page, int order)
+{
+	int i;
+
+	/*
+	 * The hardware/CPU does not enforce coherency between mappings
+	 * of the same physical page with different KeyIDs or
+	 * encryption keys. We are responsible for cache management.
+	 *
+	 * Flush cache lines with non-0 KeyID. page_address() returns virtual
+	 * address of the page mapping with the current (non-zero) KeyID.
+	 */
+	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
+
+	for (i = 0; i < (1 << order); i++) {
+		/* Check if the page has reasonable KeyID */
+		WARN_ON_ONCE(!lookup_page_ext(page)->keyid);
+		WARN_ON_ONCE(lookup_page_ext(page)->keyid > mktme_nr_keyids());
+
+		/*
+		 * Switch the page back to zero KeyID.
+		 *
+		 * We don't need barrier() before the KeyID change because
+		 * clflush_cache_range() above stops compiler from reordring
+		 * past the point with mb().
+		 *
+		 * And we don't need a barrier() after the assignment because
+		 * any future reference of KeyID (i.e. from page_address())
+		 * will create address dependency and compiler is not allow to
+		 * mess with this.
+		 */
+		lookup_page_ext(page)->keyid = 0;
+		page++;
+	}
+}
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 15/59] x86/mm: Map zero pages into encrypted mappings correctly
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (13 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 14/59] x86/mm: Add hooks to allocate and free encrypted pages Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 16/59] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING Kirill A. Shutemov
                   ` (43 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Zero pages are never encrypted. Keep KeyID-0 for them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/pgtable.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 0bc530c4eb13..f0dd80a920a9 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -820,6 +820,19 @@ static inline unsigned long pmd_index(unsigned long address)
  */
 #define mk_pte(page, pgprot)   pfn_pte(page_to_pfn(page), (pgprot))
 
+#define mk_zero_pte mk_zero_pte
+static inline pte_t mk_zero_pte(unsigned long addr, pgprot_t prot)
+{
+	extern unsigned long zero_pfn;
+	pte_t entry;
+
+	prot.pgprot &= ~mktme_keyid_mask();
+	entry = pfn_pte(zero_pfn, prot);
+	entry = pte_mkspecial(entry);
+
+	return entry;
+}
+
 /*
  * the pte page can be thought of an array like this: pte_t[PTRS_PER_PTE]
  *
@@ -1153,6 +1166,12 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
 
 #define mk_pmd(page, pgprot)   pfn_pmd(page_to_pfn(page), (pgprot))
 
+#define mk_zero_pmd(zero_page, prot)					\
+({									\
+	prot.pgprot &= ~mktme_keyid_mask();				\
+	pmd_mkhuge(mk_pmd(zero_page, prot));				\
+})
+
 #define  __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS
 extern int pmdp_set_access_flags(struct vm_area_struct *vma,
 				 unsigned long address, pmd_t *pmdp,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 16/59] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (14 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 15/59] x86/mm: Map zero pages into encrypted mappings correctly Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 17/59] x86/mm: Allow to disable MKTME after enumeration Kirill A. Shutemov
                   ` (42 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Rename the option to CONFIG_MEMORY_PHYSICAL_PADDING. It will be used
not only for KASLR.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/Kconfig    | 2 +-
 arch/x86/mm/kaslr.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 222855cc0158..2eb2867db5fa 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2214,7 +2214,7 @@ config RANDOMIZE_MEMORY
 
 	   If unsure, say Y.
 
-config RANDOMIZE_MEMORY_PHYSICAL_PADDING
+config MEMORY_PHYSICAL_PADDING
 	hex "Physical memory mapping padding" if EXPERT
 	depends on RANDOMIZE_MEMORY
 	default "0xa" if MEMORY_HOTPLUG
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index dc6182eecefa..580b82c2621b 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -104,7 +104,7 @@ void __init kernel_randomize_memory(void)
 	 */
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
 	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
-		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+		CONFIG_MEMORY_PHYSICAL_PADDING;
 
 	/* Adapt phyiscal memory region size based on available memory */
 	if (memory_tb < kaslr_regions[0].size_tb)
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 17/59] x86/mm: Allow to disable MKTME after enumeration
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (15 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 16/59] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 18/59] x86/mm: Calculate direct mapping size Kirill A. Shutemov
                   ` (41 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

The new helper mktme_disable() allows to disable MKTME even if it's
enumerated successfully. MKTME initialization may fail and this
functionality allows system to boot regardless of the failure.

MKTME needs per-KeyID direct mapping. It requires a lot more virtual
address space which may be a problem in 4-level paging mode. If the
system has more physical memory than we can handle with MKTME the
feature allows to fail MKTME, but boot the system successfully.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |  5 +++++
 arch/x86/kernel/cpu/intel.c  |  5 +----
 arch/x86/mm/mktme.c          | 10 ++++++++++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index a61b45fca4b1..3fc246acc279 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -22,6 +22,8 @@ static inline bool mktme_enabled(void)
 	return static_branch_unlikely(&mktme_enabled_key);
 }
 
+void mktme_disable(void);
+
 extern struct page_ext_operations page_mktme_ops;
 
 #define page_keyid page_keyid
@@ -71,6 +73,9 @@ static inline bool mktme_enabled(void)
 {
 	return false;
 }
+
+static inline void mktme_disable(void) {}
+
 #endif
 
 #endif
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4c2d70287eb4..9852580340b9 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -650,10 +650,7 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		 * We must not allow onlining secondary CPUs with non-matching
 		 * configuration.
 		 */
-		physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
-		__mktme_keyid_mask = 0;
-		__mktme_keyid_shift = 0;
-		__mktme_nr_keyids = 0;
+		mktme_disable();
 	}
 #endif
 
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 8015e7822c9b..1e8d662e5bff 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -33,6 +33,16 @@ unsigned int mktme_algs;
 DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
 EXPORT_SYMBOL_GPL(mktme_enabled_key);
 
+void mktme_disable(void)
+{
+	physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+	__mktme_keyid_mask = 0;
+	__mktme_keyid_shift = 0;
+	__mktme_nr_keyids = 0;
+	if (mktme_enabled())
+		static_branch_disable(&mktme_enabled_key);
+}
+
 static bool need_page_mktme(void)
 {
 	/* Make sure keyid doesn't collide with extended page flags */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 18/59] x86/mm: Calculate direct mapping size
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (16 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 17/59] x86/mm: Allow to disable MKTME after enumeration Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 19/59] x86/mm: Implement syncing per-KeyID direct mappings Kirill A. Shutemov
                   ` (40 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

The kernel needs to have a way to access encrypted memory. We have two
option on how approach it:

 - Create temporary mappings every time kernel needs access to encrypted
   memory. That's basically brings highmem and its overhead back.

 - Create multiple direct mappings, one per-KeyID. In this setup we
   don't need to create temporary mappings on the fly -- encrypted
   memory is permanently available in kernel address space.

We take the second approach as it has lower overhead.

It's worth noting that with per-KeyID direct mappings compromised kernel
would give access to decrypted data right away without additional tricks
to get memory mapped with the correct KeyID.

Per-KeyID mappings require a lot more virtual address space. On 4-level
machine with 64 KeyIDs we max out 46-bit virtual address space dedicated
for direct mapping with 1TiB of RAM. Given that we round up any
calculation on direct mapping size to 1TiB, we effectively claim all
46-bit address space for direct mapping on such machine regardless of
RAM size.

Increased usage of virtual address space has implications for KASLR:
we have less space for randomization. With 64 TiB claimed for direct
mapping with 4-level we left with 27 TiB of entropy to place
page_offset_base, vmalloc_base and vmemmap_base.

5-level paging provides much wider virtual address space and KASLR
doesn't suffer significantly from per-KeyID direct mappings.

It's preferred to run MKTME with 5-level paging.

A direct mapping for each KeyID will be put next to each other in the
virtual address space. We need to have a way to find boundaries of
direct mapping for particular KeyID.

The new variable direct_mapping_size specifies the size of direct
mapping. With the value, it's trivial to find direct mapping for
KeyID-N: PAGE_OFFSET + N * direct_mapping_size.

Size of direct mapping is calculated during KASLR setup. If KALSR is
disabled it happens during MKTME initialization.

With MKTME size of direct mapping has to be power-of-2. It makes
implementation of __pa() efficient.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/x86_64/mm.rst |  4 +++
 arch/x86/include/asm/page_32.h  |  1 +
 arch/x86/include/asm/page_64.h  |  2 ++
 arch/x86/include/asm/setup.h    |  6 ++++
 arch/x86/kernel/head64.c        |  4 +++
 arch/x86/kernel/setup.c         |  3 ++
 arch/x86/mm/init_64.c           | 58 +++++++++++++++++++++++++++++++++
 arch/x86/mm/kaslr.c             | 11 +++++--
 8 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.rst b/Documentation/x86/x86_64/mm.rst
index 267fc4808945..7978afe6c396 100644
--- a/Documentation/x86/x86_64/mm.rst
+++ b/Documentation/x86/x86_64/mm.rst
@@ -140,6 +140,10 @@ The direct mapping covers all memory in the system up to the highest
 memory address (this means in some cases it can also include PCI memory
 holes).
 
+With MKTME, we have multiple direct mappings. One per-KeyID. They are put
+next to each other. PAGE_OFFSET + N * direct_mapping_size can be used to
+find direct mapping for KeyID-N.
+
 vmalloc space is lazily synchronized into the different PML4/PML5 pages of
 the processes using the page fault handler, with init_top_pgt as
 reference.
diff --git a/arch/x86/include/asm/page_32.h b/arch/x86/include/asm/page_32.h
index 94dbd51df58f..8bce788f9ca9 100644
--- a/arch/x86/include/asm/page_32.h
+++ b/arch/x86/include/asm/page_32.h
@@ -6,6 +6,7 @@
 
 #ifndef __ASSEMBLY__
 
+#define direct_mapping_size 0
 #define __phys_addr_nodebug(x)	((x) - PAGE_OFFSET)
 #ifdef CONFIG_DEBUG_VIRTUAL
 extern unsigned long __phys_addr(unsigned long);
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index 939b1cff4a7b..f57fc3cc2246 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -14,6 +14,8 @@ extern unsigned long phys_base;
 extern unsigned long page_offset_base;
 extern unsigned long vmalloc_base;
 extern unsigned long vmemmap_base;
+extern unsigned long direct_mapping_size;
+extern unsigned long direct_mapping_mask;
 
 static inline unsigned long __phys_addr_nodebug(unsigned long x)
 {
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index ed8ec011a9fd..d2861074cf83 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -62,6 +62,12 @@ extern void x86_ce4100_early_setup(void);
 static inline void x86_ce4100_early_setup(void) { }
 #endif
 
+#ifdef CONFIG_MEMORY_PHYSICAL_PADDING
+void calculate_direct_mapping_size(void);
+#else
+static inline void calculate_direct_mapping_size(void) { }
+#endif
+
 #ifndef _SETUP
 
 #include <asm/espfix.h>
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 29ffa495bd1c..006d3ff46afe 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -60,6 +60,10 @@ EXPORT_SYMBOL(vmalloc_base);
 unsigned long vmemmap_base __ro_after_init = __VMEMMAP_BASE_L4;
 EXPORT_SYMBOL(vmemmap_base);
 #endif
+unsigned long direct_mapping_size __ro_after_init = -1UL;
+EXPORT_SYMBOL(direct_mapping_size);
+unsigned long direct_mapping_mask __ro_after_init = -1UL;
+EXPORT_SYMBOL(direct_mapping_mask);
 
 #define __head	__section(.head.text)
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bbe35bf879f5..d12431e20876 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1077,6 +1077,9 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	init_cache_modes();
 
+	 /* direct_mapping_size has to be initialized before KASLR and MKTME */
+	calculate_direct_mapping_size();
+
 	/*
 	 * Define random base addresses for memory sections after max_pfn is
 	 * defined and before each memory section base is used.
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a6b5c653727b..4c1f93df47a5 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1440,6 +1440,64 @@ unsigned long memory_block_size_bytes(void)
 	return memory_block_size_probed;
 }
 
+#ifdef CONFIG_MEMORY_PHYSICAL_PADDING
+void __init calculate_direct_mapping_size(void)
+{
+	unsigned long available_va;
+
+	/* 1/4 of virtual address space is didicated for direct mapping */
+	available_va = 1UL << (__VIRTUAL_MASK_SHIFT - 1);
+
+	/* How much memory the system has? */
+	direct_mapping_size = max_pfn << PAGE_SHIFT;
+	direct_mapping_size = round_up(direct_mapping_size, 1UL << 40);
+
+	if (!mktme_nr_keyids())
+		goto out;
+
+	/*
+	 * For MKTME we need direct_mapping_size to be power-of-2.
+	 * It makes __pa() implementation efficient.
+	 */
+	direct_mapping_size = roundup_pow_of_two(direct_mapping_size);
+
+	/*
+	 * Not enough virtual address space to address all physical memory with
+	 * MKTME enabled. Even without padding.
+	 *
+	 * Disable MKTME instead.
+	 */
+	if (direct_mapping_size > available_va / (mktme_nr_keyids() + 1)) {
+		pr_err("x86/mktme: Disabled. Not enough virtual address space\n");
+		pr_err("x86/mktme: Consider switching to 5-level paging\n");
+		mktme_disable();
+		goto out;
+	}
+
+	/*
+	 * Virtual address space is divided between per-KeyID direct mappings.
+	 */
+	available_va /= mktme_nr_keyids() + 1;
+out:
+	/* Add padding, if there's enough virtual address space */
+	direct_mapping_size += (1UL << 40) * CONFIG_MEMORY_PHYSICAL_PADDING;
+	if (mktme_nr_keyids())
+		direct_mapping_size = roundup_pow_of_two(direct_mapping_size);
+
+	if (direct_mapping_size > available_va)
+		direct_mapping_size = available_va;
+
+	/*
+	 * For MKTME, make sure direct_mapping_size is still power-of-2
+	 * after adding padding and calculate mask that is used in __pa().
+	 */
+	if (mktme_nr_keyids()) {
+		direct_mapping_size = rounddown_pow_of_two(direct_mapping_size);
+		direct_mapping_mask = direct_mapping_size - 1;
+	}
+}
+#endif
+
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 /*
  * Initialise the sparsemem vmemmap using huge-pages at the PMD level.
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 580b82c2621b..83af41d289ed 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -103,10 +103,15 @@ void __init kernel_randomize_memory(void)
 	 * add padding if needed (especially for memory hotplug support).
 	 */
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
-	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
-		CONFIG_MEMORY_PHYSICAL_PADDING;
 
-	/* Adapt phyiscal memory region size based on available memory */
+	/*
+	 * Calculate space required to map all physical memory.
+	 * In case of MKTME, we map physical memory multiple times, one for
+	 * each KeyID. If MKTME is disabled mktme_nr_keyids() is 0.
+	 */
+	memory_tb = (direct_mapping_size * (mktme_nr_keyids() + 1)) >> TB_SHIFT;
+
+	/* Adapt physical memory region size based on available memory */
 	if (memory_tb < kaslr_regions[0].size_tb)
 		kaslr_regions[0].size_tb = memory_tb;
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 19/59] x86/mm: Implement syncing per-KeyID direct mappings
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (17 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 18/59] x86/mm: Calculate direct mapping size Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 20/59] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
                   ` (39 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

For MKTME we use per-KeyID direct mappings. This allows kernel to have
access to encrypted memory.

sync_direct_mapping() sync per-KeyID direct mappings with a canonical
one -- KeyID-0.

The function tracks changes in the canonical mapping:
 - creating or removing chunks of the translation tree;
 - changes in mapping flags (i.e. protection bits);
 - splitting huge page mapping into a page table;
 - replacing page table with a huge page mapping;

The function need to be called on every change to the direct mapping:
hotplug, hotremove, changes in permissions bits, etc.

The function is nop until MKTME is enabled.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |   6 +
 arch/x86/mm/init_64.c        |   7 +
 arch/x86/mm/mktme.c          | 439 +++++++++++++++++++++++++++++++++++
 arch/x86/mm/pageattr.c       |  27 +++
 4 files changed, 479 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 3fc246acc279..d26ada6b65f7 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -62,6 +62,8 @@ static inline void arch_free_page(struct page *page, int order)
 		free_encrypted_page(page, order);
 }
 
+int sync_direct_mapping(unsigned long start, unsigned long end);
+
 #else
 #define mktme_keyid_mask()	((phys_addr_t)0)
 #define mktme_nr_keyids()	0
@@ -76,6 +78,10 @@ static inline bool mktme_enabled(void)
 
 static inline void mktme_disable(void) {}
 
+static inline int sync_direct_mapping(unsigned long start, unsigned long end)
+{
+	return 0;
+}
 #endif
 
 #endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 4c1f93df47a5..6769650ad18d 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -726,6 +726,7 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
 {
 	bool pgd_changed = false;
 	unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
+	int ret;
 
 	paddr_last = paddr_end;
 	vaddr = (unsigned long)__va(paddr_start);
@@ -762,6 +763,9 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
 		pgd_changed = true;
 	}
 
+	ret = sync_direct_mapping(vaddr_start, vaddr_end);
+	WARN_ON(ret);
+
 	if (pgd_changed)
 		sync_global_pgds(vaddr_start, vaddr_end - 1);
 
@@ -1201,10 +1205,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
 static void __meminit
 kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 {
+	int ret;
 	start = (unsigned long)__va(start);
 	end = (unsigned long)__va(end);
 
 	remove_pagetable(start, end, true, NULL);
+	ret = sync_direct_mapping(start, end);
+	WARN_ON(ret);
 }
 
 void __ref arch_remove_memory(int nid, u64 start, u64 size,
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 1e8d662e5bff..ed13967bb543 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,6 +1,8 @@
 #include <linux/mm.h>
 #include <linux/highmem.h>
 #include <asm/mktme.h>
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
 
 /* Mask to extract KeyID from physical address. */
 phys_addr_t __mktme_keyid_mask;
@@ -54,6 +56,8 @@ static bool need_page_mktme(void)
 static void init_page_mktme(void)
 {
 	static_branch_enable(&mktme_enabled_key);
+
+	sync_direct_mapping(PAGE_OFFSET, PAGE_OFFSET + direct_mapping_size);
 }
 
 struct page_ext_operations page_mktme_ops = {
@@ -148,3 +152,438 @@ void free_encrypted_page(struct page *page, int order)
 		page++;
 	}
 }
+
+static int sync_direct_mapping_pte(unsigned long keyid,
+		pmd_t *dst_pmd, pmd_t *src_pmd,
+		unsigned long addr, unsigned long end)
+{
+	pte_t *src_pte, *dst_pte;
+	pte_t *new_pte = NULL;
+	bool remove_pte;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_pte = !src_pmd && PAGE_ALIGNED(addr) && PAGE_ALIGNED(end);
+
+	/*
+	 * PMD page got split into page table.
+	 * Clear PMD mapping. Page table will be established instead.
+	 */
+	if (pmd_large(*dst_pmd)) {
+		spin_lock(&init_mm.page_table_lock);
+		pmd_clear(dst_pmd);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	/* Allocate a new page table if needed. */
+	if (pmd_none(*dst_pmd)) {
+		new_pte = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_pte)
+			return -ENOMEM;
+		dst_pte = new_pte + pte_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_pte = pte_offset_map(dst_pmd, addr + keyid * direct_mapping_size);
+	}
+	src_pte = src_pmd ? pte_offset_map(src_pmd, addr) : NULL;
+
+	spin_lock(&init_mm.page_table_lock);
+
+	do {
+		pteval_t val;
+
+		if (!src_pte || pte_none(*src_pte)) {
+			set_pte(dst_pte, __pte(0));
+			goto next;
+		}
+
+		if (!pte_none(*dst_pte)) {
+			/*
+			 * Sanity check: PFNs must match between source
+			 * and destination even if the rest doesn't.
+			 */
+			BUG_ON(pte_pfn(*dst_pte) != pte_pfn(*src_pte));
+		}
+
+		/* Copy entry, but set KeyID. */
+		val = pte_val(*src_pte) | keyid << mktme_keyid_shift();
+		val &= __supported_pte_mask;
+		set_pte(dst_pte, __pte(val));
+next:
+		addr += PAGE_SIZE;
+		dst_pte++;
+		if (src_pte)
+			src_pte++;
+	} while (addr != end);
+
+	if (new_pte)
+		pmd_populate_kernel(&init_mm, dst_pmd, new_pte);
+
+	if (remove_pte) {
+		__free_page(pmd_page(*dst_pmd));
+		pmd_clear(dst_pmd);
+	}
+
+	spin_unlock(&init_mm.page_table_lock);
+
+	return 0;
+}
+
+static int sync_direct_mapping_pmd(unsigned long keyid,
+		pud_t *dst_pud, pud_t *src_pud,
+		unsigned long addr, unsigned long end)
+{
+	pmd_t *src_pmd, *dst_pmd;
+	pmd_t *new_pmd = NULL;
+	bool remove_pmd = false;
+	unsigned long next;
+	int ret = 0;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_pmd = !src_pud && IS_ALIGNED(addr, PUD_SIZE) && IS_ALIGNED(end, PUD_SIZE);
+
+	/*
+	 * PUD page got split into page table.
+	 * Clear PUD mapping. Page table will be established instead.
+	 */
+	if (pud_large(*dst_pud)) {
+		spin_lock(&init_mm.page_table_lock);
+		pud_clear(dst_pud);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	/* Allocate a new page table if needed. */
+	if (pud_none(*dst_pud)) {
+		new_pmd = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_pmd)
+			return -ENOMEM;
+		dst_pmd = new_pmd + pmd_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_pmd = pmd_offset(dst_pud, addr + keyid * direct_mapping_size);
+	}
+	src_pmd = src_pud ? pmd_offset(src_pud, addr) : NULL;
+
+	do {
+		pmd_t *__src_pmd = src_pmd;
+
+		next = pmd_addr_end(addr, end);
+		if (!__src_pmd || pmd_none(*__src_pmd)) {
+			if (pmd_none(*dst_pmd))
+				goto next;
+			if (pmd_large(*dst_pmd)) {
+				spin_lock(&init_mm.page_table_lock);
+				set_pmd(dst_pmd, __pmd(0));
+				spin_unlock(&init_mm.page_table_lock);
+				goto next;
+			}
+			__src_pmd = NULL;
+		}
+
+		if (__src_pmd && pmd_large(*__src_pmd)) {
+			pmdval_t val;
+
+			if (pmd_large(*dst_pmd)) {
+				/*
+				 * Sanity check: PFNs must match between source
+				 * and destination even if the rest doesn't.
+				 */
+				BUG_ON(pmd_pfn(*dst_pmd) != pmd_pfn(*__src_pmd));
+			} else if (!pmd_none(*dst_pmd)) {
+				/*
+				 * Page table is replaced with a PMD page.
+				 * Free and unmap the page table.
+				 */
+				__free_page(pmd_page(*dst_pmd));
+				spin_lock(&init_mm.page_table_lock);
+				pmd_clear(dst_pmd);
+				spin_unlock(&init_mm.page_table_lock);
+			}
+
+			/* Copy entry, but set KeyID. */
+			val = pmd_val(*__src_pmd) | keyid << mktme_keyid_shift();
+			val &= __supported_pte_mask;
+			spin_lock(&init_mm.page_table_lock);
+			set_pmd(dst_pmd, __pmd(val));
+			spin_unlock(&init_mm.page_table_lock);
+			goto next;
+		}
+
+		ret = sync_direct_mapping_pte(keyid, dst_pmd, __src_pmd,
+				addr, next);
+next:
+		addr = next;
+		dst_pmd++;
+		if (src_pmd)
+			src_pmd++;
+	} while (addr != end && !ret);
+
+	if (new_pmd) {
+		spin_lock(&init_mm.page_table_lock);
+		pud_populate(&init_mm, dst_pud, new_pmd);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	if (remove_pmd) {
+		spin_lock(&init_mm.page_table_lock);
+		__free_page(pud_page(*dst_pud));
+		pud_clear(dst_pud);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	return ret;
+}
+
+static int sync_direct_mapping_pud(unsigned long keyid,
+		p4d_t *dst_p4d, p4d_t *src_p4d,
+		unsigned long addr, unsigned long end)
+{
+	pud_t *src_pud, *dst_pud;
+	pud_t *new_pud = NULL;
+	bool remove_pud = false;
+	unsigned long next;
+	int ret = 0;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_pud = !src_p4d && IS_ALIGNED(addr, P4D_SIZE) && IS_ALIGNED(end, P4D_SIZE);
+
+	/*
+	 * P4D page got split into page table.
+	 * Clear P4D mapping. Page table will be established instead.
+	 */
+	if (p4d_large(*dst_p4d)) {
+		spin_lock(&init_mm.page_table_lock);
+		p4d_clear(dst_p4d);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	/* Allocate a new page table if needed. */
+	if (p4d_none(*dst_p4d)) {
+		new_pud = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_pud)
+			return -ENOMEM;
+		dst_pud = new_pud + pud_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_pud = pud_offset(dst_p4d, addr + keyid * direct_mapping_size);
+	}
+	src_pud = src_p4d ? pud_offset(src_p4d, addr) : NULL;
+
+	do {
+		pud_t *__src_pud = src_pud;
+
+		next = pud_addr_end(addr, end);
+		if (!__src_pud || pud_none(*__src_pud)) {
+			if (pud_none(*dst_pud))
+				goto next;
+			if (pud_large(*dst_pud)) {
+				spin_lock(&init_mm.page_table_lock);
+				set_pud(dst_pud, __pud(0));
+				spin_unlock(&init_mm.page_table_lock);
+				goto next;
+			}
+			__src_pud = NULL;
+		}
+
+		if (__src_pud && pud_large(*__src_pud)) {
+			pudval_t val;
+
+			if (pud_large(*dst_pud)) {
+				/*
+				 * Sanity check: PFNs must match between source
+				 * and destination even if the rest doesn't.
+				 */
+				BUG_ON(pud_pfn(*dst_pud) != pud_pfn(*__src_pud));
+			} else if (!pud_none(*dst_pud)) {
+				/*
+				 * Page table is replaced with a pud page.
+				 * Free and unmap the page table.
+				 */
+				__free_page(pud_page(*dst_pud));
+				spin_lock(&init_mm.page_table_lock);
+				pud_clear(dst_pud);
+				spin_unlock(&init_mm.page_table_lock);
+			}
+
+			/* Copy entry, but set KeyID. */
+			val = pud_val(*__src_pud) | keyid << mktme_keyid_shift();
+			val &= __supported_pte_mask;
+			spin_lock(&init_mm.page_table_lock);
+			set_pud(dst_pud, __pud(val));
+			spin_unlock(&init_mm.page_table_lock);
+			goto next;
+		}
+
+		ret = sync_direct_mapping_pmd(keyid, dst_pud, __src_pud,
+				addr, next);
+next:
+		addr = next;
+		dst_pud++;
+		if (src_pud)
+			src_pud++;
+	} while (addr != end && !ret);
+
+	if (new_pud) {
+		spin_lock(&init_mm.page_table_lock);
+		p4d_populate(&init_mm, dst_p4d, new_pud);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	if (remove_pud) {
+		spin_lock(&init_mm.page_table_lock);
+		__free_page(p4d_page(*dst_p4d));
+		p4d_clear(dst_p4d);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	return ret;
+}
+
+static int sync_direct_mapping_p4d(unsigned long keyid,
+		pgd_t *dst_pgd, pgd_t *src_pgd,
+		unsigned long addr, unsigned long end)
+{
+	p4d_t *src_p4d, *dst_p4d;
+	p4d_t *new_p4d_1 = NULL, *new_p4d_2 = NULL;
+	bool remove_p4d = false;
+	unsigned long next;
+	int ret = 0;
+
+	/*
+	 * We want to unmap and free the page table if the source is empty and
+	 * the range covers whole page table.
+	 */
+	remove_p4d = !src_pgd && IS_ALIGNED(addr, PGDIR_SIZE) && IS_ALIGNED(end, PGDIR_SIZE);
+
+	/* Allocate a new page table if needed. */
+	if (pgd_none(*dst_pgd)) {
+		new_p4d_1 = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!new_p4d_1)
+			return -ENOMEM;
+		dst_p4d = new_p4d_1 + p4d_index(addr + keyid * direct_mapping_size);
+	} else {
+		dst_p4d = p4d_offset(dst_pgd, addr + keyid * direct_mapping_size);
+	}
+	src_p4d = src_pgd ? p4d_offset(src_pgd, addr) : NULL;
+
+	do {
+		p4d_t *__src_p4d = src_p4d;
+
+		next = p4d_addr_end(addr, end);
+		if (!__src_p4d || p4d_none(*__src_p4d)) {
+			if (p4d_none(*dst_p4d))
+				goto next;
+			__src_p4d = NULL;
+		}
+
+		ret = sync_direct_mapping_pud(keyid, dst_p4d, __src_p4d,
+				addr, next);
+next:
+		addr = next;
+		dst_p4d++;
+
+		/*
+		 * Direct mappings are 1TiB-aligned. With 5-level paging it
+		 * means that on PGD level there can be misalignment between
+		 * source and distiantion.
+		 *
+		 * Allocate the new page table if dst_p4d crosses page table
+		 * boundary.
+		 */
+		if (!((unsigned long)dst_p4d & ~PAGE_MASK) && addr != end) {
+			if (pgd_none(dst_pgd[1])) {
+				new_p4d_2 = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+				if (!new_p4d_2)
+					ret = -ENOMEM;
+				dst_p4d = new_p4d_2;
+			} else {
+				dst_p4d = p4d_offset(dst_pgd + 1, 0);
+			}
+		}
+		if (src_p4d)
+			src_p4d++;
+	} while (addr != end && !ret);
+
+	if (new_p4d_1 || new_p4d_2) {
+		spin_lock(&init_mm.page_table_lock);
+		if (new_p4d_1)
+			pgd_populate(&init_mm, dst_pgd, new_p4d_1);
+		if (new_p4d_2)
+			pgd_populate(&init_mm, dst_pgd + 1, new_p4d_2);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	if (remove_p4d) {
+		spin_lock(&init_mm.page_table_lock);
+		__free_page(pgd_page(*dst_pgd));
+		pgd_clear(dst_pgd);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	return ret;
+}
+
+static int sync_direct_mapping_keyid(unsigned long keyid,
+		unsigned long addr, unsigned long end)
+{
+	pgd_t *src_pgd, *dst_pgd;
+	unsigned long next;
+	int ret = 0;
+
+	dst_pgd = pgd_offset_k(addr + keyid * direct_mapping_size);
+	src_pgd = pgd_offset_k(addr);
+
+	do {
+		pgd_t *__src_pgd = src_pgd;
+
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*__src_pgd)) {
+			if (pgd_none(*dst_pgd))
+				continue;
+			__src_pgd = NULL;
+		}
+
+		ret = sync_direct_mapping_p4d(keyid, dst_pgd, __src_pgd,
+				addr, next);
+	} while (dst_pgd++, src_pgd++, addr = next, addr != end && !ret);
+
+	return ret;
+}
+
+/*
+ * For MKTME we maintain per-KeyID direct mappings. This allows kernel to have
+ * access to encrypted memory.
+ *
+ * sync_direct_mapping() sync per-KeyID direct mappings with a canonical
+ * one -- KeyID-0.
+ *
+ * The function tracks changes in the canonical mapping:
+ *  - creating or removing chunks of the translation tree;
+ *  - changes in mapping flags (i.e. protection bits);
+ *  - splitting huge page mapping into a page table;
+ *  - replacing page table with a huge page mapping;
+ *
+ * The function need to be called on every change to the direct mapping:
+ * hotplug, hotremove, changes in permissions bits, etc.
+ *
+ * The function is nop until MKTME is enabled.
+ */
+int sync_direct_mapping(unsigned long start, unsigned long end)
+{
+	int i, ret = 0;
+
+	if (!mktme_enabled())
+		return 0;
+
+	for (i = 1; !ret && i <= mktme_nr_keyids(); i++)
+		ret = sync_direct_mapping_keyid(i, start, end);
+
+	flush_tlb_all();
+
+	return ret;
+}
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 6a9a77a403c9..f4e3205d2cdd 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -347,6 +347,33 @@ static void cpa_flush(struct cpa_data *data, int cache)
 
 	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
 
+	if (mktme_enabled()) {
+		unsigned long start, end;
+
+		start = PAGE_OFFSET + (cpa->pfn << PAGE_SHIFT);
+		end = start + cpa->numpages * PAGE_SIZE;
+
+		/* Round to cover huge page possibly split by the change */
+		start = round_down(start, direct_gbpages ? PUD_SIZE : PMD_SIZE);
+		end = round_up(end, direct_gbpages ? PUD_SIZE : PMD_SIZE);
+
+		/* Sync all direct mapping for an array */
+		if (cpa->flags & CPA_ARRAY) {
+			start = PAGE_OFFSET;
+			end = PAGE_OFFSET + direct_mapping_size;
+		}
+
+		/*
+		 * Sync per-KeyID direct mappings with the canonical one
+		 * (KeyID-0).
+		 *
+		 * sync_direct_mapping() does full TLB flush.
+		 */
+		sync_direct_mapping(start, end);
+		if (!cache)
+			return;
+	}
+
 	if (cache && !static_cpu_has(X86_FEATURE_CLFLUSH)) {
 		cpa_flush_all(cache);
 		return;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 20/59] x86/mm: Handle encrypted memory in page_to_virt() and __pa()
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (18 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 19/59] x86/mm: Implement syncing per-KeyID direct mappings Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 21/59] mm/page_ext: Export lookup_page_ext() symbol Kirill A. Shutemov
                   ` (38 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Per-KeyID direct mappings require changes into how we find the right
virtual address for a page and virt-to-phys address translations.

page_to_virt() definition overwrites default macros provided by
<linux/mm.h>.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/page.h    | 3 +++
 arch/x86/include/asm/page_64.h | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 39af59487d5f..aff30554f38e 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -72,6 +72,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 extern bool __virt_addr_valid(unsigned long kaddr);
 #define virt_addr_valid(kaddr)	__virt_addr_valid((unsigned long) (kaddr))
 
+#define page_to_virt(x) \
+	(__va(PFN_PHYS(page_to_pfn(x))) + page_keyid(x) * direct_mapping_size)
+
 #endif	/* __ASSEMBLY__ */
 
 #include <asm-generic/memory_model.h>
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index f57fc3cc2246..a4f394e3471d 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -24,7 +24,7 @@ static inline unsigned long __phys_addr_nodebug(unsigned long x)
 	/* use the carry flag to determine if x was < __START_KERNEL_map */
 	x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET));
 
-	return x;
+	return x & direct_mapping_mask;
 }
 
 #ifdef CONFIG_DEBUG_VIRTUAL
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 21/59] mm/page_ext: Export lookup_page_ext() symbol
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (19 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 20/59] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 22/59] mm/rmap: Clear vma->anon_vma on unlink_anon_vmas() Kirill A. Shutemov
                   ` (37 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

page_keyid() is inline funcation that uses lookup_page_ext(). KVM is
going to use page_keyid() and since KVM can be built as a module
lookup_page_ext() has to be exported.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/page_ext.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/page_ext.c b/mm/page_ext.c
index c52b77c13cd9..eeca218891e7 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -139,6 +139,7 @@ struct page_ext *lookup_page_ext(const struct page *page)
 					MAX_ORDER_NR_PAGES);
 	return get_entry(base, index);
 }
+EXPORT_SYMBOL_GPL(lookup_page_ext);
 
 static int __init alloc_node_page_ext(int nid)
 {
@@ -209,6 +210,7 @@ struct page_ext *lookup_page_ext(const struct page *page)
 		return NULL;
 	return get_entry(section->page_ext, pfn);
 }
+EXPORT_SYMBOL_GPL(lookup_page_ext);
 
 static void *__meminit alloc_page_ext(size_t size, int nid)
 {
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 22/59] mm/rmap: Clear vma->anon_vma on unlink_anon_vmas()
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (20 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 21/59] mm/page_ext: Export lookup_page_ext() symbol Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 23/59] x86/pconfig: Set an activated algorithm in all MKTME commands Kirill A. Shutemov
                   ` (36 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

If all pages in the VMA got unmapped there's no reason to link it into
original anon VMA hierarchy: it cannot possibly share any pages with
other VMA.

Set vma->anon_vma to NULL on unlink_anon_vmas(). With the change VMA
can be reused. The new anon VMA will be allocated on the first fault.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/rmap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..911367b5fb40 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -400,8 +400,10 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
 		list_del(&avc->same_vma);
 		anon_vma_chain_free(avc);
 	}
-	if (vma->anon_vma)
+	if (vma->anon_vma) {
 		vma->anon_vma->degree--;
+		vma->anon_vma = NULL;
+	}
 	unlock_anon_vma_root(root);
 
 	/*
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 23/59] x86/pconfig: Set an activated algorithm in all MKTME commands
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (21 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 22/59] mm/rmap: Clear vma->anon_vma on unlink_anon_vmas() Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 24/59] keys/mktme: Introduce a Kernel Key Service for MKTME Kirill A. Shutemov
                   ` (35 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The Intel MKTME architecture specification requires an activated
encryption algorithm for all command types.

For commands that actually perform encryption, SET_KEY_DIRECT and
SET_KEY_RANDOM, the user specifies the algorithm when requesting the
key through the MKTME Key Service.

For CLEAR_KEY and NO_ENCRYPT commands, do not require the user to
specify an algorithm. Define a default algorithm, that is 'any
activated algorithm' to cover those two special cases.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/intel_pconfig.h | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/intel_pconfig.h b/arch/x86/include/asm/intel_pconfig.h
index 3cb002b1d0f9..4f27b0c532ee 100644
--- a/arch/x86/include/asm/intel_pconfig.h
+++ b/arch/x86/include/asm/intel_pconfig.h
@@ -21,14 +21,20 @@ enum pconfig_leaf {
 
 /* Defines and structure for MKTME_KEY_PROGRAM of PCONFIG instruction */
 
+/* mktme_key_program::keyid_ctrl ENC_ALG, bits [23:8] */
+#define MKTME_AES_XTS_128	(1 << 8)
+#define MKTME_ANY_ACTIVATED_ALG	(1 << __ffs(mktme_algs) << 8)
+
 /* mktme_key_program::keyid_ctrl COMMAND, bits [7:0] */
 #define MKTME_KEYID_SET_KEY_DIRECT	0
 #define MKTME_KEYID_SET_KEY_RANDOM	1
-#define MKTME_KEYID_CLEAR_KEY		2
-#define MKTME_KEYID_NO_ENCRYPT		3
 
-/* mktme_key_program::keyid_ctrl ENC_ALG, bits [23:8] */
-#define MKTME_AES_XTS_128	(1 << 8)
+/*
+ * CLEAR_KEY and NO_ENCRYPT require the COMMAND in bits [7:0]
+ * and any activated encryption algorithm, ENC_ALG, in bits [23:8]
+ */
+#define MKTME_KEYID_CLEAR_KEY  (2 | MKTME_ANY_ACTIVATED_ALG)
+#define MKTME_KEYID_NO_ENCRYPT (3 | MKTME_ANY_ACTIVATED_ALG)
 
 /* Return codes from the PCONFIG MKTME_KEY_PROGRAM */
 #define MKTME_PROG_SUCCESS	0
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 24/59] keys/mktme: Introduce a Kernel Key Service for MKTME
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (22 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 23/59] x86/pconfig: Set an activated algorithm in all MKTME commands Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 25/59] keys/mktme: Preparse the MKTME key payload Kirill A. Shutemov
                   ` (34 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

MKTME (Multi-Key Total Memory Encryption) is a technology that allows
transparent memory encryption in upcoming Intel platforms. MKTME will
support multiple encryption domains, each having their own key.

The MKTME key service will manage the hardware encryption keys. It
will map Userspace Keys to Hardware KeyIDs and program the hardware
with the user requested encryption options.

Here the mapping structure is introduced, as well as the key service
initialization and registration.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/Makefile     |  1 +
 security/keys/mktme_keys.c | 60 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)
 create mode 100644 security/keys/mktme_keys.c

diff --git a/security/keys/Makefile b/security/keys/Makefile
index 9cef54064f60..28799be801a9 100644
--- a/security/keys/Makefile
+++ b/security/keys/Makefile
@@ -30,3 +30,4 @@ obj-$(CONFIG_ASYMMETRIC_KEY_TYPE) += keyctl_pkey.o
 obj-$(CONFIG_BIG_KEYS) += big_key.o
 obj-$(CONFIG_TRUSTED_KEYS) += trusted.o
 obj-$(CONFIG_ENCRYPTED_KEYS) += encrypted-keys/
+obj-$(CONFIG_X86_INTEL_MKTME) += mktme_keys.o
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
new file mode 100644
index 000000000000..d262e0f348e4
--- /dev/null
+++ b/security/keys/mktme_keys.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-3.0
+
+/* Documentation/x86/mktme/ */
+
+#include <linux/init.h>
+#include <linux/key.h>
+#include <linux/key-type.h>
+#include <linux/mm.h>
+#include <keys/user-type.h>
+
+#include "internal.h"
+
+static unsigned int mktme_available_keyids;  /* Free Hardware KeyIDs */
+
+enum mktme_keyid_state {
+	KEYID_AVAILABLE,	/* Available to be assigned */
+	KEYID_ASSIGNED,		/* Assigned to a userspace key */
+	KEYID_REF_KILLED,	/* Userspace key has been destroyed */
+	KEYID_REF_RELEASED,	/* Last reference is released */
+};
+
+/* 1:1 Mapping between Userspace Keys (struct key) and Hardware KeyIDs */
+struct mktme_mapping {
+	struct key		*key;
+	enum mktme_keyid_state	state;
+};
+
+static struct mktme_mapping *mktme_map;
+
+struct key_type key_type_mktme = {
+	.name		= "mktme",
+	.describe	= user_describe,
+};
+
+static int __init init_mktme(void)
+{
+	int ret;
+
+	/* Verify keys are present */
+	if (mktme_nr_keyids() < 1)
+		return 0;
+
+	mktme_available_keyids = mktme_nr_keyids();
+
+	/* Mapping of Userspace Keys to Hardware KeyIDs */
+	mktme_map = kvzalloc((sizeof(*mktme_map) * (mktme_nr_keyids() + 1)),
+			     GFP_KERNEL);
+	if (!mktme_map)
+		return -ENOMEM;
+
+	ret = register_key_type(&key_type_mktme);
+	if (!ret)
+		return ret;			/* SUCCESS */
+
+	kvfree(mktme_map);
+
+	return -ENOMEM;
+}
+
+late_initcall(init_mktme);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 25/59] keys/mktme: Preparse the MKTME key payload
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (23 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 24/59] keys/mktme: Introduce a Kernel Key Service for MKTME Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-08-05 11:58   ` Ben Boeckel
  2019-07-31 15:07 ` [PATCHv2 26/59] keys/mktme: Instantiate MKTME keys Kirill A. Shutemov
                   ` (33 subsequent siblings)
  58 siblings, 1 reply; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

It is a requirement of the Kernel Keys subsystem to provide a
preparse method that validates payloads before key instantiate
methods are called.

Verify that userspace provides valid MKTME options and prepare
the payload for use at key instantiate time.

Create a method to free the preparsed payload. The Kernel Key
subsystem will that to clean up after the key is instantiated.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/keys/mktme-type.h  |  31 +++++++++
 security/keys/mktme_keys.c | 134 +++++++++++++++++++++++++++++++++++++
 2 files changed, 165 insertions(+)
 create mode 100644 include/keys/mktme-type.h

diff --git a/include/keys/mktme-type.h b/include/keys/mktme-type.h
new file mode 100644
index 000000000000..9dad92f17179
--- /dev/null
+++ b/include/keys/mktme-type.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/* Key service for Multi-KEY Total Memory Encryption */
+
+#ifndef _KEYS_MKTME_TYPE_H
+#define _KEYS_MKTME_TYPE_H
+
+#include <linux/key.h>
+
+enum mktme_alg {
+	MKTME_ALG_AES_XTS_128,
+};
+
+const char *const mktme_alg_names[] = {
+	[MKTME_ALG_AES_XTS_128]	= "aes-xts-128",
+};
+
+enum mktme_type {
+	MKTME_TYPE_ERROR = -1,
+	MKTME_TYPE_CPU,
+	MKTME_TYPE_NO_ENCRYPT,
+};
+
+const char *const mktme_type_names[] = {
+	[MKTME_TYPE_CPU]	= "cpu",
+	[MKTME_TYPE_NO_ENCRYPT]	= "no-encrypt",
+};
+
+extern struct key_type key_type_mktme;
+
+#endif /* _KEYS_MKTME_TYPE_H */
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index d262e0f348e4..fe119a155235 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -6,6 +6,10 @@
 #include <linux/key.h>
 #include <linux/key-type.h>
 #include <linux/mm.h>
+#include <linux/parser.h>
+#include <linux/string.h>
+#include <asm/intel_pconfig.h>
+#include <keys/mktme-type.h>
 #include <keys/user-type.h>
 
 #include "internal.h"
@@ -27,8 +31,138 @@ struct mktme_mapping {
 
 static struct mktme_mapping *mktme_map;
 
+enum mktme_opt_id {
+	OPT_ERROR,
+	OPT_TYPE,
+	OPT_ALGORITHM,
+};
+
+static const match_table_t mktme_token = {
+	{OPT_TYPE, "type=%s"},
+	{OPT_ALGORITHM, "algorithm=%s"},
+	{OPT_ERROR, NULL}
+};
+
+/* Make sure arguments are correct for the TYPE of key requested */
+static int mktme_check_options(u32 *payload, unsigned long token_mask,
+			       enum mktme_type type, enum mktme_alg alg)
+{
+	if (!token_mask)
+		return -EINVAL;
+
+	switch (type) {
+	case MKTME_TYPE_CPU:
+		if (test_bit(OPT_ALGORITHM, &token_mask))
+			*payload |= (1 << alg) << 8;
+		else
+			return -EINVAL;
+
+		*payload |= MKTME_KEYID_SET_KEY_RANDOM;
+		break;
+
+	case MKTME_TYPE_NO_ENCRYPT:
+		*payload |= MKTME_KEYID_NO_ENCRYPT;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/* Parse the options and store the key programming data in the payload. */
+static int mktme_get_options(char *options, u32 *payload)
+{
+	enum mktme_alg alg = MKTME_ALG_AES_XTS_128;
+	enum mktme_type type = MKTME_TYPE_ERROR;
+	substring_t args[MAX_OPT_ARGS];
+	unsigned long token_mask = 0;
+	char *p = options;
+	int token;
+
+	while ((p = strsep(&options, " \t"))) {
+		if (*p == '\0' || *p == ' ' || *p == '\t')
+			continue;
+		token = match_token(p, mktme_token, args);
+		if (token == OPT_ERROR)
+			return -EINVAL;
+		if (test_and_set_bit(token, &token_mask))
+			return -EINVAL;
+
+		switch (token) {
+		case OPT_TYPE:
+			type = match_string(mktme_type_names,
+					    ARRAY_SIZE(mktme_type_names),
+					    args[0].from);
+			if (type < 0)
+				return -EINVAL;
+			break;
+
+		case OPT_ALGORITHM:
+			/* Algorithm must be generally supported */
+			alg = match_string(mktme_alg_names,
+					   ARRAY_SIZE(mktme_alg_names),
+					   args[0].from);
+			if (alg < 0)
+				return -EINVAL;
+
+			/* Algorithm must be activated on this platform */
+			if (!(mktme_algs & (1 << alg)))
+				return -EINVAL;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+	}
+	return mktme_check_options(payload, token_mask, type, alg);
+}
+
+void mktme_free_preparsed_payload(struct key_preparsed_payload *prep)
+{
+	kzfree(prep->payload.data[0]);
+}
+
+/*
+ * Key Service Method to preparse a payload before a key is created.
+ * Check permissions and the options. Load the proposed key field
+ * data into the payload for use by the instantiate method.
+ */
+int mktme_preparse_payload(struct key_preparsed_payload *prep)
+{
+	size_t datalen = prep->datalen;
+	u32 *mktme_payload;
+	char *options;
+	int ret;
+
+	if (datalen <= 0 || datalen > 1024 || !prep->data)
+		return -EINVAL;
+
+	options = kmemdup_nul(prep->data, datalen, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	mktme_payload = kzalloc(sizeof(*mktme_payload), GFP_KERNEL);
+	if (!mktme_payload) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	ret = mktme_get_options(options, mktme_payload);
+	if (ret < 0) {
+		kzfree(mktme_payload);
+		goto out;
+	}
+	prep->quotalen = sizeof(mktme_payload);
+	prep->payload.data[0] = mktme_payload;
+out:
+	kzfree(options);
+	return ret;
+}
+
 struct key_type key_type_mktme = {
 	.name		= "mktme",
+	.preparse	= mktme_preparse_payload,
+	.free_preparse	= mktme_free_preparsed_payload,
 	.describe	= user_describe,
 };
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 26/59] keys/mktme: Instantiate MKTME keys
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (24 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 25/59] keys/mktme: Preparse the MKTME key payload Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 27/59] keys/mktme: Destroy " Kirill A. Shutemov
                   ` (32 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Instantiate is a Kernel Key Service method invoked when a key is
added (add_key, request_key) by the user.

During instantiation, MKTME allocates an available hardware KeyID
and maps it to the Userspace Key.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index fe119a155235..beca852db01a 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -14,6 +14,7 @@
 
 #include "internal.h"
 
+static DEFINE_SPINLOCK(mktme_lock);
 static unsigned int mktme_available_keyids;  /* Free Hardware KeyIDs */
 
 enum mktme_keyid_state {
@@ -31,6 +32,24 @@ struct mktme_mapping {
 
 static struct mktme_mapping *mktme_map;
 
+int mktme_reserve_keyid(struct key *key)
+{
+	int i;
+
+	if (!mktme_available_keyids)
+		return 0;
+
+	for (i = 1; i <= mktme_nr_keyids(); i++) {
+		if (mktme_map[i].state == KEYID_AVAILABLE) {
+			mktme_map[i].state = KEYID_ASSIGNED;
+			mktme_map[i].key = key;
+			mktme_available_keyids--;
+			return i;
+		}
+	}
+	return 0;
+}
+
 enum mktme_opt_id {
 	OPT_ERROR,
 	OPT_TYPE,
@@ -43,6 +62,20 @@ static const match_table_t mktme_token = {
 	{OPT_ERROR, NULL}
 };
 
+/* Key Service Method to create a new key. Payload is preparsed. */
+int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
+{
+	unsigned long flags;
+	int keyid;
+
+	spin_lock_irqsave(&mktme_lock, flags);
+	keyid = mktme_reserve_keyid(key);
+	spin_unlock_irqrestore(&mktme_lock, flags);
+	if (!keyid)
+		return -ENOKEY;
+	return 0;
+}
+
 /* Make sure arguments are correct for the TYPE of key requested */
 static int mktme_check_options(u32 *payload, unsigned long token_mask,
 			       enum mktme_type type, enum mktme_alg alg)
@@ -163,6 +196,7 @@ struct key_type key_type_mktme = {
 	.name		= "mktme",
 	.preparse	= mktme_preparse_payload,
 	.free_preparse	= mktme_free_preparsed_payload,
+	.instantiate	= mktme_instantiate_key,
 	.describe	= user_describe,
 };
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 27/59] keys/mktme: Destroy MKTME keys
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (25 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 26/59] keys/mktme: Instantiate MKTME keys Kirill A. Shutemov
@ 2019-07-31 15:07 ` " Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 28/59] keys/mktme: Move the MKTME payload into a cache aligned structure Kirill A. Shutemov
                   ` (31 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Destroy is a method invoked by the kernel key service when a
userspace key is being removed. (invalidate, revoke, timeout).

During destroy, MKTME wil returned the hardware KeyID to the pool
of available keyids.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index beca852db01a..10fcdbf5a08f 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -50,6 +50,23 @@ int mktme_reserve_keyid(struct key *key)
 	return 0;
 }
 
+static void mktme_release_keyid(int keyid)
+{
+	 mktme_map[keyid].state = KEYID_AVAILABLE;
+	 mktme_available_keyids++;
+}
+
+int mktme_keyid_from_key(struct key *key)
+{
+	int i;
+
+	for (i = 1; i <= mktme_nr_keyids(); i++) {
+		if (mktme_map[i].key == key)
+			return i;
+	}
+	return 0;
+}
+
 enum mktme_opt_id {
 	OPT_ERROR,
 	OPT_TYPE,
@@ -62,6 +79,17 @@ static const match_table_t mktme_token = {
 	{OPT_ERROR, NULL}
 };
 
+/* Key Service Method called when a Userspace Key is garbage collected. */
+static void mktme_destroy_key(struct key *key)
+{
+	int keyid = mktme_keyid_from_key(key);
+	unsigned long flags;
+
+	spin_lock_irqsave(&mktme_lock, flags);
+	mktme_release_keyid(keyid);
+	spin_unlock_irqrestore(&mktme_lock, flags);
+}
+
 /* Key Service Method to create a new key. Payload is preparsed. */
 int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 {
@@ -198,6 +226,7 @@ struct key_type key_type_mktme = {
 	.free_preparse	= mktme_free_preparsed_payload,
 	.instantiate	= mktme_instantiate_key,
 	.describe	= user_describe,
+	.destroy	= mktme_destroy_key,
 };
 
 static int __init init_mktme(void)
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 28/59] keys/mktme: Move the MKTME payload into a cache aligned structure
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (26 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 27/59] keys/mktme: Destroy " Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 29/59] keys/mktme: Set up PCONFIG programming targets for MKTME keys Kirill A. Shutemov
                   ` (30 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

In preparation for programming the key into the hardware, move
the key payload into a cache aligned structure. This alignment
is a requirement of the MKTME hardware.

Use the slab allocator to have this structure readily available.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 10fcdbf5a08f..8ac75b1e6188 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -16,6 +16,7 @@
 
 static DEFINE_SPINLOCK(mktme_lock);
 static unsigned int mktme_available_keyids;  /* Free Hardware KeyIDs */
+static struct kmem_cache *mktme_prog_cache;  /* Hardware programming cache */
 
 enum mktme_keyid_state {
 	KEYID_AVAILABLE,	/* Available to be assigned */
@@ -79,6 +80,25 @@ static const match_table_t mktme_token = {
 	{OPT_ERROR, NULL}
 };
 
+/* Copy the payload to the HW programming structure and program this KeyID */
+static int mktme_program_keyid(int keyid, u32 payload)
+{
+	struct mktme_key_program *kprog = NULL;
+	int ret;
+
+	kprog = kmem_cache_zalloc(mktme_prog_cache, GFP_KERNEL);
+	if (!kprog)
+		return -ENOMEM;
+
+	/* Hardware programming requires cached aligned struct */
+	kprog->keyid = keyid;
+	kprog->keyid_ctrl = payload;
+
+	ret = MKTME_PROG_SUCCESS;	/* Future programming call */
+	kmem_cache_free(mktme_prog_cache, kprog);
+	return ret;
+}
+
 /* Key Service Method called when a Userspace Key is garbage collected. */
 static void mktme_destroy_key(struct key *key)
 {
@@ -93,6 +113,7 @@ static void mktme_destroy_key(struct key *key)
 /* Key Service Method to create a new key. Payload is preparsed. */
 int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 {
+	u32 *payload = prep->payload.data[0];
 	unsigned long flags;
 	int keyid;
 
@@ -101,7 +122,14 @@ int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 	spin_unlock_irqrestore(&mktme_lock, flags);
 	if (!keyid)
 		return -ENOKEY;
-	return 0;
+
+	if (!mktme_program_keyid(keyid, *payload))
+		return MKTME_PROG_SUCCESS;
+
+	spin_lock_irqsave(&mktme_lock, flags);
+	mktme_release_keyid(keyid);
+	spin_unlock_irqrestore(&mktme_lock, flags);
+	return -ENOKEY;
 }
 
 /* Make sure arguments are correct for the TYPE of key requested */
@@ -245,10 +273,15 @@ static int __init init_mktme(void)
 	if (!mktme_map)
 		return -ENOMEM;
 
+	/* Used to program the hardware key tables */
+	mktme_prog_cache = KMEM_CACHE(mktme_key_program, SLAB_PANIC);
+	if (!mktme_prog_cache)
+		goto free_map;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
-
+free_map:
 	kvfree(mktme_map);
 
 	return -ENOMEM;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 29/59] keys/mktme: Set up PCONFIG programming targets for MKTME keys
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (27 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 28/59] keys/mktme: Move the MKTME payload into a cache aligned structure Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 30/59] keys/mktme: Program MKTME keys into the platform hardware Kirill A. Shutemov
                   ` (29 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

MKTME Key service maintains the hardware key tables. These key tables
are package scoped per the MKTME hardware definition. This means that
each physical package on the system needs its key table programmed.

These physical packages are the targets of the new PCONFIG programming
command. So, introduce a PCONFIG targets bitmap as well as a CPU mask
that includes the lead CPUs capable of programming the targets.

The lead CPU mask will be used every time a new key is programmed into
the hardware.

Keep the PCONFIG targets bit map around for future use during CPU
hotplug events.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 42 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 8ac75b1e6188..272bff8591b7 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -2,6 +2,7 @@
 
 /* Documentation/x86/mktme/ */
 
+#include <linux/cpu.h>
 #include <linux/init.h>
 #include <linux/key.h>
 #include <linux/key-type.h>
@@ -17,6 +18,8 @@
 static DEFINE_SPINLOCK(mktme_lock);
 static unsigned int mktme_available_keyids;  /* Free Hardware KeyIDs */
 static struct kmem_cache *mktme_prog_cache;  /* Hardware programming cache */
+static unsigned long *mktme_target_map;	     /* PCONFIG programming target */
+static cpumask_var_t mktme_leadcpus;	     /* One CPU per PCONFIG target */
 
 enum mktme_keyid_state {
 	KEYID_AVAILABLE,	/* Available to be assigned */
@@ -257,6 +260,33 @@ struct key_type key_type_mktme = {
 	.destroy	= mktme_destroy_key,
 };
 
+static void mktme_update_pconfig_targets(void)
+{
+	int cpu, target_id;
+
+	cpumask_clear(mktme_leadcpus);
+	bitmap_clear(mktme_target_map, 0, sizeof(mktme_target_map));
+
+	for_each_online_cpu(cpu) {
+		target_id = topology_physical_package_id(cpu);
+		if (!__test_and_set_bit(target_id, mktme_target_map))
+			__cpumask_set_cpu(cpu, mktme_leadcpus);
+	}
+}
+
+static int mktme_alloc_pconfig_targets(void)
+{
+	if (!alloc_cpumask_var(&mktme_leadcpus, GFP_KERNEL))
+		return -ENOMEM;
+
+	mktme_target_map = bitmap_alloc(topology_max_packages(), GFP_KERNEL);
+	if (!mktme_target_map) {
+		free_cpumask_var(mktme_leadcpus);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
 static int __init init_mktme(void)
 {
 	int ret;
@@ -278,9 +308,21 @@ static int __init init_mktme(void)
 	if (!mktme_prog_cache)
 		goto free_map;
 
+	/* Hardware programming targets */
+	if (mktme_alloc_pconfig_targets())
+		goto free_cache;
+
+	/* Initialize first programming targets */
+	mktme_update_pconfig_targets();
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
+
+	free_cpumask_var(mktme_leadcpus);
+	bitmap_free(mktme_target_map);
+free_cache:
+	kmem_cache_destroy(mktme_prog_cache);
 free_map:
 	kvfree(mktme_map);
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 30/59] keys/mktme: Program MKTME keys into the platform hardware
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (28 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 29/59] keys/mktme: Set up PCONFIG programming targets for MKTME keys Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 31/59] keys/mktme: Set up a percpu_ref_count for MKTME keys Kirill A. Shutemov
                   ` (28 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Finally, the keys are programmed into the hardware via each
lead CPU. Every package has to be programmed successfully.
There is no partial success allowed here.

Here a retry scheme is included for two errors that may succeed
on retry: MKTME_DEVICE_BUSY and MKTME_ENTROPY_ERROR.
However, it's not clear if even those errors should be retried
at this level. Perhaps they too, should be returned to user space
for handling.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 92 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 91 insertions(+), 1 deletion(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 272bff8591b7..3c641f3ee794 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -83,6 +83,96 @@ static const match_table_t mktme_token = {
 	{OPT_ERROR, NULL}
 };
 
+struct mktme_hw_program_info {
+	struct mktme_key_program *key_program;
+	int *status;
+};
+
+struct mktme_err_table {
+	const char *msg;
+	bool retry;
+};
+
+static const struct mktme_err_table mktme_error[] = {
+/* MKTME_PROG_SUCCESS     */ {"KeyID was successfully programmed",   false},
+/* MKTME_INVALID_PROG_CMD */ {"Invalid KeyID programming command",   false},
+/* MKTME_ENTROPY_ERROR    */ {"Insufficient entropy",		      true},
+/* MKTME_INVALID_KEYID    */ {"KeyID not valid",		     false},
+/* MKTME_INVALID_ENC_ALG  */ {"Invalid encryption algorithm chosen", false},
+/* MKTME_DEVICE_BUSY      */ {"Failure to access key table",	      true},
+};
+
+static int mktme_parse_program_status(int status[])
+{
+	int cpu, sum = 0;
+
+	/* Success: all CPU(s) programmed all key table(s) */
+	for_each_cpu(cpu, mktme_leadcpus)
+		sum += status[cpu];
+	if (!sum)
+		return MKTME_PROG_SUCCESS;
+
+	/* Invalid Parameters: log the error and return the error. */
+	for_each_cpu(cpu, mktme_leadcpus) {
+		switch (status[cpu]) {
+		case MKTME_INVALID_KEYID:
+		case MKTME_INVALID_PROG_CMD:
+		case MKTME_INVALID_ENC_ALG:
+			pr_err("mktme: %s\n", mktme_error[status[cpu]].msg);
+			return status[cpu];
+
+		default:
+			break;
+		}
+	}
+	/*
+	 * Device Busy or Insufficient Entropy: do not log the
+	 * error. These will be retried and if retries (time or
+	 * count runs out) caller will log the error.
+	 */
+	for_each_cpu(cpu, mktme_leadcpus) {
+		if (status[cpu] == MKTME_DEVICE_BUSY)
+			return status[cpu];
+	}
+	return MKTME_ENTROPY_ERROR;
+}
+
+/* Program a single key using one CPU. */
+static void mktme_do_program(void *hw_program_info)
+{
+	struct mktme_hw_program_info *info = hw_program_info;
+	int cpu;
+
+	cpu = smp_processor_id();
+	info->status[cpu] = mktme_key_program(info->key_program);
+}
+
+static int mktme_program_all_keytables(struct mktme_key_program *key_program)
+{
+	struct mktme_hw_program_info info;
+	int err, retries = 10; /* Maybe users should handle retries */
+
+	info.key_program = key_program;
+	info.status = kcalloc(num_possible_cpus(), sizeof(info.status[0]),
+			      GFP_KERNEL);
+
+	while (retries--) {
+		get_online_cpus();
+		on_each_cpu_mask(mktme_leadcpus, mktme_do_program,
+				 &info, 1);
+		put_online_cpus();
+
+		err = mktme_parse_program_status(info.status);
+		if (!err)			   /* Success */
+			return err;
+		else if (!mktme_error[err].retry)  /* Error no retry */
+			return -ENOKEY;
+	}
+	/* Ran out of retries */
+	pr_err("mktme: %s\n", mktme_error[err].msg);
+	return err;
+}
+
 /* Copy the payload to the HW programming structure and program this KeyID */
 static int mktme_program_keyid(int keyid, u32 payload)
 {
@@ -97,7 +187,7 @@ static int mktme_program_keyid(int keyid, u32 payload)
 	kprog->keyid = keyid;
 	kprog->keyid_ctrl = payload;
 
-	ret = MKTME_PROG_SUCCESS;	/* Future programming call */
+	ret = mktme_program_all_keytables(kprog);
 	kmem_cache_free(mktme_prog_cache, kprog);
 	return ret;
 }
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 31/59] keys/mktme: Set up a percpu_ref_count for MKTME keys
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (29 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 30/59] keys/mktme: Program MKTME keys into the platform hardware Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 32/59] keys/mktme: Clear the key programming from the MKTME hardware Kirill A. Shutemov
                   ` (27 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The MKTME key service needs to keep usage counts on the encryption
keys in order to know when it is safe to free a key for reuse.

percpu_ref_count applies well here because the key service will
take the initial reference and typically hold that reference while
the intermediary references are get/put. The intermediaries in this
case will be encrypted VMA's,

Align the percpu_ref_init and percpu_ref_kill with the key service
instantiate and destroy methods respectively.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 39 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 3c641f3ee794..18cb57be5193 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -8,6 +8,7 @@
 #include <linux/key-type.h>
 #include <linux/mm.h>
 #include <linux/parser.h>
+#include <linux/percpu-refcount.h>
 #include <linux/string.h>
 #include <asm/intel_pconfig.h>
 #include <keys/mktme-type.h>
@@ -71,6 +72,26 @@ int mktme_keyid_from_key(struct key *key)
 	return 0;
 }
 
+struct percpu_ref *encrypt_count;
+void mktme_percpu_ref_release(struct percpu_ref *ref)
+{
+	unsigned long flags;
+	int keyid;
+
+	for (keyid = 1; keyid <= mktme_nr_keyids(); keyid++) {
+		if (&encrypt_count[keyid] == ref)
+			break;
+	}
+	if (&encrypt_count[keyid] != ref) {
+		pr_debug("%s: invalid ref counter\n", __func__);
+		return;
+	}
+	percpu_ref_exit(ref);
+	spin_lock_irqsave(&mktme_lock, flags);
+	mktme_release_keyid(keyid);
+	spin_unlock_irqrestore(&mktme_lock, flags);
+}
+
 enum mktme_opt_id {
 	OPT_ERROR,
 	OPT_TYPE,
@@ -199,8 +220,10 @@ static void mktme_destroy_key(struct key *key)
 	unsigned long flags;
 
 	spin_lock_irqsave(&mktme_lock, flags);
-	mktme_release_keyid(keyid);
+	mktme_map[keyid].key = NULL;
+	mktme_map[keyid].state = KEYID_REF_KILLED;
 	spin_unlock_irqrestore(&mktme_lock, flags);
+	percpu_ref_kill(&encrypt_count[keyid]);
 }
 
 /* Key Service Method to create a new key. Payload is preparsed. */
@@ -216,9 +239,15 @@ int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 	if (!keyid)
 		return -ENOKEY;
 
+	if (percpu_ref_init(&encrypt_count[keyid], mktme_percpu_ref_release,
+			    0, GFP_KERNEL))
+		goto err_out;
+
 	if (!mktme_program_keyid(keyid, *payload))
 		return MKTME_PROG_SUCCESS;
 
+	percpu_ref_exit(&encrypt_count[keyid]);
+err_out:
 	spin_lock_irqsave(&mktme_lock, flags);
 	mktme_release_keyid(keyid);
 	spin_unlock_irqrestore(&mktme_lock, flags);
@@ -405,10 +434,18 @@ static int __init init_mktme(void)
 	/* Initialize first programming targets */
 	mktme_update_pconfig_targets();
 
+	/* Reference counters to protect in use KeyIDs */
+	encrypt_count = kvcalloc(mktme_nr_keyids() + 1, sizeof(encrypt_count[0]),
+				 GFP_KERNEL);
+	if (!encrypt_count)
+		goto free_targets;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
 
+	kvfree(encrypt_count);
+free_targets:
 	free_cpumask_var(mktme_leadcpus);
 	bitmap_free(mktme_target_map);
 free_cache:
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 32/59] keys/mktme: Clear the key programming from the MKTME hardware
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (30 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 31/59] keys/mktme: Set up a percpu_ref_count for MKTME keys Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 33/59] keys/mktme: Require CAP_SYS_RESOURCE capability for MKTME keys Kirill A. Shutemov
                   ` (26 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Send a request to the MKTME hardware to clear a previously
programmed key. This will be used when userspace keys are
destroyed and the key slot is no longer in use. No longer
in use means that the reference has been released, and its
usage count has returned to zero.

This clear command is not offered as an option to userspace,
since the key service can execute it automatically, and at
the right time, safely.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 18cb57be5193..1e2afcce7d85 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -72,6 +72,9 @@ int mktme_keyid_from_key(struct key *key)
 	return 0;
 }
 
+static void mktme_clear_hardware_keyid(struct work_struct *work);
+static DECLARE_WORK(mktme_clear_work, mktme_clear_hardware_keyid);
+
 struct percpu_ref *encrypt_count;
 void mktme_percpu_ref_release(struct percpu_ref *ref)
 {
@@ -88,8 +91,9 @@ void mktme_percpu_ref_release(struct percpu_ref *ref)
 	}
 	percpu_ref_exit(ref);
 	spin_lock_irqsave(&mktme_lock, flags);
-	mktme_release_keyid(keyid);
+	mktme_map[keyid].state = KEYID_REF_RELEASED;
 	spin_unlock_irqrestore(&mktme_lock, flags);
+	schedule_work(&mktme_clear_work);
 }
 
 enum mktme_opt_id {
@@ -213,6 +217,27 @@ static int mktme_program_keyid(int keyid, u32 payload)
 	return ret;
 }
 
+static void mktme_clear_hardware_keyid(struct work_struct *work)
+{
+	u32 clear_payload = MKTME_KEYID_CLEAR_KEY;
+	unsigned long flags;
+	int keyid, ret;
+
+	for (keyid = 1; keyid <= mktme_nr_keyids(); keyid++) {
+		if (mktme_map[keyid].state != KEYID_REF_RELEASED)
+			continue;
+
+		ret = mktme_program_keyid(keyid, clear_payload);
+		if (ret != MKTME_PROG_SUCCESS)
+			pr_debug("mktme: clear key failed [%s]\n",
+				 mktme_error[ret].msg);
+
+		spin_lock_irqsave(&mktme_lock, flags);
+		mktme_release_keyid(keyid);
+		spin_unlock_irqrestore(&mktme_lock, flags);
+	}
+}
+
 /* Key Service Method called when a Userspace Key is garbage collected. */
 static void mktme_destroy_key(struct key *key)
 {
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 33/59] keys/mktme: Require CAP_SYS_RESOURCE capability for MKTME keys
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (31 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 32/59] keys/mktme: Clear the key programming from the MKTME hardware Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 34/59] acpi: Remove __init from acpi table parsing functions Kirill A. Shutemov
                   ` (25 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The MKTME key type uses capabilities to restrict the allocation
of keys to privileged users. CAP_SYS_RESOURCE is required, but
the broader capability of CAP_SYS_ADMIN is accepted.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 1e2afcce7d85..2d90cc83e5ce 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -2,6 +2,7 @@
 
 /* Documentation/x86/mktme/ */
 
+#include <linux/cred.h>
 #include <linux/cpu.h>
 #include <linux/init.h>
 #include <linux/key.h>
@@ -371,6 +372,9 @@ int mktme_preparse_payload(struct key_preparsed_payload *prep)
 	char *options;
 	int ret;
 
+	if (!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
 	if (datalen <= 0 || datalen > 1024 || !prep->data)
 		return -EINVAL;
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 34/59] acpi: Remove __init from acpi table parsing functions
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (32 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 33/59] keys/mktme: Require CAP_SYS_RESOURCE capability for MKTME keys Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 35/59] acpi/hmat: Determine existence of an ACPI HMAT Kirill A. Shutemov
                   ` (24 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

ACPI table parsing functions are useful after init time.

For example, the MKTME (Multi-Key Total Memory Encryption) key
service will evaluate the ACPI HMAT table when the first key
creation request occurs.  This will happen after init time.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/acpi/tables.c | 10 +++++-----
 include/linux/acpi.h  |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index b32327759380..9d40af7f07fb 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -33,7 +33,7 @@ static char *mps_inti_flags_trigger[] = { "dfl", "edge", "res", "level" };
 
 static struct acpi_table_desc initial_tables[ACPI_MAX_TABLES] __initdata;
 
-static int acpi_apic_instance __initdata;
+static int acpi_apic_instance;
 
 enum acpi_subtable_type {
 	ACPI_SUBTABLE_COMMON,
@@ -49,7 +49,7 @@ struct acpi_subtable_entry {
  * Disable table checksum verification for the early stage due to the size
  * limitation of the current x86 early mapping implementation.
  */
-static bool acpi_verify_table_checksum __initdata = false;
+static bool acpi_verify_table_checksum = false;
 
 void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
 {
@@ -280,7 +280,7 @@ acpi_get_subtable_type(char *id)
  * On success returns sum of all matching entries for all proc handlers.
  * Otherwise, -ENODEV or -EINVAL is returned.
  */
-static int __init acpi_parse_entries_array(char *id, unsigned long table_size,
+static int acpi_parse_entries_array(char *id, unsigned long table_size,
 		struct acpi_table_header *table_header,
 		struct acpi_subtable_proc *proc, int proc_num,
 		unsigned int max_entries)
@@ -355,7 +355,7 @@ static int __init acpi_parse_entries_array(char *id, unsigned long table_size,
 	return errs ? -EINVAL : count;
 }
 
-int __init acpi_table_parse_entries_array(char *id,
+int acpi_table_parse_entries_array(char *id,
 			 unsigned long table_size,
 			 struct acpi_subtable_proc *proc, int proc_num,
 			 unsigned int max_entries)
@@ -386,7 +386,7 @@ int __init acpi_table_parse_entries_array(char *id,
 	return count;
 }
 
-int __init acpi_table_parse_entries(char *id,
+int acpi_table_parse_entries(char *id,
 			unsigned long table_size,
 			int entry_id,
 			acpi_tbl_entry_handler handler,
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 9426b9aaed86..fc1e7d4648bf 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -228,11 +228,11 @@ int acpi_numa_init (void);
 
 int acpi_table_init (void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
-int __init acpi_table_parse_entries(char *id, unsigned long table_size,
+int acpi_table_parse_entries(char *id, unsigned long table_size,
 			      int entry_id,
 			      acpi_tbl_entry_handler handler,
 			      unsigned int max_entries);
-int __init acpi_table_parse_entries_array(char *id, unsigned long table_size,
+int acpi_table_parse_entries_array(char *id, unsigned long table_size,
 			      struct acpi_subtable_proc *proc, int proc_num,
 			      unsigned int max_entries);
 int acpi_table_parse_madt(enum acpi_madt_type id,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 35/59] acpi/hmat: Determine existence of an ACPI HMAT
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (33 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 34/59] acpi: Remove __init from acpi table parsing functions Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 36/59] keys/mktme: Require ACPI HMAT to register the MKTME Key Service Kirill A. Shutemov
                   ` (23 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Platforms that need to confirm the presence of an HMAT table
can use this function that simply reports the HMATs existence.

This is added in support of the Multi-Key Total Memory Encryption
(MKTME), a feature on future Intel platforms. These platforms will
need to confirm an HMAT is present at init time.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/acpi/hmat/hmat.c | 13 +++++++++++++
 include/linux/acpi.h     |  4 ++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c
index 96b7d39a97c6..38e3341f569f 100644
--- a/drivers/acpi/hmat/hmat.c
+++ b/drivers/acpi/hmat/hmat.c
@@ -664,3 +664,16 @@ static __init int hmat_init(void)
 	return 0;
 }
 subsys_initcall(hmat_init);
+
+bool acpi_hmat_present(void)
+{
+	struct acpi_table_header *tbl;
+	acpi_status status;
+
+	status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl);
+	if (ACPI_FAILURE(status))
+		return false;
+
+	acpi_put_table(tbl);
+	return true;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index fc1e7d4648bf..d27f4d17dfb3 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1335,4 +1335,8 @@ acpi_platform_notify(struct device *dev, enum kobject_action action)
 }
 #endif
 
+#ifdef CONFIG_X86_INTEL_MKTME
+extern bool acpi_hmat_present(void);
+#endif /* CONFIG_X86_INTEL_MKTME */
+
 #endif	/*_LINUX_ACPI_H*/
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 36/59] keys/mktme: Require ACPI HMAT to register the MKTME Key Service
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (34 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 35/59] acpi/hmat: Determine existence of an ACPI HMAT Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 37/59] acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME Kirill A. Shutemov
                   ` (22 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The ACPI HMAT will be used by the MKTME key service to identify
topologies that support the safe programming of encryption keys.
Those decisions will happen at key creation time and during
hotplug events.

To enable this, we at least need to have the ACPI HMAT present
at init time. If it's not present, do not register the type.

If the HMAT is not present, failure looks like this:
[ ] MKTME: Registration failed. ACPI HMAT not present.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 2d90cc83e5ce..6265b62801e9 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -2,6 +2,7 @@
 
 /* Documentation/x86/mktme/ */
 
+#include <linux/acpi.h>
 #include <linux/cred.h>
 #include <linux/cpu.h>
 #include <linux/init.h>
@@ -445,6 +446,12 @@ static int __init init_mktme(void)
 
 	mktme_available_keyids = mktme_nr_keyids();
 
+	/* Require an ACPI HMAT to identify MKTME safe topologies */
+	if (!acpi_hmat_present()) {
+		pr_warn("MKTME: Registration failed. ACPI HMAT not present.\n");
+		return -EINVAL;
+	}
+
 	/* Mapping of Userspace Keys to Hardware KeyIDs */
 	mktme_map = kvzalloc((sizeof(*mktme_map) * (mktme_nr_keyids() + 1)),
 			     GFP_KERNEL);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 37/59] acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (35 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 36/59] keys/mktme: Require ACPI HMAT to register the MKTME Key Service Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 38/59] keys/mktme: Do not allow key creation in unsafe topologies Kirill A. Shutemov
                   ` (21 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

MKTME, Multi-Key Total Memory Encryption, is a feature on Intel
platforms. The ACPI HMAT table can be used to verify that the
platform topology is safe for the usage of MKTME.

The kernel must be capable of programming every memory controller
on the platform. This means that there must be a CPU online, in
the same proximity domain of each memory controller.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/acpi/hmat/hmat.c | 54 ++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h     |  1 +
 2 files changed, 55 insertions(+)

diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c
index 38e3341f569f..936a403c0694 100644
--- a/drivers/acpi/hmat/hmat.c
+++ b/drivers/acpi/hmat/hmat.c
@@ -677,3 +677,57 @@ bool acpi_hmat_present(void)
 	acpi_put_table(tbl);
 	return true;
 }
+
+static int mktme_parse_proximity_domains(union acpi_subtable_headers *header,
+					 const unsigned long end)
+{
+	struct acpi_hmat_proximity_domain *mar = (void *)header;
+	struct acpi_hmat_structure *hdr = (void *)header;
+
+	const struct cpumask *tmp_mask;
+
+	if (!hdr || hdr->type != ACPI_HMAT_TYPE_PROXIMITY)
+		return -EINVAL;
+
+	if (mar->header.length != sizeof(*mar)) {
+		pr_warn("MKTME: invalid header length in HMAT\n");
+		return -1;
+	}
+	/*
+	 * Require a valid processor proximity domain.
+	 * This will catch memory only physical packages with
+	 * no processor capable of programming the key table.
+	 */
+	if (!(mar->flags & ACPI_HMAT_PROCESSOR_PD_VALID)) {
+		pr_warn("MKTME: no valid processor proximity domain\n");
+		return -1;
+	}
+	/* Require an online CPU in the processor proximity domain. */
+	tmp_mask = cpumask_of_node(pxm_to_node(mar->processor_PD));
+	if (!cpumask_intersects(tmp_mask, cpu_online_mask)) {
+		pr_warn("MKTME: no online CPU in proximity domain\n");
+		return -1;
+	}
+	return 0;
+}
+
+/* Returns true if topology is safe for MKTME key creation */
+bool mktme_hmat_evaluate(void)
+{
+	struct acpi_table_header *tbl;
+	bool ret = true;
+	acpi_status status;
+
+	status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl);
+	if (ACPI_FAILURE(status))
+		return -EINVAL;
+
+	if (acpi_table_parse_entries(ACPI_SIG_HMAT,
+				     sizeof(struct acpi_table_hmat),
+				     ACPI_HMAT_TYPE_PROXIMITY,
+				     mktme_parse_proximity_domains, 0) < 0) {
+		ret = false;
+	}
+	acpi_put_table(tbl);
+	return ret;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index d27f4d17dfb3..8854ae942e37 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1337,6 +1337,7 @@ acpi_platform_notify(struct device *dev, enum kobject_action action)
 
 #ifdef CONFIG_X86_INTEL_MKTME
 extern bool acpi_hmat_present(void);
+extern bool mktme_hmat_evaluate(void);
 #endif /* CONFIG_X86_INTEL_MKTME */
 
 #endif	/*_LINUX_ACPI_H*/
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 38/59] keys/mktme: Do not allow key creation in unsafe topologies
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (36 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 37/59] acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 39/59] keys/mktme: Support CPU hotplug for MKTME key service Kirill A. Shutemov
                   ` (20 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

MKTME depends upon at least one online CPU capable of programming
each memory controller in the platform.

An unsafe topology for MKTME is a memory only package or a package
with no online CPUs. Key creation with unsafe topologies will fail
with EINVAL and a warning will be logged one time.
For example:
	[ ] MKTME: no online CPU in proximity domain
	[ ] MKTME: topology does not support key creation

These are recoverable errors. CPUs may be brought online that are
capable of programming a previously unprogrammable memory controller.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 6265b62801e9..70662e882674 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -23,6 +23,7 @@ static unsigned int mktme_available_keyids;  /* Free Hardware KeyIDs */
 static struct kmem_cache *mktme_prog_cache;  /* Hardware programming cache */
 static unsigned long *mktme_target_map;	     /* PCONFIG programming target */
 static cpumask_var_t mktme_leadcpus;	     /* One CPU per PCONFIG target */
+static bool mktme_allow_keys;		     /* HW topology supports keys */
 
 enum mktme_keyid_state {
 	KEYID_AVAILABLE,	/* Available to be assigned */
@@ -253,32 +254,55 @@ static void mktme_destroy_key(struct key *key)
 	percpu_ref_kill(&encrypt_count[keyid]);
 }
 
+static void mktme_update_pconfig_targets(void);
 /* Key Service Method to create a new key. Payload is preparsed. */
 int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 {
 	u32 *payload = prep->payload.data[0];
 	unsigned long flags;
+	int ret = -ENOKEY;
 	int keyid;
 
 	spin_lock_irqsave(&mktme_lock, flags);
+
+	/* Topology supports key creation */
+	if (mktme_allow_keys)
+		goto get_key;
+
+	/* Topology unknown, check it. */
+	if (!mktme_hmat_evaluate()) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* Keys are now allowed. Update the programming targets. */
+	mktme_update_pconfig_targets();
+	mktme_allow_keys = true;
+
+get_key:
 	keyid = mktme_reserve_keyid(key);
 	spin_unlock_irqrestore(&mktme_lock, flags);
 	if (!keyid)
-		return -ENOKEY;
+		goto out;
 
 	if (percpu_ref_init(&encrypt_count[keyid], mktme_percpu_ref_release,
 			    0, GFP_KERNEL))
-		goto err_out;
+		goto out_free_key;
 
-	if (!mktme_program_keyid(keyid, *payload))
-		return MKTME_PROG_SUCCESS;
+	ret = mktme_program_keyid(keyid, *payload);
+	if (ret == MKTME_PROG_SUCCESS)
+		goto out;
 
+	/* Key programming failed */
 	percpu_ref_exit(&encrypt_count[keyid]);
-err_out:
+
+out_free_key:
 	spin_lock_irqsave(&mktme_lock, flags);
 	mktme_release_keyid(keyid);
+out_unlock:
 	spin_unlock_irqrestore(&mktme_lock, flags);
-	return -ENOKEY;
+out:
+	return ret;
 }
 
 /* Make sure arguments are correct for the TYPE of key requested */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 39/59] keys/mktme: Support CPU hotplug for MKTME key service
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (37 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 38/59] keys/mktme: Do not allow key creation in unsafe topologies Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 40/59] keys/mktme: Block memory hotplug additions when MKTME is enabled Kirill A. Shutemov
                   ` (19 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The MKTME encryption hardware resides on each physical package.
The encryption hardware includes 'Key Tables' that must be
programmed identically across all physical packages in the
platform. Although every CPU in a package can program its key
table, the kernel uses one lead CPU per package for programming.

CPU Hotplug Teardown
--------------------
MKTME manages CPU hotplug teardown to make sure the ability to
program all packages is preserved when MKTME keys are present.

When MKTME keys are not currently programmed, simply allow
the teardown, and set "mktme_allow_keys" to false. This will
force a re-evaluation of the platform topology before the next
key creation. If this CPU teardown mattered, MKTME key service
will report an error and fail to create the key. (User can
online that CPU and try again)

When MKTME keys are currently programmed, allow teardowns
of non 'lead CPU's' and of CPUs where another, core sibling
CPU, can take over as lead. Do not allow teardown of any
lead CPU that would render a hardware key table unreachable!

CPU Hotplug Startup
-------------------
CPUs coming online are of interest to the key service, but since
the service never needs to block a CPU startup event, nor does it
need to prepare for an onlining CPU, a callback is not implemented.

MKTME will catch the availability of the new CPU, if it is
needed, at the next key creation time. If keys are not allowed,
that new CPU will be part of the topology evaluation to determine
if keys should now be allowed.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 47 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 70662e882674..b042df73899d 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -460,9 +460,46 @@ static int mktme_alloc_pconfig_targets(void)
 	return 0;
 }
 
+static int mktme_cpu_teardown(unsigned int cpu)
+{
+	int new_leadcpu, ret = 0;
+	unsigned long flags;
+
+	/* Do not allow key programming during cpu hotplug event */
+	spin_lock_irqsave(&mktme_lock, flags);
+
+	/*
+	 * When no keys are in use, allow the teardown, and set
+	 * mktme_allow_keys to FALSE. That forces an evaluation
+	 * of the topology before the next key creation.
+	 */
+	if (mktme_available_keyids == mktme_nr_keyids()) {
+		mktme_allow_keys = false;
+		goto out;
+	}
+	/* Teardown CPU is not a lead CPU. Allow teardown. */
+	if (!cpumask_test_cpu(cpu, mktme_leadcpus))
+		goto out;
+
+	/* Teardown CPU is a lead CPU. Look for a new lead CPU. */
+	new_leadcpu = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+	if (new_leadcpu < nr_cpumask_bits) {
+		/* New lead CPU found. Update the programming mask */
+		__cpumask_clear_cpu(cpu, mktme_leadcpus);
+		__cpumask_set_cpu(new_leadcpu, mktme_leadcpus);
+	} else {
+		/* New lead CPU not found. Do not allow CPU teardown */
+		ret = -1;
+	}
+out:
+	spin_unlock_irqrestore(&mktme_lock, flags);
+	return ret;
+}
+
 static int __init init_mktme(void)
 {
-	int ret;
+	int ret, cpuhp;
 
 	/* Verify keys are present */
 	if (mktme_nr_keyids() < 1)
@@ -500,10 +537,18 @@ static int __init init_mktme(void)
 	if (!encrypt_count)
 		goto free_targets;
 
+	cpuhp = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+					  "keys/mktme_keys:online",
+					  NULL, mktme_cpu_teardown);
+	if (cpuhp < 0)
+		goto free_encrypt;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
 
+	cpuhp_remove_state_nocalls(cpuhp);
+free_encrypt:
 	kvfree(encrypt_count);
 free_targets:
 	free_cpumask_var(mktme_leadcpus);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 40/59] keys/mktme: Block memory hotplug additions when MKTME is enabled
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (38 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 39/59] keys/mktme: Support CPU hotplug for MKTME key service Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 41/59] mm: Generalize the mprotect implementation to support extensions Kirill A. Shutemov
                   ` (18 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Intel platforms supporting MKTME need the ability to evaluate
the memory topology before allowing new memory to go online.
That evaluation would determine if the kernel can program the
memory controller. Every memory controller needs to have a
CPU online, capable of programming its MKTME keys.

The kernel uses the ACPI HMAT at boot time to determine a safe
MKTME topology, but at run time, there is no update to the HMAT.
That run time support will come in the future with platform and
kernel support for the _HMA method.

Meanwhile, be safe, and do not allow any MEM_GOING_ONLINE events
when MKTME is enabled.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index b042df73899d..f804d780fc91 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -8,6 +8,7 @@
 #include <linux/init.h>
 #include <linux/key.h>
 #include <linux/key-type.h>
+#include <linux/memory.h>
 #include <linux/mm.h>
 #include <linux/parser.h>
 #include <linux/percpu-refcount.h>
@@ -497,6 +498,26 @@ static int mktme_cpu_teardown(unsigned int cpu)
 	return ret;
 }
 
+static int mktme_memory_callback(struct notifier_block *nb,
+				 unsigned long action, void *arg)
+{
+	/*
+	 * Do not allow the hot add of memory until run time
+	 * support of the ACPI HMAT is available via an _HMA
+	 * method. Without it, the new memory cannot be
+	 * evaluated to determine an MTKME safe topology.
+	 */
+	if (action == MEM_GOING_ONLINE)
+		return NOTIFY_BAD;
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block mktme_memory_nb = {
+	.notifier_call = mktme_memory_callback,
+	.priority = 99,				/* priority ? */
+};
+
 static int __init init_mktme(void)
 {
 	int ret, cpuhp;
@@ -543,10 +564,15 @@ static int __init init_mktme(void)
 	if (cpuhp < 0)
 		goto free_encrypt;
 
+	if (register_memory_notifier(&mktme_memory_nb))
+		goto remove_cpuhp;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
 
+	unregister_memory_notifier(&mktme_memory_nb);
+remove_cpuhp:
 	cpuhp_remove_state_nocalls(cpuhp);
 free_encrypt:
 	kvfree(encrypt_count);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 41/59] mm: Generalize the mprotect implementation to support extensions
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (39 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 40/59] keys/mktme: Block memory hotplug additions when MKTME is enabled Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 42/59] syscall/x86: Wire up a system call for MKTME encryption keys Kirill A. Shutemov
                   ` (17 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Today mprotect is implemented to support legacy mprotect behavior
plus an extension for memory protection keys. Make it more generic
so that it can support additional extensions in the future.

This is done is preparation for adding a new system call for memory
encyption keys. The intent is that the new encrypted mprotect will be
another extension to legacy mprotect.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/mprotect.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 82d7b194a918..4d55725228e3 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -35,6 +35,8 @@
 
 #include "internal.h"
 
+#define NO_KEY	-1
+
 static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long addr, unsigned long end, pgprot_t newprot,
 		int dirty_accountable, int prot_numa)
@@ -453,9 +455,9 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 }
 
 /*
- * pkey==-1 when doing a legacy mprotect()
+ * When pkey==NO_KEY we get legacy mprotect behavior here.
  */
-static int do_mprotect_pkey(unsigned long start, size_t len,
+static int do_mprotect_ext(unsigned long start, size_t len,
 		unsigned long prot, int pkey)
 {
 	unsigned long nstart, end, tmp, reqprot;
@@ -579,7 +581,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot)
 {
-	return do_mprotect_pkey(start, len, prot, -1);
+	return do_mprotect_ext(start, len, prot, NO_KEY);
 }
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -587,7 +589,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot, int, pkey)
 {
-	return do_mprotect_pkey(start, len, prot, pkey);
+	return do_mprotect_ext(start, len, prot, pkey);
 }
 
 SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 42/59] syscall/x86: Wire up a system call for MKTME encryption keys
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (40 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 41/59] mm: Generalize the mprotect implementation to support extensions Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 43/59] x86/mm: Set KeyIDs in encrypted VMAs for MKTME Kirill A. Shutemov
                   ` (16 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

encrypt_mprotect() is a new system call to support memory encryption.

It takes the same parameters as legacy mprotect, plus an additional
key serial number that is mapped to an encryption keyid.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 include/linux/syscalls.h               | 2 ++
 include/uapi/asm-generic/unistd.h      | 4 +++-
 kernel/sys_ni.c                        | 2 ++
 5 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index c00019abd076..1b30cd007a6a 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -440,3 +440,4 @@
 433	i386	fspick			sys_fspick			__ia32_sys_fspick
 434	i386	pidfd_open		sys_pidfd_open			__ia32_sys_pidfd_open
 435	i386	clone3			sys_clone3			__ia32_sys_clone3
+436	i386	encrypt_mprotect	sys_encrypt_mprotect		__ia32_sys_encrypt_mprotect
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index c29976eca4a8..716d8a89159b 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -357,6 +357,7 @@
 433	common	fspick			__x64_sys_fspick
 434	common	pidfd_open		__x64_sys_pidfd_open
 435	common	clone3			__x64_sys_clone3/ptregs
+436	common	encrypt_mprotect	__x64_sys_encrypt_mprotect
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 88145da7d140..4494b1d9c85a 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1000,6 +1000,8 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags)
 asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 				       siginfo_t __user *info,
 				       unsigned int flags);
+asmlinkage long sys_encrypt_mprotect(unsigned long start, size_t len,
+				     unsigned long prot, key_serial_t serial);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 1be0e798e362..7c1cd13f6aaf 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -850,9 +850,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
 #define __NR_clone3 435
 __SYSCALL(__NR_clone3, sys_clone3)
 #endif
+#define __NR_encrypt_mprotect 436
+__SYSCALL(__NR_encrypt_mprotect, sys_encrypt_mprotect)
 
 #undef __NR_syscalls
-#define __NR_syscalls 436
+#define __NR_syscalls 437
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 34b76895b81e..84c8c47cf9d6 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -349,6 +349,8 @@ COND_SYSCALL(pkey_mprotect);
 COND_SYSCALL(pkey_alloc);
 COND_SYSCALL(pkey_free);
 
+/* multi-key total memory encryption keys */
+COND_SYSCALL(encrypt_mprotect);
 
 /*
  * Architecture specific weak syscall entries.
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 43/59] x86/mm: Set KeyIDs in encrypted VMAs for MKTME
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (41 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 42/59] syscall/x86: Wire up a system call for MKTME encryption keys Kirill A. Shutemov
@ 2019-07-31 15:07 ` Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 44/59] mm: Add the encrypt_mprotect() system call " Kirill A. Shutemov
                   ` (15 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

MKTME architecture requires the KeyID to be placed in PTE bits 51:46.
To create an encrypted VMA, place the KeyID in the upper bits of
vm_page_prot that matches the position of those PTE bits.

When the VMA is assigned a KeyID it is always considered a KeyID
change. The VMA is either going from not encrypted to encrypted,
or from encrypted with any KeyID to encrypted with any other KeyID.
To make the change safely, remove the user pages held by the VMA
and unlink the VMA's anonymous chain.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |  4 ++++
 arch/x86/mm/mktme.c          | 26 ++++++++++++++++++++++++++
 include/linux/mm.h           |  6 ++++++
 3 files changed, 36 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index d26ada6b65f7..e8f7f80bb013 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -16,6 +16,10 @@ extern int __mktme_nr_keyids;
 extern int mktme_nr_keyids(void);
 extern unsigned int mktme_algs;
 
+/* Set the encryption keyid bits in a VMA */
+extern void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
+				unsigned long start, unsigned long end);
+
 DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
 static inline bool mktme_enabled(void)
 {
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index ed13967bb543..05bbf5058ade 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,5 +1,6 @@
 #include <linux/mm.h>
 #include <linux/highmem.h>
+#include <linux/rmap.h>
 #include <asm/mktme.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
@@ -71,6 +72,31 @@ int __vma_keyid(struct vm_area_struct *vma)
 	return (prot & mktme_keyid_mask()) >> mktme_keyid_shift();
 }
 
+/* Set the encryption keyid bits in a VMA */
+void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
+			  unsigned long start, unsigned long end)
+{
+	int oldkeyid = vma_keyid(vma);
+	pgprotval_t newprot;
+
+	/* Unmap pages with old KeyID if there's any. */
+	zap_page_range(vma, start, end - start);
+
+	if (oldkeyid == newkeyid)
+		return;
+
+	newprot = pgprot_val(vma->vm_page_prot);
+	newprot &= ~mktme_keyid_mask();
+	newprot |= (unsigned long)newkeyid << mktme_keyid_shift();
+	vma->vm_page_prot = __pgprot(newprot);
+
+	/*
+	 * The VMA doesn't have any inherited pages.
+	 * Start anon VMA tree from scratch.
+	 */
+	unlink_anon_vmas(vma);
+}
+
 /* Prepare page to be used for encryption. Called from page allocator. */
 void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3f9640f388ac..98a6d2bd66a6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2905,5 +2905,11 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+#ifndef CONFIG_X86_INTEL_MKTME
+static inline void mprotect_set_encrypt(struct vm_area_struct *vma,
+					int newkeyid,
+					unsigned long start,
+					unsigned long end) {}
+#endif /* CONFIG_X86_INTEL_MKTME */
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 44/59] mm: Add the encrypt_mprotect() system call for MKTME
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (42 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 43/59] x86/mm: Set KeyIDs in encrypted VMAs for MKTME Kirill A. Shutemov
@ 2019-07-31 15:07 ` " Kirill A. Shutemov
  2019-07-31 15:07 ` [PATCHv2 45/59] x86/mm: Keep reference counts on hardware key usage " Kirill A. Shutemov
                   ` (14 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Implement memory encryption for MKTME (Multi-Key Total Memory
Encryption) with a new system call that is an extension of the
legacy mprotect() system call.

In encrypt_mprotect the caller must pass a handle to a previously
allocated and programmed MKTME encryption key. The key can be
obtained through the kernel key service type "mktme". The caller
must have KEY_NEED_VIEW permission on the key.

MKTME places an additional restriction on the protected data:
The length of the data must be page aligned. This is in addition
to the existing mprotect restriction that the addr must be page
aligned.

encrypt_mprotect() will lookup the hardware keyid for the given
userspace key. It will use previously defined helpers to insert
that keyid in the VMAs during legacy mprotect() execution.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/exec.c          |  4 +--
 include/linux/mm.h |  3 +-
 mm/mprotect.c      | 68 +++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index c71cbfe6826a..261e81b7e3a4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -756,8 +756,8 @@ int setup_arg_pages(struct linux_binprm *bprm,
 	vm_flags |= mm->def_flags;
 	vm_flags |= VM_STACK_INCOMPLETE_SETUP;
 
-	ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
-			vm_flags);
+	ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags,
+			     -1);
 	if (ret)
 		goto out_unlock;
 	BUG_ON(prev != vma);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 98a6d2bd66a6..8551b5ebdedf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1660,7 +1660,8 @@ extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long
 			      int dirty_accountable, int prot_numa);
 extern int mprotect_fixup(struct vm_area_struct *vma,
 			  struct vm_area_struct **pprev, unsigned long start,
-			  unsigned long end, unsigned long newflags);
+			  unsigned long end, unsigned long newflags,
+			  int newkeyid);
 
 /*
  * doesn't attempt to fault and will return short.
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 4d55725228e3..518d75582e7b 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -28,6 +28,7 @@
 #include <linux/ksm.h>
 #include <linux/uaccess.h>
 #include <linux/mm_inline.h>
+#include <linux/key.h>
 #include <asm/pgtable.h>
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
@@ -348,7 +349,8 @@ static int prot_none_walk(struct vm_area_struct *vma, unsigned long start,
 
 int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
-	unsigned long start, unsigned long end, unsigned long newflags)
+	       unsigned long start, unsigned long end, unsigned long newflags,
+	       int newkeyid)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long oldflags = vma->vm_flags;
@@ -358,7 +360,14 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	int error;
 	int dirty_accountable = 0;
 
-	if (newflags == oldflags) {
+	/*
+	 * Flags match and Keyids match or we have NO_KEY.
+	 * This _fixup is usually called from do_mprotect_ext() except
+	 * for one special case: caller fs/exec.c/setup_arg_pages()
+	 * In that case, newkeyid is passed as -1 (NO_KEY).
+	 */
+	if (newflags == oldflags &&
+	    (newkeyid == vma_keyid(vma) || newkeyid == NO_KEY)) {
 		*pprev = vma;
 		return 0;
 	}
@@ -424,6 +433,8 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	}
 
 success:
+	if (newkeyid != NO_KEY)
+		mprotect_set_encrypt(vma, newkeyid, start, end);
 	/*
 	 * vm_flags and vm_page_prot are protected by the mmap_sem
 	 * held in write mode.
@@ -455,10 +466,15 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 }
 
 /*
- * When pkey==NO_KEY we get legacy mprotect behavior here.
+ * do_mprotect_ext() supports the legacy mprotect behavior plus extensions
+ * for Protection Keys and Memory Encryption Keys. These extensions are
+ * mutually exclusive and the behavior is:
+ *	(pkey==NO_KEY && keyid==NO_KEY) ==> legacy mprotect
+ *	(pkey is valid)  ==> legacy mprotect plus Protection Key extensions
+ *	(keyid is valid) ==> legacy mprotect plus Encryption Key extensions
  */
 static int do_mprotect_ext(unsigned long start, size_t len,
-		unsigned long prot, int pkey)
+			   unsigned long prot, int pkey, int keyid)
 {
 	unsigned long nstart, end, tmp, reqprot;
 	struct vm_area_struct *vma, *prev;
@@ -556,7 +572,8 @@ static int do_mprotect_ext(unsigned long start, size_t len,
 		tmp = vma->vm_end;
 		if (tmp > end)
 			tmp = end;
-		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
+		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags,
+				       keyid);
 		if (error)
 			goto out;
 		nstart = tmp;
@@ -581,7 +598,7 @@ static int do_mprotect_ext(unsigned long start, size_t len,
 SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot)
 {
-	return do_mprotect_ext(start, len, prot, NO_KEY);
+	return do_mprotect_ext(start, len, prot, NO_KEY, NO_KEY);
 }
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -589,7 +606,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot, int, pkey)
 {
-	return do_mprotect_ext(start, len, prot, pkey);
+	return do_mprotect_ext(start, len, prot, pkey, NO_KEY);
 }
 
 SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
@@ -638,3 +655,40 @@ SYSCALL_DEFINE1(pkey_free, int, pkey)
 }
 
 #endif /* CONFIG_ARCH_HAS_PKEYS */
+
+#ifdef CONFIG_X86_INTEL_MKTME
+
+extern int mktme_keyid_from_key(struct key *key);
+
+SYSCALL_DEFINE4(encrypt_mprotect, unsigned long, start, size_t, len,
+		unsigned long, prot, key_serial_t, serial)
+{
+	key_ref_t key_ref;
+	struct key *key;
+	int ret, keyid;
+
+	/* MKTME restriction */
+	if (!PAGE_ALIGNED(len))
+		return -EINVAL;
+
+	/*
+	 * key_ref prevents the destruction of the key
+	 * while the memory encryption is being set up.
+	 */
+
+	key_ref = lookup_user_key(serial, 0, KEY_NEED_VIEW);
+	if (IS_ERR(key_ref))
+		return PTR_ERR(key_ref);
+
+	key = key_ref_to_ptr(key_ref);
+	keyid = mktme_keyid_from_key(key);
+	if (!keyid) {
+		key_ref_put(key_ref);
+		return -EINVAL;
+	}
+	ret = do_mprotect_ext(start, len, prot, NO_KEY, keyid);
+	key_ref_put(key_ref);
+	return ret;
+}
+
+#endif /* CONFIG_X86_INTEL_MKTME */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 45/59] x86/mm: Keep reference counts on hardware key usage for MKTME
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (43 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 44/59] mm: Add the encrypt_mprotect() system call " Kirill A. Shutemov
@ 2019-07-31 15:07 ` " Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 46/59] mm: Restrict MKTME memory encryption to anonymous VMAs Kirill A. Shutemov
                   ` (13 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:07 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

The MKTME (Multi-Key Total Memory Encryption) Key Service needs
a reference count the key usage. This reference count is used
to determine when a hardware encryption KeyID is no longer in use
and can be freed and reassigned to another Userspace Key.

The MKTME Key service does the percpu_ref_init and _kill.

Encrypted VMA's and encrypted pages are included in the reference
counts per keyid.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |  5 +++++
 arch/x86/mm/mktme.c          | 37 ++++++++++++++++++++++++++++++++++--
 include/linux/mm.h           |  2 ++
 kernel/fork.c                |  2 ++
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index e8f7f80bb013..a5f664d3805b 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -20,6 +20,11 @@ extern unsigned int mktme_algs;
 extern void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
 				unsigned long start, unsigned long end);
 
+/* MTKME encrypt_count for VMAs */
+extern struct percpu_ref *encrypt_count;
+extern void vma_get_encrypt_ref(struct vm_area_struct *vma);
+extern void vma_put_encrypt_ref(struct vm_area_struct *vma);
+
 DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
 static inline bool mktme_enabled(void)
 {
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 05bbf5058ade..17366d81c21b 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -84,11 +84,12 @@ void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
 
 	if (oldkeyid == newkeyid)
 		return;
-
+	vma_put_encrypt_ref(vma);
 	newprot = pgprot_val(vma->vm_page_prot);
 	newprot &= ~mktme_keyid_mask();
 	newprot |= (unsigned long)newkeyid << mktme_keyid_shift();
 	vma->vm_page_prot = __pgprot(newprot);
+	vma_get_encrypt_ref(vma);
 
 	/*
 	 * The VMA doesn't have any inherited pages.
@@ -97,6 +98,18 @@ void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
 	unlink_anon_vmas(vma);
 }
 
+void vma_get_encrypt_ref(struct vm_area_struct *vma)
+{
+	if (vma_keyid(vma))
+		percpu_ref_get(&encrypt_count[vma_keyid(vma)]);
+}
+
+void vma_put_encrypt_ref(struct vm_area_struct *vma)
+{
+	if (vma_keyid(vma))
+		percpu_ref_put(&encrypt_count[vma_keyid(vma)]);
+}
+
 /* Prepare page to be used for encryption. Called from page allocator. */
 void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
 {
@@ -137,6 +150,22 @@ void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
 
 		page++;
 	}
+
+	/*
+	 * Make sure the KeyID cannot be freed until the last page that
+	 * uses the KeyID is gone.
+	 *
+	 * This is required because the page may live longer than VMA it
+	 * is mapped into (i.e. in get_user_pages() case) and having
+	 * refcounting per-VMA is not enough.
+	 *
+	 * Taking a reference per-4K helps in case if the page will be
+	 * split after the allocation. free_encrypted_page() will balance
+	 * out the refcount even if the page was split and freed as bunch
+	 * of 4K pages.
+	 */
+
+	percpu_ref_get_many(&encrypt_count[keyid], 1 << order);
 }
 
 /*
@@ -145,7 +174,9 @@ void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
  */
 void free_encrypted_page(struct page *page, int order)
 {
-	int i;
+	int i, keyid;
+
+	keyid = page_keyid(page);
 
 	/*
 	 * The hardware/CPU does not enforce coherency between mappings
@@ -177,6 +208,8 @@ void free_encrypted_page(struct page *page, int order)
 		lookup_page_ext(page)->keyid = 0;
 		page++;
 	}
+
+	percpu_ref_put_many(&encrypt_count[keyid], 1 << order);
 }
 
 static int sync_direct_mapping_pte(unsigned long keyid,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8551b5ebdedf..be27cb0cc0c7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2911,6 +2911,8 @@ static inline void mprotect_set_encrypt(struct vm_area_struct *vma,
 					int newkeyid,
 					unsigned long start,
 					unsigned long end) {}
+static inline void vma_get_encrypt_ref(struct vm_area_struct *vma) {}
+static inline void vma_put_encrypt_ref(struct vm_area_struct *vma) {}
 #endif /* CONFIG_X86_INTEL_MKTME */
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index d8ae0f1b4148..00735092d370 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -349,12 +349,14 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
 	if (new) {
 		*new = *orig;
 		INIT_LIST_HEAD(&new->anon_vma_chain);
+		vma_get_encrypt_ref(new);
 	}
 	return new;
 }
 
 void vm_area_free(struct vm_area_struct *vma)
 {
+	vma_put_encrypt_ref(vma);
 	kmem_cache_free(vm_area_cachep, vma);
 }
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 46/59] mm: Restrict MKTME memory encryption to anonymous VMAs
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (44 preceding siblings ...)
  2019-07-31 15:07 ` [PATCHv2 45/59] x86/mm: Keep reference counts on hardware key usage " Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 47/59] kvm, x86, mmu: setup MKTME keyID to spte for given PFN Kirill A. Shutemov
                   ` (12 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Memory encryption is only supported for mappings that are ANONYMOUS.
Test the VMA's in an encrypt_mprotect() request to make sure they all
meet that requirement before encrypting any.

The encrypt_mprotect syscall will return -EINVAL and will not encrypt
any VMA's if this check fails.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/mprotect.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 518d75582e7b..4b079e1b2d6f 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -347,6 +347,24 @@ static int prot_none_walk(struct vm_area_struct *vma, unsigned long start,
 	return walk_page_range(start, end, &prot_none_walk);
 }
 
+/*
+ * Encrypted mprotect is only supported on anonymous mappings.
+ * If this test fails on any single VMA, the entire mprotect
+ * request fails.
+ */
+static bool mem_supports_encryption(struct vm_area_struct *vma, unsigned long end)
+{
+	struct vm_area_struct *test_vma = vma;
+
+	do {
+		if (!vma_is_anonymous(test_vma))
+			return false;
+
+		test_vma = test_vma->vm_next;
+	} while (test_vma && test_vma->vm_start < end);
+	return true;
+}
+
 int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	       unsigned long start, unsigned long end, unsigned long newflags,
@@ -533,6 +551,12 @@ static int do_mprotect_ext(unsigned long start, size_t len,
 				goto out;
 		}
 	}
+
+	if (keyid > 0 && !mem_supports_encryption(vma, end)) {
+		error = -EINVAL;
+		goto out;
+	}
+
 	if (start > vma->vm_start)
 		prev = vma;
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 47/59] kvm, x86, mmu: setup MKTME keyID to spte for given PFN
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (45 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 46/59] mm: Restrict MKTME memory encryption to anonymous VMAs Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-08-06 20:26   ` Lendacky, Thomas
  2019-07-31 15:08 ` [PATCHv2 48/59] iommu/vt-d: Support MKTME in DMA remapping Kirill A. Shutemov
                   ` (11 subsequent siblings)
  58 siblings, 1 reply; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Kai Huang <kai.huang@linux.intel.com>

Setup keyID to SPTE, which will be eventually programmed to shadow MMU
or EPT table, according to page's associated keyID, so that guest is
able to use correct keyID to access guest memory.

Note current shadow_me_mask doesn't suit MKTME's needs, since for MKTME
there's no fixed memory encryption mask, but can vary from keyID 1 to
maximum keyID, therefore shadow_me_mask remains 0 for MKTME.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8f72526e2f68..b8742e6219f6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2936,6 +2936,22 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
 #define SET_SPTE_WRITE_PROTECTED_PT	BIT(0)
 #define SET_SPTE_NEED_REMOTE_TLB_FLUSH	BIT(1)
 
+static u64 get_phys_encryption_mask(kvm_pfn_t pfn)
+{
+#ifdef CONFIG_X86_INTEL_MKTME
+	struct page *page;
+
+	if (!pfn_valid(pfn))
+		return 0;
+
+	page = pfn_to_page(pfn);
+
+	return ((u64)page_keyid(page)) << mktme_keyid_shift();
+#else
+	return shadow_me_mask;
+#endif
+}
+
 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		    unsigned pte_access, int level,
 		    gfn_t gfn, kvm_pfn_t pfn, bool speculative,
@@ -2982,7 +2998,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		pte_access &= ~ACC_WRITE_MASK;
 
 	if (!kvm_is_mmio_pfn(pfn))
-		spte |= shadow_me_mask;
+		spte |= get_phys_encryption_mask(pfn);
 
 	spte |= (u64)pfn << PAGE_SHIFT;
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 48/59] iommu/vt-d: Support MKTME in DMA remapping
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (46 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 47/59] kvm, x86, mmu: setup MKTME keyID to spte for given PFN Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 49/59] x86/mm: introduce common code for mem encryption Kirill A. Shutemov
                   ` (10 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

When MKTME is enabled, keyid is stored in the high order bits of physical
address. For DMA transactions targeting encrypted physical memory, keyid
must be included in the IOVA to physical address translation.

This patch appends page keyid when setting up the IOMMU PTEs. On the
reverse direction, keyid bits are cleared in the physical address lookup.
Mapping functions of both DMA ops and IOMMU ops are covered.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 29 +++++++++++++++++++++++++++--
 include/linux/intel-iommu.h |  9 ++++++++-
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ac4172c02244..32d22872656b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -867,6 +867,28 @@ static void free_context_table(struct intel_iommu *iommu)
 	spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
+static inline void set_pte_mktme_keyid(unsigned long phys_pfn,
+		phys_addr_t *pteval)
+{
+	unsigned long keyid;
+
+	if (!pfn_valid(phys_pfn))
+		return;
+
+	keyid = page_keyid(pfn_to_page(phys_pfn));
+
+#ifdef CONFIG_X86_INTEL_MKTME
+	/*
+	 * When MKTME is enabled, set keyid in PTE such that DMA
+	 * remapping will include keyid in the translation from IOVA
+	 * to physical address. This applies to both user and kernel
+	 * allocated DMA memory.
+	 */
+	*pteval &= ~mktme_keyid_mask();
+	*pteval |= keyid << mktme_keyid_shift();
+#endif
+}
+
 static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 				      unsigned long pfn, int *target_level)
 {
@@ -893,7 +915,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 			break;
 
 		if (!dma_pte_present(pte)) {
-			uint64_t pteval;
+			phys_addr_t pteval;
 
 			tmp_page = alloc_pgtable_page(domain->nid);
 
@@ -901,7 +923,8 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 				return NULL;
 
 			domain_flush_cache(domain, tmp_page, VTD_PAGE_SIZE);
-			pteval = ((uint64_t)virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE;
+			pteval = (virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE;
+			set_pte_mktme_keyid(virt_to_dma_pfn(tmp_page), &pteval);
 			if (cmpxchg64(&pte->val, 0ULL, pteval))
 				/* Someone else set it while we were thinking; use theirs. */
 				free_pgtable_page(tmp_page);
@@ -2214,6 +2237,8 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 			}
 
 		}
+		set_pte_mktme_keyid(phys_pfn, &pteval);
+
 		/* We don't need lock here, nobody else
 		 * touches the iova range
 		 */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index f2ae8a006ff8..8fbb9353d5a6 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -22,6 +22,8 @@
 
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
+#include <asm/page.h>
+
 
 /*
  * VT-d hardware uses 4KiB page size regardless of host page size.
@@ -608,7 +610,12 @@ static inline void dma_clear_pte(struct dma_pte *pte)
 static inline u64 dma_pte_addr(struct dma_pte *pte)
 {
 #ifdef CONFIG_64BIT
-	return pte->val & VTD_PAGE_MASK;
+	u64 addr = pte->val;
+	addr &= VTD_PAGE_MASK;
+#ifdef CONFIG_X86_INTEL_MKTME
+	addr &= ~mktme_keyid_mask();
+#endif
+	return addr;
 #else
 	/* Must have a full atomic 64-bit read */
 	return  __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 49/59] x86/mm: introduce common code for mem encryption
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (47 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 48/59] iommu/vt-d: Support MKTME in DMA remapping Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 50/59] x86/mm: Use common code for DMA memory encryption Kirill A. Shutemov
                   ` (9 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

Both Intel MKTME and AMD SME have needs to support DMA address
translation with encryption related bits. Common functions are
introduced in this patch to keep DMA generic code abstracted.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/Kconfig                 |  8 +++--
 arch/x86/mm/Makefile             |  1 +
 arch/x86/mm/mem_encrypt.c        | 30 ------------------
 arch/x86/mm/mem_encrypt_common.c | 52 ++++++++++++++++++++++++++++++++
 4 files changed, 59 insertions(+), 32 deletions(-)
 create mode 100644 arch/x86/mm/mem_encrypt_common.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2eb2867db5fa..f2cc88fe8ada 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1521,12 +1521,16 @@ config X86_CPA_STATISTICS
 config ARCH_HAS_MEM_ENCRYPT
 	def_bool y
 
+config X86_MEM_ENCRYPT_COMMON
+	select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+	select DYNAMIC_PHYSICAL_MASK
+	def_bool n
+
 config AMD_MEM_ENCRYPT
 	bool "AMD Secure Memory Encryption (SME) support"
 	depends on X86_64 && CPU_SUP_AMD
-	select DYNAMIC_PHYSICAL_MASK
 	select ARCH_USE_MEMREMAP_PROT
-	select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+	select X86_MEM_ENCRYPT_COMMON
 	---help---
 	  Say yes to enable support for the encryption of system memory.
 	  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 600d18691876..608e57cda784 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -55,3 +55,4 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_identity.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
 
 obj-$(CONFIG_X86_INTEL_MKTME)	+= mktme.o
+obj-$(CONFIG_X86_MEM_ENCRYPT_COMMON)	+= mem_encrypt_common.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index fece30ca8b0c..e94e0a62ba92 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -15,10 +15,6 @@
 #include <linux/dma-direct.h>
 #include <linux/swiotlb.h>
 #include <linux/mem_encrypt.h>
-#include <linux/device.h>
-#include <linux/kernel.h>
-#include <linux/bitops.h>
-#include <linux/dma-mapping.h>
 
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
@@ -352,32 +348,6 @@ bool sev_active(void)
 }
 EXPORT_SYMBOL(sev_active);
 
-/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
-bool force_dma_unencrypted(struct device *dev)
-{
-	/*
-	 * For SEV, all DMA must be to unencrypted addresses.
-	 */
-	if (sev_active())
-		return true;
-
-	/*
-	 * For SME, all DMA must be to unencrypted addresses if the
-	 * device does not support DMA to addresses that include the
-	 * encryption mask.
-	 */
-	if (sme_active()) {
-		u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
-		u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
-						dev->bus_dma_mask);
-
-		if (dma_dev_mask <= dma_enc_mask)
-			return true;
-	}
-
-	return false;
-}
-
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_free_decrypted_mem(void)
 {
diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c
new file mode 100644
index 000000000000..c11d70151735
--- /dev/null
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -0,0 +1,52 @@
+#include <linux/mm.h>
+#include <linux/mem_encrypt.h>
+#include <linux/dma-mapping.h>
+#include <asm/mktme.h>
+
+/*
+ * Encryption bits need to be set and cleared for both Intel MKTME and
+ * AMD SME when converting between DMA address and physical address.
+ */
+dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr)
+{
+	unsigned long keyid;
+
+	if (sme_active())
+		return __sme_set(daddr);
+	keyid = page_keyid(pfn_to_page(__phys_to_pfn(paddr)));
+
+	return (daddr & ~mktme_keyid_mask()) | (keyid << mktme_keyid_shift());
+}
+
+phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
+{
+	if (sme_active())
+		return __sme_clr(paddr);
+
+	return paddr & ~mktme_keyid_mask();
+}
+
+/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
+bool force_dma_unencrypted(struct device *dev)
+{
+	u64 dma_enc_mask, dma_dev_mask;
+
+	/*
+	 * For SEV, all DMA must be to unencrypted addresses.
+	 */
+	if (sev_active())
+		return true;
+
+	/*
+	 * For SME and MKTME, all DMA must be to unencrypted addresses if the
+	 * device does not support DMA to addresses that include the encryption
+	 * mask.
+	 */
+	if (!sme_active() && !mktme_enabled())
+		return false;
+
+	dma_enc_mask = sme_me_mask | mktme_keyid_mask();
+	dma_dev_mask = min_not_zero(dev->coherent_dma_mask, dev->bus_dma_mask);
+
+	return (dma_dev_mask & dma_enc_mask) != dma_enc_mask;
+}
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 50/59] x86/mm: Use common code for DMA memory encryption
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (48 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 49/59] x86/mm: introduce common code for mem encryption Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 51/59] x86/mm: Disable MKTME on incompatible platform configurations Kirill A. Shutemov
                   ` (8 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

Replace sme_ code with x86 memory encryption common code such that
Intel MKTME can be supported underneath generic DMA code.
dma_to_phys() & phys_to_dma() results will be runtime modified by
memory encryption code.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mem_encrypt.h | 29 +++++++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt_common.c   |  2 +-
 include/linux/dma-direct.h         |  4 ++--
 include/linux/mem_encrypt.h        | 23 ++++++++++-------------
 4 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 0c196c47d621..62a1493f389c 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -52,8 +52,19 @@ bool sev_active(void);
 
 #define __bss_decrypted __attribute__((__section__(".bss..decrypted")))
 
+/*
+ * The __sme_set() and __sme_clr() macros are useful for adding or removing
+ * the encryption mask from a value (e.g. when dealing with pagetable
+ * entries).
+ */
+#define __sme_set(x)		((x) | sme_me_mask)
+#define __sme_clr(x)		((x) & ~sme_me_mask)
+
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
+#define __sme_set(x)		(x)
+#define __sme_clr(x)		(x)
+
 #define sme_me_mask	0ULL
 
 static inline void __init sme_early_encrypt(resource_size_t paddr,
@@ -94,4 +105,22 @@ extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypte
 
 #endif	/* __ASSEMBLY__ */
 
+#ifdef CONFIG_X86_MEM_ENCRYPT_COMMON
+
+extern dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr);
+extern phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr);
+
+#else
+static inline dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr)
+{
+	return daddr;
+}
+
+static inline phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
+{
+	return paddr;
+}
+#endif /* CONFIG_X86_MEM_ENCRYPT_COMMON */
+
+
 #endif	/* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c
index c11d70151735..588d6ea45624 100644
--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -1,6 +1,6 @@
 #include <linux/mm.h>
-#include <linux/mem_encrypt.h>
 #include <linux/dma-mapping.h>
+#include <asm/mem_encrypt.h>
 #include <asm/mktme.h>
 
 /*
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index adf993a3bd58..6ce96b06c440 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -49,12 +49,12 @@ static inline bool force_dma_unencrypted(struct device *dev)
  */
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
-	return __sme_set(__phys_to_dma(dev, paddr));
+	return __mem_encrypt_dma_set(__phys_to_dma(dev, paddr), paddr);
 }
 
 static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
-	return __sme_clr(__dma_to_phys(dev, daddr));
+	return __mem_encrypt_dma_clear(__dma_to_phys(dev, daddr));
 }
 
 u64 dma_direct_get_required_mask(struct device *dev);
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 470bd53a89df..88724aa7c065 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -23,6 +23,16 @@
 static inline bool sme_active(void) { return false; }
 static inline bool sev_active(void) { return false; }
 
+static inline dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr)
+{
+	return daddr;
+}
+
+static inline phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
+{
+	return paddr;
+}
+
 #endif	/* CONFIG_ARCH_HAS_MEM_ENCRYPT */
 
 static inline bool mem_encrypt_active(void)
@@ -35,19 +45,6 @@ static inline u64 sme_get_me_mask(void)
 	return sme_me_mask;
 }
 
-#ifdef CONFIG_AMD_MEM_ENCRYPT
-/*
- * The __sme_set() and __sme_clr() macros are useful for adding or removing
- * the encryption mask from a value (e.g. when dealing with pagetable
- * entries).
- */
-#define __sme_set(x)		((x) | sme_me_mask)
-#define __sme_clr(x)		((x) & ~sme_me_mask)
-#else
-#define __sme_set(x)		(x)
-#define __sme_clr(x)		(x)
-#endif
-
 #endif	/* __ASSEMBLY__ */
 
 #endif	/* __MEM_ENCRYPT_H__ */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 51/59] x86/mm: Disable MKTME on incompatible platform configurations
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (49 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 50/59] x86/mm: Use common code for DMA memory encryption Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 52/59] x86/mm: Disable MKTME if not all system memory supports encryption Kirill A. Shutemov
                   ` (7 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Icelake Server requires additional check to make sure that MKTME usage
is safe on Linux.

Kernel needs a way to access encrypted memory. There can be different
approaches to this: create a temporary mapping to access the page (using
kmap() interface), modify kernel's direct mapping on allocation of
encrypted page.

In order to minimize runtime overhead, the Linux MKTME implementation
uses multiple direct mappings, one per-KeyID. Kernel uses the direct
mapping that is relevant for the page at the moment.

Icelake Server in some configurations doesn't allow a page to be mapped
with multiple KeyIDs at the same time. Even if only one of KeyIDs is
actively used. It conflicts with the Linux MKTME implementation.

OS can check if it's safe to map the same with multiple KeyIDs by
examining bit 8 of MSR 0x6F. If the bit is set we cannot safely use
MKTME on Linux.

The user can disable the Directory Mode in BIOS setup to get the
platform into Linux-compatible mode.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/cpu/intel.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 9852580340b9..3583bea0a5b9 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,7 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -560,6 +561,16 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
 
 #define TME_ACTIVATE_CRYPTO_KNOWN_ALGS	TME_ACTIVATE_CRYPTO_AES_XTS_128
 
+#define MSR_ICX_MKTME_STATUS		0x6F
+#define MKTME_ALIASES_FORBIDDEN(x)	(x & BIT(8))
+
+/* Need to check MSR_ICX_MKTME_STATUS for these CPUs */
+static const struct x86_cpu_id mktme_status_msr_ids[] = {
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ICELAKE_X		},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ICELAKE_XEON_D	},
+	{}
+};
+
 /* Values for mktme_status (SW only construct) */
 #define MKTME_ENABLED			0
 #define MKTME_DISABLED			1
@@ -593,6 +604,17 @@ static void detect_tme(struct cpuinfo_x86 *c)
 		return;
 	}
 
+	/* Icelake Server quirk: do not enable MKTME if aliases are forbidden */
+	if (x86_match_cpu(mktme_status_msr_ids)) {
+		u64 status;
+		rdmsrl(MSR_ICX_MKTME_STATUS, status);
+
+		if (MKTME_ALIASES_FORBIDDEN(status)) {
+			pr_err_once("x86/tme: Directory Mode is enabled in BIOS\n");
+			mktme_status = MKTME_DISABLED;
+		}
+	}
+
 	if (mktme_status != MKTME_UNINITIALIZED)
 		goto detect_keyid_bits;
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 52/59] x86/mm: Disable MKTME if not all system memory supports encryption
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (50 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 51/59] x86/mm: Disable MKTME on incompatible platform configurations Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 53/59] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
                   ` (6 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

UEFI memory attribute EFI_MEMORY_CPU_CRYPTO indicates whether the memory
region supports encryption.

Kernel doesn't handle situation when only part of the system memory
supports encryption.

Disable MKTME if not all system memory supports encryption.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/mm/mktme.c        | 35 +++++++++++++++++++++++++++++++++++
 drivers/firmware/efi/efi.c | 25 +++++++++++++------------
 include/linux/efi.h        |  1 +
 3 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 17366d81c21b..4e00c244478b 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,9 +1,11 @@
 #include <linux/mm.h>
 #include <linux/highmem.h>
 #include <linux/rmap.h>
+#include <linux/efi.h>
 #include <asm/mktme.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
+#include <asm/e820/api.h>
 
 /* Mask to extract KeyID from physical address. */
 phys_addr_t __mktme_keyid_mask;
@@ -48,9 +50,42 @@ void mktme_disable(void)
 
 static bool need_page_mktme(void)
 {
+	int nid;
+
 	/* Make sure keyid doesn't collide with extended page flags */
 	BUILD_BUG_ON(__NR_PAGE_EXT_FLAGS > 16);
 
+	if (!mktme_nr_keyids())
+		return 0;
+
+	for_each_node_state(nid, N_MEMORY) {
+		const efi_memory_desc_t *md;
+		unsigned long node_start, node_end;
+
+		node_start = node_start_pfn(nid) << PAGE_SHIFT;
+		node_end = node_end_pfn(nid) << PAGE_SHIFT;
+
+		for_each_efi_memory_desc(md) {
+			u64 efi_start = md->phys_addr;
+			u64 efi_end = md->phys_addr + PAGE_SIZE * md->num_pages;
+
+			if (md->attribute & EFI_MEMORY_CPU_CRYPTO)
+				continue;
+			if (efi_start > node_end)
+				continue;
+			if (efi_end  < node_start)
+				continue;
+			if (!e820__mapped_any(efi_start, efi_end, E820_TYPE_RAM))
+				continue;
+
+			printk("Memory range %#llx-%#llx: doesn't support encryption\n",
+					efi_start, efi_end);
+			printk("Disable MKTME\n");
+			mktme_disable();
+			break;
+		}
+	}
+
 	return !!mktme_nr_keyids();
 }
 
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index ad3b1f4866b3..fc19da5da3e8 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -852,25 +852,26 @@ char * __init efi_md_typeattr_format(char *buf, size_t size,
 	if (attr & ~(EFI_MEMORY_UC | EFI_MEMORY_WC | EFI_MEMORY_WT |
 		     EFI_MEMORY_WB | EFI_MEMORY_UCE | EFI_MEMORY_RO |
 		     EFI_MEMORY_WP | EFI_MEMORY_RP | EFI_MEMORY_XP |
-		     EFI_MEMORY_NV |
+		     EFI_MEMORY_NV | EFI_MEMORY_CPU_CRYPTO |
 		     EFI_MEMORY_RUNTIME | EFI_MEMORY_MORE_RELIABLE))
 		snprintf(pos, size, "|attr=0x%016llx]",
 			 (unsigned long long)attr);
 	else
 		snprintf(pos, size,
-			 "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
+			 "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
 			 attr & EFI_MEMORY_RUNTIME ? "RUN" : "",
 			 attr & EFI_MEMORY_MORE_RELIABLE ? "MR" : "",
-			 attr & EFI_MEMORY_NV      ? "NV"  : "",
-			 attr & EFI_MEMORY_XP      ? "XP"  : "",
-			 attr & EFI_MEMORY_RP      ? "RP"  : "",
-			 attr & EFI_MEMORY_WP      ? "WP"  : "",
-			 attr & EFI_MEMORY_RO      ? "RO"  : "",
-			 attr & EFI_MEMORY_UCE     ? "UCE" : "",
-			 attr & EFI_MEMORY_WB      ? "WB"  : "",
-			 attr & EFI_MEMORY_WT      ? "WT"  : "",
-			 attr & EFI_MEMORY_WC      ? "WC"  : "",
-			 attr & EFI_MEMORY_UC      ? "UC"  : "");
+			 attr & EFI_MEMORY_NV         ? "NV"  : "",
+			 attr & EFI_MEMORY_CPU_CRYPTO ? "CR"  : "",
+			 attr & EFI_MEMORY_XP         ? "XP"  : "",
+			 attr & EFI_MEMORY_RP         ? "RP"  : "",
+			 attr & EFI_MEMORY_WP         ? "WP"  : "",
+			 attr & EFI_MEMORY_RO         ? "RO"  : "",
+			 attr & EFI_MEMORY_UCE        ? "UCE" : "",
+			 attr & EFI_MEMORY_WB         ? "WB"  : "",
+			 attr & EFI_MEMORY_WT         ? "WT"  : "",
+			 attr & EFI_MEMORY_WC         ? "WC"  : "",
+			 attr & EFI_MEMORY_UC         ? "UC"  : "");
 	return buf;
 }
 
diff --git a/include/linux/efi.h b/include/linux/efi.h
index f87fabea4a85..4ac54a168ffe 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -112,6 +112,7 @@ typedef	struct {
 #define EFI_MEMORY_MORE_RELIABLE \
 				((u64)0x0000000000010000ULL)	/* higher reliability */
 #define EFI_MEMORY_RO		((u64)0x0000000000020000ULL)	/* read-only */
+#define EFI_MEMORY_CPU_CRYPTO 	((u64)0x0000000000080000ULL)	/* memory encryption supported */
 #define EFI_MEMORY_RUNTIME	((u64)0x8000000000000000ULL)	/* range requires runtime mapping */
 #define EFI_MEMORY_DESCRIPTOR_VERSION	1
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 53/59] x86: Introduce CONFIG_X86_INTEL_MKTME
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (51 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 52/59] x86/mm: Disable MKTME if not all system memory supports encryption Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 54/59] x86/mktme: Overview of Multi-Key Total Memory Encryption Kirill A. Shutemov
                   ` (5 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A. Shutemov

Add new config option to enabled/disable Multi-Key Total Memory
Encryption support.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/Kconfig | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f2cc88fe8ada..d8551b612f3b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1550,6 +1550,25 @@ config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
 	  If set to N, then the encryption of system memory can be
 	  activated with the mem_encrypt=on command line option.
 
+config X86_INTEL_MKTME
+	bool "Intel Multi-Key Total Memory Encryption"
+	depends on X86_64 && CPU_SUP_INTEL && !KASAN
+	select X86_MEM_ENCRYPT_COMMON
+	select PAGE_EXTENSION
+	select KEYS
+	select ACPI_HMAT
+	---help---
+	  Say yes to enable support for Multi-Key Total Memory Encryption.
+	  This requires an Intel processor that has support of the feature.
+
+	  Multikey Total Memory Encryption (MKTME) is a technology that allows
+	  transparent memory encryption in upcoming Intel platforms.
+
+	  MKTME is built on top of TME. TME allows encryption of the entirety
+	  of system memory using a single key. MKTME allows having multiple
+	  encryption domains, each having own key -- different memory pages can
+	  be encrypted with different keys.
+
 # Common NUMA Features
 config NUMA
 	bool "Numa Memory Allocation and Scheduler Support"
@@ -2220,7 +2239,7 @@ config RANDOMIZE_MEMORY
 
 config MEMORY_PHYSICAL_PADDING
 	hex "Physical memory mapping padding" if EXPERT
-	depends on RANDOMIZE_MEMORY
+	depends on RANDOMIZE_MEMORY || X86_INTEL_MKTME
 	default "0xa" if MEMORY_HOTPLUG
 	default "0x0"
 	range 0x1 0x40 if MEMORY_HOTPLUG
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 54/59] x86/mktme: Overview of Multi-Key Total Memory Encryption
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (52 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 53/59] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 55/59] x86/mktme: Document the MKTME provided security mitigations Kirill A. Shutemov
                   ` (4 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Provide an overview of MKTME on Intel Platforms.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/index.rst                |  1 +
 Documentation/x86/mktme/index.rst          |  8 +++
 Documentation/x86/mktme/mktme_overview.rst | 57 ++++++++++++++++++++++
 3 files changed, 66 insertions(+)
 create mode 100644 Documentation/x86/mktme/index.rst
 create mode 100644 Documentation/x86/mktme/mktme_overview.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index af64c4bb4447..449bb6abeb0e 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -22,6 +22,7 @@ x86-specific Documentation
    intel_mpx
    intel-iommu
    intel_txt
+   mktme/index
    amd-memory-encryption
    pti
    mds
diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
new file mode 100644
index 000000000000..1614b52dd3e9
--- /dev/null
+++ b/Documentation/x86/mktme/index.rst
@@ -0,0 +1,8 @@
+
+=========================================
+Multi-Key Total Memory Encryption (MKTME)
+=========================================
+
+.. toctree::
+
+   mktme_overview
diff --git a/Documentation/x86/mktme/mktme_overview.rst b/Documentation/x86/mktme/mktme_overview.rst
new file mode 100644
index 000000000000..64c3268a508e
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_overview.rst
@@ -0,0 +1,57 @@
+Overview
+=========
+Multi-Key Total Memory Encryption (MKTME)[1] is a technology that
+allows transparent memory encryption in upcoming Intel platforms.
+It uses a new instruction (PCONFIG) for key setup and selects a
+key for individual pages by repurposing physical address bits in
+the page tables.
+
+Support for MKTME is added to the existing kernel keyring subsystem
+and via a new mprotect_encrypt() system call that can be used by
+applications to encrypt anonymous memory with keys obtained from
+the keyring.
+
+This architecture supports encrypting both normal, volatile DRAM
+and persistent memory.  However, persistent memory support is
+not included in the Linux kernel implementation at this time.
+(We anticipate adding that support next.)
+
+Hardware Background
+===================
+
+MKTME is built on top of an existing single-key technology called
+TME.  TME encrypts all system memory using a single key generated
+by the CPU on every boot of the system. TME provides mitigation
+against physical attacks, such as physically removing a DIMM or
+watching memory bus traffic.
+
+MKTME enables the use of multiple encryption keys[2], allowing
+selection of the encryption key per-page using the page tables.
+Encryption keys are programmed into each memory controller and
+the same set of keys is available to all entities on the system
+with access to that memory (all cores, DMA engines, etc...).
+
+MKTME inherits many of the mitigations against hardware attacks
+from TME.  Like TME, MKTME does not mitigate vulnerable or
+malicious operating systems or virtual machine managers.  MKTME
+offers additional mitigations when compared to TME.
+
+TME and MKTME use the AES encryption algorithm in the AES-XTS
+mode.  This mode, typically used for block-based storage devices,
+takes the physical address of the data into account when
+encrypting each block.  This ensures that the effective key is
+different for each block of memory. Moving encrypted content
+across physical address results in garbage on read, mitigating
+block-relocation attacks.  This property is the reason many of
+the discussed attacks require control of a shared physical page
+to be handed from the victim to the attacker.
+
+--
+1. https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
+2. The MKTME architecture supports up to 16 bits of KeyIDs, so a
+   maximum of 65535 keys on top of the “TME key” at KeyID-0.  The
+   first implementation is expected to support 6 bits, making 63
+   keys available to applications.  However, this is not guaranteed.
+   The number of available keys could be reduced if, for instance,
+   additional physical address space is desired over additional
+   KeyIDs.
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 55/59] x86/mktme: Document the MKTME provided security mitigations
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (53 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 54/59] x86/mktme: Overview of Multi-Key Total Memory Encryption Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 56/59] x86/mktme: Document the MKTME kernel configuration requirements Kirill A. Shutemov
                   ` (3 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Describe the security benefits of Multi-Key Total Memory
Encryption (MKTME) over Total Memory Encryption (TME) alone.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst             |   1 +
 Documentation/x86/mktme/mktme_mitigations.rst | 151 ++++++++++++++++++
 2 files changed, 152 insertions(+)
 create mode 100644 Documentation/x86/mktme/mktme_mitigations.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index 1614b52dd3e9..a3a29577b013 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -6,3 +6,4 @@ Multi-Key Total Memory Encryption (MKTME)
 .. toctree::
 
    mktme_overview
+   mktme_mitigations
diff --git a/Documentation/x86/mktme/mktme_mitigations.rst b/Documentation/x86/mktme/mktme_mitigations.rst
new file mode 100644
index 000000000000..c593784851fb
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_mitigations.rst
@@ -0,0 +1,151 @@
+MKTME-Provided Mitigations
+==========================
+:Author: Dave Hansen <dave.hansen@intel.com>
+
+MKTME adds a few mitigations against attacks that are not
+mitigated when using TME alone.  The first set are mitigations
+against software attacks that are familiar today:
+
+ * Kernel Mapping Attacks: information disclosures that leverage
+   the kernel direct map are mitigated against disclosing user
+   data.
+ * Freed Data Leak Attacks: removing an encryption key from the
+   hardware mitigates future user information disclosure.
+
+The next set are attacks that depend on specialized hardware,
+such as an “evil DIMM” or a DDR interposer:
+
+ * Cross-Domain Replay Attack: data is captured from one domain
+(guest) and replayed to another at a later time.
+ * Cross-Domain Capture and Delayed Compare Attack: data is
+   captured and later analyzed to discover secrets.
+ * Key Wear-out Attack: data is captured and analyzed in order
+   to Weaken the AES encryption itself.
+
+More details on these attacks are below.
+
+Kernel Mapping Attacks
+----------------------
+Information disclosure vulnerabilities leverage the kernel direct
+map because many vulnerabilities involve manipulation of kernel
+data structures (examples: CVE-2017-7277, CVE-2017-9605).  We
+normally think of these bugs as leaking valuable *kernel* data,
+but they can leak application data when application pages are
+recycled for kernel use.
+
+With this MKTME implementation, there is a direct map created for
+each MKTME KeyID which is used whenever the kernel needs to
+access plaintext.  But, all kernel data structures are accessed
+via the direct map for KeyID-0.  Thus, memory reads which are not
+coordinated with the KeyID get garbage (for example, accessing
+KeyID-4 data with the KeyID-0 mapping).
+
+This means that if sensitive data encrypted using MKTME is leaked
+via the KeyID-0 direct map, ciphertext decrypted with the wrong
+key will be disclosed.  To disclose plaintext, an attacker must
+“pivot” to the correct direct mapping, which is non-trivial
+because there are no kernel data structures in the KeyID!=0
+direct mapping.
+
+Freed Data Leak Attack
+----------------------
+The kernel has a history of bugs around uninitialized data.
+Usually, we think of these bugs as leaking sensitive kernel data,
+but they can also be used to leak application secrets.
+
+MKTME can help mitigate the case where application secrets are
+leaked:
+
+ * App (or VM) places a secret in a page * App exits or frees
+memory to kernel allocator * Page added to allocator free list *
+Attacker reallocates page to a purpose where it can read the page
+
+Now, imagine MKTME was in use on the memory being leaked.  The
+data can only be leaked as long as the key is programmed in the
+hardware.  If the key is de-programmed, like after all pages are
+freed after a guest is shut down, any future reads will just see
+ciphertext.
+
+Basically, the key is a convenient choke-point: you can be more
+confident that data encrypted with it is inaccessible once the
+key is removed.
+
+Cross-Domain Replay Attack
+--------------------------
+MKTME mitigates cross-domain replay attacks where an attacker
+replaces an encrypted block owned by one domain with a block
+owned by another domain.  MKTME does not prevent this replacement
+from occurring, but it does mitigate plaintext from being
+disclosed if the domains use different keys.
+
+With TME, the attack could be executed by:
+ * A victim places secret in memory, at a given physical address.
+   Note: AES-XTS is what restricts the attack to being performed
+   at a single physical address instead of across different
+   physical addresses
+ * Attacker captures victim secret’s ciphertext * Later on, after
+   victim frees the physical address, attacker gains ownership 
+ * Attacker puts the ciphertext at the address and get the secret
+   plaintext
+
+But, due to the presumably different keys used by the attacker
+and the victim, the attacker can not successfully decrypt old
+ciphertext.
+
+Cross-Domain Capture and Delayed Compare Attack
+-----------------------------------------------
+This is also referred to as a kind of dictionary attack.
+
+Similarly, MKTME protects against cross-domain capture-and-compare
+attacks.  Consider the following scenario:
+ * A victim places a secret in memory, at a known physical address
+ * Attacker captures victim’s ciphertext
+ * Attacker gains control of the target physical address, perhaps
+   after the victim’s VM is shut down or its memory reclaimed.
+ * Attacker computes and writes many possible plaintexts until new
+   ciphertext matches content captured previously.
+
+Secrets which have low (plaintext) entropy are more vulnerable to
+this attack because they reduce the number of possible plaintexts
+an attacker has to compute and write.
+
+The attack will not work if attacker and victim uses different
+keys.
+
+Key Wear-out Attack
+-------------------
+Repeated use of an encryption key might be used by an attacker to
+infer information about the key or the plaintext, weakening the
+encryption.  The higher the bandwidth of the encryption engine,
+the more vulnerable the key is to wear-out.  The MKTME memory
+encryption hardware works at the speed of the memory bus, which
+has high bandwidth.
+
+Such a weakness has been demonstrated[1] on a theoretical cipher
+with similar properties as AES-XTS.
+
+An attack would take the following steps:
+ * Victim system is using TME with AES-XTS-128
+ * Attacker repeatedly captures ciphertext/plaintext pairs (can
+   be Performed with online hardware attack like an interposer).
+ * Attacker compels repeated use of the key under attack for a
+   sustained time period without a system reboot[2].
+ * Attacker discovers a cipertext collision (two plaintexts
+   translating to the same ciphertext)
+ * Attacker can induce controlled modifications to the targeted
+   plaintext by modifying the colliding ciphertext
+
+MKTME mitigates key wear-out in two ways:
+ * Keys can be rotated periodically to mitigate wear-out.  Since
+   TME keys are generated at boot, rotation of TME keys requires a
+   reboot.  In contrast, MKTME allows rotation while the system is
+   booted.  An application could implement a policy to rotate keys
+   at a frequency which is not feasible to attack.
+ * In the case that MKTME is used to encrypt two guests’ memory
+   with two different keys, an attack on one guest’s key would not
+   weaken the key used in the second guest.
+
+--
+1. http://web.cs.ucdavis.edu/~rogaway/papers/offsets.pdf
+2. This sustained time required for an attack could vary from days
+   to years depending on the attacker’s goals.
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 56/59] x86/mktme: Document the MKTME kernel configuration requirements
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (54 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 55/59] x86/mktme: Document the MKTME provided security mitigations Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 57/59] x86/mktme: Document the MKTME Key Service API Kirill A. Shutemov
                   ` (2 subsequent siblings)
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst               | 1 +
 Documentation/x86/mktme/mktme_configuration.rst | 6 ++++++
 2 files changed, 7 insertions(+)
 create mode 100644 Documentation/x86/mktme/mktme_configuration.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index a3a29577b013..0f021cc4a2db 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -7,3 +7,4 @@ Multi-Key Total Memory Encryption (MKTME)
 
    mktme_overview
    mktme_mitigations
+   mktme_configuration
diff --git a/Documentation/x86/mktme/mktme_configuration.rst b/Documentation/x86/mktme/mktme_configuration.rst
new file mode 100644
index 000000000000..7d56596360cb
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_configuration.rst
@@ -0,0 +1,6 @@
+MKTME Configuration
+===================
+
+CONFIG_X86_INTEL_MKTME
+        MKTME is enabled by selecting CONFIG_X86_INTEL_MKTME on Intel
+        platforms supporting the MKTME feature.
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 57/59] x86/mktme: Document the MKTME Key Service API
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (55 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 56/59] x86/mktme: Document the MKTME kernel configuration requirements Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-08-05 11:58   ` Ben Boeckel
  2019-07-31 15:08 ` [PATCHv2 58/59] x86/mktme: Document the MKTME API for anonymous memory encryption Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 59/59] x86/mktme: Demonstration program using the MKTME APIs Kirill A. Shutemov
  58 siblings, 1 reply; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst      |  1 +
 Documentation/x86/mktme/mktme_keys.rst | 61 ++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)
 create mode 100644 Documentation/x86/mktme/mktme_keys.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index 0f021cc4a2db..8cf2b7d62091 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -8,3 +8,4 @@ Multi-Key Total Memory Encryption (MKTME)
    mktme_overview
    mktme_mitigations
    mktme_configuration
+   mktme_keys
diff --git a/Documentation/x86/mktme/mktme_keys.rst b/Documentation/x86/mktme/mktme_keys.rst
new file mode 100644
index 000000000000..5d9125eb7950
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_keys.rst
@@ -0,0 +1,61 @@
+MKTME Key Service API
+=====================
+MKTME is a new key service type added to the Linux Kernel Key Service.
+
+The MKTME Key Service type is available when CONFIG_X86_INTEL_MKTME is
+turned on in Intel platforms that support the MKTME feature.
+
+The MKTME Key Service type manages the allocation of hardware encryption
+keys. Users can request an MKTME type key and then use that key to
+encrypt memory with the encrypt_mprotect() system call.
+
+Usage
+-----
+    When using the Kernel Key Service to request an *mktme* key,
+    specify the *payload* as follows:
+
+    type=
+        *cpu*	User requests a CPU generated encryption key.
+                The CPU generates and assigns an ephemeral key.
+
+        *no-encrypt*
+                 User requests that hardware does not encrypt
+                 memory when this key is in use.
+
+    algorithm=
+        When type=cpu the algorithm field must be *aes-xts-128*
+        *aes-xts-128* is the only supported encryption algorithm
+
+        When type=no-encrypt the algorithm field must not be
+        present in the payload.
+
+ERRORS
+------
+    In addition to the Errors returned from the Kernel Key Service,
+    add_key(2) or keyctl(1) commands, the MKTME Key Service type may
+    return the following errors:
+
+    EINVAL for any payload specification that does not match the
+           MKTME type payload as defined above.
+
+    EACCES for access denied. The MKTME key type uses capabilities
+           to restrict the allocation of keys to privileged users.
+           CAP_SYS_RESOURCE is required, but it will accept the
+           broader capability of CAP_SYS_ADMIN. See capabilities(7).
+
+    ENOKEY if a hardware key cannot be allocated. Additional error
+           messages will describe the hardware programming errors.
+
+EXAMPLES
+--------
+    Add a 'cpu' type key::
+
+        char \*options_CPU = "type=cpu algorithm=aes-xts-128";
+
+        key = add_key("mktme", "name", options_CPU, strlen(options_CPU),
+                      KEY_SPEC_THREAD_KEYRING);
+
+    Add a "no-encrypt' type key::
+
+	key = add_key("mktme", "name", "no-encrypt", strlen(options_CPU),
+		      KEY_SPEC_THREAD_KEYRING);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 58/59] x86/mktme: Document the MKTME API for anonymous memory encryption
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (56 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 57/59] x86/mktme: Document the MKTME Key Service API Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  2019-07-31 15:08 ` [PATCHv2 59/59] x86/mktme: Demonstration program using the MKTME APIs Kirill A. Shutemov
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst         |  1 +
 Documentation/x86/mktme/mktme_encrypt.rst | 56 +++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 Documentation/x86/mktme/mktme_encrypt.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index 8cf2b7d62091..ca3c76adc596 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -9,3 +9,4 @@ Multi-Key Total Memory Encryption (MKTME)
    mktme_mitigations
    mktme_configuration
    mktme_keys
+   mktme_encrypt
diff --git a/Documentation/x86/mktme/mktme_encrypt.rst b/Documentation/x86/mktme/mktme_encrypt.rst
new file mode 100644
index 000000000000..6dc8ae11f1cb
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_encrypt.rst
@@ -0,0 +1,56 @@
+MKTME API: system call encrypt_mprotect()
+=========================================
+
+Synopsis
+--------
+int encrypt_mprotect(void \*addr, size_t len, int prot, key_serial_t serial);
+
+Where *key_serial_t serial* is the serial number of a key allocated
+using the MKTME Key Service.
+
+Description
+-----------
+    encrypt_mprotect() encrypts the memory pages containing any part
+    of the address range in the interval specified by addr and len.
+
+    encrypt_mprotect() supports the legacy mprotect() behavior plus
+    the enabling of memory encryption. That means that in addition
+    to encrypting the memory, the protection flags will be updated
+    as requested in the call.
+
+    The *addr* and *len* must be aligned to a page boundary.
+
+    The caller must have *KEY_NEED_VIEW* permission on the key.
+
+    The memory that is to be protected must be mapped *ANONYMOUS*.
+
+Errors
+------
+    In addition to the Errors returned from legacy mprotect()
+    encrypt_mprotect will return:
+
+    ENOKEY *serial* parameter does not represent a valid key.
+
+    EINVAL *len* parameter is not page aligned.
+
+    EACCES Caller does not have *KEY_NEED_VIEW* permission on the key.
+
+EXAMPLE
+--------
+  Allocate an MKTME Key::
+        serial = add_key("mktme", "name", "type=cpu algorithm=aes-xts-128" @u
+
+  Map ANONYMOUS memory::
+        ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+
+  Protect memory::
+        ret = syscall(SYS_encrypt_mprotect, ptr, size, PROT_READ|PROT_WRITE,
+                      serial);
+
+  Use the encrypted memory
+
+  Free memory::
+        ret = munmap(ptr, size);
+
+  Free the key resource::
+        ret = keyctl(KEYCTL_INVALIDATE, serial);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCHv2 59/59] x86/mktme: Demonstration program using the MKTME APIs
  2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
                   ` (57 preceding siblings ...)
  2019-07-31 15:08 ` [PATCHv2 58/59] x86/mktme: Document the MKTME API for anonymous memory encryption Kirill A. Shutemov
@ 2019-07-31 15:08 ` Kirill A. Shutemov
  58 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-07-31 15:08 UTC (permalink / raw)
  To: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

From: Alison Schofield <alison.schofield@intel.com>

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst      |  1 +
 Documentation/x86/mktme/mktme_demo.rst | 53 ++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)
 create mode 100644 Documentation/x86/mktme/mktme_demo.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index ca3c76adc596..3af322d13225 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -10,3 +10,4 @@ Multi-Key Total Memory Encryption (MKTME)
    mktme_configuration
    mktme_keys
    mktme_encrypt
+   mktme_demo
diff --git a/Documentation/x86/mktme/mktme_demo.rst b/Documentation/x86/mktme/mktme_demo.rst
new file mode 100644
index 000000000000..5af78617f887
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_demo.rst
@@ -0,0 +1,53 @@
+Demonstration Program using MKTME API's
+=======================================
+
+/* Compile with the keyutils library: cc -o mdemo mdemo.c -lkeyutils */
+
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <keyutils.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGE_SIZE)
+#define sys_encrypt_mprotect 434
+
+void main(void)
+{
+	char *options_CPU = "algorithm=aes-xts-128 type=cpu";
+	long size = PAGE_SIZE;
+        key_serial_t key;
+	void *ptra;
+	int ret;
+
+        /* Allocate an MKTME Key */
+	key = add_key("mktme", "testkey", options_CPU, strlen(options_CPU),
+                      KEY_SPEC_THREAD_KEYRING);
+
+	if (key == -1) {
+		printf("addkey FAILED\n");
+		return;
+	}
+        /* Map a page of ANONYMOUS memory */
+	ptra = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+	if (!ptra) {
+		printf("failed to mmap");
+		goto inval_key;
+	}
+        /* Encrypt that page of memory with the MKTME Key */
+	ret = syscall(sys_encrypt_mprotect, ptra, size, PROT_NONE, key);
+	if (ret)
+		printf("mprotect error [%d]\n", ret);
+
+        /* Enjoy that page of encrypted memory */
+
+        /* Free the memory */
+	ret = munmap(ptra, size);
+
+inval_key:
+        /* Free the Key */
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		printf("invalidate failed on key [%d]\n", key);
+}
-- 
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCHv2 25/59] keys/mktme: Preparse the MKTME key payload
  2019-07-31 15:07 ` [PATCHv2 25/59] keys/mktme: Preparse the MKTME key payload Kirill A. Shutemov
@ 2019-08-05 11:58   ` Ben Boeckel
  2019-08-05 20:31     ` Alison Schofield
  0 siblings, 1 reply; 68+ messages in thread
From: Ben Boeckel @ 2019-08-05 11:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

On Wed, Jul 31, 2019 at 18:07:39 +0300, Kirill A. Shutemov wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> +/* Make sure arguments are correct for the TYPE of key requested */
> +static int mktme_check_options(u32 *payload, unsigned long token_mask,
> +			       enum mktme_type type, enum mktme_alg alg)
> +{
> +	if (!token_mask)
> +		return -EINVAL;
> +
> +	switch (type) {
> +	case MKTME_TYPE_CPU:
> +		if (test_bit(OPT_ALGORITHM, &token_mask))
> +			*payload |= (1 << alg) << 8;
> +		else
> +			return -EINVAL;
> +
> +		*payload |= MKTME_KEYID_SET_KEY_RANDOM;
> +		break;
> +
> +	case MKTME_TYPE_NO_ENCRYPT:
> +		*payload |= MKTME_KEYID_NO_ENCRYPT;
> +		break;

The documentation states that for `type=no-encrypt`, algorithm must not
be specified at all. Where is that checked?

--Ben


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCHv2 57/59] x86/mktme: Document the MKTME Key Service API
  2019-07-31 15:08 ` [PATCHv2 57/59] x86/mktme: Document the MKTME Key Service API Kirill A. Shutemov
@ 2019-08-05 11:58   ` Ben Boeckel
  2019-08-05 20:44     ` Alison Schofield
  0 siblings, 1 reply; 68+ messages in thread
From: Ben Boeckel @ 2019-08-05 11:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

On Wed, Jul 31, 2019 at 18:08:11 +0300, Kirill A. Shutemov wrote:
> +	key = add_key("mktme", "name", "no-encrypt", strlen(options_CPU),
> +		      KEY_SPEC_THREAD_KEYRING);

Should this be `type=no-encrypt` here? Also, seems like copy/paste from
the `type=cpu` case for the `strlen` call.

--Ben


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCHv2 25/59] keys/mktme: Preparse the MKTME key payload
  2019-08-05 11:58   ` Ben Boeckel
@ 2019-08-05 20:31     ` Alison Schofield
  2019-08-13 13:06       ` Ben Boeckel
  0 siblings, 1 reply; 68+ messages in thread
From: Alison Schofield @ 2019-08-05 20:31 UTC (permalink / raw)
  To: Ben Boeckel
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells, Kees Cook, Dave Hansen,
	Kai Huang, Jacob Pan, linux-mm, kvm, keyrings, linux-kernel,
	Kirill A . Shutemov

On Mon, Aug 05, 2019 at 07:58:19AM -0400, Ben Boeckel wrote:
> On Wed, Jul 31, 2019 at 18:07:39 +0300, Kirill A. Shutemov wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > +/* Make sure arguments are correct for the TYPE of key requested */
> > +static int mktme_check_options(u32 *payload, unsigned long token_mask,
> > +			       enum mktme_type type, enum mktme_alg alg)
> > +{
> > +	if (!token_mask)
> > +		return -EINVAL;
> > +
> > +	switch (type) {
> > +	case MKTME_TYPE_CPU:
> > +		if (test_bit(OPT_ALGORITHM, &token_mask))
> > +			*payload |= (1 << alg) << 8;
> > +		else
> > +			return -EINVAL;
> > +
> > +		*payload |= MKTME_KEYID_SET_KEY_RANDOM;
> > +		break;
> > +
> > +	case MKTME_TYPE_NO_ENCRYPT:
		if (test_bit(OPT_ALGORITHM, &token_mask))
			return -EINVAL;
> > +		*payload |= MKTME_KEYID_NO_ENCRYPT;
> > +		break;
> 
> The documentation states that for `type=no-encrypt`, algorithm must not
> be specified at all. Where is that checked?
> 
> --Ben
It's not currently checked, but should be. 
I'll add it as shown above.
Thanks for the review,
Alison


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCHv2 57/59] x86/mktme: Document the MKTME Key Service API
  2019-08-05 11:58   ` Ben Boeckel
@ 2019-08-05 20:44     ` Alison Schofield
  2019-08-13 13:07       ` Ben Boeckel
  0 siblings, 1 reply; 68+ messages in thread
From: Alison Schofield @ 2019-08-05 20:44 UTC (permalink / raw)
  To: Ben Boeckel
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells, Kees Cook, Dave Hansen,
	Kai Huang, Jacob Pan, linux-mm, kvm, keyrings, linux-kernel,
	Kirill A . Shutemov

On Mon, Aug 05, 2019 at 07:58:37AM -0400, Ben Boeckel wrote:
> On Wed, Jul 31, 2019 at 18:08:11 +0300, Kirill A. Shutemov wrote:
> > +	key = add_key("mktme", "name", "no-encrypt", strlen(options_CPU),
> > +		      KEY_SPEC_THREAD_KEYRING);
> 
> Should this be `type=no-encrypt` here? Also, seems like copy/paste from
> the `type=cpu` case for the `strlen` call.
> 
> --Ben

Yes. Fixed up as follows:

	Add a "no-encrypt' type key::

        char \*options_NOENCRYPT = "type=no-encrypt";

        key = add_key("mktme", "name", options_NOENCRYPT,
                      strlen(options_NOENCRYPT), KEY_SPEC_THREAD_KEYRING);

Thanks for the review,
Alison


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCHv2 47/59] kvm, x86, mmu: setup MKTME keyID to spte for given PFN
  2019-07-31 15:08 ` [PATCHv2 47/59] kvm, x86, mmu: setup MKTME keyID to spte for given PFN Kirill A. Shutemov
@ 2019-08-06 20:26   ` Lendacky, Thomas
  2019-08-07 14:28     ` Kirill A. Shutemov
  0 siblings, 1 reply; 68+ messages in thread
From: Lendacky, Thomas @ 2019-08-06 20:26 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells
  Cc: Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

On 7/31/19 10:08 AM, Kirill A. Shutemov wrote:
> From: Kai Huang <kai.huang@linux.intel.com>
> 
> Setup keyID to SPTE, which will be eventually programmed to shadow MMU
> or EPT table, according to page's associated keyID, so that guest is
> able to use correct keyID to access guest memory.
> 
> Note current shadow_me_mask doesn't suit MKTME's needs, since for MKTME
> there's no fixed memory encryption mask, but can vary from keyID 1 to
> maximum keyID, therefore shadow_me_mask remains 0 for MKTME.
> 
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kvm/mmu.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 8f72526e2f68..b8742e6219f6 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2936,6 +2936,22 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
>  #define SET_SPTE_WRITE_PROTECTED_PT	BIT(0)
>  #define SET_SPTE_NEED_REMOTE_TLB_FLUSH	BIT(1)
>  
> +static u64 get_phys_encryption_mask(kvm_pfn_t pfn)
> +{
> +#ifdef CONFIG_X86_INTEL_MKTME
> +	struct page *page;
> +
> +	if (!pfn_valid(pfn))
> +		return 0;
> +
> +	page = pfn_to_page(pfn);
> +
> +	return ((u64)page_keyid(page)) << mktme_keyid_shift();
> +#else
> +	return shadow_me_mask;
> +#endif
> +}

This patch breaks AMD virtualization (SVM) in general (non-SEV and SEV
guests) when SME is active. Shouldn't this be a run time, vs build time,
check for MKTME being active?

Thanks,
Tom

> +
>  static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		    unsigned pte_access, int level,
>  		    gfn_t gfn, kvm_pfn_t pfn, bool speculative,
> @@ -2982,7 +2998,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		pte_access &= ~ACC_WRITE_MASK;
>  
>  	if (!kvm_is_mmio_pfn(pfn))
> -		spte |= shadow_me_mask;
> +		spte |= get_phys_encryption_mask(pfn);
>  
>  	spte |= (u64)pfn << PAGE_SHIFT;
>  
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCHv2 47/59] kvm, x86, mmu: setup MKTME keyID to spte for given PFN
  2019-08-06 20:26   ` Lendacky, Thomas
@ 2019-08-07 14:28     ` Kirill A. Shutemov
  0 siblings, 0 replies; 68+ messages in thread
From: Kirill A. Shutemov @ 2019-08-07 14:28 UTC (permalink / raw)
  To: Lendacky, Thomas
  Cc: Andrew Morton, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, David Howells,
	Kees Cook, Dave Hansen, Kai Huang, Jacob Pan, Alison Schofield,
	linux-mm, kvm, keyrings, linux-kernel, Kirill A . Shutemov

On Tue, Aug 06, 2019 at 08:26:52PM +0000, Lendacky, Thomas wrote:
> On 7/31/19 10:08 AM, Kirill A. Shutemov wrote:
> > From: Kai Huang <kai.huang@linux.intel.com>
> > 
> > Setup keyID to SPTE, which will be eventually programmed to shadow MMU
> > or EPT table, according to page's associated keyID, so that guest is
> > able to use correct keyID to access guest memory.
> > 
> > Note current shadow_me_mask doesn't suit MKTME's needs, since for MKTME
> > there's no fixed memory encryption mask, but can vary from keyID 1 to
> > maximum keyID, therefore shadow_me_mask remains 0 for MKTME.
> > 
> > Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/kvm/mmu.c | 18 +++++++++++++++++-
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index 8f72526e2f68..b8742e6219f6 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -2936,6 +2936,22 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
> >  #define SET_SPTE_WRITE_PROTECTED_PT	BIT(0)
> >  #define SET_SPTE_NEED_REMOTE_TLB_FLUSH	BIT(1)
> >  
> > +static u64 get_phys_encryption_mask(kvm_pfn_t pfn)
> > +{
> > +#ifdef CONFIG_X86_INTEL_MKTME
> > +	struct page *page;
> > +
> > +	if (!pfn_valid(pfn))
> > +		return 0;
> > +
> > +	page = pfn_to_page(pfn);
> > +
> > +	return ((u64)page_keyid(page)) << mktme_keyid_shift();
> > +#else
> > +	return shadow_me_mask;
> > +#endif
> > +}
> 
> This patch breaks AMD virtualization (SVM) in general (non-SEV and SEV
> guests) when SME is active. Shouldn't this be a run time, vs build time,
> check for MKTME being active?

Thanks, I've missed this.

This fixup should help:

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 00d17bdfec0f..54931acf260e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2947,18 +2947,17 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
 
 static u64 get_phys_encryption_mask(kvm_pfn_t pfn)
 {
-#ifdef CONFIG_X86_INTEL_MKTME
 	struct page *page;
 
+	if (!mktme_enabled())
+		return shadow_me_mask;
+
 	if (!pfn_valid(pfn))
 		return 0;
 
 	page = pfn_to_page(pfn);
 
 	return ((u64)page_keyid(page)) << mktme_keyid_shift();
-#else
-	return shadow_me_mask;
-#endif
 }
 
 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCHv2 25/59] keys/mktme: Preparse the MKTME key payload
  2019-08-05 20:31     ` Alison Schofield
@ 2019-08-13 13:06       ` Ben Boeckel
  0 siblings, 0 replies; 68+ messages in thread
From: Ben Boeckel @ 2019-08-13 13:06 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells, Kees Cook, Dave Hansen,
	Kai Huang, Jacob Pan, linux-mm, kvm, keyrings, linux-kernel,
	Kirill A . Shutemov

On Mon, Aug 05, 2019 at 13:31:02 -0700, Alison Schofield wrote:
> It's not currently checked, but should be. 
> I'll add it as shown above.
> Thanks for the review,

Thanks. Seeing how this works elsewhere now, feel free to add my review
with the proposed check to the new patch.

Reviewed-by: Ben Boeckel <mathstuf@gmail.com>

--Ben


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCHv2 57/59] x86/mktme: Document the MKTME Key Service API
  2019-08-05 20:44     ` Alison Schofield
@ 2019-08-13 13:07       ` Ben Boeckel
  0 siblings, 0 replies; 68+ messages in thread
From: Ben Boeckel @ 2019-08-13 13:07 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Kirill A. Shutemov, Andrew Morton, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, David Howells, Kees Cook, Dave Hansen,
	Kai Huang, Jacob Pan, linux-mm, kvm, keyrings, linux-kernel,
	Kirill A . Shutemov

On Mon, Aug 05, 2019 at 13:44:53 -0700, Alison Schofield wrote:
> Yes. Fixed up as follows:
> 
> 	Add a "no-encrypt' type key::
> 
>         char \*options_NOENCRYPT = "type=no-encrypt";
> 
>         key = add_key("mktme", "name", options_NOENCRYPT,
>                       strlen(options_NOENCRYPT), KEY_SPEC_THREAD_KEYRING);

Thanks. Looks good to me.

Reviewed-by: Ben Boeckel <mathstuf@gmail.com>

--Ben


^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, back to index

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-31 15:07 [PATCHv2 00/59] Intel MKTME enabling Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 01/59] mm: Do no merge VMAs with different encryption KeyIDs Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 02/59] mm: Add helpers to setup zero page mappings Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 03/59] mm/ksm: Do not merge pages with different KeyIDs Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 04/59] mm/page_alloc: Unify alloc_hugepage_vma() Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 05/59] mm/page_alloc: Handle allocation for encrypted memory Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 06/59] mm/khugepaged: Handle encrypted pages Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 07/59] x86/mm: Mask out KeyID bits from page table entry pfn Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 08/59] x86/mm: Introduce helpers to read number, shift and mask of KeyIDs Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 09/59] x86/mm: Store bitmask of the encryption algorithms supported by MKTME Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 10/59] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify() Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 11/59] x86/mm: Detect MKTME early Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 12/59] x86/mm: Add a helper to retrieve KeyID for a page Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 13/59] x86/mm: Add a helper to retrieve KeyID for a VMA Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 14/59] x86/mm: Add hooks to allocate and free encrypted pages Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 15/59] x86/mm: Map zero pages into encrypted mappings correctly Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 16/59] x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 17/59] x86/mm: Allow to disable MKTME after enumeration Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 18/59] x86/mm: Calculate direct mapping size Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 19/59] x86/mm: Implement syncing per-KeyID direct mappings Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 20/59] x86/mm: Handle encrypted memory in page_to_virt() and __pa() Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 21/59] mm/page_ext: Export lookup_page_ext() symbol Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 22/59] mm/rmap: Clear vma->anon_vma on unlink_anon_vmas() Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 23/59] x86/pconfig: Set an activated algorithm in all MKTME commands Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 24/59] keys/mktme: Introduce a Kernel Key Service for MKTME Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 25/59] keys/mktme: Preparse the MKTME key payload Kirill A. Shutemov
2019-08-05 11:58   ` Ben Boeckel
2019-08-05 20:31     ` Alison Schofield
2019-08-13 13:06       ` Ben Boeckel
2019-07-31 15:07 ` [PATCHv2 26/59] keys/mktme: Instantiate MKTME keys Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 27/59] keys/mktme: Destroy " Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 28/59] keys/mktme: Move the MKTME payload into a cache aligned structure Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 29/59] keys/mktme: Set up PCONFIG programming targets for MKTME keys Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 30/59] keys/mktme: Program MKTME keys into the platform hardware Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 31/59] keys/mktme: Set up a percpu_ref_count for MKTME keys Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 32/59] keys/mktme: Clear the key programming from the MKTME hardware Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 33/59] keys/mktme: Require CAP_SYS_RESOURCE capability for MKTME keys Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 34/59] acpi: Remove __init from acpi table parsing functions Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 35/59] acpi/hmat: Determine existence of an ACPI HMAT Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 36/59] keys/mktme: Require ACPI HMAT to register the MKTME Key Service Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 37/59] acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 38/59] keys/mktme: Do not allow key creation in unsafe topologies Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 39/59] keys/mktme: Support CPU hotplug for MKTME key service Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 40/59] keys/mktme: Block memory hotplug additions when MKTME is enabled Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 41/59] mm: Generalize the mprotect implementation to support extensions Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 42/59] syscall/x86: Wire up a system call for MKTME encryption keys Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 43/59] x86/mm: Set KeyIDs in encrypted VMAs for MKTME Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 44/59] mm: Add the encrypt_mprotect() system call " Kirill A. Shutemov
2019-07-31 15:07 ` [PATCHv2 45/59] x86/mm: Keep reference counts on hardware key usage " Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 46/59] mm: Restrict MKTME memory encryption to anonymous VMAs Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 47/59] kvm, x86, mmu: setup MKTME keyID to spte for given PFN Kirill A. Shutemov
2019-08-06 20:26   ` Lendacky, Thomas
2019-08-07 14:28     ` Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 48/59] iommu/vt-d: Support MKTME in DMA remapping Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 49/59] x86/mm: introduce common code for mem encryption Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 50/59] x86/mm: Use common code for DMA memory encryption Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 51/59] x86/mm: Disable MKTME on incompatible platform configurations Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 52/59] x86/mm: Disable MKTME if not all system memory supports encryption Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 53/59] x86: Introduce CONFIG_X86_INTEL_MKTME Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 54/59] x86/mktme: Overview of Multi-Key Total Memory Encryption Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 55/59] x86/mktme: Document the MKTME provided security mitigations Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 56/59] x86/mktme: Document the MKTME kernel configuration requirements Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 57/59] x86/mktme: Document the MKTME Key Service API Kirill A. Shutemov
2019-08-05 11:58   ` Ben Boeckel
2019-08-05 20:44     ` Alison Schofield
2019-08-13 13:07       ` Ben Boeckel
2019-07-31 15:08 ` [PATCHv2 58/59] x86/mktme: Document the MKTME API for anonymous memory encryption Kirill A. Shutemov
2019-07-31 15:08 ` [PATCHv2 59/59] x86/mktme: Demonstration program using the MKTME APIs Kirill A. Shutemov

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox