Linux-Security-Module Archive on lore.kernel.org
 help / Atom feed
* [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
@ 2018-12-04  7:39 Alison Schofield
  2018-12-04  7:39 ` [RFC v2 01/13] x86/mktme: Document the MKTME APIs Alison Schofield
                   ` (15 more replies)
  0 siblings, 16 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

Hi Thomas, David,

Here is an updated RFC on the API's to support MKTME.
(Multi-Key Total Memory Encryption)

This RFC presents the 2 API additions to support the creation and
usage of memory encryption keys:
 1) Kernel Key Service type "mktme"
 2) System call encrypt_mprotect()

This patchset is built upon Kirill Shutemov's work for the core MKTME
support.

David: Please let me know if the changes made, based on your review,
are reasonable. I don't think that the new changes touch key service
specific areas (much).

Thomas: Please provide feedback on encrypt_mprotect(). If not a
review, then a direction check would be helpful.

I picked up a few more 'CCs this time in get_maintainer!

Thanks!
Alison


Changes in RFC2
Add a preparser to mktme key service. (dhowells)
Replace key serial no. with key struct point in mktme_map. (dhowells)
Remove patch that inserts a special MKTME case in keyctl revoke. (dhowells)
Updated key usage syntax in the documentation (Kai)
Replaced NO_PKEY, NO_KEYID with a single constant NO_KEY. (Jarkko)
Clarified comments in changelog and code. (Jarkko)
Add clear, no-encrypt, and update key support.
Add mktme_savekeys (Patch 12 ) to give kernel permission to save key data.
Add cpu hotplug support. (Patch 13)

Alison Schofield (13):
  x86/mktme: Document the MKTME APIs
  mm: Generalize the mprotect implementation to support extensions
  syscall/x86: Wire up a new system call for memory encryption keys
  x86/mm: Add helper functions for MKTME memory encryption keys
  x86/mm: Set KeyIDs in encrypted VMAs
  mm: Add the encrypt_mprotect() system call
  x86/mm: Add helpers for reference counting encrypted VMAs
  mm: Use reference counting for encrypted VMAs
  mm: Restrict memory encryption to anonymous VMA's
  keys/mktme: Add the MKTME Key Service type for memory encryption
  keys/mktme: Program memory encryption keys on a system wide basis
  keys/mktme: Save MKTME data if kernel cmdline parameter allows
  keys/mktme: Support CPU Hotplug for MKTME keys

 Documentation/admin-guide/kernel-parameters.rst |   1 +
 Documentation/admin-guide/kernel-parameters.txt |  11 +
 Documentation/x86/mktme/index.rst               |  11 +
 Documentation/x86/mktme/mktme_demo.rst          |  53 +++
 Documentation/x86/mktme/mktme_encrypt.rst       |  58 +++
 Documentation/x86/mktme/mktme_keys.rst          | 109 +++++
 Documentation/x86/mktme/mktme_overview.rst      |  60 +++
 arch/x86/Kconfig                                |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl          |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl          |   1 +
 arch/x86/include/asm/mktme.h                    |  25 +
 arch/x86/mm/mktme.c                             | 179 ++++++++
 fs/exec.c                                       |   4 +-
 include/keys/mktme-type.h                       |  41 ++
 include/linux/key.h                             |   2 +
 include/linux/mm.h                              |  11 +-
 include/linux/syscalls.h                        |   2 +
 include/uapi/asm-generic/unistd.h               |   4 +-
 kernel/fork.c                                   |   2 +
 kernel/sys_ni.c                                 |   2 +
 mm/mprotect.c                                   |  91 +++-
 security/keys/Kconfig                           |  11 +
 security/keys/Makefile                          |   1 +
 security/keys/mktme_keys.c                      | 580 ++++++++++++++++++++++++
 24 files changed, 1249 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/x86/mktme/index.rst
 create mode 100644 Documentation/x86/mktme/mktme_demo.rst
 create mode 100644 Documentation/x86/mktme/mktme_encrypt.rst
 create mode 100644 Documentation/x86/mktme/mktme_keys.rst
 create mode 100644 Documentation/x86/mktme/mktme_overview.rst
 create mode 100644 include/keys/mktme-type.h
 create mode 100644 security/keys/mktme_keys.c

-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 01/13] x86/mktme: Document the MKTME APIs
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
@ 2018-12-04  7:39 ` Alison Schofield
  2018-12-05 18:11   ` Andy Lutomirski
  2018-12-06  8:04   ` Sakkinen, Jarkko
  2018-12-04  7:39 ` [RFC v2 02/13] mm: Generalize the mprotect implementation to support extensions Alison Schofield
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

This includes an overview, a section on each API: MTKME Keys and
system call encrypt_mprotect(), and a demonstration program.

(Some of this info is destined for man pages.)

Change-Id: I34dc9ff1a1308c057ec4bb3e652c4d7ce6995606
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/mktme/index.rst          |  11 +++
 Documentation/x86/mktme/mktme_demo.rst     |  53 ++++++++++++++
 Documentation/x86/mktme/mktme_encrypt.rst  |  58 +++++++++++++++
 Documentation/x86/mktme/mktme_keys.rst     | 109 +++++++++++++++++++++++++++++
 Documentation/x86/mktme/mktme_overview.rst |  60 ++++++++++++++++
 5 files changed, 291 insertions(+)
 create mode 100644 Documentation/x86/mktme/index.rst
 create mode 100644 Documentation/x86/mktme/mktme_demo.rst
 create mode 100644 Documentation/x86/mktme/mktme_encrypt.rst
 create mode 100644 Documentation/x86/mktme/mktme_keys.rst
 create mode 100644 Documentation/x86/mktme/mktme_overview.rst

diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
new file mode 100644
index 000000000000..8c556d04cbc4
--- /dev/null
+++ b/Documentation/x86/mktme/index.rst
@@ -0,0 +1,11 @@
+
+=============================================
+Multi-Key Total Memory Encryption (MKTME) API
+=============================================
+
+.. toctree::
+
+   mktme_overview
+   mktme_keys
+   mktme_encrypt
+   mktme_demo
diff --git a/Documentation/x86/mktme/mktme_demo.rst b/Documentation/x86/mktme/mktme_demo.rst
new file mode 100644
index 000000000000..afd50772e65d
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_demo.rst
@@ -0,0 +1,53 @@
+Demonstration Program using MKTME API's
+=======================================
+
+/* Compile with the keyutils library: cc -o mdemo mdemo.c -lkeyutils */
+
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <keyutils.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGE_SIZE)
+#define sys_encrypt_mprotect 335
+
+void main(void)
+{
+	char *options_CPU = "algorithm=aes-xts-128 type=cpu";
+	long size = PAGE_SIZE;
+        key_serial_t key;
+	void *ptra;
+	int ret;
+
+        /* Allocate an MKTME Key */
+	key = add_key("mktme", "testkey", options_CPU, strlen(options_CPU),
+                      KEY_SPEC_THREAD_KEYRING);
+
+	if (key == -1) {
+		printf("addkey FAILED\n");
+		return;
+	}
+        /* Map a page of ANONYMOUS memory */
+	ptra = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+	if (!ptra) {
+		printf("failed to mmap");
+		goto inval_key;
+	}
+        /* Encrypt that page of memory with the MKTME Key */
+	ret = syscall(sys_encrypt_mprotect, ptra, size, PROT_NONE, key);
+	if (ret)
+		printf("mprotect error [%d]\n", ret);
+
+        /* Enjoy that page of encrypted memory */
+
+        /* Free the memory */
+	ret = munmap(ptra, size);
+
+inval_key:
+        /* Free the Key */
+	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+		printf("invalidate failed on key [%d]\n", key);
+}
diff --git a/Documentation/x86/mktme/mktme_encrypt.rst b/Documentation/x86/mktme/mktme_encrypt.rst
new file mode 100644
index 000000000000..ede5237183fc
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_encrypt.rst
@@ -0,0 +1,58 @@
+MKTME API: system call encrypt_mprotect()
+=========================================
+
+Synopsis
+--------
+int encrypt_mprotect(void \*addr, size_t len, int prot, key_serial_t serial);
+
+Where *key_serial_t serial* is the serial number of a key allocated
+using the MKTME Key Service.
+
+Description
+-----------
+    encrypt_mprotect() encrypts the memory pages containing any part
+    of the address range in the interval specified by addr and len.
+
+    encrypt_mprotect() supports the legacy mprotect() behavior plus
+    the enabling of memory encryption. That means that in addition
+    to encrypting the memory, the protection flags will be updated
+    as requested in the call.
+
+    The *addr* and *len* must be aligned to a page boundary.
+
+    The caller must have *KEY_NEED_VIEW* permission on the key.
+
+    The range of memory that is to be protected must be mapped as
+    *ANONYMOUS*.
+
+Errors
+------
+    In addition to the Errors returned from legacy mprotect()
+    encrypt_mprotect will return:
+
+    ENOKEY *serial* parameter does not represent a valid key.
+
+    EINVAL *len* parameter is not page aligned.
+
+    EACCES Caller does not have *KEY_NEED_VIEW* permission on the key.
+
+EXAMPLE
+--------
+  Allocate an MKTME Key::
+        serial = add_key("mktme", "name", "type=cpu algorithm=aes-xts-128" @u
+
+  Map ANONYMOUS memory::
+        ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+
+  Protect memory::
+        ret = syscall(SYS_encrypt_mprotect, ptr, size, PROT_READ|PROT_WRITE,
+                      serial);
+
+  Use the encrypted memory
+
+  Free memory::
+        ret = munmap(ptr, size);
+
+  Free the key resource::
+        ret = keyctl(KEYCTL_INVALIDATE, serial);
+
diff --git a/Documentation/x86/mktme/mktme_keys.rst b/Documentation/x86/mktme/mktme_keys.rst
new file mode 100644
index 000000000000..5837909b2c54
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_keys.rst
@@ -0,0 +1,109 @@
+MKTME Key Service API
+=====================
+MKTME is a new key service type added to the Linux Kernel Key Service.
+
+The MKTME Key Service type is available when CONFIG_X86_INTEL_MKTME is
+turned on in Intel platforms that support the MKTME feature.
+
+The MKTME Key Service type manages the allocation of hardware encryption
+keys. Users can request an MKTME type key and then use that key to
+encrypt memory with the encrypt_mprotect() system call.
+
+Usage
+-----
+    When using the Kernel Key Service to request an *mktme* key,
+    specify the *payload* as follows:
+
+    type=
+        *user*	User will supply the encryption key data. Use this
+                type to directly program a hardware encryption key.
+
+        *cpu*	User requests a CPU generated encryption key.
+                The CPU generates and assigns an ephemeral key.
+
+        *clear* User requests that a hardware encryption key be
+                cleared. This will clear the encryption key from
+                the hardware. On execution this hardware key gets
+                TME behavior.
+
+        *no-encrypt*
+                 User requests that hardware does not encrypt
+                 memory when this key is in use.
+
+    algorithm=
+        When type=user or type=cpu the algorithm field must be
+        *aes-xts-128*
+
+        When type=clear or type=no-encrypt the algorithm field
+        must not be present in the payload.
+
+    key=
+        When type=user the user must supply a 128 bit encryption
+        key as exactly 32 ASCII hexadecimal characters.
+
+	When type=cpu the user may optionally supply 128 bits of
+        entropy for the CPU generated encryption key in this field.
+        It must be exactly 32 ASCII hexadecimal characters.
+
+	When type=clear or type=no-encrypt this key field must
+        not be present in the payload.
+
+    tweak=
+	When type=user the user must supply a 128 bit tweak key
+        as exactly 32 ASCII hexadecimal characters.
+
+	When type=cpu the user may optionally supply 128 bits of
+        entropy for the CPU generated tweak key in this field. It
+        must be exactly 32 ASCII hexadecimal characters.
+
+        When type=clear or type=no-encrypt the tweak field must
+	not be present in the payload.
+
+ERRORS
+------
+    In addition to the Errors returned from the Kernel Key Service,
+    add_key(2) or keyctl(1) commands, the MKTME Key Service type may
+    return the following errors:
+
+    EINVAL for any payload specification that does not match the
+           MKTME type payload as defined above.
+    EACCES for access denied. MKTME key type uses capabilities to
+           restrict the allocation of keys. CAP_SYS_RESOURCE is
+           required, but it will accept the broader capability of
+           CAP_SYS_ADMIN.  See capabilities(7).
+
+    ENOKEY if a hardware key cannot be allocated. Additional error
+           messages will describe the hardware programming errors.
+
+EXAMPLES
+--------
+    Add a 'user' type key::
+
+        char \*options_USER = "type=user
+                               algorithm=aes-xts-128
+                               key=12345678912345671234567891234567
+                               tweak=12345678912345671234567891234567";
+
+        key = add_key("mktme", "name", options_USER, strlen(options_USER),
+                      KEY_SPEC_THREAD_KEYRING);
+
+    Add a 'cpu' type key::
+
+        char \*options_USER = "type=cpu algorithm=aes-xts-128";
+
+        key = add_key("mktme", "name", options_CPU, strlen(options_CPU),
+                      KEY_SPEC_THREAD_KEYRING);
+
+    Update a key to 'Clear' type::
+
+        Note: This has the effect of clearing out the previously programmed
+        encryption data in the hardware. Use this to clear the hardware slot
+        prior to invalidating the key.
+
+        ret = keyctl(KEYCTL_UPDATE, key, "type=clear", strlen(options_CLEAR);
+
+    Add a "no-encrypt' type key::
+
+	key = add_key("mktme", "name", "no-encrypt", strlen(options_CPU),
+		      KEY_SPEC_THREAD_KEYRING);
+
diff --git a/Documentation/x86/mktme/mktme_overview.rst b/Documentation/x86/mktme/mktme_overview.rst
new file mode 100644
index 000000000000..cc2c4a8320e7
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_overview.rst
@@ -0,0 +1,60 @@
+Overview
+========
+MKTME (Multi-Key Total Memory Encryption) is a technology that allows
+memory encryption on Intel platforms. The main use case for the feature
+is virtual machine isolation. The API should apply to a wide range of
+use cases.
+
+Find the Intel Architecture Specification for MKTME here:
+https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
+
+The Encryption Process
+----------------------
+Userspace will see MKTME encryption as a Step Process.
+
+Step 1: Use the MKTME Key Service API to allocate an encryption key.
+
+Step 2: Use the encrypt_mprotect() system call to protect memory
+        with the encryption key obtained in Step 1.
+
+Definitions
+-----------
+Keys:	References to Keys in this document are to Userspace Keys.
+	These keys are requested by users and jointly managed by the
+	MKTME Key Service Type, and more broadly by the Kernel Key
+        Service of which MKTME is a part.
+
+	This document does not intend to document KKS, but only the
+	MKTME type of the KKS. The options of the KKS can be grouped
+	into 2 classes for purposes of understanding how MKTME operates
+	within the broader KKS.
+
+KeyIDs: References to KeyIDs in this document are to the hardware KeyID
+	slots that are available on Intel Platforms. A KeyID is a
+	numerical index into a software programmable slot in the Intel
+	hardware. Refer to the Intel specification linked above for
+	details on the implementation of MKTME in Intel platforms.
+
+Key<-->KeyID Mapping:
+	The MKTME Key Service maintains a mapping between Keys and KeyIDS.
+	This mapping is known only to the kernel. Userspace does not need
+	to know which hardware KeyID slot it's Userspace Key has been
+	assigned.
+
+Configuration
+-------------
+
+CONFIG_X86_INTEL_MKTME
+        MKTME is enabled by selecting CONFIG_X86_INTEL_MKTME on Intel
+        platforms supporting the MKTME feature.
+
+mktme_savekeys
+        mktme_savekeys is a kernel cmdline parameter.
+
+        This parameter allows the kernel to save the user specified
+        MKTME key payload. Saving this payload means that the MKTME
+        Key Service can always allow the addition of new physical
+        packages. If the mktme_savekeys parameter is not present,
+        users key data will not be saved, and new physical packages
+        may only be added to the system if no user type MKTME keys
+        are in use.
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 02/13] mm: Generalize the mprotect implementation to support extensions
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
  2018-12-04  7:39 ` [RFC v2 01/13] x86/mktme: Document the MKTME APIs Alison Schofield
@ 2018-12-04  7:39 ` Alison Schofield
  2018-12-06  8:08   ` Sakkinen, Jarkko
  2018-12-04  7:39 ` [RFC v2 03/13] syscall/x86: Wire up a new system call for memory encryption keys Alison Schofield
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

Today mprotect is implemented to support legacy mprotect behavior
plus an extension for memory protection keys. Make it more generic
so that it can support additional extensions in the future.

This is done is preparation for adding a new system call for memory
encyption keys. The intent is that the new encrypted mprotect will be
another extension to legacy mprotect.

Change-Id: Ib09b9d1b605b12d0254d7fb4968dfcc8e3c79dd7
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/mprotect.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index df408956dccc..b57075e278fb 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -35,6 +35,8 @@
 
 #include "internal.h"
 
+#define NO_KEY	-1
+
 static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long addr, unsigned long end, pgprot_t newprot,
 		int dirty_accountable, int prot_numa)
@@ -451,9 +453,9 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 }
 
 /*
- * pkey==-1 when doing a legacy mprotect()
+ * When pkey==NO_KEY we get legacy mprotect behavior here.
  */
-static int do_mprotect_pkey(unsigned long start, size_t len,
+static int do_mprotect_ext(unsigned long start, size_t len,
 		unsigned long prot, int pkey)
 {
 	unsigned long nstart, end, tmp, reqprot;
@@ -577,7 +579,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot)
 {
-	return do_mprotect_pkey(start, len, prot, -1);
+	return do_mprotect_ext(start, len, prot, NO_KEY);
 }
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -585,7 +587,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot, int, pkey)
 {
-	return do_mprotect_pkey(start, len, prot, pkey);
+	return do_mprotect_ext(start, len, prot, pkey);
 }
 
 SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 03/13] syscall/x86: Wire up a new system call for memory encryption keys
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
  2018-12-04  7:39 ` [RFC v2 01/13] x86/mktme: Document the MKTME APIs Alison Schofield
  2018-12-04  7:39 ` [RFC v2 02/13] mm: Generalize the mprotect implementation to support extensions Alison Schofield
@ 2018-12-04  7:39 ` Alison Schofield
  2018-12-04  7:39 ` [RFC v2 04/13] x86/mm: Add helper functions for MKTME " Alison Schofield
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

encrypt_mprotect() is a new system call to support memory encryption.

It takes the same parameters as legacy mprotect, plus an additional
key serial number that is mapped to an encryption keyid.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 include/linux/syscalls.h               | 2 ++
 include/uapi/asm-generic/unistd.h      | 4 +++-
 kernel/sys_ni.c                        | 2 ++
 5 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 3cf7b533b3d1..f41ad857d5c6 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -398,3 +398,4 @@
 384	i386	arch_prctl		sys_arch_prctl			__ia32_compat_sys_arch_prctl
 385	i386	io_pgetevents		sys_io_pgetevents		__ia32_compat_sys_io_pgetevents
 386	i386	rseq			sys_rseq			__ia32_sys_rseq
+387	i386	encrypt_mprotect	sys_encrypt_mprotect		__ia32_sys_encrypt_mprotect
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index f0b1709a5ffb..cf2decfa6119 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -343,6 +343,7 @@
 332	common	statx			__x64_sys_statx
 333	common	io_pgetevents		__x64_sys_io_pgetevents
 334	common	rseq			__x64_sys_rseq
+335	common	encrypt_mprotect	__x64_sys_encrypt_mprotect
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 2ac3d13a915b..c728b47e9004 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -907,6 +907,8 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
 asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
 			 int flags, uint32_t sig);
+asmlinkage long sys_encrypt_mprotect(unsigned long start, size_t len,
+				     unsigned long prot, key_serial_t serial);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 538546edbfbd..696c222ebe40 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -738,9 +738,11 @@ __SYSCALL(__NR_statx,     sys_statx)
 __SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
 #define __NR_rseq 293
 __SYSCALL(__NR_rseq, sys_rseq)
+#define __NR_encrypt_mprotect 294
+__SYSCALL(__NR_encrypt_mprotect, sys_encrypt_mprotect)
 
 #undef __NR_syscalls
-#define __NR_syscalls 294
+#define __NR_syscalls 295
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index df556175be50..1b48f709c265 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -336,6 +336,8 @@ COND_SYSCALL(pkey_mprotect);
 COND_SYSCALL(pkey_alloc);
 COND_SYSCALL(pkey_free);
 
+/* multi-key total memory encryption keys */
+COND_SYSCALL(encrypt_mprotect);
 
 /*
  * Architecture specific weak syscall entries.
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 04/13] x86/mm: Add helper functions for MKTME memory encryption keys
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (2 preceding siblings ...)
  2018-12-04  7:39 ` [RFC v2 03/13] syscall/x86: Wire up a new system call for memory encryption keys Alison Schofield
@ 2018-12-04  7:39 ` " Alison Schofield
  2018-12-04  9:14   ` Peter Zijlstra
                     ` (2 more replies)
  2018-12-04  7:39 ` [RFC v2 05/13] x86/mm: Set KeyIDs in encrypted VMAs Alison Schofield
                   ` (11 subsequent siblings)
  15 siblings, 3 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

Define a global mapping structure to manage the mapping of userspace
Keys to hardware KeyIDs in MKTME (Multi-Key Total Memory Encryption).
Implement helper functions that access this mapping structure.

The helpers will be used by these MKTME API's:
 >  Key Service API: security/keys/mktme_keys.c
 >  encrypt_mprotect() system call: mm/mprotect.c

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 12 ++++++
 arch/x86/mm/mktme.c          | 91 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 103 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index f05baa15e6f6..dbb49909d665 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -12,6 +12,18 @@ extern phys_addr_t mktme_keyid_mask;
 extern int mktme_nr_keyids;
 extern int mktme_keyid_shift;
 
+/* Manage mappings between hardware KeyIDs and userspace Keys */
+extern int mktme_map_alloc(void);
+extern void mktme_map_free(void);
+extern void mktme_map_lock(void);
+extern void mktme_map_unlock(void);
+extern int mktme_map_mapped_keyids(void);
+extern void mktme_map_set_keyid(int keyid, void *key);
+extern void mktme_map_free_keyid(int keyid);
+extern int mktme_map_keyid_from_key(void *key);
+extern void *mktme_map_key_from_keyid(int keyid);
+extern int mktme_map_get_free_keyid(void);
+
 DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
 static inline bool mktme_enabled(void)
 {
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index c81727540e7c..34224d4e3f45 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -40,6 +40,97 @@ int __vma_keyid(struct vm_area_struct *vma)
 	return (prot & mktme_keyid_mask) >> mktme_keyid_shift;
 }
 
+/*
+ * struct mktme_map and the mktme_map_* functions manage the mapping
+ * of userspace Keys to hardware KeyIDs. These are used by the MKTME Key
+ * Service API and the encrypt_mprotect() system call.
+ */
+
+struct mktme_mapping {
+	struct mutex	lock;		/* protect this map & HW state */
+	unsigned int	mapped_keyids;
+	void		*key[];
+};
+
+struct mktme_mapping *mktme_map;
+
+static inline long mktme_map_size(void)
+{
+	long size = 0;
+
+	size += sizeof(*mktme_map);
+	size += sizeof(mktme_map->key[0]) * (mktme_nr_keyids + 1);
+	return size;
+}
+
+int mktme_map_alloc(void)
+{
+	mktme_map = kvzalloc(mktme_map_size(), GFP_KERNEL);
+	if (!mktme_map)
+		return 0;
+	mutex_init(&mktme_map->lock);
+	return 1;
+}
+
+void mktme_map_free(void)
+{
+	kvfree(mktme_map);
+}
+
+void mktme_map_lock(void)
+{
+	mutex_lock(&mktme_map->lock);
+}
+
+void mktme_map_unlock(void)
+{
+	mutex_unlock(&mktme_map->lock);
+}
+
+int mktme_map_mapped_keyids(void)
+{
+	return mktme_map->mapped_keyids;
+}
+
+void mktme_map_set_keyid(int keyid, void *key)
+{
+	mktme_map->key[keyid] = key;
+	mktme_map->mapped_keyids++;
+}
+
+void mktme_map_free_keyid(int keyid)
+{
+	mktme_map->key[keyid] = 0;
+	mktme_map->mapped_keyids--;
+}
+
+int mktme_map_keyid_from_key(void *key)
+{
+	int i;
+
+	for (i = 1; i <= mktme_nr_keyids; i++)
+		if (mktme_map->key[i] == key)
+			return i;
+	return 0;
+}
+
+void *mktme_map_key_from_keyid(int keyid)
+{
+	return mktme_map->key[keyid];
+}
+
+int mktme_map_get_free_keyid(void)
+{
+	int i;
+
+	if (mktme_map->mapped_keyids < mktme_nr_keyids) {
+		for (i = 1; i <= mktme_nr_keyids; i++)
+			if (mktme_map->key[i] == 0)
+				return i;
+	}
+	return 0;
+}
+
 /* Prepare page to be used for encryption. Called from page allocator. */
 void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
 {
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 05/13] x86/mm: Set KeyIDs in encrypted VMAs
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (3 preceding siblings ...)
  2018-12-04  7:39 ` [RFC v2 04/13] x86/mm: Add helper functions for MKTME " Alison Schofield
@ 2018-12-04  7:39 ` Alison Schofield
  2018-12-06  8:37   ` Sakkinen, Jarkko
  2018-12-04  7:39 ` [RFC v2 06/13] mm: Add the encrypt_mprotect() system call Alison Schofield
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

MKTME architecture requires the KeyID to be placed in PTE bits 51:46.
To create an encrypted VMA, place the KeyID in the upper bits of
vm_page_prot that matches the position of those PTE bits.

When the VMA is assigned a KeyID it is always considered a KeyID
change. The VMA is either going from not encrypted to encrypted,
or from encrypted with any KeyID to encrypted with any other KeyID.
To make the change safely, remove the user pages held by the VMA
and unlink the VMA's anonymous chain.

Change-Id: I676056525c49c8803898315a10b196ef5a5c5415
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |  4 ++++
 arch/x86/mm/mktme.c          | 26 ++++++++++++++++++++++++++
 include/linux/mm.h           |  6 ++++++
 3 files changed, 36 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index dbb49909d665..de3e529f3ab0 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -24,6 +24,10 @@ extern int mktme_map_keyid_from_key(void *key);
 extern void *mktme_map_key_from_keyid(int keyid);
 extern int mktme_map_get_free_keyid(void);
 
+/* Set the encryption keyid bits in a VMA */
+extern void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
+				unsigned long start, unsigned long end);
+
 DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
 static inline bool mktme_enabled(void)
 {
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 34224d4e3f45..e3fdf7b48173 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,5 +1,6 @@
 #include <linux/mm.h>
 #include <linux/highmem.h>
+#include <linux/rmap.h>
 #include <asm/mktme.h>
 #include <asm/set_memory.h>
 
@@ -131,6 +132,31 @@ int mktme_map_get_free_keyid(void)
 	return 0;
 }
 
+/* Set the encryption keyid bits in a VMA */
+void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
+			  unsigned long start, unsigned long end)
+{
+	int oldkeyid = vma_keyid(vma);
+	pgprotval_t newprot;
+
+	/* Unmap pages with old KeyID if there's any. */
+	zap_page_range(vma, start, end - start);
+
+	if (oldkeyid == newkeyid)
+		return;
+
+	newprot = pgprot_val(vma->vm_page_prot);
+	newprot &= ~mktme_keyid_mask;
+	newprot |= (unsigned long)newkeyid << mktme_keyid_shift;
+	vma->vm_page_prot = __pgprot(newprot);
+
+	/*
+	 * The VMA doesn't have any inherited pages.
+	 * Start anon VMA tree from scratch.
+	 */
+	unlink_anon_vmas(vma);
+}
+
 /* Prepare page to be used for encryption. Called from page allocator. */
 void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1309761bb6d0..e2d87e92ca74 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2806,5 +2806,11 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+#ifndef CONFIG_X86_INTEL_MKTME
+static inline void mprotect_set_encrypt(struct vm_area_struct *vma,
+					int newkeyid,
+					unsigned long start,
+					unsigned long end) {}
+#endif /* CONFIG_X86_INTEL_MKTME */
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 06/13] mm: Add the encrypt_mprotect() system call
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (4 preceding siblings ...)
  2018-12-04  7:39 ` [RFC v2 05/13] x86/mm: Set KeyIDs in encrypted VMAs Alison Schofield
@ 2018-12-04  7:39 ` Alison Schofield
  2018-12-06  8:38   ` Sakkinen, Jarkko
  2018-12-04  7:39 ` [RFC v2 07/13] x86/mm: Add helpers for reference counting encrypted VMAs Alison Schofield
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

Implement memory encryption with a new system call that is an
extension of the legacy mprotect() system call.

In encrypt_mprotect the caller must pass a handle to a previously
allocated and programmed encryption key. Validate the key and store
the keyid bits in the vm_page_prot for each VMA in the protection
range.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/exec.c           |  4 ++--
 include/linux/key.h |  2 ++
 include/linux/mm.h  |  3 ++-
 mm/mprotect.c       | 63 +++++++++++++++++++++++++++++++++++++++++++++++------
 4 files changed, 62 insertions(+), 10 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index fc281b738a98..a0946b23e2c5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -752,8 +752,8 @@ int setup_arg_pages(struct linux_binprm *bprm,
 	vm_flags |= mm->def_flags;
 	vm_flags |= VM_STACK_INCOMPLETE_SETUP;
 
-	ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
-			vm_flags);
+	ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags,
+			     -1);
 	if (ret)
 		goto out_unlock;
 	BUG_ON(prev != vma);
diff --git a/include/linux/key.h b/include/linux/key.h
index e58ee10f6e58..fb8a7d5f6149 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -346,6 +346,8 @@ static inline key_serial_t key_serial(const struct key *key)
 
 extern void key_set_timeout(struct key *, unsigned);
 
+extern key_ref_t lookup_user_key(key_serial_t id, unsigned long lflags,
+				 key_perm_t perm);
 /*
  * The permissions required on a key that we're looking up.
  */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e2d87e92ca74..09182d78e7b7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1607,7 +1607,8 @@ extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long
 			      int dirty_accountable, int prot_numa);
 extern int mprotect_fixup(struct vm_area_struct *vma,
 			  struct vm_area_struct **pprev, unsigned long start,
-			  unsigned long end, unsigned long newflags);
+			  unsigned long end, unsigned long newflags,
+			  int newkeyid);
 
 /*
  * doesn't attempt to fault and will return short.
diff --git a/mm/mprotect.c b/mm/mprotect.c
index b57075e278fb..ad8127dc9aac 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -28,6 +28,7 @@
 #include <linux/ksm.h>
 #include <linux/uaccess.h>
 #include <linux/mm_inline.h>
+#include <linux/key.h>
 #include <asm/pgtable.h>
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
@@ -346,7 +347,8 @@ static int prot_none_walk(struct vm_area_struct *vma, unsigned long start,
 
 int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
-	unsigned long start, unsigned long end, unsigned long newflags)
+	       unsigned long start, unsigned long end, unsigned long newflags,
+	       int newkeyid)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long oldflags = vma->vm_flags;
@@ -356,7 +358,14 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	int error;
 	int dirty_accountable = 0;
 
-	if (newflags == oldflags) {
+	/*
+	 * Flags match and Keyids match or we have NO_KEY.
+	 * This _fixup is usually called from do_mprotect_ext() except
+	 * for one special case: caller fs/exec.c/setup_arg_pages()
+	 * In that case, newkeyid is passed as -1 (NO_KEY).
+	 */
+	if (newflags == oldflags &&
+	    (newkeyid == vma_keyid(vma) || newkeyid == NO_KEY)) {
 		*pprev = vma;
 		return 0;
 	}
@@ -422,6 +431,8 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	}
 
 success:
+	if (newkeyid != NO_KEY)
+		mprotect_set_encrypt(vma, newkeyid, start, end);
 	/*
 	 * vm_flags and vm_page_prot are protected by the mmap_sem
 	 * held in write mode.
@@ -453,10 +464,15 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 }
 
 /*
- * When pkey==NO_KEY we get legacy mprotect behavior here.
+ * do_mprotect_ext() supports the legacy mprotect behavior plus extensions
+ * for Protection Keys and Memory Encryption Keys. These extensions are
+ * mutually exclusive and the behavior is:
+ *	(pkey==NO_KEY && keyid==NO_KEY) ==> legacy mprotect
+ *	(pkey is valid)  ==> legacy mprotect plus Protection Key extensions
+ *	(keyid is valid) ==> legacy mprotect plus Encryption Key extensions
  */
 static int do_mprotect_ext(unsigned long start, size_t len,
-		unsigned long prot, int pkey)
+			   unsigned long prot, int pkey, int keyid)
 {
 	unsigned long nstart, end, tmp, reqprot;
 	struct vm_area_struct *vma, *prev;
@@ -554,7 +570,8 @@ static int do_mprotect_ext(unsigned long start, size_t len,
 		tmp = vma->vm_end;
 		if (tmp > end)
 			tmp = end;
-		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
+		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags,
+				       keyid);
 		if (error)
 			goto out;
 		nstart = tmp;
@@ -579,7 +596,7 @@ static int do_mprotect_ext(unsigned long start, size_t len,
 SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot)
 {
-	return do_mprotect_ext(start, len, prot, NO_KEY);
+	return do_mprotect_ext(start, len, prot, NO_KEY, NO_KEY);
 }
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -587,7 +604,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot, int, pkey)
 {
-	return do_mprotect_ext(start, len, prot, pkey);
+	return do_mprotect_ext(start, len, prot, pkey, NO_KEY);
 }
 
 SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
@@ -636,3 +653,35 @@ SYSCALL_DEFINE1(pkey_free, int, pkey)
 }
 
 #endif /* CONFIG_ARCH_HAS_PKEYS */
+
+#ifdef CONFIG_X86_INTEL_MKTME
+
+SYSCALL_DEFINE4(encrypt_mprotect, unsigned long, start, size_t, len,
+		unsigned long, prot, key_serial_t, serial)
+{
+	key_ref_t key_ref;
+	struct key *key;
+	int ret, keyid;
+
+	if (!PAGE_ALIGNED(len))
+		return -EINVAL;
+
+	key_ref = lookup_user_key(serial, 0, KEY_NEED_VIEW);
+	if (IS_ERR(key_ref))
+		return PTR_ERR(key_ref);
+
+	key = key_ref_to_ptr(key_ref);
+	mktme_map_lock();
+	keyid = mktme_map_keyid_from_key(key);
+	if (!keyid) {
+		mktme_map_unlock();
+		key_ref_put(key_ref);
+		return -EINVAL;
+	}
+	ret = do_mprotect_ext(start, len, prot, NO_KEY, keyid);
+	mktme_map_unlock();
+	key_ref_put(key_ref);
+	return ret;
+}
+
+#endif /* CONFIG_X86_INTEL_MKTME */
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 07/13] x86/mm: Add helpers for reference counting encrypted VMAs
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (5 preceding siblings ...)
  2018-12-04  7:39 ` [RFC v2 06/13] mm: Add the encrypt_mprotect() system call Alison Schofield
@ 2018-12-04  7:39 ` Alison Schofield
  2018-12-04  8:58   ` Peter Zijlstra
  2018-12-04  7:39 ` [RFC v2 08/13] mm: Use reference counting for " Alison Schofield
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

In order to safely manage the usage of memory encryption keys, VMAs
using each KeyID need to be counted. This count allows the MKTME
(Multi-Key Total Memory Encryption) Key Service to know when the KeyID
resource is actually in use, or when it is idle and may be considered
for reuse.

Define a global refcount_t array and provide helper functions to
manipulate the counts.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h |  9 +++++++
 arch/x86/mm/mktme.c          | 58 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm.h           |  2 ++
 3 files changed, 69 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index de3e529f3ab0..22d52635562c 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -28,6 +28,15 @@ extern int mktme_map_get_free_keyid(void);
 extern void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
 				unsigned long start, unsigned long end);
 
+/* Manage the MTKME encrypt_count references */
+extern int mktme_alloc_encrypt_array(void);
+extern void mktme_free_encrypt_array(void);
+extern int mktme_read_encrypt_ref(int keyid);
+extern void vma_get_encrypt_ref(struct vm_area_struct *vma);
+extern void vma_put_encrypt_ref(struct vm_area_struct *vma);
+extern void key_get_encrypt_ref(int keyid);
+extern void key_put_encrypt_ref(int keyid);
+
 DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
 static inline bool mktme_enabled(void)
 {
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index e3fdf7b48173..facf08f9cb74 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -157,6 +157,64 @@ void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
 	unlink_anon_vmas(vma);
 }
 
+/*
+ *  Helper functions manage the encrypt_count[] array that counts
+ *  references on each MKTME hardware keyid. The gets & puts are
+ *  used in core mm code that allocates and free's VMA's. The alloc,
+ *  free, and read functions are used by the MKTME key service to
+ *  manage key allocation and programming.
+ */
+refcount_t *encrypt_count;
+
+int mktme_alloc_encrypt_array(void)
+{
+	encrypt_count = kvcalloc(mktme_nr_keyids, sizeof(refcount_t),
+				 GFP_KERNEL);
+	if (!encrypt_count)
+		return -ENOMEM;
+	return 0;
+}
+
+void mktme_free_encrypt_array(void)
+{
+	kvfree(encrypt_count);
+}
+
+int mktme_read_encrypt_ref(int keyid)
+{
+	return refcount_read(&encrypt_count[keyid]);
+}
+
+void vma_get_encrypt_ref(struct vm_area_struct *vma)
+{
+	if (vma_keyid(vma))
+		refcount_inc(&encrypt_count[vma_keyid(vma)]);
+}
+
+void vma_put_encrypt_ref(struct vm_area_struct *vma)
+{
+	if (vma_keyid(vma))
+		if (refcount_dec_and_test(&encrypt_count[vma_keyid(vma)])) {
+			mktme_map_lock();
+			mktme_map_free_keyid(vma_keyid(vma));
+			mktme_map_unlock();
+		}
+}
+
+void key_get_encrypt_ref(int keyid)
+{
+	refcount_inc(&encrypt_count[keyid]);
+}
+
+void key_put_encrypt_ref(int keyid)
+{
+	if (refcount_dec_and_test(&encrypt_count[keyid])) {
+		mktme_map_lock();
+		mktme_map_free_keyid(keyid);
+		mktme_map_unlock();
+	}
+}
+
 /* Prepare page to be used for encryption. Called from page allocator. */
 void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 09182d78e7b7..453d675dd116 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2812,6 +2812,8 @@ static inline void mprotect_set_encrypt(struct vm_area_struct *vma,
 					int newkeyid,
 					unsigned long start,
 					unsigned long end) {}
+static inline void vma_get_encrypt_ref(struct vm_area_struct *vma) {}
+static inline void vma_put_encrypt_ref(struct vm_area_struct *vma) {}
 #endif /* CONFIG_X86_INTEL_MKTME */
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 08/13] mm: Use reference counting for encrypted VMAs
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (6 preceding siblings ...)
  2018-12-04  7:39 ` [RFC v2 07/13] x86/mm: Add helpers for reference counting encrypted VMAs Alison Schofield
@ 2018-12-04  7:39 ` " Alison Schofield
  2018-12-04  7:39 ` [RFC v2 09/13] mm: Restrict memory encryption to anonymous VMA's Alison Schofield
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

The MKTME (Multi-Key Total Memory Encryption) Key Service needs
a reference count on encrypted VMAs. This reference count is used
to determine when a hardware encryption keyid is in use, which in
turn, tells the key service what operations can be safely performed
with this keyid.

The approach is:
1) Increment/decrement the reference count during encrypt_mprotect()
system call for initial or updated encryption on a VMA.

2) Piggy back on the new vm_area_dup/free() helpers. If the VMAs being
duplicated, or freed are encrypted, adjust the reference count.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/mm/mktme.c | 2 ++
 kernel/fork.c       | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index facf08f9cb74..55d34beb9b81 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -145,10 +145,12 @@ void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
 	if (oldkeyid == newkeyid)
 		return;
 
+	vma_put_encrypt_ref(vma);
 	newprot = pgprot_val(vma->vm_page_prot);
 	newprot &= ~mktme_keyid_mask;
 	newprot |= (unsigned long)newkeyid << mktme_keyid_shift;
 	vma->vm_page_prot = __pgprot(newprot);
+	vma_get_encrypt_ref(vma);
 
 	/*
 	 * The VMA doesn't have any inherited pages.
diff --git a/kernel/fork.c b/kernel/fork.c
index 07cddff89c7b..d12d27b50966 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -341,12 +341,14 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
 	if (new) {
 		*new = *orig;
 		INIT_LIST_HEAD(&new->anon_vma_chain);
+		vma_get_encrypt_ref(new);
 	}
 	return new;
 }
 
 void vm_area_free(struct vm_area_struct *vma)
 {
+	vma_put_encrypt_ref(vma);
 	kmem_cache_free(vm_area_cachep, vma);
 }
 
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 09/13] mm: Restrict memory encryption to anonymous VMA's
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (7 preceding siblings ...)
  2018-12-04  7:39 ` [RFC v2 08/13] mm: Use reference counting for " Alison Schofield
@ 2018-12-04  7:39 ` Alison Schofield
  2018-12-04  9:10   ` Peter Zijlstra
  2018-12-04  7:39 ` [RFC v2 10/13] keys/mktme: Add the MKTME Key Service type for memory encryption Alison Schofield
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

Memory encryption is only supported for mappings that are ANONYMOUS.
Test the entire range of VMA's in an encrypt_mprotect() request to
make sure they all meet that requirement before encrypting any.

The encrypt_mprotect syscall will return -EINVAL and will not encrypt
any VMA's if this check fails.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/mprotect.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index ad8127dc9aac..f1c009409134 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -345,6 +345,24 @@ static int prot_none_walk(struct vm_area_struct *vma, unsigned long start,
 	return walk_page_range(start, end, &prot_none_walk);
 }
 
+/*
+ * Encrypted mprotect is only supported on anonymous mappings.
+ * All VMA's in the requested range must be anonymous. If this
+ * test fails on any single VMA, the entire mprotect request fails.
+ */
+bool mem_supports_encryption(struct vm_area_struct *vma, unsigned long end)
+{
+	struct vm_area_struct *test_vma = vma;
+
+	do {
+		if (!vma_is_anonymous(test_vma))
+			return false;
+
+		test_vma = test_vma->vm_next;
+	} while (test_vma && test_vma->vm_start < end);
+	return true;
+}
+
 int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	       unsigned long start, unsigned long end, unsigned long newflags,
@@ -531,6 +549,12 @@ static int do_mprotect_ext(unsigned long start, size_t len,
 				goto out;
 		}
 	}
+
+	if (keyid > 0 && !mem_supports_encryption(vma, end)) {
+		error = -EINVAL;
+		goto out;
+	}
+
 	if (start > vma->vm_start)
 		prev = vma;
 
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 10/13] keys/mktme: Add the MKTME Key Service type for memory encryption
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (8 preceding siblings ...)
  2018-12-04  7:39 ` [RFC v2 09/13] mm: Restrict memory encryption to anonymous VMA's Alison Schofield
@ 2018-12-04  7:39 ` Alison Schofield
  2018-12-06  8:51   ` Sakkinen, Jarkko
  2018-12-04  7:39 ` [RFC v2 11/13] keys/mktme: Program memory encryption keys on a system wide basis Alison Schofield
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

MKTME (Multi-Key Total Memory Encryption) is a technology that allows
transparent memory encryption in upcoming Intel platforms. MKTME will
support mulitple encryption domains, each having their own key. The main
use case for the feature is virtual machine isolation. The API needs the
flexibility to work for a wide range of uses.

The MKTME key service type manages the addition and removal of the memory
encryption keys. It maps Userspace Keys to hardware KeyIDs. It programs
the hardware with the user requested encryption options.

The only supported encryption algorithm is AES-XTS 128.

The MKTME key service is half of the MKTME API level solution. It pairs
with a new memory encryption system call: encrypt_mprotect() that uses
the keys to encrypt memory.

See Documentation/x86/mktme/mktme.rst

Change-Id: Idda4af2beabb739c77719897affff183ee9fa716
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/Kconfig           |   1 +
 include/keys/mktme-type.h  |  41 ++++++
 security/keys/Kconfig      |  11 ++
 security/keys/Makefile     |   1 +
 security/keys/mktme_keys.c | 339 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 393 insertions(+)
 create mode 100644 include/keys/mktme-type.h
 create mode 100644 security/keys/mktme_keys.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7ac78e2856c7..c2e3bb5af077 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1531,6 +1531,7 @@ config X86_INTEL_MKTME
 	bool "Intel Multi-Key Total Memory Encryption"
 	select DYNAMIC_PHYSICAL_MASK
 	select PAGE_EXTENSION
+	select MKTME_KEYS
 	depends on X86_64 && CPU_SUP_INTEL
 	---help---
 	  Say yes to enable support for Multi-Key Total Memory Encryption.
diff --git a/include/keys/mktme-type.h b/include/keys/mktme-type.h
new file mode 100644
index 000000000000..c63c6568087f
--- /dev/null
+++ b/include/keys/mktme-type.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/* Key service for Multi-KEY Total Memory Encryption */
+
+#ifndef _KEYS_MKTME_TYPE_H
+#define _KEYS_MKTME_TYPE_H
+
+#include <linux/key.h>
+
+/*
+ * The AES-XTS 128 encryption algorithm requires 128 bits for each
+ * user supplied data key and tweak key.
+ */
+#define MKTME_AES_XTS_SIZE	16	/* 16 bytes, 128 bits */
+
+enum mktme_alg {
+	MKTME_ALG_AES_XTS_128,
+};
+
+const char *const mktme_alg_names[] = {
+	[MKTME_ALG_AES_XTS_128]	= "aes-xts-128",
+};
+
+enum mktme_type {
+	MKTME_TYPE_ERROR = -1,
+	MKTME_TYPE_USER,
+	MKTME_TYPE_CPU,
+	MKTME_TYPE_CLEAR,
+	MKTME_TYPE_NO_ENCRYPT,
+};
+
+const char *const mktme_type_names[] = {
+	[MKTME_TYPE_USER]	= "user",
+	[MKTME_TYPE_CPU]	= "cpu",
+	[MKTME_TYPE_CLEAR]	= "clear",
+	[MKTME_TYPE_NO_ENCRYPT]	= "no-encrypt",
+};
+
+extern struct key_type key_type_mktme;
+
+#endif /* _KEYS_MKTME_TYPE_H */
diff --git a/security/keys/Kconfig b/security/keys/Kconfig
index 6462e6654ccf..c36972113e67 100644
--- a/security/keys/Kconfig
+++ b/security/keys/Kconfig
@@ -101,3 +101,14 @@ config KEY_DH_OPERATIONS
 	 in the kernel.
 
 	 If you are unsure as to whether this is required, answer N.
+
+config MKTME_KEYS
+	bool "Multi-Key Total Memory Encryption Keys"
+	depends on KEYS && X86_INTEL_MKTME
+	help
+	  This option provides support for Multi-Key Total Memory
+	  Encryption (MKTME) on Intel platforms offering the feature.
+	  MKTME allows userspace to manage the hardware encryption
+	  keys through the kernel key services.
+
+	  If you are unsure as to whether this is required, answer N.
diff --git a/security/keys/Makefile b/security/keys/Makefile
index 9cef54064f60..94c84f10a857 100644
--- a/security/keys/Makefile
+++ b/security/keys/Makefile
@@ -30,3 +30,4 @@ obj-$(CONFIG_ASYMMETRIC_KEY_TYPE) += keyctl_pkey.o
 obj-$(CONFIG_BIG_KEYS) += big_key.o
 obj-$(CONFIG_TRUSTED_KEYS) += trusted.o
 obj-$(CONFIG_ENCRYPTED_KEYS) += encrypted-keys/
+obj-$(CONFIG_MKTME_KEYS) += mktme_keys.o
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
new file mode 100644
index 000000000000..e615eb58e600
--- /dev/null
+++ b/security/keys/mktme_keys.c
@@ -0,0 +1,339 @@
+// SPDX-License-Identifier: GPL-3.0
+
+/* Documentation/x86/mktme/mktme_keys.rst */
+
+#include <linux/cred.h>
+#include <linux/cpu.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/key.h>
+#include <linux/key-type.h>
+#include <linux/init.h>
+#include <linux/parser.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <asm/intel_pconfig.h>
+#include <asm/mktme.h>
+#include <keys/mktme-type.h>
+#include <keys/user-type.h>
+
+#include "internal.h"
+
+struct kmem_cache *mktme_prog_cache;	/* Hardware programming cache */
+
+static const char * const mktme_program_err[] = {
+	"KeyID was successfully programmed",	/* 0 */
+	"Invalid KeyID programming command",	/* 1 */
+	"Insufficient entropy",			/* 2 */
+	"KeyID not valid",			/* 3 */
+	"Invalid encryption algorithm chosen",	/* 4 */
+	"Failure to access key table",		/* 5 */
+};
+
+enum mktme_opt_id {
+	OPT_ERROR = -1,
+	OPT_TYPE,
+	OPT_KEY,
+	OPT_TWEAK,
+	OPT_ALGORITHM,
+};
+
+static const match_table_t mktme_token = {
+	{OPT_TYPE, "type=%s"},
+	{OPT_KEY, "key=%s"},
+	{OPT_TWEAK, "tweak=%s"},
+	{OPT_ALGORITHM, "algorithm=%s"},
+	{OPT_ERROR, NULL}
+};
+
+struct mktme_payload {
+	u32		keyid_ctrl;	/* Command & Encryption Algorithm */
+	u8		data_key[MKTME_AES_XTS_SIZE];
+	u8		tweak_key[MKTME_AES_XTS_SIZE];
+};
+
+/* Key Service Method called when Key is garbage collected. */
+static void mktme_destroy_key(struct key *key)
+{
+	key_put_encrypt_ref(mktme_map_keyid_from_key(key));
+}
+
+/* Copy the payload to the HW programming structure and program this KeyID */
+static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
+{
+	struct mktme_key_program *kprog = NULL;
+	u8 kern_entropy[MKTME_AES_XTS_SIZE];
+	int i, ret;
+
+	kprog = kmem_cache_zalloc(mktme_prog_cache, GFP_KERNEL);
+	if (!kprog)
+		return -ENOMEM;
+
+	/* Hardware programming requires cached aligned struct */
+	kprog->keyid = keyid;
+	kprog->keyid_ctrl = payload->keyid_ctrl;
+	memcpy(kprog->key_field_1, payload->data_key, MKTME_AES_XTS_SIZE);
+	memcpy(kprog->key_field_2, payload->tweak_key, MKTME_AES_XTS_SIZE);
+
+	/* Strengthen the entropy fields for CPU generated keys */
+	if ((payload->keyid_ctrl & 0xff) == MKTME_KEYID_SET_KEY_RANDOM) {
+		get_random_bytes(&kern_entropy, sizeof(kern_entropy));
+		for (i = 0; i < (MKTME_AES_XTS_SIZE); i++) {
+			kprog->key_field_1[i] ^= kern_entropy[i];
+			kprog->key_field_2[i] ^= kern_entropy[i];
+		}
+	}
+	ret = mktme_key_program(kprog);
+	kmem_cache_free(mktme_prog_cache, kprog);
+	return ret;
+}
+
+/* Key Service Method to update an existing key. */
+static int mktme_update_key(struct key *key,
+			    struct key_preparsed_payload *prep)
+{
+	struct mktme_payload *payload = prep->payload.data[0];
+	int keyid, ref_count;
+	int ret;
+
+	mktme_map_lock();
+	keyid = mktme_map_keyid_from_key(key);
+	if (keyid <= 0)
+		return -EINVAL;
+	/*
+	 * ref_count will be at least one when we get here because the
+	 * key already exists. If ref_count is not > 1, it is safe to
+	 * update the key while holding the mktme_map_lock.
+	 */
+	ref_count = mktme_read_encrypt_ref(keyid);
+	if (ref_count > 1) {
+		pr_debug("mktme not updating keyid[%d] encrypt_count[%d]\n",
+			 keyid, ref_count);
+		return -EBUSY;
+	}
+	ret = mktme_program_keyid(keyid, payload);
+	if (ret != MKTME_PROG_SUCCESS) {
+		pr_debug("%s: %s\n", __func__, mktme_program_err[ret]);
+		ret = -ENOKEY;
+	}
+	mktme_map_unlock();
+	return ret;
+}
+
+/* Key Service Method to create a new key. Payload is preparsed. */
+int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
+{
+	struct mktme_payload *payload = prep->payload.data[0];
+	int keyid, ret;
+
+	mktme_map_lock();
+	keyid = mktme_map_get_free_keyid();
+	if (keyid == 0) {
+		ret = -ENOKEY;
+		goto out;
+	}
+	ret = mktme_program_keyid(keyid, payload);
+	if (ret != MKTME_PROG_SUCCESS) {
+		pr_debug("%s: %s\n", __func__, mktme_program_err[ret]);
+		ret = -ENOKEY;
+		goto out;
+	}
+	mktme_map_set_keyid(keyid, key);
+	key_get_encrypt_ref(keyid);
+out:
+	mktme_map_unlock();
+	return ret;
+}
+
+/* Verify the user provided the needed arguments for the TYPE of Key */
+static int mktme_check_options(struct mktme_payload *payload,
+			       unsigned long token_mask, enum mktme_type type)
+{
+	if (!token_mask)
+		return -EINVAL;
+
+	switch (type) {
+	case MKTME_TYPE_USER:
+		if (test_bit(OPT_ALGORITHM, &token_mask))
+			payload->keyid_ctrl |= MKTME_AES_XTS_128;
+		else
+			return -EINVAL;
+
+		if ((test_bit(OPT_KEY, &token_mask)) &&
+		    (test_bit(OPT_TWEAK, &token_mask)))
+			payload->keyid_ctrl |= MKTME_KEYID_SET_KEY_DIRECT;
+		else
+			return -EINVAL;
+		break;
+
+	case MKTME_TYPE_CPU:
+		if (test_bit(OPT_ALGORITHM, &token_mask))
+			payload->keyid_ctrl |= MKTME_AES_XTS_128;
+		else
+			return -EINVAL;
+
+		payload->keyid_ctrl |= MKTME_KEYID_SET_KEY_RANDOM;
+		break;
+
+	case MKTME_TYPE_CLEAR:
+		payload->keyid_ctrl |= MKTME_KEYID_CLEAR_KEY;
+		break;
+
+	case MKTME_TYPE_NO_ENCRYPT:
+		payload->keyid_ctrl |= MKTME_KEYID_NO_ENCRYPT;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/* Parse the options and store the key programming data in the payload. */
+static int mktme_get_options(char *options, struct mktme_payload *payload)
+{
+	enum mktme_type type = MKTME_TYPE_ERROR;
+	substring_t args[MAX_OPT_ARGS];
+	unsigned long token_mask = 0;
+	char *p = options;
+	int ret, token;
+
+	while ((p = strsep(&options, " \t"))) {
+		if (*p == '\0' || *p == ' ' || *p == '\t')
+			continue;
+		token = match_token(p, mktme_token, args);
+		if (test_and_set_bit(token, &token_mask))
+			return -EINVAL;
+
+		switch (token) {
+		case OPT_KEY:
+			ret = hex2bin(payload->data_key, args[0].from,
+				      MKTME_AES_XTS_SIZE);
+			if (ret < 0)
+				return -EINVAL;
+			break;
+
+		case OPT_TWEAK:
+			ret = hex2bin(payload->tweak_key, args[0].from,
+				      MKTME_AES_XTS_SIZE);
+			if (ret < 0)
+				return -EINVAL;
+			break;
+
+		case OPT_TYPE:
+			type = match_string(mktme_type_names,
+					    ARRAY_SIZE(mktme_type_names),
+					    args[0].from);
+			if (type < 0)
+				return -EINVAL;
+			break;
+
+		case OPT_ALGORITHM:
+			ret = match_string(mktme_alg_names,
+					   ARRAY_SIZE(mktme_alg_names),
+					   args[0].from);
+			if (ret < 0)
+				return -EINVAL;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+	}
+	return mktme_check_options(payload, token_mask, type);
+}
+
+void mktme_free_preparsed_key(struct key_preparsed_payload *prep)
+{
+	kzfree(prep->payload.data[0]);
+}
+
+/*
+ * Key Service Method to preparse a payload before a key is created.
+ * Check permissions and the options. Load the proposed key field
+ * data into the payload for use by instantiate and update methods.
+ */
+int mktme_preparse_key(struct key_preparsed_payload *prep)
+{
+	struct mktme_payload *mktme_payload;
+	size_t datalen = prep->datalen;
+	char *options;
+	int ret;
+
+	if (!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (datalen <= 0 || datalen > 1024 || !prep->data)
+		return -EINVAL;
+
+	options = kmemdup(prep->data, datalen + 1, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	options[datalen] = '\0';
+
+	mktme_payload = kzalloc(sizeof(*mktme_payload), GFP_KERNEL);
+	if (!mktme_payload) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	ret = mktme_get_options(options, mktme_payload);
+	if (ret < 0)
+		goto out;
+
+	prep->quotalen = sizeof(mktme_payload);
+	prep->payload.data[0] = mktme_payload;
+out:
+	kzfree(options);
+	return ret;
+}
+
+struct key_type key_type_mktme = {
+	.name		= "mktme",
+	.preparse	= mktme_preparse_key,
+	.free_preparse	= mktme_free_preparsed_key,
+	.instantiate	= mktme_instantiate_key,
+	.update		= mktme_update_key,
+	.describe	= user_describe,
+	.destroy	= mktme_destroy_key,
+};
+
+/*
+ * Allocate the global mktme_map structure based on the available keyids.
+ * Create a cache for the hardware structure. Initialize the encrypt_count
+ * array to track * VMA's per keyid. Once all that succeeds, register the
+ * 'mktme' key type.
+ */
+static int __init init_mktme(void)
+{
+	int ret;
+
+	/* Verify keys are present */
+	if (!(mktme_nr_keyids > 0))
+		return -EINVAL;
+
+	if (!mktme_map_alloc())
+		return -ENOMEM;
+
+	mktme_prog_cache = KMEM_CACHE(mktme_key_program, SLAB_PANIC);
+	if (!mktme_prog_cache)
+		goto free_map;
+
+	if (mktme_alloc_encrypt_array() < 0)
+		goto free_cache;
+
+	ret = register_key_type(&key_type_mktme);
+	if (!ret)
+		return ret;			/* SUCCESS */
+
+	mktme_free_encrypt_array();
+free_cache:
+	kmem_cache_destroy(mktme_prog_cache);
+free_map:
+	mktme_map_free();
+
+	return -ENOMEM;
+}
+
+late_initcall(init_mktme);
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 11/13] keys/mktme: Program memory encryption keys on a system wide basis
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (9 preceding siblings ...)
  2018-12-04  7:39 ` [RFC v2 10/13] keys/mktme: Add the MKTME Key Service type for memory encryption Alison Schofield
@ 2018-12-04  7:39 ` Alison Schofield
  2018-12-04  9:21   ` Peter Zijlstra
  2018-12-04  7:39 ` [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows Alison Schofield
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

The kernel manages the MKTME (Multi-Key Total Memory Encryption) Keys
as a system wide single pool of keys. The hardware, however, manages
the keys on a per physical package basis. Each physical package
maintains a Key Table that all CPU's in that package share.

In order to maintain the consistent, system wide view that the kernel
requires, program all physical packages during a key program request.

Change-Id: I0ff46f37fde47a0305842baeb8ea600b6c568639
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 security/keys/mktme_keys.c | 61 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index e615eb58e600..7f113146acf2 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -21,6 +21,7 @@
 #include "internal.h"
 
 struct kmem_cache *mktme_prog_cache;	/* Hardware programming cache */
+cpumask_var_t mktme_leadcpus;		/* one cpu per pkg to program keys */
 
 static const char * const mktme_program_err[] = {
 	"KeyID was successfully programmed",	/* 0 */
@@ -59,6 +60,37 @@ static void mktme_destroy_key(struct key *key)
 	key_put_encrypt_ref(mktme_map_keyid_from_key(key));
 }
 
+struct mktme_hw_program_info {
+	struct mktme_key_program *key_program;
+	unsigned long status;
+};
+
+/* Program a KeyID on a single package. */
+static void mktme_program_package(void *hw_program_info)
+{
+	struct mktme_hw_program_info *info = hw_program_info;
+	int ret;
+
+	ret = mktme_key_program(info->key_program);
+	if (ret != MKTME_PROG_SUCCESS)
+		WRITE_ONCE(info->status, ret);
+}
+
+/* Program a KeyID across the entire system. */
+static int mktme_program_system(struct mktme_key_program *key_program,
+				cpumask_var_t mktme_cpumask)
+{
+	struct mktme_hw_program_info info = {
+		.key_program = key_program,
+		.status = MKTME_PROG_SUCCESS,
+	};
+	get_online_cpus();
+	on_each_cpu_mask(mktme_cpumask, mktme_program_package, &info, 1);
+	put_online_cpus();
+
+	return info.status;
+}
+
 /* Copy the payload to the HW programming structure and program this KeyID */
 static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
 {
@@ -84,7 +116,7 @@ static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
 			kprog->key_field_2[i] ^= kern_entropy[i];
 		}
 	}
-	ret = mktme_key_program(kprog);
+	ret = mktme_program_system(kprog, mktme_leadcpus);
 	kmem_cache_free(mktme_prog_cache, kprog);
 	return ret;
 }
@@ -299,6 +331,28 @@ struct key_type key_type_mktme = {
 	.destroy	= mktme_destroy_key,
 };
 
+static int mktme_build_leadcpus_mask(void)
+{
+	int online_cpu, mktme_cpu;
+	int online_pkgid, mktme_pkgid = -1;
+
+	if (!zalloc_cpumask_var(&mktme_leadcpus, GFP_KERNEL))
+		return -ENOMEM;
+
+	for_each_online_cpu(online_cpu) {
+		online_pkgid = topology_physical_package_id(online_cpu);
+
+		for_each_cpu(mktme_cpu, mktme_leadcpus) {
+			mktme_pkgid = topology_physical_package_id(mktme_cpu);
+			if (mktme_pkgid == online_pkgid)
+				break;
+		}
+		if (mktme_pkgid != online_pkgid)
+			cpumask_set_cpu(online_cpu, mktme_leadcpus);
+	}
+	return 0;
+}
+
 /*
  * Allocate the global mktme_map structure based on the available keyids.
  * Create a cache for the hardware structure. Initialize the encrypt_count
@@ -323,10 +377,15 @@ static int __init init_mktme(void)
 	if (mktme_alloc_encrypt_array() < 0)
 		goto free_cache;
 
+	if (mktme_build_leadcpus_mask() < 0)
+		goto free_array;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
 
+	free_cpumask_var(mktme_leadcpus);
+free_array:
 	mktme_free_encrypt_array();
 free_cache:
 	kmem_cache_destroy(mktme_prog_cache);
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (10 preceding siblings ...)
  2018-12-04  7:39 ` [RFC v2 11/13] keys/mktme: Program memory encryption keys on a system wide basis Alison Schofield
@ 2018-12-04  7:39 ` Alison Schofield
  2018-12-04  9:22   ` Peter Zijlstra
  2018-12-07  2:14   ` Huang, Kai
  2018-12-04  7:40 ` [RFC v2 13/13] keys/mktme: Support CPU Hotplug for MKTME keys Alison Schofield
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:39 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

MKTME (Multi-Key Total Memory Encryption) key payloads may include
data encryption keys, tweak keys, and additional entropy bits. These
are used to program the MKTME encryption hardware. By default, the
kernel destroys this payload data once the hardware is programmed.

However, in order to fully support CPU Hotplug, saving the key data
becomes important. The MKTME Key Service cannot allow a new physical
package to come online unless it can program the new packages Key Table
to match the Key Tables of all existing physical packages.

With CPU generated keys (a.k.a. random keys or ephemeral keys) the
saving of user key data is not an issue. The kernel and MKTME hardware
can generate strong encryption keys without recalling any user supplied
data.

With USER directed keys (a.k.a. user type) saving the key programming
data (data and tweak key) becomes an issue. The data and tweak keys
are required to program those keys on a new physical package.

In preparation for adding CPU hotplug support:

   Add an 'mktme_vault' where key data is stored.

   Add 'mktme_savekeys' kernel command line parameter that directs
   what key data can be stored. If it is not set, kernel does not
   store users data key or tweak key.

   Add 'mktme_bitmap_user_type' to track when USER type keys are in
   use. If no USER type keys are currently in use, a physical package
   may be brought online, despite the absence of 'mktme_savekeys'.

Change-Id: If57414862f1ac131dd97e29bf4f3937ac33777f6
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/admin-guide/kernel-parameters.rst |  1 +
 Documentation/admin-guide/kernel-parameters.txt | 11 +++++
 arch/x86/mm/mktme.c                             |  2 +
 security/keys/mktme_keys.c                      | 65 +++++++++++++++++++++++++
 4 files changed, 79 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index b8d0bc07ed0a..1b62b86d0666 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -120,6 +120,7 @@ parameter is applicable::
 			Documentation/m68k/kernel-options.txt.
 	MDA	MDA console support is enabled.
 	MIPS	MIPS architecture is enabled.
+	MKTME	Multi-Key Total Memory Encryption is enabled.
 	MOUSE	Appropriate mouse support is enabled.
 	MSI	Message Signaled Interrupts (PCI).
 	MTD	MTD (Memory Technology Device) support is enabled.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 81d1d5a74728..c777dbf0f75c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2497,6 +2497,17 @@
 			in the "bleeding edge" mini2440 support kernel at
 			http://repo.or.cz/w/linux-2.6/mini2440.git
 
+	mktme_savekeys  [X86, MKTME] When CONFIG_X86_INTEL_MKTME is set
+			this parameter allows the kernel to save the user
+			specified MKTME key payload. Saving this payload
+			means that the MKTME Key Service can always allows
+			the addition of new physical packages. If the
+			mktme_savekeys parameter is not present, users key
+			data will not be saved, and new physical packages
+			may only be added to the system if no user type
+			MKTME keys are in use.
+			See Documentation/x86/mktme.rst
+
 	mminit_loglevel=
 			[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
 			parameter allows control of the logging verbosity for
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 55d34beb9b81..f96f4f2884e8 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -99,10 +99,12 @@ void mktme_map_set_keyid(int keyid, void *key)
 	mktme_map->mapped_keyids++;
 }
 
+extern unsigned long *mktme_bitmap_user_type;
 void mktme_map_free_keyid(int keyid)
 {
 	mktme_map->key[keyid] = 0;
 	mktme_map->mapped_keyids--;
+	clear_bit(keyid, mktme_bitmap_user_type);
 }
 
 int mktme_map_keyid_from_key(void *key)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 7f113146acf2..e9c7d306cba1 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -23,6 +23,11 @@
 struct kmem_cache *mktme_prog_cache;	/* Hardware programming cache */
 cpumask_var_t mktme_leadcpus;		/* one cpu per pkg to program keys */
 
+/* Kernel command line parameter allows saving of users key payload. */
+static bool mktme_savekeys;
+/* Track the existence of user type keys to make package hotplug decisions. */
+unsigned long *mktme_bitmap_user_type;
+
 static const char * const mktme_program_err[] = {
 	"KeyID was successfully programmed",	/* 0 */
 	"Invalid KeyID programming command",	/* 1 */
@@ -54,6 +59,9 @@ struct mktme_payload {
 	u8		tweak_key[MKTME_AES_XTS_SIZE];
 };
 
+/* Store keys in this vault if cmdline parameter mktme_savekeys allows */
+struct mktme_payload *mktme_vault;
+
 /* Key Service Method called when Key is garbage collected. */
 static void mktme_destroy_key(struct key *key)
 {
@@ -121,6 +129,23 @@ static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
 	return ret;
 }
 
+static void mktme_load_vault(int keyid, struct mktme_payload *payload)
+{
+	/*
+	 * Always save the control fields to program hotplugged
+	 * packages with RANDOM, CLEAR, or NO_ENCRYPT type keys.
+	 */
+	mktme_vault[keyid].keyid_ctrl = payload->keyid_ctrl;
+
+	/* Only save data and tweak keys if allowed */
+	if (mktme_savekeys) {
+		memcpy(mktme_vault[keyid].data_key, payload->data_key,
+		       MKTME_AES_XTS_SIZE);
+		memcpy(mktme_vault[keyid].tweak_key, payload->tweak_key,
+		       MKTME_AES_XTS_SIZE);
+	}
+}
+
 /* Key Service Method to update an existing key. */
 static int mktme_update_key(struct key *key,
 			    struct key_preparsed_payload *prep)
@@ -144,11 +169,23 @@ static int mktme_update_key(struct key *key,
 			 keyid, ref_count);
 		return -EBUSY;
 	}
+
+	/* Forget if key was user type. */
+	clear_bit(keyid, mktme_bitmap_user_type);
+
 	ret = mktme_program_keyid(keyid, payload);
 	if (ret != MKTME_PROG_SUCCESS) {
 		pr_debug("%s: %s\n", __func__, mktme_program_err[ret]);
 		ret = -ENOKEY;
+		goto out;
 	}
+
+	mktme_load_vault(keyid, payload);
+
+	/* Remember if this key is user type. */
+	if ((payload->keyid_ctrl & 0xff) == MKTME_KEYID_SET_KEY_DIRECT)
+		set_bit(keyid, mktme_bitmap_user_type);
+out:
 	mktme_map_unlock();
 	return ret;
 }
@@ -171,6 +208,13 @@ int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 		ret = -ENOKEY;
 		goto out;
 	}
+
+	mktme_load_vault(keyid, payload);
+
+	/* Remember if key is user type. */
+	if ((payload->keyid_ctrl & 0xff) == MKTME_KEYID_SET_KEY_DIRECT)
+		set_bit(keyid, mktme_bitmap_user_type);
+
 	mktme_map_set_keyid(keyid, key);
 	key_get_encrypt_ref(keyid);
 out:
@@ -380,10 +424,23 @@ static int __init init_mktme(void)
 	if (mktme_build_leadcpus_mask() < 0)
 		goto free_array;
 
+	mktme_bitmap_user_type = bitmap_zalloc(mktme_nr_keyids, GFP_KERNEL);
+	if (!mktme_bitmap_user_type)
+		goto free_mask;
+
+	mktme_vault = kzalloc(sizeof(mktme_vault[0]) * (mktme_nr_keyids + 1),
+			      GFP_KERNEL);
+	if (!mktme_vault)
+		goto free_bitmap;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
 
+	kfree(mktme_vault);
+free_bitmap:
+	bitmap_free(mktme_bitmap_user_type);
+free_mask:
 	free_cpumask_var(mktme_leadcpus);
 free_array:
 	mktme_free_encrypt_array();
@@ -396,3 +453,11 @@ static int __init init_mktme(void)
 }
 
 late_initcall(init_mktme);
+
+static int mktme_enable_savekeys(char *__unused)
+{
+	mktme_savekeys = true;
+	return 1;
+}
+__setup("mktme_savekeys", mktme_enable_savekeys);
+
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC v2 13/13] keys/mktme: Support CPU Hotplug for MKTME keys
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (11 preceding siblings ...)
  2018-12-04  7:39 ` [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows Alison Schofield
@ 2018-12-04  7:40 ` Alison Schofield
  2018-12-04  9:28   ` Peter Zijlstra
  2018-12-04  9:31   ` Peter Zijlstra
  2018-12-04  9:25 ` [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Peter Zijlstra
                   ` (2 subsequent siblings)
  15 siblings, 2 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-04  7:40 UTC (permalink / raw)
  To: dhowells, tglx
  Cc: jmorris, mingo, hpa, bp, luto, peterz, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

The MKTME (Multi-Key Memory Encryption Keys) hardware resides on each
physical package. The kernel maintains one Key Table on each physical
package and these Key Tables must all remain in sync.
(From here on, package means physical package.)

Although every CPU on that package has the ability to program the Key
Table, the kernel uses one 'lead' cpu per package to program the Key
Tables. Typically, keys are programmed one at a time, across all
packages, as the 'add key' requests come in from userspace.

Some CPU hotplug scenarios are handled quite simply:
>  Teardown a non lead CPU --> do nothing
>  Teardown a lead CPU --> pick a new lead CPU
>  Teardown a lead/last CPU of a package --> forget this package
>  Startup a CPU in a known package --> do nothing
>  Startup a CPU in a new package and no keys are programmed --> do nothing

Then there is the more interesting case for MKTME: a CPU is starting up
for a new package and keys are programmed on the existing packages. The Key
Table on the new package will need to be programmed to match the Key
Tables on all existing packages.

The issue is whether or not the Key Service has the information it needs
to program the new Key Table. To address this, a new kernel commandline
parameter 'mktme_savekeys' was introduced in a previous patch. It allows
the kernel to save the data needed to program keys, beyond their first
add key request.

When 'mktme_savekeys' is not present, new packages may still be added
if all currently programmed keys are not USER type. This means that
CPU generated keys are an option for users not wanting to save key
data, but who also want to support the addition of new packages.

Change-Id: I219192fc59dd9f433963c4959f33d7f013c9f73a
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 security/keys/mktme_keys.c | 135 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 126 insertions(+), 9 deletions(-)

diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index e9c7d306cba1..fb4d4061d2f3 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -86,21 +86,29 @@ static void mktme_program_package(void *hw_program_info)
 
 /* Program a KeyID across the entire system. */
 static int mktme_program_system(struct mktme_key_program *key_program,
-				cpumask_var_t mktme_cpumask)
+				cpumask_var_t mktme_cpumask, int hotplug)
 {
 	struct mktme_hw_program_info info = {
 		.key_program = key_program,
 		.status = MKTME_PROG_SUCCESS,
 	};
-	get_online_cpus();
-	on_each_cpu_mask(mktme_cpumask, mktme_program_package, &info, 1);
-	put_online_cpus();
+
+	if (!hotplug) {
+		get_online_cpus();
+		on_each_cpu_mask(mktme_cpumask, mktme_program_package,
+				 &info, 1);
+		put_online_cpus();
+	} else {
+		on_each_cpu_mask(mktme_cpumask, mktme_program_package,
+				 &info, 1);
+	}
 
 	return info.status;
 }
 
 /* Copy the payload to the HW programming structure and program this KeyID */
-static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
+static int mktme_program_keyid(int keyid, struct mktme_payload *payload,
+			       cpumask_var_t mask, int hotplug)
 {
 	struct mktme_key_program *kprog = NULL;
 	u8 kern_entropy[MKTME_AES_XTS_SIZE];
@@ -124,7 +132,7 @@ static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
 			kprog->key_field_2[i] ^= kern_entropy[i];
 		}
 	}
-	ret = mktme_program_system(kprog, mktme_leadcpus);
+	ret = mktme_program_system(kprog, mktme_leadcpus, hotplug);
 	kmem_cache_free(mktme_prog_cache, kprog);
 	return ret;
 }
@@ -173,7 +181,7 @@ static int mktme_update_key(struct key *key,
 	/* Forget if key was user type. */
 	clear_bit(keyid, mktme_bitmap_user_type);
 
-	ret = mktme_program_keyid(keyid, payload);
+	ret = mktme_program_keyid(keyid, payload, mktme_leadcpus, 0);
 	if (ret != MKTME_PROG_SUCCESS) {
 		pr_debug("%s: %s\n", __func__, mktme_program_err[ret]);
 		ret = -ENOKEY;
@@ -202,7 +210,7 @@ int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
 		ret = -ENOKEY;
 		goto out;
 	}
-	ret = mktme_program_keyid(keyid, payload);
+	ret = mktme_program_keyid(keyid, payload, mktme_leadcpus, 0);
 	if (ret != MKTME_PROG_SUCCESS) {
 		pr_debug("%s: %s\n", __func__, mktme_program_err[ret]);
 		ret = -ENOKEY;
@@ -375,6 +383,10 @@ struct key_type key_type_mktme = {
 	.destroy	= mktme_destroy_key,
 };
 
+/*
+ * Build mktme_leadcpus mask to include one cpu per physical package.
+ * The mask is used to program the Key Table on each physical package.
+ */
 static int mktme_build_leadcpus_mask(void)
 {
 	int online_cpu, mktme_cpu;
@@ -397,6 +409,102 @@ static int mktme_build_leadcpus_mask(void)
 	return 0;
 }
 
+/* A new packages Key Table is programmed with data saved in mktme_vault. */
+static int mktme_program_new_package(cpumask_var_t mask)
+{
+	struct key *key;
+	int hotplug = 1;
+	int keyid, ret;
+
+	/* When a KeyID slot is freed, it's corresponding Key is 0 */
+	for (keyid = 1; keyid <= mktme_nr_keyids; keyid++) {
+		key = mktme_map_key_from_keyid(keyid);
+		if (!key)
+			continue;
+		/* If one key fails to program, fail the entire package. */
+		ret = mktme_program_keyid(keyid, &mktme_vault[keyid],
+					  mask, hotplug);
+		if (ret != MKTME_PROG_SUCCESS) {
+			pr_debug("%s: %s\n", __func__, mktme_program_err[ret]);
+			ret = -ENOKEY;
+			break;
+		}
+	}
+	return ret;
+}
+
+static int mktme_hotplug_cpu_startup(unsigned int cpu)
+{
+	int lead_cpu, ret = 0;
+	cpumask_var_t newmask;
+	int pkgid = topology_physical_package_id(cpu);
+
+	mktme_map_lock();
+
+	/* Nothing to do if a lead CPU exists for this package. */
+	for_each_cpu(lead_cpu, mktme_leadcpus)
+		if (topology_physical_package_id(lead_cpu) == pkgid)
+			goto out_unlock;
+
+	/* No keys to program. Just add the new lead CPU to mask. */
+	if (!mktme_map_mapped_keyids())
+		goto out_add_cpu;
+
+	/* Keys need to be programmed. Confirm programming can be done. */
+	if (!mktme_savekeys &&
+	    (bitmap_weight(mktme_bitmap_user_type, mktme_nr_keyids))) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+
+	/* Program only this packages Key Table, not all Key Tables. */
+	if (!zalloc_cpumask_var(&newmask, GFP_KERNEL)) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+	cpumask_set_cpu(cpu, newmask);
+	ret = mktme_program_new_package(newmask);
+	if (ret < 0) {
+		free_cpumask_var(newmask);
+		goto out_unlock;
+	}
+
+	free_cpumask_var(newmask);
+out_add_cpu:
+	/* Make this cpu a lead cpu for all future Key programming requests. */
+	cpumask_set_cpu(cpu, mktme_leadcpus);
+out_unlock:
+	mktme_map_unlock();
+	return ret;
+}
+
+static int mktme_hotplug_cpu_teardown(unsigned int cpu)
+{
+	int pkgid, online_cpu;
+
+	mktme_map_lock();
+	/* Teardown cpu is not a lead cpu, nothing to do. */
+	if (!cpumask_test_and_clear_cpu(cpu, mktme_leadcpus))
+		goto out;
+	/*
+	 * Teardown cpu is a lead cpu. If the physical package
+	 * is still present, pick a new lead cpu. Beware: the
+	 * teardown cpu is still in the online_cpu mask. Do
+	 * not pick it again.
+	 */
+	pkgid = topology_physical_package_id(cpu);
+	for_each_online_cpu(online_cpu)
+		if (online_cpu != cpu &&
+		    pkgid == topology_physical_package_id(online_cpu)) {
+			cpumask_set_cpu(online_cpu, mktme_leadcpus);
+			break;
+	}
+out:
+	mktme_map_unlock();
+	/* Teardowns always succeed. */
+	return 0;
+}
+
 /*
  * Allocate the global mktme_map structure based on the available keyids.
  * Create a cache for the hardware structure. Initialize the encrypt_count
@@ -405,7 +513,7 @@ static int mktme_build_leadcpus_mask(void)
  */
 static int __init init_mktme(void)
 {
-	int ret;
+	int ret, cpuhp;
 
 	/* Verify keys are present */
 	if (!(mktme_nr_keyids > 0))
@@ -433,10 +541,19 @@ static int __init init_mktme(void)
 	if (!mktme_vault)
 		goto free_bitmap;
 
+	cpuhp = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+					  "keys/mktme_keys:online",
+					  mktme_hotplug_cpu_startup,
+					  mktme_hotplug_cpu_teardown);
+	if (cpuhp < 0)
+		goto free_vault;
+
 	ret = register_key_type(&key_type_mktme);
 	if (!ret)
 		return ret;			/* SUCCESS */
 
+	cpuhp_remove_state_nocalls(cpuhp);
+free_vault:
 	kfree(mktme_vault);
 free_bitmap:
 	bitmap_free(mktme_bitmap_user_type);
-- 
2.14.1


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 07/13] x86/mm: Add helpers for reference counting encrypted VMAs
  2018-12-04  7:39 ` [RFC v2 07/13] x86/mm: Add helpers for reference counting encrypted VMAs Alison Schofield
@ 2018-12-04  8:58   ` Peter Zijlstra
  2018-12-05  5:28     ` Alison Schofield
  0 siblings, 1 reply; 87+ messages in thread
From: Peter Zijlstra @ 2018-12-04  8:58 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Mon, Dec 03, 2018 at 11:39:54PM -0800, Alison Schofield wrote:

> +void vma_put_encrypt_ref(struct vm_area_struct *vma)
> +{
> +	if (vma_keyid(vma))
> +		if (refcount_dec_and_test(&encrypt_count[vma_keyid(vma)])) {
> +			mktme_map_lock();
> +			mktme_map_free_keyid(vma_keyid(vma));
> +			mktme_map_unlock();
> +		}

This violates CodingStyle

> +}

> +void key_put_encrypt_ref(int keyid)
> +{
> +	if (refcount_dec_and_test(&encrypt_count[keyid])) {
> +		mktme_map_lock();

That smells like it wants to use refcount_dec_and_lock() instead.

> +		mktme_map_free_keyid(keyid);
> +		mktme_map_unlock();
> +	}
> +}

Also, if you write that like:

	if (!refcount_dec_and_lock(&encrypt_count[keyid], &lock))
		return;

you loose an indent level.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 09/13] mm: Restrict memory encryption to anonymous VMA's
  2018-12-04  7:39 ` [RFC v2 09/13] mm: Restrict memory encryption to anonymous VMA's Alison Schofield
@ 2018-12-04  9:10   ` Peter Zijlstra
  2018-12-05  5:30     ` Alison Schofield
  0 siblings, 1 reply; 87+ messages in thread
From: Peter Zijlstra @ 2018-12-04  9:10 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Mon, Dec 03, 2018 at 11:39:56PM -0800, Alison Schofield wrote:
> Memory encryption is only supported for mappings that are ANONYMOUS.
> Test the entire range of VMA's in an encrypt_mprotect() request to
> make sure they all meet that requirement before encrypting any.
> 
> The encrypt_mprotect syscall will return -EINVAL and will not encrypt
> any VMA's if this check fails.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

That SoB doesn't make sense; per the From you wrote the patch and signed
off on it, wth is Kirill's SoB doing there?

> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index ad8127dc9aac..f1c009409134 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -345,6 +345,24 @@ static int prot_none_walk(struct vm_area_struct *vma, unsigned long start,
>  	return walk_page_range(start, end, &prot_none_walk);
>  }
>  
> +/*
> + * Encrypted mprotect is only supported on anonymous mappings.
> + * All VMA's in the requested range must be anonymous. If this
> + * test fails on any single VMA, the entire mprotect request fails.
> + */
> +bool mem_supports_encryption(struct vm_area_struct *vma, unsigned long end)

That's a 'weird' interface and cannot do what the comment says it should
do.

> +{
> +	struct vm_area_struct *test_vma = vma;

That variable is utterly pointless.

> +	do {
> +		if (!vma_is_anonymous(test_vma))
> +			return false;
> +
> +		test_vma = test_vma->vm_next;
> +	} while (test_vma && test_vma->vm_start < end);
> +	return true;
> +}

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 04/13] x86/mm: Add helper functions for MKTME memory encryption keys
  2018-12-04  7:39 ` [RFC v2 04/13] x86/mm: Add helper functions for MKTME " Alison Schofield
@ 2018-12-04  9:14   ` Peter Zijlstra
  2018-12-05  5:49     ` Alison Schofield
  2018-12-04 15:35   ` Andy Lutomirski
  2018-12-06  8:31   ` Sakkinen, Jarkko
  2 siblings, 1 reply; 87+ messages in thread
From: Peter Zijlstra @ 2018-12-04  9:14 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Mon, Dec 03, 2018 at 11:39:51PM -0800, Alison Schofield wrote:
> +int mktme_map_keyid_from_key(void *key)
> +{
> +	int i;
> +
> +	for (i = 1; i <= mktme_nr_keyids; i++)
> +		if (mktme_map->key[i] == key)
> +			return i;

CodingStyle

> +	return 0;
> +}
> +int mktme_map_get_free_keyid(void)
> +{
> +	int i;
> +
> +	if (mktme_map->mapped_keyids < mktme_nr_keyids) {
> +		for (i = 1; i <= mktme_nr_keyids; i++)
> +			if (mktme_map->key[i] == 0)
> +				return i;

CodingStyle

> +	}
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 11/13] keys/mktme: Program memory encryption keys on a system wide basis
  2018-12-04  7:39 ` [RFC v2 11/13] keys/mktme: Program memory encryption keys on a system wide basis Alison Schofield
@ 2018-12-04  9:21   ` Peter Zijlstra
  2018-12-04  9:50     ` Kirill A. Shutemov
  2018-12-05  5:43     ` Alison Schofield
  0 siblings, 2 replies; 87+ messages in thread
From: Peter Zijlstra @ 2018-12-04  9:21 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Mon, Dec 03, 2018 at 11:39:58PM -0800, Alison Schofield wrote:

> +struct mktme_hw_program_info {
> +	struct mktme_key_program *key_program;
> +	unsigned long status;
> +};
> +
> +/* Program a KeyID on a single package. */
> +static void mktme_program_package(void *hw_program_info)
> +{
> +	struct mktme_hw_program_info *info = hw_program_info;
> +	int ret;
> +
> +	ret = mktme_key_program(info->key_program);
> +	if (ret != MKTME_PROG_SUCCESS)
> +		WRITE_ONCE(info->status, ret);

What's the purpose of that WRITE_ONCE()?

> +}
> +
> +/* Program a KeyID across the entire system. */
> +static int mktme_program_system(struct mktme_key_program *key_program,
> +				cpumask_var_t mktme_cpumask)
> +{
> +	struct mktme_hw_program_info info = {
> +		.key_program = key_program,
> +		.status = MKTME_PROG_SUCCESS,
> +	};
> +	get_online_cpus();
> +	on_each_cpu_mask(mktme_cpumask, mktme_program_package, &info, 1);
> +	put_online_cpus();
> +
> +	return info.status;
> +}
> +
>  /* Copy the payload to the HW programming structure and program this KeyID */
>  static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
>  {
> @@ -84,7 +116,7 @@ static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
>  			kprog->key_field_2[i] ^= kern_entropy[i];
>  		}
>  	}
> -	ret = mktme_key_program(kprog);
> +	ret = mktme_program_system(kprog, mktme_leadcpus);
>  	kmem_cache_free(mktme_prog_cache, kprog);
>  	return ret;
>  }
> @@ -299,6 +331,28 @@ struct key_type key_type_mktme = {
>  	.destroy	= mktme_destroy_key,
>  };
>  
> +static int mktme_build_leadcpus_mask(void)
> +{
> +	int online_cpu, mktme_cpu;
> +	int online_pkgid, mktme_pkgid = -1;
> +
> +	if (!zalloc_cpumask_var(&mktme_leadcpus, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	for_each_online_cpu(online_cpu) {
> +		online_pkgid = topology_physical_package_id(online_cpu);
> +
> +		for_each_cpu(mktme_cpu, mktme_leadcpus) {
> +			mktme_pkgid = topology_physical_package_id(mktme_cpu);
> +			if (mktme_pkgid == online_pkgid)
> +				break;
> +		}
> +		if (mktme_pkgid != online_pkgid)
> +			cpumask_set_cpu(online_cpu, mktme_leadcpus);

Do you really need LOCK prefixed bit set here?

> +	}
> +	return 0;
> +}

How is that serialized and kept relevant in the face of hotplug?

Also, do you really need O(n^2) to find the first occurence of a value
in an array?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows
  2018-12-04  7:39 ` [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows Alison Schofield
@ 2018-12-04  9:22   ` Peter Zijlstra
  2018-12-07  2:14   ` Huang, Kai
  1 sibling, 0 replies; 87+ messages in thread
From: Peter Zijlstra @ 2018-12-04  9:22 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Mon, Dec 03, 2018 at 11:39:59PM -0800, Alison Schofield wrote:
> Change-Id: If57414862f1ac131dd97e29bf4f3937ac33777f6

Does not belong in patches..

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (12 preceding siblings ...)
  2018-12-04  7:40 ` [RFC v2 13/13] keys/mktme: Support CPU Hotplug for MKTME keys Alison Schofield
@ 2018-12-04  9:25 ` Peter Zijlstra
  2018-12-04  9:46   ` Kirill A. Shutemov
  2018-12-04 19:19 ` Andy Lutomirski
  2018-12-05 20:30 ` Sakkinen, Jarkko
  15 siblings, 1 reply; 87+ messages in thread
From: Peter Zijlstra @ 2018-12-04  9:25 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Mon, Dec 03, 2018 at 11:39:47PM -0800, Alison Schofield wrote:
> (Multi-Key Total Memory Encryption)

I think that MKTME is a horrible name, and doesn't appear to accurately
describe what it does either. Specifically the 'total' seems out of
place, it doesn't require all memory to be encrypted.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 13/13] keys/mktme: Support CPU Hotplug for MKTME keys
  2018-12-04  7:40 ` [RFC v2 13/13] keys/mktme: Support CPU Hotplug for MKTME keys Alison Schofield
@ 2018-12-04  9:28   ` Peter Zijlstra
  2018-12-05  5:32     ` Alison Schofield
  2018-12-04  9:31   ` Peter Zijlstra
  1 sibling, 1 reply; 87+ messages in thread
From: Peter Zijlstra @ 2018-12-04  9:28 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Mon, Dec 03, 2018 at 11:40:00PM -0800, Alison Schofield wrote:
> +	for_each_online_cpu(online_cpu)
> +		if (online_cpu != cpu &&
> +		    pkgid == topology_physical_package_id(online_cpu)) {
> +			cpumask_set_cpu(online_cpu, mktme_leadcpus);
> +			break;
> +	}

That's a capital offence right there.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 13/13] keys/mktme: Support CPU Hotplug for MKTME keys
  2018-12-04  7:40 ` [RFC v2 13/13] keys/mktme: Support CPU Hotplug for MKTME keys Alison Schofield
  2018-12-04  9:28   ` Peter Zijlstra
@ 2018-12-04  9:31   ` Peter Zijlstra
  2018-12-05  5:36     ` Alison Schofield
  1 sibling, 1 reply; 87+ messages in thread
From: Peter Zijlstra @ 2018-12-04  9:31 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Mon, Dec 03, 2018 at 11:40:00PM -0800, Alison Schofield wrote:
>  static int mktme_program_system(struct mktme_key_program *key_program,
> -				cpumask_var_t mktme_cpumask)
> +				cpumask_var_t mktme_cpumask, int hotplug)
>  {
>  	struct mktme_hw_program_info info = {
>  		.key_program = key_program,
>  		.status = MKTME_PROG_SUCCESS,
>  	};
> -	get_online_cpus();
> -	on_each_cpu_mask(mktme_cpumask, mktme_program_package, &info, 1);
> -	put_online_cpus();
> +
> +	if (!hotplug) {
> +		get_online_cpus();
> +		on_each_cpu_mask(mktme_cpumask, mktme_program_package,
> +				 &info, 1);
> +		put_online_cpus();
> +	} else {
> +		on_each_cpu_mask(mktme_cpumask, mktme_program_package,
> +				 &info, 1);
> +	}
>  
>  	return info.status;
>  }

That is pretty horrible; and I think easily avoided.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-04  9:25 ` [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Peter Zijlstra
@ 2018-12-04  9:46   ` Kirill A. Shutemov
  2018-12-05 20:32     ` Sakkinen, Jarkko
  0 siblings, 1 reply; 87+ messages in thread
From: Kirill A. Shutemov @ 2018-12-04  9:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alison Schofield, dhowells, tglx, jmorris, mingo, hpa, bp, luto,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Tue, Dec 04, 2018 at 09:25:50AM +0000, Peter Zijlstra wrote:
> On Mon, Dec 03, 2018 at 11:39:47PM -0800, Alison Schofield wrote:
> > (Multi-Key Total Memory Encryption)
> 
> I think that MKTME is a horrible name, and doesn't appear to accurately
> describe what it does either. Specifically the 'total' seems out of
> place, it doesn't require all memory to be encrypted.

MKTME implies TME. TME is enabled by BIOS and it encrypts all memory with
CPU-generated key. MKTME allows to use other keys or disable encryption
for a page.

But, yes, name is not good.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 11/13] keys/mktme: Program memory encryption keys on a system wide basis
  2018-12-04  9:21   ` Peter Zijlstra
@ 2018-12-04  9:50     ` Kirill A. Shutemov
  2018-12-05  5:44       ` Alison Schofield
  2018-12-05  5:43     ` Alison Schofield
  1 sibling, 1 reply; 87+ messages in thread
From: Kirill A. Shutemov @ 2018-12-04  9:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alison Schofield, dhowells, tglx, jmorris, mingo, hpa, bp, luto,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Tue, Dec 04, 2018 at 09:21:45AM +0000, Peter Zijlstra wrote:
> On Mon, Dec 03, 2018 at 11:39:58PM -0800, Alison Schofield wrote:
> 
> > +struct mktme_hw_program_info {
> > +	struct mktme_key_program *key_program;
> > +	unsigned long status;
> > +};
> > +
> > +/* Program a KeyID on a single package. */
> > +static void mktme_program_package(void *hw_program_info)
> > +{
> > +	struct mktme_hw_program_info *info = hw_program_info;
> > +	int ret;
> > +
> > +	ret = mktme_key_program(info->key_program);
> > +	if (ret != MKTME_PROG_SUCCESS)
> > +		WRITE_ONCE(info->status, ret);
> 
> What's the purpose of that WRITE_ONCE()?

[I suggested the code to Alison.]

Yes, you're right. Simple assignment will do.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 04/13] x86/mm: Add helper functions for MKTME memory encryption keys
  2018-12-04  7:39 ` [RFC v2 04/13] x86/mm: Add helper functions for MKTME " Alison Schofield
  2018-12-04  9:14   ` Peter Zijlstra
@ 2018-12-04 15:35   ` Andy Lutomirski
  2018-12-05  5:52     ` Alison Schofield
  2018-12-06  8:31   ` Sakkinen, Jarkko
  2 siblings, 1 reply; 87+ messages in thread
From: Andy Lutomirski @ 2018-12-04 15:35 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, peterz,
	kirill.shutemov, dave.hansen, kai.huang, jun.nakajima,
	dan.j.williams, jarkko.sakkinen, keyrings, linux-security-module,
	linux-mm, x86



> On Dec 3, 2018, at 11:39 PM, Alison Schofield <alison.schofield@intel.com> wrote:
> 
> Define a global mapping structure to manage the mapping of userspace
> Keys to hardware KeyIDs in MKTME (Multi-Key Total Memory Encryption).
> Implement helper functions that access this mapping structure.
> 

Why is a key “void *”?  Who owns the memory?  Can a real type be used?


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (13 preceding siblings ...)
  2018-12-04  9:25 ` [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Peter Zijlstra
@ 2018-12-04 19:19 ` Andy Lutomirski
  2018-12-04 20:00   ` Andy Lutomirski
                     ` (2 more replies)
  2018-12-05 20:30 ` Sakkinen, Jarkko
  15 siblings, 3 replies; 87+ messages in thread
From: Andy Lutomirski @ 2018-12-04 19:19 UTC (permalink / raw)
  To: alison.schofield, Matthew Wilcox, Dan Williams
  Cc: David Howells, Thomas Gleixner, James Morris, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Andrew Lutomirski,
	Peter Zijlstra, Kirill A. Shutemov, Dave Hansen, kai.huang,
	Jun Nakajima, Sakkinen, Jarkko, keyrings, LSM List, Linux-MM,
	X86 ML

On Mon, Dec 3, 2018 at 11:37 PM Alison Schofield
<alison.schofield@intel.com> wrote:
>
> Hi Thomas, David,
>
> Here is an updated RFC on the API's to support MKTME.
> (Multi-Key Total Memory Encryption)
>
> This RFC presents the 2 API additions to support the creation and
> usage of memory encryption keys:
>  1) Kernel Key Service type "mktme"
>  2) System call encrypt_mprotect()
>
> This patchset is built upon Kirill Shutemov's work for the core MKTME
> support.
>
> David: Please let me know if the changes made, based on your review,
> are reasonable. I don't think that the new changes touch key service
> specific areas (much).
>
> Thomas: Please provide feedback on encrypt_mprotect(). If not a
> review, then a direction check would be helpful.
>

I'm not Thomas, but I think it's the wrong direction.  As it stands,
encrypt_mprotect() is an incomplete version of mprotect() (since it's
missing the protection key support), and it's also functionally just
MADV_DONTNEED.  In other words, the sole user-visible effect appears
to be that the existing pages are blown away.  The fact that it
changes the key in use doesn't seem terribly useful, since it's
anonymous memory, and the most secure choice is to use CPU-managed
keying, which appears to be the default anyway on TME systems.  It
also has totally unclear semantics WRT swap, and, off the top of my
head, it looks like it may have serious cache-coherency issues and
like swapping the pages might corrupt them, both because there are no
flushes and because the direct-map alias looks like it will use the
default key and therefore appear to contain the wrong data.

I would propose a very different direction: don't try to support MKTME
at all for anonymous memory, and instead figure out the important use
cases and support them directly.  The use cases that I can think of
off the top of my head are:

1. pmem.  This should probably use a very different API.

2. Some kind of VM hardening, where a VM's memory can be protected a
little tiny bit from the main kernel.  But I don't see why this is any
better than XPO (eXclusive Page-frame Ownership), which brings to
mind:

The main implementation concern I have with this patch set is cache
coherency and handling of the direct map.  Unless I missed something,
you're not doing anything about the direct map, which means that you
have RW aliases of the same memory with different keys.  For use case
#2, this probably means that you need to either get rid of the direct
map and make get_user_pages() fail, or you need to change the key on
the direct map as well, probably using the pageattr.c code.

As for caching, As far as I can tell from reading the preliminary
docs, Intel's MKTME, much like AMD's SME, is basically invisible to
the hardware cache coherency mechanism.  So, if you modify a physical
address with one key (or SME-enable bit), and you read it with
another, you get garbage unless you flush.  And, if you modify memory
with one key then remap it with a different key without flushing in
the mean time, you risk corruption.  And, what's worse, if I'm reading
between the lines in the docs correctly, if you use PCONFIG to change
a key, you may need to do a bunch of cache flushing to ensure you get
reasonable effects.  (If you have dirty cache lines for some (PA, key)
and you PCONFIG to change the underlying key, you get different
results depending on whether the writeback happens before or after the
package doing the writeback notices the PCONFIG.)

Finally, If you're going to teach the kernel how to have some user
pages that aren't in the direct map, you've essentially done XPO,
which is nifty but expensive.  And I think that doing this gets you
essentially all the benefit of MKTME for the non-pmem use case.  Why
exactly would any software want to use anything other than a
CPU-managed key for anything other than pmem?

--Andy

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-04 19:19 ` Andy Lutomirski
@ 2018-12-04 20:00   ` Andy Lutomirski
  2018-12-04 20:32     ` Dave Hansen
  2018-12-05 22:19   ` Sakkinen, Jarkko
  2018-12-05 23:49   ` Dave Hansen
  2 siblings, 1 reply; 87+ messages in thread
From: Andy Lutomirski @ 2018-12-04 20:00 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: alison.schofield, Matthew Wilcox, Dan Williams, David Howells,
	Thomas Gleixner, James Morris, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Kirill A. Shutemov, Dave Hansen,
	kai.huang, Jun Nakajima, Sakkinen, Jarkko, keyrings, LSM List,
	Linux-MM, X86 ML

On Tue, Dec 4, 2018 at 11:19 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Mon, Dec 3, 2018 at 11:37 PM Alison Schofield
> <alison.schofield@intel.com> wrote:
> >

> Finally, If you're going to teach the kernel how to have some user
> pages that aren't in the direct map, you've essentially done XPO,
> which is nifty but expensive.  And I think that doing this gets you
> essentially all the benefit of MKTME for the non-pmem use case.  Why
> exactly would any software want to use anything other than a
> CPU-managed key for anything other than pmem?
>

Let me say this less abstractly.  Here's a somewhat concrete actual
proposal.  Make a new memfd_create() flag like MEMFD_ISOLATED.  The
semantics are that the underlying pages are made not-present in the
direct map when they're allocated (which is hideously slow, but so be
it), and that anything that tries to get_user_pages() the resulting
pages fails.  And then make sure we have all the required APIs so that
QEMU can still map this stuff into a VM.

If there is indeed a situation in which MKTME-ifying the memory adds
some value, then we can consider doing that.

And maybe we get fancy and encrypt this memory when it's swapped, but
maybe we should just encrypt everything when it's swapped.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-04 20:00   ` Andy Lutomirski
@ 2018-12-04 20:32     ` Dave Hansen
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Hansen @ 2018-12-04 20:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: alison.schofield, Matthew Wilcox, Dan Williams, David Howells,
	Thomas Gleixner, James Morris, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Kirill A. Shutemov, kai.huang,
	Jun Nakajima, Sakkinen, Jarkko, keyrings, LSM List, Linux-MM,
	X86 ML

On 12/4/18 12:00 PM, Andy Lutomirski wrote:
> On Tue, Dec 4, 2018 at 11:19 AM Andy Lutomirski <luto@kernel.org> wrote:
>> On Mon, Dec 3, 2018 at 11:37 PM Alison Schofield <alison.schofield@intel.com> wrote:
>> Finally, If you're going to teach the kernel how to have some user
>> pages that aren't in the direct map, you've essentially done XPO,
>> which is nifty but expensive.  And I think that doing this gets you
>> essentially all the benefit of MKTME for the non-pmem use case.  Why
>> exactly would any software want to use anything other than a
>> CPU-managed key for anything other than pmem?
> 
> Let me say this less abstractly.  Here's a somewhat concrete actual
> proposal.  Make a new memfd_create() flag like MEMFD_ISOLATED.  The
> semantics are that the underlying pages are made not-present in the
> direct map when they're allocated (which is hideously slow, but so be
> it), and that anything that tries to get_user_pages() the resulting
> pages fails.  And then make sure we have all the required APIs so that
> QEMU can still map this stuff into a VM.

I think we need get_user_pages().  We want direct I/O to work, *and* we
really want direct device assignment into VMs.

> And maybe we get fancy and encrypt this memory when it's swapped, but
> maybe we should just encrypt everything when it's swapped.

We decided long ago (and this should be in the patches somewhere) that
we wouldn't force memory to be encrypted in swap.  We would just
recommend it in the documentation as a best practice, especially when
using MKTME.

We can walk that back, of course, but that's what we're doing at the moment.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 07/13] x86/mm: Add helpers for reference counting encrypted VMAs
  2018-12-04  8:58   ` Peter Zijlstra
@ 2018-12-05  5:28     ` Alison Schofield
  0 siblings, 0 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-05  5:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Tue, Dec 04, 2018 at 09:58:35AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 03, 2018 at 11:39:54PM -0800, Alison Schofield wrote:
> 
> > +void vma_put_encrypt_ref(struct vm_area_struct *vma)
> > +{
> > +	if (vma_keyid(vma))
> > +		if (refcount_dec_and_test(&encrypt_count[vma_keyid(vma)])) {
> > +			mktme_map_lock();
> > +			mktme_map_free_keyid(vma_keyid(vma));
> > +			mktme_map_unlock();
> > +		}
> 
> This violates CodingStyle

Got it!
Will fix this and the other instances where you noticed poorly nested
if statements. 

> > +	if (refcount_dec_and_test(&encrypt_count[keyid])) {
> > +		mktme_map_lock();
> 
> That smells like it wants to use refcount_dec_and_lock() instead.
> 
> Also, if you write that like:
> 
> 	if (!refcount_dec_and_lock(&encrypt_count[keyid], &lock))
> 		return;
> 
> you loose an indent level.
Looks good! I need to make sure it's OK to switch to a spinlock to use
the *_lock functions.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 09/13] mm: Restrict memory encryption to anonymous VMA's
  2018-12-04  9:10   ` Peter Zijlstra
@ 2018-12-05  5:30     ` Alison Schofield
  2018-12-05  9:07       ` Peter Zijlstra
  0 siblings, 1 reply; 87+ messages in thread
From: Alison Schofield @ 2018-12-05  5:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Tue, Dec 04, 2018 at 10:10:44AM +0100, Peter Zijlstra wrote:
> > + * Encrypted mprotect is only supported on anonymous mappings.
> > + * All VMA's in the requested range must be anonymous. If this
> > + * test fails on any single VMA, the entire mprotect request fails.
> > + */
> > +bool mem_supports_encryption(struct vm_area_struct *vma, unsigned long end)
> 
> That's a 'weird' interface and cannot do what the comment says it should
> do.

More please? With MKTME, only anonymous memory supports encryption.
Is it the naming that's weird, or you don't see it doing what it says?

> > +	struct vm_area_struct *test_vma = vma;
> 
> That variable is utterly pointless.
Got it. Will fix.

Thanks

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 13/13] keys/mktme: Support CPU Hotplug for MKTME keys
  2018-12-04  9:28   ` Peter Zijlstra
@ 2018-12-05  5:32     ` Alison Schofield
  0 siblings, 0 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-05  5:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Tue, Dec 04, 2018 at 10:28:41AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 03, 2018 at 11:40:00PM -0800, Alison Schofield wrote:
> > +	for_each_online_cpu(online_cpu)
> > +		if (online_cpu != cpu &&
> > +		    pkgid == topology_physical_package_id(online_cpu)) {
> > +			cpumask_set_cpu(online_cpu, mktme_leadcpus);
> > +			break;
> > +	}
> 
> That's a capital offence right there.
Got it!


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 13/13] keys/mktme: Support CPU Hotplug for MKTME keys
  2018-12-04  9:31   ` Peter Zijlstra
@ 2018-12-05  5:36     ` Alison Schofield
  0 siblings, 0 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-05  5:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Tue, Dec 04, 2018 at 10:31:16AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 03, 2018 at 11:40:00PM -0800, Alison Schofield wrote:
> >  static int mktme_program_system(struct mktme_key_program *key_program,
> > -				cpumask_var_t mktme_cpumask)
> > +				cpumask_var_t mktme_cpumask, int hotplug)
> >  {
> >  	struct mktme_hw_program_info info = {
> >  		.key_program = key_program,
> >  		.status = MKTME_PROG_SUCCESS,
> >  	};
> > -	get_online_cpus();
> > -	on_each_cpu_mask(mktme_cpumask, mktme_program_package, &info, 1);
> > -	put_online_cpus();
> > +
> > +	if (!hotplug) {
> > +		get_online_cpus();
> > +		on_each_cpu_mask(mktme_cpumask, mktme_program_package,
> > +				 &info, 1);
> > +		put_online_cpus();
> > +	} else {
> > +		on_each_cpu_mask(mktme_cpumask, mktme_program_package,
> > +				 &info, 1);
> > +	}
> >  
> >  	return info.status;
> >  }
> 
> That is pretty horrible; and I think easily avoided.
Agree it's ugly. Not sure we share the same reasoning. I realize that
the hotplug case is on the current cpu and so that whole
one_each_cpu_mask() call is not needed. mktme_program_package() can just
be called on the current cpu.

The ugliness that haunts me is that I wanted to reuse this code path,
and so I passed that 'hotplug' parameter along as a differentiator
between hotplug & 'typical' key programming. 
I'll rework this.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 11/13] keys/mktme: Program memory encryption keys on a system wide basis
  2018-12-04  9:21   ` Peter Zijlstra
  2018-12-04  9:50     ` Kirill A. Shutemov
@ 2018-12-05  5:43     ` Alison Schofield
  2018-12-05  9:10       ` Peter Zijlstra
  1 sibling, 1 reply; 87+ messages in thread
From: Alison Schofield @ 2018-12-05  5:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Tue, Dec 04, 2018 at 10:21:45AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 03, 2018 at 11:39:58PM -0800, Alison Schofield wrote:
> 
> > +static int mktme_build_leadcpus_mask(void)
> > +{
> > +	int online_cpu, mktme_cpu;
> > +	int online_pkgid, mktme_pkgid = -1;
> > +
> > +	if (!zalloc_cpumask_var(&mktme_leadcpus, GFP_KERNEL))
> > +		return -ENOMEM;
> > +
> > +	for_each_online_cpu(online_cpu) {
> > +		online_pkgid = topology_physical_package_id(online_cpu);
> > +
> > +		for_each_cpu(mktme_cpu, mktme_leadcpus) {
> > +			mktme_pkgid = topology_physical_package_id(mktme_cpu);
> > +			if (mktme_pkgid == online_pkgid)
> > +				break;
> > +		}
> > +		if (mktme_pkgid != online_pkgid)
> > +			cpumask_set_cpu(online_cpu, mktme_leadcpus);
> 
> Do you really need LOCK prefixed bit set here?
No. Changed to __cpumask_set_cpu(). Will check for other instances
where I can skip LOCK prefix.

> How is that serialized and kept relevant in the face of hotplug?
mktme_leadcpus is updated on hotplug startup and teardowns.

> Also, do you really need O(n^2) to find the first occurence of a value
> in an array?
How about this O(n)?
	
	unsigned long *pkg_map;
	int cpu, pkgid;

	if (!zalloc_cpumask_var(&mktme_leadcpus, GFP_KERNEL))
		return -ENOMEM;

	pkg_map = bitmap_zalloc(topology_max_packages(), GFP_KERNEL);
	if (!pkg_map) {
		free_cpumask_var(mktme_leadcpus);
		return -ENOMEM;
	}
	for_each_online_cpu(cpu) {
		pkgid = topology_physical_package_id(cpu);
		if (!test_and_set_bit(pkgid, pkg_map))
			__cpumask_set_cpu(cpu, mktme_leadcpus);
	}








^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 11/13] keys/mktme: Program memory encryption keys on a system wide basis
  2018-12-04  9:50     ` Kirill A. Shutemov
@ 2018-12-05  5:44       ` Alison Schofield
  0 siblings, 0 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-05  5:44 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Peter Zijlstra, dhowells, tglx, jmorris, mingo, hpa, bp, luto,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Tue, Dec 04, 2018 at 12:50:09PM +0300, Kirill A. Shutemov wrote:
> On Tue, Dec 04, 2018 at 09:21:45AM +0000, Peter Zijlstra wrote:
> > On Mon, Dec 03, 2018 at 11:39:58PM -0800, Alison Schofield wrote:
> > 
> > > +struct mktme_hw_program_info {
> > > +	struct mktme_key_program *key_program;
> > > +	unsigned long status;
> > > +};
> > > +
> > > +/* Program a KeyID on a single package. */
> > > +static void mktme_program_package(void *hw_program_info)
> > > +{
> > > +	struct mktme_hw_program_info *info = hw_program_info;
> > > +	int ret;
> > > +
> > > +	ret = mktme_key_program(info->key_program);
> > > +	if (ret != MKTME_PROG_SUCCESS)
> > > +		WRITE_ONCE(info->status, ret);
> > 
> > What's the purpose of that WRITE_ONCE()?
> 
> [I suggested the code to Alison.]
> 
> Yes, you're right. Simple assignment will do.
Will do. Thanks!

> 
> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 04/13] x86/mm: Add helper functions for MKTME memory encryption keys
  2018-12-04  9:14   ` Peter Zijlstra
@ 2018-12-05  5:49     ` Alison Schofield
  0 siblings, 0 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-05  5:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Tue, Dec 04, 2018 at 10:14:34AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 03, 2018 at 11:39:51PM -0800, Alison Schofield wrote:
> 
> CodingStyle
> CodingStyle
>
Thanks Peter. I'll repair all the badly nested if statements.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 04/13] x86/mm: Add helper functions for MKTME memory encryption keys
  2018-12-04 15:35   ` Andy Lutomirski
@ 2018-12-05  5:52     ` Alison Schofield
  0 siblings, 0 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-05  5:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, peterz,
	kirill.shutemov, dave.hansen, kai.huang, jun.nakajima,
	dan.j.williams, jarkko.sakkinen, keyrings, linux-security-module,
	linux-mm, x86

On Tue, Dec 04, 2018 at 07:35:50AM -0800, Andy Lutomirski wrote:
> 
> 
> > On Dec 3, 2018, at 11:39 PM, Alison Schofield <alison.schofield@intel.com> wrote:
> > 
> > Define a global mapping structure to manage the mapping of userspace
> > Keys to hardware KeyIDs in MKTME (Multi-Key Total Memory Encryption).
> > Implement helper functions that access this mapping structure.
> > 
> 
> Why is a key “void *”?  Who owns the memory?  Can a real type be used?
>
It's of type "struct key" of the kernel key service.
Replacing void w 'struct key'.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 09/13] mm: Restrict memory encryption to anonymous VMA's
  2018-12-05  5:30     ` Alison Schofield
@ 2018-12-05  9:07       ` Peter Zijlstra
  0 siblings, 0 replies; 87+ messages in thread
From: Peter Zijlstra @ 2018-12-05  9:07 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Tue, Dec 04, 2018 at 09:30:20PM -0800, Alison Schofield wrote:
> On Tue, Dec 04, 2018 at 10:10:44AM +0100, Peter Zijlstra wrote:
> > > + * Encrypted mprotect is only supported on anonymous mappings.
> > > + * All VMA's in the requested range must be anonymous. If this
> > > + * test fails on any single VMA, the entire mprotect request fails.
> > > + */
> > > +bool mem_supports_encryption(struct vm_area_struct *vma, unsigned long end)
> > 
> > That's a 'weird' interface and cannot do what the comment says it should
> > do.
> 
> More please? With MKTME, only anonymous memory supports encryption.
> Is it the naming that's weird, or you don't see it doing what it says?

It's weird because you don't fully speficy the range -- ie. it cannot
verify the vma argument. It is also weird because the start and end are
not of the same type -- or rather, there is no start at all.

So while the comment talks about a range, there is not in fact a range
(only the implied @start is somewhere inside @vma). The comment also
states all vmas in the range, but again, because of a lack of range
specification it cannot verify this statement.

Now, I don't necessarily object to the function and its implementation,
but that comment is just plain misleading.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 11/13] keys/mktme: Program memory encryption keys on a system wide basis
  2018-12-05  5:43     ` Alison Schofield
@ 2018-12-05  9:10       ` Peter Zijlstra
  2018-12-05 17:26         ` Alison Schofield
  0 siblings, 1 reply; 87+ messages in thread
From: Peter Zijlstra @ 2018-12-05  9:10 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Tue, Dec 04, 2018 at 09:43:53PM -0800, Alison Schofield wrote:
> On Tue, Dec 04, 2018 at 10:21:45AM +0100, Peter Zijlstra wrote:
> > On Mon, Dec 03, 2018 at 11:39:58PM -0800, Alison Schofield wrote:
> > 
> > > +static int mktme_build_leadcpus_mask(void)
> > > +{
> > > +	int online_cpu, mktme_cpu;
> > > +	int online_pkgid, mktme_pkgid = -1;
> > > +
> > > +	if (!zalloc_cpumask_var(&mktme_leadcpus, GFP_KERNEL))
> > > +		return -ENOMEM;
> > > +
> > > +	for_each_online_cpu(online_cpu) {
> > > +		online_pkgid = topology_physical_package_id(online_cpu);
> > > +
> > > +		for_each_cpu(mktme_cpu, mktme_leadcpus) {
> > > +			mktme_pkgid = topology_physical_package_id(mktme_cpu);
> > > +			if (mktme_pkgid == online_pkgid)
> > > +				break;
> > > +		}
> > > +		if (mktme_pkgid != online_pkgid)
> > > +			cpumask_set_cpu(online_cpu, mktme_leadcpus);
> > 
> > Do you really need LOCK prefixed bit set here?
> No. Changed to __cpumask_set_cpu(). Will check for other instances
> where I can skip LOCK prefix.
> 
> > How is that serialized and kept relevant in the face of hotplug?
> mktme_leadcpus is updated on hotplug startup and teardowns.

Not in this patch it is not. That is added in a subsequent patch, which
means that during bisection hotplug is utterly wrecked if you happen to
land between these patches, that is bad.

> > Also, do you really need O(n^2) to find the first occurence of a value
> > in an array?

> How about this O(n)?
> 	
> 	unsigned long *pkg_map;
> 	int cpu, pkgid;
> 
> 	if (!zalloc_cpumask_var(&mktme_leadcpus, GFP_KERNEL))
> 		return -ENOMEM;
> 
> 	pkg_map = bitmap_zalloc(topology_max_packages(), GFP_KERNEL);
> 	if (!pkg_map) {
> 		free_cpumask_var(mktme_leadcpus);
> 		return -ENOMEM;
> 	}
> 	for_each_online_cpu(cpu) {
> 		pkgid = topology_physical_package_id(cpu);
> 		if (!test_and_set_bit(pkgid, pkg_map))

You again don't need that LOCK prefix here.

	__test_and_set_bit() :-)

> 			__cpumask_set_cpu(cpu, mktme_leadcpus);
> 	}

Right.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 11/13] keys/mktme: Program memory encryption keys on a system wide basis
  2018-12-05  9:10       ` Peter Zijlstra
@ 2018-12-05 17:26         ` Alison Schofield
  0 siblings, 0 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-05 17:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, kirill.shutemov,
	dave.hansen, kai.huang, jun.nakajima, dan.j.williams,
	jarkko.sakkinen, keyrings, linux-security-module, linux-mm, x86

On Wed, Dec 05, 2018 at 10:10:29AM +0100, Peter Zijlstra wrote:
> On Tue, Dec 04, 2018 at 09:43:53PM -0800, Alison Schofield wrote:
> > On Tue, Dec 04, 2018 at 10:21:45AM +0100, Peter Zijlstra wrote:
> > > On Mon, Dec 03, 2018 at 11:39:58PM -0800, Alison Schofield wrote:
> > 
> > > How is that serialized and kept relevant in the face of hotplug?
> > mktme_leadcpus is updated on hotplug startup and teardowns.
> 
> Not in this patch it is not. That is added in a subsequent patch, which
> means that during bisection hotplug is utterly wrecked if you happen to
> land between these patches, that is bad.
>
The Key Service support is split between 4 main patches (10-13), but
the dependencies go further back in the patchset.

If the bisect need outweighs any benefit from reviewing in pieces,
then these patches can be squashed to a single patch:

keys/mktme: Add the MKTME Key Service type for memory encryption
keys/mktme: Program memory encryption keys on a system wide basis
keys/mktme: Save MKTME data if kernel cmdline parameter allows
keys/mktme: Support CPU Hotplug for MKTME keys

Am I interpreting your point correctly?
Thanks,
Alison

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 01/13] x86/mktme: Document the MKTME APIs
  2018-12-04  7:39 ` [RFC v2 01/13] x86/mktme: Document the MKTME APIs Alison Schofield
@ 2018-12-05 18:11   ` Andy Lutomirski
  2018-12-05 19:22     ` Alison Schofield
  2018-12-06  8:04   ` Sakkinen, Jarkko
  1 sibling, 1 reply; 87+ messages in thread
From: Andy Lutomirski @ 2018-12-05 18:11 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, peterz,
	kirill.shutemov, dave.hansen, kai.huang, jun.nakajima,
	dan.j.williams, jarkko.sakkinen, keyrings, linux-security-module,
	linux-mm, x86



> On Dec 3, 2018, at 11:39 PM, Alison Schofield <alison.schofield@intel.com> wrote:

I realize you’re writing code to expose hardware behavior, but I’m not sure this
really makes sense in this context.

> .
> +
> +Usage
> +-----
> +    When using the Kernel Key Service to request an *mktme* key,
> +    specify the *payload* as follows:
> +
> +    type=
> +        *user*    User will supply the encryption key data. Use this
> +                type to directly program a hardware encryption key.
> +

I think that “user” probably sense as a “key service” key, but I don’t think it is at all useful for non-persistent memory.  Even if we take for granted that MKTME for anonymous memory is useful at all, “cpu” seems to be better in all respects.


Perhaps support for “user” should be tabled until there’s a design for how to use this for pmem?  I imagine it would look quite a bit like dm-crypt.  Advanced pmem filesystems could plausibly use different keys for different files, I suppose.

If “user” is dropped, I think a lot of the complexity goes away. Hotplug becomes automatic, right?

> +        *cpu*    User requests a CPU generated encryption key.

Okay, maybe, but it’s still unclear to me exactly what the intended benefit is, though.

> +                The CPU generates and assigns an ephemeral key.
> +
> +        *clear* User requests that a hardware encryption key be
> +                cleared. This will clear the encryption key from
> +                the hardware. On execution this hardware key gets
> +                TME behavior.
> +

Why is this a key type?  Shouldn’t the API to select a key just have an option to ask for no key to be used?

> +        *no-encrypt*
> +                 User requests that hardware does not encrypt
> +                 memory when this key is in use.

Same as above.  If there’s a performance benefit, then there could be a way to ask for cleartext memory.  Similarly, some pmem users may want a way to keep their pmem unencrypted.

—Andy

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 01/13] x86/mktme: Document the MKTME APIs
  2018-12-05 18:11   ` Andy Lutomirski
@ 2018-12-05 19:22     ` Alison Schofield
  2018-12-05 23:35       ` Andy Lutomirski
  0 siblings, 1 reply; 87+ messages in thread
From: Alison Schofield @ 2018-12-05 19:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: dhowells, tglx, jmorris, mingo, hpa, bp, luto, peterz,
	kirill.shutemov, dave.hansen, kai.huang, jun.nakajima,
	dan.j.williams, jarkko.sakkinen, keyrings, linux-security-module,
	linux-mm, x86

On Wed, Dec 05, 2018 at 10:11:18AM -0800, Andy Lutomirski wrote:
> 
> 
> > On Dec 3, 2018, at 11:39 PM, Alison Schofield <alison.schofield@intel.com> wrote:
> 
> I realize you’re writing code to expose hardware behavior, but I’m not sure this
> really makes sense in this context.

Your observation is accurate. The Usage defined here is very closely
aligned to the Intel MKTME Architecture spec. That's a starting point,
but not the ending point. We need to implement the feature set that
makes sense. More below...

> > +
> > +    type=
> > +        *user*    User will supply the encryption key data. Use this
> > +                type to directly program a hardware encryption key.
> > +
> 
> I think that “user” probably sense as a “key service” key, but I don’t think it is at all useful for non-persistent memory.  Even if we take for granted that MKTME for anonymous memory is useful at all, “cpu” seems to be better in all respects.
> 
> 
> Perhaps support for “user” should be tabled until there’s a design for how to use this for pmem?  I imagine it would look quite a bit like dm-crypt.  Advanced pmem filesystems could plausibly use different keys for different files, I suppose.
> 
> If “user” is dropped, I think a lot of the complexity goes away. Hotplug becomes automatic, right?

Dropping 'user' type removes a great deal of complexity.

Let me follow up in 2 ways:
1) Find out when MKTME support for pmem is required.
2) Go back to the the requirements and get the justification for user
type.

> 
> > +        *cpu*    User requests a CPU generated encryption key.
> 
> Okay, maybe, but it’s still unclear to me exactly what the intended benefit is, though.
*cpu* is the RANDOM key generated by the cpu. If there were no other
options, then this would be default, and go away.

> > +        *clear* User requests that a hardware encryption key be
> > +                cleared. This will clear the encryption key from
> > +                the hardware. On execution this hardware key gets
> > +                TME behavior.
> > +
> 
> Why is this a key type?  Shouldn’t the API to select a key just have an option to ask for no key to be used?

The *clear* key has been requested in order to clear/erase the users
key data that has been programmed into a hardware slot. User does not
want to leave a slot programmed with their encryption data when they
are done with it.

> > +        *no-encrypt*
> > +                 User requests that hardware does not encrypt
> > +                 memory when this key is in use.
> 
> Same as above.  If there’s a performance benefit, then there could be a way to ask for cleartext memory.  Similarly, some pmem users may want a way to keep their pmem unencrypted.

So, this is the way to ask for cleartext memory.
The entire system will be encrypted with the system wide TME Key.
A subset of that will be protected with MKTME Keys.
If user wants, no encrypt, this *no-encrypt* is the way to do it.

Alison
> 
> —Andy

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
                   ` (14 preceding siblings ...)
  2018-12-04 19:19 ` Andy Lutomirski
@ 2018-12-05 20:30 ` Sakkinen, Jarkko
  15 siblings, 0 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-05 20:30 UTC (permalink / raw)
  To: tglx, Schofield, Alison, dhowells
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, linux-mm,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	bp, Hansen, Dave, Nakajima, Jun

On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> Hi Thomas, David,
> 
> Here is an updated RFC on the API's to support MKTME.
> (Multi-Key Total Memory Encryption)
> 
> This RFC presents the 2 API additions to support the creation and
> usage of memory encryption keys:
>  1) Kernel Key Service type "mktme"
>  2) System call encrypt_mprotect()
> 
> This patchset is built upon Kirill Shutemov's work for the core MKTME
> support.

Please, explain what MKTME is right here.

No references, no explanations... Even with a reference, a short
summary would be really nice to have.

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-04  9:46   ` Kirill A. Shutemov
@ 2018-12-05 20:32     ` Sakkinen, Jarkko
  2018-12-06 11:22       ` Kirill A. Shutemov
  0 siblings, 1 reply; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-05 20:32 UTC (permalink / raw)
  To: kirill.shutemov, peterz
  Cc: jmorris, Huang, Kai, keyrings, tglx, linux-mm, dhowells,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	bp, Hansen, Dave, Schofield, Alison, Nakajima, Jun

On Tue, 2018-12-04 at 12:46 +0300, Kirill A. Shutemov wrote:
> On Tue, Dec 04, 2018 at 09:25:50AM +0000, Peter Zijlstra wrote:
> > On Mon, Dec 03, 2018 at 11:39:47PM -0800, Alison Schofield wrote:
> > > (Multi-Key Total Memory Encryption)
> > 
> > I think that MKTME is a horrible name, and doesn't appear to accurately
> > describe what it does either. Specifically the 'total' seems out of
> > place, it doesn't require all memory to be encrypted.
> 
> MKTME implies TME. TME is enabled by BIOS and it encrypts all memory with
> CPU-generated key. MKTME allows to use other keys or disable encryption
> for a page.

When you say "disable encryption to a page" does the encryption get
actually disabled or does the CPU just decrypt it transparently i.e.
what happens physically?

> But, yes, name is not good.

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-04 19:19 ` Andy Lutomirski
  2018-12-04 20:00   ` Andy Lutomirski
@ 2018-12-05 22:19   ` Sakkinen, Jarkko
  2018-12-07  2:05     ` Huang, Kai
  2018-12-07 11:57     ` Kirill A. Shutemov
  2018-12-05 23:49   ` Dave Hansen
  2 siblings, 2 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-05 22:19 UTC (permalink / raw)
  To: Williams, Dan J, Schofield, Alison, luto, willy
  Cc: kirill.shutemov, jmorris, peterz, Huang, Kai, keyrings, tglx,
	linux-mm, dhowells, linux-security-module, x86, hpa, mingo, bp,
	Hansen, Dave, Nakajima, Jun

On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote:
> I'm not Thomas, but I think it's the wrong direction.  As it stands,
> encrypt_mprotect() is an incomplete version of mprotect() (since it's
> missing the protection key support), and it's also functionally just
> MADV_DONTNEED.  In other words, the sole user-visible effect appears
> to be that the existing pages are blown away.  The fact that it
> changes the key in use doesn't seem terribly useful, since it's
> anonymous memory, and the most secure choice is to use CPU-managed
> keying, which appears to be the default anyway on TME systems.  It
> also has totally unclear semantics WRT swap, and, off the top of my
> head, it looks like it may have serious cache-coherency issues and
> like swapping the pages might corrupt them, both because there are no
> flushes and because the direct-map alias looks like it will use the
> default key and therefore appear to contain the wrong data.
> 
> I would propose a very different direction: don't try to support MKTME
> at all for anonymous memory, and instead figure out the important use
> cases and support them directly.  The use cases that I can think of
> off the top of my head are:
> 
> 1. pmem.  This should probably use a very different API.
> 
> 2. Some kind of VM hardening, where a VM's memory can be protected a
> little tiny bit from the main kernel.  But I don't see why this is any
> better than XPO (eXclusive Page-frame Ownership), which brings to
> mind:

What is the threat model anyway for AMD and Intel technologies?

For me it looks like that you can read, write and even replay 
encrypted pages both in SME and TME. 

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 01/13] x86/mktme: Document the MKTME APIs
  2018-12-05 19:22     ` Alison Schofield
@ 2018-12-05 23:35       ` Andy Lutomirski
  0 siblings, 0 replies; 87+ messages in thread
From: Andy Lutomirski @ 2018-12-05 23:35 UTC (permalink / raw)
  To: alison.schofield
  Cc: David Howells, Thomas Gleixner, James Morris, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Andrew Lutomirski,
	Peter Zijlstra, Kirill A. Shutemov, Dave Hansen, kai.huang,
	Jun Nakajima, Dan Williams, Sakkinen, Jarkko, keyrings, LSM List,
	Linux-MM, X86 ML

>> On Dec 5, 2018, at 11:22 AM, Alison Schofield <alison.schofield@intel.com> wrote:
>>
>> On Wed, Dec 05, 2018 at 10:11:18AM -0800, Andy Lutomirski wrote:
>>
>>
>>> On Dec 3, 2018, at 11:39 PM, Alison Schofield <alison.schofield@intel.com> wrote:
>>
>> I realize you’re writing code to expose hardware behavior, but I’m not sure this
>> really makes sense in this context.
>
> Your observation is accurate. The Usage defined here is very closely
> aligned to the Intel MKTME Architecture spec. That's a starting point,
> but not the ending point. We need to implement the feature set that
> makes sense. More below...
>
>>> +
>>> +    type=
>>> +        *user*    User will supply the encryption key data. Use this
>>> +                type to directly program a hardware encryption key.
>>> +
>>
>> I think that “user” probably sense as a “key service” key, but I don’t think it is at all useful for non-persistent memory.  Even if we take for granted that MKTME for anonymous memory is useful at all, “cpu” seems to be better in all respects.
>>
>>
>> Perhaps support for “user” should be tabled until there’s a design for how to use this for pmem?  I imagine it would look quite a bit like dm-crypt.  Advanced pmem filesystems could plausibly use different keys for different files, I suppose.
>>
>> If “user” is dropped, I think a lot of the complexity goes away. Hotplug becomes automatic, right?
>
> Dropping 'user' type removes a great deal of complexity.
>
> Let me follow up in 2 ways:
> 1) Find out when MKTME support for pmem is required.
> 2) Go back to the the requirements and get the justification for user
> type.
>
>>
>>> +        *cpu*    User requests a CPU generated encryption key.
>>
>> Okay, maybe, but it’s still unclear to me exactly what the intended benefit is, though.
> *cpu* is the RANDOM key generated by the cpu. If there were no other
> options, then this would be default, and go away.
>
>>> +        *clear* User requests that a hardware encryption key be
>>> +                cleared. This will clear the encryption key from
>>> +                the hardware. On execution this hardware key gets
>>> +                TME behavior.
>>> +
>>
>> Why is this a key type?  Shouldn’t the API to select a key just have an option to ask for no key to be used?
>
> The *clear* key has been requested in order to clear/erase the users
> key data that has been programmed into a hardware slot. User does not
> want to leave a slot programmed with their encryption data when they
> are done with it.

Can’t you just clear the key when the key is deleted by the user?
Asking the user to allocate a *new* key and hope that it somehow ends
up in the same spot seems like a poor design, especially if future
hardware gains support for key slot virtualization in some way that
makes the slot allocation more dynamic.

>
>>> +        *no-encrypt*
>>> +                 User requests that hardware does not encrypt
>>> +                 memory when this key is in use.
>>
>> Same as above.  If there’s a performance benefit, then there could be a way to ask for cleartext memory.  Similarly, some pmem users may want a way to keep their pmem unencrypted.
>
> So, this is the way to ask for cleartext memory.
> The entire system will be encrypted with the system wide TME Key.
> A subset of that will be protected with MKTME Keys.
> If user wants, no encrypt, this *no-encrypt* is the way to do it.
>

Understood.  I’m saying that having a *key* (in the add_key sense) for
it seems unnecessary.  Whatever the final API for controlling the use
of keys, adding an option to ask for clear text seems reasonable.
This actually seems more useful for anonymous memory than the
cpu-generates keys are IMO.

I do think that, before you invest too much time in perfecting the
series with the current design, you should identify the use cases,
make sure the use cases are valid, and figure out whether your API
design is appropriate.  After considerable head-scratching, I haven’t
thought of a reason that explicit CPU generated keys are any better
than the default TME key, at least in the absence of additional
hardware support for locking down what code can use what key.  The
sole exception is that a key can be removed, which is probably faster
than directly zeroing large amounts of data.

I understand that it would be very nice to say "hey, cloud customer,
your VM has all its memory encrypted with a key that is unique to your
VM", but that seems to be more or less just a platitude with no actual
effect.  Anyone who snoops the memory bus or steals a DIMM learns
nothing unless they also take control of the CPU and can replay all
the data into the CPU.  On the other hand, anyone who can get the CPU
to read from a given physical address (which seems like the most
likely threat) can just get the CPU to decrypt any tenant's data.  So,
for example, if someone manages to write a couple of words to the EPT
for one VM, then they can easily read another VM's data, MKTME or no
MKTME, because the memory controller has no clue which VM initiated
the access.

I suppose there's some smallish value in rotating the key every now
and then to make old data non-replayable, but an attack that
compromises the memory bus and only later compromises the CPU is a
strange threat model.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-04 19:19 ` Andy Lutomirski
  2018-12-04 20:00   ` Andy Lutomirski
  2018-12-05 22:19   ` Sakkinen, Jarkko
@ 2018-12-05 23:49   ` Dave Hansen
  2018-12-06  1:09     ` Andy Lutomirski
  2 siblings, 1 reply; 87+ messages in thread
From: Dave Hansen @ 2018-12-05 23:49 UTC (permalink / raw)
  To: Andy Lutomirski, alison.schofield, Matthew Wilcox, Dan Williams
  Cc: David Howells, Thomas Gleixner, James Morris, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Kirill A. Shutemov, kai.huang, Jun Nakajima, Sakkinen, Jarkko,
	keyrings, LSM List, Linux-MM, X86 ML

On 12/4/18 11:19 AM, Andy Lutomirski wrote:
> I'm not Thomas, but I think it's the wrong direction.  As it stands,
> encrypt_mprotect() is an incomplete version of mprotect() (since it's
> missing the protection key support),

I thought about this when I added mprotect_pkey().  We start with:

	mprotect(addr, len, prot);

then

	mprotect_pkey(addr, len, prot);

then

	mprotect_pkey_encrypt(addr, len, prot, key);

That doesn't scale because we eventually have
mprotect_and_a_history_of_mm_features(). :)

What I was hoping to see was them do this (apologies for the horrible
indentation:

	ptr = mmap(..., PROT_NONE);
	mprotect_pkey(   addr, len, PROT_NONE, pkey);
	mprotect_encrypt(addr, len, PROT_NONE, keyid);
	mprotect(        addr, len, real_prot);

The point is that you *can* stack these things and don't have to have an
mprotect_kitchen_sink() if you use PROT_NONE for intermediate
permissions during setup.

> and it's also functionally just MADV_DONTNEED.  In other words, the
> sole user-visible effect appears to be that the existing pages are
> blown away.  The fact that it changes the key in use doesn't seem
> terribly useful, since it's anonymous memory,

It's functionally MADV_DONTNEED, plus a future promise that your writes
will never show up as plaintext on the DIMM.

We also haven't settled on the file-backed properties.  For file-backed,
my hope was that you could do:

	ptr = mmap(fd, size, prot);
	printf("ciphertext: %x\n", *ptr);
	mprotect_encrypt(ptr, len, prot, keyid);
	printf("plaintext: %x\n", *ptr);

> and the most secure choice is to use CPU-managed keying, which
> appears to be the default anyway on TME systems.  It also has totally
> unclear semantics WRT swap, and, off the top of my head, it looks
> like it may have serious cache-coherency issues and like swapping the
> pages might corrupt them, both because there are no flushes and
> because the direct-map alias looks like it will use the default key
> and therefore appear to contain the wrong data.

I think we fleshed this out on IRC a bit, but the other part of the
implementation is described here: https://lwn.net/Articles/758313/, and
contains a direct map per keyid.  When you do phys_to_virt() and
friends, you get the correct, decrypted view direct map which is
appropriate for the physical page.  And, yes, this has very
consequential security implications.

> I would propose a very different direction: don't try to support MKTME
> at all for anonymous memory, and instead figure out the important use
> cases and support them directly.  The use cases that I can think of
> off the top of my head are:
> 
> 1. pmem.  This should probably use a very different API.
> 
> 2. Some kind of VM hardening, where a VM's memory can be protected a
> little tiny bit from the main kernel.  But I don't see why this is any
> better than XPO (eXclusive Page-frame Ownership), which brings to
> mind:

The XPO approach is "fun", and would certainly be a way to keep the
direct map from being exploited to get access to plain-text mappings of
ciphertext.

But, it also has massive performance implications and we didn't quite
want to go there quite yet.

> The main implementation concern I have with this patch set is cache
> coherency and handling of the direct map.  Unless I missed something,
> you're not doing anything about the direct map, which means that you
> have RW aliases of the same memory with different keys.  For use case
> #2, this probably means that you need to either get rid of the direct
> map and make get_user_pages() fail, or you need to change the key on
> the direct map as well, probably using the pageattr.c code.

The current, public hardware spec has a description of what's required
to maintain cache coherency.  Basically, you can keep as many mappings
of a physical page as you want, but only write to one mapping at a time,
and clflush the old one when you want to write to a new one.

> As for caching, As far as I can tell from reading the preliminary
> docs, Intel's MKTME, much like AMD's SME, is basically invisible to
> the hardware cache coherency mechanism.  So, if you modify a physical
> address with one key (or SME-enable bit), and you read it with
> another, you get garbage unless you flush.  And, if you modify memory
> with one key then remap it with a different key without flushing in
> the mean time, you risk corruption.

Yes, all true (at least with respect to Intel's implementation).

> And, what's worse, if I'm reading
> between the lines in the docs correctly, if you use PCONFIG to change
> a key, you may need to do a bunch of cache flushing to ensure you get
> reasonable effects.  (If you have dirty cache lines for some (PA, key)
> and you PCONFIG to change the underlying key, you get different
> results depending on whether the writeback happens before or after the
> package doing the writeback notices the PCONFIG.)

We're not going to allow a key to be PCONFIG'd while there are any
physical pages still associated with it.  There are per-VMA refcounts
tied back to the keyid slots, IIRC.  So, before PCONFIG can happen, we
just need to make sure that all the VMAs are gone, all the pages are
freed, and all dirty cachelines have been clflushed.

This is where get_user_pages() is our mortal enemy, though.  I hope we
got that right.  Kirill/Alison, we should chat about this one. :)

> Finally, If you're going to teach the kernel how to have some user
> pages that aren't in the direct map, you've essentially done XPO,
> which is nifty but expensive.  And I think that doing this gets you
> essentially all the benefit of MKTME for the non-pmem use case.  Why
> exactly would any software want to use anything other than a
> CPU-managed key for anything other than pmem?

It is handy, for one, to let you "cluster" key usage.  If you have 5
Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each
Coke one using the same key, you can boil it down to only 2 hardware
keyid slots that get used, and do this transparently.

But, I think what you're implying is that the security properties of
user-supplied keys can only be *worse* than using CPU-generated keys
(assuming the CPU does a good job generating it).  So, why bother
allowing user-specified keys in the first place?

It's a good question and I don't have a solid answer for why folks want
this.  I'll find out.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-05 23:49   ` Dave Hansen
@ 2018-12-06  1:09     ` Andy Lutomirski
  2018-12-06  1:25       ` Dan Williams
                         ` (2 more replies)
  0 siblings, 3 replies; 87+ messages in thread
From: Andy Lutomirski @ 2018-12-06  1:09 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Lutomirski, alison.schofield, Matthew Wilcox,
	Dan Williams, David Howells, Thomas Gleixner, James Morris,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Kirill A. Shutemov, kai.huang, Jun Nakajima, Sakkinen, Jarkko,
	keyrings, LSM List, Linux-MM, X86 ML

On Wed, Dec 5, 2018 at 3:49 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 12/4/18 11:19 AM, Andy Lutomirski wrote:
> > I'm not Thomas, but I think it's the wrong direction.  As it stands,
> > encrypt_mprotect() is an incomplete version of mprotect() (since it's
> > missing the protection key support),
>
> I thought about this when I added mprotect_pkey().  We start with:
>
>         mprotect(addr, len, prot);
>
> then
>
>         mprotect_pkey(addr, len, prot);
>
> then
>
>         mprotect_pkey_encrypt(addr, len, prot, key);
>
> That doesn't scale because we eventually have
> mprotect_and_a_history_of_mm_features(). :)
>
> What I was hoping to see was them do this (apologies for the horrible
> indentation:
>
>         ptr = mmap(..., PROT_NONE);
>         mprotect_pkey(   addr, len, PROT_NONE, pkey);
>         mprotect_encrypt(addr, len, PROT_NONE, keyid);
>         mprotect(        addr, len, real_prot);
>
> The point is that you *can* stack these things and don't have to have an
> mprotect_kitchen_sink() if you use PROT_NONE for intermediate
> permissions during setup.

Sure, but then why call it mprotect at all?  How about:

mmap(..., PROT_NONE);
mencrypt(..., keyid);
mprotect_pkey(...);

But wouldn't this be much nicer:

int fd = memfd_create(...);
memfd_set_tme_key(fd, keyid);  /* fails if len != 0 */
mmap(fd, ...);

>
> > and it's also functionally just MADV_DONTNEED.  In other words, the
> > sole user-visible effect appears to be that the existing pages are
> > blown away.  The fact that it changes the key in use doesn't seem
> > terribly useful, since it's anonymous memory,
>
> It's functionally MADV_DONTNEED, plus a future promise that your writes
> will never show up as plaintext on the DIMM.

But that's mostly vacuous.  If I read the docs right, MKTME systems
also support TME, so you *already* have that promise, unless the
firmware totally blew it.  If we want a boot option to have the kernel
use MKTME to forcibly encrypt everything regardless of what the TME
MSRs say, I'd be entirely on board.  Heck, the implementation would be
quite simple because we mostly reuse the SME code.

>
> We also haven't settled on the file-backed properties.  For file-backed,
> my hope was that you could do:
>
>         ptr = mmap(fd, size, prot);
>         printf("ciphertext: %x\n", *ptr);
>         mprotect_encrypt(ptr, len, prot, keyid);
>         printf("plaintext: %x\n", *ptr);

Why would you ever want the plaintext?  Also, how does this work on a
normal fs, where relocation of the file would cause the ciphertext to
get lost?  It really seems to be that it should look more like
dm-crypt where you encrypt a filesystem.  Maybe you'd just configure
the pmem device to be encrypted before you mount it, or you'd get a
new pmem-mktme device node instead.  This would also avoid some nasty
multiple-copies-of-the-direct-map issue, since you'd only ever have
one of them mapped.

>
> > The main implementation concern I have with this patch set is cache
> > coherency and handling of the direct map.  Unless I missed something,
> > you're not doing anything about the direct map, which means that you
> > have RW aliases of the same memory with different keys.  For use case
> > #2, this probably means that you need to either get rid of the direct
> > map and make get_user_pages() fail, or you need to change the key on
> > the direct map as well, probably using the pageattr.c code.
>
> The current, public hardware spec has a description of what's required
> to maintain cache coherency.  Basically, you can keep as many mappings
> of a physical page as you want, but only write to one mapping at a time,
> and clflush the old one when you want to write to a new one.

Surely you at least have to clflush the old mapping and then the new
mapping, since the new mapping could have been speculatively read.

> > Finally, If you're going to teach the kernel how to have some user
> > pages that aren't in the direct map, you've essentially done XPO,
> > which is nifty but expensive.  And I think that doing this gets you
> > essentially all the benefit of MKTME for the non-pmem use case.  Why
> > exactly would any software want to use anything other than a
> > CPU-managed key for anything other than pmem?
>
> It is handy, for one, to let you "cluster" key usage.  If you have 5
> Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each
> Coke one using the same key, you can boil it down to only 2 hardware
> keyid slots that get used, and do this transparently.

I understand this from a marketing perspective but not a security
perspective.  Say I'm Coke and you've sold me some VMs that are
"encrypted with a Coke-specific key and no other VMs get to use that
key."  I can't think of *any* not-exceedingly-contrived attack in
which this makes the slightest difference.  If Pepsi tries to attack
Coke without MKTME, then they'll either need to get the hypervisor to
leak Coke's data through the direct map or they'll have to find some
way to corrupt a page table or use something like L1TF to read from a
physical address Coke owns.  With MKTME, if they can read through the
host direct map, then they'll get Coke's cleartext, and if they can
corrupt a page table or use L1TF to read from your memory, they'll get
Coke's cleartext.

TME itself provides a ton of protection -- you can't just barge into
the datacenter, refrigerate the DIMMs, walk away with them, and read
off everyone's data.

Am I missing something?

>
> But, I think what you're implying is that the security properties of
> user-supplied keys can only be *worse* than using CPU-generated keys
> (assuming the CPU does a good job generating it).  So, why bother
> allowing user-specified keys in the first place?

That too :)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-06  1:09     ` Andy Lutomirski
@ 2018-12-06  1:25       ` Dan Williams
  2018-12-06 15:39       ` Dave Hansen
  2018-12-07  1:55       ` Huang, Kai
  2 siblings, 0 replies; 87+ messages in thread
From: Dan Williams @ 2018-12-06  1:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Schofield, Alison, Matthew Wilcox, David Howells,
	Thomas Gleixner, James Morris, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Kirill A. Shutemov, Huang, Kai,
	Nakajima, Jun, Jarkko Sakkinen, keyrings, linux-security-module,
	Linux MM, X86 ML

[ only responding to the pmem side of things... ]

On Wed, Dec 5, 2018 at 5:09 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Wed, Dec 5, 2018 at 3:49 PM Dave Hansen <dave.hansen@intel.com> wrote:
[..]
> > We also haven't settled on the file-backed properties.  For file-backed,
> > my hope was that you could do:
> >
> >         ptr = mmap(fd, size, prot);
> >         printf("ciphertext: %x\n", *ptr);
> >         mprotect_encrypt(ptr, len, prot, keyid);
> >         printf("plaintext: %x\n", *ptr);
>
> Why would you ever want the plaintext?  Also, how does this work on a
> normal fs, where relocation of the file would cause the ciphertext to
> get lost?  It really seems to be that it should look more like
> dm-crypt where you encrypt a filesystem.  Maybe you'd just configure
> the pmem device to be encrypted before you mount it, or you'd get a
> new pmem-mktme device node instead.  This would also avoid some nasty
> multiple-copies-of-the-direct-map issue, since you'd only ever have
> one of them mapped.

Yes, this is really the only way it can work. Otherwise you need to
teach the filesystem that "these blocks can't move without the key
because encryption", and have an fs-feature flag to say "you can't
mount this legacy / encryption unaware filesystem from an older kernel
because we're not sure you'll move something and break the
encryption".

So pmem namespaces (volumes) would be encrypted providing something
similar to dm-crypt, although we're looking at following the lead of
the fscrypt key management scheme.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 01/13] x86/mktme: Document the MKTME APIs
  2018-12-04  7:39 ` [RFC v2 01/13] x86/mktme: Document the MKTME APIs Alison Schofield
  2018-12-05 18:11   ` Andy Lutomirski
@ 2018-12-06  8:04   ` Sakkinen, Jarkko
  1 sibling, 0 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-06  8:04 UTC (permalink / raw)
  To: tglx, Schofield, Alison, dhowells
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, linux-mm,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	bp, Hansen, Dave, Nakajima, Jun

I'll focus my remarks now towards documentation as I have lots of
catching up to do with TME :-) I'll give more feedback of actual code
changes once v18 of the SGX patch set is out, pull request for TPM 4.21
changes is out and maybe a new version of this patch set has been
released.

Right now too much on my shoulders to go too deep with this patch set.

On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> This includes an overview, a section on each API: MTKME Keys and
> system call encrypt_mprotect(), and a demonstration program.
> 
> (Some of this info is destined for man pages.)
> 
> Change-Id: I34dc9ff1a1308c057ec4bb3e652c4d7ce6995606
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> ?

Not needed if this is for the most part written by you.

> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  Documentation/x86/mktme/index.rst          |  11 +++
>  Documentation/x86/mktme/mktme_demo.rst     |  53 ++++++++++++++
>  Documentation/x86/mktme/mktme_encrypt.rst  |  58 +++++++++++++++
>  Documentation/x86/mktme/mktme_keys.rst     | 109
> +++++++++++++++++++++++++++++
>  Documentation/x86/mktme/mktme_overview.rst |  60 ++++++++++++++++
>  5 files changed, 291 insertions(+)
>  create mode 100644 Documentation/x86/mktme/index.rst
>  create mode 100644 Documentation/x86/mktme/mktme_demo.rst
>  create mode 100644 Documentation/x86/mktme/mktme_encrypt.rst
>  create mode 100644 Documentation/x86/mktme/mktme_keys.rst
>  create mode 100644 Documentation/x86/mktme/mktme_overview.rst
> 
> diff --git a/Documentation/x86/mktme/index.rst
> b/Documentation/x86/mktme/index.rst
> new file mode 100644
> index 000000000000..8c556d04cbc4
> --- /dev/null
> +++ b/Documentation/x86/mktme/index.rst
> @@ -0,0 +1,11 @@
> +

SPDX?

> +=============================================
> +Multi-Key Total Memory Encryption (MKTME) API
> +=============================================
> +
> +.. toctree::
> +
> +   mktme_overview
> +   mktme_keys
> +   mktme_encrypt
> +   mktme_demo
> diff --git a/Documentation/x86/mktme/mktme_demo.rst
> b/Documentation/x86/mktme/mktme_demo.rst
> new file mode 100644
> index 000000000000..afd50772e65d
> --- /dev/null
> +++ b/Documentation/x86/mktme/mktme_demo.rst
> @@ -0,0 +1,53 @@
> +Demonstration Program using MKTME API's
> +=======================================

Probably would be better idea to put into tools/testings/selftest/x86
and not as part of the documentation.

> +
> +/* Compile with the keyutils library: cc -o mdemo mdemo.c -lkeyutils */
> +
> +#include <sys/mman.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +#include <keyutils.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +#define PAGE_SIZE sysconf(_SC_PAGE_SIZE)
> +#define sys_encrypt_mprotect 335
> +
> +void main(void)
> +{
> +	char *options_CPU = "algorithm=aes-xts-128 type=cpu";
> +	long size = PAGE_SIZE;
> +        key_serial_t key;
> +	void *ptra;
> +	int ret;
> +
> +        /* Allocate an MKTME Key */
> +	key = add_key("mktme", "testkey", options_CPU, strlen(options_CPU),
> +                      KEY_SPEC_THREAD_KEYRING);
> +
> +	if (key == -1) {
> +		printf("addkey FAILED\n");
> +		return;
> +	}
> +        /* Map a page of ANONYMOUS memory */
> +	ptra = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> +	if (!ptra) {
> +		printf("failed to mmap");
> +		goto inval_key;
> +	}
> +        /* Encrypt that page of memory with the MKTME Key */
> +	ret = syscall(sys_encrypt_mprotect, ptra, size, PROT_NONE, key);
> +	if (ret)
> +		printf("mprotect error [%d]\n", ret);
> +
> +        /* Enjoy that page of encrypted memory */
> +
> +        /* Free the memory */
> +	ret = munmap(ptra, size);
> +
> +inval_key:
> +        /* Free the Key */
> +	if (keyctl(KEYCTL_INVALIDATE, key) == -1)
> +		printf("invalidate failed on key [%d]\n", key);

Would it make sense to print error messages to stderr?

> +}
> diff --git a/Documentation/x86/mktme/mktme_encrypt.rst
> b/Documentation/x86/mktme/mktme_encrypt.rst
> new file mode 100644
> index 000000000000..ede5237183fc
> --- /dev/null
> +++ b/Documentation/x86/mktme/mktme_encrypt.rst
> @@ -0,0 +1,58 @@
> +MKTME API: system call encrypt_mprotect()
> +=========================================
> +
> +Synopsis
> +--------
> +int encrypt_mprotect(void \*addr, size_t len, int prot, key_serial_t serial);
> +
> +Where *key_serial_t serial* is the serial number of a key allocated
> +using the MKTME Key Service.

There is only one key service i.e. the kernel keyring. Should be rephrased
somehow.

> +
> +Description
> +-----------
> +    encrypt_mprotect() encrypts the memory pages containing any part
> +    of the address range in the interval specified by addr and len.

What does it actually do? I don't think the syscall does any encryption,
does it? I'm not looking SDM level details but somehow better
description what does it do would be nice.

> +
> +    encrypt_mprotect() supports the legacy mprotect() behavior plus
> +    the enabling of memory encryption. That means that in addition
> +    to encrypting the memory, the protection flags will be updated
> +    as requested in the call.

Ditto.

> +
> +    The *addr* and *len* must be aligned to a page boundary.
> +
> +    The caller must have *KEY_NEED_VIEW* permission on the key.

Maybe more verbose description, especially when it is a must.

> +
> +    The range of memory that is to be protected must be mapped as
> +    *ANONYMOUS*.

Ditto.

> +
> +Errors
> +------
> +    In addition to the Errors returned from legacy mprotect()
> +    encrypt_mprotect will return:
> +
> +    ENOKEY *serial* parameter does not represent a valid key.
> +
> +    EINVAL *len* parameter is not page aligned.
> +
> +    EACCES Caller does not have *KEY_NEED_VIEW* permission on the key.
> +
> +EXAMPLE
> +--------
> +  Allocate an MKTME Key::
> +        serial = add_key("mktme", "name", "type=cpu algorithm=aes-xts-128" @u
> +
> +  Map ANONYMOUS memory::
> +        ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> +
> +  Protect memory::
> +        ret = syscall(SYS_encrypt_mprotect, ptr, size, PROT_READ|PROT_WRITE,
> +                      serial);
> +
> +  Use the encrypted memory
> +
> +  Free memory::
> +        ret = munmap(ptr, size);
> +
> +  Free the key resource::
> +        ret = keyctl(KEYCTL_INVALIDATE, serial);
> +

Not really sure what this example serves as you already had an example
program...

Have you read pkeys man page? It has really good balance explaining how
it is implemented and used (man pkeys, not man pkey_mprotect()).

What if you suddenly change a key for VMA? I guess memory is then
corrupted? Not documented here. Should be.

I did not find the thing I was looking for most i.e. some high level
description of the threat model. Emphasis on high-level as kernel
documentation is not a CVE database.

> diff --git a/Documentation/x86/mktme/mktme_keys.rst
> b/Documentation/x86/mktme/mktme_keys.rst
> new file mode 100644
> index 000000000000..5837909b2c54
> --- /dev/null
> +++ b/Documentation/x86/mktme/mktme_keys.rst
> @@ -0,0 +1,109 @@
> +MKTME Key Service API
> +=====================
> +MKTME is a new key service type added to the Linux Kernel Key Service.
> +
> +The MKTME Key Service type is available when CONFIG_X86_INTEL_MKTME is
> +turned on in Intel platforms that support the MKTME feature.
> +
> +The MKTME Key Service type manages the allocation of hardware encryption
> +keys. Users can request an MKTME type key and then use that key to
> +encrypt memory with the encrypt_mprotect() system call.
> +
> +Usage
> +-----
> +    When using the Kernel Key Service to request an *mktme* key,
> +    specify the *payload* as follows:
> +
> +    type=
> +        *user*	User will supply the encryption key data. Use this
> +                type to directly program a hardware encryption key.
> +
> +        *cpu*	User requests a CPU generated encryption key.
> +                The CPU generates and assigns an ephemeral key.

How are these implemented? Is there an opcode to request CPU to generate
a key, or? What about the user key? Does cpu key ever leave out of the
CPU package?

The user key sounds like a really bad idea at the first sight and maybe
should be considered to be left out. What would be a legit use case for
it?

Are the keys per-process or is it a global resource?

> +        *clear* User requests that a hardware encryption key be
> +                cleared. This will clear the encryption key from
> +                the hardware. On execution this hardware key gets
> +                TME behavior.
> +
> +        *no-encrypt*
> +                 User requests that hardware does not encrypt
> +                 memory when this key is in use.

Not sure about these with my current knowledge.

> +
> +    algorithm=
> +        When type=user or type=cpu the algorithm field must be
> +        *aes-xts-128*
> +
> +        When type=clear or type=no-encrypt the algorithm field
> +        must not be present in the payload.

This parameter must be removed as it is a function of other paramaters
and nothing else i.e. complexity without gain.

> +	This document does not intend to document KKS, but only the
> +	MKTME type of the KKS. The options of the KKS can be grouped
> +	into 2 classes for purposes of understanding how MKTME operates
> +	within the broader KKS.

Maybe just delete this paragraph? I think it is just stating the
obvious.

I think you need this paragraph only because you have deployed this
document to wrong place. Better path would be

Documentation/security/keys/mktme.rst.

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 02/13] mm: Generalize the mprotect implementation to support extensions
  2018-12-04  7:39 ` [RFC v2 02/13] mm: Generalize the mprotect implementation to support extensions Alison Schofield
@ 2018-12-06  8:08   ` Sakkinen, Jarkko
  0 siblings, 0 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-06  8:08 UTC (permalink / raw)
  To: tglx, Schofield, Alison, dhowells
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, linux-mm,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	bp, Hansen, Dave, Nakajima, Jun

On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> Today mprotect is implemented to support legacy mprotect behavior
> plus an extension for memory protection keys. Make it more generic
> so that it can support additional extensions in the future.
> 
> This is done is preparation for adding a new system call for memory
> encyption keys. The intent is that the new encrypted mprotect will be
> another extension to legacy mprotect.
> 
> Change-Id: Ib09b9d1b605b12d0254d7fb4968dfcc8e3c79dd7

What is this??

> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/mprotect.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index df408956dccc..b57075e278fb 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -35,6 +35,8 @@
>  
>  #include "internal.h"
>  
> +#define NO_KEY	-1
> +
>  static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>  		unsigned long addr, unsigned long end, pgprot_t newprot,
>  		int dirty_accountable, int prot_numa)
> @@ -451,9 +453,9 @@ mprotect_fixup(struct vm_area_struct *vma, struct
> vm_area_struct **pprev,
>  }
>  
>  /*
> - * pkey==-1 when doing a legacy mprotect()
> + * When pkey==NO_KEY we get legacy mprotect behavior here.
>   */
> -static int do_mprotect_pkey(unsigned long start, size_t len,
> +static int do_mprotect_ext(unsigned long start, size_t len,
>  		unsigned long prot, int pkey)
>  {
>  	unsigned long nstart, end, tmp, reqprot;
> @@ -577,7 +579,7 @@ static int do_mprotect_pkey(unsigned long start, size_t
> len,
>  SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>  		unsigned long, prot)
>  {
> -	return do_mprotect_pkey(start, len, prot, -1);
> +	return do_mprotect_ext(start, len, prot, NO_KEY);
>  }
>  
>  #ifdef CONFIG_ARCH_HAS_PKEYS
> @@ -585,7 +587,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t,
> len,
>  SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,
>  		unsigned long, prot, int, pkey)
>  {
> -	return do_mprotect_pkey(start, len, prot, pkey);
> +	return do_mprotect_ext(start, len, prot, pkey);
>  }
>  
>  SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)

Would squash this whatever this is required for. This split makes
review more complex (IMHO).

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 04/13] x86/mm: Add helper functions for MKTME memory encryption keys
  2018-12-04  7:39 ` [RFC v2 04/13] x86/mm: Add helper functions for MKTME " Alison Schofield
  2018-12-04  9:14   ` Peter Zijlstra
  2018-12-04 15:35   ` Andy Lutomirski
@ 2018-12-06  8:31   ` Sakkinen, Jarkko
  2 siblings, 0 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-06  8:31 UTC (permalink / raw)
  To: tglx, Schofield, Alison, dhowells
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, linux-mm,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	bp, Hansen, Dave, Nakajima, Jun

On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> +extern int mktme_map_alloc(void);
> +extern void mktme_map_free(void);
> +extern void mktme_map_lock(void);
> +extern void mktme_map_unlock(void);
> +extern int mktme_map_mapped_keyids(void);
> +extern void mktme_map_set_keyid(int keyid, void *key);
> +extern void mktme_map_free_keyid(int keyid);
> +extern int mktme_map_keyid_from_key(void *key);
> +extern void *mktme_map_key_from_keyid(int keyid);
> +extern int mktme_map_get_free_keyid(void);

No need for extern keyword for function declarations. It is
only needed for variable declarations.

> +
>  DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
>  static inline bool mktme_enabled(void)
>  {
> diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
> index c81727540e7c..34224d4e3f45 100644
> --- a/arch/x86/mm/mktme.c
> +++ b/arch/x86/mm/mktme.c
> @@ -40,6 +40,97 @@ int __vma_keyid(struct vm_area_struct *vma)
>  	return (prot & mktme_keyid_mask) >> mktme_keyid_shift;
>  }
>  
> +/*
> + * struct mktme_map and the mktme_map_* functions manage the mapping
> + * of userspace Keys to hardware KeyIDs. These are used by the MKTME Key

What are "userspace Keys" anyway and why Key and not key?

> + * Service API and the encrypt_mprotect() system call.
> + */
> +
> +struct mktme_mapping {
> +	struct mutex	lock;		/* protect this map & HW state */
> +	unsigned int	mapped_keyids;
> +	void		*key[];
> +};

Personally, I prefer not to align struct fields (I do align enums
because there it makes more sense) as often you end up realigning
everything.

Documentation would bring more clarity. For example, what does key[]
contain, why there is a lock and what mapped_keyids field contains?

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 05/13] x86/mm: Set KeyIDs in encrypted VMAs
  2018-12-04  7:39 ` [RFC v2 05/13] x86/mm: Set KeyIDs in encrypted VMAs Alison Schofield
@ 2018-12-06  8:37   ` Sakkinen, Jarkko
  0 siblings, 0 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-06  8:37 UTC (permalink / raw)
  To: tglx, Schofield, Alison, dhowells
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, linux-mm,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	bp, Hansen, Dave, Nakajima, Jun

On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> MKTME architecture requires the KeyID to be placed in PTE bits 51:46.
> To create an encrypted VMA, place the KeyID in the upper bits of
> vm_page_prot that matches the position of those PTE bits.
> 
> When the VMA is assigned a KeyID it is always considered a KeyID
> change. The VMA is either going from not encrypted to encrypted,
> or from encrypted with any KeyID to encrypted with any other KeyID.
> To make the change safely, remove the user pages held by the VMA
> and unlink the VMA's anonymous chain.
> 
> Change-Id: I676056525c49c8803898315a10b196ef5a5c5415

Remove.

> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/include/asm/mktme.h |  4 ++++
>  arch/x86/mm/mktme.c          | 26 ++++++++++++++++++++++++++
>  include/linux/mm.h           |  6 ++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
> index dbb49909d665..de3e529f3ab0 100644
> --- a/arch/x86/include/asm/mktme.h
> +++ b/arch/x86/include/asm/mktme.h
> @@ -24,6 +24,10 @@ extern int mktme_map_keyid_from_key(void *key);
>  extern void *mktme_map_key_from_keyid(int keyid);
>  extern int mktme_map_get_free_keyid(void);
>  
> +/* Set the encryption keyid bits in a VMA */
> +extern void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
> +				unsigned long start, unsigned long end);
> +
>  DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
>  static inline bool mktme_enabled(void)
>  {
> diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
> index 34224d4e3f45..e3fdf7b48173 100644
> --- a/arch/x86/mm/mktme.c
> +++ b/arch/x86/mm/mktme.c
> @@ -1,5 +1,6 @@
>  #include <linux/mm.h>
>  #include <linux/highmem.h>
> +#include <linux/rmap.h>
>  #include <asm/mktme.h>
>  #include <asm/set_memory.h>
>  
> @@ -131,6 +132,31 @@ int mktme_map_get_free_keyid(void)
>  	return 0;
>  }
>  
> +/* Set the encryption keyid bits in a VMA */

Maybe proper kdoc?

> +void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
> +			  unsigned long start, unsigned long end)
> +{
> +	int oldkeyid = vma_keyid(vma);
> +	pgprotval_t newprot;
> +
> +	/* Unmap pages with old KeyID if there's any. */
> +	zap_page_range(vma, start, end - start);
> +
> +	if (oldkeyid == newkeyid)
> +		return;
> +
> +	newprot = pgprot_val(vma->vm_page_prot);
> +	newprot &= ~mktme_keyid_mask;
> +	newprot |= (unsigned long)newkeyid << mktme_keyid_shift;
> +	vma->vm_page_prot = __pgprot(newprot);
> +
> +	/*

No empty comment line.

> +	 * The VMA doesn't have any inherited pages.
> +	 * Start anon VMA tree from scratch.
> +	 */
> +}
> +
>  /* Prepare page to be used for encryption. Called from page allocator. */
>  void __prep_encrypted_page(struct page *page, int order, int keyid, bool
> zero)
>  {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1309761bb6d0..e2d87e92ca74 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2806,5 +2806,11 @@ void __init setup_nr_node_ids(void);
>  static inline void setup_nr_node_ids(void) {}
>  #endif
>  
> +#ifndef CONFIG_X86_INTEL_MKTME
> +static inline void mprotect_set_encrypt(struct vm_area_struct *vma,
> +					int newkeyid,
> +					unsigned long start,
> +					unsigned long end) {}

Add a new line and

{
}


> +#endif /* CONFIG_X86_INTEL_MKTME */
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 06/13] mm: Add the encrypt_mprotect() system call
  2018-12-04  7:39 ` [RFC v2 06/13] mm: Add the encrypt_mprotect() system call Alison Schofield
@ 2018-12-06  8:38   ` Sakkinen, Jarkko
  0 siblings, 0 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-06  8:38 UTC (permalink / raw)
  To: tglx, Schofield, Alison, dhowells
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, linux-mm,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	bp, Hansen, Dave, Nakajima, Jun

On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> Implement memory encryption with a new system call that is an
> extension of the legacy mprotect() system call.
> 
> In encrypt_mprotect the caller must pass a handle to a previously
> allocated and programmed encryption key. Validate the key and store
> the keyid bits in the vm_page_prot for each VMA in the protection
> range.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Why you don't use that NO_KEY in this patch?

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 10/13] keys/mktme: Add the MKTME Key Service type for memory encryption
  2018-12-04  7:39 ` [RFC v2 10/13] keys/mktme: Add the MKTME Key Service type for memory encryption Alison Schofield
@ 2018-12-06  8:51   ` Sakkinen, Jarkko
  2018-12-06  8:54     ` Sakkinen, Jarkko
  2018-12-06 15:11     ` Dave Hansen
  0 siblings, 2 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-06  8:51 UTC (permalink / raw)
  To: tglx, Schofield, Alison, dhowells
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, linux-mm,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	bp, Hansen, Dave, Nakajima, Jun

On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> MKTME (Multi-Key Total Memory Encryption) is a technology that allows
> transparent memory encryption in upcoming Intel platforms. MKTME will
> support mulitple encryption domains, each having their own key. The main
> use case for the feature is virtual machine isolation. The API needs the
> flexibility to work for a wide range of uses.

Some, maybe brutal, honesty (apologies)...

Have never really got the grip why either SME or TME would make
isolation any better. If you can break into hypervisor, you'll
have these tools availabe:

1. Read page (in encrypted form).
2. Write page (for example replay as pages are not versioned).

with all the side-channel possibilities of course since you can
control the VMs (in which core they execute etc.).

I've seen now SME presentation three times and it always leaves
me an empty feeling. This feels the same same.

> The MKTME key service type manages the addition and removal of the memory
> encryption keys. It maps Userspace Keys to hardware KeyIDs. It programs
> the hardware with the user requested encryption options.
> 
> The only supported encryption algorithm is AES-XTS 128.
> 
> The MKTME key service is half of the MKTME API level solution. It pairs
> with a new memory encryption system call: encrypt_mprotect() that uses
> the keys to encrypt memory.
> 
> See Documentation/x86/mktme/mktme.rst
> 
> Change-Id: Idda4af2beabb739c77719897affff183ee9fa716
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/Kconfig           |   1 +
>  include/keys/mktme-type.h  |  41 ++++++
>  security/keys/Kconfig      |  11 ++
>  security/keys/Makefile     |   1 +
>  security/keys/mktme_keys.c | 339
> +++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 393 insertions(+)
>  create mode 100644 include/keys/mktme-type.h
>  create mode 100644 security/keys/mktme_keys.c
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 7ac78e2856c7..c2e3bb5af077 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1531,6 +1531,7 @@ config X86_INTEL_MKTME
>  	bool "Intel Multi-Key Total Memory Encryption"
>  	select DYNAMIC_PHYSICAL_MASK
>  	select PAGE_EXTENSION
> +	select MKTME_KEYS
>  	depends on X86_64 && CPU_SUP_INTEL
>  	---help---
>  	  Say yes to enable support for Multi-Key Total Memory Encryption.
> diff --git a/include/keys/mktme-type.h b/include/keys/mktme-type.h
> new file mode 100644
> index 000000000000..c63c6568087f
> --- /dev/null
> +++ b/include/keys/mktme-type.h
> @@ -0,0 +1,41 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/* Key service for Multi-KEY Total Memory Encryption */
> +
> +#ifndef _KEYS_MKTME_TYPE_H
> +#define _KEYS_MKTME_TYPE_H
> +
> +#include <linux/key.h>
> +
> +/*
> + * The AES-XTS 128 encryption algorithm requires 128 bits for each
> + * user supplied data key and tweak key.
> + */
> +#define MKTME_AES_XTS_SIZE	16	/* 16 bytes, 128 bits */
> +
> +enum mktme_alg {
> +	MKTME_ALG_AES_XTS_128,
> +};
> +
> +const char *const mktme_alg_names[] = {
> +	[MKTME_ALG_AES_XTS_128]	= "aes-xts-128",
> +};
> +
> +enum mktme_type {
> +	MKTME_TYPE_ERROR = -1,
> +	MKTME_TYPE_USER,
> +	MKTME_TYPE_CPU,
> +	MKTME_TYPE_CLEAR,
> +	MKTME_TYPE_NO_ENCRYPT,
> +};
> +
> +const char *const mktme_type_names[] = {
> +	[MKTME_TYPE_USER]	= "user",
> +	[MKTME_TYPE_CPU]	= "cpu",
> +	[MKTME_TYPE_CLEAR]	= "clear",
> +	[MKTME_TYPE_NO_ENCRYPT]	= "no-encrypt",
> +};
> +
> +extern struct key_type key_type_mktme;
> +
> +#endif /* _KEYS_MKTME_TYPE_H */
> diff --git a/security/keys/Kconfig b/security/keys/Kconfig
> index 6462e6654ccf..c36972113e67 100644
> --- a/security/keys/Kconfig
> +++ b/security/keys/Kconfig
> @@ -101,3 +101,14 @@ config KEY_DH_OPERATIONS
>  	 in the kernel.
>  
>  	 If you are unsure as to whether this is required, answer N.
> +
> +config MKTME_KEYS
> +	bool "Multi-Key Total Memory Encryption Keys"
> +	depends on KEYS && X86_INTEL_MKTME
> +	help
> +	  This option provides support for Multi-Key Total Memory
> +	  Encryption (MKTME) on Intel platforms offering the feature.
> +	  MKTME allows userspace to manage the hardware encryption
> +	  keys through the kernel key services.
> +
> +	  If you are unsure as to whether this is required, answer N.
> diff --git a/security/keys/Makefile b/security/keys/Makefile
> index 9cef54064f60..94c84f10a857 100644
> --- a/security/keys/Makefile
> +++ b/security/keys/Makefile
> @@ -30,3 +30,4 @@ obj-$(CONFIG_ASYMMETRIC_KEY_TYPE) += keyctl_pkey.o
>  obj-$(CONFIG_BIG_KEYS) += big_key.o
>  obj-$(CONFIG_TRUSTED_KEYS) += trusted.o
>  obj-$(CONFIG_ENCRYPTED_KEYS) += encrypted-keys/
> +obj-$(CONFIG_MKTME_KEYS) += mktme_keys.o
> diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
> new file mode 100644
> index 000000000000..e615eb58e600
> --- /dev/null
> +++ b/security/keys/mktme_keys.c
> @@ -0,0 +1,339 @@
> +// SPDX-License-Identifier: GPL-3.0
> +
> +/* Documentation/x86/mktme/mktme_keys.rst */
> +
> +#include <linux/cred.h>
> +#include <linux/cpu.h>
> +#include <linux/err.h>
> +#include <linux/init.h>
> +#include <linux/key.h>
> +#include <linux/key-type.h>
> +#include <linux/init.h>
> +#include <linux/parser.h>
> +#include <linux/random.h>
> +#include <linux/slab.h>
> +#include <linux/string.h>
> +#include <asm/intel_pconfig.h>
> +#include <asm/mktme.h>
> +#include <keys/mktme-type.h>
> +#include <keys/user-type.h>
> +
> +#include "internal.h"
> +
> +struct kmem_cache *mktme_prog_cache;	/* Hardware programming cache */
> +
> +static const char * const mktme_program_err[] = {
> +	"KeyID was successfully programmed",	/* 0 */
> +	"Invalid KeyID programming command",	/* 1 */
> +	"Insufficient entropy",			/* 2 */
> +	"KeyID not valid",			/* 3 */
> +	"Invalid encryption algorithm chosen",	/* 4 */
> +	"Failure to access key table",		/* 5 */
> +};
> +
> +enum mktme_opt_id {
> +	OPT_ERROR = -1,
> +	OPT_TYPE,
> +	OPT_KEY,
> +	OPT_TWEAK,
> +	OPT_ALGORITHM,
> +};
> +
> +static const match_table_t mktme_token = {
> +	{OPT_TYPE, "type=%s"},
> +	{OPT_KEY, "key=%s"},
> +	{OPT_TWEAK, "tweak=%s"},
> +	{OPT_ALGORITHM, "algorithm=%s"},
> +	{OPT_ERROR, NULL}
> +};
> +
> +struct mktme_payload {
> +	u32		keyid_ctrl;	/* Command & Encryption Algorithm */
> +	u8		data_key[MKTME_AES_XTS_SIZE];
> +	u8		tweak_key[MKTME_AES_XTS_SIZE];
> +};
> +
> +/* Key Service Method called when Key is garbage collected. */
> +static void mktme_destroy_key(struct key *key)
> +{
> +	key_put_encrypt_ref(mktme_map_keyid_from_key(key));
> +}
> +
> +/* Copy the payload to the HW programming structure and program this KeyID */
> +static int mktme_program_keyid(int keyid, struct mktme_payload *payload)
> +{
> +	struct mktme_key_program *kprog = NULL;
> +	u8 kern_entropy[MKTME_AES_XTS_SIZE];
> +	int i, ret;
> +
> +	kprog = kmem_cache_zalloc(mktme_prog_cache, GFP_KERNEL);
> +	if (!kprog)
> +		return -ENOMEM;
> +
> +	/* Hardware programming requires cached aligned struct */
> +	kprog->keyid = keyid;
> +	kprog->keyid_ctrl = payload->keyid_ctrl;
> +	memcpy(kprog->key_field_1, payload->data_key, MKTME_AES_XTS_SIZE);
> +	memcpy(kprog->key_field_2, payload->tweak_key, MKTME_AES_XTS_SIZE);
> +
> +	/* Strengthen the entropy fields for CPU generated keys */
> +	if ((payload->keyid_ctrl & 0xff) == MKTME_KEYID_SET_KEY_RANDOM) {
> +		get_random_bytes(&kern_entropy, sizeof(kern_entropy));
> +		for (i = 0; i < (MKTME_AES_XTS_SIZE); i++) {
> +			kprog->key_field_1[i] ^= kern_entropy[i];
> +			kprog->key_field_2[i] ^= kern_entropy[i];
> +		}
> +	}
> +	ret = mktme_key_program(kprog);
> +	kmem_cache_free(mktme_prog_cache, kprog);
> +	return ret;
> +}
> +
> +/* Key Service Method to update an existing key. */
> +static int mktme_update_key(struct key *key,
> +			    struct key_preparsed_payload *prep)
> +{
> +	struct mktme_payload *payload = prep->payload.data[0];
> +	int keyid, ref_count;
> +	int ret;
> +
> +	mktme_map_lock();
> +	keyid = mktme_map_keyid_from_key(key);
> +	if (keyid <= 0)
> +		return -EINVAL;
> +	/*
> +	 * ref_count will be at least one when we get here because the
> +	 * key already exists. If ref_count is not > 1, it is safe to
> +	 * update the key while holding the mktme_map_lock.
> +	 */
> +	ref_count = mktme_read_encrypt_ref(keyid);
> +	if (ref_count > 1) {
> +		pr_debug("mktme not updating keyid[%d] encrypt_count[%d]\n",
> +			 keyid, ref_count);
> +		return -EBUSY;
> +	}
> +	ret = mktme_program_keyid(keyid, payload);
> +	if (ret != MKTME_PROG_SUCCESS) {
> +		pr_debug("%s: %s\n", __func__, mktme_program_err[ret]);
> +		ret = -ENOKEY;
> +	}
> +	mktme_map_unlock();
> +	return ret;
> +}
> +
> +/* Key Service Method to create a new key. Payload is preparsed. */
> +int mktme_instantiate_key(struct key *key, struct key_preparsed_payload
> *prep)
> +{
> +	struct mktme_payload *payload = prep->payload.data[0];
> +	int keyid, ret;
> +
> +	mktme_map_lock();
> +	keyid = mktme_map_get_free_keyid();
> +	if (keyid == 0) {
> +		ret = -ENOKEY;
> +		goto out;
> +	}
> +	ret = mktme_program_keyid(keyid, payload);
> +	if (ret != MKTME_PROG_SUCCESS) {
> +		pr_debug("%s: %s\n", __func__, mktme_program_err[ret]);
> +		ret = -ENOKEY;
> +		goto out;
> +	}
> +	mktme_map_set_keyid(keyid, key);
> +	key_get_encrypt_ref(keyid);
> +out:
> +	mktme_map_unlock();
> +	return ret;
> +}
> +
> +/* Verify the user provided the needed arguments for the TYPE of Key */
> +static int mktme_check_options(struct mktme_payload *payload,
> +			       unsigned long token_mask, enum mktme_type type)
> +{
> +	if (!token_mask)
> +		return -EINVAL;
> +
> +	switch (type) {
> +	case MKTME_TYPE_USER:
> +		if (test_bit(OPT_ALGORITHM, &token_mask))
> +			payload->keyid_ctrl |= MKTME_AES_XTS_128;
> +		else
> +			return -EINVAL;
> +
> +		if ((test_bit(OPT_KEY, &token_mask)) &&
> +		    (test_bit(OPT_TWEAK, &token_mask)))
> +			payload->keyid_ctrl |= MKTME_KEYID_SET_KEY_DIRECT;
> +		else
> +			return -EINVAL;
> +		break;
> +
> +	case MKTME_TYPE_CPU:
> +		if (test_bit(OPT_ALGORITHM, &token_mask))
> +			payload->keyid_ctrl |= MKTME_AES_XTS_128;
> +		else
> +			return -EINVAL;
> +
> +		payload->keyid_ctrl |= MKTME_KEYID_SET_KEY_RANDOM;
> +		break;
> +
> +	case MKTME_TYPE_CLEAR:
> +		payload->keyid_ctrl |= MKTME_KEYID_CLEAR_KEY;
> +		break;
> +
> +	case MKTME_TYPE_NO_ENCRYPT:
> +		payload->keyid_ctrl |= MKTME_KEYID_NO_ENCRYPT;
> +		break;
> +
> +	default:
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +/* Parse the options and store the key programming data in the payload. */
> +static int mktme_get_options(char *options, struct mktme_payload *payload)
> +{
> +	enum mktme_type type = MKTME_TYPE_ERROR;
> +	substring_t args[MAX_OPT_ARGS];
> +	unsigned long token_mask = 0;
> +	char *p = options;
> +	int ret, token;
> +
> +	while ((p = strsep(&options, " \t"))) {
> +		if (*p == '\0' || *p == ' ' || *p == '\t')
> +			continue;
> +		token = match_token(p, mktme_token, args);
> +		if (test_and_set_bit(token, &token_mask))
> +			return -EINVAL;
> +
> +		switch (token) {
> +		case OPT_KEY:
> +			ret = hex2bin(payload->data_key, args[0].from,
> +				      MKTME_AES_XTS_SIZE);
> +			if (ret < 0)
> +				return -EINVAL;
> +			break;
> +
> +		case OPT_TWEAK:
> +			ret = hex2bin(payload->tweak_key, args[0].from,
> +				      MKTME_AES_XTS_SIZE);
> +			if (ret < 0)
> +				return -EINVAL;
> +			break;
> +
> +		case OPT_TYPE:
> +			type = match_string(mktme_type_names,
> +					    ARRAY_SIZE(mktme_type_names),
> +					    args[0].from);
> +			if (type < 0)
> +				return -EINVAL;
> +			break;
> +
> +		case OPT_ALGORITHM:
> +			ret = match_string(mktme_alg_names,
> +					   ARRAY_SIZE(mktme_alg_names),
> +					   args[0].from);
> +			if (ret < 0)
> +				return -EINVAL;
> +			break;
> +
> +		default:
> +			return -EINVAL;
> +		}
> +	}
> +	return mktme_check_options(payload, token_mask, type);
> +}
> +
> +void mktme_free_preparsed_key(struct key_preparsed_payload *prep)
> +{
> +	kzfree(prep->payload.data[0]);
> +}
> +
> +/*
> + * Key Service Method to preparse a payload before a key is created.
> + * Check permissions and the options. Load the proposed key field
> + * data into the payload for use by instantiate and update methods.
> + */
> +int mktme_preparse_key(struct key_preparsed_payload *prep)
> +{
> +	struct mktme_payload *mktme_payload;
> +	size_t datalen = prep->datalen;
> +	char *options;
> +	int ret;
> +
> +	if (!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> +		return -EACCES;
> +
> +	if (datalen <= 0 || datalen > 1024 || !prep->data)
> +		return -EINVAL;
> +
> +	options = kmemdup(prep->data, datalen + 1, GFP_KERNEL);
> +	if (!options)
> +		return -ENOMEM;
> +
> +	options[datalen] = '\0';
> +
> +	mktme_payload = kzalloc(sizeof(*mktme_payload), GFP_KERNEL);
> +	if (!mktme_payload) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	ret = mktme_get_options(options, mktme_payload);
> +	if (ret < 0)
> +		goto out;
> +
> +	prep->quotalen = sizeof(mktme_payload);
> +	prep->payload.data[0] = mktme_payload;
> +out:
> +	kzfree(options);
> +	return ret;
> +}
> +
> +struct key_type key_type_mktme = {
> +	.name		= "mktme",
> +	.preparse	= mktme_preparse_key,
> +	.free_preparse	= mktme_free_preparsed_key,
> +	.instantiate	= mktme_instantiate_key,
> +	.update		= mktme_update_key,
> +	.describe	= user_describe,
> +	.destroy	= mktme_destroy_key,
> +};
> +
> +/*
> + * Allocate the global mktme_map structure based on the available keyids.
> + * Create a cache for the hardware structure. Initialize the encrypt_count
> + * array to track * VMA's per keyid. Once all that succeeds, register the
> + * 'mktme' key type.
> + */
> +static int __init init_mktme(void)
> +{
> +	int ret;
> +
> +	/* Verify keys are present */
> +	if (!(mktme_nr_keyids > 0))
> +		return -EINVAL;
> +
> +	if (!mktme_map_alloc())
> +		return -ENOMEM;
> +
> +	mktme_prog_cache = KMEM_CACHE(mktme_key_program, SLAB_PANIC);
> +	if (!mktme_prog_cache)
> +		goto free_map;
> +
> +	if (mktme_alloc_encrypt_array() < 0)
> +		goto free_cache;
> +
> +	ret = register_key_type(&key_type_mktme);
> +	if (!ret)
> +		return ret;			/* SUCCESS */
> +
> +	mktme_free_encrypt_array();
> +free_cache:
> +	kmem_cache_destroy(mktme_prog_cache);
> +free_map:

Maybe err_free_cache and err_map (more than cosmetic)?

> +	mktme_map_free();
> +
> +	return -ENOMEM;
> +}
> +
> +late_initcall(init_mktme);

As for code change. Overally, it looks good! But before using time
for detailed review or testing (once I get a chance to acquire
something to test it) lets go through the documentation discussion.
Clearly there is bunch of stuff that can be cut...

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 10/13] keys/mktme: Add the MKTME Key Service type for memory encryption
  2018-12-06  8:51   ` Sakkinen, Jarkko
@ 2018-12-06  8:54     ` Sakkinen, Jarkko
  2018-12-06 15:11     ` Dave Hansen
  1 sibling, 0 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-06  8:54 UTC (permalink / raw)
  To: tglx, Schofield, Alison, dhowells
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, linux-mm,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	bp, Hansen, Dave, Nakajima, Jun

On Thu, 2018-12-06 at 00:51 -0800, Jarkko Sakkinen wrote:
> On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> > MKTME (Multi-Key Total Memory Encryption) is a technology that allows
> > transparent memory encryption in upcoming Intel platforms. MKTME will
> > support mulitple encryption domains, each having their own key. The main
> > use case for the feature is virtual machine isolation. The API needs the
> > flexibility to work for a wide range of uses.
> 
> Some, maybe brutal, honesty (apologies)...
> 
> Have never really got the grip why either SME or TME would make
> isolation any better. If you can break into hypervisor, you'll
> have these tools availabe:
> 
> 1. Read page (in encrypted form).
> 2. Write page (for example replay as pages are not versioned).
> 
> with all the side-channel possibilities of course since you can
> control the VMs (in which core they execute etc.).
> 
> I've seen now SME presentation three times and it always leaves
> me an empty feeling. This feels the same same.

I.e. need to tell very explicitly the scenario where this will
help. Not saying that this should resolve everything but it must
resolve something.

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-05 20:32     ` Sakkinen, Jarkko
@ 2018-12-06 11:22       ` Kirill A. Shutemov
  2018-12-06 14:59         ` Dave Hansen
  2018-12-06 21:23         ` Sakkinen, Jarkko
  0 siblings, 2 replies; 87+ messages in thread
From: Kirill A. Shutemov @ 2018-12-06 11:22 UTC (permalink / raw)
  To: Sakkinen, Jarkko
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, tglx,
	linux-mm, dhowells, linux-security-module, Williams, Dan J, x86,
	hpa, mingo, luto, bp, Hansen, Dave, Schofield, Alison, Nakajima,
	Jun

On Wed, Dec 05, 2018 at 08:32:52PM +0000, Sakkinen, Jarkko wrote:
> On Tue, 2018-12-04 at 12:46 +0300, Kirill A. Shutemov wrote:
> > On Tue, Dec 04, 2018 at 09:25:50AM +0000, Peter Zijlstra wrote:
> > > On Mon, Dec 03, 2018 at 11:39:47PM -0800, Alison Schofield wrote:
> > > > (Multi-Key Total Memory Encryption)
> > > 
> > > I think that MKTME is a horrible name, and doesn't appear to accurately
> > > describe what it does either. Specifically the 'total' seems out of
> > > place, it doesn't require all memory to be encrypted.
> > 
> > MKTME implies TME. TME is enabled by BIOS and it encrypts all memory with
> > CPU-generated key. MKTME allows to use other keys or disable encryption
> > for a page.
> 
> When you say "disable encryption to a page" does the encryption get
> actually disabled or does the CPU just decrypt it transparently i.e.
> what happens physically?

Yes, it gets disabled. Physically. It overrides TME encryption.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-06 11:22       ` Kirill A. Shutemov
@ 2018-12-06 14:59         ` Dave Hansen
  2018-12-07 10:12           ` Huang, Kai
  2018-12-06 21:23         ` Sakkinen, Jarkko
  1 sibling, 1 reply; 87+ messages in thread
From: Dave Hansen @ 2018-12-06 14:59 UTC (permalink / raw)
  To: Kirill A. Shutemov, Sakkinen, Jarkko
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, tglx,
	linux-mm, dhowells, linux-security-module, Williams, Dan J, x86,
	hpa, mingo, luto, bp, Schofield, Alison, Nakajima, Jun

On 12/6/18 3:22 AM, Kirill A. Shutemov wrote:
>> When you say "disable encryption to a page" does the encryption get
>> actually disabled or does the CPU just decrypt it transparently i.e.
>> what happens physically?
> Yes, it gets disabled. Physically. It overrides TME encryption.

I know MKTME itself has a runtime overhead and we expect it to have a
performance impact in the low single digits.  Does TME have that
overhead?  Presumably MKTME plus no-encryption is not expected to have
the overhead.

We should probably mention that in the changelogs too.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 10/13] keys/mktme: Add the MKTME Key Service type for memory encryption
  2018-12-06  8:51   ` Sakkinen, Jarkko
  2018-12-06  8:54     ` Sakkinen, Jarkko
@ 2018-12-06 15:11     ` Dave Hansen
  2018-12-06 22:56       ` Sakkinen, Jarkko
  1 sibling, 1 reply; 87+ messages in thread
From: Dave Hansen @ 2018-12-06 15:11 UTC (permalink / raw)
  To: Sakkinen, Jarkko, tglx, Schofield, Alison, dhowells
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, linux-mm,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	bp, Nakajima, Jun

On 12/6/18 12:51 AM, Sakkinen, Jarkko wrote:
> On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
>> MKTME (Multi-Key Total Memory Encryption) is a technology that allows
>> transparent memory encryption in upcoming Intel platforms. MKTME will
>> support mulitple encryption domains, each having their own key. The main
>> use case for the feature is virtual machine isolation. The API needs the
>> flexibility to work for a wide range of uses.
> Some, maybe brutal, honesty (apologies)...
> 
> Have never really got the grip why either SME or TME would make
> isolation any better. If you can break into hypervisor, you'll
> have these tools availabe:

For systems using MKTME, the hypervisor is within the "trust boundary".
 From what I've read, it is a bit _more_ trusted than with AMD's scheme.

But, yes, if you can mount a successful arbitrary code execution attack
against the MKTME hypervisor, you can defeat MKTME's protections.  If
the kernel creates non-encrypted mappings of memory that's being
encrypted with MKTME, an arbitrary read primitive could also be a very
valuable in defeating MKTME's protections.  That's why Andy is proposing
doing something like eXclusive-Page-Frame-Ownership (google XPFO).

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-06  1:09     ` Andy Lutomirski
  2018-12-06  1:25       ` Dan Williams
@ 2018-12-06 15:39       ` Dave Hansen
  2018-12-06 19:10         ` Andy Lutomirski
  2018-12-07  1:55       ` Huang, Kai
  2 siblings, 1 reply; 87+ messages in thread
From: Dave Hansen @ 2018-12-06 15:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: alison.schofield, Matthew Wilcox, Dan Williams, David Howells,
	Thomas Gleixner, James Morris, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Kirill A. Shutemov, kai.huang,
	Jun Nakajima, Sakkinen, Jarkko, keyrings, LSM List, Linux-MM,
	X86 ML

On 12/5/18 5:09 PM, Andy Lutomirski wrote:
> On Wed, Dec 5, 2018 at 3:49 PM Dave Hansen <dave.hansen@intel.com> wrote:
>> What I was hoping to see was them do this (apologies for the horrible
>> indentation:
>>
>>         ptr = mmap(..., PROT_NONE);
>>         mprotect_pkey(   addr, len, PROT_NONE, pkey);
>>         mprotect_encrypt(addr, len, PROT_NONE, keyid);
>>         mprotect(        addr, len, real_prot);
>>
>> The point is that you *can* stack these things and don't have to have an
>> mprotect_kitchen_sink() if you use PROT_NONE for intermediate
>> permissions during setup.
> 
> Sure, but then why call it mprotect at all?  How about:
> 
> mmap(..., PROT_NONE);
> mencrypt(..., keyid);
> mprotect_pkey(...);

That would totally work too.  I just like the idea of a family of
mprotect() syscalls that do mprotect() plus one other thing.  What
you're proposing is totally equivalent where we have mprotect_pkey()
always being the *last* thing that gets called, plus a family of things
that we expect to get called on something that's probably PROT_NONE.

> But wouldn't this be much nicer:
> 
> int fd = memfd_create(...);
> memfd_set_tme_key(fd, keyid);  /* fails if len != 0 */
> mmap(fd, ...);

No. :)

One really big advantage with protection keys, or this implementation is
that you don't have to implement an allocator.  You can use it with any
old malloc() as long as you own a whole page.

The pages also fundamentally *stay* anonymous in the VM and get all the
goodness that comes with that, like THP.

>>> and it's also functionally just MADV_DONTNEED.  In other words, the
>>> sole user-visible effect appears to be that the existing pages are
>>> blown away.  The fact that it changes the key in use doesn't seem
>>> terribly useful, since it's anonymous memory,
>>
>> It's functionally MADV_DONTNEED, plus a future promise that your writes
>> will never show up as plaintext on the DIMM.
> 
> But that's mostly vacuous.  If I read the docs right, MKTME systems
> also support TME, so you *already* have that promise, unless the
> firmware totally blew it.  If we want a boot option to have the kernel
> use MKTME to forcibly encrypt everything regardless of what the TME
> MSRs say, I'd be entirely on board.  Heck, the implementation would be
> quite simple because we mostly reuse the SME code.

Yeah, that's true.  I seem to always forget about the TME case! :)

"It's functionally MADV_DONTNEED, plus a future promise that your writes
will never be written to the DIMM with the TME key."

But, this gets us back to your very good question about what good this
does in the end.  What value does _that_ scheme provide over TME?  We're
admittedly weak on specific examples there, but I'm working on it.

>>> the direct map as well, probably using the pageattr.c code.
>>
>> The current, public hardware spec has a description of what's required
>> to maintain cache coherency.  Basically, you can keep as many mappings
>> of a physical page as you want, but only write to one mapping at a time,
>> and clflush the old one when you want to write to a new one.
> 
> Surely you at least have to clflush the old mapping and then the new
> mapping, since the new mapping could have been speculatively read.

Nope.  The coherency is "fine" unless you have writeback of an older
cacheline that blows away newer data.  CPUs that support MKTME are
guaranteed to never do writeback of the lines that could be established
speculatively or from prefetching.

>>> Finally, If you're going to teach the kernel how to have some user
>>> pages that aren't in the direct map, you've essentially done XPO,
>>> which is nifty but expensive.  And I think that doing this gets you
>>> essentially all the benefit of MKTME for the non-pmem use case.  Why
>>> exactly would any software want to use anything other than a
>>> CPU-managed key for anything other than pmem?
>>
>> It is handy, for one, to let you "cluster" key usage.  If you have 5
>> Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each
>> Coke one using the same key, you can boil it down to only 2 hardware
>> keyid slots that get used, and do this transparently.
> 
> I understand this from a marketing perspective but not a security
> perspective.  Say I'm Coke and you've sold me some VMs that are
> "encrypted with a Coke-specific key and no other VMs get to use that
> key."  I can't think of *any* not-exceedingly-contrived attack in
> which this makes the slightest difference.  If Pepsi tries to attack
> Coke without MKTME, then they'll either need to get the hypervisor to
> leak Coke's data through the direct map or they'll have to find some
> way to corrupt a page table or use something like L1TF to read from a
> physical address Coke owns.  With MKTME, if they can read through the
> host direct map, then they'll get Coke's cleartext, and if they can
> corrupt a page table or use L1TF to read from your memory, they'll get
> Coke's cleartext.

The design definitely has the hypervisor in the trust boundary.  If the
hypervisor is evil, or if someone evil compromises the hypervisor, MKTME
obviously provides less protection.

I guess the question ends up being if this makes its protections weak
enough that we should not bother merging it in its current form.

I still have the homework assignment to go figure out why folks want the
protections as they stand.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-06 15:39       ` Dave Hansen
@ 2018-12-06 19:10         ` Andy Lutomirski
  2018-12-06 19:31           ` Dave Hansen
  0 siblings, 1 reply; 87+ messages in thread
From: Andy Lutomirski @ 2018-12-06 19:10 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Lutomirski, Alison Schofield, Matthew Wilcox,
	Dan Williams, David Howells, Thomas Gleixner, James Morris,
	Ingo Molnar, H. Peter Anvin, Borislav Petkov, Peter Zijlstra,
	Kirill A. Shutemov, kai.huang, Jun Nakajima, Sakkinen, Jarkko,
	keyrings, LSM List, Linux-MM, X86 ML

> On Dec 6, 2018, at 7:39 AM, Dave Hansen <dave.hansen@intel.com> wrote:

>>>> the direct map as well, probably using the pageattr.c code.
>>>
>>> The current, public hardware spec has a description of what's required
>>> to maintain cache coherency.  Basically, you can keep as many mappings
>>> of a physical page as you want, but only write to one mapping at a time,
>>> and clflush the old one when you want to write to a new one.
>>
>> Surely you at least have to clflush the old mapping and then the new
>> mapping, since the new mapping could have been speculatively read.
>
> Nope.  The coherency is "fine" unless you have writeback of an older
> cacheline that blows away newer data.  CPUs that support MKTME are
> guaranteed to never do writeback of the lines that could be established
> speculatively or from prefetching.

How is that sufficient?  Suppose I have some physical page mapped with
keys 1 and 2. #1 is logically live and I write to it.  Then I prefetch
or otherwise populate mapping 2 into the cache (in the S state,
presumably).  Now I clflush mapping 1 and read 2.  It contains garbage
in the cache, but the garbage in the cache is inconsistent with the
garbage in memory.  This can’t be a good thing, even if no writeback
occurs.

I suppose the right fix is to clflush the old mapping and then to zero
the new mapping.

>
>>>> Finally, If you're going to teach the kernel how to have some user
>>>> pages that aren't in the direct map, you've essentially done XPO,
>>>> which is nifty but expensive.  And I think that doing this gets you
>>>> essentially all the benefit of MKTME for the non-pmem use case.  Why
>>>> exactly would any software want to use anything other than a
>>>> CPU-managed key for anything other than pmem?
>>>
>>> It is handy, for one, to let you "cluster" key usage.  If you have 5
>>> Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each
>>> Coke one using the same key, you can boil it down to only 2 hardware
>>> keyid slots that get used, and do this transparently.
>>
>> I understand this from a marketing perspective but not a security
>> perspective.  Say I'm Coke and you've sold me some VMs that are
>> "encrypted with a Coke-specific key and no other VMs get to use that
>> key."  I can't think of *any* not-exceedingly-contrived attack in
>> which this makes the slightest difference.  If Pepsi tries to attack
>> Coke without MKTME, then they'll either need to get the hypervisor to
>> leak Coke's data through the direct map or they'll have to find some
>> way to corrupt a page table or use something like L1TF to read from a
>> physical address Coke owns.  With MKTME, if they can read through the
>> host direct map, then they'll get Coke's cleartext, and if they can
>> corrupt a page table or use L1TF to read from your memory, they'll get
>> Coke's cleartext.
>
> The design definitely has the hypervisor in the trust boundary.  If the
> hypervisor is evil, or if someone evil compromises the hypervisor, MKTME
> obviously provides less protection.
>
> I guess the question ends up being if this makes its protections weak
> enough that we should not bother merging it in its current form.

Indeed, but I’d ask another question too: I expect that MKTME is weak
enough that it will be improved, and without seeing the improvement,
it seems quite plausible that using the improvement will require
radically reworking the kernel implementation.

As a straw man, suppose we get a way to say “this key may only be
accessed through such-and-such VPID or by using a special new
restricted facility for the hypervisor to request access”.    Now we
have some degree of serious protection, but it doesn’t work, by
design, for anonymous memory.  Similarly, something that looks more
like AMD's SEV would be very very awkward to support with anything
like the current API proposal.

>
> I still have the homework assignment to go figure out why folks want the
> protections as they stand.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-06 19:10         ` Andy Lutomirski
@ 2018-12-06 19:31           ` Dave Hansen
  0 siblings, 0 replies; 87+ messages in thread
From: Dave Hansen @ 2018-12-06 19:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alison Schofield, Matthew Wilcox, Dan Williams, David Howells,
	Thomas Gleixner, James Morris, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, Kirill A. Shutemov, kai.huang,
	Jun Nakajima, Sakkinen, Jarkko, keyrings, LSM List, Linux-MM,
	X86 ML

On 12/6/18 11:10 AM, Andy Lutomirski wrote:
>> On Dec 6, 2018, at 7:39 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>>The coherency is "fine" unless you have writeback of an older
>> cacheline that blows away newer data.  CPUs that support MKTME are
>> guaranteed to never do writeback of the lines that could be established
>> speculatively or from prefetching.
> 
> How is that sufficient?  Suppose I have some physical page mapped with
> keys 1 and 2. #1 is logically live and I write to it.  Then I prefetch
> or otherwise populate mapping 2 into the cache (in the S state,
> presumably).  Now I clflush mapping 1 and read 2.  It contains garbage
> in the cache, but the garbage in the cache is inconsistent with the
> garbage in memory.  This can’t be a good thing, even if no writeback
> occurs.
> 
> I suppose the right fix is to clflush the old mapping and then to zero
> the new mapping.

Yep.  Practically, you need to write to the new mapping to give it any
meaning.  Those writes effectively blow away any previously cached,
garbage contents.

I think you're right, though, that the cached data might not be
_consistent_ with what is in memory.  It feels really dirty, but I can't
think of any problems that it actually causes.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-06 11:22       ` Kirill A. Shutemov
  2018-12-06 14:59         ` Dave Hansen
@ 2018-12-06 21:23         ` Sakkinen, Jarkko
  2018-12-07 11:54           ` Kirill A. Shutemov
  1 sibling, 1 reply; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-06 21:23 UTC (permalink / raw)
  To: kirill
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, tglx,
	linux-mm, dhowells, linux-security-module, Williams, Dan J, x86,
	hpa, mingo, luto, bp, Hansen, Dave, Schofield, Alison, Nakajima,
	Jun

On Thu, 2018-12-06 at 14:22 +0300, Kirill A. Shutemov wrote:
> When you say "disable encryption to a page" does the encryption get
> > actually disabled or does the CPU just decrypt it transparently i.e.
> > what happens physically?
> 
> Yes, it gets disabled. Physically. It overrides TME encryption.

OK, thanks for confirmation. BTW, how much is the penalty to keep it
always enabled? Is it something that would not make sense for some
other reasons?

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 10/13] keys/mktme: Add the MKTME Key Service type for memory encryption
  2018-12-06 15:11     ` Dave Hansen
@ 2018-12-06 22:56       ` Sakkinen, Jarkko
  0 siblings, 0 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-06 22:56 UTC (permalink / raw)
  To: tglx, Schofield, Alison, dhowells, Hansen, Dave
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, linux-mm,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	bp, Nakajima, Jun

On Thu, 2018-12-06 at 07:11 -0800, Dave Hansen wrote:
> On 12/6/18 12:51 AM, Sakkinen, Jarkko wrote:
> > On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> > > MKTME (Multi-Key Total Memory Encryption) is a technology that allows
> > > transparent memory encryption in upcoming Intel platforms. MKTME will
> > > support mulitple encryption domains, each having their own key. The main
> > > use case for the feature is virtual machine isolation. The API needs the
> > > flexibility to work for a wide range of uses.
> > Some, maybe brutal, honesty (apologies)...
> > 
> > Have never really got the grip why either SME or TME would make
> > isolation any better. If you can break into hypervisor, you'll
> > have these tools availabe:
> 
> For systems using MKTME, the hypervisor is within the "trust boundary".
>  From what I've read, it is a bit _more_ trusted than with AMD's scheme.
> 
> But, yes, if you can mount a successful arbitrary code execution attack
> against the MKTME hypervisor, you can defeat MKTME's protections.  If
> the kernel creates non-encrypted mappings of memory that's being
> encrypted with MKTME, an arbitrary read primitive could also be a very
> valuable in defeating MKTME's protections.  That's why Andy is proposing
> doing something like eXclusive-Page-Frame-Ownership (google XPFO).

Thanks, I was not aware of XPFO but I found a nice ~2 page article about it:

https://lwn.net/Articles/700647/

I think the performance hit is the necessary price to pay (if you want
something more opaque than just the usual "military grade security"). At
minimum, it should be an opt-in feature.

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-06  1:09     ` Andy Lutomirski
  2018-12-06  1:25       ` Dan Williams
  2018-12-06 15:39       ` Dave Hansen
@ 2018-12-07  1:55       ` Huang, Kai
  2018-12-07  4:23         ` Dave Hansen
  2018-12-07 23:53         ` Andy Lutomirski
  2 siblings, 2 replies; 87+ messages in thread
From: Huang, Kai @ 2018-12-07  1:55 UTC (permalink / raw)
  To: luto, Hansen, Dave
  Cc: kirill.shutemov, jmorris, peterz, keyrings, willy, tglx,
	linux-mm, dhowells, linux-security-module, Williams, Dan J, x86,
	hpa, mingo, Sakkinen, Jarkko, bp, Schofield, Alison, Nakajima,
	Jun


> 
> TME itself provides a ton of protection -- you can't just barge into
> the datacenter, refrigerate the DIMMs, walk away with them, and read
> off everyone's data.
> 
> Am I missing something?

I think we can make such assumption in most cases, but I think it's better that we don't make any
assumption at all. For example, the admin of data center (or anyone) who has physical access to
servers may do something malicious. I am not expert but there should be other physical attack
methods besides coldboot attack, if the malicious employee can get physical access to server w/o
being detected.

> 
> > 
> > But, I think what you're implying is that the security properties of
> > user-supplied keys can only be *worse* than using CPU-generated keys
> > (assuming the CPU does a good job generating it).  So, why bother
> > allowing user-specified keys in the first place?
> 
> That too :)

I think one usage of user-specified key is for NVDIMM, since CPU key will be gone after machine
reboot, therefore if NVDIMM is encrypted by CPU key we are not able to retrieve it once
shutdown/reboot, etc.

There are some other use cases that already require tenant to send key to CSP. For example, the VM
image can be provided by tenant and encrypted by tenant's own key, and tenant needs to send key to
CSP when asking CSP to run that encrypted image. But tenant will need to trust CSP in such case,
which brings us why tenant wants to use his own image at first place (I have to say I myself is not
convinced the value of such use case). I think there are two levels of trustiness involved here: 1)
tenant needs to trust CSP anyway; 2) but CSP needs to convince tenant that CSP can be trusted, ie,
by proving it can prevent potential attack from malicious employee (ie, by raising bar by using
MKTME), etc.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-05 22:19   ` Sakkinen, Jarkko
@ 2018-12-07  2:05     ` Huang, Kai
  2018-12-07  6:48       ` Jarkko Sakkinen
  2018-12-07 11:57     ` Kirill A. Shutemov
  1 sibling, 1 reply; 87+ messages in thread
From: Huang, Kai @ 2018-12-07  2:05 UTC (permalink / raw)
  To: Williams, Dan J, Schofield, Alison, luto, Sakkinen, Jarkko, willy
  Cc: kirill.shutemov, jmorris, peterz, keyrings, tglx, linux-mm,
	dhowells, linux-security-module, x86, hpa, mingo, bp, Hansen,
	Dave, Nakajima, Jun

On Wed, 2018-12-05 at 22:19 +0000, Sakkinen, Jarkko wrote:
> On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote:
> > I'm not Thomas, but I think it's the wrong direction.  As it stands,
> > encrypt_mprotect() is an incomplete version of mprotect() (since it's
> > missing the protection key support), and it's also functionally just
> > MADV_DONTNEED.  In other words, the sole user-visible effect appears
> > to be that the existing pages are blown away.  The fact that it
> > changes the key in use doesn't seem terribly useful, since it's
> > anonymous memory, and the most secure choice is to use CPU-managed
> > keying, which appears to be the default anyway on TME systems.  It
> > also has totally unclear semantics WRT swap, and, off the top of my
> > head, it looks like it may have serious cache-coherency issues and
> > like swapping the pages might corrupt them, both because there are no
> > flushes and because the direct-map alias looks like it will use the
> > default key and therefore appear to contain the wrong data.
> > 
> > I would propose a very different direction: don't try to support MKTME
> > at all for anonymous memory, and instead figure out the important use
> > cases and support them directly.  The use cases that I can think of
> > off the top of my head are:
> > 
> > 1. pmem.  This should probably use a very different API.
> > 
> > 2. Some kind of VM hardening, where a VM's memory can be protected a
> > little tiny bit from the main kernel.  But I don't see why this is any
> > better than XPO (eXclusive Page-frame Ownership), which brings to
> > mind:
> 
> What is the threat model anyway for AMD and Intel technologies?
> 
> For me it looks like that you can read, write and even replay 
> encrypted pages both in SME and TME. 

Right. Neither of them (including MKTME) prevents replay attack. But in my understanding SEV doesn't
prevent replay attack either since it doesn't have integrity protection.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows
  2018-12-04  7:39 ` [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows Alison Schofield
  2018-12-04  9:22   ` Peter Zijlstra
@ 2018-12-07  2:14   ` Huang, Kai
  2018-12-07  3:42     ` Alison Schofield
                       ` (2 more replies)
  1 sibling, 3 replies; 87+ messages in thread
From: Huang, Kai @ 2018-12-07  2:14 UTC (permalink / raw)
  To: tglx, Schofield, Alison, dhowells
  Cc: kirill.shutemov, peterz, jmorris, keyrings, linux-mm,
	linux-security-module, Williams, Dan J, x86, hpa, mingo, luto,
	Sakkinen, Jarkko, bp, Hansen, Dave, Nakajima, Jun

On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> MKTME (Multi-Key Total Memory Encryption) key payloads may include
> data encryption keys, tweak keys, and additional entropy bits. These
> are used to program the MKTME encryption hardware. By default, the
> kernel destroys this payload data once the hardware is programmed.
> 
> However, in order to fully support CPU Hotplug, saving the key data
> becomes important. The MKTME Key Service cannot allow a new physical
> package to come online unless it can program the new packages Key Table
> to match the Key Tables of all existing physical packages.
> 
> With CPU generated keys (a.k.a. random keys or ephemeral keys) the
> saving of user key data is not an issue. The kernel and MKTME hardware
> can generate strong encryption keys without recalling any user supplied
> data.
> 
> With USER directed keys (a.k.a. user type) saving the key programming
> data (data and tweak key) becomes an issue. The data and tweak keys
> are required to program those keys on a new physical package.
> 
> In preparation for adding CPU hotplug support:
> 
>    Add an 'mktme_vault' where key data is stored.
> 
>    Add 'mktme_savekeys' kernel command line parameter that directs
>    what key data can be stored. If it is not set, kernel does not
>    store users data key or tweak key.
> 
>    Add 'mktme_bitmap_user_type' to track when USER type keys are in
>    use. If no USER type keys are currently in use, a physical package
>    may be brought online, despite the absence of 'mktme_savekeys'.

Overall, I am not sure whether saving key is good idea, since it breaks coldboot attack IMHO. We
need to tradeoff between supporting CPU hotplug and security. I am not sure whether supporting CPU
hotplug is that important, since for some other features such as SGX, we don't support CPU hotplug
anyway.

Alternatively, we can choose to use per-socket keyID, but not to program keyID globally across all
sockets, so you don't have to save key while still supporting CPU hotplug.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows
  2018-12-07  2:14   ` Huang, Kai
@ 2018-12-07  3:42     ` Alison Schofield
  2018-12-07  6:39     ` Jarkko Sakkinen
  2018-12-07 11:47     ` Kirill A. Shutemov
  2 siblings, 0 replies; 87+ messages in thread
From: Alison Schofield @ 2018-12-07  3:42 UTC (permalink / raw)
  To: Huang, Kai
  Cc: tglx, dhowells, kirill.shutemov, peterz, jmorris, keyrings,
	linux-mm, linux-security-module, Williams, Dan J, x86, hpa,
	mingo, luto, Sakkinen, Jarkko, bp, Hansen, Dave, Nakajima, Jun

On Thu, Dec 06, 2018 at 06:14:03PM -0800, Huang, Kai wrote:
> On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:

8< ------------

> >    Add an 'mktme_vault' where key data is stored.
> > 
> >    Add 'mktme_savekeys' kernel command line parameter that directs
> >    what key data can be stored. If it is not set, kernel does not
> >    store users data key or tweak key.
> > 
> >    Add 'mktme_bitmap_user_type' to track when USER type keys are in
> >    use. If no USER type keys are currently in use, a physical package
> >    may be brought online, despite the absence of 'mktme_savekeys'.
> 
> Overall, I am not sure whether saving key is good idea, since it breaks coldboot attack IMHO. We
> need to tradeoff between supporting CPU hotplug and security. I am not sure whether supporting CPU
> hotplug is that important, since for some other features such as SGX, we don't support CPU hotplug
> anyway.

Yes, saving the key data exposes it in a cold boot attack.

Here we have 2 conflicting requirements. Do not save the data and
support CPU hotplug. I don't think CPU hotplug support is budging!
If the risk of offering the mktme_savekeys option is too dangerous,
then we can't have user type keys.
Is mktme_savekeys options too risky to offer?
(That's not just a question for you Kai ;). I'll pursue too.)
> 
> Alternatively, we can choose to use per-socket keyID, but not to program keyID globally across all
> sockets, so you don't have to save key while still supporting CPU hotplug.

An alternative, with a lot of impact to the core linux support for
MKTME.  I don't think we need to go there. I'll leave this thought for
a Kirill or Dave to perhaps elaborate on. 

Alison 
> 
> Thanks,
> -Kai

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-07  1:55       ` Huang, Kai
@ 2018-12-07  4:23         ` Dave Hansen
  2018-12-07 23:53         ` Andy Lutomirski
  1 sibling, 0 replies; 87+ messages in thread
From: Dave Hansen @ 2018-12-07  4:23 UTC (permalink / raw)
  To: Huang, Kai, luto
  Cc: kirill.shutemov, jmorris, peterz, keyrings, willy, tglx,
	linux-mm, dhowells, linux-security-module, Williams, Dan J, x86,
	hpa, mingo, Sakkinen, Jarkko, bp, Schofield, Alison, Nakajima,
	Jun

On 12/6/18 5:55 PM, Huang, Kai wrote:
> I think one usage of user-specified key is for NVDIMM, since CPU key
> will be gone after machine reboot, therefore if NVDIMM is encrypted
> by CPU key we are not able to retrieve it once shutdown/reboot, etc.

I think we all agree that the NVDIMM uses are really useful.

But, these patches don't implement that.  So, if NVDIMMs are the only
reasonable use case, we shouldn't merge these patches until we add
NVDIMM support.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows
  2018-12-07  2:14   ` Huang, Kai
  2018-12-07  3:42     ` Alison Schofield
@ 2018-12-07  6:39     ` Jarkko Sakkinen
  2018-12-07  6:45       ` Jarkko Sakkinen
  2018-12-07 11:47     ` Kirill A. Shutemov
  2 siblings, 1 reply; 87+ messages in thread
From: Jarkko Sakkinen @ 2018-12-07  6:39 UTC (permalink / raw)
  To: Huang, Kai
  Cc: tglx, Schofield, Alison, dhowells, kirill.shutemov, peterz,
	jmorris, keyrings, linux-mm, linux-security-module, Williams,
	Dan J, x86, hpa, mingo, luto, bp, Hansen, Dave, Nakajima, Jun

On Thu, Dec 06, 2018 at 06:14:03PM -0800, Huang, Kai wrote:
> On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> > MKTME (Multi-Key Total Memory Encryption) key payloads may include
> > data encryption keys, tweak keys, and additional entropy bits. These
> > are used to program the MKTME encryption hardware. By default, the
> > kernel destroys this payload data once the hardware is programmed.
> > 
> > However, in order to fully support CPU Hotplug, saving the key data
> > becomes important. The MKTME Key Service cannot allow a new physical
> > package to come online unless it can program the new packages Key Table
> > to match the Key Tables of all existing physical packages.
> > 
> > With CPU generated keys (a.k.a. random keys or ephemeral keys) the
> > saving of user key data is not an issue. The kernel and MKTME hardware
> > can generate strong encryption keys without recalling any user supplied
> > data.
> > 
> > With USER directed keys (a.k.a. user type) saving the key programming
> > data (data and tweak key) becomes an issue. The data and tweak keys
> > are required to program those keys on a new physical package.
> > 
> > In preparation for adding CPU hotplug support:
> > 
> >    Add an 'mktme_vault' where key data is stored.
> > 
> >    Add 'mktme_savekeys' kernel command line parameter that directs
> >    what key data can be stored. If it is not set, kernel does not
> >    store users data key or tweak key.
> > 
> >    Add 'mktme_bitmap_user_type' to track when USER type keys are in
> >    use. If no USER type keys are currently in use, a physical package
> >    may be brought online, despite the absence of 'mktme_savekeys'.
> 
> Overall, I am not sure whether saving key is good idea, since it
> breaks coldboot attack IMHO. We need to tradeoff between supporting
> CPU hotplug and security. I am not sure whether supporting CPU hotplug
> is that important, since for some other features such as SGX, we don't
> support CPU hotplug anyway.

What is the application for saving the key anyway?

With my current knowledge, I'm not even sure what is the application
for user provided keys.

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows
  2018-12-07  6:39     ` Jarkko Sakkinen
@ 2018-12-07  6:45       ` Jarkko Sakkinen
  0 siblings, 0 replies; 87+ messages in thread
From: Jarkko Sakkinen @ 2018-12-07  6:45 UTC (permalink / raw)
  To: Huang, Kai
  Cc: tglx, Schofield, Alison, dhowells, kirill.shutemov, peterz,
	jmorris, keyrings, linux-mm, linux-security-module, Williams,
	Dan J, x86, hpa, mingo, luto, bp, Hansen, Dave, Nakajima, Jun

On Thu, Dec 06, 2018 at 10:39:18PM -0800, Jarkko Sakkinen wrote:
> On Thu, Dec 06, 2018 at 06:14:03PM -0800, Huang, Kai wrote:
> > On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> > > MKTME (Multi-Key Total Memory Encryption) key payloads may include
> > > data encryption keys, tweak keys, and additional entropy bits. These
> > > are used to program the MKTME encryption hardware. By default, the
> > > kernel destroys this payload data once the hardware is programmed.
> > > 
> > > However, in order to fully support CPU Hotplug, saving the key data
> > > becomes important. The MKTME Key Service cannot allow a new physical
> > > package to come online unless it can program the new packages Key Table
> > > to match the Key Tables of all existing physical packages.
> > > 
> > > With CPU generated keys (a.k.a. random keys or ephemeral keys) the
> > > saving of user key data is not an issue. The kernel and MKTME hardware
> > > can generate strong encryption keys without recalling any user supplied
> > > data.
> > > 
> > > With USER directed keys (a.k.a. user type) saving the key programming
> > > data (data and tweak key) becomes an issue. The data and tweak keys
> > > are required to program those keys on a new physical package.
> > > 
> > > In preparation for adding CPU hotplug support:
> > > 
> > >    Add an 'mktme_vault' where key data is stored.
> > > 
> > >    Add 'mktme_savekeys' kernel command line parameter that directs
> > >    what key data can be stored. If it is not set, kernel does not
> > >    store users data key or tweak key.
> > > 
> > >    Add 'mktme_bitmap_user_type' to track when USER type keys are in
> > >    use. If no USER type keys are currently in use, a physical package
> > >    may be brought online, despite the absence of 'mktme_savekeys'.
> > 
> > Overall, I am not sure whether saving key is good idea, since it
> > breaks coldboot attack IMHO. We need to tradeoff between supporting
> > CPU hotplug and security. I am not sure whether supporting CPU hotplug
> > is that important, since for some other features such as SGX, we don't
> > support CPU hotplug anyway.
> 
> What is the application for saving the key anyway?
> 
> With my current knowledge, I'm not even sure what is the application
> for user provided keys.

Ugh, right of course, you need to save the key in order to support
hotplug.

Cold boot is like the main security use case for this (probably would
be worth to mention this in the documentation).

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-07  2:05     ` Huang, Kai
@ 2018-12-07  6:48       ` Jarkko Sakkinen
  0 siblings, 0 replies; 87+ messages in thread
From: Jarkko Sakkinen @ 2018-12-07  6:48 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Williams, Dan J, Schofield, Alison, luto, willy, kirill.shutemov,
	jmorris, peterz, keyrings, tglx, linux-mm, dhowells,
	linux-security-module, x86, hpa, mingo, bp, Hansen, Dave,
	Nakajima, Jun

On Thu, Dec 06, 2018 at 06:05:50PM -0800, Huang, Kai wrote:
> On Wed, 2018-12-05 at 22:19 +0000, Sakkinen, Jarkko wrote:
> > On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote:
> > > I'm not Thomas, but I think it's the wrong direction.  As it stands,
> > > encrypt_mprotect() is an incomplete version of mprotect() (since it's
> > > missing the protection key support), and it's also functionally just
> > > MADV_DONTNEED.  In other words, the sole user-visible effect appears
> > > to be that the existing pages are blown away.  The fact that it
> > > changes the key in use doesn't seem terribly useful, since it's
> > > anonymous memory, and the most secure choice is to use CPU-managed
> > > keying, which appears to be the default anyway on TME systems.  It
> > > also has totally unclear semantics WRT swap, and, off the top of my
> > > head, it looks like it may have serious cache-coherency issues and
> > > like swapping the pages might corrupt them, both because there are no
> > > flushes and because the direct-map alias looks like it will use the
> > > default key and therefore appear to contain the wrong data.
> > > 
> > > I would propose a very different direction: don't try to support MKTME
> > > at all for anonymous memory, and instead figure out the important use
> > > cases and support them directly.  The use cases that I can think of
> > > off the top of my head are:
> > > 
> > > 1. pmem.  This should probably use a very different API.
> > > 
> > > 2. Some kind of VM hardening, where a VM's memory can be protected a
> > > little tiny bit from the main kernel.  But I don't see why this is any
> > > better than XPO (eXclusive Page-frame Ownership), which brings to
> > > mind:
> > 
> > What is the threat model anyway for AMD and Intel technologies?
> > 
> > For me it looks like that you can read, write and even replay 
> > encrypted pages both in SME and TME. 
> 
> Right. Neither of them (including MKTME) prevents replay attack. But
> in my understanding SEV doesn't prevent replay attack either since it
> doesn't have integrity protection.

Yep, it doesn't :-) That's why I've been wondering after seeing
presentations concerning SME and SVE what they are good for.

Cold boot attacks are definitely at least something where these
techs can help...

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-06 14:59         ` Dave Hansen
@ 2018-12-07 10:12           ` Huang, Kai
  0 siblings, 0 replies; 87+ messages in thread
From: Huang, Kai @ 2018-12-07 10:12 UTC (permalink / raw)
  To: kirill, Sakkinen, Jarkko, Hansen, Dave
  Cc: kirill.shutemov, peterz, jmorris, keyrings, tglx, linux-mm,
	dhowells, linux-security-module, Williams, Dan J, x86, hpa,
	mingo, luto, bp, Schofield, Alison, Nakajima, Jun

On Thu, 2018-12-06 at 06:59 -0800, Dave Hansen wrote:
> On 12/6/18 3:22 AM, Kirill A. Shutemov wrote:
> > > When you say "disable encryption to a page" does the encryption get
> > > actually disabled or does the CPU just decrypt it transparently i.e.
> > > what happens physically?
> > 
> > Yes, it gets disabled. Physically. It overrides TME encryption.
> 
> I know MKTME itself has a runtime overhead and we expect it to have a
> performance impact in the low single digits.  Does TME have that
> overhead?  Presumably MKTME plus no-encryption is not expected to have
> the overhead.
> 
> We should probably mention that in the changelogs too.
> 

I believe in terms of hardware crypto overhead MKTME and TME should have the same (except MKTME no-
encrypt case?). But MKTME might have additional overhead from software implementation in kernel?

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows
  2018-12-07  2:14   ` Huang, Kai
  2018-12-07  3:42     ` Alison Schofield
  2018-12-07  6:39     ` Jarkko Sakkinen
@ 2018-12-07 11:47     ` Kirill A. Shutemov
  2 siblings, 0 replies; 87+ messages in thread
From: Kirill A. Shutemov @ 2018-12-07 11:47 UTC (permalink / raw)
  To: Huang, Kai
  Cc: tglx, Schofield, Alison, dhowells, kirill.shutemov, peterz,
	jmorris, keyrings, linux-mm, linux-security-module, Williams,
	Dan J, x86, hpa, mingo, luto, Sakkinen, Jarkko, bp, Hansen, Dave,
	Nakajima, Jun

On Fri, Dec 07, 2018 at 02:14:03AM +0000, Huang, Kai wrote:
> Alternatively, we can choose to use per-socket keyID, but not to program
> keyID globally across all sockets, so you don't have to save key while
> still supporting CPU hotplug.

Per-socket KeyID approach would make things more complex. For instance
KeyID on its own will not be enough to refer a key. You will need a node
too. It will also require a way to track whether theirs an KeyID on other
node for the key.

It also makes memory management less flexible: runtime migration of the
memory between nodes will be limited and it can hurt memory availablity
for non-encrypted tasks too.

In general, I don't see per-socket KeyID handling very attractive. It
creates more problems than solves.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-06 21:23         ` Sakkinen, Jarkko
@ 2018-12-07 11:54           ` Kirill A. Shutemov
  0 siblings, 0 replies; 87+ messages in thread
From: Kirill A. Shutemov @ 2018-12-07 11:54 UTC (permalink / raw)
  To: Sakkinen, Jarkko
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, tglx,
	linux-mm, dhowells, linux-security-module, Williams, Dan J, x86,
	hpa, mingo, luto, bp, Hansen, Dave, Schofield, Alison, Nakajima,
	Jun

On Thu, Dec 06, 2018 at 09:23:20PM +0000, Sakkinen, Jarkko wrote:
> On Thu, 2018-12-06 at 14:22 +0300, Kirill A. Shutemov wrote:
> > When you say "disable encryption to a page" does the encryption get
> > > actually disabled or does the CPU just decrypt it transparently i.e.
> > > what happens physically?
> > 
> > Yes, it gets disabled. Physically. It overrides TME encryption.
> 
> OK, thanks for confirmation. BTW, how much is the penalty to keep it
> always enabled? Is it something that would not make sense for some
> other reasons?

We don't have any numbers to share at this point.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-05 22:19   ` Sakkinen, Jarkko
  2018-12-07  2:05     ` Huang, Kai
@ 2018-12-07 11:57     ` Kirill A. Shutemov
  2018-12-07 21:59       ` Sakkinen, Jarkko
  2018-12-07 23:35       ` Eric Rannaud
  1 sibling, 2 replies; 87+ messages in thread
From: Kirill A. Shutemov @ 2018-12-07 11:57 UTC (permalink / raw)
  To: Sakkinen, Jarkko
  Cc: Williams, Dan J, Schofield, Alison, luto, willy, kirill.shutemov,
	jmorris, peterz, Huang, Kai, keyrings, tglx, linux-mm, dhowells,
	linux-security-module, x86, hpa, mingo, bp, Hansen, Dave,
	Nakajima, Jun

On Wed, Dec 05, 2018 at 10:19:20PM +0000, Sakkinen, Jarkko wrote:
> On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote:
> > I'm not Thomas, but I think it's the wrong direction.  As it stands,
> > encrypt_mprotect() is an incomplete version of mprotect() (since it's
> > missing the protection key support), and it's also functionally just
> > MADV_DONTNEED.  In other words, the sole user-visible effect appears
> > to be that the existing pages are blown away.  The fact that it
> > changes the key in use doesn't seem terribly useful, since it's
> > anonymous memory, and the most secure choice is to use CPU-managed
> > keying, which appears to be the default anyway on TME systems.  It
> > also has totally unclear semantics WRT swap, and, off the top of my
> > head, it looks like it may have serious cache-coherency issues and
> > like swapping the pages might corrupt them, both because there are no
> > flushes and because the direct-map alias looks like it will use the
> > default key and therefore appear to contain the wrong data.
> > 
> > I would propose a very different direction: don't try to support MKTME
> > at all for anonymous memory, and instead figure out the important use
> > cases and support them directly.  The use cases that I can think of
> > off the top of my head are:
> > 
> > 1. pmem.  This should probably use a very different API.
> > 
> > 2. Some kind of VM hardening, where a VM's memory can be protected a
> > little tiny bit from the main kernel.  But I don't see why this is any
> > better than XPO (eXclusive Page-frame Ownership), which brings to
> > mind:
> 
> What is the threat model anyway for AMD and Intel technologies?
> 
> For me it looks like that you can read, write and even replay 
> encrypted pages both in SME and TME. 

What replay attack are you talking about? MKTME uses AES-XTS with physical
address tweak. So the data is tied to the place in physical address space
and replacing one encrypted page with another encrypted page from
different address will produce garbage on decryption.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-07 11:57     ` Kirill A. Shutemov
@ 2018-12-07 21:59       ` Sakkinen, Jarkko
  2018-12-07 23:45         ` Sakkinen, Jarkko
  2018-12-07 23:35       ` Eric Rannaud
  1 sibling, 1 reply; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-07 21:59 UTC (permalink / raw)
  To: kirill
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, willy,
	tglx, linux-mm, dhowells, linux-security-module, Williams, Dan J,
	x86, hpa, mingo, luto, bp, Hansen, Dave, Schofield, Alison,
	Nakajima, Jun

On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote:
> > What is the threat model anyway for AMD and Intel technologies?
> > 
> > For me it looks like that you can read, write and even replay 
> > encrypted pages both in SME and TME. 
> 
> What replay attack are you talking about? MKTME uses AES-XTS with physical
> address tweak. So the data is tied to the place in physical address space and
> replacing one encrypted page with another encrypted page from different
> address will produce garbage on decryption.

Just trying to understand how this works.

So you use physical address like a nonce/version for the page and
thus prevent replay? Was not aware of this.

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-07 11:57     ` Kirill A. Shutemov
  2018-12-07 21:59       ` Sakkinen, Jarkko
@ 2018-12-07 23:35       ` Eric Rannaud
  1 sibling, 0 replies; 87+ messages in thread
From: Eric Rannaud @ 2018-12-07 23:35 UTC (permalink / raw)
  To: kirill
  Cc: jarkko.sakkinen, dan.j.williams, alison.schofield, luto, willy,
	kirill.shutemov, jmorris, peterz, kai.huang, keyrings, tglx,
	linux-mm, dhowells, linux-security-module, x86, hpa, mingo, bp,
	dave.hansen, jun.nakajima

On Fri, Dec 7, 2018 at 3:57 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > What is the threat model anyway for AMD and Intel technologies?
> >
> > For me it looks like that you can read, write and even replay
> > encrypted pages both in SME and TME.
>
> What replay attack are you talking about? MKTME uses AES-XTS with physical
> address tweak. So the data is tied to the place in physical address space
> and replacing one encrypted page with another encrypted page from
> different address will produce garbage on decryption.

What if you have some control over the physical addresses you write
the stolen encrypted page to? For instance, VM_Eve might manage to use
physical address space previously used by VM_Alice by getting the
hypervisor to move memory around (memory pressure, force other VMs out
via some type of DOS attack, etc.).

Say:
    C is VM_Alice's clear text at hwaddr
    E = mktme_encrypt(VM_Allice_key, hwaddr, C)
    Eve somehow stole the encrypted bits E

Eve would need to write the page E without further encryption to make
sure that the DRAM contains the original stolen bits E, not encrypted
again with VM_Eve's key or mktme_encrypt(VM_Eve_key, hwaddr, E) would
be present in the DRAM which is not helpful. But with MKTME under the
current proposal VM_Eve can disable encryption for a given mapping,
right? (See also Note 1)

Eve gets the HV to move VM_Alice back over the same physical address,
Eve "somehow" gets VM_Alice to read that page and use its content
(which would likely be a use of uninitialized memory bug, from
VM_Alice's perspective) and you have a replay attack?

For TME, this doesn't work as you cannot partially disable encryption,
so if Eve tries to write the stolen encrypted bits E, even in the
"right place", they get encrypted again to tme_encrypt(hwaddr, E).
Upon decryption, VM_Alice will get E, not C.

Note 1: Actually, even if with MKTME you cannot disable encryption but
*if* Eve knows its own key, Eve can always write a preimage P that the
CPU encrypts to E for VM_Alice to read back and decrypt:
    P = mktme_decrypt(VM_Eve_key, hwaddr, E)

This is not possible with TME as Eve doesn't know the key used by the
CPU and cannot compute P.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-07 21:59       ` Sakkinen, Jarkko
@ 2018-12-07 23:45         ` Sakkinen, Jarkko
  2018-12-07 23:48           ` Andy Lutomirski
                             ` (2 more replies)
  0 siblings, 3 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-07 23:45 UTC (permalink / raw)
  To: kirill
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, willy,
	tglx, linux-mm, dhowells, linux-security-module, Williams, Dan J,
	x86, hpa, mingo, luto, bp, Hansen, Dave, Schofield, Alison,
	Nakajima, Jun

On Fri, 2018-12-07 at 13:59 -0800, Jarkko Sakkinen wrote:
> On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote:
> > > What is the threat model anyway for AMD and Intel technologies?
> > > 
> > > For me it looks like that you can read, write and even replay 
> > > encrypted pages both in SME and TME. 
> > 
> > What replay attack are you talking about? MKTME uses AES-XTS with physical
> > address tweak. So the data is tied to the place in physical address space
> > and
> > replacing one encrypted page with another encrypted page from different
> > address will produce garbage on decryption.
> 
> Just trying to understand how this works.
> 
> So you use physical address like a nonce/version for the page and
> thus prevent replay? Was not aware of this.

The brutal fact is that a physical address is an astronomical stretch
from a random value or increasing counter. Thus, it is fair to say that
MKTME provides only naive measures against replay attacks...

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-07 23:45         ` Sakkinen, Jarkko
@ 2018-12-07 23:48           ` Andy Lutomirski
  2018-12-08  1:33           ` Huang, Kai
  2018-12-12 15:31           ` Sakkinen, Jarkko
  2 siblings, 0 replies; 87+ messages in thread
From: Andy Lutomirski @ 2018-12-07 23:48 UTC (permalink / raw)
  To: Sakkinen, Jarkko
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, Peter Zijlstra,
	James Morris, kai.huang, keyrings, Matthew Wilcox,
	Thomas Gleixner, Linux-MM, David Howells, LSM List, Dan Williams,
	X86 ML, H. Peter Anvin, Ingo Molnar, Andrew Lutomirski,
	Borislav Petkov, Dave Hansen, Alison Schofield, Jun Nakajima

On Fri, Dec 7, 2018 at 3:45 PM Sakkinen, Jarkko
<jarkko.sakkinen@intel.com> wrote:
>
> On Fri, 2018-12-07 at 13:59 -0800, Jarkko Sakkinen wrote:
> > On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote:
> > > > What is the threat model anyway for AMD and Intel technologies?
> > > >
> > > > For me it looks like that you can read, write and even replay
> > > > encrypted pages both in SME and TME.
> > >
> > > What replay attack are you talking about? MKTME uses AES-XTS with physical
> > > address tweak. So the data is tied to the place in physical address space
> > > and
> > > replacing one encrypted page with another encrypted page from different
> > > address will produce garbage on decryption.
> >
> > Just trying to understand how this works.
> >
> > So you use physical address like a nonce/version for the page and
> > thus prevent replay? Was not aware of this.
>
> The brutal fact is that a physical address is an astronomical stretch
> from a random value or increasing counter. Thus, it is fair to say that
> MKTME provides only naive measures against replay attacks...
>

And this is potentially a big deal, since there are much simpler
replay attacks that can compromise the system.  For example, if I can
replay the contents of a page table, I can write to freed memory.

--Andy

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-07  1:55       ` Huang, Kai
  2018-12-07  4:23         ` Dave Hansen
@ 2018-12-07 23:53         ` Andy Lutomirski
  2018-12-08  1:11           ` Dave Hansen
  2018-12-08  2:07           ` Huang, Kai
  1 sibling, 2 replies; 87+ messages in thread
From: Andy Lutomirski @ 2018-12-07 23:53 UTC (permalink / raw)
  To: kai.huang
  Cc: Andrew Lutomirski, Dave Hansen, Kirill A. Shutemov, James Morris,
	Peter Zijlstra, keyrings, Matthew Wilcox, Thomas Gleixner,
	Linux-MM, David Howells, LSM List, Dan Williams, X86 ML,
	H. Peter Anvin, Ingo Molnar, Sakkinen, Jarkko, Borislav Petkov,
	Alison Schofield, Jun Nakajima

> On Dec 6, 2018, at 5:55 PM, Huang, Kai <kai.huang@intel.com> wrote:
>
>
>>
>> TME itself provides a ton of protection -- you can't just barge into
>> the datacenter, refrigerate the DIMMs, walk away with them, and read
>> off everyone's data.
>>
>> Am I missing something?
>
> I think we can make such assumption in most cases, but I think it's better that we don't make any
> assumption at all. For example, the admin of data center (or anyone) who has physical access to
> servers may do something malicious. I am not expert but there should be other physical attack
> methods besides coldboot attack, if the malicious employee can get physical access to server w/o
> being detected.
>
>>
>>>
>>> But, I think what you're implying is that the security properties of
>>> user-supplied keys can only be *worse* than using CPU-generated keys
>>> (assuming the CPU does a good job generating it).  So, why bother
>>> allowing user-specified keys in the first place?
>>
>> That too :)
>
> I think one usage of user-specified key is for NVDIMM, since CPU key will be gone after machine
> reboot, therefore if NVDIMM is encrypted by CPU key we are not able to retrieve it once
> shutdown/reboot, etc.
>
> There are some other use cases that already require tenant to send key to CSP. For example, the VM
> image can be provided by tenant and encrypted by tenant's own key, and tenant needs to send key to
> CSP when asking CSP to run that encrypted image.


I can imagine a few reasons why one would want to encrypt one’s image.
For example, the CSP could issue a public key and state, or even
attest, that the key is wrapped and locked to particular PCRs of their
TPM or otherwise protected by an enclave that verifies that the key is
only used to decrypt the image for the benefit of a hypervisor.

I don’t see what MKTME has to do with this.  The only remotely
plausible way I can see to use MKTME for this is to have the
hypervisor load a TPM (or other enclave) protected key into an MKTME
user key slot and to load customer-provided ciphertext into the
corresponding physical memory (using an MKTME no-encrypt slot).  But
this has three major problems.  First, it's effectively just a fancy
way to avoid one AES pass over the data.  Second, sensible scheme for
this type of VM image protection would use *authenticated* encryption
or at least verify a signature, which MKTME can't do.  The third
problem is the real show-stopper, though: this scheme requires that
the ciphertext go into predetermined physical addresses, which would
be a giant mess.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-07 23:53         ` Andy Lutomirski
@ 2018-12-08  1:11           ` Dave Hansen
  2018-12-08  2:07           ` Huang, Kai
  1 sibling, 0 replies; 87+ messages in thread
From: Dave Hansen @ 2018-12-08  1:11 UTC (permalink / raw)
  To: Andy Lutomirski, kai.huang
  Cc: Kirill A. Shutemov, James Morris, Peter Zijlstra, keyrings,
	Matthew Wilcox, Thomas Gleixner, Linux-MM, David Howells,
	LSM List, Dan Williams, X86 ML, H. Peter Anvin, Ingo Molnar,
	Sakkinen, Jarkko, Borislav Petkov, Alison Schofield,
	Jun Nakajima

On 12/7/18 3:53 PM, Andy Lutomirski wrote:
> The third problem is the real show-stopper, though: this scheme
> requires that the ciphertext go into predetermined physical
> addresses, which would be a giant mess.

There's a more fundamental problem than that.  The tweak fed into the
actual AES-XTS operation is determined by the firmware, programmed into
the memory controller, and is not visible to software.  So, not only
would you need to put stuff at a fixed physical address, the tweaks can
change from boot-to-boot, so whatever you did would only be good for one
boot.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-07 23:45         ` Sakkinen, Jarkko
  2018-12-07 23:48           ` Andy Lutomirski
@ 2018-12-08  1:33           ` Huang, Kai
  2018-12-08  3:53             ` Sakkinen, Jarkko
  2018-12-12 15:31           ` Sakkinen, Jarkko
  2 siblings, 1 reply; 87+ messages in thread
From: Huang, Kai @ 2018-12-08  1:33 UTC (permalink / raw)
  To: kirill, Sakkinen, Jarkko
  Cc: kirill.shutemov, peterz, jmorris, keyrings, willy, tglx,
	linux-mm, dhowells, linux-security-module, Williams, Dan J, x86,
	hpa, mingo, luto, bp, Hansen, Dave, Schofield, Alison, Nakajima,
	Jun

On Fri, 2018-12-07 at 23:45 +0000, Sakkinen, Jarkko wrote:
> On Fri, 2018-12-07 at 13:59 -0800, Jarkko Sakkinen wrote:
> > On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote:
> > > > What is the threat model anyway for AMD and Intel technologies?
> > > > 
> > > > For me it looks like that you can read, write and even replay 
> > > > encrypted pages both in SME and TME. 
> > > 
> > > What replay attack are you talking about? MKTME uses AES-XTS with physical
> > > address tweak. So the data is tied to the place in physical address space
> > > and
> > > replacing one encrypted page with another encrypted page from different
> > > address will produce garbage on decryption.
> > 
> > Just trying to understand how this works.
> > 
> > So you use physical address like a nonce/version for the page and
> > thus prevent replay? Was not aware of this.
> 
> The brutal fact is that a physical address is an astronomical stretch
> from a random value or increasing counter. Thus, it is fair to say that
> MKTME provides only naive measures against replay attacks...
> 
> /Jarkko

Currently there's no nonce to protect cache line so TME/MKTME is not able to prevent replay attack
you mentioned. Currently MKTME only involves AES-XTS-128 encryption but nothing else. But like I
said if I understand correctly even SEV doesn't have integrity protection so not able to prevent
reply attack as well.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-07 23:53         ` Andy Lutomirski
  2018-12-08  1:11           ` Dave Hansen
@ 2018-12-08  2:07           ` Huang, Kai
  1 sibling, 0 replies; 87+ messages in thread
From: Huang, Kai @ 2018-12-08  2:07 UTC (permalink / raw)
  To: luto
  Cc: kirill.shutemov, jmorris, peterz, keyrings, willy, linux-mm,
	tglx, dhowells, linux-security-module, Williams, Dan J, x86, hpa,
	mingo, Sakkinen, Jarkko, bp, Hansen, Dave, Schofield, Alison,
	Nakajima, Jun


> > There are some other use cases that already require tenant to send key to CSP. For example, the
> > VM
> > image can be provided by tenant and encrypted by tenant's own key, and tenant needs to send key
> > to
> > CSP when asking CSP to run that encrypted image.
> 
> 
> I can imagine a few reasons why one would want to encrypt one’s image.
> For example, the CSP could issue a public key and state, or even
> attest, that the key is wrapped and locked to particular PCRs of their
> TPM or otherwise protected by an enclave that verifies that the key is
> only used to decrypt the image for the benefit of a hypervisor.

Right. I think before tenant releases key to CSP it should always use attestation authority to
verify the trustiness of computer node. I can understand that the key can be wrapped by TPM before
sending to CSP but need some catch up about using enclave part. 

The thing is computer node can be trusted doesn't mean it cannot be attacked, or even it doesn't
mean it can prevent, ie some malicious admin, to get tenant key even by using legitimate way. There
are many SW components involved here. Anyway this is not related to MKTME itself like you mentioned
below, therefore the point is, as we already see MKTME itself provides very weak security
protection, we need to see whether MKTME has value from the whole use case's point of view
(including all the things you mentioned above) -- we define the whole use case, we clearly state
who/what should be in trust boundary, and what we can prevent, etc.

> 
> I don’t see what MKTME has to do with this. The only remotely
> plausible way I can see to use MKTME for this is to have the
> hypervisor load a TPM (or other enclave) protected key into an MKTME
> user key slot and to load customer-provided ciphertext into the
> corresponding physical memory (using an MKTME no-encrypt slot).  But
> this has three major problems.  First, it's effectively just a fancy
> way to avoid one AES pass over the data.  Second, sensible scheme for
> this type of VM image protection would use *authenticated* encryption
> or at least verify a signature, which MKTME can't do.  The third
> problem is the real show-stopper, though: this scheme requires that
> the ciphertext go into predetermined physical addresses, which would
> be a giant mess.

My intention was to say if we are already sending key to CSP, then we may prefer to use the key for
MKTME VM runtime protection as well, but like you said we may not have real security gain here
comparing to TME, so I agree we need to find out one specific case to prove that.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-08  1:33           ` Huang, Kai
@ 2018-12-08  3:53             ` Sakkinen, Jarkko
  0 siblings, 0 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-08  3:53 UTC (permalink / raw)
  To: kirill, Huang, Kai
  Cc: kirill.shutemov, peterz, jmorris, keyrings, willy, tglx,
	linux-mm, dhowells, linux-security-module, Williams, Dan J, x86,
	hpa, mingo, luto, bp, Hansen, Dave, Schofield, Alison, Nakajima,
	Jun

On Sat, 2018-12-08 at 09:33 +0800, Huang, Kai wrote:
> Currently there's no nonce to protect cache line so TME/MKTME is not able to
> prevent replay attack
> you mentioned. Currently MKTME only involves AES-XTS-128 encryption but
> nothing else. But like I
> said if I understand correctly even SEV doesn't have integrity protection so
> not able to prevent
> reply attack as well.

You're absolutely correct.

There's a also good paper on SEV subvertion:

https://arxiv.org/pdf/1805.09604.pdf

I don't think this makes MKTME or SEV uselss, but yeah, it is a
constraint that needs to be taken into consideration when finding the
best way to use these technologies in Linux.

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-07 23:45         ` Sakkinen, Jarkko
  2018-12-07 23:48           ` Andy Lutomirski
  2018-12-08  1:33           ` Huang, Kai
@ 2018-12-12 15:31           ` Sakkinen, Jarkko
  2018-12-12 16:29             ` Andy Lutomirski
  2 siblings, 1 reply; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-12 15:31 UTC (permalink / raw)
  To: kirill
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, willy,
	tglx, linux-mm, dhowells, linux-security-module, Williams, Dan J,
	x86, hpa, mingo, luto, bp, Hansen, Dave, Schofield, Alison,
	Nakajima, Jun

On Fri, 2018-12-07 at 15:45 -0800, Jarkko Sakkinen wrote:
> The brutal fact is that a physical address is an astronomical stretch
> from a random value or increasing counter. Thus, it is fair to say that
> MKTME provides only naive measures against replay attacks...

I'll try to summarize how I understand the high level security
model of MKTME because (would be good idea to document it).

Assumptions:

1. The hypervisor has not been infiltrated.
2. The hypervisor does not leak secrets.

When (1) and (2) hold [1], we harden VMs in two different ways:

A. VMs cannot leak data to each other or can they with L1TF when HT
   is enabled?
B. Protects against cold boot attacks.

Isn't this what this about in the nutshell roughly?

[1] XPFO could potentially be an opt-in feature that reduces the
    damage when either of these assumptions has been broken.

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-12 15:31           ` Sakkinen, Jarkko
@ 2018-12-12 16:29             ` Andy Lutomirski
  2018-12-12 16:43               ` Sakkinen, Jarkko
  0 siblings, 1 reply; 87+ messages in thread
From: Andy Lutomirski @ 2018-12-12 16:29 UTC (permalink / raw)
  To: Sakkinen, Jarkko
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, Peter Zijlstra,
	James Morris, kai.huang, keyrings, Matthew Wilcox,
	Thomas Gleixner, Linux-MM, David Howells, LSM List, Dan Williams,
	X86 ML, H. Peter Anvin, Ingo Molnar, Andrew Lutomirski,
	Borislav Petkov, Dave Hansen, Alison Schofield, Jun Nakajima

On Wed, Dec 12, 2018 at 7:31 AM Sakkinen, Jarkko
<jarkko.sakkinen@intel.com> wrote:
>
> On Fri, 2018-12-07 at 15:45 -0800, Jarkko Sakkinen wrote:
> > The brutal fact is that a physical address is an astronomical stretch
> > from a random value or increasing counter. Thus, it is fair to say that
> > MKTME provides only naive measures against replay attacks...
>
> I'll try to summarize how I understand the high level security
> model of MKTME because (would be good idea to document it).
>
> Assumptions:
>
> 1. The hypervisor has not been infiltrated.
> 2. The hypervisor does not leak secrets.
>
> When (1) and (2) hold [1], we harden VMs in two different ways:
>
> A. VMs cannot leak data to each other or can they with L1TF when HT
>    is enabled?

I strongly suspect that, on L1TF-vulnerable CPUs, MKTME provides no
protection whatsoever.  It sounds like MKTME is implemented in the
memory controller -- as far as the rest of the CPU and the cache
hierarchy are concerned, the MKTME key selction bits are just part of
the physical address.  So an attack like L1TF that leaks a cacheline
that's selected by physical address will leak the cleartext if the key
selection bits are set correctly.

(I suppose that, if the attacker needs to brute-force the physical
address, then MKTME makes it a bit harder because the effective
physical address space is larger.)

> B. Protects against cold boot attacks.

TME does this, AFAIK.  MKTME does, too, unless the "user" mode is
used, in which case the protection is weaker.

>
> Isn't this what this about in the nutshell roughly?
>
> [1] XPFO could potentially be an opt-in feature that reduces the
>     damage when either of these assumptions has been broken.
>
> /Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME)
  2018-12-12 16:29             ` Andy Lutomirski
@ 2018-12-12 16:43               ` Sakkinen, Jarkko
  0 siblings, 0 replies; 87+ messages in thread
From: Sakkinen, Jarkko @ 2018-12-12 16:43 UTC (permalink / raw)
  To: luto
  Cc: kirill.shutemov, peterz, jmorris, Huang, Kai, keyrings, willy,
	tglx, linux-mm, dhowells, linux-security-module, Williams, Dan J,
	x86, hpa, mingo, kirill, bp, Hansen, Dave, Schofield, Alison,
	Nakajima, Jun

On Wed, 2018-12-12 at 08:29 -0800, Andy Lutomirski wrote:
> On Wed, Dec 12, 2018 at 7:31 AM Sakkinen, Jarkko
> <jarkko.sakkinen@intel.com> wrote:
> > On Fri, 2018-12-07 at 15:45 -0800, Jarkko Sakkinen wrote:
> > > The brutal fact is that a physical address is an astronomical stretch
> > > from a random value or increasing counter. Thus, it is fair to say that
> > > MKTME provides only naive measures against replay attacks...
> > 
> > I'll try to summarize how I understand the high level security
> > model of MKTME because (would be good idea to document it).
> > 
> > Assumptions:
> > 
> > 1. The hypervisor has not been infiltrated.
> > 2. The hypervisor does not leak secrets.
> > 
> > When (1) and (2) hold [1], we harden VMs in two different ways:
> > 
> > A. VMs cannot leak data to each other or can they with L1TF when HT
> >    is enabled?
> 
> I strongly suspect that, on L1TF-vulnerable CPUs, MKTME provides no
> protection whatsoever.  It sounds like MKTME is implemented in the
> memory controller -- as far as the rest of the CPU and the cache
> hierarchy are concerned, the MKTME key selction bits are just part of
> the physical address.  So an attack like L1TF that leaks a cacheline
> that's selected by physical address will leak the cleartext if the key
> selection bits are set correctly.
> 
> (I suppose that, if the attacker needs to brute-force the physical
> address, then MKTME makes it a bit harder because the effective
> physical address space is larger.)
> 
> > B. Protects against cold boot attacks.
> 
> TME does this, AFAIK.  MKTME does, too, unless the "user" mode is
> used, in which case the protection is weaker.
> 
> > Isn't this what this about in the nutshell roughly?
> > 
> > [1] XPFO could potentially be an opt-in feature that reduces the
> >     damage when either of these assumptions has been broken.

This all should be summarized in the documentation (high-level model
and corner cases).

/Jarkko

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, back to index

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-04  7:39 [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Alison Schofield
2018-12-04  7:39 ` [RFC v2 01/13] x86/mktme: Document the MKTME APIs Alison Schofield
2018-12-05 18:11   ` Andy Lutomirski
2018-12-05 19:22     ` Alison Schofield
2018-12-05 23:35       ` Andy Lutomirski
2018-12-06  8:04   ` Sakkinen, Jarkko
2018-12-04  7:39 ` [RFC v2 02/13] mm: Generalize the mprotect implementation to support extensions Alison Schofield
2018-12-06  8:08   ` Sakkinen, Jarkko
2018-12-04  7:39 ` [RFC v2 03/13] syscall/x86: Wire up a new system call for memory encryption keys Alison Schofield
2018-12-04  7:39 ` [RFC v2 04/13] x86/mm: Add helper functions for MKTME " Alison Schofield
2018-12-04  9:14   ` Peter Zijlstra
2018-12-05  5:49     ` Alison Schofield
2018-12-04 15:35   ` Andy Lutomirski
2018-12-05  5:52     ` Alison Schofield
2018-12-06  8:31   ` Sakkinen, Jarkko
2018-12-04  7:39 ` [RFC v2 05/13] x86/mm: Set KeyIDs in encrypted VMAs Alison Schofield
2018-12-06  8:37   ` Sakkinen, Jarkko
2018-12-04  7:39 ` [RFC v2 06/13] mm: Add the encrypt_mprotect() system call Alison Schofield
2018-12-06  8:38   ` Sakkinen, Jarkko
2018-12-04  7:39 ` [RFC v2 07/13] x86/mm: Add helpers for reference counting encrypted VMAs Alison Schofield
2018-12-04  8:58   ` Peter Zijlstra
2018-12-05  5:28     ` Alison Schofield
2018-12-04  7:39 ` [RFC v2 08/13] mm: Use reference counting for " Alison Schofield
2018-12-04  7:39 ` [RFC v2 09/13] mm: Restrict memory encryption to anonymous VMA's Alison Schofield
2018-12-04  9:10   ` Peter Zijlstra
2018-12-05  5:30     ` Alison Schofield
2018-12-05  9:07       ` Peter Zijlstra
2018-12-04  7:39 ` [RFC v2 10/13] keys/mktme: Add the MKTME Key Service type for memory encryption Alison Schofield
2018-12-06  8:51   ` Sakkinen, Jarkko
2018-12-06  8:54     ` Sakkinen, Jarkko
2018-12-06 15:11     ` Dave Hansen
2018-12-06 22:56       ` Sakkinen, Jarkko
2018-12-04  7:39 ` [RFC v2 11/13] keys/mktme: Program memory encryption keys on a system wide basis Alison Schofield
2018-12-04  9:21   ` Peter Zijlstra
2018-12-04  9:50     ` Kirill A. Shutemov
2018-12-05  5:44       ` Alison Schofield
2018-12-05  5:43     ` Alison Schofield
2018-12-05  9:10       ` Peter Zijlstra
2018-12-05 17:26         ` Alison Schofield
2018-12-04  7:39 ` [RFC v2 12/13] keys/mktme: Save MKTME data if kernel cmdline parameter allows Alison Schofield
2018-12-04  9:22   ` Peter Zijlstra
2018-12-07  2:14   ` Huang, Kai
2018-12-07  3:42     ` Alison Schofield
2018-12-07  6:39     ` Jarkko Sakkinen
2018-12-07  6:45       ` Jarkko Sakkinen
2018-12-07 11:47     ` Kirill A. Shutemov
2018-12-04  7:40 ` [RFC v2 13/13] keys/mktme: Support CPU Hotplug for MKTME keys Alison Schofield
2018-12-04  9:28   ` Peter Zijlstra
2018-12-05  5:32     ` Alison Schofield
2018-12-04  9:31   ` Peter Zijlstra
2018-12-05  5:36     ` Alison Schofield
2018-12-04  9:25 ` [RFC v2 00/13] Multi-Key Total Memory Encryption API (MKTME) Peter Zijlstra
2018-12-04  9:46   ` Kirill A. Shutemov
2018-12-05 20:32     ` Sakkinen, Jarkko
2018-12-06 11:22       ` Kirill A. Shutemov
2018-12-06 14:59         ` Dave Hansen
2018-12-07 10:12           ` Huang, Kai
2018-12-06 21:23         ` Sakkinen, Jarkko
2018-12-07 11:54           ` Kirill A. Shutemov
2018-12-04 19:19 ` Andy Lutomirski
2018-12-04 20:00   ` Andy Lutomirski
2018-12-04 20:32     ` Dave Hansen
2018-12-05 22:19   ` Sakkinen, Jarkko
2018-12-07  2:05     ` Huang, Kai
2018-12-07  6:48       ` Jarkko Sakkinen
2018-12-07 11:57     ` Kirill A. Shutemov
2018-12-07 21:59       ` Sakkinen, Jarkko
2018-12-07 23:45         ` Sakkinen, Jarkko
2018-12-07 23:48           ` Andy Lutomirski
2018-12-08  1:33           ` Huang, Kai
2018-12-08  3:53             ` Sakkinen, Jarkko
2018-12-12 15:31           ` Sakkinen, Jarkko
2018-12-12 16:29             ` Andy Lutomirski
2018-12-12 16:43               ` Sakkinen, Jarkko
2018-12-07 23:35       ` Eric Rannaud
2018-12-05 23:49   ` Dave Hansen
2018-12-06  1:09     ` Andy Lutomirski
2018-12-06  1:25       ` Dan Williams
2018-12-06 15:39       ` Dave Hansen
2018-12-06 19:10         ` Andy Lutomirski
2018-12-06 19:31           ` Dave Hansen
2018-12-07  1:55       ` Huang, Kai
2018-12-07  4:23         ` Dave Hansen
2018-12-07 23:53         ` Andy Lutomirski
2018-12-08  1:11           ` Dave Hansen
2018-12-08  2:07           ` Huang, Kai
2018-12-05 20:30 ` Sakkinen, Jarkko

Linux-Security-Module Archive on lore.kernel.org

Archives are clonable: git clone --mirror https://lore.kernel.org/linux-security-module/0 linux-security-module/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-security-module linux-security-module/ https://lore.kernel.org/linux-security-module \
		linux-security-module@vger.kernel.org linux-security-module@archiver.kernel.org
	public-inbox-index linux-security-module


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-security-module


AGPL code for this site: git clone https://public-inbox.org/ public-inbox