linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization
@ 2019-07-27  5:51 Sean Christopherson
  2019-07-27  5:51 ` [RFC PATCH 01/21] x86/sgx: Add defines for SGX device minor numbers Sean Christopherson
                   ` (20 more replies)
  0 siblings, 21 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:51 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

This is an "early" RFC series for adding SGX virtualization to KVM.  SGX
virtualization (more specifically, EPC virtualization) is dependent on the
not-yet-merged SGX enabling series and so cannot be considered for
inclusion any time soon.

The primary goal of this RFC is to get feedback on the overall approach,
e.g. code location, uAPI changes, functionality, etc...  My hope is to
sort out any major issues sooner rather than later, so that if/when the
base SGX enabling is merged, virtualization support can quickly follow
suit.

That being said, nitpicking and bikeshedding is more than welcome :-)

This code applies on top of a slightly modified version of v21 of the SGX
enabling series[1].  The modifications on top of the SGX series are a few
minor bug fixes that are not related to SGX virtualization, but affect
code that is moved/modified by this series.  The full source for the
modified version of v21 can be found at:

 https://github.com/sean-jc/linux.git

under the tag:

  sgx-v21-ish

A corresponding Qemu RFC will (hopefully) follow next week, the Qemu
patches need a bit more cleanup...

[1] https://lkml.kernel.org/r/20190713170804.2340-1-jarkko.sakkinen@linux.intel.com

Sean Christopherson (21):
  x86/sgx: Add defines for SGX device minor numbers
  x86/sgx: Move bus registration and device init to common code
  x86/sgx: Move provisioning device to common code
  x86/sgx: Add /dev/sgx/virt_epc device to allocate "raw" EPC for VMs
  x86/sgx: Expose SGX architectural definitions to the kernel
  KVM: x86: Add SGX sub-features leaf to reverse CPUID table
  KVM: x86: Add WARN_ON_ONCE(index!=0) in __do_cpuid_ent
  KVM: x86: Add kvm_x86_ops hook to short circuit emulation
  KVM: VMX: Add basic handling of VM-Exit from SGX enclave
  KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for VMX/SGX
  KVM: x86: Export kvm_propagate_fault (as kvm_propagate_page_fault)
  KVM: x86: Define new #PF SGX error code bit
  x86/sgx: Move the intermediate EINIT helper into the driver
  x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions
  KVM: VMX: Edd emulation of SGX Launch Control LE hash MSRs
  KVM: VMX: Add handler for ENCLS[EINIT] to support SGX Launch Control
  KVM: x86: Invoke kvm_x86_ops->cpuid_update() after kvm_update_cpuid()
  KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC
  x86/sgx: Export sgx_set_attribute() for use by KVM
  KVM: x86: Add capability to grant VM access to privileged SGX
    attribute

 Documentation/virtual/kvm/api.txt             |  20 ++
 arch/x86/Kconfig                              |  13 +
 arch/x86/include/asm/kvm_host.h               |   8 +-
 arch/x86/include/asm/sgx.h                    |  17 +
 .../cpu/sgx/arch.h => include/asm/sgx_arch.h} |   1 +
 arch/x86/include/asm/vmx.h                    |   1 +
 arch/x86/include/uapi/asm/vmx.h               |   1 +
 arch/x86/kernel/cpu/sgx/Makefile              |   1 +
 arch/x86/kernel/cpu/sgx/driver/driver.h       |   3 +-
 arch/x86/kernel/cpu/sgx/driver/ioctl.c        |  40 ++-
 arch/x86/kernel/cpu/sgx/driver/main.c         |  73 +----
 arch/x86/kernel/cpu/sgx/encl.c                |   2 +-
 arch/x86/kernel/cpu/sgx/encls.h               |   2 +-
 arch/x86/kernel/cpu/sgx/main.c                | 141 ++++++--
 arch/x86/kernel/cpu/sgx/sgx.h                 |  16 +-
 arch/x86/kernel/cpu/sgx/virt.c                | 308 ++++++++++++++++++
 arch/x86/kernel/cpu/sgx/virt.h                |  14 +
 arch/x86/kvm/Makefile                         |   2 +
 arch/x86/kvm/cpuid.c                          | 135 ++++++--
 arch/x86/kvm/cpuid.h                          |  20 ++
 arch/x86/kvm/emulate.c                        |   1 +
 arch/x86/kvm/mmu.c                            |  12 -
 arch/x86/kvm/svm.c                            |  19 +-
 arch/x86/kvm/vmx/nested.c                     |  21 +-
 arch/x86/kvm/vmx/nested.h                     |   5 +
 arch/x86/kvm/vmx/sgx.c                        | 247 ++++++++++++++
 arch/x86/kvm/vmx/sgx.h                        |  11 +
 arch/x86/kvm/vmx/vmcs12.c                     |   1 +
 arch/x86/kvm/vmx/vmcs12.h                     |   4 +-
 arch/x86/kvm/vmx/vmx.c                        | 251 +++++++++++++-
 arch/x86/kvm/vmx/vmx.h                        |   6 +
 arch/x86/kvm/x86.c                            |  40 ++-
 arch/x86/kvm/x86.h                            |   5 -
 include/uapi/linux/kvm.h                      |   1 +
 tools/testing/selftests/x86/sgx/defines.h     |   2 +-
 35 files changed, 1234 insertions(+), 210 deletions(-)
 create mode 100644 arch/x86/include/asm/sgx.h
 rename arch/x86/{kernel/cpu/sgx/arch.h => include/asm/sgx_arch.h} (99%)
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.c
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.h
 create mode 100644 arch/x86/kvm/vmx/sgx.c
 create mode 100644 arch/x86/kvm/vmx/sgx.h

-- 
2.22.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 01/21] x86/sgx: Add defines for SGX device minor numbers
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
@ 2019-07-27  5:51 ` Sean Christopherson
  2019-07-27  5:51 ` [RFC PATCH 02/21] x86/sgx: Move bus registration and device init to common code Sean Christopherson
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:51 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Add defines to track the minor numbers for each SGX device in
preparation for moving the helper code and provisioning device to the
common subsystem, and in preparation for adding a third device, i.e. a
virtual EPC device.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/driver/driver.h | 1 -
 arch/x86/kernel/cpu/sgx/driver/main.c   | 9 +++++----
 arch/x86/kernel/cpu/sgx/sgx.h           | 4 ++++
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver/driver.h b/arch/x86/kernel/cpu/sgx/driver/driver.h
index da60839b133a..6ce18c766a5a 100644
--- a/arch/x86/kernel/cpu/sgx/driver/driver.h
+++ b/arch/x86/kernel/cpu/sgx/driver/driver.h
@@ -15,7 +15,6 @@
 #include "../encls.h"
 #include "../sgx.h"
 
-#define SGX_DRV_NR_DEVICES	2
 #define SGX_EINIT_SPIN_COUNT	20
 #define SGX_EINIT_SLEEP_COUNT	50
 #define SGX_EINIT_SLEEP_TIME	20
diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c
index bb7f1932529f..a2506a49c95a 100644
--- a/arch/x86/kernel/cpu/sgx/driver/main.c
+++ b/arch/x86/kernel/cpu/sgx/driver/main.c
@@ -211,7 +211,7 @@ int __init sgx_drv_init(void)
 	if (ret)
 		return ret;
 
-	ret = alloc_chrdev_region(&sgx_devt, 0, SGX_DRV_NR_DEVICES, "sgx");
+	ret = alloc_chrdev_region(&sgx_devt, 0, SGX_MAX_NR_DEVICES, "sgx");
 	if (ret < 0)
 		goto err_bus;
 
@@ -238,12 +238,13 @@ int __init sgx_drv_init(void)
 	}
 
 	ret = sgx_dev_init("sgx/enclave", &sgx_encl_dev, &sgx_encl_cdev,
-			   &sgx_encl_fops, 0);
+			   &sgx_encl_fops, SGX_ENCL_DEV_MINOR);
 	if (ret)
 		goto err_chrdev_region;
 
 	ret = sgx_dev_init("sgx/provision", &sgx_provision_dev,
-			   &sgx_provision_cdev, &sgx_provision_fops, 1);
+			   &sgx_provision_cdev, &sgx_provision_fops,
+			   SGX_PROV_DEV_MINOR);
 	if (ret)
 		goto err_encl_dev;
 
@@ -277,7 +278,7 @@ int __init sgx_drv_init(void)
 	put_device(&sgx_encl_dev);
 
 err_chrdev_region:
-	unregister_chrdev_region(sgx_devt, SGX_DRV_NR_DEVICES);
+	unregister_chrdev_region(sgx_devt, SGX_MAX_NR_DEVICES);
 
 err_bus:
 	bus_unregister(&sgx_bus_type);
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index c9276d4b6ffe..4e2c3ce94f63 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -89,4 +89,8 @@ void sgx_free_page(struct sgx_epc_page *page);
 int sgx_einit(struct sgx_sigstruct *sigstruct, struct sgx_einittoken *token,
 	      struct sgx_epc_page *secs, u64 *lepubkeyhash);
 
+#define SGX_ENCL_DEV_MINOR	0
+#define SGX_PROV_DEV_MINOR	1
+#define SGX_MAX_NR_DEVICES	2
+
 #endif /* _X86_SGX_H */
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 02/21] x86/sgx: Move bus registration and device init to common code
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
  2019-07-27  5:51 ` [RFC PATCH 01/21] x86/sgx: Add defines for SGX device minor numbers Sean Christopherson
@ 2019-07-27  5:51 ` Sean Christopherson
  2019-07-27  5:51 ` [RFC PATCH 03/21] x86/sgx: Move provisioning device " Sean Christopherson
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:51 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Move the SGX bus registration and initialization into common code in
preparation for adding a virtual EPC device, which will reside outside
of the native SGX userspace driver.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/driver/main.c | 48 +------------------------
 arch/x86/kernel/cpu/sgx/main.c        | 50 ++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h         |  4 +++
 3 files changed, 54 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c
index a2506a49c95a..d62bdc7ed4d9 100644
--- a/arch/x86/kernel/cpu/sgx/driver/main.c
+++ b/arch/x86/kernel/cpu/sgx/driver/main.c
@@ -158,42 +158,10 @@ const struct file_operations sgx_provision_fops = {
 	.owner			= THIS_MODULE,
 };
 
-static struct bus_type sgx_bus_type = {
-	.name	= "sgx",
-};
-
 static struct device sgx_encl_dev;
 static struct cdev sgx_encl_cdev;
 static struct device sgx_provision_dev;
 static struct cdev sgx_provision_cdev;
-static dev_t sgx_devt;
-
-static void sgx_dev_release(struct device *dev)
-{
-}
-
-static __init int sgx_dev_init(const char *name, struct device *dev,
-			       struct cdev *cdev,
-			       const struct file_operations *fops, int minor)
-{
-	int ret;
-
-	device_initialize(dev);
-
-	dev->bus = &sgx_bus_type;
-	dev->devt = MKDEV(MAJOR(sgx_devt), minor);
-	dev->release = sgx_dev_release;
-
-	ret = dev_set_name(dev, name);
-	if (ret) {
-		put_device(dev);
-		return ret;
-	}
-
-	cdev_init(cdev, fops);
-	cdev->owner = THIS_MODULE;
-	return 0;
-}
 
 int __init sgx_drv_init(void)
 {
@@ -207,14 +175,6 @@ int __init sgx_drv_init(void)
 		return -ENODEV;
 	}
 
-	ret = bus_register(&sgx_bus_type);
-	if (ret)
-		return ret;
-
-	ret = alloc_chrdev_region(&sgx_devt, 0, SGX_MAX_NR_DEVICES, "sgx");
-	if (ret < 0)
-		goto err_bus;
-
 	cpuid_count(SGX_CPUID, 0, &eax, &ebx, &ecx, &edx);
 	sgx_misc_reserved_mask = ~ebx | SGX_MISC_RESERVED_MASK;
 	sgx_encl_size_max_64 = 1ULL << ((edx >> 8) & 0xFF);
@@ -240,7 +200,7 @@ int __init sgx_drv_init(void)
 	ret = sgx_dev_init("sgx/enclave", &sgx_encl_dev, &sgx_encl_cdev,
 			   &sgx_encl_fops, SGX_ENCL_DEV_MINOR);
 	if (ret)
-		goto err_chrdev_region;
+		return ret;
 
 	ret = sgx_dev_init("sgx/provision", &sgx_provision_dev,
 			   &sgx_provision_cdev, &sgx_provision_fops,
@@ -277,11 +237,5 @@ int __init sgx_drv_init(void)
 err_encl_dev:
 	put_device(&sgx_encl_dev);
 
-err_chrdev_region:
-	unregister_chrdev_region(sgx_devt, SGX_MAX_NR_DEVICES);
-
-err_bus:
-	bus_unregister(&sgx_bus_type);
-
 	return ret;
 }
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index f790a03571c5..edbd465083c7 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
 // Copyright(c) 2016-17 Intel Corporation.
 
+#include <linux/cdev.h>
 #include <linux/freezer.h>
 #include <linux/highmem.h>
 #include <linux/kthread.h>
@@ -329,6 +330,39 @@ static __init int sgx_page_cache_init(void)
 	return 0;
 }
 
+static struct bus_type sgx_bus_type = {
+	.name	= "sgx",
+};
+static dev_t sgx_devt;
+
+static void sgx_dev_release(struct device *dev)
+{
+
+}
+
+__init int sgx_dev_init(const char *name, struct device *dev,
+			struct cdev *cdev, const struct file_operations *fops,
+			int minor)
+{
+	int ret;
+
+	device_initialize(dev);
+
+	dev->bus = &sgx_bus_type;
+	dev->devt = MKDEV(MAJOR(sgx_devt), minor);
+	dev->release = sgx_dev_release;
+
+	ret = dev_set_name(dev, name);
+	if (ret) {
+		put_device(dev);
+		return ret;
+	}
+
+	cdev_init(cdev, fops);
+	cdev->owner = THIS_MODULE;
+	return 0;
+}
+
 static __init int sgx_init(void)
 {
 	int ret;
@@ -344,12 +378,26 @@ static __init int sgx_init(void)
 	if (ret)
 		goto err_page_cache;
 
-	ret = sgx_drv_init();
+	ret = bus_register(&sgx_bus_type);
 	if (ret)
 		goto err_kthread;
 
+	ret = alloc_chrdev_region(&sgx_devt, 0, SGX_MAX_NR_DEVICES, "sgx");
+	if (ret < 0)
+		goto err_bus;
+
+	ret = sgx_drv_init();
+	if (ret)
+		goto err_chrdev_region;
+
 	return 0;
 
+err_chrdev_region:
+	unregister_chrdev_region(sgx_devt, SGX_MAX_NR_DEVICES);
+
+err_bus:
+	bus_unregister(&sgx_bus_type);
+
 err_kthread:
 	kthread_stop(ksgxswapd_tsk);
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4e2c3ce94f63..85b3674e1d43 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -93,4 +93,8 @@ int sgx_einit(struct sgx_sigstruct *sigstruct, struct sgx_einittoken *token,
 #define SGX_PROV_DEV_MINOR	1
 #define SGX_MAX_NR_DEVICES	2
 
+__init int sgx_dev_init(const char *name, struct device *dev,
+			struct cdev *cdev, const struct file_operations *fops,
+			int minor);
+
 #endif /* _X86_SGX_H */
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 03/21] x86/sgx: Move provisioning device to common code
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
  2019-07-27  5:51 ` [RFC PATCH 01/21] x86/sgx: Add defines for SGX device minor numbers Sean Christopherson
  2019-07-27  5:51 ` [RFC PATCH 02/21] x86/sgx: Move bus registration and device init to common code Sean Christopherson
@ 2019-07-27  5:51 ` Sean Christopherson
  2019-07-27  5:51 ` [RFC PATCH 04/21] x86/sgx: Add /dev/sgx/virt_epc device to allocate "raw" EPC for VMs Sean Christopherson
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:51 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Move the provisioning device to common code in preparation for adding
support for SGX virtualization.  The provisioning device will need to be
instantiated if the native SGX driver *or* the virtual EPC "driver" is
loaded.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/driver/ioctl.c | 18 ++---------
 arch/x86/kernel/cpu/sgx/driver/main.c  | 24 +-------------
 arch/x86/kernel/cpu/sgx/main.c         | 44 +++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h          |  1 +
 4 files changed, 47 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
index 89b3fb81c15b..b7aa06920d10 100644
--- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
@@ -794,26 +794,12 @@ static long sgx_ioc_enclave_set_attribute(struct file *filep, void __user *arg)
 {
 	struct sgx_encl *encl = filep->private_data;
 	struct sgx_enclave_set_attribute params;
-	struct file *attribute_file;
-	int ret;
 
 	if (copy_from_user(&params, arg, sizeof(params)))
 		return -EFAULT;
 
-	attribute_file = fget(params.attribute_fd);
-	if (!attribute_file)
-		return -EINVAL;
-
-	if (attribute_file->f_op != &sgx_provision_fops) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	encl->allowed_attributes |= SGX_ATTR_PROVISIONKEY;
-
-out:
-	fput(attribute_file);
-	return ret;
+	return sgx_set_attribute(&encl->allowed_attributes,
+				 params.attribute_fd);
 }
 
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c
index d62bdc7ed4d9..1e107dd0d909 100644
--- a/arch/x86/kernel/cpu/sgx/driver/main.c
+++ b/arch/x86/kernel/cpu/sgx/driver/main.c
@@ -154,14 +154,8 @@ static const struct file_operations sgx_encl_fops = {
 	.get_unmapped_area	= sgx_get_unmapped_area,
 };
 
-const struct file_operations sgx_provision_fops = {
-	.owner			= THIS_MODULE,
-};
-
 static struct device sgx_encl_dev;
 static struct cdev sgx_encl_cdev;
-static struct device sgx_provision_dev;
-static struct cdev sgx_provision_cdev;
 
 int __init sgx_drv_init(void)
 {
@@ -202,38 +196,22 @@ int __init sgx_drv_init(void)
 	if (ret)
 		return ret;
 
-	ret = sgx_dev_init("sgx/provision", &sgx_provision_dev,
-			   &sgx_provision_cdev, &sgx_provision_fops,
-			   SGX_PROV_DEV_MINOR);
-	if (ret)
-		goto err_encl_dev;
-
 	sgx_encl_wq = alloc_workqueue("sgx-encl-wq",
 				      WQ_UNBOUND | WQ_FREEZABLE, 1);
 	if (!sgx_encl_wq) {
 		ret = -ENOMEM;
-		goto err_provision_dev;
+		goto err_encl_dev;
 	}
 
 	ret = cdev_device_add(&sgx_encl_cdev, &sgx_encl_dev);
 	if (ret)
 		goto err_encl_wq;
 
-	ret = cdev_device_add(&sgx_provision_cdev, &sgx_provision_dev);
-	if (ret)
-		goto err_encl_cdev;
-
 	return 0;
 
-err_encl_cdev:
-	cdev_device_del(&sgx_encl_cdev, &sgx_encl_dev);
-
 err_encl_wq:
 	destroy_workqueue(sgx_encl_wq);
 
-err_provision_dev:
-	put_device(&sgx_provision_dev);
-
 err_encl_dev:
 	put_device(&sgx_encl_dev);
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index edbd465083c7..9f4473597620 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -2,6 +2,7 @@
 // Copyright(c) 2016-17 Intel Corporation.
 
 #include <linux/cdev.h>
+#include <linux/file.h>
 #include <linux/freezer.h>
 #include <linux/highmem.h>
 #include <linux/kthread.h>
@@ -335,6 +336,31 @@ static struct bus_type sgx_bus_type = {
 };
 static dev_t sgx_devt;
 
+const struct file_operations sgx_provision_fops = {
+	.owner			= THIS_MODULE,
+};
+
+static struct device sgx_provision_dev;
+static struct cdev sgx_provision_cdev;
+
+int sgx_set_attribute(u64 *allowed_attributes, unsigned int attribute_fd)
+{
+	struct file *attribute_file;
+
+	attribute_file = fget(attribute_fd);
+	if (!attribute_file)
+		return -EINVAL;
+
+	if (attribute_file->f_op != &sgx_provision_fops) {
+		fput(attribute_file);
+		return -EINVAL;
+	}
+	fput(attribute_file);
+
+	*allowed_attributes |= SGX_ATTR_PROVISIONKEY;
+	return 0;
+}
+
 static void sgx_dev_release(struct device *dev)
 {
 
@@ -386,12 +412,28 @@ static __init int sgx_init(void)
 	if (ret < 0)
 		goto err_bus;
 
-	ret = sgx_drv_init();
+	ret = sgx_dev_init("sgx/provision", &sgx_provision_dev,
+			   &sgx_provision_cdev, &sgx_provision_fops,
+			   SGX_PROV_DEV_MINOR);
 	if (ret)
 		goto err_chrdev_region;
 
+	ret = cdev_device_add(&sgx_provision_cdev, &sgx_provision_dev);
+	if (ret)
+		goto err_provision_dev;
+
+	ret = sgx_drv_init();
+	if (ret)
+		goto err_provision_cdev;
+
 	return 0;
 
+err_provision_cdev:
+	cdev_device_del(&sgx_provision_cdev, &sgx_provision_dev);
+
+err_provision_dev:
+	put_device(&sgx_provision_dev);
+
 err_chrdev_region:
 	unregister_chrdev_region(sgx_devt, SGX_MAX_NR_DEVICES);
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 85b3674e1d43..a0af8849c7c3 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -96,5 +96,6 @@ int sgx_einit(struct sgx_sigstruct *sigstruct, struct sgx_einittoken *token,
 __init int sgx_dev_init(const char *name, struct device *dev,
 			struct cdev *cdev, const struct file_operations *fops,
 			int minor);
+int sgx_set_attribute(u64 *allowed_attributes, unsigned int attribute_fd);
 
 #endif /* _X86_SGX_H */
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 04/21] x86/sgx: Add /dev/sgx/virt_epc device to allocate "raw" EPC for VMs
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (2 preceding siblings ...)
  2019-07-27  5:51 ` [RFC PATCH 03/21] x86/sgx: Move provisioning device " Sean Christopherson
@ 2019-07-27  5:51 ` Sean Christopherson
  2019-07-27 17:44   ` Andy Lutomirski
  2019-07-27  5:51 ` [RFC PATCH 05/21] x86/sgx: Expose SGX architectural definitions to the kernel Sean Christopherson
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:51 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Add an SGX device to enable userspace to allocate EPC without an
associated enclave.  The intended and only known use case for direct EPC
allocation is to expose EPC to a KVM guest, hence the virt_epc moniker,
virt.{c,h} files and INTEL_SGX_VIRTUALIZATION Kconfig.

Although KVM is the end consumer of EPC, and will need hooks into the
virtual EPC management if oversubscription of EPC for guest is ever
supported (see below), implement direct access to EPC in the SGX
subsystem instead of in KVM.  Doing so has two major advantages:

  - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
    just another memory backend for guests.

  - EPC management is wholly contained in the SGX subsystem, e.g. SGX
    does not have to export any symbols, changes to reclaim flows don't
    need to be routed through KVM, SGX's dirty laundry doesn't have to
    get aired out for the world to see, and so on and so forth.

Oversubscription of EPC for KVM guests is not currently supported.  Due
to the complications of handling reclaim conflicts between guest and
host, KVM EPC oversubscription is expected to be at least an order of
magnitude more complex than basic support for SGX virtualization.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/Kconfig                 |  10 ++
 arch/x86/kernel/cpu/sgx/Makefile |   1 +
 arch/x86/kernel/cpu/sgx/main.c   |   3 +
 arch/x86/kernel/cpu/sgx/sgx.h    |   3 +-
 arch/x86/kernel/cpu/sgx/virt.c   | 253 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/virt.h   |  14 ++
 6 files changed, 283 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.c
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 74ccb1bdea16..c1bdb9f85928 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1961,6 +1961,16 @@ config INTEL_SGX_DRIVER
 
 	  If unsure, say N.
 
+config INTEL_SGX_VIRTUALIZATION
+	bool "Intel SGX Virtualization"
+	depends on INTEL_SGX && KVM_INTEL
+	help
+	  Enabling support for SGX virtualization enables userspace to allocate
+	  "raw" EPC for the purpose of exposing EPC to a KVM guest, i.e. a
+	  virtual machine, via a device node (/dev/sgx/virt_epc by default).
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index e5d1e862969c..559fd0f9be50 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -1,2 +1,3 @@
 obj-y += encl.o encls.o main.o reclaim.o
 obj-$(CONFIG_INTEL_SGX_DRIVER) += driver/
+obj-$(CONFIG_INTEL_SGX_VIRTUALIZATION) += virt.o
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 9f4473597620..ead827371139 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -14,6 +14,7 @@
 #include "arch.h"
 #include "encls.h"
 #include "sgx.h"
+#include "virt.h"
 
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 int sgx_nr_epc_sections;
@@ -422,7 +423,9 @@ static __init int sgx_init(void)
 	if (ret)
 		goto err_provision_dev;
 
+	/* Success if the native *or* virtual driver initialized cleanly. */
 	ret = sgx_drv_init();
+	ret = sgx_virt_epc_init() ? ret : 0;
 	if (ret)
 		goto err_provision_cdev;
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index a0af8849c7c3..16cdb935aaa7 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -91,7 +91,8 @@ int sgx_einit(struct sgx_sigstruct *sigstruct, struct sgx_einittoken *token,
 
 #define SGX_ENCL_DEV_MINOR	0
 #define SGX_PROV_DEV_MINOR	1
-#define SGX_MAX_NR_DEVICES	2
+#define SGX_VIRT_DEV_MINOR	2
+#define SGX_MAX_NR_DEVICES	3
 
 __init int sgx_dev_init(const char *name, struct device *dev,
 			struct cdev *cdev, const struct file_operations *fops,
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
new file mode 100644
index 000000000000..79ee5917a4fc
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -0,0 +1,253 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/cdev.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <uapi/asm/sgx.h>
+
+#include "encls.h"
+#include "sgx.h"
+#include "virt.h"
+
+struct sgx_virt_epc_page {
+	struct sgx_epc_page *epc_page;
+};
+
+struct sgx_virt_epc {
+	struct radix_tree_root page_tree;
+	struct rw_semaphore lock;
+};
+
+static inline unsigned long sgx_virt_epc_calc_index(struct vm_area_struct *vma,
+						    unsigned long addr)
+{
+	return vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
+}
+
+static struct sgx_virt_epc_page *__sgx_virt_epc_fault(struct sgx_virt_epc *epc,
+						      struct vm_area_struct *vma,
+						      unsigned long addr)
+{
+	struct sgx_virt_epc_page *page;
+	struct sgx_epc_page *epc_page;
+	unsigned long index;
+	int ret;
+
+	index = sgx_virt_epc_calc_index(vma, addr);
+
+	page = radix_tree_lookup(&epc->page_tree, index);
+	if (page) {
+		if (page->epc_page)
+			return page;
+	} else {
+		page = kzalloc(sizeof(*page), GFP_KERNEL);
+		if (!page)
+			return ERR_PTR(-ENOMEM);
+
+		ret = radix_tree_insert(&epc->page_tree, index, page);
+		if (unlikely(ret)) {
+			kfree(page);
+			return ERR_PTR(ret);
+		}
+	}
+
+	epc_page = sgx_alloc_page(&epc, false);
+	if (IS_ERR(epc_page))
+		return ERR_CAST(epc_page);
+
+	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(epc_page->desc));
+	if (unlikely(ret != VM_FAULT_NOPAGE)) {
+		sgx_free_page(epc_page);
+		return ERR_PTR(-EFAULT);
+	}
+
+	page->epc_page = epc_page;
+
+	return page;
+}
+
+static vm_fault_t sgx_virt_epc_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_virt_epc *epc = (struct sgx_virt_epc *)vma->vm_private_data;
+	struct sgx_virt_epc_page *page;
+
+	down_write(&epc->lock);
+	page = __sgx_virt_epc_fault(epc, vma, vmf->address);
+	up_write(&epc->lock);
+
+	if (!IS_ERR(page) || signal_pending(current))
+		return VM_FAULT_NOPAGE;
+
+	if (PTR_ERR(page) == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
+		up_read(&vma->vm_mm->mmap_sem);
+		return VM_FAULT_RETRY;
+	}
+
+	return VM_FAULT_SIGBUS;
+}
+
+static struct sgx_virt_epc_page *sgx_virt_epc_get_page(struct sgx_virt_epc *epc,
+						       unsigned long index)
+{
+	struct sgx_virt_epc_page *page;
+
+	down_read(&epc->lock);
+	page = radix_tree_lookup(&epc->page_tree, index);
+	if (!page || !page->epc_page)
+		page = ERR_PTR(-EFAULT);
+	up_read(&epc->lock);
+
+	return page;
+}
+
+static int sgx_virt_epc_access(struct vm_area_struct *vma, unsigned long start,
+			       void *buf, int len, int write)
+{
+	/* EDBG{RD,WR} are naturally sized, i.e. always 8-byte on 64-bit. */
+	unsigned char data[sizeof(unsigned long)];
+	struct sgx_virt_epc_page *page;
+	struct sgx_virt_epc *epc;
+	unsigned long addr, index;
+	int offset, cnt, i;
+	int ret = 0;
+	void *p;
+
+	epc = vma->vm_private_data;
+
+	for (i = 0; i < len && !ret; i += cnt) {
+		addr = start + i;
+		if (i == 0 || PFN_DOWN(addr) != PFN_DOWN(addr - cnt))
+			index = sgx_virt_epc_calc_index(vma, addr);
+
+		page = sgx_virt_epc_get_page(epc, index);
+
+		/*
+		 * EDBG{RD,WR} require an active enclave, and given that VMM
+		 * EPC oversubscription isn't supported, a not-present EPC page
+		 * means the guest hasn't accessed the page and therefore can't
+		 * possibility have added the page to an enclave.
+		 */
+		if (IS_ERR(page))
+			return PTR_ERR(page);
+
+		offset = addr & (sizeof(unsigned long) - 1);
+		addr = ALIGN_DOWN(addr, sizeof(unsigned long));
+		cnt = min((int)sizeof(unsigned long) - offset, len - i);
+
+		p = sgx_epc_addr(page->epc_page) + (addr & ~PAGE_MASK);
+
+		/* EDBGRD for read, or to do RMW for a partial write. */
+		if (!write || cnt != sizeof(unsigned long))
+			ret = __edbgrd(p, (void *)data);
+
+		if (!ret) {
+			if (write) {
+				memcpy(data + offset, buf + i, cnt);
+				ret = __edbgwr(p, (void *)data);
+			} else {
+				memcpy(buf + i, data + offset, cnt);
+			}
+		}
+	}
+
+	if (ret)
+		return -EIO;
+	return i;
+}
+
+const struct vm_operations_struct sgx_virt_epc_vm_ops = {
+	.fault = sgx_virt_epc_fault,
+	.access = sgx_virt_epc_access,
+};
+
+static int sgx_virt_epc_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
+	vma->vm_ops = &sgx_virt_epc_vm_ops;
+	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP;
+	vma->vm_private_data = file->private_data;
+
+	return 0;
+}
+
+static int sgx_virt_epc_release(struct inode *inode, struct file *file)
+{
+	struct sgx_virt_epc *epc = file->private_data;
+	struct radix_tree_iter iter;
+	struct sgx_virt_epc_page *page;
+	void **slot;
+
+	LIST_HEAD(secs_pages);
+
+	radix_tree_for_each_slot(slot, &epc->page_tree, &iter, 0) {
+		page = *slot;
+		if (page->epc_page && __sgx_free_page(page->epc_page))
+			continue;
+		kfree(page);
+		radix_tree_delete(&epc->page_tree, iter.index);
+	}
+
+	/*
+	 * Because we don't track which pages are SECS pages, it's possible
+	 * for EREMOVE to fail, e.g. a SECS page can have children if a VM
+	 * shutdown unexpectedly.  Retry all failed pages after iterating
+	 * through the entire tree, at which point all children should be
+	 * removed and the SECS pages can be nuked as well.
+	 */
+	radix_tree_for_each_slot(slot, &epc->page_tree, &iter, 0) {
+		page = *slot;
+		if (!(WARN_ON(!page->epc_page)))
+			sgx_free_page(page->epc_page);
+		radix_tree_delete(&epc->page_tree, iter.index);
+	}
+
+	kfree(epc);
+
+	return 0;
+}
+
+static int sgx_virt_epc_open(struct inode *inode, struct file *file)
+{
+	struct sgx_virt_epc *epc;
+
+	epc = kzalloc(sizeof(struct sgx_virt_epc), GFP_KERNEL);
+	if (!epc)
+		return -ENOMEM;
+
+	init_rwsem(&epc->lock);
+	INIT_RADIX_TREE(&epc->page_tree, GFP_KERNEL);
+
+	file->private_data = epc;
+
+	return 0;
+}
+
+static const struct file_operations sgx_virt_epc_fops = {
+	.owner			= THIS_MODULE,
+	.open			= sgx_virt_epc_open,
+	.release		= sgx_virt_epc_release,
+	.mmap			= sgx_virt_epc_mmap,
+};
+
+static struct device sgx_virt_epc_dev;
+static struct cdev sgx_virt_epc_cdev;
+
+int __init sgx_virt_epc_init(void)
+{
+	int ret = sgx_dev_init("sgx/virt_epc", &sgx_virt_epc_dev,
+			       &sgx_virt_epc_cdev, &sgx_virt_epc_fops,
+			       SGX_VIRT_DEV_MINOR);
+	if (ret)
+		return ret;
+
+	ret = cdev_device_add(&sgx_virt_epc_cdev, &sgx_virt_epc_dev);
+	if (ret)
+		put_device(&sgx_virt_epc_dev);
+
+	return ret;
+}
diff --git a/arch/x86/kernel/cpu/sgx/virt.h b/arch/x86/kernel/cpu/sgx/virt.h
new file mode 100644
index 000000000000..436170412b98
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/virt.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+#ifndef _ASM_X86_SGX_VIRT_H
+#define _ASM_X86_SGX_VIRT_H
+
+#ifdef CONFIG_INTEL_SGX_VIRTUALIZATION
+int __init sgx_virt_epc_init(void);
+#else
+static inline int __init sgx_virt_epc_init(void)
+{
+	return -ENODEV;
+}
+#endif
+
+#endif /* _ASM_X86_SGX_VIRT_H */
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 05/21] x86/sgx: Expose SGX architectural definitions to the kernel
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (3 preceding siblings ...)
  2019-07-27  5:51 ` [RFC PATCH 04/21] x86/sgx: Add /dev/sgx/virt_epc device to allocate "raw" EPC for VMs Sean Christopherson
@ 2019-07-27  5:51 ` Sean Christopherson
  2019-07-27  5:51 ` [RFC PATCH 06/21] KVM: x86: Add SGX sub-features leaf to reverse CPUID table Sean Christopherson
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:51 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

KVM will use many of the architectural constants and structs to
virtualize SGX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/{kernel/cpu/sgx/arch.h => include/asm/sgx_arch.h} | 0
 arch/x86/kernel/cpu/sgx/driver/driver.h                    | 2 +-
 arch/x86/kernel/cpu/sgx/encl.c                             | 2 +-
 arch/x86/kernel/cpu/sgx/encls.h                            | 2 +-
 arch/x86/kernel/cpu/sgx/main.c                             | 2 +-
 arch/x86/kernel/cpu/sgx/sgx.h                              | 3 +--
 tools/testing/selftests/x86/sgx/defines.h                  | 2 +-
 7 files changed, 6 insertions(+), 7 deletions(-)
 rename arch/x86/{kernel/cpu/sgx/arch.h => include/asm/sgx_arch.h} (100%)

diff --git a/arch/x86/kernel/cpu/sgx/arch.h b/arch/x86/include/asm/sgx_arch.h
similarity index 100%
rename from arch/x86/kernel/cpu/sgx/arch.h
rename to arch/x86/include/asm/sgx_arch.h
diff --git a/arch/x86/kernel/cpu/sgx/driver/driver.h b/arch/x86/kernel/cpu/sgx/driver/driver.h
index 6ce18c766a5a..4dc133f3c186 100644
--- a/arch/x86/kernel/cpu/sgx/driver/driver.h
+++ b/arch/x86/kernel/cpu/sgx/driver/driver.h
@@ -10,7 +10,7 @@
 #include <linux/sched.h>
 #include <linux/workqueue.h>
 #include <uapi/asm/sgx.h>
-#include "../arch.h"
+#include <asm/sgx_arch.h>
 #include "../encl.h"
 #include "../encls.h"
 #include "../sgx.h"
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 836c55d4352d..8549fd95f02d 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -7,7 +7,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/suspend.h>
 #include <linux/sched/mm.h>
-#include "arch.h"
+#include <asm/sgx_arch.h>
 #include "encl.h"
 #include "encls.h"
 #include "sgx.h"
diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index aea3b9d09936..1b49c7419767 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -8,7 +8,7 @@
 #include <linux/rwsem.h>
 #include <linux/types.h>
 #include <asm/asm.h>
-#include "arch.h"
+#include <asm/sgx_arch.h>
 
 /**
  * ENCLS_FAULT_FLAG - flag signifying an ENCLS return code is a trapnr
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index ead827371139..532dd90e09e1 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -10,8 +10,8 @@
 #include <linux/ratelimit.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
+#include <asm/sgx_arch.h>
 #include "driver/driver.h"
-#include "arch.h"
 #include "encls.h"
 #include "sgx.h"
 #include "virt.h"
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 16cdb935aaa7..748b1633d770 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -8,10 +8,9 @@
 #include <linux/rwsem.h>
 #include <linux/types.h>
 #include <asm/asm.h>
+#include <asm/sgx_arch.h>
 #include <uapi/asm/sgx_errno.h>
 
-#include "arch.h"
-
 struct sgx_epc_page {
 	unsigned long desc;
 	struct sgx_encl_page *owner;
diff --git a/tools/testing/selftests/x86/sgx/defines.h b/tools/testing/selftests/x86/sgx/defines.h
index 3ff73a9d9b93..ebc4c6cf57c4 100644
--- a/tools/testing/selftests/x86/sgx/defines.h
+++ b/tools/testing/selftests/x86/sgx/defines.h
@@ -33,7 +33,7 @@ typedef uint64_t u64;
 	(((~0ULL) - (1ULL << (l)) + 1) & \
 	 (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h))))
 
-#include "../../../../../arch/x86/kernel/cpu/sgx/arch.h"
+#include "../../../../../arch/x86/include/asm/sgx_arch.h"
 #include "../../../../../arch/x86/include/uapi/asm/sgx.h"
 
 #endif /* TYPES_H */
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 06/21] KVM: x86: Add SGX sub-features leaf to reverse CPUID table
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (4 preceding siblings ...)
  2019-07-27  5:51 ` [RFC PATCH 05/21] x86/sgx: Expose SGX architectural definitions to the kernel Sean Christopherson
@ 2019-07-27  5:51 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 07/21] KVM: x86: Add WARN_ON_ONCE(index!=0) in __do_cpuid_ent Sean Christopherson
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:51 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

CPUID_12_EAX is an Intel-defined feature bits leaf dedicated for SGX
that enumerates the SGX instruction sets that are supported by the
CPU, e.g. SGX1, SGX2, etc...

Since Linux only cares about two bits at this time (SGX1 and SGX2), the
SGX bits were relocated to to Linux-defined word 8, i.e. CPUID_LNX_3,
instead of adding a dedicated SGX word so as to conserve space.  But,
to make KVM's life simple, the bit numbers of the SGX features were
intentionally kept the same between the Intel-defined leaf and the
Linux-defined leaf.

Add build-time assertions to ensure X86_FEATURE_SGX{1,2} are at the
expected locations, and that KVM isn't trying to do a reverse CPUID
lookup on a non-SGX bit in CPUID_LNX_3.

Relocate bit() to cpuid.h where it belongs (it's NOT a generic bit
function) and add a beefy comment explaining what the hell it's doing.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/cpuid.h   | 20 ++++++++++++++++++++
 arch/x86/kvm/emulate.c |  1 +
 arch/x86/kvm/x86.h     |  5 -----
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index d78a61408243..aed49d639c3b 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -53,6 +53,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_7_ECX]         = {         7, 0, CPUID_ECX},
 	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
 	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
+	[CPUID_LNX_3]         = {      0x12, 0, CPUID_EAX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
@@ -61,6 +62,7 @@ static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
 
 	BUILD_BUG_ON(x86_leaf >= ARRAY_SIZE(reverse_cpuid));
 	BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
+	BUILD_BUG_ON(x86_leaf == CPUID_LNX_3 && (x86_feature & 31) > 1);
 
 	return reverse_cpuid[x86_leaf];
 }
@@ -89,6 +91,24 @@ static __always_inline int *guest_cpuid_get_register(struct kvm_vcpu *vcpu, unsi
 	}
 }
 
+/*
+ * Retrieve the bit from an X86_FEATURE_* definition using a simple AND to
+ * isolate the bit number from the feature definition.  Note that this works
+ * only for features that are NOT scattered, i.e. the X86_FEATURE_* bit number
+ * must match the hardware-defined CPUID bit number.  The only exception to
+ * this rule is the SGX sub-features leaf, which is scattered but only in the
+ * sense that its bits are relocated from hardware-defined leaf 0x12.0.EAX to
+ * Linux defined word 8, but its bit numbers are maintained (KVM asserts this
+ * expectation at build time).
+ */
+static __always_inline u32 bit(unsigned x86_feature)
+{
+	BUILD_BUG_ON((X86_FEATURE_SGX1 & 31) != 0);
+	BUILD_BUG_ON((X86_FEATURE_SGX2 & 31) != 1);
+
+	return 1 << (x86_feature & 31);
+}
+
 static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu, unsigned x86_feature)
 {
 	int *reg;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4a387a235424..6ffe23febcd7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -29,6 +29,7 @@
 #include "tss.h"
 #include "mmu.h"
 #include "pmu.h"
+#include "cpuid.h"
 
 /*
  * Operand types
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a470ff0868c5..1e0c7b17effa 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -139,11 +139,6 @@ static inline int is_paging(struct kvm_vcpu *vcpu)
 	return likely(kvm_read_cr0_bits(vcpu, X86_CR0_PG));
 }
 
-static inline u32 bit(int bitno)
-{
-	return 1 << (bitno & 31);
-}
-
 static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
 {
 	return kvm_read_cr4_bits(vcpu, X86_CR4_LA57) ? 57 : 48;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 07/21] KVM: x86: Add WARN_ON_ONCE(index!=0) in __do_cpuid_ent
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (5 preceding siblings ...)
  2019-07-27  5:51 ` [RFC PATCH 06/21] KVM: x86: Add SGX sub-features leaf to reverse CPUID table Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation Sean Christopherson
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Except for one outlier, function 7, all cases in __do_cpuid_ent and
its children assume that the index passed in is zero.  Furthermore,
the index is fully under KVM's control and all callers pass an index
of zero.  In other words, a non-zero index would indicate either a
bug in the caller or a new case that is expected to be handled.  WARN
and return an error on a non-zero index and remove the now unreachable
code in function 7 for handling a non-zero index.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/cpuid.c | 57 ++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 4992e7c99588..70e488951f25 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -410,6 +410,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
 		F(MD_CLEAR);
 
+	/*
+	 * The code below assumes index == 0, which simplifies handling leafs
+	 * with a dynamic number of sub-leafs.  The index is fully under KVM's
+	 * control, i.e. a non-zero value is a bug.
+	 */
+	if (WARN_ON_ONCE(index != 0))
+		return -EINVAL;
+
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
 
@@ -480,38 +488,31 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		entry->ecx = 0;
 		entry->edx = 0;
 		break;
-	case 7: {
+	case 7:
 		entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
 		/* Mask ebx against host capability word 9 */
-		if (index == 0) {
-			entry->ebx &= kvm_cpuid_7_0_ebx_x86_features;
-			cpuid_mask(&entry->ebx, CPUID_7_0_EBX);
-			// TSC_ADJUST is emulated
-			entry->ebx |= F(TSC_ADJUST);
-			entry->ecx &= kvm_cpuid_7_0_ecx_x86_features;
-			f_la57 = entry->ecx & F(LA57);
-			cpuid_mask(&entry->ecx, CPUID_7_ECX);
-			/* Set LA57 based on hardware capability. */
-			entry->ecx |= f_la57;
-			entry->ecx |= f_umip;
-			/* PKU is not yet implemented for shadow paging. */
-			if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
-				entry->ecx &= ~F(PKU);
-			entry->edx &= kvm_cpuid_7_0_edx_x86_features;
-			cpuid_mask(&entry->edx, CPUID_7_EDX);
-			/*
-			 * We emulate ARCH_CAPABILITIES in software even
-			 * if the host doesn't support it.
-			 */
-			entry->edx |= F(ARCH_CAPABILITIES);
-		} else {
-			entry->ebx = 0;
-			entry->ecx = 0;
-			entry->edx = 0;
-		}
+		entry->ebx &= kvm_cpuid_7_0_ebx_x86_features;
+		cpuid_mask(&entry->ebx, CPUID_7_0_EBX);
+		// TSC_ADJUST is emulated
+		entry->ebx |= F(TSC_ADJUST);
+		entry->ecx &= kvm_cpuid_7_0_ecx_x86_features;
+		f_la57 = entry->ecx & F(LA57);
+		cpuid_mask(&entry->ecx, CPUID_7_ECX);
+		/* Set LA57 based on hardware capability. */
+		entry->ecx |= f_la57;
+		entry->ecx |= f_umip;
+		/* PKU is not yet implemented for shadow paging. */
+		if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
+			entry->ecx &= ~F(PKU);
+		entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+		cpuid_mask(&entry->edx, CPUID_7_EDX);
+		/*
+		 * We emulate ARCH_CAPABILITIES in software even
+		 * if the host doesn't support it.
+		 */
+		entry->edx |= F(ARCH_CAPABILITIES);
 		entry->eax = 0;
 		break;
-	}
 	case 9:
 		break;
 	case 0xa: { /* Architectural Performance Monitoring */
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (6 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 07/21] KVM: x86: Add WARN_ON_ONCE(index!=0) in __do_cpuid_ent Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27 17:38   ` Andy Lutomirski
  2019-07-30  3:08   ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 09/21] KVM: VMX: Add basic handling of VM-Exit from SGX enclave Sean Christopherson
                   ` (12 subsequent siblings)
  20 siblings, 2 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Similar to the existing AMD #NPF case where emulation of the current
instruction is not possible due to lack of information, virtualization
of Intel SGX will introduce a scenario where emulation is not possible
due to the VMExit occurring in an SGX enclave.  And again similar to
the AMD case, emulation can be initiated by kvm_mmu_page_fault(), i.e.
outside of the control of the vendor-specific code.

While the cause and architecturally visible behavior of the two cases
is different,  e.g. Intel SGX will inject a #UD whereas AMD #NPF is a
clean resume or complete shutdown, the impact on the common emulation
code is identical: KVM must stop emulation immediately and resume the
guest.

Replace the exisiting need_emulation_on_page_fault() with a more generic
is_emulatable() kvm_x86_ops callback, which is called unconditionally
by x86_emulate_instruction().

Query is_emulatable() in handle_ud() as well so that the
force_emulation_prefix code doesn't incorrectly modify RIP before
calling emulate_instruction() in the absurdly unlikely scenario that
we encounter forced emulation in conjunction with "do not emulate".
Do this for both Intel and AMD so that any future changes to AMD's
emulation logic take effect as expected for handle_ud().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu.c              | 12 ------------
 arch/x86/kvm/svm.c              | 19 +++++++++++++++++--
 arch/x86/kvm/vmx/vmx.c          | 11 +++++------
 arch/x86/kvm/x86.c              |  9 ++++++++-
 5 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 26d1eb83f72a..1341d8390ebe 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1198,7 +1198,7 @@ struct kvm_x86_ops {
 				   uint16_t *vmcs_version);
 	uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);
 
-	bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
+	bool (*is_emulatable)(struct kvm_vcpu *vcpu, void *insn, int insn_len);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 98f6e4f88b04..bf6952f8f330 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5412,18 +5412,6 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
 	if (!mmio_info_in_cache(vcpu, cr2, direct) && !is_guest_mode(vcpu))
 		emulation_type = EMULTYPE_ALLOW_RETRY;
 emulate:
-	/*
-	 * On AMD platforms, under certain conditions insn_len may be zero on #NPF.
-	 * This can happen if a guest gets a page-fault on data access but the HW
-	 * table walker is not able to read the instruction page (e.g instruction
-	 * page is not present in memory). In those cases we simply restart the
-	 * guest, with the exception of AMD Erratum 1096 which is unrecoverable.
-	 */
-	if (unlikely(insn && !insn_len)) {
-		if (!kvm_x86_ops->need_emulation_on_page_fault(vcpu))
-			return 1;
-	}
-
 	er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len);
 
 	switch (er) {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 48c865a4e5dd..0fb8b60eb136 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7115,10 +7115,25 @@ static int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 	return -ENODEV;
 }
 
-static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
+static bool svm_is_emulatable(struct kvm_vcpu *vcpu, void *insn, int insn_len)
 {
 	bool is_user, smap;
 
+	if (likely(!insn || insn_len))
+		return true;
+
+	/*
+	 * Under certain conditions insn_len may be zero on #NPF.  This can
+	 * happen if a guest gets a page-fault on data access but the HW table
+	 * walker is not able to read the instruction page (e.g instruction
+	 * page is not present in memory). In those cases we simply restart the
+	 * guest, with the exception of AMD Erratum 1096 which is unrecoverable.
+	 */
+	if (unlikely(insn && !insn_len)) {
+		if (!kvm_x86_ops->need_emulation_on_page_fault(vcpu))
+			return 1;
+	}
+
 	is_user = svm_get_cpl(vcpu) == 3;
 	smap = !kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
 
@@ -7279,7 +7294,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.nested_enable_evmcs = nested_enable_evmcs,
 	.nested_get_evmcs_version = nested_get_evmcs_version,
 
-	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
+	.is_emulatable = svm_is_emulatable,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d98eac371c0a..f48fc990ca6d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1458,6 +1458,10 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
 	return 0;
 }
 
+static bool vmx_is_emulatable(struct kvm_vcpu *vcpu, void *insn, int insn_len)
+{
+	return true;
+}
 
 static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
@@ -7416,11 +7420,6 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static bool vmx_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
-{
-	return 0;
-}
-
 static __init int hardware_setup(void)
 {
 	unsigned long host_bndcfgs;
@@ -7723,7 +7722,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.set_nested_state = NULL,
 	.get_vmcs12_pages = NULL,
 	.nested_enable_evmcs = NULL,
-	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
+	.is_emulatable = vmx_is_emulatable,
 };
 
 static void vmx_cleanup_l1d_flush(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 63bb1ee8258e..afcc01a59421 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5274,6 +5274,9 @@ int handle_ud(struct kvm_vcpu *vcpu)
 	char sig[5]; /* ud2; .ascii "kvm" */
 	struct x86_exception e;
 
+	if (unlikely(!kvm_x86_ops->is_emulatable(vcpu, NULL, 0)))
+		return 1;
+
 	if (force_emulation_prefix &&
 	    kvm_read_guest_virt(vcpu, kvm_get_linear_rip(vcpu),
 				sig, sizeof(sig), &e) == 0 &&
@@ -6430,7 +6433,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 	int r;
 	struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
 	bool writeback = true;
-	bool write_fault_to_spt = vcpu->arch.write_fault_to_shadow_pgtable;
+	bool write_fault_to_spt;
+
+	if (unlikely(!kvm_x86_ops->is_emulatable(vcpu, insn, insn_len)))
+		return EMULATE_DONE;
 
 	vcpu->arch.l1tf_flush_l1d = true;
 
@@ -6438,6 +6444,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 	 * Clear write_fault_to_shadow_pgtable here to ensure it is
 	 * never reused.
 	 */
+	write_fault_to_spt = vcpu->arch.write_fault_to_shadow_pgtable;
 	vcpu->arch.write_fault_to_shadow_pgtable = false;
 	kvm_clear_exception_queue(vcpu);
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 09/21] KVM: VMX: Add basic handling of VM-Exit from SGX enclave
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (7 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 10/21] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for VMX/SGX Sean Christopherson
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Intel SGX adds a new CPL3-only execution environment referred to as an
"enclave".  To protect the secrets of an enclave, the CPU's state is
loaded with synthetic data when exiting the enclave (the enclave's state
is saved/restored via protected memory), and the RIP is set to a defined
exit value.  This behavior also applies to VMi-Exits from the enclave,
e.g. GUEST_RIP may not necessarily reflect the actual RIP that triggered
the VMExit.

To help a VMM recognize and handle exits from enclaves, SGX adds bits to
existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and
GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR.  Define the
new architectural bits and add a boolean to struct vcpu_vmx to cache
VMX_EXIT_REASON_FROM_ENCLAVE and clear the bit in exit_reason so that
checks against exit_reason do not need to account for SGX, e.g.
exit_reason == EXIT_REASON_EXCEPTION_NMI continues to work.

As for new behavior for VM-Exits from enclaves, KVM is for the most
part a passive observer of both bits, e.g. it needs to account for
the bits when propagating information to a nested VMM, but otherwise
doesn't need to act differently for VMExits from enclaves.

The one scenario that is impacted is emulation, which becomes impossible
since KVM does not have access to the RIP or instruction stream that
triggered the VMExit[2].  This is largely a non-issue as most
instructions that might trigger VM-Exit are designed to unconditionally
that may VM-Exit but do not #UD, KVM either never sets the exiting
control, e.g. PAUSE_EXITING[1], or sets it if and only if the feature is
not exposed to the guest in order to inject a #UD, e.g. RDRAND_EXITING.

But, because it is still possible for a guest to trigger emulation,
e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit
from an enclave.  This is architecturally accurate for instruction
VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable
to killing the VM.  In practice, only broken or particularly stupid
guests should ever encounter this behavior.

Add a WARN in skip_emulated_instruction to detect any attempt to
modify the guest's RIP during an SGX enclave VM-Exit as all such flows
should either be unreachable or must handle exits from enclaves before
getting to skip_emulated_instruction.

[1] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at
    CPL3, so we also don't need to worry about that interaction.

[2] Impossible for all practical purposes.  Not truly impossible
    since KVM could implement some form of para-virtualization scheme.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/vmx.h      |  1 +
 arch/x86/include/uapi/asm/vmx.h |  1 +
 arch/x86/kvm/vmx/nested.c       |  2 ++
 arch/x86/kvm/vmx/vmx.c          | 42 ++++++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/vmx.h          |  3 +++
 5 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index a39136b0d509..a62ac47d2006 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -364,6 +364,7 @@ enum vmcs_field {
 #define GUEST_INTR_STATE_MOV_SS		0x00000002
 #define GUEST_INTR_STATE_SMI		0x00000004
 #define GUEST_INTR_STATE_NMI		0x00000008
+#define GUEST_INTR_STATE_ENCLAVE_INTR	0x00000010
 
 /* GUEST_ACTIVITY_STATE flags */
 #define GUEST_ACTIVITY_ACTIVE		0
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index d213ec5c3766..501a35bd4cc7 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -27,6 +27,7 @@
 
 
 #define VMX_EXIT_REASONS_FAILED_VMENTRY         0x80000000
+#define VMX_EXIT_REASON_FROM_ENCLAVE		0x08000000
 
 #define EXIT_REASON_EXCEPTION_NMI       0
 #define EXIT_REASON_EXTERNAL_INTERRUPT  1
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 46af3a5e9209..fef4fb3e1aaa 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3523,6 +3523,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	/* update exit information fields: */
 
 	vmcs12->vm_exit_reason = exit_reason;
+	if (to_vmx(vcpu)->sgx_enclave_exit)
+		vmcs12->vm_exit_reason |= VMX_EXIT_REASON_FROM_ENCLAVE;
 	vmcs12->exit_qualification = exit_qualification;
 	vmcs12->vm_exit_intr_info = exit_intr_info;
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f48fc990ca6d..abcd2f7a36f5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1460,16 +1460,40 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
 
 static bool vmx_is_emulatable(struct kvm_vcpu *vcpu, void *insn, int insn_len)
 {
+	if (unlikely(to_vmx(vcpu)->sgx_enclave_exit)) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+		return false;
+	}
 	return true;
 }
 
 static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
+	u32 instr_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
 	unsigned long rip;
 
-	rip = kvm_rip_read(vcpu);
-	rip += vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
-	kvm_rip_write(vcpu, rip);
+	/*
+	 * Emulating an enclave instruction is not supported as KVM cannot
+	 * access the enclave's memory or its true RIP, e.g. the GUEST_RIP
+	 * field points at the exit point of the enclave, not the RIP that
+	 * actually triggered the VMExit.  But, because most instructions
+	 * that cause VM-Exit will #UD in an enclave, this conundrum is
+	 * handled for us.  There are a few exceptions, notably the debug
+	 * instructions INT1ICEBRK and INT3, as they are allowed in debug
+	 * enclaves and generate #DB/#BP as expected, which KVM might
+	 * intercept.  But again, the CPU does the dirty work and saves an
+	 * instr length of zero so that VMMs don't shoot themselves in the
+	 * foot.  WARN if we try to skip a non-zero length instruction after
+	 * a VMExit from an enclave.
+	 */
+	if (likely(instr_len)) {
+		WARN(to_vmx(vcpu)->sgx_enclave_exit,
+		     "KVM: skipping instruction after SGX enclave VM-Exit");
+
+		rip = kvm_rip_read(vcpu);
+		rip += instr_len;
+		kvm_rip_write(vcpu, rip);
+	}
 
 	/* skipping an emulated instruction also counts */
 	vmx_set_interrupt_shadow(vcpu, 0);
@@ -5104,6 +5128,9 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 {
 	gpa_t gpa;
 
+	if (!vmx_is_emulatable(vcpu, NULL, 0))
+		return 1;
+
 	/*
 	 * A nested guest cannot optimize MMIO vmexits, because we have an
 	 * nGPA here instead of the required GPA.
@@ -6537,6 +6564,15 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	vmx->idt_vectoring_info = 0;
 
 	vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
+
+	/*
+	 * Immediately cache and clear the "exit from SGX enclave" bit so that
+	 * KVM can directly compare exit_reason against the base exit reasons,
+	 * e.g. exit_reason == EXIT_REASON_EXCEPTION_NMI.
+	 */
+	vmx->sgx_enclave_exit =
+		(vmx->exit_reason & VMX_EXIT_REASON_FROM_ENCLAVE);
+	vmx->exit_reason &= ~VMX_EXIT_REASON_FROM_ENCLAVE;
 	if (vmx->fail || (vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
 		return;
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 61128b48c503..6d1b57e0337e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -247,6 +247,9 @@ struct vcpu_vmx {
 
 	bool req_immediate_exit;
 
+	/* Exit from SGX enclave */
+	bool sgx_enclave_exit;
+
 	/* Support for PML */
 #define PML_ENTITY_NUM		512
 	struct page *pml_pg;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 10/21] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for VMX/SGX
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (8 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 09/21] KVM: VMX: Add basic handling of VM-Exit from SGX enclave Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 11/21] KVM: x86: Export kvm_propagate_fault (as kvm_propagate_page_fault) Sean Christopherson
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Support for SGX Launch Control requires KVM to trap and execute
ENCLS[ECREATE] and ENCLS[EINIT] on behalf of the guest, which requires
requires obtaining the GPA of a Secure Enclave Control Structure (SECS)
in order to get its corresponding HVA.

Because the SECS must reside in the Enclave Page Cache (EPC), copying
the SECS's data to a host-controlled buffer via existing exported
helpers is not a viable option as the EPC is not readable or writable
by the kernel.

Translating GVA->HVA for non-EPC pages is also desirable, as passing
user pointers directly to ECREATE and EINIT avoids having to copy pages
worth of data into the kernel.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index afcc01a59421..2b64bb854571 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5089,6 +5089,7 @@ gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
 	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
 	return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read);
 
  gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva,
 				struct x86_exception *exception)
@@ -5105,6 +5106,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
 	access |= PFERR_WRITE_MASK;
 	return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_write);
 
 /* uses this to access any guest's mapped memory without checking CPL */
 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 11/21] KVM: x86: Export kvm_propagate_fault (as kvm_propagate_page_fault)
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (9 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 10/21] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for VMX/SGX Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 12/21] KVM: x86: Define new #PF SGX error code bit Sean Christopherson
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Support for SGX Launch Control requires KVM to trap and execute
ENCLS[EINIT] on behalf of the guest.  Interception of ENCLS leafs
occurs immediately after CPL0 checks, i.e. before any processing
of the leaf-specific operands.  As such, it's possible that KVM
could intercept an EINIT from L2 and encounter an EPT fault while
walking L1's EPT tables.  Rather than force EINIT through the
generic emulator, export kvm_propagate_fault() so that the EINIT
handler can inject the proper page fault.  Rename the function to
kvm_propagate_page_fault() to clarify that it is only for page
faults, and WARN if it is invoked with an exception other than
PF_VECTOR.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/x86.c              | 7 +++++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1341d8390ebe..397d755bb353 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1357,6 +1357,7 @@ void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
+bool kvm_propagate_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
 int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			    gfn_t gfn, void *data, int offset, int len,
 			    u32 access);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2b64bb854571..ec92c5534336 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -587,8 +587,10 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
 
-static bool kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
+bool kvm_propagate_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 {
+	WARN_ON(fault->vector != PF_VECTOR);
+
 	if (mmu_is_nested(vcpu) && !fault->nested_page_fault)
 		vcpu->arch.nested_mmu.inject_page_fault(vcpu, fault);
 	else
@@ -596,6 +598,7 @@ static bool kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fau
 
 	return fault->nested_page_fault;
 }
+EXPORT_SYMBOL_GPL(kvm_propagate_page_fault);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu)
 {
@@ -6089,7 +6092,7 @@ static bool inject_emulated_exception(struct kvm_vcpu *vcpu)
 {
 	struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
 	if (ctxt->exception.vector == PF_VECTOR)
-		return kvm_propagate_fault(vcpu, &ctxt->exception);
+		return kvm_propagate_page_fault(vcpu, &ctxt->exception);
 
 	if (ctxt->exception.error_code_valid)
 		kvm_queue_exception_e(vcpu, ctxt->exception.vector,
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 12/21] KVM: x86: Define new #PF SGX error code bit
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (10 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 11/21] KVM: x86: Export kvm_propagate_fault (as kvm_propagate_page_fault) Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 13/21] x86/sgx: Move the intermediate EINIT helper into the driver Sean Christopherson
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Page faults that are signaled by the SGX Enclave Page Cache Map (EPCM),
as opposed to the traditional IA32/EPT page tables, set an SGX bit in
the error code to indicate that the #PF was induced by SGX.  KVM will
need to emulate this behavior as part of its trap-and-execute-EINIT
scheme needed to virtualize SGX Launch Control, e.g. if EINIT itself
faults due to the EPC being zapped by hardware after suspend-resume.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 397d755bb353..103df8cbdd24 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -201,6 +201,7 @@ enum {
 #define PFERR_RSVD_BIT 3
 #define PFERR_FETCH_BIT 4
 #define PFERR_PK_BIT 5
+#define PFERR_SGX_BIT 15
 #define PFERR_GUEST_FINAL_BIT 32
 #define PFERR_GUEST_PAGE_BIT 33
 
@@ -210,6 +211,7 @@ enum {
 #define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
 #define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
 #define PFERR_PK_MASK (1U << PFERR_PK_BIT)
+#define PFERR_SGX_MASK (1U << PFERR_SGX_BIT)
 #define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
 #define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 13/21] x86/sgx: Move the intermediate EINIT helper into the driver
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (11 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 12/21] KVM: x86: Define new #PF SGX error code bit Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 14/21] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Sean Christopherson
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Providing sgx_einit() in the common SGX code was a bit premature.  The
thought was that the native SGX driver and KVM would be able to use a
common EINIT helper, but that may or may not hold true depending on
how KVM's implementation shakes out.  For example, KVM may want to pass
user pointers directly to EINIT in order to avoid copying large amounts
of data to in-kernel temp structures.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/driver/ioctl.c | 21 +++++++++++--
 arch/x86/kernel/cpu/sgx/main.c         | 43 ++++++--------------------
 arch/x86/kernel/cpu/sgx/sgx.h          |  4 +--
 3 files changed, 30 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
index b7aa06920d10..a1cb5f772363 100644
--- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
@@ -658,6 +658,23 @@ static int sgx_get_key_hash(const void *modulus, void *hash)
 	return ret;
 }
 
+static int __sgx_einit(struct sgx_sigstruct *sigstruct,
+		       struct sgx_einittoken *token, struct sgx_epc_page *secs,
+		       u64 *lepubkeyhash)
+{
+	int ret;
+
+	preempt_disable();
+	sgx_update_lepubkeyhash_msrs(lepubkeyhash, false);
+	ret = __einit(sigstruct, token, sgx_epc_addr(secs));
+	if (ret == SGX_INVALID_EINITTOKEN) {
+		sgx_update_lepubkeyhash_msrs(lepubkeyhash, true);
+		ret = __einit(sigstruct, token, sgx_epc_addr(secs));
+	}
+	preempt_enable();
+	return ret;
+}
+
 static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 			 struct sgx_einittoken *token)
 {
@@ -686,8 +703,8 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 
 	for (i = 0; i < SGX_EINIT_SLEEP_COUNT; i++) {
 		for (j = 0; j < SGX_EINIT_SPIN_COUNT; j++) {
-			ret = sgx_einit(sigstruct, token, encl->secs.epc_page,
-					mrsigner);
+			ret = __sgx_einit(sigstruct, token,
+					  encl->secs.epc_page, mrsigner);
 			if (ret == SGX_UNMASKED_EVENT)
 				continue;
 			else
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 532dd90e09e1..542427c6ae9c 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -166,7 +166,15 @@ void sgx_free_page(struct sgx_epc_page *page)
 	WARN(ret > 0, "sgx: EREMOVE returned %d (0x%x)", ret, ret);
 }
 
-static void sgx_update_lepubkeyhash_msrs(u64 *lepubkeyhash, bool enforce)
+/**
+ * sgx_update_lepubkeyhash_msrs - Write the IA32_SGXLEPUBKEYHASHx MSRs
+ * @lepubkeyhash:	array of desired MSRs values
+ * @enforce:		force WRMSR regardless of cache status
+ *
+ * Write the IA32_SGXLEPUBKEYHASHx MSRs according to @lepubkeyhash if the
+ * last cached value doesn't match the desired value, or if @enforce is %true.
+ */
+void sgx_update_lepubkeyhash_msrs(u64 *lepubkeyhash, bool enforce)
 {
 	u64 *cache;
 	int i;
@@ -180,39 +188,6 @@ static void sgx_update_lepubkeyhash_msrs(u64 *lepubkeyhash, bool enforce)
 	}
 }
 
-/**
- * sgx_einit - initialize an enclave
- * @sigstruct:		a pointer a SIGSTRUCT
- * @token:		a pointer an EINITTOKEN (optional)
- * @secs:		a pointer a SECS
- * @lepubkeyhash:	the desired value for IA32_SGXLEPUBKEYHASHx MSRs
- *
- * Execute ENCLS[EINIT], writing the IA32_SGXLEPUBKEYHASHx MSRs according
- * to @lepubkeyhash (if possible and necessary).
- *
- * Return:
- *   0 on success,
- *   -errno or SGX error on failure
- */
-int sgx_einit(struct sgx_sigstruct *sigstruct, struct sgx_einittoken *token,
-	      struct sgx_epc_page *secs, u64 *lepubkeyhash)
-{
-	int ret;
-
-	if (!boot_cpu_has(X86_FEATURE_SGX_LC))
-		return __einit(sigstruct, token, sgx_epc_addr(secs));
-
-	preempt_disable();
-	sgx_update_lepubkeyhash_msrs(lepubkeyhash, false);
-	ret = __einit(sigstruct, token, sgx_epc_addr(secs));
-	if (ret == SGX_INVALID_EINITTOKEN) {
-		sgx_update_lepubkeyhash_msrs(lepubkeyhash, true);
-		ret = __einit(sigstruct, token, sgx_epc_addr(secs));
-	}
-	preempt_enable();
-	return ret;
-}
-
 static __init void sgx_free_epc_section(struct sgx_epc_section *section)
 {
 	struct sgx_epc_page *page;
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 748b1633d770..3f3311024bd0 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -85,8 +85,8 @@ void sgx_reclaim_pages(void);
 struct sgx_epc_page *sgx_alloc_page(void *owner, bool reclaim);
 int __sgx_free_page(struct sgx_epc_page *page);
 void sgx_free_page(struct sgx_epc_page *page);
-int sgx_einit(struct sgx_sigstruct *sigstruct, struct sgx_einittoken *token,
-	      struct sgx_epc_page *secs, u64 *lepubkeyhash);
+
+void sgx_update_lepubkeyhash_msrs(u64 *lepubkeyhash, bool enforce);
 
 #define SGX_ENCL_DEV_MINOR	0
 #define SGX_PROV_DEV_MINOR	1
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 14/21] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (12 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 13/21] x86/sgx: Move the intermediate EINIT helper into the driver Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 15/21] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions Sean Christopherson
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Provide wrappers around __ecreate() and __einit() to export their
functionality for use by KVM without having to export a large amount of
SGX boilerplate code.  Intermediate helpers also shelter KVM from the
ugliness of overloading the ENCLS return value to encode multiple error
formats in a single int.

KVM will use the helpers to trap-and-execute ECREATE and EINIT as part
its SGX virtualization.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/Kconfig               |  3 ++
 arch/x86/include/asm/sgx.h     | 15 ++++++++++
 arch/x86/kernel/cpu/sgx/virt.c | 55 ++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+)
 create mode 100644 arch/x86/include/asm/sgx.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c1bdb9f85928..8bbc6a30588d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1969,6 +1969,9 @@ config INTEL_SGX_VIRTUALIZATION
 	  "raw" EPC for the purpose of exposing EPC to a KVM guest, i.e. a
 	  virtual machine, via a device node (/dev/sgx/virt_epc by default).
 
+	  SGX virtualization also adds helpers that are used by KVM to trap
+	  and execute certain ENCLS instructions on behalf of a KVM guest.
+
 	  If unsure, say N.
 
 config EFI
diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
new file mode 100644
index 000000000000..f0f0176b8e2f
--- /dev/null
+++ b/arch/x86/include/asm/sgx.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_SGX_H
+#define _ASM_X86_SGX_H
+
+#include <linux/types.h>
+
+struct sgx_pageinfo;
+
+#if IS_ENABLED(CONFIG_KVM_INTEL)
+int sgx_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs, int *trapnr);
+int sgx_einit(void __user *sigstruct, void __user *token,
+	      void __user *secs, u64 *lepubkeyhash, int *trapnr);
+#endif
+
+#endif /* _ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 79ee5917a4fc..9e5bf4450bf7 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -251,3 +251,58 @@ int __init sgx_virt_epc_init(void)
 
 	return ret;
 }
+
+#if IS_ENABLED(CONFIG_KVM_INTEL)
+int sgx_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs, int *trapnr)
+{
+	int ret;
+
+	__uaccess_begin();
+	ret = __ecreate(pageinfo, (void *)secs);
+	__uaccess_end();
+
+	if (encls_faulted(ret)) {
+		*trapnr = ENCLS_TRAPNR(ret);
+		return -EFAULT;
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sgx_ecreate);
+
+static int __sgx_einit(void __user *sigstruct, void __user *token,
+		       void __user *secs)
+{
+	int ret;
+
+	__uaccess_begin();
+	ret =  __einit((void *)sigstruct, (void *)token, (void *)secs);
+	__uaccess_end();
+	return ret;
+}
+
+int sgx_einit(void __user *sigstruct, void __user *token,
+	      void __user *secs, u64 *lepubkeyhash, int *trapnr)
+{
+	int ret;
+
+	if (!boot_cpu_has(X86_FEATURE_SGX_LC)) {
+		ret = __sgx_einit(sigstruct, token, secs);
+	} else {
+		preempt_disable();
+		sgx_update_lepubkeyhash_msrs(lepubkeyhash, false);
+		ret = __sgx_einit(sigstruct, token, secs);
+		if (ret == SGX_INVALID_EINITTOKEN) {
+			sgx_update_lepubkeyhash_msrs(lepubkeyhash, true);
+			ret = __sgx_einit(sigstruct, token, secs);
+		}
+		preempt_enable();
+	}
+
+	if (encls_faulted(ret)) {
+		*trapnr = ENCLS_TRAPNR(ret);
+		return -EFAULT;
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sgx_einit);
+#endif
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 15/21] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (13 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 14/21] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 16/21] KVM: VMX: Edd emulation of SGX Launch Control LE hash MSRs Sean Christopherson
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

Userspace can restrict what bits can be set in MISCSELECT, ATTRIBUTES
and XFRM via CPUID.  Intercept ECREATE when any of the aforementioned
masks diverges from hardware in order to enforce the desired CPUID
model, i.e. inject #GP if the guest attempts to set a bit that hasn't
been enumerated as allowed-1 in CPUID.

Add the handler in a dedicated SGX file under the VMX sub-directory so
as to confine the ugliness of the SGX specific code (re-executing ENCLS
leafs is messy due to the need to follow pointers from structs, get EPC
pages, etc...) and to save compilation cycles when SGX functionality is
disabled in the kernel.  The ENCLS handlers will soon grow to ~300 lines
of code when Launch Control support is added, and in the distant future
could balloon significantly if/when EPC oversubscription is supported.

Actual usage of the handler will be added in a future patch, i.e. when
SGX virtualization is fully enabled.

Note, access to the PROVISIONKEY is not yet supported.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |   3 +
 arch/x86/include/asm/sgx_arch.h |   1 +
 arch/x86/kvm/Makefile           |   2 +
 arch/x86/kvm/vmx/sgx.c          | 223 ++++++++++++++++++++++++++++++++
 4 files changed, 229 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/sgx.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 103df8cbdd24..27841a5d7851 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -928,6 +928,9 @@ struct kvm_arch {
 
 	bool guest_can_read_msr_platform_info;
 	bool exception_payload_enabled;
+
+	/* Guest can access the SGX PROVISIONKEY. */
+	bool sgx_provisioning_allowed;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/include/asm/sgx_arch.h b/arch/x86/include/asm/sgx_arch.h
index 39f731580ea8..e06f3ff717b4 100644
--- a/arch/x86/include/asm/sgx_arch.h
+++ b/arch/x86/include/asm/sgx_arch.h
@@ -8,6 +8,7 @@
 #ifndef _ASM_X86_SGX_ARCH_H
 #define _ASM_X86_SGX_ARCH_H
 
+#include <linux/bits.h>
 #include <linux/types.h>
 
 #define SGX_CPUID				0x12
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 31ecf7a76d5a..f919c3e6abd7 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -13,6 +13,8 @@ kvm-y			+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
 			   hyperv.o page_track.o debugfs.o
 
 kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o
+kvm-intel-$(CONFIG_INTEL_SGX_VIRTUALIZATION) += vmx/sgx.o
+
 kvm-amd-y		+= svm.o pmu_amd.o
 
 obj-$(CONFIG_KVM)	+= kvm.o
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
new file mode 100644
index 000000000000..5b08e7dcc3a3
--- /dev/null
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <asm/sgx.h>
+#include <asm/sgx_arch.h>
+
+#include "cpuid.h"
+#include "kvm_cache_regs.h"
+#include "vmx.h"
+#include "x86.h"
+
+/*
+ * ENCLS's memory operands use a fixed segment (DS) and a fixed
+ * address size based on the mode.  Related prefixes are ignored.
+ */
+static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
+			     int size, int alignment, gva_t *gva)
+{
+	struct kvm_segment s;
+	bool fault;
+
+	vmx_get_segment(vcpu, &s, VCPU_SREG_DS);
+
+	*gva = s.base + offset;
+
+	if (!IS_ALIGNED(*gva, alignment)) {
+		fault = true;
+	} else if (is_long_mode(vcpu)) {
+		fault = is_noncanonical_address(*gva, vcpu);
+	} else {
+		*gva &= 0xffffffff;
+		fault = (s.unusable) ||
+			(s.type != 2 && s.type != 3) ||
+			(*gva > s.limit) ||
+			((s.base != 0 || s.limit != 0xffffffff) &&
+			(((u64)*gva + size - 1) > s.limit + 1));
+	}
+	if (fault)
+		kvm_inject_gp(vcpu, 0);
+	return fault ? -EINVAL : 0;
+}
+
+static int sgx_read_gva(struct kvm_vcpu *vcpu, gva_t gva, void *data,
+			 unsigned int size)
+{
+	struct x86_exception ex;
+
+	if (kvm_read_guest_virt(vcpu, gva, data, size, &ex)) {
+		kvm_propagate_page_fault(vcpu, &ex);
+		return -EFAULT;
+	}
+	return 0;
+}
+
+static int sgx_read_hva(struct kvm_vcpu *vcpu, unsigned long hva, void *data,
+			unsigned int size)
+{
+	if (__copy_from_user(data, (void __user *)hva, size)) {
+		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+		vcpu->run->internal.ndata = 2;
+		vcpu->run->internal.data[0] = hva;
+		vcpu->run->internal.data[1] = size;
+		return -EFAULT;
+	}
+	return 0;
+}
+
+static int sgx_gva_to_hva(struct kvm_vcpu *vcpu, gva_t gva, bool write,
+			  unsigned long *hva)
+{
+	struct x86_exception ex;
+	gpa_t gpa;
+
+	if (write)
+		gpa = kvm_mmu_gva_to_gpa_write(vcpu, gva, &ex);
+	else
+		gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, &ex);
+
+	if (gpa == UNMAPPED_GVA) {
+		kvm_propagate_page_fault(vcpu, &ex);
+		return -EFAULT;
+	}
+
+	*hva = kvm_vcpu_gfn_to_hva(vcpu, PFN_DOWN(gpa));
+	if (kvm_is_error_hva(*hva)) {
+		ex.vector = PF_VECTOR;
+		ex.error_code = PFERR_PRESENT_MASK;
+		if (write)
+			ex.error_code |= PFERR_WRITE_MASK;
+		ex.address = gva;
+		ex.error_code_valid = true;
+		ex.nested_page_fault = false;
+		kvm_propagate_page_fault(vcpu, &ex);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int sgx_encls_postamble(struct kvm_vcpu *vcpu, int ret, int trapnr,
+			       gva_t gva)
+{
+	struct x86_exception ex;
+	unsigned long rflags;
+
+	if (ret == -EFAULT)
+		goto handle_fault;
+
+	rflags = vmx_get_rflags(vcpu) & ~(X86_EFLAGS_CF | X86_EFLAGS_PF |
+					  X86_EFLAGS_AF | X86_EFLAGS_SF |
+					  X86_EFLAGS_OF);
+	if (ret)
+		rflags |= X86_EFLAGS_ZF;
+	else
+		rflags &= ~X86_EFLAGS_ZF;
+	vmx_set_rflags(vcpu, rflags);
+
+	kvm_rax_write(vcpu, ret);
+	return kvm_skip_emulated_instruction(vcpu);
+
+handle_fault:
+	/*
+	 * A non-EPCM #PF indicates a bad userspace HVA.  This *should* check
+	 * for PFEC.SGX and not assume any #PF on SGX2 originated in the EPC,
+	 * but the error code isn't (yet) plumbed through the ENCLS helpers.
+	 */
+	if (trapnr == PF_VECTOR && !boot_cpu_has(X86_FEATURE_SGX2)) {
+		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+		vcpu->run->internal.ndata = 0;
+		return 0;
+	}
+
+	/*
+	 * If the guest thinks it's running on SGX2 hardware, inject an SGX
+	 * #PF if the fault matches an EPCM fault signature (#GP on SGX1,
+	 * #PF on SGX2).  The assumption is that EPCM faults are much more
+	 * likely than a bad userspace address.
+	 */
+	if ((trapnr == PF_VECTOR || !boot_cpu_has(X86_FEATURE_SGX2)) &&
+	    guest_cpuid_has(vcpu, X86_FEATURE_SGX2)) {
+		ex.vector = PF_VECTOR;
+		ex.error_code = PFERR_PRESENT_MASK | PFERR_WRITE_MASK |
+				PFERR_SGX_MASK;
+		ex.address = gva;
+		ex.error_code_valid = true;
+		ex.nested_page_fault = false;
+		kvm_inject_page_fault(vcpu, &ex);
+	} else {
+		kvm_inject_gp(vcpu, 0);
+	}
+	return 1;
+}
+
+int handle_encls_ecreate(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *sgx_12_0, *sgx_12_1;
+	unsigned long a_hva, m_hva, x_hva, secs_hva;
+	struct sgx_pageinfo pageinfo;
+	gva_t pageinfo_gva, secs_gva;
+	u64 attributes, xfrm;
+	int ret, trapnr;
+	u32 miscselect;
+
+	sgx_12_0 = kvm_find_cpuid_entry(vcpu, 0x12, 0);
+	sgx_12_1 = kvm_find_cpuid_entry(vcpu, 0x12, 1);
+	if (!sgx_12_0 || !sgx_12_1) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 32, 32, &pageinfo_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva))
+		return 1;
+
+	/*
+	 * Copy the PAGEINFO to local memory, its pointers need to be
+	 * translated, i.e. we need to do a deep copy/translate.
+	 */
+	if (sgx_read_gva(vcpu, pageinfo_gva, &pageinfo, sizeof(pageinfo)))
+		return 1;
+
+	/* Translate the SECINFO, SOURCE and SECS pointers from GVA to HVA. */
+	if (sgx_gva_to_hva(vcpu, pageinfo.metadata, false,
+			   (unsigned long *)&pageinfo.metadata) ||
+	    sgx_gva_to_hva(vcpu, pageinfo.contents, false,
+			   (unsigned long *)&pageinfo.contents) ||
+	    sgx_gva_to_hva(vcpu, secs_gva, true, &secs_hva))
+		return 1;
+
+	m_hva = pageinfo.contents + offsetof(struct sgx_secs, miscselect);
+	a_hva = pageinfo.contents + offsetof(struct sgx_secs, attributes);
+	x_hva = pageinfo.contents + offsetof(struct sgx_secs, xfrm);
+
+	/* Exit to userspace if copying from a host userspace address fails. */
+	if (sgx_read_hva(vcpu, m_hva, &miscselect, sizeof(miscselect)) ||
+	    sgx_read_hva(vcpu, a_hva, &attributes, sizeof(attributes)) ||
+	    sgx_read_hva(vcpu, x_hva, &xfrm, sizeof(xfrm)))
+		return 0;
+
+	/* Enforce restriction of access to the PROVISIONKEY. */
+	if (!vcpu->kvm->arch.sgx_provisioning_allowed &&
+	    (attributes & SGX_ATTR_PROVISIONKEY)) {
+		if (sgx_12_1->eax & SGX_ATTR_PROVISIONKEY)
+			pr_warn_once("KVM: SGX PROVISIONKEY advertised but not allowed\n");
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	/* Enforce CPUID restrictions on MISCSELECT, ATTRIBUTES and XFRM. */
+	if ((u32)miscselect & ~sgx_12_0->ebx ||
+	    (u32)attributes & ~sgx_12_1->eax ||
+	    (u32)(attributes >> 32) & ~sgx_12_1->ebx ||
+	    (u32)xfrm & ~sgx_12_1->ecx ||
+	    (u32)(xfrm >> 32) & ~sgx_12_1->edx) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	ret = sgx_ecreate(&pageinfo, (void __user *)secs_hva, &trapnr);
+
+	return sgx_encls_postamble(vcpu, ret, trapnr, secs_gva);
+}
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 16/21] KVM: VMX: Edd emulation of SGX Launch Control LE hash MSRs
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (14 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 15/21] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 17/21] KVM: VMX: Add handler for ENCLS[EINIT] to support SGX Launch Control Sean Christopherson
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

SGX Launch Control (LC) modifies the behavior of ENCLS[EINIT] to query
a set of user-controllable MSRs (Launch Enclave, a.k.a. LE, Hash MSRs)
when verifying the key used to sign an enclave.  On CPUs without LC
support, the public key hash of allowed LEs is hardwired into the CPU to
an Intel controlled key (the Intel key is also the reset value of the LE
hash MSRs).  Track the guest's desired hash and stuff it into hardware
when executing EINIT on behalf of the guest (in a future patch).

Note, KVM allows writes to the LE hash MSRs if IA32_FEATURE_CONTROL is
unlocked.  This is technically not arch behavior, but it's roughly
equivalent to the arch behavior of the MSRs being writable prior to
activating SGX[1].  Emulating SGX activation is feasible, but adds no
tangible benefits and would just create extra work for KVM and guest
firmware.

[1] SGX related bits in IA32_FEATURE_CONTROL cannot be set until SGX
    is activated, e.g. by firmware.  SGX activation is triggered by
    setting bit 0 in MSR 0x7a.  Until SGX is activated, the LE hash
    MSRs are writable, e.g. to allow firmware to lock down the LE
    root key with a non-Intel value.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h |  2 ++
 2 files changed, 44 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index abcd2f7a36f5..819c47fee157 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -390,6 +390,8 @@ static const struct kvm_vmx_segment_field {
 
 u64 host_efer;
 
+static u64 sgx_pubkey_hash[4] __ro_after_init;
+
 /*
  * Though SYSCALL is only supported in 64-bit mode on Intel CPUs, kvm
  * will emulate SYSCALL in legacy mode if the vendor string in guest
@@ -1740,6 +1742,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_FEATURE_CONTROL:
 		msr_info->data = vmx->msr_ia32_feature_control;
 		break;
+	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
+			return 1;
+		msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
+			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
+		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!nested_vmx_allowed(vcpu))
 			return 1;
@@ -1953,6 +1962,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (msr_info->host_initiated && data == 0)
 			vmx_leave_nested(vcpu);
 		break;
+	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
+		if (!msr_info->host_initiated &&
+		    (!guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC) ||
+		    ((vmx->msr_ia32_feature_control & FEATURE_CONTROL_LOCKED) &&
+		    !(vmx->msr_ia32_feature_control & FEATURE_CONTROL_SGX_LE_WR))))
+			return 1;
+		vmx->msr_ia32_sgxlepubkeyhash
+			[msr_index - MSR_IA32_SGXLEPUBKEYHASH0] = data;
+		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!msr_info->host_initiated)
 			return 1; /* they are read-only */
@@ -6698,6 +6716,9 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 	else
 		memset(&vmx->nested.msrs, 0, sizeof(vmx->nested.msrs));
 
+	memcpy(vmx->msr_ia32_sgxlepubkeyhash, sgx_pubkey_hash,
+	       sizeof(sgx_pubkey_hash));
+
 	vmx->nested.posted_intr_nv = -1;
 	vmx->nested.current_vmptr = -1ull;
 
@@ -7588,6 +7609,27 @@ static __init int hardware_setup(void)
 	if (!enable_ept || !cpu_has_vmx_intel_pt())
 		pt_mode = PT_MODE_SYSTEM;
 
+	/*
+	 * Use Intel's default value for Skylake hardware if Launch Control is
+	 * not supported, i.e. Intel's hash is hardcoded into silicon, or if
+	 * Launch Control is supported and enabled, i.e. mimic the reset value
+	 * and let the guest write the MSRs at will.  If Launch Control is
+	 * supported but disabled, then we have to use the current MSR values
+	 * as the MSRs the hash MSRs exist but are locked and not writable.
+	 */
+	if (boot_cpu_has(X86_FEATURE_SGX_LC) ||
+	    rdmsrl_safe(MSR_IA32_SGXLEPUBKEYHASH0, &sgx_pubkey_hash[0])) {
+		sgx_pubkey_hash[0] = 0xa6053e051270b7acULL;
+		sgx_pubkey_hash[1] = 0x6cfbe8ba8b3b413dULL;
+		sgx_pubkey_hash[2] = 0xc4916d99f2b3735dULL;
+		sgx_pubkey_hash[3] = 0xd4f8c05909f9bb3bULL;
+	} else {
+		/* MSR_IA32_SGXLEPUBKEYHASH0 is read above */
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH1, sgx_pubkey_hash[1]);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH2, sgx_pubkey_hash[2]);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH3, sgx_pubkey_hash[3]);
+	}
+
 	if (nested) {
 		nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
 					   vmx_capability.ept, enable_apicv);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 6d1b57e0337e..1519c6918190 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -272,6 +272,8 @@ struct vcpu_vmx {
 	 */
 	u64 msr_ia32_feature_control;
 	u64 msr_ia32_feature_control_valid_bits;
+	/* SGX Launch Control public key hash */
+	u64 msr_ia32_sgxlepubkeyhash[4];
 	u64 ept_pointer;
 
 	struct pt_desc pt_desc;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 17/21] KVM: VMX: Add handler for ENCLS[EINIT] to support SGX Launch Control
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (15 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 16/21] KVM: VMX: Edd emulation of SGX Launch Control LE hash MSRs Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 18/21] KVM: x86: Invoke kvm_x86_ops->cpuid_update() after kvm_update_cpuid() Sean Christopherson
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

SGX Launch Control (LC) modifies the behavior of ENCLS[EINIT] to query
a set of user-controllable MSRs (Launch Enclave, a.k.a. LE, Hash MSRs)
when verifying the key used to sign an enclave.  On CPUs without LC
support, the public key hash of allowed LEs is hardwired into the CPU to
an Intel controlled key (the Intel key is also the reset value of the LE
hash MSRs).

When LC is enabled in the host, EINIT must be intercepted and executed
in the host using the guest's LE hash MSR value, even if the guest's
values are fixed to hardware default values.  The MSRs are not switched
on VM-Enter/VM-Exit as writing the MSRs is extraordinarily expensive,
e.g. each WRMSR is 4x slower than a regular WRMSR and on par with a full
VM-Enter -> VM-Exit transition.  Furthermore, as the MSRS aren't allowed
in the hardware-supported lists, i.e. would need to be manually read and
written.  On the other hand, EINIT takes tens of thousands of cycles to
execute (it's so slow that it's interruptible), i.e. the ~1k cycles of
overhead to trap-and-execute EINIT is unlikely to be noticed by the
guest, let alone impact the overall performance of SGX.

Actual usage of the handler will be added in a future patch, i.e. when
SGX virtualization is fully enabled.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/sgx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 5b08e7dcc3a3..2bcfa3b6c75e 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -221,3 +221,27 @@ int handle_encls_ecreate(struct kvm_vcpu *vcpu)
 
 	return sgx_encls_postamble(vcpu, ret, trapnr, secs_gva);
 }
+
+int handle_encls_einit(struct kvm_vcpu *vcpu)
+{
+	unsigned long sig_hva, secs_hva, token_hva;
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	gva_t sig_gva, secs_gva, token_gva;
+	int ret, trapnr;
+
+	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 1808, 4096, &sig_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_rdx_read(vcpu), 304, 512, &token_gva))
+		return 1;
+
+	if (sgx_gva_to_hva(vcpu, sig_gva, false, &sig_hva) ||
+	    sgx_gva_to_hva(vcpu, secs_gva, true, &secs_hva) ||
+	    sgx_gva_to_hva(vcpu, token_gva, false, &token_hva))
+		return 1;
+
+	ret = sgx_einit((void __user *)sig_hva, (void __user *)token_hva,
+			(void __user *)secs_hva, vmx->msr_ia32_sgxlepubkeyhash,
+			&trapnr);
+
+	return sgx_encls_postamble(vcpu, ret, trapnr, secs_hva);
+}
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 18/21] KVM: x86: Invoke kvm_x86_ops->cpuid_update() after kvm_update_cpuid()
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (16 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 17/21] KVM: VMX: Add handler for ENCLS[EINIT] to support SGX Launch Control Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 19/21] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC Sean Christopherson
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

VMX's virtualization of SGX adds a lovely dependency on the guest's
supported xcr0, which is calculated in kvm_update_cpuid().  VMX must
toggled its interception of SGX instructions based on the supported
xcr0, i.e. kvm_x86_ops->cpuid_update() is certainly the correct
location for the dependent code.

kvm_update_cpuid() was originally added by commit 2acf923e38fb ("KVM:
VMX: Enable XSAVE/XRSTOR for guest").  There is no indication that its
placement after kvm_x86_ops->cpuid_update() was anything more than a
"new function at the end" decision.

Inspection of the current code reveals no dependency on kvm_x86_ops's
cpuid_update() in kvm_update_cpuid() or any of its helpers.

  - SVM's sole update is to conditionally clear X86_FEATURE_X2APIC.
    X86_FEATURE_X2APIC is only consumed by kvm_apic_set_state(), which
    is already called immediately prior to kvm_x86_ops->cpuid_update().

  - VMX updates only nested VMX MSRs, allowed FEATURE_CONTROL bits,
    and VMCS fields, e.g. secondary execution controls, none of which
    should bleed back into kvm_update_cpuid() barring an egregious
    dependency bug somewhere else.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/cpuid.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 70e488951f25..4c235af5318c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -222,8 +222,9 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 	vcpu->arch.cpuid_nent = cpuid->nent;
 	cpuid_fix_nx_cap(vcpu);
 	kvm_apic_set_version(vcpu);
-	kvm_x86_ops->cpuid_update(vcpu);
 	r = kvm_update_cpuid(vcpu);
+	if (!r)
+		kvm_x86_ops->cpuid_update(vcpu);
 
 out:
 	vfree(cpuid_entries);
@@ -245,8 +246,9 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
 		goto out;
 	vcpu->arch.cpuid_nent = cpuid->nent;
 	kvm_apic_set_version(vcpu);
-	kvm_x86_ops->cpuid_update(vcpu);
 	r = kvm_update_cpuid(vcpu);
+	if (!r)
+		kvm_x86_ops->cpuid_update(vcpu);
 out:
 	return r;
 }
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 19/21] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (17 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 18/21] KVM: x86: Invoke kvm_x86_ops->cpuid_update() after kvm_update_cpuid() Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 20/21] x86/sgx: Export sgx_set_attribute() for use by KVM Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 21/21] KVM: x86: Add capability to grant VM access to privileged SGX attribute Sean Christopherson
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

SGX adds a basic support bit to CPUID(7, 0), and enumerates SGX
capabilities, e.g. EPC info, ENCLS leafs, etc..., in CPUID(0x12, *).
All SGX1 and SGX2 ENCLS leafs (supported in hardware) can be exposed
to the guest unconditionally.  All other ENCLS leafs (currently the
ENCLS_C leafs) and all ENCLV leafs currently cannot be exposed to the
guest.

Flexible Launch Control, a.k.a. SGX LC, allows software to control the
key that is used to verify the signer of an enclave.  Because SGX LC
impacts guest operation even if it's not exposed to the guest, i.e.
EINIT is affected by hardware's LE hash MSRs, SGX cannot be exposed to
the guest if the host supports LC without explicit LC support in KVM.
In other words, LC support is required to run on platforms with LC
enabled in the host, thus making exposure of SGX LC to the guest a
formality.

Access to the provision key is not supported in this patch.  Access to
the provision key is controlled via securityfs, a future patch will
plumb in the ability for the userspace hypervisor to grant a VM access
to the provision key by passing in an appropriate file descriptor.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/cpuid.c      |  72 +++++++++++++++++-
 arch/x86/kvm/vmx/nested.c |  19 ++++-
 arch/x86/kvm/vmx/nested.h |   5 ++
 arch/x86/kvm/vmx/sgx.h    |  11 +++
 arch/x86/kvm/vmx/vmcs12.c |   1 +
 arch/x86/kvm/vmx/vmcs12.h |   4 +-
 arch/x86/kvm/vmx/vmx.c    | 156 ++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/vmx/vmx.h    |   1 +
 8 files changed, 254 insertions(+), 15 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/sgx.h

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 4c235af5318c..73a0326a1968 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -18,6 +18,7 @@
 #include <asm/processor.h>
 #include <asm/user.h>
 #include <asm/fpu/xstate.h>
+#include <asm/sgx_arch.h>
 #include "cpuid.h"
 #include "lapic.h"
 #include "mmu.h"
@@ -117,6 +118,21 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 	if (best && (best->eax & (F(XSAVES) | F(XSAVEC))))
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
+	/*
+	 * Bits 127:0 of the allowed SECS.ATTRIBUTES (CPUID.0x12.0x1) enumerate
+	 * the supported XSAVE Feature Request Mask (XFRM), i.e. the enclave's
+	 * requested XCR0 value.  The enclave's XFRM must be a subset of XCRO
+	 * at the time of EENTER, thus adjust the allowed XFRM by the guest's
+	 * supported XCR0.  Similar to XCR0 handling, FP and SSE are forced to
+	 * '1' even on CPUs that don't support XSAVE.
+	 */
+	best = kvm_find_cpuid_entry(vcpu, 0x12, 0x1);
+	if (best) {
+		best->ecx &= vcpu->arch.guest_supported_xcr0 & 0xffffffff;
+		best->edx &= vcpu->arch.guest_supported_xcr0 >> 32;
+		best->ecx |= XFEATURE_MASK_FPSSE;
+	}
+
 	/*
 	 * The existing code assumes virtual address is 48-bit or 57-bit in the
 	 * canonical address checks; exit if it is ever changed.
@@ -393,7 +409,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) |
 		F(ADX) | F(SMAP) | F(AVX512IFMA) | F(AVX512F) | F(AVX512PF) |
 		F(AVX512ER) | F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(AVX512DQ) |
-		F(SHA_NI) | F(AVX512BW) | F(AVX512VL) | f_intel_pt;
+		F(SHA_NI) | F(AVX512BW) | F(AVX512VL) | f_intel_pt | F(SGX);
 
 	/* cpuid 0xD.1.eax */
 	const u32 kvm_cpuid_D_1_eax_x86_features =
@@ -404,7 +420,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
-		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
+		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | F(SGX_LC);
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
@@ -412,6 +428,23 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
 		F(MD_CLEAR);
 
+	/* cpuid 12.0.eax*/
+	const u32 kvm_cpuid_12_0_eax_x86_features =
+		F(SGX1) | F(SGX2) | 0 /* Reserved */ | 0 /* Reserved */ |
+		0 /* Reserved */ | 0 /* ENCLV */ | 0 /* ENCLS_C */;
+
+	/* cpuid 12.0.ebx*/
+	const u32 kvm_cpuid_12_0_ebx_sgx_features =
+		SGX_MISC_EXINFO;
+
+	/* cpuid 12.1.eax*/
+	const u32 kvm_cpuid_12_1_eax_sgx_features =
+		SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | 0 /* PROVISIONKEY */ |
+		SGX_ATTR_EINITTOKENKEY | SGX_ATTR_KSS;
+
+	/* cpuid 12.1.ebx*/
+	const u32 kvm_cpuid_12_1_ebx_sgx_features = 0;
+
 	/*
 	 * The code below assumes index == 0, which simplifies handling leafs
 	 * with a dynamic number of sub-leafs.  The index is fully under KVM's
@@ -433,7 +466,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	switch (function) {
 	case 0:
-		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
+		entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0x12));
 		break;
 	case 1:
 		entry->edx &= kvm_cpuid_1_edx_x86_features;
@@ -607,6 +640,39 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		}
 		break;
 	}
+	case 0x12:
+		/* Intel SGX */
+		if (!boot_cpu_has(X86_FEATURE_SGX)) {
+			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+			break;
+		}
+
+		/*
+		 * Index 0: Sub-features, MISCSELECT (a.k.a extended features)
+		 * and max enclave sizes.   The SGX sub-features and MISCSELECT
+		 * are restricted by KVM and x86_capabilities (like most
+		 * feature flags), while enclave size is unrestricted.
+		 */
+		entry->eax &= kvm_cpuid_12_0_eax_x86_features;
+		entry->ebx &= kvm_cpuid_12_0_ebx_sgx_features;
+		cpuid_mask(&entry->eax, CPUID_LNX_3);
+		entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+
+		/*
+		 * Index 1: SECS.ATTRIBUTES.  ATTRIBUTES are restricted a la
+		 * feature flags.  Unconditionally advertise all supported
+		 * flags, even privileged attributes that require explicit
+		 * opt-in from userspace.  ATTRIBUTES.XFRM is not adjusted
+		 * as userspace is expected to derive it from supported XCR0.
+		 */
+		if (*nent >= maxnent)
+			goto out;
+		do_cpuid_1_ent(++entry, 0x12, 0x1);
+		entry->eax &= kvm_cpuid_12_1_eax_sgx_features;
+		entry->ebx &= kvm_cpuid_12_1_ebx_sgx_features;
+		entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+		++*nent;
+		break;
 	/* Intel PT */
 	case 0x14: {
 		int t, times = entry->eax;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fef4fb3e1aaa..6d0e46584133 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2074,7 +2074,7 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 			vmcs_write64(APIC_ACCESS_ADDR, -1ull);
 
 		if (exec_control & SECONDARY_EXEC_ENCLS_EXITING)
-			vmcs_write64(ENCLS_EXITING_BITMAP, -1ull);
+			vmx_write_encls_bitmap(&vmx->vcpu, vmcs12);
 
 		vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
 	}
@@ -5014,6 +5014,20 @@ static bool nested_vmx_exit_handled_cr(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+static bool nested_vmx_exit_handled_encls(struct kvm_vcpu *vcpu,
+					  struct vmcs12 *vmcs12)
+{
+	u32 encls_leaf;
+
+	if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENCLS_EXITING))
+		return false;
+
+	encls_leaf = vcpu->arch.regs[VCPU_REGS_RAX];
+	if (encls_leaf > 62)
+		encls_leaf = 63;
+	return (vmcs12->encls_exiting_bitmap & (1ULL << encls_leaf));
+}
+
 static bool nested_vmx_exit_handled_vmcs_access(struct kvm_vcpu *vcpu,
 	struct vmcs12 *vmcs12, gpa_t bitmap)
 {
@@ -5212,8 +5226,7 @@ bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason)
 		/* VM functions are emulated through L2->L0 vmexits. */
 		return false;
 	case EXIT_REASON_ENCLS:
-		/* SGX is never exposed to L1 */
-		return false;
+		return nested_vmx_exit_handled_encls(vcpu, vmcs12);
 	default:
 		return true;
 	}
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index e847ff1019a2..527bab90703d 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -226,6 +226,11 @@ static inline bool nested_cpu_has_save_preemption_timer(struct vmcs12 *vmcs12)
 	    VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 }
 
+static inline bool nested_cpu_has_encls_exit(struct vmcs12 *vmcs12)
+{
+	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENCLS_EXITING);
+}
+
 /*
  * In nested virtualization, check if L1 asked to exit on external interrupts.
  * For most existing hypervisors, this will always return true.
diff --git a/arch/x86/kvm/vmx/sgx.h b/arch/x86/kvm/vmx/sgx.h
new file mode 100644
index 000000000000..2087d5f70883
--- /dev/null
+++ b/arch/x86/kvm/vmx/sgx.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_SGX_H
+#define __KVM_X86_SGX_H
+
+#ifdef CONFIG_INTEL_SGX_VIRTUALIZATION
+int handle_encls_ecreate(struct kvm_vcpu *vcpu);
+int handle_encls_einit(struct kvm_vcpu *vcpu);
+#endif
+
+#endif /* __KVM_X86_SGX_H */
+
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 53dfb401316d..355020944f6c 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -50,6 +50,7 @@ const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD64(VMREAD_BITMAP, vmread_bitmap),
 	FIELD64(VMWRITE_BITMAP, vmwrite_bitmap),
 	FIELD64(XSS_EXIT_BITMAP, xss_exit_bitmap),
+	FIELD64(ENCLS_EXITING_BITMAP, encls_exiting_bitmap),
 	FIELD64(GUEST_PHYSICAL_ADDRESS, guest_physical_address),
 	FIELD64(VMCS_LINK_POINTER, vmcs_link_pointer),
 	FIELD64(GUEST_IA32_DEBUGCTL, guest_ia32_debugctl),
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 337718fc8a36..82a6ccd91ce5 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -69,7 +69,8 @@ struct __packed vmcs12 {
 	u64 vm_function_control;
 	u64 eptp_list_address;
 	u64 pml_address;
-	u64 padding64[3]; /* room for future expansion */
+	u64 encls_exiting_bitmap;
+	u64 padding64[2]; /* room for future expansion */
 	/*
 	 * To allow migration of L1 (complete with its L2 guests) between
 	 * machines of different natural widths (32 or 64 bit), we cannot have
@@ -259,6 +260,7 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(vm_function_control, 296);
 	CHECK_OFFSET(eptp_list_address, 304);
 	CHECK_OFFSET(pml_address, 312);
+	CHECK_OFFSET(encls_exiting_bitmap, 320);
 	CHECK_OFFSET(cr0_guest_host_mask, 344);
 	CHECK_OFFSET(cr4_guest_host_mask, 352);
 	CHECK_OFFSET(cr0_read_shadow, 360);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 819c47fee157..ed0366641587 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -41,6 +41,7 @@
 #include <asm/mce.h>
 #include <asm/mmu_context.h>
 #include <asm/mshyperv.h>
+#include <asm/sgx_arch.h>
 #include <asm/spec-ctrl.h>
 #include <asm/virtext.h>
 #include <asm/vmx.h>
@@ -55,6 +56,7 @@
 #include "nested.h"
 #include "ops.h"
 #include "pmu.h"
+#include "sgx.h"
 #include "trace.h"
 #include "vmcs.h"
 #include "vmcs12.h"
@@ -1961,6 +1963,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx->msr_ia32_feature_control = data;
 		if (msr_info->host_initiated && data == 0)
 			vmx_leave_nested(vcpu);
+
+		/* SGX may be enabled/disabled by guest's firmware */
+		vmx_write_encls_bitmap(vcpu, NULL);
 		break;
 	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
 		if (!msr_info->host_initiated &&
@@ -4030,6 +4035,15 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 		}
 	}
 
+	if (cpu_has_vmx_encls_vmexit() && nested) {
+		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX))
+			vmx->nested.msrs.secondary_ctls_high |=
+				SECONDARY_EXEC_ENCLS_EXITING;
+		else
+			vmx->nested.msrs.secondary_ctls_high &=
+				~SECONDARY_EXEC_ENCLS_EXITING;
+	}
+
 	vmx->secondary_exec_control = exec_control;
 }
 
@@ -4145,8 +4159,7 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
 		vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
 	}
 
-	if (cpu_has_vmx_encls_vmexit())
-		vmcs_write64(ENCLS_EXITING_BITMAP, -1ull);
+	vmx_write_encls_bitmap(&vmx->vcpu, NULL);
 
 	if (pt_mode == PT_MODE_HOST_GUEST) {
 		memset(&vmx->pt_desc, 0, sizeof(vmx->pt_desc));
@@ -5501,14 +5514,48 @@ static int handle_vmx_instruction(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static inline bool sgx_enabled_in_guest_bios(struct kvm_vcpu *vcpu)
+{
+	const u64 bits = FEATURE_CONTROL_SGX_ENABLE | FEATURE_CONTROL_LOCKED;
+
+	return (to_vmx(vcpu)->msr_ia32_feature_control & bits) == bits;
+}
+
+static inline bool encls_leaf_enabled_in_guest(struct kvm_vcpu *vcpu, u32 leaf)
+{
+	if (!guest_cpuid_has(vcpu, X86_FEATURE_SGX))
+		return false;
+
+	if (leaf >= SGX_ECREATE && leaf <= SGX_ETRACK)
+		return guest_cpuid_has(vcpu, X86_FEATURE_SGX1);
+
+	if (leaf >= SGX_EAUG && leaf <= SGX_EMODT)
+		return guest_cpuid_has(vcpu, X86_FEATURE_SGX2);
+
+	return false;
+}
+
 static int handle_encls(struct kvm_vcpu *vcpu)
 {
-	/*
-	 * SGX virtualization is not yet supported.  There is no software
-	 * enable bit for SGX, so we have to trap ENCLS and inject a #UD
-	 * to prevent the guest from executing ENCLS.
-	 */
-	kvm_queue_exception(vcpu, UD_VECTOR);
+	u32 leaf = (u32)vcpu->arch.regs[VCPU_REGS_RAX];
+
+	if (!encls_leaf_enabled_in_guest(vcpu, leaf) ||
+	    !IS_ENABLED(CONFIG_INTEL_SGX_VIRTUALIZATION))
+		kvm_queue_exception(vcpu, UD_VECTOR);
+	else if (!sgx_enabled_in_guest_bios(vcpu))
+		kvm_inject_gp(vcpu, 0);
+	else {
+#ifdef CONFIG_INTEL_SGX_VIRTUALIZATION
+		if (leaf == SGX_ECREATE)
+			return handle_encls_ecreate(vcpu);
+		if (leaf == SGX_EINIT)
+			return handle_encls_einit(vcpu);
+#endif
+		WARN(1, "KVM: unexpected exit on ENCLS[%u]", leaf);
+		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+		vcpu->run->hw.hardware_exit_reason = EXIT_REASON_ENCLS;
+		return 0;
+	}
 	return 1;
 }
 
@@ -7005,6 +7052,85 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
+/*
+ * ECREATE must be intercepted to enforce MISCSELECT, ATTRIBUTES and XFRM
+ * restrictions if the guest's allowed-1 settings diverge from hardware.
+ */
+static inline bool vmx_intercept_encls_ecreate(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_INTEL_SGX_VIRTUALIZATION
+	struct kvm_cpuid_entry2 *guest_cpuid;
+	u32 eax, ebx, ecx, edx;
+
+	if (!vcpu->kvm->arch.sgx_provisioning_allowed)
+		return true;
+
+	guest_cpuid = kvm_find_cpuid_entry(vcpu, 0x12, 0);
+	if (!guest_cpuid)
+		return true;
+
+	cpuid_count(0x12, 0, &eax, &ebx, &ecx, &edx);
+	if (guest_cpuid->ebx != ebx)
+		return true;
+
+	guest_cpuid = kvm_find_cpuid_entry(vcpu, 0x12, 1);
+	if (!guest_cpuid)
+		return true;
+
+	cpuid_count(0x12, 1, &eax, &ebx, &ecx, &edx);
+	if (guest_cpuid->eax != eax || guest_cpuid->ebx != ebx ||
+	    guest_cpuid->ecx != ecx || guest_cpuid->edx != edx)
+		return true;
+#endif
+	return false;
+}
+
+void vmx_write_encls_bitmap(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
+{
+	/*
+	 * There is no software enable bit for SGX that is virtualized by
+	 * hardware, e.g. there's no CR4.SGXE, so when SGX is disabled in the
+	 * guest (either by the host or by the guest's BIOS) but enabled in the
+	 * host, trap all ENCLS leafs and inject #UD/#GP as needed to emulate
+	 * the expected system behavior for ENCLS.
+	 */
+	u64 bitmap = -1ull;
+
+	/* Nothing to do if hardware doesn't support SGX */
+	if (!cpu_has_vmx_encls_vmexit())
+		return;
+
+#ifdef CONFIG_INTEL_SGX_VIRTUALIZATION
+	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX) &&
+	    sgx_enabled_in_guest_bios(vcpu)) {
+		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX1)) {
+			bitmap &= ~GENMASK_ULL(SGX_ETRACK, SGX_ECREATE);
+			if (vmx_intercept_encls_ecreate(vcpu))
+				bitmap |= (1 << SGX_ECREATE);
+		}
+
+		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX2))
+			bitmap &= ~GENMASK_ULL(SGX_EMODT, SGX_EAUG);
+
+		/*
+		 * Trap and execute EINIT if launch control is enabled in the
+		 * host using the guest's values for launch control MSRs, even
+		 * if the guest's values are fixed to hardware default values.
+		 * The MSRs are not loaded/saved on VM-Enter/VM-Exit as writing
+		 * the MSRs is extraordinarily expensive.
+		 */
+		if (boot_cpu_has(X86_FEATURE_SGX_LC))
+			bitmap |= (1 << SGX_EINIT);
+
+		if (!vmcs12 && nested && is_guest_mode(vcpu))
+			vmcs12 = get_vmcs12(vcpu);
+		if (vmcs12 && nested_cpu_has_encls_exit(vmcs12))
+			bitmap |= vmcs12->encls_exiting_bitmap;
+	}
+#endif
+	vmcs_write64(ENCLS_EXITING_BITMAP, bitmap);
+}
+
 static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7029,6 +7155,20 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
 			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
 		update_intel_pt_cfg(vcpu);
+
+	vmx_write_encls_bitmap(vcpu, NULL);
+	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX))
+		vmx->msr_ia32_feature_control_valid_bits |=
+			FEATURE_CONTROL_SGX_ENABLE;
+	else
+		vmx->msr_ia32_feature_control_valid_bits &=
+			~FEATURE_CONTROL_SGX_ENABLE;
+	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
+		vmx->msr_ia32_feature_control_valid_bits |=
+			FEATURE_CONTROL_SGX_LE_WR;
+	else
+		vmx->msr_ia32_feature_control_valid_bits &=
+			~FEATURE_CONTROL_SGX_LE_WR;
 }
 
 static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 1519c6918190..78c81efe43ba 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -325,6 +325,7 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
 struct shared_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr);
 void pt_update_intercept_for_msr(struct vcpu_vmx *vmx);
 void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp);
+void vmx_write_encls_bitmap(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12);
 
 #define POSTED_INTR_ON  0
 #define POSTED_INTR_SN  1
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 20/21] x86/sgx: Export sgx_set_attribute() for use by KVM
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (18 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 19/21] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27  5:52 ` [RFC PATCH 21/21] KVM: x86: Add capability to grant VM access to privileged SGX attribute Sean Christopherson
  20 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

To prevent userspace from circumventing access to the PROVISIONKEY by
running an enclave in a VM, KVM will deny access to the PROVISIONKEY
unless userspace proves to KVM that it is allowed to access the key.
Export sgx_set_attribute() so that it may be used by KVM to verify an
SGX attribute file.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/sgx.h             | 2 ++
 arch/x86/kernel/cpu/sgx/driver/ioctl.c | 1 +
 arch/x86/kernel/cpu/sgx/main.c         | 1 +
 arch/x86/kernel/cpu/sgx/sgx.h          | 1 -
 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index f0f0176b8e2f..65c9417d3a80 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -6,6 +6,8 @@
 
 struct sgx_pageinfo;
 
+int sgx_set_attribute(u64 *allowed_attributes, unsigned int attribute_fd);
+
 #if IS_ENABLED(CONFIG_KVM_INTEL)
 int sgx_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs, int *trapnr);
 int sgx_einit(void __user *sigstruct, void __user *token,
diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
index a1cb5f772363..1b7a05cd9d02 100644
--- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
@@ -2,6 +2,7 @@
 // Copyright(c) 2016-19 Intel Corporation.
 
 #include <asm/mman.h>
+#include <asm/sgx.h>
 #include <linux/mman.h>
 #include <linux/delay.h>
 #include <linux/file.h>
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 542427c6ae9c..68e5c704378a 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -336,6 +336,7 @@ int sgx_set_attribute(u64 *allowed_attributes, unsigned int attribute_fd)
 	*allowed_attributes |= SGX_ATTR_PROVISIONKEY;
 	return 0;
 }
+EXPORT_SYMBOL_GPL(sgx_set_attribute);
 
 static void sgx_dev_release(struct device *dev)
 {
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 3f3311024bd0..fab12cc0e7c5 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -96,6 +96,5 @@ void sgx_update_lepubkeyhash_msrs(u64 *lepubkeyhash, bool enforce);
 __init int sgx_dev_init(const char *name, struct device *dev,
 			struct cdev *cdev, const struct file_operations *fops,
 			int minor);
-int sgx_set_attribute(u64 *allowed_attributes, unsigned int attribute_fd);
 
 #endif /* _X86_SGX_H */
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 21/21] KVM: x86: Add capability to grant VM access to privileged SGX attribute
  2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
                   ` (19 preceding siblings ...)
  2019-07-27  5:52 ` [RFC PATCH 20/21] x86/sgx: Export sgx_set_attribute() for use by KVM Sean Christopherson
@ 2019-07-27  5:52 ` Sean Christopherson
  2019-07-27 17:32   ` Andy Lutomirski
  20 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-07-27  5:52 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Sean Christopherson, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

The SGX subsystem restricts access to a subset of enclave attributes to
provide additional security for an uncompromised kernel, e.g. to prevent
malware from using the PROVISIONKEY to ensure its nodes are running
inside a geniune SGX enclave and/or to obtain a stable fingerprint.

To prevent userspace from circumventing such restrictions by running an
enclave in a VM, KVM restricts guest access to privileged attributes by
default.  Add a capability, KVM_CAP_SGX_ATTRIBUTE, that can be used by
userspace to grant a VM access to a priveleged attribute, with args[0]
holding a file handle to a valid SGX attribute file corresponding to
an attribute that is restricted by KVM (currently only PROVISIONKEY).

Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 Documentation/virtual/kvm/api.txt | 20 ++++++++++++++++++++
 arch/x86/kvm/cpuid.c              |  2 +-
 arch/x86/kvm/x86.c                | 22 ++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  1 +
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 383b292966fa..b1c0ff4e9224 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -5013,6 +5013,26 @@ it hard or impossible to use it correctly.  The availability of
 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed.
 Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT.
 
+7.19 KVM_CAP_SGX_ATTRIBUTE
+
+Architectures: x86
+Parameters: args[0] is a file handle of a SGX attribute file in securityfs
+Returns: 0 on success, -EINVAL if the file handle is invalid or if a requested
+	 attribute is not supported by KVM.
+
+The SGX subsystem restricts access to a subset of enclave attributes, e.g. the
+PROVISIONKEY, to provide additional security for an uncompromised kernel, e.g.
+to prevent malware from using the PROVISIONKEY to ensure its nodes are running
+inside a geniune SGX enclave and/or to obtain a stable system fingerprint.
+
+To prevent userspace from circumventing such restrictions by running an enclave
+in a VM, KVM prevents access to privileged attributes by default.  Userspace
+can use KVM_CAP_SGX_ATTRIBUTE to grant a VM access to a priveleged attribute.
+args[0] must hold a file handle to a valid SGX attribute file corresponding to
+an attribute that is supported/restricted by KVM (currently only PROVISIONKEY).
+
+See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
+
 8. Other capabilities.
 ----------------------
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 73a0326a1968..73af09edb2fa 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -439,7 +439,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 12.1.eax*/
 	const u32 kvm_cpuid_12_1_eax_sgx_features =
-		SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | 0 /* PROVISIONKEY */ |
+		SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | SGX_ATTR_PROVISIONKEY |
 		SGX_ATTR_EINITTOKENKEY | SGX_ATTR_KSS;
 
 	/* cpuid 12.1.ebx*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ec92c5534336..9144909d4a8e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -67,6 +67,8 @@
 #include <asm/mshyperv.h>
 #include <asm/hypervisor.h>
 #include <asm/intel_pt.h>
+#include <asm/sgx.h>
+#include <asm/sgx_arch.h>
 #include <clocksource/hyperv_timer.h>
 
 #define CREATE_TRACE_POINTS
@@ -3090,6 +3092,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_GET_MSR_FEATURES:
 	case KVM_CAP_MSR_PLATFORM_INFO:
 	case KVM_CAP_EXCEPTION_PAYLOAD:
+#ifdef CONFIG_INTEL_SGX_VIRTUALIZATION
+	case KVM_CAP_SGX_ATTRIBUTE:
+#endif
 		r = 1;
 		break;
 	case KVM_CAP_SYNC_REGS:
@@ -4626,6 +4631,23 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		kvm->arch.exception_payload_enabled = cap->args[0];
 		r = 0;
 		break;
+#ifdef CONFIG_INTEL_SGX_VIRTUALIZATION
+	case KVM_CAP_SGX_ATTRIBUTE: {
+		u64 allowed_attributes = 0;
+
+		r = sgx_set_attribute(&allowed_attributes, cap->args[0]);
+		if (r)
+			break;
+
+		/* KVM only supports the PROVISIONKEY privileged attribute. */
+		if ((allowed_attributes & SGX_ATTR_PROVISIONKEY) &&
+		    !(allowed_attributes & ~SGX_ATTR_PROVISIONKEY))
+			kvm->arch.sgx_provisioning_allowed = true;
+		else
+			r = -EINVAL;
+		break;
+	}
+#endif
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2fe12b40d503..b16708c2b6c9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -993,6 +993,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_SVE 170
 #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
+#define KVM_CAP_SGX_ATTRIBUTE 200
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 21/21] KVM: x86: Add capability to grant VM access to privileged SGX attribute
  2019-07-27  5:52 ` [RFC PATCH 21/21] KVM: x86: Add capability to grant VM access to privileged SGX attribute Sean Christopherson
@ 2019-07-27 17:32   ` Andy Lutomirski
  0 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2019-07-27 17:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	Jarkko Sakkinen, Joerg Roedel, H. Peter Anvin, kvm list, LKML,
	linux-sgx

On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> The SGX subsystem restricts access to a subset of enclave attributes to
> provide additional security for an uncompromised kernel, e.g. to prevent
> malware from using the PROVISIONKEY to ensure its nodes are running
> inside a geniune SGX enclave and/or to obtain a stable fingerprint.
>
> To prevent userspace from circumventing such restrictions by running an
> enclave in a VM, KVM restricts guest access to privileged attributes by
> default.  Add a capability, KVM_CAP_SGX_ATTRIBUTE, that can be used by
> userspace to grant a VM access to a priveleged attribute, with args[0]
> holding a file handle to a valid SGX attribute file corresponding to
> an attribute that is restricted by KVM (currently only PROVISIONKEY).

Looks good to me.  Thanks!

> +can use KVM_CAP_SGX_ATTRIBUTE to grant a VM access to a priveleged attribute.

Spelling.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation
  2019-07-27  5:52 ` [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation Sean Christopherson
@ 2019-07-27 17:38   ` Andy Lutomirski
  2019-07-30  2:49     ` Sean Christopherson
  2019-07-30  3:08   ` Sean Christopherson
  1 sibling, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2019-07-27 17:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	Jarkko Sakkinen, Joerg Roedel, H. Peter Anvin, kvm list, LKML,
	linux-sgx

On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Similar to the existing AMD #NPF case where emulation of the current
> instruction is not possible due to lack of information, virtualization
> of Intel SGX will introduce a scenario where emulation is not possible
> due to the VMExit occurring in an SGX enclave.  And again similar to
> the AMD case, emulation can be initiated by kvm_mmu_page_fault(), i.e.
> outside of the control of the vendor-specific code.
>
> While the cause and architecturally visible behavior of the two cases
> is different,  e.g. Intel SGX will inject a #UD whereas AMD #NPF is a
> clean resume or complete shutdown, the impact on the common emulation
> code is identical: KVM must stop emulation immediately and resume the
> guest.
>
> Replace the exisiting need_emulation_on_page_fault() with a more generic
> is_emulatable() kvm_x86_ops callback, which is called unconditionally
> by x86_emulate_instruction().
>

Having recently noticed that emulate_ud() is broken when the guest's
TF is set, I suppose I should ask: does your new code function
sensibly when TF is set?

Also, anyone want to fix that emulate_ud() bug?  The test case is merged now:

# ./tools/testing/selftests/x86/syscall_arg_fault_32
[RUN]    SYSENTER with invalid state
[OK]    Seems okay
[RUN]    SYSCALL with invalid state
[SKIP]    Illegal instruction
[RUN]    SYSENTER with TF and invalid state
[OK]    Seems okay
[RUN]    SYSCALL with TF and invalid state
[WARN]    Got stuck single-stepping -- you probably have a KVM bug

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 04/21] x86/sgx: Add /dev/sgx/virt_epc device to allocate "raw" EPC for VMs
  2019-07-27  5:51 ` [RFC PATCH 04/21] x86/sgx: Add /dev/sgx/virt_epc device to allocate "raw" EPC for VMs Sean Christopherson
@ 2019-07-27 17:44   ` Andy Lutomirski
  2019-07-29 17:05     ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2019-07-27 17:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	Jarkko Sakkinen, Joerg Roedel, H. Peter Anvin, kvm list, LKML,
	linux-sgx

On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Add an SGX device to enable userspace to allocate EPC without an
> associated enclave.  The intended and only known use case for direct EPC
> allocation is to expose EPC to a KVM guest, hence the virt_epc moniker,
> virt.{c,h} files and INTEL_SGX_VIRTUALIZATION Kconfig.
>
> Although KVM is the end consumer of EPC, and will need hooks into the
> virtual EPC management if oversubscription of EPC for guest is ever
> supported (see below), implement direct access to EPC in the SGX
> subsystem instead of in KVM.  Doing so has two major advantages:
>
>   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
>     just another memory backend for guests.

This is general grumbling more than useful feedback, but I wish there
was a way for KVM's userspace to add a memory region that is *not*
backed by a memory mapping.  For SGX, this would avoid the slightly
awkward situation where useless EPC pages are mapped by QEMU.  For
SEV, it would solve the really fairly awful situation where the SEV
pages are mapped *incoherently* for QEMU.  And even in the absence of
fancy hardware features, it would allow the guest to have secrets in
memory that are not exposed to wild reads, speculation attacks, etc
coming from QEMU.

I realize the implementation would be extremely intrusive, but it just
might make it a lot easier to do things like making SEV pages property
movable.  Similarly, I could see EPC oversubscription being less nasty
in this model.  For one thing, it would make it more straightforward
to keep track of exactly which VMs have a given EPC page mapped,
whereas right now this driver only really knows which host userspace
mm has the EPC page mapped.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 04/21] x86/sgx: Add /dev/sgx/virt_epc device to allocate "raw" EPC for VMs
  2019-07-27 17:44   ` Andy Lutomirski
@ 2019-07-29 17:05     ` Sean Christopherson
  0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-29 17:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	Jarkko Sakkinen, Joerg Roedel, H. Peter Anvin, kvm list, LKML,
	linux-sgx

On Sat, Jul 27, 2019 at 10:44:24AM -0700, Andy Lutomirski wrote:
> On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > Add an SGX device to enable userspace to allocate EPC without an
> > associated enclave.  The intended and only known use case for direct EPC
> > allocation is to expose EPC to a KVM guest, hence the virt_epc moniker,
> > virt.{c,h} files and INTEL_SGX_VIRTUALIZATION Kconfig.
> >
> > Although KVM is the end consumer of EPC, and will need hooks into the
> > virtual EPC management if oversubscription of EPC for guest is ever
> > supported (see below), implement direct access to EPC in the SGX
> > subsystem instead of in KVM.  Doing so has two major advantages:
> >
> >   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
> >     just another memory backend for guests.
> 
> This is general grumbling more than useful feedback, but I wish there
> was a way for KVM's userspace to add a memory region that is *not*
> backed by a memory mapping.  For SGX, this would avoid the slightly
> awkward situation where useless EPC pages are mapped by QEMU.  For
> SEV, it would solve the really fairly awful situation where the SEV
> pages are mapped *incoherently* for QEMU.  And even in the absence of
> fancy hardware features, it would allow the guest to have secrets in
> memory that are not exposed to wild reads, speculation attacks, etc
> coming from QEMU.
> 
> I realize the implementation would be extremely intrusive, but it just
> might make it a lot easier to do things like making SEV pages property
> movable.  Similarly, I could see EPC oversubscription being less nasty
> in this model.  For one thing, it would make it more straightforward
> to keep track of exactly which VMs have a given EPC page mapped,
> whereas right now this driver only really knows which host userspace
> mm has the EPC page mapped.

Host userspace vs VM doesn't add much, if any, complexity to EPC
oversubscription, especially since the problem of supporting multiple mm
structs needs to be solved for the native driver anyways.

The nastiness of oversubscribing a VM is primarily in dealing with
EBLOCK/ETRACK/EWB conflicts between guest and host.  The other nasty bit
is that it all but requires fancier VA page management, e.g. allocating a
dedicated VA slot for every EPC page doesn't scale when presenting a
multi-{GB,TB} EPC to a guest, especially since there's no guarantee the
guest will ever access EPC.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation
  2019-07-27 17:38   ` Andy Lutomirski
@ 2019-07-30  2:49     ` Sean Christopherson
  2019-08-16  0:47       ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-07-30  2:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	Jarkko Sakkinen, Joerg Roedel, H. Peter Anvin, kvm list, LKML,
	linux-sgx

On Sat, Jul 27, 2019 at 10:38:03AM -0700, Andy Lutomirski wrote:
> On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > Similar to the existing AMD #NPF case where emulation of the current
> > instruction is not possible due to lack of information, virtualization
> > of Intel SGX will introduce a scenario where emulation is not possible
> > due to the VMExit occurring in an SGX enclave.  And again similar to
> > the AMD case, emulation can be initiated by kvm_mmu_page_fault(), i.e.
> > outside of the control of the vendor-specific code.
> >
> > While the cause and architecturally visible behavior of the two cases
> > is different,  e.g. Intel SGX will inject a #UD whereas AMD #NPF is a
> > clean resume or complete shutdown, the impact on the common emulation
> > code is identical: KVM must stop emulation immediately and resume the
> > guest.
> >
> > Replace the exisiting need_emulation_on_page_fault() with a more generic
> > is_emulatable() kvm_x86_ops callback, which is called unconditionally
> > by x86_emulate_instruction().
> >
> 
> Having recently noticed that emulate_ud() is broken when the guest's
> TF is set, I suppose I should ask: does your new code function
> sensibly when TF is set?

Barring a VMX fault injection interaction I'm not thinking of, yes.  The
SGX reaction to the #UD VM-Exit is to inject a #UD and resume the guest,
pending breakpoints shouldn't be affected in any way (unless some other
part of KVM mucks with them, e.g. when guest single-stepping is enabled).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation
  2019-07-27  5:52 ` [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation Sean Christopherson
  2019-07-27 17:38   ` Andy Lutomirski
@ 2019-07-30  3:08   ` Sean Christopherson
  1 sibling, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-07-30  3:08 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Jarkko Sakkinen, Joerg Roedel
  Cc: H. Peter Anvin, kvm, linux-kernel, linux-sgx, Andy Lutomirski

On Fri, Jul 26, 2019 at 10:52:01PM -0700, Sean Christopherson wrote:
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 48c865a4e5dd..0fb8b60eb136 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7115,10 +7115,25 @@ static int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>  	return -ENODEV;
>  }
>  
> -static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
> +static bool svm_is_emulatable(struct kvm_vcpu *vcpu, void *insn, int insn_len)
>  {
>  	bool is_user, smap;
>  
> +	if (likely(!insn || insn_len))
> +		return true;
> +
> +	/*
> +	 * Under certain conditions insn_len may be zero on #NPF.  This can
> +	 * happen if a guest gets a page-fault on data access but the HW table
> +	 * walker is not able to read the instruction page (e.g instruction
> +	 * page is not present in memory). In those cases we simply restart the
> +	 * guest, with the exception of AMD Erratum 1096 which is unrecoverable.
> +	 */
> +	if (unlikely(insn && !insn_len)) {
> +		if (!kvm_x86_ops->need_emulation_on_page_fault(vcpu))
> +			return 1;
> +	}

Doh, obviously forgot to compile for SVM when rebasing.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation
  2019-07-30  2:49     ` Sean Christopherson
@ 2019-08-16  0:47       ` Andy Lutomirski
  2019-08-19 22:01         ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2019-08-16  0:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	Jarkko Sakkinen, Joerg Roedel, H. Peter Anvin, kvm list, LKML,
	linux-sgx



>> On Jul 29, 2019, at 7:49 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>> 
>> On Sat, Jul 27, 2019 at 10:38:03AM -0700, Andy Lutomirski wrote:
>> On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
>> <sean.j.christopherson@intel.com> wrote:
>>> 
>>> Similar to the existing AMD #NPF case where emulation of the current
>>> instruction is not possible due to lack of information, virtualization
>>> of Intel SGX will introduce a scenario where emulation is not possible
>>> due to the VMExit occurring in an SGX enclave.  And again similar to
>>> the AMD case, emulation can be initiated by kvm_mmu_page_fault(), i.e.
>>> outside of the control of the vendor-specific code.
>>> 
>>> While the cause and architecturally visible behavior of the two cases
>>> is different,  e.g. Intel SGX will inject a #UD whereas AMD #NPF is a
>>> clean resume or complete shutdown, the impact on the common emulation
>>> code is identical: KVM must stop emulation immediately and resume the
>>> guest.
>>> 
>>> Replace the exisiting need_emulation_on_page_fault() with a more generic
>>> is_emulatable() kvm_x86_ops callback, which is called unconditionally
>>> by x86_emulate_instruction().
>> 
>> Having recently noticed that emulate_ud() is broken when the guest's
>> TF is set, I suppose I should ask: does your new code function
>> sensibly when TF is set?
> 
> Barring a VMX fault injection interaction I'm not thinking of, yes.  The
> SGX reaction to the #UD VM-Exit is to inject a #UD and resume the guest,
> pending breakpoints shouldn't be affected in any way (unless some other
> part of KVM mucks with them, e.g. when guest single-stepping is enabled).

What I mean is: does the code actually do what you think it does if TF is set?  Right now, as I understand it, the KVM emulation code has a bug in which some emulated faults also inject #DB despite the fact that the instruction faulted, and the #DB seems to take precedence over the original fault.  This confuses the guest.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation
  2019-08-16  0:47       ` Andy Lutomirski
@ 2019-08-19 22:01         ` Sean Christopherson
  2019-08-20  1:34           ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-08-19 22:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	Jarkko Sakkinen, Joerg Roedel, H. Peter Anvin, kvm list, LKML,
	linux-sgx

On Thu, Aug 15, 2019 at 05:47:12PM -0700, Andy Lutomirski wrote:
> 
> 
> >> On Jul 29, 2019, at 7:49 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> >> 
> >> On Sat, Jul 27, 2019 at 10:38:03AM -0700, Andy Lutomirski wrote:
> >> On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
> >> <sean.j.christopherson@intel.com> wrote:
> >>> 
> >>> Similar to the existing AMD #NPF case where emulation of the current
> >>> instruction is not possible due to lack of information, virtualization
> >>> of Intel SGX will introduce a scenario where emulation is not possible
> >>> due to the VMExit occurring in an SGX enclave.  And again similar to
> >>> the AMD case, emulation can be initiated by kvm_mmu_page_fault(), i.e.
> >>> outside of the control of the vendor-specific code.
> >>> 
> >>> While the cause and architecturally visible behavior of the two cases
> >>> is different,  e.g. Intel SGX will inject a #UD whereas AMD #NPF is a
> >>> clean resume or complete shutdown, the impact on the common emulation
> >>> code is identical: KVM must stop emulation immediately and resume the
> >>> guest.
> >>> 
> >>> Replace the exisiting need_emulation_on_page_fault() with a more generic
> >>> is_emulatable() kvm_x86_ops callback, which is called unconditionally
> >>> by x86_emulate_instruction().
> >> 
> >> Having recently noticed that emulate_ud() is broken when the guest's
> >> TF is set, I suppose I should ask: does your new code function
> >> sensibly when TF is set?
> > 
> > Barring a VMX fault injection interaction I'm not thinking of, yes.  The
> > SGX reaction to the #UD VM-Exit is to inject a #UD and resume the guest,
> > pending breakpoints shouldn't be affected in any way (unless some other
> > part of KVM mucks with them, e.g. when guest single-stepping is enabled).
> 
> What I mean is: does the code actually do what you think it does if TF is
> set?  Right now, as I understand it, the KVM emulation code has a bug in
> which some emulated faults also inject #DB despite the fact that the
> instruction faulted, and the #DB seems to take precedence over the original
> fault.  This confuses the guest.

Yes.  The proposed change is to inject the #UD instead of calling into the
emulator, and by inspection I've verified that all code that injects a #DB
is either contained within the emulator or is mutually exclusive with an
intercepted #UD.  It's a qualified yes because I don't have an actual
testcase to verify my literacy.  I'll look into adding a test, either to
the selftest/x86/sgx or to kvm-unit-tests.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation
  2019-08-19 22:01         ` Sean Christopherson
@ 2019-08-20  1:34           ` Andy Lutomirski
  2019-08-20  1:41             ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2019-08-20  1:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	Jarkko Sakkinen, Joerg Roedel, H. Peter Anvin, kvm list, LKML,
	linux-sgx

On Mon, Aug 19, 2019 at 3:01 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Thu, Aug 15, 2019 at 05:47:12PM -0700, Andy Lutomirski wrote:
> >
> >
> > >> On Jul 29, 2019, at 7:49 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > >>
> > >> On Sat, Jul 27, 2019 at 10:38:03AM -0700, Andy Lutomirski wrote:
> > >> On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
> > >> <sean.j.christopherson@intel.com> wrote:
> > >>>
> > >>> Similar to the existing AMD #NPF case where emulation of the current
> > >>> instruction is not possible due to lack of information, virtualization
> > >>> of Intel SGX will introduce a scenario where emulation is not possible
> > >>> due to the VMExit occurring in an SGX enclave.  And again similar to
> > >>> the AMD case, emulation can be initiated by kvm_mmu_page_fault(), i.e.
> > >>> outside of the control of the vendor-specific code.
> > >>>
> > >>> While the cause and architecturally visible behavior of the two cases
> > >>> is different,  e.g. Intel SGX will inject a #UD whereas AMD #NPF is a
> > >>> clean resume or complete shutdown, the impact on the common emulation
> > >>> code is identical: KVM must stop emulation immediately and resume the
> > >>> guest.
> > >>>
> > >>> Replace the exisiting need_emulation_on_page_fault() with a more generic
> > >>> is_emulatable() kvm_x86_ops callback, which is called unconditionally
> > >>> by x86_emulate_instruction().
> > >>
> > >> Having recently noticed that emulate_ud() is broken when the guest's
> > >> TF is set, I suppose I should ask: does your new code function
> > >> sensibly when TF is set?
> > >
> > > Barring a VMX fault injection interaction I'm not thinking of, yes.  The
> > > SGX reaction to the #UD VM-Exit is to inject a #UD and resume the guest,
> > > pending breakpoints shouldn't be affected in any way (unless some other
> > > part of KVM mucks with them, e.g. when guest single-stepping is enabled).
> >
> > What I mean is: does the code actually do what you think it does if TF is
> > set?  Right now, as I understand it, the KVM emulation code has a bug in
> > which some emulated faults also inject #DB despite the fact that the
> > instruction faulted, and the #DB seems to take precedence over the original
> > fault.  This confuses the guest.
>
> Yes.  The proposed change is to inject the #UD instead of calling into the
> emulator, and by inspection I've verified that all code that injects a #DB
> is either contained within the emulator or is mutually exclusive with an
> intercepted #UD.  It's a qualified yes because I don't have an actual
> testcase to verify my literacy.  I'll look into adding a test, either to
> the selftest/x86/sgx or to kvm-unit-tests.

I wrote one, and it fails:

# ./tools/testing/selftests/x86/syscall_arg_fault_32
[RUN]    SYSENTER with invalid state
[OK]    Seems okay
[RUN]    SYSCALL with invalid state
[SKIP]    Illegal instruction
[RUN]    SYSENTER with TF and invalid state
[OK]    Seems okay
[RUN]    SYSCALL with TF and invalid state
[WARN]    Got stuck single-stepping -- you probably have a KVM bug

emulate_ud() is buggy.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation
  2019-08-20  1:34           ` Andy Lutomirski
@ 2019-08-20  1:41             ` Sean Christopherson
  0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-08-20  1:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	Jarkko Sakkinen, Joerg Roedel, H. Peter Anvin, kvm list, LKML,
	linux-sgx

On Mon, Aug 19, 2019 at 06:34:07PM -0700, Andy Lutomirski wrote:
> On Mon, Aug 19, 2019 at 3:01 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Thu, Aug 15, 2019 at 05:47:12PM -0700, Andy Lutomirski wrote:
> > >
> > >
> > > >> On Jul 29, 2019, at 7:49 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > > >>
> > > >> On Sat, Jul 27, 2019 at 10:38:03AM -0700, Andy Lutomirski wrote:
> > > >> On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
> > > >> <sean.j.christopherson@intel.com> wrote:
> > > >>>
> > > >>> Similar to the existing AMD #NPF case where emulation of the current
> > > >>> instruction is not possible due to lack of information, virtualization
> > > >>> of Intel SGX will introduce a scenario where emulation is not possible
> > > >>> due to the VMExit occurring in an SGX enclave.  And again similar to
> > > >>> the AMD case, emulation can be initiated by kvm_mmu_page_fault(), i.e.
> > > >>> outside of the control of the vendor-specific code.
> > > >>>
> > > >>> While the cause and architecturally visible behavior of the two cases
> > > >>> is different,  e.g. Intel SGX will inject a #UD whereas AMD #NPF is a
> > > >>> clean resume or complete shutdown, the impact on the common emulation
> > > >>> code is identical: KVM must stop emulation immediately and resume the
> > > >>> guest.
> > > >>>
> > > >>> Replace the exisiting need_emulation_on_page_fault() with a more generic
> > > >>> is_emulatable() kvm_x86_ops callback, which is called unconditionally
> > > >>> by x86_emulate_instruction().
> > > >>
> > > >> Having recently noticed that emulate_ud() is broken when the guest's
> > > >> TF is set, I suppose I should ask: does your new code function
> > > >> sensibly when TF is set?
> > > >
> > > > Barring a VMX fault injection interaction I'm not thinking of, yes.  The
> > > > SGX reaction to the #UD VM-Exit is to inject a #UD and resume the guest,
> > > > pending breakpoints shouldn't be affected in any way (unless some other
> > > > part of KVM mucks with them, e.g. when guest single-stepping is enabled).
> > >
> > > What I mean is: does the code actually do what you think it does if TF is
> > > set?  Right now, as I understand it, the KVM emulation code has a bug in
> > > which some emulated faults also inject #DB despite the fact that the
> > > instruction faulted, and the #DB seems to take precedence over the original
> > > fault.  This confuses the guest.
> >
> > Yes.  The proposed change is to inject the #UD instead of calling into the
> > emulator, and by inspection I've verified that all code that injects a #DB
> > is either contained within the emulator or is mutually exclusive with an
> > intercepted #UD.  It's a qualified yes because I don't have an actual
> > testcase to verify my literacy.  I'll look into adding a test, either to
> > the selftest/x86/sgx or to kvm-unit-tests.
> 
> I wrote one, and it fails:
> 
> # ./tools/testing/selftests/x86/syscall_arg_fault_32
> [RUN]    SYSENTER with invalid state
> [OK]    Seems okay
> [RUN]    SYSCALL with invalid state
> [SKIP]    Illegal instruction
> [RUN]    SYSENTER with TF and invalid state
> [OK]    Seems okay
> [RUN]    SYSCALL with TF and invalid state
> [WARN]    Got stuck single-stepping -- you probably have a KVM bug
> 
> emulate_ud() is buggy.

Heh, yeah, I meant for the SGX case, e.g. SYSCALL from inside an enclave
with RFLAGS.TF=1.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-08-20  1:41 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-27  5:51 [RFC PATCH 00/21] x86/sgx: KVM: Add SGX virtualization Sean Christopherson
2019-07-27  5:51 ` [RFC PATCH 01/21] x86/sgx: Add defines for SGX device minor numbers Sean Christopherson
2019-07-27  5:51 ` [RFC PATCH 02/21] x86/sgx: Move bus registration and device init to common code Sean Christopherson
2019-07-27  5:51 ` [RFC PATCH 03/21] x86/sgx: Move provisioning device " Sean Christopherson
2019-07-27  5:51 ` [RFC PATCH 04/21] x86/sgx: Add /dev/sgx/virt_epc device to allocate "raw" EPC for VMs Sean Christopherson
2019-07-27 17:44   ` Andy Lutomirski
2019-07-29 17:05     ` Sean Christopherson
2019-07-27  5:51 ` [RFC PATCH 05/21] x86/sgx: Expose SGX architectural definitions to the kernel Sean Christopherson
2019-07-27  5:51 ` [RFC PATCH 06/21] KVM: x86: Add SGX sub-features leaf to reverse CPUID table Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 07/21] KVM: x86: Add WARN_ON_ONCE(index!=0) in __do_cpuid_ent Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 08/21] KVM: x86: Add kvm_x86_ops hook to short circuit emulation Sean Christopherson
2019-07-27 17:38   ` Andy Lutomirski
2019-07-30  2:49     ` Sean Christopherson
2019-08-16  0:47       ` Andy Lutomirski
2019-08-19 22:01         ` Sean Christopherson
2019-08-20  1:34           ` Andy Lutomirski
2019-08-20  1:41             ` Sean Christopherson
2019-07-30  3:08   ` Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 09/21] KVM: VMX: Add basic handling of VM-Exit from SGX enclave Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 10/21] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for VMX/SGX Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 11/21] KVM: x86: Export kvm_propagate_fault (as kvm_propagate_page_fault) Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 12/21] KVM: x86: Define new #PF SGX error code bit Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 13/21] x86/sgx: Move the intermediate EINIT helper into the driver Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 14/21] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 15/21] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 16/21] KVM: VMX: Edd emulation of SGX Launch Control LE hash MSRs Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 17/21] KVM: VMX: Add handler for ENCLS[EINIT] to support SGX Launch Control Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 18/21] KVM: x86: Invoke kvm_x86_ops->cpuid_update() after kvm_update_cpuid() Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 19/21] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 20/21] x86/sgx: Export sgx_set_attribute() for use by KVM Sean Christopherson
2019-07-27  5:52 ` [RFC PATCH 21/21] KVM: x86: Add capability to grant VM access to privileged SGX attribute Sean Christopherson
2019-07-27 17:32   ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).