kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/23] KVM SGX virtualization support
@ 2021-01-06  1:55 Kai Huang
  2021-01-06  1:55 ` [RFC PATCH 01/23] x86/sgx: Split out adding EPC page to free list to separate helper Kai Huang
                   ` (25 more replies)
  0 siblings, 26 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:55 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, jethro, b.thiel, mattson, joro, vkuznets,
	wanpengli, corbet

--- Disclaimer ---

These patches were originally written by Sean Christopherson while at Intel.
Now that Sean has left Intel, I (Kai) have taken over getting them upstream.
This series needs more review before it can be merged.  It is being posted
publicly and under RFC so Sean and others can review it. Maintainers are safe
ignoring it for now.

------------------

Hi all,

This series adds KVM SGX virtualization support. The first 12 patches starting
with x86/sgx or x86/cpu.. are necessary changes to x86 and SGX core/driver to
support KVM SGX virtualization, while the rest are patches to KVM subsystem.

Please help to review this series. Also I'd like to hear what is the proper
way to merge this series, since it contains change to both x86/SGX and KVM
subsystem. Any feedback is highly appreciated. And please let me know if I
forgot to CC anyone, or anyone wants to be removed from CC. Thanks in advance!

This series is based against latest tip tree's x86/sgx branch. You can also get
the code from tip branch of kvm-sgx repo on github:

        https://github.com/intel/kvm-sgx.git tip

It also requires Qemu changes to create VM with SGX support. You can find Qemu
repo here:

	https://github.com/intel/qemu-sgx.git next

Please refer to README.md of above qemu-sgx repo for detail on how to create
guest with SGX support. At meantime, for your quick reference you can use below
command to create SGX guest:

	#qemu-system-x86_64 -smp 4 -m 2G -drive file=<your_vm_image>,if=virtio \
		-cpu host,+sgx_provisionkey \
		-sgx-epc id=epc1,memdev=mem1 \
		-object memory-backend-epc,id=mem1,size=64M,prealloc

Please note that the SGX relevant part is:

		-cpu host,+sgx_provisionkey \
		-sgx-epc id=epc1,memdev=mem1 \
		-object memory-backend-epc,id=mem1,size=64M,prealloc

And you can change other parameters of your qemu command based on your needs.

=========
KVM SGX virtualization Overview

- Virtual EPC

"Virtual EPC" is the EPC section exposed by KVM to guest so SGX software in
guest can discover it and use it to create SGX enclaves. KVM exposes SGX to 
guest via CPUID, and exposes one or more "virtual EPC" sections for guest.
The size of "virtual EPC" is passed as Qemu parameter when creating the
guest, and the base address is calcualted internally according to guest's
configuration.

To support virtual EPC, add a new misc device /dev/sgx_virt_epc to SGX
core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
"virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX driver,
virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave associated,
and how virtual EPC is used by guest is compeletely controlled by guest's SGX
software.

Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
/dev/sgx_virt_epc rather than in KVM. Doing so has two major advantages:

  - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
    just another memory backend for guests.

  - EPC management is wholly contained in the SGX subsystem, e.g. SGX
    does not have to export any symbols, changes to reclaim flows don't
    need to be routed through KVM, SGX's dirty laundry doesn't have to
    get aired out for the world to see, and so on and so forth.

The virtual EPC allocated to guests is currently not reclaimable, due to
reclaiming EPC from KVM guests is not currently supported. Due to the
complications of handling reclaim conflicts between guest and host, KVM
EPC oversubscription, which allows total virtual EPC size greater than
physical EPC by being able to reclaiming guests' EPC, is significantly more
complex than basic support for SGX virtualization.

- Support SGX virtualization without SGX Launch Control unlocked mode

Although SGX driver requires SGX Launch Control unlocked mode to work, SGX
virtualization doesn't, since how enclave is created is completely controlled
by guest SGX software, which is not necessarily linux. Therefore, this series
allows KVM to expose SGX to guest even SGX Launch Control is in locked mode,
or is not present at all. The reason is the goal of SGX virtualization, or
virtualization in general, is to expose hardware feature to guest, but not to
make assumption how guest will use it. Therefore, KVM should support SGX guest
as long as hardware is able to, to have chance to support more potential use
cases in cloud environment.

- Support exposing SGX2

Due to the same reason above, SGX2 feature detection is added to core SGX code
to allow KVM to expose SGX2 to guest, even currently SGX driver doesn't support
SGX2, because SGX2 can work just fine in guest w/o any interaction to host SGX
driver.

- Restricit SGX guest access to provisioning key

To grant guest being able to fully use SGX, guest needs to be able to create
provisioning enclave. However provisioning key is sensitive and is restricted by
/dev/sgx_provision in host SGX driver, therefore KVM SGX virtualization follows
the same role: a new KVM_CAP_SGX_ATTRIBUTE is added to KVM uAPI, and only file
descriptor of /dev/sgx_provision is passed to that CAP by usersppace hypervisor
(Qemu) when creating the guest, it can access provisioning bit. This is done by
making KVM trape ECREATE instruction from guest, and check the provisioning bit
in ECREATE's attribute.


Kai Huang (1):
  x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs

Sean Christopherson (22):
  x86/sgx: Split out adding EPC page to free list to separate helper
  x86/sgx: Add enum for SGX_CHILD_PRESENT error code
  x86/sgx: Introduce virtual EPC for use by KVM guests
  x86/cpufeatures: Add SGX1 and SGX2 sub-features
  x86/cpu/intel: Allow SGX virtualization without Launch Control support
  x86/sgx: Expose SGX architectural definitions to the kernel
  x86/sgx: Move ENCLS leaf definitions to sgx_arch.h
  x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT)
  x86/sgx: Add encls_faulted() helper
  x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  x86/sgx: Move provisioning device creation out of SGX driver
  KVM: VMX: Convert vcpu_vmx.exit_reason to a union
  KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX)
  KVM: x86: Define new #PF SGX error code bit
  KVM: x86: Add SGX feature leaf to reverse CPUID lookup
  KVM: VMX: Add basic handling of VM-Exit from SGX enclave
  KVM: VMX: Frame in ENCLS handler for SGX virtualization
  KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions
  KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs
  KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)
  KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC
  KVM: x86: Add capability to grant VM access to privileged SGX
    attribute

 Documentation/virt/kvm/api.rst                |  23 +
 arch/x86/Kconfig                              |  12 +
 arch/x86/include/asm/cpufeature.h             |   5 +-
 arch/x86/include/asm/cpufeatures.h            |   6 +-
 arch/x86/include/asm/disabled-features.h      |   7 +-
 arch/x86/include/asm/kvm_host.h               |   5 +
 arch/x86/include/asm/required-features.h      |   2 +-
 arch/x86/include/asm/sgx.h                    |  19 +
 .../cpu/sgx/arch.h => include/asm/sgx_arch.h} |  20 +
 arch/x86/include/asm/vmx.h                    |   1 +
 arch/x86/include/uapi/asm/vmx.h               |   1 +
 arch/x86/kernel/cpu/common.c                  |   4 +
 arch/x86/kernel/cpu/feat_ctl.c                |  50 +-
 arch/x86/kernel/cpu/sgx/Makefile              |   1 +
 arch/x86/kernel/cpu/sgx/driver.c              |  17 -
 arch/x86/kernel/cpu/sgx/encl.c                |   2 +-
 arch/x86/kernel/cpu/sgx/encls.h               |  29 +-
 arch/x86/kernel/cpu/sgx/ioctl.c               |  23 +-
 arch/x86/kernel/cpu/sgx/main.c                |  79 ++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   5 +-
 arch/x86/kernel/cpu/sgx/virt.c                | 318 ++++++++++++
 arch/x86/kernel/cpu/sgx/virt.h                |  14 +
 arch/x86/kvm/Makefile                         |   2 +
 arch/x86/kvm/cpuid.c                          |  58 ++-
 arch/x86/kvm/cpuid.h                          |   1 +
 arch/x86/kvm/vmx/nested.c                     |  70 ++-
 arch/x86/kvm/vmx/nested.h                     |   5 +
 arch/x86/kvm/vmx/sgx.c                        | 462 ++++++++++++++++++
 arch/x86/kvm/vmx/sgx.h                        |  34 ++
 arch/x86/kvm/vmx/vmcs12.c                     |   1 +
 arch/x86/kvm/vmx/vmcs12.h                     |   4 +-
 arch/x86/kvm/vmx/vmx.c                        | 171 +++++--
 arch/x86/kvm/vmx/vmx.h                        |  27 +-
 arch/x86/kvm/x86.c                            |  24 +
 include/uapi/linux/kvm.h                      |   1 +
 tools/testing/selftests/sgx/defines.h         |   2 +-
 36 files changed, 1366 insertions(+), 139 deletions(-)
 create mode 100644 arch/x86/include/asm/sgx.h
 rename arch/x86/{kernel/cpu/sgx/arch.h => include/asm/sgx_arch.h} (96%)
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.c
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.h
 create mode 100644 arch/x86/kvm/vmx/sgx.c
 create mode 100644 arch/x86/kvm/vmx/sgx.h

-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 01/23] x86/sgx: Split out adding EPC page to free list to separate helper
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
@ 2021-01-06  1:55 ` Kai Huang
  2021-01-11 22:38   ` Jarkko Sakkinen
  2021-01-06  1:55 ` [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code Kai Huang
                   ` (24 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:55 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

SGX virtualization requires to allocate "raw" EPC and use it as virtual
EPC for SGX guest.  Unlike EPC used by SGX driver, virtual EPC doesn't
track how EPC pages are used in VM, e.g. (de)construction of enclaves,
so it cannot guarantee EREMOVE success, e.g. it doesn't have a priori
knowledge of which pages are SECS with non-zero child counts.

Split sgx_free_page() into two parts so that the "add to free list"
part can be used by virtual EPC without having to modify the EREMOVE
logic in sgx_free_page().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 24 ++++++++++++++++++------
 arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index c519fc5f6948..95aad183bb65 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -594,15 +594,30 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 	return page;
 }
 
+/**
+ * __sgx_free_epc_page() - Free an EPC page
+ * @page:	pointer to a previously allocated EPC page
+ *
+ * Insert an EPC page back to the list of free pages.
+ */
+void __sgx_free_epc_page(struct sgx_epc_page *page)
+{
+	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
+
+	spin_lock(&section->lock);
+	list_add_tail(&page->list, &section->page_list);
+	section->free_cnt++;
+	spin_unlock(&section->lock);
+}
+
 /**
  * sgx_free_epc_page() - Free an EPC page
- * @page:	an EPC page
+ * @page:	pointer to a previously allocated EPC page
  *
  * Call EREMOVE for an EPC page and insert it back to the list of free pages.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
-	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
 	int ret;
 
 	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
@@ -611,10 +626,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
 		return;
 
-	spin_lock(&section->lock);
-	list_add_tail(&page->list, &section->page_list);
-	section->free_cnt++;
-	spin_unlock(&section->lock);
+	__sgx_free_epc_page(page);
 }
 
 static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 5fa42d143feb..4dddd81cbbc3 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -77,6 +77,7 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
 }
 
 struct sgx_epc_page *__sgx_alloc_epc_page(void);
+void __sgx_free_epc_page(struct sgx_epc_page *page);
 void sgx_free_epc_page(struct sgx_epc_page *page);
 
 void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
  2021-01-06  1:55 ` [RFC PATCH 01/23] x86/sgx: Split out adding EPC page to free list to separate helper Kai Huang
@ 2021-01-06  1:55 ` Kai Huang
  2021-01-06 18:28   ` Dave Hansen
  2021-01-11 23:32   ` Jarkko Sakkinen
  2021-01-06  1:55 ` [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
                   ` (23 subsequent siblings)
  25 siblings, 2 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:55 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

SGX virtualization requires to allocate "raw" EPC and use it as "virtual
EPC" for SGX guest.  Unlike EPC used by SGX driver, virtual EPC doesn't
track how EPC pages are used in VM, e.g. (de)construction of enclaves,
so it cannot guarantee EREMOVE success, e.g. it doesn't have a priori
knowledge of which pages are SECS with non-zero child counts.

Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
failures are expected, but only due to SGX_CHILD_PRESENT.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/cpu/sgx/arch.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/arch.h b/arch/x86/kernel/cpu/sgx/arch.h
index dd7602c44c72..56b0f8ae3f92 100644
--- a/arch/x86/kernel/cpu/sgx/arch.h
+++ b/arch/x86/kernel/cpu/sgx/arch.h
@@ -26,12 +26,14 @@
  * enum sgx_return_code - The return code type for ENCLS, ENCLU and ENCLV
  * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
  *				been completed yet.
+ * %SGX_CHILD_PRESENT		Enclave has child pages present in the EPC.
  * %SGX_INVALID_EINITTOKEN:	EINITTOKEN is invalid and enclave signer's
  *				public key does not match IA32_SGXLEPUBKEYHASH.
  * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was received
  */
 enum sgx_return_code {
 	SGX_NOT_TRACKED			= 11,
+	SGX_CHILD_PRESENT		= 13,
 	SGX_INVALID_EINITTOKEN		= 16,
 	SGX_UNMASKED_EVENT		= 128,
 };
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
  2021-01-06  1:55 ` [RFC PATCH 01/23] x86/sgx: Split out adding EPC page to free list to separate helper Kai Huang
  2021-01-06  1:55 ` [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code Kai Huang
@ 2021-01-06  1:55 ` Kai Huang
  2021-01-06 19:35   ` Dave Hansen
  2021-01-11 23:38   ` Jarkko Sakkinen
  2021-01-06  1:55 ` [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features Kai Huang
                   ` (22 subsequent siblings)
  25 siblings, 2 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:55 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a misc device /dev/sgx_virt_epc to allow userspace to allocate "raw"
EPC without an associated enclave.  The intended and only known use case
for raw EPC allocation is to expose EPC to a KVM guest, hence the
virt_epc moniker, virt.{c,h} files and X86_SGX_VIRTUALIZATION Kconfig.

Modify sgx_init() to always try to initialize virtual EPC driver, even
when SGX driver is disabled due to SGX Launch Control is in locked mode,
or not present at all, since SGX virtualization allows to expose SGX to
guests that support non-LC configurations.

Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
/dev/sgx_virt_epc rather than in KVM. Doing so has two major advantages:

  - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
    just another memory backend for guests.

  - EPC management is wholly contained in the SGX subsystem, e.g. SGX
    does not have to export any symbols, changes to reclaim flows don't
    need to be routed through KVM, SGX's dirty laundry doesn't have to
    get aired out for the world to see, and so on and so forth.

The virtual EPC allocated to guests is currently not reclaimable, due to
oversubscription of EPC for KVM guests is not currently supported. Due
to the complications of handling reclaim conflicts between guest and
host, KVM EPC oversubscription is significantly more complex than basic
support for SGX virtualization.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/Kconfig                 |  12 ++
 arch/x86/kernel/cpu/sgx/Makefile |   1 +
 arch/x86/kernel/cpu/sgx/main.c   |   5 +-
 arch/x86/kernel/cpu/sgx/virt.c   | 263 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/virt.h   |  14 ++
 5 files changed, 294 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.c
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 618d1aabccb8..a7318175509b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1947,6 +1947,18 @@ config X86_SGX
 
 	  If unsure, say N.
 
+config X86_SGX_VIRTUALIZATION
+	bool "Software Guard eXtensions (SGX) Virtualization"
+	depends on X86_SGX && KVM_INTEL
+	help
+
+	  Enables KVM guests to create SGX enclaves.
+
+	  This includes support to expose "raw" unreclaimable enclave memory to
+	  guests via a device node, e.g. /dev/sgx_virt_epc.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 91d3dc784a29..7a25bf63adfb 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -3,3 +3,4 @@ obj-y += \
 	encl.o \
 	ioctl.o \
 	main.o
+obj-$(CONFIG_X86_SGX_VIRTUALIZATION)	+= virt.o
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 95aad183bb65..02993a327a1f 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -9,9 +9,11 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
+#include "arch.h"
 #include "driver.h"
 #include "encl.h"
 #include "encls.h"
+#include "virt.h"
 
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
@@ -726,7 +728,8 @@ static void __init sgx_init(void)
 	if (!sgx_page_reclaimer_init())
 		goto err_page_cache;
 
-	ret = sgx_drv_init();
+	/* Success if the native *or* virtual EPC driver initialized cleanly. */
+	ret = !!sgx_drv_init() & !!sgx_virt_epc_init();
 	if (ret)
 		goto err_kthread;
 
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
new file mode 100644
index 000000000000..d625551ccf25
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -0,0 +1,263 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  Copyright(c) 2016-20 Intel Corporation. */
+
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/xarray.h>
+#include <asm/sgx.h>
+#include <uapi/asm/sgx.h>
+
+#include "encls.h"
+#include "sgx.h"
+#include "virt.h"
+
+struct sgx_virt_epc {
+	struct xarray page_array;
+	struct mutex lock;
+	struct mm_struct *mm;
+};
+
+static struct mutex virt_epc_lock;
+static struct list_head virt_epc_zombie_pages;
+
+static int __sgx_virt_epc_fault(struct sgx_virt_epc *epc,
+				struct vm_area_struct *vma, unsigned long addr)
+{
+	struct sgx_epc_page *epc_page;
+	unsigned long index, pfn;
+	int ret;
+
+	/* epc->lock must already have been hold */
+
+	/* Calculate index of EPC page in virtual EPC's page_array */
+	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
+
+	epc_page = xa_load(&epc->page_array, index);
+	if (epc_page)
+		return 0;
+
+	epc_page = sgx_alloc_epc_page(epc, false);
+	if (IS_ERR(epc_page))
+		return PTR_ERR(epc_page);
+
+	ret = xa_err(xa_store(&epc->page_array, index, epc_page, GFP_KERNEL));
+	if (ret)
+		goto err_free;
+
+	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
+
+	ret = vmf_insert_pfn(vma, addr, pfn);
+	if (ret != VM_FAULT_NOPAGE) {
+		ret = -EFAULT;
+		goto err_delete;
+	}
+
+	return 0;
+
+err_delete:
+	xa_erase(&epc->page_array, index);
+err_free:
+	sgx_free_epc_page(epc_page);
+	return ret;
+}
+
+static vm_fault_t sgx_virt_epc_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_virt_epc *epc = vma->vm_private_data;
+	int ret;
+
+	mutex_lock(&epc->lock);
+	ret = __sgx_virt_epc_fault(epc, vma, vmf->address);
+	mutex_unlock(&epc->lock);
+
+	if (!ret)
+		return VM_FAULT_NOPAGE;
+
+	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
+		mmap_read_unlock(vma->vm_mm);
+		return VM_FAULT_RETRY;
+	}
+
+	return VM_FAULT_SIGBUS;
+}
+
+const struct vm_operations_struct sgx_virt_epc_vm_ops = {
+	.fault = sgx_virt_epc_fault,
+};
+
+static int sgx_virt_epc_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct sgx_virt_epc *epc = file->private_data;
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
+	/*
+	 * Don't allow mmap() from child after fork(), since child and parent
+	 * cannot map to the same EPC.
+	 */
+	if (vma->vm_mm != epc->mm)
+		return -EINVAL;
+
+	vma->vm_ops = &sgx_virt_epc_vm_ops;
+	/* Don't copy VMA in fork() */
+	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
+	vma->vm_private_data = file->private_data;
+
+	return 0;
+}
+
+static int sgx_virt_epc_free_page(struct sgx_epc_page *epc_page)
+{
+	int ret;
+
+	if (!epc_page)
+		return 0;
+
+	/*
+	 * Explicitly EREMOVE virtual EPC page. Virtual EPC is only used by
+	 * guest, and in normal condition guest should have done EREMOVE for
+	 * all EPC pages before they are freed here. But it's possible guest
+	 * is killed or crashed unnormally in which case EREMOVE has not been
+	 * done. Do EREMOVE unconditionally here to cover both cases, because
+	 * it's not possible to tell whether guest has done EREMOVE, since
+	 * virtual EPC page status is not tracked. And it is fine to EREMOVE
+	 * EPC page multiple times.
+	 */
+	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
+	if (ret) {
+		/*
+		 * Only SGX_CHILD_PRESENT is expected, which is because of
+		 * EREMOVE-ing an SECS still with child, in which case it can
+		 * be handled by EREMOVE-ing the SECS again after all pages in
+		 * virtual EPC have been EREMOVE-ed. See comments in below in
+		 * sgx_virt_epc_release().
+		 */
+		WARN_ON_ONCE(ret != SGX_CHILD_PRESENT);
+		return ret;
+	}
+
+	__sgx_free_epc_page(epc_page);
+	return 0;
+}
+
+static int sgx_virt_epc_release(struct inode *inode, struct file *file)
+{
+	struct sgx_virt_epc *epc = file->private_data;
+	struct sgx_epc_page *epc_page, *tmp, *entry;
+	unsigned long index;
+
+	LIST_HEAD(secs_pages);
+
+	mmdrop(epc->mm);
+
+	xa_for_each(&epc->page_array, index, entry) {
+		/*
+		 * Virtual EPC pages are not tracked, so it's possible for
+		 * EREMOVE to fail due to, e.g. a SECS page still has children
+		 * if guest was shutdown unexpectedly. If it is the case, leave
+		 * it in the xarray and retry EREMOVE below later.
+		 */
+		if (sgx_virt_epc_free_page(entry))
+			continue;
+
+		xa_erase(&epc->page_array, index);
+	}
+
+	/*
+	 * Retry all failed pages after iterating through the entire tree, at
+	 * which point all children should be removed and the SECS pages can be
+	 * nuked as well...unless userspace has exposed multiple instance of
+	 * virtual EPC to a single VM.
+	 */
+	xa_for_each(&epc->page_array, index, entry) {
+		epc_page = entry;
+		/*
+		 * Error here means that EREMOVE failed due to a SECS page
+		 * still has child on *another* EPC instance.  Put it to a
+		 * temporary SECS list which will be spliced to 'zombie page
+		 * list' and will be EREMOVE-ed again when freeing another
+		 * virtual EPC instance.
+		 */
+		if (sgx_virt_epc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+
+		xa_erase(&epc->page_array, index);
+	}
+
+	/*
+	 * Third time's a charm.  Try to EREMOVE zombie SECS pages from virtual
+	 * EPC instances that were previously released, i.e. free SECS pages
+	 * that were in limbo due to having children in *this* EPC instance.
+	 */
+	mutex_lock(&virt_epc_lock);
+	list_for_each_entry_safe(epc_page, tmp, &virt_epc_zombie_pages, list) {
+		/*
+		 * Speculatively remove the page from the list of zombies, if
+		 * the page is successfully EREMOVE it will be added to the
+		 * list of free pages.  If EREMOVE fails, throw the page on the
+		 * local list, which will be spliced on at the end.
+		 */
+		list_del(&epc_page->list);
+
+		if (sgx_virt_epc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+	}
+
+	if (!list_empty(&secs_pages))
+		list_splice_tail(&secs_pages, &virt_epc_zombie_pages);
+	mutex_unlock(&virt_epc_lock);
+
+	kfree(epc);
+
+	return 0;
+}
+
+static int sgx_virt_epc_open(struct inode *inode, struct file *file)
+{
+	struct sgx_virt_epc *epc;
+
+	epc = kzalloc(sizeof(struct sgx_virt_epc), GFP_KERNEL);
+	if (!epc)
+		return -ENOMEM;
+	/*
+	 * Keep the current->mm to virtual EPC. It will be checked in
+	 * sgx_virt_epc_mmap() to prevent, in case of fork, child being
+	 * able to mmap() to the same virtual EPC pages.
+	 */
+	mmgrab(current->mm);
+	epc->mm = current->mm;
+	mutex_init(&epc->lock);
+	xa_init(&epc->page_array);
+
+	file->private_data = epc;
+
+	return 0;
+}
+
+static const struct file_operations sgx_virt_epc_fops = {
+	.owner			= THIS_MODULE,
+	.open			= sgx_virt_epc_open,
+	.release		= sgx_virt_epc_release,
+	.mmap			= sgx_virt_epc_mmap,
+};
+
+static struct miscdevice sgx_virt_epc_dev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "sgx_virt_epc",
+	.nodename = "sgx_virt_epc",
+	.fops = &sgx_virt_epc_fops,
+};
+
+int __init sgx_virt_epc_init(void)
+{
+	INIT_LIST_HEAD(&virt_epc_zombie_pages);
+	mutex_init(&virt_epc_lock);
+
+	return misc_register(&sgx_virt_epc_dev);
+}
diff --git a/arch/x86/kernel/cpu/sgx/virt.h b/arch/x86/kernel/cpu/sgx/virt.h
new file mode 100644
index 000000000000..e5434541a122
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/virt.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+#ifndef _ASM_X86_SGX_VIRT_H
+#define _ASM_X86_SGX_VIRT_H
+
+#ifdef CONFIG_X86_SGX_VIRTUALIZATION
+int __init sgx_virt_epc_init(void);
+#else
+static inline int __init sgx_virt_epc_init(void)
+{
+	return -ENODEV;
+}
+#endif
+
+#endif /* _ASM_X86_SGX_VIRT_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (2 preceding siblings ...)
  2021-01-06  1:55 ` [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
@ 2021-01-06  1:55 ` Kai Huang
  2021-01-06 19:39   ` Dave Hansen
                     ` (2 more replies)
  2021-01-06  1:55 ` [RFC PATCH 05/23] x86/cpu/intel: Allow SGX virtualization without Launch Control support Kai Huang
                   ` (21 subsequent siblings)
  25 siblings, 3 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:55 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a feature word to hold SGX features enumerated via CPUID.0x12.0x0,
along with flags for SGX1 and SGX2. As part of virtualizing SGX, KVM
needs to expose the SGX CPUID leafs to its guest. SGX1 and SGX2 need to
be in a dedicated feature word so that they can be queried via KVM's
reverse CPUID lookup to properly emulate the expected guest behavior.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
 [Kai: Also clear SGX1 and SGX2 bits in clear_sgx_caps().]
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/cpufeature.h        | 5 +++--
 arch/x86/include/asm/cpufeatures.h       | 6 +++++-
 arch/x86/include/asm/disabled-features.h | 7 ++++++-
 arch/x86/include/asm/required-features.h | 2 +-
 arch/x86/kernel/cpu/common.c             | 4 ++++
 arch/x86/kernel/cpu/feat_ctl.c           | 2 ++
 6 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 59bf91c57aa8..efbdba5170a3 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -30,6 +30,7 @@ enum cpuid_leafs
 	CPUID_7_ECX,
 	CPUID_8000_0007_EBX,
 	CPUID_7_EDX,
+	CPUID_12_EAX,
 };
 
 #ifdef CONFIG_X86_FEATURE_NAMES
@@ -89,7 +90,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
 	   CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) ||	\
 	   CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) ||	\
 	   REQUIRED_MASK_CHECK					  ||	\
-	   BUILD_BUG_ON_ZERO(NCAPINTS != 19))
+	   BUILD_BUG_ON_ZERO(NCAPINTS != 20))
 
 #define DISABLED_MASK_BIT_SET(feature_bit)				\
 	 ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK,  0, feature_bit) ||	\
@@ -112,7 +113,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
 	   CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) ||	\
 	   CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) ||	\
 	   DISABLED_MASK_CHECK					  ||	\
-	   BUILD_BUG_ON_ZERO(NCAPINTS != 19))
+	   BUILD_BUG_ON_ZERO(NCAPINTS != 20))
 
 #define cpu_has(c, bit)							\
 	(__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 :	\
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f5ef2d5b9231..62b58cda034a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -13,7 +13,7 @@
 /*
  * Defines x86 CPU feature bits
  */
-#define NCAPINTS			19	   /* N 32-bit words worth of info */
+#define NCAPINTS			20	   /* N 32-bit words worth of info */
 #define NBUGINTS			1	   /* N 32-bit bug flags */
 
 /*
@@ -383,6 +383,10 @@
 #define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
+/* Intel-defined SGX features, CPUID level 0x00000012:0 (EAX), word 19 */
+#define X86_FEATURE_SGX1		(19*32+ 0) /* SGX1 leaf functions */
+#define X86_FEATURE_SGX2		(19*32+ 1) /* SGX2 leaf functions */
+
 /*
  * BUG word(s)
  */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 7947cb1782da..dfb8bbf21e2f 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -28,12 +28,16 @@
 # define DISABLE_CYRIX_ARR	(1<<(X86_FEATURE_CYRIX_ARR & 31))
 # define DISABLE_CENTAUR_MCR	(1<<(X86_FEATURE_CENTAUR_MCR & 31))
 # define DISABLE_PCID		0
+# define DISABLE_SGX1		0
+# define DISABLE_SGX2		0
 #else
 # define DISABLE_VME		0
 # define DISABLE_K6_MTRR	0
 # define DISABLE_CYRIX_ARR	0
 # define DISABLE_CENTAUR_MCR	0
 # define DISABLE_PCID		(1<<(X86_FEATURE_PCID & 31))
+# define DISABLE_SGX1		(1<<(X86_FEATURE_SGX1 & 31))
+# define DISABLE_SGX2		(1<<(X86_FEATURE_SGX2 & 31))
 #endif /* CONFIG_X86_64 */
 
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
@@ -91,6 +95,7 @@
 			 DISABLE_ENQCMD)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
-#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
+#define DISABLED_MASK19	(DISABLE_SGX1|DISABLE_SGX2)
+#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)
 
 #endif /* _ASM_X86_DISABLED_FEATURES_H */
diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h
index 3ff0d48469f2..6a02e04c90fb 100644
--- a/arch/x86/include/asm/required-features.h
+++ b/arch/x86/include/asm/required-features.h
@@ -101,6 +101,6 @@
 #define REQUIRED_MASK16	0
 #define REQUIRED_MASK17	0
 #define REQUIRED_MASK18	0
-#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
+#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)
 
 #endif /* _ASM_X86_REQUIRED_FEATURES_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 35ad8480c464..8746499aa415 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -932,6 +932,10 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		c->x86_capability[CPUID_D_1_EAX] = eax;
 	}
 
+	/* Additional Intel-defined SGX flags: level 0x00000012 */
+	if (c->cpuid_level >= 0x00000012)
+		c->x86_capability[CPUID_12_EAX] = cpuid_eax(0x00000012);
+
 	/* AMD-defined flags: level 0x80000001 */
 	eax = cpuid_eax(0x80000000);
 	c->extended_cpuid_level = eax;
diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
index 3b1b01f2b248..4fcd57fdc682 100644
--- a/arch/x86/kernel/cpu/feat_ctl.c
+++ b/arch/x86/kernel/cpu/feat_ctl.c
@@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
 {
 	setup_clear_cpu_cap(X86_FEATURE_SGX);
 	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
+	setup_clear_cpu_cap(X86_FEATURE_SGX1);
+	setup_clear_cpu_cap(X86_FEATURE_SGX2);
 }
 
 static int __init nosgx(char *str)
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 05/23] x86/cpu/intel: Allow SGX virtualization without Launch Control support
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (3 preceding siblings ...)
  2021-01-06  1:55 ` [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features Kai Huang
@ 2021-01-06  1:55 ` Kai Huang
  2021-01-06 19:54   ` Dave Hansen
  2021-01-06  1:56 ` [RFC PATCH 06/23] x86/sgx: Expose SGX architectural definitions to the kernel Kai Huang
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:55 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, jethro, b.thiel, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Allow SGX virtualization on systems without Launch Control support, i.e.
allow KVM to expose SGX to guests that support non-LC configurations.

Introduce clear_sgx_lc() to clear SGX_LC feature bit only if SGX Launch
Control is locked by BIOS when SGX virtualization is enabled, to prevent
SGX driver being enabled.

Improve error message to distinguish three cases: 1) SGX disabled
completely by BIOS; 2) SGX disabled completely due to SGX LC is locked
by BIOS, and SGX virtualization is also disabled; 3) Only SGX driver is
disabled due to SGX LC is locked by BIOS, but SGX virtualization is
enabled.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/cpu/feat_ctl.c | 48 +++++++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
index 4fcd57fdc682..b07452b68538 100644
--- a/arch/x86/kernel/cpu/feat_ctl.c
+++ b/arch/x86/kernel/cpu/feat_ctl.c
@@ -101,6 +101,11 @@ static void clear_sgx_caps(void)
 	setup_clear_cpu_cap(X86_FEATURE_SGX2);
 }
 
+static void clear_sgx_lc(void)
+{
+	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
+}
+
 static int __init nosgx(char *str)
 {
 	clear_sgx_caps();
@@ -113,7 +118,7 @@ early_param("nosgx", nosgx);
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 {
 	bool tboot = tboot_enabled();
-	bool enable_sgx;
+	bool enable_sgx_virt, enable_sgx_driver;
 	u64 msr;
 
 	if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) {
@@ -123,12 +128,19 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 	}
 
 	/*
-	 * Enable SGX if and only if the kernel supports SGX and Launch Control
-	 * is supported, i.e. disable SGX if the LE hash MSRs can't be written.
+	 * Enable SGX if and only if the kernel supports SGX.  Require Launch
+	 * Control support if SGX virtualization is *not* supported, i.e.
+	 * disable SGX if the LE hash MSRs can't be written and SGX can't be
+	 * exposed to a KVM guest (which might support non-LC configurations).
 	 */
-	enable_sgx = cpu_has(c, X86_FEATURE_SGX) &&
-		     cpu_has(c, X86_FEATURE_SGX_LC) &&
-		     IS_ENABLED(CONFIG_X86_SGX);
+	enable_sgx_driver = cpu_has(c, X86_FEATURE_SGX) &&
+			    cpu_has(c, X86_FEATURE_SGX1) &&
+			    IS_ENABLED(CONFIG_X86_SGX) &&
+			    cpu_has(c, X86_FEATURE_SGX_LC);
+	enable_sgx_virt = cpu_has(c, X86_FEATURE_SGX) &&
+			  cpu_has(c, X86_FEATURE_SGX1) &&
+			  IS_ENABLED(CONFIG_X86_SGX) &&
+			  IS_ENABLED(CONFIG_X86_SGX_VIRTUALIZATION);
 
 	if (msr & FEAT_CTL_LOCKED)
 		goto update_caps;
@@ -151,8 +163,11 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 			msr |= FEAT_CTL_VMX_ENABLED_INSIDE_SMX;
 	}
 
-	if (enable_sgx)
-		msr |= FEAT_CTL_SGX_ENABLED | FEAT_CTL_SGX_LC_ENABLED;
+	if (enable_sgx_driver || enable_sgx_virt) {
+		msr |= FEAT_CTL_SGX_ENABLED;
+		if (enable_sgx_driver)
+			msr |= FEAT_CTL_SGX_LC_ENABLED;
+	}
 
 	wrmsrl(MSR_IA32_FEAT_CTL, msr);
 
@@ -175,10 +190,19 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 	}
 
 update_sgx:
-	if (!(msr & FEAT_CTL_SGX_ENABLED) ||
-	    !(msr & FEAT_CTL_SGX_LC_ENABLED) || !enable_sgx) {
-		if (enable_sgx)
-			pr_err_once("SGX disabled by BIOS\n");
+	if (!(msr & FEAT_CTL_SGX_ENABLED)) {
+		if (enable_sgx_driver || enable_sgx_virt)
+			pr_err_once("SGX disabled by BIOS.\n");
 		clear_sgx_caps();
 	}
+	if (!(msr & FEAT_CTL_SGX_LC_ENABLED) &&
+	    (enable_sgx_driver || enable_sgx_virt)) {
+		if (!enable_sgx_virt) {
+			pr_err_once("SGX Launch Control is locked. Disable SGX.\n");
+			clear_sgx_caps();
+		} else if (enable_sgx_driver) {
+			pr_err_once("SGX Launch Control is locked. Disable SGX driver.\n");
+			clear_sgx_lc();
+		}
+	}
 }
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 06/23] x86/sgx: Expose SGX architectural definitions to the kernel
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (4 preceding siblings ...)
  2021-01-06  1:55 ` [RFC PATCH 05/23] x86/cpu/intel: Allow SGX virtualization without Launch Control support Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 07/23] x86/sgx: Move ENCLS leaf definitions to sgx_arch.h Kai Huang
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, jethro, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

KVM will use many of the architectural constants and structs to
virtualize SGX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/{kernel/cpu/sgx/arch.h => include/asm/sgx_arch.h} | 0
 arch/x86/kernel/cpu/sgx/encl.c                             | 2 +-
 arch/x86/kernel/cpu/sgx/main.c                             | 2 +-
 arch/x86/kernel/cpu/sgx/sgx.h                              | 2 +-
 tools/testing/selftests/sgx/defines.h                      | 2 +-
 5 files changed, 4 insertions(+), 4 deletions(-)
 rename arch/x86/{kernel/cpu/sgx/arch.h => include/asm/sgx_arch.h} (100%)

diff --git a/arch/x86/kernel/cpu/sgx/arch.h b/arch/x86/include/asm/sgx_arch.h
similarity index 100%
rename from arch/x86/kernel/cpu/sgx/arch.h
rename to arch/x86/include/asm/sgx_arch.h
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index ee50a5010277..24bf1604326d 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -7,7 +7,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/suspend.h>
 #include <linux/sched/mm.h>
-#include "arch.h"
+#include <asm/sgx_arch.h>
 #include "encl.h"
 #include "encls.h"
 #include "sgx.h"
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 02993a327a1f..9ad6ab6d4310 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -9,7 +9,7 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
-#include "arch.h"
+#include <asm/sgx_arch.h>
 #include "driver.h"
 #include "encl.h"
 #include "encls.h"
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4dddd81cbbc3..1a3312acbcd9 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -8,7 +8,7 @@
 #include <linux/rwsem.h>
 #include <linux/types.h>
 #include <asm/asm.h>
-#include "arch.h"
+#include <asm/sgx_arch.h>
 
 #undef pr_fmt
 #define pr_fmt(fmt) "sgx: " fmt
diff --git a/tools/testing/selftests/sgx/defines.h b/tools/testing/selftests/sgx/defines.h
index 592c1ccf4576..4dd39a003f40 100644
--- a/tools/testing/selftests/sgx/defines.h
+++ b/tools/testing/selftests/sgx/defines.h
@@ -14,7 +14,7 @@
 #define __aligned(x) __attribute__((__aligned__(x)))
 #define __packed __attribute__((packed))
 
-#include "../../../../arch/x86/kernel/cpu/sgx/arch.h"
+#include "../../../../arch/x86/include/asm/sgx_arch.h"
 #include "../../../../arch/x86/include/asm/enclu.h"
 #include "../../../../arch/x86/include/uapi/asm/sgx.h"
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 07/23] x86/sgx: Move ENCLS leaf definitions to sgx_arch.h
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (5 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 06/23] x86/sgx: Expose SGX architectural definitions to the kernel Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 08/23] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT) Kai Huang
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, jethro, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the ENCLS leaf definitions to sgx_arch.h so that they can be used
by KVM.  And because they're architectural.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/sgx_arch.h | 15 +++++++++++++++
 arch/x86/kernel/cpu/sgx/encls.h | 15 ---------------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/sgx_arch.h b/arch/x86/include/asm/sgx_arch.h
index 56b0f8ae3f92..38ef7ce3d3c7 100644
--- a/arch/x86/include/asm/sgx_arch.h
+++ b/arch/x86/include/asm/sgx_arch.h
@@ -22,6 +22,21 @@
 /* The bitmask for the EPC section type. */
 #define SGX_CPUID_EPC_MASK	GENMASK(3, 0)
 
+enum sgx_encls_function {
+	ECREATE	= 0x00,
+	EADD	= 0x01,
+	EINIT	= 0x02,
+	EREMOVE	= 0x03,
+	EDGBRD	= 0x04,
+	EDGBWR	= 0x05,
+	EEXTEND	= 0x06,
+	ELDU	= 0x08,
+	EBLOCK	= 0x09,
+	EPA	= 0x0A,
+	EWB	= 0x0B,
+	ETRACK	= 0x0C,
+};
+
 /**
  * enum sgx_return_code - The return code type for ENCLS, ENCLU and ENCLV
  * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index 443188fe7e70..be5c49689980 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -11,21 +11,6 @@
 #include <asm/traps.h>
 #include "sgx.h"
 
-enum sgx_encls_function {
-	ECREATE	= 0x00,
-	EADD	= 0x01,
-	EINIT	= 0x02,
-	EREMOVE	= 0x03,
-	EDGBRD	= 0x04,
-	EDGBWR	= 0x05,
-	EEXTEND	= 0x06,
-	ELDU	= 0x08,
-	EBLOCK	= 0x09,
-	EPA	= 0x0A,
-	EWB	= 0x0B,
-	ETRACK	= 0x0C,
-};
-
 /**
  * ENCLS_FAULT_FLAG - flag signifying an ENCLS return code is a trapnr
  *
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 08/23] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT)
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (6 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 07/23] x86/sgx: Move ENCLS leaf definitions to sgx_arch.h Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 09/23] x86/sgx: Add encls_faulted() helper Kai Huang
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, jethro, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Define the ENCLS leafs that are available with SGX2, also referred to as
Enclave Dynamic Memory Management (EDMM).  The leafs will be used by KVM
to conditionally expose SGX2 capabilities to guests.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/sgx_arch.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/sgx_arch.h b/arch/x86/include/asm/sgx_arch.h
index 38ef7ce3d3c7..2323ded379d6 100644
--- a/arch/x86/include/asm/sgx_arch.h
+++ b/arch/x86/include/asm/sgx_arch.h
@@ -35,6 +35,9 @@ enum sgx_encls_function {
 	EPA	= 0x0A,
 	EWB	= 0x0B,
 	ETRACK	= 0x0C,
+	EAUG	= 0x0D,
+	EMODPR	= 0x0E,
+	EMODT	= 0x0F,
 };
 
 /**
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 09/23] x86/sgx: Add encls_faulted() helper
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (7 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 08/23] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT) Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 10/23] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs Kai Huang
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a helper to extract the fault indicator from an encoded ENCLS return
value.  SGX virtualization will also need to detect ENCLS faults.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/cpu/sgx/encls.h | 14 +++++++++++++-
 arch/x86/kernel/cpu/sgx/ioctl.c |  2 +-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index be5c49689980..55919a2b01b0 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -40,6 +40,18 @@
 	} while (0);							  \
 }
 
+/*
+ * encls_faulted() - Check if an ENCLS leaf faulted given an error code
+ * @ret		the return value of an ENCLS leaf function call
+ *
+ * Return:
+ *	%true if @ret indicates a fault, %false otherwise
+ */
+static inline bool encls_faulted(int ret)
+{
+	return ret & ENCLS_FAULT_FLAG;
+}
+
 /**
  * encls_failed() - Check if an ENCLS function failed
  * @ret:	the return value of an ENCLS function call
@@ -50,7 +62,7 @@
  */
 static inline bool encls_failed(int ret)
 {
-	if (ret & ENCLS_FAULT_FLAG)
+	if (encls_faulted(ret))
 		return ENCLS_TRAPNR(ret) != X86_TRAP_PF;
 
 	return !!ret;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 90a5caf76939..e5977752c7be 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -568,7 +568,7 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 		}
 	}
 
-	if (ret & ENCLS_FAULT_FLAG) {
+	if (encls_faulted(ret)) {
 		if (encls_failed(ret))
 			ENCLS_WARN(ret, "EINIT");
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 10/23] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (8 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 09/23] x86/sgx: Add encls_faulted() helper Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06 19:56   ` Dave Hansen
  2021-01-06  1:56 ` [RFC PATCH 11/23] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Kai Huang
                   ` (15 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, Kai Huang

Add a helper to update SGX_LEPUBKEYHASHn MSRs. SGX virtualization also
needs to update those MSRs based on guest's "virtual" SGX_LEPUBKEYHASHn
before EINIT from guest.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/cpu/sgx/ioctl.c | 5 ++---
 arch/x86/kernel/cpu/sgx/main.c  | 8 ++++++++
 arch/x86/kernel/cpu/sgx/sgx.h   | 2 ++
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index e5977752c7be..1bae754268d1 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -495,7 +495,7 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 			 void *token)
 {
 	u64 mrsigner[4];
-	int i, j, k;
+	int i, j;
 	void *addr;
 	int ret;
 
@@ -544,8 +544,7 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 
 			preempt_disable();
 
-			for (k = 0; k < 4; k++)
-				wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0 + k, mrsigner[k]);
+			sgx_update_lepubkeyhash(mrsigner);
 
 			ret = __einit(sigstruct, token, addr);
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 9ad6ab6d4310..fd77b5775bc4 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -714,6 +714,14 @@ static bool __init sgx_page_cache_init(void)
 	return true;
 }
 
+void sgx_update_lepubkeyhash(u64 *lepubkeyhash)
+{
+	int i;
+
+	for (i = 0; i < 4; i++)
+		wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0 + i, lepubkeyhash[i]);
+}
+
 static void __init sgx_init(void)
 {
 	int ret;
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 1a3312acbcd9..ca741577fea5 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -84,4 +84,6 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
 
+void sgx_update_lepubkeyhash(u64 *lepubkeyhash);
+
 #endif /* _X86_SGX_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 11/23] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (9 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 10/23] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06 20:12   ` Dave Hansen
  2021-01-06  1:56 ` [RFC PATCH 12/23] x86/sgx: Move provisioning device creation out of SGX driver Kai Huang
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Provide wrappers around __ecreate() and __einit() to hide the ugliness
of overloading the ENCLS return value to encode multiple error formats
in a single int.  KVM will trap-and-execute ECREATE and EINIT as part
of SGX virtualization, and on an exception, KVM needs the trapnr so that
it can inject the correct fault into the guest.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
 [Kai: Use sgx_update_lepubkeyhash() to update pubkey hash MSRs.]
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/sgx.h     | 16 ++++++++++
 arch/x86/kernel/cpu/sgx/virt.c | 55 ++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)
 create mode 100644 arch/x86/include/asm/sgx.h

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
new file mode 100644
index 000000000000..0d643b985085
--- /dev/null
+++ b/arch/x86/include/asm/sgx.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_SGX_H
+#define _ASM_X86_SGX_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_X86_SGX_VIRTUALIZATION
+struct sgx_pageinfo;
+
+int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
+		     int *trapnr);
+int sgx_virt_einit(void __user *sigstruct, void __user *token,
+		   void __user *secs, u64 *lepubkeyhash, int *trapnr);
+#endif
+
+#endif /* _ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index d625551ccf25..4e9810ba9259 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -261,3 +261,58 @@ int __init sgx_virt_epc_init(void)
 
 	return misc_register(&sgx_virt_epc_dev);
 }
+
+int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
+		     int *trapnr)
+{
+	int ret;
+
+	__uaccess_begin();
+	ret = __ecreate(pageinfo, (void *)secs);
+	__uaccess_end();
+
+	if (encls_faulted(ret)) {
+		*trapnr = ENCLS_TRAPNR(ret);
+		return -EFAULT;
+	}
+
+	/* ECREATE doesn't return an error code, it faults or succeeds. */
+	WARN_ON_ONCE(ret);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sgx_virt_ecreate);
+
+static int __sgx_virt_einit(void __user *sigstruct, void __user *token,
+			    void __user *secs)
+{
+	int ret;
+
+	__uaccess_begin();
+	ret =  __einit((void *)sigstruct, (void *)token, (void *)secs);
+	__uaccess_end();
+	return ret;
+}
+
+int sgx_virt_einit(void __user *sigstruct, void __user *token,
+		   void __user *secs, u64 *lepubkeyhash, int *trapnr)
+{
+	int ret;
+
+	if (!boot_cpu_has(X86_FEATURE_SGX_LC)) {
+		ret = __sgx_virt_einit(sigstruct, token, secs);
+	} else {
+		preempt_disable();
+
+		sgx_update_lepubkeyhash(lepubkeyhash);
+
+		ret = __sgx_virt_einit(sigstruct, token, secs);
+		preempt_enable();
+	}
+
+	if (encls_faulted(ret)) {
+		*trapnr = ENCLS_TRAPNR(ret);
+		return -EFAULT;
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sgx_virt_einit);
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 12/23] x86/sgx: Move provisioning device creation out of SGX driver
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (10 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 11/23] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 13/23] KVM: VMX: Convert vcpu_vmx.exit_reason to a union Kai Huang
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

And extract sgx_set_attribute() out of sgx_ioc_enclave_provision() and
export it as symbol for KVM to use.

Provisioning key is sensitive. SGX driver only allows to create enclave
which can access provisioning key when enclave creator has permission to
open /dev/sgx_provision.  It should apply to VM as well, as provisioning
key is platform specific, thus unrestricted VM can also potentially
compromise provisioning key.

Move provisioning device creation out of sgx_drv_init() to sgx_init() as
preparation for adding SGX virtualization support, so that even SGX
driver is not enabled due to flexible launch control is not available,
SGX virtualization can still be enabled, and use it to restrict VM's
capability of being able to access provisioning key.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/sgx.h       |  3 +++
 arch/x86/kernel/cpu/sgx/driver.c | 17 ------------
 arch/x86/kernel/cpu/sgx/ioctl.c  | 16 ++----------
 arch/x86/kernel/cpu/sgx/main.c   | 44 +++++++++++++++++++++++++++++++-
 4 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 0d643b985085..795d724fab87 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -4,6 +4,9 @@
 
 #include <linux/types.h>
 
+int sgx_set_attribute(unsigned long *allowed_attributes,
+		      unsigned int attribute_fd);
+
 #ifdef CONFIG_X86_SGX_VIRTUALIZATION
 struct sgx_pageinfo;
 
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index f2eac41bb4ff..4f3241109bda 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -133,10 +133,6 @@ static const struct file_operations sgx_encl_fops = {
 	.get_unmapped_area	= sgx_get_unmapped_area,
 };
 
-const struct file_operations sgx_provision_fops = {
-	.owner			= THIS_MODULE,
-};
-
 static struct miscdevice sgx_dev_enclave = {
 	.minor = MISC_DYNAMIC_MINOR,
 	.name = "sgx_enclave",
@@ -144,13 +140,6 @@ static struct miscdevice sgx_dev_enclave = {
 	.fops = &sgx_encl_fops,
 };
 
-static struct miscdevice sgx_dev_provision = {
-	.minor = MISC_DYNAMIC_MINOR,
-	.name = "sgx_provision",
-	.nodename = "sgx_provision",
-	.fops = &sgx_provision_fops,
-};
-
 int __init sgx_drv_init(void)
 {
 	unsigned int eax, ebx, ecx, edx;
@@ -184,11 +173,5 @@ int __init sgx_drv_init(void)
 	if (ret)
 		return ret;
 
-	ret = misc_register(&sgx_dev_provision);
-	if (ret) {
-		misc_deregister(&sgx_dev_enclave);
-		return ret;
-	}
-
 	return 0;
 }
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 1bae754268d1..4714de12422d 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -2,6 +2,7 @@
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
 #include <asm/mman.h>
+#include <asm/sgx.h>
 #include <linux/mman.h>
 #include <linux/delay.h>
 #include <linux/file.h>
@@ -664,24 +665,11 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg)
 static long sgx_ioc_enclave_provision(struct sgx_encl *encl, void __user *arg)
 {
 	struct sgx_enclave_provision params;
-	struct file *file;
 
 	if (copy_from_user(&params, arg, sizeof(params)))
 		return -EFAULT;
 
-	file = fget(params.fd);
-	if (!file)
-		return -EINVAL;
-
-	if (file->f_op != &sgx_provision_fops) {
-		fput(file);
-		return -EINVAL;
-	}
-
-	encl->attributes_mask |= SGX_ATTR_PROVISIONKEY;
-
-	fput(file);
-	return 0;
+	return sgx_set_attribute(&encl->attributes_mask, params.fd);
 }
 
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index fd77b5775bc4..90659937950b 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -1,15 +1,18 @@
 // SPDX-License-Identifier: GPL-2.0
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
+#include <linux/file.h>
 #include <linux/freezer.h>
 #include <linux/highmem.h>
 #include <linux/kthread.h>
+#include <linux/miscdevice.h>
 #include <linux/pagemap.h>
 #include <linux/ratelimit.h>
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
 #include <asm/sgx_arch.h>
+#include <asm/sgx.h>
 #include "driver.h"
 #include "encl.h"
 #include "encls.h"
@@ -722,6 +725,38 @@ void sgx_update_lepubkeyhash(u64 *lepubkeyhash)
 		wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0 + i, lepubkeyhash[i]);
 }
 
+const struct file_operations sgx_provision_fops = {
+	.owner			= THIS_MODULE,
+};
+
+static struct miscdevice sgx_dev_provision = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "sgx_provision",
+	.nodename = "sgx_provision",
+	.fops = &sgx_provision_fops,
+};
+
+int sgx_set_attribute(unsigned long *allowed_attributes,
+		      unsigned int attribute_fd)
+{
+	struct file *file;
+
+	file = fget(attribute_fd);
+	if (!file)
+		return -EINVAL;
+
+	if (file->f_op != &sgx_provision_fops) {
+		fput(file);
+		return -EINVAL;
+	}
+
+	*allowed_attributes |= SGX_ATTR_PROVISIONKEY;
+
+	fput(file);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sgx_set_attribute);
+
 static void __init sgx_init(void)
 {
 	int ret;
@@ -736,13 +771,20 @@ static void __init sgx_init(void)
 	if (!sgx_page_reclaimer_init())
 		goto err_page_cache;
 
+	ret = misc_register(&sgx_dev_provision);
+	if (ret)
+		goto err_kthread;
+
 	/* Success if the native *or* virtual EPC driver initialized cleanly. */
 	ret = !!sgx_drv_init() & !!sgx_virt_epc_init();
 	if (ret)
-		goto err_kthread;
+		goto err_provision;
 
 	return;
 
+err_provision:
+	misc_deregister(&sgx_dev_provision);
+
 err_kthread:
 	kthread_stop(ksgxd_tsk);
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 13/23] KVM: VMX: Convert vcpu_vmx.exit_reason to a union
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (11 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 12/23] x86/sgx: Move provisioning device creation out of SGX driver Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 14/23] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX) Kai Huang
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, mattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Convert vcpu_vmx.exit_reason from a u32 to a union (of size u32).  The
full VM_EXIT_REASON field is comprised of a 16-bit basic exit reason in
bits 15:0, and single-bit modifiers in bits 31:16.

Historically, KVM has only had to worry about handling the "failed
VM-Entry" modifier, which could only be set in very specific flows and
required dedicated handling.  I.e. manually stripping the FAILED_VMENTRY
bit was a somewhat viable approach.  But even with only a single bit to
worry about, KVM has had several bugs related to comparing a basic exit
reason against the full exit reason store in vcpu_vmx.

Upcoming Intel features, e.g. SGX, will add new modifier bits that can
be set on more or less any VM-Exit, as opposed to the significantly more
restricted FAILED_VMENTRY, i.e. correctly handling everything in one-off
flows isn't scalable.  Tracking exit reason in a union forces code to
explicitly choose between consuming the full exit reason and the basic
exit, and is a convenient way to document and access the modifiers.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 42 +++++++++++++++---------
 arch/x86/kvm/vmx/vmx.c    | 68 ++++++++++++++++++++-------------------
 arch/x86/kvm/vmx/vmx.h    | 25 +++++++++++++-
 3 files changed, 86 insertions(+), 49 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 89af692deb7e..1ae147e2c8f2 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3310,7 +3310,11 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 	enum vm_entry_failure_code entry_failure_code;
 	bool evaluate_pending_interrupts;
-	u32 exit_reason, failed_index;
+	u32 failed_index;
+	union vmx_exit_reason exit_reason = {
+		.basic = -1,
+		.failed_vmentry = 1,
+	};
 
 	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
 		kvm_vcpu_flush_tlb_current(vcpu);
@@ -3362,7 +3366,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 
 		if (nested_vmx_check_guest_state(vcpu, vmcs12,
 						 &entry_failure_code)) {
-			exit_reason = EXIT_REASON_INVALID_STATE;
+			exit_reason.basic = EXIT_REASON_INVALID_STATE;
 			vmcs12->exit_qualification = entry_failure_code;
 			goto vmentry_fail_vmexit;
 		}
@@ -3373,7 +3377,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 		vcpu->arch.tsc_offset += vmcs12->tsc_offset;
 
 	if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code)) {
-		exit_reason = EXIT_REASON_INVALID_STATE;
+		exit_reason.basic = EXIT_REASON_INVALID_STATE;
 		vmcs12->exit_qualification = entry_failure_code;
 		goto vmentry_fail_vmexit_guest_mode;
 	}
@@ -3383,7 +3387,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 						   vmcs12->vm_entry_msr_load_addr,
 						   vmcs12->vm_entry_msr_load_count);
 		if (failed_index) {
-			exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
+			exit_reason.basic = EXIT_REASON_MSR_LOAD_FAIL;
 			vmcs12->exit_qualification = failed_index;
 			goto vmentry_fail_vmexit_guest_mode;
 		}
@@ -3451,7 +3455,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 		return NVMX_VMENTRY_VMEXIT;
 
 	load_vmcs12_host_state(vcpu, vmcs12);
-	vmcs12->vm_exit_reason = exit_reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
+	vmcs12->vm_exit_reason = exit_reason.full;
 	if (enable_shadow_vmcs || vmx->nested.hv_evmcs)
 		vmx->nested.need_vmcs12_to_shadow_sync = true;
 	return NVMX_VMENTRY_VMEXIT;
@@ -5512,7 +5516,12 @@ static int handle_vmfunc(struct kvm_vcpu *vcpu)
 	return kvm_skip_emulated_instruction(vcpu);
 
 fail:
-	nested_vmx_vmexit(vcpu, vmx->exit_reason,
+	/*
+	 * This is effectively a reflected VM-Exit, as opposed to a synthesized
+	 * nested VM-Exit.  Pass the original exit reason, i.e. don't hardcode
+	 * EXIT_REASON_VMFUNC as the exit reason.
+	 */
+	nested_vmx_vmexit(vcpu, vmx->exit_reason.full,
 			  vmx_get_intr_info(vcpu),
 			  vmx_get_exit_qual(vcpu));
 	return 1;
@@ -5580,7 +5589,8 @@ static bool nested_vmx_exit_handled_io(struct kvm_vcpu *vcpu,
  * MSR bitmap. This may be the case even when L0 doesn't use MSR bitmaps.
  */
 static bool nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu,
-	struct vmcs12 *vmcs12, u32 exit_reason)
+					struct vmcs12 *vmcs12,
+					union vmx_exit_reason exit_reason)
 {
 	u32 msr_index = kvm_rcx_read(vcpu);
 	gpa_t bitmap;
@@ -5594,7 +5604,7 @@ static bool nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu,
 	 * First we need to figure out which of the four to use:
 	 */
 	bitmap = vmcs12->msr_bitmap;
-	if (exit_reason == EXIT_REASON_MSR_WRITE)
+	if (exit_reason.basic == EXIT_REASON_MSR_WRITE)
 		bitmap += 2048;
 	if (msr_index >= 0xc0000000) {
 		msr_index -= 0xc0000000;
@@ -5731,11 +5741,12 @@ static bool nested_vmx_exit_handled_mtf(struct vmcs12 *vmcs12)
  * Return true if L0 wants to handle an exit from L2 regardless of whether or not
  * L1 wants the exit.  Only call this when in is_guest_mode (L2).
  */
-static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu, u32 exit_reason)
+static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu,
+				     union vmx_exit_reason exit_reason)
 {
 	u32 intr_info;
 
-	switch ((u16)exit_reason) {
+	switch (exit_reason.basic) {
 	case EXIT_REASON_EXCEPTION_NMI:
 		intr_info = vmx_get_intr_info(vcpu);
 		if (is_nmi(intr_info))
@@ -5791,12 +5802,13 @@ static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu, u32 exit_reason)
  * Return 1 if L1 wants to intercept an exit from L2.  Only call this when in
  * is_guest_mode (L2).
  */
-static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu, u32 exit_reason)
+static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu,
+				     union vmx_exit_reason exit_reason)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 	u32 intr_info;
 
-	switch ((u16)exit_reason) {
+	switch (exit_reason.basic) {
 	case EXIT_REASON_EXCEPTION_NMI:
 		intr_info = vmx_get_intr_info(vcpu);
 		if (is_nmi(intr_info))
@@ -5915,7 +5927,7 @@ static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu, u32 exit_reason)
 bool nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u32 exit_reason = vmx->exit_reason;
+	union vmx_exit_reason exit_reason = vmx->exit_reason;
 	unsigned long exit_qual;
 	u32 exit_intr_info;
 
@@ -5934,7 +5946,7 @@ bool nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu)
 		goto reflect_vmexit;
 	}
 
-	trace_kvm_nested_vmexit(exit_reason, vcpu, KVM_ISA_VMX);
+	trace_kvm_nested_vmexit(exit_reason.full, vcpu, KVM_ISA_VMX);
 
 	/* If L0 (KVM) wants the exit, it trumps L1's desires. */
 	if (nested_vmx_l0_wants_exit(vcpu, exit_reason))
@@ -5960,7 +5972,7 @@ bool nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu)
 	exit_qual = vmx_get_exit_qual(vcpu);
 
 reflect_vmexit:
-	nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
+	nested_vmx_vmexit(vcpu, exit_reason.full, exit_intr_info, exit_qual);
 	return true;
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 47b8357b9751..8b37812bbadc 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1578,7 +1578,7 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	 * i.e. we end up advancing IP with some random value.
 	 */
 	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
-	    to_vmx(vcpu)->exit_reason != EXIT_REASON_EPT_MISCONFIG) {
+	    to_vmx(vcpu)->exit_reason.basic != EXIT_REASON_EPT_MISCONFIG) {
 		orig_rip = kvm_rip_read(vcpu);
 		rip = orig_rip + vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
 #ifdef CONFIG_X86_64
@@ -5687,7 +5687,7 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	*info1 = vmx_get_exit_qual(vcpu);
-	if (!(vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) {
+	if (!vmx->exit_reason.failed_vmentry) {
 		*info2 = vmx->idt_vectoring_info;
 		*intr_info = vmx_get_intr_info(vcpu);
 		if (is_exception_with_error_code(*intr_info))
@@ -5931,8 +5931,9 @@ void dump_vmcs(void)
 static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u32 exit_reason = vmx->exit_reason;
+	union vmx_exit_reason exit_reason = vmx->exit_reason;
 	u32 vectoring_info = vmx->idt_vectoring_info;
+	u16 exit_handler_index;
 
 	/*
 	 * Flush logged GPAs PML buffer, this will make dirty_bitmap more
@@ -5974,11 +5975,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 			return 1;
 	}
 
-	if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) {
+	if (exit_reason.failed_vmentry) {
 		dump_vmcs();
 		vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
 		vcpu->run->fail_entry.hardware_entry_failure_reason
-			= exit_reason;
+			= exit_reason.full;
 		vcpu->run->fail_entry.cpu = vcpu->arch.last_vmentry_cpu;
 		return 0;
 	}
@@ -6000,18 +6001,18 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	 * will cause infinite loop.
 	 */
 	if ((vectoring_info & VECTORING_INFO_VALID_MASK) &&
-			(exit_reason != EXIT_REASON_EXCEPTION_NMI &&
-			exit_reason != EXIT_REASON_EPT_VIOLATION &&
-			exit_reason != EXIT_REASON_PML_FULL &&
-			exit_reason != EXIT_REASON_APIC_ACCESS &&
-			exit_reason != EXIT_REASON_TASK_SWITCH)) {
+	    (exit_reason.basic != EXIT_REASON_EXCEPTION_NMI &&
+	     exit_reason.basic != EXIT_REASON_EPT_VIOLATION &&
+	     exit_reason.basic != EXIT_REASON_PML_FULL &&
+	     exit_reason.basic != EXIT_REASON_APIC_ACCESS &&
+	     exit_reason.basic != EXIT_REASON_TASK_SWITCH)) {
 		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_DELIVERY_EV;
 		vcpu->run->internal.ndata = 3;
 		vcpu->run->internal.data[0] = vectoring_info;
-		vcpu->run->internal.data[1] = exit_reason;
+		vcpu->run->internal.data[1] = exit_reason.full;
 		vcpu->run->internal.data[2] = vcpu->arch.exit_qualification;
-		if (exit_reason == EXIT_REASON_EPT_MISCONFIG) {
+		if (exit_reason.basic == EXIT_REASON_EPT_MISCONFIG) {
 			vcpu->run->internal.ndata++;
 			vcpu->run->internal.data[3] =
 				vmcs_read64(GUEST_PHYSICAL_ADDRESS);
@@ -6043,38 +6044,39 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	if (exit_fastpath != EXIT_FASTPATH_NONE)
 		return 1;
 
-	if (exit_reason >= kvm_vmx_max_exit_handlers)
+	if (exit_reason.basic >= kvm_vmx_max_exit_handlers)
 		goto unexpected_vmexit;
 #ifdef CONFIG_RETPOLINE
-	if (exit_reason == EXIT_REASON_MSR_WRITE)
+	if (exit_reason.basic == EXIT_REASON_MSR_WRITE)
 		return kvm_emulate_wrmsr(vcpu);
-	else if (exit_reason == EXIT_REASON_PREEMPTION_TIMER)
+	else if (exit_reason.basic == EXIT_REASON_PREEMPTION_TIMER)
 		return handle_preemption_timer(vcpu);
-	else if (exit_reason == EXIT_REASON_INTERRUPT_WINDOW)
+	else if (exit_reason.basic == EXIT_REASON_INTERRUPT_WINDOW)
 		return handle_interrupt_window(vcpu);
-	else if (exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
+	else if (exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
 		return handle_external_interrupt(vcpu);
-	else if (exit_reason == EXIT_REASON_HLT)
+	else if (exit_reason.basic == EXIT_REASON_HLT)
 		return kvm_emulate_halt(vcpu);
-	else if (exit_reason == EXIT_REASON_EPT_MISCONFIG)
+	else if (exit_reason.basic == EXIT_REASON_EPT_MISCONFIG)
 		return handle_ept_misconfig(vcpu);
 #endif
 
-	exit_reason = array_index_nospec(exit_reason,
-					 kvm_vmx_max_exit_handlers);
-	if (!kvm_vmx_exit_handlers[exit_reason])
+	exit_handler_index = array_index_nospec((u16)exit_reason.basic,
+						kvm_vmx_max_exit_handlers);
+	if (!kvm_vmx_exit_handlers[exit_handler_index])
 		goto unexpected_vmexit;
 
-	return kvm_vmx_exit_handlers[exit_reason](vcpu);
+	return kvm_vmx_exit_handlers[exit_handler_index](vcpu);
 
 unexpected_vmexit:
-	vcpu_unimpl(vcpu, "vmx: unexpected exit reason 0x%x\n", exit_reason);
+	vcpu_unimpl(vcpu, "vmx: unexpected exit reason 0x%x\n",
+		    exit_reason.full);
 	dump_vmcs();
 	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 	vcpu->run->internal.suberror =
 			KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
 	vcpu->run->internal.ndata = 2;
-	vcpu->run->internal.data[0] = exit_reason;
+	vcpu->run->internal.data[0] = exit_reason.full;
 	vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
 	return 0;
 }
@@ -6393,9 +6395,9 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	if (vmx->exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
+	if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
 		handle_external_interrupt_irqoff(vcpu);
-	else if (vmx->exit_reason == EXIT_REASON_EXCEPTION_NMI)
+	else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
 		handle_exception_nmi_irqoff(vmx);
 }
 
@@ -6583,7 +6585,7 @@ void noinstr vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)
 
 static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
 {
-	switch (to_vmx(vcpu)->exit_reason) {
+	switch (to_vmx(vcpu)->exit_reason.basic) {
 	case EXIT_REASON_MSR_WRITE:
 		return handle_fastpath_set_msr_irqoff(vcpu);
 	case EXIT_REASON_PREEMPTION_TIMER:
@@ -6782,17 +6784,17 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	vmx->idt_vectoring_info = 0;
 
 	if (unlikely(vmx->fail)) {
-		vmx->exit_reason = 0xdead;
+		vmx->exit_reason.full = 0xdead;
 		return EXIT_FASTPATH_NONE;
 	}
 
-	vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
-	if (unlikely((u16)vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY))
+	vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
+	if (unlikely(vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY))
 		kvm_machine_check();
 
-	trace_kvm_exit(vmx->exit_reason, vcpu, KVM_ISA_VMX);
+	trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX);
 
-	if (unlikely(vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
+	if (unlikely(vmx->exit_reason.failed_vmentry))
 		return EXIT_FASTPATH_NONE;
 
 	vmx->loaded_vmcs->launched = 1;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index f6f66e5c6510..c8ad47ea8445 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -70,6 +70,29 @@ struct pt_desc {
 	struct pt_ctx guest;
 };
 
+union vmx_exit_reason {
+	struct {
+		u32	basic			: 16;
+		u32	reserved16		: 1;
+		u32	reserved17		: 1;
+		u32	reserved18		: 1;
+		u32	reserved19		: 1;
+		u32	reserved20		: 1;
+		u32	reserved21		: 1;
+		u32	reserved22		: 1;
+		u32	reserved23		: 1;
+		u32	reserved24		: 1;
+		u32	reserved25		: 1;
+		u32	reserved26		: 1;
+		u32	sgx_enclave_mode	: 1;
+		u32	smi_pending_mtf		: 1;
+		u32	smi_from_vmx_root	: 1;
+		u32	reserved30		: 1;
+		u32	failed_vmentry		: 1;
+	};
+	u32 full;
+};
+
 /*
  * The nested_vmx structure is part of vcpu_vmx, and holds information we need
  * for correct emulation of VMX (i.e., nested VMX) on this vcpu.
@@ -244,7 +267,7 @@ struct vcpu_vmx {
 	int vpid;
 	bool emulation_required;
 
-	u32 exit_reason;
+	union vmx_exit_reason exit_reason;
 
 	/* Posted interrupt descriptor */
 	struct pi_desc pi_desc;
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 14/23] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX)
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (12 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 13/23] KVM: VMX: Convert vcpu_vmx.exit_reason to a union Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 15/23] KVM: x86: Define new #PF SGX error code bit Kai Huang
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, mattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Export the gva_to_gpa() helpers for use by SGX virtualization when
executing ENCLS[ECREATE] and ENCLS[EINIT] on behalf of the guest.
To execute ECREATE and EINIT, KVM must obtain the GPA of the target
Secure Enclave Control Structure (SECS) in order to get its
corresponding HVA.

Because the SECS must reside in the Enclave Page Cache (EPC), copying
the SECS's data to a host-controlled buffer via existing exported
helpers is not a viable option as the EPC is not readable or writable
by the kernel.

SGX virtualization will also use gva_to_gpa() to obtain HVAs for
non-EPC pages in order to pass user pointers directly to ECREATE and
EINIT, which avoids having to copy pages worth of data into the kernel.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 078a39d489fe..c195494da0ea 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5866,6 +5866,7 @@ gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
 	u32 access = (kvm_x86_ops.get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
 	return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read);
 
  gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva,
 				struct x86_exception *exception)
@@ -5882,6 +5883,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
 	access |= PFERR_WRITE_MASK;
 	return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_write);
 
 /* uses this to access any guest's mapped memory without checking CPL */
 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 15/23] KVM: x86: Define new #PF SGX error code bit
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (13 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 14/23] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX) Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 16/23] KVM: x86: Add SGX feature leaf to reverse CPUID lookup Kai Huang
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, mattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Page faults that are signaled by the SGX Enclave Page Cache Map (EPCM),
as opposed to the traditional IA32/EPT page tables, set an SGX bit in
the error code to indicate that the #PF was induced by SGX.  KVM will
need to emulate this behavior as part of its trap-and-execute scheme for
virtualizing SGX Launch Control, e.g. to inject SGX-induced #PFs if
EINIT faults in the host, and to support live migration.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 324ddd7fd0aa..b1cbcfff0265 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -216,6 +216,7 @@ enum x86_intercept_stage;
 #define PFERR_RSVD_BIT 3
 #define PFERR_FETCH_BIT 4
 #define PFERR_PK_BIT 5
+#define PFERR_SGX_BIT 15
 #define PFERR_GUEST_FINAL_BIT 32
 #define PFERR_GUEST_PAGE_BIT 33
 
@@ -225,6 +226,7 @@ enum x86_intercept_stage;
 #define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
 #define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
 #define PFERR_PK_MASK (1U << PFERR_PK_BIT)
+#define PFERR_SGX_MASK (1U << PFERR_SGX_BIT)
 #define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
 #define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 16/23] KVM: x86: Add SGX feature leaf to reverse CPUID lookup
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (14 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 15/23] KVM: x86: Define new #PF SGX error code bit Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 17/23] KVM: VMX: Add basic handling of VM-Exit from SGX enclave Kai Huang
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, mattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add SGX's sub-features leaf to the reverse CPUID lookup table in
preparation for adding SGX virtualization.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/cpuid.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index f7a6e8f83783..88b7f5db55b9 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -63,6 +63,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
 	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
 	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
+	[CPUID_12_EAX]        = {      0x12, 0, CPUID_EAX},
 };
 
 /*
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 17/23] KVM: VMX: Add basic handling of VM-Exit from SGX enclave
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (15 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 16/23] KVM: x86: Add SGX feature leaf to reverse CPUID lookup Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 18/23] KVM: VMX: Frame in ENCLS handler for SGX virtualization Kai Huang
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, mattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add support for handling VM-Exits that originate from a guest SGX
enclave.  In SGX, an "enclave" is a new CPL3-only execution environment,
wherein the CPU and memory state is protected by hardware to make the
state inaccesible to code running outside of the enclave.  When exiting
an enclave due to an asynchronous event (from the perspective of the
enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state
is automatically saved and scrubbed (the CPU loads synthetic state), and
then reloaded when re-entering the enclave.  E.g. after an instruction
based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP
of the enclave instruction that trigered VM-Exit, but will instead point
to a RIP in the enclave's untrusted runtime (the guest userspace code
that coordinates entry/exit to/from the enclave).

To help a VMM recognize and handle exits from enclaves, SGX adds bits to
existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and
GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR.  Define the
new architectural bits, and add a boolean to struct vcpu_vmx to cache
VMX_EXIT_REASON_FROM_ENCLAVE.  Clear the bit in exit_reason so that
checks against exit_reason do not need to account for SGX, e.g.
"if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work.

KVM is a largely a passive observer of the new bits, e.g. KVM needs to
account for the bits when propagating information to a nested VMM, but
otherwise doesn't need to act differently for the majority of VM-Exits
from enclaves.

The one scenario that is directly impacted is emulation, which is for
all intents and purposes impossible[1] since KVM does not have access to
the RIP or instruction stream that triggered the VM-Exit.  The inability
to emulate is a non-issue for KVM, as most instructions that might
trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit
check.  For the few instruction that conditionally #UD, KVM either never
sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only
if the feature is not exposed to the guest in order to inject a #UD,
e.g. RDRAND_EXITING.

But, because it is still possible for a guest to trigger emulation,
e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit
from an enclave.  This is architecturally accurate for instruction
VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable
to killing the VM.  In practice, only broken or particularly stupid
guests should ever encounter this behavior.

Add a WARN in skip_emulated_instruction to detect any attempt to
modify the guest's RIP during an SGX enclave VM-Exit as all such flows
should either be unreachable or must handle exits from enclaves before
getting to skip_emulated_instruction.

[1] Impossible for all practical purposes.  Not truly impossible
    since KVM could implement some form of para-virtualization scheme.

[2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at
    CPL3, so we also don't need to worry about that interaction.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
 [Kai: Remove unlikely()s suggested by Dave Hansen.]
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/vmx.h      |  1 +
 arch/x86/include/uapi/asm/vmx.h |  1 +
 arch/x86/kvm/vmx/nested.c       |  2 ++
 arch/x86/kvm/vmx/vmx.c          | 38 +++++++++++++++++++++++++++++++--
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index f8ba5289ecb0..c6f028bac3ff 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -371,6 +371,7 @@ enum vmcs_field {
 #define GUEST_INTR_STATE_MOV_SS		0x00000002
 #define GUEST_INTR_STATE_SMI		0x00000004
 #define GUEST_INTR_STATE_NMI		0x00000008
+#define GUEST_INTR_STATE_ENCLAVE_INTR	0x00000010
 
 /* GUEST_ACTIVITY_STATE flags */
 #define GUEST_ACTIVITY_ACTIVE		0
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index b8ff9e8ac0d5..df6707a76a3d 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -27,6 +27,7 @@
 
 
 #define VMX_EXIT_REASONS_FAILED_VMENTRY         0x80000000
+#define VMX_EXIT_REASONS_SGX_ENCLAVE_MODE	0x08000000
 
 #define EXIT_REASON_EXCEPTION_NMI       0
 #define EXIT_REASON_EXTERNAL_INTERRUPT  1
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1ae147e2c8f2..f16d6c83eafa 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4100,6 +4100,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 {
 	/* update exit information fields: */
 	vmcs12->vm_exit_reason = vm_exit_reason;
+	if (to_vmx(vcpu)->exit_reason.sgx_enclave_mode)
+		vmcs12->vm_exit_reason |= VMX_EXIT_REASONS_SGX_ENCLAVE_MODE;
 	vmcs12->exit_qualification = exit_qualification;
 	vmcs12->vm_exit_intr_info = exit_intr_info;
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8b37812bbadc..bdc36f5e06b9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1562,12 +1562,18 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
 
 static bool vmx_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn, int insn_len)
 {
+	if (to_vmx(vcpu)->exit_reason.sgx_enclave_mode) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+		return false;
+	}
 	return true;
 }
 
 static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
+	union vmx_exit_reason exit_reason = to_vmx(vcpu)->exit_reason;
 	unsigned long rip, orig_rip;
+	u32 instr_len;
 
 	/*
 	 * Using VMCS.VM_EXIT_INSTRUCTION_LEN on EPT misconfig depends on
@@ -1578,9 +1584,33 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	 * i.e. we end up advancing IP with some random value.
 	 */
 	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
-	    to_vmx(vcpu)->exit_reason.basic != EXIT_REASON_EPT_MISCONFIG) {
+	    exit_reason.basic != EXIT_REASON_EPT_MISCONFIG) {
+		instr_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+
+		/*
+		 * Emulating an enclave's instructions isn't supported as KVM
+		 * cannot access the enclave's memory or its true RIP, e.g. the
+		 * vmcs.GUEST_RIP points at the exit point of the enclave, not
+		 * the RIP that actually triggered the VM-Exit.  But, because
+		 * most instructions that cause VM-Exit will #UD in an enclave,
+		 * most instruction-based VM-Exits simply do not occur.
+		 *
+		 * There are a few exceptions, notably the debug instructions
+		 * INT1ICEBRK and INT3, as they are allowed in debug enclaves
+		 * and generate #DB/#BP as expected, which KVM might intercept.
+		 * But again, the CPU does the dirty work and saves an instr
+		 * length of zero so VMMs don't shoot themselves in the foot.
+		 * WARN if KVM tries to skip a non-zero length instruction on
+		 * a VM-Exit from an enclave.
+		 */
+		if (!instr_len)
+			goto rip_updated;
+
+		WARN(exit_reason.sgx_enclave_mode,
+		     "KVM: skipping instruction after SGX enclave VM-Exit");
+
 		orig_rip = kvm_rip_read(vcpu);
-		rip = orig_rip + vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+		rip = orig_rip + instr_len;
 #ifdef CONFIG_X86_64
 		/*
 		 * We need to mask out the high 32 bits of RIP if not in 64-bit
@@ -1596,6 +1626,7 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 			return 0;
 	}
 
+rip_updated:
 	/* skipping an emulated instruction also counts */
 	vmx_set_interrupt_shadow(vcpu, 0);
 
@@ -5361,6 +5392,9 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 {
 	gpa_t gpa;
 
+	if (!vmx_can_emulate_instruction(vcpu, NULL, 0))
+		return 1;
+
 	/*
 	 * A nested guest cannot optimize MMIO vmexits, because we have an
 	 * nGPA here instead of the required GPA.
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 18/23] KVM: VMX: Frame in ENCLS handler for SGX virtualization
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (16 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 17/23] KVM: VMX: Add basic handling of VM-Exit from SGX enclave Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 19/23] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions Kai Huang
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, mattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Introduce sgx.c and sgx.h, along with the framework for handling ENCLS
VM-Exits.  Add a bool, enable_sgx, that will eventually be wired up to a
module param to control whether or not SGX virtualization is enabled at
runtime.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/Makefile  |  2 ++
 arch/x86/kvm/vmx/sgx.c | 51 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/sgx.h | 15 +++++++++++++
 arch/x86/kvm/vmx/vmx.c |  9 +++++---
 4 files changed, 74 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/sgx.c
 create mode 100644 arch/x86/kvm/vmx/sgx.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index b804444e16d4..6b69230e9a29 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -20,6 +20,8 @@ kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
 
 kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
+kvm-intel-$(CONFIG_X86_SGX_VIRTUALIZATION)	+= vmx/sgx.o
+
 kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
 
 obj-$(CONFIG_KVM)	+= kvm.o
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
new file mode 100644
index 000000000000..693bf7735308
--- /dev/null
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  Copyright(c) 2016-20 Intel Corporation. */
+
+#include <asm/sgx.h>
+#include <asm/sgx_arch.h>
+
+#include "cpuid.h"
+#include "kvm_cache_regs.h"
+#include "sgx.h"
+#include "vmx.h"
+#include "x86.h"
+
+bool __read_mostly enable_sgx;
+
+static inline bool encls_leaf_enabled_in_guest(struct kvm_vcpu *vcpu, u32 leaf)
+{
+	if (!enable_sgx || !guest_cpuid_has(vcpu, X86_FEATURE_SGX))
+		return false;
+
+	if (leaf >= ECREATE && leaf <= ETRACK)
+		return guest_cpuid_has(vcpu, X86_FEATURE_SGX1);
+
+	if (leaf >= EAUG && leaf <= EMODT)
+		return guest_cpuid_has(vcpu, X86_FEATURE_SGX2);
+
+	return false;
+}
+
+static inline bool sgx_enabled_in_guest_bios(struct kvm_vcpu *vcpu)
+{
+	const u64 bits = FEAT_CTL_SGX_ENABLED | FEAT_CTL_LOCKED;
+
+	return (to_vmx(vcpu)->msr_ia32_feature_control & bits) == bits;
+}
+
+int handle_encls(struct kvm_vcpu *vcpu)
+{
+	u32 leaf = (u32)vcpu->arch.regs[VCPU_REGS_RAX];
+
+	if (!encls_leaf_enabled_in_guest(vcpu, leaf)) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+	} else if (!sgx_enabled_in_guest_bios(vcpu)) {
+		kvm_inject_gp(vcpu, 0);
+	} else {
+		WARN(1, "KVM: unexpected exit on ENCLS[%u]", leaf);
+		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+		vcpu->run->hw.hardware_exit_reason = EXIT_REASON_ENCLS;
+		return 0;
+	}
+	return 1;
+}
diff --git a/arch/x86/kvm/vmx/sgx.h b/arch/x86/kvm/vmx/sgx.h
new file mode 100644
index 000000000000..647afc7546bf
--- /dev/null
+++ b/arch/x86/kvm/vmx/sgx.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_SGX_H
+#define __KVM_X86_SGX_H
+
+#include <linux/kvm_host.h>
+
+#ifdef CONFIG_X86_SGX_VIRTUALIZATION
+extern bool __read_mostly enable_sgx;
+
+int handle_encls(struct kvm_vcpu *vcpu);
+#else
+#define enable_sgx 0
+#endif
+
+#endif /* __KVM_X86_SGX_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bdc36f5e06b9..4bcb391fc2f5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -57,6 +57,7 @@
 #include "mmu.h"
 #include "nested.h"
 #include "pmu.h"
+#include "sgx.h"
 #include "trace.h"
 #include "vmcs.h"
 #include "vmcs12.h"
@@ -5643,16 +5644,18 @@ static int handle_vmx_instruction(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+#ifndef CONFIG_X86_SGX_VIRTUALIZATION
 static int handle_encls(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * SGX virtualization is not yet supported.  There is no software
-	 * enable bit for SGX, so we have to trap ENCLS and inject a #UD
-	 * to prevent the guest from executing ENCLS.
+	 * SGX virtualization is disabled.  There is no software enable bit for
+	 * SGX, so KVM intercepts all ENCLS leafs and injects a #UD to prevent
+	 * the guest from executing ENCLS (when SGX is supported by hardware).
 	 */
 	kvm_queue_exception(vcpu, UD_VECTOR);
 	return 1;
 }
+#endif /* CONFIG_X86_SGX_VIRTUALIZATION */
 
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 19/23] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (17 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 18/23] KVM: VMX: Frame in ENCLS handler for SGX virtualization Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 20/23] KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs Kai Huang
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, mattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add an ECREATE handler that will be used to intercept ECREATE for the
purpose of enforcing and enclave's MISCSELECT, ATTRIBUTES and XFRM, i.e.
to allow userspace to restrict SGX features via CPUID.  ECREATE will be
intercepted when any of the aforementioned masks diverges from hardware
in order to enforce the desired CPUID model, i.e. inject #GP if the
guest attempts to set a bit that hasn't been enumerated as allowed-1 in
CPUID.

Note, access to the PROVISIONKEY is not yet supported.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |   3 +
 arch/x86/kvm/vmx/sgx.c          | 243 ++++++++++++++++++++++++++++++++
 2 files changed, 246 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b1cbcfff0265..567b6fa02fb3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -996,6 +996,9 @@ struct kvm_arch {
 		struct msr_bitmap_range ranges[16];
 	} msr_filter;
 
+	/* Guest can access the SGX PROVISIONKEY. */
+	bool sgx_provisioning_allowed;
+
 	struct kvm_pmu_event_filter *pmu_event_filter;
 	struct task_struct *nx_lpage_recovery_thread;
 
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 693bf7735308..4281045318ac 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -12,6 +12,247 @@
 
 bool __read_mostly enable_sgx;
 
+/*
+ * ENCLS's memory operands use a fixed segment (DS) and a fixed
+ * address size based on the mode.  Related prefixes are ignored.
+ */
+static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
+			     int size, int alignment, gva_t *gva)
+{
+	struct kvm_segment s;
+	bool fault;
+
+	/* Skip vmcs.GUEST_DS retrieval for 64-bit mode to avoid VMREADs. */
+	*gva = offset;
+	if (!is_long_mode(vcpu)) {
+		vmx_get_segment(vcpu, &s, VCPU_SREG_DS);
+		*gva += s.base;
+	}
+
+	if (!IS_ALIGNED(*gva, alignment)) {
+		fault = true;
+	} else if (likely(is_long_mode(vcpu))) {
+		fault = is_noncanonical_address(*gva, vcpu);
+	} else {
+		*gva &= 0xffffffff;
+		fault = (s.unusable) ||
+			(s.type != 2 && s.type != 3) ||
+			(*gva > s.limit) ||
+			((s.base != 0 || s.limit != 0xffffffff) &&
+			(((u64)*gva + size - 1) > s.limit + 1));
+	}
+	if (fault)
+		kvm_inject_gp(vcpu, 0);
+	return fault ? -EINVAL : 0;
+}
+
+static void sgx_handle_emulation_failure(struct kvm_vcpu *vcpu, u64 addr,
+					 unsigned int size)
+{
+	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+	vcpu->run->internal.ndata = 2;
+	vcpu->run->internal.data[0] = addr;
+	vcpu->run->internal.data[1] = size;
+}
+
+static int sgx_read_hva(struct kvm_vcpu *vcpu, unsigned long hva, void *data,
+			unsigned int size)
+{
+	if (__copy_from_user(data, (void __user *)hva, size)) {
+		sgx_handle_emulation_failure(vcpu, hva, size);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int sgx_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t gva, bool write,
+			  gpa_t *gpa)
+{
+	struct x86_exception ex;
+
+	if (write)
+		*gpa = kvm_mmu_gva_to_gpa_write(vcpu, gva, &ex);
+	else
+		*gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, &ex);
+
+	if (*gpa == UNMAPPED_GVA) {
+		kvm_inject_emulated_page_fault(vcpu, &ex);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int sgx_gpa_to_hva(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long *hva)
+{
+	*hva = kvm_vcpu_gfn_to_hva(vcpu, PFN_DOWN(gpa));
+	if (kvm_is_error_hva(*hva)) {
+		sgx_handle_emulation_failure(vcpu, gpa, 1);
+		return -EFAULT;
+	}
+
+	*hva |= gpa & ~PAGE_MASK;
+
+	return 0;
+}
+
+static int sgx_inject_fault(struct kvm_vcpu *vcpu, gva_t gva, int trapnr)
+{
+	struct x86_exception ex;
+
+	/*
+	 * A non-EPCM #PF indicates a bad userspace HVA.  This *should* check
+	 * for PFEC.SGX and not assume any #PF on SGX2 originated in the EPC,
+	 * but the error code isn't (yet) plumbed through the ENCLS helpers.
+	 */
+	if (trapnr == PF_VECTOR && !boot_cpu_has(X86_FEATURE_SGX2)) {
+		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+		vcpu->run->internal.ndata = 0;
+		return 0;
+	}
+
+	/*
+	 * If the guest thinks it's running on SGX2 hardware, inject an SGX
+	 * #PF if the fault matches an EPCM fault signature (#GP on SGX1,
+	 * #PF on SGX2).  The assumption is that EPCM faults are much more
+	 * likely than a bad userspace address.
+	 */
+	if ((trapnr == PF_VECTOR || !boot_cpu_has(X86_FEATURE_SGX2)) &&
+	    guest_cpuid_has(vcpu, X86_FEATURE_SGX2)) {
+		memset(&ex, 0, sizeof(ex));
+		ex.vector = PF_VECTOR;
+		ex.error_code = PFERR_PRESENT_MASK | PFERR_WRITE_MASK |
+				PFERR_SGX_MASK;
+		ex.address = gva;
+		ex.error_code_valid = true;
+		ex.nested_page_fault = false;
+		kvm_inject_page_fault(vcpu, &ex);
+	} else {
+		kvm_inject_gp(vcpu, 0);
+	}
+	return 1;
+}
+
+static int handle_encls_ecreate(struct kvm_vcpu *vcpu)
+{
+	unsigned long a_hva, m_hva, x_hva, s_hva, secs_hva;
+	struct kvm_cpuid_entry2 *sgx_12_0, *sgx_12_1;
+	gpa_t metadata_gpa, contents_gpa, secs_gpa;
+	struct sgx_pageinfo pageinfo;
+	gva_t pageinfo_gva, secs_gva;
+	u64 attributes, xfrm, size;
+	struct x86_exception ex;
+	u8 max_size_log2;
+	u32 miscselect;
+	int trapnr, r;
+
+	sgx_12_0 = kvm_find_cpuid_entry(vcpu, 0x12, 0);
+	sgx_12_1 = kvm_find_cpuid_entry(vcpu, 0x12, 1);
+	if (!sgx_12_0 || !sgx_12_1) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 32, 32, &pageinfo_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva))
+		return 1;
+
+	/*
+	 * Copy the PAGEINFO to local memory, its pointers need to be
+	 * translated, i.e. we need to do a deep copy/translate.
+	 */
+	r = kvm_read_guest_virt(vcpu, pageinfo_gva, &pageinfo,
+				sizeof(pageinfo), &ex);
+	if (r == X86EMUL_PROPAGATE_FAULT) {
+		kvm_inject_emulated_page_fault(vcpu, &ex);
+		return 1;
+	} else if (r != X86EMUL_CONTINUE) {
+		sgx_handle_emulation_failure(vcpu, pageinfo_gva, size);
+		return 0;
+	}
+
+	/*
+	 * Verify alignment early.  This conveniently avoids having to worry
+	 * about page splits on userspace addresses.
+	 */
+	if (!IS_ALIGNED(pageinfo.metadata, 64) ||
+	    !IS_ALIGNED(pageinfo.contents, 4096)) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	/*
+	 * Translate the SECINFO, SOURCE and SECS pointers from GVA to GPA.
+	 * Resume the guest on failure to inject a #PF.
+	 */
+	if (sgx_gva_to_gpa(vcpu, pageinfo.metadata, false, &metadata_gpa) ||
+	    sgx_gva_to_gpa(vcpu, pageinfo.contents, false, &contents_gpa) ||
+	    sgx_gva_to_gpa(vcpu, secs_gva, true, &secs_gpa))
+		return 1;
+
+	/*
+	 * ...and then to HVA.  The order of accesses isn't architectural, i.e.
+	 * KVM doesn't have to fully process one address at a time.  Exit to
+	 * userspace if a GPA is invalid.
+	 */
+	if (sgx_gpa_to_hva(vcpu, metadata_gpa,
+			   (unsigned long *)&pageinfo.metadata) ||
+	    sgx_gpa_to_hva(vcpu, contents_gpa,
+			   (unsigned long *)&pageinfo.contents) ||
+	    sgx_gpa_to_hva(vcpu, secs_gpa, &secs_hva))
+		return 0;
+
+	/*
+	 * Read out select portions of the input SECS to enforce userspace
+	 * restrictions on MISCSELECT, ATTRIBUTES, etc...  Note, 'contents' is
+	 * page aligned, i.e. no need to worry about page splits.
+	 */
+	m_hva = pageinfo.contents + offsetof(struct sgx_secs, miscselect);
+	a_hva = pageinfo.contents + offsetof(struct sgx_secs, attributes);
+	x_hva = pageinfo.contents + offsetof(struct sgx_secs, xfrm);
+	s_hva = pageinfo.contents + offsetof(struct sgx_secs, size);
+
+	/* Exit to userspace if copying from a host userspace address fails. */
+	if (sgx_read_hva(vcpu, m_hva, &miscselect, sizeof(miscselect)) ||
+	    sgx_read_hva(vcpu, a_hva, &attributes, sizeof(attributes)) ||
+	    sgx_read_hva(vcpu, x_hva, &xfrm, sizeof(xfrm)) ||
+	    sgx_read_hva(vcpu, s_hva, &size, sizeof(size)))
+		return 0;
+
+	/* Enforce restriction of access to the PROVISIONKEY. */
+	if (!vcpu->kvm->arch.sgx_provisioning_allowed &&
+	    (attributes & SGX_ATTR_PROVISIONKEY)) {
+		if (sgx_12_1->eax & SGX_ATTR_PROVISIONKEY)
+			pr_warn_once("KVM: SGX PROVISIONKEY advertised but not allowed\n");
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	/* Enforce CPUID restrictions on MISCSELECT, ATTRIBUTES and XFRM. */
+	if ((u32)miscselect & ~sgx_12_0->ebx ||
+	    (u32)attributes & ~sgx_12_1->eax ||
+	    (u32)(attributes >> 32) & ~sgx_12_1->ebx ||
+	    (u32)xfrm & ~sgx_12_1->ecx ||
+	    (u32)(xfrm >> 32) & ~sgx_12_1->edx) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	/* Enforce CPUID restriction on max enclave size. */
+	max_size_log2 = (attributes & SGX_ATTR_MODE64BIT) ? sgx_12_0->edx >> 8 :
+							    sgx_12_0->edx;
+	if (size >= BIT_ULL(max_size_log2))
+		kvm_inject_gp(vcpu, 0);
+
+	if (sgx_virt_ecreate(&pageinfo, (void __user *)secs_hva, &trapnr))
+		return sgx_inject_fault(vcpu, secs_gva, trapnr);
+
+	return kvm_skip_emulated_instruction(vcpu);
+}
+
 static inline bool encls_leaf_enabled_in_guest(struct kvm_vcpu *vcpu, u32 leaf)
 {
 	if (!enable_sgx || !guest_cpuid_has(vcpu, X86_FEATURE_SGX))
@@ -42,6 +283,8 @@ int handle_encls(struct kvm_vcpu *vcpu)
 	} else if (!sgx_enabled_in_guest_bios(vcpu)) {
 		kvm_inject_gp(vcpu, 0);
 	} else {
+		if (leaf == ECREATE)
+			return handle_encls_ecreate(vcpu);
 		WARN(1, "KVM: unexpected exit on ENCLS[%u]", leaf);
 		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
 		vcpu->run->hw.hardware_exit_reason = EXIT_REASON_ENCLS;
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 20/23] KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (18 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 19/23] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 21/23] KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC) Kai Huang
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, mattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Emulate the four Launch Enclave public key hash MSRs (LE hash MSRs) that
exist on CPUs that support SGX Launch Control (LC).  SGX LC modifies the
behavior of ENCLS[EINIT] to use the LE hash MSRs when verifying the key
used to sign an enclave.  On CPUs without LC support, the LE hash is
hardwired into the CPU to an Intel controlled key (the Intel key is also
the reset value of the LE hash MSRs). Track the guest's desired hash so
that a future patch can stuff the hash into the hardware MSRs when
executing EINIT on behalf of the guest, when those MSRs are writable in
host.

Note, KVM allows writes to the LE hash MSRs if IA32_FEATURE_CONTROL is
unlocked.  This is technically not architectural behavior, but it's
roughly equivalent to the arch behavior of the MSRs being writable prior
to activating SGX[1].  Emulating SGX activation is feasible, but adds no
tangible benefits and would just create extra work for KVM and guest
firmware.

[1] SGX related bits in IA32_FEATURE_CONTROL cannot be set until SGX
    is activated, e.g. by firmware.  SGX activation is triggered by
    setting bit 0 in MSR 0x7a.  Until SGX is activated, the LE hash
    MSRs are writable, e.g. to allow firmware to lock down the LE
    root key with a non-Intel value.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/vmx/sgx.c | 35 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/sgx.h |  6 ++++++
 arch/x86/kvm/vmx/vmx.c | 20 ++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h |  2 ++
 4 files changed, 63 insertions(+)

diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 4281045318ac..6ad6a24c4e93 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -12,6 +12,9 @@
 
 bool __read_mostly enable_sgx;
 
+/* Initial value of guest's virtual SGX_LEPUBKEYHASHn MSRs */
+static u64 sgx_pubkey_hash[4] __ro_after_init;
+
 /*
  * ENCLS's memory operands use a fixed segment (DS) and a fixed
  * address size based on the mode.  Related prefixes are ignored.
@@ -292,3 +295,35 @@ int handle_encls(struct kvm_vcpu *vcpu)
 	}
 	return 1;
 }
+
+void setup_default_sgx_lepubkeyhash(void)
+{
+	/*
+	 * Use Intel's default value for Skylake hardware if Launch Control is
+	 * not supported, i.e. Intel's hash is hardcoded into silicon, or if
+	 * Launch Control is supported and enabled, i.e. mimic the reset value
+	 * and let the guest write the MSRs at will.  If Launch Control is
+	 * supported but disabled, then use the current MSR values as the hash
+	 * MSRs exist but are read-only (locked and not writable).
+	 */
+	if (!enable_sgx || !boot_cpu_has(X86_FEATURE_SGX_LC) ||
+	    rdmsrl_safe(MSR_IA32_SGXLEPUBKEYHASH0, &sgx_pubkey_hash[0])) {
+		sgx_pubkey_hash[0] = 0xa6053e051270b7acULL;
+		sgx_pubkey_hash[1] = 0x6cfbe8ba8b3b413dULL;
+		sgx_pubkey_hash[2] = 0xc4916d99f2b3735dULL;
+		sgx_pubkey_hash[3] = 0xd4f8c05909f9bb3bULL;
+	} else {
+		/* MSR_IA32_SGXLEPUBKEYHASH0 is read above */
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH1, sgx_pubkey_hash[1]);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH2, sgx_pubkey_hash[2]);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH3, sgx_pubkey_hash[3]);
+	}
+}
+
+void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	memcpy(vmx->msr_ia32_sgxlepubkeyhash, sgx_pubkey_hash,
+	       sizeof(sgx_pubkey_hash));
+}
diff --git a/arch/x86/kvm/vmx/sgx.h b/arch/x86/kvm/vmx/sgx.h
index 647afc7546bf..05d774f62b7f 100644
--- a/arch/x86/kvm/vmx/sgx.h
+++ b/arch/x86/kvm/vmx/sgx.h
@@ -8,8 +8,14 @@
 extern bool __read_mostly enable_sgx;
 
 int handle_encls(struct kvm_vcpu *vcpu);
+
+void setup_default_sgx_lepubkeyhash(void);
+void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu);
 #else
 #define enable_sgx 0
+
+static inline void setup_default_sgx_lepubkeyhash(void) { }
+static inline void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu) { }
 #endif
 
 #endif /* __KVM_X86_SGX_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4bcb391fc2f5..7a32e88882cb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1889,6 +1889,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_FEAT_CTL:
 		msr_info->data = vmx->msr_ia32_feature_control;
 		break;
+	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
+			return 1;
+		msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
+			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
+		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!nested_vmx_allowed(vcpu))
 			return 1;
@@ -2155,6 +2162,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (msr_info->host_initiated && data == 0)
 			vmx_leave_nested(vcpu);
 		break;
+	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
+		if (!msr_info->host_initiated &&
+		    (!guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC) ||
+		    ((vmx->msr_ia32_feature_control & FEAT_CTL_LOCKED) &&
+		    !(vmx->msr_ia32_feature_control & FEAT_CTL_SGX_LC_ENABLED))))
+			return 1;
+		vmx->msr_ia32_sgxlepubkeyhash
+			[msr_index - MSR_IA32_SGXLEPUBKEYHASH0] = data;
+		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!msr_info->host_initiated)
 			return 1; /* they are read-only */
@@ -6973,6 +6989,8 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
 	else
 		memset(&vmx->nested.msrs, 0, sizeof(vmx->nested.msrs));
 
+	vcpu_setup_sgx_lepubkeyhash(vcpu);
+
 	vmx->nested.posted_intr_nv = -1;
 	vmx->nested.current_vmptr = -1ull;
 
@@ -7912,6 +7930,8 @@ static __init int hardware_setup(void)
 	if (!enable_ept || !cpu_has_vmx_intel_pt())
 		pt_mode = PT_MODE_SYSTEM;
 
+	setup_default_sgx_lepubkeyhash();
+
 	if (nested) {
 		nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
 					   vmx_capability.ept);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c8ad47ea8445..b46cfafcfa44 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -299,6 +299,8 @@ struct vcpu_vmx {
 	 */
 	u64 msr_ia32_feature_control;
 	u64 msr_ia32_feature_control_valid_bits;
+	/* SGX Launch Control public key hash */
+	u64 msr_ia32_sgxlepubkeyhash[4];
 	u64 ept_pointer;
 
 	struct pt_desc pt_desc;
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 21/23] KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (19 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 20/23] KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:56 ` [RFC PATCH 22/23] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC Kai Huang
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, mattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a VM-Exit handler to trap-and-execute EINIT when SGX LC is enabled
in the host.  When SGX LC is enabled, the host kernel may rewrite the
hardware values at will, e.g. to launch enclaves with different signers,
thus KVM needs to intercept EINIT to ensure it is executed with the
correct LE hash (even if the guest sees a hardwired hash).

Switching the LE hash MSRs on VM-Enter/VM-Exit is not a viable option as
writing the MSRs is prohibitively expensive, e.g. on SKL hardware each
WRMSR is ~400 cycles.  And because EINIT takes tens of thousands of
cycles to execute, the ~1500 cycle overhead to trap-and-execute EINIT is
unlikely to be noticed by the guest, let alone impact its overall SGX
performance.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/vmx/sgx.c | 55 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 6ad6a24c4e93..979d0597e4ac 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -256,6 +256,59 @@ static int handle_encls_ecreate(struct kvm_vcpu *vcpu)
 	return kvm_skip_emulated_instruction(vcpu);
 }
 
+static int handle_encls_einit(struct kvm_vcpu *vcpu)
+{
+	unsigned long sig_hva, secs_hva, token_hva, rflags;
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	gva_t sig_gva, secs_gva, token_gva;
+	gpa_t sig_gpa, secs_gpa, token_gpa;
+	int ret, trapnr;
+
+	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 1808, 4096, &sig_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_rdx_read(vcpu), 304, 512, &token_gva))
+		return 1;
+
+	/*
+	 * Translate the SIGSTRUCT, SECS and TOKEN pointers from GVA to GPA.
+	 * Resume the guest on failure to inject a #PF.
+	 */
+	if (sgx_gva_to_gpa(vcpu, sig_gva, false, &sig_gpa) ||
+	    sgx_gva_to_gpa(vcpu, secs_gva, true, &secs_gpa) ||
+	    sgx_gva_to_gpa(vcpu, token_gva, false, &token_gpa))
+		return 1;
+
+	/*
+	 * ...and then to HVA.  The order of accesses isn't architectural, i.e.
+	 * KVM doesn't have to fully process one address at a time.  Exit to
+	 * userspace if a GPA is invalid.  Note, all structures are aligned and
+	 * cannot split pages.
+	 */
+	if (sgx_gpa_to_hva(vcpu, sig_gpa, &sig_hva) ||
+	    sgx_gpa_to_hva(vcpu, secs_gpa, &secs_hva) ||
+	    sgx_gpa_to_hva(vcpu, token_gpa, &token_hva))
+		return 0;
+
+	ret = sgx_virt_einit((void __user *)sig_hva, (void __user *)token_hva,
+			     (void __user *)secs_hva,
+			     vmx->msr_ia32_sgxlepubkeyhash, &trapnr);
+
+	if (ret == -EFAULT)
+		return sgx_inject_fault(vcpu, secs_gva, trapnr);
+
+	rflags = vmx_get_rflags(vcpu) & ~(X86_EFLAGS_CF | X86_EFLAGS_PF |
+					  X86_EFLAGS_AF | X86_EFLAGS_SF |
+					  X86_EFLAGS_OF);
+	if (ret)
+		rflags |= X86_EFLAGS_ZF;
+	else
+		rflags &= ~X86_EFLAGS_ZF;
+	vmx_set_rflags(vcpu, rflags);
+
+	kvm_rax_write(vcpu, ret);
+	return kvm_skip_emulated_instruction(vcpu);
+}
+
 static inline bool encls_leaf_enabled_in_guest(struct kvm_vcpu *vcpu, u32 leaf)
 {
 	if (!enable_sgx || !guest_cpuid_has(vcpu, X86_FEATURE_SGX))
@@ -288,6 +341,8 @@ int handle_encls(struct kvm_vcpu *vcpu)
 	} else {
 		if (leaf == ECREATE)
 			return handle_encls_ecreate(vcpu);
+		if (leaf == EINIT)
+			return handle_encls_einit(vcpu);
 		WARN(1, "KVM: unexpected exit on ENCLS[%u]", leaf);
 		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
 		vcpu->run->hw.hardware_exit_reason = EXIT_REASON_ENCLS;
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 22/23] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (20 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 21/23] KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC) Kai Huang
@ 2021-01-06  1:56 ` Kai Huang
  2021-01-06  1:58 ` [RFC PATCH 23/23] KVM: x86: Add capability to grant VM access to privileged SGX attribute Kai Huang
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:56 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, mattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Enable SGX virtualization now that KVM has the VM-Exit handlers needed
to trap-and-execute ENCLS to ensure correctness and/or enforce the CPU
model exposed to the guest.  Add a KVM module param, "sgx", to allow an
admin to disable SGX virtualization independent of the kernel.

When supported in hardware and the kernel, advertise SGX1, SGX2 and SGX
LC to userspace via CPUID and wire up the ENCLS_EXITING bitmap based on
the guest's SGX capabilities, i.e. to allow ENCLS to be executed in an
SGX-enabled guest.  With the exception of the provision key, all SGX
attribute bits may be exposed to the guest.  Guest access to the
provision key, which is controlled via securityfs, will be added in a
future patch.

Note, KVM does not yet support exposing ENCLS_C leafs or ENCLV leafs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/cpuid.c      | 58 +++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/nested.c | 26 +++++++++++--
 arch/x86/kvm/vmx/nested.h |  5 +++
 arch/x86/kvm/vmx/sgx.c    | 80 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/sgx.h    | 13 +++++++
 arch/x86/kvm/vmx/vmcs12.c |  1 +
 arch/x86/kvm/vmx/vmcs12.h |  4 +-
 arch/x86/kvm/vmx/vmx.c    | 38 ++++++++++++++++++-
 8 files changed, 216 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 83637a2ff605..99e5e6f1a1ae 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -18,6 +18,7 @@
 #include <asm/processor.h>
 #include <asm/user.h>
 #include <asm/fpu/xstate.h>
+#include <asm/sgx_arch.h>
 #include "cpuid.h"
 #include "lapic.h"
 #include "mmu.h"
@@ -169,6 +170,21 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		vcpu->arch.guest_supported_xcr0 =
 			(best->eax | ((u64)best->edx << 32)) & supported_xcr0;
 
+	/*
+	 * Bits 127:0 of the allowed SECS.ATTRIBUTES (CPUID.0x12.0x1) enumerate
+	 * the supported XSAVE Feature Request Mask (XFRM), i.e. the enclave's
+	 * requested XCR0 value.  The enclave's XFRM must be a subset of XCRO
+	 * at the time of EENTER, thus adjust the allowed XFRM by the guest's
+	 * supported XCR0.  Similar to XCR0 handling, FP and SSE are forced to
+	 * '1' even on CPUs that don't support XSAVE.
+	 */
+	best = kvm_find_cpuid_entry(vcpu, 0x12, 0x1);
+	if (best) {
+		best->ecx &= vcpu->arch.guest_supported_xcr0 & 0xffffffff;
+		best->edx &= vcpu->arch.guest_supported_xcr0 >> 32;
+		best->ecx |= XFEATURE_MASK_FPSSE;
+	}
+
 	kvm_update_pv_runtime(vcpu);
 
 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
@@ -390,7 +406,7 @@ void kvm_set_cpu_caps(void)
 	);
 
 	kvm_cpu_cap_mask(CPUID_7_0_EBX,
-		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
+		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
 		F(BMI2) | F(ERMS) | 0 /*INVPCID*/ | F(RTM) | 0 /*MPX*/ | F(RDSEED) |
 		F(ADX) | F(SMAP) | F(AVX512IFMA) | F(AVX512F) | F(AVX512PF) |
 		F(AVX512ER) | F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(AVX512DQ) |
@@ -401,7 +417,8 @@ void kvm_set_cpu_caps(void)
 		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
-		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/
+		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
+		F(SGX_LC)
 	);
 	/* Set LA57 based on hardware capability. */
 	if (cpuid_ecx(7) & F(LA57))
@@ -440,6 +457,11 @@ void kvm_set_cpu_caps(void)
 		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES)
 	);
 
+	kvm_cpu_cap_mask(CPUID_12_EAX,
+		F(SGX1) | F(SGX2) | 0 /* Reserved */ | 0 /* Reserved */ |
+		0 /* Reserved */ | 0 /* ENCLV */ | 0 /* ENCLS_C */
+	);
+
 	kvm_cpu_cap_mask(CPUID_8000_0001_ECX,
 		F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
 		F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
@@ -761,6 +783,38 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			entry->edx = 0;
 		}
 		break;
+	case 0x12:
+		/* Intel SGX */
+		if (!kvm_cpu_cap_has(X86_FEATURE_SGX)) {
+			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+			break;
+		}
+
+		/*
+		 * Index 0: Sub-features, MISCSELECT (a.k.a extended features)
+		 * and max enclave sizes.   The SGX sub-features and MISCSELECT
+		 * are restricted by kernel and KVM capabilities (like most
+		 * feature flags), while enclave size is unrestricted.
+		 */
+		cpuid_entry_override(entry, CPUID_12_EAX);
+		entry->ebx &= SGX_MISC_EXINFO;
+
+		entry = do_host_cpuid(array, function, 1);
+		if (!entry)
+			goto out;
+
+		/*
+		 * Index 1: SECS.ATTRIBUTES.  ATTRIBUTES are restricted a la
+		 * feature flags.  Advertise all supported flags, including
+		 * privileged attributes that require explicit opt-in from
+		 * userspace.  ATTRIBUTES.XFRM is not adjusted as userspace is
+		 * expected to derive it from supported XCR0.
+		 */
+		entry->eax &= SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT |
+			      /* PROVISIONKEY | */ SGX_ATTR_EINITTOKENKEY |
+			      SGX_ATTR_KSS;
+		entry->ebx &= 0;
+		break;
 	/* Intel PT */
 	case 0x14:
 		if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT)) {
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f16d6c83eafa..66fe2056559a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -11,6 +11,7 @@
 #include "mmu.h"
 #include "nested.h"
 #include "pmu.h"
+#include "sgx.h"
 #include "trace.h"
 #include "x86.h"
 
@@ -2318,6 +2319,9 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 		if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
 		    exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
 
+		if (exec_control & SECONDARY_EXEC_ENCLS_EXITING)
+			vmx_write_encls_bitmap(&vmx->vcpu, vmcs12);
+
 		secondary_exec_controls_set(vmx, exec_control);
 	}
 
@@ -5698,6 +5702,20 @@ static bool nested_vmx_exit_handled_cr(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+static bool nested_vmx_exit_handled_encls(struct kvm_vcpu *vcpu,
+					  struct vmcs12 *vmcs12)
+{
+	u32 encls_leaf;
+
+	if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENCLS_EXITING))
+		return false;
+
+	encls_leaf = kvm_rax_read(vcpu);
+	if (encls_leaf > 62)
+		encls_leaf = 63;
+	return vmcs12->encls_exiting_bitmap & BIT_ULL(encls_leaf);
+}
+
 static bool nested_vmx_exit_handled_vmcs_access(struct kvm_vcpu *vcpu,
 	struct vmcs12 *vmcs12, gpa_t bitmap)
 {
@@ -5791,9 +5809,6 @@ static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu,
 	case EXIT_REASON_VMFUNC:
 		/* VM functions are emulated through L2->L0 vmexits. */
 		return true;
-	case EXIT_REASON_ENCLS:
-		/* SGX is never exposed to L1 */
-		return true;
 	default:
 		break;
 	}
@@ -5917,6 +5932,8 @@ static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu,
 	case EXIT_REASON_TPAUSE:
 		return nested_cpu_has2(vmcs12,
 			SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE);
+	case EXIT_REASON_ENCLS:
+		return nested_vmx_exit_handled_encls(vcpu, vmcs12);
 	default:
 		return true;
 	}
@@ -6489,6 +6506,9 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		msrs->secondary_ctls_high |=
 			SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 
+	if (enable_sgx)
+		msrs->secondary_ctls_high |= SECONDARY_EXEC_ENCLS_EXITING;
+
 	/* miscellaneous data */
 	rdmsr(MSR_IA32_VMX_MISC,
 		msrs->misc_low,
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 197148d76b8f..184418baeb3c 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -244,6 +244,11 @@ static inline bool nested_exit_on_intr(struct kvm_vcpu *vcpu)
 		PIN_BASED_EXT_INTR_MASK;
 }
 
+static inline bool nested_cpu_has_encls_exit(struct vmcs12 *vmcs12)
+{
+	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENCLS_EXITING);
+}
+
 /*
  * if fixed0[i] == 1: val[i] must be 1
  * if fixed1[i] == 0: val[i] must be 0
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 979d0597e4ac..62c3f3ec960b 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -6,11 +6,13 @@
 
 #include "cpuid.h"
 #include "kvm_cache_regs.h"
+#include "nested.h"
 #include "sgx.h"
 #include "vmx.h"
 #include "x86.h"
 
-bool __read_mostly enable_sgx;
+bool __read_mostly enable_sgx = 1;
+module_param_named(sgx, enable_sgx, bool, 0444);
 
 /* Initial value of guest's virtual SGX_LEPUBKEYHASHn MSRs */
 static u64 sgx_pubkey_hash[4] __ro_after_init;
@@ -382,3 +384,79 @@ void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu)
 	memcpy(vmx->msr_ia32_sgxlepubkeyhash, sgx_pubkey_hash,
 	       sizeof(sgx_pubkey_hash));
 }
+
+/*
+ * ECREATE must be intercepted to enforce MISCSELECT, ATTRIBUTES and XFRM
+ * restrictions if the guest's allowed-1 settings diverge from hardware.
+ */
+static bool sgx_intercept_encls_ecreate(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *guest_cpuid;
+	u32 eax, ebx, ecx, edx;
+
+	if (!vcpu->kvm->arch.sgx_provisioning_allowed)
+		return true;
+
+	guest_cpuid = kvm_find_cpuid_entry(vcpu, 0x12, 0);
+	if (!guest_cpuid)
+		return true;
+
+	cpuid_count(0x12, 0, &eax, &ebx, &ecx, &edx);
+	if (guest_cpuid->ebx != ebx || guest_cpuid->edx != edx)
+		return true;
+
+	guest_cpuid = kvm_find_cpuid_entry(vcpu, 0x12, 1);
+	if (!guest_cpuid)
+		return true;
+
+	cpuid_count(0x12, 1, &eax, &ebx, &ecx, &edx);
+	if (guest_cpuid->eax != eax || guest_cpuid->ebx != ebx ||
+	    guest_cpuid->ecx != ecx || guest_cpuid->edx != edx)
+		return true;
+
+	return false;
+}
+
+void vmx_write_encls_bitmap(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
+{
+	/*
+	 * There is no software enable bit for SGX that is virtualized by
+	 * hardware, e.g. there's no CR4.SGXE, so when SGX is disabled in the
+	 * guest (either by the host or by the guest's BIOS) but enabled in the
+	 * host, trap all ENCLS leafs and inject #UD/#GP as needed to emulate
+	 * the expected system behavior for ENCLS.
+	 */
+	u64 bitmap = -1ull;
+
+	/* Nothing to do if hardware doesn't support SGX */
+	if (!cpu_has_vmx_encls_vmexit())
+		return;
+
+	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX) &&
+	    sgx_enabled_in_guest_bios(vcpu)) {
+		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX1)) {
+			bitmap &= ~GENMASK_ULL(ETRACK, ECREATE);
+			if (sgx_intercept_encls_ecreate(vcpu))
+				bitmap |= (1 << ECREATE);
+		}
+
+		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX2))
+			bitmap &= ~GENMASK_ULL(EMODT, EAUG);
+
+		/*
+		 * Trap and execute EINIT if launch control is enabled in the
+		 * host using the guest's values for launch control MSRs, even
+		 * if the guest's values are fixed to hardware default values.
+		 * The MSRs are not loaded/saved on VM-Enter/VM-Exit as writing
+		 * the MSRs is extraordinarily expensive.
+		 */
+		if (boot_cpu_has(X86_FEATURE_SGX_LC))
+			bitmap |= (1 << EINIT);
+
+		if (!vmcs12 && is_guest_mode(vcpu))
+			vmcs12 = get_vmcs12(vcpu);
+		if (vmcs12 && nested_cpu_has_encls_exit(vmcs12))
+			bitmap |= vmcs12->encls_exiting_bitmap;
+	}
+	vmcs_write64(ENCLS_EXITING_BITMAP, bitmap);
+}
diff --git a/arch/x86/kvm/vmx/sgx.h b/arch/x86/kvm/vmx/sgx.h
index 05d774f62b7f..da570dc8e519 100644
--- a/arch/x86/kvm/vmx/sgx.h
+++ b/arch/x86/kvm/vmx/sgx.h
@@ -4,6 +4,9 @@
 
 #include <linux/kvm_host.h>
 
+#include "capabilities.h"
+#include "vmx_ops.h"
+
 #ifdef CONFIG_X86_SGX_VIRTUALIZATION
 extern bool __read_mostly enable_sgx;
 
@@ -11,11 +14,21 @@ int handle_encls(struct kvm_vcpu *vcpu);
 
 void setup_default_sgx_lepubkeyhash(void);
 void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu);
+
+void vmx_write_encls_bitmap(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12);
 #else
 #define enable_sgx 0
 
 static inline void setup_default_sgx_lepubkeyhash(void) { }
 static inline void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu) { }
+
+static inline void vmx_write_encls_bitmap(struct kvm_vcpu *vcpu,
+					  struct vmcs12 *vmcs12)
+{
+	/* Nothing to do if hardware doesn't support SGX */
+	if (cpu_has_vmx_encls_vmexit())
+		vmcs_write64(ENCLS_EXITING_BITMAP, -1ull);
+}
 #endif
 
 #endif /* __KVM_X86_SGX_H */
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index c8e51c004f78..034adb6404dc 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -50,6 +50,7 @@ const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD64(VMREAD_BITMAP, vmread_bitmap),
 	FIELD64(VMWRITE_BITMAP, vmwrite_bitmap),
 	FIELD64(XSS_EXIT_BITMAP, xss_exit_bitmap),
+	FIELD64(ENCLS_EXITING_BITMAP, encls_exiting_bitmap),
 	FIELD64(GUEST_PHYSICAL_ADDRESS, guest_physical_address),
 	FIELD64(VMCS_LINK_POINTER, vmcs_link_pointer),
 	FIELD64(GUEST_IA32_DEBUGCTL, guest_ia32_debugctl),
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 80232daf00ff..13494956d0e9 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -69,7 +69,8 @@ struct __packed vmcs12 {
 	u64 vm_function_control;
 	u64 eptp_list_address;
 	u64 pml_address;
-	u64 padding64[3]; /* room for future expansion */
+	u64 encls_exiting_bitmap;
+	u64 padding64[2]; /* room for future expansion */
 	/*
 	 * To allow migration of L1 (complete with its L2 guests) between
 	 * machines of different natural widths (32 or 64 bit), we cannot have
@@ -256,6 +257,7 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(vm_function_control, 296);
 	CHECK_OFFSET(eptp_list_address, 304);
 	CHECK_OFFSET(pml_address, 312);
+	CHECK_OFFSET(encls_exiting_bitmap, 320);
 	CHECK_OFFSET(cr0_guest_host_mask, 344);
 	CHECK_OFFSET(cr4_guest_host_mask, 352);
 	CHECK_OFFSET(cr0_read_shadow, 360);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7a32e88882cb..4d18baa97764 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2161,6 +2161,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx->msr_ia32_feature_control = data;
 		if (msr_info->host_initiated && data == 0)
 			vmx_leave_nested(vcpu);
+
+		/* SGX may be enabled/disabled by guest's firmware */
+		vmx_write_encls_bitmap(vcpu, NULL);
 		break;
 	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
 		if (!msr_info->host_initiated &&
@@ -4320,6 +4323,15 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 	vmx_adjust_sec_exec_control(vmx, &exec_control, waitpkg, WAITPKG,
 				    ENABLE_USR_WAIT_PAUSE, false);
 
+	if (cpu_has_vmx_encls_vmexit() && nested) {
+		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX))
+			vmx->nested.msrs.secondary_ctls_high |=
+				SECONDARY_EXEC_ENCLS_EXITING;
+		else
+			vmx->nested.msrs.secondary_ctls_high &=
+				~SECONDARY_EXEC_ENCLS_EXITING;
+	}
+
 	vmx->secondary_exec_control = exec_control;
 }
 
@@ -4419,8 +4431,7 @@ static void init_vmcs(struct vcpu_vmx *vmx)
 		vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
 	}
 
-	if (cpu_has_vmx_encls_vmexit())
-		vmcs_write64(ENCLS_EXITING_BITMAP, -1ull);
+	vmx_write_encls_bitmap(&vmx->vcpu, NULL);
 
 	if (vmx_pt_mode_is_host_guest()) {
 		memset(&vmx->pt_desc, 0, sizeof(vmx->pt_desc));
@@ -7317,6 +7328,22 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	set_cr4_guest_host_mask(vmx);
 
+	vmx_write_encls_bitmap(vcpu, NULL);
+	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX))
+		vmx->msr_ia32_feature_control_valid_bits |= FEAT_CTL_SGX_ENABLED;
+	else
+		vmx->msr_ia32_feature_control_valid_bits &= ~FEAT_CTL_SGX_ENABLED;
+	/*
+	 * Only allow guest to write its virtual SGX_LEPUBKEYHASHn MSRs when
+	 * host is writable, otherwise it is meaningless.
+	 */
+	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
+		vmx->msr_ia32_feature_control_valid_bits |=
+			FEAT_CTL_SGX_LC_ENABLED;
+	else
+		vmx->msr_ia32_feature_control_valid_bits &=
+			~FEAT_CTL_SGX_LC_ENABLED;
+
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	update_exception_bitmap(vcpu);
 }
@@ -7337,6 +7364,13 @@ static __init void vmx_set_cpu_caps(void)
 	if (vmx_pt_mode_is_host_guest())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT);
 
+	if (!enable_sgx) {
+		kvm_cpu_cap_clear(X86_FEATURE_SGX);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX_LC);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX1);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX2);
+	}
+
 	if (vmx_umip_emulated())
 		kvm_cpu_cap_set(X86_FEATURE_UMIP);
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH 23/23] KVM: x86: Add capability to grant VM access to privileged SGX attribute
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (21 preceding siblings ...)
  2021-01-06  1:56 ` [RFC PATCH 22/23] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC Kai Huang
@ 2021-01-06  1:58 ` Kai Huang
  2021-01-06  2:22 ` [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  1:58 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, mattson, joro, vkuznets, wanpengli, corbet,
	Andy Lutomirski, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a capability, KVM_CAP_SGX_ATTRIBUTE, that can be used by userspace
to grant a VM access to a priveleged attribute, with args[0] holding a
file handle to a valid SGX attribute file.

The SGX subsystem restricts access to a subset of enclave attributes to
provide additional security for an uncompromised kernel, e.g. to prevent
malware from using the PROVISIONKEY to ensure its nodes are running
inside a geniune SGX enclave and/or to obtain a stable fingerprint.

To prevent userspace from circumventing such restrictions by running an
enclave in a VM, KVM restricts guest access to privileged attributes by
default.

Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++
 arch/x86/kvm/cpuid.c           |  2 +-
 arch/x86/kvm/x86.c             | 22 ++++++++++++++++++++++
 include/uapi/linux/kvm.h       |  1 +
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e00a66d72372..edd650e8a87d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6014,6 +6014,29 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space
 can then handle to implement model specific MSR handling and/or user notifications
 to inform a user that an MSR was not handled.
 
+7.22 KVM_CAP_SGX_ATTRIBUTE
+----------------------
+
+:Architectures: x86
+:Target: VM
+:Parameters: args[0] is a file handle of a SGX attribute file in securityfs
+:Returns: 0 on success, -EINVAL if the file handle is invalid or if a requested
+          attribute is not supported by KVM.
+
+KVM_CAP_SGX_ATTRIBUTE enables a userspace VMM to grant a VM access to one or
+more priveleged enclave attributes.  args[0] must hold a file handle to a valid
+SGX attribute file corresponding to an attribute that is supported/restricted
+by KVM (currently only PROVISIONKEY).
+
+The SGX subsystem restricts access to a subset of enclave attributes to provide
+additional security for an uncompromised kernel, e.g. use of the PROVISIONKEY
+is restricted to deter malware from using the PROVISIONKEY to obtain a stable
+system fingerprint.  To prevent userspace from circumventing such restrictions
+by running an enclave in a VM, KVM prevents access to privileged attributes by
+default.
+
+See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 99e5e6f1a1ae..b53ba5570ee5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -811,7 +811,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		 * expected to derive it from supported XCR0.
 		 */
 		entry->eax &= SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT |
-			      /* PROVISIONKEY | */ SGX_ATTR_EINITTOKENKEY |
+			      SGX_ATTR_PROVISIONKEY | SGX_ATTR_EINITTOKENKEY |
 			      SGX_ATTR_KSS;
 		entry->ebx &= 0;
 		break;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c195494da0ea..bfc44ca6b6ba 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -74,6 +74,8 @@
 #include <asm/tlbflush.h>
 #include <asm/intel_pt.h>
 #include <asm/emulate_prefix.h>
+#include <asm/sgx.h>
+#include <asm/sgx_arch.h>
 #include <clocksource/hyperv_timer.h>
 
 #define CREATE_TRACE_POINTS
@@ -3739,6 +3741,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_X86_USER_SPACE_MSR:
 	case KVM_CAP_X86_MSR_FILTER:
 	case KVM_CAP_ENFORCE_PV_FEATURE_CPUID:
+#ifdef CONFIG_X86_SGX_VIRTUALIZATION
+	case KVM_CAP_SGX_ATTRIBUTE:
+#endif
 		r = 1;
 		break;
 	case KVM_CAP_SYNC_REGS:
@@ -5270,6 +5275,23 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		kvm->arch.user_space_msr_mask = cap->args[0];
 		r = 0;
 		break;
+#ifdef CONFIG_X86_SGX_VIRTUALIZATION
+	case KVM_CAP_SGX_ATTRIBUTE: {
+		unsigned long allowed_attributes = 0;
+
+		r = sgx_set_attribute(&allowed_attributes, cap->args[0]);
+		if (r)
+			break;
+
+		/* KVM only supports the PROVISIONKEY privileged attribute. */
+		if ((allowed_attributes & SGX_ATTR_PROVISIONKEY) &&
+		    !(allowed_attributes & ~SGX_ATTR_PROVISIONKEY))
+			kvm->arch.sgx_provisioning_allowed = true;
+		else
+			r = -EINVAL;
+		break;
+	}
+#endif
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ca41220b40b8..4053522ac191 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_X86_USER_SPACE_MSR 188
 #define KVM_CAP_X86_MSR_FILTER 189
 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
+#define KVM_CAP_SGX_ATTRIBUTE 200
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (22 preceding siblings ...)
  2021-01-06  1:58 ` [RFC PATCH 23/23] KVM: x86: Add capability to grant VM access to privileged SGX attribute Kai Huang
@ 2021-01-06  2:22 ` Kai Huang
  2021-01-06 17:07 ` Dave Hansen
  2021-01-11 17:20 ` Jarkko Sakkinen
  25 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06  2:22 UTC (permalink / raw)
  To: linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, dave.hansen, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa, jethro, b.thiel, joro, vkuznets, wanpengli,
	corbet, jmattson

Sorry that I made mistake when copy and paste Jim's email address :( 
Remove the wrong email address (mattson@google.com) and add the correct one
(jmattson@gmail.com). Really apologize for the noise.

Thanks,
-Kai

On Wed, 2021-01-06 at 14:55 +1300, Kai Huang wrote:
> --- Disclaimer ---
> 
> These patches were originally written by Sean Christopherson while at Intel.
> Now that Sean has left Intel, I (Kai) have taken over getting them upstream.
> This series needs more review before it can be merged.  It is being posted
> publicly and under RFC so Sean and others can review it. Maintainers are safe
> ignoring it for now.
> 
> ------------------
> 
> Hi all,
> 
> This series adds KVM SGX virtualization support. The first 12 patches starting
> with x86/sgx or x86/cpu.. are necessary changes to x86 and SGX core/driver to
> support KVM SGX virtualization, while the rest are patches to KVM subsystem.
> 
> Please help to review this series. Also I'd like to hear what is the proper
> way to merge this series, since it contains change to both x86/SGX and KVM
> subsystem. Any feedback is highly appreciated. And please let me know if I
> forgot to CC anyone, or anyone wants to be removed from CC. Thanks in advance!
> 
> This series is based against latest tip tree's x86/sgx branch. You can also get
> the code from tip branch of kvm-sgx repo on github:
> 
>         https://github.com/intel/kvm-sgx.git tip
> 
> It also requires Qemu changes to create VM with SGX support. You can find Qemu
> repo here:
> 
> 	https://github.com/intel/qemu-sgx.git next
> 
> Please refer to README.md of above qemu-sgx repo for detail on how to create
> guest with SGX support. At meantime, for your quick reference you can use below
> command to create SGX guest:
> 
> 	#qemu-system-x86_64 -smp 4 -m 2G -drive file=<your_vm_image>,if=virtio \
> 		-cpu host,+sgx_provisionkey \
> 		-sgx-epc id=epc1,memdev=mem1 \
> 		-object memory-backend-epc,id=mem1,size=64M,prealloc
> 
> Please note that the SGX relevant part is:
> 
> 		-cpu host,+sgx_provisionkey \
> 		-sgx-epc id=epc1,memdev=mem1 \
> 		-object memory-backend-epc,id=mem1,size=64M,prealloc
> 
> And you can change other parameters of your qemu command based on your needs.
> 
> =========
> KVM SGX virtualization Overview
> 
> - Virtual EPC
> 
> "Virtual EPC" is the EPC section exposed by KVM to guest so SGX software in
> guest can discover it and use it to create SGX enclaves. KVM exposes SGX to 
> guest via CPUID, and exposes one or more "virtual EPC" sections for guest.
> The size of "virtual EPC" is passed as Qemu parameter when creating the
> guest, and the base address is calcualted internally according to guest's
> configuration.
> 
> To support virtual EPC, add a new misc device /dev/sgx_virt_epc to SGX
> core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
> "virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX driver,
> virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave associated,
> and how virtual EPC is used by guest is compeletely controlled by guest's SGX
> software.
> 
> Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> /dev/sgx_virt_epc rather than in KVM. Doing so has two major advantages:
> 
>   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
>     just another memory backend for guests.
> 
>   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
>     does not have to export any symbols, changes to reclaim flows don't
>     need to be routed through KVM, SGX's dirty laundry doesn't have to
>     get aired out for the world to see, and so on and so forth.
> 
> The virtual EPC allocated to guests is currently not reclaimable, due to
> reclaiming EPC from KVM guests is not currently supported. Due to the
> complications of handling reclaim conflicts between guest and host, KVM
> EPC oversubscription, which allows total virtual EPC size greater than
> physical EPC by being able to reclaiming guests' EPC, is significantly more
> complex than basic support for SGX virtualization.
> 
> - Support SGX virtualization without SGX Launch Control unlocked mode
> 
> Although SGX driver requires SGX Launch Control unlocked mode to work, SGX
> virtualization doesn't, since how enclave is created is completely controlled
> by guest SGX software, which is not necessarily linux. Therefore, this series
> allows KVM to expose SGX to guest even SGX Launch Control is in locked mode,
> or is not present at all. The reason is the goal of SGX virtualization, or
> virtualization in general, is to expose hardware feature to guest, but not to
> make assumption how guest will use it. Therefore, KVM should support SGX guest
> as long as hardware is able to, to have chance to support more potential use
> cases in cloud environment.
> 
> - Support exposing SGX2
> 
> Due to the same reason above, SGX2 feature detection is added to core SGX code
> to allow KVM to expose SGX2 to guest, even currently SGX driver doesn't support
> SGX2, because SGX2 can work just fine in guest w/o any interaction to host SGX
> driver.
> 
> - Restricit SGX guest access to provisioning key
> 
> To grant guest being able to fully use SGX, guest needs to be able to create
> provisioning enclave. However provisioning key is sensitive and is restricted by
> /dev/sgx_provision in host SGX driver, therefore KVM SGX virtualization follows
> the same role: a new KVM_CAP_SGX_ATTRIBUTE is added to KVM uAPI, and only file
> descriptor of /dev/sgx_provision is passed to that CAP by usersppace hypervisor
> (Qemu) when creating the guest, it can access provisioning bit. This is done by
> making KVM trape ECREATE instruction from guest, and check the provisioning bit
> in ECREATE's attribute.
> 
> 
> Kai Huang (1):
>   x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs
> 
> Sean Christopherson (22):
>   x86/sgx: Split out adding EPC page to free list to separate helper
>   x86/sgx: Add enum for SGX_CHILD_PRESENT error code
>   x86/sgx: Introduce virtual EPC for use by KVM guests
>   x86/cpufeatures: Add SGX1 and SGX2 sub-features
>   x86/cpu/intel: Allow SGX virtualization without Launch Control support
>   x86/sgx: Expose SGX architectural definitions to the kernel
>   x86/sgx: Move ENCLS leaf definitions to sgx_arch.h
>   x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT)
>   x86/sgx: Add encls_faulted() helper
>   x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
>   x86/sgx: Move provisioning device creation out of SGX driver
>   KVM: VMX: Convert vcpu_vmx.exit_reason to a union
>   KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX)
>   KVM: x86: Define new #PF SGX error code bit
>   KVM: x86: Add SGX feature leaf to reverse CPUID lookup
>   KVM: VMX: Add basic handling of VM-Exit from SGX enclave
>   KVM: VMX: Frame in ENCLS handler for SGX virtualization
>   KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions
>   KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs
>   KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)
>   KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC
>   KVM: x86: Add capability to grant VM access to privileged SGX
>     attribute
> 
>  Documentation/virt/kvm/api.rst                |  23 +
>  arch/x86/Kconfig                              |  12 +
>  arch/x86/include/asm/cpufeature.h             |   5 +-
>  arch/x86/include/asm/cpufeatures.h            |   6 +-
>  arch/x86/include/asm/disabled-features.h      |   7 +-
>  arch/x86/include/asm/kvm_host.h               |   5 +
>  arch/x86/include/asm/required-features.h      |   2 +-
>  arch/x86/include/asm/sgx.h                    |  19 +
>  .../cpu/sgx/arch.h => include/asm/sgx_arch.h} |  20 +
>  arch/x86/include/asm/vmx.h                    |   1 +
>  arch/x86/include/uapi/asm/vmx.h               |   1 +
>  arch/x86/kernel/cpu/common.c                  |   4 +
>  arch/x86/kernel/cpu/feat_ctl.c                |  50 +-
>  arch/x86/kernel/cpu/sgx/Makefile              |   1 +
>  arch/x86/kernel/cpu/sgx/driver.c              |  17 -
>  arch/x86/kernel/cpu/sgx/encl.c                |   2 +-
>  arch/x86/kernel/cpu/sgx/encls.h               |  29 +-
>  arch/x86/kernel/cpu/sgx/ioctl.c               |  23 +-
>  arch/x86/kernel/cpu/sgx/main.c                |  79 ++-
>  arch/x86/kernel/cpu/sgx/sgx.h                 |   5 +-
>  arch/x86/kernel/cpu/sgx/virt.c                | 318 ++++++++++++
>  arch/x86/kernel/cpu/sgx/virt.h                |  14 +
>  arch/x86/kvm/Makefile                         |   2 +
>  arch/x86/kvm/cpuid.c                          |  58 ++-
>  arch/x86/kvm/cpuid.h                          |   1 +
>  arch/x86/kvm/vmx/nested.c                     |  70 ++-
>  arch/x86/kvm/vmx/nested.h                     |   5 +
>  arch/x86/kvm/vmx/sgx.c                        | 462 ++++++++++++++++++
>  arch/x86/kvm/vmx/sgx.h                        |  34 ++
>  arch/x86/kvm/vmx/vmcs12.c                     |   1 +
>  arch/x86/kvm/vmx/vmcs12.h                     |   4 +-
>  arch/x86/kvm/vmx/vmx.c                        | 171 +++++--
>  arch/x86/kvm/vmx/vmx.h                        |  27 +-
>  arch/x86/kvm/x86.c                            |  24 +
>  include/uapi/linux/kvm.h                      |   1 +
>  tools/testing/selftests/sgx/defines.h         |   2 +-
>  36 files changed, 1366 insertions(+), 139 deletions(-)
>  create mode 100644 arch/x86/include/asm/sgx.h
>  rename arch/x86/{kernel/cpu/sgx/arch.h => include/asm/sgx_arch.h} (96%)
>  create mode 100644 arch/x86/kernel/cpu/sgx/virt.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/virt.h
>  create mode 100644 arch/x86/kvm/vmx/sgx.c
>  create mode 100644 arch/x86/kvm/vmx/sgx.h
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (23 preceding siblings ...)
  2021-01-06  2:22 ` [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
@ 2021-01-06 17:07 ` Dave Hansen
  2021-01-07  0:34   ` Kai Huang
  2021-01-11 17:20 ` Jarkko Sakkinen
  25 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 17:07 UTC (permalink / raw)
  To: Kai Huang, linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, haitao.huang, pbonzini, bp, tglx, mingo,
	hpa, jethro, b.thiel, mattson, joro, vkuznets, wanpengli, corbet

On 1/5/21 5:55 PM, Kai Huang wrote:
> - Virtual EPC
> 
> "Virtual EPC" is the EPC section exposed by KVM to guest so SGX software in
> guest can discover it and use it to create SGX enclaves. KVM exposes SGX to 
> guest via CPUID, and exposes one or more "virtual EPC" sections for guest.
> The size of "virtual EPC" is passed as Qemu parameter when creating the
> guest, and the base address is calcualted internally according to guest's

				^ calculated

> configuration.

This is not a great first paragraph to introduce me to this feature.

Please remind us what EPC *is*, then you can go and talk about why we
have to virtualize it, and how "virtual EPC" is different from normal
EPC.  For instance:

SGX enclave memory is special and is reserved specifically for enclave
use.  In bare-metal SGX enclaves, the kernel allocates enclave pages,
copies data into the pages with privileged instructions, then allows the
enclave to start.  In this scenario, only initialized pages already
assigned to an enclave are mapped to userspace.

In virtualized environments, the hypervisor still needs to do the
physical enclave page allocation.  The guest kernel is responsible for
the data copying (among other things).  This means that the job of
starting an enclave is now split between hypervisor and guest.

This series introduces a new misc device: /dev/sgx_virt_epc.  This
device allows the host to map *uninitialized* enclave memory into
userspace, which can then be passed into a guest.

While it might be *possible* to start a host-side enclave with
/dev/sgx_enclave and pass its memory into a guest, it would be wasteful
and convoluted.

> core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
> "virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX driver,
> virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave associated,
> and how virtual EPC is used by guest is compeletely controlled by guest's SGX

					   ^ completely

Please run a spell checker on this thing.

> software.
> 
> Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> /dev/sgx_virt_epc rather than in KVM. Doing so has two major advantages:
> 
>   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
>     just another memory backend for guests.
> 
>   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
>     does not have to export any symbols, changes to reclaim flows don't
>     need to be routed through KVM, SGX's dirty laundry doesn't have to
>     get aired out for the world to see, and so on and so forth.
> 
> The virtual EPC allocated to guests is currently not reclaimable, due to
> reclaiming EPC from KVM guests is not currently supported. Due to the
> complications of handling reclaim conflicts between guest and host, KVM
> EPC oversubscription, which allows total virtual EPC size greater than
> physical EPC by being able to reclaiming guests' EPC, is significantly more
> complex than basic support for SGX virtualization.

It would also help here to remind the reader that enclave pages have a
special reclaim mechanism separtae from normal page reclaim, and that
mechanism is disabled for these pages.

Does the *ABI* here preclude doing oversubscription in the future?

> - Support SGX virtualization without SGX Launch Control unlocked mode
> 
> Although SGX driver requires SGX Launch Control unlocked mode to work, SGX

Although the bare-metal SGX driver requires...

Also, didn't we call this "Flexible Launch Control"?

> virtualization doesn't, since how enclave is created is completely controlled
> by guest SGX software, which is not necessarily linux. Therefore, this series
> allows KVM to expose SGX to guest even SGX Launch Control is in locked mode,

... "expose SGX to guests even if" ...

> or is not present at all. The reason is the goal of SGX virtualization, or
> virtualization in general, is to expose hardware feature to guest, but not to
> make assumption how guest will use it. Therefore, KVM should support SGX guest
> as long as hardware is able to, to have chance to support more potential use
> cases in cloud environment.

This is kinda long-winded and misses a lot of important context.  How about:

SGX hardware supports two "launch control" modes to limit which enclaves
can run.  In the "locked" mode, the hardware prevents enclaves from
running unless they are blessed by a third party.  In the unlocked mode,
the kernel is in full control of which enclaves can run.  The bare-metal
SGX code refuses to launch enclaves unless it is in the unlocked mode.

This sgx_virt_epc driver does not have such a restriction.  This allows
guests which are OK with the locked mode to use SGX, even if the host
kernel refuses to.

> - Support exposing SGX2
> 
> Due to the same reason above, SGX2 feature detection is added to core SGX code
> to allow KVM to expose SGX2 to guest, even currently SGX driver doesn't support
> SGX2, because SGX2 can work just fine in guest w/o any interaction to host SGX
> driver.
> 
> - Restricit SGX guest access to provisioning key
> 
> To grant guest being able to fully use SGX, guest needs to be able to create
> provisioning enclave.

"enclave" or "enclaves"?

> However provisioning key is sensitive and is restricted by

	^ the

> /dev/sgx_provision in host SGX driver, therefore KVM SGX virtualization follows
> the same role: a new KVM_CAP_SGX_ATTRIBUTE is added to KVM uAPI, and only file
> descriptor of /dev/sgx_provision is passed to that CAP by usersppace hypervisor
> (Qemu) when creating the guest, it can access provisioning bit. This is done by
> making KVM trape ECREATE instruction from guest, and check the provisioning bit

		^ trap

> in ECREATE's attribute.

The grammar in that paragraph is really off to me.  Can you give it
another go?


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code
  2021-01-06  1:55 ` [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code Kai Huang
@ 2021-01-06 18:28   ` Dave Hansen
  2021-01-06 21:40     ` Kai Huang
  2021-01-12  0:26     ` Jarkko Sakkinen
  2021-01-11 23:32   ` Jarkko Sakkinen
  1 sibling, 2 replies; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 18:28 UTC (permalink / raw)
  To: Kai Huang, linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, haitao.huang, pbonzini, bp, tglx, mingo, hpa

On 1/5/21 5:55 PM, Kai Huang wrote:
> Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
> failures are expected, but only due to SGX_CHILD_PRESENT.

This dances around the fact that this is an architectural error-code.
Could that be explicit?  Maybe the subject should be:

	Add SGX_CHILD_PRESENT hardware error code

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-06  1:55 ` [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
@ 2021-01-06 19:35   ` Dave Hansen
  2021-01-06 20:35     ` Sean Christopherson
  2021-01-07  1:42     ` Kai Huang
  2021-01-11 23:38   ` Jarkko Sakkinen
  1 sibling, 2 replies; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 19:35 UTC (permalink / raw)
  To: Kai Huang, linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, haitao.huang, pbonzini, bp, tglx, mingo, hpa

On 1/5/21 5:55 PM, Kai Huang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Add a misc device /dev/sgx_virt_epc to allow userspace to allocate "raw"
> EPC without an associated enclave.  The intended and only known use case
> for raw EPC allocation is to expose EPC to a KVM guest, hence the
> virt_epc moniker, virt.{c,h} files and X86_SGX_VIRTUALIZATION Kconfig.
> 
> Modify sgx_init() to always try to initialize virtual EPC driver, even
> when SGX driver is disabled due to SGX Launch Control is in locked mode,
> or not present at all, since SGX virtualization allows to expose SGX to
> guests that support non-LC configurations.

The grammar here is a bit off.  Here's a rewrite:

Modify sgx_init() to always try to initialize the virtual EPC driver,
even if the bare-metal SGX driver is disabled.  The bare-metal driver
might be disabled if SGX Launch Control is in locked mode, or not
supported in the hardware at all.  This allows (non-Linux) guests that
support non-LC configurations to use SGX.


> diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
> index 91d3dc784a29..7a25bf63adfb 100644
> --- a/arch/x86/kernel/cpu/sgx/Makefile
> +++ b/arch/x86/kernel/cpu/sgx/Makefile
> @@ -3,3 +3,4 @@ obj-y += \
>  	encl.o \
>  	ioctl.o \
>  	main.o
> +obj-$(CONFIG_X86_SGX_VIRTUALIZATION)	+= virt.o
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 95aad183bb65..02993a327a1f 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -9,9 +9,11 @@
>  #include <linux/sched/mm.h>
>  #include <linux/sched/signal.h>
>  #include <linux/slab.h>
> +#include "arch.h"
>  #include "driver.h"
>  #include "encl.h"
>  #include "encls.h"
> +#include "virt.h"
>  
>  struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
>  static int sgx_nr_epc_sections;
> @@ -726,7 +728,8 @@ static void __init sgx_init(void)
>  	if (!sgx_page_reclaimer_init())
>  		goto err_page_cache;
>  
> -	ret = sgx_drv_init();
> +	/* Success if the native *or* virtual EPC driver initialized cleanly. */
> +	ret = !!sgx_drv_init() & !!sgx_virt_epc_init();
>  	if (ret)
>  		goto err_kthread;

FWIW, I hate that conditional.  But, I tried to write to to be something
more sane and failed.

> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> new file mode 100644
> index 000000000000..d625551ccf25
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -0,0 +1,263 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*  Copyright(c) 2016-20 Intel Corporation. */
> +
> +#include <linux/miscdevice.h>
> +#include <linux/mm.h>
> +#include <linux/mman.h>
> +#include <linux/sched/mm.h>
> +#include <linux/sched/signal.h>
> +#include <linux/slab.h>
> +#include <linux/xarray.h>
> +#include <asm/sgx.h>
> +#include <uapi/asm/sgx.h>
> +
> +#include "encls.h"
> +#include "sgx.h"
> +#include "virt.h"
> +
> +struct sgx_virt_epc {
> +	struct xarray page_array;
> +	struct mutex lock;
> +	struct mm_struct *mm;
> +};
> +
> +static struct mutex virt_epc_lock;
> +static struct list_head virt_epc_zombie_pages;

What does the lock protect?

What are zombie pages?

BTW, if zombies are SECS-only, shouldn't that be in the name rather than
"epc"?

> +static int __sgx_virt_epc_fault(struct sgx_virt_epc *epc,
> +				struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct sgx_epc_page *epc_page;
> +	unsigned long index, pfn;
> +	int ret;
> +
> +	/* epc->lock must already have been hold */

	/* epc->lock must already be held */

Wouldn't this be better as:

WARN_ON(!mutex_is_locked(&epc->lock));

?


> +	/* Calculate index of EPC page in virtual EPC's page_array */
> +	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
> +
> +	epc_page = xa_load(&epc->page_array, index);
> +	if (epc_page)
> +		return 0;
> +
> +	epc_page = sgx_alloc_epc_page(epc, false);
> +	if (IS_ERR(epc_page))
> +		return PTR_ERR(epc_page);
> +
> +	ret = xa_err(xa_store(&epc->page_array, index, epc_page, GFP_KERNEL));
> +	if (ret)
> +		goto err_free;
> +
> +	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
> +
> +	ret = vmf_insert_pfn(vma, addr, pfn);
> +	if (ret != VM_FAULT_NOPAGE) {
> +		ret = -EFAULT;
> +		goto err_delete;
> +	}
> +
> +	return 0;
> +
> +err_delete:
> +	xa_erase(&epc->page_array, index);
> +err_free:
> +	sgx_free_epc_page(epc_page);
> +	return ret;
> +}
> +
> +static vm_fault_t sgx_virt_epc_fault(struct vm_fault *vmf)
> +{
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct sgx_virt_epc *epc = vma->vm_private_data;
> +	int ret;
> +
> +	mutex_lock(&epc->lock);
> +	ret = __sgx_virt_epc_fault(epc, vma, vmf->address);
> +	mutex_unlock(&epc->lock);
> +
> +	if (!ret)
> +		return VM_FAULT_NOPAGE;
> +
> +	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
> +		mmap_read_unlock(vma->vm_mm);
> +		return VM_FAULT_RETRY;
> +	}
> +
> +	return VM_FAULT_SIGBUS;
> +}
> +
> +const struct vm_operations_struct sgx_virt_epc_vm_ops = {
> +	.fault = sgx_virt_epc_fault,
> +};
> +
> +static int sgx_virt_epc_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct sgx_virt_epc *epc = file->private_data;
> +
> +	if (!(vma->vm_flags & VM_SHARED))
> +		return -EINVAL;
> +
> +	/*
> +	 * Don't allow mmap() from child after fork(), since child and parent
> +	 * cannot map to the same EPC.
> +	 */
> +	if (vma->vm_mm != epc->mm)
> +		return -EINVAL;

I mentioned this below, but I'm not buying this logic.  I know it would
be *bad*, but I don't see why the kernel needs to keep it from happening.

> +	vma->vm_ops = &sgx_virt_epc_vm_ops;
> +	/* Don't copy VMA in fork() */
> +	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
> +	vma->vm_private_data = file->private_data;
> +
> +	return 0;
> +}
> +
> +static int sgx_virt_epc_free_page(struct sgx_epc_page *epc_page)
> +{
> +	int ret;
> +
> +	if (!epc_page)
> +		return 0;

I always worry about these.  Why is passing NULL around OK?

> +	/*
> +	 * Explicitly EREMOVE virtual EPC page. Virtual EPC is only used by
> +	 * guest, and in normal condition guest should have done EREMOVE for
> +	 * all EPC pages before they are freed here. But it's possible guest
> +	 * is killed or crashed unnormally in which case EREMOVE has not been
	
				"abnormally"

I don't think "unnormally" is a word.  Also, this isn't just about
crashing or being killed.  The guest could simply have a bug.

> +	 * done. Do EREMOVE unconditionally here to cover both cases, because
> +	 * it's not possible to tell whether guest has done EREMOVE, since
> +	 * virtual EPC page status is not tracked. And it is fine to EREMOVE
> +	 * EPC page multiple times.
> +	 */

Surprise!  I dislike this comment.

	/*
	 * Take a previously guest-owned EPC page and return it to the
	 * general EPC page pool.
	 *
	 * Guests can not be trusted to have left this page in a good
	 * state, so run EREMOVE on the page unconditionally.  In the
	 * case that a guest properly EREMOVE'd this page, a
	 * superfluous EREMOVE is harmless.
	 */

> +	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
> +	if (ret) {
> +		/*
> +		 * Only SGX_CHILD_PRESENT is expected, which is because of
> +		 * EREMOVE-ing an SECS still with child, in which case it can
> +		 * be handled by EREMOVE-ing the SECS again after all pages in
> +		 * virtual EPC have been EREMOVE-ed. See comments in below in
> +		 * sgx_virt_epc_release().
> +		 */
> +		WARN_ON_ONCE(ret != SGX_CHILD_PRESENT);
> +		return ret;
> +	}

I find myself wondering what errors could cause the WARN_ON_ONCE() to be
hit.  The SDM indicates that it's only:

	SGX_ENCLAVE_ACT If there are still logical processors executing
			inside the enclave.

Should that be mentioned in the comment?

> +
> +	__sgx_free_epc_page(epc_page);
> +	return 0;
> +}
> +
> +static int sgx_virt_epc_release(struct inode *inode, struct file *file)
> +{
> +	struct sgx_virt_epc *epc = file->private_data;

FWIW, I hate the "struct sgx_virt_epc *epc" name.  "epc" here is really
an instance

> +	struct sgx_epc_page *epc_page, *tmp, *entry;
> +	unsigned long index;
> +
> +	LIST_HEAD(secs_pages);
> +
> +	mmdrop(epc->mm);
> +
> +	xa_for_each(&epc->page_array, index, entry) {
> +		/*
> +		 * Virtual EPC pages are not tracked, so it's possible for
> +		 * EREMOVE to fail due to, e.g. a SECS page still has children
> +		 * if guest was shutdown unexpectedly. If it is the case, leave
> +		 * it in the xarray and retry EREMOVE below later.
> +		 */

I don't know what it is about the comments, but I cringe every time I
see an "i.e." or "e.g.".

I'd rewrite the comment as:

	/*
	 * Remove all normal, child pages.  sgx_virt_epc_free_page()
	 * will fail if EREMOVE fails, but this is OK and expected on
	 * SECS pages.  Those can only be EREMOVE'd *after* all their
	 * child pages. Retries below will clean them up.
 	 */

> +		if (sgx_virt_epc_free_page(entry))
> +			continue;
> +
> +		xa_erase(&epc->page_array, index);
> +	}
> +
> +	/*
> +	 * Retry all failed pages after iterating through the entire tree, at
> +	 * which point all children should be removed and the SECS pages can be
> +	 * nuked as well...unless userspace has exposed multiple instance of
> +	 * virtual EPC to a single VM.
> +	 */

I'm just a comment grouch today I guess.  That's a horrible run-on
sentence.  Let's just state the goal of the loop in the comment above it:

	Retry EREMOVE'ing pages.  This will clean up any SECS pages that
	only had children in this 'epc' area.

> +	xa_for_each(&epc->page_array, index, entry) {
> +		epc_page = entry;

Then, talk about the error condition here:

> +		/*
> +		 * Error here means that EREMOVE failed due to a SECS page
> +		 * still has child on *another* EPC instance.  Put it to a
> +		 * temporary SECS list which will be spliced to 'zombie page
> +		 * list' and will be EREMOVE-ed again when freeing another
> +		 * virtual EPC instance.
> +		 */

Surprise, I've got another rewrite:

		/*
		 * An EREMOVE failure here means that the SECS page
		 * still has children.  But, since all children in this
		 * 'sgx_virt_epc' have been removed, the SECS page must
		 * have a child on another instance.
		 */

> +		if (sgx_virt_epc_free_page(epc_page))
> +			list_add_tail(&epc_page->list, &secs_pages);

Why move these over to &secs_list here?  I think it's to avoid another
xa_for_each() below, but it's not clear.

> +		xa_erase(&epc->page_array, index);
> +	}
> +
> +	/*
> +	 * Third time's a charm.

This is confusing.  This section is *NOT* retrying a third time.  This
is a cute comment, but it's actually, logically different from the two
tries above.  I say remove it.  In fact, I'd even concentrate the
comment here to explain that this is a logically *TOALLY* disconnected
from what happened above.

>		  Try to EREMOVE zombie SECS pages from virtual
> +	 * EPC instances that were previously released, i.e. free SECS pages
> +	 * that were in limbo due to having children in *this* EPC instance.
> +	 */

This is as close as this code gets to telling me what a zombie page is.
 I don't think it gets close enough, or does it in the right spot.

I think it probably needs explicit discussion in the changelog.  I think
Sean explained this to me once, but I've forgotten by now.  The code
needs to be understandable without getting Sean on the phone anyway. :)

I'd probably just say:

	/*
	 * SECS pages are "pinned" by child pages, an unpinned once all
	 * children have been EREMOVE'd.  A child page in this instance
	 * may have pinned an SECS page encountered in an earlier
	 * release(), creating a zombie.  Since some children  were
	 * EREMOVE'd above, try to EREMOVE all zombies in the hopes that
	 * one was unpinned.
	 */

	
> +	mutex_lock(&virt_epc_lock);
> +	list_for_each_entry_safe(epc_page, tmp, &virt_epc_zombie_pages, list) {
> +		/*
> +		 * Speculatively remove the page from the list of zombies, if
> +		 * the page is successfully EREMOVE it will be added to the
> +		 * list of free pages.  If EREMOVE fails, throw the page on the
> +		 * local list, which will be spliced on at the end.
> +		 */
> +		list_del(&epc_page->list);
> +
> +		if (sgx_virt_epc_free_page(epc_page))
> +			list_add_tail(&epc_page->list, &secs_pages);

I don't get this.  Couldn't you do without the unconditional list_del()
and instead just do:

		if (!sgx_virt_epc_free_page(epc_page))
			list_del(&epc_page->list);

Or does the free() code clobber the list_head?  If that's the case,
maybe you should say that explicitly.

> +	}
> +
> +	if (!list_empty(&secs_pages))
> +		list_splice_tail(&secs_pages, &virt_epc_zombie_pages);
> +	mutex_unlock(&virt_epc_lock);
> +
> +	kfree(epc);
> +
> +	return 0;
> +}
> +
> +static int sgx_virt_epc_open(struct inode *inode, struct file *file)
> +{
> +	struct sgx_virt_epc *epc;
> +
> +	epc = kzalloc(sizeof(struct sgx_virt_epc), GFP_KERNEL);
> +	if (!epc)
> +		return -ENOMEM;
> +	/*
> +	 * Keep the current->mm to virtual EPC. It will be checked in
> +	 * sgx_virt_epc_mmap() to prevent, in case of fork, child being
> +	 * able to mmap() to the same virtual EPC pages.
> +	 */
> +	mmgrab(current->mm);
> +	epc->mm = current->mm;
> +	mutex_init(&epc->lock);
> +	xa_init(&epc->page_array);
> +
> +	file->private_data = epc;
> +
> +	return 0;
> +}

I understand why this made sense for regular enclaves, but I'm having a
harder time here.  If you mmap(fd, MAP_SHARED), fork(), and then pass
that mapping through to two different guests, you get to hold the
pieces, just like if you did the same with normal memory.

Why does the kernel need to enforce this policy?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06  1:55 ` [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features Kai Huang
@ 2021-01-06 19:39   ` Dave Hansen
  2021-01-06 22:12     ` Kai Huang
  2021-01-06 22:15   ` Borislav Petkov
  2021-01-11 23:39   ` Jarkko Sakkinen
  2 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 19:39 UTC (permalink / raw)
  To: Kai Huang, linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, haitao.huang, pbonzini, bp, tglx, mingo, hpa

On 1/5/21 5:55 PM, Kai Huang wrote:
> --- a/arch/x86/kernel/cpu/feat_ctl.c
> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
>  {
>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
>  }

Logically, I think you want this *after* the "Allow SGX virtualization
without Launch Control support" patch.  As it stands, this will totally
disable SGX (including virtualization) if launch control is unavailable.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 05/23] x86/cpu/intel: Allow SGX virtualization without Launch Control support
  2021-01-06  1:55 ` [RFC PATCH 05/23] x86/cpu/intel: Allow SGX virtualization without Launch Control support Kai Huang
@ 2021-01-06 19:54   ` Dave Hansen
  2021-01-06 22:34     ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 19:54 UTC (permalink / raw)
  To: Kai Huang, linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, haitao.huang, pbonzini, bp, tglx, mingo,
	hpa, jethro, b.thiel

On 1/5/21 5:55 PM, Kai Huang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Allow SGX virtualization on systems without Launch Control support, i.e.
> allow KVM to expose SGX to guests that support non-LC configurations.

Context, please.

The kernel will currently disable all SGX support if the hardware does
not support launch control.  Make it more permissive to allow SGX
virtualization on systems without Launch Control support.  This will
allow KVM to expose SGX to guests that have less-strict requirements on
the availability of flexible launch control.

> Introduce clear_sgx_lc() to clear SGX_LC feature bit only if SGX Launch
> Control is locked by BIOS when SGX virtualization is enabled, to prevent
> SGX driver being enabled.

This is another run-on, and it makes it really hard to figure out what
it is trying to say.

> Improve error message to distinguish three cases: 1) SGX disabled
> completely by BIOS; 2) SGX disabled completely due to SGX LC is locked
> by BIOS, and SGX virtualization is also disabled; 3) Only SGX driver is
> disabled due to SGX LC is locked by BIOS, but SGX virtualization is
> enabled.

Editing for grammar and clarity again...

Improve error message to distinguish between three cases.  There are two
cases where SGX support is completely disabled:
1) SGX has been disabled completely by the BIOS
2) SGX LC is locked by the BIOS.  Bare-metal support is disabled because
   of LC unavailability.  SGX virtualization is unavailable (because of
   Kconfig).
One where it is partially available:
3) SGX LC is locked by the BIOS.  Bare-metal support is disabled because
   of LC unavailability.  SGX virtualization is supported.

> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Co-developed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/kernel/cpu/feat_ctl.c | 48 +++++++++++++++++++++++++---------
>  1 file changed, 36 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
> index 4fcd57fdc682..b07452b68538 100644
> --- a/arch/x86/kernel/cpu/feat_ctl.c
> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> @@ -101,6 +101,11 @@ static void clear_sgx_caps(void)
>  	setup_clear_cpu_cap(X86_FEATURE_SGX2);
>  }
>  
> +static void clear_sgx_lc(void)
> +{
> +	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> +}
> +
>  static int __init nosgx(char *str)
>  {
>  	clear_sgx_caps();
> @@ -113,7 +118,7 @@ early_param("nosgx", nosgx);
>  void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
>  {
>  	bool tboot = tboot_enabled();
> -	bool enable_sgx;
> +	bool enable_sgx_virt, enable_sgx_driver;
>  	u64 msr;
>  
>  	if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) {
> @@ -123,12 +128,19 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
>  	}
>  
>  	/*
> -	 * Enable SGX if and only if the kernel supports SGX and Launch Control
> -	 * is supported, i.e. disable SGX if the LE hash MSRs can't be written.
> +	 * Enable SGX if and only if the kernel supports SGX.  Require Launch
> +	 * Control support if SGX virtualization is *not* supported, i.e.
> +	 * disable SGX if the LE hash MSRs can't be written and SGX can't be
> +	 * exposed to a KVM guest (which might support non-LC configurations).
>  	 */
> -	enable_sgx = cpu_has(c, X86_FEATURE_SGX) &&
> -		     cpu_has(c, X86_FEATURE_SGX_LC) &&
> -		     IS_ENABLED(CONFIG_X86_SGX);
> +	enable_sgx_driver = cpu_has(c, X86_FEATURE_SGX) &&
> +			    cpu_has(c, X86_FEATURE_SGX1) &&
> +			    IS_ENABLED(CONFIG_X86_SGX) &&
> +			    cpu_has(c, X86_FEATURE_SGX_LC);
> +	enable_sgx_virt = cpu_has(c, X86_FEATURE_SGX) &&
> +			  cpu_has(c, X86_FEATURE_SGX1) &&
> +			  IS_ENABLED(CONFIG_X86_SGX) &&
> +			  IS_ENABLED(CONFIG_X86_SGX_VIRTUALIZATION);

Don't we also need some runtime checks here?  What if we boot on
hardware that doesn't support KVM?

>  	if (msr & FEAT_CTL_LOCKED)
>  		goto update_caps;
> @@ -151,8 +163,11 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
>  			msr |= FEAT_CTL_VMX_ENABLED_INSIDE_SMX;
>  	}
>  
> -	if (enable_sgx)
> -		msr |= FEAT_CTL_SGX_ENABLED | FEAT_CTL_SGX_LC_ENABLED;
> +	if (enable_sgx_driver || enable_sgx_virt) {
> +		msr |= FEAT_CTL_SGX_ENABLED;
> +		if (enable_sgx_driver)
> +			msr |= FEAT_CTL_SGX_LC_ENABLED;
> +	}
>  
>  	wrmsrl(MSR_IA32_FEAT_CTL, msr);
>  
> @@ -175,10 +190,19 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
>  	}
>  
>  update_sgx:
> -	if (!(msr & FEAT_CTL_SGX_ENABLED) ||
> -	    !(msr & FEAT_CTL_SGX_LC_ENABLED) || !enable_sgx) {
> -		if (enable_sgx)
> -			pr_err_once("SGX disabled by BIOS\n");
> +	if (!(msr & FEAT_CTL_SGX_ENABLED)) {
> +		if (enable_sgx_driver || enable_sgx_virt)
> +			pr_err_once("SGX disabled by BIOS.\n");
>  		clear_sgx_caps();
>  	}
> +	if (!(msr & FEAT_CTL_SGX_LC_ENABLED) &&
> +	    (enable_sgx_driver || enable_sgx_virt)) {
> +		if (!enable_sgx_virt) {
> +			pr_err_once("SGX Launch Control is locked. Disable SGX.\n");
> +			clear_sgx_caps();
> +		} else if (enable_sgx_driver) {
> +			pr_err_once("SGX Launch Control is locked. Disable SGX driver.\n");

Should we have an explicit message for enabling virtualization?  I'm not
sure how many people will understand that "SGX driver" actually doesn't
mean /dev/sgx_epc_virt.

> +			clear_sgx_lc();
> +		}
> +	}
>  }
> 


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 10/23] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs
  2021-01-06  1:56 ` [RFC PATCH 10/23] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs Kai Huang
@ 2021-01-06 19:56   ` Dave Hansen
  0 siblings, 0 replies; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 19:56 UTC (permalink / raw)
  To: Kai Huang, linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, haitao.huang, pbonzini, bp, tglx, mingo, hpa

On 1/5/21 5:56 PM, Kai Huang wrote:
> Add a helper to update SGX_LEPUBKEYHASHn MSRs. SGX virtualization also
> needs to update those MSRs based on guest's "virtual" SGX_LEPUBKEYHASHn
> before EINIT from guest.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>

This one is fine from a core x86 perspective:

Acked-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 11/23] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-01-06  1:56 ` [RFC PATCH 11/23] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Kai Huang
@ 2021-01-06 20:12   ` Dave Hansen
  2021-01-06 21:04     ` Sean Christopherson
  0 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 20:12 UTC (permalink / raw)
  To: Kai Huang, linux-sgx, kvm, x86
  Cc: seanjc, jarkko, luto, haitao.huang, pbonzini, bp, tglx, mingo, hpa

On 1/5/21 5:56 PM, Kai Huang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Provide wrappers around __ecreate() and __einit() to hide the ugliness
> of overloading the ENCLS return value to encode multiple error formats
> in a single int.  KVM will trap-and-execute ECREATE and EINIT as part
> of SGX virtualization, and on an exception, KVM needs the trapnr so that
> it can inject the correct fault into the guest.

This is missing a bit of a step about how and why ECREATE needs to be
run in the host in the first place.

> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> new file mode 100644
> index 000000000000..0d643b985085
> --- /dev/null
> +++ b/arch/x86/include/asm/sgx.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_SGX_H
> +#define _ASM_X86_SGX_H
> +
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_X86_SGX_VIRTUALIZATION
> +struct sgx_pageinfo;
> +
> +int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
> +		     int *trapnr);
> +int sgx_virt_einit(void __user *sigstruct, void __user *token,
> +		   void __user *secs, u64 *lepubkeyhash, int *trapnr);
> +#endif
> +
> +#endif /* _ASM_X86_SGX_H */
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index d625551ccf25..4e9810ba9259 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -261,3 +261,58 @@ int __init sgx_virt_epc_init(void)
>  
>  	return misc_register(&sgx_virt_epc_dev);
>  }
> +
> +int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
> +		     int *trapnr)
> +{
> +	int ret;
> +
> +	__uaccess_begin();
> +	ret = __ecreate(pageinfo, (void *)secs);
> +	__uaccess_end();

The __uaccess_begin/end() worries me.  There are *very* few of these in
the kernel and it seems like something we want to use as sparingly as
possible.

Why don't we just use the kernel mapping for 'secs' and not have to deal
with stac/clac?

I'm also just generally worried about casting away an __user without
doing any checking.  How is that OK?

> +	if (encls_faulted(ret)) {
> +		*trapnr = ENCLS_TRAPNR(ret);
> +		return -EFAULT;
> +	}
> +
> +	/* ECREATE doesn't return an error code, it faults or succeeds. */
> +	WARN_ON_ONCE(ret);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(sgx_virt_ecreate);
> +
> +static int __sgx_virt_einit(void __user *sigstruct, void __user *token,
> +			    void __user *secs)
> +{
> +	int ret;
> +
> +	__uaccess_begin();
> +	ret =  __einit((void *)sigstruct, (void *)token, (void *)secs);
> +	__uaccess_end();
> +	return ret;
> +}

If casting away one __user wasn't good enough, we get three! :)

We need some more background in the changelog on why this is OK.

> +int sgx_virt_einit(void __user *sigstruct, void __user *token,
> +		   void __user *secs, u64 *lepubkeyhash, int *trapnr)
> +{
> +	int ret;
> +
> +	if (!boot_cpu_has(X86_FEATURE_SGX_LC)) {
> +		ret = __sgx_virt_einit(sigstruct, token, secs);
> +	} else {
> +		preempt_disable();
> +
> +		sgx_update_lepubkeyhash(lepubkeyhash);
> +
> +		ret = __sgx_virt_einit(sigstruct, token, secs);
> +		preempt_enable();
> +	}
> +
> +	if (encls_faulted(ret)) {
> +		*trapnr = ENCLS_TRAPNR(ret);
> +		return -EFAULT;
> +	}
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(sgx_virt_einit);
> 


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-06 19:35   ` Dave Hansen
@ 2021-01-06 20:35     ` Sean Christopherson
  2021-01-07  0:47       ` Kai Huang
  2021-01-07  1:42     ` Kai Huang
  1 sibling, 1 reply; 111+ messages in thread
From: Sean Christopherson @ 2021-01-06 20:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kai Huang, linux-sgx, kvm, x86, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, Jan 06, 2021, Dave Hansen wrote:
> On 1/5/21 5:55 PM, Kai Huang wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 95aad183bb65..02993a327a1f 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -9,9 +9,11 @@
> >  #include <linux/sched/mm.h>
> >  #include <linux/sched/signal.h>
> >  #include <linux/slab.h>
> > +#include "arch.h"
> >  #include "driver.h"
> >  #include "encl.h"
> >  #include "encls.h"
> > +#include "virt.h"
> >  
> >  struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
> >  static int sgx_nr_epc_sections;
> > @@ -726,7 +728,8 @@ static void __init sgx_init(void)
> >  	if (!sgx_page_reclaimer_init())
> >  		goto err_page_cache;
> >  
> > -	ret = sgx_drv_init();
> > +	/* Success if the native *or* virtual EPC driver initialized cleanly. */
> > +	ret = !!sgx_drv_init() & !!sgx_virt_epc_init();
> >  	if (ret)
> >  		goto err_kthread;
> 
> FWIW, I hate that conditional.  But, I tried to write to to be something
> more sane and failed.

Heh, you're welcome :-D

> > diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> > new file mode 100644
> > index 000000000000..d625551ccf25
> > --- /dev/null
> > +++ b/arch/x86/kernel/cpu/sgx/virt.c
> > @@ -0,0 +1,263 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*  Copyright(c) 2016-20 Intel Corporation. */
> > +
> > +#include <linux/miscdevice.h>
> > +#include <linux/mm.h>
> > +#include <linux/mman.h>
> > +#include <linux/sched/mm.h>
> > +#include <linux/sched/signal.h>
> > +#include <linux/slab.h>
> > +#include <linux/xarray.h>
> > +#include <asm/sgx.h>
> > +#include <uapi/asm/sgx.h>
> > +
> > +#include "encls.h"
> > +#include "sgx.h"
> > +#include "virt.h"
> > +
> > +struct sgx_virt_epc {
> > +	struct xarray page_array;
> > +	struct mutex lock;
> > +	struct mm_struct *mm;
> > +};
> > +
> > +static struct mutex virt_epc_lock;
> > +static struct list_head virt_epc_zombie_pages;
> 
> What does the lock protect?

Effectively, the list of zombie SECS pages.  Not sure why I used a generic name.

> What are zombie pages?

My own terminology for SECS pages whose virtual EPC has been destroyed but can't
be reclaimed due to them having child EPC pages in other virtual EPCs.

> BTW, if zombies are SECS-only, shouldn't that be in the name rather than
> "epc"?

I used the virt_epc prefix/namespace to tag it as a global list.  I've no
argument against something like zombie_secs_pages.

> > +static int __sgx_virt_epc_fault(struct sgx_virt_epc *epc,
> > +				struct vm_area_struct *vma, unsigned long addr)
> > +{
> > +	struct sgx_epc_page *epc_page;
> > +	unsigned long index, pfn;
> > +	int ret;
> > +
> > +	/* epc->lock must already have been hold */
> 
> 	/* epc->lock must already be held */
> 
> Wouldn't this be better as:
> 
> WARN_ON(!mutex_is_locked(&epc->lock));
> 
> ?

Or just proper lockdep?
 
> > +	/* Calculate index of EPC page in virtual EPC's page_array */
> > +	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
> > +
> > +	epc_page = xa_load(&epc->page_array, index);
> > +	if (epc_page)
> > +		return 0;
> > +
> > +	epc_page = sgx_alloc_epc_page(epc, false);
> > +	if (IS_ERR(epc_page))
> > +		return PTR_ERR(epc_page);
> > +
> > +	ret = xa_err(xa_store(&epc->page_array, index, epc_page, GFP_KERNEL));
> > +	if (ret)
> > +		goto err_free;
> > +
> > +	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
> > +
> > +	ret = vmf_insert_pfn(vma, addr, pfn);
> > +	if (ret != VM_FAULT_NOPAGE) {
> > +		ret = -EFAULT;
> > +		goto err_delete;
> > +	}
> > +
> > +	return 0;
> > +
> > +err_delete:
> > +	xa_erase(&epc->page_array, index);
> > +err_free:
> > +	sgx_free_epc_page(epc_page);
> > +	return ret;
> > +}
> > +
> > +static vm_fault_t sgx_virt_epc_fault(struct vm_fault *vmf)
> > +{
> > +	struct vm_area_struct *vma = vmf->vma;
> > +	struct sgx_virt_epc *epc = vma->vm_private_data;
> > +	int ret;
> > +
> > +	mutex_lock(&epc->lock);
> > +	ret = __sgx_virt_epc_fault(epc, vma, vmf->address);
> > +	mutex_unlock(&epc->lock);
> > +
> > +	if (!ret)
> > +		return VM_FAULT_NOPAGE;
> > +
> > +	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
> > +		mmap_read_unlock(vma->vm_mm);
> > +		return VM_FAULT_RETRY;
> > +	}
> > +
> > +	return VM_FAULT_SIGBUS;
> > +}
> > +
> > +const struct vm_operations_struct sgx_virt_epc_vm_ops = {
> > +	.fault = sgx_virt_epc_fault,
> > +};
> > +
> > +static int sgx_virt_epc_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +	struct sgx_virt_epc *epc = file->private_data;
> > +
> > +	if (!(vma->vm_flags & VM_SHARED))
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * Don't allow mmap() from child after fork(), since child and parent
> > +	 * cannot map to the same EPC.
> > +	 */
> > +	if (vma->vm_mm != epc->mm)
> > +		return -EINVAL;
> 
> I mentioned this below, but I'm not buying this logic.  I know it would
> be *bad*, but I don't see why the kernel needs to keep it from happening.

There's no known use case (KVM doesn't support sharing a VM across multiple
mm structs), and supporting multiple mm structs is a nightmare; see the driver
for the amount of pain incurred.

And IIRC, supporting VMM (KVM) EPC oversubscription, which may or may not ever
happen, was borderline impossible if virtual EPC supports multiple mm structs as
the interaction between KVM and virtual EPC is a disaster in that case.

> > +	vma->vm_ops = &sgx_virt_epc_vm_ops;
> > +	/* Don't copy VMA in fork() */
> > +	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
> > +	vma->vm_private_data = file->private_data;
> > +
> > +	return 0;
> > +}
> > +
> > +static int sgx_virt_epc_free_page(struct sgx_epc_page *epc_page)
> > +{
> > +	int ret;
> > +
> > +	if (!epc_page)
> > +		return 0;
> 
> I always worry about these.  Why is passing NULL around OK?

I suspect I did it to mimic kfree() behavior.  I don't _think_ the radix (now
xarray) usage will ever encounter a NULL entry.

> 
> > +	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
> > +	if (ret) {
> > +		/*
> > +		 * Only SGX_CHILD_PRESENT is expected, which is because of
> > +		 * EREMOVE-ing an SECS still with child, in which case it can
> > +		 * be handled by EREMOVE-ing the SECS again after all pages in
> > +		 * virtual EPC have been EREMOVE-ed. See comments in below in
> > +		 * sgx_virt_epc_release().
> > +		 */
> > +		WARN_ON_ONCE(ret != SGX_CHILD_PRESENT);
> > +		return ret;
> > +	}
> 
> I find myself wondering what errors could cause the WARN_ON_ONCE() to be
> hit.  The SDM indicates that it's only:
> 
> 	SGX_ENCLAVE_ACT If there are still logical processors executing
> 			inside the enclave.
> 
> Should that be mentioned in the comment?

And faults, which are also spliced into the return value by the ENCLS macros.
I do remember hitting this WARN when I broke things, though I can't remember
whether it was a fault or the SGX_ENCLAVE_ACT scenario.  Probably the latter?

> > +
> > +	__sgx_free_epc_page(epc_page);
> > +	return 0;
> > +}
> > +

...

> > +	xa_for_each(&epc->page_array, index, entry) {
> > +		epc_page = entry;
> 
> Then, talk about the error condition here:
> 
> > +		/*
> > +		 * Error here means that EREMOVE failed due to a SECS page
> > +		 * still has child on *another* EPC instance.  Put it to a
> > +		 * temporary SECS list which will be spliced to 'zombie page
> > +		 * list' and will be EREMOVE-ed again when freeing another
> > +		 * virtual EPC instance.
> > +		 */
> 
> Surprise, I've got another rewrite:
> 
> 		/*
> 		 * An EREMOVE failure here means that the SECS page
> 		 * still has children.  But, since all children in this
> 		 * 'sgx_virt_epc' have been removed, the SECS page must
> 		 * have a child on another instance.
> 		 */
> 
> > +		if (sgx_virt_epc_free_page(epc_page))
> > +			list_add_tail(&epc_page->list, &secs_pages);
> 
> Why move these over to &secs_list here?  I think it's to avoid another
> xa_for_each() below, but it's not clear.

Yes?  IIRC, the sole motivation is to make the list_split_tail() operation as
short as possible while holding the global virt_epc_lock.
 
> > +		xa_erase(&epc->page_array, index);
> > +	}
> > +

...
 	
> > +	mutex_lock(&virt_epc_lock);
> > +	list_for_each_entry_safe(epc_page, tmp, &virt_epc_zombie_pages, list) {
> > +		/*
> > +		 * Speculatively remove the page from the list of zombies, if
> > +		 * the page is successfully EREMOVE it will be added to the
> > +		 * list of free pages.  If EREMOVE fails, throw the page on the
> > +		 * local list, which will be spliced on at the end.
> > +		 */
> > +		list_del(&epc_page->list);
> > +
> > +		if (sgx_virt_epc_free_page(epc_page))
> > +			list_add_tail(&epc_page->list, &secs_pages);
> 
> I don't get this.  Couldn't you do without the unconditional list_del()
> and instead just do:
> 
> 		if (!sgx_virt_epc_free_page(epc_page))
> 			list_del(&epc_page->list);
> 
> Or does the free() code clobber the list_head?  If that's the case,
> maybe you should say that explicitly.

More or less.  EPC pages need to be removed from their list before freeing, once
a page is freed it is owned by the allocator.  Deleting after freeing leads to
list corruption if a different thread allocates the page and adds it to a
different list.
 
> > +	}
> > +
> > +	if (!list_empty(&secs_pages))
> > +		list_splice_tail(&secs_pages, &virt_epc_zombie_pages);
> > +	mutex_unlock(&virt_epc_lock);
> > +
> > +	kfree(epc);
> > +
> > +	return 0;
> > +}

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 11/23] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-01-06 20:12   ` Dave Hansen
@ 2021-01-06 21:04     ` Sean Christopherson
  2021-01-06 21:23       ` Dave Hansen
  0 siblings, 1 reply; 111+ messages in thread
From: Sean Christopherson @ 2021-01-06 21:04 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kai Huang, linux-sgx, kvm, x86, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, Jan 06, 2021, Dave Hansen wrote:
> On 1/5/21 5:56 PM, Kai Huang wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > Provide wrappers around __ecreate() and __einit() to hide the ugliness
> > of overloading the ENCLS return value to encode multiple error formats
> > in a single int.  KVM will trap-and-execute ECREATE and EINIT as part
> > of SGX virtualization, and on an exception, KVM needs the trapnr so that
> > it can inject the correct fault into the guest.
> 
> This is missing a bit of a step about how and why ECREATE needs to be
> run in the host in the first place.

There's (hopefully) good info in the KVM usage patch that can be borrowed:

  Add an ECREATE handler that will be used to intercept ECREATE for the
  purpose of enforcing and enclave's MISCSELECT, ATTRIBUTES and XFRM, i.e.
  to allow userspace to restrict SGX features via CPUID.  ECREATE will be
  intercepted when any of the aforementioned masks diverges from hardware
  in order to enforce the desired CPUID model, i.e. inject #GP if the
  guest attempts to set a bit that hasn't been enumerated as allowed-1 in
  CPUID.
 
> > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> > new file mode 100644
> > index 000000000000..0d643b985085
> > --- /dev/null
> > +++ b/arch/x86/include/asm/sgx.h
> > @@ -0,0 +1,16 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_X86_SGX_H
> > +#define _ASM_X86_SGX_H
> > +
> > +#include <linux/types.h>
> > +
> > +#ifdef CONFIG_X86_SGX_VIRTUALIZATION
> > +struct sgx_pageinfo;
> > +
> > +int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
> > +		     int *trapnr);
> > +int sgx_virt_einit(void __user *sigstruct, void __user *token,
> > +		   void __user *secs, u64 *lepubkeyhash, int *trapnr);
> > +#endif
> > +
> > +#endif /* _ASM_X86_SGX_H */
> > diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> > index d625551ccf25..4e9810ba9259 100644
> > --- a/arch/x86/kernel/cpu/sgx/virt.c
> > +++ b/arch/x86/kernel/cpu/sgx/virt.c
> > @@ -261,3 +261,58 @@ int __init sgx_virt_epc_init(void)
> >  
> >  	return misc_register(&sgx_virt_epc_dev);
> >  }
> > +
> > +int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
> > +		     int *trapnr)
> > +{
> > +	int ret;
> > +
> > +	__uaccess_begin();
> > +	ret = __ecreate(pageinfo, (void *)secs);
> > +	__uaccess_end();
> 
> The __uaccess_begin/end() worries me.  There are *very* few of these in
> the kernel and it seems like something we want to use as sparingly as
> possible.
> 
> Why don't we just use the kernel mapping for 'secs' and not have to deal
> with stac/clac?

The kernel mapping isn't readily available.  At this point, it's not even
guaranteed that @secs points at an EPC page.  Unlike the driver code, where the
EPC page is allocated on-demand by the kernel, the pointer here is userspace
(technically guest) controlled.  The caller (KVM) is responsible for ensuring
it's a valid userspace address, but the SGX/EPC specific checks are mostly
deferred to hardware.

It's also possible to either retrieve the existing kernel mapping or to generate
a new mapping by resolving the PFN; this is/was simpler.

> I'm also just generally worried about casting away an __user without
> doing any checking.  How is that OK?

Short answer, KVM validates the virtual addresses.

KVM validates the host virtual addresses (HVA) when creating a memslot (maps
GPA->HVA).  The HVAs that are passed to these helpers are generated/retrieved
by KVM translating GVA->GPA->HVA; the GPA->HVA stage ensures the address is in a
valid memslot, and thus a valid user address.

That being said, these aren't exactly fast operations, adding access_ok() checks
is probably a good idea.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 11/23] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-01-06 21:04     ` Sean Christopherson
@ 2021-01-06 21:23       ` Dave Hansen
  2021-01-06 22:58         ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 21:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, linux-sgx, kvm, x86, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On 1/6/21 1:04 PM, Sean Christopherson wrote:
> On Wed, Jan 06, 2021, Dave Hansen wrote:
>> On 1/5/21 5:56 PM, Kai Huang wrote:
>>> From: Sean Christopherson <sean.j.christopherson@intel.com>
>>>
>>> Provide wrappers around __ecreate() and __einit() to hide the ugliness
>>> of overloading the ENCLS return value to encode multiple error formats
>>> in a single int.  KVM will trap-and-execute ECREATE and EINIT as part
>>> of SGX virtualization, and on an exception, KVM needs the trapnr so that
>>> it can inject the correct fault into the guest.
>>
>> This is missing a bit of a step about how and why ECREATE needs to be
>> run in the host in the first place.
> 
> There's (hopefully) good info in the KVM usage patch that can be borrowed:
> 
>   Add an ECREATE handler that will be used to intercept ECREATE for the
>   purpose of enforcing and enclave's MISCSELECT, ATTRIBUTES and XFRM, i.e.
>   to allow userspace to restrict SGX features via CPUID.  ECREATE will be
>   intercepted when any of the aforementioned masks diverges from hardware
>   in order to enforce the desired CPUID model, i.e. inject #GP if the
>   guest attempts to set a bit that hasn't been enumerated as allowed-1 in
>   CPUID.

OK, so in plain language: the bare-metal kernel must intercept ECREATE
to be able to impose policies on guests.  When it does this, the
bare-metal kernel runs ECREATE against the userspace mapping of the
virtualized EPC.

>>> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
>>> new file mode 100644
>>> index 000000000000..0d643b985085
>>> --- /dev/null
>>> +++ b/arch/x86/include/asm/sgx.h
>>> @@ -0,0 +1,16 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +#ifndef _ASM_X86_SGX_H
>>> +#define _ASM_X86_SGX_H
>>> +
>>> +#include <linux/types.h>
>>> +
>>> +#ifdef CONFIG_X86_SGX_VIRTUALIZATION
>>> +struct sgx_pageinfo;
>>> +
>>> +int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
>>> +		     int *trapnr);
>>> +int sgx_virt_einit(void __user *sigstruct, void __user *token,
>>> +		   void __user *secs, u64 *lepubkeyhash, int *trapnr);
>>> +#endif
>>> +
>>> +#endif /* _ASM_X86_SGX_H */
>>> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
>>> index d625551ccf25..4e9810ba9259 100644
>>> --- a/arch/x86/kernel/cpu/sgx/virt.c
>>> +++ b/arch/x86/kernel/cpu/sgx/virt.c
>>> @@ -261,3 +261,58 @@ int __init sgx_virt_epc_init(void)
>>>  
>>>  	return misc_register(&sgx_virt_epc_dev);
>>>  }
>>> +
>>> +int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
>>> +		     int *trapnr)
>>> +{
>>> +	int ret;
>>> +
>>> +	__uaccess_begin();
>>> +	ret = __ecreate(pageinfo, (void *)secs);
>>> +	__uaccess_end();
>>
>> The __uaccess_begin/end() worries me.  There are *very* few of these in
>> the kernel and it seems like something we want to use as sparingly as
>> possible.
>>
>> Why don't we just use the kernel mapping for 'secs' and not have to deal
>> with stac/clac?
> 
> The kernel mapping isn't readily available. 

Oh, duh.  There's no kernel mapping for EPC... it's not RAM in the first
place.

> At this point, it's not even
> guaranteed that @secs points at an EPC page.  Unlike the driver code, where the
> EPC page is allocated on-demand by the kernel, the pointer here is userspace
> (technically guest) controlled.  The caller (KVM) is responsible for ensuring
> it's a valid userspace address, but the SGX/EPC specific checks are mostly
> deferred to hardware.

Ahh, got it.  Kai, could we get some of this into comments or the changelog?


>> I'm also just generally worried about casting away an __user without
>> doing any checking.  How is that OK?
> 
> Short answer, KVM validates the virtual addresses.
> 
> KVM validates the host virtual addresses (HVA) when creating a memslot (maps
> GPA->HVA).  The HVAs that are passed to these helpers are generated/retrieved
> by KVM translating GVA->GPA->HVA; the GPA->HVA stage ensures the address is in a
> valid memslot, and thus a valid user address.

There is something a *bit* unpalatable about having KVM fill an
'unsigned long' only to cast it to a (void __user *), the to cast it
back to a (void *) to pass it to the SGX inlines.

I guess sparse would catch us in the window that it is __user if someone
tried to dereference it.

Adding access_ok()'s sounds like a good idea to me.  Or, at *least*
commenting why they're not necessary.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code
  2021-01-06 18:28   ` Dave Hansen
@ 2021-01-06 21:40     ` Kai Huang
  2021-01-12  0:26     ` Jarkko Sakkinen
  1 sibling, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06 21:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, 6 Jan 2021 10:28:55 -0800 Dave Hansen wrote:
> On 1/5/21 5:55 PM, Kai Huang wrote:
> > Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
> > failures are expected, but only due to SGX_CHILD_PRESENT.
> 
> This dances around the fact that this is an architectural error-code.
> Could that be explicit?  Maybe the subject should be:
> 
> 	Add SGX_CHILD_PRESENT hardware error code

Sure. I'll do that.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06 19:39   ` Dave Hansen
@ 2021-01-06 22:12     ` Kai Huang
  2021-01-06 22:21       ` Dave Hansen
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-06 22:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
> On 1/5/21 5:55 PM, Kai Huang wrote:
> > --- a/arch/x86/kernel/cpu/feat_ctl.c
> > +++ b/arch/x86/kernel/cpu/feat_ctl.c
> > @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
> >  {
> >  	setup_clear_cpu_cap(X86_FEATURE_SGX);
> >  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> > +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> > +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
> >  }
> 
> Logically, I think you want this *after* the "Allow SGX virtualization
> without Launch Control support" patch.  As it stands, this will totally
> disable SGX (including virtualization) if launch control is unavailable.

To me it is better to be here, since clear_sgx_caps(), which disables SGX
totally, should logically clear all SGX feature bits, no matter later patch's
behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
them too. Otherwise the logic of this patch (adding new SGX feature bits) is
not complete IMHO.

And actually in later patch "Allow SGX virtualization without Launch Control
support", a new clear_sgx_lc() is added, and is called when LC is not
available but SGX virtualization is enabled, to make sure only SGX_LC bit is
cleared in this case. I don't quite understand why we need to clear SGX1 and
SGX2 in clear_sgx_caps() after the later patch.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06  1:55 ` [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features Kai Huang
  2021-01-06 19:39   ` Dave Hansen
@ 2021-01-06 22:15   ` Borislav Petkov
  2021-01-06 23:09     ` Kai Huang
  2021-01-11 23:39   ` Jarkko Sakkinen
  2 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2021-01-06 22:15 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, dave.hansen,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Wed, Jan 06, 2021 at 02:55:21PM +1300, Kai Huang wrote:
> +/* Intel-defined SGX features, CPUID level 0x00000012:0 (EAX), word 19 */
> +#define X86_FEATURE_SGX1		(19*32+ 0) /* SGX1 leaf functions */
> +#define X86_FEATURE_SGX2		(19*32+ 1) /* SGX2 leaf functions */

Is anything else from that leaf going to be added later? Bit 5 is
"supports ENCLV instruction leaves", 6 is ENCLS insn leaves... are those
going to be used in the kernel too eventually?

Rest of them is reserved in the SDM which probably means internal only
for now.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06 22:12     ` Kai Huang
@ 2021-01-06 22:21       ` Dave Hansen
  2021-01-06 22:56         ` Kai Huang
  2021-01-06 23:40         ` Kai Huang
  0 siblings, 2 replies; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 22:21 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On 1/6/21 2:12 PM, Kai Huang wrote:
> On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
>> On 1/5/21 5:55 PM, Kai Huang wrote:
>>> --- a/arch/x86/kernel/cpu/feat_ctl.c
>>> +++ b/arch/x86/kernel/cpu/feat_ctl.c
>>> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
>>>  {
>>>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
>>>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
>>> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
>>> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
>>>  }
>> Logically, I think you want this *after* the "Allow SGX virtualization
>> without Launch Control support" patch.  As it stands, this will totally
>> disable SGX (including virtualization) if launch control is unavailable.
> To me it is better to be here, since clear_sgx_caps(), which disables SGX
> totally, should logically clear all SGX feature bits, no matter later patch's
> behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
> them too. Otherwise the logic of this patch (adding new SGX feature bits) is
> not complete IMHO.
> 
> And actually in later patch "Allow SGX virtualization without Launch Control
> support", a new clear_sgx_lc() is added, and is called when LC is not
> available but SGX virtualization is enabled, to make sure only SGX_LC bit is
> cleared in this case. I don't quite understand why we need to clear SGX1 and
> SGX2 in clear_sgx_caps() after the later patch.

I was talking about patch ordering.  It could be argued that this goes
after the content of patch 05/23.  Please _consider_ changing the ordering.

If that doesn't work for some reason, please at least call out in the
changelog that it leaves a temporarily funky situation.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 05/23] x86/cpu/intel: Allow SGX virtualization without Launch Control support
  2021-01-06 19:54   ` Dave Hansen
@ 2021-01-06 22:34     ` Kai Huang
  2021-01-06 22:38       ` Dave Hansen
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-06 22:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel

On Wed, 6 Jan 2021 11:54:52 -0800 Dave Hansen wrote:
> On 1/5/21 5:55 PM, Kai Huang wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > Allow SGX virtualization on systems without Launch Control support, i.e.
> > allow KVM to expose SGX to guests that support non-LC configurations.
> 
> Context, please.
> 
> The kernel will currently disable all SGX support if the hardware does
> not support launch control.  Make it more permissive to allow SGX
> virtualization on systems without Launch Control support.  This will
> allow KVM to expose SGX to guests that have less-strict requirements on
> the availability of flexible launch control.

OK. I'll add this.

> 
> > Introduce clear_sgx_lc() to clear SGX_LC feature bit only if SGX Launch
> > Control is locked by BIOS when SGX virtualization is enabled, to prevent
> > SGX driver being enabled.
> 
> This is another run-on, and it makes it really hard to figure out what
> it is trying to say.

How about just removing this paragraph? It is a little bit detail anyway. We
can add some comment in the code.

> 
> > Improve error message to distinguish three cases: 1) SGX disabled
> > completely by BIOS; 2) SGX disabled completely due to SGX LC is locked
> > by BIOS, and SGX virtualization is also disabled; 3) Only SGX driver is
> > disabled due to SGX LC is locked by BIOS, but SGX virtualization is
> > enabled.
> 
> Editing for grammar and clarity again...
> 
> Improve error message to distinguish between three cases.  There are two
> cases where SGX support is completely disabled:
> 1) SGX has been disabled completely by the BIOS
> 2) SGX LC is locked by the BIOS.  Bare-metal support is disabled because
>    of LC unavailability.  SGX virtualization is unavailable (because of
>    Kconfig).
> One where it is partially available:
> 3) SGX LC is locked by the BIOS.  Bare-metal support is disabled because
>    of LC unavailability.  SGX virtualization is supported.

OK. Thanks for help here.

> 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kernel/cpu/feat_ctl.c | 48 +++++++++++++++++++++++++---------
> >  1 file changed, 36 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
> > index 4fcd57fdc682..b07452b68538 100644
> > --- a/arch/x86/kernel/cpu/feat_ctl.c
> > +++ b/arch/x86/kernel/cpu/feat_ctl.c
> > @@ -101,6 +101,11 @@ static void clear_sgx_caps(void)
> >  	setup_clear_cpu_cap(X86_FEATURE_SGX2);
> >  }
> >  
> > +static void clear_sgx_lc(void)
> > +{
> > +	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> > +}
> > +
> >  static int __init nosgx(char *str)
> >  {
> >  	clear_sgx_caps();
> > @@ -113,7 +118,7 @@ early_param("nosgx", nosgx);
> >  void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
> >  {
> >  	bool tboot = tboot_enabled();
> > -	bool enable_sgx;
> > +	bool enable_sgx_virt, enable_sgx_driver;
> >  	u64 msr;
> >  
> >  	if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) {
> > @@ -123,12 +128,19 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
> >  	}
> >  
> >  	/*
> > -	 * Enable SGX if and only if the kernel supports SGX and Launch Control
> > -	 * is supported, i.e. disable SGX if the LE hash MSRs can't be written.
> > +	 * Enable SGX if and only if the kernel supports SGX.  Require Launch
> > +	 * Control support if SGX virtualization is *not* supported, i.e.
> > +	 * disable SGX if the LE hash MSRs can't be written and SGX can't be
> > +	 * exposed to a KVM guest (which might support non-LC configurations).
> >  	 */
> > -	enable_sgx = cpu_has(c, X86_FEATURE_SGX) &&
> > -		     cpu_has(c, X86_FEATURE_SGX_LC) &&
> > -		     IS_ENABLED(CONFIG_X86_SGX);
> > +	enable_sgx_driver = cpu_has(c, X86_FEATURE_SGX) &&
> > +			    cpu_has(c, X86_FEATURE_SGX1) &&
> > +			    IS_ENABLED(CONFIG_X86_SGX) &&
> > +			    cpu_has(c, X86_FEATURE_SGX_LC);
> > +	enable_sgx_virt = cpu_has(c, X86_FEATURE_SGX) &&
> > +			  cpu_has(c, X86_FEATURE_SGX1) &&
> > +			  IS_ENABLED(CONFIG_X86_SGX) &&
> > +			  IS_ENABLED(CONFIG_X86_SGX_VIRTUALIZATION);
> 
> Don't we also need some runtime checks here?  What if we boot on
> hardware that doesn't support KVM?

Yeah I kinda agree here. KVM will be available if X86_FEATURE_VMX is
available. I am OK to add additional check right after 'update_sgx' label:

update_sgx:
	if (!cpu_has(c, X86_FEATURE_VMX))
		enable_sgx_driver = 0;

The rest logic should just work. If necessary, we can also add some message to
say SGX virtualization is disabled due to VMX is not available.

Sean, what is your opinion?

> 
> >  	if (msr & FEAT_CTL_LOCKED)
> >  		goto update_caps;
> > @@ -151,8 +163,11 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
> >  			msr |= FEAT_CTL_VMX_ENABLED_INSIDE_SMX;
> >  	}
> >  
> > -	if (enable_sgx)
> > -		msr |= FEAT_CTL_SGX_ENABLED | FEAT_CTL_SGX_LC_ENABLED;
> > +	if (enable_sgx_driver || enable_sgx_virt) {
> > +		msr |= FEAT_CTL_SGX_ENABLED;
> > +		if (enable_sgx_driver)
> > +			msr |= FEAT_CTL_SGX_LC_ENABLED;
> > +	}
> >  
> >  	wrmsrl(MSR_IA32_FEAT_CTL, msr);
> >  
> > @@ -175,10 +190,19 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
> >  	}
> >  
> >  update_sgx:
> > -	if (!(msr & FEAT_CTL_SGX_ENABLED) ||
> > -	    !(msr & FEAT_CTL_SGX_LC_ENABLED) || !enable_sgx) {
> > -		if (enable_sgx)
> > -			pr_err_once("SGX disabled by BIOS\n");
> > +	if (!(msr & FEAT_CTL_SGX_ENABLED)) {
> > +		if (enable_sgx_driver || enable_sgx_virt)
> > +			pr_err_once("SGX disabled by BIOS.\n");
> >  		clear_sgx_caps();
> >  	}
> > +	if (!(msr & FEAT_CTL_SGX_LC_ENABLED) &&
> > +	    (enable_sgx_driver || enable_sgx_virt)) {
> > +		if (!enable_sgx_virt) {
> > +			pr_err_once("SGX Launch Control is locked. Disable SGX.\n");
> > +			clear_sgx_caps();
> > +		} else if (enable_sgx_driver) {
> > +			pr_err_once("SGX Launch Control is locked. Disable SGX driver.\n");
> 
> Should we have an explicit message for enabling virtualization?  I'm not
> sure how many people will understand that "SGX driver" actually doesn't
> mean /dev/sgx_epc_virt.

OK. I'll add an explicit message for that. Let me see how I can refine this.

Thanks for comments.

> 
> > +			clear_sgx_lc();
> > +		}
> > +	}
> >  }
> > 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 05/23] x86/cpu/intel: Allow SGX virtualization without Launch Control support
  2021-01-06 22:34     ` Kai Huang
@ 2021-01-06 22:38       ` Dave Hansen
  0 siblings, 0 replies; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 22:38 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel

On 1/6/21 2:34 PM, Kai Huang wrote:
>>> Introduce clear_sgx_lc() to clear SGX_LC feature bit only if SGX Launch
>>> Control is locked by BIOS when SGX virtualization is enabled, to prevent
>>> SGX driver being enabled.
>> This is another run-on, and it makes it really hard to figure out what
>> it is trying to say.
> How about just removing this paragraph? It is a little bit detail anyway. We
> can add some comment in the code.

Fine with me.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06 22:21       ` Dave Hansen
@ 2021-01-06 22:56         ` Kai Huang
  2021-01-06 23:19           ` Sean Christopherson
  2021-01-06 23:40         ` Kai Huang
  1 sibling, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-06 22:56 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, 6 Jan 2021 14:21:39 -0800 Dave Hansen wrote:
> On 1/6/21 2:12 PM, Kai Huang wrote:
> > On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
> >> On 1/5/21 5:55 PM, Kai Huang wrote:
> >>> --- a/arch/x86/kernel/cpu/feat_ctl.c
> >>> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> >>> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
> >>>  {
> >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
> >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
> >>>  }
> >> Logically, I think you want this *after* the "Allow SGX virtualization
> >> without Launch Control support" patch.  As it stands, this will totally
> >> disable SGX (including virtualization) if launch control is unavailable.
> > To me it is better to be here, since clear_sgx_caps(), which disables SGX
> > totally, should logically clear all SGX feature bits, no matter later patch's
> > behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
> > them too. Otherwise the logic of this patch (adding new SGX feature bits) is
> > not complete IMHO.
> > 
> > And actually in later patch "Allow SGX virtualization without Launch Control
> > support", a new clear_sgx_lc() is added, and is called when LC is not
> > available but SGX virtualization is enabled, to make sure only SGX_LC bit is
> > cleared in this case. I don't quite understand why we need to clear SGX1 and
> > SGX2 in clear_sgx_caps() after the later patch.
> 
> I was talking about patch ordering.  It could be argued that this goes
> after the content of patch 05/23.  Please _consider_ changing the ordering.
> 
> If that doesn't work for some reason, please at least call out in the
> changelog that it leaves a temporarily funky situation.
> 

The later patch currently uses SGX1 bit, which is the reason that this patch
needs be before later patch.

Sean,

I think it is OK to remove SGX1 bit check in later patch, since I have
never seen a machine with SGX bit in CPUID, but w/o SGX1. If we remove SGX1 bit
check in later, we can put this patch after the later patch.

Do you have comment here? If you are OK, I'll remove SGX1 bit check in later
patch and reorder the patch.








^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 11/23] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-01-06 21:23       ` Dave Hansen
@ 2021-01-06 22:58         ` Kai Huang
  0 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06 22:58 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sean Christopherson, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, bp, tglx, mingo, hpa

On Wed, 6 Jan 2021 13:23:37 -0800 Dave Hansen wrote:
> On 1/6/21 1:04 PM, Sean Christopherson wrote:
> > On Wed, Jan 06, 2021, Dave Hansen wrote:
> >> On 1/5/21 5:56 PM, Kai Huang wrote:
> >>> From: Sean Christopherson <sean.j.christopherson@intel.com>
> >>>
> >>> Provide wrappers around __ecreate() and __einit() to hide the ugliness
> >>> of overloading the ENCLS return value to encode multiple error formats
> >>> in a single int.  KVM will trap-and-execute ECREATE and EINIT as part
> >>> of SGX virtualization, and on an exception, KVM needs the trapnr so that
> >>> it can inject the correct fault into the guest.
> >>
> >> This is missing a bit of a step about how and why ECREATE needs to be
> >> run in the host in the first place.
> > 
> > There's (hopefully) good info in the KVM usage patch that can be borrowed:
> > 
> >   Add an ECREATE handler that will be used to intercept ECREATE for the
> >   purpose of enforcing and enclave's MISCSELECT, ATTRIBUTES and XFRM, i.e.
> >   to allow userspace to restrict SGX features via CPUID.  ECREATE will be
> >   intercepted when any of the aforementioned masks diverges from hardware
> >   in order to enforce the desired CPUID model, i.e. inject #GP if the
> >   guest attempts to set a bit that hasn't been enumerated as allowed-1 in
> >   CPUID.
> 
> OK, so in plain language: the bare-metal kernel must intercept ECREATE
> to be able to impose policies on guests.  When it does this, the
> bare-metal kernel runs ECREATE against the userspace mapping of the
> virtualized EPC.

Thanks. I'll add this to commit message.

> 
> >>> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> >>> new file mode 100644
> >>> index 000000000000..0d643b985085
> >>> --- /dev/null
> >>> +++ b/arch/x86/include/asm/sgx.h
> >>> @@ -0,0 +1,16 @@
> >>> +/* SPDX-License-Identifier: GPL-2.0 */
> >>> +#ifndef _ASM_X86_SGX_H
> >>> +#define _ASM_X86_SGX_H
> >>> +
> >>> +#include <linux/types.h>
> >>> +
> >>> +#ifdef CONFIG_X86_SGX_VIRTUALIZATION
> >>> +struct sgx_pageinfo;
> >>> +
> >>> +int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
> >>> +		     int *trapnr);
> >>> +int sgx_virt_einit(void __user *sigstruct, void __user *token,
> >>> +		   void __user *secs, u64 *lepubkeyhash, int *trapnr);
> >>> +#endif
> >>> +
> >>> +#endif /* _ASM_X86_SGX_H */
> >>> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> >>> index d625551ccf25..4e9810ba9259 100644
> >>> --- a/arch/x86/kernel/cpu/sgx/virt.c
> >>> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> >>> @@ -261,3 +261,58 @@ int __init sgx_virt_epc_init(void)
> >>>  
> >>>  	return misc_register(&sgx_virt_epc_dev);
> >>>  }
> >>> +
> >>> +int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
> >>> +		     int *trapnr)
> >>> +{
> >>> +	int ret;
> >>> +
> >>> +	__uaccess_begin();
> >>> +	ret = __ecreate(pageinfo, (void *)secs);
> >>> +	__uaccess_end();
> >>
> >> The __uaccess_begin/end() worries me.  There are *very* few of these in
> >> the kernel and it seems like something we want to use as sparingly as
> >> possible.
> >>
> >> Why don't we just use the kernel mapping for 'secs' and not have to deal
> >> with stac/clac?
> > 
> > The kernel mapping isn't readily available. 
> 
> Oh, duh.  There's no kernel mapping for EPC... it's not RAM in the first
> place.
> 
> > At this point, it's not even
> > guaranteed that @secs points at an EPC page.  Unlike the driver code, where the
> > EPC page is allocated on-demand by the kernel, the pointer here is userspace
> > (technically guest) controlled.  The caller (KVM) is responsible for ensuring
> > it's a valid userspace address, but the SGX/EPC specific checks are mostly
> > deferred to hardware.
> 
> Ahh, got it.  Kai, could we get some of this into comments or the changelog?

Yes I'll add some into comments.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06 22:15   ` Borislav Petkov
@ 2021-01-06 23:09     ` Kai Huang
  2021-01-07  6:41       ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-06 23:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, dave.hansen,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Wed, 6 Jan 2021 23:15:27 +0100 Borislav Petkov wrote:
> On Wed, Jan 06, 2021 at 02:55:21PM +1300, Kai Huang wrote:
> > +/* Intel-defined SGX features, CPUID level 0x00000012:0 (EAX), word 19 */
> > +#define X86_FEATURE_SGX1		(19*32+ 0) /* SGX1 leaf functions */
> > +#define X86_FEATURE_SGX2		(19*32+ 1) /* SGX2 leaf functions */
> 
> Is anything else from that leaf going to be added later? Bit 5 is
> "supports ENCLV instruction leaves", 6 is ENCLS insn leaves... are those
> going to be used in the kernel too eventually?

Bit 5 and Bit 6 are related to reclaiming EPC page from SGX guest, and the
mechanism behind the two bits are only supposed to be used by KVM.

There's no urgent request to support them for now (and given basic SGX
virtualization is not in upstream), but I don't know whether they need to be
supported in the future.

> 
> Rest of them is reserved in the SDM which probably means internal only
> for now.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06 22:56         ` Kai Huang
@ 2021-01-06 23:19           ` Sean Christopherson
  2021-01-06 23:33             ` Dave Hansen
  2021-01-06 23:56             ` Kai Huang
  0 siblings, 2 replies; 111+ messages in thread
From: Sean Christopherson @ 2021-01-06 23:19 UTC (permalink / raw)
  To: Kai Huang
  Cc: Dave Hansen, linux-sgx, kvm, x86, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Thu, Jan 07, 2021, Kai Huang wrote:
> On Wed, 6 Jan 2021 14:21:39 -0800 Dave Hansen wrote:
> > On 1/6/21 2:12 PM, Kai Huang wrote:
> > > On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
> > >> On 1/5/21 5:55 PM, Kai Huang wrote:
> > >>> --- a/arch/x86/kernel/cpu/feat_ctl.c
> > >>> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> > >>> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
> > >>>  {
> > >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
> > >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> > >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> > >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
> > >>>  }
> > >> Logically, I think you want this *after* the "Allow SGX virtualization
> > >> without Launch Control support" patch.  As it stands, this will totally
> > >> disable SGX (including virtualization) if launch control is unavailable.
> > >>
> > > To me it is better to be here, since clear_sgx_caps(), which disables SGX
> > > totally, should logically clear all SGX feature bits, no matter later patch's
> > > behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
> > > them too. Otherwise the logic of this patch (adding new SGX feature bits) is
> > > not complete IMHO.
> > > 
> > > And actually in later patch "Allow SGX virtualization without Launch Control
> > > support", a new clear_sgx_lc() is added, and is called when LC is not
> > > available but SGX virtualization is enabled, to make sure only SGX_LC bit is
> > > cleared in this case. I don't quite understand why we need to clear SGX1 and
> > > SGX2 in clear_sgx_caps() after the later patch.
> > 
> > I was talking about patch ordering.  It could be argued that this goes
> > after the content of patch 05/23.  Please _consider_ changing the ordering.
> > 
> > If that doesn't work for some reason, please at least call out in the
> > changelog that it leaves a temporarily funky situation.
> > 
> 
> The later patch currently uses SGX1 bit, which is the reason that this patch
> needs be before later patch.
> 
> Sean,
> 
> I think it is OK to remove SGX1 bit check in later patch, since I have
> never seen a machine with SGX bit in CPUID, but w/o SGX1.

The SGX1 check is "needed" to handle the case where SGX is supported but was
soft-disabled, e.g. because software disable a machine check bank by writing an
MCi_CTL MSR.

> If we remove SGX1 bit check in later, we can put this patch after the later
> patch.
> 
> Do you have comment here? If you are OK, I'll remove SGX1 bit check in later
> patch and reorder the patch.

Hmm, I'm not sure why the SGX driver was merged without explicitly checking for
SGX1 support.  I'm pretty sure we had an explicit SGX1 check in the driver path
at some point.  My guess is that the SGX1 change ended up in the KVM series
through a mishandled rebase.

Moving the check later won't break anything that's not already broken.  But,
arguably checking SGX1 is a bug fix of sorts, e.g. to guard against broken
firmware, and should go in as a standalone patch destined for stable.  The
kernel can't prevent SGX from being soft-disabled after boot, but IMO it should
cleanly handle the case where SGX was soft-disabled _before_ boot.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06 23:19           ` Sean Christopherson
@ 2021-01-06 23:33             ` Dave Hansen
  2021-01-06 23:56             ` Kai Huang
  1 sibling, 0 replies; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 23:33 UTC (permalink / raw)
  To: Sean Christopherson, Kai Huang
  Cc: linux-sgx, kvm, x86, jarkko, luto, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa

On 1/6/21 3:19 PM, Sean Christopherson wrote:
>> If we remove SGX1 bit check in later, we can put this patch after the later
>> patch.
>>
>> Do you have comment here? If you are OK, I'll remove SGX1 bit check in later
>> patch and reorder the patch.
> Hmm, I'm not sure why the SGX driver was merged without explicitly checking for
> SGX1 support.  I'm pretty sure we had an explicit SGX1 check in the driver path
> at some point.  My guess is that the SGX1 change ended up in the KVM series
> through a mishandled rebase.

There was one, but I think it got removed when I asked that the
X86_FEATURE_SGX1/2 bits be removed.  I actually even mentioned checking
the CPUID leaf directly with cpuid...() in initialization.  But I missed
when that was never done.

It's not a practical problem, but I do agree we should fix it up for
5.10 stable.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06 22:21       ` Dave Hansen
  2021-01-06 22:56         ` Kai Huang
@ 2021-01-06 23:40         ` Kai Huang
  2021-01-06 23:43           ` Dave Hansen
  1 sibling, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-06 23:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, 6 Jan 2021 14:21:39 -0800 Dave Hansen wrote:
> On 1/6/21 2:12 PM, Kai Huang wrote:
> > On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
> >> On 1/5/21 5:55 PM, Kai Huang wrote:
> >>> --- a/arch/x86/kernel/cpu/feat_ctl.c
> >>> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> >>> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
> >>>  {
> >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
> >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
> >>>  }
> >> Logically, I think you want this *after* the "Allow SGX virtualization
> >> without Launch Control support" patch.  As it stands, this will totally
> >> disable SGX (including virtualization) if launch control is unavailable.
> > To me it is better to be here, since clear_sgx_caps(), which disables SGX
> > totally, should logically clear all SGX feature bits, no matter later patch's
> > behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
> > them too. Otherwise the logic of this patch (adding new SGX feature bits) is
> > not complete IMHO.
> > 
> > And actually in later patch "Allow SGX virtualization without Launch Control
> > support", a new clear_sgx_lc() is added, and is called when LC is not
> > available but SGX virtualization is enabled, to make sure only SGX_LC bit is
> > cleared in this case. I don't quite understand why we need to clear SGX1 and
> > SGX2 in clear_sgx_caps() after the later patch.
> 
> I was talking about patch ordering.  It could be argued that this goes
> after the content of patch 05/23.  Please _consider_ changing the ordering.
> 
> If that doesn't work for some reason, please at least call out in the
> changelog that it leaves a temporarily funky situation.
> 
> 

Hi Dave,

After second thinking, if I understand you correctly, the "funky situation" you
are talking about is, w/o patch "Allow SGX virtualization without Launch
Control Support", SGX virtualization is disabled too if LC is not available in
hardware, but in previous patches (basically patch 3 "Introduce virtual EPC
for use by KVM guests"), we have been treating SGX virtualization can be
enabled?

In this case, clearing SGX1 and SGX2 bits before or after "Allow SGX
virtualization without Launch Control support" patch doesn't make difference,
since KVM should always check SGX bit first.

So a better way is to put "Allow SGX virtualization without Launch Control
Support" at the beginning of this series? If so, the Kconfig
X86_SGX_VIRTUALIZATION needs to be in separate patch at the very beginning.

Does above make sense? 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06 23:40         ` Kai Huang
@ 2021-01-06 23:43           ` Dave Hansen
  2021-01-06 23:56             ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2021-01-06 23:43 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On 1/6/21 3:40 PM, Kai Huang wrote:
> So a better way is to put "Allow SGX virtualization without Launch Control
> Support" at the beginning of this series? If so, the Kconfig
> X86_SGX_VIRTUALIZATION needs to be in separate patch at the very beginning.
> 
> Does above make sense? 

I think it's worth trying.  No promises that anyone will like the end
result, but give it a shot and I'll take a look.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06 23:19           ` Sean Christopherson
  2021-01-06 23:33             ` Dave Hansen
@ 2021-01-06 23:56             ` Kai Huang
  1 sibling, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06 23:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dave Hansen, linux-sgx, kvm, x86, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, 6 Jan 2021 15:19:41 -0800 Sean Christopherson wrote:
> On Thu, Jan 07, 2021, Kai Huang wrote:
> > On Wed, 6 Jan 2021 14:21:39 -0800 Dave Hansen wrote:
> > > On 1/6/21 2:12 PM, Kai Huang wrote:
> > > > On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
> > > >> On 1/5/21 5:55 PM, Kai Huang wrote:
> > > >>> --- a/arch/x86/kernel/cpu/feat_ctl.c
> > > >>> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> > > >>> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
> > > >>>  {
> > > >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
> > > >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> > > >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> > > >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
> > > >>>  }
> > > >> Logically, I think you want this *after* the "Allow SGX virtualization
> > > >> without Launch Control support" patch.  As it stands, this will totally
> > > >> disable SGX (including virtualization) if launch control is unavailable.
> > > >>
> > > > To me it is better to be here, since clear_sgx_caps(), which disables SGX
> > > > totally, should logically clear all SGX feature bits, no matter later patch's
> > > > behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
> > > > them too. Otherwise the logic of this patch (adding new SGX feature bits) is
> > > > not complete IMHO.
> > > > 
> > > > And actually in later patch "Allow SGX virtualization without Launch Control
> > > > support", a new clear_sgx_lc() is added, and is called when LC is not
> > > > available but SGX virtualization is enabled, to make sure only SGX_LC bit is
> > > > cleared in this case. I don't quite understand why we need to clear SGX1 and
> > > > SGX2 in clear_sgx_caps() after the later patch.
> > > 
> > > I was talking about patch ordering.  It could be argued that this goes
> > > after the content of patch 05/23.  Please _consider_ changing the ordering.
> > > 
> > > If that doesn't work for some reason, please at least call out in the
> > > changelog that it leaves a temporarily funky situation.
> > > 
> > 
> > The later patch currently uses SGX1 bit, which is the reason that this patch
> > needs be before later patch.
> > 
> > Sean,
> > 
> > I think it is OK to remove SGX1 bit check in later patch, since I have
> > never seen a machine with SGX bit in CPUID, but w/o SGX1.
> 
> The SGX1 check is "needed" to handle the case where SGX is supported but was
> soft-disabled, e.g. because software disable a machine check bank by writing an
> MCi_CTL MSR.
> 
> > If we remove SGX1 bit check in later, we can put this patch after the later
> > patch.
> > 
> > Do you have comment here? If you are OK, I'll remove SGX1 bit check in later
> > patch and reorder the patch.
> 
> Hmm, I'm not sure why the SGX driver was merged without explicitly checking for
> SGX1 support.  I'm pretty sure we had an explicit SGX1 check in the driver path
> at some point.  My guess is that the SGX1 change ended up in the KVM series
> through a mishandled rebase.
> 
> Moving the check later won't break anything that's not already broken.  But,
> arguably checking SGX1 is a bug fix of sorts, e.g. to guard against broken
> firmware, and should go in as a standalone patch destined for stable.  The
> kernel can't prevent SGX from being soft-disabled after boot, but IMO it should
> cleanly handle the case where SGX was soft-disabled _before_ boot.

It seems I need to dig some history. Thanks Sean for the info!

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06 23:43           ` Dave Hansen
@ 2021-01-06 23:56             ` Kai Huang
  0 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-06 23:56 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, 6 Jan 2021 15:43:54 -0800 Dave Hansen wrote:
> On 1/6/21 3:40 PM, Kai Huang wrote:
> > So a better way is to put "Allow SGX virtualization without Launch Control
> > Support" at the beginning of this series? If so, the Kconfig
> > X86_SGX_VIRTUALIZATION needs to be in separate patch at the very beginning.
> > 
> > Does above make sense? 
> 
> I think it's worth trying.  No promises that anyone will like the end
> result, but give it a shot and I'll take a look.

OK I'll try in next version. Thanks.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-06 17:07 ` Dave Hansen
@ 2021-01-07  0:34   ` Kai Huang
  2021-01-07  0:48     ` Dave Hansen
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-07  0:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, jmattson, joro,
	vkuznets, wanpengli, corbet

On Wed, 6 Jan 2021 09:07:13 -0800 Dave Hansen wrote:
> On 1/5/21 5:55 PM, Kai Huang wrote:
> > - Virtual EPC
> > 
> > "Virtual EPC" is the EPC section exposed by KVM to guest so SGX software in
> > guest can discover it and use it to create SGX enclaves. KVM exposes SGX to 
> > guest via CPUID, and exposes one or more "virtual EPC" sections for guest.
> > The size of "virtual EPC" is passed as Qemu parameter when creating the
> > guest, and the base address is calcualted internally according to guest's
> 
> 				^ calculated
> 
> > configuration.
> 
> This is not a great first paragraph to introduce me to this feature.
> 
> Please remind us what EPC *is*, then you can go and talk about why we
> have to virtualize it, and how "virtual EPC" is different from normal
> EPC.  For instance:
> 
> SGX enclave memory is special and is reserved specifically for enclave
> use.  In bare-metal SGX enclaves, the kernel allocates enclave pages,
> copies data into the pages with privileged instructions, then allows the
> enclave to start.  In this scenario, only initialized pages already
> assigned to an enclave are mapped to userspace.
> 
> In virtualized environments, the hypervisor still needs to do the
> physical enclave page allocation.  The guest kernel is responsible for
> the data copying (among other things).  This means that the job of
> starting an enclave is now split between hypervisor and guest.
> 
> This series introduces a new misc device: /dev/sgx_virt_epc.  This
> device allows the host to map *uninitialized* enclave memory into
> userspace, which can then be passed into a guest.
> 
> While it might be *possible* to start a host-side enclave with
> /dev/sgx_enclave and pass its memory into a guest, it would be wasteful
> and convoluted.

Thanks. I'll add this.

> 
> > core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
> > "virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX driver,
> > virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave associated,
> > and how virtual EPC is used by guest is compeletely controlled by guest's SGX
> 
> 					   ^ completely
> 
> Please run a spell checker on this thing.

Yeah will do. Thanks for good suggestion.

> 
> > software.
> > 
> > Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> > /dev/sgx_virt_epc rather than in KVM. Doing so has two major advantages:
> > 
> >   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
> >     just another memory backend for guests.
> > 
> >   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
> >     does not have to export any symbols, changes to reclaim flows don't
> >     need to be routed through KVM, SGX's dirty laundry doesn't have to
> >     get aired out for the world to see, and so on and so forth.
> > 
> > The virtual EPC allocated to guests is currently not reclaimable, due to
> > reclaiming EPC from KVM guests is not currently supported. Due to the
> > complications of handling reclaim conflicts between guest and host, KVM
> > EPC oversubscription, which allows total virtual EPC size greater than
> > physical EPC by being able to reclaiming guests' EPC, is significantly more
> > complex than basic support for SGX virtualization.
> 
> It would also help here to remind the reader that enclave pages have a
> special reclaim mechanism separtae from normal page reclaim, and that
> mechanism is disabled for these pages.

OK.

> 
> Does the *ABI* here preclude doing oversubscription in the future?

I am Sorry what *ABI* do you mean?

> 
> > - Support SGX virtualization without SGX Launch Control unlocked mode
> > 
> > Although SGX driver requires SGX Launch Control unlocked mode to work, SGX
> 
> Although the bare-metal SGX driver requires...

OK.

> 
> Also, didn't we call this "Flexible Launch Control"?

I am actually a little bit confused about all those terms here. I don't think
from spec's perspective, there's such thing "Flexible Launch Control", but I
think everyone knows what does it mean. But I am not sure whether it is
commonly used by community. 

I think using FLC is fine if we only want to mention unlocked mode. But if you
want to mention both, IMHO it would be better to specifically use LC locked
mode and unlocked mode, since technically there's third case that LC is not
present at all.

> 
> > virtualization doesn't, since how enclave is created is completely controlled
> > by guest SGX software, which is not necessarily linux. Therefore, this series
> > allows KVM to expose SGX to guest even SGX Launch Control is in locked mode,
> 
> ... "expose SGX to guests even if" ...

Thanks.

> 
> > or is not present at all. The reason is the goal of SGX virtualization, or
> > virtualization in general, is to expose hardware feature to guest, but not to
> > make assumption how guest will use it. Therefore, KVM should support SGX guest
> > as long as hardware is able to, to have chance to support more potential use
> > cases in cloud environment.
> 
> This is kinda long-winded and misses a lot of important context.  How about:
> 
> SGX hardware supports two "launch control" modes to limit which enclaves
> can run.  In the "locked" mode, the hardware prevents enclaves from
> running unless they are blessed by a third party. 

or "by Intel".

 In the unlocked mode,
> the kernel is in full control of which enclaves can run.  The bare-metal
> SGX code refuses to launch enclaves unless it is in the unlocked mode.
> 
> This sgx_virt_epc driver does not have such a restriction.  This allows
> guests which are OK with the locked mode to use SGX, even if the host
> kernel refuses to.

Indeed better. Thanks a lot.

> 
> > - Support exposing SGX2
> > 
> > Due to the same reason above, SGX2 feature detection is added to core SGX code
> > to allow KVM to expose SGX2 to guest, even currently SGX driver doesn't support
> > SGX2, because SGX2 can work just fine in guest w/o any interaction to host SGX
> > driver.
> > 
> > - Restricit SGX guest access to provisioning key
> > 
> > To grant guest being able to fully use SGX, guest needs to be able to create
> > provisioning enclave.
> 
> "enclave" or "enclaves"?

I think should be "enclave", inside one VM, there should only be one
provisioning enclave.

> 
> > However provisioning key is sensitive and is restricted by
> 
> 	^ the

Thanks.

> 
> > /dev/sgx_provision in host SGX driver, therefore KVM SGX virtualization follows
> > the same role: a new KVM_CAP_SGX_ATTRIBUTE is added to KVM uAPI, and only file
> > descriptor of /dev/sgx_provision is passed to that CAP by usersppace hypervisor
> > (Qemu) when creating the guest, it can access provisioning bit. This is done by
> > making KVM trape ECREATE instruction from guest, and check the provisioning bit
> 
> 		^ trap
> 
> > in ECREATE's attribute.
> 
> The grammar in that paragraph is really off to me.  Can you give it
> another go?

I'll refine it. Thanks a lot for input.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-06 20:35     ` Sean Christopherson
@ 2021-01-07  0:47       ` Kai Huang
  2021-01-07  0:52         ` Dave Hansen
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-07  0:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dave Hansen, linux-sgx, kvm, x86, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa


> > > +static struct mutex virt_epc_lock;
> > > +static struct list_head virt_epc_zombie_pages;
> > 
> > What does the lock protect?
> 
> Effectively, the list of zombie SECS pages.  Not sure why I used a generic name.
> 
> > What are zombie pages?
> 
> My own terminology for SECS pages whose virtual EPC has been destroyed but can't
> be reclaimed due to them having child EPC pages in other virtual EPCs.
> 
> > BTW, if zombies are SECS-only, shouldn't that be in the name rather than
> > "epc"?
> 
> I used the virt_epc prefix/namespace to tag it as a global list.  I've no
> argument against something like zombie_secs_pages.

I'll change to zombie_secs_pages, and lock name to zombie_secs_pages_lock,
respectively.


[...]

> > > +static int sgx_virt_epc_free_page(struct sgx_epc_page *epc_page)
> > > +{
> > > +	int ret;
> > > +
> > > +	if (!epc_page)
> > > +		return 0;
> > 
> > I always worry about these.  Why is passing NULL around OK?
> 
> I suspect I did it to mimic kfree() behavior.  I don't _think_ the radix (now
> xarray) usage will ever encounter a NULL entry.

I'll remove the NULL page check.

> 
> > 
> > > +	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
> > > +	if (ret) {
> > > +		/*
> > > +		 * Only SGX_CHILD_PRESENT is expected, which is because of
> > > +		 * EREMOVE-ing an SECS still with child, in which case it can
> > > +		 * be handled by EREMOVE-ing the SECS again after all pages in
> > > +		 * virtual EPC have been EREMOVE-ed. See comments in below in
> > > +		 * sgx_virt_epc_release().
> > > +		 */
> > > +		WARN_ON_ONCE(ret != SGX_CHILD_PRESENT);
> > > +		return ret;
> > > +	}
> > 
> > I find myself wondering what errors could cause the WARN_ON_ONCE() to be
> > hit.  The SDM indicates that it's only:
> > 
> > 	SGX_ENCLAVE_ACT If there are still logical processors executing
> > 			inside the enclave.
> > 
> > Should that be mentioned in the comment?
> 
> And faults, which are also spliced into the return value by the ENCLS macros.
> I do remember hitting this WARN when I broke things, though I can't remember
> whether it was a fault or the SGX_ENCLAVE_ACT scenario.  Probably the latter?

I'll add a comment saying that there should be no active logical processor
still running inside guest's enclave. We cannot handle SGX_ENCLAVE_ACT here
anyway.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-07  0:34   ` Kai Huang
@ 2021-01-07  0:48     ` Dave Hansen
  2021-01-07  1:50       ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2021-01-07  0:48 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, jmattson, joro,
	vkuznets, wanpengli, corbet

On 1/6/21 4:34 PM, Kai Huang wrote:
> On Wed, 6 Jan 2021 09:07:13 -0800 Dave Hansen wrote:
>> Does the *ABI* here preclude doing oversubscription in the future?
> 
> I am Sorry what *ABI* do you mean?

Oh boy.

https://en.wikipedia.org/wiki/Application_binary_interface

In your patch set that you are posting, /dev/sgx_virt_epc is a new
interface: a new ABI.  If we accept your contribution, programs will be
build around and expect Linux to support this ABI.  An ABI is a contract
between software written to use it and the kernel.  The kernel tries
*really* hard to never break its contracts with applications.

OK, now that we have that out of the way, I'll ask my question in
another way:

Your series adds some new interfaces, including /dev/sgx_virt_epc.  If
the kernel wants to add oversubscription in the future, will old binary
application users of /dev/sgx_virt_epc be able to support
oversubscription?  Or, would users of /dev/sgx_virt_epc need to change
to support oversubscription?

>> Also, didn't we call this "Flexible Launch Control"?
> 
> I am actually a little bit confused about all those terms here. I don't think
> from spec's perspective, there's such thing "Flexible Launch Control", but I
> think everyone knows what does it mean. But I am not sure whether it is
> commonly used by community. 
> 
> I think using FLC is fine if we only want to mention unlocked mode. But if you
> want to mention both, IMHO it would be better to specifically use LC locked
> mode and unlocked mode, since technically there's third case that LC is not
> present at all.

Could you go over the changelogs from Jarkko's patches and at least make
these consistent with those?


>>> or is not present at all. The reason is the goal of SGX virtualization, or
>>> virtualization in general, is to expose hardware feature to guest, but not to
>>> make assumption how guest will use it. Therefore, KVM should support SGX guest
>>> as long as hardware is able to, to have chance to support more potential use
>>> cases in cloud environment.
>>
>> This is kinda long-winded and misses a lot of important context.  How about:
>>
>> SGX hardware supports two "launch control" modes to limit which enclaves
>> can run.  In the "locked" mode, the hardware prevents enclaves from
>> running unless they are blessed by a third party. 
> 
> or "by Intel".

From what I understand, Intel had to bless the enclaves but the
architecture itself doesn't say "Intel must bless them".  But, yeah, in
practice, it had to be Intel.

>>> - Support exposing SGX2
>>>
>>> Due to the same reason above, SGX2 feature detection is added to core SGX code
>>> to allow KVM to expose SGX2 to guest, even currently SGX driver doesn't support
>>> SGX2, because SGX2 can work just fine in guest w/o any interaction to host SGX
>>> driver.
>>>
>>> - Restricit SGX guest access to provisioning key
>>>
>>> To grant guest being able to fully use SGX, guest needs to be able to create
>>> provisioning enclave.
>>
>> "enclave" or "enclaves"?
> 
> I think should be "enclave", inside one VM, there should only be one
> provisioning enclave.

This is where the language becomes important.  Is the provisioning
enclave a one-shot deal?  You create one per guest and can never create
another?  Or, can you restart it?  Can you architecturally have more
than one active at once?  Or, can you only create one once the first one
dies?

You'll write that sentence differently based on the answers.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-07  0:47       ` Kai Huang
@ 2021-01-07  0:52         ` Dave Hansen
  2021-01-07  1:38           ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2021-01-07  0:52 UTC (permalink / raw)
  To: Kai Huang, Sean Christopherson
  Cc: linux-sgx, kvm, x86, jarkko, luto, haitao.huang, pbonzini, bp,
	tglx, mingo, hpa

On 1/6/21 4:47 PM, Kai Huang wrote:
>>>> +	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
>>>> +	if (ret) {
>>>> +		/*
>>>> +		 * Only SGX_CHILD_PRESENT is expected, which is because of
>>>> +		 * EREMOVE-ing an SECS still with child, in which case it can
>>>> +		 * be handled by EREMOVE-ing the SECS again after all pages in
>>>> +		 * virtual EPC have been EREMOVE-ed. See comments in below in
>>>> +		 * sgx_virt_epc_release().
>>>> +		 */
>>>> +		WARN_ON_ONCE(ret != SGX_CHILD_PRESENT);
>>>> +		return ret;
>>>> +	}
>>> I find myself wondering what errors could cause the WARN_ON_ONCE() to be
>>> hit.  The SDM indicates that it's only:
>>>
>>> 	SGX_ENCLAVE_ACT If there are still logical processors executing
>>> 			inside the enclave.
>>>
>>> Should that be mentioned in the comment?
>> And faults, which are also spliced into the return value by the ENCLS macros.
>> I do remember hitting this WARN when I broke things, though I can't remember
>> whether it was a fault or the SGX_ENCLAVE_ACT scenario.  Probably the latter?
> I'll add a comment saying that there should be no active logical processor
> still running inside guest's enclave. We cannot handle SGX_ENCLAVE_ACT here
> anyway.

One more thing...

Could we dump out the *actual* error code with a WARN(), please?  If we
see a warning, I'd rather not have to disassemble the instructions and
check against register values to see whether the error code was sane.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-07  0:52         ` Dave Hansen
@ 2021-01-07  1:38           ` Kai Huang
  2021-01-07  5:00             ` Dave Hansen
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-07  1:38 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sean Christopherson, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, bp, tglx, mingo, hpa

On Wed, 6 Jan 2021 16:52:49 -0800 Dave Hansen wrote:
> On 1/6/21 4:47 PM, Kai Huang wrote:
> >>>> +	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
> >>>> +	if (ret) {
> >>>> +		/*
> >>>> +		 * Only SGX_CHILD_PRESENT is expected, which is because of
> >>>> +		 * EREMOVE-ing an SECS still with child, in which case it can
> >>>> +		 * be handled by EREMOVE-ing the SECS again after all pages in
> >>>> +		 * virtual EPC have been EREMOVE-ed. See comments in below in
> >>>> +		 * sgx_virt_epc_release().
> >>>> +		 */
> >>>> +		WARN_ON_ONCE(ret != SGX_CHILD_PRESENT);
> >>>> +		return ret;
> >>>> +	}
> >>> I find myself wondering what errors could cause the WARN_ON_ONCE() to be
> >>> hit.  The SDM indicates that it's only:
> >>>
> >>> 	SGX_ENCLAVE_ACT If there are still logical processors executing
> >>> 			inside the enclave.
> >>>
> >>> Should that be mentioned in the comment?
> >> And faults, which are also spliced into the return value by the ENCLS macros.
> >> I do remember hitting this WARN when I broke things, though I can't remember
> >> whether it was a fault or the SGX_ENCLAVE_ACT scenario.  Probably the latter?
> > I'll add a comment saying that there should be no active logical processor
> > still running inside guest's enclave. We cannot handle SGX_ENCLAVE_ACT here
> > anyway.
> 
> One more thing...
> 
> Could we dump out the *actual* error code with a WARN(), please?  If we
> see a warning, I'd rather not have to disassemble the instructions and
> check against register values to see whether the error code was sane.

Sure. But WARN_ONCE() should be used, right, instead of WARN()?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-06 19:35   ` Dave Hansen
  2021-01-06 20:35     ` Sean Christopherson
@ 2021-01-07  1:42     ` Kai Huang
  2021-01-07  5:02       ` Dave Hansen
  1 sibling, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-07  1:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, 6 Jan 2021 11:35:41 -0800 Dave Hansen wrote:
> On 1/5/21 5:55 PM, Kai Huang wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > Add a misc device /dev/sgx_virt_epc to allow userspace to allocate "raw"
> > EPC without an associated enclave.  The intended and only known use case
> > for raw EPC allocation is to expose EPC to a KVM guest, hence the
> > virt_epc moniker, virt.{c,h} files and X86_SGX_VIRTUALIZATION Kconfig.
> > 
> > Modify sgx_init() to always try to initialize virtual EPC driver, even
> > when SGX driver is disabled due to SGX Launch Control is in locked mode,
> > or not present at all, since SGX virtualization allows to expose SGX to
> > guests that support non-LC configurations.
> 
> The grammar here is a bit off.  Here's a rewrite:
> 
> Modify sgx_init() to always try to initialize the virtual EPC driver,
> even if the bare-metal SGX driver is disabled.  The bare-metal driver
> might be disabled if SGX Launch Control is in locked mode, or not
> supported in the hardware at all.  This allows (non-Linux) guests that
> support non-LC configurations to use SGX.

Thanks. I'll use yours, except I want to change "bare-metal driver might be
disabled.." to "bare-metal driver will be disabled..".

I'll also use all your comments mentioned in your reply to this patch.

[...]

> > +
> > +static int sgx_virt_epc_release(struct inode *inode, struct file *file)
> > +{
> > +	struct sgx_virt_epc *epc = file->private_data;
> 
> FWIW, I hate the "struct sgx_virt_epc *epc" name.  "epc" here is really
> an instance
> 

How about "struct sgx_virt_epc *vepc" ?

[...]

> > +static int sgx_virt_epc_open(struct inode *inode, struct file *file)
> > +{
> > +	struct sgx_virt_epc *epc;
> > +
> > +	epc = kzalloc(sizeof(struct sgx_virt_epc), GFP_KERNEL);
> > +	if (!epc)
> > +		return -ENOMEM;
> > +	/*
> > +	 * Keep the current->mm to virtual EPC. It will be checked in
> > +	 * sgx_virt_epc_mmap() to prevent, in case of fork, child being
> > +	 * able to mmap() to the same virtual EPC pages.
> > +	 */
> > +	mmgrab(current->mm);
> > +	epc->mm = current->mm;
> > +	mutex_init(&epc->lock);
> > +	xa_init(&epc->page_array);
> > +
> > +	file->private_data = epc;
> > +
> > +	return 0;
> > +}
> 
> I understand why this made sense for regular enclaves, but I'm having a
> harder time here.  If you mmap(fd, MAP_SHARED), fork(), and then pass
> that mapping through to two different guests, you get to hold the
> pieces, just like if you did the same with normal memory.
> 
> Why does the kernel need to enforce this policy?

Does Sean's reply in another email satisfy you?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-07  0:48     ` Dave Hansen
@ 2021-01-07  1:50       ` Kai Huang
  2021-01-07 16:14         ` Sean Christopherson
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-07  1:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, jmattson, joro,
	vkuznets, wanpengli, corbet

On Wed, 6 Jan 2021 16:48:58 -0800 Dave Hansen wrote:
> On 1/6/21 4:34 PM, Kai Huang wrote:
> > On Wed, 6 Jan 2021 09:07:13 -0800 Dave Hansen wrote:
> >> Does the *ABI* here preclude doing oversubscription in the future?
> > 
> > I am Sorry what *ABI* do you mean?
> 
> Oh boy.
> 
> https://en.wikipedia.org/wiki/Application_binary_interface
> 
> In your patch set that you are posting, /dev/sgx_virt_epc is a new
> interface: a new ABI.  If we accept your contribution, programs will be
> build around and expect Linux to support this ABI.  An ABI is a contract
> between software written to use it and the kernel.  The kernel tries
> *really* hard to never break its contracts with applications.

Thanks.

> 
> OK, now that we have that out of the way, I'll ask my question in
> another way:
> 
> Your series adds some new interfaces, including /dev/sgx_virt_epc.  If
> the kernel wants to add oversubscription in the future, will old binary
> application users of /dev/sgx_virt_epc be able to support
> oversubscription?  Or, would users of /dev/sgx_virt_epc need to change
> to support oversubscription?

Oversubscription will be completely done in kernel/kvm, and will be
transparent to userspace, so it will not impact ABI.

> 
> >> Also, didn't we call this "Flexible Launch Control"?
> > 
> > I am actually a little bit confused about all those terms here. I don't think
> > from spec's perspective, there's such thing "Flexible Launch Control", but I
> > think everyone knows what does it mean. But I am not sure whether it is
> > commonly used by community. 
> > 
> > I think using FLC is fine if we only want to mention unlocked mode. But if you
> > want to mention both, IMHO it would be better to specifically use LC locked
> > mode and unlocked mode, since technically there's third case that LC is not
> > present at all.
> 
> Could you go over the changelogs from Jarkko's patches and at least make
> these consistent with those?

I'll dig into them.

[...]

> >>> - Restricit SGX guest access to provisioning key
> >>>
> >>> To grant guest being able to fully use SGX, guest needs to be able to create
> >>> provisioning enclave.
> >>
> >> "enclave" or "enclaves"?
> > 
> > I think should be "enclave", inside one VM, there should only be one
> > provisioning enclave.
> 
> This is where the language becomes important.  Is the provisioning
> enclave a one-shot deal?  You create one per guest and can never create
> another?  Or, can you restart it?  Can you architecturally have more
> than one active at once?  Or, can you only create one once the first one
> dies?
> 
> You'll write that sentence differently based on the answers.
> 

I think I can just change to "guest needs to be able to access provisioning
key". :)



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-07  1:38           ` Kai Huang
@ 2021-01-07  5:00             ` Dave Hansen
  0 siblings, 0 replies; 111+ messages in thread
From: Dave Hansen @ 2021-01-07  5:00 UTC (permalink / raw)
  To: Kai Huang
  Cc: Sean Christopherson, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, bp, tglx, mingo, hpa

On 1/6/21 5:38 PM, Kai Huang wrote:
>> Could we dump out the *actual* error code with a WARN(), please?  If we
>> see a warning, I'd rather not have to disassemble the instructions and
>> check against register values to see whether the error code was sane.
> Sure. But WARN_ONCE() should be used, right, instead of WARN()?

Whatever will let you get a printf-format string out and only happens once.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-07  1:42     ` Kai Huang
@ 2021-01-07  5:02       ` Dave Hansen
  2021-01-15 14:07         ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2021-01-07  5:02 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On 1/6/21 5:42 PM, Kai Huang wrote:
>> I understand why this made sense for regular enclaves, but I'm having a
>> harder time here.  If you mmap(fd, MAP_SHARED), fork(), and then pass
>> that mapping through to two different guests, you get to hold the
>> pieces, just like if you did the same with normal memory.
>>
>> Why does the kernel need to enforce this policy?
> Does Sean's reply in another email satisfy you?

I'm not totally convinced.

Please give it a go in the changelog for the next one and try to
convince me that this is a good idea.  Focus on what the downsides will
be if the kernel does not enforce this policy.  What will break, and why
will it be bad?  Why is the kernel in the best position to thwart the
badness?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06 23:09     ` Kai Huang
@ 2021-01-07  6:41       ` Borislav Petkov
  2021-01-08  2:00         ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2021-01-07  6:41 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, dave.hansen,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Thu, Jan 07, 2021 at 12:09:46PM +1300, Kai Huang wrote:
> There's no urgent request to support them for now (and given basic SGX
> virtualization is not in upstream), but I don't know whether they need to be
> supported in the future.

If that is the case, then wasting a whole leaf for two bits doesn't make
too much sense. And it looks like the kvm reverse lookup can be taught
to deal with composing that leaf dynamically when needed instead.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-07  1:50       ` Kai Huang
@ 2021-01-07 16:14         ` Sean Christopherson
  2021-01-08  2:16           ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Sean Christopherson @ 2021-01-07 16:14 UTC (permalink / raw)
  To: Kai Huang
  Cc: Dave Hansen, linux-sgx, kvm, x86, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, jmattson, joro,
	vkuznets, wanpengli, corbet

On Thu, Jan 07, 2021, Kai Huang wrote:
> On Wed, 6 Jan 2021 16:48:58 -0800 Dave Hansen wrote:
> > Your series adds some new interfaces, including /dev/sgx_virt_epc.  If
> > the kernel wants to add oversubscription in the future, will old binary
> > application users of /dev/sgx_virt_epc be able to support
> > oversubscription?  Or, would users of /dev/sgx_virt_epc need to change
> > to support oversubscription?
> 
> Oversubscription will be completely done in kernel/kvm, and will be
> transparent to userspace, so it will not impact ABI.

It's not transparent to userpsace, odds are very good that userspace would want
to opt in/out of EPC reclaim for its VMs.  E.g. for cases where it would be
preferable to fail to launch a VM than degrade performance.

That being said, there are no anticipated /dev/sgx_virt_epc ABI changes to
support reclaim, as the ABI changes will be in KVM.  In the KVM oversubscription
POC, I added a KVM ioctl to allow enabling EPC reclaim/oversubscription.  That
ioctl took a fd for a /dev/sgx_virt_epc instance.

The reason for routing through KVM was to solve two dependencies issues:

  - KVM needs a reference to the virt_epc instance to handle SGX_CONFLICT VM-Exits

  - The SGX subsystem needs to be able to translate GPAs to HVAs to retrieve the
    SECS for a page it is reclaiming.  That requires a KVM instance and helper
    function.

Routing the ioctl through KVM allows KVM to hand over a pointer of itself along
with a GPA->HVA helper, and the SGX subsystem in turn can hand back the virt_epc
instance resolved from the fd.

It would be possible to invert the flow, e.g. pass in a KVM fd to a new
/dev/sgx_virt_epc ioctl, but I suspect that would be kludgier, and in any case
it would be a new ioctl and so would not break existing users.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-07  6:41       ` Borislav Petkov
@ 2021-01-08  2:00         ` Kai Huang
  2021-01-08  5:10           ` Dave Hansen
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-08  2:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, dave.hansen,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Thu, 7 Jan 2021 07:41:25 +0100 Borislav Petkov wrote:
> On Thu, Jan 07, 2021 at 12:09:46PM +1300, Kai Huang wrote:
> > There's no urgent request to support them for now (and given basic SGX
> > virtualization is not in upstream), but I don't know whether they need to be
> > supported in the future.
> 
> If that is the case, then wasting a whole leaf for two bits doesn't make
> too much sense. And it looks like the kvm reverse lookup can be taught
> to deal with composing that leaf dynamically when needed instead.

I am not sure changing reverse lookup to handle dynamic would be acceptable. To
me it is ugly, and I don't have a first glance on how to do it. KVM can query
host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
straightforward as having X86_FEATURE_SGX1/2.

And as Sean pointed out, SGX1 bit is also needed by both SGX driver and
init_ia32_feat_ctl():

	https://www.spinics.net/lists/kvm/msg231973.html

So having it would make things easier.

And regarding to other bits of this leaf, to me: 1) we cannot rule out
possibility that bit 5 and bit 6 will be supported in the future; 2) I cannot
talk more but we cannot rule out the possibility that there will be other bits
introduced in the future.

Sean, what do you think?

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-07 16:14         ` Sean Christopherson
@ 2021-01-08  2:16           ` Kai Huang
  0 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-08  2:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dave Hansen, linux-sgx, kvm, x86, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, jmattson, joro,
	vkuznets, wanpengli, corbet

On Thu, 7 Jan 2021 08:14:44 -0800 Sean Christopherson wrote:
> On Thu, Jan 07, 2021, Kai Huang wrote:
> > On Wed, 6 Jan 2021 16:48:58 -0800 Dave Hansen wrote:
> > > Your series adds some new interfaces, including /dev/sgx_virt_epc.  If
> > > the kernel wants to add oversubscription in the future, will old binary
> > > application users of /dev/sgx_virt_epc be able to support
> > > oversubscription?  Or, would users of /dev/sgx_virt_epc need to change
> > > to support oversubscription?
> > 
> > Oversubscription will be completely done in kernel/kvm, and will be
> > transparent to userspace, so it will not impact ABI.
> 
> It's not transparent to userpsace, odds are very good that userspace would want
> to opt in/out of EPC reclaim for its VMs.  E.g. for cases where it would be
> preferable to fail to launch a VM than degrade performance.

It seems reasonable use case, but I don't have immediate picture how it
requires new ABI related to virtualization. For instance, SGX driver should
expose sysfs saying how frequent the EPC swapping is (with KVM
oversubscription, host SGX code should provide such info in whole I think),
and cloud admin can determine whether to launch new VM.

Another argument is, theoretically, cloud admin may not know how EPC will be
used in guest, so potentially guest will only use very little EPC, thus
creating new VM won't hurt a lot, so I am not sure that, if we want to
upstream KVM oversubscription one day, do we need to consider such case.
 
> 
> That being said, there are no anticipated /dev/sgx_virt_epc ABI changes to
> support reclaim, as the ABI changes will be in KVM.  In the KVM oversubscription
> POC, I added a KVM ioctl to allow enabling EPC reclaim/oversubscription.  That
> ioctl took a fd for a /dev/sgx_virt_epc instance.

Adding IOCTL to enable/disable oversubscription for particular VM seems
user-case dependent, and I am not sure whether we need to support that if we
want to upstream oversubscription one day. To me, it makes sense to upstream
*basic* oversubscription (which just supports reclaiming EPC from VM) first,
and then we can extend if needed according to use cases.

Anyway, oversubscription won't break existing ABI as you mentioned. 

> 
> The reason for routing through KVM was to solve two dependencies issues:
> 
>   - KVM needs a reference to the virt_epc instance to handle SGX_CONFLICT VM-Exits
> 
>   - The SGX subsystem needs to be able to translate GPAs to HVAs to retrieve the
>     SECS for a page it is reclaiming.  That requires a KVM instance and helper
>     function.
> 
> Routing the ioctl through KVM allows KVM to hand over a pointer of itself along
> with a GPA->HVA helper, and the SGX subsystem in turn can hand back the virt_epc
> instance resolved from the fd.
> 
> It would be possible to invert the flow, e.g. pass in a KVM fd to a new
> /dev/sgx_virt_epc ioctl, but I suspect that would be kludgier, and in any case
> it would be a new ioctl and so would not break existing users.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-08  2:00         ` Kai Huang
@ 2021-01-08  5:10           ` Dave Hansen
  2021-01-08  7:03             ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2021-01-08  5:10 UTC (permalink / raw)
  To: Kai Huang, Borislav Petkov
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, tglx, mingo, hpa

On 1/7/21 6:00 PM, Kai Huang wrote:
> On Thu, 7 Jan 2021 07:41:25 +0100 Borislav Petkov wrote:
>> On Thu, Jan 07, 2021 at 12:09:46PM +1300, Kai Huang wrote:
>>> There's no urgent request to support them for now (and given basic SGX
>>> virtualization is not in upstream), but I don't know whether they need to be
>>> supported in the future.
>>
>> If that is the case, then wasting a whole leaf for two bits doesn't make
>> too much sense. And it looks like the kvm reverse lookup can be taught
>> to deal with composing that leaf dynamically when needed instead.
> 
> I am not sure changing reverse lookup to handle dynamic would be acceptable. To
> me it is ugly, and I don't have a first glance on how to do it. KVM can query
> host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
> straightforward as having X86_FEATURE_SGX1/2.

So, Boris was pretty direct here.  Could you please go spend a bit of
time to see what it would take to make these dynamic?  You can check
what our (Intel) plans are for this leaf, but if it's going to remain
sparsely-used, we need to look into making the leaves a bit more dynamic.

> And regarding to other bits of this leaf, to me: 1) we cannot rule out
> possibility that bit 5 and bit 6 will be supported in the future; 2) I cannot
> talk more but we cannot rule out the possibility that there will be other bits
> introduced in the future.

From the Intel side, let's go look at the features that are coming.  We
have a list of CPUID bits that have been dedicated to future CPU
features.  Let's look if 1 of these bits or 30 is coming.  I don't think
it's exactly a state secret approximately how many CPUID bits we *think*
will get used in this leaf.

We can't exactly put together a roadmap of bits, microarchitectures and
chip release dates.  But, we can at least say, "we have immediate plans
for most of the leaf" or "we don't plan to fill up the leaf any time soon."

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-08  5:10           ` Dave Hansen
@ 2021-01-08  7:03             ` Kai Huang
  2021-01-08  7:17               ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-08  7:03 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Borislav Petkov, linux-sgx, kvm, x86, seanjc, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Thu, 7 Jan 2021 21:10:29 -0800 Dave Hansen wrote:
> On 1/7/21 6:00 PM, Kai Huang wrote:
> > On Thu, 7 Jan 2021 07:41:25 +0100 Borislav Petkov wrote:
> >> On Thu, Jan 07, 2021 at 12:09:46PM +1300, Kai Huang wrote:
> >>> There's no urgent request to support them for now (and given basic SGX
> >>> virtualization is not in upstream), but I don't know whether they need to be
> >>> supported in the future.
> >>
> >> If that is the case, then wasting a whole leaf for two bits doesn't make
> >> too much sense. And it looks like the kvm reverse lookup can be taught
> >> to deal with composing that leaf dynamically when needed instead.
> > 
> > I am not sure changing reverse lookup to handle dynamic would be acceptable. To
> > me it is ugly, and I don't have a first glance on how to do it. KVM can query
> > host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
> > straightforward as having X86_FEATURE_SGX1/2.
> 
> So, Boris was pretty direct here.  Could you please go spend a bit of
> time to see what it would take to make these dynamic?  You can check
> what our (Intel) plans are for this leaf, but if it's going to remain
> sparsely-used, we need to look into making the leaves a bit more dynamic.

I don't think reverse lookup can be made dyanmic, but like I said if we don't
have X86_FEATURE_SGX1/2, KVM needs to query raw CPUID when dealing with SGX.

The purpose of reverse lookup is to simplify KVM to have one common helper to
check whether guest's CPUID has particular hardware feature bit or not. For
instance, it changes guest_cpuid_has_xxx(cpuid) to guest_cpuid_has(cpuid,
X86_FEATURE_xxx), so KVM can get rid of bunch of dedicated
guest_cpuid_has_xxx() for each feature, but just use X86_FEATURE_xxx with one
function. W/o X86_FEATURE_SGX1/2, when dealing with them, KVM needs to have
dedicated functions but cannot use common one. That is a drawback for KVM.

Btw, one thing I forgot to say is with X86_FEATURE_SGX1/2, "sgx1" and "sgx2"
will be in /proc/cpuinfo auatomatically. I think showing "sgx2" (and other
future SGX features) in /proc/cpuinfo is helpful. W/o X86_FEATURE_SGX1/2, we
need specific handling, if we want to show them in /proc/cpuinfo.

That being said, if all those doesn't convince Boris and you guys, and Sean has
no say here, I'll remove X86_FEATURE_SGX1/2 in next version.

	
> 
> > And regarding to other bits of this leaf, to me: 1) we cannot rule out
> > possibility that bit 5 and bit 6 will be supported in the future; 2) I cannot
> > talk more but we cannot rule out the possibility that there will be other bits
> > introduced in the future.
> 
> From the Intel side, let's go look at the features that are coming.  We
> have a list of CPUID bits that have been dedicated to future CPU
> features.  Let's look if 1 of these bits or 30 is coming.  I don't think
> it's exactly a state secret approximately how many CPUID bits we *think*
> will get used in this leaf.
> 
> We can't exactly put together a roadmap of bits, microarchitectures and
> chip release dates.  But, we can at least say, "we have immediate plans
> for most of the leaf" or "we don't plan to fill up the leaf any time soon."

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-08  7:03             ` Kai Huang
@ 2021-01-08  7:17               ` Borislav Petkov
  2021-01-08  8:06                 ` Kai Huang
  2021-01-08 23:55                 ` Sean Christopherson
  0 siblings, 2 replies; 111+ messages in thread
From: Borislav Petkov @ 2021-01-08  7:17 UTC (permalink / raw)
  To: Kai Huang
  Cc: Dave Hansen, linux-sgx, kvm, x86, seanjc, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Fri, Jan 08, 2021 at 08:03:50PM +1300, Kai Huang wrote:
> > > I am not sure changing reverse lookup to handle dynamic would be acceptable. To
> > > me it is ugly, and I don't have a first glance on how to do it. KVM can query
> > > host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
> > > straightforward as having X86_FEATURE_SGX1/2.
> > 
> > So, Boris was pretty direct here.  Could you please go spend a bit of
> > time to see what it would take to make these dynamic?  You can check
> > what our (Intel) plans are for this leaf, but if it's going to remain
> > sparsely-used, we need to look into making the leaves a bit more dynamic.
> 
> I don't think reverse lookup can be made dyanmic, but like I said if we don't
> have X86_FEATURE_SGX1/2, KVM needs to query raw CPUID when dealing with SGX.

How about before you go and say that "it is ugly" and "don't think can
be made" you actually go and *really* try it first? Because actually
trying is sometimes faster than trying to find arguments against it. :)

Because I just did it and unless I'm missing something obvious - I
haven't actually tested it - this is not ugly at all and in the long run
it will become one big switch-case, which is perfectly fine.

---
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 59bf91c57aa8..0bf5cb5441f8 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -30,6 +30,7 @@ enum cpuid_leafs
 	CPUID_7_ECX,
 	CPUID_8000_0007_EBX,
 	CPUID_7_EDX,
+	CPUID_12_EAX,	/* used only by KVM for now */
 };
 
 #ifdef CONFIG_X86_FEATURE_NAMES
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..1bc1ade64489 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -292,6 +292,8 @@
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
 #define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 #define X86_FEATURE_PER_THREAD_MBA	(11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
+#define X86_FEATURE_SGX1		(11*32+ 8) /* SGX1 leaf functions */
+#define X86_FEATURE_SGX2		(11*32+ 9) /* SGX2 leaf functions */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dc921d76e42e..33c53a7411a1 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -63,8 +63,27 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
 	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
 	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
+	[CPUID_12_EAX]        = {      0x12, 0, CPUID_EAX},
 };
 
+/*
+ * Map a synthetic X86_FEATURE bit definition to the corresponding bit in the
+ * hardware CPUID leaf.
+ */
+static int map_synthetic_leaf(int x86_feature)
+{
+	switch (x86_feature) {
+	case X86_FEATURE_SGX1:	return BIT(0);
+	case X86_FEATURE_SGX2:	return BIT(1);
+	default:
+		break;
+	}
+
+	WARN_ON_ONCE(1);
+
+	return 0;
+}
+
 /*
  * Reverse CPUID and its derivatives can only be used for hardware-defined
  * feature words, i.e. words whose bits directly correspond to a CPUID leaf.
@@ -78,7 +97,6 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
 	BUILD_BUG_ON(x86_leaf == CPUID_LNX_1);
 	BUILD_BUG_ON(x86_leaf == CPUID_LNX_2);
 	BUILD_BUG_ON(x86_leaf == CPUID_LNX_3);
-	BUILD_BUG_ON(x86_leaf == CPUID_LNX_4);
 	BUILD_BUG_ON(x86_leaf >= ARRAY_SIZE(reverse_cpuid));
 	BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
 }
@@ -91,8 +109,14 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
  */
 static __always_inline u32 __feature_bit(int x86_feature)
 {
-	reverse_cpuid_check(x86_feature / 32);
-	return 1 << (x86_feature & 31);
+	int leaf = x86_feature / 32;
+
+	reverse_cpuid_check(leaf);
+
+	if (leaf == CPUID_LNX_4)
+		return map_synthetic_leaf(x86_feature);
+
+	return BIT(x86_feature & 31);
 }
 
 #define feature_bit(name)  __feature_bit(X86_FEATURE_##name)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-08  7:17               ` Borislav Petkov
@ 2021-01-08  8:06                 ` Kai Huang
  2021-01-08  8:13                   ` Borislav Petkov
  2021-01-08 23:55                 ` Sean Christopherson
  1 sibling, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-08  8:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, linux-sgx, kvm, x86, seanjc, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Fri, 8 Jan 2021 08:17:22 +0100 Borislav Petkov wrote:
> On Fri, Jan 08, 2021 at 08:03:50PM +1300, Kai Huang wrote:
> > > > I am not sure changing reverse lookup to handle dynamic would be acceptable. To
> > > > me it is ugly, and I don't have a first glance on how to do it. KVM can query
> > > > host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
> > > > straightforward as having X86_FEATURE_SGX1/2.
> > > 
> > > So, Boris was pretty direct here.  Could you please go spend a bit of
> > > time to see what it would take to make these dynamic?  You can check
> > > what our (Intel) plans are for this leaf, but if it's going to remain
> > > sparsely-used, we need to look into making the leaves a bit more dynamic.
> > 
> > I don't think reverse lookup can be made dyanmic, but like I said if we don't
> > have X86_FEATURE_SGX1/2, KVM needs to query raw CPUID when dealing with SGX.
> 
> How about before you go and say that "it is ugly" and "don't think can
> be made" you actually go and *really* try it first? Because actually
> trying is sometimes faster than trying to find arguments against it. :)

THanks. Lesson learned :)

> 
> Because I just did it and unless I'm missing something obvious - I
> haven't actually tested it - this is not ugly at all and in the long run
> it will become one big switch-case, which is perfectly fine.

No offence, but using synthetic bits is a little bit hack to me,given they are
actually hardware feature bits. And using synthetic leaf in reverse lookup is
against current KVM code. 

I'll try my own  way in next version, but thank you for the insight! :)

> 
> ---
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 59bf91c57aa8..0bf5cb5441f8 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -30,6 +30,7 @@ enum cpuid_leafs
>  	CPUID_7_ECX,
>  	CPUID_8000_0007_EBX,
>  	CPUID_7_EDX,
> +	CPUID_12_EAX,	/* used only by KVM for now */
>  };
>  
>  #ifdef CONFIG_X86_FEATURE_NAMES
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 84b887825f12..1bc1ade64489 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -292,6 +292,8 @@
>  #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
>  #define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
>  #define X86_FEATURE_PER_THREAD_MBA	(11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
> +#define X86_FEATURE_SGX1		(11*32+ 8) /* SGX1 leaf functions */
> +#define X86_FEATURE_SGX2		(11*32+ 9) /* SGX2 leaf functions */
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index dc921d76e42e..33c53a7411a1 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -63,8 +63,27 @@ static const struct cpuid_reg reverse_cpuid[] = {
>  	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
>  	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
>  	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
> +	[CPUID_12_EAX]        = {      0x12, 0, CPUID_EAX},
>  };
>  
> +/*
> + * Map a synthetic X86_FEATURE bit definition to the corresponding bit in the
> + * hardware CPUID leaf.
> + */
> +static int map_synthetic_leaf(int x86_feature)
> +{
> +	switch (x86_feature) {
> +	case X86_FEATURE_SGX1:	return BIT(0);
> +	case X86_FEATURE_SGX2:	return BIT(1);
> +	default:
> +		break;
> +	}
> +
> +	WARN_ON_ONCE(1);
> +
> +	return 0;
> +}
> +
>  /*
>   * Reverse CPUID and its derivatives can only be used for hardware-defined
>   * feature words, i.e. words whose bits directly correspond to a CPUID leaf.
> @@ -78,7 +97,6 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
>  	BUILD_BUG_ON(x86_leaf == CPUID_LNX_1);
>  	BUILD_BUG_ON(x86_leaf == CPUID_LNX_2);
>  	BUILD_BUG_ON(x86_leaf == CPUID_LNX_3);
> -	BUILD_BUG_ON(x86_leaf == CPUID_LNX_4);
>  	BUILD_BUG_ON(x86_leaf >= ARRAY_SIZE(reverse_cpuid));
>  	BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
>  }
> @@ -91,8 +109,14 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
>   */
>  static __always_inline u32 __feature_bit(int x86_feature)
>  {
> -	reverse_cpuid_check(x86_feature / 32);
> -	return 1 << (x86_feature & 31);
> +	int leaf = x86_feature / 32;
> +
> +	reverse_cpuid_check(leaf);
> +
> +	if (leaf == CPUID_LNX_4)
> +		return map_synthetic_leaf(x86_feature);
> +
> +	return BIT(x86_feature & 31);
>  }
>  
>  #define feature_bit(name)  __feature_bit(X86_FEATURE_##name)
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-08  8:06                 ` Kai Huang
@ 2021-01-08  8:13                   ` Borislav Petkov
  2021-01-08  9:00                     ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2021-01-08  8:13 UTC (permalink / raw)
  To: Kai Huang
  Cc: Dave Hansen, linux-sgx, kvm, x86, seanjc, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Fri, Jan 08, 2021 at 09:06:47PM +1300, Kai Huang wrote:
> No offence, but using synthetic bits is a little bit hack to me,given
> they are actually hardware feature bits.

Why?

Perhaps you need to have a look at Documentation/x86/cpuinfo.rst first.

> And using synthetic leaf in reverse lookup is against current KVM
> code.

You know how the kernel gets improved each day and old limitations are
not valid anymore?

> I'll try my own  way in next version, but thank you for the insight! :)

Feel free but remember to keep it simple. You can use mine too, if you
want to, as long as you attribute it with a Suggested-by or so.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-08  8:13                   ` Borislav Petkov
@ 2021-01-08  9:00                     ` Kai Huang
  0 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-08  9:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, linux-sgx, kvm, x86, seanjc, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Fri, 8 Jan 2021 09:13:14 +0100 Borislav Petkov wrote:
> On Fri, Jan 08, 2021 at 09:06:47PM +1300, Kai Huang wrote:
> > No offence, but using synthetic bits is a little bit hack to me,given
> > they are actually hardware feature bits.
> 
> Why?
> 
> Perhaps you need to have a look at Documentation/x86/cpuinfo.rst first.

Will take a look. Thanks.

> 
> > And using synthetic leaf in reverse lookup is against current KVM
> > code.
> 
> You know how the kernel gets improved each day and old limitations are
> not valid anymore?
> 
> > I'll try my own  way in next version, but thank you for the insight! :)
> 
> Feel free but remember to keep it simple. You can use mine too, if you
> want to, as long as you attribute it with a Suggested-by or so.

OK. Thanks. 

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-08  7:17               ` Borislav Petkov
  2021-01-08  8:06                 ` Kai Huang
@ 2021-01-08 23:55                 ` Sean Christopherson
  2021-01-09  0:35                   ` Borislav Petkov
  2021-01-09  1:19                   ` Borislav Petkov
  1 sibling, 2 replies; 111+ messages in thread
From: Sean Christopherson @ 2021-01-08 23:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Fri, Jan 08, 2021, Borislav Petkov wrote:
> On Fri, Jan 08, 2021 at 08:03:50PM +1300, Kai Huang wrote:
> > > > I am not sure changing reverse lookup to handle dynamic would be acceptable. To
> > > > me it is ugly, and I don't have a first glance on how to do it. KVM can query
> > > > host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
> > > > straightforward as having X86_FEATURE_SGX1/2.
> > > 
> > > So, Boris was pretty direct here.  Could you please go spend a bit of
> > > time to see what it would take to make these dynamic?  You can check
> > > what our (Intel) plans are for this leaf, but if it's going to remain
> > > sparsely-used, we need to look into making the leaves a bit more dynamic.

To be fair, this is the third time we've got conflicting, direct feedback on
this exact issue.  I do agree that it doesn't make sense to burn a whole word
for just two features, I guess I just feel like whining.

[*] https://lore.kernel.org/kvm/20180828102140.GA31102@nazgul.tnic/
[*] https://lore.kernel.org/linux-sgx/20190924162520.GJ19317@zn.tnic/

> > I don't think reverse lookup can be made dyanmic, but like I said if we don't
> > have X86_FEATURE_SGX1/2, KVM needs to query raw CPUID when dealing with SGX.
> 
> How about before you go and say that "it is ugly" and "don't think can
> be made" you actually go and *really* try it first? Because actually
> trying is sometimes faster than trying to find arguments against it. :)
> 
> Because I just did it and unless I'm missing something obvious - I
> haven't actually tested it - this is not ugly at all and in the long run
> it will become one big switch-case, which is perfectly fine.
>
> ---
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 59bf91c57aa8..0bf5cb5441f8 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -30,6 +30,7 @@ enum cpuid_leafs
>  	CPUID_7_ECX,
>  	CPUID_8000_0007_EBX,
>  	CPUID_7_EDX,
> +	CPUID_12_EAX,	/* used only by KVM for now */
>  };
>  
>  #ifdef CONFIG_X86_FEATURE_NAMES
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 84b887825f12..1bc1ade64489 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -292,6 +292,8 @@
>  #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
>  #define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
>  #define X86_FEATURE_PER_THREAD_MBA	(11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
> +#define X86_FEATURE_SGX1		(11*32+ 8) /* SGX1 leaf functions */
> +#define X86_FEATURE_SGX2		(11*32+ 9) /* SGX2 leaf functions */
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index dc921d76e42e..33c53a7411a1 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -63,8 +63,27 @@ static const struct cpuid_reg reverse_cpuid[] = {
>  	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
>  	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
>  	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
> +	[CPUID_12_EAX]        = {      0x12, 0, CPUID_EAX},
>  };

As is, this won't build (if KVM uses the features) because KVM cares about where
the feature actually lives in Linux's words.  The addition of CPUID_12_EAX is
unnecessary, and the new entry in reverse_cpuid would need to be
s/CPUID_12_EAX/CPUID_LNX_4.

That being said, I dislike this approach as it introduces fragility into KVM's
CPUID shenanigans.  E.g. fixing the above will make guest_cpuid_has() functional,
but kvm_cpu_cap_mask() and cpuid_entry_override() will not work as expected.

Here's a more involved approach that I believe will work (compile tested only)
and retains KVM's build magic.  Idea is to allocate a word kvm_cpu_caps for the
hardware-defined, Linux-scattered features, and use boot_cpu_has() to bridge the
gap when populating kvm_cpu_caps.

Another alternative would be to have KVM use boot_cpu_has() for everything, and
omit the memcpy from boot_cpu_data.x86_capability -> kvm_cpu_caps.  That would
eliminate some of the special logic for scattered features, but it adds nearly
3k bytes to kvm_set_cpu_caps(), which is hard to stomach even though it's
effectively one-and-done code.


From 6bdd61e23f1c0bd7519a3a6391c95cde5456f79d Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Fri, 8 Jan 2021 15:46:11 -0800
Subject: [PATCH] KVM: x86: Add support for reverse CPUID lookup of scattered
 features

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/cpufeatures.h |  2 ++
 arch/x86/kvm/cpuid.c               | 36 ++++++++++++++++++---
 arch/x86/kvm/cpuid.h               | 50 +++++++++++++++++++++++++++---
 3 files changed, 78 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 9f9e9511f7cd..2fe57736d644 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -291,6 +291,8 @@
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
 #define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 #define X86_FEATURE_PER_THREAD_MBA	(11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
+#define X86_FEATURE_SGX1                (11*32+ 8) /* SGX1 leafs */
+#define X86_FEATURE_SGX2        	(11*32+ 9) /* SGX2 leafs */

 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 13036cf0b912..4e647524f302 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -28,7 +28,7 @@
  * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
  * aligned to sizeof(unsigned long) because it's not accessed via bitops.
  */
-u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
+u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 EXPORT_SYMBOL_GPL(kvm_cpu_caps);

 static u32 xstate_required_size(u64 xstate_bv, bool compacted)
@@ -53,6 +53,7 @@ static u32 xstate_required_size(u64 xstate_bv, bool compacted)
 }

 #define F feature_bit
+#define SF(name) (boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0)

 static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
 	struct kvm_cpuid_entry2 *entries, int nent, u32 function, u32 index)
@@ -331,13 +332,13 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
 	return r;
 }

-static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
+/* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
+static __always_inline void __kvm_cpu_cap_mask(enum cpuid_leafs leaf)
 {
 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
 	struct kvm_cpuid_entry2 entry;

 	reverse_cpuid_check(leaf);
-	kvm_cpu_caps[leaf] &= mask;

 	cpuid_count(cpuid.function, cpuid.index,
 		    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
@@ -345,6 +346,26 @@ static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
 	kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
 }

+static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
+{
+	/* Use the "init" variant for scattered leafs. */
+	BUILD_BUG_ON(leaf >= NCAPINTS);
+
+	kvm_cpu_caps[leaf] &= mask;
+
+	__kvm_cpu_cap_mask(leaf);
+}
+
+static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
+{
+	/* Use the "mask" variant for hardwared-defined leafs. */
+	BUILD_BUG_ON(leaf < NCAPINTS);
+
+	kvm_cpu_caps[leaf] = mask;
+
+	__kvm_cpu_cap_mask(leaf);
+}
+
 void kvm_set_cpu_caps(void)
 {
 	unsigned int f_nx = is_efer_nx() ? F(NX) : 0;
@@ -355,12 +376,13 @@ void kvm_set_cpu_caps(void)
 	unsigned int f_gbpages = 0;
 	unsigned int f_lm = 0;
 #endif
+	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));

-	BUILD_BUG_ON(sizeof(kvm_cpu_caps) >
+	BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
 		     sizeof(boot_cpu_data.x86_capability));

 	memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
-	       sizeof(kvm_cpu_caps));
+	       sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));

 	kvm_cpu_cap_mask(CPUID_1_ECX,
 		/*
@@ -503,6 +525,10 @@ void kvm_set_cpu_caps(void)
 		F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
 		F(PMM) | F(PMM_EN)
 	);
+
+	kvm_cpu_cap_init(CPUID_12_EAX,
+		SF(SGX1) | SF(SGX2)
+	);
 }
 EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dc921d76e42e..21f92d81d5a5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -7,7 +7,25 @@
 #include <asm/processor.h>
 #include <uapi/asm/kvm_para.h>

-extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
+/*
+ * Hardware-defined CPUID leafs that are scattered in the kernel, but need to
+ * be directly by KVM.  Note, these word values conflict with the kernel's
+ * "bug" caps, but KVM doesn't use those.
+ */
+enum kvm_only_cpuid_leafs {
+	CPUID_12_EAX	 = NCAPINTS,
+	NR_KVM_CPU_CAPS,
+
+	NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
+};
+
+#define X86_KVM_FEATURE(w, f)		((w)*32 + (f))
+
+/* Intel-defined SGX sub-features, CPUID level 0x12 (EAX). */
+#define __X86_FEATURE_SGX1		X86_KVM_FEATURE(CPUID_12_EAX, 0)
+#define __X86_FEATURE_SGX2		X86_KVM_FEATURE(CPUID_12_EAX, 1)
+
+extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 void kvm_set_cpu_caps(void);

 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
@@ -63,6 +81,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
 	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
 	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
+	[CPUID_12_EAX]        = {0x00000012, 0, CPUID_EAX},
 };

 /*
@@ -83,6 +102,25 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
 	BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
 }

+/*
+ * A handful of feature bits are scattered in the kernel's cpufeatures word,
+ * translate them to KVM features that align with the hardware definitions.
+ */
+static __always_inline u32 __feature_translate(int x86_feature)
+{
+	if (x86_feature == X86_FEATURE_SGX1)
+		return __X86_FEATURE_SGX1;
+	else if (x86_feature == X86_FEATURE_SGX2)
+		return __X86_FEATURE_SGX2;
+
+	return x86_feature;
+}
+
+static __always_inline u32 __feature_leaf(int x86_feature)
+{
+	return __feature_translate(x86_feature) / 32;
+}
+
 /*
  * Retrieve the bit mask from an X86_FEATURE_* definition.  Features contain
  * the hardware defined bit number (stored in bits 4:0) and a software defined
@@ -91,6 +129,8 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
  */
 static __always_inline u32 __feature_bit(int x86_feature)
 {
+	x86_feature = __feature_translate(x86_feature);
+
 	reverse_cpuid_check(x86_feature / 32);
 	return 1 << (x86_feature & 31);
 }
@@ -99,7 +139,7 @@ static __always_inline u32 __feature_bit(int x86_feature)

 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);

 	reverse_cpuid_check(x86_leaf);
 	return reverse_cpuid[x86_leaf];
@@ -291,7 +331,7 @@ static inline bool cpuid_fault_enabled(struct kvm_vcpu *vcpu)

 static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);

 	reverse_cpuid_check(x86_leaf);
 	kvm_cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
@@ -299,7 +339,7 @@ static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)

 static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);

 	reverse_cpuid_check(x86_leaf);
 	kvm_cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
@@ -307,7 +347,7 @@ static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)

 static __always_inline u32 kvm_cpu_cap_get(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);

 	reverse_cpuid_check(x86_leaf);
 	return kvm_cpu_caps[x86_leaf] & __feature_bit(x86_feature);
--
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-08 23:55                 ` Sean Christopherson
@ 2021-01-09  0:35                   ` Borislav Petkov
  2021-01-09  1:01                     ` Sean Christopherson
  2021-01-09  1:19                   ` Borislav Petkov
  1 sibling, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2021-01-09  0:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Fri, Jan 08, 2021 at 03:55:52PM -0800, Sean Christopherson wrote:
> To be fair, this is the third time we've got conflicting, direct feedback on
> this exact issue.  I do agree that it doesn't make sense to burn a whole word
> for just two features, I guess I just feel like whining.
> 
> [*] https://lore.kernel.org/kvm/20180828102140.GA31102@nazgul.tnic/
> [*] https://lore.kernel.org/linux-sgx/20190924162520.GJ19317@zn.tnic/

Well, sorry that I confused you guys but in hindsight we probably should
have stopped you right then and there from imposing kvm requirements on
the machinery behind *_cpu_has() and kvm should have been a regular user
of those interfaces like the rest of the kernel code - nothing more.

And if you'd like to do your own X86_FEATURE_* querying but then extend
it with its own functionality, then that should have been decoupled.

And I will look at your patch later when brain is actually awake but
I strongly feel that in order to avoid such situations in the future,
*_cpu_has() internal functionality should be separate from kvm's
respective CPUID leafs representation. For obvious reasons.

And if there should be some partial sharing - if that makes sense at all
- then that should be first agreed upon.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-09  0:35                   ` Borislav Petkov
@ 2021-01-09  1:01                     ` Sean Christopherson
  0 siblings, 0 replies; 111+ messages in thread
From: Sean Christopherson @ 2021-01-09  1:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Sat, Jan 09, 2021, Borislav Petkov wrote:
> On Fri, Jan 08, 2021 at 03:55:52PM -0800, Sean Christopherson wrote:
> > To be fair, this is the third time we've got conflicting, direct feedback on
> > this exact issue.  I do agree that it doesn't make sense to burn a whole word
> > for just two features, I guess I just feel like whining.
> > 
> > [*] https://lore.kernel.org/kvm/20180828102140.GA31102@nazgul.tnic/
> > [*] https://lore.kernel.org/linux-sgx/20190924162520.GJ19317@zn.tnic/
> 
> Well, sorry that I confused you guys but in hindsight we probably should
> have stopped you right then and there from imposing kvm requirements on
> the machinery behind *_cpu_has() and kvm should have been a regular user
> of those interfaces like the rest of the kernel code - nothing more.
> 
> And if you'd like to do your own X86_FEATURE_* querying but then extend
> it with its own functionality, then that should have been decoupled.
> 
> And I will look at your patch later when brain is actually awake but
> I strongly feel that in order to avoid such situations in the future,
> *_cpu_has() internal functionality should be separate from kvm's
> respective CPUID leafs representation. For obvious reasons.

I kinda agree, but I'd prefer not to fully decouple KVM's CPUID stuff.  The more
manual definitions/translations we have to create, the more likely it is that
we'll screw something up.

> And if there should be some partial sharing - if that makes sense at all
> - then that should be first agreed upon.

Assuming the code I wrote actually works, I think that gets KVM to the point
where handling scattered features isn't awful, which should eliminate most of
the friction.  KVM would still be relying on the internals of *_cpu_has(), but
there are quite a few build-time assertions that help keep things aligned.  And,
what's best for the kernel will be what's best for KVM the vast majority of the
time, e.g. I don't anticipate the kernel scattering densely populated words just
for giggles.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-08 23:55                 ` Sean Christopherson
  2021-01-09  0:35                   ` Borislav Petkov
@ 2021-01-09  1:19                   ` Borislav Petkov
  2021-01-11 17:54                     ` Sean Christopherson
  1 sibling, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2021-01-09  1:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Fri, Jan 08, 2021 at 03:55:52PM -0800, Sean Christopherson wrote:
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index dc921d76e42e..21f92d81d5a5 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -7,7 +7,25 @@
>  #include <asm/processor.h>
>  #include <uapi/asm/kvm_para.h>
> 
> -extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
> +/*
> + * Hardware-defined CPUID leafs that are scattered in the kernel, but need to
> + * be directly by KVM.  Note, these word values conflict with the kernel's
> + * "bug" caps, but KVM doesn't use those.

This feels like another conflict waiting to happen if KVM decides to use
them at some point...

So let me get this straight: KVM wants to use X86_FEATURE_* which
means, those numbers must map to the respective words in its CPUID caps
representation kvm_cpu_caps, AFAICT.

Then, it wants the leafs to correspond to the hardware leafs layout so
that it can do:

	kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);

which comes straight from CPUID.

So lemme look at one word:

        kvm_cpu_cap_mask(CPUID_1_EDX,
                F(FPU) | F(VME) | F(DE) | F(PSE) |
                F(TSC) | F(MSR) | F(PAE) | F(MCE) |
		...


it would build the bitmask of the CPUID leaf using X86_FEATURE_* bits
and then mask it out with the hardware leaf read from CPUID.

But why?

Why doesn't it simply build those leafs in kvm_cpu_caps from the leafs
we've already queried?

Oh it does so a bit earlier:

        memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
               sizeof(kvm_cpu_caps));

and that kvm_cpu_cap_mask() call is to clear some bits in kvm_cpu_caps
which is kvm-specific thing (not supported stuff etc).

But then why does kvm_cpu_cap_mask() does cpuid_count()? Didn't it just
read the bits from boot_cpu_data.x86_capability? And those bits we do
query and massage extensively during boot. So why does KVM needs to
query CPUID again instead of using what we've already queried?

Maybe I'm missing something kvm-specific.

In any case, this feels somewhat weird: you have *_cpu_has() on
baremetal abstracting almost completely from CPUID by collecting all
feature bits it needs into its own structure - x86_capability[] along
with accessors for it - and then you want to "abstract back" to CPUID
leafs from that interface. I wonder why.

Anyway, more questions tomorrow.

Gnight and good luck. :)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
                   ` (24 preceding siblings ...)
  2021-01-06 17:07 ` Dave Hansen
@ 2021-01-11 17:20 ` Jarkko Sakkinen
  2021-01-11 18:37   ` Sean Christopherson
  2021-01-12  1:14   ` Kai Huang
  25 siblings, 2 replies; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-11 17:20 UTC (permalink / raw)
  To: Kai Huang, linux-sgx, kvm, x86
  Cc: seanjc, luto, dave.hansen, haitao.huang, pbonzini, bp, tglx,
	mingo, hpa, jethro, b.thiel, mattson, joro, vkuznets, wanpengli,
	corbet

On Wed, 2021-01-06 at 14:55 +1300, Kai Huang wrote:
> --- Disclaimer ---
> 
> These patches were originally written by Sean Christopherson while at Intel.
> Now that Sean has left Intel, I (Kai) have taken over getting them upstream.
> This series needs more review before it can be merged.  It is being posted
> publicly and under RFC so Sean and others can review it. Maintainers are safe
> ignoring it for now.
> 
> ------------------
> 
> Hi all,
> 
> This series adds KVM SGX virtualization support. The first 12 patches starting
> with x86/sgx or x86/cpu.. are necessary changes to x86 and SGX core/driver to
> support KVM SGX virtualization, while the rest are patches to KVM subsystem.
> 
> Please help to review this series. Also I'd like to hear what is the proper
> way to merge this series, since it contains change to both x86/SGX and KVM
> subsystem. Any feedback is highly appreciated. And please let me know if I
> forgot to CC anyone, or anyone wants to be removed from CC. Thanks in advance!
> 
> This series is based against latest tip tree's x86/sgx branch. You can also get
> the code from tip branch of kvm-sgx repo on github:
> 
>         https://github.com/intel/kvm-sgx.git tip
> 
> It also requires Qemu changes to create VM with SGX support. You can find Qemu
> repo here:
> 
>         https://github.com/intel/qemu-sgx.git next
> 
> Please refer to README.md of above qemu-sgx repo for detail on how to create
> guest with SGX support. At meantime, for your quick reference you can use below
> command to create SGX guest:
> 
>         #qemu-system-x86_64 -smp 4 -m 2G -drive file=<your_vm_image>,if=virtio \
>                 -cpu host,+sgx_provisionkey \
>                 -sgx-epc id=epc1,memdev=mem1 \
>                 -object memory-backend-epc,id=mem1,size=64M,prealloc
> 
> Please note that the SGX relevant part is:
> 
>                 -cpu host,+sgx_provisionkey \
>                 -sgx-epc id=epc1,memdev=mem1 \
>                 -object memory-backend-epc,id=mem1,size=64M,prealloc
> 
> And you can change other parameters of your qemu command based on your needs.

Thanks a lot documenting these snippets to the cover letter. I dig these
up from lore once my environment is working.

I'm setting up Arch based test environment with the eye on this patch set
and generic Linux keyring patches:

https://git.kernel.org/pub/scm/linux/kernel/git/jarkko/arch.git/

Still have some minor bits to adjust before I can start deploying it for SGX
testing. For this patch set I'll use two instances of it.

> =========
> KVM SGX virtualization Overview
> 
> - Virtual EPC
> 
> "Virtual EPC" is the EPC section exposed by KVM to guest so SGX software in
> guest can discover it and use it to create SGX enclaves. KVM exposes SGX to 

Virtual EPC is a representation of an EPC section. And there is no "the".

> guest via CPUID, and exposes one or more "virtual EPC" sections for guest.
> The size of "virtual EPC" is passed as Qemu parameter when creating the
> guest, and the base address is calcualted internally according to guest's
> configuration.
> 
> To support virtual EPC, add a new misc device /dev/sgx_virt_epc to SGX
> core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
> "virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX driver,
> virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave associated,
> and how virtual EPC is used by guest is compeletely controlled by guest's SGX
> software.

I think that /dev/sgx_vepc would be a clear enough name for the device. This
text has now a bit confusing "terminology" related to this.

In some places virtual EPC is quotes, and in other places it is not. I think
that you could consistently an abbervation vEPC (without quotations):

"
vEPC
====

Virtual EPC, shortened as vEPC, is a representation of ...
"

> Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> /dev/sgx_virt_epc rather than in KVM. Doing so has two major advantages:

Maybe you could remove "raw" from the text.

>   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
>     just another memory backend for guests.

Why this an advantage? No objection, just a question.

>   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
>     does not have to export any symbols, changes to reclaim flows don't
>     need to be routed through KVM, SGX's dirty laundry doesn't have to
>     get aired out for the world to see, and so on and so forth.

No comments to this before understanding code changes better.

> The virtual EPC allocated to guests is currently not reclaimable, due to
> reclaiming EPC from KVM guests is not currently supported. Due to the
> complications of handling reclaim conflicts between guest and host, KVM
> EPC oversubscription, which allows total virtual EPC size greater than
> physical EPC by being able to reclaiming guests' EPC, is significantly more
> complex than basic support for SGX virtualization.

I think it should be really in the center of the patch set description that
this patch set implements segmentation of EPC, not oversubscription. It should
be clear immediately. It's a core part of knowing "what I'm looking at".

> - Support SGX virtualization without SGX Launch Control unlocked mode
> 
> Although SGX driver requires SGX Launch Control unlocked mode to work, SGX
> virtualization doesn't, since how enclave is created is completely controlled
> by guest SGX software, which is not necessarily linux. Therefore, this series
> allows KVM to expose SGX to guest even SGX Launch Control is in locked mode,
> or is not present at all. The reason is the goal of SGX virtualization, or
> virtualization in general, is to expose hardware feature to guest, but not to
> make assumption how guest will use it. Therefore, KVM should support SGX guest
> as long as hardware is able to, to have chance to support more potential use
> cases in cloud environment.

AFAIK the convergence point with the FLC was, and is that Linux never enables
SGX with locked MSRs.

And I don't understand, if it is not fine to allow locked SGX for a *process*,
why is it fine for a *virtual machine*? They have a lot same.

I cannot remember out of top of my head, could the Intel SHA256 be read when
booted with unlocked MSRs. If that is the case, then you can still support
guests with that configuration.

Context-dependent guidelines tend to also trash code big time. Also, for the
sake of a sane kernel code base, I would consider only supporting unlocked
MSRs.

/Jarkko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-09  1:19                   ` Borislav Petkov
@ 2021-01-11 17:54                     ` Sean Christopherson
  2021-01-11 19:09                       ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Sean Christopherson @ 2021-01-11 17:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Sat, Jan 09, 2021, Borislav Petkov wrote:
> On Fri, Jan 08, 2021 at 03:55:52PM -0800, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > index dc921d76e42e..21f92d81d5a5 100644
> > --- a/arch/x86/kvm/cpuid.h
> > +++ b/arch/x86/kvm/cpuid.h
> > @@ -7,7 +7,25 @@
> >  #include <asm/processor.h>
> >  #include <uapi/asm/kvm_para.h>
> > 
> > -extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
> > +/*
> > + * Hardware-defined CPUID leafs that are scattered in the kernel, but need to
> > + * be directly by KVM.  Note, these word values conflict with the kernel's
> > + * "bug" caps, but KVM doesn't use those.
> 
> This feels like another conflict waiting to happen if KVM decides to use
> them at some point...

Yes, but KVM including the bug caps in kvm_cpu_caps is extremely unlikely, and
arguably flat out wrong.  Currently, kvm_cpu_caps includes only CPUID-based
features that can be exposed direcly to the guest.  I could see a scenario where
KVM exposed "bug" capabilities to the guest via a paravirt interface, but I
would expect that KVM would either filter and expose the kernel's bug caps
without userspace input, or would add a KVM-defined paravirt CPUID leaf to
enumerate the caps and track _that_ in kvm_cpu_caps.

Anyways, I agree that overlapping the bug caps it's a bit of unnecessary
cleverness.  I'm not opposed to incorporating NBUGINTS into KVM, but that would
mean explicitly pulling in even more x86_capability implementation details.

> So let me get this straight: KVM wants to use X86_FEATURE_* which
> means, those numbers must map to the respective words in its CPUID caps
> representation kvm_cpu_caps, AFAICT.

That part is deliberate and isn't a dependency so much as how things are
implemented.  The true dependency is on the bit offsets within each word.  The
kernel could completely rescramble the word numbering and KVM would chug along
happily.  What KVM won't play nice with is if the kernel broke up a hardware-
defined, gathered CPUID leaf/word into scattered features spread out amongst
multiple Linux-defined words.

> Then, it wants the leafs to correspond to the hardware leafs layout so
> that it can do:
> 
> 	kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
> 
> which comes straight from CPUID.
> 
> So lemme look at one word:
> 
>         kvm_cpu_cap_mask(CPUID_1_EDX,
>                 F(FPU) | F(VME) | F(DE) | F(PSE) |
>                 F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> 		...
> 
> 
> it would build the bitmask of the CPUID leaf using X86_FEATURE_* bits
> and then mask it out with the hardware leaf read from CPUID.
> 
> But why?
> 
> Why doesn't it simply build those leafs in kvm_cpu_caps from the leafs
> we've already queried?
> 
> Oh it does so a bit earlier:
> 
>         memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
>                sizeof(kvm_cpu_caps));
> 
> and that kvm_cpu_cap_mask() call is to clear some bits in kvm_cpu_caps
> which is kvm-specific thing (not supported stuff etc).
> 
> But then why does kvm_cpu_cap_mask() does cpuid_count()? Didn't it just
> read the bits from boot_cpu_data.x86_capability? And those bits we do
> query and massage extensively during boot. So why does KVM needs to
> query CPUID again instead of using what we've already queried?

It's mostly historical; before the kvm_cpu_caps concept was introduced, the code
had grown organically to include both boot_cpu_data and raw CPUID info.  The
vast, vast majority of the time, doing CPUID is likely redundant.  But, as noted
in commit d8577a4c238f ("KVM: x86: Do host CPUID at load time to mask KVM cpu
caps"), the code is quite cheap and runs once at KVM load.  My argument back
then was, and still is, that an extra bit of paranoia is justified since the
code and operations are quite nearly free.

This particular dependency can be broken, and quite easily at that.  Rather than
memcpy() boot_cpu_data.x86_capability, it's trivially easy to redefine the F()
macro to invoke boot_cpu_has(), which would allow dropping the memcpy().  The
big downside, and why I didn't post the code, is that doing so means every
feature routed through F() requires some form of BT+Jcc (or CMOVcc) sequence,
whereas the mempcy() approach allows the F() features to be encoded as a single
literal by the compiler.

From a latency perspective, the extra code is negligible.  The big issue is that
all those extra checks add 2k+ bytes of code.  Eliminating the mempcy() doesn't
actually break KVM's dependency on the bit offsets, so we'd be bloating kvm.ko
by a noticeable amount without providing substantial value.

And, this behavior is mostly opportunistic; the true justification/motiviation
for taking a dependency on the X86_FEATURE_* bit offsets is for communication
with userspace, querying the guest CPU model, and runtime checks.

> Maybe I'm missing something kvm-specific.
> 
> In any case, this feels somewhat weird: you have *_cpu_has() on
> baremetal abstracting almost completely from CPUID by collecting all
> feature bits it needs into its own structure - x86_capability[] along
> with accessors for it - and then you want to "abstract back" to CPUID
> leafs from that interface. I wonder why.

It's effectively for communication with userspace.  Userspace, via ioctl(),
dictates the vCPU model to KVM, including the exact CPUID results.  to properly
virtualize/emulate the defined vCPU model, KVM must query the dictated CPUID
results to determine what features are supported, what guest operations
should fault, etc...  E.g. if the vCPU model, via CPUID, states that SMEP isn't
supported then KVM needs to inject a #GP if the guest attempts to set CR4.SMEP.

KVM also uses the hardware-defined CPUID ABI to advertise which features are
supported by both hardware and KVM.  This is the kvm_cpu_cap stuff, where KVM
reads boot_cpu_data to see what features were enabled by the kernel.

It would be possible for KVM to break the dependency on X86_FEATURE_* bit
offsets by defining a translation layer, but I strongly feel that adding manual
translations will do more harm than good as it increases the odds of us botching
a translation or using the wrong feature flag, creates potential namespace
conflicts, etc...

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-11 17:20 ` Jarkko Sakkinen
@ 2021-01-11 18:37   ` Sean Christopherson
  2021-01-12  1:58     ` Jarkko Sakkinen
  2021-01-12  1:14   ` Kai Huang
  1 sibling, 1 reply; 111+ messages in thread
From: Sean Christopherson @ 2021-01-11 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Kai Huang, linux-sgx, kvm, x86, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, mattson, joro,
	vkuznets, wanpengli, corbet

On Mon, Jan 11, 2021, Jarkko Sakkinen wrote:
> On Wed, 2021-01-06 at 14:55 +1300, Kai Huang wrote:
> >   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
> >     just another memory backend for guests.
> 
> Why this an advantage? No objection, just a question.

There are zero KVM changes required to support exposing EPC to a guest.  KVM's
MMU is completely ignorant of what physical backing is used for any given host
virtual address.  KVM has to be aware of various VM_* flags, e.g. VM_PFNMAP and
VM_IO, but that code is arch agnostic and is quite isolated.

> >   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
> >     does not have to export any symbols, changes to reclaim flows don't
> >     need to be routed through KVM, SGX's dirty laundry doesn't have to
> >     get aired out for the world to see, and so on and so forth.
> 
> No comments to this before understanding code changes better.
> 
> > The virtual EPC allocated to guests is currently not reclaimable, due to
> > reclaiming EPC from KVM guests is not currently supported. Due to the
> > complications of handling reclaim conflicts between guest and host, KVM
> > EPC oversubscription, which allows total virtual EPC size greater than
> > physical EPC by being able to reclaiming guests' EPC, is significantly more
> > complex than basic support for SGX virtualization.
> 
> I think it should be really in the center of the patch set description that
> this patch set implements segmentation of EPC, not oversubscription. It should
> be clear immediately. It's a core part of knowing "what I'm looking at".

Technically, it doesn't implement EPC segmentation of EPC.  It implements
non-reclaimable EPC allocation.  Even that is somewhat untrue as the EPC can be
forcefully reclaimed, but doing so will destroy the guest contents.

Userspace can oversubscribe the EPC to KVM guests, but it would need to kill,
migrate, or pause one or more VMs if the pool of physical EPC were exhausted.

> > - Support SGX virtualization without SGX Launch Control unlocked mode
> > 
> > Although SGX driver requires SGX Launch Control unlocked mode to work, SGX
> > virtualization doesn't, since how enclave is created is completely controlled
> > by guest SGX software, which is not necessarily linux. Therefore, this series
> > allows KVM to expose SGX to guest even SGX Launch Control is in locked mode,
> > or is not present at all. The reason is the goal of SGX virtualization, or
> > virtualization in general, is to expose hardware feature to guest, but not to
> > make assumption how guest will use it. Therefore, KVM should support SGX guest
> > as long as hardware is able to, to have chance to support more potential use
> > cases in cloud environment.
> 
> AFAIK the convergence point with the FLC was, and is that Linux never enables
> SGX with locked MSRs.
> 
> And I don't understand, if it is not fine to allow locked SGX for a *process*,
> why is it fine for a *virtual machine*? They have a lot same.

Because it's a completely different OS/kernel.  If the user has a kernel that
supports locked SGX, then so be it.  There's no novel circumvention of the
kernel policy, e.g. the user could simply boot the non-upstream kernel directly,
and running an upstream kernel in the guest will not cause the kernel to support
SGX.

There are any number of things that are allowed in a KVM guest that are not
allowed in a bare metal process.

> I cannot remember out of top of my head, could the Intel SHA256 be read when
> booted with unlocked MSRs. If that is the case, then you can still support
> guests with that configuration.

No, it's not guaranteed to be readable as firmware could have already changed
the values in the MSRs.

> Context-dependent guidelines tend to also trash code big time. Also, for the
> sake of a sane kernel code base, I would consider only supporting unlocked
> MSRs.

It's one line of a code to teach the kernel driver not to load if the MSRs are
locked.  And IMO, that one line of code is a net positive as it makes it clear
in the driver itself that it chooses not support locked MSRs, even if SGX itself
is fully enabled.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-11 17:54                     ` Sean Christopherson
@ 2021-01-11 19:09                       ` Borislav Petkov
  2021-01-11 19:20                         ` Sean Christopherson
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2021-01-11 19:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Mon, Jan 11, 2021 at 09:54:17AM -0800, Sean Christopherson wrote:
> Yes, but KVM including the bug caps in kvm_cpu_caps is extremely unlikely, and
> arguably flat out wrong.  Currently, kvm_cpu_caps includes only CPUID-based
> features that can be exposed direcly to the guest.  I could see a scenario where
> KVM exposed "bug" capabilities to the guest via a paravirt interface, but I
> would expect that KVM would either filter and expose the kernel's bug caps
> without userspace input, or would add a KVM-defined paravirt CPUID leaf to
> enumerate the caps and track _that_ in kvm_cpu_caps.
> 
> Anyways, I agree that overlapping the bug caps it's a bit of unnecessary
> cleverness.  I'm not opposed to incorporating NBUGINTS into KVM, but that would
> mean explicitly pulling in even more x86_capability implementation details.

Also, the kernel and kvm being part of it :) kinda tries to fix those
bugs and not expose them to the guest so exposing a bug would probably
be only for testing purposes...

> That part is deliberate and isn't a dependency so much as how things are
> implemented.  The true dependency is on the bit offsets within each word. 

Right.

> The kernel could completely rescramble the word numbering and KVM
> would chug along happily. What KVM won't play nice with is if the
> kernel broke up a hardware- defined, gathered CPUID leaf/word into
> scattered features spread out amongst multiple Linux-defined words.

Yes, kvm wants the bits just as they are in the CPUID leafs from the hw.

> It's mostly historical; before the kvm_cpu_caps concept was introduced, the code
> had grown organically to include both boot_cpu_data and raw CPUID info.  The
> vast, vast majority of the time, doing CPUID is likely redundant.  But, as noted
> in commit d8577a4c238f ("KVM: x86: Do host CPUID at load time to mask KVM cpu
> caps"), the code is quite cheap and runs once at KVM load.  My argument back
> then was, and still is, that an extra bit of paranoia is justified since the
> code and operations are quite nearly free.

Ok.

> This particular dependency can be broken, and quite easily at that.  Rather than
> memcpy() boot_cpu_data.x86_capability, it's trivially easy to redefine the F()
> macro to invoke boot_cpu_has(), which would allow dropping the memcpy().  The
> big downside, and why I didn't post the code, is that doing so means every
> feature routed through F() requires some form of BT+Jcc (or CMOVcc) sequence,
> whereas the mempcy() approach allows the F() features to be encoded as a single
> literal by the compiler.
> 
> From a latency perspective, the extra code is negligible.  The big issue is that
> all those extra checks add 2k+ bytes of code.  Eliminating the mempcy() doesn't
> actually break KVM's dependency on the bit offsets, so we'd be bloating kvm.ko
> by a noticeable amount without providing substantial value.
> 
> And, this behavior is mostly opportunistic; the true justification/motiviation
> for taking a dependency on the X86_FEATURE_* bit offsets is for communication
> with userspace, querying the guest CPU model, and runtime checks.

Ok, I guess we'll try to find a middle ground here and not let stuff
grow too ugly to live.

> It's effectively for communication with userspace.  Userspace, via ioctl(),
> dictates the vCPU model to KVM, including the exact CPUID results. 

And using the CPUID leafs with the exact bit positions is sort of an
"interface" there, I see.

> to properly
> virtualize/emulate the defined vCPU model, KVM must query the dictated CPUID
> results to determine what features are supported, what guest operations
> should fault, etc...  E.g. if the vCPU model, via CPUID, states that SMEP isn't
> supported then KVM needs to inject a #GP if the guest attempts to set CR4.SMEP.
> 
> KVM also uses the hardware-defined CPUID ABI to advertise which features are
> supported by both hardware and KVM.  This is the kvm_cpu_cap stuff, where KVM
> reads boot_cpu_data to see what features were enabled by the kernel.

Right.

> It would be possible for KVM to break the dependency on X86_FEATURE_* bit
> offsets by defining a translation layer, but I strongly feel that adding manual
> translations will do more harm than good as it increases the odds of us botching
> a translation or using the wrong feature flag, creates potential namespace
> conflicts, etc...

Ok, lemme see if we might encounter more issues down the road...

+enum kvm_only_cpuid_leafs {
+       CPUID_12_EAX     = NCAPINTS,
+       NR_KVM_CPU_CAPS,
+
+       NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
+};
+

What happens when we decide to allocate a separate leaf for CPUID_12_EAX
down the road?

You do it already here

Subject: [PATCH 04/13] x86/cpufeatures: Assign dedicated feature word for AMD mem encryption

for the AMD leaf.

I'm thinking this way around - from scattered to a hw one - should be ok
because that should work easily. The other way around, taking a hw leaf
and scattering it around x86_capability[] array elems would probably be
nasty but with your change that should work too.

Yah, I'm just hypothesizing here - I don't think this "other way around"
will ever happen...

Hmm, yap, I can cautiously say that with your change we should be ok...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-11 19:09                       ` Borislav Petkov
@ 2021-01-11 19:20                         ` Sean Christopherson
  2021-01-12  2:01                           ` Kai Huang
  2021-01-12 12:13                           ` Borislav Petkov
  0 siblings, 2 replies; 111+ messages in thread
From: Sean Christopherson @ 2021-01-11 19:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Mon, Jan 11, 2021, Borislav Petkov wrote:
> On Mon, Jan 11, 2021 at 09:54:17AM -0800, Sean Christopherson wrote:
> > It would be possible for KVM to break the dependency on X86_FEATURE_* bit
> > offsets by defining a translation layer, but I strongly feel that adding manual
> > translations will do more harm than good as it increases the odds of us botching
> > a translation or using the wrong feature flag, creates potential namespace
> > conflicts, etc...
> 
> Ok, lemme see if we might encounter more issues down the road...
> 
> +enum kvm_only_cpuid_leafs {
> +       CPUID_12_EAX     = NCAPINTS,
> +       NR_KVM_CPU_CAPS,
> +
> +       NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
> +};
> +
> 
> What happens when we decide to allocate a separate leaf for CPUID_12_EAX
> down the road?

Well, mechanically, that would generate a build failure if the kernel does the
obvious things and names the 'enum cpuid_leafs' entry CPUID_12_EAX.  That would
be an obvious clue that KVM should be updated.

If the kernel named the enum entry something different, and we botched the code
review, KVM would continue to work, but would unnecessarily copy the bits it
cares about to its own word.   E.g. the boot_cpu_has() checks and translation to
__X86_FEATURE_* would still be valid.  As far as failure modes go, that's not
terrible.

> You do it already here
> 
> Subject: [PATCH 04/13] x86/cpufeatures: Assign dedicated feature word for AMD mem encryption
> 
> for the AMD leaf.
> 
> I'm thinking this way around - from scattered to a hw one - should be ok
> because that should work easily. The other way around, taking a hw leaf
> and scattering it around x86_capability[] array elems would probably be
> nasty but with your change that should work too.
> 
> Yah, I'm just hypothesizing here - I don't think this "other way around"
> will ever happen...
> 
> Hmm, yap, I can cautiously say that with your change we should be ok...
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 01/23] x86/sgx: Split out adding EPC page to free list to separate helper
  2021-01-06  1:55 ` [RFC PATCH 01/23] x86/sgx: Split out adding EPC page to free list to separate helper Kai Huang
@ 2021-01-11 22:38   ` Jarkko Sakkinen
  2021-01-12  0:19     ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-11 22:38 UTC (permalink / raw)
  To: Kai Huang, linux-sgx, kvm, x86
  Cc: seanjc, luto, dave.hansen, haitao.huang, pbonzini, bp, tglx, mingo, hpa

On Wed, 2021-01-06 at 14:55 +1300, Kai Huang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> SGX virtualization requires to allocate "raw" EPC and use it as virtual
> EPC for SGX guest.  Unlike EPC used by SGX driver, virtual EPC doesn't
> track how EPC pages are used in VM, e.g. (de)construction of enclaves,
> so it cannot guarantee EREMOVE success, e.g. it doesn't have a priori
> knowledge of which pages are SECS with non-zero child counts.
> 
> Split sgx_free_page() into two parts so that the "add to free list"
> part can be used by virtual EPC without having to modify the EREMOVE
> logic in sgx_free_page().
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>

I have a better idea with the same outcome for KVM.

https://lore.kernel.org/linux-sgx/20210111223610.62261-1-jarkko@kernel.org/T/#t

/Jarkko


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code
  2021-01-06  1:55 ` [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code Kai Huang
  2021-01-06 18:28   ` Dave Hansen
@ 2021-01-11 23:32   ` Jarkko Sakkinen
  2021-01-12  0:16     ` Kai Huang
  1 sibling, 1 reply; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-11 23:32 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, Jan 06, 2021 at 02:55:19PM +1300, Kai Huang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> SGX virtualization requires to allocate "raw" EPC and use it as "virtual
> EPC" for SGX guest.  Unlike EPC used by SGX driver, virtual EPC doesn't
> track how EPC pages are used in VM, e.g. (de)construction of enclaves,
> so it cannot guarantee EREMOVE success, e.g. it doesn't have a priori
> knowledge of which pages are SECS with non-zero child counts.
> 
> Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
> failures are expected, but only due to SGX_CHILD_PRESENT.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>

Acked-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-06  1:55 ` [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
  2021-01-06 19:35   ` Dave Hansen
@ 2021-01-11 23:38   ` Jarkko Sakkinen
  2021-01-12  0:56     ` Kai Huang
  1 sibling, 1 reply; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-11 23:38 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, Jan 06, 2021 at 02:55:20PM +1300, Kai Huang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Add a misc device /dev/sgx_virt_epc to allow userspace to allocate "raw"
> EPC without an associated enclave.  The intended and only known use case
> for raw EPC allocation is to expose EPC to a KVM guest, hence the
> virt_epc moniker, virt.{c,h} files and X86_SGX_VIRTUALIZATION Kconfig.
> 
> Modify sgx_init() to always try to initialize virtual EPC driver, even
> when SGX driver is disabled due to SGX Launch Control is in locked mode,
> or not present at all, since SGX virtualization allows to expose SGX to
> guests that support non-LC configurations.
> 
> Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> /dev/sgx_virt_epc rather than in KVM. Doing so has two major advantages:
> 
>   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
>     just another memory backend for guests.
> 
>   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
>     does not have to export any symbols, changes to reclaim flows don't
>     need to be routed through KVM, SGX's dirty laundry doesn't have to
>     get aired out for the world to see, and so on and so forth.
> 
> The virtual EPC allocated to guests is currently not reclaimable, due to
> oversubscription of EPC for KVM guests is not currently supported. Due
> to the complications of handling reclaim conflicts between guest and
> host, KVM EPC oversubscription is significantly more complex than basic
> support for SGX virtualization.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Co-developed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>

The commit message does not describe the code changes. It should
have an understandable explanation of fops. There is nothing about
the implementation right now.

/Jarkko

> ---
>  arch/x86/Kconfig                 |  12 ++
>  arch/x86/kernel/cpu/sgx/Makefile |   1 +
>  arch/x86/kernel/cpu/sgx/main.c   |   5 +-
>  arch/x86/kernel/cpu/sgx/virt.c   | 263 +++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/virt.h   |  14 ++
>  5 files changed, 294 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/kernel/cpu/sgx/virt.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/virt.h
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 618d1aabccb8..a7318175509b 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1947,6 +1947,18 @@ config X86_SGX
>  
>  	  If unsure, say N.
>  
> +config X86_SGX_VIRTUALIZATION
> +	bool "Software Guard eXtensions (SGX) Virtualization"
> +	depends on X86_SGX && KVM_INTEL
> +	help
> +
> +	  Enables KVM guests to create SGX enclaves.
> +
> +	  This includes support to expose "raw" unreclaimable enclave memory to
> +	  guests via a device node, e.g. /dev/sgx_virt_epc.
> +
> +	  If unsure, say N.
> +
>  config EFI
>  	bool "EFI runtime service support"
>  	depends on ACPI
> diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
> index 91d3dc784a29..7a25bf63adfb 100644
> --- a/arch/x86/kernel/cpu/sgx/Makefile
> +++ b/arch/x86/kernel/cpu/sgx/Makefile
> @@ -3,3 +3,4 @@ obj-y += \
>  	encl.o \
>  	ioctl.o \
>  	main.o
> +obj-$(CONFIG_X86_SGX_VIRTUALIZATION)	+= virt.o
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 95aad183bb65..02993a327a1f 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -9,9 +9,11 @@
>  #include <linux/sched/mm.h>
>  #include <linux/sched/signal.h>
>  #include <linux/slab.h>
> +#include "arch.h"
>  #include "driver.h"
>  #include "encl.h"
>  #include "encls.h"
> +#include "virt.h"
>  
>  struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
>  static int sgx_nr_epc_sections;
> @@ -726,7 +728,8 @@ static void __init sgx_init(void)
>  	if (!sgx_page_reclaimer_init())
>  		goto err_page_cache;
>  
> -	ret = sgx_drv_init();
> +	/* Success if the native *or* virtual EPC driver initialized cleanly. */
> +	ret = !!sgx_drv_init() & !!sgx_virt_epc_init();
>  	if (ret)
>  		goto err_kthread;
>  
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> new file mode 100644
> index 000000000000..d625551ccf25
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -0,0 +1,263 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*  Copyright(c) 2016-20 Intel Corporation. */
> +
> +#include <linux/miscdevice.h>
> +#include <linux/mm.h>
> +#include <linux/mman.h>
> +#include <linux/sched/mm.h>
> +#include <linux/sched/signal.h>
> +#include <linux/slab.h>
> +#include <linux/xarray.h>
> +#include <asm/sgx.h>
> +#include <uapi/asm/sgx.h>
> +
> +#include "encls.h"
> +#include "sgx.h"
> +#include "virt.h"
> +
> +struct sgx_virt_epc {
> +	struct xarray page_array;
> +	struct mutex lock;
> +	struct mm_struct *mm;
> +};
> +
> +static struct mutex virt_epc_lock;
> +static struct list_head virt_epc_zombie_pages;
> +
> +static int __sgx_virt_epc_fault(struct sgx_virt_epc *epc,
> +				struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct sgx_epc_page *epc_page;
> +	unsigned long index, pfn;
> +	int ret;
> +
> +	/* epc->lock must already have been hold */
> +
> +	/* Calculate index of EPC page in virtual EPC's page_array */
> +	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
> +
> +	epc_page = xa_load(&epc->page_array, index);
> +	if (epc_page)
> +		return 0;
> +
> +	epc_page = sgx_alloc_epc_page(epc, false);
> +	if (IS_ERR(epc_page))
> +		return PTR_ERR(epc_page);
> +
> +	ret = xa_err(xa_store(&epc->page_array, index, epc_page, GFP_KERNEL));
> +	if (ret)
> +		goto err_free;
> +
> +	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
> +
> +	ret = vmf_insert_pfn(vma, addr, pfn);
> +	if (ret != VM_FAULT_NOPAGE) {
> +		ret = -EFAULT;
> +		goto err_delete;
> +	}
> +
> +	return 0;
> +
> +err_delete:
> +	xa_erase(&epc->page_array, index);
> +err_free:
> +	sgx_free_epc_page(epc_page);
> +	return ret;
> +}
> +
> +static vm_fault_t sgx_virt_epc_fault(struct vm_fault *vmf)
> +{
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct sgx_virt_epc *epc = vma->vm_private_data;
> +	int ret;
> +
> +	mutex_lock(&epc->lock);
> +	ret = __sgx_virt_epc_fault(epc, vma, vmf->address);
> +	mutex_unlock(&epc->lock);
> +
> +	if (!ret)
> +		return VM_FAULT_NOPAGE;
> +
> +	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
> +		mmap_read_unlock(vma->vm_mm);
> +		return VM_FAULT_RETRY;
> +	}
> +
> +	return VM_FAULT_SIGBUS;
> +}
> +
> +const struct vm_operations_struct sgx_virt_epc_vm_ops = {
> +	.fault = sgx_virt_epc_fault,
> +};
> +
> +static int sgx_virt_epc_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct sgx_virt_epc *epc = file->private_data;
> +
> +	if (!(vma->vm_flags & VM_SHARED))
> +		return -EINVAL;
> +
> +	/*
> +	 * Don't allow mmap() from child after fork(), since child and parent
> +	 * cannot map to the same EPC.
> +	 */
> +	if (vma->vm_mm != epc->mm)
> +		return -EINVAL;
> +
> +	vma->vm_ops = &sgx_virt_epc_vm_ops;
> +	/* Don't copy VMA in fork() */
> +	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
> +	vma->vm_private_data = file->private_data;
> +
> +	return 0;
> +}
> +
> +static int sgx_virt_epc_free_page(struct sgx_epc_page *epc_page)
> +{
> +	int ret;
> +
> +	if (!epc_page)
> +		return 0;
> +
> +	/*
> +	 * Explicitly EREMOVE virtual EPC page. Virtual EPC is only used by
> +	 * guest, and in normal condition guest should have done EREMOVE for
> +	 * all EPC pages before they are freed here. But it's possible guest
> +	 * is killed or crashed unnormally in which case EREMOVE has not been
> +	 * done. Do EREMOVE unconditionally here to cover both cases, because
> +	 * it's not possible to tell whether guest has done EREMOVE, since
> +	 * virtual EPC page status is not tracked. And it is fine to EREMOVE
> +	 * EPC page multiple times.
> +	 */
> +	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
> +	if (ret) {
> +		/*
> +		 * Only SGX_CHILD_PRESENT is expected, which is because of
> +		 * EREMOVE-ing an SECS still with child, in which case it can
> +		 * be handled by EREMOVE-ing the SECS again after all pages in
> +		 * virtual EPC have been EREMOVE-ed. See comments in below in
> +		 * sgx_virt_epc_release().
> +		 */
> +		WARN_ON_ONCE(ret != SGX_CHILD_PRESENT);
> +		return ret;
> +	}
> +
> +	__sgx_free_epc_page(epc_page);
> +	return 0;
> +}
> +
> +static int sgx_virt_epc_release(struct inode *inode, struct file *file)
> +{
> +	struct sgx_virt_epc *epc = file->private_data;
> +	struct sgx_epc_page *epc_page, *tmp, *entry;
> +	unsigned long index;
> +
> +	LIST_HEAD(secs_pages);
> +
> +	mmdrop(epc->mm);
> +
> +	xa_for_each(&epc->page_array, index, entry) {
> +		/*
> +		 * Virtual EPC pages are not tracked, so it's possible for
> +		 * EREMOVE to fail due to, e.g. a SECS page still has children
> +		 * if guest was shutdown unexpectedly. If it is the case, leave
> +		 * it in the xarray and retry EREMOVE below later.
> +		 */
> +		if (sgx_virt_epc_free_page(entry))
> +			continue;
> +
> +		xa_erase(&epc->page_array, index);
> +	}
> +
> +	/*
> +	 * Retry all failed pages after iterating through the entire tree, at
> +	 * which point all children should be removed and the SECS pages can be
> +	 * nuked as well...unless userspace has exposed multiple instance of
> +	 * virtual EPC to a single VM.
> +	 */
> +	xa_for_each(&epc->page_array, index, entry) {
> +		epc_page = entry;
> +		/*
> +		 * Error here means that EREMOVE failed due to a SECS page
> +		 * still has child on *another* EPC instance.  Put it to a
> +		 * temporary SECS list which will be spliced to 'zombie page
> +		 * list' and will be EREMOVE-ed again when freeing another
> +		 * virtual EPC instance.
> +		 */
> +		if (sgx_virt_epc_free_page(epc_page))
> +			list_add_tail(&epc_page->list, &secs_pages);
> +
> +		xa_erase(&epc->page_array, index);
> +	}
> +
> +	/*
> +	 * Third time's a charm.  Try to EREMOVE zombie SECS pages from virtual
> +	 * EPC instances that were previously released, i.e. free SECS pages
> +	 * that were in limbo due to having children in *this* EPC instance.
> +	 */
> +	mutex_lock(&virt_epc_lock);
> +	list_for_each_entry_safe(epc_page, tmp, &virt_epc_zombie_pages, list) {
> +		/*
> +		 * Speculatively remove the page from the list of zombies, if
> +		 * the page is successfully EREMOVE it will be added to the
> +		 * list of free pages.  If EREMOVE fails, throw the page on the
> +		 * local list, which will be spliced on at the end.
> +		 */
> +		list_del(&epc_page->list);
> +
> +		if (sgx_virt_epc_free_page(epc_page))
> +			list_add_tail(&epc_page->list, &secs_pages);
> +	}
> +
> +	if (!list_empty(&secs_pages))
> +		list_splice_tail(&secs_pages, &virt_epc_zombie_pages);
> +	mutex_unlock(&virt_epc_lock);
> +
> +	kfree(epc);
> +
> +	return 0;
> +}
> +
> +static int sgx_virt_epc_open(struct inode *inode, struct file *file)
> +{
> +	struct sgx_virt_epc *epc;
> +
> +	epc = kzalloc(sizeof(struct sgx_virt_epc), GFP_KERNEL);
> +	if (!epc)
> +		return -ENOMEM;
> +	/*
> +	 * Keep the current->mm to virtual EPC. It will be checked in
> +	 * sgx_virt_epc_mmap() to prevent, in case of fork, child being
> +	 * able to mmap() to the same virtual EPC pages.
> +	 */
> +	mmgrab(current->mm);
> +	epc->mm = current->mm;
> +	mutex_init(&epc->lock);
> +	xa_init(&epc->page_array);
> +
> +	file->private_data = epc;
> +
> +	return 0;
> +}
> +
> +static const struct file_operations sgx_virt_epc_fops = {
> +	.owner			= THIS_MODULE,
> +	.open			= sgx_virt_epc_open,
> +	.release		= sgx_virt_epc_release,
> +	.mmap			= sgx_virt_epc_mmap,
> +};
> +
> +static struct miscdevice sgx_virt_epc_dev = {
> +	.minor = MISC_DYNAMIC_MINOR,
> +	.name = "sgx_virt_epc",
> +	.nodename = "sgx_virt_epc",
> +	.fops = &sgx_virt_epc_fops,
> +};
> +
> +int __init sgx_virt_epc_init(void)
> +{
> +	INIT_LIST_HEAD(&virt_epc_zombie_pages);
> +	mutex_init(&virt_epc_lock);
> +
> +	return misc_register(&sgx_virt_epc_dev);
> +}
> diff --git a/arch/x86/kernel/cpu/sgx/virt.h b/arch/x86/kernel/cpu/sgx/virt.h
> new file mode 100644
> index 000000000000..e5434541a122
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/sgx/virt.h
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
> +#ifndef _ASM_X86_SGX_VIRT_H
> +#define _ASM_X86_SGX_VIRT_H
> +
> +#ifdef CONFIG_X86_SGX_VIRTUALIZATION
> +int __init sgx_virt_epc_init(void);
> +#else
> +static inline int __init sgx_virt_epc_init(void)
> +{
> +	return -ENODEV;
> +}
> +#endif
> +
> +#endif /* _ASM_X86_SGX_VIRT_H */
> -- 
> 2.29.2
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-06  1:55 ` [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features Kai Huang
  2021-01-06 19:39   ` Dave Hansen
  2021-01-06 22:15   ` Borislav Petkov
@ 2021-01-11 23:39   ` Jarkko Sakkinen
  2 siblings, 0 replies; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-11 23:39 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, Jan 06, 2021 at 02:55:21PM +1300, Kai Huang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Add a feature word to hold SGX features enumerated via CPUID.0x12.0x0,
> along with flags for SGX1 and SGX2. As part of virtualizing SGX, KVM
> needs to expose the SGX CPUID leafs to its guest. SGX1 and SGX2 need to
> be in a dedicated feature word so that they can be queried via KVM's
> reverse CPUID lookup to properly emulate the expected guest behavior.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>  [Kai: Also clear SGX1 and SGX2 bits in clear_sgx_caps().]
> Signed-off-by: Kai Huang <kai.huang@intel.com>

Acked-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code
  2021-01-11 23:32   ` Jarkko Sakkinen
@ 2021-01-12  0:16     ` Kai Huang
  2021-01-12  1:46       ` Jarkko Sakkinen
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-12  0:16 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Tue, 12 Jan 2021 01:32:19 +0200 Jarkko Sakkinen wrote:
> On Wed, Jan 06, 2021 at 02:55:19PM +1300, Kai Huang wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > SGX virtualization requires to allocate "raw" EPC and use it as "virtual
> > EPC" for SGX guest.  Unlike EPC used by SGX driver, virtual EPC doesn't
> > track how EPC pages are used in VM, e.g. (de)construction of enclaves,
> > so it cannot guarantee EREMOVE success, e.g. it doesn't have a priori
> > knowledge of which pages are SECS with non-zero child counts.
> > 
> > Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
> > failures are expected, but only due to SGX_CHILD_PRESENT.
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> 
> Acked-by: Jarkko Sakkinen <jarkko@kernel.org>

Thanks Jarkko. 

Dave suggested to change patch subject to explicitly call out hardware error
code:
	Add SGX_CHILD_PRESENT hardware error code

I suppose this also works for you, and I can have your Acked-by after I changed
that in v2?

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 01/23] x86/sgx: Split out adding EPC page to free list to separate helper
  2021-01-11 22:38   ` Jarkko Sakkinen
@ 2021-01-12  0:19     ` Kai Huang
  2021-01-12 21:45       ` Sean Christopherson
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-12  0:19 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Tue, 12 Jan 2021 00:38:40 +0200 Jarkko Sakkinen wrote:
> On Wed, 2021-01-06 at 14:55 +1300, Kai Huang wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > SGX virtualization requires to allocate "raw" EPC and use it as virtual
> > EPC for SGX guest.  Unlike EPC used by SGX driver, virtual EPC doesn't
> > track how EPC pages are used in VM, e.g. (de)construction of enclaves,
> > so it cannot guarantee EREMOVE success, e.g. it doesn't have a priori
> > knowledge of which pages are SECS with non-zero child counts.
> > 
> > Split sgx_free_page() into two parts so that the "add to free list"
> > part can be used by virtual EPC without having to modify the EREMOVE
> > logic in sgx_free_page().
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> 
> I have a better idea with the same outcome for KVM.
> 
> https://lore.kernel.org/linux-sgx/20210111223610.62261-1-jarkko@kernel.org/T/#t

I agree with your patch this one can be replaced. I'll include your patch in
next version, and once it is upstreamed, it can be removed in my series.

Sean, please let me know if you have objection.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code
  2021-01-06 18:28   ` Dave Hansen
  2021-01-06 21:40     ` Kai Huang
@ 2021-01-12  0:26     ` Jarkko Sakkinen
  1 sibling, 0 replies; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-12  0:26 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kai Huang, linux-sgx, kvm, x86, seanjc, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, Jan 06, 2021 at 10:28:55AM -0800, Dave Hansen wrote:
> On 1/5/21 5:55 PM, Kai Huang wrote:
> > Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
> > failures are expected, but only due to SGX_CHILD_PRESENT.
> 
> This dances around the fact that this is an architectural error-code.
> Could that be explicit?  Maybe the subject should be:
> 
> 	Add SGX_CHILD_PRESENT hardware error code

Yeah, a valid point. Please, change this.

/Jarkko

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-11 23:38   ` Jarkko Sakkinen
@ 2021-01-12  0:56     ` Kai Huang
  2021-01-12  1:50       ` Jarkko Sakkinen
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-12  0:56 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Tue, 12 Jan 2021 01:38:23 +0200 Jarkko Sakkinen wrote:
> On Wed, Jan 06, 2021 at 02:55:20PM +1300, Kai Huang wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > Add a misc device /dev/sgx_virt_epc to allow userspace to allocate "raw"
> > EPC without an associated enclave.  The intended and only known use case
> > for raw EPC allocation is to expose EPC to a KVM guest, hence the
> > virt_epc moniker, virt.{c,h} files and X86_SGX_VIRTUALIZATION Kconfig.
> > 
> > Modify sgx_init() to always try to initialize virtual EPC driver, even
> > when SGX driver is disabled due to SGX Launch Control is in locked mode,
> > or not present at all, since SGX virtualization allows to expose SGX to
> > guests that support non-LC configurations.
> > 
> > Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> > /dev/sgx_virt_epc rather than in KVM. Doing so has two major advantages:
> > 
> >   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
> >     just another memory backend for guests.
> > 
> >   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
> >     does not have to export any symbols, changes to reclaim flows don't
> >     need to be routed through KVM, SGX's dirty laundry doesn't have to
> >     get aired out for the world to see, and so on and so forth.
> > 
> > The virtual EPC allocated to guests is currently not reclaimable, due to
> > oversubscription of EPC for KVM guests is not currently supported. Due
> > to the complications of handling reclaim conflicts between guest and
> > host, KVM EPC oversubscription is significantly more complex than basic
> > support for SGX virtualization.
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> 
> The commit message does not describe the code changes. It should
> have an understandable explanation of fops. There is nothing about
> the implementation right now.

Thanks for feedback. Does "understabdable explanation of fops" mean I
should add one sentence to say, for instance: "userspace hypervisor should open
the /dev/sgx_virt_epc, use mmap() to get a valid address range, and then use
that address range to create KVM memory region"?

Or should I include an example of how to use /dev/sgx_virt_epc in userspace, for
instance, below?

	fd = open("/dev/sgx_virt_epc", O_RDWR);
	void *addr = mmap(NULL, size, ..., fd);
	/* userspace hypervisor uses addr, size to create KVM memory slot */
	...

I dug the SGX driver side to understand what should I add, but in below commit I
don't see description of fops either:

	commit 3fe0778edac8628637e2fd23835996523b1a3372
	Author: Jarkko Sakkinen <jarkko@kernel.org>
	Date:   Fri Nov 13 00:01:22 2020 +0200

    	    x86/sgx: Add an SGX misc driver interface


> 
> /Jarkko
> 
> > ---
> >  arch/x86/Kconfig                 |  12 ++
> >  arch/x86/kernel/cpu/sgx/Makefile |   1 +
> >  arch/x86/kernel/cpu/sgx/main.c   |   5 +-
> >  arch/x86/kernel/cpu/sgx/virt.c   | 263 +++++++++++++++++++++++++++++++
> >  arch/x86/kernel/cpu/sgx/virt.h   |  14 ++
> >  5 files changed, 294 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/x86/kernel/cpu/sgx/virt.c
> >  create mode 100644 arch/x86/kernel/cpu/sgx/virt.h
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 618d1aabccb8..a7318175509b 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1947,6 +1947,18 @@ config X86_SGX
> >  
> >  	  If unsure, say N.
> >  
> > +config X86_SGX_VIRTUALIZATION
> > +	bool "Software Guard eXtensions (SGX) Virtualization"
> > +	depends on X86_SGX && KVM_INTEL
> > +	help
> > +
> > +	  Enables KVM guests to create SGX enclaves.
> > +
> > +	  This includes support to expose "raw" unreclaimable enclave memory to
> > +	  guests via a device node, e.g. /dev/sgx_virt_epc.
> > +
> > +	  If unsure, say N.
> > +
> >  config EFI
> >  	bool "EFI runtime service support"
> >  	depends on ACPI
> > diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
> > index 91d3dc784a29..7a25bf63adfb 100644
> > --- a/arch/x86/kernel/cpu/sgx/Makefile
> > +++ b/arch/x86/kernel/cpu/sgx/Makefile
> > @@ -3,3 +3,4 @@ obj-y += \
> >  	encl.o \
> >  	ioctl.o \
> >  	main.o
> > +obj-$(CONFIG_X86_SGX_VIRTUALIZATION)	+= virt.o
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 95aad183bb65..02993a327a1f 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -9,9 +9,11 @@
> >  #include <linux/sched/mm.h>
> >  #include <linux/sched/signal.h>
> >  #include <linux/slab.h>
> > +#include "arch.h"
> >  #include "driver.h"
> >  #include "encl.h"
> >  #include "encls.h"
> > +#include "virt.h"
> >  
> >  struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
> >  static int sgx_nr_epc_sections;
> > @@ -726,7 +728,8 @@ static void __init sgx_init(void)
> >  	if (!sgx_page_reclaimer_init())
> >  		goto err_page_cache;
> >  
> > -	ret = sgx_drv_init();
> > +	/* Success if the native *or* virtual EPC driver initialized cleanly. */
> > +	ret = !!sgx_drv_init() & !!sgx_virt_epc_init();
> >  	if (ret)
> >  		goto err_kthread;
> >  
> > diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> > new file mode 100644
> > index 000000000000..d625551ccf25
> > --- /dev/null
> > +++ b/arch/x86/kernel/cpu/sgx/virt.c
> > @@ -0,0 +1,263 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*  Copyright(c) 2016-20 Intel Corporation. */
> > +
> > +#include <linux/miscdevice.h>
> > +#include <linux/mm.h>
> > +#include <linux/mman.h>
> > +#include <linux/sched/mm.h>
> > +#include <linux/sched/signal.h>
> > +#include <linux/slab.h>
> > +#include <linux/xarray.h>
> > +#include <asm/sgx.h>
> > +#include <uapi/asm/sgx.h>
> > +
> > +#include "encls.h"
> > +#include "sgx.h"
> > +#include "virt.h"
> > +
> > +struct sgx_virt_epc {
> > +	struct xarray page_array;
> > +	struct mutex lock;
> > +	struct mm_struct *mm;
> > +};
> > +
> > +static struct mutex virt_epc_lock;
> > +static struct list_head virt_epc_zombie_pages;
> > +
> > +static int __sgx_virt_epc_fault(struct sgx_virt_epc *epc,
> > +				struct vm_area_struct *vma, unsigned long addr)
> > +{
> > +	struct sgx_epc_page *epc_page;
> > +	unsigned long index, pfn;
> > +	int ret;
> > +
> > +	/* epc->lock must already have been hold */
> > +
> > +	/* Calculate index of EPC page in virtual EPC's page_array */
> > +	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
> > +
> > +	epc_page = xa_load(&epc->page_array, index);
> > +	if (epc_page)
> > +		return 0;
> > +
> > +	epc_page = sgx_alloc_epc_page(epc, false);
> > +	if (IS_ERR(epc_page))
> > +		return PTR_ERR(epc_page);
> > +
> > +	ret = xa_err(xa_store(&epc->page_array, index, epc_page, GFP_KERNEL));
> > +	if (ret)
> > +		goto err_free;
> > +
> > +	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
> > +
> > +	ret = vmf_insert_pfn(vma, addr, pfn);
> > +	if (ret != VM_FAULT_NOPAGE) {
> > +		ret = -EFAULT;
> > +		goto err_delete;
> > +	}
> > +
> > +	return 0;
> > +
> > +err_delete:
> > +	xa_erase(&epc->page_array, index);
> > +err_free:
> > +	sgx_free_epc_page(epc_page);
> > +	return ret;
> > +}
> > +
> > +static vm_fault_t sgx_virt_epc_fault(struct vm_fault *vmf)
> > +{
> > +	struct vm_area_struct *vma = vmf->vma;
> > +	struct sgx_virt_epc *epc = vma->vm_private_data;
> > +	int ret;
> > +
> > +	mutex_lock(&epc->lock);
> > +	ret = __sgx_virt_epc_fault(epc, vma, vmf->address);
> > +	mutex_unlock(&epc->lock);
> > +
> > +	if (!ret)
> > +		return VM_FAULT_NOPAGE;
> > +
> > +	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
> > +		mmap_read_unlock(vma->vm_mm);
> > +		return VM_FAULT_RETRY;
> > +	}
> > +
> > +	return VM_FAULT_SIGBUS;
> > +}
> > +
> > +const struct vm_operations_struct sgx_virt_epc_vm_ops = {
> > +	.fault = sgx_virt_epc_fault,
> > +};
> > +
> > +static int sgx_virt_epc_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +	struct sgx_virt_epc *epc = file->private_data;
> > +
> > +	if (!(vma->vm_flags & VM_SHARED))
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * Don't allow mmap() from child after fork(), since child and parent
> > +	 * cannot map to the same EPC.
> > +	 */
> > +	if (vma->vm_mm != epc->mm)
> > +		return -EINVAL;
> > +
> > +	vma->vm_ops = &sgx_virt_epc_vm_ops;
> > +	/* Don't copy VMA in fork() */
> > +	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
> > +	vma->vm_private_data = file->private_data;
> > +
> > +	return 0;
> > +}
> > +
> > +static int sgx_virt_epc_free_page(struct sgx_epc_page *epc_page)
> > +{
> > +	int ret;
> > +
> > +	if (!epc_page)
> > +		return 0;
> > +
> > +	/*
> > +	 * Explicitly EREMOVE virtual EPC page. Virtual EPC is only used by
> > +	 * guest, and in normal condition guest should have done EREMOVE for
> > +	 * all EPC pages before they are freed here. But it's possible guest
> > +	 * is killed or crashed unnormally in which case EREMOVE has not been
> > +	 * done. Do EREMOVE unconditionally here to cover both cases, because
> > +	 * it's not possible to tell whether guest has done EREMOVE, since
> > +	 * virtual EPC page status is not tracked. And it is fine to EREMOVE
> > +	 * EPC page multiple times.
> > +	 */
> > +	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
> > +	if (ret) {
> > +		/*
> > +		 * Only SGX_CHILD_PRESENT is expected, which is because of
> > +		 * EREMOVE-ing an SECS still with child, in which case it can
> > +		 * be handled by EREMOVE-ing the SECS again after all pages in
> > +		 * virtual EPC have been EREMOVE-ed. See comments in below in
> > +		 * sgx_virt_epc_release().
> > +		 */
> > +		WARN_ON_ONCE(ret != SGX_CHILD_PRESENT);
> > +		return ret;
> > +	}
> > +
> > +	__sgx_free_epc_page(epc_page);
> > +	return 0;
> > +}
> > +
> > +static int sgx_virt_epc_release(struct inode *inode, struct file *file)
> > +{
> > +	struct sgx_virt_epc *epc = file->private_data;
> > +	struct sgx_epc_page *epc_page, *tmp, *entry;
> > +	unsigned long index;
> > +
> > +	LIST_HEAD(secs_pages);
> > +
> > +	mmdrop(epc->mm);
> > +
> > +	xa_for_each(&epc->page_array, index, entry) {
> > +		/*
> > +		 * Virtual EPC pages are not tracked, so it's possible for
> > +		 * EREMOVE to fail due to, e.g. a SECS page still has children
> > +		 * if guest was shutdown unexpectedly. If it is the case, leave
> > +		 * it in the xarray and retry EREMOVE below later.
> > +		 */
> > +		if (sgx_virt_epc_free_page(entry))
> > +			continue;
> > +
> > +		xa_erase(&epc->page_array, index);
> > +	}
> > +
> > +	/*
> > +	 * Retry all failed pages after iterating through the entire tree, at
> > +	 * which point all children should be removed and the SECS pages can be
> > +	 * nuked as well...unless userspace has exposed multiple instance of
> > +	 * virtual EPC to a single VM.
> > +	 */
> > +	xa_for_each(&epc->page_array, index, entry) {
> > +		epc_page = entry;
> > +		/*
> > +		 * Error here means that EREMOVE failed due to a SECS page
> > +		 * still has child on *another* EPC instance.  Put it to a
> > +		 * temporary SECS list which will be spliced to 'zombie page
> > +		 * list' and will be EREMOVE-ed again when freeing another
> > +		 * virtual EPC instance.
> > +		 */
> > +		if (sgx_virt_epc_free_page(epc_page))
> > +			list_add_tail(&epc_page->list, &secs_pages);
> > +
> > +		xa_erase(&epc->page_array, index);
> > +	}
> > +
> > +	/*
> > +	 * Third time's a charm.  Try to EREMOVE zombie SECS pages from virtual
> > +	 * EPC instances that were previously released, i.e. free SECS pages
> > +	 * that were in limbo due to having children in *this* EPC instance.
> > +	 */
> > +	mutex_lock(&virt_epc_lock);
> > +	list_for_each_entry_safe(epc_page, tmp, &virt_epc_zombie_pages, list) {
> > +		/*
> > +		 * Speculatively remove the page from the list of zombies, if
> > +		 * the page is successfully EREMOVE it will be added to the
> > +		 * list of free pages.  If EREMOVE fails, throw the page on the
> > +		 * local list, which will be spliced on at the end.
> > +		 */
> > +		list_del(&epc_page->list);
> > +
> > +		if (sgx_virt_epc_free_page(epc_page))
> > +			list_add_tail(&epc_page->list, &secs_pages);
> > +	}
> > +
> > +	if (!list_empty(&secs_pages))
> > +		list_splice_tail(&secs_pages, &virt_epc_zombie_pages);
> > +	mutex_unlock(&virt_epc_lock);
> > +
> > +	kfree(epc);
> > +
> > +	return 0;
> > +}
> > +
> > +static int sgx_virt_epc_open(struct inode *inode, struct file *file)
> > +{
> > +	struct sgx_virt_epc *epc;
> > +
> > +	epc = kzalloc(sizeof(struct sgx_virt_epc), GFP_KERNEL);
> > +	if (!epc)
> > +		return -ENOMEM;
> > +	/*
> > +	 * Keep the current->mm to virtual EPC. It will be checked in
> > +	 * sgx_virt_epc_mmap() to prevent, in case of fork, child being
> > +	 * able to mmap() to the same virtual EPC pages.
> > +	 */
> > +	mmgrab(current->mm);
> > +	epc->mm = current->mm;
> > +	mutex_init(&epc->lock);
> > +	xa_init(&epc->page_array);
> > +
> > +	file->private_data = epc;
> > +
> > +	return 0;
> > +}
> > +
> > +static const struct file_operations sgx_virt_epc_fops = {
> > +	.owner			= THIS_MODULE,
> > +	.open			= sgx_virt_epc_open,
> > +	.release		= sgx_virt_epc_release,
> > +	.mmap			= sgx_virt_epc_mmap,
> > +};
> > +
> > +static struct miscdevice sgx_virt_epc_dev = {
> > +	.minor = MISC_DYNAMIC_MINOR,
> > +	.name = "sgx_virt_epc",
> > +	.nodename = "sgx_virt_epc",
> > +	.fops = &sgx_virt_epc_fops,
> > +};
> > +
> > +int __init sgx_virt_epc_init(void)
> > +{
> > +	INIT_LIST_HEAD(&virt_epc_zombie_pages);
> > +	mutex_init(&virt_epc_lock);
> > +
> > +	return misc_register(&sgx_virt_epc_dev);
> > +}
> > diff --git a/arch/x86/kernel/cpu/sgx/virt.h b/arch/x86/kernel/cpu/sgx/virt.h
> > new file mode 100644
> > index 000000000000..e5434541a122
> > --- /dev/null
> > +++ b/arch/x86/kernel/cpu/sgx/virt.h
> > @@ -0,0 +1,14 @@
> > +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
> > +#ifndef _ASM_X86_SGX_VIRT_H
> > +#define _ASM_X86_SGX_VIRT_H
> > +
> > +#ifdef CONFIG_X86_SGX_VIRTUALIZATION
> > +int __init sgx_virt_epc_init(void);
> > +#else
> > +static inline int __init sgx_virt_epc_init(void)
> > +{
> > +	return -ENODEV;
> > +}
> > +#endif
> > +
> > +#endif /* _ASM_X86_SGX_VIRT_H */
> > -- 
> > 2.29.2
> > 
> > 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-11 17:20 ` Jarkko Sakkinen
  2021-01-11 18:37   ` Sean Christopherson
@ 2021-01-12  1:14   ` Kai Huang
  2021-01-12  2:02     ` Jarkko Sakkinen
  1 sibling, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-12  1:14 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, mattson, joro,
	vkuznets, wanpengli, corbet

On Mon, 11 Jan 2021 19:20:48 +0200 Jarkko Sakkinen wrote:
> On Wed, 2021-01-06 at 14:55 +1300, Kai Huang wrote:
> > --- Disclaimer ---
> > 
> > These patches were originally written by Sean Christopherson while at Intel.
> > Now that Sean has left Intel, I (Kai) have taken over getting them upstream.
> > This series needs more review before it can be merged.  It is being posted
> > publicly and under RFC so Sean and others can review it. Maintainers are safe
> > ignoring it for now.
> > 
> > ------------------
> > 
> > Hi all,
> > 
> > This series adds KVM SGX virtualization support. The first 12 patches starting
> > with x86/sgx or x86/cpu.. are necessary changes to x86 and SGX core/driver to
> > support KVM SGX virtualization, while the rest are patches to KVM subsystem.
> > 
> > Please help to review this series. Also I'd like to hear what is the proper
> > way to merge this series, since it contains change to both x86/SGX and KVM
> > subsystem. Any feedback is highly appreciated. And please let me know if I
> > forgot to CC anyone, or anyone wants to be removed from CC. Thanks in advance!
> > 
> > This series is based against latest tip tree's x86/sgx branch. You can also get
> > the code from tip branch of kvm-sgx repo on github:
> > 
> >         https://github.com/intel/kvm-sgx.git tip
> > 
> > It also requires Qemu changes to create VM with SGX support. You can find Qemu
> > repo here:
> > 
> >         https://github.com/intel/qemu-sgx.git next
> > 
> > Please refer to README.md of above qemu-sgx repo for detail on how to create
> > guest with SGX support. At meantime, for your quick reference you can use below
> > command to create SGX guest:
> > 
> >         #qemu-system-x86_64 -smp 4 -m 2G -drive file=<your_vm_image>,if=virtio \
> >                 -cpu host,+sgx_provisionkey \
> >                 -sgx-epc id=epc1,memdev=mem1 \
> >                 -object memory-backend-epc,id=mem1,size=64M,prealloc
> > 
> > Please note that the SGX relevant part is:
> > 
> >                 -cpu host,+sgx_provisionkey \
> >                 -sgx-epc id=epc1,memdev=mem1 \
> >                 -object memory-backend-epc,id=mem1,size=64M,prealloc
> > 
> > And you can change other parameters of your qemu command based on your needs.
> 
> Thanks a lot documenting these snippets to the cover letter. I dig these
> up from lore once my environment is working.
> 
> I'm setting up Arch based test environment with the eye on this patch set
> and generic Linux keyring patches:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/jarkko/arch.git/
> 
> Still have some minor bits to adjust before I can start deploying it for SGX
> testing. For this patch set I'll use two instances of it.

Thanks. Please let me know if you need anything more.

> 
> > =========
> > KVM SGX virtualization Overview
> > 
> > - Virtual EPC
> > 
> > "Virtual EPC" is the EPC section exposed by KVM to guest so SGX software in
> > guest can discover it and use it to create SGX enclaves. KVM exposes SGX to 
> 
> Virtual EPC is a representation of an EPC section. And there is no "the".
> 
> > guest via CPUID, and exposes one or more "virtual EPC" sections for guest.
> > The size of "virtual EPC" is passed as Qemu parameter when creating the
> > guest, and the base address is calcualted internally according to guest's
> > configuration.
> > 
> > To support virtual EPC, add a new misc device /dev/sgx_virt_epc to SGX
> > core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
> > "virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX driver,
> > virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave associated,
> > and how virtual EPC is used by guest is compeletely controlled by guest's SGX
> > software.
> 
> I think that /dev/sgx_vepc would be a clear enough name for the device. This
> text has now a bit confusing "terminology" related to this.

/dev/sgx_virt_epc may be clearer from userspace's perspective, for instance,
if people see /dev/sgx_vepc, they may have to think about what it is,
while /dev/sgx_virt_epc they may not.

But I don't have strong objection here. Does anyone has anything to say here?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code
  2021-01-12  0:16     ` Kai Huang
@ 2021-01-12  1:46       ` Jarkko Sakkinen
  0 siblings, 0 replies; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-12  1:46 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Tue, Jan 12, 2021 at 01:16:53PM +1300, Kai Huang wrote:
> On Tue, 12 Jan 2021 01:32:19 +0200 Jarkko Sakkinen wrote:
> > On Wed, Jan 06, 2021 at 02:55:19PM +1300, Kai Huang wrote:
> > > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > > 
> > > SGX virtualization requires to allocate "raw" EPC and use it as "virtual
> > > EPC" for SGX guest.  Unlike EPC used by SGX driver, virtual EPC doesn't
> > > track how EPC pages are used in VM, e.g. (de)construction of enclaves,
> > > so it cannot guarantee EREMOVE success, e.g. it doesn't have a priori
> > > knowledge of which pages are SECS with non-zero child counts.
> > > 
> > > Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
> > > failures are expected, but only due to SGX_CHILD_PRESENT.
> > > 
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > 
> > Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> 
> Thanks Jarkko. 
> 
> Dave suggested to change patch subject to explicitly call out hardware error
> code:
> 	Add SGX_CHILD_PRESENT hardware error code
> 
> I suppose this also works for you, and I can have your Acked-by after I changed
> that in v2?

Yeah, I agree with that.

/Jarkko

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-12  0:56     ` Kai Huang
@ 2021-01-12  1:50       ` Jarkko Sakkinen
  2021-01-12  2:03         ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-12  1:50 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Tue, Jan 12, 2021 at 01:56:54PM +1300, Kai Huang wrote:
> On Tue, 12 Jan 2021 01:38:23 +0200 Jarkko Sakkinen wrote:
> > On Wed, Jan 06, 2021 at 02:55:20PM +1300, Kai Huang wrote:
> > > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > > 
> > > Add a misc device /dev/sgx_virt_epc to allow userspace to allocate "raw"
> > > EPC without an associated enclave.  The intended and only known use case
> > > for raw EPC allocation is to expose EPC to a KVM guest, hence the
> > > virt_epc moniker, virt.{c,h} files and X86_SGX_VIRTUALIZATION Kconfig.
> > > 
> > > Modify sgx_init() to always try to initialize virtual EPC driver, even
> > > when SGX driver is disabled due to SGX Launch Control is in locked mode,
> > > or not present at all, since SGX virtualization allows to expose SGX to
> > > guests that support non-LC configurations.
> > > 
> > > Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> > > /dev/sgx_virt_epc rather than in KVM. Doing so has two major advantages:
> > > 
> > >   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
> > >     just another memory backend for guests.
> > > 
> > >   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
> > >     does not have to export any symbols, changes to reclaim flows don't
> > >     need to be routed through KVM, SGX's dirty laundry doesn't have to
> > >     get aired out for the world to see, and so on and so forth.
> > > 
> > > The virtual EPC allocated to guests is currently not reclaimable, due to
> > > oversubscription of EPC for KVM guests is not currently supported. Due
> > > to the complications of handling reclaim conflicts between guest and
> > > host, KVM EPC oversubscription is significantly more complex than basic
> > > support for SGX virtualization.
> > > 
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > 
> > The commit message does not describe the code changes. It should
> > have an understandable explanation of fops. There is nothing about
> > the implementation right now.
> 
> Thanks for feedback. Does "understabdable explanation of fops" mean I
> should add one sentence to say, for instance: "userspace hypervisor should open
> the /dev/sgx_virt_epc, use mmap() to get a valid address range, and then use
> that address range to create KVM memory region"?
> 
> Or should I include an example of how to use /dev/sgx_virt_epc in userspace, for
> instance, below?
> 
> 	fd = open("/dev/sgx_virt_epc", O_RDWR);
> 	void *addr = mmap(NULL, size, ..., fd);
> 	/* userspace hypervisor uses addr, size to create KVM memory slot */
> 	...

I would suggest just to describe them in few sentences. Just write
how you understand them in one paragraph.

/Jarkko

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-11 18:37   ` Sean Christopherson
@ 2021-01-12  1:58     ` Jarkko Sakkinen
  0 siblings, 0 replies; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-12  1:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, linux-sgx, kvm, x86, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, mattson, joro,
	vkuznets, wanpengli, corbet

On Mon, Jan 11, 2021 at 10:37:05AM -0800, Sean Christopherson wrote:
> On Mon, Jan 11, 2021, Jarkko Sakkinen wrote:
> > On Wed, 2021-01-06 at 14:55 +1300, Kai Huang wrote:
> > >   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
> > >     just another memory backend for guests.
> > 
> > Why this an advantage? No objection, just a question.
> 
> There are zero KVM changes required to support exposing EPC to a guest.  KVM's
> MMU is completely ignorant of what physical backing is used for any given host
> virtual address.  KVM has to be aware of various VM_* flags, e.g. VM_PFNMAP and
> VM_IO, but that code is arch agnostic and is quite isolated.

Right, thanks for explanation.

> > >   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
> > >     does not have to export any symbols, changes to reclaim flows don't
> > >     need to be routed through KVM, SGX's dirty laundry doesn't have to
> > >     get aired out for the world to see, and so on and so forth.
> > 
> > No comments to this before understanding code changes better.
> > 
> > > The virtual EPC allocated to guests is currently not reclaimable, due to
> > > reclaiming EPC from KVM guests is not currently supported. Due to the
> > > complications of handling reclaim conflicts between guest and host, KVM
> > > EPC oversubscription, which allows total virtual EPC size greater than
> > > physical EPC by being able to reclaiming guests' EPC, is significantly more
> > > complex than basic support for SGX virtualization.
> > 
> > I think it should be really in the center of the patch set description that
> > this patch set implements segmentation of EPC, not oversubscription. It should
> > be clear immediately. It's a core part of knowing "what I'm looking at".
> 
> Technically, it doesn't implement EPC segmentation of EPC.  It implements
> non-reclaimable EPC allocation.  Even that is somewhat untrue as the EPC can be
> forcefully reclaimed, but doing so will destroy the guest contents.

In SGX case, that isn't actually as a bad as a policy in high stress
situations as with "normal" applications.  Runtimes must expect
dissappearance of the enclave at any point of time anyway...

> Userspace can oversubscribe the EPC to KVM guests, but it would need to kill,
> migrate, or pause one or more VMs if the pool of physical EPC were exhausted.

Right.

> 
> > > - Support SGX virtualization without SGX Launch Control unlocked mode
> > > 
> > > Although SGX driver requires SGX Launch Control unlocked mode to work, SGX
> > > virtualization doesn't, since how enclave is created is completely controlled
> > > by guest SGX software, which is not necessarily linux. Therefore, this series
> > > allows KVM to expose SGX to guest even SGX Launch Control is in locked mode,
> > > or is not present at all. The reason is the goal of SGX virtualization, or
> > > virtualization in general, is to expose hardware feature to guest, but not to
> > > make assumption how guest will use it. Therefore, KVM should support SGX guest
> > > as long as hardware is able to, to have chance to support more potential use
> > > cases in cloud environment.
> > 
> > AFAIK the convergence point with the FLC was, and is that Linux never enables
> > SGX with locked MSRs.
> > 
> > And I don't understand, if it is not fine to allow locked SGX for a *process*,
> > why is it fine for a *virtual machine*? They have a lot same.
> 
> Because it's a completely different OS/kernel.  If the user has a kernel that
> supports locked SGX, then so be it.  There's no novel circumvention of the
> kernel policy, e.g. the user could simply boot the non-upstream kernel directly,
> and running an upstream kernel in the guest will not cause the kernel to support
> SGX.
> 
> There are any number of things that are allowed in a KVM guest that are not
> allowed in a bare metal process.

I buy this.

> > I cannot remember out of top of my head, could the Intel SHA256 be read when
> > booted with unlocked MSRs. If that is the case, then you can still support
> > guests with that configuration.
> 
> No, it's not guaranteed to be readable as firmware could have already changed
> the values in the MSRs.

Right.

> > Context-dependent guidelines tend to also trash code big time. Also, for the
> > sake of a sane kernel code base, I would consider only supporting unlocked
> > MSRs.
> 
> It's one line of a code to teach the kernel driver not to load if the MSRs are
> locked.  And IMO, that one line of code is a net positive as it makes it clear
> in the driver itself that it chooses not support locked MSRs, even if SGX itself
> is fully enabled.

Yup, I think this clears my concerns, thank you.

/Jarkko

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-11 19:20                         ` Sean Christopherson
@ 2021-01-12  2:01                           ` Kai Huang
  2021-01-12 12:13                           ` Borislav Petkov
  1 sibling, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-12  2:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Mon, 11 Jan 2021 11:20:11 -0800 Sean Christopherson wrote:
> On Mon, Jan 11, 2021, Borislav Petkov wrote:
> > On Mon, Jan 11, 2021 at 09:54:17AM -0800, Sean Christopherson wrote:
> > > It would be possible for KVM to break the dependency on X86_FEATURE_* bit
> > > offsets by defining a translation layer, but I strongly feel that adding manual
> > > translations will do more harm than good as it increases the odds of us botching
> > > a translation or using the wrong feature flag, creates potential namespace
> > > conflicts, etc...
> > 
> > Ok, lemme see if we might encounter more issues down the road...
> > 
> > +enum kvm_only_cpuid_leafs {
> > +       CPUID_12_EAX     = NCAPINTS,
> > +       NR_KVM_CPU_CAPS,
> > +
> > +       NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
> > +};
> > +
> > 
> > What happens when we decide to allocate a separate leaf for CPUID_12_EAX
> > down the road?
> 
> Well, mechanically, that would generate a build failure if the kernel does the
> obvious things and names the 'enum cpuid_leafs' entry CPUID_12_EAX.  That would
> be an obvious clue that KVM should be updated.
> 
> If the kernel named the enum entry something different, and we botched the code
> review, KVM would continue to work, but would unnecessarily copy the bits it
> cares about to its own word.   E.g. the boot_cpu_has() checks and translation to
> __X86_FEATURE_* would still be valid.  As far as failure modes go, that's not
> terrible.

Should we add a dedicated, i.e. kvm_scattered_cpu_caps[], instead of using
existing kvm_cpu_cap[NCAPINTS]? If so this issue can be avoided??

> 
> > You do it already here
> > 
> > Subject: [PATCH 04/13] x86/cpufeatures: Assign dedicated feature word for AMD mem encryption
> > 
> > for the AMD leaf.
> > 
> > I'm thinking this way around - from scattered to a hw one - should be ok
> > because that should work easily. The other way around, taking a hw leaf
> > and scattering it around x86_capability[] array elems would probably be
> > nasty but with your change that should work too.
> > 
> > Yah, I'm just hypothesizing here - I don't think this "other way around"
> > will ever happen...
> > 
> > Hmm, yap, I can cautiously say that with your change we should be ok...
> > 
> > Thx.
> > 
> > -- 
> > Regards/Gruss,
> >     Boris.
> > 
> > https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-12  1:14   ` Kai Huang
@ 2021-01-12  2:02     ` Jarkko Sakkinen
  2021-01-12  2:07       ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-12  2:02 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, mattson, joro,
	vkuznets, wanpengli, corbet

On Tue, Jan 12, 2021 at 02:14:28PM +1300, Kai Huang wrote:
> On Mon, 11 Jan 2021 19:20:48 +0200 Jarkko Sakkinen wrote:
> > On Wed, 2021-01-06 at 14:55 +1300, Kai Huang wrote:
> > > --- Disclaimer ---
> > > 
> > > These patches were originally written by Sean Christopherson while at Intel.
> > > Now that Sean has left Intel, I (Kai) have taken over getting them upstream.
> > > This series needs more review before it can be merged.  It is being posted
> > > publicly and under RFC so Sean and others can review it. Maintainers are safe
> > > ignoring it for now.
> > > 
> > > ------------------
> > > 
> > > Hi all,
> > > 
> > > This series adds KVM SGX virtualization support. The first 12 patches starting
> > > with x86/sgx or x86/cpu.. are necessary changes to x86 and SGX core/driver to
> > > support KVM SGX virtualization, while the rest are patches to KVM subsystem.
> > > 
> > > Please help to review this series. Also I'd like to hear what is the proper
> > > way to merge this series, since it contains change to both x86/SGX and KVM
> > > subsystem. Any feedback is highly appreciated. And please let me know if I
> > > forgot to CC anyone, or anyone wants to be removed from CC. Thanks in advance!
> > > 
> > > This series is based against latest tip tree's x86/sgx branch. You can also get
> > > the code from tip branch of kvm-sgx repo on github:
> > > 
> > >         https://github.com/intel/kvm-sgx.git tip
> > > 
> > > It also requires Qemu changes to create VM with SGX support. You can find Qemu
> > > repo here:
> > > 
> > >         https://github.com/intel/qemu-sgx.git next
> > > 
> > > Please refer to README.md of above qemu-sgx repo for detail on how to create
> > > guest with SGX support. At meantime, for your quick reference you can use below
> > > command to create SGX guest:
> > > 
> > >         #qemu-system-x86_64 -smp 4 -m 2G -drive file=<your_vm_image>,if=virtio \
> > >                 -cpu host,+sgx_provisionkey \
> > >                 -sgx-epc id=epc1,memdev=mem1 \
> > >                 -object memory-backend-epc,id=mem1,size=64M,prealloc
> > > 
> > > Please note that the SGX relevant part is:
> > > 
> > >                 -cpu host,+sgx_provisionkey \
> > >                 -sgx-epc id=epc1,memdev=mem1 \
> > >                 -object memory-backend-epc,id=mem1,size=64M,prealloc
> > > 
> > > And you can change other parameters of your qemu command based on your needs.
> > 
> > Thanks a lot documenting these snippets to the cover letter. I dig these
> > up from lore once my environment is working.
> > 
> > I'm setting up Arch based test environment with the eye on this patch set
> > and generic Linux keyring patches:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/jarkko/arch.git/
> > 
> > Still have some minor bits to adjust before I can start deploying it for SGX
> > testing. For this patch set I'll use two instances of it.
> 
> Thanks. Please let me know if you need anything more.
> 
> > 
> > > =========
> > > KVM SGX virtualization Overview
> > > 
> > > - Virtual EPC
> > > 
> > > "Virtual EPC" is the EPC section exposed by KVM to guest so SGX software in
> > > guest can discover it and use it to create SGX enclaves. KVM exposes SGX to 
> > 
> > Virtual EPC is a representation of an EPC section. And there is no "the".
> > 
> > > guest via CPUID, and exposes one or more "virtual EPC" sections for guest.
> > > The size of "virtual EPC" is passed as Qemu parameter when creating the
> > > guest, and the base address is calcualted internally according to guest's
> > > configuration.
> > > 
> > > To support virtual EPC, add a new misc device /dev/sgx_virt_epc to SGX
> > > core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
> > > "virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX driver,
> > > virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave associated,
> > > and how virtual EPC is used by guest is compeletely controlled by guest's SGX
> > > software.
> > 
> > I think that /dev/sgx_vepc would be a clear enough name for the device. This
> > text has now a bit confusing "terminology" related to this.
> 
> /dev/sgx_virt_epc may be clearer from userspace's perspective, for instance,
> if people see /dev/sgx_vepc, they may have to think about what it is,
> while /dev/sgx_virt_epc they may not.
> 
> But I don't have strong objection here. Does anyone has anything to say here?

It's already an abberevation to start with, why leave it halfways?

Especially when three remaining words have been shrunk to single
characters ('E', 'P' and 'C').

/Jarkko

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-12  1:50       ` Jarkko Sakkinen
@ 2021-01-12  2:03         ` Kai Huang
  0 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-12  2:03 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Tue, 12 Jan 2021 03:50:23 +0200 Jarkko Sakkinen wrote:
> On Tue, Jan 12, 2021 at 01:56:54PM +1300, Kai Huang wrote:
> > On Tue, 12 Jan 2021 01:38:23 +0200 Jarkko Sakkinen wrote:
> > > On Wed, Jan 06, 2021 at 02:55:20PM +1300, Kai Huang wrote:
> > > > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > > > 
> > > > Add a misc device /dev/sgx_virt_epc to allow userspace to allocate "raw"
> > > > EPC without an associated enclave.  The intended and only known use case
> > > > for raw EPC allocation is to expose EPC to a KVM guest, hence the
> > > > virt_epc moniker, virt.{c,h} files and X86_SGX_VIRTUALIZATION Kconfig.
> > > > 
> > > > Modify sgx_init() to always try to initialize virtual EPC driver, even
> > > > when SGX driver is disabled due to SGX Launch Control is in locked mode,
> > > > or not present at all, since SGX virtualization allows to expose SGX to
> > > > guests that support non-LC configurations.
> > > > 
> > > > Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> > > > /dev/sgx_virt_epc rather than in KVM. Doing so has two major advantages:
> > > > 
> > > >   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
> > > >     just another memory backend for guests.
> > > > 
> > > >   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
> > > >     does not have to export any symbols, changes to reclaim flows don't
> > > >     need to be routed through KVM, SGX's dirty laundry doesn't have to
> > > >     get aired out for the world to see, and so on and so forth.
> > > > 
> > > > The virtual EPC allocated to guests is currently not reclaimable, due to
> > > > oversubscription of EPC for KVM guests is not currently supported. Due
> > > > to the complications of handling reclaim conflicts between guest and
> > > > host, KVM EPC oversubscription is significantly more complex than basic
> > > > support for SGX virtualization.
> > > > 
> > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > 
> > > The commit message does not describe the code changes. It should
> > > have an understandable explanation of fops. There is nothing about
> > > the implementation right now.
> > 
> > Thanks for feedback. Does "understabdable explanation of fops" mean I
> > should add one sentence to say, for instance: "userspace hypervisor should open
> > the /dev/sgx_virt_epc, use mmap() to get a valid address range, and then use
> > that address range to create KVM memory region"?
> > 
> > Or should I include an example of how to use /dev/sgx_virt_epc in userspace, for
> > instance, below?
> > 
> > 	fd = open("/dev/sgx_virt_epc", O_RDWR);
> > 	void *addr = mmap(NULL, size, ..., fd);
> > 	/* userspace hypervisor uses addr, size to create KVM memory slot */
> > 	...
> 
> I would suggest just to describe them in few sentences. Just write
> how you understand them in one paragraph.

Will do. Thanks.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-12  2:02     ` Jarkko Sakkinen
@ 2021-01-12  2:07       ` Kai Huang
  2021-01-15 14:43         ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-12  2:07 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, mattson, joro,
	vkuznets, wanpengli, corbet


> > > > 
> > > > To support virtual EPC, add a new misc device /dev/sgx_virt_epc to SGX
> > > > core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
> > > > "virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX driver,
> > > > virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave associated,
> > > > and how virtual EPC is used by guest is compeletely controlled by guest's SGX
> > > > software.
> > > 
> > > I think that /dev/sgx_vepc would be a clear enough name for the device. This
> > > text has now a bit confusing "terminology" related to this.
> > 
> > /dev/sgx_virt_epc may be clearer from userspace's perspective, for instance,
> > if people see /dev/sgx_vepc, they may have to think about what it is,
> > while /dev/sgx_virt_epc they may not.
> > 
> > But I don't have strong objection here. Does anyone has anything to say here?
> 
> It's already an abberevation to start with, why leave it halfways?
> 
> Especially when three remaining words have been shrunk to single
> characters ('E', 'P' and 'C').
> 

I have expressed my opinion above. And as I said I don't have strong objection
here. I'll change to /dev/sgx_vepc if no one opposes.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-11 19:20                         ` Sean Christopherson
  2021-01-12  2:01                           ` Kai Huang
@ 2021-01-12 12:13                           ` Borislav Petkov
  2021-01-12 17:15                             ` Sean Christopherson
  1 sibling, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2021-01-12 12:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Mon, Jan 11, 2021 at 11:20:11AM -0800, Sean Christopherson wrote:
> Well, mechanically, that would generate a build failure if the kernel does the
> obvious things and names the 'enum cpuid_leafs' entry CPUID_12_EAX.  That would
> be an obvious clue that KVM should be updated.

Then we need to properly document that whenever someone does that
change, someone needs to touch the proper places.

> If the kernel named the enum entry something different, and we botched the code
> review, KVM would continue to work, but would unnecessarily copy the bits it
> cares about to its own word.   E.g. the boot_cpu_has() checks and translation to
> __X86_FEATURE_* would still be valid.  As far as failure modes go, that's not
> terrible.

Right, which reminds me: with your prototype patch, we would have:

static __always_inline void __kvm_cpu_cap_mask(enum cpuid_leafs leaf)
{
        const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
        struct kvm_cpuid_entry2 entry;

        reverse_cpuid_check(leaf);

        cpuid_count(cpuid.function, cpuid.index,
                    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);

        kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
}

which does read CPUID from the hw and kvm_cpu_caps[] has already the
copied bits from boot_cpu_data.x86_capability.

Now you said that reading the CPUID is mostly redundant but we're
paranoid so we do it anyway, just in case, so how about we remove the
copying of boot_cpu_data.x86_capability? That's one less dependency
on the baremetal implementation.

Practically, nothing changes for kvm because it will read CPUID which is
the canonical thing anyway. And this should simplify the deal more and
keep it simple(r).

Hmmm.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-12 12:13                           ` Borislav Petkov
@ 2021-01-12 17:15                             ` Sean Christopherson
  2021-01-12 17:51                               ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Sean Christopherson @ 2021-01-12 17:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Tue, Jan 12, 2021, Borislav Petkov wrote:
> On Mon, Jan 11, 2021 at 11:20:11AM -0800, Sean Christopherson wrote:
> > Well, mechanically, that would generate a build failure if the kernel does the
> > obvious things and names the 'enum cpuid_leafs' entry CPUID_12_EAX.  That would
> > be an obvious clue that KVM should be updated.
> 
> Then we need to properly document that whenever someone does that
> change, someone needs to touch the proper places.
> 
> > If the kernel named the enum entry something different, and we botched the code
> > review, KVM would continue to work, but would unnecessarily copy the bits it
> > cares about to its own word.   E.g. the boot_cpu_has() checks and translation to
> > __X86_FEATURE_* would still be valid.  As far as failure modes go, that's not
> > terrible.
> 
> Right, which reminds me: with your prototype patch, we would have:
> 
> static __always_inline void __kvm_cpu_cap_mask(enum cpuid_leafs leaf)
> {
>         const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
>         struct kvm_cpuid_entry2 entry;
> 
>         reverse_cpuid_check(leaf);
> 
>         cpuid_count(cpuid.function, cpuid.index,
>                     &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
> 
>         kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
> }
> 
> which does read CPUID from the hw and kvm_cpu_caps[] has already the
> copied bits from boot_cpu_data.x86_capability.
> 
> Now you said that reading the CPUID is mostly redundant but we're
> paranoid so we do it anyway, just in case, so how about we remove the
> copying of boot_cpu_data.x86_capability? That's one less dependency
> on the baremetal implementation.
>
> Practically, nothing changes for kvm because it will read CPUID which is
> the canonical thing anyway. And this should simplify the deal more and
> keep it simple(r).

We want the boot_cpu_data.x86_capability memcpy() so that KVM doesn't advertise
support for features that are intentionally disabled in the kernel, e.g. via
kernel params.  Except for a few special cases, e.g. LA57, KVM doesn't enable
features in the guest if they're disabled in the host, even if the features are
supported in hardware.

For some features, e.g. SMEP and SMAP, honoring boot_cpu_data is mostly about
respecting the kernel's wishes, i.e. barring hardware bugs, enabling such
features in the guest won't break anything.  But for other features, e.g. XSAVE
based features, enabling them in the guest without proper support in the host
will corrupt guest and/or host state.

So it's really the CPUID read that is (mostly) superfluous.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-12 17:15                             ` Sean Christopherson
@ 2021-01-12 17:51                               ` Borislav Petkov
  2021-01-12 21:07                                 ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2021-01-12 17:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Tue, Jan 12, 2021 at 09:15:52AM -0800, Sean Christopherson wrote:
> We want the boot_cpu_data.x86_capability memcpy() so that KVM doesn't advertise
> support for features that are intentionally disabled in the kernel, e.g. via
> kernel params.  Except for a few special cases, e.g. LA57, KVM doesn't enable
> features in the guest if they're disabled in the host, even if the features are
> supported in hardware.
> 
> For some features, e.g. SMEP and SMAP, honoring boot_cpu_data is mostly about
> respecting the kernel's wishes, i.e. barring hardware bugs, enabling such
> features in the guest won't break anything.  But for other features, e.g. XSAVE
> based features, enabling them in the guest without proper support in the host
> will corrupt guest and/or host state.

Ah ok, that is an important point.
 
> So it's really the CPUID read that is (mostly) superfluous.

Yeah, but that is cheap, as we established.

Ok then, I don't see anything that might be a problem and I guess we can
try that handling of scattered bits in kvm and see how far we'll get.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-12 17:51                               ` Borislav Petkov
@ 2021-01-12 21:07                                 ` Kai Huang
  2021-01-12 23:17                                   ` Sean Christopherson
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-12 21:07 UTC (permalink / raw)
  To: Borislav Petkov, Sean Christopherson
  Cc: Dave Hansen, linux-sgx, kvm, x86, jarkko, luto, haitao.huang,
	pbonzini, tglx, mingo, hpa

On Tue, 2021-01-12 at 18:51 +0100, Borislav Petkov wrote:
> On Tue, Jan 12, 2021 at 09:15:52AM -0800, Sean Christopherson wrote:
> > We want the boot_cpu_data.x86_capability memcpy() so that KVM doesn't advertise
> > support for features that are intentionally disabled in the kernel, e.g. via
> > kernel params.  Except for a few special cases, e.g. LA57, KVM doesn't enable
> > features in the guest if they're disabled in the host, even if the features are
> > supported in hardware.
> > 
> > For some features, e.g. SMEP and SMAP, honoring boot_cpu_data is mostly about
> > respecting the kernel's wishes, i.e. barring hardware bugs, enabling such
> > features in the guest won't break anything.  But for other features, e.g. XSAVE
> > based features, enabling them in the guest without proper support in the host
> > will corrupt guest and/or host state.
> 
> Ah ok, that is an important point.
>  
> 
> 
> 
> > So it's really the CPUID read that is (mostly) superfluous.
> 
> Yeah, but that is cheap, as we established.
> 
> Ok then, I don't see anything that might be a problem and I guess we can
> try that handling of scattered bits in kvm and see how far we'll get.

Hi Sean, Boris,

Thanks for all  your feedback.

Sean,

Do you want to send me your patch (so that with your SoB), or do you want me to copy
& paste the code you posted in this series, plus Suggested-by you? Or how do you want
to proceed?

Also to me it is better to separate X86_FEATURE_SGX1/2 with rest of KVM changes?

And do you think adding a dedicated, i.e. kvm_scattered_cpu_caps[], instead of using
existing kvm_cpu_cap[NCAPINTS] would be helpful to solve the problem caused by adding
new leaf to x86 core (see my another reply in this thread)?

> 
> Thx.
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 01/23] x86/sgx: Split out adding EPC page to free list to separate helper
  2021-01-12  0:19     ` Kai Huang
@ 2021-01-12 21:45       ` Sean Christopherson
  2021-01-13  1:15         ` Kai Huang
  2021-01-13 17:05         ` Jarkko Sakkinen
  0 siblings, 2 replies; 111+ messages in thread
From: Sean Christopherson @ 2021-01-12 21:45 UTC (permalink / raw)
  To: Kai Huang
  Cc: Jarkko Sakkinen, linux-sgx, kvm, x86, luto, dave.hansen,
	haitao.huang, pbonzini, bp, tglx, mingo, hpa

On Tue, Jan 12, 2021, Kai Huang wrote:
> On Tue, 12 Jan 2021 00:38:40 +0200 Jarkko Sakkinen wrote:
> > On Wed, 2021-01-06 at 14:55 +1300, Kai Huang wrote:
> > > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > > 
> > > SGX virtualization requires to allocate "raw" EPC and use it as virtual
> > > EPC for SGX guest.  Unlike EPC used by SGX driver, virtual EPC doesn't
> > > track how EPC pages are used in VM, e.g. (de)construction of enclaves,
> > > so it cannot guarantee EREMOVE success, e.g. it doesn't have a priori
> > > knowledge of which pages are SECS with non-zero child counts.
> > > 
> > > Split sgx_free_page() into two parts so that the "add to free list"
> > > part can be used by virtual EPC without having to modify the EREMOVE
> > > logic in sgx_free_page().
> > > 
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > 
> > I have a better idea with the same outcome for KVM.
> > 
> > https://lore.kernel.org/linux-sgx/20210111223610.62261-1-jarkko@kernel.org/T/#t
> 
> I agree with your patch this one can be replaced. I'll include your patch in
> next version, and once it is upstreamed, it can be removed in my series.
> 
> Sean, please let me know if you have objection.

6 of one, half dozen of the other.  I liked not having to modify the existing
call sites, but it's your code.

Though on that topic, this snippet is wrong:

@@ -431,7 +443,8 @@ void sgx_encl_release(struct kref *ref)
 		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
 					   list);
 		list_del(&va_page->list);
-		sgx_free_epc_page(va_page->epc_page);
+		sgx_reset_epc_page(entry->epc_page);
+		sgx_free_epc_page(entry->epc_page);

s/entry/va_page in the new code.

P.S. I apparently hadn't been subscribed linux-sgx and so didn't see those
     patches.  I'm now subscribed and can chime-in as needed.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-12 21:07                                 ` Kai Huang
@ 2021-01-12 23:17                                   ` Sean Christopherson
  2021-01-13  1:05                                     ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Sean Christopherson @ 2021-01-12 23:17 UTC (permalink / raw)
  To: Kai Huang
  Cc: Borislav Petkov, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Wed, Jan 13, 2021, Kai Huang wrote:
> On Tue, 2021-01-12 at 18:51 +0100, Borislav Petkov wrote:
> > On Tue, Jan 12, 2021 at 09:15:52AM -0800, Sean Christopherson wrote:
> > > We want the boot_cpu_data.x86_capability memcpy() so that KVM doesn't advertise
> > > support for features that are intentionally disabled in the kernel, e.g. via
> > > kernel params.  Except for a few special cases, e.g. LA57, KVM doesn't enable
> > > features in the guest if they're disabled in the host, even if the features are
> > > supported in hardware.
> > > 
> > > For some features, e.g. SMEP and SMAP, honoring boot_cpu_data is mostly about
> > > respecting the kernel's wishes, i.e. barring hardware bugs, enabling such
> > > features in the guest won't break anything.  But for other features, e.g. XSAVE
> > > based features, enabling them in the guest without proper support in the host
> > > will corrupt guest and/or host state.
> > 
> > Ah ok, that is an important point.
> > 
> > > So it's really the CPUID read that is (mostly) superfluous.
> > 
> > Yeah, but that is cheap, as we established.
> > 
> > Ok then, I don't see anything that might be a problem and I guess we can
> > try that handling of scattered bits in kvm and see how far we'll get.
> 
> Hi Sean, Boris,
> 
> Thanks for all  your feedback.
> 
> Sean,
> 
> Do you want to send me your patch (so that with your SoB), or do you want me to copy
> & paste the code you posted in this series, plus Suggested-by you? Or how do you want
> to proceed?
> 
> Also to me it is better to separate X86_FEATURE_SGX1/2 with rest of KVM changes?

Hmm, I'll split the changes into two proper patches and send them to you off list.

> And do you think adding a dedicated, i.e. kvm_scattered_cpu_caps[], instead of using
> existing kvm_cpu_cap[NCAPINTS] would be helpful to solve the problem caused by adding
> new leaf to x86 core (see my another reply in this thread)?

Probably not, because then we'd have to add new helpers to deal with the new
array, or change all the helpers to take the array as a pointer.  Blasting past
NCAPINTS is a little evil, but it does slot in nicely to the existing code.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-01-12 23:17                                   ` Sean Christopherson
@ 2021-01-13  1:05                                     ` Kai Huang
  0 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-13  1:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, Dave Hansen, linux-sgx, kvm, x86, jarkko, luto,
	haitao.huang, pbonzini, tglx, mingo, hpa

On Tue, 2021-01-12 at 15:17 -0800, Sean Christopherson wrote:
> On Wed, Jan 13, 2021, Kai Huang wrote:
> > On Tue, 2021-01-12 at 18:51 +0100, Borislav Petkov wrote:
> > > On Tue, Jan 12, 2021 at 09:15:52AM -0800, Sean Christopherson wrote:
> > > > We want the boot_cpu_data.x86_capability memcpy() so that KVM doesn't advertise
> > > > support for features that are intentionally disabled in the kernel, e.g. via
> > > > kernel params.  Except for a few special cases, e.g. LA57, KVM doesn't enable
> > > > features in the guest if they're disabled in the host, even if the features are
> > > > supported in hardware.
> > > > 
> > > > For some features, e.g. SMEP and SMAP, honoring boot_cpu_data is mostly about
> > > > respecting the kernel's wishes, i.e. barring hardware bugs, enabling such
> > > > features in the guest won't break anything.  But for other features, e.g. XSAVE
> > > > based features, enabling them in the guest without proper support in the host
> > > > will corrupt guest and/or host state.
> > > 
> > > Ah ok, that is an important point.
> > > 
> > > > So it's really the CPUID read that is (mostly) superfluous.
> > > 
> > > Yeah, but that is cheap, as we established.
> > > 
> > > Ok then, I don't see anything that might be a problem and I guess we can
> > > try that handling of scattered bits in kvm and see how far we'll get.
> > 
> > Hi Sean, Boris,
> > 
> > Thanks for all  your feedback.
> > 
> > Sean,
> > 
> > Do you want to send me your patch (so that with your SoB), or do you want me to copy
> > & paste the code you posted in this series, plus Suggested-by you? Or how do you want
> > to proceed?
> > 
> > Also to me it is better to separate X86_FEATURE_SGX1/2 with rest of KVM changes?
> 
> Hmm, I'll split the changes into two proper patches and send them to you off list.

Thanks.

> 
> > And do you think adding a dedicated, i.e. kvm_scattered_cpu_caps[], instead of using
> > existing kvm_cpu_cap[NCAPINTS] would be helpful to solve the problem caused by adding
> > new leaf to x86 core (see my another reply in this thread)?
> 
> Probably not, because then we'd have to add new helpers to deal with the new
> array, or change all the helpers to take the array as a pointer.  Blasting past
> NCAPINTS is a little evil, but it does slot in nicely to the existing code.

Sure. Thanks.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 01/23] x86/sgx: Split out adding EPC page to free list to separate helper
  2021-01-12 21:45       ` Sean Christopherson
@ 2021-01-13  1:15         ` Kai Huang
  2021-01-13 17:05         ` Jarkko Sakkinen
  1 sibling, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-13  1:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jarkko Sakkinen, linux-sgx, kvm, x86, luto, dave.hansen,
	haitao.huang, pbonzini, bp, tglx, mingo, hpa

On Tue, 12 Jan 2021 13:45:24 -0800 Sean Christopherson wrote:
> On Tue, Jan 12, 2021, Kai Huang wrote:
> > On Tue, 12 Jan 2021 00:38:40 +0200 Jarkko Sakkinen wrote:
> > > On Wed, 2021-01-06 at 14:55 +1300, Kai Huang wrote:
> > > > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > > > 
> > > > SGX virtualization requires to allocate "raw" EPC and use it as virtual
> > > > EPC for SGX guest.  Unlike EPC used by SGX driver, virtual EPC doesn't
> > > > track how EPC pages are used in VM, e.g. (de)construction of enclaves,
> > > > so it cannot guarantee EREMOVE success, e.g. it doesn't have a priori
> > > > knowledge of which pages are SECS with non-zero child counts.
> > > > 
> > > > Split sgx_free_page() into two parts so that the "add to free list"
> > > > part can be used by virtual EPC without having to modify the EREMOVE
> > > > logic in sgx_free_page().
> > > > 
> > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > 
> > > I have a better idea with the same outcome for KVM.
> > > 
> > > https://lore.kernel.org/linux-sgx/20210111223610.62261-1-jarkko@kernel.org/T/#t
> > 
> > I agree with your patch this one can be replaced. I'll include your patch in
> > next version, and once it is upstreamed, it can be removed in my series.
> > 
> > Sean, please let me know if you have objection.
> 
> 6 of one, half dozen of the other.  I liked not having to modify the existing
> call sites, but it's your code.
> 
> Though on that topic, this snippet is wrong:
> 
> @@ -431,7 +443,8 @@ void sgx_encl_release(struct kref *ref)
>  		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
>  					   list);
>  		list_del(&va_page->list);
> -		sgx_free_epc_page(va_page->epc_page);
> +		sgx_reset_epc_page(entry->epc_page);
> +		sgx_free_epc_page(entry->epc_page);
> 
> s/entry/va_page in the new code.
> 
> P.S. I apparently hadn't been subscribed linux-sgx and so didn't see those
>      patches.  I'm now subscribed and can chime-in as needed.

Thanks. I also have replied to Jarkko's v2 patch, and I think you can see it
now.

I think if Jarkko's patch is eventually merged to upstream, we can drop
this patch. So please help to comment if Jarkko's patch is reasonable, since I
don't have history with SGX driver and cannot immediately tell if it is
reasonable.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 01/23] x86/sgx: Split out adding EPC page to free list to separate helper
  2021-01-12 21:45       ` Sean Christopherson
  2021-01-13  1:15         ` Kai Huang
@ 2021-01-13 17:05         ` Jarkko Sakkinen
  1 sibling, 0 replies; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-13 17:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, linux-sgx, kvm, x86, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Tue, Jan 12, 2021 at 01:45:24PM -0800, Sean Christopherson wrote:
> On Tue, Jan 12, 2021, Kai Huang wrote:
> > On Tue, 12 Jan 2021 00:38:40 +0200 Jarkko Sakkinen wrote:
> > > On Wed, 2021-01-06 at 14:55 +1300, Kai Huang wrote:
> > > > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > > > 
> > > > SGX virtualization requires to allocate "raw" EPC and use it as virtual
> > > > EPC for SGX guest.  Unlike EPC used by SGX driver, virtual EPC doesn't
> > > > track how EPC pages are used in VM, e.g. (de)construction of enclaves,
> > > > so it cannot guarantee EREMOVE success, e.g. it doesn't have a priori
> > > > knowledge of which pages are SECS with non-zero child counts.
> > > > 
> > > > Split sgx_free_page() into two parts so that the "add to free list"
> > > > part can be used by virtual EPC without having to modify the EREMOVE
> > > > logic in sgx_free_page().
> > > > 
> > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > 
> > > I have a better idea with the same outcome for KVM.
> > > 
> > > https://lore.kernel.org/linux-sgx/20210111223610.62261-1-jarkko@kernel.org/T/#t
> > 
> > I agree with your patch this one can be replaced. I'll include your patch in
> > next version, and once it is upstreamed, it can be removed in my series.
> > 
> > Sean, please let me know if you have objection.
> 
> 6 of one, half dozen of the other.  I liked not having to modify the existing
> call sites, but it's your code.

Only the call sites contained sgx_encl_release() require a call to
sgx_reset_epc_page(). That's three in total.

> Though on that topic, this snippet is wrong:
> 
> @@ -431,7 +443,8 @@ void sgx_encl_release(struct kref *ref)
>  		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
>  					   list);
>  		list_del(&va_page->list);
> -		sgx_free_epc_page(va_page->epc_page);
> +		sgx_reset_epc_page(entry->epc_page);
> +		sgx_free_epc_page(entry->epc_page);

Thanks for the remark.

I checked why I did not see this when running test_sgx. I had not specified
local tree for BuildRoot (LINUX_OVERRIDE_SRCDIR) when building test image.

> s/entry/va_page in the new code.
> 
> P.S. I apparently hadn't been subscribed linux-sgx and so didn't see those
>      patches.  I'm now subscribed and can chime-in as needed.

OK, great, I can also CC directly to the next version.

/Jarkko

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-07  5:02       ` Dave Hansen
@ 2021-01-15 14:07         ` Kai Huang
  2021-01-15 15:39           ` Dave Hansen
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-15 14:07 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Wed, 6 Jan 2021 21:02:54 -0800 Dave Hansen wrote:
> On 1/6/21 5:42 PM, Kai Huang wrote:
> >> I understand why this made sense for regular enclaves, but I'm having a
> >> harder time here.  If you mmap(fd, MAP_SHARED), fork(), and then pass
> >> that mapping through to two different guests, you get to hold the
> >> pieces, just like if you did the same with normal memory.
> >>
> >> Why does the kernel need to enforce this policy?
> > Does Sean's reply in another email satisfy you?
> 
> I'm not totally convinced.
> 
> Please give it a go in the changelog for the next one and try to
> convince me that this is a good idea.  Focus on what the downsides will
> be if the kernel does not enforce this policy.  What will break, and why
> will it be bad?  Why is the kernel in the best position to thwart the
> badness?

Hi Dave, Sean,

Sorry for late reply of this. I feel I should try again to get consensus here.

From virtual EPC's perspective, if we don't force this in kernel, then
*theoretically*, userspace can use fork() to make multiple VMs map to the
same physical EPC, which will potentially cause enclaves in all VMs to behave
abnormally. So to me, from this perspective, it's better to enforce in kernel
so that only first VM can use this virtual EPC instance, because EPC by
architectural design cannot be shared.

But as Sean said, KVM doesn't support VM across multiple mm structs. And if I
read code correctly, KVM doesn't support userspace to use fork() to create new
VM. For instance, when creating VM, KVM grabs current->mm and keeps it in
'struct kvm' for bookkeeping, and kvm_vcpu_ioctl() and kvm_device_ioctl() will
refuse to work if kvm->mm doesn't equal to current->mm. So in practice, I
believe w/o enforcing this in kernel, we should also have no problem here.

Sean, please correct me if I am wrong.

Dave, if above stands, do you think it is reasonable to keep current->mm in
epc->mm and enforce in sgx_virt_epc_mmap()?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-12  2:07       ` Kai Huang
@ 2021-01-15 14:43         ` Kai Huang
  2021-01-16  9:31           ` Jarkko Sakkinen
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-15 14:43 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, jmattson, joro,
	vkuznets, wanpengli, corbet

On Tue, 2021-01-12 at 15:07 +1300, Kai Huang wrote:
> > > > > 
> > > > > To support virtual EPC, add a new misc device /dev/sgx_virt_epc to SGX
> > > > > core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
> > > > > "virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX
> > > > > driver,
> > > > > virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave
> > > > > associated,
> > > > > and how virtual EPC is used by guest is compeletely controlled by guest's
> > > > > SGX
> > > > > software.
> > > > 
> > > > I think that /dev/sgx_vepc would be a clear enough name for the device. This
> > > > text has now a bit confusing "terminology" related to this.
> > > 
> > > /dev/sgx_virt_epc may be clearer from userspace's perspective, for instance,
> > > if people see /dev/sgx_vepc, they may have to think about what it is,
> > > while /dev/sgx_virt_epc they may not.
> > > 
> > > But I don't have strong objection here. Does anyone has anything to say here?
> > 
> > It's already an abberevation to start with, why leave it halfways?
> > 
> > Especially when three remaining words have been shrunk to single
> > characters ('E', 'P' and 'C').
> > 
> 
> I have expressed my opinion above. And as I said I don't have strong objection
> here. I'll change to /dev/sgx_vepc if no one opposes.

Hi Jarkko,

I am reluctant to change to /dev/sgx_vepc now, because there are lots of
'sgx_virt_epc' in the code.  For instance, 'struct sgx_virt_epc', and function names
in sgx/virt.c are all sgx_virt_epc_xxx(), which has 'sgx_virt_epc' as prefix. I feel
changing to /dev/sgx_vepc only is kinda incomplete, but I really don't want to change
so many 'sgx_virt_epc' to 'sgx_vepc'. 

(Plus I still think  'virt_epc' is more obvious than 'vepc' from userspace's
perspective.)

Does it make sense?



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-15 14:07         ` Kai Huang
@ 2021-01-15 15:39           ` Dave Hansen
  2021-01-15 21:33             ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2021-01-15 15:39 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On 1/15/21 6:07 AM, Kai Huang wrote:
>>From virtual EPC's perspective, if we don't force this in kernel, then
> *theoretically*, userspace can use fork() to make multiple VMs map to the
> same physical EPC, which will potentially cause enclaves in all VMs to behave
> abnormally. So to me, from this perspective, it's better to enforce in kernel
> so that only first VM can use this virtual EPC instance, because EPC by
> architectural design cannot be shared.
> 
> But as Sean said, KVM doesn't support VM across multiple mm structs. And if I
> read code correctly, KVM doesn't support userspace to use fork() to create new
> VM. For instance, when creating VM, KVM grabs current->mm and keeps it in
> 'struct kvm' for bookkeeping, and kvm_vcpu_ioctl() and kvm_device_ioctl() will
> refuse to work if kvm->mm doesn't equal to current->mm. So in practice, I
> believe w/o enforcing this in kernel, we should also have no problem here.
> 
> Sean, please correct me if I am wrong.
> 
> Dave, if above stands, do you think it is reasonable to keep current->mm in
> epc->mm and enforce in sgx_virt_epc_mmap()?

Everything you wrote above tells me the kernel should not be enforcing
the behavior.  You basically said that it's only a theoretical problem,
and old if someone goes and does something with KVM that's nobody can do
today.

You've 100% convinced me that having the kernel enforce this is
*un*reasonable.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-15 15:39           ` Dave Hansen
@ 2021-01-15 21:33             ` Kai Huang
  2021-01-15 21:45               ` Sean Christopherson
  0 siblings, 1 reply; 111+ messages in thread
From: Kai Huang @ 2021-01-15 21:33 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, kvm, x86, seanjc, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Fri, 15 Jan 2021 07:39:44 -0800 Dave Hansen wrote:
> On 1/15/21 6:07 AM, Kai Huang wrote:
> >>From virtual EPC's perspective, if we don't force this in kernel, then
> > *theoretically*, userspace can use fork() to make multiple VMs map to the
> > same physical EPC, which will potentially cause enclaves in all VMs to behave
> > abnormally. So to me, from this perspective, it's better to enforce in kernel
> > so that only first VM can use this virtual EPC instance, because EPC by
> > architectural design cannot be shared.
> > 
> > But as Sean said, KVM doesn't support VM across multiple mm structs. And if I
> > read code correctly, KVM doesn't support userspace to use fork() to create new
> > VM. For instance, when creating VM, KVM grabs current->mm and keeps it in
> > 'struct kvm' for bookkeeping, and kvm_vcpu_ioctl() and kvm_device_ioctl() will
> > refuse to work if kvm->mm doesn't equal to current->mm. So in practice, I
> > believe w/o enforcing this in kernel, we should also have no problem here.
> > 
> > Sean, please correct me if I am wrong.
> > 
> > Dave, if above stands, do you think it is reasonable to keep current->mm in
> > epc->mm and enforce in sgx_virt_epc_mmap()?
> 
> Everything you wrote above tells me the kernel should not be enforcing
> the behavior.  You basically said that it's only a theoretical problem,
> and old if someone goes and does something with KVM that's nobody can do
> today.
> 
> You've 100% convinced me that having the kernel enforce this is
> *un*reasonable.

Sean, I'll remove epc->mm, unless I see your further objection.

Thanks to you both.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-15 21:33             ` Kai Huang
@ 2021-01-15 21:45               ` Sean Christopherson
  2021-01-15 22:30                 ` Kai Huang
  0 siblings, 1 reply; 111+ messages in thread
From: Sean Christopherson @ 2021-01-15 21:45 UTC (permalink / raw)
  To: Kai Huang
  Cc: Dave Hansen, linux-sgx, kvm, x86, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Sat, Jan 16, 2021, Kai Huang wrote:
> On Fri, 15 Jan 2021 07:39:44 -0800 Dave Hansen wrote:
> > On 1/15/21 6:07 AM, Kai Huang wrote:
> > >>From virtual EPC's perspective, if we don't force this in kernel, then
> > > *theoretically*, userspace can use fork() to make multiple VMs map to the
> > > same physical EPC, which will potentially cause enclaves in all VMs to behave
> > > abnormally. So to me, from this perspective, it's better to enforce in kernel
> > > so that only first VM can use this virtual EPC instance, because EPC by
> > > architectural design cannot be shared.
> > > 
> > > But as Sean said, KVM doesn't support VM across multiple mm structs. And if I
> > > read code correctly, KVM doesn't support userspace to use fork() to create new
> > > VM. For instance, when creating VM, KVM grabs current->mm and keeps it in
> > > 'struct kvm' for bookkeeping, and kvm_vcpu_ioctl() and kvm_device_ioctl() will
> > > refuse to work if kvm->mm doesn't equal to current->mm. So in practice, I
> > > believe w/o enforcing this in kernel, we should also have no problem here.
> > > 
> > > Sean, please correct me if I am wrong.
> > > 
> > > Dave, if above stands, do you think it is reasonable to keep current->mm in
> > > epc->mm and enforce in sgx_virt_epc_mmap()?
> > 
> > Everything you wrote above tells me the kernel should not be enforcing
> > the behavior.  You basically said that it's only a theoretical problem,
> > and old if someone goes and does something with KVM that's nobody can do
> > today.
> > 
> > You've 100% convinced me that having the kernel enforce this is
> > *un*reasonable.
> 
> Sean, I'll remove epc->mm, unless I see your further objection.

It's probably ok.  I guess worst case scenario, to avoid the mm tracking
nightmare for oversubscription, you could prevent attaching KVM to a virtual EPC
if there is already a mm associated with the EPC, or if there are already EPC
pages "in" the virt EPC.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-01-15 21:45               ` Sean Christopherson
@ 2021-01-15 22:30                 ` Kai Huang
  0 siblings, 0 replies; 111+ messages in thread
From: Kai Huang @ 2021-01-15 22:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dave Hansen, linux-sgx, kvm, x86, jarkko, luto, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa

On Fri, 15 Jan 2021 13:45:21 -0800 Sean Christopherson wrote:
> On Sat, Jan 16, 2021, Kai Huang wrote:
> > On Fri, 15 Jan 2021 07:39:44 -0800 Dave Hansen wrote:
> > > On 1/15/21 6:07 AM, Kai Huang wrote:
> > > >>From virtual EPC's perspective, if we don't force this in kernel, then
> > > > *theoretically*, userspace can use fork() to make multiple VMs map to the
> > > > same physical EPC, which will potentially cause enclaves in all VMs to behave
> > > > abnormally. So to me, from this perspective, it's better to enforce in kernel
> > > > so that only first VM can use this virtual EPC instance, because EPC by
> > > > architectural design cannot be shared.
> > > > 
> > > > But as Sean said, KVM doesn't support VM across multiple mm structs. And if I
> > > > read code correctly, KVM doesn't support userspace to use fork() to create new
> > > > VM. For instance, when creating VM, KVM grabs current->mm and keeps it in
> > > > 'struct kvm' for bookkeeping, and kvm_vcpu_ioctl() and kvm_device_ioctl() will
> > > > refuse to work if kvm->mm doesn't equal to current->mm. So in practice, I
> > > > believe w/o enforcing this in kernel, we should also have no problem here.
> > > > 
> > > > Sean, please correct me if I am wrong.
> > > > 
> > > > Dave, if above stands, do you think it is reasonable to keep current->mm in
> > > > epc->mm and enforce in sgx_virt_epc_mmap()?
> > > 
> > > Everything you wrote above tells me the kernel should not be enforcing
> > > the behavior.  You basically said that it's only a theoretical problem,
> > > and old if someone goes and does something with KVM that's nobody can do
> > > today.
> > > 
> > > You've 100% convinced me that having the kernel enforce this is
> > > *un*reasonable.
> > 
> > Sean, I'll remove epc->mm, unless I see your further objection.
> 
> It's probably ok.  I guess worst case scenario, to avoid the mm tracking
> nightmare for oversubscription, you could prevent attaching KVM to a virtual EPC
> if there is already a mm associated with the EPC, or if there are already EPC
> pages "in" the virt EPC.

Since we are not 100% certain oversubscription will be upstreamed, I think it
makes sense to address when we do it. For now, let us just drop it. 

Makes sense? Thanks.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-15 14:43         ` Kai Huang
@ 2021-01-16  9:31           ` Jarkko Sakkinen
  2021-01-16  9:50             ` Jarkko Sakkinen
  0 siblings, 1 reply; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-16  9:31 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, jmattson, joro,
	vkuznets, wanpengli, corbet

On Sat, Jan 16, 2021 at 03:43:18AM +1300, Kai Huang wrote:
> On Tue, 2021-01-12 at 15:07 +1300, Kai Huang wrote:
> > > > > > 
> > > > > > To support virtual EPC, add a new misc device /dev/sgx_virt_epc to SGX
> > > > > > core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
> > > > > > "virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX
> > > > > > driver,
> > > > > > virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave
> > > > > > associated,
> > > > > > and how virtual EPC is used by guest is compeletely controlled by guest's
> > > > > > SGX
> > > > > > software.
> > > > > 
> > > > > I think that /dev/sgx_vepc would be a clear enough name for the device. This
> > > > > text has now a bit confusing "terminology" related to this.
> > > > 
> > > > /dev/sgx_virt_epc may be clearer from userspace's perspective, for instance,
> > > > if people see /dev/sgx_vepc, they may have to think about what it is,
> > > > while /dev/sgx_virt_epc they may not.
> > > > 
> > > > But I don't have strong objection here. Does anyone has anything to say here?
> > > 
> > > It's already an abberevation to start with, why leave it halfways?
> > > 
> > > Especially when three remaining words have been shrunk to single
> > > characters ('E', 'P' and 'C').
> > > 
> > 
> > I have expressed my opinion above. And as I said I don't have strong objection
> > here. I'll change to /dev/sgx_vepc if no one opposes.
> 
> Hi Jarkko,
> 
> I am reluctant to change to /dev/sgx_vepc now, because there are lots of
> 'sgx_virt_epc' in the code.  For instance, 'struct sgx_virt_epc', and function names
> in sgx/virt.c are all sgx_virt_epc_xxx(), which has 'sgx_virt_epc' as prefix. I feel
> changing to /dev/sgx_vepc only is kinda incomplete, but I really don't want to change
> so many 'sgx_virt_epc' to 'sgx_vepc'. 
> 
> (Plus I still think  'virt_epc' is more obvious than 'vepc' from userspace's
> perspective.)
> 
> Does it make sense?

We can reconsider naming later on for sure, and maybe it's better to do
so. It's probably too early to define the final name.

As far as naming goes, I'm actually wondering is this usable outside of
KVM by any means? If not, then probably the best name for this device
would be sgx_kvm_epc. Better to be always as explicit as possible.

/Jarkko

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH 00/23] KVM SGX virtualization support
  2021-01-16  9:31           ` Jarkko Sakkinen
@ 2021-01-16  9:50             ` Jarkko Sakkinen
  0 siblings, 0 replies; 111+ messages in thread
From: Jarkko Sakkinen @ 2021-01-16  9:50 UTC (permalink / raw)
  To: Kai Huang
  Cc: linux-sgx, kvm, x86, seanjc, luto, dave.hansen, haitao.huang,
	pbonzini, bp, tglx, mingo, hpa, jethro, b.thiel, jmattson, joro,
	vkuznets, wanpengli, corbet

On Sat, Jan 16, 2021 at 11:31:49AM +0200, Jarkko Sakkinen wrote:
> On Sat, Jan 16, 2021 at 03:43:18AM +1300, Kai Huang wrote:
> > On Tue, 2021-01-12 at 15:07 +1300, Kai Huang wrote:
> > > > > > > 
> > > > > > > To support virtual EPC, add a new misc device /dev/sgx_virt_epc to SGX
> > > > > > > core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
> > > > > > > "virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX
> > > > > > > driver,
> > > > > > > virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave
> > > > > > > associated,
> > > > > > > and how virtual EPC is used by guest is compeletely controlled by guest's
> > > > > > > SGX
> > > > > > > software.
> > > > > > 
> > > > > > I think that /dev/sgx_vepc would be a clear enough name for the device. This
> > > > > > text has now a bit confusing "terminology" related to this.
> > > > > 
> > > > > /dev/sgx_virt_epc may be clearer from userspace's perspective, for instance,
> > > > > if people see /dev/sgx_vepc, they may have to think about what it is,
> > > > > while /dev/sgx_virt_epc they may not.
> > > > > 
> > > > > But I don't have strong objection here. Does anyone has anything to say here?
> > > > 
> > > > It's already an abberevation to start with, why leave it halfways?
> > > > 
> > > > Especially when three remaining words have been shrunk to single
> > > > characters ('E', 'P' and 'C').
> > > > 
> > > 
> > > I have expressed my opinion above. And as I said I don't have strong objection
> > > here. I'll change to /dev/sgx_vepc if no one opposes.
> > 
> > Hi Jarkko,
> > 
> > I am reluctant to change to /dev/sgx_vepc now, because there are lots of
> > 'sgx_virt_epc' in the code.  For instance, 'struct sgx_virt_epc', and function names
> > in sgx/virt.c are all sgx_virt_epc_xxx(), which has 'sgx_virt_epc' as prefix. I feel
> > changing to /dev/sgx_vepc only is kinda incomplete, but I really don't want to change
> > so many 'sgx_virt_epc' to 'sgx_vepc'. 
> > 
> > (Plus I still think  'virt_epc' is more obvious than 'vepc' from userspace's
> > perspective.)
> > 
> > Does it make sense?
> 
> We can reconsider naming later on for sure, and maybe it's better to do
> so. It's probably too early to define the final name.
> 
> As far as naming goes, I'm actually wondering is this usable outside of
> KVM by any means? If not, then probably the best name for this device
> would be sgx_kvm_epc. Better to be always as explicit as possible.

You can easily do such renames with git filter-branch over a patch set:

https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History

Having to rename something in too many places is not an argument.
Considering it too early is.

/Jarkko

^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2021-01-16  9:51 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-06  1:55 [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
2021-01-06  1:55 ` [RFC PATCH 01/23] x86/sgx: Split out adding EPC page to free list to separate helper Kai Huang
2021-01-11 22:38   ` Jarkko Sakkinen
2021-01-12  0:19     ` Kai Huang
2021-01-12 21:45       ` Sean Christopherson
2021-01-13  1:15         ` Kai Huang
2021-01-13 17:05         ` Jarkko Sakkinen
2021-01-06  1:55 ` [RFC PATCH 02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code Kai Huang
2021-01-06 18:28   ` Dave Hansen
2021-01-06 21:40     ` Kai Huang
2021-01-12  0:26     ` Jarkko Sakkinen
2021-01-11 23:32   ` Jarkko Sakkinen
2021-01-12  0:16     ` Kai Huang
2021-01-12  1:46       ` Jarkko Sakkinen
2021-01-06  1:55 ` [RFC PATCH 03/23] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
2021-01-06 19:35   ` Dave Hansen
2021-01-06 20:35     ` Sean Christopherson
2021-01-07  0:47       ` Kai Huang
2021-01-07  0:52         ` Dave Hansen
2021-01-07  1:38           ` Kai Huang
2021-01-07  5:00             ` Dave Hansen
2021-01-07  1:42     ` Kai Huang
2021-01-07  5:02       ` Dave Hansen
2021-01-15 14:07         ` Kai Huang
2021-01-15 15:39           ` Dave Hansen
2021-01-15 21:33             ` Kai Huang
2021-01-15 21:45               ` Sean Christopherson
2021-01-15 22:30                 ` Kai Huang
2021-01-11 23:38   ` Jarkko Sakkinen
2021-01-12  0:56     ` Kai Huang
2021-01-12  1:50       ` Jarkko Sakkinen
2021-01-12  2:03         ` Kai Huang
2021-01-06  1:55 ` [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features Kai Huang
2021-01-06 19:39   ` Dave Hansen
2021-01-06 22:12     ` Kai Huang
2021-01-06 22:21       ` Dave Hansen
2021-01-06 22:56         ` Kai Huang
2021-01-06 23:19           ` Sean Christopherson
2021-01-06 23:33             ` Dave Hansen
2021-01-06 23:56             ` Kai Huang
2021-01-06 23:40         ` Kai Huang
2021-01-06 23:43           ` Dave Hansen
2021-01-06 23:56             ` Kai Huang
2021-01-06 22:15   ` Borislav Petkov
2021-01-06 23:09     ` Kai Huang
2021-01-07  6:41       ` Borislav Petkov
2021-01-08  2:00         ` Kai Huang
2021-01-08  5:10           ` Dave Hansen
2021-01-08  7:03             ` Kai Huang
2021-01-08  7:17               ` Borislav Petkov
2021-01-08  8:06                 ` Kai Huang
2021-01-08  8:13                   ` Borislav Petkov
2021-01-08  9:00                     ` Kai Huang
2021-01-08 23:55                 ` Sean Christopherson
2021-01-09  0:35                   ` Borislav Petkov
2021-01-09  1:01                     ` Sean Christopherson
2021-01-09  1:19                   ` Borislav Petkov
2021-01-11 17:54                     ` Sean Christopherson
2021-01-11 19:09                       ` Borislav Petkov
2021-01-11 19:20                         ` Sean Christopherson
2021-01-12  2:01                           ` Kai Huang
2021-01-12 12:13                           ` Borislav Petkov
2021-01-12 17:15                             ` Sean Christopherson
2021-01-12 17:51                               ` Borislav Petkov
2021-01-12 21:07                                 ` Kai Huang
2021-01-12 23:17                                   ` Sean Christopherson
2021-01-13  1:05                                     ` Kai Huang
2021-01-11 23:39   ` Jarkko Sakkinen
2021-01-06  1:55 ` [RFC PATCH 05/23] x86/cpu/intel: Allow SGX virtualization without Launch Control support Kai Huang
2021-01-06 19:54   ` Dave Hansen
2021-01-06 22:34     ` Kai Huang
2021-01-06 22:38       ` Dave Hansen
2021-01-06  1:56 ` [RFC PATCH 06/23] x86/sgx: Expose SGX architectural definitions to the kernel Kai Huang
2021-01-06  1:56 ` [RFC PATCH 07/23] x86/sgx: Move ENCLS leaf definitions to sgx_arch.h Kai Huang
2021-01-06  1:56 ` [RFC PATCH 08/23] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT) Kai Huang
2021-01-06  1:56 ` [RFC PATCH 09/23] x86/sgx: Add encls_faulted() helper Kai Huang
2021-01-06  1:56 ` [RFC PATCH 10/23] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs Kai Huang
2021-01-06 19:56   ` Dave Hansen
2021-01-06  1:56 ` [RFC PATCH 11/23] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Kai Huang
2021-01-06 20:12   ` Dave Hansen
2021-01-06 21:04     ` Sean Christopherson
2021-01-06 21:23       ` Dave Hansen
2021-01-06 22:58         ` Kai Huang
2021-01-06  1:56 ` [RFC PATCH 12/23] x86/sgx: Move provisioning device creation out of SGX driver Kai Huang
2021-01-06  1:56 ` [RFC PATCH 13/23] KVM: VMX: Convert vcpu_vmx.exit_reason to a union Kai Huang
2021-01-06  1:56 ` [RFC PATCH 14/23] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX) Kai Huang
2021-01-06  1:56 ` [RFC PATCH 15/23] KVM: x86: Define new #PF SGX error code bit Kai Huang
2021-01-06  1:56 ` [RFC PATCH 16/23] KVM: x86: Add SGX feature leaf to reverse CPUID lookup Kai Huang
2021-01-06  1:56 ` [RFC PATCH 17/23] KVM: VMX: Add basic handling of VM-Exit from SGX enclave Kai Huang
2021-01-06  1:56 ` [RFC PATCH 18/23] KVM: VMX: Frame in ENCLS handler for SGX virtualization Kai Huang
2021-01-06  1:56 ` [RFC PATCH 19/23] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions Kai Huang
2021-01-06  1:56 ` [RFC PATCH 20/23] KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs Kai Huang
2021-01-06  1:56 ` [RFC PATCH 21/23] KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC) Kai Huang
2021-01-06  1:56 ` [RFC PATCH 22/23] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC Kai Huang
2021-01-06  1:58 ` [RFC PATCH 23/23] KVM: x86: Add capability to grant VM access to privileged SGX attribute Kai Huang
2021-01-06  2:22 ` [RFC PATCH 00/23] KVM SGX virtualization support Kai Huang
2021-01-06 17:07 ` Dave Hansen
2021-01-07  0:34   ` Kai Huang
2021-01-07  0:48     ` Dave Hansen
2021-01-07  1:50       ` Kai Huang
2021-01-07 16:14         ` Sean Christopherson
2021-01-08  2:16           ` Kai Huang
2021-01-11 17:20 ` Jarkko Sakkinen
2021-01-11 18:37   ` Sean Christopherson
2021-01-12  1:58     ` Jarkko Sakkinen
2021-01-12  1:14   ` Kai Huang
2021-01-12  2:02     ` Jarkko Sakkinen
2021-01-12  2:07       ` Kai Huang
2021-01-15 14:43         ` Kai Huang
2021-01-16  9:31           ` Jarkko Sakkinen
2021-01-16  9:50             ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).