All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 01/25] x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
@ 2021-03-19  7:22 ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Kai Huang
  2021-03-19  7:22 ` [PATCH v3 02/25] x86/cpufeatures: Add SGX1 and SGX2 sub-features Kai Huang
                   ` (25 subsequent siblings)
  26 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:22 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

Move SGX_LC feature bit to CPUID dependency table to make clearing all
SGX feature bits easier. Also remove clear_sgx_caps() since it is just
a wrapper of setup_clear_cpu_cap(X86_FEATURE_SGX) now.

Suggested-by: Sean Christopherson <seanjc@google.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/cpu/cpuid-deps.c |  1 +
 arch/x86/kernel/cpu/feat_ctl.c   | 12 +++---------
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 42af31b64c2c..d40f8e0a54ce 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -72,6 +72,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_PER_THREAD_MBA,		X86_FEATURE_MBA       },
+	{ X86_FEATURE_SGX_LC,			X86_FEATURE_SGX	      },
 	{}
 };
 
diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
index 3b1b01f2b248..27533a6e04fa 100644
--- a/arch/x86/kernel/cpu/feat_ctl.c
+++ b/arch/x86/kernel/cpu/feat_ctl.c
@@ -93,15 +93,9 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c)
 }
 #endif /* CONFIG_X86_VMX_FEATURE_NAMES */
 
-static void clear_sgx_caps(void)
-{
-	setup_clear_cpu_cap(X86_FEATURE_SGX);
-	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
-}
-
 static int __init nosgx(char *str)
 {
-	clear_sgx_caps();
+	setup_clear_cpu_cap(X86_FEATURE_SGX);
 
 	return 0;
 }
@@ -116,7 +110,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 
 	if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) {
 		clear_cpu_cap(c, X86_FEATURE_VMX);
-		clear_sgx_caps();
+		clear_cpu_cap(c, X86_FEATURE_SGX);
 		return;
 	}
 
@@ -177,6 +171,6 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 	    !(msr & FEAT_CTL_SGX_LC_ENABLED) || !enable_sgx) {
 		if (enable_sgx)
 			pr_err_once("SGX disabled by BIOS\n");
-		clear_sgx_caps();
+		clear_cpu_cap(c, X86_FEATURE_SGX);
 	}
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 02/25] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
  2021-03-19  7:22 ` [PATCH v3 01/25] x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit Kai Huang
@ 2021-03-19  7:22 ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  2021-03-19  7:22 ` [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() Kai Huang
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:22 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

From: Sean Christopherson <seanjc@google.com>

Add SGX1 and SGX2 feature flags, via CPUID.0x12.0x0.EAX, as scattered
features, since adding a new leaf for only two bits would be wasteful.
As part of virtualizing SGX, KVM will expose the SGX CPUID leafs to its
guest, and to do so correctly needs to query hardware and kernel support
for SGX1 and SGX2.

Suppress both SGX1 and SGX2 from /proc/cpuinfo. SGX1 basically means
SGX, and for SGX2 there is no concrete use case of using it in
/proc/cpuinfo.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c   | 2 ++
 arch/x86/kernel/cpu/scattered.c    | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cc96e26d69f7..1f918f5e0055 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -290,6 +290,8 @@
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
 #define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 #define X86_FEATURE_PER_THREAD_MBA	(11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
+#define X86_FEATURE_SGX1		(11*32+ 8) /* "" Basic SGX */
+#define X86_FEATURE_SGX2		(11*32+ 9) /* "" SGX Enclave Dynamic Memory Management (EDMM) */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index d40f8e0a54ce..defda61f372d 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -73,6 +73,8 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_PER_THREAD_MBA,		X86_FEATURE_MBA       },
 	{ X86_FEATURE_SGX_LC,			X86_FEATURE_SGX	      },
+	{ X86_FEATURE_SGX1,			X86_FEATURE_SGX       },
+	{ X86_FEATURE_SGX2,			X86_FEATURE_SGX1      },
 	{}
 };
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 972ec3bfa9c0..21d1f062895a 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -36,6 +36,8 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_CDP_L2,		CPUID_ECX,  2, 0x00000010, 2 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  3, 0x00000010, 0 },
 	{ X86_FEATURE_PER_THREAD_MBA,	CPUID_ECX,  0, 0x00000010, 3 },
+	{ X86_FEATURE_SGX1,		CPUID_EAX,  0, 0x00000012, 0 },
+	{ X86_FEATURE_SGX2,		CPUID_EAX,  1, 0x00000012, 0 },
 	{ X86_FEATURE_HW_PSTATE,	CPUID_EDX,  7, 0x80000007, 0 },
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
  2021-03-19  7:22 ` [PATCH v3 01/25] x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit Kai Huang
  2021-03-19  7:22 ` [PATCH v3 02/25] x86/cpufeatures: Add SGX1 and SGX2 sub-features Kai Huang
@ 2021-03-19  7:22 ` Kai Huang
  2021-03-22 18:16   ` Borislav Petkov
  2021-03-25  9:30   ` [PATCH v4 " Kai Huang
  2021-03-19  7:22 ` [PATCH v3 04/25] x86/sgx: Add SGX_CHILD_PRESENT hardware error code Kai Huang
                   ` (23 subsequent siblings)
  26 siblings, 2 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:22 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

EREMOVE takes a page and removes any association between that page and
an enclave.  It must be run on a page before it can be added into
another enclave.  Currently, EREMOVE is run as part of pages being freed
into the SGX page allocator.  It is not expected to fail.

KVM does not track how guest pages are used, which means that SGX
virtualization use of EREMOVE might fail.

Break out the EREMOVE call from the SGX page allocator.  This will allow
the SGX virtualization code to use the allocator directly.  (SGX/KVM
will also introduce a more permissive EREMOVE helper).

Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
more specific that it is used to free EPC page assigned to one enclave.
Explicitly give an message using WARN_ONCE() when EREMOVE fails, to call
out EPC page is leaked, and requires machine reboot to get leaked pages
back.

Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call
sites.  No functional change is intended, except the new WARNING message
when EREMOVE fails.

Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v2->v3:

 - Changed to replace all call sites of sgx_free_epc_page() with
   sgx_encl_free_epc_page() to make this patch have no functional change,
   except a WARN() upon EREMOVE failure requested by Dave.
 - Rebased to latest tip/x86/sgx to resolve merge conflict with Jarkko's NUMA
   allocation.
 - Removed Jarkko as author. Added Jarkko's Acked-by.

v1->v2:

 - Merge original WARN() and pr_err_once() into one single WARN(), suggested
   by Sean.

---
 arch/x86/kernel/cpu/sgx/encl.c  | 40 ++++++++++++++++++++++++++++-----
 arch/x86/kernel/cpu/sgx/encl.h  |  1 +
 arch/x86/kernel/cpu/sgx/ioctl.c |  6 ++---
 arch/x86/kernel/cpu/sgx/main.c  | 14 +++++-------
 4 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 7449ef33f081..e0fb0f121616 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -78,7 +78,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	ret = __sgx_encl_eldu(encl_page, epc_page, secs_page);
 	if (ret) {
-		sgx_free_epc_page(epc_page);
+		sgx_encl_free_epc_page(epc_page);
 		return ERR_PTR(ret);
 	}
 
@@ -404,7 +404,7 @@ void sgx_encl_release(struct kref *ref)
 			if (sgx_unmark_page_reclaimable(entry->epc_page))
 				continue;
 
-			sgx_free_epc_page(entry->epc_page);
+			sgx_encl_free_epc_page(entry->epc_page);
 			encl->secs_child_cnt--;
 			entry->epc_page = NULL;
 		}
@@ -415,7 +415,7 @@ void sgx_encl_release(struct kref *ref)
 	xa_destroy(&encl->page_array);
 
 	if (!encl->secs_child_cnt && encl->secs.epc_page) {
-		sgx_free_epc_page(encl->secs.epc_page);
+		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 	}
 
@@ -423,7 +423,7 @@ void sgx_encl_release(struct kref *ref)
 		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
 					   list);
 		list_del(&va_page->list);
-		sgx_free_epc_page(va_page->epc_page);
+		sgx_encl_free_epc_page(va_page->epc_page);
 		kfree(va_page);
 	}
 
@@ -686,7 +686,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
 	ret = __epa(sgx_get_epc_virt_addr(epc_page));
 	if (ret) {
 		WARN_ONCE(1, "EPA returned %d (0x%x)", ret, ret);
-		sgx_free_epc_page(epc_page);
+		sgx_encl_free_epc_page(epc_page);
 		return ERR_PTR(-EFAULT);
 	}
 
@@ -735,3 +735,33 @@ bool sgx_va_page_full(struct sgx_va_page *va_page)
 
 	return slot == SGX_VA_SLOT_COUNT;
 }
+
+/**
+ * sgx_encl_free_epc_page - free EPC page assigned to an enclave
+ * @page:	EPC page to be freed
+ *
+ * Free EPC page assigned to an enclave.  It does EREMOVE for the page, and
+ * only upon success, it puts the page back to free page list.  Otherwise, it
+ * gives a WARNING to indicate page is leaked, and require reboot to retrieve
+ * leaked pages.
+ */
+void sgx_encl_free_epc_page(struct sgx_epc_page *page)
+{
+	int ret;
+
+	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
+
+	/*
+	 * Give a message to remind EPC page is leaked when EREMOVE fails,
+	 * and requires machine reboot to get leaked pages back. This can
+	 * be improved in future by adding stats of leaked pages, etc.
+	 */
+#define EREMOVE_ERROR_MESSAGE \
+	"EREMOVE returned %d (0x%x).  EPC page leaked.  Reboot required to retrieve leaked pages."
+	ret = __eremove(sgx_get_epc_virt_addr(page));
+	if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret))
+		return;
+#undef EREMOVE_ERROR_MESSAGE
+
+	sgx_free_epc_page(page);
+}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index d8d30ccbef4c..6e74f85b6264 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -115,5 +115,6 @@ struct sgx_epc_page *sgx_alloc_va_page(void);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
+void sgx_encl_free_epc_page(struct sgx_epc_page *page);
 
 #endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 90a5caf76939..772b9c648cf1 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -47,7 +47,7 @@ static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
 	encl->page_cnt--;
 
 	if (va_page) {
-		sgx_free_epc_page(va_page->epc_page);
+		sgx_encl_free_epc_page(va_page->epc_page);
 		list_del(&va_page->list);
 		kfree(va_page);
 	}
@@ -117,7 +117,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 	return 0;
 
 err_out:
-	sgx_free_epc_page(encl->secs.epc_page);
+	sgx_encl_free_epc_page(encl->secs.epc_page);
 	encl->secs.epc_page = NULL;
 
 err_out_backing:
@@ -365,7 +365,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 	mmap_read_unlock(current->mm);
 
 err_out_free:
-	sgx_free_epc_page(epc_page);
+	sgx_encl_free_epc_page(epc_page);
 	kfree(encl_page);
 
 	return ret;
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 5c9c5e5fb1fb..6a734f484aa7 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -294,7 +294,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 
 		sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
 
-		sgx_free_epc_page(encl->secs.epc_page);
+		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 
 		sgx_encl_put_backing(&secs_backing, true);
@@ -609,19 +609,15 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  * sgx_free_epc_page() - Free an EPC page
  * @page:	an EPC page
  *
- * Call EREMOVE for an EPC page and insert it back to the list of free pages.
+ * Put the EPC page back to the list of free pages. It's the caller's
+ * responsibility to make sure that the page is in uninitialized state. In other
+ * words, do EREMOVE, EWB or whatever operation is necessary before calling
+ * this function.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
 	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
 	struct sgx_numa_node *node = section->node;
-	int ret;
-
-	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
-
-	ret = __eremove(sgx_get_epc_virt_addr(page));
-	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
-		return;
 
 	spin_lock(&node->lock);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 04/25] x86/sgx: Add SGX_CHILD_PRESENT hardware error code
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (2 preceding siblings ...)
  2021-03-19  7:22 ` [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() Kai Huang
@ 2021-03-19  7:22 ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  2021-03-19  7:22 ` [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
                   ` (22 subsequent siblings)
  26 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:22 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

SGX driver can accurately track how enclave pages are used.  This
enables SECS to be specifically targeted and EREMOVE'd only after all
child pages have been EREMOVE'd.  This ensures that SGX driver will
never encounter SGX_CHILD_PRESENT in normal operation.

Virtual EPC is different.  The host does not track how EPC pages are
used by the guest, so it cannot guarantee EREMOVE success.  It might,
for instance, encounter a SECS with a non-zero child count.

Add a definition of SGX_CHILD_PRESENT.  It will be used exclusively by
the SGX virtualization driver to handle recoverable EREMOVE errors when
saniziting EPC pages after they are freed.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/cpu/sgx/arch.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/arch.h b/arch/x86/kernel/cpu/sgx/arch.h
index dd7602c44c72..abf99bb71fdc 100644
--- a/arch/x86/kernel/cpu/sgx/arch.h
+++ b/arch/x86/kernel/cpu/sgx/arch.h
@@ -26,12 +26,14 @@
  * enum sgx_return_code - The return code type for ENCLS, ENCLU and ENCLV
  * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
  *				been completed yet.
+ * %SGX_CHILD_PRESENT		SECS has child pages present in the EPC.
  * %SGX_INVALID_EINITTOKEN:	EINITTOKEN is invalid and enclave signer's
  *				public key does not match IA32_SGXLEPUBKEYHASH.
  * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was received
  */
 enum sgx_return_code {
 	SGX_NOT_TRACKED			= 11,
+	SGX_CHILD_PRESENT		= 13,
 	SGX_INVALID_EINITTOKEN		= 16,
 	SGX_UNMASKED_EVENT		= 128,
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (3 preceding siblings ...)
  2021-03-19  7:22 ` [PATCH v3 04/25] x86/sgx: Add SGX_CHILD_PRESENT hardware error code Kai Huang
@ 2021-03-19  7:22 ` Kai Huang
  2021-03-25  9:36   ` Kai Huang
                     ` (4 more replies)
  2021-03-19  7:22 ` [PATCH v3 06/25] x86/cpu/intel: Allow SGX virtualization without Launch Control support Kai Huang
                   ` (21 subsequent siblings)
  26 siblings, 5 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:22 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw" EPC
without an associated enclave.  The intended and only known use case for
raw EPC allocation is to expose EPC to a KVM guest, hence the 'vepc'
moniker, virt.{c,h} files and X86_SGX_KVM Kconfig.

SGX driver uses misc device /dev/sgx_enclave to support userspace to
create enclave.  Each file descriptor from opening /dev/sgx_enclave
represents an enclave.  Unlike SGX driver, KVM doesn't control how guest
uses EPC, therefore EPC allocated to KVM guest is not associated to an
enclave, and /dev/sgx_enclave is not suitable for allocating EPC for KVM
guest.

Having separate device nodes for SGX driver and KVM virtual EPC also
allows separate permission control for running host SGX enclaves and
KVM SGX guests.

To use /dev/sgx_vepc to allocate a virtual EPC instance with particular
size, the userspace hypervisor opens /dev/sgx_vepc, and uses mmap()
with the intended size to get an address range of virtual EPC.  Then
it may use the address range to create one KVM memory slot as virtual
EPC for guest.

Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
/dev/sgx_vepc rather than in KVM. Doing so has two major advantages:

  - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
    just another memory backend for guests.

  - EPC management is wholly contained in the SGX subsystem, e.g. SGX
    does not have to export any symbols, changes to reclaim flows don't
    need to be routed through KVM, SGX's dirty laundry doesn't have to
    get aired out for the world to see, and so on and so forth.

The virtual EPC pages allocated to guests are currently not reclaimable.
Reclaiming EPC page used by enclave requires a special reclaim mechanism
separate from normal page reclaim, and that mechanism is not supported
for virutal EPC pages.  Due to the complications of handling reclaim
conflicts between guest and host, reclaiming virtual EPC pages is
significantly more complex than basic support for SGX virtualization.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/Kconfig                 |  12 ++
 arch/x86/kernel/cpu/sgx/Makefile |   1 +
 arch/x86/kernel/cpu/sgx/sgx.h    |   9 ++
 arch/x86/kernel/cpu/sgx/virt.c   | 260 +++++++++++++++++++++++++++++++
 4 files changed, 282 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 35391e94bd22..007912f67a06 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1942,6 +1942,18 @@ config X86_SGX
 
 	  If unsure, say N.
 
+config X86_SGX_KVM
+	bool "Software Guard eXtensions (SGX) Virtualization"
+	depends on X86_SGX && KVM_INTEL
+	help
+
+	  Enables KVM guests to create SGX enclaves.
+
+	  This includes support to expose "raw" unreclaimable enclave memory to
+	  guests via a device node, e.g. /dev/sgx_vepc.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 91d3dc784a29..9c1656779b2a 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -3,3 +3,4 @@ obj-y += \
 	encl.o \
 	ioctl.o \
 	main.o
+obj-$(CONFIG_X86_SGX_KVM)	+= virt.o
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 653af8ca1a25..a3aa00cb1ac4 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -80,4 +80,13 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
 
+#ifdef CONFIG_X86_SGX_KVM
+int __init sgx_vepc_init(void);
+#else
+static inline int __init sgx_vepc_init(void)
+{
+	return -ENODEV;
+}
+#endif
+
 #endif /* _X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
new file mode 100644
index 000000000000..29d8d28b4695
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -0,0 +1,260 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Device driver to expose SGX enclave memory to KVM guests.
+ *
+ * Copyright(c) 2021 Intel Corporation.
+ */
+
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/xarray.h>
+#include <asm/sgx.h>
+#include <uapi/asm/sgx.h>
+
+#include "encls.h"
+#include "sgx.h"
+
+struct sgx_vepc {
+	struct xarray page_array;
+	struct mutex lock;
+};
+
+/*
+ * Temporary SECS pages that cannot be EREMOVE'd due to having child in other
+ * virtual EPC instances, and the lock to protect it.
+ */
+static struct mutex zombie_secs_pages_lock;
+static struct list_head zombie_secs_pages;
+
+static int __sgx_vepc_fault(struct sgx_vepc *vepc,
+			    struct vm_area_struct *vma, unsigned long addr)
+{
+	struct sgx_epc_page *epc_page;
+	unsigned long index, pfn;
+	int ret;
+
+	WARN_ON(!mutex_is_locked(&vepc->lock));
+
+	/* Calculate index of EPC page in virtual EPC's page_array */
+	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
+
+	epc_page = xa_load(&vepc->page_array, index);
+	if (epc_page)
+		return 0;
+
+	epc_page = sgx_alloc_epc_page(vepc, false);
+	if (IS_ERR(epc_page))
+		return PTR_ERR(epc_page);
+
+	ret = xa_err(xa_store(&vepc->page_array, index, epc_page, GFP_KERNEL));
+	if (ret)
+		goto err_free;
+
+	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
+
+	ret = vmf_insert_pfn(vma, addr, pfn);
+	if (ret != VM_FAULT_NOPAGE) {
+		ret = -EFAULT;
+		goto err_delete;
+	}
+
+	return 0;
+
+err_delete:
+	xa_erase(&vepc->page_array, index);
+err_free:
+	sgx_free_epc_page(epc_page);
+	return ret;
+}
+
+static vm_fault_t sgx_vepc_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_vepc *vepc = vma->vm_private_data;
+	int ret;
+
+	mutex_lock(&vepc->lock);
+	ret = __sgx_vepc_fault(vepc, vma, vmf->address);
+	mutex_unlock(&vepc->lock);
+
+	if (!ret)
+		return VM_FAULT_NOPAGE;
+
+	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
+		mmap_read_unlock(vma->vm_mm);
+		return VM_FAULT_RETRY;
+	}
+
+	return VM_FAULT_SIGBUS;
+}
+
+const struct vm_operations_struct sgx_vepc_vm_ops = {
+	.fault = sgx_vepc_fault,
+};
+
+static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct sgx_vepc *vepc = file->private_data;
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
+	vma->vm_ops = &sgx_vepc_vm_ops;
+	/* Don't copy VMA in fork() */
+	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
+	vma->vm_private_data = vepc;
+
+	return 0;
+}
+
+static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+{
+	int ret;
+
+	/*
+	 * Take a previously guest-owned EPC page and return it to the
+	 * general EPC page pool.
+	 *
+	 * Guests can not be trusted to have left this page in a good
+	 * state, so run EREMOVE on the page unconditionally.  In the
+	 * case that a guest properly EREMOVE'd this page, a superfluous
+	 * EREMOVE is harmless.
+	 */
+	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
+	if (ret) {
+		/*
+		 * Only SGX_CHILD_PRESENT is expected, which is because of
+		 * EREMOVE'ing an SECS still with child, in which case it can
+		 * be handled by EREMOVE'ing the SECS again after all pages in
+		 * virtual EPC have been EREMOVE'd. See comments in below in
+		 * sgx_vepc_release().
+		 *
+		 * The user of virtual EPC (KVM) needs to guarantee there's no
+		 * logical processor is still running in the enclave in guest,
+		 * otherwise EREMOVE will get SGX_ENCLAVE_ACT which cannot be
+		 * handled here.
+		 */
+		WARN_ONCE(ret != SGX_CHILD_PRESENT,
+			  "EREMOVE (EPC page 0x%lx): unexpected error: %d\n",
+			  sgx_get_epc_phys_addr(epc_page), ret);
+		return ret;
+	}
+
+	sgx_free_epc_page(epc_page);
+
+	return 0;
+}
+
+static int sgx_vepc_release(struct inode *inode, struct file *file)
+{
+	struct sgx_vepc *vepc = file->private_data;
+	struct sgx_epc_page *epc_page, *tmp, *entry;
+	unsigned long index;
+
+	LIST_HEAD(secs_pages);
+
+	xa_for_each(&vepc->page_array, index, entry) {
+		/*
+		 * Remove all normal, child pages.  sgx_vepc_free_page()
+		 * will fail if EREMOVE fails, but this is OK and expected on
+		 * SECS pages.  Those can only be EREMOVE'd *after* all their
+		 * child pages. Retries below will clean them up.
+		 */
+		if (sgx_vepc_free_page(entry))
+			continue;
+
+		xa_erase(&vepc->page_array, index);
+	}
+
+	/*
+	 * Retry EREMOVE'ing pages.  This will clean up any SECS pages that
+	 * only had children in this 'epc' area.
+	 */
+	xa_for_each(&vepc->page_array, index, entry) {
+		epc_page = entry;
+		/*
+		 * An EREMOVE failure here means that the SECS page still
+		 * has children.  But, since all children in this 'sgx_vepc'
+		 * have been removed, the SECS page must have a child on
+		 * another instance.
+		 */
+		if (sgx_vepc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+
+		xa_erase(&vepc->page_array, index);
+	}
+
+	/*
+	 * SECS pages are "pinned" by child pages, and unpinned once all
+	 * children have been EREMOVE'd.  A child page in this instance
+	 * may have pinned an SECS page encountered in an earlier release(),
+	 * creating a zombie.  Since some children were EREMOVE'd above,
+	 * try to EREMOVE all zombies in the hopes that one was unpinned.
+	 */
+	mutex_lock(&zombie_secs_pages_lock);
+	list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) {
+		/*
+		 * Speculatively remove the page from the list of zombies,
+		 * if the page is successfully EREMOVE it will be added to
+		 * the list of free pages.  If EREMOVE fails, throw the page
+		 * on the local list, which will be spliced on at the end.
+		 */
+		list_del(&epc_page->list);
+
+		if (sgx_vepc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+	}
+
+	if (!list_empty(&secs_pages))
+		list_splice_tail(&secs_pages, &zombie_secs_pages);
+	mutex_unlock(&zombie_secs_pages_lock);
+
+	kfree(vepc);
+
+	return 0;
+}
+
+static int sgx_vepc_open(struct inode *inode, struct file *file)
+{
+	struct sgx_vepc *vepc;
+
+	vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL);
+	if (!vepc)
+		return -ENOMEM;
+	mutex_init(&vepc->lock);
+	xa_init(&vepc->page_array);
+
+	file->private_data = vepc;
+
+	return 0;
+}
+
+static const struct file_operations sgx_vepc_fops = {
+	.owner		= THIS_MODULE,
+	.open		= sgx_vepc_open,
+	.release	= sgx_vepc_release,
+	.mmap		= sgx_vepc_mmap,
+};
+
+static struct miscdevice sgx_vepc_dev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "sgx_vepc",
+	.nodename = "sgx_vepc",
+	.fops = &sgx_vepc_fops,
+};
+
+int __init sgx_vepc_init(void)
+{
+	/* SGX virtualization requires KVM to work */
+	if (!boot_cpu_has(X86_FEATURE_VMX))
+		return -ENODEV;
+
+	INIT_LIST_HEAD(&zombie_secs_pages);
+	mutex_init(&zombie_secs_pages_lock);
+
+	return misc_register(&sgx_vepc_dev);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 06/25] x86/cpu/intel: Allow SGX virtualization without Launch Control support
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (4 preceding siblings ...)
  2021-03-19  7:22 ` [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
@ 2021-03-19  7:22 ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  2021-03-19  7:23 ` [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled Kai Huang
                   ` (20 subsequent siblings)
  26 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:22 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jethro, b.thiel, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

The kernel will currently disable all SGX support if the hardware does
not support launch control.  Make it more permissive to allow SGX
virtualization on systems without Launch Control support.  This will
allow KVM to expose SGX to guests that have less-strict requirements on
the availability of flexible launch control.

Improve error message to distinguish between three cases.  There are two
cases where SGX support is completely disabled:
1) SGX has been disabled completely by the BIOS
2) SGX LC is locked by the BIOS.  Bare-metal support is disabled because
   of LC unavailability.  SGX virtualization is unavailable (because of
   Kconfig).
One where it is partially available:
3) SGX LC is locked by the BIOS.  Bare-metal support is disabled because
   of LC unavailability.  SGX virtualization is supported.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v2->v3:
 - Fix nit: s/Faunch/Launch.

---
 arch/x86/kernel/cpu/feat_ctl.c | 59 +++++++++++++++++++++++++---------
 1 file changed, 44 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
index 27533a6e04fa..da696eb4821a 100644
--- a/arch/x86/kernel/cpu/feat_ctl.c
+++ b/arch/x86/kernel/cpu/feat_ctl.c
@@ -104,8 +104,9 @@ early_param("nosgx", nosgx);
 
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 {
+	bool enable_sgx_kvm = false, enable_sgx_driver = false;
 	bool tboot = tboot_enabled();
-	bool enable_sgx;
+	bool enable_vmx;
 	u64 msr;
 
 	if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) {
@@ -114,13 +115,19 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 		return;
 	}
 
-	/*
-	 * Enable SGX if and only if the kernel supports SGX and Launch Control
-	 * is supported, i.e. disable SGX if the LE hash MSRs can't be written.
-	 */
-	enable_sgx = cpu_has(c, X86_FEATURE_SGX) &&
-		     cpu_has(c, X86_FEATURE_SGX_LC) &&
-		     IS_ENABLED(CONFIG_X86_SGX);
+	enable_vmx = cpu_has(c, X86_FEATURE_VMX) &&
+		     IS_ENABLED(CONFIG_KVM_INTEL);
+
+	if (cpu_has(c, X86_FEATURE_SGX) && IS_ENABLED(CONFIG_X86_SGX)) {
+		/*
+		 * Separate out SGX driver enabling from KVM.  This allows KVM
+		 * guests to use SGX even if the kernel SGX driver refuses to
+		 * use it.  This happens if flexible Launch Control is not
+		 * available.
+		 */
+		enable_sgx_driver = cpu_has(c, X86_FEATURE_SGX_LC);
+		enable_sgx_kvm = enable_vmx && IS_ENABLED(CONFIG_X86_SGX_KVM);
+	}
 
 	if (msr & FEAT_CTL_LOCKED)
 		goto update_caps;
@@ -136,15 +143,18 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 	 * i.e. KVM is enabled, to avoid unnecessarily adding an attack vector
 	 * for the kernel, e.g. using VMX to hide malicious code.
 	 */
-	if (cpu_has(c, X86_FEATURE_VMX) && IS_ENABLED(CONFIG_KVM_INTEL)) {
+	if (enable_vmx) {
 		msr |= FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;
 
 		if (tboot)
 			msr |= FEAT_CTL_VMX_ENABLED_INSIDE_SMX;
 	}
 
-	if (enable_sgx)
-		msr |= FEAT_CTL_SGX_ENABLED | FEAT_CTL_SGX_LC_ENABLED;
+	if (enable_sgx_kvm || enable_sgx_driver) {
+		msr |= FEAT_CTL_SGX_ENABLED;
+		if (enable_sgx_driver)
+			msr |= FEAT_CTL_SGX_LC_ENABLED;
+	}
 
 	wrmsrl(MSR_IA32_FEAT_CTL, msr);
 
@@ -167,10 +177,29 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 	}
 
 update_sgx:
-	if (!(msr & FEAT_CTL_SGX_ENABLED) ||
-	    !(msr & FEAT_CTL_SGX_LC_ENABLED) || !enable_sgx) {
-		if (enable_sgx)
-			pr_err_once("SGX disabled by BIOS\n");
+	if (!(msr & FEAT_CTL_SGX_ENABLED)) {
+		if (enable_sgx_kvm || enable_sgx_driver)
+			pr_err_once("SGX disabled by BIOS.\n");
 		clear_cpu_cap(c, X86_FEATURE_SGX);
+		return;
+	}
+
+	/*
+	 * VMX feature bit may be cleared due to being disabled in BIOS,
+	 * in which case SGX virtualization cannot be supported either.
+	 */
+	if (!cpu_has(c, X86_FEATURE_VMX) && enable_sgx_kvm) {
+		pr_err_once("SGX virtualization disabled due to lack of VMX.\n");
+		enable_sgx_kvm = 0;
+	}
+
+	if (!(msr & FEAT_CTL_SGX_LC_ENABLED) && enable_sgx_driver) {
+		if (!enable_sgx_kvm) {
+			pr_err_once("SGX Launch Control is locked. Disable SGX.\n");
+			clear_cpu_cap(c, X86_FEATURE_SGX);
+		} else {
+			pr_err_once("SGX Launch Control is locked. Support SGX virtualization only.\n");
+			clear_cpu_cap(c, X86_FEATURE_SGX_LC);
+		}
 	}
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (5 preceding siblings ...)
  2021-03-19  7:22 ` [PATCH v3 06/25] x86/cpu/intel: Allow SGX virtualization without Launch Control support Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-04-02  9:48   ` Borislav Petkov
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Kai Huang
  2021-03-19  7:23 ` [PATCH v3 08/25] x86/sgx: Expose SGX architectural definitions to the kernel Kai Huang
                   ` (19 subsequent siblings)
  26 siblings, 2 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, linux-sgx, x86
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

Modify sgx_init() to always try to initialize the virtual EPC driver,
even if the SGX driver is disabled.  The SGX driver might be disabled
if SGX Launch Control is in locked mode, or not supported in the
hardware at all.  This allows (non-Linux) guests that support non-LC
configurations to use SGX.

Acked-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 6a734f484aa7..b73114150ff8 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -743,7 +743,15 @@ static int __init sgx_init(void)
 		goto err_page_cache;
 	}
 
-	ret = sgx_drv_init();
+	/*
+	 * Always try to initialize the native *and* KVM drivers.
+	 * The KVM driver is less picky than the native one and
+	 * can function if the native one is not supported on the
+	 * current system or fails to initialize.
+	 *
+	 * Error out only if both fail to initialize.
+	 */
+	ret = !!sgx_drv_init() & !!sgx_vepc_init();
 	if (ret)
 		goto err_kthread;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 08/25] x86/sgx: Expose SGX architectural definitions to the kernel
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (6 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  2021-03-19  7:23 ` [PATCH v3 09/25] x86/sgx: Move ENCLS leaf definitions to sgx.h Kai Huang
                   ` (18 subsequent siblings)
  26 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, linux-sgx, x86
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Expose SGX architectural structures, as KVM will use many of the
architectural constants and structs to virtualize SGX.

Name the new header file as asm/sgx.h, rather than asm/sgx_arch.h, to
have single header to provide SGX facilities to share with other kernel
componments. Also update MAINTAINERS to include asm/sgx.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v1->v3:
 - Added MAINTAINERS file update to include new asm/sgx.h
 - Changed 'line' to 'comment' in the comment pointed out by Sean.

---
 MAINTAINERS                                   |  1 +
 .../cpu/sgx/arch.h => include/asm/sgx.h}      | 20 ++++++++++++++-----
 arch/x86/kernel/cpu/sgx/encl.c                |  2 +-
 arch/x86/kernel/cpu/sgx/sgx.h                 |  2 +-
 tools/testing/selftests/sgx/defines.h         |  2 +-
 5 files changed, 19 insertions(+), 8 deletions(-)
 rename arch/x86/{kernel/cpu/sgx/arch.h => include/asm/sgx.h} (95%)

diff --git a/MAINTAINERS b/MAINTAINERS
index aa84121c5611..0cb606aeba5e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9274,6 +9274,7 @@ Q:	https://patchwork.kernel.org/project/intel-sgx/list/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/sgx
 F:	Documentation/x86/sgx.rst
 F:	arch/x86/entry/vdso/vsgx.S
+F:	arch/x86/include/asm/sgx.h
 F:	arch/x86/include/uapi/asm/sgx.h
 F:	arch/x86/kernel/cpu/sgx/*
 F:	tools/testing/selftests/sgx/*
diff --git a/arch/x86/kernel/cpu/sgx/arch.h b/arch/x86/include/asm/sgx.h
similarity index 95%
rename from arch/x86/kernel/cpu/sgx/arch.h
rename to arch/x86/include/asm/sgx.h
index abf99bb71fdc..14bb5f7e221c 100644
--- a/arch/x86/kernel/cpu/sgx/arch.h
+++ b/arch/x86/include/asm/sgx.h
@@ -2,15 +2,20 @@
 /**
  * Copyright(c) 2016-20 Intel Corporation.
  *
- * Contains data structures defined by the SGX architecture.  Data structures
- * defined by the Linux software stack should not be placed here.
+ * Intel Software Guard Extensions (SGX) support.
  */
-#ifndef _ASM_X86_SGX_ARCH_H
-#define _ASM_X86_SGX_ARCH_H
+#ifndef _ASM_X86_SGX_H
+#define _ASM_X86_SGX_H
 
 #include <linux/bits.h>
 #include <linux/types.h>
 
+/*
+ * This file contains both data structures defined by SGX architecture and Linux
+ * defined software data structures and functions.  The two should not be mixed
+ * together for better readibility.  The architectural definitions come first.
+ */
+
 /* The SGX specific CPUID function. */
 #define SGX_CPUID		0x12
 /* EPC enumeration. */
@@ -337,4 +342,9 @@ struct sgx_sigstruct {
 
 #define SGX_LAUNCH_TOKEN_SIZE 304
 
-#endif /* _ASM_X86_SGX_ARCH_H */
+/*
+ * Do not put any hardware-defined SGX structure representations below this
+ * comment!
+ */
+
+#endif /* _ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index e0fb0f121616..f78cf880c332 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -7,7 +7,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/suspend.h>
 #include <linux/sched/mm.h>
-#include "arch.h"
+#include <asm/sgx.h>
 #include "encl.h"
 #include "encls.h"
 #include "sgx.h"
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index a3aa00cb1ac4..5086b240d269 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -8,7 +8,7 @@
 #include <linux/rwsem.h>
 #include <linux/types.h>
 #include <asm/asm.h>
-#include "arch.h"
+#include <asm/sgx.h>
 
 #undef pr_fmt
 #define pr_fmt(fmt) "sgx: " fmt
diff --git a/tools/testing/selftests/sgx/defines.h b/tools/testing/selftests/sgx/defines.h
index 592c1ccf4576..0bd73428d2f3 100644
--- a/tools/testing/selftests/sgx/defines.h
+++ b/tools/testing/selftests/sgx/defines.h
@@ -14,7 +14,7 @@
 #define __aligned(x) __attribute__((__aligned__(x)))
 #define __packed __attribute__((packed))
 
-#include "../../../../arch/x86/kernel/cpu/sgx/arch.h"
+#include "../../../../arch/x86/include/asm/sgx.h"
 #include "../../../../arch/x86/include/asm/enclu.h"
 #include "../../../../arch/x86/include/uapi/asm/sgx.h"
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 09/25] x86/sgx: Move ENCLS leaf definitions to sgx.h
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (7 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 08/25] x86/sgx: Expose SGX architectural definitions to the kernel Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  2021-03-19  7:23 ` [PATCH v3 10/25] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT) Kai Huang
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, linux-sgx, x86
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the ENCLS leaf definitions to sgx.h so that they can be used by
KVM.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/sgx.h      | 15 +++++++++++++++
 arch/x86/kernel/cpu/sgx/encls.h | 15 ---------------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 14bb5f7e221c..34f44238d1d1 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -27,6 +27,21 @@
 /* The bitmask for the EPC section type. */
 #define SGX_CPUID_EPC_MASK	GENMASK(3, 0)
 
+enum sgx_encls_function {
+	ECREATE	= 0x00,
+	EADD	= 0x01,
+	EINIT	= 0x02,
+	EREMOVE	= 0x03,
+	EDGBRD	= 0x04,
+	EDGBWR	= 0x05,
+	EEXTEND	= 0x06,
+	ELDU	= 0x08,
+	EBLOCK	= 0x09,
+	EPA	= 0x0A,
+	EWB	= 0x0B,
+	ETRACK	= 0x0C,
+};
+
 /**
  * enum sgx_return_code - The return code type for ENCLS, ENCLU and ENCLV
  * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index 443188fe7e70..be5c49689980 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -11,21 +11,6 @@
 #include <asm/traps.h>
 #include "sgx.h"
 
-enum sgx_encls_function {
-	ECREATE	= 0x00,
-	EADD	= 0x01,
-	EINIT	= 0x02,
-	EREMOVE	= 0x03,
-	EDGBRD	= 0x04,
-	EDGBWR	= 0x05,
-	EEXTEND	= 0x06,
-	ELDU	= 0x08,
-	EBLOCK	= 0x09,
-	EPA	= 0x0A,
-	EWB	= 0x0B,
-	ETRACK	= 0x0C,
-};
-
 /**
  * ENCLS_FAULT_FLAG - flag signifying an ENCLS return code is a trapnr
  *
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 10/25] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT)
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (8 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 09/25] x86/sgx: Move ENCLS leaf definitions to sgx.h Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  2021-03-19  7:23 ` [PATCH v3 11/25] x86/sgx: Add encls_faulted() helper Kai Huang
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, linux-sgx, x86
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Define the ENCLS leafs that are available with SGX2, also referred to as
Enclave Dynamic Memory Management (EDMM).  The leafs will be used by KVM
to conditionally expose SGX2 capabilities to guests.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/sgx.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 34f44238d1d1..3b025afec0a7 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -40,6 +40,9 @@ enum sgx_encls_function {
 	EPA	= 0x0A,
 	EWB	= 0x0B,
 	ETRACK	= 0x0C,
+	EAUG	= 0x0D,
+	EMODPR	= 0x0E,
+	EMODT	= 0x0F,
 };
 
 /**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 11/25] x86/sgx: Add encls_faulted() helper
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (9 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 10/25] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT) Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  2021-03-19  7:23 ` [PATCH v3 12/25] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs Kai Huang
                   ` (15 subsequent siblings)
  26 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, linux-sgx, x86
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a helper to extract the fault indicator from an encoded ENCLS return
value.  SGX virtualization will also need to detect ENCLS faults.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/cpu/sgx/encls.h | 15 ++++++++++++++-
 arch/x86/kernel/cpu/sgx/ioctl.c |  2 +-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index be5c49689980..9b204843b78d 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -40,6 +40,19 @@
 	} while (0);							  \
 }
 
+/*
+ * encls_faulted() - Check if an ENCLS leaf faulted given an error code
+ * @ret:	the return value of an ENCLS leaf function call
+ *
+ * Return:
+ * - true:	ENCLS leaf faulted.
+ * - false:	Otherwise.
+ */
+static inline bool encls_faulted(int ret)
+{
+	return ret & ENCLS_FAULT_FLAG;
+}
+
 /**
  * encls_failed() - Check if an ENCLS function failed
  * @ret:	the return value of an ENCLS function call
@@ -50,7 +63,7 @@
  */
 static inline bool encls_failed(int ret)
 {
-	if (ret & ENCLS_FAULT_FLAG)
+	if (encls_faulted(ret))
 		return ENCLS_TRAPNR(ret) != X86_TRAP_PF;
 
 	return !!ret;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 772b9c648cf1..dc18ced04ad8 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -568,7 +568,7 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 		}
 	}
 
-	if (ret & ENCLS_FAULT_FLAG) {
+	if (encls_faulted(ret)) {
 		if (encls_failed(ret))
 			ENCLS_WARN(ret, "EINIT");
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 12/25] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (10 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 11/25] x86/sgx: Add encls_faulted() helper Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Kai Huang
  2021-03-19  7:23 ` [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Kai Huang
                   ` (14 subsequent siblings)
  26 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, linux-sgx, x86
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

Add a helper to update SGX_LEPUBKEYHASHn MSRs.  SGX virtualization also
needs to update those MSRs based on guest's "virtual" SGX_LEPUBKEYHASHn
before EINIT from guest.

Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/cpu/sgx/ioctl.c |  5 ++---
 arch/x86/kernel/cpu/sgx/main.c  | 17 +++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h   |  2 ++
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index dc18ced04ad8..2a0aed446d6c 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -495,7 +495,7 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 			 void *token)
 {
 	u64 mrsigner[4];
-	int i, j, k;
+	int i, j;
 	void *addr;
 	int ret;
 
@@ -544,8 +544,7 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 
 			preempt_disable();
 
-			for (k = 0; k < 4; k++)
-				wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0 + k, mrsigner[k]);
+			sgx_update_lepubkeyhash(mrsigner);
 
 			ret = __einit(sigstruct, token, addr);
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index b73114150ff8..b95168427056 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -727,6 +727,23 @@ static bool __init sgx_page_cache_init(void)
 	return true;
 }
 
+
+/*
+ * Update the SGX_LEPUBKEYHASH MSRs to the values specified by caller.
+ * Bare-metal driver requires to update them to hash of enclave's signer
+ * before EINIT. KVM needs to update them to guest's virtual MSR values
+ * before doing EINIT from guest.
+ */
+void sgx_update_lepubkeyhash(u64 *lepubkeyhash)
+{
+	int i;
+
+	WARN_ON_ONCE(preemptible());
+
+	for (i = 0; i < 4; i++)
+		wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0 + i, lepubkeyhash[i]);
+}
+
 static int __init sgx_init(void)
 {
 	int ret;
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 5086b240d269..f0f2a92bb8d0 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -89,4 +89,6 @@ static inline int __init sgx_vepc_init(void)
 }
 #endif
 
+void sgx_update_lepubkeyhash(u64 *lepubkeyhash);
+
 #endif /* _X86_SGX_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (11 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 12/25] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-04-05  9:07   ` Borislav Petkov
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  2021-03-19  7:23 ` [PATCH v3 14/25] x86/sgx: Move provisioning device creation out of SGX driver Kai Huang
                   ` (13 subsequent siblings)
  26 siblings, 2 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, linux-sgx, x86
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

The host kernel must intercept ECREATE to impose policies on guests, and
intercept EINIT to be able to write guest's virtual SGX_LEPUBKEYHASH MSR
values to hardware before running guest's EINIT so it can run correctly
according to hardware behavior.

Provide wrappers around __ecreate() and __einit() to hide the ugliness
of overloading the ENCLS return value to encode multiple error formats
in a single int.  KVM will trap-and-execute ECREATE and EINIT as part
of SGX virtualization, and reflect ENCLS execution result to guest by
setting up guest's GPRs, or on an exception, injecting the correct fault
based on return value of __ecreate() and __einit().

Use host userspace addresses (provided by KVM based on guest physical
address of ENCLS parameters) to execute ENCLS/EINIT when possible.
Accesses to both EPC and memory originating from ENCLS are subject to
segmentation and paging mechanisms.  It's also possible to generate
kernel mappings for ENCLS parameters by resolving PFN but using
__uaccess_xx() is simpler.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v2->v3:
 - Updated to use addr,size directly for access_ok()s.

---
 arch/x86/include/asm/sgx.h     |   7 +++
 arch/x86/kernel/cpu/sgx/virt.c | 109 +++++++++++++++++++++++++++++++++
 2 files changed, 116 insertions(+)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 3b025afec0a7..954042e04102 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -365,4 +365,11 @@ struct sgx_sigstruct {
  * comment!
  */
 
+#ifdef CONFIG_X86_SGX_KVM
+int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
+		     int *trapnr);
+int sgx_virt_einit(void __user *sigstruct, void __user *token,
+		   void __user *secs, u64 *lepubkeyhash, int *trapnr);
+#endif
+
 #endif /* _ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 29d8d28b4695..eaff7d72e47f 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -258,3 +258,112 @@ int __init sgx_vepc_init(void)
 
 	return misc_register(&sgx_vepc_dev);
 }
+
+/**
+ * sgx_virt_ecreate() - Run ECREATE on behalf of guest
+ * @pageinfo:	Pointer to PAGEINFO structure
+ * @secs:	Userspace pointer to SECS page
+ * @trapnr:	trap number injected to guest in case of ECREATE error
+ *
+ * Run ECREATE on behalf of guest after KVM traps ECREATE for the purpose
+ * of enforcing policies of guest's enclaves, and return the trap number
+ * which should be injected to guest in case of any ECREATE error.
+ *
+ * Return:
+ * - 0:		ECREATE was successful.
+ * - -EFAULT:	ECREATE returned error.
+ */
+int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
+		     int *trapnr)
+{
+	int ret;
+
+	/*
+	 * @secs is an untrusted, userspace-provided address.  It comes from
+	 * KVM and is assumed to be a valid pointer which points somewhere in
+	 * userspace.  This can fault and call SGX or other fault handlers when
+	 * userspace mapping @secs doesn't exist.
+	 *
+	 * Add a WARN() to make sure @secs is already valid userspace pointer
+	 * from caller (KVM), who should already have handled invalid pointer
+	 * case (for instance, made by malicious guest).  All other checks,
+	 * such as alignment of @secs, are deferred to ENCLS itself.
+	 */
+	WARN_ON_ONCE(!access_ok(secs, PAGE_SIZE));
+	__uaccess_begin();
+	ret = __ecreate(pageinfo, (void *)secs);
+	__uaccess_end();
+
+	if (encls_faulted(ret)) {
+		*trapnr = ENCLS_TRAPNR(ret);
+		return -EFAULT;
+	}
+
+	/* ECREATE doesn't return an error code, it faults or succeeds. */
+	WARN_ON_ONCE(ret);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sgx_virt_ecreate);
+
+static int __sgx_virt_einit(void __user *sigstruct, void __user *token,
+			    void __user *secs)
+{
+	int ret;
+
+	/*
+	 * Make sure all userspace pointers from caller (KVM) are valid.
+	 * All other checks deferred to ENCLS itself.  Also see comment
+	 * for @secs in sgx_virt_ecreate().
+	 */
+#define SGX_EINITTOKEN_SIZE	304
+	WARN_ON_ONCE(!access_ok(sigstruct, sizeof(struct sgx_sigstruct)) ||
+		     !access_ok(token, SGX_EINITTOKEN_SIZE) ||
+		     !access_ok(secs, PAGE_SIZE));
+	__uaccess_begin();
+	ret =  __einit((void *)sigstruct, (void *)token, (void *)secs);
+	__uaccess_end();
+
+	return ret;
+}
+
+/**
+ * sgx_virt_einit() - Run EINIT on behalf of guest
+ * @sigstruct:		Userspace pointer to SIGSTRUCT structure
+ * @token:		Userspace pointer to EINITTOKEN structure
+ * @secs:		Userspace pointer to SECS page
+ * @lepubkeyhash:	Pointer to guest's *virtual* SGX_LEPUBKEYHASH MSR values
+ * @trapnr:		trap number injected to guest in case of EINIT error
+ *
+ * Run EINIT on behalf of guest after KVM traps EINIT. If SGX_LC is available
+ * in host, SGX driver may rewrite the hardware values at wish, therefore KVM
+ * needs to update hardware values to guest's virtual MSR values in order to
+ * ensure EINIT is executed with expected hardware values.
+ *
+ * Return:
+ * - 0:		EINIT was successful.
+ * - -EFAULT:	EINIT returned error.
+ */
+int sgx_virt_einit(void __user *sigstruct, void __user *token,
+		   void __user *secs, u64 *lepubkeyhash, int *trapnr)
+{
+	int ret;
+
+	if (!boot_cpu_has(X86_FEATURE_SGX_LC)) {
+		ret = __sgx_virt_einit(sigstruct, token, secs);
+	} else {
+		preempt_disable();
+
+		sgx_update_lepubkeyhash(lepubkeyhash);
+
+		ret = __sgx_virt_einit(sigstruct, token, secs);
+		preempt_enable();
+	}
+
+	if (encls_faulted(ret)) {
+		*trapnr = ENCLS_TRAPNR(ret);
+		return -EFAULT;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sgx_virt_einit);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 14/25] x86/sgx: Move provisioning device creation out of SGX driver
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (12 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  2021-03-19  7:23 ` [PATCH v3 15/25] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX) Kai Huang
                   ` (12 subsequent siblings)
  26 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, linux-sgx, x86
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

And extract sgx_set_attribute() out of sgx_ioc_enclave_provision() and
export it as symbol for KVM to use.

Provisioning key is sensitive. SGX driver only allows to create enclave
which can access provisioning key when enclave creator has permission to
open /dev/sgx_provision.  It should apply to VM as well, as provisioning
key is platform specific, thus unrestricted VM can also potentially
compromise provisioning key.

Move provisioning device creation out of sgx_drv_init() to sgx_init() as
preparation for adding SGX virtualization support, so that even SGX
driver is not enabled due to flexible launch control is not available,
SGX virtualization can still be enabled, and use it to restrict VM's
capability of being able to access provisioning key.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/sgx.h       |  3 ++
 arch/x86/kernel/cpu/sgx/driver.c | 17 ----------
 arch/x86/kernel/cpu/sgx/ioctl.c  | 16 ++-------
 arch/x86/kernel/cpu/sgx/main.c   | 57 +++++++++++++++++++++++++++++++-
 4 files changed, 61 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 954042e04102..a16e2c9154a3 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -372,4 +372,7 @@ int sgx_virt_einit(void __user *sigstruct, void __user *token,
 		   void __user *secs, u64 *lepubkeyhash, int *trapnr);
 #endif
 
+int sgx_set_attribute(unsigned long *allowed_attributes,
+		      unsigned int attribute_fd);
+
 #endif /* _ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 8ce6d8371cfb..aa9b8b868867 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -136,10 +136,6 @@ static const struct file_operations sgx_encl_fops = {
 	.get_unmapped_area	= sgx_get_unmapped_area,
 };
 
-const struct file_operations sgx_provision_fops = {
-	.owner			= THIS_MODULE,
-};
-
 static struct miscdevice sgx_dev_enclave = {
 	.minor = MISC_DYNAMIC_MINOR,
 	.name = "sgx_enclave",
@@ -147,13 +143,6 @@ static struct miscdevice sgx_dev_enclave = {
 	.fops = &sgx_encl_fops,
 };
 
-static struct miscdevice sgx_dev_provision = {
-	.minor = MISC_DYNAMIC_MINOR,
-	.name = "sgx_provision",
-	.nodename = "sgx_provision",
-	.fops = &sgx_provision_fops,
-};
-
 int __init sgx_drv_init(void)
 {
 	unsigned int eax, ebx, ecx, edx;
@@ -187,11 +176,5 @@ int __init sgx_drv_init(void)
 	if (ret)
 		return ret;
 
-	ret = misc_register(&sgx_dev_provision);
-	if (ret) {
-		misc_deregister(&sgx_dev_enclave);
-		return ret;
-	}
-
 	return 0;
 }
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 2a0aed446d6c..7f573e37f4d4 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -2,6 +2,7 @@
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
 #include <asm/mman.h>
+#include <asm/sgx.h>
 #include <linux/mman.h>
 #include <linux/delay.h>
 #include <linux/file.h>
@@ -664,24 +665,11 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg)
 static long sgx_ioc_enclave_provision(struct sgx_encl *encl, void __user *arg)
 {
 	struct sgx_enclave_provision params;
-	struct file *file;
 
 	if (copy_from_user(&params, arg, sizeof(params)))
 		return -EFAULT;
 
-	file = fget(params.fd);
-	if (!file)
-		return -EINVAL;
-
-	if (file->f_op != &sgx_provision_fops) {
-		fput(file);
-		return -EINVAL;
-	}
-
-	encl->attributes_mask |= SGX_ATTR_PROVISIONKEY;
-
-	fput(file);
-	return 0;
+	return sgx_set_attribute(&encl->attributes_mask, params.fd);
 }
 
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index b95168427056..7105e34da530 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -1,14 +1,17 @@
 // SPDX-License-Identifier: GPL-2.0
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
+#include <linux/file.h>
 #include <linux/freezer.h>
 #include <linux/highmem.h>
 #include <linux/kthread.h>
+#include <linux/miscdevice.h>
 #include <linux/pagemap.h>
 #include <linux/ratelimit.h>
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
+#include <asm/sgx.h>
 #include "driver.h"
 #include "encl.h"
 #include "encls.h"
@@ -744,6 +747,51 @@ void sgx_update_lepubkeyhash(u64 *lepubkeyhash)
 		wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0 + i, lepubkeyhash[i]);
 }
 
+const struct file_operations sgx_provision_fops = {
+	.owner			= THIS_MODULE,
+};
+
+static struct miscdevice sgx_dev_provision = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "sgx_provision",
+	.nodename = "sgx_provision",
+	.fops = &sgx_provision_fops,
+};
+
+/**
+ * sgx_set_attribute() - Update allowed attributes given file descriptor
+ * @allowed_attributes:		Pointer to allowed enclave attributes
+ * @attribute_fd:		File descriptor for specific attribute
+ *
+ * Append enclave attribute indicated by file descriptor to allowed
+ * attributes. Currently only SGX_ATTR_PROVISIONKEY indicated by
+ * /dev/sgx_provision is supported.
+ *
+ * Return:
+ * -0:		SGX_ATTR_PROVISIONKEY is appended to allowed_attributes
+ * -EINVAL:	Invalid, or not supported file descriptor
+ */
+int sgx_set_attribute(unsigned long *allowed_attributes,
+		      unsigned int attribute_fd)
+{
+	struct file *file;
+
+	file = fget(attribute_fd);
+	if (!file)
+		return -EINVAL;
+
+	if (file->f_op != &sgx_provision_fops) {
+		fput(file);
+		return -EINVAL;
+	}
+
+	*allowed_attributes |= SGX_ATTR_PROVISIONKEY;
+
+	fput(file);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sgx_set_attribute);
+
 static int __init sgx_init(void)
 {
 	int ret;
@@ -760,6 +808,10 @@ static int __init sgx_init(void)
 		goto err_page_cache;
 	}
 
+	ret = misc_register(&sgx_dev_provision);
+	if (ret)
+		goto err_kthread;
+
 	/*
 	 * Always try to initialize the native *and* KVM drivers.
 	 * The KVM driver is less picky than the native one and
@@ -770,10 +822,13 @@ static int __init sgx_init(void)
 	 */
 	ret = !!sgx_drv_init() & !!sgx_vepc_init();
 	if (ret)
-		goto err_kthread;
+		goto err_provision;
 
 	return 0;
 
+err_provision:
+	misc_deregister(&sgx_dev_provision);
+
 err_kthread:
 	kthread_stop(ksgxd_tsk);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 15/25] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX)
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (13 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 14/25] x86/sgx: Move provisioning device creation out of SGX driver Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-03-19  7:23 ` [PATCH v3 16/25] KVM: x86: Define new #PF SGX error code bit Kai Huang
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jmattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Export the gva_to_gpa() helpers for use by SGX virtualization when
executing ENCLS[ECREATE] and ENCLS[EINIT] on behalf of the guest.
To execute ECREATE and EINIT, KVM must obtain the GPA of the target
Secure Enclave Control Structure (SECS) in order to get its
corresponding HVA.

Because the SECS must reside in the Enclave Page Cache (EPC), copying
the SECS's data to a host-controlled buffer via existing exported
helpers is not a viable option as the EPC is not readable or writable
by the kernel.

SGX virtualization will also use gva_to_gpa() to obtain HVAs for
non-EPC pages in order to pass user pointers directly to ECREATE and
EINIT, which avoids having to copy pages worth of data into the kernel.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47e021bdcc94..d2da5abcf395 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5959,6 +5959,7 @@ gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
 	u32 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
 	return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read);
 
  gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva,
 				struct x86_exception *exception)
@@ -5975,6 +5976,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
 	access |= PFERR_WRITE_MASK;
 	return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_write);
 
 /* uses this to access any guest's mapped memory without checking CPL */
 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 16/25] KVM: x86: Define new #PF SGX error code bit
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (14 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 15/25] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX) Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-03-19  7:23 ` [PATCH v3 17/25] KVM: x86: Add support for reverse CPUID lookup of scattered features Kai Huang
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jmattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Page faults that are signaled by the SGX Enclave Page Cache Map (EPCM),
as opposed to the traditional IA32/EPT page tables, set an SGX bit in
the error code to indicate that the #PF was induced by SGX.  KVM will
need to emulate this behavior as part of its trap-and-execute scheme for
virtualizing SGX Launch Control, e.g. to inject SGX-induced #PFs if
EINIT faults in the host, and to support live migration.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9bc091ecaaeb..8b1c13056768 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -227,6 +227,7 @@ enum x86_intercept_stage;
 #define PFERR_RSVD_BIT 3
 #define PFERR_FETCH_BIT 4
 #define PFERR_PK_BIT 5
+#define PFERR_SGX_BIT 15
 #define PFERR_GUEST_FINAL_BIT 32
 #define PFERR_GUEST_PAGE_BIT 33
 
@@ -236,6 +237,7 @@ enum x86_intercept_stage;
 #define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
 #define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
 #define PFERR_PK_MASK (1U << PFERR_PK_BIT)
+#define PFERR_SGX_MASK (1U << PFERR_SGX_BIT)
 #define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
 #define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 17/25] KVM: x86: Add support for reverse CPUID lookup of scattered features
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (15 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 16/25] KVM: x86: Define new #PF SGX error code bit Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-03-19  7:23 ` [PATCH v3 18/25] KVM: x86: Add reverse-CPUID lookup support for scattered SGX features Kai Huang
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jmattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <seanjc@google.com>

Introduce a scheme that allows KVM's CPUID magic to support features
that are scattered in the kernel's feature words.  To advertise and/or
query guest support for CPUID-based features, KVM requires the bit
number of an X86_FEATURE_* to match the bit number in its associated
CPUID entry.  For scattered features, this does not hold true.

Add a framework to allow defining KVM-only words, stored in
kvm_cpu_caps after the shared kernel caps, that can be used to gather
the scattered feature bits by translating X86_FEATURE_* flags into their
KVM-defined feature.

Note, because reverse_cpuid_check() effectively forces kvm_cpu_caps
lookups to be resolved at compile time, there is no runtime cost for
translating from kernel-defined to kvm-defined features.

More details here:  https://lkml.kernel.org/r/X/jxCOLG+HUO4QlZ@google.com

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/cpuid.c | 32 +++++++++++++++++++++++++++-----
 arch/x86/kvm/cpuid.h | 39 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 6bd2f8b830e4..a0e7be9ed449 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -28,7 +28,7 @@
  * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
  * aligned to sizeof(unsigned long) because it's not accessed via bitops.
  */
-u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
+u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 EXPORT_SYMBOL_GPL(kvm_cpu_caps);
 
 static u32 xstate_required_size(u64 xstate_bv, bool compacted)
@@ -53,6 +53,7 @@ static u32 xstate_required_size(u64 xstate_bv, bool compacted)
 }
 
 #define F feature_bit
+#define SF(name) (boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0)
 
 static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
 	struct kvm_cpuid_entry2 *entries, int nent, u32 function, u32 index)
@@ -347,13 +348,13 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
 	return r;
 }
 
-static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
+/* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
+static __always_inline void __kvm_cpu_cap_mask(enum cpuid_leafs leaf)
 {
 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
 	struct kvm_cpuid_entry2 entry;
 
 	reverse_cpuid_check(leaf);
-	kvm_cpu_caps[leaf] &= mask;
 
 	cpuid_count(cpuid.function, cpuid.index,
 		    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
@@ -361,6 +362,26 @@ static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
 	kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
 }
 
+static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
+{
+	/* Use the "init" variant for scattered leafs. */
+	BUILD_BUG_ON(leaf >= NCAPINTS);
+
+	kvm_cpu_caps[leaf] &= mask;
+
+	__kvm_cpu_cap_mask(leaf);
+}
+
+static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
+{
+	/* Use the "mask" variant for hardwared-defined leafs. */
+	BUILD_BUG_ON(leaf < NCAPINTS);
+
+	kvm_cpu_caps[leaf] = mask;
+
+	__kvm_cpu_cap_mask(leaf);
+}
+
 void kvm_set_cpu_caps(void)
 {
 	unsigned int f_nx = is_efer_nx() ? F(NX) : 0;
@@ -371,12 +392,13 @@ void kvm_set_cpu_caps(void)
 	unsigned int f_gbpages = 0;
 	unsigned int f_lm = 0;
 #endif
+	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
 
-	BUILD_BUG_ON(sizeof(kvm_cpu_caps) >
+	BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
 		     sizeof(boot_cpu_data.x86_capability));
 
 	memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
-	       sizeof(kvm_cpu_caps));
+	       sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));
 
 	kvm_cpu_cap_mask(CPUID_1_ECX,
 		/*
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 2a0c5064497f..8925a929186c 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -7,7 +7,20 @@
 #include <asm/processor.h>
 #include <uapi/asm/kvm_para.h>
 
-extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
+/*
+ * Hardware-defined CPUID leafs that are scattered in the kernel, but need to
+ * be directly used by KVM.  Note, these word values conflict with the kernel's
+ * "bug" caps, but KVM doesn't use those.
+ */
+enum kvm_only_cpuid_leafs {
+	NR_KVM_CPU_CAPS = NCAPINTS,
+
+	NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
+};
+
+#define X86_KVM_FEATURE(w, f)		((w)*32 + (f))
+
+extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 void kvm_set_cpu_caps(void);
 
 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
@@ -100,6 +113,20 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
 	BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
 }
 
+/*
+ * Translate feature bits that are scattered in the kernel's cpufeatures word
+ * into KVM feature words that align with hardware's definitions.
+ */
+static __always_inline u32 __feature_translate(int x86_feature)
+{
+	return x86_feature;
+}
+
+static __always_inline u32 __feature_leaf(int x86_feature)
+{
+	return __feature_translate(x86_feature) / 32;
+}
+
 /*
  * Retrieve the bit mask from an X86_FEATURE_* definition.  Features contain
  * the hardware defined bit number (stored in bits 4:0) and a software defined
@@ -108,6 +135,8 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
  */
 static __always_inline u32 __feature_bit(int x86_feature)
 {
+	x86_feature = __feature_translate(x86_feature);
+
 	reverse_cpuid_check(x86_feature / 32);
 	return 1 << (x86_feature & 31);
 }
@@ -116,7 +145,7 @@ static __always_inline u32 __feature_bit(int x86_feature)
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
 	reverse_cpuid_check(x86_leaf);
 	return reverse_cpuid[x86_leaf];
@@ -308,7 +337,7 @@ static inline bool cpuid_fault_enabled(struct kvm_vcpu *vcpu)
 
 static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
 	reverse_cpuid_check(x86_leaf);
 	kvm_cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
@@ -316,7 +345,7 @@ static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)
 
 static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
 	reverse_cpuid_check(x86_leaf);
 	kvm_cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
@@ -324,7 +353,7 @@ static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)
 
 static __always_inline u32 kvm_cpu_cap_get(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
 	reverse_cpuid_check(x86_leaf);
 	return kvm_cpu_caps[x86_leaf] & __feature_bit(x86_feature);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 18/25] KVM: x86: Add reverse-CPUID lookup support for scattered SGX features
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (16 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 17/25] KVM: x86: Add support for reverse CPUID lookup of scattered features Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-03-19  7:23 ` [PATCH v3 19/25] KVM: VMX: Add basic handling of VM-Exit from SGX enclave Kai Huang
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jmattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <seanjc@google.com>

Define a new KVM-only feature word for advertising and querying SGX
sub-features in CPUID.0x12.0x0.EAX.  Because SGX1 and SGX2 are scattered
in the kernel's feature word, they need to be translated so that the
bit numbers match those of hardware.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/cpuid.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 8925a929186c..a175ff75bbbe 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -13,13 +13,18 @@
  * "bug" caps, but KVM doesn't use those.
  */
 enum kvm_only_cpuid_leafs {
-	NR_KVM_CPU_CAPS = NCAPINTS,
+	CPUID_12_EAX	 = NCAPINTS,
+	NR_KVM_CPU_CAPS,
 
 	NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
 };
 
 #define X86_KVM_FEATURE(w, f)		((w)*32 + (f))
 
+/* Intel-defined SGX sub-features, CPUID level 0x12 (EAX). */
+#define __X86_FEATURE_SGX1		X86_KVM_FEATURE(CPUID_12_EAX, 0)
+#define __X86_FEATURE_SGX2		X86_KVM_FEATURE(CPUID_12_EAX, 1)
+
 extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 void kvm_set_cpu_caps(void);
 
@@ -93,6 +98,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
 	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
 	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
+	[CPUID_12_EAX]        = {0x00000012, 0, CPUID_EAX},
 };
 
 /*
@@ -119,6 +125,11 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
  */
 static __always_inline u32 __feature_translate(int x86_feature)
 {
+	if (x86_feature == X86_FEATURE_SGX1)
+		return __X86_FEATURE_SGX1;
+	else if (x86_feature == X86_FEATURE_SGX2)
+		return __X86_FEATURE_SGX2;
+
 	return x86_feature;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 19/25] KVM: VMX: Add basic handling of VM-Exit from SGX enclave
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (17 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 18/25] KVM: x86: Add reverse-CPUID lookup support for scattered SGX features Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-03-19  7:23 ` [PATCH v3 20/25] KVM: VMX: Frame in ENCLS handler for SGX virtualization Kai Huang
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jmattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add support for handling VM-Exits that originate from a guest SGX
enclave.  In SGX, an "enclave" is a new CPL3-only execution environment,
wherein the CPU and memory state is protected by hardware to make the
state inaccesible to code running outside of the enclave.  When exiting
an enclave due to an asynchronous event (from the perspective of the
enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state
is automatically saved and scrubbed (the CPU loads synthetic state), and
then reloaded when re-entering the enclave.  E.g. after an instruction
based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP
of the enclave instruction that trigered VM-Exit, but will instead point
to a RIP in the enclave's untrusted runtime (the guest userspace code
that coordinates entry/exit to/from the enclave).

To help a VMM recognize and handle exits from enclaves, SGX adds bits to
existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and
GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR.  Define the
new architectural bits, and add a boolean to struct vcpu_vmx to cache
VMX_EXIT_REASON_FROM_ENCLAVE.  Clear the bit in exit_reason so that
checks against exit_reason do not need to account for SGX, e.g.
"if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work.

KVM is a largely a passive observer of the new bits, e.g. KVM needs to
account for the bits when propagating information to a nested VMM, but
otherwise doesn't need to act differently for the majority of VM-Exits
from enclaves.

The one scenario that is directly impacted is emulation, which is for
all intents and purposes impossible[1] since KVM does not have access to
the RIP or instruction stream that triggered the VM-Exit.  The inability
to emulate is a non-issue for KVM, as most instructions that might
trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit
check.  For the few instruction that conditionally #UD, KVM either never
sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only
if the feature is not exposed to the guest in order to inject a #UD,
e.g. RDRAND_EXITING.

But, because it is still possible for a guest to trigger emulation,
e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit
from an enclave.  This is architecturally accurate for instruction
VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable
to killing the VM.  In practice, only broken or particularly stupid
guests should ever encounter this behavior.

Add a WARN in skip_emulated_instruction to detect any attempt to
modify the guest's RIP during an SGX enclave VM-Exit as all such flows
should either be unreachable or must handle exits from enclaves before
getting to skip_emulated_instruction.

[1] Impossible for all practical purposes.  Not truly impossible
    since KVM could implement some form of para-virtualization scheme.

[2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at
    CPL3, so we also don't need to worry about that interaction.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/vmx.h      |  1 +
 arch/x86/include/uapi/asm/vmx.h |  1 +
 arch/x86/kvm/vmx/nested.c       |  2 ++
 arch/x86/kvm/vmx/vmx.c          | 45 +++++++++++++++++++++++++++++++--
 4 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 358707f60d99..0ffaa3156a4e 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -373,6 +373,7 @@ enum vmcs_field {
 #define GUEST_INTR_STATE_MOV_SS		0x00000002
 #define GUEST_INTR_STATE_SMI		0x00000004
 #define GUEST_INTR_STATE_NMI		0x00000008
+#define GUEST_INTR_STATE_ENCLAVE_INTR	0x00000010
 
 /* GUEST_ACTIVITY_STATE flags */
 #define GUEST_ACTIVITY_ACTIVE		0
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index b8e650a985e3..946d761adbd3 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -27,6 +27,7 @@
 
 
 #define VMX_EXIT_REASONS_FAILED_VMENTRY         0x80000000
+#define VMX_EXIT_REASONS_SGX_ENCLAVE_MODE	0x08000000
 
 #define EXIT_REASON_EXCEPTION_NMI       0
 #define EXIT_REASON_EXTERNAL_INTERRUPT  1
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bcca0b80e0d0..28848e9f70e2 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4105,6 +4105,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 {
 	/* update exit information fields: */
 	vmcs12->vm_exit_reason = vm_exit_reason;
+	if (to_vmx(vcpu)->exit_reason.enclave_mode)
+		vmcs12->vm_exit_reason |= VMX_EXIT_REASONS_SGX_ENCLAVE_MODE;
 	vmcs12->exit_qualification = exit_qualification;
 	vmcs12->vm_exit_intr_info = exit_intr_info;
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 32cf8287d4a7..9dd185a53a3e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1570,12 +1570,25 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
 
 static bool vmx_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn, int insn_len)
 {
+	/*
+	 * Emulation of instructions in SGX enclaves is impossible as RIP does
+	 * not point  tthe failing instruction, and even if it did, the code
+	 * stream is inaccessible.  Inject #UD instead of exiting to userspace
+	 * so that guest userspace can't DoS the guest simply by triggering
+	 * emulation (enclaves are CPL3 only).
+	 */
+	if (to_vmx(vcpu)->exit_reason.enclave_mode) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+		return false;
+	}
 	return true;
 }
 
 static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
+	union vmx_exit_reason exit_reason = to_vmx(vcpu)->exit_reason;
 	unsigned long rip, orig_rip;
+	u32 instr_len;
 
 	/*
 	 * Using VMCS.VM_EXIT_INSTRUCTION_LEN on EPT misconfig depends on
@@ -1586,9 +1599,33 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	 * i.e. we end up advancing IP with some random value.
 	 */
 	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
-	    to_vmx(vcpu)->exit_reason.basic != EXIT_REASON_EPT_MISCONFIG) {
+	    exit_reason.basic != EXIT_REASON_EPT_MISCONFIG) {
+		instr_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+
+		/*
+		 * Emulating an enclave's instructions isn't supported as KVM
+		 * cannot access the enclave's memory or its true RIP, e.g. the
+		 * vmcs.GUEST_RIP points at the exit point of the enclave, not
+		 * the RIP that actually triggered the VM-Exit.  But, because
+		 * most instructions that cause VM-Exit will #UD in an enclave,
+		 * most instruction-based VM-Exits simply do not occur.
+		 *
+		 * There are a few exceptions, notably the debug instructions
+		 * INT1ICEBRK and INT3, as they are allowed in debug enclaves
+		 * and generate #DB/#BP as expected, which KVM might intercept.
+		 * But again, the CPU does the dirty work and saves an instr
+		 * length of zero so VMMs don't shoot themselves in the foot.
+		 * WARN if KVM tries to skip a non-zero length instruction on
+		 * a VM-Exit from an enclave.
+		 */
+		if (!instr_len)
+			goto rip_updated;
+
+		WARN(exit_reason.enclave_mode,
+		     "KVM: skipping instruction after SGX enclave VM-Exit");
+
 		orig_rip = kvm_rip_read(vcpu);
-		rip = orig_rip + vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+		rip = orig_rip + instr_len;
 #ifdef CONFIG_X86_64
 		/*
 		 * We need to mask out the high 32 bits of RIP if not in 64-bit
@@ -1604,6 +1641,7 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 			return 0;
 	}
 
+rip_updated:
 	/* skipping an emulated instruction also counts */
 	vmx_set_interrupt_shadow(vcpu, 0);
 
@@ -5384,6 +5422,9 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 {
 	gpa_t gpa;
 
+	if (!vmx_can_emulate_instruction(vcpu, NULL, 0))
+		return 1;
+
 	/*
 	 * A nested guest cannot optimize MMIO vmexits, because we have an
 	 * nGPA here instead of the required GPA.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 20/25] KVM: VMX: Frame in ENCLS handler for SGX virtualization
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (18 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 19/25] KVM: VMX: Add basic handling of VM-Exit from SGX enclave Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-03-19  7:23 ` [PATCH v3 21/25] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions Kai Huang
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jmattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Introduce sgx.c and sgx.h, along with the framework for handling ENCLS
VM-Exits.  Add a bool, enable_sgx, that will eventually be wired up to a
module param to control whether or not SGX virtualization is enabled at
runtime.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/Makefile  |  2 ++
 arch/x86/kvm/vmx/sgx.c | 50 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/sgx.h | 15 +++++++++++++
 arch/x86/kvm/vmx/vmx.c |  9 +++++---
 4 files changed, 73 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/sgx.c
 create mode 100644 arch/x86/kvm/vmx/sgx.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 1b4766fe1de2..87f514c36eae 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -23,6 +23,8 @@ kvm-$(CONFIG_KVM_XEN)	+= xen.o
 
 kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
+kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
+
 kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
 
 obj-$(CONFIG_KVM)	+= kvm.o
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
new file mode 100644
index 000000000000..f68adbe38750
--- /dev/null
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  Copyright(c) 2021 Intel Corporation. */
+
+#include <asm/sgx.h>
+
+#include "cpuid.h"
+#include "kvm_cache_regs.h"
+#include "sgx.h"
+#include "vmx.h"
+#include "x86.h"
+
+bool __read_mostly enable_sgx;
+
+static inline bool encls_leaf_enabled_in_guest(struct kvm_vcpu *vcpu, u32 leaf)
+{
+	if (!enable_sgx || !guest_cpuid_has(vcpu, X86_FEATURE_SGX))
+		return false;
+
+	if (leaf >= ECREATE && leaf <= ETRACK)
+		return guest_cpuid_has(vcpu, X86_FEATURE_SGX1);
+
+	if (leaf >= EAUG && leaf <= EMODT)
+		return guest_cpuid_has(vcpu, X86_FEATURE_SGX2);
+
+	return false;
+}
+
+static inline bool sgx_enabled_in_guest_bios(struct kvm_vcpu *vcpu)
+{
+	const u64 bits = FEAT_CTL_SGX_ENABLED | FEAT_CTL_LOCKED;
+
+	return (to_vmx(vcpu)->msr_ia32_feature_control & bits) == bits;
+}
+
+int handle_encls(struct kvm_vcpu *vcpu)
+{
+	u32 leaf = (u32)vcpu->arch.regs[VCPU_REGS_RAX];
+
+	if (!encls_leaf_enabled_in_guest(vcpu, leaf)) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+	} else if (!sgx_enabled_in_guest_bios(vcpu)) {
+		kvm_inject_gp(vcpu, 0);
+	} else {
+		WARN(1, "KVM: unexpected exit on ENCLS[%u]", leaf);
+		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+		vcpu->run->hw.hardware_exit_reason = EXIT_REASON_ENCLS;
+		return 0;
+	}
+	return 1;
+}
diff --git a/arch/x86/kvm/vmx/sgx.h b/arch/x86/kvm/vmx/sgx.h
new file mode 100644
index 000000000000..6e17ecd4aca3
--- /dev/null
+++ b/arch/x86/kvm/vmx/sgx.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_SGX_H
+#define __KVM_X86_SGX_H
+
+#include <linux/kvm_host.h>
+
+#ifdef CONFIG_X86_SGX_KVM
+extern bool __read_mostly enable_sgx;
+
+int handle_encls(struct kvm_vcpu *vcpu);
+#else
+#define enable_sgx 0
+#endif
+
+#endif /* __KVM_X86_SGX_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9dd185a53a3e..ef668047a8f9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -57,6 +57,7 @@
 #include "mmu.h"
 #include "nested.h"
 #include "pmu.h"
+#include "sgx.h"
 #include "trace.h"
 #include "vmcs.h"
 #include "vmcs12.h"
@@ -5673,16 +5674,18 @@ static int handle_vmx_instruction(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+#ifndef CONFIG_X86_SGX_KVM
 static int handle_encls(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * SGX virtualization is not yet supported.  There is no software
-	 * enable bit for SGX, so we have to trap ENCLS and inject a #UD
-	 * to prevent the guest from executing ENCLS.
+	 * SGX virtualization is disabled.  There is no software enable bit for
+	 * SGX, so KVM intercepts all ENCLS leafs and injects a #UD to prevent
+	 * the guest from executing ENCLS (when SGX is supported by hardware).
 	 */
 	kvm_queue_exception(vcpu, UD_VECTOR);
 	return 1;
 }
+#endif /* CONFIG_X86_SGX_KVM */
 
 static int handle_bus_lock_vmexit(struct kvm_vcpu *vcpu)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 21/25] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (19 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 20/25] KVM: VMX: Frame in ENCLS handler for SGX virtualization Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-03-19  7:23 ` [PATCH v3 22/25] KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs Kai Huang
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jmattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add an ECREATE handler that will be used to intercept ECREATE for the
purpose of enforcing and enclave's MISCSELECT, ATTRIBUTES and XFRM, i.e.
to allow userspace to restrict SGX features via CPUID.  ECREATE will be
intercepted when any of the aforementioned masks diverges from hardware
in order to enforce the desired CPUID model, i.e. inject #GP if the
guest attempts to set a bit that hasn't been enumerated as allowed-1 in
CPUID.

Note, access to the PROVISIONKEY is not yet supported.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |   3 +
 arch/x86/kvm/vmx/sgx.c          | 263 ++++++++++++++++++++++++++++++++
 2 files changed, 266 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8b1c13056768..d6329ede0198 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1038,6 +1038,9 @@ struct kvm_arch {
 
 	bool bus_lock_detection_enabled;
 
+	/* Guest can access the SGX PROVISIONKEY. */
+	bool sgx_provisioning_allowed;
+
 	struct kvm_pmu_event_filter __rcu *pmu_event_filter;
 	struct task_struct *nx_lpage_recovery_thread;
 
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index f68adbe38750..cb7cc6174a84 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -11,6 +11,267 @@
 
 bool __read_mostly enable_sgx;
 
+/*
+ * ENCLS's memory operands use a fixed segment (DS) and a fixed
+ * address size based on the mode.  Related prefixes are ignored.
+ */
+static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
+			     int size, int alignment, gva_t *gva)
+{
+	struct kvm_segment s;
+	bool fault;
+
+	/* Skip vmcs.GUEST_DS retrieval for 64-bit mode to avoid VMREADs. */
+	*gva = offset;
+	if (!is_long_mode(vcpu)) {
+		vmx_get_segment(vcpu, &s, VCPU_SREG_DS);
+		*gva += s.base;
+	}
+
+	if (!IS_ALIGNED(*gva, alignment)) {
+		fault = true;
+	} else if (likely(is_long_mode(vcpu))) {
+		fault = is_noncanonical_address(*gva, vcpu);
+	} else {
+		*gva &= 0xffffffff;
+		fault = (s.unusable) ||
+			(s.type != 2 && s.type != 3) ||
+			(*gva > s.limit) ||
+			((s.base != 0 || s.limit != 0xffffffff) &&
+			(((u64)*gva + size - 1) > s.limit + 1));
+	}
+	if (fault)
+		kvm_inject_gp(vcpu, 0);
+	return fault ? -EINVAL : 0;
+}
+
+static void sgx_handle_emulation_failure(struct kvm_vcpu *vcpu, u64 addr,
+					 unsigned int size)
+{
+	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+	vcpu->run->internal.ndata = 2;
+	vcpu->run->internal.data[0] = addr;
+	vcpu->run->internal.data[1] = size;
+}
+
+static int sgx_read_hva(struct kvm_vcpu *vcpu, unsigned long hva, void *data,
+			unsigned int size)
+{
+	if (__copy_from_user(data, (void __user *)hva, size)) {
+		sgx_handle_emulation_failure(vcpu, hva, size);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int sgx_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t gva, bool write,
+			  gpa_t *gpa)
+{
+	struct x86_exception ex;
+
+	if (write)
+		*gpa = kvm_mmu_gva_to_gpa_write(vcpu, gva, &ex);
+	else
+		*gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, &ex);
+
+	if (*gpa == UNMAPPED_GVA) {
+		kvm_inject_emulated_page_fault(vcpu, &ex);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int sgx_gpa_to_hva(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long *hva)
+{
+	*hva = kvm_vcpu_gfn_to_hva(vcpu, PFN_DOWN(gpa));
+	if (kvm_is_error_hva(*hva)) {
+		sgx_handle_emulation_failure(vcpu, gpa, 1);
+		return -EFAULT;
+	}
+
+	*hva |= gpa & ~PAGE_MASK;
+
+	return 0;
+}
+
+static int sgx_inject_fault(struct kvm_vcpu *vcpu, gva_t gva, int trapnr)
+{
+	struct x86_exception ex;
+
+	/*
+	 * A non-EPCM #PF indicates a bad userspace HVA.  This *should* check
+	 * for PFEC.SGX and not assume any #PF on SGX2 originated in the EPC,
+	 * but the error code isn't (yet) plumbed through the ENCLS helpers.
+	 */
+	if (trapnr == PF_VECTOR && !boot_cpu_has(X86_FEATURE_SGX2)) {
+		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+		vcpu->run->internal.ndata = 0;
+		return 0;
+	}
+
+	/*
+	 * If the guest thinks it's running on SGX2 hardware, inject an SGX
+	 * #PF if the fault matches an EPCM fault signature (#GP on SGX1,
+	 * #PF on SGX2).  The assumption is that EPCM faults are much more
+	 * likely than a bad userspace address.
+	 */
+	if ((trapnr == PF_VECTOR || !boot_cpu_has(X86_FEATURE_SGX2)) &&
+	    guest_cpuid_has(vcpu, X86_FEATURE_SGX2)) {
+		memset(&ex, 0, sizeof(ex));
+		ex.vector = PF_VECTOR;
+		ex.error_code = PFERR_PRESENT_MASK | PFERR_WRITE_MASK |
+				PFERR_SGX_MASK;
+		ex.address = gva;
+		ex.error_code_valid = true;
+		ex.nested_page_fault = false;
+		kvm_inject_page_fault(vcpu, &ex);
+	} else {
+		kvm_inject_gp(vcpu, 0);
+	}
+	return 1;
+}
+
+static int __handle_encls_ecreate(struct kvm_vcpu *vcpu,
+				  struct sgx_pageinfo *pageinfo,
+				  unsigned long secs_hva,
+				  gva_t secs_gva)
+{
+	struct sgx_secs *contents = (struct sgx_secs *)pageinfo->contents;
+	struct kvm_cpuid_entry2 *sgx_12_0, *sgx_12_1;
+	u64 attributes, xfrm, size;
+	u32 miscselect;
+	u8 max_size_log2;
+	int trapnr;
+
+	sgx_12_0 = kvm_find_cpuid_entry(vcpu, 0x12, 0);
+	sgx_12_1 = kvm_find_cpuid_entry(vcpu, 0x12, 1);
+	if (!sgx_12_0 || !sgx_12_1) {
+		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+		vcpu->run->internal.ndata = 0;
+		return 0;
+	}
+
+	miscselect = contents->miscselect;
+	attributes = contents->attributes;
+	xfrm = contents->xfrm;
+	size = contents->size;
+
+	/* Enforce restriction of access to the PROVISIONKEY. */
+	if (!vcpu->kvm->arch.sgx_provisioning_allowed &&
+	    (attributes & SGX_ATTR_PROVISIONKEY)) {
+		if (sgx_12_1->eax & SGX_ATTR_PROVISIONKEY)
+			pr_warn_once("KVM: SGX PROVISIONKEY advertised but not allowed\n");
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	/* Enforce CPUID restrictions on MISCSELECT, ATTRIBUTES and XFRM. */
+	if ((u32)miscselect & ~sgx_12_0->ebx ||
+	    (u32)attributes & ~sgx_12_1->eax ||
+	    (u32)(attributes >> 32) & ~sgx_12_1->ebx ||
+	    (u32)xfrm & ~sgx_12_1->ecx ||
+	    (u32)(xfrm >> 32) & ~sgx_12_1->edx) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	/* Enforce CPUID restriction on max enclave size. */
+	max_size_log2 = (attributes & SGX_ATTR_MODE64BIT) ? sgx_12_0->edx >> 8 :
+							    sgx_12_0->edx;
+	if (size >= BIT_ULL(max_size_log2))
+		kvm_inject_gp(vcpu, 0);
+
+	if (sgx_virt_ecreate(pageinfo, (void __user *)secs_hva, &trapnr))
+		return sgx_inject_fault(vcpu, secs_gva, trapnr);
+
+	return kvm_skip_emulated_instruction(vcpu);
+}
+
+static int handle_encls_ecreate(struct kvm_vcpu *vcpu)
+{
+	gva_t pageinfo_gva, secs_gva;
+	gva_t metadata_gva, contents_gva;
+	gpa_t metadata_gpa, contents_gpa, secs_gpa;
+	unsigned long metadata_hva, contents_hva, secs_hva;
+	struct sgx_pageinfo pageinfo;
+	struct sgx_secs *contents;
+	struct x86_exception ex;
+	int r;
+
+	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 32, 32, &pageinfo_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva))
+		return 1;
+
+	/*
+	 * Copy the PAGEINFO to local memory, its pointers need to be
+	 * translated, i.e. we need to do a deep copy/translate.
+	 */
+	r = kvm_read_guest_virt(vcpu, pageinfo_gva, &pageinfo,
+				sizeof(pageinfo), &ex);
+	if (r == X86EMUL_PROPAGATE_FAULT) {
+		kvm_inject_emulated_page_fault(vcpu, &ex);
+		return 1;
+	} else if (r != X86EMUL_CONTINUE) {
+		sgx_handle_emulation_failure(vcpu, pageinfo_gva,
+					     sizeof(pageinfo));
+		return 0;
+	}
+
+	if (sgx_get_encls_gva(vcpu, pageinfo.metadata, 64, 64, &metadata_gva) ||
+	    sgx_get_encls_gva(vcpu, pageinfo.contents, 4096, 4096,
+			      &contents_gva))
+		return 1;
+
+	/*
+	 * Translate the SECINFO, SOURCE and SECS pointers from GVA to GPA.
+	 * Resume the guest on failure to inject a #PF.
+	 */
+	if (sgx_gva_to_gpa(vcpu, metadata_gva, false, &metadata_gpa) ||
+	    sgx_gva_to_gpa(vcpu, contents_gva, false, &contents_gpa) ||
+	    sgx_gva_to_gpa(vcpu, secs_gva, true, &secs_gpa))
+		return 1;
+
+	/*
+	 * ...and then to HVA.  The order of accesses isn't architectural, i.e.
+	 * KVM doesn't have to fully process one address at a time.  Exit to
+	 * userspace if a GPA is invalid.
+	 */
+	if (sgx_gpa_to_hva(vcpu, metadata_gpa, &metadata_hva) ||
+	    sgx_gpa_to_hva(vcpu, contents_gpa, &contents_hva) ||
+	    sgx_gpa_to_hva(vcpu, secs_gpa, &secs_hva))
+		return 0;
+
+	/*
+	 * Copy contents into kernel memory to prevent TOCTOU attack. E.g. the
+	 * guest could do ECREATE w/ SECS.SGX_ATTR_PROVISIONKEY=0, and
+	 * simultaneously set SGX_ATTR_PROVISIONKEY to bypass the check to
+	 * enforce restriction of access to the PROVISIONKEY.
+	 */
+	contents = (struct sgx_secs *)__get_free_page(GFP_KERNEL);
+	if (!contents)
+		return -ENOMEM;
+
+	/* Exit to userspace if copying from a host userspace address fails. */
+	if (sgx_read_hva(vcpu, contents_hva, (void *)contents, PAGE_SIZE)) {
+		free_page((unsigned long)contents);
+		return 0;
+	}
+
+	pageinfo.metadata = metadata_hva;
+	pageinfo.contents = (u64)contents;
+
+	r = __handle_encls_ecreate(vcpu, &pageinfo, secs_hva, secs_gva);
+
+	free_page((unsigned long)contents);
+
+	return r;
+}
+
 static inline bool encls_leaf_enabled_in_guest(struct kvm_vcpu *vcpu, u32 leaf)
 {
 	if (!enable_sgx || !guest_cpuid_has(vcpu, X86_FEATURE_SGX))
@@ -41,6 +302,8 @@ int handle_encls(struct kvm_vcpu *vcpu)
 	} else if (!sgx_enabled_in_guest_bios(vcpu)) {
 		kvm_inject_gp(vcpu, 0);
 	} else {
+		if (leaf == ECREATE)
+			return handle_encls_ecreate(vcpu);
 		WARN(1, "KVM: unexpected exit on ENCLS[%u]", leaf);
 		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
 		vcpu->run->hw.hardware_exit_reason = EXIT_REASON_ENCLS;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 22/25] KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (20 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 21/25] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-03-19  7:23 ` [PATCH v3 23/25] KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC) Kai Huang
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jmattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Emulate the four Launch Enclave public key hash MSRs (LE hash MSRs) that
exist on CPUs that support SGX Launch Control (LC).  SGX LC modifies the
behavior of ENCLS[EINIT] to use the LE hash MSRs when verifying the key
used to sign an enclave.  On CPUs without LC support, the LE hash is
hardwired into the CPU to an Intel controlled key (the Intel key is also
the reset value of the LE hash MSRs). Track the guest's desired hash so
that a future patch can stuff the hash into the hardware MSRs when
executing EINIT on behalf of the guest, when those MSRs are writable in
host.

Note, KVM allows writes to the LE hash MSRs if IA32_FEATURE_CONTROL is
unlocked.  This is technically not architectural behavior, but it's
roughly equivalent to the arch behavior of the MSRs being writable prior
to activating SGX[1].  Emulating SGX activation is feasible, but adds no
tangible benefits and would just create extra work for KVM and guest
firmware.

[1] SGX related bits in IA32_FEATURE_CONTROL cannot be set until SGX
    is activated, e.g. by firmware.  SGX activation is triggered by
    setting bit 0 in MSR 0x7a.  Until SGX is activated, the LE hash
    MSRs are writable, e.g. to allow firmware to lock down the LE
    root key with a non-Intel value.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/vmx/sgx.c | 35 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/sgx.h |  6 ++++++
 arch/x86/kvm/vmx/vmx.c | 20 ++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h |  2 ++
 4 files changed, 63 insertions(+)

diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index cb7cc6174a84..be0429c3c04a 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -11,6 +11,9 @@
 
 bool __read_mostly enable_sgx;
 
+/* Initial value of guest's virtual SGX_LEPUBKEYHASHn MSRs */
+static u64 sgx_pubkey_hash[4] __ro_after_init;
+
 /*
  * ENCLS's memory operands use a fixed segment (DS) and a fixed
  * address size based on the mode.  Related prefixes are ignored.
@@ -311,3 +314,35 @@ int handle_encls(struct kvm_vcpu *vcpu)
 	}
 	return 1;
 }
+
+void setup_default_sgx_lepubkeyhash(void)
+{
+	/*
+	 * Use Intel's default value for Skylake hardware if Launch Control is
+	 * not supported, i.e. Intel's hash is hardcoded into silicon, or if
+	 * Launch Control is supported and enabled, i.e. mimic the reset value
+	 * and let the guest write the MSRs at will.  If Launch Control is
+	 * supported but disabled, then use the current MSR values as the hash
+	 * MSRs exist but are read-only (locked and not writable).
+	 */
+	if (!enable_sgx || boot_cpu_has(X86_FEATURE_SGX_LC) ||
+	    rdmsrl_safe(MSR_IA32_SGXLEPUBKEYHASH0, &sgx_pubkey_hash[0])) {
+		sgx_pubkey_hash[0] = 0xa6053e051270b7acULL;
+		sgx_pubkey_hash[1] = 0x6cfbe8ba8b3b413dULL;
+		sgx_pubkey_hash[2] = 0xc4916d99f2b3735dULL;
+		sgx_pubkey_hash[3] = 0xd4f8c05909f9bb3bULL;
+	} else {
+		/* MSR_IA32_SGXLEPUBKEYHASH0 is read above */
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH1, sgx_pubkey_hash[1]);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH2, sgx_pubkey_hash[2]);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH3, sgx_pubkey_hash[3]);
+	}
+}
+
+void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	memcpy(vmx->msr_ia32_sgxlepubkeyhash, sgx_pubkey_hash,
+	       sizeof(sgx_pubkey_hash));
+}
diff --git a/arch/x86/kvm/vmx/sgx.h b/arch/x86/kvm/vmx/sgx.h
index 6e17ecd4aca3..6502fa52c7e9 100644
--- a/arch/x86/kvm/vmx/sgx.h
+++ b/arch/x86/kvm/vmx/sgx.h
@@ -8,8 +8,14 @@
 extern bool __read_mostly enable_sgx;
 
 int handle_encls(struct kvm_vcpu *vcpu);
+
+void setup_default_sgx_lepubkeyhash(void);
+void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu);
 #else
 #define enable_sgx 0
+
+static inline void setup_default_sgx_lepubkeyhash(void) { }
+static inline void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu) { }
 #endif
 
 #endif /* __KVM_X86_SGX_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef668047a8f9..070460df5c3b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1904,6 +1904,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_FEAT_CTL:
 		msr_info->data = vmx->msr_ia32_feature_control;
 		break;
+	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
+			return 1;
+		msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
+			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
+		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!nested_vmx_allowed(vcpu))
 			return 1;
@@ -2198,6 +2205,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (msr_info->host_initiated && data == 0)
 			vmx_leave_nested(vcpu);
 		break;
+	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
+		if (!msr_info->host_initiated &&
+		    (!guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC) ||
+		    ((vmx->msr_ia32_feature_control & FEAT_CTL_LOCKED) &&
+		    !(vmx->msr_ia32_feature_control & FEAT_CTL_SGX_LC_ENABLED))))
+			return 1;
+		vmx->msr_ia32_sgxlepubkeyhash
+			[msr_index - MSR_IA32_SGXLEPUBKEYHASH0] = data;
+		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!msr_info->host_initiated)
 			return 1; /* they are read-only */
@@ -7020,6 +7036,8 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
 	else
 		memset(&vmx->nested.msrs, 0, sizeof(vmx->nested.msrs));
 
+	vcpu_setup_sgx_lepubkeyhash(vcpu);
+
 	vmx->nested.posted_intr_nv = -1;
 	vmx->nested.current_vmptr = -1ull;
 
@@ -7953,6 +7971,8 @@ static __init int hardware_setup(void)
 	if (!enable_ept || !cpu_has_vmx_intel_pt())
 		pt_mode = PT_MODE_SYSTEM;
 
+	setup_default_sgx_lepubkeyhash();
+
 	if (nested) {
 		nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
 					   vmx_capability.ept);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 89da5e1251f1..d0bf078b1087 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -325,6 +325,8 @@ struct vcpu_vmx {
 	 */
 	u64 msr_ia32_feature_control;
 	u64 msr_ia32_feature_control_valid_bits;
+	/* SGX Launch Control public key hash */
+	u64 msr_ia32_sgxlepubkeyhash[4];
 	u64 ept_pointer;
 
 	struct pt_desc pt_desc;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 23/25] KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (21 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 22/25] KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-03-19  7:23 ` [PATCH v3 24/25] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC Kai Huang
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jmattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a VM-Exit handler to trap-and-execute EINIT when SGX LC is enabled
in the host.  When SGX LC is enabled, the host kernel may rewrite the
hardware values at will, e.g. to launch enclaves with different signers,
thus KVM needs to intercept EINIT to ensure it is executed with the
correct LE hash (even if the guest sees a hardwired hash).

Switching the LE hash MSRs on VM-Enter/VM-Exit is not a viable option as
writing the MSRs is prohibitively expensive, e.g. on SKL hardware each
WRMSR is ~400 cycles.  And because EINIT takes tens of thousands of
cycles to execute, the ~1500 cycle overhead to trap-and-execute EINIT is
unlikely to be noticed by the guest, let alone impact its overall SGX
performance.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/vmx/sgx.c | 55 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index be0429c3c04a..9f0149b9d53f 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -275,6 +275,59 @@ static int handle_encls_ecreate(struct kvm_vcpu *vcpu)
 	return r;
 }
 
+static int handle_encls_einit(struct kvm_vcpu *vcpu)
+{
+	unsigned long sig_hva, secs_hva, token_hva, rflags;
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	gva_t sig_gva, secs_gva, token_gva;
+	gpa_t sig_gpa, secs_gpa, token_gpa;
+	int ret, trapnr;
+
+	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 1808, 4096, &sig_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_rdx_read(vcpu), 304, 512, &token_gva))
+		return 1;
+
+	/*
+	 * Translate the SIGSTRUCT, SECS and TOKEN pointers from GVA to GPA.
+	 * Resume the guest on failure to inject a #PF.
+	 */
+	if (sgx_gva_to_gpa(vcpu, sig_gva, false, &sig_gpa) ||
+	    sgx_gva_to_gpa(vcpu, secs_gva, true, &secs_gpa) ||
+	    sgx_gva_to_gpa(vcpu, token_gva, false, &token_gpa))
+		return 1;
+
+	/*
+	 * ...and then to HVA.  The order of accesses isn't architectural, i.e.
+	 * KVM doesn't have to fully process one address at a time.  Exit to
+	 * userspace if a GPA is invalid.  Note, all structures are aligned and
+	 * cannot split pages.
+	 */
+	if (sgx_gpa_to_hva(vcpu, sig_gpa, &sig_hva) ||
+	    sgx_gpa_to_hva(vcpu, secs_gpa, &secs_hva) ||
+	    sgx_gpa_to_hva(vcpu, token_gpa, &token_hva))
+		return 0;
+
+	ret = sgx_virt_einit((void __user *)sig_hva, (void __user *)token_hva,
+			     (void __user *)secs_hva,
+			     vmx->msr_ia32_sgxlepubkeyhash, &trapnr);
+
+	if (ret == -EFAULT)
+		return sgx_inject_fault(vcpu, secs_gva, trapnr);
+
+	rflags = vmx_get_rflags(vcpu) & ~(X86_EFLAGS_CF | X86_EFLAGS_PF |
+					  X86_EFLAGS_AF | X86_EFLAGS_SF |
+					  X86_EFLAGS_OF);
+	if (ret)
+		rflags |= X86_EFLAGS_ZF;
+	else
+		rflags &= ~X86_EFLAGS_ZF;
+	vmx_set_rflags(vcpu, rflags);
+
+	kvm_rax_write(vcpu, ret);
+	return kvm_skip_emulated_instruction(vcpu);
+}
+
 static inline bool encls_leaf_enabled_in_guest(struct kvm_vcpu *vcpu, u32 leaf)
 {
 	if (!enable_sgx || !guest_cpuid_has(vcpu, X86_FEATURE_SGX))
@@ -307,6 +360,8 @@ int handle_encls(struct kvm_vcpu *vcpu)
 	} else {
 		if (leaf == ECREATE)
 			return handle_encls_ecreate(vcpu);
+		if (leaf == EINIT)
+			return handle_encls_einit(vcpu);
 		WARN(1, "KVM: unexpected exit on ENCLS[%u]", leaf);
 		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
 		vcpu->run->hw.hardware_exit_reason = EXIT_REASON_ENCLS;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 24/25] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (22 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 23/25] KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC) Kai Huang
@ 2021-03-19  7:23 ` Kai Huang
  2021-03-19  7:24 ` [PATCH v3 25/25] KVM: x86: Add capability to grant VM access to privileged SGX attribute Kai Huang
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:23 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jmattson, joro, vkuznets, wanpengli, Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Enable SGX virtualization now that KVM has the VM-Exit handlers needed
to trap-and-execute ENCLS to ensure correctness and/or enforce the CPU
model exposed to the guest.  Add a KVM module param, "sgx", to allow an
admin to disable SGX virtualization independent of the kernel.

When supported in hardware and the kernel, advertise SGX1, SGX2 and SGX
LC to userspace via CPUID and wire up the ENCLS_EXITING bitmap based on
the guest's SGX capabilities, i.e. to allow ENCLS to be executed in an
SGX-enabled guest.  With the exception of the provision key, all SGX
attribute bits may be exposed to the guest.  Guest access to the
provision key, which is controlled via securityfs, will be added in a
future patch.

Note, KVM does not yet support exposing ENCLS_C leafs or ENCLV leafs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/cpuid.c      | 57 +++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/nested.c | 26 +++++++++++--
 arch/x86/kvm/vmx/nested.h |  5 +++
 arch/x86/kvm/vmx/sgx.c    | 80 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/sgx.h    | 13 +++++++
 arch/x86/kvm/vmx/vmcs12.c |  1 +
 arch/x86/kvm/vmx/vmcs12.h |  4 +-
 arch/x86/kvm/vmx/vmx.c    | 35 ++++++++++++++++-
 8 files changed, 212 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a0e7be9ed449..a0d45607b702 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -18,6 +18,7 @@
 #include <asm/processor.h>
 #include <asm/user.h>
 #include <asm/fpu/xstate.h>
+#include <asm/sgx.h>
 #include "cpuid.h"
 #include "lapic.h"
 #include "mmu.h"
@@ -171,6 +172,21 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		vcpu->arch.guest_supported_xcr0 =
 			(best->eax | ((u64)best->edx << 32)) & supported_xcr0;
 
+	/*
+	 * Bits 127:0 of the allowed SECS.ATTRIBUTES (CPUID.0x12.0x1) enumerate
+	 * the supported XSAVE Feature Request Mask (XFRM), i.e. the enclave's
+	 * requested XCR0 value.  The enclave's XFRM must be a subset of XCRO
+	 * at the time of EENTER, thus adjust the allowed XFRM by the guest's
+	 * supported XCR0.  Similar to XCR0 handling, FP and SSE are forced to
+	 * '1' even on CPUs that don't support XSAVE.
+	 */
+	best = kvm_find_cpuid_entry(vcpu, 0x12, 0x1);
+	if (best) {
+		best->ecx &= vcpu->arch.guest_supported_xcr0 & 0xffffffff;
+		best->edx &= vcpu->arch.guest_supported_xcr0 >> 32;
+		best->ecx |= XFEATURE_MASK_FPSSE;
+	}
+
 	kvm_update_pv_runtime(vcpu);
 
 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
@@ -429,7 +445,7 @@ void kvm_set_cpu_caps(void)
 	);
 
 	kvm_cpu_cap_mask(CPUID_7_0_EBX,
-		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
+		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
 		F(BMI2) | F(ERMS) | F(INVPCID) | F(RTM) | 0 /*MPX*/ | F(RDSEED) |
 		F(ADX) | F(SMAP) | F(AVX512IFMA) | F(AVX512F) | F(AVX512PF) |
 		F(AVX512ER) | F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(AVX512DQ) |
@@ -440,7 +456,8 @@ void kvm_set_cpu_caps(void)
 		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
-		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/
+		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
+		F(SGX_LC)
 	);
 	/* Set LA57 based on hardware capability. */
 	if (cpuid_ecx(7) & F(LA57))
@@ -479,6 +496,10 @@ void kvm_set_cpu_caps(void)
 		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES)
 	);
 
+	kvm_cpu_cap_init(CPUID_12_EAX,
+		SF(SGX1) | SF(SGX2)
+	);
+
 	kvm_cpu_cap_mask(CPUID_8000_0001_ECX,
 		F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
 		F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
@@ -800,6 +821,38 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			entry->edx = 0;
 		}
 		break;
+	case 0x12:
+		/* Intel SGX */
+		if (!kvm_cpu_cap_has(X86_FEATURE_SGX)) {
+			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+			break;
+		}
+
+		/*
+		 * Index 0: Sub-features, MISCSELECT (a.k.a extended features)
+		 * and max enclave sizes.   The SGX sub-features and MISCSELECT
+		 * are restricted by kernel and KVM capabilities (like most
+		 * feature flags), while enclave size is unrestricted.
+		 */
+		cpuid_entry_override(entry, CPUID_12_EAX);
+		entry->ebx &= SGX_MISC_EXINFO;
+
+		entry = do_host_cpuid(array, function, 1);
+		if (!entry)
+			goto out;
+
+		/*
+		 * Index 1: SECS.ATTRIBUTES.  ATTRIBUTES are restricted a la
+		 * feature flags.  Advertise all supported flags, including
+		 * privileged attributes that require explicit opt-in from
+		 * userspace.  ATTRIBUTES.XFRM is not adjusted as userspace is
+		 * expected to derive it from supported XCR0.
+		 */
+		entry->eax &= SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT |
+			      /* PROVISIONKEY | */ SGX_ATTR_EINITTOKENKEY |
+			      SGX_ATTR_KSS;
+		entry->ebx &= 0;
+		break;
 	/* Intel PT */
 	case 0x14:
 		if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT)) {
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 28848e9f70e2..ba8b7755cd12 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -11,6 +11,7 @@
 #include "mmu.h"
 #include "nested.h"
 #include "pmu.h"
+#include "sgx.h"
 #include "trace.h"
 #include "vmx.h"
 #include "x86.h"
@@ -2306,6 +2307,9 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 		if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
 		    exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
 
+		if (exec_control & SECONDARY_EXEC_ENCLS_EXITING)
+			vmx_write_encls_bitmap(&vmx->vcpu, vmcs12);
+
 		secondary_exec_controls_set(vmx, exec_control);
 	}
 
@@ -5707,6 +5711,20 @@ static bool nested_vmx_exit_handled_cr(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+static bool nested_vmx_exit_handled_encls(struct kvm_vcpu *vcpu,
+					  struct vmcs12 *vmcs12)
+{
+	u32 encls_leaf;
+
+	if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENCLS_EXITING))
+		return false;
+
+	encls_leaf = kvm_rax_read(vcpu);
+	if (encls_leaf > 62)
+		encls_leaf = 63;
+	return vmcs12->encls_exiting_bitmap & BIT_ULL(encls_leaf);
+}
+
 static bool nested_vmx_exit_handled_vmcs_access(struct kvm_vcpu *vcpu,
 	struct vmcs12 *vmcs12, gpa_t bitmap)
 {
@@ -5803,9 +5821,6 @@ static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu,
 	case EXIT_REASON_VMFUNC:
 		/* VM functions are emulated through L2->L0 vmexits. */
 		return true;
-	case EXIT_REASON_ENCLS:
-		/* SGX is never exposed to L1 */
-		return true;
 	default:
 		break;
 	}
@@ -5929,6 +5944,8 @@ static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu,
 	case EXIT_REASON_TPAUSE:
 		return nested_cpu_has2(vmcs12,
 			SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE);
+	case EXIT_REASON_ENCLS:
+		return nested_vmx_exit_handled_encls(vcpu, vmcs12);
 	default:
 		return true;
 	}
@@ -6504,6 +6521,9 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		msrs->secondary_ctls_high |=
 			SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 
+	if (enable_sgx)
+		msrs->secondary_ctls_high |= SECONDARY_EXEC_ENCLS_EXITING;
+
 	/* miscellaneous data */
 	rdmsr(MSR_IA32_VMX_MISC,
 		msrs->misc_low,
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 197148d76b8f..184418baeb3c 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -244,6 +244,11 @@ static inline bool nested_exit_on_intr(struct kvm_vcpu *vcpu)
 		PIN_BASED_EXT_INTR_MASK;
 }
 
+static inline bool nested_cpu_has_encls_exit(struct vmcs12 *vmcs12)
+{
+	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENCLS_EXITING);
+}
+
 /*
  * if fixed0[i] == 1: val[i] must be 1
  * if fixed1[i] == 0: val[i] must be 0
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 9f0149b9d53f..dc43d90da49b 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -5,11 +5,13 @@
 
 #include "cpuid.h"
 #include "kvm_cache_regs.h"
+#include "nested.h"
 #include "sgx.h"
 #include "vmx.h"
 #include "x86.h"
 
-bool __read_mostly enable_sgx;
+bool __read_mostly enable_sgx = 1;
+module_param_named(sgx, enable_sgx, bool, 0444);
 
 /* Initial value of guest's virtual SGX_LEPUBKEYHASHn MSRs */
 static u64 sgx_pubkey_hash[4] __ro_after_init;
@@ -401,3 +403,79 @@ void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu)
 	memcpy(vmx->msr_ia32_sgxlepubkeyhash, sgx_pubkey_hash,
 	       sizeof(sgx_pubkey_hash));
 }
+
+/*
+ * ECREATE must be intercepted to enforce MISCSELECT, ATTRIBUTES and XFRM
+ * restrictions if the guest's allowed-1 settings diverge from hardware.
+ */
+static bool sgx_intercept_encls_ecreate(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *guest_cpuid;
+	u32 eax, ebx, ecx, edx;
+
+	if (!vcpu->kvm->arch.sgx_provisioning_allowed)
+		return true;
+
+	guest_cpuid = kvm_find_cpuid_entry(vcpu, 0x12, 0);
+	if (!guest_cpuid)
+		return true;
+
+	cpuid_count(0x12, 0, &eax, &ebx, &ecx, &edx);
+	if (guest_cpuid->ebx != ebx || guest_cpuid->edx != edx)
+		return true;
+
+	guest_cpuid = kvm_find_cpuid_entry(vcpu, 0x12, 1);
+	if (!guest_cpuid)
+		return true;
+
+	cpuid_count(0x12, 1, &eax, &ebx, &ecx, &edx);
+	if (guest_cpuid->eax != eax || guest_cpuid->ebx != ebx ||
+	    guest_cpuid->ecx != ecx || guest_cpuid->edx != edx)
+		return true;
+
+	return false;
+}
+
+void vmx_write_encls_bitmap(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
+{
+	/*
+	 * There is no software enable bit for SGX that is virtualized by
+	 * hardware, e.g. there's no CR4.SGXE, so when SGX is disabled in the
+	 * guest (either by the host or by the guest's BIOS) but enabled in the
+	 * host, trap all ENCLS leafs and inject #UD/#GP as needed to emulate
+	 * the expected system behavior for ENCLS.
+	 */
+	u64 bitmap = -1ull;
+
+	/* Nothing to do if hardware doesn't support SGX */
+	if (!cpu_has_vmx_encls_vmexit())
+		return;
+
+	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX) &&
+	    sgx_enabled_in_guest_bios(vcpu)) {
+		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX1)) {
+			bitmap &= ~GENMASK_ULL(ETRACK, ECREATE);
+			if (sgx_intercept_encls_ecreate(vcpu))
+				bitmap |= (1 << ECREATE);
+		}
+
+		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX2))
+			bitmap &= ~GENMASK_ULL(EMODT, EAUG);
+
+		/*
+		 * Trap and execute EINIT if launch control is enabled in the
+		 * host using the guest's values for launch control MSRs, even
+		 * if the guest's values are fixed to hardware default values.
+		 * The MSRs are not loaded/saved on VM-Enter/VM-Exit as writing
+		 * the MSRs is extraordinarily expensive.
+		 */
+		if (boot_cpu_has(X86_FEATURE_SGX_LC))
+			bitmap |= (1 << EINIT);
+
+		if (!vmcs12 && is_guest_mode(vcpu))
+			vmcs12 = get_vmcs12(vcpu);
+		if (vmcs12 && nested_cpu_has_encls_exit(vmcs12))
+			bitmap |= vmcs12->encls_exiting_bitmap;
+	}
+	vmcs_write64(ENCLS_EXITING_BITMAP, bitmap);
+}
diff --git a/arch/x86/kvm/vmx/sgx.h b/arch/x86/kvm/vmx/sgx.h
index 6502fa52c7e9..a400888b376d 100644
--- a/arch/x86/kvm/vmx/sgx.h
+++ b/arch/x86/kvm/vmx/sgx.h
@@ -4,6 +4,9 @@
 
 #include <linux/kvm_host.h>
 
+#include "capabilities.h"
+#include "vmx_ops.h"
+
 #ifdef CONFIG_X86_SGX_KVM
 extern bool __read_mostly enable_sgx;
 
@@ -11,11 +14,21 @@ int handle_encls(struct kvm_vcpu *vcpu);
 
 void setup_default_sgx_lepubkeyhash(void);
 void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu);
+
+void vmx_write_encls_bitmap(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12);
 #else
 #define enable_sgx 0
 
 static inline void setup_default_sgx_lepubkeyhash(void) { }
 static inline void vcpu_setup_sgx_lepubkeyhash(struct kvm_vcpu *vcpu) { }
+
+static inline void vmx_write_encls_bitmap(struct kvm_vcpu *vcpu,
+					  struct vmcs12 *vmcs12)
+{
+	/* Nothing to do if hardware doesn't support SGX */
+	if (cpu_has_vmx_encls_vmexit())
+		vmcs_write64(ENCLS_EXITING_BITMAP, -1ull);
+}
 #endif
 
 #endif /* __KVM_X86_SGX_H */
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index c8e51c004f78..034adb6404dc 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -50,6 +50,7 @@ const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD64(VMREAD_BITMAP, vmread_bitmap),
 	FIELD64(VMWRITE_BITMAP, vmwrite_bitmap),
 	FIELD64(XSS_EXIT_BITMAP, xss_exit_bitmap),
+	FIELD64(ENCLS_EXITING_BITMAP, encls_exiting_bitmap),
 	FIELD64(GUEST_PHYSICAL_ADDRESS, guest_physical_address),
 	FIELD64(VMCS_LINK_POINTER, vmcs_link_pointer),
 	FIELD64(GUEST_IA32_DEBUGCTL, guest_ia32_debugctl),
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 80232daf00ff..13494956d0e9 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -69,7 +69,8 @@ struct __packed vmcs12 {
 	u64 vm_function_control;
 	u64 eptp_list_address;
 	u64 pml_address;
-	u64 padding64[3]; /* room for future expansion */
+	u64 encls_exiting_bitmap;
+	u64 padding64[2]; /* room for future expansion */
 	/*
 	 * To allow migration of L1 (complete with its L2 guests) between
 	 * machines of different natural widths (32 or 64 bit), we cannot have
@@ -256,6 +257,7 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(vm_function_control, 296);
 	CHECK_OFFSET(eptp_list_address, 304);
 	CHECK_OFFSET(pml_address, 312);
+	CHECK_OFFSET(encls_exiting_bitmap, 320);
 	CHECK_OFFSET(cr0_guest_host_mask, 344);
 	CHECK_OFFSET(cr4_guest_host_mask, 352);
 	CHECK_OFFSET(cr0_read_shadow, 360);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 070460df5c3b..0c2394f9020a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2204,6 +2204,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx->msr_ia32_feature_control = data;
 		if (msr_info->host_initiated && data == 0)
 			vmx_leave_nested(vcpu);
+
+		/* SGX may be enabled/disabled by guest's firmware */
+		vmx_write_encls_bitmap(vcpu, NULL);
 		break;
 	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
 		if (!msr_info->host_initiated &&
@@ -4366,6 +4369,15 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 	if (!vcpu->kvm->arch.bus_lock_detection_enabled)
 		exec_control &= ~SECONDARY_EXEC_BUS_LOCK_DETECTION;
 
+	if (cpu_has_vmx_encls_vmexit() && nested) {
+		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX))
+			vmx->nested.msrs.secondary_ctls_high |=
+				SECONDARY_EXEC_ENCLS_EXITING;
+		else
+			vmx->nested.msrs.secondary_ctls_high &=
+				~SECONDARY_EXEC_ENCLS_EXITING;
+	}
+
 	vmx->secondary_exec_control = exec_control;
 }
 
@@ -4465,8 +4477,7 @@ static void init_vmcs(struct vcpu_vmx *vmx)
 		vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
 	}
 
-	if (cpu_has_vmx_encls_vmexit())
-		vmcs_write64(ENCLS_EXITING_BITMAP, -1ull);
+	vmx_write_encls_bitmap(&vmx->vcpu, NULL);
 
 	if (vmx_pt_mode_is_host_guest()) {
 		memset(&vmx->pt_desc, 0, sizeof(vmx->pt_desc));
@@ -7364,6 +7375,19 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	set_cr4_guest_host_mask(vmx);
 
+	vmx_write_encls_bitmap(vcpu, NULL);
+	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX))
+		vmx->msr_ia32_feature_control_valid_bits |= FEAT_CTL_SGX_ENABLED;
+	else
+		vmx->msr_ia32_feature_control_valid_bits &= ~FEAT_CTL_SGX_ENABLED;
+
+	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
+		vmx->msr_ia32_feature_control_valid_bits |=
+			FEAT_CTL_SGX_LC_ENABLED;
+	else
+		vmx->msr_ia32_feature_control_valid_bits &=
+			~FEAT_CTL_SGX_LC_ENABLED;
+
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
 }
@@ -7384,6 +7408,13 @@ static __init void vmx_set_cpu_caps(void)
 	if (vmx_pt_mode_is_host_guest())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT);
 
+	if (!enable_sgx) {
+		kvm_cpu_cap_clear(X86_FEATURE_SGX);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX_LC);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX1);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX2);
+	}
+
 	if (vmx_umip_emulated())
 		kvm_cpu_cap_set(X86_FEATURE_UMIP);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 25/25] KVM: x86: Add capability to grant VM access to privileged SGX attribute
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (23 preceding siblings ...)
  2021-03-19  7:23 ` [PATCH v3 24/25] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC Kai Huang
@ 2021-03-19  7:24 ` Kai Huang
  2021-03-19 14:52 ` [PATCH v3 00/25] KVM SGX virtualization support Jarkko Sakkinen
  2021-03-26 22:46 ` Jarkko Sakkinen
  26 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:24 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jmattson, joro, vkuznets, wanpengli, corbet, Andy Lutomirski,
	Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a capability, KVM_CAP_SGX_ATTRIBUTE, that can be used by userspace
to grant a VM access to a priveleged attribute, with args[0] holding a
file handle to a valid SGX attribute file.

The SGX subsystem restricts access to a subset of enclave attributes to
provide additional security for an uncompromised kernel, e.g. to prevent
malware from using the PROVISIONKEY to ensure its nodes are running
inside a geniune SGX enclave and/or to obtain a stable fingerprint.

To prevent userspace from circumventing such restrictions by running an
enclave in a VM, KVM restricts guest access to privileged attributes by
default.

Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++
 arch/x86/kvm/cpuid.c           |  2 +-
 arch/x86/kvm/x86.c             | 21 +++++++++++++++++++++
 include/uapi/linux/kvm.h       |  1 +
 4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 38e327d4b479..ebb47e48d4f3 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6230,6 +6230,29 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between them.
 This capability can be used to check / enable 2nd DAWR feature provided
 by POWER10 processor.
 
+7.24 KVM_CAP_SGX_ATTRIBUTE
+----------------------
+
+:Architectures: x86
+:Target: VM
+:Parameters: args[0] is a file handle of a SGX attribute file in securityfs
+:Returns: 0 on success, -EINVAL if the file handle is invalid or if a requested
+          attribute is not supported by KVM.
+
+KVM_CAP_SGX_ATTRIBUTE enables a userspace VMM to grant a VM access to one or
+more priveleged enclave attributes.  args[0] must hold a file handle to a valid
+SGX attribute file corresponding to an attribute that is supported/restricted
+by KVM (currently only PROVISIONKEY).
+
+The SGX subsystem restricts access to a subset of enclave attributes to provide
+additional security for an uncompromised kernel, e.g. use of the PROVISIONKEY
+is restricted to deter malware from using the PROVISIONKEY to obtain a stable
+system fingerprint.  To prevent userspace from circumventing such restrictions
+by running an enclave in a VM, KVM prevents access to privileged attributes by
+default.
+
+See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a0d45607b702..6dc12d949f86 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -849,7 +849,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		 * expected to derive it from supported XCR0.
 		 */
 		entry->eax &= SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT |
-			      /* PROVISIONKEY | */ SGX_ATTR_EINITTOKENKEY |
+			      SGX_ATTR_PROVISIONKEY | SGX_ATTR_EINITTOKENKEY |
 			      SGX_ATTR_KSS;
 		entry->ebx &= 0;
 		break;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d2da5abcf395..81139e076380 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -75,6 +75,7 @@
 #include <asm/tlbflush.h>
 #include <asm/intel_pt.h>
 #include <asm/emulate_prefix.h>
+#include <asm/sgx.h>
 #include <clocksource/hyperv_timer.h>
 
 #define CREATE_TRACE_POINTS
@@ -3759,6 +3760,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_X86_USER_SPACE_MSR:
 	case KVM_CAP_X86_MSR_FILTER:
 	case KVM_CAP_ENFORCE_PV_FEATURE_CPUID:
+#ifdef CONFIG_X86_SGX_KVM
+	case KVM_CAP_SGX_ATTRIBUTE:
+#endif
 		r = 1;
 		break;
 #ifdef CONFIG_KVM_XEN
@@ -5345,6 +5349,23 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			kvm->arch.bus_lock_detection_enabled = true;
 		r = 0;
 		break;
+#ifdef CONFIG_X86_SGX_KVM
+	case KVM_CAP_SGX_ATTRIBUTE: {
+		unsigned long allowed_attributes = 0;
+
+		r = sgx_set_attribute(&allowed_attributes, cap->args[0]);
+		if (r)
+			break;
+
+		/* KVM only supports the PROVISIONKEY privileged attribute. */
+		if ((allowed_attributes & SGX_ATTR_PROVISIONKEY) &&
+		    !(allowed_attributes & ~SGX_ATTR_PROVISIONKEY))
+			kvm->arch.sgx_provisioning_allowed = true;
+		else
+			r = -EINVAL;
+		break;
+	}
+#endif
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6afee209620..7d8927e474f8 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1078,6 +1078,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DIRTY_LOG_RING 192
 #define KVM_CAP_X86_BUS_LOCK_EXIT 193
 #define KVM_CAP_PPC_DAWR1 194
+#define KVM_CAP_SGX_ATTRIBUTE 195
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 00/25] KVM SGX virtualization support
@ 2021-03-19  7:29 Kai Huang
  2021-03-19  7:22 ` [PATCH v3 01/25] x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit Kai Huang
                   ` (26 more replies)
  0 siblings, 27 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-19  7:29 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jethro, b.thiel, jmattson, joro, vkuznets, wanpengli, corbet

This series adds KVM SGX virtualization support. The first 14 patches starting
with x86/sgx or x86/cpu.. are necessary changes to x86 and SGX core/driver to
support KVM SGX virtualization, while the rest are patches to KVM subsystem.

This series is based against latest tip/x86/sgx, which has Jarkko's NUMA
allocation support.

You can also get the code from upstream branch of kvm-sgx repo on github:

        https://github.com/intel/kvm-sgx.git upstream

It also requires Qemu changes to create VM with SGX support. You can find Qemu
repo here:

	https://github.com/intel/qemu-sgx.git upstream

Please refer to README.md of above qemu-sgx repo for detail on how to create
guest with SGX support. At meantime, for your quick reference you can use below
command to create SGX guest:

	#qemu-system-x86_64 -smp 4 -m 2G -drive file=<your_vm_image>,if=virtio \
		-cpu host,+sgx_provisionkey \
		-sgx-epc id=epc1,memdev=mem1 \
		-object memory-backend-epc,id=mem1,size=64M,prealloc

Please note that the SGX relevant part is:

		-cpu host,+sgx_provisionkey \
		-sgx-epc id=epc1,memdev=mem1 \
		-object memory-backend-epc,id=mem1,size=64M,prealloc

And you can change other parameters of your qemu command based on your needs.

=========
Changelog:

(Changelog here is for global changes. Please see each patch's changelog for
 changes made to specific patch.)

v2->v3:

 - No big change in design, structure of patch series, etc.
 - Rebased to lastest tip/x86/sgx, to resolve merge conflict of patch 3
   (x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()).
 - Addressed some Nit issues found by Sean in v2.
 - Also addressed some Nit issues reported by checkpatch.pl. Now there's no
   checkpatch issues.
 - Updated patch 3 (x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()):
   - Removed Jarkko from author, per request.
   - Changed to replace all call sites of sgx_free_epc_page() with new
     sgx_encl_free_epc_page(), to make this patch doesn't have functional
     changes (except a WARN upon EREMOVE failure requestd by Dave).
   - Rebased to tip/x86/sgx, which has Jarkko's NUMA allocation.
   - Added Jarkko's Acked-by.
 - Updated patch 8 (x86/sgx: Expose SGX architectural definitions to the
   kernel) to add MAINTAINER file update to include new introduced asm/sgx.h.
 - Updated patch 13 (x86/sgx: Add helpers to expose ECREATE and EINIT to KVM)
   to use addr and size directly in access_ok()s (which won't be triggered
   anyway).

v1->v2:

 - No big change in design, structural of patch series, etc.
 - Addressed Boris's comments regarding to suppressing both SGX1 and SGX2 in
   /proc/cpuinfo, and improvement in feat_ctl.c when enabling SGX (patch 2
   and 6).
 - Addressed Sean's comments for both x86 part patches and KVM patches (patch 3,
   5, 9, 12, 19, 21).
 - Addressed Dave's comments in RFC v6 series (patch 13).

RFC->v1:

 - Refined patch (x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()) to print
   error msg that EPC page is leaked when EREMOVE failed, requested by Dave.
 - Changelog history of all RFC series is removed in both this cover letter
   and each individual patch, since majority of x86 part patches already got
   Acked-by from Dave and Jarkko. And the changelogs are not quite useful from
   my perspective.

=========
KVM SGX virtualization Overview

- Virtual EPC

SGX enclave memory is special and is reserved specifically for enclave use.
In bare-metal SGX enclaves, the kernel allocates enclave pages, copies data
into the pages with privileged instructions, then allows the enclave to start.
In this scenario, only initialized pages already assigned to an enclave are
mapped to userspace.

In virtualized environments, the hypervisor still needs to do the physical
enclave page allocation.  The guest kernel is responsible for the data copying
(among other things).  This means that the job of starting an enclave is now
split between hypervisor and guest.

This series introduces a new misc device: /dev/sgx_vepc.  This device allows
the host to map *uninitialized* enclave memory into userspace, which can then
be passed into a guest.

While it might be *possible* to start a host-side enclave with /dev/sgx_enclave
and pass its memory into a guest, it would be wasteful and convoluted.

Implement the *raw* EPC allocation in the x86 core-SGX subsystem via
/dev/sgx_vepc rather than in KVM.  Doing so has two major advantages:

  - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
    just another memory backend for guests.

  - EPC management is wholly contained in the SGX subsystem, e.g. SGX
    does not have to export any symbols, changes to reclaim flows don't
    need to be routed through KVM, SGX's dirty laundry doesn't have to
    get aired out for the world to see, and so on and so forth.

The virtual EPC pages allocated to guests are currently not reclaimable.
Reclaiming EPC page used by enclave requires a special reclaim mechanism
separate from normal page reclaim, and that mechanism is not supported
for virutal EPC pages.  Due to the complications of handling reclaim
conflicts between guest and host, reclaiming virtual EPC pages is 
significantly more complex than basic support for SGX virtualization.

- Support SGX virtualization without SGX Flexible Launch Control

SGX hardware supports two "launch control" modes to limit which enclaves can
run.  In the "locked" mode, the hardware prevents enclaves from running unless
they are blessed by a third party.  In the unlocked mode, the kernel is in
full control of which enclaves can run.  The bare-metal SGX code refuses to
launch enclaves unless it is in the unlocked mode.

This sgx_virt_epc driver does not have such a restriction.  This allows guests
which are OK with the locked mode to use SGX, even if the host kernel refuses
to.

- Support exposing SGX2

Due to the same reason above, SGX2 feature detection is added to core SGX code
to allow KVM to expose SGX2 to guest, even currently SGX driver doesn't support
SGX2, because SGX2 can work just fine in guest w/o any interaction to host SGX
driver.

- Restricit SGX guest access to provisioning key

To grant guest being able to fully use SGX, guest needs to be able to access
provisioning key.  The provisioning key is sensitive, and accessing to it should
be restricted. In bare-metal driver, allowing enclave to access provisioning key
is restricted by being able to open /dev/sgx_provision.

Add a new KVM_CAP_SGX_ATTRIBUTE to KVM uAPI to extend above mechanism to KVM
guests as well.  When userspace hypervisor creates a new VM, the new cap is only
added to VM when userspace hypervisior is able to open /dev/sgx_provision,
following the same role as in bare-metal driver.  KVM then traps ECREATE from
guest, and only allows ECREATE with provisioning key bit to run when guest
supports KVM_CAP_SGX_ATTRIBUTE.


Kai Huang (4):
  x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit
  x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  x86/sgx: Initialize virtual EPC driver even when SGX driver is
    disabled
  x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs

Sean Christopherson (21):
  x86/cpufeatures: Add SGX1 and SGX2 sub-features
  x86/sgx: Add SGX_CHILD_PRESENT hardware error code
  x86/sgx: Introduce virtual EPC for use by KVM guests
  x86/cpu/intel: Allow SGX virtualization without Launch Control support
  x86/sgx: Expose SGX architectural definitions to the kernel
  x86/sgx: Move ENCLS leaf definitions to sgx.h
  x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT)
  x86/sgx: Add encls_faulted() helper
  x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  x86/sgx: Move provisioning device creation out of SGX driver
  KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX)
  KVM: x86: Define new #PF SGX error code bit
  KVM: x86: Add support for reverse CPUID lookup of scattered features
  KVM: x86: Add reverse-CPUID lookup support for scattered SGX features
  KVM: VMX: Add basic handling of VM-Exit from SGX enclave
  KVM: VMX: Frame in ENCLS handler for SGX virtualization
  KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions
  KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs
  KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)
  KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC
  KVM: x86: Add capability to grant VM access to privileged SGX
    attribute

 Documentation/virt/kvm/api.rst                |  23 +
 MAINTAINERS                                   |   1 +
 arch/x86/Kconfig                              |  12 +
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/kvm_host.h               |   5 +
 .../cpu/sgx/arch.h => include/asm/sgx.h}      |  50 +-
 arch/x86/include/asm/vmx.h                    |   1 +
 arch/x86/include/uapi/asm/vmx.h               |   1 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
 arch/x86/kernel/cpu/feat_ctl.c                |  71 ++-
 arch/x86/kernel/cpu/scattered.c               |   2 +
 arch/x86/kernel/cpu/sgx/Makefile              |   1 +
 arch/x86/kernel/cpu/sgx/driver.c              |  17 -
 arch/x86/kernel/cpu/sgx/encl.c                |  42 +-
 arch/x86/kernel/cpu/sgx/encl.h                |   1 +
 arch/x86/kernel/cpu/sgx/encls.h               |  30 +-
 arch/x86/kernel/cpu/sgx/ioctl.c               |  29 +-
 arch/x86/kernel/cpu/sgx/main.c                |  96 +++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |  13 +-
 arch/x86/kernel/cpu/sgx/virt.c                | 369 ++++++++++++++
 arch/x86/kvm/Makefile                         |   2 +
 arch/x86/kvm/cpuid.c                          |  89 +++-
 arch/x86/kvm/cpuid.h                          |  50 +-
 arch/x86/kvm/vmx/nested.c                     |  28 +-
 arch/x86/kvm/vmx/nested.h                     |   5 +
 arch/x86/kvm/vmx/sgx.c                        | 481 ++++++++++++++++++
 arch/x86/kvm/vmx/sgx.h                        |  34 ++
 arch/x86/kvm/vmx/vmcs12.c                     |   1 +
 arch/x86/kvm/vmx/vmcs12.h                     |   4 +-
 arch/x86/kvm/vmx/vmx.c                        | 109 +++-
 arch/x86/kvm/vmx/vmx.h                        |   2 +
 arch/x86/kvm/x86.c                            |  23 +
 include/uapi/linux/kvm.h                      |   1 +
 tools/testing/selftests/sgx/defines.h         |   2 +-
 34 files changed, 1476 insertions(+), 124 deletions(-)
 rename arch/x86/{kernel/cpu/sgx/arch.h => include/asm/sgx.h} (89%)
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.c
 create mode 100644 arch/x86/kvm/vmx/sgx.c
 create mode 100644 arch/x86/kvm/vmx/sgx.h

-- 
2.30.2


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 00/25] KVM SGX virtualization support
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (24 preceding siblings ...)
  2021-03-19  7:24 ` [PATCH v3 25/25] KVM: x86: Add capability to grant VM access to privileged SGX attribute Kai Huang
@ 2021-03-19 14:52 ` Jarkko Sakkinen
  2021-03-22 10:03   ` Kai Huang
  2021-03-26 22:46 ` Jarkko Sakkinen
  26 siblings, 1 reply; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-19 14:52 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jethro, b.thiel, jmattson, joro, vkuznets, wanpengli, corbet

On Fri, Mar 19, 2021 at 08:29:27PM +1300, Kai Huang wrote:
> This series adds KVM SGX virtualization support. The first 14 patches starting
> with x86/sgx or x86/cpu.. are necessary changes to x86 and SGX core/driver to
> support KVM SGX virtualization, while the rest are patches to KVM subsystem.
> 
> This series is based against latest tip/x86/sgx, which has Jarkko's NUMA
> allocation support.
> 
> You can also get the code from upstream branch of kvm-sgx repo on github:
> 
>         https://github.com/intel/kvm-sgx.git upstream
> 
> It also requires Qemu changes to create VM with SGX support. You can find Qemu
> repo here:
> 
> 	https://github.com/intel/qemu-sgx.git upstream
> 
> Please refer to README.md of above qemu-sgx repo for detail on how to create
> guest with SGX support. At meantime, for your quick reference you can use below
> command to create SGX guest:
> 
> 	#qemu-system-x86_64 -smp 4 -m 2G -drive file=<your_vm_image>,if=virtio \
> 		-cpu host,+sgx_provisionkey \
> 		-sgx-epc id=epc1,memdev=mem1 \
> 		-object memory-backend-epc,id=mem1,size=64M,prealloc
> 
> Please note that the SGX relevant part is:
> 
> 		-cpu host,+sgx_provisionkey \
> 		-sgx-epc id=epc1,memdev=mem1 \
> 		-object memory-backend-epc,id=mem1,size=64M,prealloc
> 
> And you can change other parameters of your qemu command based on your needs.
> 
> =========
> Changelog:
> 
> (Changelog here is for global changes. Please see each patch's changelog for
>  changes made to specific patch.)
> 
> v2->v3:
> 
>  - No big change in design, structure of patch series, etc.
>  - Rebased to lastest tip/x86/sgx, to resolve merge conflict of patch 3
>    (x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()).
>  - Addressed some Nit issues found by Sean in v2.
>  - Also addressed some Nit issues reported by checkpatch.pl. Now there's no
>    checkpatch issues.
>  - Updated patch 3 (x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()):
>    - Removed Jarkko from author, per request.
>    - Changed to replace all call sites of sgx_free_epc_page() with new
>      sgx_encl_free_epc_page(), to make this patch doesn't have functional
>      changes (except a WARN upon EREMOVE failure requestd by Dave).
>    - Rebased to tip/x86/sgx, which has Jarkko's NUMA allocation.
>    - Added Jarkko's Acked-by.
>  - Updated patch 8 (x86/sgx: Expose SGX architectural definitions to the
>    kernel) to add MAINTAINER file update to include new introduced asm/sgx.h.
>  - Updated patch 13 (x86/sgx: Add helpers to expose ECREATE and EINIT to KVM)
>    to use addr and size directly in access_ok()s (which won't be triggered
>    anyway).
> 
> v1->v2:
> 
>  - No big change in design, structural of patch series, etc.
>  - Addressed Boris's comments regarding to suppressing both SGX1 and SGX2 in
>    /proc/cpuinfo, and improvement in feat_ctl.c when enabling SGX (patch 2
>    and 6).
>  - Addressed Sean's comments for both x86 part patches and KVM patches (patch 3,
>    5, 9, 12, 19, 21).
>  - Addressed Dave's comments in RFC v6 series (patch 13).
> 
> RFC->v1:
> 
>  - Refined patch (x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()) to print
>    error msg that EPC page is leaked when EREMOVE failed, requested by Dave.
>  - Changelog history of all RFC series is removed in both this cover letter
>    and each individual patch, since majority of x86 part patches already got
>    Acked-by from Dave and Jarkko. And the changelogs are not quite useful from
>    my perspective.
> 
> =========
> KVM SGX virtualization Overview
> 
> - Virtual EPC
> 
> SGX enclave memory is special and is reserved specifically for enclave use.
> In bare-metal SGX enclaves, the kernel allocates enclave pages, copies data
> into the pages with privileged instructions, then allows the enclave to start.
> In this scenario, only initialized pages already assigned to an enclave are
> mapped to userspace.
> 
> In virtualized environments, the hypervisor still needs to do the physical
> enclave page allocation.  The guest kernel is responsible for the data copying
> (among other things).  This means that the job of starting an enclave is now
> split between hypervisor and guest.
> 
> This series introduces a new misc device: /dev/sgx_vepc.  This device allows
> the host to map *uninitialized* enclave memory into userspace, which can then
> be passed into a guest.
> 
> While it might be *possible* to start a host-side enclave with /dev/sgx_enclave
> and pass its memory into a guest, it would be wasteful and convoluted.
> 
> Implement the *raw* EPC allocation in the x86 core-SGX subsystem via
> /dev/sgx_vepc rather than in KVM.  Doing so has two major advantages:
> 
>   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
>     just another memory backend for guests.
> 
>   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
>     does not have to export any symbols, changes to reclaim flows don't
>     need to be routed through KVM, SGX's dirty laundry doesn't have to
>     get aired out for the world to see, and so on and so forth.
> 
> The virtual EPC pages allocated to guests are currently not reclaimable.
> Reclaiming EPC page used by enclave requires a special reclaim mechanism
> separate from normal page reclaim, and that mechanism is not supported
> for virutal EPC pages.  Due to the complications of handling reclaim
> conflicts between guest and host, reclaiming virtual EPC pages is 
> significantly more complex than basic support for SGX virtualization.
> 
> - Support SGX virtualization without SGX Flexible Launch Control
> 
> SGX hardware supports two "launch control" modes to limit which enclaves can
> run.  In the "locked" mode, the hardware prevents enclaves from running unless
> they are blessed by a third party.  In the unlocked mode, the kernel is in
> full control of which enclaves can run.  The bare-metal SGX code refuses to
> launch enclaves unless it is in the unlocked mode.
> 
> This sgx_virt_epc driver does not have such a restriction.  This allows guests
> which are OK with the locked mode to use SGX, even if the host kernel refuses
> to.
> 
> - Support exposing SGX2
> 
> Due to the same reason above, SGX2 feature detection is added to core SGX code
> to allow KVM to expose SGX2 to guest, even currently SGX driver doesn't support
> SGX2, because SGX2 can work just fine in guest w/o any interaction to host SGX
> driver.
> 
> - Restricit SGX guest access to provisioning key
> 
> To grant guest being able to fully use SGX, guest needs to be able to access
> provisioning key.  The provisioning key is sensitive, and accessing to it should
> be restricted. In bare-metal driver, allowing enclave to access provisioning key
> is restricted by being able to open /dev/sgx_provision.
> 
> Add a new KVM_CAP_SGX_ATTRIBUTE to KVM uAPI to extend above mechanism to KVM
> guests as well.  When userspace hypervisor creates a new VM, the new cap is only
> added to VM when userspace hypervisior is able to open /dev/sgx_provision,
> following the same role as in bare-metal driver.  KVM then traps ECREATE from
> guest, and only allows ECREATE with provisioning key bit to run when guest
> supports KVM_CAP_SGX_ATTRIBUTE.
> 
> 
> Kai Huang (4):
>   x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit
>   x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
>   x86/sgx: Initialize virtual EPC driver even when SGX driver is
>     disabled
>   x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs
> 
> Sean Christopherson (21):
>   x86/cpufeatures: Add SGX1 and SGX2 sub-features
>   x86/sgx: Add SGX_CHILD_PRESENT hardware error code
>   x86/sgx: Introduce virtual EPC for use by KVM guests
>   x86/cpu/intel: Allow SGX virtualization without Launch Control support
>   x86/sgx: Expose SGX architectural definitions to the kernel
>   x86/sgx: Move ENCLS leaf definitions to sgx.h
>   x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT)
>   x86/sgx: Add encls_faulted() helper
>   x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
>   x86/sgx: Move provisioning device creation out of SGX driver
>   KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX)
>   KVM: x86: Define new #PF SGX error code bit
>   KVM: x86: Add support for reverse CPUID lookup of scattered features
>   KVM: x86: Add reverse-CPUID lookup support for scattered SGX features
>   KVM: VMX: Add basic handling of VM-Exit from SGX enclave
>   KVM: VMX: Frame in ENCLS handler for SGX virtualization
>   KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions
>   KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs
>   KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)
>   KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC
>   KVM: x86: Add capability to grant VM access to privileged SGX
>     attribute
> 
>  Documentation/virt/kvm/api.rst                |  23 +
>  MAINTAINERS                                   |   1 +
>  arch/x86/Kconfig                              |  12 +
>  arch/x86/include/asm/cpufeatures.h            |   2 +
>  arch/x86/include/asm/kvm_host.h               |   5 +
>  .../cpu/sgx/arch.h => include/asm/sgx.h}      |  50 +-
>  arch/x86/include/asm/vmx.h                    |   1 +
>  arch/x86/include/uapi/asm/vmx.h               |   1 +
>  arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
>  arch/x86/kernel/cpu/feat_ctl.c                |  71 ++-
>  arch/x86/kernel/cpu/scattered.c               |   2 +
>  arch/x86/kernel/cpu/sgx/Makefile              |   1 +
>  arch/x86/kernel/cpu/sgx/driver.c              |  17 -
>  arch/x86/kernel/cpu/sgx/encl.c                |  42 +-
>  arch/x86/kernel/cpu/sgx/encl.h                |   1 +
>  arch/x86/kernel/cpu/sgx/encls.h               |  30 +-
>  arch/x86/kernel/cpu/sgx/ioctl.c               |  29 +-
>  arch/x86/kernel/cpu/sgx/main.c                |  96 +++-
>  arch/x86/kernel/cpu/sgx/sgx.h                 |  13 +-
>  arch/x86/kernel/cpu/sgx/virt.c                | 369 ++++++++++++++
>  arch/x86/kvm/Makefile                         |   2 +
>  arch/x86/kvm/cpuid.c                          |  89 +++-
>  arch/x86/kvm/cpuid.h                          |  50 +-
>  arch/x86/kvm/vmx/nested.c                     |  28 +-
>  arch/x86/kvm/vmx/nested.h                     |   5 +
>  arch/x86/kvm/vmx/sgx.c                        | 481 ++++++++++++++++++
>  arch/x86/kvm/vmx/sgx.h                        |  34 ++
>  arch/x86/kvm/vmx/vmcs12.c                     |   1 +
>  arch/x86/kvm/vmx/vmcs12.h                     |   4 +-
>  arch/x86/kvm/vmx/vmx.c                        | 109 +++-
>  arch/x86/kvm/vmx/vmx.h                        |   2 +
>  arch/x86/kvm/x86.c                            |  23 +
>  include/uapi/linux/kvm.h                      |   1 +
>  tools/testing/selftests/sgx/defines.h         |   2 +-
>  34 files changed, 1476 insertions(+), 124 deletions(-)
>  rename arch/x86/{kernel/cpu/sgx/arch.h => include/asm/sgx.h} (89%)
>  create mode 100644 arch/x86/kernel/cpu/sgx/virt.c
>  create mode 100644 arch/x86/kvm/vmx/sgx.c
>  create mode 100644 arch/x86/kvm/vmx/sgx.h
> 
> -- 
> 2.30.2
> 
> 

I just say add my ack to SGX specific patches where it is missing.
Good enough.

/Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 00/25] KVM SGX virtualization support
  2021-03-19 14:52 ` [PATCH v3 00/25] KVM SGX virtualization support Jarkko Sakkinen
@ 2021-03-22 10:03   ` Kai Huang
  2021-03-22 10:31     ` Borislav Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-22 10:03 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jethro, b.thiel, jmattson, joro, vkuznets, wanpengli, corbet

> 
> I just say add my ack to SGX specific patches where it is missing.
> Good enough.
> 
> /Jarkko

Thank you Jarkko!

Hi Boris,

If there's no other comments, should I send another version adding Jarkko's Acked-by
for the x86 SGX patches that don't have it (patch 2, 5, 6, 7, 8, 13 -- in which patch
2 and 6 are changes to generic x86)?



^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 00/25] KVM SGX virtualization support
  2021-03-22 10:03   ` Kai Huang
@ 2021-03-22 10:31     ` Borislav Petkov
  0 siblings, 0 replies; 134+ messages in thread
From: Borislav Petkov @ 2021-03-22 10:31 UTC (permalink / raw)
  To: Kai Huang
  Cc: Jarkko Sakkinen, kvm, x86, linux-sgx, linux-kernel, seanjc, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa, jethro, b.thiel, jmattson, joro, vkuznets, wanpengli,
	corbet

On Mon, Mar 22, 2021 at 11:03:28PM +1300, Kai Huang wrote:
> If there's no other comments, should I send another version

No need.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-19  7:22 ` [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() Kai Huang
@ 2021-03-22 18:16   ` Borislav Petkov
  2021-03-22 18:56     ` Sean Christopherson
  2021-03-25  9:30   ` [PATCH v4 " Kai Huang
  1 sibling, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-03-22 18:16 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, Mar 19, 2021 at 08:22:19PM +1300, Kai Huang wrote:
> +/**
> + * sgx_encl_free_epc_page - free EPC page assigned to an enclave
> + * @page:	EPC page to be freed
> + *
> + * Free EPC page assigned to an enclave.  It does EREMOVE for the page, and
> + * only upon success, it puts the page back to free page list.  Otherwise, it
> + * gives a WARNING to indicate page is leaked, and require reboot to retrieve
> + * leaked pages.
> + */
> +void sgx_encl_free_epc_page(struct sgx_epc_page *page)
> +{
> +	int ret;
> +
> +	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
> +
> +	/*
> +	 * Give a message to remind EPC page is leaked when EREMOVE fails,
> +	 * and requires machine reboot to get leaked pages back. This can
> +	 * be improved in future by adding stats of leaked pages, etc.
> +	 */
> +#define EREMOVE_ERROR_MESSAGE \
> +	"EREMOVE returned %d (0x%x).  EPC page leaked.  Reboot required to retrieve leaked pages."

A reboot? Seriously? Why?

How are you going to explain to cloud people that they need to reboot
their fat server? The same cloud people who want to make sure Intel
supports late microcode loading no matter the effort just so to avoid
rebooting the machine.

But now all of a sudden, if they wanna have SGX enclaves in guests, they
need to get prepared for potential rebooting.

I sure hope I'm missing something...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 18:16   ` Borislav Petkov
@ 2021-03-22 18:56     ` Sean Christopherson
  2021-03-22 19:11       ` Paolo Bonzini
  2021-03-22 19:15       ` Borislav Petkov
  0 siblings, 2 replies; 134+ messages in thread
From: Sean Christopherson @ 2021-03-22 18:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Mon, Mar 22, 2021, Borislav Petkov wrote:
> On Fri, Mar 19, 2021 at 08:22:19PM +1300, Kai Huang wrote:
> > +/**
> > + * sgx_encl_free_epc_page - free EPC page assigned to an enclave
> > + * @page:	EPC page to be freed
> > + *
> > + * Free EPC page assigned to an enclave.  It does EREMOVE for the page, and
> > + * only upon success, it puts the page back to free page list.  Otherwise, it
> > + * gives a WARNING to indicate page is leaked, and require reboot to retrieve
> > + * leaked pages.
> > + */
> > +void sgx_encl_free_epc_page(struct sgx_epc_page *page)
> > +{
> > +	int ret;
> > +
> > +	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
> > +
> > +	/*
> > +	 * Give a message to remind EPC page is leaked when EREMOVE fails,
> > +	 * and requires machine reboot to get leaked pages back. This can
> > +	 * be improved in future by adding stats of leaked pages, etc.
> > +	 */
> > +#define EREMOVE_ERROR_MESSAGE \
> > +	"EREMOVE returned %d (0x%x).  EPC page leaked.  Reboot required to retrieve leaked pages."
> 
> A reboot? Seriously? Why?
> 
> How are you going to explain to cloud people that they need to reboot
> their fat server? The same cloud people who want to make sure Intel
> supports late microcode loading no matter the effort just so to avoid
> rebooting the machine.
> 
> But now all of a sudden, if they wanna have SGX enclaves in guests, they
> need to get prepared for potential rebooting.

Not necessarily.  This can only trigger in the host, and thus require a host
reboot, if the host is also running enclaves.  If the CSP is not running
enclaves, or is running its enclaves in a separate VM, then this path cannot be
reached.

> I sure hope I'm missing something...

EREMOVE can only fail if there's a kernel or hardware bug (or a VMM bug if
running as a guest).  IME, nearly every kernel/KVM bug that I introduced that
led to EREMOVE failure was also quite fatal to SGX, i.e. this is just the canary
in the coal mine.

It's certainly possible to add more sophisticated error handling, e.g. through
the pages onto a list and periodically try to recover them.  But, since the vast
majority of bugs that cause EREMOVE failure are fatal to SGX, implementing
sophisticated handling is quite low on the list of priorities.

Dave wanted the "page leaked" error message so that it's abundantly clear that
the kernel is leaking pages on EREMOVE failure and that the WARN isn't "benign".

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 18:56     ` Sean Christopherson
@ 2021-03-22 19:11       ` Paolo Bonzini
  2021-03-22 20:43         ` Kai Huang
  2021-03-22 19:15       ` Borislav Petkov
  1 sibling, 1 reply; 134+ messages in thread
From: Paolo Bonzini @ 2021-03-22 19:11 UTC (permalink / raw)
  To: Sean Christopherson, Borislav Petkov
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, tglx, mingo, hpa

On 22/03/21 19:56, Sean Christopherson wrote:
> EREMOVE can only fail if there's a kernel or hardware bug (or a VMM bug if
> running as a guest).  IME, nearly every kernel/KVM bug that I introduced that
> led to EREMOVE failure was also quite fatal to SGX, i.e. this is just the canary
> in the coal mine.

That was my recollection as well from previous threads but, to be fair 
to Boris, the commit message is a lot more scary (and, which is what 
triggers me, puts the blame on KVM).  It just says "KVM does not track 
how guest pages are used, which means that SGX virtualization use of 
EREMOVE might fail".

Paolo


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 18:56     ` Sean Christopherson
  2021-03-22 19:11       ` Paolo Bonzini
@ 2021-03-22 19:15       ` Borislav Petkov
  2021-03-22 19:37         ` Sean Christopherson
  1 sibling, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-03-22 19:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Mon, Mar 22, 2021 at 11:56:37AM -0700, Sean Christopherson wrote:
> Not necessarily.  This can only trigger in the host, and thus require a host
> reboot, if the host is also running enclaves.  If the CSP is not running
> enclaves, or is running its enclaves in a separate VM, then this path cannot be
> reached.

That's what I meant. Rebooting guests is a lot easier, ofc.

Or are you saying, this can trigger *only* when they're running enclaves
on the *host* too?

> EREMOVE can only fail if there's a kernel or hardware bug (or a VMM bug if
> running as a guest). 

We get those on a daily basis.

> IME, nearly every kernel/KVM bug that I introduced that led to EREMOVE
> failure was also quite fatal to SGX, i.e. this is just the canary in
> the coal mine.
>
> It's certainly possible to add more sophisticated error handling, e.g. through
> the pages onto a list and periodically try to recover them.  But, since the vast
> majority of bugs that cause EREMOVE failure are fatal to SGX, implementing
> sophisticated handling is quite low on the list of priorities.
> 
> Dave wanted the "page leaked" error message so that it's abundantly clear that
> the kernel is leaking pages on EREMOVE failure and that the WARN isn't "benign".

So this sounds to me like this should BUG too eventually.

Or is this one of those "this should never happen" things so no one
should worry?

Whatever it is, if an admin sees this message in dmesg and doesn't get a
lengthy explanation what she/he is supposed to do, I don't think she/he
will be as relaxed.

Hell, people open bugs for correctable ECCs and are asking whether they
need to replace their hardware.

So let's play this out: put yourself in an admin's shoes and tell me how
should an admin react when she/he sees that?

Should the kernel probably also say: "Don't worry, you have enough
memory and what's a 4K, who cares? You'll reboot eventually."

Or should the kernel say "You need to reboot ASAP."

And so on...

So what is the scenario here and what kind of reaction is that message
supposed to cause, recovery action, blabla, the whole spiel?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 19:15       ` Borislav Petkov
@ 2021-03-22 19:37         ` Sean Christopherson
  2021-03-22 20:36           ` Kai Huang
  2021-03-22 21:06           ` Borislav Petkov
  0 siblings, 2 replies; 134+ messages in thread
From: Sean Christopherson @ 2021-03-22 19:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Mon, Mar 22, 2021, Borislav Petkov wrote:
> On Mon, Mar 22, 2021 at 11:56:37AM -0700, Sean Christopherson wrote:
> > Not necessarily.  This can only trigger in the host, and thus require a host
> > reboot, if the host is also running enclaves.  If the CSP is not running
> > enclaves, or is running its enclaves in a separate VM, then this path cannot be
> > reached.
> 
> That's what I meant. Rebooting guests is a lot easier, ofc.
> 
> Or are you saying, this can trigger *only* when they're running enclaves
> on the *host* too?

Yes.  Note, it's still true if you strike out the "too", KVM support is completely
orthogonal to this code.  The purpose of this patch is to separate out the EREMOVE
path used for host enclaves (/dev/sgx_enclave), because EPC virtualization for
KVM will have non-buggy scenarios where EREMOVE can fail.  But the virt EPC code
is designed to handle that gracefully.

> > EREMOVE can only fail if there's a kernel or hardware bug (or a VMM bug if
> > running as a guest). 
> 
> We get those on a daily basis.
> 
> > IME, nearly every kernel/KVM bug that I introduced that led to EREMOVE
> > failure was also quite fatal to SGX, i.e. this is just the canary in
> > the coal mine.
> >
> > It's certainly possible to add more sophisticated error handling, e.g. through
> > the pages onto a list and periodically try to recover them.  But, since the vast
> > majority of bugs that cause EREMOVE failure are fatal to SGX, implementing
> > sophisticated handling is quite low on the list of priorities.
> > 
> > Dave wanted the "page leaked" error message so that it's abundantly clear that
> > the kernel is leaking pages on EREMOVE failure and that the WARN isn't "benign".
> 
> So this sounds to me like this should BUG too eventually.
> 
> Or is this one of those "this should never happen" things so no one
> should worry?

Hmm.  I don't think it warrants BUG.  At worst, leaking EPC pages is fatal only
to SGX.  If the underlying bug caused other fallout, e.g. didn't release a lock,
then obviously that could be fatal to the kernel.  But I don't think there's
ever a case where SGX being unusuable would prevent the kernel from functioning.
 
> Whatever it is, if an admin sees this message in dmesg and doesn't get a
> lengthy explanation what she/he is supposed to do, I don't think she/he
> will be as relaxed.
> 
> Hell, people open bugs for correctable ECCs and are asking whether they
> need to replace their hardware.

LOL.

> So let's play this out: put yourself in an admin's shoes and tell me how
> should an admin react when she/he sees that?
> 
> Should the kernel probably also say: "Don't worry, you have enough
> memory and what's a 4K, who cares? You'll reboot eventually."
 
> Or should the kernel say "You need to reboot ASAP."
> 
> And so on...
> 
> So what is the scenario here and what kind of reaction is that message
> supposed to cause, recovery action, blabla, the whole spiel?

Probably something in between.  Odds are good SGX will eventually become
unusuable, e.g. either kernel SGX support is completely hosted, or it will soon
leak the majority of EPC pages.  Something like this?

  "EREMOVE returned %d (0x%x), kernel bug likely.  EPC page leaked, SGX may become unusuable.  Reboot recommended to continue using SGX."

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 19:37         ` Sean Christopherson
@ 2021-03-22 20:36           ` Kai Huang
  2021-03-22 21:06           ` Borislav Petkov
  1 sibling, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-22 20:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Mon, 22 Mar 2021 12:37:02 -0700 Sean Christopherson wrote:
> On Mon, Mar 22, 2021, Borislav Petkov wrote:
> > On Mon, Mar 22, 2021 at 11:56:37AM -0700, Sean Christopherson wrote:
> > > Not necessarily.  This can only trigger in the host, and thus require a host
> > > reboot, if the host is also running enclaves.  If the CSP is not running
> > > enclaves, or is running its enclaves in a separate VM, then this path cannot be
> > > reached.
> > 
> > That's what I meant. Rebooting guests is a lot easier, ofc.
> > 
> > Or are you saying, this can trigger *only* when they're running enclaves
> > on the *host* too?
> 
> Yes.  Note, it's still true if you strike out the "too", KVM support is completely
> orthogonal to this code.  The purpose of this patch is to separate out the EREMOVE
> path used for host enclaves (/dev/sgx_enclave), because EPC virtualization for
> KVM will have non-buggy scenarios where EREMOVE can fail.  But the virt EPC code
> is designed to handle that gracefully.
> 
> > > EREMOVE can only fail if there's a kernel or hardware bug (or a VMM bug if
> > > running as a guest). 
> > 
> > We get those on a daily basis.
> > 
> > > IME, nearly every kernel/KVM bug that I introduced that led to EREMOVE
> > > failure was also quite fatal to SGX, i.e. this is just the canary in
> > > the coal mine.
> > >
> > > It's certainly possible to add more sophisticated error handling, e.g. through
> > > the pages onto a list and periodically try to recover them.  But, since the vast
> > > majority of bugs that cause EREMOVE failure are fatal to SGX, implementing
> > > sophisticated handling is quite low on the list of priorities.
> > > 
> > > Dave wanted the "page leaked" error message so that it's abundantly clear that
> > > the kernel is leaking pages on EREMOVE failure and that the WARN isn't "benign".
> > 
> > So this sounds to me like this should BUG too eventually.
> > 
> > Or is this one of those "this should never happen" things so no one
> > should worry?
> 
> Hmm.  I don't think it warrants BUG.  At worst, leaking EPC pages is fatal only
> to SGX.  If the underlying bug caused other fallout, e.g. didn't release a lock,
> then obviously that could be fatal to the kernel.  But I don't think there's
> ever a case where SGX being unusuable would prevent the kernel from functioning.
>  
> > Whatever it is, if an admin sees this message in dmesg and doesn't get a
> > lengthy explanation what she/he is supposed to do, I don't think she/he
> > will be as relaxed.
> > 
> > Hell, people open bugs for correctable ECCs and are asking whether they
> > need to replace their hardware.
> 
> LOL.
> 
> > So let's play this out: put yourself in an admin's shoes and tell me how
> > should an admin react when she/he sees that?
> > 
> > Should the kernel probably also say: "Don't worry, you have enough
> > memory and what's a 4K, who cares? You'll reboot eventually."
>  
> > Or should the kernel say "You need to reboot ASAP."
> > 
> > And so on...
> > 
> > So what is the scenario here and what kind of reaction is that message
> > supposed to cause, recovery action, blabla, the whole spiel?
> 
> Probably something in between.  Odds are good SGX will eventually become
> unusuable, e.g. either kernel SGX support is completely hosted, or it will soon
> leak the majority of EPC pages.  Something like this?
> 
>   "EREMOVE returned %d (0x%x), kernel bug likely.  EPC page leaked, SGX may become unusuable.  Reboot recommended to continue using SGX."

Or perhaps just stick to old behavior in original sgx_free_epc_page()?

	ret = __eremove(sgx_get_epc_virt_addr(page));
	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
		return;

This code path is only used by host SGX driver, but not KVM. And this patch's
*main* intention is to break EREMOVE out of sgx_free_epc_page() so virtual EPC
code can use sgx_free_epc_page(). 

Improving the error msg can be a separate discussion and separate patch which
can be done in the future, and this has nothing to do with SGX virtualization
support.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 19:11       ` Paolo Bonzini
@ 2021-03-22 20:43         ` Kai Huang
  2021-03-23 16:40           ` Paolo Bonzini
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-22 20:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Borislav Petkov, kvm, x86, linux-sgx,
	linux-kernel, jarkko, luto, dave.hansen, rick.p.edgecombe,
	haitao.huang, tglx, mingo, hpa

On Mon, 22 Mar 2021 20:11:57 +0100 Paolo Bonzini wrote:
> On 22/03/21 19:56, Sean Christopherson wrote:
> > EREMOVE can only fail if there's a kernel or hardware bug (or a VMM bug if
> > running as a guest).  IME, nearly every kernel/KVM bug that I introduced that
> > led to EREMOVE failure was also quite fatal to SGX, i.e. this is just the canary
> > in the coal mine.
> 
> That was my recollection as well from previous threads but, to be fair 
> to Boris, the commit message is a lot more scary (and, which is what 
> triggers me, puts the blame on KVM).  It just says "KVM does not track 
> how guest pages are used, which means that SGX virtualization use of 
> EREMOVE might fail".

I don't see the commit msg being scary.  EREMOVE might fail but virtual EPC code
can handle that.  This is the reason to break out EREMOVE from original
sgx_free_epc_page(), so virtual EPC code can have its own logic of handling
EREMOVE failure.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 19:37         ` Sean Christopherson
  2021-03-22 20:36           ` Kai Huang
@ 2021-03-22 21:06           ` Borislav Petkov
  2021-03-22 22:06             ` Kai Huang
  2021-03-22 22:23             ` Kai Huang
  1 sibling, 2 replies; 134+ messages in thread
From: Borislav Petkov @ 2021-03-22 21:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Mon, Mar 22, 2021 at 12:37:02PM -0700, Sean Christopherson wrote:
> Yes.  Note, it's still true if you strike out the "too", KVM support is completely
> orthogonal to this code.  The purpose of this patch is to separate out the EREMOVE
> path used for host enclaves (/dev/sgx_enclave), because EPC virtualization for
> KVM will have non-buggy scenarios where EREMOVE can fail.  But the virt EPC code
> is designed to handle that gracefully.

"gracefully" as it won't leak EPC pages which would require a host reboot? That
leaking is done by host enclaves only?

> Hmm.  I don't think it warrants BUG.  At worst, leaking EPC pages is fatal only
> to SGX.

Fatal how? If it keeps leaking, at some point it won't have any pages
for EPC pages anymore?

Btw, I probably have seen this and forgotten again so pls remind me,
is the amount of pages available for SGX use static and limited by,
I believe BIOS, or can a leakage in EPC pages cause system memory
shortage?

> If the underlying bug caused other fallout, e.g. didn't release a
> lock, then obviously that could be fatal to the kernel. But I don't
> think there's ever a case where SGX being unusuable would prevent the
> kernel from functioning.

This kinda replies my question above but still...

> Probably something in between.  Odds are good SGX will eventually become
> unusuable, e.g. either kernel SGX support is completely hosted, or it will soon
> leak the majority of EPC pages.  Something like this?
> 
>   "EREMOVE returned %d (0x%x), kernel bug likely.  EPC page leaked, SGX may become unusuable.  Reboot recommended to continue using SGX."

So all this handwaving I'm doing is to provoke a proper response from
you guys as to how a EPC page leaking is supposed to be handled by the
users of the technology:

1. Issue a warning message and forget about it, eventual reboot

2. Really scary message to make users reboot sooner

3. Detect when host enclaves are run while guest enclaves are running
and issue a warning then.

4. Fall on knees and pray to not get sued by customers because their
enclaves are not working anymore.

....

Btw, 4. needs to be considered properly so that people can cover asses.

Oh and whatever we end up deciding, we should document that in
Documentation/... somewhere and point users to it in that warning
message where a longer treatise is explaining the whole deal properly.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 21:06           ` Borislav Petkov
@ 2021-03-22 22:06             ` Kai Huang
  2021-03-22 22:37               ` Borislav Petkov
  2021-03-22 22:23             ` Kai Huang
  1 sibling, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-22 22:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sean Christopherson, kvm, x86, linux-sgx, linux-kernel, jarkko,
	luto, dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini,
	tglx, mingo, hpa

On Mon, 22 Mar 2021 22:06:45 +0100 Borislav Petkov wrote:
> On Mon, Mar 22, 2021 at 12:37:02PM -0700, Sean Christopherson wrote:
> > Yes.  Note, it's still true if you strike out the "too", KVM support is completely
> > orthogonal to this code.  The purpose of this patch is to separate out the EREMOVE
> > path used for host enclaves (/dev/sgx_enclave), because EPC virtualization for
> > KVM will have non-buggy scenarios where EREMOVE can fail.  But the virt EPC code
> > is designed to handle that gracefully.
> 
> "gracefully" as it won't leak EPC pages which would require a host reboot? That
> leaking is done by host enclaves only?

This path is called by host SGX driver only, so yes this leaking is done by
host enclaves only.

This patch is purpose is to break EREMOVE out of sgx_free_epc_page() so virtual
EPC code can use sgx_free_epc_page(), and handle EREMOVE logic differently.
This patch doesn't have intention to bring functional change.  I changed the
error msg because Dave said it worth to give a message saying EPC page is
leaked, and I thought this minor change won't break anything.

Perpahps we can avoid changing error message but stick to existing SGX driver
behavior? 

> 
> > Hmm.  I don't think it warrants BUG.  At worst, leaking EPC pages is fatal only
> > to SGX.
> 
> Fatal how? If it keeps leaking, at some point it won't have any pages
> for EPC pages anymore?
> 
> Btw, I probably have seen this and forgotten again so pls remind me,
> is the amount of pages available for SGX use static and limited by,
> I believe BIOS, or can a leakage in EPC pages cause system memory
> shortage?
> 
> > If the underlying bug caused other fallout, e.g. didn't release a
> > lock, then obviously that could be fatal to the kernel. But I don't
> > think there's ever a case where SGX being unusuable would prevent the
> > kernel from functioning.
> 
> This kinda replies my question above but still...
> 
> > Probably something in between.  Odds are good SGX will eventually become
> > unusuable, e.g. either kernel SGX support is completely hosted, or it will soon
> > leak the majority of EPC pages.  Something like this?
> > 
> >   "EREMOVE returned %d (0x%x), kernel bug likely.  EPC page leaked, SGX may become unusuable.  Reboot recommended to continue using SGX."
> 
> So all this handwaving I'm doing is to provoke a proper response from
> you guys as to how a EPC page leaking is supposed to be handled by the
> users of the technology:
> 
> 1. Issue a warning message and forget about it, eventual reboot

This is the existing SGX driver behavior IMHO. It just gives a WARN() saying
EREMOVE failed with the error code printed.  The EPC page is leaked w/o any
message to user.  I can live with it, and it is existing code anyway.

btw, currently virtual EPC code (patch 5) handles in similar way: There's one
EREMOVE error is expected and virtual EPC code can handle, but for other
errors, it means kernel bug, and virtual EPC code gives a WARN(), and that EPC
page is leaked too:

+		WARN_ONCE(ret != SGX_CHILD_PRESENT,
+			  "EREMOVE (EPC page 0x%lx): unexpected error: %d\n",
+			  sgx_get_epc_phys_addr(epc_page), ret);
+		return ret;

So to me they are just WARN() to catch kernel bug.

> 
> 2. Really scary message to make users reboot sooner
> 
> 3. Detect when host enclaves are run while guest enclaves are running
> and issue a warning then.

This code path has nothing to do with guest enclaves.

> 
> 4. Fall on knees and pray to not get sued by customers because their
> enclaves are not working anymore.
> 
> ....
> 
> Btw, 4. needs to be considered properly so that people can cover asses.

If we are talking about CSPs being unable to provide correct services to
customers due to kernel bug, I think this is a bigger question but not
just related to SGX,  since other kernel bug can also cause similar problem, for
instance, VM or SGX process itself being killed.

> 
> Oh and whatever we end up deciding, we should document that in
> Documentation/... somewhere and point users to it in that warning
> message where a longer treatise is explaining the whole deal properly.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 21:06           ` Borislav Petkov
  2021-03-22 22:06             ` Kai Huang
@ 2021-03-22 22:23             ` Kai Huang
  1 sibling, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-22 22:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sean Christopherson, kvm, x86, linux-sgx, linux-kernel, jarkko,
	luto, dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini,
	tglx, mingo, hpa


> 
> Btw, I probably have seen this and forgotten again so pls remind me,
> is the amount of pages available for SGX use static and limited by,
> I believe BIOS, or can a leakage in EPC pages cause system memory
> shortage?
> 

Yes EPC size is fixed and configured in BIOS. Leaking EPC pages may cause EPC
shortage, but not system memory.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 22:06             ` Kai Huang
@ 2021-03-22 22:37               ` Borislav Petkov
  2021-03-22 23:16                 ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-03-22 22:37 UTC (permalink / raw)
  To: Kai Huang
  Cc: Sean Christopherson, kvm, x86, linux-sgx, linux-kernel, jarkko,
	luto, dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini,
	tglx, mingo, hpa

On Tue, Mar 23, 2021 at 11:06:43AM +1300, Kai Huang wrote:
> This path is called by host SGX driver only, so yes this leaking is done by
> host enclaves only.

Yes, so I was told.

> This patch is purpose is to break EREMOVE out of sgx_free_epc_page() so virtual
> EPC code can use sgx_free_epc_page(), and handle EREMOVE logic differently.
> This patch doesn't have intention to bring functional change.  I changed the
> error msg because Dave said it worth to give a message saying EPC page is
> leaked, and I thought this minor change won't break anything.

I read that already - you don't have to repeat it.

> btw, currently virtual EPC code (patch 5) handles in similar way: There's one
> EREMOVE error is expected and virtual EPC code can handle, but for other
> errors, it means kernel bug, and virtual EPC code gives a WARN(), and that EPC
> page is leaked too:
> 
> +		WARN_ONCE(ret != SGX_CHILD_PRESENT,
> +			  "EREMOVE (EPC page 0x%lx): unexpected error: %d\n",
> +			  sgx_get_epc_phys_addr(epc_page), ret);
> +		return ret;
> 
> So to me they are just WARN() to catch kernel bug.

You don't care about users, do you? Because when that error happens,
they won't come crying to you to fix it.

Lemme save you some trouble: we don't take half-baked code into the
kernel until stuff has been discussed and analyzed properly. So instead
of trying to downplay this, try answering my questions.

Here's another one: when does EREMOVE fail?

/me goes and looks it up

"The instruction fails if the operand is not properly aligned or does
not refer to an EPC page or the page is in use by another thread, or
other threads are running in the enclave to which the page belongs. In
addition the instruction fails if the operand refers to an SECS with
associations."

And I guess those conditions will become more in the future.

Now, let's play. I'm the cloud admin and you're cloud OS customer
support. I say:

"I got this scary error message while running enclaves on my server

"EREMOVE returned ... .  EPC page leaked.  Reboot required to retrieve leaked pages."

but I cannot reboot that machine because there are guests running on it
and I'm getting paid for those guests and I might get sued if I do?"

Your turn, go wild.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 22:37               ` Borislav Petkov
@ 2021-03-22 23:16                 ` Kai Huang
  2021-03-23 15:45                   ` Sean Christopherson
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-22 23:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sean Christopherson, kvm, x86, linux-sgx, linux-kernel, jarkko,
	luto, dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini,
	tglx, mingo, hpa

On Mon, 22 Mar 2021 23:37:26 +0100 Borislav Petkov wrote:
> On Tue, Mar 23, 2021 at 11:06:43AM +1300, Kai Huang wrote:
> > This path is called by host SGX driver only, so yes this leaking is done by
> > host enclaves only.
> 
> Yes, so I was told.
> 
> > This patch is purpose is to break EREMOVE out of sgx_free_epc_page() so virtual
> > EPC code can use sgx_free_epc_page(), and handle EREMOVE logic differently.
> > This patch doesn't have intention to bring functional change.  I changed the
> > error msg because Dave said it worth to give a message saying EPC page is
> > leaked, and I thought this minor change won't break anything.
> 
> I read that already - you don't have to repeat it.
> 
> > btw, currently virtual EPC code (patch 5) handles in similar way: There's one
> > EREMOVE error is expected and virtual EPC code can handle, but for other
> > errors, it means kernel bug, and virtual EPC code gives a WARN(), and that EPC
> > page is leaked too:
> > 
> > +		WARN_ONCE(ret != SGX_CHILD_PRESENT,
> > +			  "EREMOVE (EPC page 0x%lx): unexpected error: %d\n",
> > +			  sgx_get_epc_phys_addr(epc_page), ret);
> > +		return ret;
> > 
> > So to me they are just WARN() to catch kernel bug.
> 
> You don't care about users, do you? Because when that error happens,
> they won't come crying to you to fix it.
> 
> Lemme save you some trouble: we don't take half-baked code into the
> kernel until stuff has been discussed and analyzed properly. So instead
> of trying to downplay this, try answering my questions.
> 
> Here's another one: when does EREMOVE fail?
> 
> /me goes and looks it up
> 
> "The instruction fails if the operand is not properly aligned or does
> not refer to an EPC page or the page is in use by another thread, or
> other threads are running in the enclave to which the page belongs. In
> addition the instruction fails if the operand refers to an SECS with
> associations."
> 
> And I guess those conditions will become more in the future.
> 
> Now, let's play. I'm the cloud admin and you're cloud OS customer
> support. I say:
> 
> "I got this scary error message while running enclaves on my server
> 
> "EREMOVE returned ... .  EPC page leaked.  Reboot required to retrieve leaked pages."
> 
> but I cannot reboot that machine because there are guests running on it
> and I'm getting paid for those guests and I might get sued if I do?"
> 
> Your turn, go wild.

I suppose admin can migrate those VMs, and then engineers can analyse the root
cause of such failure, and then fix it.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 23:16                 ` Kai Huang
@ 2021-03-23 15:45                   ` Sean Christopherson
  2021-03-23 16:06                     ` Borislav Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Sean Christopherson @ 2021-03-23 15:45 UTC (permalink / raw)
  To: Kai Huang
  Cc: Borislav Petkov, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, Mar 23, 2021, Kai Huang wrote:
> On Mon, 22 Mar 2021 23:37:26 +0100 Borislav Petkov wrote:
> > "The instruction fails if the operand is not properly aligned or does
> > not refer to an EPC page or the page is in use by another thread, or
> > other threads are running in the enclave to which the page belongs. In
> > addition the instruction fails if the operand refers to an SECS with
> > associations."
> > 
> > And I guess those conditions will become more in the future.

Yep, IME these types of bugs rarely, if ever, lead to isolated failures.

> > Now, let's play. I'm the cloud admin and you're cloud OS customer
> > support. I say:
> > 
> > "I got this scary error message while running enclaves on my server
> > 
> > "EREMOVE returned ... .  EPC page leaked.  Reboot required to retrieve leaked pages."
> > 
> > but I cannot reboot that machine because there are guests running on it
> > and I'm getting paid for those guests and I might get sued if I do?"
> > 
> > Your turn, go wild.
> 
> I suppose admin can migrate those VMs, and then engineers can analyse the root
> cause of such failure, and then fix it.

That's more than likely what will happen, though there are a lot of "ifs" and
"buts" in any answer, e.g. things will go downhill fast if the majority of
systems in the fleet are running the buggy kernel and are triggering the error.

Practically speaking, "basic" deployments of SGX VMs will be insulated from
this bug.  KVM doesn't support EPC oversubscription, so even if all EPC is
exhausted, new VMs will fail to launch, but existing VMs will continue to chug
along with no ill effects.  There are again caveats, e.g. if EPC is being lazily
allocated for VMs, then running VMs will be affected if a VM starts using SGX
after the leak in the host occurs.  But, IMO doing lazy allocation _and_ running
enclaves in the host falls firmly into the "advanced" bucket; anyone going that
route had better do their homework to understand the various EPC interactions.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 15:45                   ` Sean Christopherson
@ 2021-03-23 16:06                     ` Borislav Petkov
  2021-03-23 16:21                       ` Sean Christopherson
                                         ` (2 more replies)
  0 siblings, 3 replies; 134+ messages in thread
From: Borislav Petkov @ 2021-03-23 16:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, Mar 23, 2021 at 03:45:14PM +0000, Sean Christopherson wrote:
> Practically speaking, "basic" deployments of SGX VMs will be insulated from
> this bug.  KVM doesn't support EPC oversubscription, so even if all EPC is
> exhausted, new VMs will fail to launch, but existing VMs will continue to chug
> along with no ill effects....

Ok, so it sounds to me like *at* *least* there should be some writeup in
Documentation/ explaining to the user what to do when she sees such an
EREMOVE failure, perhaps the gist of this thread and then possibly the
error message should point to that doc.

We will of course have to revisit when this hits the wild and people
start (or not) hitting this. But judging by past experience, if it is
there, we will hit it. Murphy says so.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 16:06                     ` Borislav Petkov
@ 2021-03-23 16:21                       ` Sean Christopherson
  2021-03-23 16:32                         ` Borislav Petkov
  2021-03-24  9:28                         ` Jarkko Sakkinen
  2021-03-23 16:38                       ` Paolo Bonzini
  2021-03-24  9:26                       ` Jarkko Sakkinen
  2 siblings, 2 replies; 134+ messages in thread
From: Sean Christopherson @ 2021-03-23 16:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, Mar 23, 2021, Borislav Petkov wrote:
> On Tue, Mar 23, 2021 at 03:45:14PM +0000, Sean Christopherson wrote:
> > Practically speaking, "basic" deployments of SGX VMs will be insulated from
> > this bug.  KVM doesn't support EPC oversubscription, so even if all EPC is
> > exhausted, new VMs will fail to launch, but existing VMs will continue to chug
> > along with no ill effects....
> 
> Ok, so it sounds to me like *at* *least* there should be some writeup in
> Documentation/ explaining to the user what to do when she sees such an
> EREMOVE failure, perhaps the gist of this thread and then possibly the
> error message should point to that doc.
> 
> We will of course have to revisit when this hits the wild and people
> start (or not) hitting this. But judging by past experience, if it is
> there, we will hit it. Murphy says so.

I like the idea of pointing at the documentation.  The documentation should
probably emphasize that something is very, very wrong.  E.g. if a kernel bug
triggers EREMOVE failure and isn't detected until the kernel is widely deployed
in a fleet, then the folks deploying the kernel probably _should_ be in all out
panic.  For this variety of bug to escape that far, it means there are huge
holes in test coverage, in both the kernel itself and in the infrasturcture of
whoever is rolling out their new kernel.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 16:21                       ` Sean Christopherson
@ 2021-03-23 16:32                         ` Borislav Petkov
  2021-03-23 16:51                           ` Sean Christopherson
  2021-03-24  9:38                           ` Kai Huang
  2021-03-24  9:28                         ` Jarkko Sakkinen
  1 sibling, 2 replies; 134+ messages in thread
From: Borislav Petkov @ 2021-03-23 16:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, Mar 23, 2021 at 04:21:47PM +0000, Sean Christopherson wrote:
> I like the idea of pointing at the documentation.  The documentation should
> probably emphasize that something is very, very wrong.

Yap, because no matter how we formulate the error message, it still ain't enough
and needs a longer explanation.

> E.g. if a kernel bug triggers EREMOVE failure and isn't detected until
> the kernel is widely deployed in a fleet, then the folks deploying the
> kernel probably _should_ be in all out panic. For this variety of bug
> to escape that far, it means there are huge holes in test coverage, in
> both the kernel itself and in the infrasturcture of whoever is rolling
> out their new kernel.

You sound just like someone who works at a company with a big fleet, oh
wait...

:-)

And yap, you big fleeted guys will more likely catch it but we do have
all these other customers who have a handful of servers only so they
probably won't be able to do such a wide coverage.

So I hope they'll appreciate this longer explanation about what to do
when they hit it. And normally I wouldn't even care but we almost never
tell people to reboot their boxes to fix sh*t - that's the other OS.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 16:06                     ` Borislav Petkov
  2021-03-23 16:21                       ` Sean Christopherson
@ 2021-03-23 16:38                       ` Paolo Bonzini
  2021-03-23 17:02                         ` Sean Christopherson
  2021-03-24  9:26                       ` Jarkko Sakkinen
  2 siblings, 1 reply; 134+ messages in thread
From: Paolo Bonzini @ 2021-03-23 16:38 UTC (permalink / raw)
  To: Borislav Petkov, Sean Christopherson
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, tglx, mingo, hpa

On 23/03/21 17:06, Borislav Petkov wrote:
>> Practically speaking, "basic" deployments of SGX VMs will be insulated from
>> this bug.  KVM doesn't support EPC oversubscription, so even if all EPC is
>> exhausted, new VMs will fail to launch, but existing VMs will continue to chug
>> along with no ill effects....
>
> Ok, so it sounds to me like*at*  *least*  there should be some writeup in
> Documentation/ explaining to the user what to do when she sees such an
> EREMOVE failure, perhaps the gist of this thread and then possibly the
> error message should point to that doc.

That's important, but it's even more important *to developers* that the 
commit message spells out why this would be a kernel bug more often than 
not.  I for one do not understand it, and I suspect I'm not alone.

Maybe (optimistically) once we see that explanation we decide that the 
documentation is not important.  Sean, Kai, can you explain it?

Paolo


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-22 20:43         ` Kai Huang
@ 2021-03-23 16:40           ` Paolo Bonzini
  0 siblings, 0 replies; 134+ messages in thread
From: Paolo Bonzini @ 2021-03-23 16:40 UTC (permalink / raw)
  To: Kai Huang
  Cc: Sean Christopherson, Borislav Petkov, kvm, x86, linux-sgx,
	linux-kernel, jarkko, luto, dave.hansen, rick.p.edgecombe,
	haitao.huang, tglx, mingo, hpa

On 22/03/21 21:43, Kai Huang wrote:
>> That was my recollection as well from previous threads but, to be fair
>> to Boris, the commit message is a lot more scary (and, which is what
>> triggers me, puts the blame on KVM).  It just says "KVM does not track
>> how guest pages are used, which means that SGX virtualization use of
>> EREMOVE might fail".
>
> I don't see the commit msg being scary.  EREMOVE might fail but virtual EPC code
> can handle that.  This is the reason to break out EREMOVE from original
> sgx_free_epc_page(), so virtual EPC code can have its own logic of handling
> EREMOVE failure.

I should explain what I mean by scary.

What you wrote above, "EREMOVE might fail but virtual EPC code can 
handle that" sounds fine.  But it doesn't say the failure mode, so it's 
hiding information.

What I would like to have, "EREMOVE might fail and will be leaked, but 
virtual EPC code will not crash and in any case there are much worse 
problems waiting to happen" is fine.  (It's even better with an 
explanation of the problems).

Your message however was in the middle: "EREMOVE might fail, virtual EPC 
code will not crash but the page will be leaked".  It gives the failure 
mode but not how the problem arises, and it is this combination that 
results in something scary-sounding.

Paolo


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 16:32                         ` Borislav Petkov
@ 2021-03-23 16:51                           ` Sean Christopherson
  2021-03-24  9:38                           ` Kai Huang
  1 sibling, 0 replies; 134+ messages in thread
From: Sean Christopherson @ 2021-03-23 16:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, Mar 23, 2021, Borislav Petkov wrote:
> On Tue, Mar 23, 2021 at 04:21:47PM +0000, Sean Christopherson wrote:
> > I like the idea of pointing at the documentation.  The documentation should
> > probably emphasize that something is very, very wrong.
> 
> Yap, because no matter how we formulate the error message, it still ain't enough
> and needs a longer explanation.
> 
> > E.g. if a kernel bug triggers EREMOVE failure and isn't detected until
> > the kernel is widely deployed in a fleet, then the folks deploying the
> > kernel probably _should_ be in all out panic. For this variety of bug
> > to escape that far, it means there are huge holes in test coverage, in
> > both the kernel itself and in the infrasturcture of whoever is rolling
> > out their new kernel.
> 
> You sound just like someone who works at a company with a big fleet, oh
> wait...
> 
> :-)
> 
> And yap, you big fleeted guys will more likely catch it but we do have
> all these other customers who have a handful of servers only so they
> probably won't be able to do such a wide coverage.

The size of the fleet shouldn't matter for this specific case.  This bug
requires the _host_ to be running enclaves, and obviously it also requires the
system to be running SGX-enabled guests as well.  In such a setup, the SGX
workload running in the host should be very well defined and understood, i.e.
testing should be a well-bounded problem to solve.

Running enclaves in both the host and guest should be uncommon in and of itself,
and for such setups, running _any_ SGX workloads in the host, let alone more
than 1 or 2 unique workloads, without ensuring guests are fully isolated is,
IMO, insane.

But yeah, what can happen, will happen.
 
> So I hope they'll appreciate this longer explanation about what to do
> when they hit it. And normally I wouldn't even care but we almost never
> tell people to reboot their boxes to fix sh*t - that's the other OS.
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 16:38                       ` Paolo Bonzini
@ 2021-03-23 17:02                         ` Sean Christopherson
  2021-03-23 17:06                           ` Paolo Bonzini
  0 siblings, 1 reply; 134+ messages in thread
From: Sean Christopherson @ 2021-03-23 17:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Borislav Petkov, Kai Huang, kvm, x86, linux-sgx, linux-kernel,
	jarkko, luto, dave.hansen, rick.p.edgecombe, haitao.huang, tglx,
	mingo, hpa

On Tue, Mar 23, 2021, Paolo Bonzini wrote:
> On 23/03/21 17:06, Borislav Petkov wrote:
> > > Practically speaking, "basic" deployments of SGX VMs will be insulated from
> > > this bug.  KVM doesn't support EPC oversubscription, so even if all EPC is
> > > exhausted, new VMs will fail to launch, but existing VMs will continue to chug
> > > along with no ill effects....
> > 
> > Ok, so it sounds to me like*at*  *least*  there should be some writeup in
> > Documentation/ explaining to the user what to do when she sees such an
> > EREMOVE failure, perhaps the gist of this thread and then possibly the
> > error message should point to that doc.
> 
> That's important, but it's even more important *to developers* that the
> commit message spells out why this would be a kernel bug more often than
> not.  I for one do not understand it, and I suspect I'm not alone.
> 
> Maybe (optimistically) once we see that explanation we decide that the
> documentation is not important.  Sean, Kai, can you explain it?

Thought of a good analogy that can be used for the changelog and/or docs:

This is effectively a kernel use-after-free of EPC, and due to the way SGX works,
the bug is detected at freeing.  Rather than add the page back to the pool of
available EPC, the kernel intentionally leaks the page to avoid additional
errors in the future.

Does that help?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 17:02                         ` Sean Christopherson
@ 2021-03-23 17:06                           ` Paolo Bonzini
  2021-03-23 17:16                             ` Sean Christopherson
  2021-03-23 18:16                             ` Borislav Petkov
  0 siblings, 2 replies; 134+ messages in thread
From: Paolo Bonzini @ 2021-03-23 17:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, Kai Huang, kvm, x86, linux-sgx, linux-kernel,
	jarkko, luto, dave.hansen, rick.p.edgecombe, haitao.huang, tglx,
	mingo, hpa

On 23/03/21 18:02, Sean Christopherson wrote:
>> That's important, but it's even more important *to developers* that the
>> commit message spells out why this would be a kernel bug more often than
>> not.  I for one do not understand it, and I suspect I'm not alone.
>> 
>> Maybe (optimistically) once we see that explanation we decide that the
>> documentation is not important.  Sean, Kai, can you explain it?
>
> Thought of a good analogy that can be used for the changelog and/or docs:
> 
> This is effectively a kernel use-after-free of EPC, and due to the way SGX works,
> the bug is detected at freeing.  Rather than add the page back to the pool of
> available EPC, the kernel intentionally leaks the page to avoid additional
> errors in the future.
> 
> Does that help?

Very much, and for me this also settles the question of documentation. 
Borislav or Kai, can you add it to the commit message?

Paolo


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 17:06                           ` Paolo Bonzini
@ 2021-03-23 17:16                             ` Sean Christopherson
  2021-03-23 18:16                             ` Borislav Petkov
  1 sibling, 0 replies; 134+ messages in thread
From: Sean Christopherson @ 2021-03-23 17:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Borislav Petkov, Kai Huang, kvm, x86, linux-sgx, linux-kernel,
	jarkko, luto, dave.hansen, rick.p.edgecombe, haitao.huang, tglx,
	mingo, hpa

On Tue, Mar 23, 2021, Paolo Bonzini wrote:
> On 23/03/21 18:02, Sean Christopherson wrote:
> > > That's important, but it's even more important *to developers* that the
> > > commit message spells out why this would be a kernel bug more often than
> > > not.  I for one do not understand it, and I suspect I'm not alone.
> > > 
> > > Maybe (optimistically) once we see that explanation we decide that the
> > > documentation is not important.  Sean, Kai, can you explain it?
> > 
> > Thought of a good analogy that can be used for the changelog and/or docs:
> > 
> > This is effectively a kernel use-after-free of EPC, and due to the way SGX works,
> > the bug is detected at freeing.  Rather than add the page back to the pool of
> > available EPC, the kernel intentionally leaks the page to avoid additional
> > errors in the future.
> > 
> > Does that help?
> 
> Very much, and for me this also settles the question of documentation.
> Borislav or Kai, can you add it to the commit message?

One last thought.  This error/WARN doesn't guarantee that a conflict hasn't
already occurred, e.g. the EPC page was prematurely put back on the list and
already handed out to a second enclave.  In that case there will undoubtedly be
a slew of WARNs/errors leading up to this one, I just wanted to clarify that
intentionally leaking the page doesn't magically cure _all_ use-after-free or
double-use bugs.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 17:06                           ` Paolo Bonzini
  2021-03-23 17:16                             ` Sean Christopherson
@ 2021-03-23 18:16                             ` Borislav Petkov
  1 sibling, 0 replies; 134+ messages in thread
From: Borislav Petkov @ 2021-03-23 18:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Kai Huang, kvm, x86, linux-sgx,
	linux-kernel, jarkko, luto, dave.hansen, rick.p.edgecombe,
	haitao.huang, tglx, mingo, hpa

On Tue, Mar 23, 2021 at 06:06:19PM +0100, Paolo Bonzini wrote:
> Very much, and for me this also settles the question of documentation.
> Borislav or Kai, can you add it to the commit message?

Not only the commit message - that will become hard to find over time. I
believe Documentation/x86/sgx.rst is a good place to put the gist of it
in and refer to it in the warning message.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 16:06                     ` Borislav Petkov
  2021-03-23 16:21                       ` Sean Christopherson
  2021-03-23 16:38                       ` Paolo Bonzini
@ 2021-03-24  9:26                       ` Jarkko Sakkinen
  2 siblings, 0 replies; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-24  9:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sean Christopherson, Kai Huang, kvm, x86, linux-sgx,
	linux-kernel, luto, dave.hansen, rick.p.edgecombe, haitao.huang,
	pbonzini, tglx, mingo, hpa

On Tue, Mar 23, 2021 at 05:06:04PM +0100, Borislav Petkov wrote:
> On Tue, Mar 23, 2021 at 03:45:14PM +0000, Sean Christopherson wrote:
> > Practically speaking, "basic" deployments of SGX VMs will be insulated from
> > this bug.  KVM doesn't support EPC oversubscription, so even if all EPC is
> > exhausted, new VMs will fail to launch, but existing VMs will continue to chug
> > along with no ill effects....
> 
> Ok, so it sounds to me like *at* *least* there should be some writeup in
> Documentation/ explaining to the user what to do when she sees such an
> EREMOVE failure, perhaps the gist of this thread and then possibly the
> error message should point to that doc.
> 
> We will of course have to revisit when this hits the wild and people
> start (or not) hitting this. But judging by past experience, if it is
> there, we will hit it. Murphy says so.
> 
> Thx.

We had recently a steady flush of bug reports about a WARN() in tpm_tis
driver, from all levels of involvement with the kernel. Even people who
don't know what kernel documentation is, got their message through.

When a WARN() triggers anywhere in the kernel, what people tend to do is
that they go to the distro bugzilla, and the issue is quickly escalated
to the corresponding maintainer.

So, what is the part missing from the equation that should be documented
to the kernel documentation. This not a counter argument per se, I just
don't fully understand what is the missing piece that should be put there.

> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 

/Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 16:21                       ` Sean Christopherson
  2021-03-23 16:32                         ` Borislav Petkov
@ 2021-03-24  9:28                         ` Jarkko Sakkinen
  1 sibling, 0 replies; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-24  9:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, Kai Huang, kvm, x86, linux-sgx, linux-kernel,
	luto, dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini,
	tglx, mingo, hpa

On Tue, Mar 23, 2021 at 04:21:47PM +0000, Sean Christopherson wrote:
> On Tue, Mar 23, 2021, Borislav Petkov wrote:
> > On Tue, Mar 23, 2021 at 03:45:14PM +0000, Sean Christopherson wrote:
> > > Practically speaking, "basic" deployments of SGX VMs will be insulated from
> > > this bug.  KVM doesn't support EPC oversubscription, so even if all EPC is
> > > exhausted, new VMs will fail to launch, but existing VMs will continue to chug
> > > along with no ill effects....
> > 
> > Ok, so it sounds to me like *at* *least* there should be some writeup in
> > Documentation/ explaining to the user what to do when she sees such an
> > EREMOVE failure, perhaps the gist of this thread and then possibly the
> > error message should point to that doc.
> > 
> > We will of course have to revisit when this hits the wild and people
> > start (or not) hitting this. But judging by past experience, if it is
> > there, we will hit it. Murphy says so.
> 
> I like the idea of pointing at the documentation.  The documentation should
> probably emphasize that something is very, very wrong.  E.g. if a kernel bug
> triggers EREMOVE failure and isn't detected until the kernel is widely deployed
> in a fleet, then the folks deploying the kernel probably _should_ be in all out
> panic.  For this variety of bug to escape that far, it means there are huge
> holes in test coverage, in both the kernel itself and in the infrasturcture of
> whoever is rolling out their new kernel.

My own experience with WARN()'s has been so far, that the stack trace does
the job fairly well. It's commonly misintepreted same as oops.

/Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-23 16:32                         ` Borislav Petkov
  2021-03-23 16:51                           ` Sean Christopherson
@ 2021-03-24  9:38                           ` Kai Huang
  2021-03-24 10:09                             ` Paolo Bonzini
  1 sibling, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-24  9:38 UTC (permalink / raw)
  To: Borislav Petkov, Sean Christopherson
  Cc: kvm, x86, linux-sgx, linux-kernel, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, tglx, mingo, hpa

On Tue, 2021-03-23 at 17:32 +0100, Borislav Petkov wrote:
> On Tue, Mar 23, 2021 at 04:21:47PM +0000, Sean Christopherson wrote:
> > I like the idea of pointing at the documentation.  The documentation should
> > probably emphasize that something is very, very wrong.
> 
> Yap, because no matter how we formulate the error message, it still ain't enough
> and needs a longer explanation.
> 
> > E.g. if a kernel bug triggers EREMOVE failure and isn't detected until
> > the kernel is widely deployed in a fleet, then the folks deploying the
> > kernel probably _should_ be in all out panic. For this variety of bug
> > to escape that far, it means there are huge holes in test coverage, in
> > both the kernel itself and in the infrasturcture of whoever is rolling
> > out their new kernel.
> 
> You sound just like someone who works at a company with a big fleet, oh
> wait...
> 
> :-)
> 
> And yap, you big fleeted guys will more likely catch it but we do have
> all these other customers who have a handful of servers only so they
> probably won't be able to do such a wide coverage.
> 
> So I hope they'll appreciate this longer explanation about what to do
> when they hit it. And normally I wouldn't even care but we almost never
> tell people to reboot their boxes to fix sh*t - that's the other OS.
> 
> Thx.
> 

Hi Sean, Boris, Paolo,

Thanks for the discussion. I tried to digest all your conversations and
hopefully I have understood you correctly. I pasted the new patch here
(not full patch, but relevant part only). I modified the error msg, added
some writeup to Documentation/x86/sgx.rst, and put Sean's explanation of this
bug to the commit msg (per Paolo). I am terrible Documentation writer, so
please help to check and give comments. Thanks!

---
commit 1e297a535bcb4f51a08343c40207520017d85efe (HEAD)
Author: Kai Huang <kai.huang@intel.com>
Date:   Wed Jan 20 03:40:53 2021 +0200

    x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
    
    EREMOVE takes a page and removes any association between that page and
    an enclave.  It must be run on a page before it can be added into
    another enclave.  Currently, EREMOVE is run as part of pages being freed
    into the SGX page allocator.  It is not expected to fail.
    
    KVM does not track how guest pages are used, which means that SGX
    virtualization use of EREMOVE might fail.  Specifically, it is
    legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
    KVM guest, because KVM/kernel doesn't track SECS pages.
    
    Break out the EREMOVE call from the SGX page allocator.  This will allow
    the SGX virtualization code to use the allocator directly.  (SGX/KVM
    will also introduce a more permissive EREMOVE helper).
    
    Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
    more specific that it is used to free EPC page assigned host enclave.
    Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call
    sites so there's no functional change.
    
    Improve error message when EREMOVE fails, and kernel is about to leak
    EPC page, which is likely a kernel bug.  This is effectively a kernel
    use-after-free of EPC, and due to the way SGX works, the bug is detected
    at freeing.  Rather than add the page back to the pool of available EPC,
    the kernel intentionally leaks the page to avoid additional errors in
    the future.
    
    Also add documentation to explain to user what is the bug and suggest
    user what to do when this bug happens, although extremely unlikely.
    
    Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Kai Huang <kai.huang@intel.com>

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index eaee1368b4fd..cb0428a8f4dd 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -209,3 +209,29 @@ An application may be loaded into a container enclave which is
specially
 configured with a library OS and run-time which permits the application to run.
 The enclave run-time and library OS work together to execute the application
 when a thread enters the enclave.
+
+Impact of Potential Kernel SGX Bugs
+===================================
+
+EPC leaks
+---------
+
+Although extremely unlikely, EPC leaks can happen if kernel SGX bug happens,
+when a WARNING with below message is shown in dmesg:
+
+"...EREMOVE returned ..., kernel bug likely.  EPC page leaked, SGX may become
+unusuable.  Please refer to Documentation/x86/sgx.rst for more information."
+
+This is effectively a kernel use-after-free of EPC, and due to the way SGX
+works, the bug is detected at freeing. Rather than add the page back to the pool
+of available EPC, the kernel intentionally leaks the page to avoid additional
+errors in the future.
+
+When this happens, kernel will likely soon leak majority of EPC pages, and SGX
+will likely become unusable. However while this may be fatal to SGX, other
+kernel functionalities are unlikely to be impacted, and should continue to work.
+
+As a result, when this happpens, user should stop running any new SGX workloads,
+(or just any new workloads), and migrate all valuable workloads, for instance,
+virtual machines, to other places. Although a machine reboot can recover all
+EPC, debugging and fixing this bug is appreciated.
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 7449ef33f081..26c0987153de 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -78,7 +78,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page
*encl_page,
 
        ret = __sgx_encl_eldu(encl_page, epc_page, secs_page);
        if (ret) {
-               sgx_free_epc_page(epc_page);
+               sgx_encl_free_epc_page(epc_page);
                return ERR_PTR(ret);
        }
 
@@ -404,7 +404,7 @@ void sgx_encl_release(struct kref *ref)
                        if (sgx_unmark_page_reclaimable(entry->epc_page))
                                continue;
 
-                       sgx_free_epc_page(entry->epc_page);
+                       sgx_encl_free_epc_page(entry->epc_page);
                        encl->secs_child_cnt--;
                        entry->epc_page = NULL;
                }
@@ -415,7 +415,7 @@ void sgx_encl_release(struct kref *ref)
        xa_destroy(&encl->page_array);
 
        if (!encl->secs_child_cnt && encl->secs.epc_page) {
-               sgx_free_epc_page(encl->secs.epc_page);
+               sgx_encl_free_epc_page(encl->secs.epc_page);
                encl->secs.epc_page = NULL;
        }
 
@@ -423,7 +423,7 @@ void sgx_encl_release(struct kref *ref)
                va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
                                           list);
                list_del(&va_page->list);
-               sgx_free_epc_page(va_page->epc_page);
+               sgx_encl_free_epc_page(va_page->epc_page);
                kfree(va_page);
        }
 
@@ -686,7 +686,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
        ret = __epa(sgx_get_epc_virt_addr(epc_page));
        if (ret) {
                WARN_ONCE(1, "EPA returned %d (0x%x)", ret, ret);
-               sgx_free_epc_page(epc_page);
+               sgx_encl_free_epc_page(epc_page);
                return ERR_PTR(-EFAULT);
        }
 
@@ -735,3 +735,25 @@ bool sgx_va_page_full(struct sgx_va_page *va_page)
 
        return slot == SGX_VA_SLOT_COUNT;
 }
+
+/**
+ * sgx_encl_free_epc_page - free EPC page assigned to an enclave
+ * @page:      EPC page to be freed
+ *
+ * Free EPC page assigned to an enclave.  It does EREMOVE for the page, and
+ * only upon success, it puts the page back to free page list.  Otherwise, it
+ * gives a WARNING to indicate page is leaked, and require reboot to retrieve
+ * leaked pages.
+ */
+void sgx_encl_free_epc_page(struct sgx_epc_page *page)
+{
+       int ret;
+
+       WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
+
+       ret = __eremove(sgx_get_epc_virt_addr(page));
+       if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret))
+               return;
+
+       sgx_free_epc_page(page);
+}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index d8d30ccbef4c..6e74f85b6264 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -115,5 +115,6 @@ struct sgx_epc_page *sgx_alloc_va_page(void);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
+void sgx_encl_free_epc_page(struct sgx_epc_page *page);
 
 #endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 90a5caf76939..772b9c648cf1 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -47,7 +47,7 @@ static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page
*va_page)
        encl->page_cnt--;
 
        if (va_page) {
-               sgx_free_epc_page(va_page->epc_page);
+               sgx_encl_free_epc_page(va_page->epc_page);
                list_del(&va_page->list);
                kfree(va_page);
        }
@@ -117,7 +117,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs
*secs)
        return 0;
 
 err_out:
-       sgx_free_epc_page(encl->secs.epc_page);
+       sgx_encl_free_epc_page(encl->secs.epc_page);
        encl->secs.epc_page = NULL;
 
 err_out_backing:
@@ -365,7 +365,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
        mmap_read_unlock(current->mm);
 
 err_out_free:
-       sgx_free_epc_page(epc_page);
+       sgx_encl_free_epc_page(epc_page);
        kfree(encl_page);
 
        return ret;
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 5c9c5e5fb1fb..6a734f484aa7 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -294,7 +294,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 
                sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
 
-               sgx_free_epc_page(encl->secs.epc_page);
+               sgx_encl_free_epc_page(encl->secs.epc_page);
                encl->secs.epc_page = NULL;
 
                sgx_encl_put_backing(&secs_backing, true);
@@ -609,19 +609,15 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  * sgx_free_epc_page() - Free an EPC page
  * @page:      an EPC page
  *
- * Call EREMOVE for an EPC page and insert it back to the list of free pages.
+ * Put the EPC page back to the list of free pages. It's the caller's
+ * responsibility to make sure that the page is in uninitialized state. In other
+ * words, do EREMOVE, EWB or whatever operation is necessary before calling
+ * this function.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
        struct sgx_epc_section *section = &sgx_epc_sections[page->section];
        struct sgx_numa_node *node = section->node;
-       int ret;
-
-       WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
-
-       ret = __eremove(sgx_get_epc_virt_addr(page));
-       if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
-               return;
 
        spin_lock(&node->lock);
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 653af8ca1a25..a66614f94538 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -13,6 +13,10 @@
 #undef pr_fmt
 #define pr_fmt(fmt) "sgx: " fmt
 
+/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
+#define EREMOVE_ERROR_MESSAGE \
+       "EREMOVE returned %d (0x%x), kernel bug likely.  EPC page leaked, SGX may become
unusuable.  Please refer to Documentation/x86/sgx.rst for more information."
+
 #define SGX_MAX_EPC_SECTIONS           8
 #define SGX_EEXTEND_BLOCK_SIZE         256
 #define SGX_NR_TO_SCAN                 16



^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-24  9:38                           ` Kai Huang
@ 2021-03-24 10:09                             ` Paolo Bonzini
  2021-03-24 10:48                               ` Kai Huang
  2021-03-24 23:23                               ` Kai Huang
  0 siblings, 2 replies; 134+ messages in thread
From: Paolo Bonzini @ 2021-03-24 10:09 UTC (permalink / raw)
  To: Kai Huang, Borislav Petkov, Sean Christopherson
  Cc: kvm, x86, linux-sgx, linux-kernel, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, tglx, mingo, hpa

On 24/03/21 10:38, Kai Huang wrote:
> Hi Sean, Boris, Paolo,
> 
> Thanks for the discussion. I tried to digest all your conversations and
> hopefully I have understood you correctly. I pasted the new patch here
> (not full patch, but relevant part only). I modified the error msg, added
> some writeup to Documentation/x86/sgx.rst, and put Sean's explanation of this
> bug to the commit msg (per Paolo). I am terrible Documentation writer, so
> please help to check and give comments. Thanks!

I have some phrasing suggestions below but that was actually pretty good.

> ---
> commit 1e297a535bcb4f51a08343c40207520017d85efe (HEAD)
> Author: Kai Huang <kai.huang@intel.com>
> Date:   Wed Jan 20 03:40:53 2021 +0200
> 
>      x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
>      
>      EREMOVE takes a page and removes any association between that page and
>      an enclave.  It must be run on a page before it can be added into
>      another enclave.  Currently, EREMOVE is run as part of pages being freed
>      into the SGX page allocator.  It is not expected to fail.
>      
>      KVM does not track how guest pages are used, which means that SGX
>      virtualization use of EREMOVE might fail.  Specifically, it is
>      legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
>      KVM guest, because KVM/kernel doesn't track SECS pages.
>
>      Break out the EREMOVE call from the SGX page allocator.  This will allow
>      the SGX virtualization code to use the allocator directly.  (SGX/KVM
>      will also introduce a more permissive EREMOVE helper).

Ok, I think I got the source of my confusion.  The part in parentheses
is the key.  It was not clear that KVM can deal with EREMOVE failures
*without printing the error*.  Good!

>      Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
>      more specific that it is used to free EPC page assigned host enclave.
>      Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call
>      sites so there's no functional change.
>      
>      Improve error message when EREMOVE fails, and kernel is about to leak
>      EPC page, which is likely a kernel bug.  This is effectively a kernel
>      use-after-free of EPC, and due to the way SGX works, the bug is detected
>      at freeing.  Rather than add the page back to the pool of available EPC,
>      the kernel intentionally leaks the page to avoid additional errors in
>      the future.
>      
>      Also add documentation to explain to user what is the bug and suggest
>      user what to do when this bug happens, although extremely unlikely.

Rewritten:

EREMOVE takes a page and removes any association between that page and
an enclave.  It must be run on a page before it can be added into
another enclave.  Currently, EREMOVE is run as part of pages being freed
into the SGX page allocator.  It is not expected to fail, as it would
indicate a use-after-free of EPC.  Rather than add the page back to the
pool of available EPC, the kernel intentionally leaks the page to avoid
additional errors in the future.

However, KVM does not track how guest pages are used, which means that SGX
virtualization use of EREMOVE might fail.  Specifically, it is
legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
KVM guest, because KVM/kernel doesn't track SECS pages.

To allow SGX/KVM to introduce a more permissive EREMOVE helper and to
let the SGX virtualization code use the allocator directly,
break out the EREMOVE call from the SGX page allocator.  Rename the
original sgx_free_epc_page() to sgx_encl_free_epc_page(),
indicating that it is used to free EPC page assigned host enclave.
Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call
sites so there's no functional change.

At the same time improve error message when EREMOVE fails, and add
documentation to explain to user what is the bug and suggest user what
to do when this bug happens, although extremely unlikely.

> +Although extremely unlikely, EPC leaks can happen if kernel SGX bug happens,
> +when a WARNING with below message is shown in dmesg:

Remove "Although extremely unlikely".

> +"...EREMOVE returned ..., kernel bug likely.  EPC page leaked, SGX may become
> +unusuable.  Please refer to Documentation/x86/sgx.rst for more information."
> +
> +This is effectively a kernel use-after-free of EPC, and due to the way SGX
> +works, the bug is detected at freeing. Rather than add the page back to the pool
> +of available EPC, the kernel intentionally leaks the page to avoid additional
> +errors in the future.
> +
> +When this happens, kernel will likely soon leak majority of EPC pages, and SGX
> +will likely become unusable. However while this may be fatal to SGX, other
> +kernel functionalities are unlikely to be impacted, and should continue to work.
> +
> +As a result, when this happpens, user should stop running any new SGX workloads,
> +(or just any new workloads), and migrate all valuable workloads, for instance,
> +virtual machines, to other places.

Remove everything starting with "for instance".

  Although a machine reboot can recover all
> +EPC, debugging and fixing this bug is appreciated.

Replace the second part with "the bug should be reported to the Linux developers".
The poor user is not expected to debug SGX. ;)

> +/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
> +#define EREMOVE_ERROR_MESSAGE \
> +       "EREMOVE returned %d (0x%x), kernel bug likely.  EPC page leaked, SGX may become
> unusuable.  Please refer to Documentation/x86/sgx.rst for more information."

Rewritten:

EREMOVE returned %d and an EPC page was leaked; SGX may become unusable.
This is a kernel bug, refer to Documentation/x86/sgx.rst for more information.

Also please split it across multiple lines.

Paolo


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-24 10:09                             ` Paolo Bonzini
@ 2021-03-24 10:48                               ` Kai Huang
  2021-03-24 11:24                                 ` Paolo Bonzini
  2021-03-24 23:23                               ` Kai Huang
  1 sibling, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-24 10:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, kvm, x86, linux-sgx,
	linux-kernel, jarkko, luto, dave.hansen, rick.p.edgecombe,
	haitao.huang, tglx, mingo, hpa

On Wed, 24 Mar 2021 11:09:20 +0100 Paolo Bonzini wrote:
> On 24/03/21 10:38, Kai Huang wrote:
> > Hi Sean, Boris, Paolo,
> > 
> > Thanks for the discussion. I tried to digest all your conversations and
> > hopefully I have understood you correctly. I pasted the new patch here
> > (not full patch, but relevant part only). I modified the error msg, added
> > some writeup to Documentation/x86/sgx.rst, and put Sean's explanation of this
> > bug to the commit msg (per Paolo). I am terrible Documentation writer, so
> > please help to check and give comments. Thanks!
> 
> I have some phrasing suggestions below but that was actually pretty good.
> 
> > ---
> > commit 1e297a535bcb4f51a08343c40207520017d85efe (HEAD)
> > Author: Kai Huang <kai.huang@intel.com>
> > Date:   Wed Jan 20 03:40:53 2021 +0200
> > 
> >      x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
> >      
> >      EREMOVE takes a page and removes any association between that page and
> >      an enclave.  It must be run on a page before it can be added into
> >      another enclave.  Currently, EREMOVE is run as part of pages being freed
> >      into the SGX page allocator.  It is not expected to fail.
> >      
> >      KVM does not track how guest pages are used, which means that SGX
> >      virtualization use of EREMOVE might fail.  Specifically, it is
> >      legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
> >      KVM guest, because KVM/kernel doesn't track SECS pages.
> >
> >      Break out the EREMOVE call from the SGX page allocator.  This will allow
> >      the SGX virtualization code to use the allocator directly.  (SGX/KVM
> >      will also introduce a more permissive EREMOVE helper).
> 
> Ok, I think I got the source of my confusion.  The part in parentheses
> is the key.  It was not clear that KVM can deal with EREMOVE failures
> *without printing the error*.  Good!

Yes the "will also introduce a more premissive EREMOVE helper" is done in patch
5 (x86/sgx: Introduce virtual EPC for use by KVM guests).

> 
> >      Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
> >      more specific that it is used to free EPC page assigned host enclave.
> >      Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call
> >      sites so there's no functional change.
> >      
> >      Improve error message when EREMOVE fails, and kernel is about to leak
> >      EPC page, which is likely a kernel bug.  This is effectively a kernel
> >      use-after-free of EPC, and due to the way SGX works, the bug is detected
> >      at freeing.  Rather than add the page back to the pool of available EPC,
> >      the kernel intentionally leaks the page to avoid additional errors in
> >      the future.
> >      
> >      Also add documentation to explain to user what is the bug and suggest
> >      user what to do when this bug happens, although extremely unlikely.
> 
> Rewritten:
> 
> EREMOVE takes a page and removes any association between that page and
> an enclave.  It must be run on a page before it can be added into
> another enclave.  Currently, EREMOVE is run as part of pages being freed
> into the SGX page allocator.  It is not expected to fail, as it would
> indicate a use-after-free of EPC.  Rather than add the page back to the
> pool of available EPC, the kernel intentionally leaks the page to avoid
> additional errors in the future.
> 
> However, KVM does not track how guest pages are used, which means that SGX
> virtualization use of EREMOVE might fail.  Specifically, it is
> legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
> KVM guest, because KVM/kernel doesn't track SECS pages.
> 
> To allow SGX/KVM to introduce a more permissive EREMOVE helper and to
> let the SGX virtualization code use the allocator directly,
> break out the EREMOVE call from the SGX page allocator.  Rename the
> original sgx_free_epc_page() to sgx_encl_free_epc_page(),
> indicating that it is used to free EPC page assigned host enclave.
> Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call
> sites so there's no functional change.
> 
> At the same time improve error message when EREMOVE fails, and add
> documentation to explain to user what is the bug and suggest user what
> to do when this bug happens, although extremely unlikely.

Thanks :)

> 
> > +Although extremely unlikely, EPC leaks can happen if kernel SGX bug happens,
> > +when a WARNING with below message is shown in dmesg:
> 
> Remove "Although extremely unlikely".

Will do.

> 
> > +"...EREMOVE returned ..., kernel bug likely.  EPC page leaked, SGX may become
> > +unusuable.  Please refer to Documentation/x86/sgx.rst for more information."
> > +
> > +This is effectively a kernel use-after-free of EPC, and due to the way SGX
> > +works, the bug is detected at freeing. Rather than add the page back to the pool
> > +of available EPC, the kernel intentionally leaks the page to avoid additional
> > +errors in the future.
> > +
> > +When this happens, kernel will likely soon leak majority of EPC pages, and SGX
> > +will likely become unusable. However while this may be fatal to SGX, other
> > +kernel functionalities are unlikely to be impacted, and should continue to work.
> > +
> > +As a result, when this happpens, user should stop running any new SGX workloads,
> > +(or just any new workloads), and migrate all valuable workloads, for instance,
> > +virtual machines, to other places.
> 
> Remove everything starting with "for instance".

Will do.

> 
>   Although a machine reboot can recover all
> > +EPC, debugging and fixing this bug is appreciated.
> 
> Replace the second part with "the bug should be reported to the Linux developers".
> The poor user is not expected to debug SGX. ;)

Hmm.. Makes sense :)

> 
> > +/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
> > +#define EREMOVE_ERROR_MESSAGE \
> > +       "EREMOVE returned %d (0x%x), kernel bug likely.  EPC page leaked, SGX may become
> > unusuable.  Please refer to Documentation/x86/sgx.rst for more information."
> 
> Rewritten:
> 
> EREMOVE returned %d and an EPC page was leaked; SGX may become unusable.
> This is a kernel bug, refer to Documentation/x86/sgx.rst for more information.

Fine to me, although this would have %d (0x%x) -> %d change in the code.

> 
> Also please split it across multiple lines.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-24 10:48                               ` Kai Huang
@ 2021-03-24 11:24                                 ` Paolo Bonzini
  0 siblings, 0 replies; 134+ messages in thread
From: Paolo Bonzini @ 2021-03-24 11:24 UTC (permalink / raw)
  To: Kai Huang
  Cc: Borislav Petkov, Sean Christopherson, kvm, x86, linux-sgx,
	linux-kernel, jarkko, luto, dave.hansen, rick.p.edgecombe,
	haitao.huang, tglx, mingo, hpa

On 24/03/21 11:48, Kai Huang wrote:
>>> +/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
>>> +#define EREMOVE_ERROR_MESSAGE \
>>> +       "EREMOVE returned %d (0x%x), kernel bug likely.  EPC page leaked, SGX may become
>>> unusuable.  Please refer to Documentation/x86/sgx.rst for more information."
>> Rewritten:
>>
>> EREMOVE returned %d and an EPC page was leaked; SGX may become unusable.
>> This is a kernel bug, refer to Documentation/x86/sgx.rst for more information.
> Fine to me, although this would have %d (0x%x) -> %d change in the code.
> 

Yeah you can of course keep the 0x%x part.

Paolo


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-24 10:09                             ` Paolo Bonzini
  2021-03-24 10:48                               ` Kai Huang
@ 2021-03-24 23:23                               ` Kai Huang
  2021-03-24 23:39                                 ` Paolo Bonzini
  1 sibling, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-24 23:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, kvm, x86, linux-sgx,
	linux-kernel, jarkko, luto, dave.hansen, rick.p.edgecombe,
	haitao.huang, tglx, mingo, hpa


> 
> > +/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
> > +#define EREMOVE_ERROR_MESSAGE \
> > +       "EREMOVE returned %d (0x%x), kernel bug likely.  EPC page leaked, SGX may become
> > unusuable.  Please refer to Documentation/x86/sgx.rst for more information."
> 
> Rewritten:
> 
> EREMOVE returned %d and an EPC page was leaked; SGX may become unusable.
> This is a kernel bug, refer to Documentation/x86/sgx.rst for more information.
> 
> Also please split it across multiple lines.
> 
> Paolo
> 

Hi Boris/Paolo,

I changed to below (with slight modification on Paolo's):

/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
#define EREMOVE_ERROR_MESSAGE \ 
        "EREMOVE returned %d (0x%x) and an EPC page was leaked.  SGX may become unusuable.  " \
        "This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more information."

I got a checkpatch warning however:

WARNING: It's generally not useful to have the filename in the file
#60: FILE: Documentation/x86/sgx.rst:223:
+This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more

I suppose it is OK? Since the error msg is actually hard-coded in the code,
and in this document, IMHO we should explicitly call out what error message user
is supposed to see, when this bug happens, so that user can absolutely know
he/she is dealing with this particular issue.




^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-24 23:23                               ` Kai Huang
@ 2021-03-24 23:39                                 ` Paolo Bonzini
  2021-03-24 23:46                                   ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Paolo Bonzini @ 2021-03-24 23:39 UTC (permalink / raw)
  To: Kai Huang
  Cc: Borislav Petkov, Sean Christopherson, kvm, x86, linux-sgx,
	linux-kernel, jarkko, luto, dave.hansen, rick.p.edgecombe,
	haitao.huang, tglx, mingo, hpa

On 25/03/21 00:23, Kai Huang wrote:
> I changed to below (with slight modification on Paolo's):
> 
> /* Error message for EREMOVE failure, when kernel is about to leak EPC page */
> #define EREMOVE_ERROR_MESSAGE \
>          "EREMOVE returned %d (0x%x) and an EPC page was leaked.  SGX may become unusuable.  " \
>          "This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more information."
> 
> I got a checkpatch warning however:
> 
> WARNING: It's generally not useful to have the filename in the file
> #60: FILE: Documentation/x86/sgx.rst:223:
> +This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more

Yeah, this is more or less a false positive.

Paolo


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-24 23:39                                 ` Paolo Bonzini
@ 2021-03-24 23:46                                   ` Kai Huang
  2021-03-25  8:42                                     ` Borislav Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-24 23:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, kvm, x86, linux-sgx,
	linux-kernel, jarkko, luto, dave.hansen, rick.p.edgecombe,
	haitao.huang, tglx, mingo, hpa

On Thu, 25 Mar 2021 00:39:01 +0100 Paolo Bonzini wrote:
> On 25/03/21 00:23, Kai Huang wrote:
> > I changed to below (with slight modification on Paolo's):
> > 
> > /* Error message for EREMOVE failure, when kernel is about to leak EPC page */
> > #define EREMOVE_ERROR_MESSAGE \
> >          "EREMOVE returned %d (0x%x) and an EPC page was leaked.  SGX may become unusuable.  " \
> >          "This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more information."
> > 
> > I got a checkpatch warning however:
> > 
> > WARNING: It's generally not useful to have the filename in the file
> > #60: FILE: Documentation/x86/sgx.rst:223:
> > +This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more
> 
> Yeah, this is more or less a false positive.
> 
> Paolo
> 

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-24 23:46                                   ` Kai Huang
@ 2021-03-25  8:42                                     ` Borislav Petkov
  2021-03-25  9:38                                       ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-03-25  8:42 UTC (permalink / raw)
  To: Kai Huang
  Cc: Paolo Bonzini, Sean Christopherson, kvm, x86, linux-sgx,
	linux-kernel, jarkko, luto, dave.hansen, rick.p.edgecombe,
	haitao.huang, tglx, mingo, hpa

... so you could send the final version of this patch as a reply to this
thread, now that everyone agrees, so that I can continue going through
the rest.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v4 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-19  7:22 ` [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() Kai Huang
  2021-03-22 18:16   ` Borislav Petkov
@ 2021-03-25  9:30   ` Kai Huang
  2021-03-26 19:48     ` Jarkko Sakkinen
  2021-04-07 10:03     ` [tip: x86/sgx] " tip-bot2 for Kai Huang
  1 sibling, 2 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-25  9:30 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

EREMOVE takes a page and removes any association between that page and
an enclave.  It must be run on a page before it can be added into
another enclave.  Currently, EREMOVE is run as part of pages being freed
into the SGX page allocator.  It is not expected to fail, as it would
indicate a use-after-free of EPC.  Rather than add the page back to the
pool of available EPC, the kernel intentionally leaks the page to avoid
additional errors in the future.

However, KVM does not track how guest pages are used, which means that
SGX virtualization use of EREMOVE might fail.  Specifically, it is
legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
KVM guest, because KVM/kernel doesn't track SECS pages.

To allow SGX/KVM to introduce a more permissive EREMOVE helper and to
let the SGX virtualization code use the allocator directly, break out
the EREMOVE call from the SGX page allocator.  Rename the original
sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that it is
used to free EPC page assigned host enclave. Replace sgx_free_epc_page()
with sgx_encl_free_epc_page() in all call sites so there's no functional
change.

At the same time improve error message when EREMOVE fails, and add
documentation to explain to user what is the bug and suggest user what
to do when this bug happens, although extremely unlikely.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 Documentation/x86/sgx.rst       | 27 +++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.c  | 32 +++++++++++++++++++++++++++-----
 arch/x86/kernel/cpu/sgx/encl.h  |  1 +
 arch/x86/kernel/cpu/sgx/ioctl.c |  6 +++---
 arch/x86/kernel/cpu/sgx/main.c  | 14 +++++---------
 arch/x86/kernel/cpu/sgx/sgx.h   |  5 +++++
 6 files changed, 68 insertions(+), 17 deletions(-)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index eaee1368b4fd..5ec7d17e65e0 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -209,3 +209,30 @@ An application may be loaded into a container enclave which is specially
 configured with a library OS and run-time which permits the application to run.
 The enclave run-time and library OS work together to execute the application
 when a thread enters the enclave.
+
+Impact of Potential Kernel SGX Bugs
+===================================
+
+EPC leaks
+---------
+
+EPC leaks can happen if kernel SGX bug happens, when a WARNING with below
+message is shown in dmesg:
+
+"...EREMOVE returned ... and an EPC page was leaked.  SGX may become unusuable.
+This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more
+information."
+
+This is effectively a kernel use-after-free of EPC, and due to the way SGX
+works, the bug is detected at freeing. Rather than add the page back to the pool
+of available EPC, the kernel intentionally leaks the page to avoid additional
+errors in the future.
+
+When this happens, kernel will likely soon leak majority of EPC pages, and SGX
+will likely become unusable. However while this may be fatal to SGX, other
+kernel functionalities are unlikely to be impacted, and should continue to work.
+
+As a result, when this happpens, user should stop running any new SGX workloads,
+(or just any new workloads), and migrate all valuable workloads. Although a
+machine reboot can recover all EPC, the bug should be reported to Linux
+developers.
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 7449ef33f081..26c0987153de 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -78,7 +78,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	ret = __sgx_encl_eldu(encl_page, epc_page, secs_page);
 	if (ret) {
-		sgx_free_epc_page(epc_page);
+		sgx_encl_free_epc_page(epc_page);
 		return ERR_PTR(ret);
 	}
 
@@ -404,7 +404,7 @@ void sgx_encl_release(struct kref *ref)
 			if (sgx_unmark_page_reclaimable(entry->epc_page))
 				continue;
 
-			sgx_free_epc_page(entry->epc_page);
+			sgx_encl_free_epc_page(entry->epc_page);
 			encl->secs_child_cnt--;
 			entry->epc_page = NULL;
 		}
@@ -415,7 +415,7 @@ void sgx_encl_release(struct kref *ref)
 	xa_destroy(&encl->page_array);
 
 	if (!encl->secs_child_cnt && encl->secs.epc_page) {
-		sgx_free_epc_page(encl->secs.epc_page);
+		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 	}
 
@@ -423,7 +423,7 @@ void sgx_encl_release(struct kref *ref)
 		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
 					   list);
 		list_del(&va_page->list);
-		sgx_free_epc_page(va_page->epc_page);
+		sgx_encl_free_epc_page(va_page->epc_page);
 		kfree(va_page);
 	}
 
@@ -686,7 +686,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
 	ret = __epa(sgx_get_epc_virt_addr(epc_page));
 	if (ret) {
 		WARN_ONCE(1, "EPA returned %d (0x%x)", ret, ret);
-		sgx_free_epc_page(epc_page);
+		sgx_encl_free_epc_page(epc_page);
 		return ERR_PTR(-EFAULT);
 	}
 
@@ -735,3 +735,25 @@ bool sgx_va_page_full(struct sgx_va_page *va_page)
 
 	return slot == SGX_VA_SLOT_COUNT;
 }
+
+/**
+ * sgx_encl_free_epc_page - free EPC page assigned to an enclave
+ * @page:	EPC page to be freed
+ *
+ * Free EPC page assigned to an enclave.  It does EREMOVE for the page, and
+ * only upon success, it puts the page back to free page list.  Otherwise, it
+ * gives a WARNING to indicate page is leaked, and require reboot to retrieve
+ * leaked pages.
+ */
+void sgx_encl_free_epc_page(struct sgx_epc_page *page)
+{
+	int ret;
+
+	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
+
+	ret = __eremove(sgx_get_epc_virt_addr(page));
+	if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret))
+		return;
+
+	sgx_free_epc_page(page);
+}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index d8d30ccbef4c..6e74f85b6264 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -115,5 +115,6 @@ struct sgx_epc_page *sgx_alloc_va_page(void);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
+void sgx_encl_free_epc_page(struct sgx_epc_page *page);
 
 #endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 90a5caf76939..772b9c648cf1 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -47,7 +47,7 @@ static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
 	encl->page_cnt--;
 
 	if (va_page) {
-		sgx_free_epc_page(va_page->epc_page);
+		sgx_encl_free_epc_page(va_page->epc_page);
 		list_del(&va_page->list);
 		kfree(va_page);
 	}
@@ -117,7 +117,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 	return 0;
 
 err_out:
-	sgx_free_epc_page(encl->secs.epc_page);
+	sgx_encl_free_epc_page(encl->secs.epc_page);
 	encl->secs.epc_page = NULL;
 
 err_out_backing:
@@ -365,7 +365,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 	mmap_read_unlock(current->mm);
 
 err_out_free:
-	sgx_free_epc_page(epc_page);
+	sgx_encl_free_epc_page(epc_page);
 	kfree(encl_page);
 
 	return ret;
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 13a7599ce7d4..b227629b1e9c 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -294,7 +294,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 
 		sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
 
-		sgx_free_epc_page(encl->secs.epc_page);
+		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 
 		sgx_encl_put_backing(&secs_backing, true);
@@ -609,19 +609,15 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  * sgx_free_epc_page() - Free an EPC page
  * @page:	an EPC page
  *
- * Call EREMOVE for an EPC page and insert it back to the list of free pages.
+ * Put the EPC page back to the list of free pages. It's the caller's
+ * responsibility to make sure that the page is in uninitialized state. In other
+ * words, do EREMOVE, EWB or whatever operation is necessary before calling
+ * this function.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
 	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
 	struct sgx_numa_node *node = section->node;
-	int ret;
-
-	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
-
-	ret = __eremove(sgx_get_epc_virt_addr(page));
-	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
-		return;
 
 	spin_lock(&node->lock);
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 653af8ca1a25..6b21a165500e 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -13,6 +13,11 @@
 #undef pr_fmt
 #define pr_fmt(fmt) "sgx: " fmt
 
+/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
+#define EREMOVE_ERROR_MESSAGE \
+	"EREMOVE returned %d (0x%x) and an EPC page was leaked.  SGX may become unusuable.  " \
+	"This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more information."
+
 #define SGX_MAX_EPC_SECTIONS		8
 #define SGX_EEXTEND_BLOCK_SIZE		256
 #define SGX_NR_TO_SCAN			16
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-19  7:22 ` [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
@ 2021-03-25  9:36   ` Kai Huang
  2021-03-26 15:03   ` Borislav Petkov
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-25  9:36 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx,
	mingo, hpa


> +
> +static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
> +{
> +	int ret;
> +
> +	/*
> +	 * Take a previously guest-owned EPC page and return it to the
> +	 * general EPC page pool.
> +	 *
> +	 * Guests can not be trusted to have left this page in a good
> +	 * state, so run EREMOVE on the page unconditionally.  In the
> +	 * case that a guest properly EREMOVE'd this page, a superfluous
> +	 * EREMOVE is harmless.
> +	 */
> +	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
> +	if (ret) {
> +		/*
> +		 * Only SGX_CHILD_PRESENT is expected, which is because of
> +		 * EREMOVE'ing an SECS still with child, in which case it can
> +		 * be handled by EREMOVE'ing the SECS again after all pages in
> +		 * virtual EPC have been EREMOVE'd. See comments in below in
> +		 * sgx_vepc_release().
> +		 *
> +		 * The user of virtual EPC (KVM) needs to guarantee there's no
> +		 * logical processor is still running in the enclave in guest,
> +		 * otherwise EREMOVE will get SGX_ENCLAVE_ACT which cannot be
> +		 * handled here.
> +		 */
> +		WARN_ONCE(ret != SGX_CHILD_PRESENT,
> +			  "EREMOVE (EPC page 0x%lx): unexpected error: %d\n",
> +			  sgx_get_epc_phys_addr(epc_page), ret);

Hi Boris,

With the change to patch 3, I think perhaps this WARN_ONCE() should also be
changed to:

                WARN_ONCE(ret != SGX_CHILD_PRESENT, EREMOVE_ERROR_MESSAGE,
                                ret, ret);

> +		return ret;
> +	}
> +
> +	sgx_free_epc_page(epc_page);
> +
> +	return 0;
> +}
>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-25  8:42                                     ` Borislav Petkov
@ 2021-03-25  9:38                                       ` Kai Huang
  2021-03-25 16:52                                         ` Borislav Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-25  9:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paolo Bonzini, Sean Christopherson, kvm, x86, linux-sgx,
	linux-kernel, jarkko, luto, dave.hansen, rick.p.edgecombe,
	haitao.huang, tglx, mingo, hpa

On Thu, 25 Mar 2021 09:42:41 +0100 Borislav Petkov wrote:
> ... so you could send the final version of this patch as a reply to this
> thread, now that everyone agrees, so that I can continue going through
> the rest.
> 

I have sent it by replying to this patch.

[PATCH v4 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

Btw, with this patch being changed, I think there's a place in patch 5 should
also be changed. I have replied patch 5. Please take a look.

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-25  9:38                                       ` Kai Huang
@ 2021-03-25 16:52                                         ` Borislav Petkov
  0 siblings, 0 replies; 134+ messages in thread
From: Borislav Petkov @ 2021-03-25 16:52 UTC (permalink / raw)
  To: Kai Huang
  Cc: Paolo Bonzini, Sean Christopherson, kvm, x86, linux-sgx,
	linux-kernel, jarkko, luto, dave.hansen, rick.p.edgecombe,
	haitao.huang, tglx, mingo, hpa

On Thu, Mar 25, 2021 at 10:38:13PM +1300, Kai Huang wrote:
> I have sent it by replying to this patch.
>
> [PATCH v4 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

Thanks, I've committed the below.

> Btw, with this patch being changed, I think there's a place in patch 5 should
> also be changed. I have replied patch 5. Please take a look.

Ok, thx.

---
From: Kai Huang <kai.huang@intel.com>
Date: Thu, 25 Mar 2021 22:30:57 +1300
Subject: [PATCH] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

EREMOVE takes a page and removes any association between that page and
an enclave. It must be run on a page before it can be added into another
enclave. Currently, EREMOVE is run as part of pages being freed into the
SGX page allocator. It is not expected to fail, as it would indicate a
use-after-free of EPC pages. Rather than add the page back to the pool
of available EPC pages, the kernel intentionally leaks the page to avoid
additional errors in the future.

However, KVM does not track how guest pages are used, which means that
SGX virtualization use of EREMOVE might fail. Specifically, it is
legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
KVM guest, because KVM/kernel doesn't track SECS pages.

To allow SGX/KVM to introduce a more permissive EREMOVE helper and
to let the SGX virtualization code use the allocator directly, break
out the EREMOVE call from the SGX page allocator. Rename the original
sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that
it is used to free an EPC page assigned to a host enclave. Replace
sgx_free_epc_page() with sgx_encl_free_epc_page() in all call sites so
there's no functional change.

At the same time, improve the error message when EREMOVE fails, and
add documentation to explain to the user what that failure means and
to suggest to the user what to do when this bug happens in the case it
happens.

 [ bp: Massage commit message, fix typos and sanitize text, simplify. ]

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210325093057.122834-1-kai.huang@intel.com
---
 Documentation/x86/sgx.rst       | 25 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.c  | 31 ++++++++++++++++++++++++++-----
 arch/x86/kernel/cpu/sgx/encl.h  |  1 +
 arch/x86/kernel/cpu/sgx/ioctl.c |  6 +++---
 arch/x86/kernel/cpu/sgx/main.c  | 14 +++++---------
 arch/x86/kernel/cpu/sgx/sgx.h   |  4 ++++
 6 files changed, 64 insertions(+), 17 deletions(-)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index eaee1368b4fd..f90076e67cde 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -209,3 +209,28 @@ An application may be loaded into a container enclave which is specially
 configured with a library OS and run-time which permits the application to run.
 The enclave run-time and library OS work together to execute the application
 when a thread enters the enclave.
+
+Impact of Potential Kernel SGX Bugs
+===================================
+
+EPC leaks
+---------
+
+When EPC page leaks happen, a WARNING like this is shown in dmesg:
+
+"EREMOVE returned ... and an EPC page was leaked.  SGX may become unusable..."
+
+This is effectively a kernel use-after-free of an EPC page, and due
+to the way SGX works, the bug is detected at freeing. Rather than
+adding the page back to the pool of available EPC pages, the kernel
+intentionally leaks the page to avoid additional errors in the future.
+
+When this happens, the kernel will likely soon leak more EPC pages, and
+SGX will likely become unusable because the memory available to SGX is
+limited. However, while this may be fatal to SGX, the rest of the kernel
+is unlikely to be impacted and should continue to work.
+
+As a result, when this happpens, user should stop running any new
+SGX workloads, (or just any new workloads), and migrate all valuable
+workloads. Although a machine reboot can recover all EPC memory, the bug
+should be reported to Linux developers.
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 7449ef33f081..d25f2a245e1d 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -78,7 +78,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	ret = __sgx_encl_eldu(encl_page, epc_page, secs_page);
 	if (ret) {
-		sgx_free_epc_page(epc_page);
+		sgx_encl_free_epc_page(epc_page);
 		return ERR_PTR(ret);
 	}
 
@@ -404,7 +404,7 @@ void sgx_encl_release(struct kref *ref)
 			if (sgx_unmark_page_reclaimable(entry->epc_page))
 				continue;
 
-			sgx_free_epc_page(entry->epc_page);
+			sgx_encl_free_epc_page(entry->epc_page);
 			encl->secs_child_cnt--;
 			entry->epc_page = NULL;
 		}
@@ -415,7 +415,7 @@ void sgx_encl_release(struct kref *ref)
 	xa_destroy(&encl->page_array);
 
 	if (!encl->secs_child_cnt && encl->secs.epc_page) {
-		sgx_free_epc_page(encl->secs.epc_page);
+		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 	}
 
@@ -423,7 +423,7 @@ void sgx_encl_release(struct kref *ref)
 		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
 					   list);
 		list_del(&va_page->list);
-		sgx_free_epc_page(va_page->epc_page);
+		sgx_encl_free_epc_page(va_page->epc_page);
 		kfree(va_page);
 	}
 
@@ -686,7 +686,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
 	ret = __epa(sgx_get_epc_virt_addr(epc_page));
 	if (ret) {
 		WARN_ONCE(1, "EPA returned %d (0x%x)", ret, ret);
-		sgx_free_epc_page(epc_page);
+		sgx_encl_free_epc_page(epc_page);
 		return ERR_PTR(-EFAULT);
 	}
 
@@ -735,3 +735,24 @@ bool sgx_va_page_full(struct sgx_va_page *va_page)
 
 	return slot == SGX_VA_SLOT_COUNT;
 }
+
+/**
+ * sgx_encl_free_epc_page - free an EPC page assigned to an enclave
+ * @page:	EPC page to be freed
+ *
+ * Free an EPC page assigned to an enclave. It does EREMOVE for the page, and
+ * only upon success, it puts the page back to free page list.  Otherwise, it
+ * gives a WARNING to indicate page is leaked.
+ */
+void sgx_encl_free_epc_page(struct sgx_epc_page *page)
+{
+	int ret;
+
+	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
+
+	ret = __eremove(sgx_get_epc_virt_addr(page));
+	if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret))
+		return;
+
+	sgx_free_epc_page(page);
+}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index d8d30ccbef4c..6e74f85b6264 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -115,5 +115,6 @@ struct sgx_epc_page *sgx_alloc_va_page(void);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
+void sgx_encl_free_epc_page(struct sgx_epc_page *page);
 
 #endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 2e10367ea66c..354e309fcdb7 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -47,7 +47,7 @@ static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
 	encl->page_cnt--;
 
 	if (va_page) {
-		sgx_free_epc_page(va_page->epc_page);
+		sgx_encl_free_epc_page(va_page->epc_page);
 		list_del(&va_page->list);
 		kfree(va_page);
 	}
@@ -117,7 +117,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 	return 0;
 
 err_out:
-	sgx_free_epc_page(encl->secs.epc_page);
+	sgx_encl_free_epc_page(encl->secs.epc_page);
 	encl->secs.epc_page = NULL;
 
 err_out_backing:
@@ -365,7 +365,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 	mmap_read_unlock(current->mm);
 
 err_out_free:
-	sgx_free_epc_page(epc_page);
+	sgx_encl_free_epc_page(epc_page);
 	kfree(encl_page);
 
 	return ret;
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 13a7599ce7d4..b227629b1e9c 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -294,7 +294,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 
 		sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
 
-		sgx_free_epc_page(encl->secs.epc_page);
+		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 
 		sgx_encl_put_backing(&secs_backing, true);
@@ -609,19 +609,15 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  * sgx_free_epc_page() - Free an EPC page
  * @page:	an EPC page
  *
- * Call EREMOVE for an EPC page and insert it back to the list of free pages.
+ * Put the EPC page back to the list of free pages. It's the caller's
+ * responsibility to make sure that the page is in uninitialized state. In other
+ * words, do EREMOVE, EWB or whatever operation is necessary before calling
+ * this function.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
 	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
 	struct sgx_numa_node *node = section->node;
-	int ret;
-
-	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
-
-	ret = __eremove(sgx_get_epc_virt_addr(page));
-	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
-		return;
 
 	spin_lock(&node->lock);
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 653af8ca1a25..4aa40c627819 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -13,6 +13,10 @@
 #undef pr_fmt
 #define pr_fmt(fmt) "sgx: " fmt
 
+#define EREMOVE_ERROR_MESSAGE \
+	"EREMOVE returned %d (0x%x) and an EPC page was leaked. SGX may become unusable. " \
+	"Refer to Documentation/x86/sgx.rst for more information."
+
 #define SGX_MAX_EPC_SECTIONS		8
 #define SGX_EEXTEND_BLOCK_SIZE		256
 #define SGX_NR_TO_SCAN			16
-- 
2.29.2


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-19  7:22 ` [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
  2021-03-25  9:36   ` Kai Huang
@ 2021-03-26 15:03   ` Borislav Petkov
  2021-03-26 15:17     ` Dave Hansen
  2021-03-31  1:10     ` Kai Huang
  2021-04-01  9:42   ` [PATCH v4 " Kai Huang
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 134+ messages in thread
From: Borislav Petkov @ 2021-03-26 15:03 UTC (permalink / raw)
  To: seanjc
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, Mar 19, 2021 at 08:22:21PM +1300, Kai Huang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw" EPC
> without an associated enclave.  The intended and only known use case for
> raw EPC allocation is to expose EPC to a KVM guest, hence the 'vepc'
> moniker, virt.{c,h} files and X86_SGX_KVM Kconfig.
> 
> SGX driver uses misc device /dev/sgx_enclave to support userspace to
> create enclave.  Each file descriptor from opening /dev/sgx_enclave
> represents an enclave.  Unlike SGX driver, KVM doesn't control how guest
> uses EPC, therefore EPC allocated to KVM guest is not associated to an
> enclave, and /dev/sgx_enclave is not suitable for allocating EPC for KVM
> guest.
> 
> Having separate device nodes for SGX driver and KVM virtual EPC also
> allows separate permission control for running host SGX enclaves and
> KVM SGX guests.

Hmm, just a question on the big picture here - that might've popped up
already:

So baremetal uses /dev/sgx_enclave and KVM uses /dev/sgx_vepc. Who's
deciding which of the two has priority?

Let's say all guests start using enclaves and baremetal cannot start any
new ones anymore due to no more memory. Are we ok with that?

What if baremetal creates a big fat enclave and starves guests all of a
sudden. Are we ok with that either?

In general, having two disjoint things give out SGX resources separately
sounds like trouble to me.

IOW, why don't all virt allocations go through /dev/sgx_enclave too, so
that you can have a single place to control all resource allocations?

> To use /dev/sgx_vepc to allocate a virtual EPC instance with particular
> size, the userspace hypervisor opens /dev/sgx_vepc, and uses mmap()
> with the intended size to get an address range of virtual EPC.  Then
> it may use the address range to create one KVM memory slot as virtual
> EPC for guest.
> 
> Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> /dev/sgx_vepc rather than in KVM. Doing so has two major advantages:
> 
>   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
>     just another memory backend for guests.
> 
>   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
>     does not have to export any symbols, changes to reclaim flows don't
>     need to be routed through KVM, SGX's dirty laundry doesn't have to
>     get aired out for the world to see,

Good one. :-)

> and so on and so forth.

> The virtual EPC pages allocated to guests are currently not reclaimable.
> Reclaiming EPC page used by enclave requires a special reclaim mechanism
> separate from normal page reclaim, and that mechanism is not supported
> for virutal EPC pages.  Due to the complications of handling reclaim
> conflicts between guest and host, reclaiming virtual EPC pages is
> significantly more complex than basic support for SGX virtualization.

What happens if someone in the future wants to change that? Someone
needs to write patches or there's a more fundamental stopper issue
involved?

As always, I might be missing something but that doesn't stop me from
being devil's advocate. :-)

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-26 15:03   ` Borislav Petkov
@ 2021-03-26 15:17     ` Dave Hansen
  2021-03-26 15:29       ` Borislav Petkov
  2021-03-31  1:10     ` Kai Huang
  1 sibling, 1 reply; 134+ messages in thread
From: Dave Hansen @ 2021-03-26 15:17 UTC (permalink / raw)
  To: Borislav Petkov, seanjc
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	rick.p.edgecombe, haitao.huang, pbonzini, tglx, mingo, hpa

On 3/26/21 8:03 AM, Borislav Petkov wrote:
> Let's say all guests start using enclaves and baremetal cannot start any
> new ones anymore due to no more memory. Are we ok with that?

Yes, for now.

> What if baremetal creates a big fat enclave and starves guests all of a
> sudden. Are we ok with that either?

Actually, the baremetal enclave will get a large chunk of its resources
reclaimed and stolen from it.  The guests will probably start and the
baremetal will probably thrash until its allocations fail and it is
killed because it couldn't allocate enclave memory in a page fault.

> In general, having two disjoint things give out SGX resources separately
> sounds like trouble to me.

Yes, it's trouble as-is.

We're working on a cgroup controller just for enclave pages that will
apply to guest use and bare metal.  It would have been nice to have up
front, but we're trying to do things incrementally.  A cgroup controller
should solve he vast majority of these issues where users are quarreling
about who gets enclave memory.

BTW, we probably should have laid this out up front in the original
merge, but the plans in order were roughly:

1. Core SGX functionality (merged into 5.11)
2. NUMA and KVM work
3. cgroup controller for enclave pages
4. EDMM support (lets you add/remove pages and change permissions while
   enclave runs.  Current enclaves are stuck with the same memory they
   start with)

After that, things become less clear.  There's some debate whether we
need to rework the VA pages (enclave swapping metadata to prevent
replay) or improve ability to reclaim guest pages.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-26 15:17     ` Dave Hansen
@ 2021-03-26 15:29       ` Borislav Petkov
  2021-03-26 15:35         ` Dave Hansen
  0 siblings, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-03-26 15:29 UTC (permalink / raw)
  To: Dave Hansen
  Cc: seanjc, Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko,
	luto, rick.p.edgecombe, haitao.huang, pbonzini, tglx, mingo, hpa

On Fri, Mar 26, 2021 at 08:17:38AM -0700, Dave Hansen wrote:
> We're working on a cgroup controller just for enclave pages that will
> apply to guest use and bare metal.  It would have been nice to have up
> front, but we're trying to do things incrementally.  A cgroup controller
> should solve he vast majority of these issues where users are quarreling
> about who gets enclave memory.

Maybe I'm missing something but why do you need a cgroup controller
instead of controlling that resource sharing in the sgx core? Or the
cgroup thing has additional functionality which is good to have anyway?

> BTW, we probably should have laid this out up front in the original
> merge, but the plans in order were roughly:
> 
> 1. Core SGX functionality (merged into 5.11)
> 2. NUMA and KVM work
> 3. cgroup controller for enclave pages
> 4. EDMM support (lets you add/remove pages and change permissions while
>    enclave runs.  Current enclaves are stuck with the same memory they
>    start with)

Oh yeah, that helps, thanks!

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-26 15:29       ` Borislav Petkov
@ 2021-03-26 15:35         ` Dave Hansen
  2021-03-26 17:02           ` Borislav Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Dave Hansen @ 2021-03-26 15:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: seanjc, Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko,
	luto, rick.p.edgecombe, haitao.huang, pbonzini, tglx, mingo, hpa

On 3/26/21 8:29 AM, Borislav Petkov wrote:
> On Fri, Mar 26, 2021 at 08:17:38AM -0700, Dave Hansen wrote:
>> We're working on a cgroup controller just for enclave pages that will
>> apply to guest use and bare metal.  It would have been nice to have up
>> front, but we're trying to do things incrementally.  A cgroup controller
>> should solve he vast majority of these issues where users are quarreling
>> about who gets enclave memory.
> Maybe I'm missing something but why do you need a cgroup controller
> instead of controlling that resource sharing in the sgx core? Or the
> cgroup thing has additional functionality which is good to have anyway?

We could do it in the SGX core, but I think what we end up with will end
up looking a lot like a cgroup controller.  It seems like overkill, but
I think there's enough infrastructure to leverage that it's simpler to
do it with cgroups versus anything else.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-26 15:35         ` Dave Hansen
@ 2021-03-26 17:02           ` Borislav Petkov
  0 siblings, 0 replies; 134+ messages in thread
From: Borislav Petkov @ 2021-03-26 17:02 UTC (permalink / raw)
  To: Dave Hansen
  Cc: seanjc, Kai Huang, kvm, x86, linux-sgx, linux-kernel, jarkko,
	luto, rick.p.edgecombe, haitao.huang, pbonzini, tglx, mingo, hpa

On Fri, Mar 26, 2021 at 08:35:34AM -0700, Dave Hansen wrote:
> We could do it in the SGX core, but I think what we end up with will end
> up looking a lot like a cgroup controller.  It seems like overkill, but
> I think there's enough infrastructure to leverage that it's simpler to
> do it with cgroups versus anything else.

Right.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v4 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-25  9:30   ` [PATCH v4 " Kai Huang
@ 2021-03-26 19:48     ` Jarkko Sakkinen
  2021-03-26 20:38       ` Kai Huang
  2021-03-26 21:39       ` Jarkko Sakkinen
  2021-04-07 10:03     ` [tip: x86/sgx] " tip-bot2 for Kai Huang
  1 sibling, 2 replies; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-26 19:48 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa

On Thu, Mar 25, 2021 at 10:30:57PM +1300, Kai Huang wrote:
> EREMOVE takes a page and removes any association between that page and
> an enclave.  It must be run on a page before it can be added into
> another enclave.  Currently, EREMOVE is run as part of pages being freed
> into the SGX page allocator.  It is not expected to fail, as it would
> indicate a use-after-free of EPC.  Rather than add the page back to the
> pool of available EPC, the kernel intentionally leaks the page to avoid
> additional errors in the future.
> 
> However, KVM does not track how guest pages are used, which means that
> SGX virtualization use of EREMOVE might fail.  Specifically, it is
> legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
> KVM guest, because KVM/kernel doesn't track SECS pages.
> 
> To allow SGX/KVM to introduce a more permissive EREMOVE helper and to
> let the SGX virtualization code use the allocator directly, break out
> the EREMOVE call from the SGX page allocator.  Rename the original
> sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that it is
> used to free EPC page assigned host enclave. Replace sgx_free_epc_page()
> with sgx_encl_free_epc_page() in all call sites so there's no functional
> change.
> 
> At the same time improve error message when EREMOVE fails, and add
> documentation to explain to user what is the bug and suggest user what
> to do when this bug happens, although extremely unlikely.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  Documentation/x86/sgx.rst       | 27 +++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/encl.c  | 32 +++++++++++++++++++++++++++-----
>  arch/x86/kernel/cpu/sgx/encl.h  |  1 +
>  arch/x86/kernel/cpu/sgx/ioctl.c |  6 +++---
>  arch/x86/kernel/cpu/sgx/main.c  | 14 +++++---------
>  arch/x86/kernel/cpu/sgx/sgx.h   |  5 +++++
>  6 files changed, 68 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> index eaee1368b4fd..5ec7d17e65e0 100644
> --- a/Documentation/x86/sgx.rst
> +++ b/Documentation/x86/sgx.rst
> @@ -209,3 +209,30 @@ An application may be loaded into a container enclave which is specially
>  configured with a library OS and run-time which permits the application to run.
>  The enclave run-time and library OS work together to execute the application
>  when a thread enters the enclave.
> +
> +Impact of Potential Kernel SGX Bugs
> +===================================
> +
> +EPC leaks
> +---------
> +
> +EPC leaks can happen if kernel SGX bug happens, when a WARNING with below
> +message is shown in dmesg:
> +
> +"...EREMOVE returned ... and an EPC page was leaked.  SGX may become unusuable.
> +This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more
> +information."
> +
> +This is effectively a kernel use-after-free of EPC, and due to the way SGX
> +works, the bug is detected at freeing. Rather than add the page back to the pool
> +of available EPC, the kernel intentionally leaks the page to avoid additional
> +errors in the future.
> +
> +When this happens, kernel will likely soon leak majority of EPC pages, and SGX
> +will likely become unusable. However while this may be fatal to SGX, other
> +kernel functionalities are unlikely to be impacted, and should continue to work.
> +
> +As a result, when this happpens, user should stop running any new SGX workloads,
> +(or just any new workloads), and migrate all valuable workloads. Although a
> +machine reboot can recover all EPC, the bug should be reported to Linux
> +developers.
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 7449ef33f081..26c0987153de 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -78,7 +78,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  
>  	ret = __sgx_encl_eldu(encl_page, epc_page, secs_page);
>  	if (ret) {
> -		sgx_free_epc_page(epc_page);
> +		sgx_encl_free_epc_page(epc_page);
>  		return ERR_PTR(ret);
>  	}
>  
> @@ -404,7 +404,7 @@ void sgx_encl_release(struct kref *ref)
>  			if (sgx_unmark_page_reclaimable(entry->epc_page))
>  				continue;
>  
> -			sgx_free_epc_page(entry->epc_page);
> +			sgx_encl_free_epc_page(entry->epc_page);
>  			encl->secs_child_cnt--;
>  			entry->epc_page = NULL;
>  		}
> @@ -415,7 +415,7 @@ void sgx_encl_release(struct kref *ref)
>  	xa_destroy(&encl->page_array);
>  
>  	if (!encl->secs_child_cnt && encl->secs.epc_page) {
> -		sgx_free_epc_page(encl->secs.epc_page);
> +		sgx_encl_free_epc_page(encl->secs.epc_page);
>  		encl->secs.epc_page = NULL;
>  	}
>  
> @@ -423,7 +423,7 @@ void sgx_encl_release(struct kref *ref)
>  		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
>  					   list);
>  		list_del(&va_page->list);
> -		sgx_free_epc_page(va_page->epc_page);
> +		sgx_encl_free_epc_page(va_page->epc_page);
>  		kfree(va_page);
>  	}
>  
> @@ -686,7 +686,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
>  	ret = __epa(sgx_get_epc_virt_addr(epc_page));
>  	if (ret) {
>  		WARN_ONCE(1, "EPA returned %d (0x%x)", ret, ret);
> -		sgx_free_epc_page(epc_page);
> +		sgx_encl_free_epc_page(epc_page);
>  		return ERR_PTR(-EFAULT);
>  	}
>  
> @@ -735,3 +735,25 @@ bool sgx_va_page_full(struct sgx_va_page *va_page)
>  
>  	return slot == SGX_VA_SLOT_COUNT;
>  }
> +
> +/**
> + * sgx_encl_free_epc_page - free EPC page assigned to an enclave
> + * @page:	EPC page to be freed
> + *
> + * Free EPC page assigned to an enclave.  It does EREMOVE for the page, and
> + * only upon success, it puts the page back to free page list.  Otherwise, it
> + * gives a WARNING to indicate page is leaked, and require reboot to retrieve
> + * leaked pages.
> + */
> +void sgx_encl_free_epc_page(struct sgx_epc_page *page)
> +{
> +	int ret;
> +
> +	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
> +
> +	ret = __eremove(sgx_get_epc_virt_addr(page));
> +	if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret))
> +		return;
> +
> +	sgx_free_epc_page(page);
> +}
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index d8d30ccbef4c..6e74f85b6264 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -115,5 +115,6 @@ struct sgx_epc_page *sgx_alloc_va_page(void);
>  unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
>  void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
>  bool sgx_va_page_full(struct sgx_va_page *va_page);
> +void sgx_encl_free_epc_page(struct sgx_epc_page *page);
>  
>  #endif /* _X86_ENCL_H */
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 90a5caf76939..772b9c648cf1 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -47,7 +47,7 @@ static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
>  	encl->page_cnt--;
>  
>  	if (va_page) {
> -		sgx_free_epc_page(va_page->epc_page);
> +		sgx_encl_free_epc_page(va_page->epc_page);
>  		list_del(&va_page->list);
>  		kfree(va_page);
>  	}
> @@ -117,7 +117,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
>  	return 0;
>  
>  err_out:
> -	sgx_free_epc_page(encl->secs.epc_page);
> +	sgx_encl_free_epc_page(encl->secs.epc_page);
>  	encl->secs.epc_page = NULL;
>  
>  err_out_backing:
> @@ -365,7 +365,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
>  	mmap_read_unlock(current->mm);
>  
>  err_out_free:
> -	sgx_free_epc_page(epc_page);
> +	sgx_encl_free_epc_page(epc_page);
>  	kfree(encl_page);
>  
>  	return ret;
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 13a7599ce7d4..b227629b1e9c 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -294,7 +294,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>  
>  		sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
>  
> -		sgx_free_epc_page(encl->secs.epc_page);
> +		sgx_encl_free_epc_page(encl->secs.epc_page);
>  		encl->secs.epc_page = NULL;
>  
>  		sgx_encl_put_backing(&secs_backing, true);
> @@ -609,19 +609,15 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>   * sgx_free_epc_page() - Free an EPC page
>   * @page:	an EPC page
>   *
> - * Call EREMOVE for an EPC page and insert it back to the list of free pages.
> + * Put the EPC page back to the list of free pages. It's the caller's
> + * responsibility to make sure that the page is in uninitialized state. In other
> + * words, do EREMOVE, EWB or whatever operation is necessary before calling
> + * this function.
>   */
>  void sgx_free_epc_page(struct sgx_epc_page *page)
>  {
>  	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
>  	struct sgx_numa_node *node = section->node;
> -	int ret;
> -
> -	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
> -
> -	ret = __eremove(sgx_get_epc_virt_addr(page));
> -	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
> -		return;
>  
>  	spin_lock(&node->lock);
>  
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 653af8ca1a25..6b21a165500e 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -13,6 +13,11 @@
>  #undef pr_fmt
>  #define pr_fmt(fmt) "sgx: " fmt
>  
> +/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
> +#define EREMOVE_ERROR_MESSAGE \
> +	"EREMOVE returned %d (0x%x) and an EPC page was leaked.  SGX may become unusuable.  " \
> +	"This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more information."


Why this needs to be here and not open coded where it is used?

> +
>  #define SGX_MAX_EPC_SECTIONS		8
>  #define SGX_EEXTEND_BLOCK_SIZE		256
>  #define SGX_NR_TO_SCAN			16
> -- 
> 2.30.2
> 
> 

/Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v4 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-26 19:48     ` Jarkko Sakkinen
@ 2021-03-26 20:38       ` Kai Huang
  2021-03-26 21:39       ` Jarkko Sakkinen
  1 sibling, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-26 20:38 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa


> > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> > index 653af8ca1a25..6b21a165500e 100644
> > --- a/arch/x86/kernel/cpu/sgx/sgx.h
> > +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> > @@ -13,6 +13,11 @@
> >  #undef pr_fmt
> >  #define pr_fmt(fmt) "sgx: " fmt
> >  
> > +/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
> > +#define EREMOVE_ERROR_MESSAGE \
> > +	"EREMOVE returned %d (0x%x) and an EPC page was leaked.  SGX may become unusuable.  " \
> > +	"This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more information."
> 
> 
> Why this needs to be here and not open coded where it is used?
> 
I want to use it in sgx/virt.c as well. Please see my reply to patch 5.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v4 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-26 19:48     ` Jarkko Sakkinen
  2021-03-26 20:38       ` Kai Huang
@ 2021-03-26 21:39       ` Jarkko Sakkinen
  1 sibling, 0 replies; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-26 21:39 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa

On Fri, Mar 26, 2021 at 09:48:48PM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 25, 2021 at 10:30:57PM +1300, Kai Huang wrote:
> > EREMOVE takes a page and removes any association between that page and
> > an enclave.  It must be run on a page before it can be added into
> > another enclave.  Currently, EREMOVE is run as part of pages being freed
> > into the SGX page allocator.  It is not expected to fail, as it would
> > indicate a use-after-free of EPC.  Rather than add the page back to the
> > pool of available EPC, the kernel intentionally leaks the page to avoid
> > additional errors in the future.
> > 
> > However, KVM does not track how guest pages are used, which means that
> > SGX virtualization use of EREMOVE might fail.  Specifically, it is
> > legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
> > KVM guest, because KVM/kernel doesn't track SECS pages.
> > 
> > To allow SGX/KVM to introduce a more permissive EREMOVE helper and to
> > let the SGX virtualization code use the allocator directly, break out
> > the EREMOVE call from the SGX page allocator.  Rename the original
> > sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that it is
> > used to free EPC page assigned host enclave. Replace sgx_free_epc_page()
> > with sgx_encl_free_epc_page() in all call sites so there's no functional
> > change.
> > 
> > At the same time improve error message when EREMOVE fails, and add
> > documentation to explain to user what is the bug and suggest user what
> > to do when this bug happens, although extremely unlikely.
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  Documentation/x86/sgx.rst       | 27 +++++++++++++++++++++++++++
> >  arch/x86/kernel/cpu/sgx/encl.c  | 32 +++++++++++++++++++++++++++-----
> >  arch/x86/kernel/cpu/sgx/encl.h  |  1 +
> >  arch/x86/kernel/cpu/sgx/ioctl.c |  6 +++---
> >  arch/x86/kernel/cpu/sgx/main.c  | 14 +++++---------
> >  arch/x86/kernel/cpu/sgx/sgx.h   |  5 +++++
> >  6 files changed, 68 insertions(+), 17 deletions(-)
> > 
> > diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> > index eaee1368b4fd..5ec7d17e65e0 100644
> > --- a/Documentation/x86/sgx.rst
> > +++ b/Documentation/x86/sgx.rst
> > @@ -209,3 +209,30 @@ An application may be loaded into a container enclave which is specially
> >  configured with a library OS and run-time which permits the application to run.
> >  The enclave run-time and library OS work together to execute the application
> >  when a thread enters the enclave.
> > +
> > +Impact of Potential Kernel SGX Bugs
> > +===================================
> > +
> > +EPC leaks
> > +---------
> > +
> > +EPC leaks can happen if kernel SGX bug happens, when a WARNING with below
> > +message is shown in dmesg:
> > +
> > +"...EREMOVE returned ... and an EPC page was leaked.  SGX may become unusuable.
> > +This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more
> > +information."
> > +
> > +This is effectively a kernel use-after-free of EPC, and due to the way SGX
> > +works, the bug is detected at freeing. Rather than add the page back to the pool
> > +of available EPC, the kernel intentionally leaks the page to avoid additional
> > +errors in the future.
> > +
> > +When this happens, kernel will likely soon leak majority of EPC pages, and SGX
> > +will likely become unusable. However while this may be fatal to SGX, other
> > +kernel functionalities are unlikely to be impacted, and should continue to work.
> > +
> > +As a result, when this happpens, user should stop running any new SGX workloads,
> > +(or just any new workloads), and migrate all valuable workloads. Although a
> > +machine reboot can recover all EPC, the bug should be reported to Linux
> > +developers.
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index 7449ef33f081..26c0987153de 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -78,7 +78,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
> >  
> >  	ret = __sgx_encl_eldu(encl_page, epc_page, secs_page);
> >  	if (ret) {
> > -		sgx_free_epc_page(epc_page);
> > +		sgx_encl_free_epc_page(epc_page);
> >  		return ERR_PTR(ret);
> >  	}
> >  
> > @@ -404,7 +404,7 @@ void sgx_encl_release(struct kref *ref)
> >  			if (sgx_unmark_page_reclaimable(entry->epc_page))
> >  				continue;
> >  
> > -			sgx_free_epc_page(entry->epc_page);
> > +			sgx_encl_free_epc_page(entry->epc_page);
> >  			encl->secs_child_cnt--;
> >  			entry->epc_page = NULL;
> >  		}
> > @@ -415,7 +415,7 @@ void sgx_encl_release(struct kref *ref)
> >  	xa_destroy(&encl->page_array);
> >  
> >  	if (!encl->secs_child_cnt && encl->secs.epc_page) {
> > -		sgx_free_epc_page(encl->secs.epc_page);
> > +		sgx_encl_free_epc_page(encl->secs.epc_page);
> >  		encl->secs.epc_page = NULL;
> >  	}
> >  
> > @@ -423,7 +423,7 @@ void sgx_encl_release(struct kref *ref)
> >  		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
> >  					   list);
> >  		list_del(&va_page->list);
> > -		sgx_free_epc_page(va_page->epc_page);
> > +		sgx_encl_free_epc_page(va_page->epc_page);
> >  		kfree(va_page);
> >  	}
> >  
> > @@ -686,7 +686,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
> >  	ret = __epa(sgx_get_epc_virt_addr(epc_page));
> >  	if (ret) {
> >  		WARN_ONCE(1, "EPA returned %d (0x%x)", ret, ret);
> > -		sgx_free_epc_page(epc_page);
> > +		sgx_encl_free_epc_page(epc_page);
> >  		return ERR_PTR(-EFAULT);
> >  	}
> >  
> > @@ -735,3 +735,25 @@ bool sgx_va_page_full(struct sgx_va_page *va_page)
> >  
> >  	return slot == SGX_VA_SLOT_COUNT;
> >  }
> > +
> > +/**
> > + * sgx_encl_free_epc_page - free EPC page assigned to an enclave
> > + * @page:	EPC page to be freed
> > + *
> > + * Free EPC page assigned to an enclave.  It does EREMOVE for the page, and
> > + * only upon success, it puts the page back to free page list.  Otherwise, it
> > + * gives a WARNING to indicate page is leaked, and require reboot to retrieve
> > + * leaked pages.
> > + */
> > +void sgx_encl_free_epc_page(struct sgx_epc_page *page)
> > +{
> > +	int ret;
> > +
> > +	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
> > +
> > +	ret = __eremove(sgx_get_epc_virt_addr(page));
> > +	if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret))
> > +		return;
> > +
> > +	sgx_free_epc_page(page);
> > +}
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> > index d8d30ccbef4c..6e74f85b6264 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > @@ -115,5 +115,6 @@ struct sgx_epc_page *sgx_alloc_va_page(void);
> >  unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
> >  void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
> >  bool sgx_va_page_full(struct sgx_va_page *va_page);
> > +void sgx_encl_free_epc_page(struct sgx_epc_page *page);
> >  
> >  #endif /* _X86_ENCL_H */
> > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> > index 90a5caf76939..772b9c648cf1 100644
> > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > @@ -47,7 +47,7 @@ static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> >  	encl->page_cnt--;
> >  
> >  	if (va_page) {
> > -		sgx_free_epc_page(va_page->epc_page);
> > +		sgx_encl_free_epc_page(va_page->epc_page);
> >  		list_del(&va_page->list);
> >  		kfree(va_page);
> >  	}
> > @@ -117,7 +117,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
> >  	return 0;
> >  
> >  err_out:
> > -	sgx_free_epc_page(encl->secs.epc_page);
> > +	sgx_encl_free_epc_page(encl->secs.epc_page);
> >  	encl->secs.epc_page = NULL;
> >  
> >  err_out_backing:
> > @@ -365,7 +365,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
> >  	mmap_read_unlock(current->mm);
> >  
> >  err_out_free:
> > -	sgx_free_epc_page(epc_page);
> > +	sgx_encl_free_epc_page(epc_page);
> >  	kfree(encl_page);
> >  
> >  	return ret;
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 13a7599ce7d4..b227629b1e9c 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -294,7 +294,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
> >  
> >  		sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
> >  
> > -		sgx_free_epc_page(encl->secs.epc_page);
> > +		sgx_encl_free_epc_page(encl->secs.epc_page);
> >  		encl->secs.epc_page = NULL;
> >  
> >  		sgx_encl_put_backing(&secs_backing, true);
> > @@ -609,19 +609,15 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> >   * sgx_free_epc_page() - Free an EPC page
> >   * @page:	an EPC page
> >   *
> > - * Call EREMOVE for an EPC page and insert it back to the list of free pages.
> > + * Put the EPC page back to the list of free pages. It's the caller's
> > + * responsibility to make sure that the page is in uninitialized state. In other
> > + * words, do EREMOVE, EWB or whatever operation is necessary before calling
> > + * this function.
> >   */
> >  void sgx_free_epc_page(struct sgx_epc_page *page)
> >  {
> >  	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
> >  	struct sgx_numa_node *node = section->node;
> > -	int ret;
> > -
> > -	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
> > -
> > -	ret = __eremove(sgx_get_epc_virt_addr(page));
> > -	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
> > -		return;
> >  
> >  	spin_lock(&node->lock);
> >  
> > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> > index 653af8ca1a25..6b21a165500e 100644
> > --- a/arch/x86/kernel/cpu/sgx/sgx.h
> > +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> > @@ -13,6 +13,11 @@
> >  #undef pr_fmt
> >  #define pr_fmt(fmt) "sgx: " fmt
> >  
> > +/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
> > +#define EREMOVE_ERROR_MESSAGE \
> > +	"EREMOVE returned %d (0x%x) and an EPC page was leaked.  SGX may become unusuable.  " \
> > +	"This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more information."
> 
> 
> Why this needs to be here and not open coded where it is used?
> 
> > +
> >  #define SGX_MAX_EPC_SECTIONS		8
> >  #define SGX_EEXTEND_BLOCK_SIZE		256
> >  #define SGX_NR_TO_SCAN			16
> > -- 
> > 2.30.2
> > 
> > 

Anyway,

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 00/25] KVM SGX virtualization support
  2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
                   ` (25 preceding siblings ...)
  2021-03-19 14:52 ` [PATCH v3 00/25] KVM SGX virtualization support Jarkko Sakkinen
@ 2021-03-26 22:46 ` Jarkko Sakkinen
  2021-03-28 21:01   ` Huang, Kai
  26 siblings, 1 reply; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-26 22:46 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	jethro, b.thiel, jmattson, joro, vkuznets, wanpengli, corbet

On Fri, Mar 19, 2021 at 08:29:27PM +1300, Kai Huang wrote:
> This series adds KVM SGX virtualization support. The first 14 patches starting
> with x86/sgx or x86/cpu.. are necessary changes to x86 and SGX core/driver to
> support KVM SGX virtualization, while the rest are patches to KVM subsystem.
> 
> This series is based against latest tip/x86/sgx, which has Jarkko's NUMA
> allocation support.
> 
> You can also get the code from upstream branch of kvm-sgx repo on github:
> 
>         https://github.com/intel/kvm-sgx.git upstream
> 
> It also requires Qemu changes to create VM with SGX support. You can find Qemu
> repo here:
> 
> 	https://github.com/intel/qemu-sgx.git upstream
> 
> Please refer to README.md of above qemu-sgx repo for detail on how to create
> guest with SGX support. At meantime, for your quick reference you can use below
> command to create SGX guest:
> 
> 	#qemu-system-x86_64 -smp 4 -m 2G -drive file=<your_vm_image>,if=virtio \
> 		-cpu host,+sgx_provisionkey \
> 		-sgx-epc id=epc1,memdev=mem1 \
> 		-object memory-backend-epc,id=mem1,size=64M,prealloc
> 
> Please note that the SGX relevant part is:
> 
> 		-cpu host,+sgx_provisionkey \
> 		-sgx-epc id=epc1,memdev=mem1 \
> 		-object memory-backend-epc,id=mem1,size=64M,prealloc
> 
> And you can change other parameters of your qemu command based on your needs.

Please also put tested-by from me to all patches (including pure KVM
patches):

Tested-by: Jarkko Sakkinen <jarkko@kernel.org>

I did the basic test, i.e. run selftest in a VM. I think that is
sufficient at this point.

/Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* RE: [PATCH v3 00/25] KVM SGX virtualization support
  2021-03-26 22:46 ` Jarkko Sakkinen
@ 2021-03-28 21:01   ` Huang, Kai
  2021-03-31 23:23     ` Jarkko Sakkinen
  0 siblings, 1 reply; 134+ messages in thread
From: Huang, Kai @ 2021-03-28 21:01 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, luto, Hansen, Dave,
	Edgecombe, Rick P, Huang, Haitao, pbonzini, bp, tglx, mingo, hpa,
	jethro, b.thiel, jmattson, joro, vkuznets, wanpengli, corbet

> On Fri, Mar 19, 2021 at 08:29:27PM +1300, Kai Huang wrote:
> > This series adds KVM SGX virtualization support. The first 14 patches
> > starting with x86/sgx or x86/cpu.. are necessary changes to x86 and
> > SGX core/driver to support KVM SGX virtualization, while the rest are patches
> to KVM subsystem.
> >
> > This series is based against latest tip/x86/sgx, which has Jarkko's
> > NUMA allocation support.
> >
> > You can also get the code from upstream branch of kvm-sgx repo on github:
> >
> >         https://github.com/intel/kvm-sgx.git upstream
> >
> > It also requires Qemu changes to create VM with SGX support. You can
> > find Qemu repo here:
> >
> > 	https://github.com/intel/qemu-sgx.git upstream
> >
> > Please refer to README.md of above qemu-sgx repo for detail on how to
> > create guest with SGX support. At meantime, for your quick reference
> > you can use below command to create SGX guest:
> >
> > 	#qemu-system-x86_64 -smp 4 -m 2G -drive
> file=<your_vm_image>,if=virtio \
> > 		-cpu host,+sgx_provisionkey \
> > 		-sgx-epc id=epc1,memdev=mem1 \
> > 		-object memory-backend-epc,id=mem1,size=64M,prealloc
> >
> > Please note that the SGX relevant part is:
> >
> > 		-cpu host,+sgx_provisionkey \
> > 		-sgx-epc id=epc1,memdev=mem1 \
> > 		-object memory-backend-epc,id=mem1,size=64M,prealloc
> >
> > And you can change other parameters of your qemu command based on your
> needs.
> 
> Please also put tested-by from me to all patches (including pure KVM
> patches):
> 
> Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
> 
> I did the basic test, i.e. run selftest in a VM. I think that is sufficient at this point.
> 

Thanks Jarkko for doing the test!

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-26 15:03   ` Borislav Petkov
  2021-03-26 15:17     ` Dave Hansen
@ 2021-03-31  1:10     ` Kai Huang
  2021-03-31  6:44       ` Boris Petkov
  1 sibling, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-31  1:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: seanjc, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, 26 Mar 2021 16:03:55 +0100 Borislav Petkov wrote:
> On Fri, Mar 19, 2021 at 08:22:21PM +1300, Kai Huang wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw" EPC
> > without an associated enclave.  The intended and only known use case for
> > raw EPC allocation is to expose EPC to a KVM guest, hence the 'vepc'
> > moniker, virt.{c,h} files and X86_SGX_KVM Kconfig.
> > 
> > SGX driver uses misc device /dev/sgx_enclave to support userspace to
> > create enclave.  Each file descriptor from opening /dev/sgx_enclave
> > represents an enclave.  Unlike SGX driver, KVM doesn't control how guest
> > uses EPC, therefore EPC allocated to KVM guest is not associated to an
> > enclave, and /dev/sgx_enclave is not suitable for allocating EPC for KVM
> > guest.
> > 
> > Having separate device nodes for SGX driver and KVM virtual EPC also
> > allows separate permission control for running host SGX enclaves and
> > KVM SGX guests.
> 
> Hmm, just a question on the big picture here - that might've popped up
> already:
> 
> So baremetal uses /dev/sgx_enclave and KVM uses /dev/sgx_vepc. Who's
> deciding which of the two has priority?

Hi Boris,

Sorry the late response (I saw Dave was replying. Thanks Dave :)).

Ultimately the admin, or the user decides, or the two don't have priority, from
EPC page allocation's perspective. SGX driver's EPC page reclaiming won't be
able to reclaim pages that have been allocated to KVM guests, and virtual EPC
fault handler won't try to reclaim page that has been allocated to host enclaves
either, when it tries to allocate EPC page.

For instance, in case of cloud, where KVM SGX is the main usage, SGX driver in
host either won't be used, or very minimal, specific and well-defined workloads
will be deployed in host (for instance, Quoting enclave and architecture
enclaves that are used for attestation). The admin will be aware of such EPC
allocation disjoint situation, and deploy host enclaves/KVM SGX guests
accordingly.

> 
> Let's say all guests start using enclaves and baremetal cannot start any
> new ones anymore due to no more memory. Are we ok with that?
> 
> What if baremetal creates a big fat enclave and starves guests all of a
> sudden. Are we ok with that either?

Yes to both above questions.

> 
> In general, having two disjoint things give out SGX resources separately
> sounds like trouble to me.
> 
> IOW, why don't all virt allocations go through /dev/sgx_enclave too, so
> that you can have a single place to control all resource allocations?

Overall, there are two reasons (also mentioned in the commit msg of this patch):

1) /dev/sgx_enclave, by its name, implies EPC pages allocated to it are
associated to an host enclave, so it is not suitable for virtual EPC, since EPC
allocated to KVM guest won't have an enclave associated. It's possible to
modify SGX driver (such as deferring 'struct sgx_encl' allocation from open to
CREATE_ENCLAVE ioctl, modifying majority code flows to cover both cases, etc),
but even with that, we'd still better to change /dev/sgx_enclave
to, for instance, /dev/sgx_epc, so it doesn't imply the fd opened from it is
an host enclave, but some raw EPC. However this is userspace ABI change. 
2) Having separate /dev/sgx_enclave, and /dev/sgx_vepc, allows admin to have
different permission control, if required.

So based on above reasons, we agreed it's better to have two device nodes.

Please see previous discussion for RFC v4:
https://lore.kernel.org/linux-sgx/c50ffb557166132cf73d0e838d3a5c1f653b28b7.camel@intel.com/

> 
> > To use /dev/sgx_vepc to allocate a virtual EPC instance with particular
> > size, the userspace hypervisor opens /dev/sgx_vepc, and uses mmap()
> > with the intended size to get an address range of virtual EPC.  Then
> > it may use the address range to create one KVM memory slot as virtual
> > EPC for guest.
> > 
> > Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> > /dev/sgx_vepc rather than in KVM. Doing so has two major advantages:
> > 
> >   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
> >     just another memory backend for guests.
> > 
> >   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
> >     does not have to export any symbols, changes to reclaim flows don't
> >     need to be routed through KVM, SGX's dirty laundry doesn't have to
> >     get aired out for the world to see,
> 
> Good one. :-)
> 
> > and so on and so forth.
> 
> > The virtual EPC pages allocated to guests are currently not reclaimable.
> > Reclaiming EPC page used by enclave requires a special reclaim mechanism
> > separate from normal page reclaim, and that mechanism is not supported
> > for virutal EPC pages.  Due to the complications of handling reclaim
> > conflicts between guest and host, reclaiming virtual EPC pages is
> > significantly more complex than basic support for SGX virtualization.
> 
> What happens if someone in the future wants to change that? Someone
> needs to write patches or there's a more fundamental stopper issue
> involved?

Sorry I am not following. Do you mean if someone wants to support "reclaiming
EPC page from KVM guests"? If so yes someone needs to write patches (we
internally have some, actually), but could you elaborate why there will be a
more fundamental stopper issue involved?



^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-31  1:10     ` Kai Huang
@ 2021-03-31  6:44       ` Boris Petkov
  2021-03-31  6:51         ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Boris Petkov @ 2021-03-31  6:44 UTC (permalink / raw)
  To: Kai Huang
  Cc: seanjc, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On March 31, 2021 3:10:32 AM GMT+02:00, Kai Huang <kai.huang@intel.com> wrote: 

> The admin will be aware of
>such EPC
>allocation disjoint situation, and deploy host enclaves/KVM SGX guests
>accordingly.

The admin will be aware because...

1) he's following our discussion?

2) he'll read the commit messages and hopefully understand?

3) we *actually* have documentation somewhere explaining how we envision that stuff to be used?

Or none of the above and he'll end up doing whatever and then he'll eventually figure out that we don't support that use case but he's doing it already anyway and we don't break userspace so we have to support it now and we're stuck somewhere between a rock and a hard place?

Hmm, I think we have enough misguided use cases as it is - don't need another one.

-- 
Sent from a small device: formatting sux and brevity is inevitable.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-31  6:44       ` Boris Petkov
@ 2021-03-31  6:51         ` Kai Huang
  2021-03-31  7:44           ` Boris Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-31  6:51 UTC (permalink / raw)
  To: Boris Petkov
  Cc: seanjc, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Wed, 31 Mar 2021 08:44:23 +0200 Boris Petkov wrote:
> On March 31, 2021 3:10:32 AM GMT+02:00, Kai Huang <kai.huang@intel.com> wrote: 
> 
> > The admin will be aware of
> >such EPC
> >allocation disjoint situation, and deploy host enclaves/KVM SGX guests
> >accordingly.
> 
> The admin will be aware because...
> 
> 1) he's following our discussion?
> 
> 2) he'll read the commit messages and hopefully understand?
> 
> 3) we *actually* have documentation somewhere explaining how we envision that stuff to be used?
> 
> Or none of the above and he'll end up doing whatever and then he'll eventually figure out that we don't support that use case but he's doing it already anyway and we don't break userspace so we have to support it now and we're stuck somewhere between a rock and a hard place?
> 
> Hmm, I think we have enough misguided use cases as it is - don't need another one.

How about adding explanation to Documentation/x86/sgx.rst?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-31  6:51         ` Kai Huang
@ 2021-03-31  7:44           ` Boris Petkov
  2021-03-31  8:53             ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Boris Petkov @ 2021-03-31  7:44 UTC (permalink / raw)
  To: Kai Huang
  Cc: seanjc, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On March 31, 2021 8:51:38 AM GMT+02:00, Kai Huang <kai.huang@intel.com> wrote:
>How about adding explanation to Documentation/x86/sgx.rst?

Sure, and then we should point users at it. The thing is also indexed by search engines so hopefully people will find it.

Thx.

-- 
Sent from a small device: formatting sux and brevity is inevitable.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-31  7:44           ` Boris Petkov
@ 2021-03-31  8:53             ` Kai Huang
  2021-03-31 12:20               ` Kai Huang
  2021-04-01  9:45               ` Kai Huang
  0 siblings, 2 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-31  8:53 UTC (permalink / raw)
  To: Boris Petkov
  Cc: seanjc, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Wed, 31 Mar 2021 09:44:39 +0200 Boris Petkov wrote:
> On March 31, 2021 8:51:38 AM GMT+02:00, Kai Huang <kai.huang@intel.com> wrote:
> >How about adding explanation to Documentation/x86/sgx.rst?
> 
> Sure, and then we should point users at it. The thing is also indexed by search engines so hopefully people will find it.

Thanks. Will do and send out new patch for review.


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-31  8:53             ` Kai Huang
@ 2021-03-31 12:20               ` Kai Huang
  2021-04-01 18:31                 ` Borislav Petkov
  2021-04-01  9:45               ` Kai Huang
  1 sibling, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-31 12:20 UTC (permalink / raw)
  To: Kai Huang
  Cc: Boris Petkov, seanjc, kvm, x86, linux-sgx, linux-kernel, jarkko,
	luto, dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini,
	tglx, mingo, hpa

On Wed, 31 Mar 2021 21:53:45 +1300 Kai Huang wrote:
> On Wed, 31 Mar 2021 09:44:39 +0200 Boris Petkov wrote:
> > On March 31, 2021 8:51:38 AM GMT+02:00, Kai Huang <kai.huang@intel.com> wrote:
> > >How about adding explanation to Documentation/x86/sgx.rst?
> > 
> > Sure, and then we should point users at it. The thing is also indexed by search engines so hopefully people will find it.
> 
> Thanks. Will do and send out new patch for review.
> 
Hi Boris,

Could you help to review whether below change is OK?

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index 5ec7d17e65e0..49a840718a4d 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -236,3 +236,19 @@ As a result, when this happpens, user should stop running
any new SGX workloads, (or just any new workloads), and migrate all valuable
workloads. Although a machine reboot can recover all EPC, the bug should be
reported to Linux developers.
+
+Virtual EPC
+===========
+
+Separated from SGX driver for creating and running enclaves in host, SGX core
+also supports virtual EPC driver to support KVM SGX virtualization. Unlike SGX
+driver, EPC page allocated via virtual EPC driver is "raw" EPC page and doesn't
+have specific enclave associated. This is because KVM doesn't track how guest
+uses EPC pages.
+
+As a result, SGX core page reclaimer doesn't support reclaiming EPC pages
+allocated to KVM guests via virtual EPC driver. If user wants to deploy both
+host SGX applications and KVM SGX guests on the same machine, user should
+reserve enough EPC (by taking out total virtual EPC size of all SGX VMs from
+physical EPC size) for host SGX applications so they can run with acceptable
+performance.

In my local, I have squashed above change to this patch, and also added below
paragraph to the commit message:

    Also add documenetation to explain what is virtual EPC, and suggest
    users should be aware of virtual EPC pages are not reclaimable and take
    this into account when deploying both host SGX applications and KVM SGX
    guests on the same machine.

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 00/25] KVM SGX virtualization support
  2021-03-28 21:01   ` Huang, Kai
@ 2021-03-31 23:23     ` Jarkko Sakkinen
  0 siblings, 0 replies; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-31 23:23 UTC (permalink / raw)
  To: Huang, Kai
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, luto, Hansen, Dave,
	Edgecombe, Rick P, Huang, Haitao, pbonzini, bp, tglx, mingo, hpa,
	jethro, b.thiel, jmattson, joro, vkuznets, wanpengli, corbet

On Sun, Mar 28, 2021 at 09:01:38PM +0000, Huang, Kai wrote:
> > On Fri, Mar 19, 2021 at 08:29:27PM +1300, Kai Huang wrote:
> > > This series adds KVM SGX virtualization support. The first 14 patches
> > > starting with x86/sgx or x86/cpu.. are necessary changes to x86 and
> > > SGX core/driver to support KVM SGX virtualization, while the rest are patches
> > to KVM subsystem.
> > >
> > > This series is based against latest tip/x86/sgx, which has Jarkko's
> > > NUMA allocation support.
> > >
> > > You can also get the code from upstream branch of kvm-sgx repo on github:
> > >
> > >         https://github.com/intel/kvm-sgx.git upstream
> > >
> > > It also requires Qemu changes to create VM with SGX support. You can
> > > find Qemu repo here:
> > >
> > > 	https://github.com/intel/qemu-sgx.git upstream
> > >
> > > Please refer to README.md of above qemu-sgx repo for detail on how to
> > > create guest with SGX support. At meantime, for your quick reference
> > > you can use below command to create SGX guest:
> > >
> > > 	#qemu-system-x86_64 -smp 4 -m 2G -drive
> > file=<your_vm_image>,if=virtio \
> > > 		-cpu host,+sgx_provisionkey \
> > > 		-sgx-epc id=epc1,memdev=mem1 \
> > > 		-object memory-backend-epc,id=mem1,size=64M,prealloc
> > >
> > > Please note that the SGX relevant part is:
> > >
> > > 		-cpu host,+sgx_provisionkey \
> > > 		-sgx-epc id=epc1,memdev=mem1 \
> > > 		-object memory-backend-epc,id=mem1,size=64M,prealloc
> > >
> > > And you can change other parameters of your qemu command based on your
> > needs.
> > 
> > Please also put tested-by from me to all patches (including pure KVM
> > patches):
> > 
> > Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
> > 
> > I did the basic test, i.e. run selftest in a VM. I think that is sufficient at this point.
> > 
> 
> Thanks Jarkko for doing the test!

Sure, have had it in my queue for some time :-)

/Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v4 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-19  7:22 ` [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
  2021-03-25  9:36   ` Kai Huang
  2021-03-26 15:03   ` Borislav Petkov
@ 2021-04-01  9:42   ` Kai Huang
  2021-04-05  9:01   ` [PATCH v3 " Borislav Petkov
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  4 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-04-01  9:42 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw" EPC
without an associated enclave.  The intended and only known use case for
raw EPC allocation is to expose EPC to a KVM guest, hence the 'vepc'
moniker, virt.{c,h} files and X86_SGX_KVM Kconfig.

SGX driver uses misc device /dev/sgx_enclave to support userspace to
create enclave.  Each file descriptor from opening /dev/sgx_enclave
represents an enclave.  Unlike SGX driver, KVM doesn't control how guest
uses EPC, therefore EPC allocated to KVM guest is not associated to an
enclave, and /dev/sgx_enclave is not suitable for allocating EPC for KVM
guest.

Having separate device nodes for SGX driver and KVM virtual EPC also
allows separate permission control for running host SGX enclaves and
KVM SGX guests.

To use /dev/sgx_vepc to allocate a virtual EPC instance with particular
size, the userspace hypervisor opens /dev/sgx_vepc, and uses mmap()
with the intended size to get an address range of virtual EPC.  Then
it may use the address range to create one KVM memory slot as virtual
EPC for guest.

Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
/dev/sgx_vepc rather than in KVM. Doing so has two major advantages:

  - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
    just another memory backend for guests.

  - EPC management is wholly contained in the SGX subsystem, e.g. SGX
    does not have to export any symbols, changes to reclaim flows don't
    need to be routed through KVM, SGX's dirty laundry doesn't have to
    get aired out for the world to see, and so on and so forth.

The virtual EPC pages allocated to guests are currently not reclaimable.
Reclaiming EPC page used by enclave requires a special reclaim mechanism
separate from normal page reclaim, and that mechanism is not supported
for virutal EPC pages.  Due to the complications of handling reclaim
conflicts between guest and host, reclaiming virtual EPC pages is
significantly more complex than basic support for SGX virtualization.

Also add documenetation to explain what is virtual EPC, and suggest
users should be aware of virtual EPC pages are not reclaimable and take
this into account when deploying both host SGX applications and KVM SGX
guests on the same machine.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v3->v4:
 - Added documentation to explain virtual EPC, and suggest user what to do if
   user wants to run both host SGX apps and KVM SGX guests, since EPC pages
   assigned to guest is not reclaimable.
 - Added one paragraph in commit msg to say Documentation has been updated.
 - Changed to use EREMOVE_ERROR_MESSAGE in sgx_vepc_free_page().

---
 Documentation/x86/sgx.rst        |  16 ++
 arch/x86/Kconfig                 |  12 ++
 arch/x86/kernel/cpu/sgx/Makefile |   1 +
 arch/x86/kernel/cpu/sgx/sgx.h    |   9 ++
 arch/x86/kernel/cpu/sgx/virt.c   | 259 +++++++++++++++++++++++++++++++
 5 files changed, 297 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.c

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index 5ec7d17e65e0..49a840718a4d 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -236,3 +236,19 @@ As a result, when this happpens, user should stop running any new SGX workloads,
 (or just any new workloads), and migrate all valuable workloads. Although a
 machine reboot can recover all EPC, the bug should be reported to Linux
 developers.
+
+Virtual EPC
+===========
+
+Separated from SGX driver for creating and running enclaves in host, SGX core
+also supports virtual EPC driver to support KVM SGX virtualization. Unlike SGX
+driver, EPC page allocated via virtual EPC driver is "raw" EPC page and doesn't
+have specific enclave associated. This is because KVM doesn't track how guest
+uses EPC pages.
+
+As a result, SGX core page reclaimer doesn't support reclaiming EPC pages
+allocated to KVM guests via virtual EPC driver. If user wants to deploy both
+host SGX applications and KVM SGX guests on the same machine, user should
+reserve enough EPC (by taking out total virtual EPC size of all SGX VMs from
+physical EPC size) for host SGX applications so they can run with acceptable
+performance.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 35391e94bd22..007912f67a06 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1942,6 +1942,18 @@ config X86_SGX
 
 	  If unsure, say N.
 
+config X86_SGX_KVM
+	bool "Software Guard eXtensions (SGX) Virtualization"
+	depends on X86_SGX && KVM_INTEL
+	help
+
+	  Enables KVM guests to create SGX enclaves.
+
+	  This includes support to expose "raw" unreclaimable enclave memory to
+	  guests via a device node, e.g. /dev/sgx_vepc.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 91d3dc784a29..9c1656779b2a 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -3,3 +3,4 @@ obj-y += \
 	encl.o \
 	ioctl.o \
 	main.o
+obj-$(CONFIG_X86_SGX_KVM)	+= virt.o
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 6b21a165500e..165bd2596843 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -85,4 +85,13 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
 
+#ifdef CONFIG_X86_SGX_KVM
+int __init sgx_vepc_init(void);
+#else
+static inline int __init sgx_vepc_init(void)
+{
+	return -ENODEV;
+}
+#endif
+
 #endif /* _X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
new file mode 100644
index 000000000000..42ff7dae09bb
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -0,0 +1,259 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Device driver to expose SGX enclave memory to KVM guests.
+ *
+ * Copyright(c) 2021 Intel Corporation.
+ */
+
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/xarray.h>
+#include <asm/sgx.h>
+#include <uapi/asm/sgx.h>
+
+#include "encls.h"
+#include "sgx.h"
+
+struct sgx_vepc {
+	struct xarray page_array;
+	struct mutex lock;
+};
+
+/*
+ * Temporary SECS pages that cannot be EREMOVE'd due to having child in other
+ * virtual EPC instances, and the lock to protect it.
+ */
+static struct mutex zombie_secs_pages_lock;
+static struct list_head zombie_secs_pages;
+
+static int __sgx_vepc_fault(struct sgx_vepc *vepc,
+			    struct vm_area_struct *vma, unsigned long addr)
+{
+	struct sgx_epc_page *epc_page;
+	unsigned long index, pfn;
+	int ret;
+
+	WARN_ON(!mutex_is_locked(&vepc->lock));
+
+	/* Calculate index of EPC page in virtual EPC's page_array */
+	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
+
+	epc_page = xa_load(&vepc->page_array, index);
+	if (epc_page)
+		return 0;
+
+	epc_page = sgx_alloc_epc_page(vepc, false);
+	if (IS_ERR(epc_page))
+		return PTR_ERR(epc_page);
+
+	ret = xa_err(xa_store(&vepc->page_array, index, epc_page, GFP_KERNEL));
+	if (ret)
+		goto err_free;
+
+	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
+
+	ret = vmf_insert_pfn(vma, addr, pfn);
+	if (ret != VM_FAULT_NOPAGE) {
+		ret = -EFAULT;
+		goto err_delete;
+	}
+
+	return 0;
+
+err_delete:
+	xa_erase(&vepc->page_array, index);
+err_free:
+	sgx_free_epc_page(epc_page);
+	return ret;
+}
+
+static vm_fault_t sgx_vepc_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_vepc *vepc = vma->vm_private_data;
+	int ret;
+
+	mutex_lock(&vepc->lock);
+	ret = __sgx_vepc_fault(vepc, vma, vmf->address);
+	mutex_unlock(&vepc->lock);
+
+	if (!ret)
+		return VM_FAULT_NOPAGE;
+
+	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
+		mmap_read_unlock(vma->vm_mm);
+		return VM_FAULT_RETRY;
+	}
+
+	return VM_FAULT_SIGBUS;
+}
+
+const struct vm_operations_struct sgx_vepc_vm_ops = {
+	.fault = sgx_vepc_fault,
+};
+
+static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct sgx_vepc *vepc = file->private_data;
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
+	vma->vm_ops = &sgx_vepc_vm_ops;
+	/* Don't copy VMA in fork() */
+	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
+	vma->vm_private_data = vepc;
+
+	return 0;
+}
+
+static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+{
+	int ret;
+
+	/*
+	 * Take a previously guest-owned EPC page and return it to the
+	 * general EPC page pool.
+	 *
+	 * Guests can not be trusted to have left this page in a good
+	 * state, so run EREMOVE on the page unconditionally.  In the
+	 * case that a guest properly EREMOVE'd this page, a superfluous
+	 * EREMOVE is harmless.
+	 */
+	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
+	if (ret) {
+		/*
+		 * Only SGX_CHILD_PRESENT is expected, which is because of
+		 * EREMOVE'ing an SECS still with child, in which case it can
+		 * be handled by EREMOVE'ing the SECS again after all pages in
+		 * virtual EPC have been EREMOVE'd. See comments in below in
+		 * sgx_vepc_release().
+		 *
+		 * The user of virtual EPC (KVM) needs to guarantee there's no
+		 * logical processor is still running in the enclave in guest,
+		 * otherwise EREMOVE will get SGX_ENCLAVE_ACT which cannot be
+		 * handled here.
+		 */
+		WARN_ONCE(ret != SGX_CHILD_PRESENT, EREMOVE_ERROR_MESSAGE,
+				ret, ret);
+		return ret;
+	}
+
+	sgx_free_epc_page(epc_page);
+
+	return 0;
+}
+
+static int sgx_vepc_release(struct inode *inode, struct file *file)
+{
+	struct sgx_vepc *vepc = file->private_data;
+	struct sgx_epc_page *epc_page, *tmp, *entry;
+	unsigned long index;
+
+	LIST_HEAD(secs_pages);
+
+	xa_for_each(&vepc->page_array, index, entry) {
+		/*
+		 * Remove all normal, child pages.  sgx_vepc_free_page()
+		 * will fail if EREMOVE fails, but this is OK and expected on
+		 * SECS pages.  Those can only be EREMOVE'd *after* all their
+		 * child pages. Retries below will clean them up.
+		 */
+		if (sgx_vepc_free_page(entry))
+			continue;
+
+		xa_erase(&vepc->page_array, index);
+	}
+
+	/*
+	 * Retry EREMOVE'ing pages.  This will clean up any SECS pages that
+	 * only had children in this 'epc' area.
+	 */
+	xa_for_each(&vepc->page_array, index, entry) {
+		epc_page = entry;
+		/*
+		 * An EREMOVE failure here means that the SECS page still
+		 * has children.  But, since all children in this 'sgx_vepc'
+		 * have been removed, the SECS page must have a child on
+		 * another instance.
+		 */
+		if (sgx_vepc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+
+		xa_erase(&vepc->page_array, index);
+	}
+
+	/*
+	 * SECS pages are "pinned" by child pages, and unpinned once all
+	 * children have been EREMOVE'd.  A child page in this instance
+	 * may have pinned an SECS page encountered in an earlier release(),
+	 * creating a zombie.  Since some children were EREMOVE'd above,
+	 * try to EREMOVE all zombies in the hopes that one was unpinned.
+	 */
+	mutex_lock(&zombie_secs_pages_lock);
+	list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) {
+		/*
+		 * Speculatively remove the page from the list of zombies,
+		 * if the page is successfully EREMOVE it will be added to
+		 * the list of free pages.  If EREMOVE fails, throw the page
+		 * on the local list, which will be spliced on at the end.
+		 */
+		list_del(&epc_page->list);
+
+		if (sgx_vepc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+	}
+
+	if (!list_empty(&secs_pages))
+		list_splice_tail(&secs_pages, &zombie_secs_pages);
+	mutex_unlock(&zombie_secs_pages_lock);
+
+	kfree(vepc);
+
+	return 0;
+}
+
+static int sgx_vepc_open(struct inode *inode, struct file *file)
+{
+	struct sgx_vepc *vepc;
+
+	vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL);
+	if (!vepc)
+		return -ENOMEM;
+	mutex_init(&vepc->lock);
+	xa_init(&vepc->page_array);
+
+	file->private_data = vepc;
+
+	return 0;
+}
+
+static const struct file_operations sgx_vepc_fops = {
+	.owner		= THIS_MODULE,
+	.open		= sgx_vepc_open,
+	.release	= sgx_vepc_release,
+	.mmap		= sgx_vepc_mmap,
+};
+
+static struct miscdevice sgx_vepc_dev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "sgx_vepc",
+	.nodename = "sgx_vepc",
+	.fops = &sgx_vepc_fops,
+};
+
+int __init sgx_vepc_init(void)
+{
+	/* SGX virtualization requires KVM to work */
+	if (!boot_cpu_has(X86_FEATURE_VMX))
+		return -ENODEV;
+
+	INIT_LIST_HEAD(&zombie_secs_pages);
+	mutex_init(&zombie_secs_pages_lock);
+
+	return misc_register(&sgx_vepc_dev);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-31  8:53             ` Kai Huang
  2021-03-31 12:20               ` Kai Huang
@ 2021-04-01  9:45               ` Kai Huang
  1 sibling, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-04-01  9:45 UTC (permalink / raw)
  To: Kai Huang
  Cc: Boris Petkov, seanjc, kvm, x86, linux-sgx, linux-kernel, jarkko,
	luto, dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini,
	tglx, mingo, hpa

On Wed, 31 Mar 2021 21:53:45 +1300 Kai Huang wrote:
> On Wed, 31 Mar 2021 09:44:39 +0200 Boris Petkov wrote:
> > On March 31, 2021 8:51:38 AM GMT+02:00, Kai Huang <kai.huang@intel.com> wrote:
> > >How about adding explanation to Documentation/x86/sgx.rst?
> > 
> > Sure, and then we should point users at it. The thing is also indexed by search engines so hopefully people will find it.
> 
> Thanks. Will do and send out new patch for review.
> 

Hi Boris,

I have sent out the updated patch, with documentation added. I also added
changelog in the patch. please help to take a look. Thanks!

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-31 12:20               ` Kai Huang
@ 2021-04-01 18:31                 ` Borislav Petkov
  2021-04-01 23:38                   ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-04-01 18:31 UTC (permalink / raw)
  To: Kai Huang
  Cc: seanjc, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Thu, Apr 01, 2021 at 01:20:39AM +1300, Kai Huang wrote:
> Could you help to review whether below change is OK?

I ended up applying this:

---
From: Sean Christopherson <sean.j.christopherson@intel.com>
Date: Fri, 19 Mar 2021 20:22:21 +1300
Subject: [PATCH] x86/sgx: Introduce virtual EPC for use by KVM guests

Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw"
Enclave Page Cache (EPC) without an associated enclave. The intended
and only known use case for raw EPC allocation is to expose EPC to a
KVM guest, hence the 'vepc' moniker, virt.{c,h} files and X86_SGX_KVM
Kconfig.

The SGX driver uses the misc device /dev/sgx_enclave to support
userspace in creating an enclave. Each file descriptor returned from
opening /dev/sgx_enclave represents an enclave. Unlike the SGX driver,
KVM doesn't control how the guest uses the EPC, therefore EPC allocated
to a KVM guest is not associated with an enclave, and /dev/sgx_enclave
is not suitable for allocating EPC for a KVM guest.

Having separate device nodes for the SGX driver and KVM virtual EPC also
allows separate permission control for running host SGX enclaves and KVM
SGX guests.

To use /dev/sgx_vepc to allocate a virtual EPC instance with particular
size, the hypervisor opens /dev/sgx_vepc, and uses mmap() with the
intended size to get an address range of virtual EPC. Then it may use
the address range to create one KVM memory slot as virtual EPC for
a guest.

Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
/dev/sgx_vepc rather than in KVM. Doing so has two major advantages:

  - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
    just another memory backend for guests.

  - EPC management is wholly contained in the SGX subsystem, e.g. SGX
    does not have to export any symbols, changes to reclaim flows don't
    need to be routed through KVM, SGX's dirty laundry doesn't have to
    get aired out for the world to see, and so on and so forth.

The virtual EPC pages allocated to guests are currently not reclaimable.
Reclaiming an EPC page used by enclave requires a special reclaim
mechanism separate from normal page reclaim, and that mechanism is not
supported for virutal EPC pages. Due to the complications of handling
reclaim conflicts between guest and host, reclaiming virtual EPC pages
is significantly more complex than basic support for SGX virtualization.

 [ bp:
   - Massage commit message and comments
   - use cpu_feature_enabled()
   - vertically align struct members init
   - massage Virtual EPC clarification text ]

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/0c38ced8c8e5a69872db4d6a1c0dabd01e07cad7.1616136308.git.kai.huang@intel.com
---
 Documentation/x86/sgx.rst        |  16 ++
 arch/x86/Kconfig                 |  12 ++
 arch/x86/kernel/cpu/sgx/Makefile |   1 +
 arch/x86/kernel/cpu/sgx/sgx.h    |   9 ++
 arch/x86/kernel/cpu/sgx/virt.c   | 259 +++++++++++++++++++++++++++++++
 5 files changed, 297 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.c

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index f90076e67cde..dd0ac96ff9ef 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -234,3 +234,19 @@ As a result, when this happpens, user should stop running any new
 SGX workloads, (or just any new workloads), and migrate all valuable
 workloads. Although a machine reboot can recover all EPC memory, the bug
 should be reported to Linux developers.
+
+
+Virtual EPC
+===========
+
+The implementation has also a virtual EPC driver to support SGX enclaves
+in guests. Unlike the SGX driver, an EPC page allocated by the virtual
+EPC driver doesn't have a specific enclave associated with it. This is
+because KVM doesn't track how a guest uses EPC pages.
+
+As a result, the SGX core page reclaimer doesn't support reclaiming EPC
+pages allocated to KVM guests through the virtual EPC driver. If the
+user wants to deploy SGX applications both on the host and in guests
+on the same machine, the user should reserve enough EPC (by taking out
+total virtual EPC size of all SGX VMs from the physical EPC size) for
+host SGX applications so they can run with acceptable performance.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 35391e94bd22..007912f67a06 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1942,6 +1942,18 @@ config X86_SGX
 
 	  If unsure, say N.
 
+config X86_SGX_KVM
+	bool "Software Guard eXtensions (SGX) Virtualization"
+	depends on X86_SGX && KVM_INTEL
+	help
+
+	  Enables KVM guests to create SGX enclaves.
+
+	  This includes support to expose "raw" unreclaimable enclave memory to
+	  guests via a device node, e.g. /dev/sgx_vepc.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 91d3dc784a29..9c1656779b2a 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -3,3 +3,4 @@ obj-y += \
 	encl.o \
 	ioctl.o \
 	main.o
+obj-$(CONFIG_X86_SGX_KVM)	+= virt.o
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4aa40c627819..4854f3980edd 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -84,4 +84,13 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
 
+#ifdef CONFIG_X86_SGX_KVM
+int __init sgx_vepc_init(void);
+#else
+static inline int __init sgx_vepc_init(void)
+{
+	return -ENODEV;
+}
+#endif
+
 #endif /* _X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
new file mode 100644
index 000000000000..259cc46ad78c
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -0,0 +1,259 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Device driver to expose SGX enclave memory to KVM guests.
+ *
+ * Copyright(c) 2021 Intel Corporation.
+ */
+
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/xarray.h>
+#include <asm/sgx.h>
+#include <uapi/asm/sgx.h>
+
+#include "encls.h"
+#include "sgx.h"
+
+struct sgx_vepc {
+	struct xarray page_array;
+	struct mutex lock;
+};
+
+/*
+ * Temporary SECS pages that cannot be EREMOVE'd due to having child in other
+ * virtual EPC instances, and the lock to protect it.
+ */
+static struct mutex zombie_secs_pages_lock;
+static struct list_head zombie_secs_pages;
+
+static int __sgx_vepc_fault(struct sgx_vepc *vepc,
+			    struct vm_area_struct *vma, unsigned long addr)
+{
+	struct sgx_epc_page *epc_page;
+	unsigned long index, pfn;
+	int ret;
+
+	WARN_ON(!mutex_is_locked(&vepc->lock));
+
+	/* Calculate index of EPC page in virtual EPC's page_array */
+	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
+
+	epc_page = xa_load(&vepc->page_array, index);
+	if (epc_page)
+		return 0;
+
+	epc_page = sgx_alloc_epc_page(vepc, false);
+	if (IS_ERR(epc_page))
+		return PTR_ERR(epc_page);
+
+	ret = xa_err(xa_store(&vepc->page_array, index, epc_page, GFP_KERNEL));
+	if (ret)
+		goto err_free;
+
+	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
+
+	ret = vmf_insert_pfn(vma, addr, pfn);
+	if (ret != VM_FAULT_NOPAGE) {
+		ret = -EFAULT;
+		goto err_delete;
+	}
+
+	return 0;
+
+err_delete:
+	xa_erase(&vepc->page_array, index);
+err_free:
+	sgx_free_epc_page(epc_page);
+	return ret;
+}
+
+static vm_fault_t sgx_vepc_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_vepc *vepc = vma->vm_private_data;
+	int ret;
+
+	mutex_lock(&vepc->lock);
+	ret = __sgx_vepc_fault(vepc, vma, vmf->address);
+	mutex_unlock(&vepc->lock);
+
+	if (!ret)
+		return VM_FAULT_NOPAGE;
+
+	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
+		mmap_read_unlock(vma->vm_mm);
+		return VM_FAULT_RETRY;
+	}
+
+	return VM_FAULT_SIGBUS;
+}
+
+const struct vm_operations_struct sgx_vepc_vm_ops = {
+	.fault = sgx_vepc_fault,
+};
+
+static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct sgx_vepc *vepc = file->private_data;
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
+	vma->vm_ops = &sgx_vepc_vm_ops;
+	/* Don't copy VMA in fork() */
+	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
+	vma->vm_private_data = vepc;
+
+	return 0;
+}
+
+static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+{
+	int ret;
+
+	/*
+	 * Take a previously guest-owned EPC page and return it to the
+	 * general EPC page pool.
+	 *
+	 * Guests can not be trusted to have left this page in a good
+	 * state, so run EREMOVE on the page unconditionally.  In the
+	 * case that a guest properly EREMOVE'd this page, a superfluous
+	 * EREMOVE is harmless.
+	 */
+	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
+	if (ret) {
+		/*
+		 * Only SGX_CHILD_PRESENT is expected, which is because of
+		 * EREMOVE'ing an SECS still with child, in which case it can
+		 * be handled by EREMOVE'ing the SECS again after all pages in
+		 * virtual EPC have been EREMOVE'd. See comments in below in
+		 * sgx_vepc_release().
+		 *
+		 * The user of virtual EPC (KVM) needs to guarantee there's no
+		 * logical processor is still running in the enclave in guest,
+		 * otherwise EREMOVE will get SGX_ENCLAVE_ACT which cannot be
+		 * handled here.
+		 */
+		WARN_ONCE(ret != SGX_CHILD_PRESENT, EREMOVE_ERROR_MESSAGE,
+			  ret, ret);
+		return ret;
+	}
+
+	sgx_free_epc_page(epc_page);
+
+	return 0;
+}
+
+static int sgx_vepc_release(struct inode *inode, struct file *file)
+{
+	struct sgx_vepc *vepc = file->private_data;
+	struct sgx_epc_page *epc_page, *tmp, *entry;
+	unsigned long index;
+
+	LIST_HEAD(secs_pages);
+
+	xa_for_each(&vepc->page_array, index, entry) {
+		/*
+		 * Remove all normal, child pages.  sgx_vepc_free_page()
+		 * will fail if EREMOVE fails, but this is OK and expected on
+		 * SECS pages.  Those can only be EREMOVE'd *after* all their
+		 * child pages. Retries below will clean them up.
+		 */
+		if (sgx_vepc_free_page(entry))
+			continue;
+
+		xa_erase(&vepc->page_array, index);
+	}
+
+	/*
+	 * Retry EREMOVE'ing pages.  This will clean up any SECS pages that
+	 * only had children in this 'epc' area.
+	 */
+	xa_for_each(&vepc->page_array, index, entry) {
+		epc_page = entry;
+		/*
+		 * An EREMOVE failure here means that the SECS page still
+		 * has children.  But, since all children in this 'sgx_vepc'
+		 * have been removed, the SECS page must have a child on
+		 * another instance.
+		 */
+		if (sgx_vepc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+
+		xa_erase(&vepc->page_array, index);
+	}
+
+	/*
+	 * SECS pages are "pinned" by child pages, and "unpinned" once all
+	 * children have been EREMOVE'd.  A child page in this instance
+	 * may have pinned an SECS page encountered in an earlier release(),
+	 * creating a zombie.  Since some children were EREMOVE'd above,
+	 * try to EREMOVE all zombies in the hopes that one was unpinned.
+	 */
+	mutex_lock(&zombie_secs_pages_lock);
+	list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) {
+		/*
+		 * Speculatively remove the page from the list of zombies,
+		 * if the page is successfully EREMOVE'd it will be added to
+		 * the list of free pages.  If EREMOVE fails, throw the page
+		 * on the local list, which will be spliced on at the end.
+		 */
+		list_del(&epc_page->list);
+
+		if (sgx_vepc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+	}
+
+	if (!list_empty(&secs_pages))
+		list_splice_tail(&secs_pages, &zombie_secs_pages);
+	mutex_unlock(&zombie_secs_pages_lock);
+
+	kfree(vepc);
+
+	return 0;
+}
+
+static int sgx_vepc_open(struct inode *inode, struct file *file)
+{
+	struct sgx_vepc *vepc;
+
+	vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL);
+	if (!vepc)
+		return -ENOMEM;
+	mutex_init(&vepc->lock);
+	xa_init(&vepc->page_array);
+
+	file->private_data = vepc;
+
+	return 0;
+}
+
+static const struct file_operations sgx_vepc_fops = {
+	.owner		= THIS_MODULE,
+	.open		= sgx_vepc_open,
+	.release	= sgx_vepc_release,
+	.mmap		= sgx_vepc_mmap,
+};
+
+static struct miscdevice sgx_vepc_dev = {
+	.minor		= MISC_DYNAMIC_MINOR,
+	.name		= "sgx_vepc",
+	.nodename	= "sgx_vepc",
+	.fops		= &sgx_vepc_fops,
+};
+
+int __init sgx_vepc_init(void)
+{
+	/* SGX virtualization requires KVM to work */
+	if (!cpu_feature_enabled(X86_FEATURE_VMX))
+		return -ENODEV;
+
+	INIT_LIST_HEAD(&zombie_secs_pages);
+	mutex_init(&zombie_secs_pages_lock);
+
+	return misc_register(&sgx_vepc_dev);
+}
-- 
2.29.2

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-04-01 18:31                 ` Borislav Petkov
@ 2021-04-01 23:38                   ` Kai Huang
  0 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-04-01 23:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: seanjc, kvm, x86, linux-sgx, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Thu, 1 Apr 2021 20:31:59 +0200 Borislav Petkov wrote:
> On Thu, Apr 01, 2021 at 01:20:39AM +1300, Kai Huang wrote:
> > Could you help to review whether below change is OK?
> 
> I ended up applying this:

Thank you!

> 
> ---
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> Date: Fri, 19 Mar 2021 20:22:21 +1300
> Subject: [PATCH] x86/sgx: Introduce virtual EPC for use by KVM guests
> 
> Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw"
> Enclave Page Cache (EPC) without an associated enclave. The intended
> and only known use case for raw EPC allocation is to expose EPC to a
> KVM guest, hence the 'vepc' moniker, virt.{c,h} files and X86_SGX_KVM
> Kconfig.
> 
> The SGX driver uses the misc device /dev/sgx_enclave to support
> userspace in creating an enclave. Each file descriptor returned from
> opening /dev/sgx_enclave represents an enclave. Unlike the SGX driver,
> KVM doesn't control how the guest uses the EPC, therefore EPC allocated
> to a KVM guest is not associated with an enclave, and /dev/sgx_enclave
> is not suitable for allocating EPC for a KVM guest.
> 
> Having separate device nodes for the SGX driver and KVM virtual EPC also
> allows separate permission control for running host SGX enclaves and KVM
> SGX guests.
> 
> To use /dev/sgx_vepc to allocate a virtual EPC instance with particular
> size, the hypervisor opens /dev/sgx_vepc, and uses mmap() with the
> intended size to get an address range of virtual EPC. Then it may use
> the address range to create one KVM memory slot as virtual EPC for
> a guest.
> 
> Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> /dev/sgx_vepc rather than in KVM. Doing so has two major advantages:
> 
>   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
>     just another memory backend for guests.
> 
>   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
>     does not have to export any symbols, changes to reclaim flows don't
>     need to be routed through KVM, SGX's dirty laundry doesn't have to
>     get aired out for the world to see, and so on and so forth.
> 
> The virtual EPC pages allocated to guests are currently not reclaimable.
> Reclaiming an EPC page used by enclave requires a special reclaim
> mechanism separate from normal page reclaim, and that mechanism is not
> supported for virutal EPC pages. Due to the complications of handling
> reclaim conflicts between guest and host, reclaiming virtual EPC pages
> is significantly more complex than basic support for SGX virtualization.
> 
>  [ bp:
>    - Massage commit message and comments
>    - use cpu_feature_enabled()
>    - vertically align struct members init
>    - massage Virtual EPC clarification text ]
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Co-developed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Acked-by: Dave Hansen <dave.hansen@intel.com>
> Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> Link: https://lkml.kernel.org/r/0c38ced8c8e5a69872db4d6a1c0dabd01e07cad7.1616136308.git.kai.huang@intel.com
> ---
>  Documentation/x86/sgx.rst        |  16 ++
>  arch/x86/Kconfig                 |  12 ++
>  arch/x86/kernel/cpu/sgx/Makefile |   1 +
>  arch/x86/kernel/cpu/sgx/sgx.h    |   9 ++
>  arch/x86/kernel/cpu/sgx/virt.c   | 259 +++++++++++++++++++++++++++++++
>  5 files changed, 297 insertions(+)
>  create mode 100644 arch/x86/kernel/cpu/sgx/virt.c
> 
> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> index f90076e67cde..dd0ac96ff9ef 100644
> --- a/Documentation/x86/sgx.rst
> +++ b/Documentation/x86/sgx.rst
> @@ -234,3 +234,19 @@ As a result, when this happpens, user should stop running any new
>  SGX workloads, (or just any new workloads), and migrate all valuable
>  workloads. Although a machine reboot can recover all EPC memory, the bug
>  should be reported to Linux developers.
> +
> +
> +Virtual EPC
> +===========
> +
> +The implementation has also a virtual EPC driver to support SGX enclaves
> +in guests. Unlike the SGX driver, an EPC page allocated by the virtual
> +EPC driver doesn't have a specific enclave associated with it. This is
> +because KVM doesn't track how a guest uses EPC pages.
> +
> +As a result, the SGX core page reclaimer doesn't support reclaiming EPC
> +pages allocated to KVM guests through the virtual EPC driver. If the
> +user wants to deploy SGX applications both on the host and in guests
> +on the same machine, the user should reserve enough EPC (by taking out
> +total virtual EPC size of all SGX VMs from the physical EPC size) for
> +host SGX applications so they can run with acceptable performance.
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 35391e94bd22..007912f67a06 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1942,6 +1942,18 @@ config X86_SGX
>  
>  	  If unsure, say N.
>  
> +config X86_SGX_KVM
> +	bool "Software Guard eXtensions (SGX) Virtualization"
> +	depends on X86_SGX && KVM_INTEL
> +	help
> +
> +	  Enables KVM guests to create SGX enclaves.
> +
> +	  This includes support to expose "raw" unreclaimable enclave memory to
> +	  guests via a device node, e.g. /dev/sgx_vepc.
> +
> +	  If unsure, say N.
> +
>  config EFI
>  	bool "EFI runtime service support"
>  	depends on ACPI
> diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
> index 91d3dc784a29..9c1656779b2a 100644
> --- a/arch/x86/kernel/cpu/sgx/Makefile
> +++ b/arch/x86/kernel/cpu/sgx/Makefile
> @@ -3,3 +3,4 @@ obj-y += \
>  	encl.o \
>  	ioctl.o \
>  	main.o
> +obj-$(CONFIG_X86_SGX_KVM)	+= virt.o
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 4aa40c627819..4854f3980edd 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -84,4 +84,13 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
>  int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
>  struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
>  
> +#ifdef CONFIG_X86_SGX_KVM
> +int __init sgx_vepc_init(void);
> +#else
> +static inline int __init sgx_vepc_init(void)
> +{
> +	return -ENODEV;
> +}
> +#endif
> +
>  #endif /* _X86_SGX_H */
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> new file mode 100644
> index 000000000000..259cc46ad78c
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -0,0 +1,259 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Device driver to expose SGX enclave memory to KVM guests.
> + *
> + * Copyright(c) 2021 Intel Corporation.
> + */
> +
> +#include <linux/miscdevice.h>
> +#include <linux/mm.h>
> +#include <linux/mman.h>
> +#include <linux/sched/mm.h>
> +#include <linux/sched/signal.h>
> +#include <linux/slab.h>
> +#include <linux/xarray.h>
> +#include <asm/sgx.h>
> +#include <uapi/asm/sgx.h>
> +
> +#include "encls.h"
> +#include "sgx.h"
> +
> +struct sgx_vepc {
> +	struct xarray page_array;
> +	struct mutex lock;
> +};
> +
> +/*
> + * Temporary SECS pages that cannot be EREMOVE'd due to having child in other
> + * virtual EPC instances, and the lock to protect it.
> + */
> +static struct mutex zombie_secs_pages_lock;
> +static struct list_head zombie_secs_pages;
> +
> +static int __sgx_vepc_fault(struct sgx_vepc *vepc,
> +			    struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct sgx_epc_page *epc_page;
> +	unsigned long index, pfn;
> +	int ret;
> +
> +	WARN_ON(!mutex_is_locked(&vepc->lock));
> +
> +	/* Calculate index of EPC page in virtual EPC's page_array */
> +	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
> +
> +	epc_page = xa_load(&vepc->page_array, index);
> +	if (epc_page)
> +		return 0;
> +
> +	epc_page = sgx_alloc_epc_page(vepc, false);
> +	if (IS_ERR(epc_page))
> +		return PTR_ERR(epc_page);
> +
> +	ret = xa_err(xa_store(&vepc->page_array, index, epc_page, GFP_KERNEL));
> +	if (ret)
> +		goto err_free;
> +
> +	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
> +
> +	ret = vmf_insert_pfn(vma, addr, pfn);
> +	if (ret != VM_FAULT_NOPAGE) {
> +		ret = -EFAULT;
> +		goto err_delete;
> +	}
> +
> +	return 0;
> +
> +err_delete:
> +	xa_erase(&vepc->page_array, index);
> +err_free:
> +	sgx_free_epc_page(epc_page);
> +	return ret;
> +}
> +
> +static vm_fault_t sgx_vepc_fault(struct vm_fault *vmf)
> +{
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct sgx_vepc *vepc = vma->vm_private_data;
> +	int ret;
> +
> +	mutex_lock(&vepc->lock);
> +	ret = __sgx_vepc_fault(vepc, vma, vmf->address);
> +	mutex_unlock(&vepc->lock);
> +
> +	if (!ret)
> +		return VM_FAULT_NOPAGE;
> +
> +	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
> +		mmap_read_unlock(vma->vm_mm);
> +		return VM_FAULT_RETRY;
> +	}
> +
> +	return VM_FAULT_SIGBUS;
> +}
> +
> +const struct vm_operations_struct sgx_vepc_vm_ops = {
> +	.fault = sgx_vepc_fault,
> +};
> +
> +static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct sgx_vepc *vepc = file->private_data;
> +
> +	if (!(vma->vm_flags & VM_SHARED))
> +		return -EINVAL;
> +
> +	vma->vm_ops = &sgx_vepc_vm_ops;
> +	/* Don't copy VMA in fork() */
> +	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
> +	vma->vm_private_data = vepc;
> +
> +	return 0;
> +}
> +
> +static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
> +{
> +	int ret;
> +
> +	/*
> +	 * Take a previously guest-owned EPC page and return it to the
> +	 * general EPC page pool.
> +	 *
> +	 * Guests can not be trusted to have left this page in a good
> +	 * state, so run EREMOVE on the page unconditionally.  In the
> +	 * case that a guest properly EREMOVE'd this page, a superfluous
> +	 * EREMOVE is harmless.
> +	 */
> +	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
> +	if (ret) {
> +		/*
> +		 * Only SGX_CHILD_PRESENT is expected, which is because of
> +		 * EREMOVE'ing an SECS still with child, in which case it can
> +		 * be handled by EREMOVE'ing the SECS again after all pages in
> +		 * virtual EPC have been EREMOVE'd. See comments in below in
> +		 * sgx_vepc_release().
> +		 *
> +		 * The user of virtual EPC (KVM) needs to guarantee there's no
> +		 * logical processor is still running in the enclave in guest,
> +		 * otherwise EREMOVE will get SGX_ENCLAVE_ACT which cannot be
> +		 * handled here.
> +		 */
> +		WARN_ONCE(ret != SGX_CHILD_PRESENT, EREMOVE_ERROR_MESSAGE,
> +			  ret, ret);
> +		return ret;
> +	}
> +
> +	sgx_free_epc_page(epc_page);
> +
> +	return 0;
> +}
> +
> +static int sgx_vepc_release(struct inode *inode, struct file *file)
> +{
> +	struct sgx_vepc *vepc = file->private_data;
> +	struct sgx_epc_page *epc_page, *tmp, *entry;
> +	unsigned long index;
> +
> +	LIST_HEAD(secs_pages);
> +
> +	xa_for_each(&vepc->page_array, index, entry) {
> +		/*
> +		 * Remove all normal, child pages.  sgx_vepc_free_page()
> +		 * will fail if EREMOVE fails, but this is OK and expected on
> +		 * SECS pages.  Those can only be EREMOVE'd *after* all their
> +		 * child pages. Retries below will clean them up.
> +		 */
> +		if (sgx_vepc_free_page(entry))
> +			continue;
> +
> +		xa_erase(&vepc->page_array, index);
> +	}
> +
> +	/*
> +	 * Retry EREMOVE'ing pages.  This will clean up any SECS pages that
> +	 * only had children in this 'epc' area.
> +	 */
> +	xa_for_each(&vepc->page_array, index, entry) {
> +		epc_page = entry;
> +		/*
> +		 * An EREMOVE failure here means that the SECS page still
> +		 * has children.  But, since all children in this 'sgx_vepc'
> +		 * have been removed, the SECS page must have a child on
> +		 * another instance.
> +		 */
> +		if (sgx_vepc_free_page(epc_page))
> +			list_add_tail(&epc_page->list, &secs_pages);
> +
> +		xa_erase(&vepc->page_array, index);
> +	}
> +
> +	/*
> +	 * SECS pages are "pinned" by child pages, and "unpinned" once all
> +	 * children have been EREMOVE'd.  A child page in this instance
> +	 * may have pinned an SECS page encountered in an earlier release(),
> +	 * creating a zombie.  Since some children were EREMOVE'd above,
> +	 * try to EREMOVE all zombies in the hopes that one was unpinned.
> +	 */
> +	mutex_lock(&zombie_secs_pages_lock);
> +	list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) {
> +		/*
> +		 * Speculatively remove the page from the list of zombies,
> +		 * if the page is successfully EREMOVE'd it will be added to
> +		 * the list of free pages.  If EREMOVE fails, throw the page
> +		 * on the local list, which will be spliced on at the end.
> +		 */
> +		list_del(&epc_page->list);
> +
> +		if (sgx_vepc_free_page(epc_page))
> +			list_add_tail(&epc_page->list, &secs_pages);
> +	}
> +
> +	if (!list_empty(&secs_pages))
> +		list_splice_tail(&secs_pages, &zombie_secs_pages);
> +	mutex_unlock(&zombie_secs_pages_lock);
> +
> +	kfree(vepc);
> +
> +	return 0;
> +}
> +
> +static int sgx_vepc_open(struct inode *inode, struct file *file)
> +{
> +	struct sgx_vepc *vepc;
> +
> +	vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL);
> +	if (!vepc)
> +		return -ENOMEM;
> +	mutex_init(&vepc->lock);
> +	xa_init(&vepc->page_array);
> +
> +	file->private_data = vepc;
> +
> +	return 0;
> +}
> +
> +static const struct file_operations sgx_vepc_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= sgx_vepc_open,
> +	.release	= sgx_vepc_release,
> +	.mmap		= sgx_vepc_mmap,
> +};
> +
> +static struct miscdevice sgx_vepc_dev = {
> +	.minor		= MISC_DYNAMIC_MINOR,
> +	.name		= "sgx_vepc",
> +	.nodename	= "sgx_vepc",
> +	.fops		= &sgx_vepc_fops,
> +};
> +
> +int __init sgx_vepc_init(void)
> +{
> +	/* SGX virtualization requires KVM to work */
> +	if (!cpu_feature_enabled(X86_FEATURE_VMX))
> +		return -ENODEV;
> +
> +	INIT_LIST_HEAD(&zombie_secs_pages);
> +	mutex_init(&zombie_secs_pages_lock);
> +
> +	return misc_register(&sgx_vepc_dev);
> +}
> -- 
> 2.29.2
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
  2021-03-19  7:23 ` [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled Kai Huang
@ 2021-04-02  9:48   ` Borislav Petkov
  2021-04-02 11:08     ` Kai Huang
  2021-04-02 15:42     ` Sean Christopherson
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Kai Huang
  1 sibling, 2 replies; 134+ messages in thread
From: Borislav Petkov @ 2021-04-02  9:48 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, Mar 19, 2021 at 08:23:02PM +1300, Kai Huang wrote:
> Modify sgx_init() to always try to initialize the virtual EPC driver,
> even if the SGX driver is disabled.  The SGX driver might be disabled
> if SGX Launch Control is in locked mode, or not supported in the
> hardware at all.  This allows (non-Linux) guests that support non-LC
> configurations to use SGX.
> 
> Acked-by: Dave Hansen <dave.hansen@intel.com>
> Reviewed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 6a734f484aa7..b73114150ff8 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -743,7 +743,15 @@ static int __init sgx_init(void)
>  		goto err_page_cache;
>  	}
>  
> -	ret = sgx_drv_init();
> +	/*
> +	 * Always try to initialize the native *and* KVM drivers.
> +	 * The KVM driver is less picky than the native one and
> +	 * can function if the native one is not supported on the
> +	 * current system or fails to initialize.
> +	 *
> +	 * Error out only if both fail to initialize.
> +	 */
> +	ret = !!sgx_drv_init() & !!sgx_vepc_init();

This is a silly way of writing:

        if (sgx_drv_init() && sgx_vepc_init())
                goto err_kthread;

methinks.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
  2021-04-02  9:48   ` Borislav Petkov
@ 2021-04-02 11:08     ` Kai Huang
  2021-04-02 11:22       ` Borislav Petkov
  2021-04-02 15:42     ` Sean Christopherson
  1 sibling, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-04-02 11:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, 2 Apr 2021 11:48:16 +0200 Borislav Petkov wrote:
> On Fri, Mar 19, 2021 at 08:23:02PM +1300, Kai Huang wrote:
> > Modify sgx_init() to always try to initialize the virtual EPC driver,
> > even if the SGX driver is disabled.  The SGX driver might be disabled
> > if SGX Launch Control is in locked mode, or not supported in the
> > hardware at all.  This allows (non-Linux) guests that support non-LC
> > configurations to use SGX.
> > 
> > Acked-by: Dave Hansen <dave.hansen@intel.com>
> > Reviewed-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/main.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 6a734f484aa7..b73114150ff8 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -743,7 +743,15 @@ static int __init sgx_init(void)
> >  		goto err_page_cache;
> >  	}
> >  
> > -	ret = sgx_drv_init();
> > +	/*
> > +	 * Always try to initialize the native *and* KVM drivers.
> > +	 * The KVM driver is less picky than the native one and
> > +	 * can function if the native one is not supported on the
> > +	 * current system or fails to initialize.
> > +	 *
> > +	 * Error out only if both fail to initialize.
> > +	 */
> > +	ret = !!sgx_drv_init() & !!sgx_vepc_init();
> 
> This is a silly way of writing:
> 
>         if (sgx_drv_init() && sgx_vepc_init())
>                 goto err_kthread;
> 
> methinks.

Works for me. Thanks.

Do you want me to send updated patch?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
  2021-04-02 11:08     ` Kai Huang
@ 2021-04-02 11:22       ` Borislav Petkov
  2021-04-02 11:38         ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-04-02 11:22 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Sat, Apr 03, 2021 at 12:08:10AM +1300, Kai Huang wrote:
> Do you want me to send updated patch?

No need. If I do, I'll ask kindly, otherwise you don't have to do
anything.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
  2021-04-02 11:22       ` Borislav Petkov
@ 2021-04-02 11:38         ` Kai Huang
  0 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-04-02 11:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, 2 Apr 2021 13:22:35 +0200 Borislav Petkov wrote:
> On Sat, Apr 03, 2021 at 12:08:10AM +1300, Kai Huang wrote:
> > Do you want me to send updated patch?
> 
> No need. If I do, I'll ask kindly, otherwise you don't have to do
> anything.
> 
I see. Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
  2021-04-02  9:48   ` Borislav Petkov
  2021-04-02 11:08     ` Kai Huang
@ 2021-04-02 15:42     ` Sean Christopherson
  2021-04-02 19:08       ` Kai Huang
  2021-04-02 19:19       ` Borislav Petkov
  1 sibling, 2 replies; 134+ messages in thread
From: Sean Christopherson @ 2021-04-02 15:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, kvm, linux-sgx, x86, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, Apr 02, 2021, Borislav Petkov wrote:
> On Fri, Mar 19, 2021 at 08:23:02PM +1300, Kai Huang wrote:
> > Modify sgx_init() to always try to initialize the virtual EPC driver,
> > even if the SGX driver is disabled.  The SGX driver might be disabled
> > if SGX Launch Control is in locked mode, or not supported in the
> > hardware at all.  This allows (non-Linux) guests that support non-LC
> > configurations to use SGX.
> > 
> > Acked-by: Dave Hansen <dave.hansen@intel.com>
> > Reviewed-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/main.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 6a734f484aa7..b73114150ff8 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -743,7 +743,15 @@ static int __init sgx_init(void)
> >  		goto err_page_cache;
> >  	}
> >  
> > -	ret = sgx_drv_init();
> > +	/*
> > +	 * Always try to initialize the native *and* KVM drivers.
> > +	 * The KVM driver is less picky than the native one and
> > +	 * can function if the native one is not supported on the
> > +	 * current system or fails to initialize.
> > +	 *
> > +	 * Error out only if both fail to initialize.
> > +	 */
> > +	ret = !!sgx_drv_init() & !!sgx_vepc_init();
> 
> This is a silly way of writing:
> 
>         if (sgx_drv_init() && sgx_vepc_init())
>                 goto err_kthread;
> 
> methinks.

Nope!  That's wrong, as sgx_epc_init() will not be called if sgx_drv_init()
succeeds.  And writing it as "if (sgx_drv_init() || sgx_vepc_init())" is also
wrong since that would kill SGX when one of the drivers is alive and well.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
  2021-04-02 15:42     ` Sean Christopherson
@ 2021-04-02 19:08       ` Kai Huang
  2021-04-02 19:19       ` Borislav Petkov
  1 sibling, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-04-02 19:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, kvm, linux-sgx, x86, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, 2 Apr 2021 15:42:51 +0000 Sean Christopherson wrote:
> On Fri, Apr 02, 2021, Borislav Petkov wrote:
> > On Fri, Mar 19, 2021 at 08:23:02PM +1300, Kai Huang wrote:
> > > Modify sgx_init() to always try to initialize the virtual EPC driver,
> > > even if the SGX driver is disabled.  The SGX driver might be disabled
> > > if SGX Launch Control is in locked mode, or not supported in the
> > > hardware at all.  This allows (non-Linux) guests that support non-LC
> > > configurations to use SGX.
> > > 
> > > Acked-by: Dave Hansen <dave.hansen@intel.com>
> > > Reviewed-by: Sean Christopherson <seanjc@google.com>
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > ---
> > >  arch/x86/kernel/cpu/sgx/main.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > index 6a734f484aa7..b73114150ff8 100644
> > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > @@ -743,7 +743,15 @@ static int __init sgx_init(void)
> > >  		goto err_page_cache;
> > >  	}
> > >  
> > > -	ret = sgx_drv_init();
> > > +	/*
> > > +	 * Always try to initialize the native *and* KVM drivers.
> > > +	 * The KVM driver is less picky than the native one and
> > > +	 * can function if the native one is not supported on the
> > > +	 * current system or fails to initialize.
> > > +	 *
> > > +	 * Error out only if both fail to initialize.
> > > +	 */
> > > +	ret = !!sgx_drv_init() & !!sgx_vepc_init();
> > 
> > This is a silly way of writing:
> > 
> >         if (sgx_drv_init() && sgx_vepc_init())
> >                 goto err_kthread;
> > 
> > methinks.
> 
> Nope!  That's wrong, as sgx_epc_init() will not be called if sgx_drv_init()
> succeeds.  And writing it as "if (sgx_drv_init() || sgx_vepc_init())" is also
> wrong since that would kill SGX when one of the drivers is alive and well.

Right. Thanks for pointing out.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
  2021-04-02 15:42     ` Sean Christopherson
  2021-04-02 19:08       ` Kai Huang
@ 2021-04-02 19:19       ` Borislav Petkov
  2021-04-02 19:30         ` Sean Christopherson
  1 sibling, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-04-02 19:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, kvm, linux-sgx, x86, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, Apr 02, 2021 at 03:42:51PM +0000, Sean Christopherson wrote:
> Nope!  That's wrong, as sgx_epc_init() will not be called if sgx_drv_init()
> succeeds.  And writing it as "if (sgx_drv_init() || sgx_vepc_init())" is also
> wrong since that would kill SGX when one of the drivers is alive and well.

Bah, right you are.

How about:

	/* Error out only if both fail to initialize. */
        ret = sgx_drv_init();

        if (sgx_vepc_init() && ret)
                goto err_kthread;

And yah, this looks strange so it needs the comment to explain what's
going on here.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
  2021-04-02 19:19       ` Borislav Petkov
@ 2021-04-02 19:30         ` Sean Christopherson
  2021-04-02 19:46           ` Borislav Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Sean Christopherson @ 2021-04-02 19:30 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, kvm, linux-sgx, x86, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, Apr 02, 2021, Borislav Petkov wrote:
> On Fri, Apr 02, 2021 at 03:42:51PM +0000, Sean Christopherson wrote:
> > Nope!  That's wrong, as sgx_epc_init() will not be called if sgx_drv_init()
> > succeeds.  And writing it as "if (sgx_drv_init() || sgx_vepc_init())" is also
> > wrong since that would kill SGX when one of the drivers is alive and well.
> 
> Bah, right you are.
> 
> How about:
> 
> 	/* Error out only if both fail to initialize. */
>         ret = sgx_drv_init();
> 
>         if (sgx_vepc_init() && ret)
>                 goto err_kthread;

Heh, that's what I had originally and used for literally years.  IIRC, I
suggested the "!! & !!" abomination after internal review complained about the
oddness of the above.

FWIW, I think the above is far less likely to be misread, even though I love the
cleverness of the bitwise AND.

> 
> And yah, this looks strange so it needs the comment to explain what's
> going on here.
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
  2021-04-02 19:30         ` Sean Christopherson
@ 2021-04-02 19:46           ` Borislav Petkov
  0 siblings, 0 replies; 134+ messages in thread
From: Borislav Petkov @ 2021-04-02 19:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, kvm, linux-sgx, x86, linux-kernel, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, Apr 02, 2021 at 07:30:23PM +0000, Sean Christopherson wrote:
> Heh, that's what I had originally and used for literally years.  IIRC, I
> suggested the "!! & !!" abomination after internal review complained about the
> oddness of the above.

Whut?

> FWIW, I think the above is far less likely to be misread, even though I love the
> cleverness of the bitwise AND.

The problem with using bitwise operations here is that they don't belong
in a logical expression of this sort - you do those when you actually
work with bitmasks etc and not when you wanna check whether the functions
returned success or not.

Yeah, yeah, the bitwise thing gets you what you want and yadda yadda but
readability is important. That thing that keeps this code maintainable
years from now...

Also, your original suggestion is literally translating the comment in
code, while

	!!sgx_drv_init() & !!sgx_vepc_init()

especially with that bitwise-& in there, makes me go "say what now?!"

So yeah, you were right the first time.

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-19  7:22 ` [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
                     ` (2 preceding siblings ...)
  2021-04-01  9:42   ` [PATCH v4 " Kai Huang
@ 2021-04-05  9:01   ` Borislav Petkov
  2021-04-05 21:46     ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  4 siblings, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-04-05  9:01 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, Mar 19, 2021 at 08:22:21PM +1300, Kai Huang wrote:
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 35391e94bd22..007912f67a06 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1942,6 +1942,18 @@ config X86_SGX
>  
>  	  If unsure, say N.
>  
> +config X86_SGX_KVM
> +	bool "Software Guard eXtensions (SGX) Virtualization"
> +	depends on X86_SGX && KVM_INTEL
> +	help

It seems to me this would fit better under "Virtualization" because even
if I want to enable it, I have to go enable KVM_INTEL first and then
return back here to turn it on too.

And under "Virtualization" is where we enable all kinds of aspects which
belong to it.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-03-19  7:23 ` [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Kai Huang
@ 2021-04-05  9:07   ` Borislav Petkov
  2021-04-05 21:44     ` Kai Huang
  2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
  1 sibling, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-04-05  9:07 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Fri, Mar 19, 2021 at 08:23:08PM +1300, Kai Huang wrote:
> +	/*
> +	 * @secs is an untrusted, userspace-provided address.  It comes from
> +	 * KVM and is assumed to be a valid pointer which points somewhere in
> +	 * userspace.  This can fault and call SGX or other fault handlers when
> +	 * userspace mapping @secs doesn't exist.
> +	 *
> +	 * Add a WARN() to make sure @secs is already valid userspace pointer
> +	 * from caller (KVM), who should already have handled invalid pointer
> +	 * case (for instance, made by malicious guest).  All other checks,
> +	 * such as alignment of @secs, are deferred to ENCLS itself.
> +	 */
> +	WARN_ON_ONCE(!access_ok(secs, PAGE_SIZE));

So why do we continue here then? IOW:

diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index fdfc21263a95..497b06fc6f7f 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -270,7 +270,7 @@ int __init sgx_vepc_init(void)
  *
  * Return:
  * - 0:		ECREATE was successful.
- * - -EFAULT:	ECREATE returned error.
+ * - <0:	ECREATE returned error.
  */
 int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
 		     int *trapnr)
@@ -288,7 +288,9 @@ int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
 	 * case (for instance, made by malicious guest).  All other checks,
 	 * such as alignment of @secs, are deferred to ENCLS itself.
 	 */
-	WARN_ON_ONCE(!access_ok(secs, PAGE_SIZE));
+	if (WARN_ON_ONCE(!access_ok(secs, PAGE_SIZE)))
+		return -EINVAL;
+
 	__uaccess_begin();
 	ret = __ecreate(pageinfo, (void *)secs);
 	__uaccess_end();

> +	__uaccess_begin();
> +	ret = __ecreate(pageinfo, (void *)secs);
> +	__uaccess_end();
> +
> +	if (encls_faulted(ret)) {
> +		*trapnr = ENCLS_TRAPNR(ret);
> +		return -EFAULT;
> +	}
> +
> +	/* ECREATE doesn't return an error code, it faults or succeeds. */
> +	WARN_ON_ONCE(ret);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(sgx_virt_ecreate);
> +
> +static int __sgx_virt_einit(void __user *sigstruct, void __user *token,
> +			    void __user *secs)
> +{
> +	int ret;
> +
> +	/*
> +	 * Make sure all userspace pointers from caller (KVM) are valid.
> +	 * All other checks deferred to ENCLS itself.  Also see comment
> +	 * for @secs in sgx_virt_ecreate().
> +	 */
> +#define SGX_EINITTOKEN_SIZE	304
> +	WARN_ON_ONCE(!access_ok(sigstruct, sizeof(struct sgx_sigstruct)) ||
> +		     !access_ok(token, SGX_EINITTOKEN_SIZE) ||
> +		     !access_ok(secs, PAGE_SIZE));

Ditto.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-04-05  9:07   ` Borislav Petkov
@ 2021-04-05 21:44     ` Kai Huang
  2021-04-06  7:40       ` Borislav Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-04-05 21:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Mon, 5 Apr 2021 11:07:59 +0200 Borislav Petkov wrote:
> On Fri, Mar 19, 2021 at 08:23:08PM +1300, Kai Huang wrote:
> > +	/*
> > +	 * @secs is an untrusted, userspace-provided address.  It comes from
> > +	 * KVM and is assumed to be a valid pointer which points somewhere in
> > +	 * userspace.  This can fault and call SGX or other fault handlers when
> > +	 * userspace mapping @secs doesn't exist.
> > +	 *
> > +	 * Add a WARN() to make sure @secs is already valid userspace pointer
> > +	 * from caller (KVM), who should already have handled invalid pointer
> > +	 * case (for instance, made by malicious guest).  All other checks,
> > +	 * such as alignment of @secs, are deferred to ENCLS itself.
> > +	 */
> > +	WARN_ON_ONCE(!access_ok(secs, PAGE_SIZE));
> 
> So why do we continue here then? IOW:

The intention was to catch KVM bug, since KVM is the only caller, and in current
implementation KVM won't call this function if @secs is not a valid userspace
pointer. But yes we can also return here, but in this case an exception number
must also be specified to *trapnr so that KVM can inject to guest. It's not that
straightforward to decide which exception should we inject, but I think #GP
should be OK. Please see below.

> 
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index fdfc21263a95..497b06fc6f7f 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -270,7 +270,7 @@ int __init sgx_vepc_init(void)
>   *
>   * Return:
>   * - 0:		ECREATE was successful.
> - * - -EFAULT:	ECREATE returned error.
> + * - <0:	ECREATE returned error.
>   */
>  int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
>  		     int *trapnr)
> @@ -288,7 +288,9 @@ int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
>  	 * case (for instance, made by malicious guest).  All other checks,
>  	 * such as alignment of @secs, are deferred to ENCLS itself.
>  	 */
> -	WARN_ON_ONCE(!access_ok(secs, PAGE_SIZE));
> +	if (WARN_ON_ONCE(!access_ok(secs, PAGE_SIZE)))
> +		return -EINVAL;
> +

*trapnr should also be set to an exception before return -EINVAL. It's not
possible to get it from ENCLS_TRAPNR(ret) like below, since ENCLS hasn't been
run yet. I think it makes sense to just set it to #GP(X86_TRAP_GP), since
above error basically means SECS is not pointing to an valid EPC address, and
in such case #GP should happen based on SDM (SDM 40.3 ECREATE).

>  	__uaccess_begin();
>  	ret = __ecreate(pageinfo, (void *)secs);
>  	__uaccess_end();
> 
> > +	__uaccess_begin();
> > +	ret = __ecreate(pageinfo, (void *)secs);
> > +	__uaccess_end();
> > +
> > +	if (encls_faulted(ret)) {
> > +		*trapnr = ENCLS_TRAPNR(ret);
> > +		return -EFAULT;
> > +	}
> > +
> > +	/* ECREATE doesn't return an error code, it faults or succeeds. */
> > +	WARN_ON_ONCE(ret);
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(sgx_virt_ecreate);
> > +
> > +static int __sgx_virt_einit(void __user *sigstruct, void __user *token,
> > +			    void __user *secs)
> > +{
> > +	int ret;
> > +
> > +	/*
> > +	 * Make sure all userspace pointers from caller (KVM) are valid.
> > +	 * All other checks deferred to ENCLS itself.  Also see comment
> > +	 * for @secs in sgx_virt_ecreate().
> > +	 */
> > +#define SGX_EINITTOKEN_SIZE	304
> > +	WARN_ON_ONCE(!access_ok(sigstruct, sizeof(struct sgx_sigstruct)) ||
> > +		     !access_ok(token, SGX_EINITTOKEN_SIZE) ||
> > +		     !access_ok(secs, PAGE_SIZE));
> 
> Ditto.

The same as above, *trapnr should be set to X86_TRAP_GP before return
-EINVAL, although for sigstruct, token, they are just normal memory, but not
EPC.

Sean, do you have comments?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-04-05  9:01   ` [PATCH v3 " Borislav Petkov
@ 2021-04-05 21:46     ` Kai Huang
  2021-04-06  8:28       ` Borislav Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-04-05 21:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Mon, 5 Apr 2021 11:01:51 +0200 Borislav Petkov wrote:
> On Fri, Mar 19, 2021 at 08:22:21PM +1300, Kai Huang wrote:
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 35391e94bd22..007912f67a06 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1942,6 +1942,18 @@ config X86_SGX
> >  
> >  	  If unsure, say N.
> >  
> > +config X86_SGX_KVM
> > +	bool "Software Guard eXtensions (SGX) Virtualization"
> > +	depends on X86_SGX && KVM_INTEL
> > +	help
> 
> It seems to me this would fit better under "Virtualization" because even
> if I want to enable it, I have to go enable KVM_INTEL first and then
> return back here to turn it on too.
> 
> And under "Virtualization" is where we enable all kinds of aspects which
> belong to it.

Fine to me. Please let me know if you want me to resend patches. Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-04-05 21:44     ` Kai Huang
@ 2021-04-06  7:40       ` Borislav Petkov
  2021-04-06  8:59         ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-04-06  7:40 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, Apr 06, 2021 at 09:44:21AM +1200, Kai Huang wrote:
> The intention was to catch KVM bug, since KVM is the only caller, and in current
> implementation KVM won't call this function if @secs is not a valid userspace
> pointer. But yes we can also return here, but in this case an exception number
> must also be specified to *trapnr so that KVM can inject to guest. It's not that
> straightforward to decide which exception should we inject, but I think #GP
> should be OK. Please see below.

Why should you inject anything in that case?

AFAICT, you can handle the return value in __handle_encls_ecreate() and
inject only when the return value is EFAULT. If it is another negative
error value, you pass it back up to its caller, handle_encls_ecreate()
which returns other error values like -ENOMEM too. Which means, its
callchain can stomach negative values just fine.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-04-05 21:46     ` Kai Huang
@ 2021-04-06  8:28       ` Borislav Petkov
  2021-04-06  9:04         ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-04-06  8:28 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, Apr 06, 2021 at 09:46:34AM +1200, Kai Huang wrote:
> Fine to me. Please let me know if you want me to resend patches. Thanks.

Patch updated:

---
From: Sean Christopherson <sean.j.christopherson@intel.com>
Date: Fri, 19 Mar 2021 20:22:21 +1300
Subject: [PATCH] x86/sgx: Introduce virtual EPC for use by KVM guests

Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw"
Enclave Page Cache (EPC) without an associated enclave. The intended
and only known use case for raw EPC allocation is to expose EPC to a
KVM guest, hence the 'vepc' moniker, virt.{c,h} files and X86_SGX_KVM
Kconfig.

The SGX driver uses the misc device /dev/sgx_enclave to support
userspace in creating an enclave. Each file descriptor returned from
opening /dev/sgx_enclave represents an enclave. Unlike the SGX driver,
KVM doesn't control how the guest uses the EPC, therefore EPC allocated
to a KVM guest is not associated with an enclave, and /dev/sgx_enclave
is not suitable for allocating EPC for a KVM guest.

Having separate device nodes for the SGX driver and KVM virtual EPC also
allows separate permission control for running host SGX enclaves and KVM
SGX guests.

To use /dev/sgx_vepc to allocate a virtual EPC instance with particular
size, the hypervisor opens /dev/sgx_vepc, and uses mmap() with the
intended size to get an address range of virtual EPC. Then it may use
the address range to create one KVM memory slot as virtual EPC for
a guest.

Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
/dev/sgx_vepc rather than in KVM. Doing so has two major advantages:

  - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
    just another memory backend for guests.

  - EPC management is wholly contained in the SGX subsystem, e.g. SGX
    does not have to export any symbols, changes to reclaim flows don't
    need to be routed through KVM, SGX's dirty laundry doesn't have to
    get aired out for the world to see, and so on and so forth.

The virtual EPC pages allocated to guests are currently not reclaimable.
Reclaiming an EPC page used by enclave requires a special reclaim
mechanism separate from normal page reclaim, and that mechanism is not
supported for virutal EPC pages. Due to the complications of handling
reclaim conflicts between guest and host, reclaiming virtual EPC pages
is significantly more complex than basic support for SGX virtualization.

 [ bp:
   - Massage commit message and comments
   - use cpu_feature_enabled()
   - vertically align struct members init
   - massage Virtual EPC clarification text
   - move Kconfig prompt to Virtualization ]

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/0c38ced8c8e5a69872db4d6a1c0dabd01e07cad7.1616136308.git.kai.huang@intel.com
---
 Documentation/x86/sgx.rst        |  16 ++
 arch/x86/kernel/cpu/sgx/Makefile |   1 +
 arch/x86/kernel/cpu/sgx/sgx.h    |   9 ++
 arch/x86/kernel/cpu/sgx/virt.c   | 259 +++++++++++++++++++++++++++++++
 arch/x86/kvm/Kconfig             |  12 ++
 5 files changed, 297 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.c

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index f90076e67cde..dd0ac96ff9ef 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -234,3 +234,19 @@ As a result, when this happpens, user should stop running any new
 SGX workloads, (or just any new workloads), and migrate all valuable
 workloads. Although a machine reboot can recover all EPC memory, the bug
 should be reported to Linux developers.
+
+
+Virtual EPC
+===========
+
+The implementation has also a virtual EPC driver to support SGX enclaves
+in guests. Unlike the SGX driver, an EPC page allocated by the virtual
+EPC driver doesn't have a specific enclave associated with it. This is
+because KVM doesn't track how a guest uses EPC pages.
+
+As a result, the SGX core page reclaimer doesn't support reclaiming EPC
+pages allocated to KVM guests through the virtual EPC driver. If the
+user wants to deploy SGX applications both on the host and in guests
+on the same machine, the user should reserve enough EPC (by taking out
+total virtual EPC size of all SGX VMs from the physical EPC size) for
+host SGX applications so they can run with acceptable performance.
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 91d3dc784a29..9c1656779b2a 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -3,3 +3,4 @@ obj-y += \
 	encl.o \
 	ioctl.o \
 	main.o
+obj-$(CONFIG_X86_SGX_KVM)	+= virt.o
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4aa40c627819..4854f3980edd 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -84,4 +84,13 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
 
+#ifdef CONFIG_X86_SGX_KVM
+int __init sgx_vepc_init(void);
+#else
+static inline int __init sgx_vepc_init(void)
+{
+	return -ENODEV;
+}
+#endif
+
 #endif /* _X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
new file mode 100644
index 000000000000..259cc46ad78c
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -0,0 +1,259 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Device driver to expose SGX enclave memory to KVM guests.
+ *
+ * Copyright(c) 2021 Intel Corporation.
+ */
+
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/xarray.h>
+#include <asm/sgx.h>
+#include <uapi/asm/sgx.h>
+
+#include "encls.h"
+#include "sgx.h"
+
+struct sgx_vepc {
+	struct xarray page_array;
+	struct mutex lock;
+};
+
+/*
+ * Temporary SECS pages that cannot be EREMOVE'd due to having child in other
+ * virtual EPC instances, and the lock to protect it.
+ */
+static struct mutex zombie_secs_pages_lock;
+static struct list_head zombie_secs_pages;
+
+static int __sgx_vepc_fault(struct sgx_vepc *vepc,
+			    struct vm_area_struct *vma, unsigned long addr)
+{
+	struct sgx_epc_page *epc_page;
+	unsigned long index, pfn;
+	int ret;
+
+	WARN_ON(!mutex_is_locked(&vepc->lock));
+
+	/* Calculate index of EPC page in virtual EPC's page_array */
+	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
+
+	epc_page = xa_load(&vepc->page_array, index);
+	if (epc_page)
+		return 0;
+
+	epc_page = sgx_alloc_epc_page(vepc, false);
+	if (IS_ERR(epc_page))
+		return PTR_ERR(epc_page);
+
+	ret = xa_err(xa_store(&vepc->page_array, index, epc_page, GFP_KERNEL));
+	if (ret)
+		goto err_free;
+
+	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
+
+	ret = vmf_insert_pfn(vma, addr, pfn);
+	if (ret != VM_FAULT_NOPAGE) {
+		ret = -EFAULT;
+		goto err_delete;
+	}
+
+	return 0;
+
+err_delete:
+	xa_erase(&vepc->page_array, index);
+err_free:
+	sgx_free_epc_page(epc_page);
+	return ret;
+}
+
+static vm_fault_t sgx_vepc_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_vepc *vepc = vma->vm_private_data;
+	int ret;
+
+	mutex_lock(&vepc->lock);
+	ret = __sgx_vepc_fault(vepc, vma, vmf->address);
+	mutex_unlock(&vepc->lock);
+
+	if (!ret)
+		return VM_FAULT_NOPAGE;
+
+	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
+		mmap_read_unlock(vma->vm_mm);
+		return VM_FAULT_RETRY;
+	}
+
+	return VM_FAULT_SIGBUS;
+}
+
+const struct vm_operations_struct sgx_vepc_vm_ops = {
+	.fault = sgx_vepc_fault,
+};
+
+static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct sgx_vepc *vepc = file->private_data;
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
+	vma->vm_ops = &sgx_vepc_vm_ops;
+	/* Don't copy VMA in fork() */
+	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
+	vma->vm_private_data = vepc;
+
+	return 0;
+}
+
+static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+{
+	int ret;
+
+	/*
+	 * Take a previously guest-owned EPC page and return it to the
+	 * general EPC page pool.
+	 *
+	 * Guests can not be trusted to have left this page in a good
+	 * state, so run EREMOVE on the page unconditionally.  In the
+	 * case that a guest properly EREMOVE'd this page, a superfluous
+	 * EREMOVE is harmless.
+	 */
+	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
+	if (ret) {
+		/*
+		 * Only SGX_CHILD_PRESENT is expected, which is because of
+		 * EREMOVE'ing an SECS still with child, in which case it can
+		 * be handled by EREMOVE'ing the SECS again after all pages in
+		 * virtual EPC have been EREMOVE'd. See comments in below in
+		 * sgx_vepc_release().
+		 *
+		 * The user of virtual EPC (KVM) needs to guarantee there's no
+		 * logical processor is still running in the enclave in guest,
+		 * otherwise EREMOVE will get SGX_ENCLAVE_ACT which cannot be
+		 * handled here.
+		 */
+		WARN_ONCE(ret != SGX_CHILD_PRESENT, EREMOVE_ERROR_MESSAGE,
+			  ret, ret);
+		return ret;
+	}
+
+	sgx_free_epc_page(epc_page);
+
+	return 0;
+}
+
+static int sgx_vepc_release(struct inode *inode, struct file *file)
+{
+	struct sgx_vepc *vepc = file->private_data;
+	struct sgx_epc_page *epc_page, *tmp, *entry;
+	unsigned long index;
+
+	LIST_HEAD(secs_pages);
+
+	xa_for_each(&vepc->page_array, index, entry) {
+		/*
+		 * Remove all normal, child pages.  sgx_vepc_free_page()
+		 * will fail if EREMOVE fails, but this is OK and expected on
+		 * SECS pages.  Those can only be EREMOVE'd *after* all their
+		 * child pages. Retries below will clean them up.
+		 */
+		if (sgx_vepc_free_page(entry))
+			continue;
+
+		xa_erase(&vepc->page_array, index);
+	}
+
+	/*
+	 * Retry EREMOVE'ing pages.  This will clean up any SECS pages that
+	 * only had children in this 'epc' area.
+	 */
+	xa_for_each(&vepc->page_array, index, entry) {
+		epc_page = entry;
+		/*
+		 * An EREMOVE failure here means that the SECS page still
+		 * has children.  But, since all children in this 'sgx_vepc'
+		 * have been removed, the SECS page must have a child on
+		 * another instance.
+		 */
+		if (sgx_vepc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+
+		xa_erase(&vepc->page_array, index);
+	}
+
+	/*
+	 * SECS pages are "pinned" by child pages, and "unpinned" once all
+	 * children have been EREMOVE'd.  A child page in this instance
+	 * may have pinned an SECS page encountered in an earlier release(),
+	 * creating a zombie.  Since some children were EREMOVE'd above,
+	 * try to EREMOVE all zombies in the hopes that one was unpinned.
+	 */
+	mutex_lock(&zombie_secs_pages_lock);
+	list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) {
+		/*
+		 * Speculatively remove the page from the list of zombies,
+		 * if the page is successfully EREMOVE'd it will be added to
+		 * the list of free pages.  If EREMOVE fails, throw the page
+		 * on the local list, which will be spliced on at the end.
+		 */
+		list_del(&epc_page->list);
+
+		if (sgx_vepc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+	}
+
+	if (!list_empty(&secs_pages))
+		list_splice_tail(&secs_pages, &zombie_secs_pages);
+	mutex_unlock(&zombie_secs_pages_lock);
+
+	kfree(vepc);
+
+	return 0;
+}
+
+static int sgx_vepc_open(struct inode *inode, struct file *file)
+{
+	struct sgx_vepc *vepc;
+
+	vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL);
+	if (!vepc)
+		return -ENOMEM;
+	mutex_init(&vepc->lock);
+	xa_init(&vepc->page_array);
+
+	file->private_data = vepc;
+
+	return 0;
+}
+
+static const struct file_operations sgx_vepc_fops = {
+	.owner		= THIS_MODULE,
+	.open		= sgx_vepc_open,
+	.release	= sgx_vepc_release,
+	.mmap		= sgx_vepc_mmap,
+};
+
+static struct miscdevice sgx_vepc_dev = {
+	.minor		= MISC_DYNAMIC_MINOR,
+	.name		= "sgx_vepc",
+	.nodename	= "sgx_vepc",
+	.fops		= &sgx_vepc_fops,
+};
+
+int __init sgx_vepc_init(void)
+{
+	/* SGX virtualization requires KVM to work */
+	if (!cpu_feature_enabled(X86_FEATURE_VMX))
+		return -ENODEV;
+
+	INIT_LIST_HEAD(&zombie_secs_pages);
+	mutex_init(&zombie_secs_pages_lock);
+
+	return misc_register(&sgx_vepc_dev);
+}
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index a788d5120d4d..f6b93a35ce14 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -84,6 +84,18 @@ config KVM_INTEL
 	  To compile this as a module, choose M here: the module
 	  will be called kvm-intel.
 
+config X86_SGX_KVM
+	bool "Software Guard eXtensions (SGX) Virtualization"
+	depends on X86_SGX && KVM_INTEL
+	help
+
+	  Enables KVM guests to create SGX enclaves.
+
+	  This includes support to expose "raw" unreclaimable enclave memory to
+	  guests via a device node, e.g. /dev/sgx_vepc.
+
+	  If unsure, say N.
+
 config KVM_AMD
 	tristate "KVM for AMD processors support"
 	depends on KVM
-- 
2.29.2



-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-04-06  7:40       ` Borislav Petkov
@ 2021-04-06  8:59         ` Kai Huang
  2021-04-06  9:09           ` Borislav Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-04-06  8:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, 6 Apr 2021 09:40:38 +0200 Borislav Petkov wrote:
> On Tue, Apr 06, 2021 at 09:44:21AM +1200, Kai Huang wrote:
> > The intention was to catch KVM bug, since KVM is the only caller, and in current
> > implementation KVM won't call this function if @secs is not a valid userspace
> > pointer. But yes we can also return here, but in this case an exception number
> > must also be specified to *trapnr so that KVM can inject to guest. It's not that
> > straightforward to decide which exception should we inject, but I think #GP
> > should be OK. Please see below.
> 
> Why should you inject anything in that case?
> 
> AFAICT, you can handle the return value in __handle_encls_ecreate() and
> inject only when the return value is EFAULT. If it is another negative
> error value, you pass it back up to its caller, handle_encls_ecreate()
> which returns other error values like -ENOMEM too. Which means, its
> callchain can stomach negative values just fine.
> 

OK. My thinking was that, returning negative error value basically means guest
will be killed.  For the case access_ok() fails for @secs or other user
pointers, it seems killing guest is a little it overkill, but since this code's
purpose is to catch KVM bug, I think killing guest is also OK from this
perspective (like -ENOMEM case, it is kernel/kvm internal error). So yes I
guess we can make handle_encls_xx() to stomach negative values, and only inject
upon -EFAULT.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-04-06  8:28       ` Borislav Petkov
@ 2021-04-06  9:04         ` Kai Huang
  0 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-04-06  9:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, x86, linux-sgx, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, 6 Apr 2021 10:28:00 +0200 Borislav Petkov wrote:
> On Tue, Apr 06, 2021 at 09:46:34AM +1200, Kai Huang wrote:
> > Fine to me. Please let me know if you want me to resend patches. Thanks.
> 
> Patch updated:

Looks fine. Thank you!

> 
> ---
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> Date: Fri, 19 Mar 2021 20:22:21 +1300
> Subject: [PATCH] x86/sgx: Introduce virtual EPC for use by KVM guests
> 
> Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw"
> Enclave Page Cache (EPC) without an associated enclave. The intended
> and only known use case for raw EPC allocation is to expose EPC to a
> KVM guest, hence the 'vepc' moniker, virt.{c,h} files and X86_SGX_KVM
> Kconfig.
> 
> The SGX driver uses the misc device /dev/sgx_enclave to support
> userspace in creating an enclave. Each file descriptor returned from
> opening /dev/sgx_enclave represents an enclave. Unlike the SGX driver,
> KVM doesn't control how the guest uses the EPC, therefore EPC allocated
> to a KVM guest is not associated with an enclave, and /dev/sgx_enclave
> is not suitable for allocating EPC for a KVM guest.
> 
> Having separate device nodes for the SGX driver and KVM virtual EPC also
> allows separate permission control for running host SGX enclaves and KVM
> SGX guests.
> 
> To use /dev/sgx_vepc to allocate a virtual EPC instance with particular
> size, the hypervisor opens /dev/sgx_vepc, and uses mmap() with the
> intended size to get an address range of virtual EPC. Then it may use
> the address range to create one KVM memory slot as virtual EPC for
> a guest.
> 
> Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> /dev/sgx_vepc rather than in KVM. Doing so has two major advantages:
> 
>   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
>     just another memory backend for guests.
> 
>   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
>     does not have to export any symbols, changes to reclaim flows don't
>     need to be routed through KVM, SGX's dirty laundry doesn't have to
>     get aired out for the world to see, and so on and so forth.
> 
> The virtual EPC pages allocated to guests are currently not reclaimable.
> Reclaiming an EPC page used by enclave requires a special reclaim
> mechanism separate from normal page reclaim, and that mechanism is not
> supported for virutal EPC pages. Due to the complications of handling
> reclaim conflicts between guest and host, reclaiming virtual EPC pages
> is significantly more complex than basic support for SGX virtualization.
> 
>  [ bp:
>    - Massage commit message and comments
>    - use cpu_feature_enabled()
>    - vertically align struct members init
>    - massage Virtual EPC clarification text
>    - move Kconfig prompt to Virtualization ]
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Co-developed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Acked-by: Dave Hansen <dave.hansen@intel.com>
> Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> Link: https://lkml.kernel.org/r/0c38ced8c8e5a69872db4d6a1c0dabd01e07cad7.1616136308.git.kai.huang@intel.com
> ---
>  Documentation/x86/sgx.rst        |  16 ++
>  arch/x86/kernel/cpu/sgx/Makefile |   1 +
>  arch/x86/kernel/cpu/sgx/sgx.h    |   9 ++
>  arch/x86/kernel/cpu/sgx/virt.c   | 259 +++++++++++++++++++++++++++++++
>  arch/x86/kvm/Kconfig             |  12 ++
>  5 files changed, 297 insertions(+)
>  create mode 100644 arch/x86/kernel/cpu/sgx/virt.c
> 
> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> index f90076e67cde..dd0ac96ff9ef 100644
> --- a/Documentation/x86/sgx.rst
> +++ b/Documentation/x86/sgx.rst
> @@ -234,3 +234,19 @@ As a result, when this happpens, user should stop running any new
>  SGX workloads, (or just any new workloads), and migrate all valuable
>  workloads. Although a machine reboot can recover all EPC memory, the bug
>  should be reported to Linux developers.
> +
> +
> +Virtual EPC
> +===========
> +
> +The implementation has also a virtual EPC driver to support SGX enclaves
> +in guests. Unlike the SGX driver, an EPC page allocated by the virtual
> +EPC driver doesn't have a specific enclave associated with it. This is
> +because KVM doesn't track how a guest uses EPC pages.
> +
> +As a result, the SGX core page reclaimer doesn't support reclaiming EPC
> +pages allocated to KVM guests through the virtual EPC driver. If the
> +user wants to deploy SGX applications both on the host and in guests
> +on the same machine, the user should reserve enough EPC (by taking out
> +total virtual EPC size of all SGX VMs from the physical EPC size) for
> +host SGX applications so they can run with acceptable performance.
> diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
> index 91d3dc784a29..9c1656779b2a 100644
> --- a/arch/x86/kernel/cpu/sgx/Makefile
> +++ b/arch/x86/kernel/cpu/sgx/Makefile
> @@ -3,3 +3,4 @@ obj-y += \
>  	encl.o \
>  	ioctl.o \
>  	main.o
> +obj-$(CONFIG_X86_SGX_KVM)	+= virt.o
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 4aa40c627819..4854f3980edd 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -84,4 +84,13 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
>  int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
>  struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
>  
> +#ifdef CONFIG_X86_SGX_KVM
> +int __init sgx_vepc_init(void);
> +#else
> +static inline int __init sgx_vepc_init(void)
> +{
> +	return -ENODEV;
> +}
> +#endif
> +
>  #endif /* _X86_SGX_H */
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> new file mode 100644
> index 000000000000..259cc46ad78c
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -0,0 +1,259 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Device driver to expose SGX enclave memory to KVM guests.
> + *
> + * Copyright(c) 2021 Intel Corporation.
> + */
> +
> +#include <linux/miscdevice.h>
> +#include <linux/mm.h>
> +#include <linux/mman.h>
> +#include <linux/sched/mm.h>
> +#include <linux/sched/signal.h>
> +#include <linux/slab.h>
> +#include <linux/xarray.h>
> +#include <asm/sgx.h>
> +#include <uapi/asm/sgx.h>
> +
> +#include "encls.h"
> +#include "sgx.h"
> +
> +struct sgx_vepc {
> +	struct xarray page_array;
> +	struct mutex lock;
> +};
> +
> +/*
> + * Temporary SECS pages that cannot be EREMOVE'd due to having child in other
> + * virtual EPC instances, and the lock to protect it.
> + */
> +static struct mutex zombie_secs_pages_lock;
> +static struct list_head zombie_secs_pages;
> +
> +static int __sgx_vepc_fault(struct sgx_vepc *vepc,
> +			    struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct sgx_epc_page *epc_page;
> +	unsigned long index, pfn;
> +	int ret;
> +
> +	WARN_ON(!mutex_is_locked(&vepc->lock));
> +
> +	/* Calculate index of EPC page in virtual EPC's page_array */
> +	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
> +
> +	epc_page = xa_load(&vepc->page_array, index);
> +	if (epc_page)
> +		return 0;
> +
> +	epc_page = sgx_alloc_epc_page(vepc, false);
> +	if (IS_ERR(epc_page))
> +		return PTR_ERR(epc_page);
> +
> +	ret = xa_err(xa_store(&vepc->page_array, index, epc_page, GFP_KERNEL));
> +	if (ret)
> +		goto err_free;
> +
> +	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
> +
> +	ret = vmf_insert_pfn(vma, addr, pfn);
> +	if (ret != VM_FAULT_NOPAGE) {
> +		ret = -EFAULT;
> +		goto err_delete;
> +	}
> +
> +	return 0;
> +
> +err_delete:
> +	xa_erase(&vepc->page_array, index);
> +err_free:
> +	sgx_free_epc_page(epc_page);
> +	return ret;
> +}
> +
> +static vm_fault_t sgx_vepc_fault(struct vm_fault *vmf)
> +{
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct sgx_vepc *vepc = vma->vm_private_data;
> +	int ret;
> +
> +	mutex_lock(&vepc->lock);
> +	ret = __sgx_vepc_fault(vepc, vma, vmf->address);
> +	mutex_unlock(&vepc->lock);
> +
> +	if (!ret)
> +		return VM_FAULT_NOPAGE;
> +
> +	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
> +		mmap_read_unlock(vma->vm_mm);
> +		return VM_FAULT_RETRY;
> +	}
> +
> +	return VM_FAULT_SIGBUS;
> +}
> +
> +const struct vm_operations_struct sgx_vepc_vm_ops = {
> +	.fault = sgx_vepc_fault,
> +};
> +
> +static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct sgx_vepc *vepc = file->private_data;
> +
> +	if (!(vma->vm_flags & VM_SHARED))
> +		return -EINVAL;
> +
> +	vma->vm_ops = &sgx_vepc_vm_ops;
> +	/* Don't copy VMA in fork() */
> +	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
> +	vma->vm_private_data = vepc;
> +
> +	return 0;
> +}
> +
> +static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
> +{
> +	int ret;
> +
> +	/*
> +	 * Take a previously guest-owned EPC page and return it to the
> +	 * general EPC page pool.
> +	 *
> +	 * Guests can not be trusted to have left this page in a good
> +	 * state, so run EREMOVE on the page unconditionally.  In the
> +	 * case that a guest properly EREMOVE'd this page, a superfluous
> +	 * EREMOVE is harmless.
> +	 */
> +	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
> +	if (ret) {
> +		/*
> +		 * Only SGX_CHILD_PRESENT is expected, which is because of
> +		 * EREMOVE'ing an SECS still with child, in which case it can
> +		 * be handled by EREMOVE'ing the SECS again after all pages in
> +		 * virtual EPC have been EREMOVE'd. See comments in below in
> +		 * sgx_vepc_release().
> +		 *
> +		 * The user of virtual EPC (KVM) needs to guarantee there's no
> +		 * logical processor is still running in the enclave in guest,
> +		 * otherwise EREMOVE will get SGX_ENCLAVE_ACT which cannot be
> +		 * handled here.
> +		 */
> +		WARN_ONCE(ret != SGX_CHILD_PRESENT, EREMOVE_ERROR_MESSAGE,
> +			  ret, ret);
> +		return ret;
> +	}
> +
> +	sgx_free_epc_page(epc_page);
> +
> +	return 0;
> +}
> +
> +static int sgx_vepc_release(struct inode *inode, struct file *file)
> +{
> +	struct sgx_vepc *vepc = file->private_data;
> +	struct sgx_epc_page *epc_page, *tmp, *entry;
> +	unsigned long index;
> +
> +	LIST_HEAD(secs_pages);
> +
> +	xa_for_each(&vepc->page_array, index, entry) {
> +		/*
> +		 * Remove all normal, child pages.  sgx_vepc_free_page()
> +		 * will fail if EREMOVE fails, but this is OK and expected on
> +		 * SECS pages.  Those can only be EREMOVE'd *after* all their
> +		 * child pages. Retries below will clean them up.
> +		 */
> +		if (sgx_vepc_free_page(entry))
> +			continue;
> +
> +		xa_erase(&vepc->page_array, index);
> +	}
> +
> +	/*
> +	 * Retry EREMOVE'ing pages.  This will clean up any SECS pages that
> +	 * only had children in this 'epc' area.
> +	 */
> +	xa_for_each(&vepc->page_array, index, entry) {
> +		epc_page = entry;
> +		/*
> +		 * An EREMOVE failure here means that the SECS page still
> +		 * has children.  But, since all children in this 'sgx_vepc'
> +		 * have been removed, the SECS page must have a child on
> +		 * another instance.
> +		 */
> +		if (sgx_vepc_free_page(epc_page))
> +			list_add_tail(&epc_page->list, &secs_pages);
> +
> +		xa_erase(&vepc->page_array, index);
> +	}
> +
> +	/*
> +	 * SECS pages are "pinned" by child pages, and "unpinned" once all
> +	 * children have been EREMOVE'd.  A child page in this instance
> +	 * may have pinned an SECS page encountered in an earlier release(),
> +	 * creating a zombie.  Since some children were EREMOVE'd above,
> +	 * try to EREMOVE all zombies in the hopes that one was unpinned.
> +	 */
> +	mutex_lock(&zombie_secs_pages_lock);
> +	list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) {
> +		/*
> +		 * Speculatively remove the page from the list of zombies,
> +		 * if the page is successfully EREMOVE'd it will be added to
> +		 * the list of free pages.  If EREMOVE fails, throw the page
> +		 * on the local list, which will be spliced on at the end.
> +		 */
> +		list_del(&epc_page->list);
> +
> +		if (sgx_vepc_free_page(epc_page))
> +			list_add_tail(&epc_page->list, &secs_pages);
> +	}
> +
> +	if (!list_empty(&secs_pages))
> +		list_splice_tail(&secs_pages, &zombie_secs_pages);
> +	mutex_unlock(&zombie_secs_pages_lock);
> +
> +	kfree(vepc);
> +
> +	return 0;
> +}
> +
> +static int sgx_vepc_open(struct inode *inode, struct file *file)
> +{
> +	struct sgx_vepc *vepc;
> +
> +	vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL);
> +	if (!vepc)
> +		return -ENOMEM;
> +	mutex_init(&vepc->lock);
> +	xa_init(&vepc->page_array);
> +
> +	file->private_data = vepc;
> +
> +	return 0;
> +}
> +
> +static const struct file_operations sgx_vepc_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= sgx_vepc_open,
> +	.release	= sgx_vepc_release,
> +	.mmap		= sgx_vepc_mmap,
> +};
> +
> +static struct miscdevice sgx_vepc_dev = {
> +	.minor		= MISC_DYNAMIC_MINOR,
> +	.name		= "sgx_vepc",
> +	.nodename	= "sgx_vepc",
> +	.fops		= &sgx_vepc_fops,
> +};
> +
> +int __init sgx_vepc_init(void)
> +{
> +	/* SGX virtualization requires KVM to work */
> +	if (!cpu_feature_enabled(X86_FEATURE_VMX))
> +		return -ENODEV;
> +
> +	INIT_LIST_HEAD(&zombie_secs_pages);
> +	mutex_init(&zombie_secs_pages_lock);
> +
> +	return misc_register(&sgx_vepc_dev);
> +}
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index a788d5120d4d..f6b93a35ce14 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -84,6 +84,18 @@ config KVM_INTEL
>  	  To compile this as a module, choose M here: the module
>  	  will be called kvm-intel.
>  
> +config X86_SGX_KVM
> +	bool "Software Guard eXtensions (SGX) Virtualization"
> +	depends on X86_SGX && KVM_INTEL
> +	help
> +
> +	  Enables KVM guests to create SGX enclaves.
> +
> +	  This includes support to expose "raw" unreclaimable enclave memory to
> +	  guests via a device node, e.g. /dev/sgx_vepc.
> +
> +	  If unsure, say N.
> +
>  config KVM_AMD
>  	tristate "KVM for AMD processors support"
>  	depends on KVM
> -- 
> 2.29.2
> 
> 
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-04-06  8:59         ` Kai Huang
@ 2021-04-06  9:09           ` Borislav Petkov
  2021-04-06  9:24             ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-04-06  9:09 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, Apr 06, 2021 at 08:59:58PM +1200, Kai Huang wrote:
> OK. My thinking was that, returning negative error value basically means guest
> will be killed.

You need to define how you're going to handle invalid input from the
guest. If that guest is considered malicious, then sure, killing it
makes sense.

> For the case access_ok() fails for @secs or other user pointers, it
> seems killing guest is a little it overkill,

So don't kill it then - just don't allow it to create an enclave because
it is doing stupid crap.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-04-06  9:09           ` Borislav Petkov
@ 2021-04-06  9:24             ` Kai Huang
  2021-04-06  9:32               ` Borislav Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-04-06  9:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, 6 Apr 2021 11:09:01 +0200 Borislav Petkov wrote:
> On Tue, Apr 06, 2021 at 08:59:58PM +1200, Kai Huang wrote:
> > OK. My thinking was that, returning negative error value basically means guest
> > will be killed.
> 
> You need to define how you're going to handle invalid input from the
> guest. If that guest is considered malicious, then sure, killing it
> makes sense.

Such invalid input has already been handled in handle_encls_xx() before calling
the two helpers in this patch. KVM returns to Qemu and let it decide whether to
kill or not. The access_ok()s here are trying to catch KVM bug.

> 
> > For the case access_ok() fails for @secs or other user pointers, it
> > seems killing guest is a little it overkill,
> 
> So don't kill it then - just don't allow it to create an enclave because
> it is doing stupid crap.

If so we'd better inject an exception to guest (and return 1) in KVM so guest
can continue to run. Otherwise basically KVM will return to Qemu and let it
decide (and basically it will kill guest).

I think killing guest is also OK. KVM part patches needs to be updated, though,
anyway.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-04-06  9:24             ` Kai Huang
@ 2021-04-06  9:32               ` Borislav Petkov
  2021-04-06  9:41                 ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-04-06  9:32 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, Apr 06, 2021 at 09:24:24PM +1200, Kai Huang wrote:
> Such invalid input has already been handled in handle_encls_xx() before calling
> the two helpers in this patch. KVM returns to Qemu and let it decide whether to
> kill or not. The access_ok()s here are trying to catch KVM bug.

Whatever they try to do, you cannot continue creating an enclave using
invalid input, no matter whether you've warned or not. People do not
stare at dmesg all the time.

> If so we'd better inject an exception to guest (and return 1) in KVM so guest
> can continue to run. Otherwise basically KVM will return to Qemu and let it
> decide (and basically it will kill guest).
>
> I think killing guest is also OK. KVM part patches needs to be updated, though,
> anyway.

Ok, I'll make the changes and you can redo the KVM rest ontop.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-04-06  9:32               ` Borislav Petkov
@ 2021-04-06  9:41                 ` Kai Huang
  2021-04-06 17:08                   ` Borislav Petkov
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-04-06  9:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, 6 Apr 2021 11:32:11 +0200 Borislav Petkov wrote:
> On Tue, Apr 06, 2021 at 09:24:24PM +1200, Kai Huang wrote:
> > Such invalid input has already been handled in handle_encls_xx() before calling
> > the two helpers in this patch. KVM returns to Qemu and let it decide whether to
> > kill or not. The access_ok()s here are trying to catch KVM bug.
> 
> Whatever they try to do, you cannot continue creating an enclave using
> invalid input, no matter whether you've warned or not. People do not
> stare at dmesg all the time.
> 
> > If so we'd better inject an exception to guest (and return 1) in KVM so guest
> > can continue to run. Otherwise basically KVM will return to Qemu and let it
> > decide (and basically it will kill guest).
> >
> > I think killing guest is also OK. KVM part patches needs to be updated, though,
> > anyway.
> 
> Ok, I'll make the changes and you can redo the KVM rest ontop.
> 

Thank you!

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-04-06  9:41                 ` Kai Huang
@ 2021-04-06 17:08                   ` Borislav Petkov
  2021-04-06 20:33                     ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Borislav Petkov @ 2021-04-06 17:08 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, Apr 06, 2021 at 09:41:52PM +1200, Kai Huang wrote:
> > Ok, I'll make the changes and you can redo the KVM rest ontop.
> > 
> 
> Thank you!

I.e., something like this:

---
From: Sean Christopherson <sean.j.christopherson@intel.com>
Date: Fri, 19 Mar 2021 20:23:08 +1300
Subject: [PATCH] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM

The host kernel must intercept ECREATE to impose policies on guests, and
intercept EINIT to be able to write guest's virtual SGX_LEPUBKEYHASH MSR
values to hardware before running guest's EINIT so it can run correctly
according to hardware behavior.

Provide wrappers around __ecreate() and __einit() to hide the ugliness
of overloading the ENCLS return value to encode multiple error formats
in a single int.  KVM will trap-and-execute ECREATE and EINIT as part
of SGX virtualization, and reflect ENCLS execution result to guest by
setting up guest's GPRs, or on an exception, injecting the correct fault
based on return value of __ecreate() and __einit().

Use host userspace addresses (provided by KVM based on guest physical
address of ENCLS parameters) to execute ENCLS/EINIT when possible.
Accesses to both EPC and memory originating from ENCLS are subject to
segmentation and paging mechanisms.  It's also possible to generate
kernel mappings for ENCLS parameters by resolving PFN but using
__uaccess_xx() is simpler.

 [ bp: Return early if the __user memory accesses fail. ]

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20e09daf559aa5e9e680a0b4b5fba940f1bad86e.1616136308.git.kai.huang@intel.com
---
 arch/x86/include/asm/sgx.h     |   7 ++
 arch/x86/kernel/cpu/sgx/virt.c | 117 +++++++++++++++++++++++++++++++++
 2 files changed, 124 insertions(+)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 3b025afec0a7..954042e04102 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -365,4 +365,11 @@ struct sgx_sigstruct {
  * comment!
  */
 
+#ifdef CONFIG_X86_SGX_KVM
+int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
+		     int *trapnr);
+int sgx_virt_einit(void __user *sigstruct, void __user *token,
+		   void __user *secs, u64 *lepubkeyhash, int *trapnr);
+#endif
+
 #endif /* _ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 259cc46ad78c..7d221eac716a 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -257,3 +257,120 @@ int __init sgx_vepc_init(void)
 
 	return misc_register(&sgx_vepc_dev);
 }
+
+/**
+ * sgx_virt_ecreate() - Run ECREATE on behalf of guest
+ * @pageinfo:	Pointer to PAGEINFO structure
+ * @secs:	Userspace pointer to SECS page
+ * @trapnr:	trap number injected to guest in case of ECREATE error
+ *
+ * Run ECREATE on behalf of guest after KVM traps ECREATE for the purpose
+ * of enforcing policies of guest's enclaves, and return the trap number
+ * which should be injected to guest in case of any ECREATE error.
+ *
+ * Return:
+ * -  0:	ECREATE was successful.
+ * - <0:	on error.
+ */
+int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
+		     int *trapnr)
+{
+	int ret;
+
+	/*
+	 * @secs is an untrusted, userspace-provided address.  It comes from
+	 * KVM and is assumed to be a valid pointer which points somewhere in
+	 * userspace.  This can fault and call SGX or other fault handlers when
+	 * userspace mapping @secs doesn't exist.
+	 *
+	 * Add a WARN() to make sure @secs is already valid userspace pointer
+	 * from caller (KVM), who should already have handled invalid pointer
+	 * case (for instance, made by malicious guest).  All other checks,
+	 * such as alignment of @secs, are deferred to ENCLS itself.
+	 */
+	if (WARN_ON_ONCE(!access_ok(secs, PAGE_SIZE)))
+		return -EINVAL;
+
+	__uaccess_begin();
+	ret = __ecreate(pageinfo, (void *)secs);
+	__uaccess_end();
+
+	if (encls_faulted(ret)) {
+		*trapnr = ENCLS_TRAPNR(ret);
+		return -EFAULT;
+	}
+
+	/* ECREATE doesn't return an error code, it faults or succeeds. */
+	WARN_ON_ONCE(ret);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sgx_virt_ecreate);
+
+static int __sgx_virt_einit(void __user *sigstruct, void __user *token,
+			    void __user *secs)
+{
+	int ret;
+
+	/*
+	 * Make sure all userspace pointers from caller (KVM) are valid.
+	 * All other checks deferred to ENCLS itself.  Also see comment
+	 * for @secs in sgx_virt_ecreate().
+	 */
+#define SGX_EINITTOKEN_SIZE	304
+	if (WARN_ON_ONCE(!access_ok(sigstruct, sizeof(struct sgx_sigstruct)) ||
+			 !access_ok(token, SGX_EINITTOKEN_SIZE) ||
+			 !access_ok(secs, PAGE_SIZE)))
+		return -EINVAL;
+
+	__uaccess_begin();
+	ret = __einit((void *)sigstruct, (void *)token, (void *)secs);
+	__uaccess_end();
+
+	return ret;
+}
+
+/**
+ * sgx_virt_einit() - Run EINIT on behalf of guest
+ * @sigstruct:		Userspace pointer to SIGSTRUCT structure
+ * @token:		Userspace pointer to EINITTOKEN structure
+ * @secs:		Userspace pointer to SECS page
+ * @lepubkeyhash:	Pointer to guest's *virtual* SGX_LEPUBKEYHASH MSR values
+ * @trapnr:		trap number injected to guest in case of EINIT error
+ *
+ * Run EINIT on behalf of guest after KVM traps EINIT. If SGX_LC is available
+ * in host, SGX driver may rewrite the hardware values at wish, therefore KVM
+ * needs to update hardware values to guest's virtual MSR values in order to
+ * ensure EINIT is executed with expected hardware values.
+ *
+ * Return:
+ * -  0:	EINIT was successful.
+ * - <0:	on error.
+ */
+int sgx_virt_einit(void __user *sigstruct, void __user *token,
+		   void __user *secs, u64 *lepubkeyhash, int *trapnr)
+{
+	int ret;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SGX_LC)) {
+		ret = __sgx_virt_einit(sigstruct, token, secs);
+	} else {
+		preempt_disable();
+
+		sgx_update_lepubkeyhash(lepubkeyhash);
+
+		ret = __sgx_virt_einit(sigstruct, token, secs);
+		preempt_enable();
+	}
+
+	/* Propagate up the error from the WARN_ON_ONCE in __sgx_virt_einit() */
+	if (ret == -EINVAL)
+		return ret;
+
+	if (encls_faulted(ret)) {
+		*trapnr = ENCLS_TRAPNR(ret);
+		return -EFAULT;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sgx_virt_einit);
-- 
2.29.2

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-04-06 17:08                   ` Borislav Petkov
@ 2021-04-06 20:33                     ` Kai Huang
  0 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-04-06 20:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, linux-sgx, x86, linux-kernel, seanjc, jarkko, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, tglx,
	mingo, hpa

On Tue, 6 Apr 2021 19:08:58 +0200 Borislav Petkov wrote:
> On Tue, Apr 06, 2021 at 09:41:52PM +1200, Kai Huang wrote:
> > > Ok, I'll make the changes and you can redo the KVM rest ontop.
> > > 
> > 
> > Thank you!
> 
> I.e., something like this:

Looks good. I'll update KVM part patches based on this one. 

Thanks.

> 
> ---
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> Date: Fri, 19 Mar 2021 20:23:08 +1300
> Subject: [PATCH] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
> 
> The host kernel must intercept ECREATE to impose policies on guests, and
> intercept EINIT to be able to write guest's virtual SGX_LEPUBKEYHASH MSR
> values to hardware before running guest's EINIT so it can run correctly
> according to hardware behavior.
> 
> Provide wrappers around __ecreate() and __einit() to hide the ugliness
> of overloading the ENCLS return value to encode multiple error formats
> in a single int.  KVM will trap-and-execute ECREATE and EINIT as part
> of SGX virtualization, and reflect ENCLS execution result to guest by
> setting up guest's GPRs, or on an exception, injecting the correct fault
> based on return value of __ecreate() and __einit().
> 
> Use host userspace addresses (provided by KVM based on guest physical
> address of ENCLS parameters) to execute ENCLS/EINIT when possible.
> Accesses to both EPC and memory originating from ENCLS are subject to
> segmentation and paging mechanisms.  It's also possible to generate
> kernel mappings for ENCLS parameters by resolving PFN but using
> __uaccess_xx() is simpler.
> 
>  [ bp: Return early if the __user memory accesses fail. ]
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> Link: https://lkml.kernel.org/r/20e09daf559aa5e9e680a0b4b5fba940f1bad86e.1616136308.git.kai.huang@intel.com
> ---
>  arch/x86/include/asm/sgx.h     |   7 ++
>  arch/x86/kernel/cpu/sgx/virt.c | 117 +++++++++++++++++++++++++++++++++
>  2 files changed, 124 insertions(+)
> 
> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> index 3b025afec0a7..954042e04102 100644
> --- a/arch/x86/include/asm/sgx.h
> +++ b/arch/x86/include/asm/sgx.h
> @@ -365,4 +365,11 @@ struct sgx_sigstruct {
>   * comment!
>   */
>  
> +#ifdef CONFIG_X86_SGX_KVM
> +int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
> +		     int *trapnr);
> +int sgx_virt_einit(void __user *sigstruct, void __user *token,
> +		   void __user *secs, u64 *lepubkeyhash, int *trapnr);
> +#endif
> +
>  #endif /* _ASM_X86_SGX_H */
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index 259cc46ad78c..7d221eac716a 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -257,3 +257,120 @@ int __init sgx_vepc_init(void)
>  
>  	return misc_register(&sgx_vepc_dev);
>  }
> +
> +/**
> + * sgx_virt_ecreate() - Run ECREATE on behalf of guest
> + * @pageinfo:	Pointer to PAGEINFO structure
> + * @secs:	Userspace pointer to SECS page
> + * @trapnr:	trap number injected to guest in case of ECREATE error
> + *
> + * Run ECREATE on behalf of guest after KVM traps ECREATE for the purpose
> + * of enforcing policies of guest's enclaves, and return the trap number
> + * which should be injected to guest in case of any ECREATE error.
> + *
> + * Return:
> + * -  0:	ECREATE was successful.
> + * - <0:	on error.
> + */
> +int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
> +		     int *trapnr)
> +{
> +	int ret;
> +
> +	/*
> +	 * @secs is an untrusted, userspace-provided address.  It comes from
> +	 * KVM and is assumed to be a valid pointer which points somewhere in
> +	 * userspace.  This can fault and call SGX or other fault handlers when
> +	 * userspace mapping @secs doesn't exist.
> +	 *
> +	 * Add a WARN() to make sure @secs is already valid userspace pointer
> +	 * from caller (KVM), who should already have handled invalid pointer
> +	 * case (for instance, made by malicious guest).  All other checks,
> +	 * such as alignment of @secs, are deferred to ENCLS itself.
> +	 */
> +	if (WARN_ON_ONCE(!access_ok(secs, PAGE_SIZE)))
> +		return -EINVAL;
> +
> +	__uaccess_begin();
> +	ret = __ecreate(pageinfo, (void *)secs);
> +	__uaccess_end();
> +
> +	if (encls_faulted(ret)) {
> +		*trapnr = ENCLS_TRAPNR(ret);
> +		return -EFAULT;
> +	}
> +
> +	/* ECREATE doesn't return an error code, it faults or succeeds. */
> +	WARN_ON_ONCE(ret);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(sgx_virt_ecreate);
> +
> +static int __sgx_virt_einit(void __user *sigstruct, void __user *token,
> +			    void __user *secs)
> +{
> +	int ret;
> +
> +	/*
> +	 * Make sure all userspace pointers from caller (KVM) are valid.
> +	 * All other checks deferred to ENCLS itself.  Also see comment
> +	 * for @secs in sgx_virt_ecreate().
> +	 */
> +#define SGX_EINITTOKEN_SIZE	304
> +	if (WARN_ON_ONCE(!access_ok(sigstruct, sizeof(struct sgx_sigstruct)) ||
> +			 !access_ok(token, SGX_EINITTOKEN_SIZE) ||
> +			 !access_ok(secs, PAGE_SIZE)))
> +		return -EINVAL;
> +
> +	__uaccess_begin();
> +	ret = __einit((void *)sigstruct, (void *)token, (void *)secs);
> +	__uaccess_end();
> +
> +	return ret;
> +}
> +
> +/**
> + * sgx_virt_einit() - Run EINIT on behalf of guest
> + * @sigstruct:		Userspace pointer to SIGSTRUCT structure
> + * @token:		Userspace pointer to EINITTOKEN structure
> + * @secs:		Userspace pointer to SECS page
> + * @lepubkeyhash:	Pointer to guest's *virtual* SGX_LEPUBKEYHASH MSR values
> + * @trapnr:		trap number injected to guest in case of EINIT error
> + *
> + * Run EINIT on behalf of guest after KVM traps EINIT. If SGX_LC is available
> + * in host, SGX driver may rewrite the hardware values at wish, therefore KVM
> + * needs to update hardware values to guest's virtual MSR values in order to
> + * ensure EINIT is executed with expected hardware values.
> + *
> + * Return:
> + * -  0:	EINIT was successful.
> + * - <0:	on error.
> + */
> +int sgx_virt_einit(void __user *sigstruct, void __user *token,
> +		   void __user *secs, u64 *lepubkeyhash, int *trapnr)
> +{
> +	int ret;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SGX_LC)) {
> +		ret = __sgx_virt_einit(sigstruct, token, secs);
> +	} else {
> +		preempt_disable();
> +
> +		sgx_update_lepubkeyhash(lepubkeyhash);
> +
> +		ret = __sgx_virt_einit(sigstruct, token, secs);
> +		preempt_enable();
> +	}
> +
> +	/* Propagate up the error from the WARN_ON_ONCE in __sgx_virt_einit() */
> +	if (ret == -EINVAL)
> +		return ret;
> +
> +	if (encls_faulted(ret)) {
> +		*trapnr = ENCLS_TRAPNR(ret);
> +		return -EFAULT;
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(sgx_virt_einit);
> -- 
> 2.29.2
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/sgx: Move provisioning device creation out of SGX driver
  2021-03-19  7:23 ` [PATCH v3 14/25] x86/sgx: Move provisioning device creation out of SGX driver Kai Huang
@ 2021-04-07 10:03   ` tip-bot2 for Sean Christopherson
  0 siblings, 0 replies; 134+ messages in thread
From: tip-bot2 for Sean Christopherson @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sean Christopherson, Kai Huang, Borislav Petkov, Jarkko Sakkinen,
	Dave Hansen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     b3754e5d3da320af2bebb7a690002685c7f5c15c
Gitweb:        https://git.kernel.org/tip/b3754e5d3da320af2bebb7a690002685c7f5c15c
Author:        Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:23:09 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 06 Apr 2021 19:18:46 +02:00

x86/sgx: Move provisioning device creation out of SGX driver

And extract sgx_set_attribute() out of sgx_ioc_enclave_provision() and
export it as symbol for KVM to use.

The provisioning key is sensitive. The SGX driver only allows to create
an enclave which can access the provisioning key when the enclave
creator has permission to open /dev/sgx_provision. It should apply to
a VM as well, as the provisioning key is platform-specific, thus an
unrestricted VM can also potentially compromise the provisioning key.

Move the provisioning device creation out of sgx_drv_init() to
sgx_init() as a preparation for adding SGX virtualization support,
so that even if the SGX driver is not enabled due to flexible launch
control not being available, SGX virtualization can still be enabled,
and use it to restrict a VM's capability of being able to access the
provisioning key.

 [ bp: Massage commit message. ]

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/0f4d044d621561f26d5f4ef73e8dc6cd18cc7e79.1616136308.git.kai.huang@intel.com
---
 arch/x86/include/asm/sgx.h       |  3 ++-
 arch/x86/kernel/cpu/sgx/driver.c | 17 +---------
 arch/x86/kernel/cpu/sgx/ioctl.c  | 16 +--------
 arch/x86/kernel/cpu/sgx/main.c   | 57 ++++++++++++++++++++++++++++++-
 4 files changed, 61 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 954042e..a16e2c9 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -372,4 +372,7 @@ int sgx_virt_einit(void __user *sigstruct, void __user *token,
 		   void __user *secs, u64 *lepubkeyhash, int *trapnr);
 #endif
 
+int sgx_set_attribute(unsigned long *allowed_attributes,
+		      unsigned int attribute_fd);
+
 #endif /* _ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 8ce6d83..aa9b8b8 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -136,10 +136,6 @@ static const struct file_operations sgx_encl_fops = {
 	.get_unmapped_area	= sgx_get_unmapped_area,
 };
 
-const struct file_operations sgx_provision_fops = {
-	.owner			= THIS_MODULE,
-};
-
 static struct miscdevice sgx_dev_enclave = {
 	.minor = MISC_DYNAMIC_MINOR,
 	.name = "sgx_enclave",
@@ -147,13 +143,6 @@ static struct miscdevice sgx_dev_enclave = {
 	.fops = &sgx_encl_fops,
 };
 
-static struct miscdevice sgx_dev_provision = {
-	.minor = MISC_DYNAMIC_MINOR,
-	.name = "sgx_provision",
-	.nodename = "sgx_provision",
-	.fops = &sgx_provision_fops,
-};
-
 int __init sgx_drv_init(void)
 {
 	unsigned int eax, ebx, ecx, edx;
@@ -187,11 +176,5 @@ int __init sgx_drv_init(void)
 	if (ret)
 		return ret;
 
-	ret = misc_register(&sgx_dev_provision);
-	if (ret) {
-		misc_deregister(&sgx_dev_enclave);
-		return ret;
-	}
-
 	return 0;
 }
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 7be9c06..83df20e 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -2,6 +2,7 @@
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
 #include <asm/mman.h>
+#include <asm/sgx.h>
 #include <linux/mman.h>
 #include <linux/delay.h>
 #include <linux/file.h>
@@ -666,24 +667,11 @@ out:
 static long sgx_ioc_enclave_provision(struct sgx_encl *encl, void __user *arg)
 {
 	struct sgx_enclave_provision params;
-	struct file *file;
 
 	if (copy_from_user(&params, arg, sizeof(params)))
 		return -EFAULT;
 
-	file = fget(params.fd);
-	if (!file)
-		return -EINVAL;
-
-	if (file->f_op != &sgx_provision_fops) {
-		fput(file);
-		return -EINVAL;
-	}
-
-	encl->attributes_mask |= SGX_ATTR_PROVISIONKEY;
-
-	fput(file);
-	return 0;
+	return sgx_set_attribute(&encl->attributes_mask, params.fd);
 }
 
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 227f1e2..92cb11d 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -1,14 +1,17 @@
 // SPDX-License-Identifier: GPL-2.0
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
+#include <linux/file.h>
 #include <linux/freezer.h>
 #include <linux/highmem.h>
 #include <linux/kthread.h>
+#include <linux/miscdevice.h>
 #include <linux/pagemap.h>
 #include <linux/ratelimit.h>
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
+#include <asm/sgx.h>
 #include "driver.h"
 #include "encl.h"
 #include "encls.h"
@@ -743,6 +746,51 @@ void sgx_update_lepubkeyhash(u64 *lepubkeyhash)
 		wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0 + i, lepubkeyhash[i]);
 }
 
+const struct file_operations sgx_provision_fops = {
+	.owner			= THIS_MODULE,
+};
+
+static struct miscdevice sgx_dev_provision = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "sgx_provision",
+	.nodename = "sgx_provision",
+	.fops = &sgx_provision_fops,
+};
+
+/**
+ * sgx_set_attribute() - Update allowed attributes given file descriptor
+ * @allowed_attributes:		Pointer to allowed enclave attributes
+ * @attribute_fd:		File descriptor for specific attribute
+ *
+ * Append enclave attribute indicated by file descriptor to allowed
+ * attributes. Currently only SGX_ATTR_PROVISIONKEY indicated by
+ * /dev/sgx_provision is supported.
+ *
+ * Return:
+ * -0:		SGX_ATTR_PROVISIONKEY is appended to allowed_attributes
+ * -EINVAL:	Invalid, or not supported file descriptor
+ */
+int sgx_set_attribute(unsigned long *allowed_attributes,
+		      unsigned int attribute_fd)
+{
+	struct file *file;
+
+	file = fget(attribute_fd);
+	if (!file)
+		return -EINVAL;
+
+	if (file->f_op != &sgx_provision_fops) {
+		fput(file);
+		return -EINVAL;
+	}
+
+	*allowed_attributes |= SGX_ATTR_PROVISIONKEY;
+
+	fput(file);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sgx_set_attribute);
+
 static int __init sgx_init(void)
 {
 	int ret;
@@ -759,6 +807,10 @@ static int __init sgx_init(void)
 		goto err_page_cache;
 	}
 
+	ret = misc_register(&sgx_dev_provision);
+	if (ret)
+		goto err_kthread;
+
 	/*
 	 * Always try to initialize the native *and* KVM drivers.
 	 * The KVM driver is less picky than the native one and
@@ -770,10 +822,13 @@ static int __init sgx_init(void)
 	ret = sgx_drv_init();
 
 	if (sgx_vepc_init() && ret)
-		goto err_kthread;
+		goto err_provision;
 
 	return 0;
 
+err_provision:
+	misc_deregister(&sgx_dev_provision);
+
 err_kthread:
 	kthread_stop(ksgxd_tsk);
 

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM
  2021-03-19  7:23 ` [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Kai Huang
  2021-04-05  9:07   ` Borislav Petkov
@ 2021-04-07 10:03   ` tip-bot2 for Sean Christopherson
  1 sibling, 0 replies; 134+ messages in thread
From: tip-bot2 for Sean Christopherson @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sean Christopherson, Kai Huang, Borislav Petkov, Jarkko Sakkinen,
	x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     d155030b1e7c0e448aab22a803f7a71ea2e117d7
Gitweb:        https://git.kernel.org/tip/d155030b1e7c0e448aab22a803f7a71ea2e117d7
Author:        Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:23:08 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 06 Apr 2021 19:18:27 +02:00

x86/sgx: Add helpers to expose ECREATE and EINIT to KVM

The host kernel must intercept ECREATE to impose policies on guests, and
intercept EINIT to be able to write guest's virtual SGX_LEPUBKEYHASH MSR
values to hardware before running guest's EINIT so it can run correctly
according to hardware behavior.

Provide wrappers around __ecreate() and __einit() to hide the ugliness
of overloading the ENCLS return value to encode multiple error formats
in a single int.  KVM will trap-and-execute ECREATE and EINIT as part
of SGX virtualization, and reflect ENCLS execution result to guest by
setting up guest's GPRs, or on an exception, injecting the correct fault
based on return value of __ecreate() and __einit().

Use host userspace addresses (provided by KVM based on guest physical
address of ENCLS parameters) to execute ENCLS/EINIT when possible.
Accesses to both EPC and memory originating from ENCLS are subject to
segmentation and paging mechanisms.  It's also possible to generate
kernel mappings for ENCLS parameters by resolving PFN but using
__uaccess_xx() is simpler.

 [ bp: Return early if the __user memory accesses fail, use
   cpu_feature_enabled(). ]

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20e09daf559aa5e9e680a0b4b5fba940f1bad86e.1616136308.git.kai.huang@intel.com
---
 arch/x86/include/asm/sgx.h     |   7 ++-
 arch/x86/kernel/cpu/sgx/virt.c | 117 ++++++++++++++++++++++++++++++++-
 2 files changed, 124 insertions(+)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 3b025af..954042e 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -365,4 +365,11 @@ struct sgx_sigstruct {
  * comment!
  */
 
+#ifdef CONFIG_X86_SGX_KVM
+int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
+		     int *trapnr);
+int sgx_virt_einit(void __user *sigstruct, void __user *token,
+		   void __user *secs, u64 *lepubkeyhash, int *trapnr);
+#endif
+
 #endif /* _ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 259cc46..7d221ea 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -257,3 +257,120 @@ int __init sgx_vepc_init(void)
 
 	return misc_register(&sgx_vepc_dev);
 }
+
+/**
+ * sgx_virt_ecreate() - Run ECREATE on behalf of guest
+ * @pageinfo:	Pointer to PAGEINFO structure
+ * @secs:	Userspace pointer to SECS page
+ * @trapnr:	trap number injected to guest in case of ECREATE error
+ *
+ * Run ECREATE on behalf of guest after KVM traps ECREATE for the purpose
+ * of enforcing policies of guest's enclaves, and return the trap number
+ * which should be injected to guest in case of any ECREATE error.
+ *
+ * Return:
+ * -  0:	ECREATE was successful.
+ * - <0:	on error.
+ */
+int sgx_virt_ecreate(struct sgx_pageinfo *pageinfo, void __user *secs,
+		     int *trapnr)
+{
+	int ret;
+
+	/*
+	 * @secs is an untrusted, userspace-provided address.  It comes from
+	 * KVM and is assumed to be a valid pointer which points somewhere in
+	 * userspace.  This can fault and call SGX or other fault handlers when
+	 * userspace mapping @secs doesn't exist.
+	 *
+	 * Add a WARN() to make sure @secs is already valid userspace pointer
+	 * from caller (KVM), who should already have handled invalid pointer
+	 * case (for instance, made by malicious guest).  All other checks,
+	 * such as alignment of @secs, are deferred to ENCLS itself.
+	 */
+	if (WARN_ON_ONCE(!access_ok(secs, PAGE_SIZE)))
+		return -EINVAL;
+
+	__uaccess_begin();
+	ret = __ecreate(pageinfo, (void *)secs);
+	__uaccess_end();
+
+	if (encls_faulted(ret)) {
+		*trapnr = ENCLS_TRAPNR(ret);
+		return -EFAULT;
+	}
+
+	/* ECREATE doesn't return an error code, it faults or succeeds. */
+	WARN_ON_ONCE(ret);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sgx_virt_ecreate);
+
+static int __sgx_virt_einit(void __user *sigstruct, void __user *token,
+			    void __user *secs)
+{
+	int ret;
+
+	/*
+	 * Make sure all userspace pointers from caller (KVM) are valid.
+	 * All other checks deferred to ENCLS itself.  Also see comment
+	 * for @secs in sgx_virt_ecreate().
+	 */
+#define SGX_EINITTOKEN_SIZE	304
+	if (WARN_ON_ONCE(!access_ok(sigstruct, sizeof(struct sgx_sigstruct)) ||
+			 !access_ok(token, SGX_EINITTOKEN_SIZE) ||
+			 !access_ok(secs, PAGE_SIZE)))
+		return -EINVAL;
+
+	__uaccess_begin();
+	ret = __einit((void *)sigstruct, (void *)token, (void *)secs);
+	__uaccess_end();
+
+	return ret;
+}
+
+/**
+ * sgx_virt_einit() - Run EINIT on behalf of guest
+ * @sigstruct:		Userspace pointer to SIGSTRUCT structure
+ * @token:		Userspace pointer to EINITTOKEN structure
+ * @secs:		Userspace pointer to SECS page
+ * @lepubkeyhash:	Pointer to guest's *virtual* SGX_LEPUBKEYHASH MSR values
+ * @trapnr:		trap number injected to guest in case of EINIT error
+ *
+ * Run EINIT on behalf of guest after KVM traps EINIT. If SGX_LC is available
+ * in host, SGX driver may rewrite the hardware values at wish, therefore KVM
+ * needs to update hardware values to guest's virtual MSR values in order to
+ * ensure EINIT is executed with expected hardware values.
+ *
+ * Return:
+ * -  0:	EINIT was successful.
+ * - <0:	on error.
+ */
+int sgx_virt_einit(void __user *sigstruct, void __user *token,
+		   void __user *secs, u64 *lepubkeyhash, int *trapnr)
+{
+	int ret;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SGX_LC)) {
+		ret = __sgx_virt_einit(sigstruct, token, secs);
+	} else {
+		preempt_disable();
+
+		sgx_update_lepubkeyhash(lepubkeyhash);
+
+		ret = __sgx_virt_einit(sigstruct, token, secs);
+		preempt_enable();
+	}
+
+	/* Propagate up the error from the WARN_ON_ONCE in __sgx_virt_einit() */
+	if (ret == -EINVAL)
+		return ret;
+
+	if (encls_faulted(ret)) {
+		*trapnr = ENCLS_TRAPNR(ret);
+		return -EFAULT;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sgx_virt_einit);

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs
  2021-03-19  7:23 ` [PATCH v3 12/25] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs Kai Huang
@ 2021-04-07 10:03   ` tip-bot2 for Kai Huang
  0 siblings, 0 replies; 134+ messages in thread
From: tip-bot2 for Kai Huang @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kai Huang, Borislav Petkov, Dave Hansen, Jarkko Sakkinen, x86,
	linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     73916b6a0c714258f9c2619408a66c6696a761a7
Gitweb:        https://git.kernel.org/tip/73916b6a0c714258f9c2619408a66c6696a761a7
Author:        Kai Huang <kai.huang@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:23:07 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 06 Apr 2021 09:43:42 +02:00

x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs

Add a helper to update SGX_LEPUBKEYHASHn MSRs.  SGX virtualization also
needs to update those MSRs based on guest's "virtual" SGX_LEPUBKEYHASHn
before EINIT from guest.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/dfb7cd39d4dd62ea27703b64afdd8bccb579f623.1616136308.git.kai.huang@intel.com
---
 arch/x86/kernel/cpu/sgx/ioctl.c |  5 ++---
 arch/x86/kernel/cpu/sgx/main.c  | 16 ++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h   |  2 ++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 11e3f96..7be9c06 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -495,7 +495,7 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 			 void *token)
 {
 	u64 mrsigner[4];
-	int i, j, k;
+	int i, j;
 	void *addr;
 	int ret;
 
@@ -544,8 +544,7 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 
 			preempt_disable();
 
-			for (k = 0; k < 4; k++)
-				wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0 + k, mrsigner[k]);
+			sgx_update_lepubkeyhash(mrsigner);
 
 			ret = __einit(sigstruct, token, addr);
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 1c8a228..227f1e2 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -727,6 +727,22 @@ static bool __init sgx_page_cache_init(void)
 	return true;
 }
 
+/*
+ * Update the SGX_LEPUBKEYHASH MSRs to the values specified by caller.
+ * Bare-metal driver requires to update them to hash of enclave's signer
+ * before EINIT. KVM needs to update them to guest's virtual MSR values
+ * before doing EINIT from guest.
+ */
+void sgx_update_lepubkeyhash(u64 *lepubkeyhash)
+{
+	int i;
+
+	WARN_ON_ONCE(preemptible());
+
+	for (i = 0; i < 4; i++)
+		wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0 + i, lepubkeyhash[i]);
+}
+
 static int __init sgx_init(void)
 {
 	int ret;
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index e4cbc71..4628ace 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -93,4 +93,6 @@ static inline int __init sgx_vepc_init(void)
 }
 #endif
 
+void sgx_update_lepubkeyhash(u64 *lepubkeyhash);
+
 #endif /* _X86_SGX_H */

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/sgx: Add encls_faulted() helper
  2021-03-19  7:23 ` [PATCH v3 11/25] x86/sgx: Add encls_faulted() helper Kai Huang
@ 2021-04-07 10:03   ` tip-bot2 for Sean Christopherson
  0 siblings, 0 replies; 134+ messages in thread
From: tip-bot2 for Sean Christopherson @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sean Christopherson, Kai Huang, Borislav Petkov, Jarkko Sakkinen,
	Dave Hansen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     a67136b458e5e63822b19c35794451122fe2bf3e
Gitweb:        https://git.kernel.org/tip/a67136b458e5e63822b19c35794451122fe2bf3e
Author:        Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:23:06 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 06 Apr 2021 09:43:42 +02:00

x86/sgx: Add encls_faulted() helper

Add a helper to extract the fault indicator from an encoded ENCLS return
value.  SGX virtualization will also need to detect ENCLS faults.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/c1f955898110de2f669da536fc6cf62e003dff88.1616136308.git.kai.huang@intel.com
---
 arch/x86/kernel/cpu/sgx/encls.h | 15 ++++++++++++++-
 arch/x86/kernel/cpu/sgx/ioctl.c |  2 +-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index be5c496..9b20484 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -40,6 +40,19 @@
 	} while (0);							  \
 }
 
+/*
+ * encls_faulted() - Check if an ENCLS leaf faulted given an error code
+ * @ret:	the return value of an ENCLS leaf function call
+ *
+ * Return:
+ * - true:	ENCLS leaf faulted.
+ * - false:	Otherwise.
+ */
+static inline bool encls_faulted(int ret)
+{
+	return ret & ENCLS_FAULT_FLAG;
+}
+
 /**
  * encls_failed() - Check if an ENCLS function failed
  * @ret:	the return value of an ENCLS function call
@@ -50,7 +63,7 @@
  */
 static inline bool encls_failed(int ret)
 {
-	if (ret & ENCLS_FAULT_FLAG)
+	if (encls_faulted(ret))
 		return ENCLS_TRAPNR(ret) != X86_TRAP_PF;
 
 	return !!ret;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 354e309..11e3f96 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -568,7 +568,7 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 		}
 	}
 
-	if (ret & ENCLS_FAULT_FLAG) {
+	if (encls_faulted(ret)) {
 		if (encls_failed(ret))
 			ENCLS_WARN(ret, "EINIT");
 

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/sgx: Move ENCLS leaf definitions to sgx.h
  2021-03-19  7:23 ` [PATCH v3 09/25] x86/sgx: Move ENCLS leaf definitions to sgx.h Kai Huang
@ 2021-04-07 10:03   ` tip-bot2 for Sean Christopherson
  0 siblings, 0 replies; 134+ messages in thread
From: tip-bot2 for Sean Christopherson @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sean Christopherson, Kai Huang, Borislav Petkov, Jarkko Sakkinen,
	Dave Hansen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     9c55c78a73ce6e62a1d46ba6e4f242c23c29b812
Gitweb:        https://git.kernel.org/tip/9c55c78a73ce6e62a1d46ba6e4f242c23c29b812
Author:        Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:23:04 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 06 Apr 2021 09:43:41 +02:00

x86/sgx: Move ENCLS leaf definitions to sgx.h

Move the ENCLS leaf definitions to sgx.h so that they can be used by
KVM.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/2e6cd7c5c1ced620cfcd292c3c6c382827fde6b2.1616136308.git.kai.huang@intel.com
---
 arch/x86/include/asm/sgx.h      | 15 +++++++++++++++
 arch/x86/kernel/cpu/sgx/encls.h | 15 ---------------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 14bb5f7..34f4423 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -27,6 +27,21 @@
 /* The bitmask for the EPC section type. */
 #define SGX_CPUID_EPC_MASK	GENMASK(3, 0)
 
+enum sgx_encls_function {
+	ECREATE	= 0x00,
+	EADD	= 0x01,
+	EINIT	= 0x02,
+	EREMOVE	= 0x03,
+	EDGBRD	= 0x04,
+	EDGBWR	= 0x05,
+	EEXTEND	= 0x06,
+	ELDU	= 0x08,
+	EBLOCK	= 0x09,
+	EPA	= 0x0A,
+	EWB	= 0x0B,
+	ETRACK	= 0x0C,
+};
+
 /**
  * enum sgx_return_code - The return code type for ENCLS, ENCLU and ENCLV
  * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index 443188f..be5c496 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -11,21 +11,6 @@
 #include <asm/traps.h>
 #include "sgx.h"
 
-enum sgx_encls_function {
-	ECREATE	= 0x00,
-	EADD	= 0x01,
-	EINIT	= 0x02,
-	EREMOVE	= 0x03,
-	EDGBRD	= 0x04,
-	EDGBWR	= 0x05,
-	EEXTEND	= 0x06,
-	ELDU	= 0x08,
-	EBLOCK	= 0x09,
-	EPA	= 0x0A,
-	EWB	= 0x0B,
-	ETRACK	= 0x0C,
-};
-
 /**
  * ENCLS_FAULT_FLAG - flag signifying an ENCLS return code is a trapnr
  *

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT)
  2021-03-19  7:23 ` [PATCH v3 10/25] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT) Kai Huang
@ 2021-04-07 10:03   ` tip-bot2 for Sean Christopherson
  0 siblings, 0 replies; 134+ messages in thread
From: tip-bot2 for Sean Christopherson @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sean Christopherson, Kai Huang, Borislav Petkov, Jarkko Sakkinen,
	Dave Hansen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     32ddda8e445df3de477db14d386fb3518042224a
Gitweb:        https://git.kernel.org/tip/32ddda8e445df3de477db14d386fb3518042224a
Author:        Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:23:05 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 06 Apr 2021 09:43:42 +02:00

x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT)

Define the ENCLS leafs that are available with SGX2, also referred to as
Enclave Dynamic Memory Management (EDMM).  The leafs will be used by KVM
to conditionally expose SGX2 capabilities to guests.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/5f0970c251ebcc6d5add132f0d750cc753b7060f.1616136308.git.kai.huang@intel.com
---
 arch/x86/include/asm/sgx.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 34f4423..3b025af 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -40,6 +40,9 @@ enum sgx_encls_function {
 	EPA	= 0x0A,
 	EWB	= 0x0B,
 	ETRACK	= 0x0C,
+	EAUG	= 0x0D,
+	EMODPR	= 0x0E,
+	EMODT	= 0x0F,
 };
 
 /**

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/sgx: Expose SGX architectural definitions to the kernel
  2021-03-19  7:23 ` [PATCH v3 08/25] x86/sgx: Expose SGX architectural definitions to the kernel Kai Huang
@ 2021-04-07 10:03   ` tip-bot2 for Sean Christopherson
  0 siblings, 0 replies; 134+ messages in thread
From: tip-bot2 for Sean Christopherson @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sean Christopherson, Kai Huang, Borislav Petkov, Jarkko Sakkinen,
	Dave Hansen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     8ca52cc38dc8fdcbdbd0c23eafb19db5e5f5c8d0
Gitweb:        https://git.kernel.org/tip/8ca52cc38dc8fdcbdbd0c23eafb19db5e5f5c8d0
Author:        Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:23:03 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 06 Apr 2021 09:43:41 +02:00

x86/sgx: Expose SGX architectural definitions to the kernel

Expose SGX architectural structures, as KVM will use many of the
architectural constants and structs to virtualize SGX.

Name the new header file as asm/sgx.h, rather than asm/sgx_arch.h, to
have single header to provide SGX facilities to share with other kernel
componments. Also update MAINTAINERS to include asm/sgx.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/6bf47acd91ab4d709e66ad1692c7803e4c9063a0.1616136308.git.kai.huang@intel.com
---
 MAINTAINERS                           |   1 +-
 arch/x86/include/asm/sgx.h            | 350 +++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/arch.h        | 340 +------------------------
 arch/x86/kernel/cpu/sgx/encl.c        |   2 +-
 arch/x86/kernel/cpu/sgx/sgx.h         |   2 +-
 tools/testing/selftests/sgx/defines.h |   2 +-
 6 files changed, 354 insertions(+), 343 deletions(-)
 create mode 100644 arch/x86/include/asm/sgx.h
 delete mode 100644 arch/x86/kernel/cpu/sgx/arch.h

diff --git a/MAINTAINERS b/MAINTAINERS
index aa84121..0cb606a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9274,6 +9274,7 @@ Q:	https://patchwork.kernel.org/project/intel-sgx/list/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/sgx
 F:	Documentation/x86/sgx.rst
 F:	arch/x86/entry/vdso/vsgx.S
+F:	arch/x86/include/asm/sgx.h
 F:	arch/x86/include/uapi/asm/sgx.h
 F:	arch/x86/kernel/cpu/sgx/*
 F:	tools/testing/selftests/sgx/*
diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
new file mode 100644
index 0000000..14bb5f7
--- /dev/null
+++ b/arch/x86/include/asm/sgx.h
@@ -0,0 +1,350 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/**
+ * Copyright(c) 2016-20 Intel Corporation.
+ *
+ * Intel Software Guard Extensions (SGX) support.
+ */
+#ifndef _ASM_X86_SGX_H
+#define _ASM_X86_SGX_H
+
+#include <linux/bits.h>
+#include <linux/types.h>
+
+/*
+ * This file contains both data structures defined by SGX architecture and Linux
+ * defined software data structures and functions.  The two should not be mixed
+ * together for better readibility.  The architectural definitions come first.
+ */
+
+/* The SGX specific CPUID function. */
+#define SGX_CPUID		0x12
+/* EPC enumeration. */
+#define SGX_CPUID_EPC		2
+/* An invalid EPC section, i.e. the end marker. */
+#define SGX_CPUID_EPC_INVALID	0x0
+/* A valid EPC section. */
+#define SGX_CPUID_EPC_SECTION	0x1
+/* The bitmask for the EPC section type. */
+#define SGX_CPUID_EPC_MASK	GENMASK(3, 0)
+
+/**
+ * enum sgx_return_code - The return code type for ENCLS, ENCLU and ENCLV
+ * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
+ *				been completed yet.
+ * %SGX_CHILD_PRESENT		SECS has child pages present in the EPC.
+ * %SGX_INVALID_EINITTOKEN:	EINITTOKEN is invalid and enclave signer's
+ *				public key does not match IA32_SGXLEPUBKEYHASH.
+ * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was received
+ */
+enum sgx_return_code {
+	SGX_NOT_TRACKED			= 11,
+	SGX_CHILD_PRESENT		= 13,
+	SGX_INVALID_EINITTOKEN		= 16,
+	SGX_UNMASKED_EVENT		= 128,
+};
+
+/* The modulus size for 3072-bit RSA keys. */
+#define SGX_MODULUS_SIZE 384
+
+/**
+ * enum sgx_miscselect - additional information to an SSA frame
+ * %SGX_MISC_EXINFO:	Report #PF or #GP to the SSA frame.
+ *
+ * Save State Area (SSA) is a stack inside the enclave used to store processor
+ * state when an exception or interrupt occurs. This enum defines additional
+ * information stored to an SSA frame.
+ */
+enum sgx_miscselect {
+	SGX_MISC_EXINFO		= BIT(0),
+};
+
+#define SGX_MISC_RESERVED_MASK	GENMASK_ULL(63, 1)
+
+#define SGX_SSA_GPRS_SIZE		184
+#define SGX_SSA_MISC_EXINFO_SIZE	16
+
+/**
+ * enum sgx_attributes - the attributes field in &struct sgx_secs
+ * %SGX_ATTR_INIT:		Enclave can be entered (is initialized).
+ * %SGX_ATTR_DEBUG:		Allow ENCLS(EDBGRD) and ENCLS(EDBGWR).
+ * %SGX_ATTR_MODE64BIT:		Tell that this a 64-bit enclave.
+ * %SGX_ATTR_PROVISIONKEY:      Allow to use provisioning keys for remote
+ *				attestation.
+ * %SGX_ATTR_KSS:		Allow to use key separation and sharing (KSS).
+ * %SGX_ATTR_EINITTOKENKEY:	Allow to use token signing key that is used to
+ *				sign cryptographic tokens that can be passed to
+ *				EINIT as an authorization to run an enclave.
+ */
+enum sgx_attribute {
+	SGX_ATTR_INIT		= BIT(0),
+	SGX_ATTR_DEBUG		= BIT(1),
+	SGX_ATTR_MODE64BIT	= BIT(2),
+	SGX_ATTR_PROVISIONKEY	= BIT(4),
+	SGX_ATTR_EINITTOKENKEY	= BIT(5),
+	SGX_ATTR_KSS		= BIT(7),
+};
+
+#define SGX_ATTR_RESERVED_MASK	(BIT_ULL(3) | BIT_ULL(6) | GENMASK_ULL(63, 8))
+
+/**
+ * struct sgx_secs - SGX Enclave Control Structure (SECS)
+ * @size:		size of the address space
+ * @base:		base address of the  address space
+ * @ssa_frame_size:	size of an SSA frame
+ * @miscselect:		additional information stored to an SSA frame
+ * @attributes:		attributes for enclave
+ * @xfrm:		XSave-Feature Request Mask (subset of XCR0)
+ * @mrenclave:		SHA256-hash of the enclave contents
+ * @mrsigner:		SHA256-hash of the public key used to sign the SIGSTRUCT
+ * @config_id:		a user-defined value that is used in key derivation
+ * @isv_prod_id:	a user-defined value that is used in key derivation
+ * @isv_svn:		a user-defined value that is used in key derivation
+ * @config_svn:		a user-defined value that is used in key derivation
+ *
+ * SGX Enclave Control Structure (SECS) is a special enclave page that is not
+ * visible in the address space. In fact, this structure defines the address
+ * range and other global attributes for the enclave and it is the first EPC
+ * page created for any enclave. It is moved from a temporary buffer to an EPC
+ * by the means of ENCLS[ECREATE] function.
+ */
+struct sgx_secs {
+	u64 size;
+	u64 base;
+	u32 ssa_frame_size;
+	u32 miscselect;
+	u8  reserved1[24];
+	u64 attributes;
+	u64 xfrm;
+	u32 mrenclave[8];
+	u8  reserved2[32];
+	u32 mrsigner[8];
+	u8  reserved3[32];
+	u32 config_id[16];
+	u16 isv_prod_id;
+	u16 isv_svn;
+	u16 config_svn;
+	u8  reserved4[3834];
+} __packed;
+
+/**
+ * enum sgx_tcs_flags - execution flags for TCS
+ * %SGX_TCS_DBGOPTIN:	If enabled allows single-stepping and breakpoints
+ *			inside an enclave. It is cleared by EADD but can
+ *			be set later with EDBGWR.
+ */
+enum sgx_tcs_flags {
+	SGX_TCS_DBGOPTIN	= 0x01,
+};
+
+#define SGX_TCS_RESERVED_MASK	GENMASK_ULL(63, 1)
+#define SGX_TCS_RESERVED_SIZE	4024
+
+/**
+ * struct sgx_tcs - Thread Control Structure (TCS)
+ * @state:		used to mark an entered TCS
+ * @flags:		execution flags (cleared by EADD)
+ * @ssa_offset:		SSA stack offset relative to the enclave base
+ * @ssa_index:		the current SSA frame index (cleard by EADD)
+ * @nr_ssa_frames:	the number of frame in the SSA stack
+ * @entry_offset:	entry point offset relative to the enclave base
+ * @exit_addr:		address outside the enclave to exit on an exception or
+ *			interrupt
+ * @fs_offset:		offset relative to the enclave base to become FS
+ *			segment inside the enclave
+ * @gs_offset:		offset relative to the enclave base to become GS
+ *			segment inside the enclave
+ * @fs_limit:		size to become a new FS-limit (only 32-bit enclaves)
+ * @gs_limit:		size to become a new GS-limit (only 32-bit enclaves)
+ *
+ * Thread Control Structure (TCS) is an enclave page visible in its address
+ * space that defines an entry point inside the enclave. A thread enters inside
+ * an enclave by supplying address of TCS to ENCLU(EENTER). A TCS can be entered
+ * by only one thread at a time.
+ */
+struct sgx_tcs {
+	u64 state;
+	u64 flags;
+	u64 ssa_offset;
+	u32 ssa_index;
+	u32 nr_ssa_frames;
+	u64 entry_offset;
+	u64 exit_addr;
+	u64 fs_offset;
+	u64 gs_offset;
+	u32 fs_limit;
+	u32 gs_limit;
+	u8  reserved[SGX_TCS_RESERVED_SIZE];
+} __packed;
+
+/**
+ * struct sgx_pageinfo - an enclave page descriptor
+ * @addr:	address of the enclave page
+ * @contents:	pointer to the page contents
+ * @metadata:	pointer either to a SECINFO or PCMD instance
+ * @secs:	address of the SECS page
+ */
+struct sgx_pageinfo {
+	u64 addr;
+	u64 contents;
+	u64 metadata;
+	u64 secs;
+} __packed __aligned(32);
+
+
+/**
+ * enum sgx_page_type - bits in the SECINFO flags defining the page type
+ * %SGX_PAGE_TYPE_SECS:	a SECS page
+ * %SGX_PAGE_TYPE_TCS:	a TCS page
+ * %SGX_PAGE_TYPE_REG:	a regular page
+ * %SGX_PAGE_TYPE_VA:	a VA page
+ * %SGX_PAGE_TYPE_TRIM:	a page in trimmed state
+ */
+enum sgx_page_type {
+	SGX_PAGE_TYPE_SECS,
+	SGX_PAGE_TYPE_TCS,
+	SGX_PAGE_TYPE_REG,
+	SGX_PAGE_TYPE_VA,
+	SGX_PAGE_TYPE_TRIM,
+};
+
+#define SGX_NR_PAGE_TYPES	5
+#define SGX_PAGE_TYPE_MASK	GENMASK(7, 0)
+
+/**
+ * enum sgx_secinfo_flags - the flags field in &struct sgx_secinfo
+ * %SGX_SECINFO_R:	allow read
+ * %SGX_SECINFO_W:	allow write
+ * %SGX_SECINFO_X:	allow execution
+ * %SGX_SECINFO_SECS:	a SECS page
+ * %SGX_SECINFO_TCS:	a TCS page
+ * %SGX_SECINFO_REG:	a regular page
+ * %SGX_SECINFO_VA:	a VA page
+ * %SGX_SECINFO_TRIM:	a page in trimmed state
+ */
+enum sgx_secinfo_flags {
+	SGX_SECINFO_R			= BIT(0),
+	SGX_SECINFO_W			= BIT(1),
+	SGX_SECINFO_X			= BIT(2),
+	SGX_SECINFO_SECS		= (SGX_PAGE_TYPE_SECS << 8),
+	SGX_SECINFO_TCS			= (SGX_PAGE_TYPE_TCS << 8),
+	SGX_SECINFO_REG			= (SGX_PAGE_TYPE_REG << 8),
+	SGX_SECINFO_VA			= (SGX_PAGE_TYPE_VA << 8),
+	SGX_SECINFO_TRIM		= (SGX_PAGE_TYPE_TRIM << 8),
+};
+
+#define SGX_SECINFO_PERMISSION_MASK	GENMASK_ULL(2, 0)
+#define SGX_SECINFO_PAGE_TYPE_MASK	(SGX_PAGE_TYPE_MASK << 8)
+#define SGX_SECINFO_RESERVED_MASK	~(SGX_SECINFO_PERMISSION_MASK | \
+					  SGX_SECINFO_PAGE_TYPE_MASK)
+
+/**
+ * struct sgx_secinfo - describes attributes of an EPC page
+ * @flags:	permissions and type
+ *
+ * Used together with ENCLS leaves that add or modify an EPC page to an
+ * enclave to define page permissions and type.
+ */
+struct sgx_secinfo {
+	u64 flags;
+	u8  reserved[56];
+} __packed __aligned(64);
+
+#define SGX_PCMD_RESERVED_SIZE 40
+
+/**
+ * struct sgx_pcmd - Paging Crypto Metadata (PCMD)
+ * @enclave_id:	enclave identifier
+ * @mac:	MAC over PCMD, page contents and isvsvn
+ *
+ * PCMD is stored for every swapped page to the regular memory. When ELDU loads
+ * the page back it recalculates the MAC by using a isvsvn number stored in a
+ * VA page. Together these two structures bring integrity and rollback
+ * protection.
+ */
+struct sgx_pcmd {
+	struct sgx_secinfo secinfo;
+	u64 enclave_id;
+	u8  reserved[SGX_PCMD_RESERVED_SIZE];
+	u8  mac[16];
+} __packed __aligned(128);
+
+#define SGX_SIGSTRUCT_RESERVED1_SIZE 84
+#define SGX_SIGSTRUCT_RESERVED2_SIZE 20
+#define SGX_SIGSTRUCT_RESERVED3_SIZE 32
+#define SGX_SIGSTRUCT_RESERVED4_SIZE 12
+
+/**
+ * struct sgx_sigstruct_header -  defines author of the enclave
+ * @header1:		constant byte string
+ * @vendor:		must be either 0x0000 or 0x8086
+ * @date:		YYYYMMDD in BCD
+ * @header2:		costant byte string
+ * @swdefined:		software defined value
+ */
+struct sgx_sigstruct_header {
+	u64 header1[2];
+	u32 vendor;
+	u32 date;
+	u64 header2[2];
+	u32 swdefined;
+	u8  reserved1[84];
+} __packed;
+
+/**
+ * struct sgx_sigstruct_body - defines contents of the enclave
+ * @miscselect:		additional information stored to an SSA frame
+ * @misc_mask:		required miscselect in SECS
+ * @attributes:		attributes for enclave
+ * @xfrm:		XSave-Feature Request Mask (subset of XCR0)
+ * @attributes_mask:	required attributes in SECS
+ * @xfrm_mask:		required XFRM in SECS
+ * @mrenclave:		SHA256-hash of the enclave contents
+ * @isvprodid:		a user-defined value that is used in key derivation
+ * @isvsvn:		a user-defined value that is used in key derivation
+ */
+struct sgx_sigstruct_body {
+	u32 miscselect;
+	u32 misc_mask;
+	u8  reserved2[20];
+	u64 attributes;
+	u64 xfrm;
+	u64 attributes_mask;
+	u64 xfrm_mask;
+	u8  mrenclave[32];
+	u8  reserved3[32];
+	u16 isvprodid;
+	u16 isvsvn;
+} __packed;
+
+/**
+ * struct sgx_sigstruct - an enclave signature
+ * @header:		defines author of the enclave
+ * @modulus:		the modulus of the public key
+ * @exponent:		the exponent of the public key
+ * @signature:		the signature calculated over the fields except modulus,
+ * @body:		defines contents of the enclave
+ * @q1:			a value used in RSA signature verification
+ * @q2:			a value used in RSA signature verification
+ *
+ * Header and body are the parts that are actual signed. The remaining fields
+ * define the signature of the enclave.
+ */
+struct sgx_sigstruct {
+	struct sgx_sigstruct_header header;
+	u8  modulus[SGX_MODULUS_SIZE];
+	u32 exponent;
+	u8  signature[SGX_MODULUS_SIZE];
+	struct sgx_sigstruct_body body;
+	u8  reserved4[12];
+	u8  q1[SGX_MODULUS_SIZE];
+	u8  q2[SGX_MODULUS_SIZE];
+} __packed;
+
+#define SGX_LAUNCH_TOKEN_SIZE 304
+
+/*
+ * Do not put any hardware-defined SGX structure representations below this
+ * comment!
+ */
+
+#endif /* _ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/arch.h b/arch/x86/kernel/cpu/sgx/arch.h
deleted file mode 100644
index abf99bb..0000000
--- a/arch/x86/kernel/cpu/sgx/arch.h
+++ /dev/null
@@ -1,340 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/**
- * Copyright(c) 2016-20 Intel Corporation.
- *
- * Contains data structures defined by the SGX architecture.  Data structures
- * defined by the Linux software stack should not be placed here.
- */
-#ifndef _ASM_X86_SGX_ARCH_H
-#define _ASM_X86_SGX_ARCH_H
-
-#include <linux/bits.h>
-#include <linux/types.h>
-
-/* The SGX specific CPUID function. */
-#define SGX_CPUID		0x12
-/* EPC enumeration. */
-#define SGX_CPUID_EPC		2
-/* An invalid EPC section, i.e. the end marker. */
-#define SGX_CPUID_EPC_INVALID	0x0
-/* A valid EPC section. */
-#define SGX_CPUID_EPC_SECTION	0x1
-/* The bitmask for the EPC section type. */
-#define SGX_CPUID_EPC_MASK	GENMASK(3, 0)
-
-/**
- * enum sgx_return_code - The return code type for ENCLS, ENCLU and ENCLV
- * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
- *				been completed yet.
- * %SGX_CHILD_PRESENT		SECS has child pages present in the EPC.
- * %SGX_INVALID_EINITTOKEN:	EINITTOKEN is invalid and enclave signer's
- *				public key does not match IA32_SGXLEPUBKEYHASH.
- * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was received
- */
-enum sgx_return_code {
-	SGX_NOT_TRACKED			= 11,
-	SGX_CHILD_PRESENT		= 13,
-	SGX_INVALID_EINITTOKEN		= 16,
-	SGX_UNMASKED_EVENT		= 128,
-};
-
-/* The modulus size for 3072-bit RSA keys. */
-#define SGX_MODULUS_SIZE 384
-
-/**
- * enum sgx_miscselect - additional information to an SSA frame
- * %SGX_MISC_EXINFO:	Report #PF or #GP to the SSA frame.
- *
- * Save State Area (SSA) is a stack inside the enclave used to store processor
- * state when an exception or interrupt occurs. This enum defines additional
- * information stored to an SSA frame.
- */
-enum sgx_miscselect {
-	SGX_MISC_EXINFO		= BIT(0),
-};
-
-#define SGX_MISC_RESERVED_MASK	GENMASK_ULL(63, 1)
-
-#define SGX_SSA_GPRS_SIZE		184
-#define SGX_SSA_MISC_EXINFO_SIZE	16
-
-/**
- * enum sgx_attributes - the attributes field in &struct sgx_secs
- * %SGX_ATTR_INIT:		Enclave can be entered (is initialized).
- * %SGX_ATTR_DEBUG:		Allow ENCLS(EDBGRD) and ENCLS(EDBGWR).
- * %SGX_ATTR_MODE64BIT:		Tell that this a 64-bit enclave.
- * %SGX_ATTR_PROVISIONKEY:      Allow to use provisioning keys for remote
- *				attestation.
- * %SGX_ATTR_KSS:		Allow to use key separation and sharing (KSS).
- * %SGX_ATTR_EINITTOKENKEY:	Allow to use token signing key that is used to
- *				sign cryptographic tokens that can be passed to
- *				EINIT as an authorization to run an enclave.
- */
-enum sgx_attribute {
-	SGX_ATTR_INIT		= BIT(0),
-	SGX_ATTR_DEBUG		= BIT(1),
-	SGX_ATTR_MODE64BIT	= BIT(2),
-	SGX_ATTR_PROVISIONKEY	= BIT(4),
-	SGX_ATTR_EINITTOKENKEY	= BIT(5),
-	SGX_ATTR_KSS		= BIT(7),
-};
-
-#define SGX_ATTR_RESERVED_MASK	(BIT_ULL(3) | BIT_ULL(6) | GENMASK_ULL(63, 8))
-
-/**
- * struct sgx_secs - SGX Enclave Control Structure (SECS)
- * @size:		size of the address space
- * @base:		base address of the  address space
- * @ssa_frame_size:	size of an SSA frame
- * @miscselect:		additional information stored to an SSA frame
- * @attributes:		attributes for enclave
- * @xfrm:		XSave-Feature Request Mask (subset of XCR0)
- * @mrenclave:		SHA256-hash of the enclave contents
- * @mrsigner:		SHA256-hash of the public key used to sign the SIGSTRUCT
- * @config_id:		a user-defined value that is used in key derivation
- * @isv_prod_id:	a user-defined value that is used in key derivation
- * @isv_svn:		a user-defined value that is used in key derivation
- * @config_svn:		a user-defined value that is used in key derivation
- *
- * SGX Enclave Control Structure (SECS) is a special enclave page that is not
- * visible in the address space. In fact, this structure defines the address
- * range and other global attributes for the enclave and it is the first EPC
- * page created for any enclave. It is moved from a temporary buffer to an EPC
- * by the means of ENCLS[ECREATE] function.
- */
-struct sgx_secs {
-	u64 size;
-	u64 base;
-	u32 ssa_frame_size;
-	u32 miscselect;
-	u8  reserved1[24];
-	u64 attributes;
-	u64 xfrm;
-	u32 mrenclave[8];
-	u8  reserved2[32];
-	u32 mrsigner[8];
-	u8  reserved3[32];
-	u32 config_id[16];
-	u16 isv_prod_id;
-	u16 isv_svn;
-	u16 config_svn;
-	u8  reserved4[3834];
-} __packed;
-
-/**
- * enum sgx_tcs_flags - execution flags for TCS
- * %SGX_TCS_DBGOPTIN:	If enabled allows single-stepping and breakpoints
- *			inside an enclave. It is cleared by EADD but can
- *			be set later with EDBGWR.
- */
-enum sgx_tcs_flags {
-	SGX_TCS_DBGOPTIN	= 0x01,
-};
-
-#define SGX_TCS_RESERVED_MASK	GENMASK_ULL(63, 1)
-#define SGX_TCS_RESERVED_SIZE	4024
-
-/**
- * struct sgx_tcs - Thread Control Structure (TCS)
- * @state:		used to mark an entered TCS
- * @flags:		execution flags (cleared by EADD)
- * @ssa_offset:		SSA stack offset relative to the enclave base
- * @ssa_index:		the current SSA frame index (cleard by EADD)
- * @nr_ssa_frames:	the number of frame in the SSA stack
- * @entry_offset:	entry point offset relative to the enclave base
- * @exit_addr:		address outside the enclave to exit on an exception or
- *			interrupt
- * @fs_offset:		offset relative to the enclave base to become FS
- *			segment inside the enclave
- * @gs_offset:		offset relative to the enclave base to become GS
- *			segment inside the enclave
- * @fs_limit:		size to become a new FS-limit (only 32-bit enclaves)
- * @gs_limit:		size to become a new GS-limit (only 32-bit enclaves)
- *
- * Thread Control Structure (TCS) is an enclave page visible in its address
- * space that defines an entry point inside the enclave. A thread enters inside
- * an enclave by supplying address of TCS to ENCLU(EENTER). A TCS can be entered
- * by only one thread at a time.
- */
-struct sgx_tcs {
-	u64 state;
-	u64 flags;
-	u64 ssa_offset;
-	u32 ssa_index;
-	u32 nr_ssa_frames;
-	u64 entry_offset;
-	u64 exit_addr;
-	u64 fs_offset;
-	u64 gs_offset;
-	u32 fs_limit;
-	u32 gs_limit;
-	u8  reserved[SGX_TCS_RESERVED_SIZE];
-} __packed;
-
-/**
- * struct sgx_pageinfo - an enclave page descriptor
- * @addr:	address of the enclave page
- * @contents:	pointer to the page contents
- * @metadata:	pointer either to a SECINFO or PCMD instance
- * @secs:	address of the SECS page
- */
-struct sgx_pageinfo {
-	u64 addr;
-	u64 contents;
-	u64 metadata;
-	u64 secs;
-} __packed __aligned(32);
-
-
-/**
- * enum sgx_page_type - bits in the SECINFO flags defining the page type
- * %SGX_PAGE_TYPE_SECS:	a SECS page
- * %SGX_PAGE_TYPE_TCS:	a TCS page
- * %SGX_PAGE_TYPE_REG:	a regular page
- * %SGX_PAGE_TYPE_VA:	a VA page
- * %SGX_PAGE_TYPE_TRIM:	a page in trimmed state
- */
-enum sgx_page_type {
-	SGX_PAGE_TYPE_SECS,
-	SGX_PAGE_TYPE_TCS,
-	SGX_PAGE_TYPE_REG,
-	SGX_PAGE_TYPE_VA,
-	SGX_PAGE_TYPE_TRIM,
-};
-
-#define SGX_NR_PAGE_TYPES	5
-#define SGX_PAGE_TYPE_MASK	GENMASK(7, 0)
-
-/**
- * enum sgx_secinfo_flags - the flags field in &struct sgx_secinfo
- * %SGX_SECINFO_R:	allow read
- * %SGX_SECINFO_W:	allow write
- * %SGX_SECINFO_X:	allow execution
- * %SGX_SECINFO_SECS:	a SECS page
- * %SGX_SECINFO_TCS:	a TCS page
- * %SGX_SECINFO_REG:	a regular page
- * %SGX_SECINFO_VA:	a VA page
- * %SGX_SECINFO_TRIM:	a page in trimmed state
- */
-enum sgx_secinfo_flags {
-	SGX_SECINFO_R			= BIT(0),
-	SGX_SECINFO_W			= BIT(1),
-	SGX_SECINFO_X			= BIT(2),
-	SGX_SECINFO_SECS		= (SGX_PAGE_TYPE_SECS << 8),
-	SGX_SECINFO_TCS			= (SGX_PAGE_TYPE_TCS << 8),
-	SGX_SECINFO_REG			= (SGX_PAGE_TYPE_REG << 8),
-	SGX_SECINFO_VA			= (SGX_PAGE_TYPE_VA << 8),
-	SGX_SECINFO_TRIM		= (SGX_PAGE_TYPE_TRIM << 8),
-};
-
-#define SGX_SECINFO_PERMISSION_MASK	GENMASK_ULL(2, 0)
-#define SGX_SECINFO_PAGE_TYPE_MASK	(SGX_PAGE_TYPE_MASK << 8)
-#define SGX_SECINFO_RESERVED_MASK	~(SGX_SECINFO_PERMISSION_MASK | \
-					  SGX_SECINFO_PAGE_TYPE_MASK)
-
-/**
- * struct sgx_secinfo - describes attributes of an EPC page
- * @flags:	permissions and type
- *
- * Used together with ENCLS leaves that add or modify an EPC page to an
- * enclave to define page permissions and type.
- */
-struct sgx_secinfo {
-	u64 flags;
-	u8  reserved[56];
-} __packed __aligned(64);
-
-#define SGX_PCMD_RESERVED_SIZE 40
-
-/**
- * struct sgx_pcmd - Paging Crypto Metadata (PCMD)
- * @enclave_id:	enclave identifier
- * @mac:	MAC over PCMD, page contents and isvsvn
- *
- * PCMD is stored for every swapped page to the regular memory. When ELDU loads
- * the page back it recalculates the MAC by using a isvsvn number stored in a
- * VA page. Together these two structures bring integrity and rollback
- * protection.
- */
-struct sgx_pcmd {
-	struct sgx_secinfo secinfo;
-	u64 enclave_id;
-	u8  reserved[SGX_PCMD_RESERVED_SIZE];
-	u8  mac[16];
-} __packed __aligned(128);
-
-#define SGX_SIGSTRUCT_RESERVED1_SIZE 84
-#define SGX_SIGSTRUCT_RESERVED2_SIZE 20
-#define SGX_SIGSTRUCT_RESERVED3_SIZE 32
-#define SGX_SIGSTRUCT_RESERVED4_SIZE 12
-
-/**
- * struct sgx_sigstruct_header -  defines author of the enclave
- * @header1:		constant byte string
- * @vendor:		must be either 0x0000 or 0x8086
- * @date:		YYYYMMDD in BCD
- * @header2:		costant byte string
- * @swdefined:		software defined value
- */
-struct sgx_sigstruct_header {
-	u64 header1[2];
-	u32 vendor;
-	u32 date;
-	u64 header2[2];
-	u32 swdefined;
-	u8  reserved1[84];
-} __packed;
-
-/**
- * struct sgx_sigstruct_body - defines contents of the enclave
- * @miscselect:		additional information stored to an SSA frame
- * @misc_mask:		required miscselect in SECS
- * @attributes:		attributes for enclave
- * @xfrm:		XSave-Feature Request Mask (subset of XCR0)
- * @attributes_mask:	required attributes in SECS
- * @xfrm_mask:		required XFRM in SECS
- * @mrenclave:		SHA256-hash of the enclave contents
- * @isvprodid:		a user-defined value that is used in key derivation
- * @isvsvn:		a user-defined value that is used in key derivation
- */
-struct sgx_sigstruct_body {
-	u32 miscselect;
-	u32 misc_mask;
-	u8  reserved2[20];
-	u64 attributes;
-	u64 xfrm;
-	u64 attributes_mask;
-	u64 xfrm_mask;
-	u8  mrenclave[32];
-	u8  reserved3[32];
-	u16 isvprodid;
-	u16 isvsvn;
-} __packed;
-
-/**
- * struct sgx_sigstruct - an enclave signature
- * @header:		defines author of the enclave
- * @modulus:		the modulus of the public key
- * @exponent:		the exponent of the public key
- * @signature:		the signature calculated over the fields except modulus,
- * @body:		defines contents of the enclave
- * @q1:			a value used in RSA signature verification
- * @q2:			a value used in RSA signature verification
- *
- * Header and body are the parts that are actual signed. The remaining fields
- * define the signature of the enclave.
- */
-struct sgx_sigstruct {
-	struct sgx_sigstruct_header header;
-	u8  modulus[SGX_MODULUS_SIZE];
-	u32 exponent;
-	u8  signature[SGX_MODULUS_SIZE];
-	struct sgx_sigstruct_body body;
-	u8  reserved4[12];
-	u8  q1[SGX_MODULUS_SIZE];
-	u8  q2[SGX_MODULUS_SIZE];
-} __packed;
-
-#define SGX_LAUNCH_TOKEN_SIZE 304
-
-#endif /* _ASM_X86_SGX_ARCH_H */
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index d25f2a2..3be2032 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -7,7 +7,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/suspend.h>
 #include <linux/sched/mm.h>
-#include "arch.h"
+#include <asm/sgx.h>
 #include "encl.h"
 #include "encls.h"
 #include "sgx.h"
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4854f39..e4cbc71 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -8,7 +8,7 @@
 #include <linux/rwsem.h>
 #include <linux/types.h>
 #include <asm/asm.h>
-#include "arch.h"
+#include <asm/sgx.h>
 
 #undef pr_fmt
 #define pr_fmt(fmt) "sgx: " fmt
diff --git a/tools/testing/selftests/sgx/defines.h b/tools/testing/selftests/sgx/defines.h
index 592c1cc..0bd7342 100644
--- a/tools/testing/selftests/sgx/defines.h
+++ b/tools/testing/selftests/sgx/defines.h
@@ -14,7 +14,7 @@
 #define __aligned(x) __attribute__((__aligned__(x)))
 #define __packed __attribute__((packed))
 
-#include "../../../../arch/x86/kernel/cpu/sgx/arch.h"
+#include "../../../../arch/x86/include/asm/sgx.h"
 #include "../../../../arch/x86/include/asm/enclu.h"
 #include "../../../../arch/x86/include/uapi/asm/sgx.h"
 

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled
  2021-03-19  7:23 ` [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled Kai Huang
  2021-04-02  9:48   ` Borislav Petkov
@ 2021-04-07 10:03   ` tip-bot2 for Kai Huang
  1 sibling, 0 replies; 134+ messages in thread
From: tip-bot2 for Kai Huang @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kai Huang, Borislav Petkov, Sean Christopherson, Jarkko Sakkinen,
	Dave Hansen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     faa7d3e6f3b983a28bf0f88f82dcb1c162e61105
Gitweb:        https://git.kernel.org/tip/faa7d3e6f3b983a28bf0f88f82dcb1c162e61105
Author:        Kai Huang <kai.huang@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:23:02 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 06 Apr 2021 09:43:41 +02:00

x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled

Modify sgx_init() to always try to initialize the virtual EPC driver,
even if the SGX driver is disabled.  The SGX driver might be disabled
if SGX Launch Control is in locked mode, or not supported in the
hardware at all.  This allows (non-Linux) guests that support non-LC
configurations to use SGX.

 [ bp: De-silli-fy the test. ]

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/d35d17a02bbf8feef83a536cec8b43746d4ea557.1616136308.git.kai.huang@intel.com
---
 arch/x86/kernel/cpu/sgx/main.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index b227629..1c8a228 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -743,8 +743,17 @@ static int __init sgx_init(void)
 		goto err_page_cache;
 	}
 
+	/*
+	 * Always try to initialize the native *and* KVM drivers.
+	 * The KVM driver is less picky than the native one and
+	 * can function if the native one is not supported on the
+	 * current system or fails to initialize.
+	 *
+	 * Error out only if both fail to initialize.
+	 */
 	ret = sgx_drv_init();
-	if (ret)
+
+	if (sgx_vepc_init() && ret)
 		goto err_kthread;
 
 	return 0;

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/cpu/intel: Allow SGX virtualization without Launch Control support
  2021-03-19  7:22 ` [PATCH v3 06/25] x86/cpu/intel: Allow SGX virtualization without Launch Control support Kai Huang
@ 2021-04-07 10:03   ` tip-bot2 for Sean Christopherson
  0 siblings, 0 replies; 134+ messages in thread
From: tip-bot2 for Sean Christopherson @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sean Christopherson, Kai Huang, Borislav Petkov, Jarkko Sakkinen,
	Dave Hansen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     332bfc7becf479de8a55864cc5ed0024baea28aa
Gitweb:        https://git.kernel.org/tip/332bfc7becf479de8a55864cc5ed0024baea28aa
Author:        Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:22:58 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 06 Apr 2021 09:43:41 +02:00

x86/cpu/intel: Allow SGX virtualization without Launch Control support

The kernel will currently disable all SGX support if the hardware does
not support launch control.  Make it more permissive to allow SGX
virtualization on systems without Launch Control support.  This will
allow KVM to expose SGX to guests that have less-strict requirements on
the availability of flexible launch control.

Improve error message to distinguish between three cases.  There are two
cases where SGX support is completely disabled:
1) SGX has been disabled completely by the BIOS
2) SGX LC is locked by the BIOS.  Bare-metal support is disabled because
   of LC unavailability.  SGX virtualization is unavailable (because of
   Kconfig).
One where it is partially available:
3) SGX LC is locked by the BIOS.  Bare-metal support is disabled because
   of LC unavailability.  SGX virtualization is supported.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/b3329777076509b3b601550da288c8f3c406a865.1616136308.git.kai.huang@intel.com
---
 arch/x86/kernel/cpu/feat_ctl.c | 59 ++++++++++++++++++++++++---------
 1 file changed, 44 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
index 27533a6..da696eb 100644
--- a/arch/x86/kernel/cpu/feat_ctl.c
+++ b/arch/x86/kernel/cpu/feat_ctl.c
@@ -104,8 +104,9 @@ early_param("nosgx", nosgx);
 
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 {
+	bool enable_sgx_kvm = false, enable_sgx_driver = false;
 	bool tboot = tboot_enabled();
-	bool enable_sgx;
+	bool enable_vmx;
 	u64 msr;
 
 	if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) {
@@ -114,13 +115,19 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 		return;
 	}
 
-	/*
-	 * Enable SGX if and only if the kernel supports SGX and Launch Control
-	 * is supported, i.e. disable SGX if the LE hash MSRs can't be written.
-	 */
-	enable_sgx = cpu_has(c, X86_FEATURE_SGX) &&
-		     cpu_has(c, X86_FEATURE_SGX_LC) &&
-		     IS_ENABLED(CONFIG_X86_SGX);
+	enable_vmx = cpu_has(c, X86_FEATURE_VMX) &&
+		     IS_ENABLED(CONFIG_KVM_INTEL);
+
+	if (cpu_has(c, X86_FEATURE_SGX) && IS_ENABLED(CONFIG_X86_SGX)) {
+		/*
+		 * Separate out SGX driver enabling from KVM.  This allows KVM
+		 * guests to use SGX even if the kernel SGX driver refuses to
+		 * use it.  This happens if flexible Launch Control is not
+		 * available.
+		 */
+		enable_sgx_driver = cpu_has(c, X86_FEATURE_SGX_LC);
+		enable_sgx_kvm = enable_vmx && IS_ENABLED(CONFIG_X86_SGX_KVM);
+	}
 
 	if (msr & FEAT_CTL_LOCKED)
 		goto update_caps;
@@ -136,15 +143,18 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 	 * i.e. KVM is enabled, to avoid unnecessarily adding an attack vector
 	 * for the kernel, e.g. using VMX to hide malicious code.
 	 */
-	if (cpu_has(c, X86_FEATURE_VMX) && IS_ENABLED(CONFIG_KVM_INTEL)) {
+	if (enable_vmx) {
 		msr |= FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;
 
 		if (tboot)
 			msr |= FEAT_CTL_VMX_ENABLED_INSIDE_SMX;
 	}
 
-	if (enable_sgx)
-		msr |= FEAT_CTL_SGX_ENABLED | FEAT_CTL_SGX_LC_ENABLED;
+	if (enable_sgx_kvm || enable_sgx_driver) {
+		msr |= FEAT_CTL_SGX_ENABLED;
+		if (enable_sgx_driver)
+			msr |= FEAT_CTL_SGX_LC_ENABLED;
+	}
 
 	wrmsrl(MSR_IA32_FEAT_CTL, msr);
 
@@ -167,10 +177,29 @@ update_caps:
 	}
 
 update_sgx:
-	if (!(msr & FEAT_CTL_SGX_ENABLED) ||
-	    !(msr & FEAT_CTL_SGX_LC_ENABLED) || !enable_sgx) {
-		if (enable_sgx)
-			pr_err_once("SGX disabled by BIOS\n");
+	if (!(msr & FEAT_CTL_SGX_ENABLED)) {
+		if (enable_sgx_kvm || enable_sgx_driver)
+			pr_err_once("SGX disabled by BIOS.\n");
 		clear_cpu_cap(c, X86_FEATURE_SGX);
+		return;
+	}
+
+	/*
+	 * VMX feature bit may be cleared due to being disabled in BIOS,
+	 * in which case SGX virtualization cannot be supported either.
+	 */
+	if (!cpu_has(c, X86_FEATURE_VMX) && enable_sgx_kvm) {
+		pr_err_once("SGX virtualization disabled due to lack of VMX.\n");
+		enable_sgx_kvm = 0;
+	}
+
+	if (!(msr & FEAT_CTL_SGX_LC_ENABLED) && enable_sgx_driver) {
+		if (!enable_sgx_kvm) {
+			pr_err_once("SGX Launch Control is locked. Disable SGX.\n");
+			clear_cpu_cap(c, X86_FEATURE_SGX);
+		} else {
+			pr_err_once("SGX Launch Control is locked. Support SGX virtualization only.\n");
+			clear_cpu_cap(c, X86_FEATURE_SGX_LC);
+		}
 	}
 }

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/sgx: Add SGX_CHILD_PRESENT hardware error code
  2021-03-19  7:22 ` [PATCH v3 04/25] x86/sgx: Add SGX_CHILD_PRESENT hardware error code Kai Huang
@ 2021-04-07 10:03   ` tip-bot2 for Sean Christopherson
  0 siblings, 0 replies; 134+ messages in thread
From: tip-bot2 for Sean Christopherson @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sean Christopherson, Kai Huang, Borislav Petkov, Dave Hansen,
	Jarkko Sakkinen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     231d3dbdda192e3b3c7b79f4c3b0616f6c7f31b7
Gitweb:        https://git.kernel.org/tip/231d3dbdda192e3b3c7b79f4c3b0616f6c7f31b7
Author:        Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:22:20 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 26 Mar 2021 22:51:36 +01:00

x86/sgx: Add SGX_CHILD_PRESENT hardware error code

SGX driver can accurately track how enclave pages are used.  This
enables SECS to be specifically targeted and EREMOVE'd only after all
child pages have been EREMOVE'd.  This ensures that SGX driver will
never encounter SGX_CHILD_PRESENT in normal operation.

Virtual EPC is different.  The host does not track how EPC pages are
used by the guest, so it cannot guarantee EREMOVE success.  It might,
for instance, encounter a SECS with a non-zero child count.

Add a definition of SGX_CHILD_PRESENT.  It will be used exclusively by
the SGX virtualization driver to handle recoverable EREMOVE errors when
saniziting EPC pages after they are freed.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/050b198e882afde7e6eba8e6a0d4da39161dbb5a.1616136308.git.kai.huang@intel.com
---
 arch/x86/kernel/cpu/sgx/arch.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/arch.h b/arch/x86/kernel/cpu/sgx/arch.h
index dd7602c..abf99bb 100644
--- a/arch/x86/kernel/cpu/sgx/arch.h
+++ b/arch/x86/kernel/cpu/sgx/arch.h
@@ -26,12 +26,14 @@
  * enum sgx_return_code - The return code type for ENCLS, ENCLU and ENCLV
  * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
  *				been completed yet.
+ * %SGX_CHILD_PRESENT		SECS has child pages present in the EPC.
  * %SGX_INVALID_EINITTOKEN:	EINITTOKEN is invalid and enclave signer's
  *				public key does not match IA32_SGXLEPUBKEYHASH.
  * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was received
  */
 enum sgx_return_code {
 	SGX_NOT_TRACKED			= 11,
+	SGX_CHILD_PRESENT		= 13,
 	SGX_INVALID_EINITTOKEN		= 16,
 	SGX_UNMASKED_EVENT		= 128,
 };

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/sgx: Introduce virtual EPC for use by KVM guests
  2021-03-19  7:22 ` [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
                     ` (3 preceding siblings ...)
  2021-04-05  9:01   ` [PATCH v3 " Borislav Petkov
@ 2021-04-07 10:03   ` tip-bot2 for Sean Christopherson
  4 siblings, 0 replies; 134+ messages in thread
From: tip-bot2 for Sean Christopherson @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sean Christopherson, Kai Huang, Borislav Petkov, Dave Hansen,
	Jarkko Sakkinen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     540745ddbc70eabdc7dbd3fcc00fe4fb17cd59ba
Gitweb:        https://git.kernel.org/tip/540745ddbc70eabdc7dbd3fcc00fe4fb17cd59ba
Author:        Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:22:21 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 06 Apr 2021 09:43:17 +02:00

x86/sgx: Introduce virtual EPC for use by KVM guests

Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw"
Enclave Page Cache (EPC) without an associated enclave. The intended
and only known use case for raw EPC allocation is to expose EPC to a
KVM guest, hence the 'vepc' moniker, virt.{c,h} files and X86_SGX_KVM
Kconfig.

The SGX driver uses the misc device /dev/sgx_enclave to support
userspace in creating an enclave. Each file descriptor returned from
opening /dev/sgx_enclave represents an enclave. Unlike the SGX driver,
KVM doesn't control how the guest uses the EPC, therefore EPC allocated
to a KVM guest is not associated with an enclave, and /dev/sgx_enclave
is not suitable for allocating EPC for a KVM guest.

Having separate device nodes for the SGX driver and KVM virtual EPC also
allows separate permission control for running host SGX enclaves and KVM
SGX guests.

To use /dev/sgx_vepc to allocate a virtual EPC instance with particular
size, the hypervisor opens /dev/sgx_vepc, and uses mmap() with the
intended size to get an address range of virtual EPC. Then it may use
the address range to create one KVM memory slot as virtual EPC for
a guest.

Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
/dev/sgx_vepc rather than in KVM. Doing so has two major advantages:

  - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
    just another memory backend for guests.

  - EPC management is wholly contained in the SGX subsystem, e.g. SGX
    does not have to export any symbols, changes to reclaim flows don't
    need to be routed through KVM, SGX's dirty laundry doesn't have to
    get aired out for the world to see, and so on and so forth.

The virtual EPC pages allocated to guests are currently not reclaimable.
Reclaiming an EPC page used by enclave requires a special reclaim
mechanism separate from normal page reclaim, and that mechanism is not
supported for virutal EPC pages. Due to the complications of handling
reclaim conflicts between guest and host, reclaiming virtual EPC pages
is significantly more complex than basic support for SGX virtualization.

 [ bp:
   - Massage commit message and comments
   - use cpu_feature_enabled()
   - vertically align struct members init
   - massage Virtual EPC clarification text
   - move Kconfig prompt to Virtualization ]

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/0c38ced8c8e5a69872db4d6a1c0dabd01e07cad7.1616136308.git.kai.huang@intel.com
---
 Documentation/x86/sgx.rst        |  16 ++-
 arch/x86/kernel/cpu/sgx/Makefile |   1 +-
 arch/x86/kernel/cpu/sgx/sgx.h    |   9 +-
 arch/x86/kernel/cpu/sgx/virt.c   | 259 ++++++++++++++++++++++++++++++-
 arch/x86/kvm/Kconfig             |  12 +-
 5 files changed, 297 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/virt.c

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index f90076e..dd0ac96 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -234,3 +234,19 @@ As a result, when this happpens, user should stop running any new
 SGX workloads, (or just any new workloads), and migrate all valuable
 workloads. Although a machine reboot can recover all EPC memory, the bug
 should be reported to Linux developers.
+
+
+Virtual EPC
+===========
+
+The implementation has also a virtual EPC driver to support SGX enclaves
+in guests. Unlike the SGX driver, an EPC page allocated by the virtual
+EPC driver doesn't have a specific enclave associated with it. This is
+because KVM doesn't track how a guest uses EPC pages.
+
+As a result, the SGX core page reclaimer doesn't support reclaiming EPC
+pages allocated to KVM guests through the virtual EPC driver. If the
+user wants to deploy SGX applications both on the host and in guests
+on the same machine, the user should reserve enough EPC (by taking out
+total virtual EPC size of all SGX VMs from the physical EPC size) for
+host SGX applications so they can run with acceptable performance.
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 91d3dc7..9c16567 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -3,3 +3,4 @@ obj-y += \
 	encl.o \
 	ioctl.o \
 	main.o
+obj-$(CONFIG_X86_SGX_KVM)	+= virt.o
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4aa40c6..4854f39 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -84,4 +84,13 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
 
+#ifdef CONFIG_X86_SGX_KVM
+int __init sgx_vepc_init(void);
+#else
+static inline int __init sgx_vepc_init(void)
+{
+	return -ENODEV;
+}
+#endif
+
 #endif /* _X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
new file mode 100644
index 0000000..259cc46
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -0,0 +1,259 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Device driver to expose SGX enclave memory to KVM guests.
+ *
+ * Copyright(c) 2021 Intel Corporation.
+ */
+
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/xarray.h>
+#include <asm/sgx.h>
+#include <uapi/asm/sgx.h>
+
+#include "encls.h"
+#include "sgx.h"
+
+struct sgx_vepc {
+	struct xarray page_array;
+	struct mutex lock;
+};
+
+/*
+ * Temporary SECS pages that cannot be EREMOVE'd due to having child in other
+ * virtual EPC instances, and the lock to protect it.
+ */
+static struct mutex zombie_secs_pages_lock;
+static struct list_head zombie_secs_pages;
+
+static int __sgx_vepc_fault(struct sgx_vepc *vepc,
+			    struct vm_area_struct *vma, unsigned long addr)
+{
+	struct sgx_epc_page *epc_page;
+	unsigned long index, pfn;
+	int ret;
+
+	WARN_ON(!mutex_is_locked(&vepc->lock));
+
+	/* Calculate index of EPC page in virtual EPC's page_array */
+	index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
+
+	epc_page = xa_load(&vepc->page_array, index);
+	if (epc_page)
+		return 0;
+
+	epc_page = sgx_alloc_epc_page(vepc, false);
+	if (IS_ERR(epc_page))
+		return PTR_ERR(epc_page);
+
+	ret = xa_err(xa_store(&vepc->page_array, index, epc_page, GFP_KERNEL));
+	if (ret)
+		goto err_free;
+
+	pfn = PFN_DOWN(sgx_get_epc_phys_addr(epc_page));
+
+	ret = vmf_insert_pfn(vma, addr, pfn);
+	if (ret != VM_FAULT_NOPAGE) {
+		ret = -EFAULT;
+		goto err_delete;
+	}
+
+	return 0;
+
+err_delete:
+	xa_erase(&vepc->page_array, index);
+err_free:
+	sgx_free_epc_page(epc_page);
+	return ret;
+}
+
+static vm_fault_t sgx_vepc_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_vepc *vepc = vma->vm_private_data;
+	int ret;
+
+	mutex_lock(&vepc->lock);
+	ret = __sgx_vepc_fault(vepc, vma, vmf->address);
+	mutex_unlock(&vepc->lock);
+
+	if (!ret)
+		return VM_FAULT_NOPAGE;
+
+	if (ret == -EBUSY && (vmf->flags & FAULT_FLAG_ALLOW_RETRY)) {
+		mmap_read_unlock(vma->vm_mm);
+		return VM_FAULT_RETRY;
+	}
+
+	return VM_FAULT_SIGBUS;
+}
+
+const struct vm_operations_struct sgx_vepc_vm_ops = {
+	.fault = sgx_vepc_fault,
+};
+
+static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct sgx_vepc *vepc = file->private_data;
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
+	vma->vm_ops = &sgx_vepc_vm_ops;
+	/* Don't copy VMA in fork() */
+	vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
+	vma->vm_private_data = vepc;
+
+	return 0;
+}
+
+static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+{
+	int ret;
+
+	/*
+	 * Take a previously guest-owned EPC page and return it to the
+	 * general EPC page pool.
+	 *
+	 * Guests can not be trusted to have left this page in a good
+	 * state, so run EREMOVE on the page unconditionally.  In the
+	 * case that a guest properly EREMOVE'd this page, a superfluous
+	 * EREMOVE is harmless.
+	 */
+	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
+	if (ret) {
+		/*
+		 * Only SGX_CHILD_PRESENT is expected, which is because of
+		 * EREMOVE'ing an SECS still with child, in which case it can
+		 * be handled by EREMOVE'ing the SECS again after all pages in
+		 * virtual EPC have been EREMOVE'd. See comments in below in
+		 * sgx_vepc_release().
+		 *
+		 * The user of virtual EPC (KVM) needs to guarantee there's no
+		 * logical processor is still running in the enclave in guest,
+		 * otherwise EREMOVE will get SGX_ENCLAVE_ACT which cannot be
+		 * handled here.
+		 */
+		WARN_ONCE(ret != SGX_CHILD_PRESENT, EREMOVE_ERROR_MESSAGE,
+			  ret, ret);
+		return ret;
+	}
+
+	sgx_free_epc_page(epc_page);
+
+	return 0;
+}
+
+static int sgx_vepc_release(struct inode *inode, struct file *file)
+{
+	struct sgx_vepc *vepc = file->private_data;
+	struct sgx_epc_page *epc_page, *tmp, *entry;
+	unsigned long index;
+
+	LIST_HEAD(secs_pages);
+
+	xa_for_each(&vepc->page_array, index, entry) {
+		/*
+		 * Remove all normal, child pages.  sgx_vepc_free_page()
+		 * will fail if EREMOVE fails, but this is OK and expected on
+		 * SECS pages.  Those can only be EREMOVE'd *after* all their
+		 * child pages. Retries below will clean them up.
+		 */
+		if (sgx_vepc_free_page(entry))
+			continue;
+
+		xa_erase(&vepc->page_array, index);
+	}
+
+	/*
+	 * Retry EREMOVE'ing pages.  This will clean up any SECS pages that
+	 * only had children in this 'epc' area.
+	 */
+	xa_for_each(&vepc->page_array, index, entry) {
+		epc_page = entry;
+		/*
+		 * An EREMOVE failure here means that the SECS page still
+		 * has children.  But, since all children in this 'sgx_vepc'
+		 * have been removed, the SECS page must have a child on
+		 * another instance.
+		 */
+		if (sgx_vepc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+
+		xa_erase(&vepc->page_array, index);
+	}
+
+	/*
+	 * SECS pages are "pinned" by child pages, and "unpinned" once all
+	 * children have been EREMOVE'd.  A child page in this instance
+	 * may have pinned an SECS page encountered in an earlier release(),
+	 * creating a zombie.  Since some children were EREMOVE'd above,
+	 * try to EREMOVE all zombies in the hopes that one was unpinned.
+	 */
+	mutex_lock(&zombie_secs_pages_lock);
+	list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) {
+		/*
+		 * Speculatively remove the page from the list of zombies,
+		 * if the page is successfully EREMOVE'd it will be added to
+		 * the list of free pages.  If EREMOVE fails, throw the page
+		 * on the local list, which will be spliced on at the end.
+		 */
+		list_del(&epc_page->list);
+
+		if (sgx_vepc_free_page(epc_page))
+			list_add_tail(&epc_page->list, &secs_pages);
+	}
+
+	if (!list_empty(&secs_pages))
+		list_splice_tail(&secs_pages, &zombie_secs_pages);
+	mutex_unlock(&zombie_secs_pages_lock);
+
+	kfree(vepc);
+
+	return 0;
+}
+
+static int sgx_vepc_open(struct inode *inode, struct file *file)
+{
+	struct sgx_vepc *vepc;
+
+	vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL);
+	if (!vepc)
+		return -ENOMEM;
+	mutex_init(&vepc->lock);
+	xa_init(&vepc->page_array);
+
+	file->private_data = vepc;
+
+	return 0;
+}
+
+static const struct file_operations sgx_vepc_fops = {
+	.owner		= THIS_MODULE,
+	.open		= sgx_vepc_open,
+	.release	= sgx_vepc_release,
+	.mmap		= sgx_vepc_mmap,
+};
+
+static struct miscdevice sgx_vepc_dev = {
+	.minor		= MISC_DYNAMIC_MINOR,
+	.name		= "sgx_vepc",
+	.nodename	= "sgx_vepc",
+	.fops		= &sgx_vepc_fops,
+};
+
+int __init sgx_vepc_init(void)
+{
+	/* SGX virtualization requires KVM to work */
+	if (!cpu_feature_enabled(X86_FEATURE_VMX))
+		return -ENODEV;
+
+	INIT_LIST_HEAD(&zombie_secs_pages);
+	mutex_init(&zombie_secs_pages_lock);
+
+	return misc_register(&sgx_vepc_dev);
+}
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index a788d51..f6b93a3 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -84,6 +84,18 @@ config KVM_INTEL
 	  To compile this as a module, choose M here: the module
 	  will be called kvm-intel.
 
+config X86_SGX_KVM
+	bool "Software Guard eXtensions (SGX) Virtualization"
+	depends on X86_SGX && KVM_INTEL
+	help
+
+	  Enables KVM guests to create SGX enclaves.
+
+	  This includes support to expose "raw" unreclaimable enclave memory to
+	  guests via a device node, e.g. /dev/sgx_vepc.
+
+	  If unsure, say N.
+
 config KVM_AMD
 	tristate "KVM for AMD processors support"
 	depends on KVM

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-25  9:30   ` [PATCH v4 " Kai Huang
  2021-03-26 19:48     ` Jarkko Sakkinen
@ 2021-04-07 10:03     ` tip-bot2 for Kai Huang
  1 sibling, 0 replies; 134+ messages in thread
From: tip-bot2 for Kai Huang @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kai Huang, Borislav Petkov, Jarkko Sakkinen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     b0c7459be0670fabe080e30906ba9fe62df5e02c
Gitweb:        https://git.kernel.org/tip/b0c7459be0670fabe080e30906ba9fe62df5e02c
Author:        Kai Huang <kai.huang@intel.com>
AuthorDate:    Thu, 25 Mar 2021 22:30:57 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 26 Mar 2021 22:51:23 +01:00

x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

EREMOVE takes a page and removes any association between that page and
an enclave. It must be run on a page before it can be added into another
enclave. Currently, EREMOVE is run as part of pages being freed into the
SGX page allocator. It is not expected to fail, as it would indicate a
use-after-free of EPC pages. Rather than add the page back to the pool
of available EPC pages, the kernel intentionally leaks the page to avoid
additional errors in the future.

However, KVM does not track how guest pages are used, which means that
SGX virtualization use of EREMOVE might fail. Specifically, it is
legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
KVM guest, because KVM/kernel doesn't track SECS pages.

To allow SGX/KVM to introduce a more permissive EREMOVE helper and
to let the SGX virtualization code use the allocator directly, break
out the EREMOVE call from the SGX page allocator. Rename the original
sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that
it is used to free an EPC page assigned to a host enclave. Replace
sgx_free_epc_page() with sgx_encl_free_epc_page() in all call sites so
there's no functional change.

At the same time, improve the error message when EREMOVE fails, and
add documentation to explain to the user what that failure means and
to suggest to the user what to do when this bug happens in the case it
happens.

 [ bp: Massage commit message, fix typos and sanitize text, simplify. ]

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20210325093057.122834-1-kai.huang@intel.com
---
 Documentation/x86/sgx.rst       | 25 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.c  | 31 ++++++++++++++++++++++++++-----
 arch/x86/kernel/cpu/sgx/encl.h  |  1 +
 arch/x86/kernel/cpu/sgx/ioctl.c |  6 +++---
 arch/x86/kernel/cpu/sgx/main.c  | 14 +++++---------
 arch/x86/kernel/cpu/sgx/sgx.h   |  4 ++++
 6 files changed, 64 insertions(+), 17 deletions(-)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index eaee136..f90076e 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -209,3 +209,28 @@ An application may be loaded into a container enclave which is specially
 configured with a library OS and run-time which permits the application to run.
 The enclave run-time and library OS work together to execute the application
 when a thread enters the enclave.
+
+Impact of Potential Kernel SGX Bugs
+===================================
+
+EPC leaks
+---------
+
+When EPC page leaks happen, a WARNING like this is shown in dmesg:
+
+"EREMOVE returned ... and an EPC page was leaked.  SGX may become unusable..."
+
+This is effectively a kernel use-after-free of an EPC page, and due
+to the way SGX works, the bug is detected at freeing. Rather than
+adding the page back to the pool of available EPC pages, the kernel
+intentionally leaks the page to avoid additional errors in the future.
+
+When this happens, the kernel will likely soon leak more EPC pages, and
+SGX will likely become unusable because the memory available to SGX is
+limited. However, while this may be fatal to SGX, the rest of the kernel
+is unlikely to be impacted and should continue to work.
+
+As a result, when this happpens, user should stop running any new
+SGX workloads, (or just any new workloads), and migrate all valuable
+workloads. Although a machine reboot can recover all EPC memory, the bug
+should be reported to Linux developers.
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 7449ef3..d25f2a2 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -78,7 +78,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	ret = __sgx_encl_eldu(encl_page, epc_page, secs_page);
 	if (ret) {
-		sgx_free_epc_page(epc_page);
+		sgx_encl_free_epc_page(epc_page);
 		return ERR_PTR(ret);
 	}
 
@@ -404,7 +404,7 @@ void sgx_encl_release(struct kref *ref)
 			if (sgx_unmark_page_reclaimable(entry->epc_page))
 				continue;
 
-			sgx_free_epc_page(entry->epc_page);
+			sgx_encl_free_epc_page(entry->epc_page);
 			encl->secs_child_cnt--;
 			entry->epc_page = NULL;
 		}
@@ -415,7 +415,7 @@ void sgx_encl_release(struct kref *ref)
 	xa_destroy(&encl->page_array);
 
 	if (!encl->secs_child_cnt && encl->secs.epc_page) {
-		sgx_free_epc_page(encl->secs.epc_page);
+		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 	}
 
@@ -423,7 +423,7 @@ void sgx_encl_release(struct kref *ref)
 		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
 					   list);
 		list_del(&va_page->list);
-		sgx_free_epc_page(va_page->epc_page);
+		sgx_encl_free_epc_page(va_page->epc_page);
 		kfree(va_page);
 	}
 
@@ -686,7 +686,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
 	ret = __epa(sgx_get_epc_virt_addr(epc_page));
 	if (ret) {
 		WARN_ONCE(1, "EPA returned %d (0x%x)", ret, ret);
-		sgx_free_epc_page(epc_page);
+		sgx_encl_free_epc_page(epc_page);
 		return ERR_PTR(-EFAULT);
 	}
 
@@ -735,3 +735,24 @@ bool sgx_va_page_full(struct sgx_va_page *va_page)
 
 	return slot == SGX_VA_SLOT_COUNT;
 }
+
+/**
+ * sgx_encl_free_epc_page - free an EPC page assigned to an enclave
+ * @page:	EPC page to be freed
+ *
+ * Free an EPC page assigned to an enclave. It does EREMOVE for the page, and
+ * only upon success, it puts the page back to free page list.  Otherwise, it
+ * gives a WARNING to indicate page is leaked.
+ */
+void sgx_encl_free_epc_page(struct sgx_epc_page *page)
+{
+	int ret;
+
+	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
+
+	ret = __eremove(sgx_get_epc_virt_addr(page));
+	if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret))
+		return;
+
+	sgx_free_epc_page(page);
+}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index d8d30cc..6e74f85 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -115,5 +115,6 @@ struct sgx_epc_page *sgx_alloc_va_page(void);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
+void sgx_encl_free_epc_page(struct sgx_epc_page *page);
 
 #endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 2e10367..354e309 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -47,7 +47,7 @@ static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
 	encl->page_cnt--;
 
 	if (va_page) {
-		sgx_free_epc_page(va_page->epc_page);
+		sgx_encl_free_epc_page(va_page->epc_page);
 		list_del(&va_page->list);
 		kfree(va_page);
 	}
@@ -117,7 +117,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 	return 0;
 
 err_out:
-	sgx_free_epc_page(encl->secs.epc_page);
+	sgx_encl_free_epc_page(encl->secs.epc_page);
 	encl->secs.epc_page = NULL;
 
 err_out_backing:
@@ -365,7 +365,7 @@ err_out_unlock:
 	mmap_read_unlock(current->mm);
 
 err_out_free:
-	sgx_free_epc_page(epc_page);
+	sgx_encl_free_epc_page(epc_page);
 	kfree(encl_page);
 
 	return ret;
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 13a7599..b227629 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -294,7 +294,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 
 		sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
 
-		sgx_free_epc_page(encl->secs.epc_page);
+		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 
 		sgx_encl_put_backing(&secs_backing, true);
@@ -609,19 +609,15 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  * sgx_free_epc_page() - Free an EPC page
  * @page:	an EPC page
  *
- * Call EREMOVE for an EPC page and insert it back to the list of free pages.
+ * Put the EPC page back to the list of free pages. It's the caller's
+ * responsibility to make sure that the page is in uninitialized state. In other
+ * words, do EREMOVE, EWB or whatever operation is necessary before calling
+ * this function.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
 	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
 	struct sgx_numa_node *node = section->node;
-	int ret;
-
-	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
-
-	ret = __eremove(sgx_get_epc_virt_addr(page));
-	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
-		return;
 
 	spin_lock(&node->lock);
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 653af8c..4aa40c6 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -13,6 +13,10 @@
 #undef pr_fmt
 #define pr_fmt(fmt) "sgx: " fmt
 
+#define EREMOVE_ERROR_MESSAGE \
+	"EREMOVE returned %d (0x%x) and an EPC page was leaked. SGX may become unusable. " \
+	"Refer to Documentation/x86/sgx.rst for more information."
+
 #define SGX_MAX_EPC_SECTIONS		8
 #define SGX_EEXTEND_BLOCK_SIZE		256
 #define SGX_NR_TO_SCAN			16

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit
  2021-03-19  7:22 ` [PATCH v3 01/25] x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit Kai Huang
@ 2021-04-07 10:03   ` tip-bot2 for Kai Huang
  0 siblings, 0 replies; 134+ messages in thread
From: tip-bot2 for Kai Huang @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sean Christopherson, Kai Huang, Borislav Petkov, Dave Hansen,
	Jarkko Sakkinen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     e9a15a40e857fc6ccfbb05fec7b184e9003057df
Gitweb:        https://git.kernel.org/tip/e9a15a40e857fc6ccfbb05fec7b184e9003057df
Author:        Kai Huang <kai.huang@intel.com>
AuthorDate:    Fri, 19 Mar 2021 20:22:17 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Thu, 25 Mar 2021 17:33:11 +01:00

x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit

Move SGX_LC feature bit to CPUID dependency table to make clearing all
SGX feature bits easier. Also remove clear_sgx_caps() since it is just
a wrapper of setup_clear_cpu_cap(X86_FEATURE_SGX) now.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/5d4220fd0a39f52af024d3fa166231c1d498dd10.1616136308.git.kai.huang@intel.com
---
 arch/x86/kernel/cpu/cpuid-deps.c |  1 +
 arch/x86/kernel/cpu/feat_ctl.c   | 12 +++---------
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 42af31b..d40f8e0 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -72,6 +72,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_PER_THREAD_MBA,		X86_FEATURE_MBA       },
+	{ X86_FEATURE_SGX_LC,			X86_FEATURE_SGX	      },
 	{}
 };
 
diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
index 3b1b01f..27533a6 100644
--- a/arch/x86/kernel/cpu/feat_ctl.c
+++ b/arch/x86/kernel/cpu/feat_ctl.c
@@ -93,15 +93,9 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c)
 }
 #endif /* CONFIG_X86_VMX_FEATURE_NAMES */
 
-static void clear_sgx_caps(void)
-{
-	setup_clear_cpu_cap(X86_FEATURE_SGX);
-	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
-}
-
 static int __init nosgx(char *str)
 {
-	clear_sgx_caps();
+	setup_clear_cpu_cap(X86_FEATURE_SGX);
 
 	return 0;
 }
@@ -116,7 +110,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
 
 	if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) {
 		clear_cpu_cap(c, X86_FEATURE_VMX);
-		clear_sgx_caps();
+		clear_cpu_cap(c, X86_FEATURE_SGX);
 		return;
 	}
 
@@ -177,6 +171,6 @@ update_sgx:
 	    !(msr & FEAT_CTL_SGX_LC_ENABLED) || !enable_sgx) {
 		if (enable_sgx)
 			pr_err_once("SGX disabled by BIOS\n");
-		clear_sgx_caps();
+		clear_cpu_cap(c, X86_FEATURE_SGX);
 	}
 }

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [tip: x86/sgx] x86/cpufeatures: Add SGX1 and SGX2 sub-features
  2021-03-19  7:22 ` [PATCH v3 02/25] x86/cpufeatures: Add SGX1 and SGX2 sub-features Kai Huang
@ 2021-04-07 10:03   ` tip-bot2 for Sean Christopherson
  0 siblings, 0 replies; 134+ messages in thread
From: tip-bot2 for Sean Christopherson @ 2021-04-07 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sean Christopherson, Kai Huang, Borislav Petkov, Dave Hansen,
	Jarkko Sakkinen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     b8921dccf3b25798409d35155b5d127085de72c2
Gitweb:        https://git.kernel.org/tip/b8921dccf3b25798409d35155b5d127085de72c2
Author:        Sean Christopherson <seanjc@google.com>
AuthorDate:    Fri, 19 Mar 2021 20:22:18 +13:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Thu, 25 Mar 2021 17:33:11 +01:00

x86/cpufeatures: Add SGX1 and SGX2 sub-features

Add SGX1 and SGX2 feature flags, via CPUID.0x12.0x0.EAX, as scattered
features, since adding a new leaf for only two bits would be wasteful.
As part of virtualizing SGX, KVM will expose the SGX CPUID leafs to its
guest, and to do so correctly needs to query hardware and kernel support
for SGX1 and SGX2.

Suppress both SGX1 and SGX2 from /proc/cpuinfo. SGX1 basically means
SGX, and for SGX2 there is no concrete use case of using it in
/proc/cpuinfo.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/d787827dbfca6b3210ac3e432e3ac1202727e786.1616136308.git.kai.huang@intel.com
---
 arch/x86/include/asm/cpufeatures.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c   | 2 ++
 arch/x86/kernel/cpu/scattered.c    | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cc96e26..1f918f5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -290,6 +290,8 @@
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
 #define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 #define X86_FEATURE_PER_THREAD_MBA	(11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
+#define X86_FEATURE_SGX1		(11*32+ 8) /* "" Basic SGX */
+#define X86_FEATURE_SGX2		(11*32+ 9) /* "" SGX Enclave Dynamic Memory Management (EDMM) */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index d40f8e0..defda61 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -73,6 +73,8 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_PER_THREAD_MBA,		X86_FEATURE_MBA       },
 	{ X86_FEATURE_SGX_LC,			X86_FEATURE_SGX	      },
+	{ X86_FEATURE_SGX1,			X86_FEATURE_SGX       },
+	{ X86_FEATURE_SGX2,			X86_FEATURE_SGX1      },
 	{}
 };
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 972ec3b..21d1f06 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -36,6 +36,8 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_CDP_L2,		CPUID_ECX,  2, 0x00000010, 2 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  3, 0x00000010, 0 },
 	{ X86_FEATURE_PER_THREAD_MBA,	CPUID_ECX,  0, 0x00000010, 3 },
+	{ X86_FEATURE_SGX1,		CPUID_EAX,  0, 0x00000012, 0 },
+	{ X86_FEATURE_SGX2,		CPUID_EAX,  1, 0x00000012, 0 },
 	{ X86_FEATURE_HW_PSTATE,	CPUID_EDX,  7, 0x80000007, 0 },
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-15 22:59               ` Jarkko Sakkinen
@ 2021-03-15 23:50                 ` Kai Huang
  0 siblings, 0 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-15 23:50 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Sean Christopherson, kvm, x86, linux-sgx, linux-kernel, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx,
	mingo, hpa

On Tue, 16 Mar 2021 00:59:31 +0200 Jarkko Sakkinen wrote:
> On Tue, Mar 16, 2021 at 09:29:34AM +1300, Kai Huang wrote:
> > On Mon, 15 Mar 2021 15:19:32 +0200 Jarkko Sakkinen wrote:
> > > On Mon, Mar 15, 2021 at 03:18:16PM +0200, Jarkko Sakkinen wrote:
> > > > On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote:
> > > > > On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote:
> > > > > > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote:
> > > > > > > On Thu, Mar 11, 2021, Kai Huang wrote:
> > > > > > > > From: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > > > 
> > > > > > > > EREMOVE takes a page and removes any association between that page and
> > > > > > > > an enclave.  It must be run on a page before it can be added into
> > > > > > > > another enclave.  Currently, EREMOVE is run as part of pages being freed
> > > > > > > > into the SGX page allocator.  It is not expected to fail.
> > > > > > > > 
> > > > > > > > KVM does not track how guest pages are used, which means that SGX
> > > > > > > > virtualization use of EREMOVE might fail.
> > > > > > > > 
> > > > > > > > Break out the EREMOVE call from the SGX page allocator.  This will allow
> > > > > > > > the SGX virtualization code to use the allocator directly.  (SGX/KVM
> > > > > > > > will also introduce a more permissive EREMOVE helper).
> > > > > > > > 
> > > > > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
> > > > > > > > more specific that it is used to free EPC page assigned to one enclave.
> > > > > > > > Print an error message when EREMOVE fails to explicitly call out EPC
> > > > > > > > page is leaked, and requires machine reboot to get leaked pages back.
> > > > > > > > 
> > > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > > > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > > > > > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > > > > > ---
> > > > > > > > v2->v3:
> > > > > > > > 
> > > > > > > >  - Fixed bug during copy/paste which results in SECS page and va pages are not
> > > > > > > >    correctly freed in sgx_encl_release() (sorry for the mistake).
> > > > > > > >  - Added Jarkko's Acked-by.
> > > > > > > 
> > > > > > > That Acked-by should either be dropped or moved above Co-developed-by to make
> > > > > > > checkpatch happy.
> > > > > > > 
> > > > > > > Reviewed-by: Sean Christopherson <seanjc@google.com>
> > > > > > 
> > > > > > Oops, my bad. Yup, ack should be removed.
> > > > > > 
> > > > > > /Jarkko
> > > > > 
> > > > > Hi Jarkko,
> > > > > 
> > > > > Your reply of your concern of this patch to the cover-letter
> > > > > 
> > > > > https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/
> > > > > 
> > > > > reminds me to do more sanity check of whether removing EREMOVE in
> > > > > sgx_free_epc_page() will impact other code path or not, and I think
> > > > > sgx_encl_release() is not the only place should be changed:
> > > > > 
> > > > > - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is
> > > > > called, the VA page can be already valid -- there are other failures can
> > > > > trigger sgx_encl_shrink().
> > > > 
> > > > You right about this, good catch.
> > > > 
> > > > Shrink needs to always do EREMOVE as grow has done EPA, which changes
> > > > EPC page state.
> > > > 
> > > > > - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:"
> > > > > label, since the EPC page can be already valid when error happened, i.e. when
> > > > > EEXTEND fails.
> > > > 
> > > > Yes, correct, good work!
> > > > 
> > > > > Other places should be OK per my check, but I'd prefer to just replacing all
> > > > > sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with
> > > > > one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA
> > > > > fails, in which case EREMOVE is not required for sure.
> > > > 
> > > > I would not unless they require it.
> > > > 
> > > > > Your idea, please?
> > > > > 
> > > > > Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me,
> > > > > because virtualization has a counterpart in sgx/virt.c too.
> > > > 
> > > > It does make sense to use sgx_free_epc_page() everywhere where it's
> > > > the right thing to call and here's why.
> > > > 
> > > > If there is some unrelated regression that causes EPC page not get
> > > > uninitialized when it actually should, doing extra EREMOVE could mask
> > > > those bugs. I.e. it can postpone a failure, which can make a bug harder
> > > > to backtrace.
> > > > 
> > > 
> > > I.e. even though it is true that for correctly working code extra EREMOVE
> > > is nil functionality, it could change semantics for buggy code.
> > 
> > Thanks for feedback. Sorry I am not sure if I understand you. So if we don't
> > want to bring functionality change, we need to replace sgx_free_epc_page() in
> > all call sites with sgx_encl_free_epc_page(). To me for this patch only, it's
> > better not to bring any functional change, so I intend to replace all (I now
> > consider even leaving sgx_alloc_va_page() out is not good idea in *this*
> > patch). 
> > 
> > Or do you just want to replace sgx_free_epc_page() with
> > sgx_encl_free_epc_page() in sgx_encl_shrink() and sgx_encl_add_page(), as I
> > pointed above? In this way there will be functional change in this patch, and
> > we need to explicitly explain  why leaving others out is OK in commit message.
> > 
> > To me I prefer the former.
> 
> The original purpose of this patch was exactly to remove EREMOVE
> sgx_free_epc_page() and call it explicitly where it is required. That's
> why I introduced sgx_reset_epc_page(). So the latter was actually the goal
> of this patch at least when I did it. Now this is something completely
> different.
> 
> So, I don't consider myself author of this patch in any possible way,
> because this is not what I intended.
> 
> To move forward, for the next patch set version, you should change the
> author field as yourself, and remove all my tags, and I will review it.
> So you can work out this with former approach if you wish.
> 
> I.e. my ack/nak/etc. apply to this patch because it's not my code.

OK. Thanks.


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-15 20:29             ` Kai Huang
  2021-03-15 22:59               ` Jarkko Sakkinen
@ 2021-03-15 23:11               ` Jarkko Sakkinen
  1 sibling, 0 replies; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-15 23:11 UTC (permalink / raw)
  To: Kai Huang
  Cc: Sean Christopherson, kvm, x86, linux-sgx, linux-kernel, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx,
	mingo, hpa

On Tue, Mar 16, 2021 at 09:29:34AM +1300, Kai Huang wrote:
> On Mon, 15 Mar 2021 15:19:32 +0200 Jarkko Sakkinen wrote:
> > On Mon, Mar 15, 2021 at 03:18:16PM +0200, Jarkko Sakkinen wrote:
> > > On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote:
> > > > On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote:
> > > > > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote:
> > > > > > On Thu, Mar 11, 2021, Kai Huang wrote:
> > > > > > > From: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > > 
> > > > > > > EREMOVE takes a page and removes any association between that page and
> > > > > > > an enclave.  It must be run on a page before it can be added into
> > > > > > > another enclave.  Currently, EREMOVE is run as part of pages being freed
> > > > > > > into the SGX page allocator.  It is not expected to fail.
> > > > > > > 
> > > > > > > KVM does not track how guest pages are used, which means that SGX
> > > > > > > virtualization use of EREMOVE might fail.
> > > > > > > 
> > > > > > > Break out the EREMOVE call from the SGX page allocator.  This will allow
> > > > > > > the SGX virtualization code to use the allocator directly.  (SGX/KVM
> > > > > > > will also introduce a more permissive EREMOVE helper).
> > > > > > > 
> > > > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
> > > > > > > more specific that it is used to free EPC page assigned to one enclave.
> > > > > > > Print an error message when EREMOVE fails to explicitly call out EPC
> > > > > > > page is leaked, and requires machine reboot to get leaked pages back.
> > > > > > > 
> > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > > > > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > > > > ---
> > > > > > > v2->v3:
> > > > > > > 
> > > > > > >  - Fixed bug during copy/paste which results in SECS page and va pages are not
> > > > > > >    correctly freed in sgx_encl_release() (sorry for the mistake).
> > > > > > >  - Added Jarkko's Acked-by.
> > > > > > 
> > > > > > That Acked-by should either be dropped or moved above Co-developed-by to make
> > > > > > checkpatch happy.
> > > > > > 
> > > > > > Reviewed-by: Sean Christopherson <seanjc@google.com>
> > > > > 
> > > > > Oops, my bad. Yup, ack should be removed.
> > > > > 
> > > > > /Jarkko
> > > > 
> > > > Hi Jarkko,
> > > > 
> > > > Your reply of your concern of this patch to the cover-letter
> > > > 
> > > > https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/
> > > > 
> > > > reminds me to do more sanity check of whether removing EREMOVE in
> > > > sgx_free_epc_page() will impact other code path or not, and I think
> > > > sgx_encl_release() is not the only place should be changed:
> > > > 
> > > > - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is
> > > > called, the VA page can be already valid -- there are other failures can
> > > > trigger sgx_encl_shrink().
> > > 
> > > You right about this, good catch.
> > > 
> > > Shrink needs to always do EREMOVE as grow has done EPA, which changes
> > > EPC page state.
> > > 
> > > > - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:"
> > > > label, since the EPC page can be already valid when error happened, i.e. when
> > > > EEXTEND fails.
> > > 
> > > Yes, correct, good work!
> > > 
> > > > Other places should be OK per my check, but I'd prefer to just replacing all
> > > > sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with
> > > > one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA
> > > > fails, in which case EREMOVE is not required for sure.
> > > 
> > > I would not unless they require it.
> > > 
> > > > Your idea, please?
> > > > 
> > > > Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me,
> > > > because virtualization has a counterpart in sgx/virt.c too.
> > > 
> > > It does make sense to use sgx_free_epc_page() everywhere where it's
> > > the right thing to call and here's why.
> > > 
> > > If there is some unrelated regression that causes EPC page not get
> > > uninitialized when it actually should, doing extra EREMOVE could mask
> > > those bugs. I.e. it can postpone a failure, which can make a bug harder
> > > to backtrace.
> > > 
> > 
> > I.e. even though it is true that for correctly working code extra EREMOVE
> > is nil functionality, it could change semantics for buggy code.
> 
> Thanks for feedback. Sorry I am not sure if I understand you. So if we don't
> want to bring functionality change, we need to replace sgx_free_epc_page() in
> all call sites with sgx_encl_free_epc_page(). To me for this patch only, it's
> better not to bring any functional change, so I intend to replace all (I now
> consider even leaving sgx_alloc_va_page() out is not good idea in *this*
> patch). 
> 
> Or do you just want to replace sgx_free_epc_page() with
> sgx_encl_free_epc_page() in sgx_encl_shrink() and sgx_encl_add_page(), as I
> pointed above? In this way there will be functional change in this patch, and
> we need to explicitly explain  why leaving others out is OK in commit message.
> 
> To me I prefer the former.

But yes, I'm cool with your preference and I do get your argument, I just
need to review it, and do not consider it as my patch :-)

/Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-15 20:29             ` Kai Huang
@ 2021-03-15 22:59               ` Jarkko Sakkinen
  2021-03-15 23:50                 ` Kai Huang
  2021-03-15 23:11               ` Jarkko Sakkinen
  1 sibling, 1 reply; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-15 22:59 UTC (permalink / raw)
  To: Kai Huang
  Cc: Sean Christopherson, kvm, x86, linux-sgx, linux-kernel, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx,
	mingo, hpa

On Tue, Mar 16, 2021 at 09:29:34AM +1300, Kai Huang wrote:
> On Mon, 15 Mar 2021 15:19:32 +0200 Jarkko Sakkinen wrote:
> > On Mon, Mar 15, 2021 at 03:18:16PM +0200, Jarkko Sakkinen wrote:
> > > On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote:
> > > > On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote:
> > > > > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote:
> > > > > > On Thu, Mar 11, 2021, Kai Huang wrote:
> > > > > > > From: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > > 
> > > > > > > EREMOVE takes a page and removes any association between that page and
> > > > > > > an enclave.  It must be run on a page before it can be added into
> > > > > > > another enclave.  Currently, EREMOVE is run as part of pages being freed
> > > > > > > into the SGX page allocator.  It is not expected to fail.
> > > > > > > 
> > > > > > > KVM does not track how guest pages are used, which means that SGX
> > > > > > > virtualization use of EREMOVE might fail.
> > > > > > > 
> > > > > > > Break out the EREMOVE call from the SGX page allocator.  This will allow
> > > > > > > the SGX virtualization code to use the allocator directly.  (SGX/KVM
> > > > > > > will also introduce a more permissive EREMOVE helper).
> > > > > > > 
> > > > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
> > > > > > > more specific that it is used to free EPC page assigned to one enclave.
> > > > > > > Print an error message when EREMOVE fails to explicitly call out EPC
> > > > > > > page is leaked, and requires machine reboot to get leaked pages back.
> > > > > > > 
> > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > > > > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > > > > ---
> > > > > > > v2->v3:
> > > > > > > 
> > > > > > >  - Fixed bug during copy/paste which results in SECS page and va pages are not
> > > > > > >    correctly freed in sgx_encl_release() (sorry for the mistake).
> > > > > > >  - Added Jarkko's Acked-by.
> > > > > > 
> > > > > > That Acked-by should either be dropped or moved above Co-developed-by to make
> > > > > > checkpatch happy.
> > > > > > 
> > > > > > Reviewed-by: Sean Christopherson <seanjc@google.com>
> > > > > 
> > > > > Oops, my bad. Yup, ack should be removed.
> > > > > 
> > > > > /Jarkko
> > > > 
> > > > Hi Jarkko,
> > > > 
> > > > Your reply of your concern of this patch to the cover-letter
> > > > 
> > > > https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/
> > > > 
> > > > reminds me to do more sanity check of whether removing EREMOVE in
> > > > sgx_free_epc_page() will impact other code path or not, and I think
> > > > sgx_encl_release() is not the only place should be changed:
> > > > 
> > > > - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is
> > > > called, the VA page can be already valid -- there are other failures can
> > > > trigger sgx_encl_shrink().
> > > 
> > > You right about this, good catch.
> > > 
> > > Shrink needs to always do EREMOVE as grow has done EPA, which changes
> > > EPC page state.
> > > 
> > > > - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:"
> > > > label, since the EPC page can be already valid when error happened, i.e. when
> > > > EEXTEND fails.
> > > 
> > > Yes, correct, good work!
> > > 
> > > > Other places should be OK per my check, but I'd prefer to just replacing all
> > > > sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with
> > > > one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA
> > > > fails, in which case EREMOVE is not required for sure.
> > > 
> > > I would not unless they require it.
> > > 
> > > > Your idea, please?
> > > > 
> > > > Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me,
> > > > because virtualization has a counterpart in sgx/virt.c too.
> > > 
> > > It does make sense to use sgx_free_epc_page() everywhere where it's
> > > the right thing to call and here's why.
> > > 
> > > If there is some unrelated regression that causes EPC page not get
> > > uninitialized when it actually should, doing extra EREMOVE could mask
> > > those bugs. I.e. it can postpone a failure, which can make a bug harder
> > > to backtrace.
> > > 
> > 
> > I.e. even though it is true that for correctly working code extra EREMOVE
> > is nil functionality, it could change semantics for buggy code.
> 
> Thanks for feedback. Sorry I am not sure if I understand you. So if we don't
> want to bring functionality change, we need to replace sgx_free_epc_page() in
> all call sites with sgx_encl_free_epc_page(). To me for this patch only, it's
> better not to bring any functional change, so I intend to replace all (I now
> consider even leaving sgx_alloc_va_page() out is not good idea in *this*
> patch). 
> 
> Or do you just want to replace sgx_free_epc_page() with
> sgx_encl_free_epc_page() in sgx_encl_shrink() and sgx_encl_add_page(), as I
> pointed above? In this way there will be functional change in this patch, and
> we need to explicitly explain  why leaving others out is OK in commit message.
> 
> To me I prefer the former.

The original purpose of this patch was exactly to remove EREMOVE
sgx_free_epc_page() and call it explicitly where it is required. That's
why I introduced sgx_reset_epc_page(). So the latter was actually the goal
of this patch at least when I did it. Now this is something completely
different.

So, I don't consider myself author of this patch in any possible way,
because this is not what I intended.

To move forward, for the next patch set version, you should change the
author field as yourself, and remove all my tags, and I will review it.
So you can work out this with former approach if you wish.

I.e. my ack/nak/etc. apply to this patch because it's not my code.

/Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-15 13:19           ` Jarkko Sakkinen
@ 2021-03-15 20:29             ` Kai Huang
  2021-03-15 22:59               ` Jarkko Sakkinen
  2021-03-15 23:11               ` Jarkko Sakkinen
  0 siblings, 2 replies; 134+ messages in thread
From: Kai Huang @ 2021-03-15 20:29 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Sean Christopherson, kvm, x86, linux-sgx, linux-kernel, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx,
	mingo, hpa

On Mon, 15 Mar 2021 15:19:32 +0200 Jarkko Sakkinen wrote:
> On Mon, Mar 15, 2021 at 03:18:16PM +0200, Jarkko Sakkinen wrote:
> > On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote:
> > > On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote:
> > > > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote:
> > > > > On Thu, Mar 11, 2021, Kai Huang wrote:
> > > > > > From: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > 
> > > > > > EREMOVE takes a page and removes any association between that page and
> > > > > > an enclave.  It must be run on a page before it can be added into
> > > > > > another enclave.  Currently, EREMOVE is run as part of pages being freed
> > > > > > into the SGX page allocator.  It is not expected to fail.
> > > > > > 
> > > > > > KVM does not track how guest pages are used, which means that SGX
> > > > > > virtualization use of EREMOVE might fail.
> > > > > > 
> > > > > > Break out the EREMOVE call from the SGX page allocator.  This will allow
> > > > > > the SGX virtualization code to use the allocator directly.  (SGX/KVM
> > > > > > will also introduce a more permissive EREMOVE helper).
> > > > > > 
> > > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
> > > > > > more specific that it is used to free EPC page assigned to one enclave.
> > > > > > Print an error message when EREMOVE fails to explicitly call out EPC
> > > > > > page is leaked, and requires machine reboot to get leaked pages back.
> > > > > > 
> > > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > > > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > > > ---
> > > > > > v2->v3:
> > > > > > 
> > > > > >  - Fixed bug during copy/paste which results in SECS page and va pages are not
> > > > > >    correctly freed in sgx_encl_release() (sorry for the mistake).
> > > > > >  - Added Jarkko's Acked-by.
> > > > > 
> > > > > That Acked-by should either be dropped or moved above Co-developed-by to make
> > > > > checkpatch happy.
> > > > > 
> > > > > Reviewed-by: Sean Christopherson <seanjc@google.com>
> > > > 
> > > > Oops, my bad. Yup, ack should be removed.
> > > > 
> > > > /Jarkko
> > > 
> > > Hi Jarkko,
> > > 
> > > Your reply of your concern of this patch to the cover-letter
> > > 
> > > https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/
> > > 
> > > reminds me to do more sanity check of whether removing EREMOVE in
> > > sgx_free_epc_page() will impact other code path or not, and I think
> > > sgx_encl_release() is not the only place should be changed:
> > > 
> > > - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is
> > > called, the VA page can be already valid -- there are other failures can
> > > trigger sgx_encl_shrink().
> > 
> > You right about this, good catch.
> > 
> > Shrink needs to always do EREMOVE as grow has done EPA, which changes
> > EPC page state.
> > 
> > > - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:"
> > > label, since the EPC page can be already valid when error happened, i.e. when
> > > EEXTEND fails.
> > 
> > Yes, correct, good work!
> > 
> > > Other places should be OK per my check, but I'd prefer to just replacing all
> > > sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with
> > > one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA
> > > fails, in which case EREMOVE is not required for sure.
> > 
> > I would not unless they require it.
> > 
> > > Your idea, please?
> > > 
> > > Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me,
> > > because virtualization has a counterpart in sgx/virt.c too.
> > 
> > It does make sense to use sgx_free_epc_page() everywhere where it's
> > the right thing to call and here's why.
> > 
> > If there is some unrelated regression that causes EPC page not get
> > uninitialized when it actually should, doing extra EREMOVE could mask
> > those bugs. I.e. it can postpone a failure, which can make a bug harder
> > to backtrace.
> > 
> 
> I.e. even though it is true that for correctly working code extra EREMOVE
> is nil functionality, it could change semantics for buggy code.

Thanks for feedback. Sorry I am not sure if I understand you. So if we don't
want to bring functionality change, we need to replace sgx_free_epc_page() in
all call sites with sgx_encl_free_epc_page(). To me for this patch only, it's
better not to bring any functional change, so I intend to replace all (I now
consider even leaving sgx_alloc_va_page() out is not good idea in *this*
patch). 

Or do you just want to replace sgx_free_epc_page() with
sgx_encl_free_epc_page() in sgx_encl_shrink() and sgx_encl_add_page(), as I
pointed above? In this way there will be functional change in this patch, and
we need to explicitly explain  why leaving others out is OK in commit message.

To me I prefer the former.



^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-15 13:18         ` Jarkko Sakkinen
@ 2021-03-15 13:19           ` Jarkko Sakkinen
  2021-03-15 20:29             ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-15 13:19 UTC (permalink / raw)
  To: Kai Huang
  Cc: Sean Christopherson, kvm, x86, linux-sgx, linux-kernel, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx,
	mingo, hpa

On Mon, Mar 15, 2021 at 03:18:16PM +0200, Jarkko Sakkinen wrote:
> On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote:
> > On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote:
> > > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote:
> > > > On Thu, Mar 11, 2021, Kai Huang wrote:
> > > > > From: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > 
> > > > > EREMOVE takes a page and removes any association between that page and
> > > > > an enclave.  It must be run on a page before it can be added into
> > > > > another enclave.  Currently, EREMOVE is run as part of pages being freed
> > > > > into the SGX page allocator.  It is not expected to fail.
> > > > > 
> > > > > KVM does not track how guest pages are used, which means that SGX
> > > > > virtualization use of EREMOVE might fail.
> > > > > 
> > > > > Break out the EREMOVE call from the SGX page allocator.  This will allow
> > > > > the SGX virtualization code to use the allocator directly.  (SGX/KVM
> > > > > will also introduce a more permissive EREMOVE helper).
> > > > > 
> > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
> > > > > more specific that it is used to free EPC page assigned to one enclave.
> > > > > Print an error message when EREMOVE fails to explicitly call out EPC
> > > > > page is leaked, and requires machine reboot to get leaked pages back.
> > > > > 
> > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > > ---
> > > > > v2->v3:
> > > > > 
> > > > >  - Fixed bug during copy/paste which results in SECS page and va pages are not
> > > > >    correctly freed in sgx_encl_release() (sorry for the mistake).
> > > > >  - Added Jarkko's Acked-by.
> > > > 
> > > > That Acked-by should either be dropped or moved above Co-developed-by to make
> > > > checkpatch happy.
> > > > 
> > > > Reviewed-by: Sean Christopherson <seanjc@google.com>
> > > 
> > > Oops, my bad. Yup, ack should be removed.
> > > 
> > > /Jarkko
> > 
> > Hi Jarkko,
> > 
> > Your reply of your concern of this patch to the cover-letter
> > 
> > https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/
> > 
> > reminds me to do more sanity check of whether removing EREMOVE in
> > sgx_free_epc_page() will impact other code path or not, and I think
> > sgx_encl_release() is not the only place should be changed:
> > 
> > - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is
> > called, the VA page can be already valid -- there are other failures can
> > trigger sgx_encl_shrink().
> 
> You right about this, good catch.
> 
> Shrink needs to always do EREMOVE as grow has done EPA, which changes
> EPC page state.
> 
> > - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:"
> > label, since the EPC page can be already valid when error happened, i.e. when
> > EEXTEND fails.
> 
> Yes, correct, good work!
> 
> > Other places should be OK per my check, but I'd prefer to just replacing all
> > sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with
> > one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA
> > fails, in which case EREMOVE is not required for sure.
> 
> I would not unless they require it.
> 
> > Your idea, please?
> > 
> > Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me,
> > because virtualization has a counterpart in sgx/virt.c too.
> 
> It does make sense to use sgx_free_epc_page() everywhere where it's
> the right thing to call and here's why.
> 
> If there is some unrelated regression that causes EPC page not get
> uninitialized when it actually should, doing extra EREMOVE could mask
> those bugs. I.e. it can postpone a failure, which can make a bug harder
> to backtrace.
> 

I.e. even though it is true that for correctly working code extra EREMOVE
is nil functionality, it could change semantics for buggy code.

/Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-15  7:12       ` Kai Huang
@ 2021-03-15 13:18         ` Jarkko Sakkinen
  2021-03-15 13:19           ` Jarkko Sakkinen
  0 siblings, 1 reply; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-15 13:18 UTC (permalink / raw)
  To: Kai Huang
  Cc: Sean Christopherson, kvm, x86, linux-sgx, linux-kernel, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx,
	mingo, hpa

On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote:
> On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote:
> > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote:
> > > On Thu, Mar 11, 2021, Kai Huang wrote:
> > > > From: Jarkko Sakkinen <jarkko@kernel.org>
> > > > 
> > > > EREMOVE takes a page and removes any association between that page and
> > > > an enclave.  It must be run on a page before it can be added into
> > > > another enclave.  Currently, EREMOVE is run as part of pages being freed
> > > > into the SGX page allocator.  It is not expected to fail.
> > > > 
> > > > KVM does not track how guest pages are used, which means that SGX
> > > > virtualization use of EREMOVE might fail.
> > > > 
> > > > Break out the EREMOVE call from the SGX page allocator.  This will allow
> > > > the SGX virtualization code to use the allocator directly.  (SGX/KVM
> > > > will also introduce a more permissive EREMOVE helper).
> > > > 
> > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
> > > > more specific that it is used to free EPC page assigned to one enclave.
> > > > Print an error message when EREMOVE fails to explicitly call out EPC
> > > > page is leaked, and requires machine reboot to get leaked pages back.
> > > > 
> > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > > ---
> > > > v2->v3:
> > > > 
> > > >  - Fixed bug during copy/paste which results in SECS page and va pages are not
> > > >    correctly freed in sgx_encl_release() (sorry for the mistake).
> > > >  - Added Jarkko's Acked-by.
> > > 
> > > That Acked-by should either be dropped or moved above Co-developed-by to make
> > > checkpatch happy.
> > > 
> > > Reviewed-by: Sean Christopherson <seanjc@google.com>
> > 
> > Oops, my bad. Yup, ack should be removed.
> > 
> > /Jarkko
> 
> Hi Jarkko,
> 
> Your reply of your concern of this patch to the cover-letter
> 
> https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/
> 
> reminds me to do more sanity check of whether removing EREMOVE in
> sgx_free_epc_page() will impact other code path or not, and I think
> sgx_encl_release() is not the only place should be changed:
> 
> - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is
> called, the VA page can be already valid -- there are other failures can
> trigger sgx_encl_shrink().

You right about this, good catch.

Shrink needs to always do EREMOVE as grow has done EPA, which changes
EPC page state.

> - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:"
> label, since the EPC page can be already valid when error happened, i.e. when
> EEXTEND fails.

Yes, correct, good work!

> Other places should be OK per my check, but I'd prefer to just replacing all
> sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with
> one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA
> fails, in which case EREMOVE is not required for sure.

I would not unless they require it.

> Your idea, please?
> 
> Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me,
> because virtualization has a counterpart in sgx/virt.c too.

It does make sense to use sgx_free_epc_page() everywhere where it's
the right thing to call and here's why.

If there is some unrelated regression that causes EPC page not get
uninitialized when it actually should, doing extra EREMOVE could mask
those bugs. I.e. it can postpone a failure, which can make a bug harder
to backtrace.

Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-13 10:45     ` Jarkko Sakkinen
@ 2021-03-15  7:12       ` Kai Huang
  2021-03-15 13:18         ` Jarkko Sakkinen
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-15  7:12 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Sean Christopherson, kvm, x86, linux-sgx, linux-kernel, luto,
	dave.hansen, rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx,
	mingo, hpa

On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote:
> On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote:
> > On Thu, Mar 11, 2021, Kai Huang wrote:
> > > From: Jarkko Sakkinen <jarkko@kernel.org>
> > > 
> > > EREMOVE takes a page and removes any association between that page and
> > > an enclave.  It must be run on a page before it can be added into
> > > another enclave.  Currently, EREMOVE is run as part of pages being freed
> > > into the SGX page allocator.  It is not expected to fail.
> > > 
> > > KVM does not track how guest pages are used, which means that SGX
> > > virtualization use of EREMOVE might fail.
> > > 
> > > Break out the EREMOVE call from the SGX page allocator.  This will allow
> > > the SGX virtualization code to use the allocator directly.  (SGX/KVM
> > > will also introduce a more permissive EREMOVE helper).
> > > 
> > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
> > > more specific that it is used to free EPC page assigned to one enclave.
> > > Print an error message when EREMOVE fails to explicitly call out EPC
> > > page is leaked, and requires machine reboot to get leaked pages back.
> > > 
> > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > ---
> > > v2->v3:
> > > 
> > >  - Fixed bug during copy/paste which results in SECS page and va pages are not
> > >    correctly freed in sgx_encl_release() (sorry for the mistake).
> > >  - Added Jarkko's Acked-by.
> > 
> > That Acked-by should either be dropped or moved above Co-developed-by to make
> > checkpatch happy.
> > 
> > Reviewed-by: Sean Christopherson <seanjc@google.com>
> 
> Oops, my bad. Yup, ack should be removed.
> 
> /Jarkko

Hi Jarkko,

Your reply of your concern of this patch to the cover-letter

https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/

reminds me to do more sanity check of whether removing EREMOVE in
sgx_free_epc_page() will impact other code path or not, and I think
sgx_encl_release() is not the only place should be changed:

- sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is
called, the VA page can be already valid -- there are other failures can
trigger sgx_encl_shrink().

- sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:"
label, since the EPC page can be already valid when error happened, i.e. when
EEXTEND fails.

Other places should be OK per my check, but I'd prefer to just replacing all
sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with
one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA
fails, in which case EREMOVE is not required for sure.

Your idea, please?

Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me,
because virtualization has a counterpart in sgx/virt.c too.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-12 21:21   ` Sean Christopherson
@ 2021-03-13 10:45     ` Jarkko Sakkinen
  2021-03-15  7:12       ` Kai Huang
  0 siblings, 1 reply; 134+ messages in thread
From: Jarkko Sakkinen @ 2021-03-13 10:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, kvm, x86, linux-sgx, linux-kernel, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa

On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote:
> On Thu, Mar 11, 2021, Kai Huang wrote:
> > From: Jarkko Sakkinen <jarkko@kernel.org>
> > 
> > EREMOVE takes a page and removes any association between that page and
> > an enclave.  It must be run on a page before it can be added into
> > another enclave.  Currently, EREMOVE is run as part of pages being freed
> > into the SGX page allocator.  It is not expected to fail.
> > 
> > KVM does not track how guest pages are used, which means that SGX
> > virtualization use of EREMOVE might fail.
> > 
> > Break out the EREMOVE call from the SGX page allocator.  This will allow
> > the SGX virtualization code to use the allocator directly.  (SGX/KVM
> > will also introduce a more permissive EREMOVE helper).
> > 
> > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
> > more specific that it is used to free EPC page assigned to one enclave.
> > Print an error message when EREMOVE fails to explicitly call out EPC
> > page is leaked, and requires machine reboot to get leaked pages back.
> > 
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > Co-developed-by: Kai Huang <kai.huang@intel.com>
> > Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> > v2->v3:
> > 
> >  - Fixed bug during copy/paste which results in SECS page and va pages are not
> >    correctly freed in sgx_encl_release() (sorry for the mistake).
> >  - Added Jarkko's Acked-by.
> 
> That Acked-by should either be dropped or moved above Co-developed-by to make
> checkpatch happy.
> 
> Reviewed-by: Sean Christopherson <seanjc@google.com>

Oops, my bad. Yup, ack should be removed.

/Jarkko

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-11  2:01 ` [PATCH v3 " Kai Huang
@ 2021-03-12 21:21   ` Sean Christopherson
  2021-03-13 10:45     ` Jarkko Sakkinen
  0 siblings, 1 reply; 134+ messages in thread
From: Sean Christopherson @ 2021-03-12 21:21 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, x86, linux-sgx, linux-kernel, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa

On Thu, Mar 11, 2021, Kai Huang wrote:
> From: Jarkko Sakkinen <jarkko@kernel.org>
> 
> EREMOVE takes a page and removes any association between that page and
> an enclave.  It must be run on a page before it can be added into
> another enclave.  Currently, EREMOVE is run as part of pages being freed
> into the SGX page allocator.  It is not expected to fail.
> 
> KVM does not track how guest pages are used, which means that SGX
> virtualization use of EREMOVE might fail.
> 
> Break out the EREMOVE call from the SGX page allocator.  This will allow
> the SGX virtualization code to use the allocator directly.  (SGX/KVM
> will also introduce a more permissive EREMOVE helper).
> 
> Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
> more specific that it is used to free EPC page assigned to one enclave.
> Print an error message when EREMOVE fails to explicitly call out EPC
> page is leaked, and requires machine reboot to get leaked pages back.
> 
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> Co-developed-by: Kai Huang <kai.huang@intel.com>
> Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
> v2->v3:
> 
>  - Fixed bug during copy/paste which results in SECS page and va pages are not
>    correctly freed in sgx_encl_release() (sorry for the mistake).
>  - Added Jarkko's Acked-by.

That Acked-by should either be dropped or moved above Co-developed-by to make
checkpatch happy.

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
  2021-03-09  1:39 [PATCH v2 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() Kai Huang
@ 2021-03-11  2:01 ` Kai Huang
  2021-03-12 21:21   ` Sean Christopherson
  0 siblings, 1 reply; 134+ messages in thread
From: Kai Huang @ 2021-03-11  2:01 UTC (permalink / raw)
  To: kvm, x86, linux-sgx
  Cc: linux-kernel, seanjc, jarkko, luto, dave.hansen,
	rick.p.edgecombe, haitao.huang, pbonzini, bp, tglx, mingo, hpa,
	Kai Huang

From: Jarkko Sakkinen <jarkko@kernel.org>

EREMOVE takes a page and removes any association between that page and
an enclave.  It must be run on a page before it can be added into
another enclave.  Currently, EREMOVE is run as part of pages being freed
into the SGX page allocator.  It is not expected to fail.

KVM does not track how guest pages are used, which means that SGX
virtualization use of EREMOVE might fail.

Break out the EREMOVE call from the SGX page allocator.  This will allow
the SGX virtualization code to use the allocator directly.  (SGX/KVM
will also introduce a more permissive EREMOVE helper).

Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
more specific that it is used to free EPC page assigned to one enclave.
Print an error message when EREMOVE fails to explicitly call out EPC
page is leaked, and requires machine reboot to get leaked pages back.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v2->v3:

 - Fixed bug during copy/paste which results in SECS page and va pages are not
   correctly freed in sgx_encl_release() (sorry for the mistake).
 - Added Jarkko's Acked-by.

v1->v2:

 - Changed to hide both SGX1 and SGX2 from /proc/cpuinfo, since no concrete
   use case, per Boris.
 - Refined commit msg to explain why to hide SGX1 and SGX2 in /proc/cpuinfo.

---
 arch/x86/kernel/cpu/sgx/encl.c | 27 ++++++++++++++++++++++++---
 arch/x86/kernel/cpu/sgx/main.c | 12 ++++--------
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 7449ef33f081..c0b80c0853d8 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -381,6 +381,27 @@ const struct vm_operations_struct sgx_vm_ops = {
 	.access = sgx_vma_access,
 };
 
+static void sgx_encl_free_epc_page(struct sgx_epc_page *epc_page)
+{
+	int ret;
+
+	WARN_ON_ONCE(epc_page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
+
+	/*
+	 * Give a message to remind EPC page is leaked when EREMOVE fails,
+	 * and requires machine reboot to get leaked pages back. This can
+	 * be improved in future by adding stats of leaked pages, etc.
+	 */
+#define EREMOVE_ERROR_MESSAGE \
+	"EREMOVE returned %d (0x%x).  EPC page leaked.  Reboot required to retrieve leaked pages."
+	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
+	if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret))
+		return;
+#undef EREMOVE_ERROR_MESSAGE
+
+	sgx_free_epc_page(epc_page);
+}
+
 /**
  * sgx_encl_release - Destroy an enclave instance
  * @kref:	address of a kref inside &sgx_encl
@@ -404,7 +425,7 @@ void sgx_encl_release(struct kref *ref)
 			if (sgx_unmark_page_reclaimable(entry->epc_page))
 				continue;
 
-			sgx_free_epc_page(entry->epc_page);
+			sgx_encl_free_epc_page(entry->epc_page);
 			encl->secs_child_cnt--;
 			entry->epc_page = NULL;
 		}
@@ -415,7 +436,7 @@ void sgx_encl_release(struct kref *ref)
 	xa_destroy(&encl->page_array);
 
 	if (!encl->secs_child_cnt && encl->secs.epc_page) {
-		sgx_free_epc_page(encl->secs.epc_page);
+		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 	}
 
@@ -423,7 +444,7 @@ void sgx_encl_release(struct kref *ref)
 		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
 					   list);
 		list_del(&va_page->list);
-		sgx_free_epc_page(va_page->epc_page);
+		sgx_encl_free_epc_page(va_page->epc_page);
 		kfree(va_page);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8df81a3ed945..44fe91a5bfb3 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -598,18 +598,14 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  * sgx_free_epc_page() - Free an EPC page
  * @page:	an EPC page
  *
- * Call EREMOVE for an EPC page and insert it back to the list of free pages.
+ * Put the EPC page back to the list of free pages. It's the caller's
+ * responsibility to make sure that the page is in uninitialized state. In other
+ * words, do EREMOVE, EWB or whatever operation is necessary before calling
+ * this function.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
 	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
-	int ret;
-
-	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
-
-	ret = __eremove(sgx_get_epc_virt_addr(page));
-	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
-		return;
 
 	spin_lock(&section->lock);
 	list_add_tail(&page->list, &section->page_list);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

end of thread, other threads:[~2021-04-07 10:05 UTC | newest]

Thread overview: 134+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
2021-03-19  7:22 ` [PATCH v3 01/25] x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Kai Huang
2021-03-19  7:22 ` [PATCH v3 02/25] x86/cpufeatures: Add SGX1 and SGX2 sub-features Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:22 ` [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() Kai Huang
2021-03-22 18:16   ` Borislav Petkov
2021-03-22 18:56     ` Sean Christopherson
2021-03-22 19:11       ` Paolo Bonzini
2021-03-22 20:43         ` Kai Huang
2021-03-23 16:40           ` Paolo Bonzini
2021-03-22 19:15       ` Borislav Petkov
2021-03-22 19:37         ` Sean Christopherson
2021-03-22 20:36           ` Kai Huang
2021-03-22 21:06           ` Borislav Petkov
2021-03-22 22:06             ` Kai Huang
2021-03-22 22:37               ` Borislav Petkov
2021-03-22 23:16                 ` Kai Huang
2021-03-23 15:45                   ` Sean Christopherson
2021-03-23 16:06                     ` Borislav Petkov
2021-03-23 16:21                       ` Sean Christopherson
2021-03-23 16:32                         ` Borislav Petkov
2021-03-23 16:51                           ` Sean Christopherson
2021-03-24  9:38                           ` Kai Huang
2021-03-24 10:09                             ` Paolo Bonzini
2021-03-24 10:48                               ` Kai Huang
2021-03-24 11:24                                 ` Paolo Bonzini
2021-03-24 23:23                               ` Kai Huang
2021-03-24 23:39                                 ` Paolo Bonzini
2021-03-24 23:46                                   ` Kai Huang
2021-03-25  8:42                                     ` Borislav Petkov
2021-03-25  9:38                                       ` Kai Huang
2021-03-25 16:52                                         ` Borislav Petkov
2021-03-24  9:28                         ` Jarkko Sakkinen
2021-03-23 16:38                       ` Paolo Bonzini
2021-03-23 17:02                         ` Sean Christopherson
2021-03-23 17:06                           ` Paolo Bonzini
2021-03-23 17:16                             ` Sean Christopherson
2021-03-23 18:16                             ` Borislav Petkov
2021-03-24  9:26                       ` Jarkko Sakkinen
2021-03-22 22:23             ` Kai Huang
2021-03-25  9:30   ` [PATCH v4 " Kai Huang
2021-03-26 19:48     ` Jarkko Sakkinen
2021-03-26 20:38       ` Kai Huang
2021-03-26 21:39       ` Jarkko Sakkinen
2021-04-07 10:03     ` [tip: x86/sgx] " tip-bot2 for Kai Huang
2021-03-19  7:22 ` [PATCH v3 04/25] x86/sgx: Add SGX_CHILD_PRESENT hardware error code Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:22 ` [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
2021-03-25  9:36   ` Kai Huang
2021-03-26 15:03   ` Borislav Petkov
2021-03-26 15:17     ` Dave Hansen
2021-03-26 15:29       ` Borislav Petkov
2021-03-26 15:35         ` Dave Hansen
2021-03-26 17:02           ` Borislav Petkov
2021-03-31  1:10     ` Kai Huang
2021-03-31  6:44       ` Boris Petkov
2021-03-31  6:51         ` Kai Huang
2021-03-31  7:44           ` Boris Petkov
2021-03-31  8:53             ` Kai Huang
2021-03-31 12:20               ` Kai Huang
2021-04-01 18:31                 ` Borislav Petkov
2021-04-01 23:38                   ` Kai Huang
2021-04-01  9:45               ` Kai Huang
2021-04-01  9:42   ` [PATCH v4 " Kai Huang
2021-04-05  9:01   ` [PATCH v3 " Borislav Petkov
2021-04-05 21:46     ` Kai Huang
2021-04-06  8:28       ` Borislav Petkov
2021-04-06  9:04         ` Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:22 ` [PATCH v3 06/25] x86/cpu/intel: Allow SGX virtualization without Launch Control support Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled Kai Huang
2021-04-02  9:48   ` Borislav Petkov
2021-04-02 11:08     ` Kai Huang
2021-04-02 11:22       ` Borislav Petkov
2021-04-02 11:38         ` Kai Huang
2021-04-02 15:42     ` Sean Christopherson
2021-04-02 19:08       ` Kai Huang
2021-04-02 19:19       ` Borislav Petkov
2021-04-02 19:30         ` Sean Christopherson
2021-04-02 19:46           ` Borislav Petkov
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Kai Huang
2021-03-19  7:23 ` [PATCH v3 08/25] x86/sgx: Expose SGX architectural definitions to the kernel Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 09/25] x86/sgx: Move ENCLS leaf definitions to sgx.h Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 10/25] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT) Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 11/25] x86/sgx: Add encls_faulted() helper Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 12/25] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Kai Huang
2021-03-19  7:23 ` [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Kai Huang
2021-04-05  9:07   ` Borislav Petkov
2021-04-05 21:44     ` Kai Huang
2021-04-06  7:40       ` Borislav Petkov
2021-04-06  8:59         ` Kai Huang
2021-04-06  9:09           ` Borislav Petkov
2021-04-06  9:24             ` Kai Huang
2021-04-06  9:32               ` Borislav Petkov
2021-04-06  9:41                 ` Kai Huang
2021-04-06 17:08                   ` Borislav Petkov
2021-04-06 20:33                     ` Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 14/25] x86/sgx: Move provisioning device creation out of SGX driver Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 15/25] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX) Kai Huang
2021-03-19  7:23 ` [PATCH v3 16/25] KVM: x86: Define new #PF SGX error code bit Kai Huang
2021-03-19  7:23 ` [PATCH v3 17/25] KVM: x86: Add support for reverse CPUID lookup of scattered features Kai Huang
2021-03-19  7:23 ` [PATCH v3 18/25] KVM: x86: Add reverse-CPUID lookup support for scattered SGX features Kai Huang
2021-03-19  7:23 ` [PATCH v3 19/25] KVM: VMX: Add basic handling of VM-Exit from SGX enclave Kai Huang
2021-03-19  7:23 ` [PATCH v3 20/25] KVM: VMX: Frame in ENCLS handler for SGX virtualization Kai Huang
2021-03-19  7:23 ` [PATCH v3 21/25] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions Kai Huang
2021-03-19  7:23 ` [PATCH v3 22/25] KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs Kai Huang
2021-03-19  7:23 ` [PATCH v3 23/25] KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC) Kai Huang
2021-03-19  7:23 ` [PATCH v3 24/25] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC Kai Huang
2021-03-19  7:24 ` [PATCH v3 25/25] KVM: x86: Add capability to grant VM access to privileged SGX attribute Kai Huang
2021-03-19 14:52 ` [PATCH v3 00/25] KVM SGX virtualization support Jarkko Sakkinen
2021-03-22 10:03   ` Kai Huang
2021-03-22 10:31     ` Borislav Petkov
2021-03-26 22:46 ` Jarkko Sakkinen
2021-03-28 21:01   ` Huang, Kai
2021-03-31 23:23     ` Jarkko Sakkinen
  -- strict thread matches above, loose matches on Subject: below --
2021-03-09  1:39 [PATCH v2 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() Kai Huang
2021-03-11  2:01 ` [PATCH v3 " Kai Huang
2021-03-12 21:21   ` Sean Christopherson
2021-03-13 10:45     ` Jarkko Sakkinen
2021-03-15  7:12       ` Kai Huang
2021-03-15 13:18         ` Jarkko Sakkinen
2021-03-15 13:19           ` Jarkko Sakkinen
2021-03-15 20:29             ` Kai Huang
2021-03-15 22:59               ` Jarkko Sakkinen
2021-03-15 23:50                 ` Kai Huang
2021-03-15 23:11               ` Jarkko Sakkinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.