linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23
@ 2019-10-16 18:37 Sean Christopherson
  2019-10-16 18:37 ` [PATCH for_v23 v3 01/12] x86/sgx: Pass EADD the kernel's virtual address for the source page Sean Christopherson
                   ` (12 more replies)
  0 siblings, 13 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

As requested, the merge of the EPC and EADD bug fix series.  The headliners
are two critical bug fixes for a memory leak in sgx_encl_destroy() and a
livelock due to the EPC page free count getting corrupted.

Sean Christopherson (12):
  x86/sgx: Pass EADD the kernel's virtual address for the source page
  x86/sgx: Check the validity of the source page address for EADD
  x86/sgx: Fix EEXTEND error handling
  x86/sgx: Drop mmap_sem before EEXTENDing an enclave page
  x86/sgx: Remove redundant message from WARN on non-emtpy mm_list
  x86/sgx: Fix a memory leak in sgx_encl_destroy()
  x86/sgx: WARN on any non-zero return from __eremove()
  x86/sgx: WARN only once if EREMOVE fails
  x86/sgx: Split second half of sgx_free_page() to a separate helper
  x86/sgx: Use the post-reclaim variant of __sgx_free_page()
  x86/sgx: Don't update free page count if EPC section allocation fails
  x86/sgx: Reinstate per EPC section free page counts

 arch/x86/kernel/cpu/sgx/encl.c    | 29 ++++++++++-------
 arch/x86/kernel/cpu/sgx/ioctl.c   | 52 +++++++++++++++--------------
 arch/x86/kernel/cpu/sgx/main.c    | 54 ++++++++++++++++++++++---------
 arch/x86/kernel/cpu/sgx/reclaim.c |  8 ++---
 arch/x86/kernel/cpu/sgx/sgx.h     | 19 ++++++++++-
 5 files changed, 106 insertions(+), 56 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 01/12] x86/sgx: Pass EADD the kernel's virtual address for the source page
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-18  9:57   ` Jarkko Sakkinen
  2019-10-16 18:37 ` [PATCH for_v23 v3 02/12] x86/sgx: Check the validity of the source page address for EADD Sean Christopherson
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

Use the kernel's virtual address to reference the source page when
EADDing a page to the enclave.  gup() "does not guarantee that the page
exists in the user mappings", i.e. EADD can still fault and deadlock
due to mmap_sem contention if it consumes the userspace address.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/ioctl.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index bb8fcc4f91e3..2dd0eceee111 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -328,16 +328,14 @@ static int __sgx_encl_add_page(struct sgx_encl *encl,
 	if (ret < 1)
 		return ret;
 
-	__uaccess_begin();
-
 	pginfo.secs = (unsigned long)sgx_epc_addr(encl->secs.epc_page);
 	pginfo.addr = SGX_ENCL_PAGE_ADDR(encl_page);
 	pginfo.metadata = (unsigned long)secinfo;
-	pginfo.contents = (unsigned long)src;
+	pginfo.contents = (unsigned long)kmap_atomic(src_page);
 
 	ret = __eadd(&pginfo, sgx_epc_addr(epc_page));
 
-	__uaccess_end();
+	kunmap_atomic((void *)pginfo.contents);
 	put_page(src_page);
 
 	return ret ? -EFAULT : 0;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 02/12] x86/sgx: Check the validity of the source page address for EADD
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
  2019-10-16 18:37 ` [PATCH for_v23 v3 01/12] x86/sgx: Pass EADD the kernel's virtual address for the source page Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-16 18:37 ` [PATCH for_v23 v3 03/12] x86/sgx: Fix EEXTEND error handling Sean Christopherson
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

Add an explicit access_ok() check on EADD's source page to avoid passing
garbage to gup().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/ioctl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 2dd0eceee111..7d1b449bf771 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -498,6 +498,9 @@ static long sgx_ioc_enclave_add_page(struct sgx_encl *encl, void __user *arg)
 	    !IS_ALIGNED(addp.src, PAGE_SIZE))
 		return -EINVAL;
 
+	if (!(access_ok(addp.src, PAGE_SIZE)))
+		return -EFAULT;
+
 	if (addp.addr < encl->base || addp.addr - encl->base >= encl->size)
 		return -EINVAL;
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 03/12] x86/sgx: Fix EEXTEND error handling
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
  2019-10-16 18:37 ` [PATCH for_v23 v3 01/12] x86/sgx: Pass EADD the kernel's virtual address for the source page Sean Christopherson
  2019-10-16 18:37 ` [PATCH for_v23 v3 02/12] x86/sgx: Check the validity of the source page address for EADD Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-18 10:42   ` Jarkko Sakkinen
  2019-10-16 18:37 ` [PATCH for_v23 v3 04/12] x86/sgx: Drop mmap_sem before EEXTENDing an enclave page Sean Christopherson
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

Rework EEXTEND error handling to fix issues related to destroying the
enclave in response to EEXTEND failure.  At the time of EEXTEND, the
page is already visibile in the sense that it has been added to the
radix tree, and therefore will be processed by sgx_encl_destroy().  This
means the "add" part needs to be fully completed prior to invoking
sgx_encl_destroy() in order to avoid consuming half-baked state.

Move sgx_encl_destroy() to the call site of __sgx_encl_extend() so that
it is somewhat more obvious why the add needs to complete before doing
EEXTEND.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/ioctl.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 7d1b449bf771..4169ff3c81d8 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -351,18 +351,14 @@ static int __sgx_encl_extend(struct sgx_encl *encl,
 	for_each_set_bit(i, &mrmask, 16) {
 		ret = __eextend(sgx_epc_addr(encl->secs.epc_page),
 				sgx_epc_addr(epc_page) + (i * 0x100));
-		if (ret)
-			goto err_out;
+		if (ret) {
+			if (encls_failed(ret))
+				ENCLS_WARN(ret, "EEXTEND");
+			return -EFAULT;
+		}
 	}
 
 	return 0;
-
-err_out:
-	if (encls_failed(ret))
-		ENCLS_WARN(ret, "EEXTEND");
-
-	sgx_encl_destroy(encl);
-	return -EFAULT;
 }
 
 static int sgx_encl_add_page(struct sgx_encl *encl,
@@ -421,19 +417,24 @@ static int sgx_encl_add_page(struct sgx_encl *encl,
 	if (ret)
 		goto err_out;
 
-	ret = __sgx_encl_extend(encl, epc_page, addp->mrmask);
-	if (ret)
-		goto err_out;
-
+	/*
+	 * Complete the "add" before doing the "extend" so that the "add"
+	 * isn't in a half-baked state in the extremely unlikely scenario the
+	 * the enclave will be destroyed in response to EEXTEND failure.
+	 */
 	encl_page->encl = encl;
 	encl_page->epc_page = epc_page;
 	encl->secs_child_cnt++;
 
-	sgx_mark_page_reclaimable(encl_page->epc_page);
+	ret = __sgx_encl_extend(encl, epc_page, addp->mrmask);
+	if (ret)
+		sgx_encl_destroy(encl);
+	else
+		sgx_mark_page_reclaimable(encl_page->epc_page);
 
 	mutex_unlock(&encl->lock);
 	up_read(&current->mm->mmap_sem);
-	return 0;
+	return ret;
 
 err_out:
 	radix_tree_delete(&encl_page->encl->page_tree,
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 04/12] x86/sgx: Drop mmap_sem before EEXTENDing an enclave page
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
                   ` (2 preceding siblings ...)
  2019-10-16 18:37 ` [PATCH for_v23 v3 03/12] x86/sgx: Fix EEXTEND error handling Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-18 10:04   ` Jarkko Sakkinen
  2019-10-16 18:37 ` [PATCH for_v23 v3 05/12] x86/sgx: Remove redundant message from WARN on non-emtpy mm_list Sean Christopherson
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

Drop mmap_sem, which needs to be held for read across EADD, prior to
doing EEXTEND on the newly added page to avoid holding mmap_sem for an
extended duration.  EEXTEND doesn't access user pages and holding
encl->lock without mmap_sem is perfectly ok, while EEXTEND is a _slow_
operation, to the point where it operates on 256-byte chunks
instead of 4k pages to maintain a reasonable latency for a single
instruction.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/ioctl.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 4169ff3c81d8..7be3fdc846d7 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -409,11 +409,15 @@ static int sgx_encl_add_page(struct sgx_encl *encl,
 	 */
 	ret = radix_tree_insert(&encl->page_tree, PFN_DOWN(encl_page->desc),
 				encl_page);
-	if (ret)
+	if (ret) {
+		up_read(&current->mm->mmap_sem);
 		goto err_out_unlock;
+	}
 
 	ret = __sgx_encl_add_page(encl, encl_page, epc_page, secinfo,
 				  addp->src);
+	up_read(&current->mm->mmap_sem);
+
 	if (ret)
 		goto err_out;
 
@@ -433,7 +437,6 @@ static int sgx_encl_add_page(struct sgx_encl *encl,
 		sgx_mark_page_reclaimable(encl_page->epc_page);
 
 	mutex_unlock(&encl->lock);
-	up_read(&current->mm->mmap_sem);
 	return ret;
 
 err_out:
@@ -443,7 +446,6 @@ static int sgx_encl_add_page(struct sgx_encl *encl,
 err_out_unlock:
 	sgx_encl_shrink(encl, va_page);
 	mutex_unlock(&encl->lock);
-	up_read(&current->mm->mmap_sem);
 
 err_out_free:
 	sgx_free_page(epc_page);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 05/12] x86/sgx: Remove redundant message from WARN on non-emtpy mm_list
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
                   ` (3 preceding siblings ...)
  2019-10-16 18:37 ` [PATCH for_v23 v3 04/12] x86/sgx: Drop mmap_sem before EEXTENDing an enclave page Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-18 12:08   ` Jarkko Sakkinen
  2019-10-16 18:37 ` [PATCH for_v23 v3 06/12] x86/sgx: Fix a memory leak in sgx_encl_destroy() Sean Christopherson
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

Use WARN_ON_ONCE() instead of WARN_ONCE() for detecting a non-empty
mm_list during enclave release, the "mm_list non-empty" message doesn't
provide any additional or helpful information.

Suggested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 28e29deaad8f..ae81cd7cd8a8 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -514,7 +514,7 @@ void sgx_encl_release(struct kref *ref)
 	if (encl->backing)
 		fput(encl->backing);
 
-	WARN_ONCE(!list_empty(&encl->mm_list), "mm_list non-empty");
+	WARN_ON_ONCE(!list_empty(&encl->mm_list));
 
 	/* Detect EPC page leak's. */
 	WARN_ON_ONCE(encl->secs_child_cnt);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 06/12] x86/sgx: Fix a memory leak in sgx_encl_destroy()
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
                   ` (4 preceding siblings ...)
  2019-10-16 18:37 ` [PATCH for_v23 v3 05/12] x86/sgx: Remove redundant message from WARN on non-emtpy mm_list Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-18 12:17   ` Jarkko Sakkinen
  2019-10-16 18:37 ` [PATCH for_v23 v3 07/12] x86/sgx: WARN on any non-zero return from __eremove() Sean Christopherson
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

Delete an enclave page's entry in the radix tree regardless of whether
or not it has an associated EPC page, and free the page itself when it's
deleted from the radix tree.

Don't free/delete anything if the page is held by the reclaimer, as the
reclaimer needs the page itself and the driver needs the radix entry to
re-process the entry during sgx_encl_release().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index ae81cd7cd8a8..6e60520a939c 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -469,14 +469,19 @@ void sgx_encl_destroy(struct sgx_encl *encl)
 	radix_tree_for_each_slot(slot, &encl->page_tree, &iter, 0) {
 		entry = *slot;
 		if (entry->epc_page) {
-			if (!sgx_free_page(entry->epc_page)) {
-				encl->secs_child_cnt--;
-				entry->epc_page = NULL;
-			}
-
-			radix_tree_delete(&entry->encl->page_tree,
-					  PFN_DOWN(entry->desc));
+			/*
+			 * The page and its radix tree entry cannot be freed
+			 * if the page is being held by the reclaimer.
+			 */
+			if (sgx_free_page(entry->epc_page))
+				continue;
+			encl->secs_child_cnt--;
+			entry->epc_page = NULL;
 		}
+
+		radix_tree_delete(&entry->encl->page_tree,
+				  PFN_DOWN(entry->desc));
+		kfree(entry);
 	}
 
 	if (!encl->secs_child_cnt && encl->secs.epc_page) {
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 07/12] x86/sgx: WARN on any non-zero return from __eremove()
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
                   ` (5 preceding siblings ...)
  2019-10-16 18:37 ` [PATCH for_v23 v3 06/12] x86/sgx: Fix a memory leak in sgx_encl_destroy() Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-16 18:37 ` [PATCH for_v23 v3 08/12] x86/sgx: WARN only once if EREMOVE fails Sean Christopherson
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

WARN on any non-zero return from __eremove() to make it clear that any
kind of failure is unexpected.  Technically, warning on negative values
is unnecessary as the ENCLS helpers return SGX error codes, which are
currently all postive.  But, the more precise check might be
misinterpreted as implying the negative values are expected/ok.

Note, prior to a recent change, warning only on positive values was
necessary to avoid a redundant double-WARN as the WARN resided outside
of what is now sgx_free_page() and so could consume -EBUSY.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index b66d5191cbaf..15965fd1f4a2 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -137,7 +137,7 @@ int sgx_free_page(struct sgx_epc_page *page)
 	spin_unlock(&sgx_active_page_list_lock);
 
 	ret = __eremove(sgx_epc_addr(page));
-	WARN(ret > 0, "EREMOVE returned %d (0x%x)", ret, ret);
+	WARN(ret, "EREMOVE returned %d (0x%x)", ret, ret);
 
 	spin_lock(&section->lock);
 	list_add_tail(&page->list, &section->page_list);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 08/12] x86/sgx: WARN only once if EREMOVE fails
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
                   ` (6 preceding siblings ...)
  2019-10-16 18:37 ` [PATCH for_v23 v3 07/12] x86/sgx: WARN on any non-zero return from __eremove() Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-16 18:37 ` [PATCH for_v23 v3 09/12] x86/sgx: Split second half of sgx_free_page() to a separate helper Sean Christopherson
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

WARN only once if EREMOVE fails to avoid spamming the kernel log if a
catastrophic failure occurs.  Warning on every failure is helpful for
development, but is a bad idea for production code as EREMOVE rarely
fails just once...

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 15965fd1f4a2..718fd5590608 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -137,7 +137,7 @@ int sgx_free_page(struct sgx_epc_page *page)
 	spin_unlock(&sgx_active_page_list_lock);
 
 	ret = __eremove(sgx_epc_addr(page));
-	WARN(ret, "EREMOVE returned %d (0x%x)", ret, ret);
+	WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret);
 
 	spin_lock(&section->lock);
 	list_add_tail(&page->list, &section->page_list);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 09/12] x86/sgx: Split second half of sgx_free_page() to a separate helper
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
                   ` (7 preceding siblings ...)
  2019-10-16 18:37 ` [PATCH for_v23 v3 08/12] x86/sgx: WARN only once if EREMOVE fails Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-18 10:06   ` Jarkko Sakkinen
  2019-10-16 18:37 ` [PATCH for_v23 v3 10/12] x86/sgx: Use the post-reclaim variant of __sgx_free_page() Sean Christopherson
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

Move the post-reclaim half of sgx_free_page() to a standalone helper so
that it can be used in flows where the page is known to be
non-reclaimable.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 44 ++++++++++++++++++++++++++--------
 arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
 2 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 718fd5590608..083d9a589882 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -103,6 +103,39 @@ struct sgx_epc_page *sgx_alloc_page(void *owner, bool reclaim)
 	return entry;
 }
 
+
+/**
+ * __sgx_free_page() - Free an EPC page
+ * @page:	pointer a previously allocated EPC page
+ *
+ * EREMOVE an EPC page and insert it back to the list of free pages. The page
+ * must not be reclaimable.
+ */
+void __sgx_free_page(struct sgx_epc_page *page)
+{
+	struct sgx_epc_section *section;
+	int ret;
+
+	/*
+	 * Don't take sgx_active_page_list_lock when asserting the page isn't
+	 * reclaimable, missing a WARN in the very rare case is preferable to
+	 * unnecessarily taking a global lock in the common case.
+	 */
+	WARN_ON_ONCE(page->desc & SGX_EPC_PAGE_RECLAIMABLE);
+
+	ret = __eremove(sgx_epc_addr(page));
+	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
+		return;
+
+	section = sgx_epc_section(page);
+
+	spin_lock(&section->lock);
+	list_add_tail(&page->list, &section->page_list);
+	sgx_nr_free_pages++;
+	spin_unlock(&section->lock);
+
+}
+
 /**
  * sgx_free_page() - Free an EPC page
  * @page:	pointer a previously allocated EPC page
@@ -116,9 +149,6 @@ struct sgx_epc_page *sgx_alloc_page(void *owner, bool reclaim)
  */
 int sgx_free_page(struct sgx_epc_page *page)
 {
-	struct sgx_epc_section *section = sgx_epc_section(page);
-	int ret;
-
 	/*
 	 * Remove the page from the active list if necessary.  If the page
 	 * is actively being reclaimed, i.e. RECLAIMABLE is set but the
@@ -136,13 +166,7 @@ int sgx_free_page(struct sgx_epc_page *page)
 	}
 	spin_unlock(&sgx_active_page_list_lock);
 
-	ret = __eremove(sgx_epc_addr(page));
-	WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret);
-
-	spin_lock(&section->lock);
-	list_add_tail(&page->list, &section->page_list);
-	sgx_nr_free_pages++;
-	spin_unlock(&section->lock);
+	__sgx_free_page(page);
 
 	return 0;
 }
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 160a3c996ef6..87e375e8c25e 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -85,6 +85,7 @@ void sgx_reclaim_pages(void);
 
 struct sgx_epc_page *sgx_try_alloc_page(void);
 struct sgx_epc_page *sgx_alloc_page(void *owner, bool reclaim);
+void __sgx_free_page(struct sgx_epc_page *page);
 int sgx_free_page(struct sgx_epc_page *page);
 
 #endif /* _X86_SGX_H */
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 10/12] x86/sgx: Use the post-reclaim variant of __sgx_free_page()
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
                   ` (8 preceding siblings ...)
  2019-10-16 18:37 ` [PATCH for_v23 v3 09/12] x86/sgx: Split second half of sgx_free_page() to a separate helper Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-16 18:37 ` [PATCH for_v23 v3 11/12] x86/sgx: Don't update free page count if EPC section allocation fails Sean Christopherson
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

Use __sgx_free_page() in all locations where the EPC page is supposed to
be unreclaimable so that a WARN fires if the page is unexpectedly marked
for reclaim.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c    | 8 ++++----
 arch/x86/kernel/cpu/sgx/ioctl.c   | 6 +++---
 arch/x86/kernel/cpu/sgx/reclaim.c | 4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 6e60520a939c..19af43826f9b 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -73,7 +73,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	ret = __sgx_encl_eldu(encl_page, epc_page, secs_page);
 	if (ret) {
-		sgx_free_page(epc_page);
+		__sgx_free_page(epc_page);
 		return ERR_PTR(ret);
 	}
 
@@ -485,7 +485,7 @@ void sgx_encl_destroy(struct sgx_encl *encl)
 	}
 
 	if (!encl->secs_child_cnt && encl->secs.epc_page) {
-		sgx_free_page(encl->secs.epc_page);
+		__sgx_free_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 	}
 
@@ -498,7 +498,7 @@ void sgx_encl_destroy(struct sgx_encl *encl)
 		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
 					   list);
 		list_del(&va_page->list);
-		sgx_free_page(va_page->epc_page);
+		__sgx_free_page(va_page->epc_page);
 		kfree(va_page);
 	}
 }
@@ -696,7 +696,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
 	ret = __epa(sgx_epc_addr(epc_page));
 	if (ret) {
 		WARN_ONCE(1, "EPA returned %d (0x%x)", ret, ret);
-		sgx_free_page(epc_page);
+		__sgx_free_page(epc_page);
 		return ERR_PTR(-EFAULT);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 7be3fdc846d7..07b3a9a1cda6 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -47,7 +47,7 @@ static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
 	encl->page_cnt--;
 
 	if (va_page) {
-		sgx_free_page(va_page->epc_page);
+		__sgx_free_page(va_page->epc_page);
 		list_del(&va_page->list);
 		kfree(va_page);
 	}
@@ -220,7 +220,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 	return 0;
 
 err_out:
-	sgx_free_page(encl->secs.epc_page);
+	__sgx_free_page(encl->secs.epc_page);
 	encl->secs.epc_page = NULL;
 
 err_out_backing:
@@ -448,7 +448,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl,
 	mutex_unlock(&encl->lock);
 
 err_out_free:
-	sgx_free_page(epc_page);
+	__sgx_free_page(epc_page);
 	kfree(encl_page);
 
 	return ret;
diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c b/arch/x86/kernel/cpu/sgx/reclaim.c
index 8143c9a20894..3f183dd0e653 100644
--- a/arch/x86/kernel/cpu/sgx/reclaim.c
+++ b/arch/x86/kernel/cpu/sgx/reclaim.c
@@ -323,7 +323,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 
 	if (!encl->secs_child_cnt) {
 		if (atomic_read(&encl->flags) & SGX_ENCL_DEAD) {
-			sgx_free_page(encl->secs.epc_page);
+			__sgx_free_page(encl->secs.epc_page);
 			encl->secs.epc_page = NULL;
 		} else if (atomic_read(&encl->flags) & SGX_ENCL_INITIALIZED) {
 			ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
@@ -331,7 +331,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 			if (!ret) {
 				sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
 
-				sgx_free_page(encl->secs.epc_page);
+				__sgx_free_page(encl->secs.epc_page);
 				encl->secs.epc_page = NULL;
 			}
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 11/12] x86/sgx: Don't update free page count if EPC section allocation fails
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
                   ` (9 preceding siblings ...)
  2019-10-16 18:37 ` [PATCH for_v23 v3 10/12] x86/sgx: Use the post-reclaim variant of __sgx_free_page() Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-16 18:37 ` [PATCH for_v23 v3 12/12] x86/sgx: Reinstate per EPC section free page counts Sean Christopherson
  2019-10-17 18:10 ` [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Jarkko Sakkinen
  12 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

Update the number of free pages only after an EPC section is fully
initialized, else the free page count will be left in a bogus state if
allocation fails.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 083d9a589882..6311aef10ec4 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -216,9 +216,10 @@ static bool __init sgx_alloc_epc_section(u64 addr, u64 size,
 
 		page->desc = (addr + (i << PAGE_SHIFT)) | index;
 		list_add_tail(&page->list, &section->unsanitized_page_list);
-		sgx_nr_free_pages++;
 	}
 
+	sgx_nr_free_pages += nr_pages;
+
 	return true;
 
 err_out:
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH for_v23 v3 12/12] x86/sgx: Reinstate per EPC section free page counts
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
                   ` (10 preceding siblings ...)
  2019-10-16 18:37 ` [PATCH for_v23 v3 11/12] x86/sgx: Don't update free page count if EPC section allocation fails Sean Christopherson
@ 2019-10-16 18:37 ` Sean Christopherson
  2019-10-18 12:49   ` Jarkko Sakkinen
  2019-10-17 18:10 ` [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Jarkko Sakkinen
  12 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-10-16 18:37 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

Track the free page count on a per EPC section basis so that the value
is properly protected by the section's spinlock.

As was pointed out when the change was proposed[*], using a global
non-atomic counter to track the number of free EPC pages is not safe.
The order of non-atomic reads and writes are not guaranteed, i.e.
concurrent RMW operats can write stale data.  This causes a variety
of bad behavior, e.g. livelocks because the free page count wraps and
causes the swap thread to stop reclaiming.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c    | 11 +++++------
 arch/x86/kernel/cpu/sgx/reclaim.c |  4 ++--
 arch/x86/kernel/cpu/sgx/sgx.h     | 18 +++++++++++++++++-
 3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 6311aef10ec4..efbb52e4ecad 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -13,18 +13,17 @@
 
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 int sgx_nr_epc_sections;
-unsigned long sgx_nr_free_pages;
 
 static struct sgx_epc_page *__sgx_try_alloc_page(struct sgx_epc_section *section)
 {
 	struct sgx_epc_page *page;
 
-	if (list_empty(&section->page_list))
+	if (!section->free_cnt)
 		return NULL;
 
 	page = list_first_entry(&section->page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
-	sgx_nr_free_pages--;
+	section->free_cnt--;
 	return page;
 }
 
@@ -97,7 +96,7 @@ struct sgx_epc_page *sgx_alloc_page(void *owner, bool reclaim)
 		schedule();
 	}
 
-	if (sgx_nr_free_pages < SGX_NR_LOW_PAGES)
+	if (!sgx_at_least_N_free_pages(SGX_NR_LOW_PAGES))
 		wake_up(&ksgxswapd_waitq);
 
 	return entry;
@@ -131,7 +130,7 @@ void __sgx_free_page(struct sgx_epc_page *page)
 
 	spin_lock(&section->lock);
 	list_add_tail(&page->list, &section->page_list);
-	sgx_nr_free_pages++;
+	section->free_cnt++;
 	spin_unlock(&section->lock);
 
 }
@@ -218,7 +217,7 @@ static bool __init sgx_alloc_epc_section(u64 addr, u64 size,
 		list_add_tail(&page->list, &section->unsanitized_page_list);
 	}
 
-	sgx_nr_free_pages += nr_pages;
+	section->free_cnt = nr_pages;
 
 	return true;
 
diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c b/arch/x86/kernel/cpu/sgx/reclaim.c
index 3f183dd0e653..8619141f4bed 100644
--- a/arch/x86/kernel/cpu/sgx/reclaim.c
+++ b/arch/x86/kernel/cpu/sgx/reclaim.c
@@ -68,7 +68,7 @@ static void sgx_sanitize_section(struct sgx_epc_section *section)
 
 static inline bool sgx_should_reclaim(void)
 {
-	return sgx_nr_free_pages < SGX_NR_HIGH_PAGES &&
+	return !sgx_at_least_N_free_pages(SGX_NR_HIGH_PAGES) &&
 	       !list_empty(&sgx_active_page_list);
 }
 
@@ -430,7 +430,7 @@ void sgx_reclaim_pages(void)
 		section = sgx_epc_section(epc_page);
 		spin_lock(&section->lock);
 		list_add_tail(&epc_page->list, &section->page_list);
-		sgx_nr_free_pages++;
+		section->free_cnt++;
 		spin_unlock(&section->lock);
 	}
 }
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 87e375e8c25e..c7f0277299f6 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -30,6 +30,7 @@ struct sgx_epc_page {
 struct sgx_epc_section {
 	unsigned long pa;
 	void *va;
+	unsigned long free_cnt;
 	struct list_head page_list;
 	struct list_head unsanitized_page_list;
 	spinlock_t lock;
@@ -73,12 +74,27 @@ static inline void *sgx_epc_addr(struct sgx_epc_page *page)
 #define SGX_NR_HIGH_PAGES	64
 
 extern int sgx_nr_epc_sections;
-extern unsigned long sgx_nr_free_pages;
 extern struct task_struct *ksgxswapd_tsk;
 extern struct wait_queue_head(ksgxswapd_waitq);
 extern struct list_head sgx_active_page_list;
 extern spinlock_t sgx_active_page_list_lock;
 
+static inline bool sgx_at_least_N_free_pages(unsigned long threshold)
+{
+	struct sgx_epc_section *section;
+	unsigned long free_cnt = 0;
+	int i;
+
+	for (i = 0; i < sgx_nr_epc_sections; i++) {
+		section = &sgx_epc_sections[i];
+		free_cnt += section->free_cnt;
+		if (free_cnt >= threshold)
+			return true;
+	}
+
+	return false;
+}
+
 bool __init sgx_page_reclaimer_init(void);
 void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 void sgx_reclaim_pages(void);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23
  2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
                   ` (11 preceding siblings ...)
  2019-10-16 18:37 ` [PATCH for_v23 v3 12/12] x86/sgx: Reinstate per EPC section free page counts Sean Christopherson
@ 2019-10-17 18:10 ` Jarkko Sakkinen
  2019-10-17 18:12   ` Jarkko Sakkinen
  2019-10-18 13:13   ` Jarkko Sakkinen
  12 siblings, 2 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-17 18:10 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Wed, Oct 16, 2019 at 11:37:33AM -0700, Sean Christopherson wrote:
> As requested, the merge of the EPC and EADD bug fix series.  The headliners
> are two critical bug fixes for a memory leak in sgx_encl_destroy() and a
> livelock due to the EPC page free count getting corrupted.

BTW, are you still planning to send multi-page EADD? I think it should
be done. It is obviously superior design when combined with my small
twist where mrmask is calculated inside the kernel.

Not only does it provide multi-page EADD but you could always do one
ioctl per area. With mrmask in parameters you end up needing two if the
user space is designed in a way that measurement is not a multiple of a
page.

If mrmask would make sense in the parameters, then a single-page EADD
would be cool but I think we want to render it out completely.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23
  2019-10-17 18:10 ` [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Jarkko Sakkinen
@ 2019-10-17 18:12   ` Jarkko Sakkinen
  2019-10-18 13:13   ` Jarkko Sakkinen
  1 sibling, 0 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-17 18:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Thu, Oct 17, 2019 at 09:10:54PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 16, 2019 at 11:37:33AM -0700, Sean Christopherson wrote:
> > As requested, the merge of the EPC and EADD bug fix series.  The headliners
> > are two critical bug fixes for a memory leak in sgx_encl_destroy() and a
> > livelock due to the EPC page free count getting corrupted.
> 
> BTW, are you still planning to send multi-page EADD? I think it should
> be done. It is obviously superior design when combined with my small
> twist where mrmask is calculated inside the kernel.
> 
> Not only does it provide multi-page EADD but you could always do one
> ioctl per area. With mrmask in parameters you end up needing two if the
> user space is designed in a way that measurement is not a multiple of a
> page.
> 
> If mrmask would make sense in the parameters, then a single-page EADD
> would be cool but I think we want to render it out completely.

My earlier comments about doing first a single page EADD were based on
API where you provide mrmask.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 01/12] x86/sgx: Pass EADD the kernel's virtual address for the source page
  2019-10-16 18:37 ` [PATCH for_v23 v3 01/12] x86/sgx: Pass EADD the kernel's virtual address for the source page Sean Christopherson
@ 2019-10-18  9:57   ` Jarkko Sakkinen
  2019-10-22  3:22     ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-18  9:57 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Wed, Oct 16, 2019 at 11:37:34AM -0700, Sean Christopherson wrote:
> Use the kernel's virtual address to reference the source page when
> EADDing a page to the enclave.  gup() "does not guarantee that the page
> exists in the user mappings", i.e. EADD can still fault and deadlock
> due to mmap_sem contention if it consumes the userspace address.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> +	kunmap_atomic((void *)pginfo.contents);

Remark: if not casted first to unsigned long,  it will give a warning on
32-bit build.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 04/12] x86/sgx: Drop mmap_sem before EEXTENDing an enclave page
  2019-10-16 18:37 ` [PATCH for_v23 v3 04/12] x86/sgx: Drop mmap_sem before EEXTENDing an enclave page Sean Christopherson
@ 2019-10-18 10:04   ` Jarkko Sakkinen
  0 siblings, 0 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-18 10:04 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Wed, Oct 16, 2019 at 11:37:37AM -0700, Sean Christopherson wrote:
> Drop mmap_sem, which needs to be held for read across EADD, prior to
> doing EEXTEND on the newly added page to avoid holding mmap_sem for an
> extended duration.  EEXTEND doesn't access user pages and holding
> encl->lock without mmap_sem is perfectly ok, while EEXTEND is a _slow_
> operation, to the point where it operates on 256-byte chunks
> instead of 4k pages to maintain a reasonable latency for a single
> instruction.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Locks should be always freed in the opposite order for any piece of
code.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 09/12] x86/sgx: Split second half of sgx_free_page() to a separate helper
  2019-10-16 18:37 ` [PATCH for_v23 v3 09/12] x86/sgx: Split second half of sgx_free_page() to a separate helper Sean Christopherson
@ 2019-10-18 10:06   ` Jarkko Sakkinen
  2019-10-22  3:36     ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-18 10:06 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Wed, Oct 16, 2019 at 11:37:42AM -0700, Sean Christopherson wrote:
> Move the post-reclaim half of sgx_free_page() to a standalone helper so
> that it can be used in flows where the page is known to be
> non-reclaimable.

The call sites wher it is known to be reclaimable should handle the
error instead of creating call site specific versions of the function.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 03/12] x86/sgx: Fix EEXTEND error handling
  2019-10-16 18:37 ` [PATCH for_v23 v3 03/12] x86/sgx: Fix EEXTEND error handling Sean Christopherson
@ 2019-10-18 10:42   ` Jarkko Sakkinen
  0 siblings, 0 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-18 10:42 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Wed, Oct 16, 2019 at 11:37:36AM -0700, Sean Christopherson wrote:
> Rework EEXTEND error handling to fix issues related to destroying the
> enclave in response to EEXTEND failure.  At the time of EEXTEND, the
> page is already visibile in the sense that it has been added to the
> radix tree, and therefore will be processed by sgx_encl_destroy().  This
> means the "add" part needs to be fully completed prior to invoking
> sgx_encl_destroy() in order to avoid consuming half-baked state.
> 
> Move sgx_encl_destroy() to the call site of __sgx_encl_extend() so that
> it is somewhat more obvious why the add needs to complete before doing
> EEXTEND.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

First three (1, 2, 3) have been applied.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 05/12] x86/sgx: Remove redundant message from WARN on non-emtpy mm_list
  2019-10-16 18:37 ` [PATCH for_v23 v3 05/12] x86/sgx: Remove redundant message from WARN on non-emtpy mm_list Sean Christopherson
@ 2019-10-18 12:08   ` Jarkko Sakkinen
  0 siblings, 0 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-18 12:08 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Wed, Oct 16, 2019 at 11:37:38AM -0700, Sean Christopherson wrote:
> Use WARN_ON_ONCE() instead of WARN_ONCE() for detecting a non-empty
> mm_list during enclave release, the "mm_list non-empty" message doesn't
> provide any additional or helpful information.
> 
> Suggested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Applied.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 06/12] x86/sgx: Fix a memory leak in sgx_encl_destroy()
  2019-10-16 18:37 ` [PATCH for_v23 v3 06/12] x86/sgx: Fix a memory leak in sgx_encl_destroy() Sean Christopherson
@ 2019-10-18 12:17   ` Jarkko Sakkinen
  0 siblings, 0 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-18 12:17 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Wed, Oct 16, 2019 at 11:37:39AM -0700, Sean Christopherson wrote:
> Delete an enclave page's entry in the radix tree regardless of whether
> or not it has an associated EPC page, and free the page itself when it's
> deleted from the radix tree.
> 
> Don't free/delete anything if the page is held by the reclaimer, as the
> reclaimer needs the page itself and the driver needs the radix entry to
> re-process the entry during sgx_encl_release().
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Applied.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 12/12] x86/sgx: Reinstate per EPC section free page counts
  2019-10-16 18:37 ` [PATCH for_v23 v3 12/12] x86/sgx: Reinstate per EPC section free page counts Sean Christopherson
@ 2019-10-18 12:49   ` Jarkko Sakkinen
  2019-10-18 12:55     ` Jarkko Sakkinen
  2019-10-18 14:30     ` Sean Christopherson
  0 siblings, 2 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-18 12:49 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Wed, Oct 16, 2019 at 11:37:45AM -0700, Sean Christopherson wrote:
> Track the free page count on a per EPC section basis so that the value
> is properly protected by the section's spinlock.
> 
> As was pointed out when the change was proposed[*], using a global
> non-atomic counter to track the number of free EPC pages is not safe.
> The order of non-atomic reads and writes are not guaranteed, i.e.
> concurrent RMW operats can write stale data.  This causes a variety
> of bad behavior, e.g. livelocks because the free page count wraps and
> causes the swap thread to stop reclaiming.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

What is the reason not change it just to atomic?

For debugging the global is useful because it could be exposed
as a sysfs file.

> ---
>  arch/x86/kernel/cpu/sgx/main.c    | 11 +++++------
>  arch/x86/kernel/cpu/sgx/reclaim.c |  4 ++--
>  arch/x86/kernel/cpu/sgx/sgx.h     | 18 +++++++++++++++++-
>  3 files changed, 24 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 6311aef10ec4..efbb52e4ecad 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -13,18 +13,17 @@
>  
>  struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
>  int sgx_nr_epc_sections;
> -unsigned long sgx_nr_free_pages;
>  
>  static struct sgx_epc_page *__sgx_try_alloc_page(struct sgx_epc_section *section)
>  {
>  	struct sgx_epc_page *page;
>  
> -	if (list_empty(&section->page_list))
> +	if (!section->free_cnt)
>  		return NULL;

 Why this check needs to be changed?

>  
>  	page = list_first_entry(&section->page_list, struct sgx_epc_page, list);
>  	list_del_init(&page->list);
> -	sgx_nr_free_pages--;
> +	section->free_cnt--;
>  	return page;
>  }
>  
> @@ -97,7 +96,7 @@ struct sgx_epc_page *sgx_alloc_page(void *owner, bool reclaim)
>  		schedule();
>  	}
>  
> -	if (sgx_nr_free_pages < SGX_NR_LOW_PAGES)
> +	if (!sgx_at_least_N_free_pages(SGX_NR_LOW_PAGES))
>  		wake_up(&ksgxswapd_waitq);
>  
>  	return entry;
> @@ -131,7 +130,7 @@ void __sgx_free_page(struct sgx_epc_page *page)
>  
>  	spin_lock(&section->lock);
>  	list_add_tail(&page->list, &section->page_list);
> -	sgx_nr_free_pages++;
> +	section->free_cnt++;
>  	spin_unlock(&section->lock);
>  
>  }
> @@ -218,7 +217,7 @@ static bool __init sgx_alloc_epc_section(u64 addr, u64 size,
>  		list_add_tail(&page->list, &section->unsanitized_page_list);
>  	}
>  
> -	sgx_nr_free_pages += nr_pages;
> +	section->free_cnt = nr_pages;
>  
>  	return true;
>  
> diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c b/arch/x86/kernel/cpu/sgx/reclaim.c
> index 3f183dd0e653..8619141f4bed 100644
> --- a/arch/x86/kernel/cpu/sgx/reclaim.c
> +++ b/arch/x86/kernel/cpu/sgx/reclaim.c
> @@ -68,7 +68,7 @@ static void sgx_sanitize_section(struct sgx_epc_section *section)
>  
>  static inline bool sgx_should_reclaim(void)
>  {
> -	return sgx_nr_free_pages < SGX_NR_HIGH_PAGES &&
> +	return !sgx_at_least_N_free_pages(SGX_NR_HIGH_PAGES) &&
>  	       !list_empty(&sgx_active_page_list);
>  }
>  
> @@ -430,7 +430,7 @@ void sgx_reclaim_pages(void)
>  		section = sgx_epc_section(epc_page);
>  		spin_lock(&section->lock);
>  		list_add_tail(&epc_page->list, &section->page_list);
> -		sgx_nr_free_pages++;
> +		section->free_cnt++;
>  		spin_unlock(&section->lock);
>  	}
>  }
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 87e375e8c25e..c7f0277299f6 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -30,6 +30,7 @@ struct sgx_epc_page {
>  struct sgx_epc_section {
>  	unsigned long pa;
>  	void *va;
> +	unsigned long free_cnt;
>  	struct list_head page_list;
>  	struct list_head unsanitized_page_list;
>  	spinlock_t lock;
> @@ -73,12 +74,27 @@ static inline void *sgx_epc_addr(struct sgx_epc_page *page)
>  #define SGX_NR_HIGH_PAGES	64
>  
>  extern int sgx_nr_epc_sections;
> -extern unsigned long sgx_nr_free_pages;
>  extern struct task_struct *ksgxswapd_tsk;
>  extern struct wait_queue_head(ksgxswapd_waitq);
>  extern struct list_head sgx_active_page_list;
>  extern spinlock_t sgx_active_page_list_lock;
>  
> +static inline bool sgx_at_least_N_free_pages(unsigned long threshold)

There is an upper case letter in the function name and name is also
weird overally.

> +{
> +	struct sgx_epc_section *section;
> +	unsigned long free_cnt = 0;
> +	int i;
> +
> +	for (i = 0; i < sgx_nr_epc_sections; i++) {
> +		section = &sgx_epc_sections[i];
> +		free_cnt += section->free_cnt;
> +		if (free_cnt >= threshold)
> +			return true;
> +	}
> +
> +	return false;
> +}

The complexity does not pay here. Better to revert instead back to this
if required:

unsigned long sgx_calc_free_cnt(void)
{
	struct sgx_epc_section *section;
	unsigned long free_cnt = 0;
	int i;

	for (i = 0; i < sgx_nr_epc_sections; i++) {
		section = &sgx_epc_sections[i];
		free_cnt += section->free_cnt;
	}

	return free_cnt;
}

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 12/12] x86/sgx: Reinstate per EPC section free page counts
  2019-10-18 12:49   ` Jarkko Sakkinen
@ 2019-10-18 12:55     ` Jarkko Sakkinen
  2019-10-18 14:30     ` Sean Christopherson
  1 sibling, 0 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-18 12:55 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Fri, Oct 18, 2019 at 03:49:42PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 16, 2019 at 11:37:45AM -0700, Sean Christopherson wrote:
> > Track the free page count on a per EPC section basis so that the value
> > is properly protected by the section's spinlock.
> > 
> > As was pointed out when the change was proposed[*], using a global
> > non-atomic counter to track the number of free EPC pages is not safe.
> > The order of non-atomic reads and writes are not guaranteed, i.e.
> > concurrent RMW operats can write stale data.  This causes a variety
> > of bad behavior, e.g. livelocks because the free page count wraps and
> > causes the swap thread to stop reclaiming.
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> What is the reason not change it just to atomic?
> 
> For debugging the global is useful because it could be exposed
> as a sysfs file.

So the regression is that counter updates is read + store (was not btw
described in the commit message what the regression you are speaking
of, which means that I'm making a guess myself).

This means that changing the variable to atomic should be IMHO
sufficient.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23
  2019-10-17 18:10 ` [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Jarkko Sakkinen
  2019-10-17 18:12   ` Jarkko Sakkinen
@ 2019-10-18 13:13   ` Jarkko Sakkinen
  1 sibling, 0 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-18 13:13 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Thu, Oct 17, 2019 at 09:10:54PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 16, 2019 at 11:37:33AM -0700, Sean Christopherson wrote:
> > As requested, the merge of the EPC and EADD bug fix series.  The headliners
> > are two critical bug fixes for a memory leak in sgx_encl_destroy() and a
> > livelock due to the EPC page free count getting corrupted.
> 
> BTW, are you still planning to send multi-page EADD? I think it should
> be done. It is obviously superior design when combined with my small
> twist where mrmask is calculated inside the kernel.
> 
> Not only does it provide multi-page EADD but you could always do one
> ioctl per area. With mrmask in parameters you end up needing two if the
> user space is designed in a way that measurement is not a multiple of a
> page.
> 
> If mrmask would make sense in the parameters, then a single-page EADD
> would be cool but I think we want to render it out completely.

I applied the fixes that I saw were appropriate. You can check if
this has been sufficient and send a revised set of fixes if not.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 12/12] x86/sgx: Reinstate per EPC section free page counts
  2019-10-18 12:49   ` Jarkko Sakkinen
  2019-10-18 12:55     ` Jarkko Sakkinen
@ 2019-10-18 14:30     ` Sean Christopherson
  2019-10-21 11:19       ` Jarkko Sakkinen
  1 sibling, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-10-18 14:30 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

On Fri, Oct 18, 2019 at 03:49:42PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 16, 2019 at 11:37:45AM -0700, Sean Christopherson wrote:
> > Track the free page count on a per EPC section basis so that the value
> > is properly protected by the section's spinlock.
> > 
> > As was pointed out when the change was proposed[*], using a global
> > non-atomic counter to track the number of free EPC pages is not safe.
> > The order of non-atomic reads and writes are not guaranteed, i.e.
> > concurrent RMW operats can write stale data.  This causes a variety
> > of bad behavior, e.g. livelocks because the free page count wraps and
> > causes the swap thread to stop reclaiming.
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> What is the reason not change it just to atomic?

The purpose of separate sections is to avoid bouncing locks and whatnot
across packages.  Adding a global atomic to the hotpath defeats that
purpose.

> For debugging the global is useful because it could be exposed
> as a sysfs file.

Can't the sysfs read helper aggregate the info?

> > ---
> >  arch/x86/kernel/cpu/sgx/main.c    | 11 +++++------
> >  arch/x86/kernel/cpu/sgx/reclaim.c |  4 ++--
> >  arch/x86/kernel/cpu/sgx/sgx.h     | 18 +++++++++++++++++-
> >  3 files changed, 24 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 6311aef10ec4..efbb52e4ecad 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -13,18 +13,17 @@
> >  
> >  struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
> >  int sgx_nr_epc_sections;
> > -unsigned long sgx_nr_free_pages;
> >  
> >  static struct sgx_epc_page *__sgx_try_alloc_page(struct sgx_epc_section *section)
> >  {
> >  	struct sgx_epc_page *page;
> >  
> > -	if (list_empty(&section->page_list))
> > +	if (!section->free_cnt)
> >  		return NULL;
> 
>  Why this check needs to be changed?

I was reverting to the previous behavior.

> >  
> >  	page = list_first_entry(&section->page_list, struct sgx_epc_page, list);
> >  	list_del_init(&page->list);
> > -	sgx_nr_free_pages--;
> > +	section->free_cnt--;
> >  	return page;
> >  }
> >  
> > @@ -97,7 +96,7 @@ struct sgx_epc_page *sgx_alloc_page(void *owner, bool reclaim)
> >  		schedule();
> >  	}
> >  
> > -	if (sgx_nr_free_pages < SGX_NR_LOW_PAGES)
> > +	if (!sgx_at_least_N_free_pages(SGX_NR_LOW_PAGES))
> >  		wake_up(&ksgxswapd_waitq);
> >  
> >  	return entry;
> > @@ -131,7 +130,7 @@ void __sgx_free_page(struct sgx_epc_page *page)
> >  
> >  	spin_lock(&section->lock);
> >  	list_add_tail(&page->list, &section->page_list);
> > -	sgx_nr_free_pages++;
> > +	section->free_cnt++;
> >  	spin_unlock(&section->lock);
> >  
> >  }
> > @@ -218,7 +217,7 @@ static bool __init sgx_alloc_epc_section(u64 addr, u64 size,
> >  		list_add_tail(&page->list, &section->unsanitized_page_list);
> >  	}
> >  
> > -	sgx_nr_free_pages += nr_pages;
> > +	section->free_cnt = nr_pages;
> >  
> >  	return true;
> >  
> > diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c b/arch/x86/kernel/cpu/sgx/reclaim.c
> > index 3f183dd0e653..8619141f4bed 100644
> > --- a/arch/x86/kernel/cpu/sgx/reclaim.c
> > +++ b/arch/x86/kernel/cpu/sgx/reclaim.c
> > @@ -68,7 +68,7 @@ static void sgx_sanitize_section(struct sgx_epc_section *section)
> >  
> >  static inline bool sgx_should_reclaim(void)
> >  {
> > -	return sgx_nr_free_pages < SGX_NR_HIGH_PAGES &&
> > +	return !sgx_at_least_N_free_pages(SGX_NR_HIGH_PAGES) &&
> >  	       !list_empty(&sgx_active_page_list);
> >  }
> >  
> > @@ -430,7 +430,7 @@ void sgx_reclaim_pages(void)
> >  		section = sgx_epc_section(epc_page);
> >  		spin_lock(&section->lock);
> >  		list_add_tail(&epc_page->list, &section->page_list);
> > -		sgx_nr_free_pages++;
> > +		section->free_cnt++;
> >  		spin_unlock(&section->lock);
> >  	}
> >  }
> > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> > index 87e375e8c25e..c7f0277299f6 100644
> > --- a/arch/x86/kernel/cpu/sgx/sgx.h
> > +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> > @@ -30,6 +30,7 @@ struct sgx_epc_page {
> >  struct sgx_epc_section {
> >  	unsigned long pa;
> >  	void *va;
> > +	unsigned long free_cnt;
> >  	struct list_head page_list;
> >  	struct list_head unsanitized_page_list;
> >  	spinlock_t lock;
> > @@ -73,12 +74,27 @@ static inline void *sgx_epc_addr(struct sgx_epc_page *page)
> >  #define SGX_NR_HIGH_PAGES	64
> >  
> >  extern int sgx_nr_epc_sections;
> > -extern unsigned long sgx_nr_free_pages;
> >  extern struct task_struct *ksgxswapd_tsk;
> >  extern struct wait_queue_head(ksgxswapd_waitq);
> >  extern struct list_head sgx_active_page_list;
> >  extern spinlock_t sgx_active_page_list_lock;
> >  
> > +static inline bool sgx_at_least_N_free_pages(unsigned long threshold)
> 
> There is an upper case letter in the function name and name is also
> weird overally.

Ya, not a fan of the name either.

> > +{
> > +	struct sgx_epc_section *section;
> > +	unsigned long free_cnt = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < sgx_nr_epc_sections; i++) {
> > +		section = &sgx_epc_sections[i];
> > +		free_cnt += section->free_cnt;
> > +		if (free_cnt >= threshold)
> > +			return true;
> > +	}
> > +
> > +	return false;
> > +}
> 
> The complexity does not pay here. Better to revert instead back to this
> if required:

I'd prefer to optimize for the happy case, even if it's minor.  But I'm
fine going with the below code.

> unsigned long sgx_calc_free_cnt(void)
> {
> 	struct sgx_epc_section *section;
> 	unsigned long free_cnt = 0;
> 	int i;
> 
> 	for (i = 0; i < sgx_nr_epc_sections; i++) {
> 		section = &sgx_epc_sections[i];
> 		free_cnt += section->free_cnt;
> 	}
> 
> 	return free_cnt;
> }
> 
> /Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 12/12] x86/sgx: Reinstate per EPC section free page counts
  2019-10-18 14:30     ` Sean Christopherson
@ 2019-10-21 11:19       ` Jarkko Sakkinen
  2019-10-22 19:35         ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-21 11:19 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Fri, Oct 18, 2019 at 07:30:57AM -0700, Sean Christopherson wrote:
> On Fri, Oct 18, 2019 at 03:49:42PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Oct 16, 2019 at 11:37:45AM -0700, Sean Christopherson wrote:
> > > Track the free page count on a per EPC section basis so that the value
> > > is properly protected by the section's spinlock.
> > > 
> > > As was pointed out when the change was proposed[*], using a global
> > > non-atomic counter to track the number of free EPC pages is not safe.
> > > The order of non-atomic reads and writes are not guaranteed, i.e.
> > > concurrent RMW operats can write stale data.  This causes a variety
> > > of bad behavior, e.g. livelocks because the free page count wraps and
> > > causes the swap thread to stop reclaiming.
> > > 
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > What is the reason not change it just to atomic?
> 
> The purpose of separate sections is to avoid bouncing locks and whatnot
> across packages.  Adding a global atomic to the hotpath defeats that
> purpose.

I do get that but it does not actually cause incorrect behaviour,
right? Not being atomic obivously does because READ part of the
READ+STORE can get re-ordered.

> Can't the sysfs read helper aggregate the info?

Sure.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 01/12] x86/sgx: Pass EADD the kernel's virtual address for the source page
  2019-10-18  9:57   ` Jarkko Sakkinen
@ 2019-10-22  3:22     ` Sean Christopherson
  2019-10-23 11:57       ` Jarkko Sakkinen
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-10-22  3:22 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

On Fri, Oct 18, 2019 at 12:57:13PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 16, 2019 at 11:37:34AM -0700, Sean Christopherson wrote:
> > Use the kernel's virtual address to reference the source page when
> > EADDing a page to the enclave.  gup() "does not guarantee that the page
> > exists in the user mappings", i.e. EADD can still fault and deadlock
> > due to mmap_sem contention if it consumes the userspace address.
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > +	kunmap_atomic((void *)pginfo.contents);
> 
> Remark: if not casted first to unsigned long,  it will give a warning on
> 32-bit build.

Do we care?  SGX is 64-bit only, I don't see that changing anytime soon.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 09/12] x86/sgx: Split second half of sgx_free_page() to a separate helper
  2019-10-18 10:06   ` Jarkko Sakkinen
@ 2019-10-22  3:36     ` Sean Christopherson
  2019-10-23 11:59       ` Jarkko Sakkinen
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-10-22  3:36 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

On Fri, Oct 18, 2019 at 01:06:55PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 16, 2019 at 11:37:42AM -0700, Sean Christopherson wrote:
> > Move the post-reclaim half of sgx_free_page() to a standalone helper so
> > that it can be used in flows where the page is known to be
> > non-reclaimable.
> 
> The call sites wher it is known to be reclaimable should handle the
> error instead of creating call site specific versions of the function.

What if we completely split the function(s)?  The existing callers of
sgx_free_page() stay as is, there is one and only one "free_page()", we
don't take sgx_active_page_list_lock in most flows, and the one case
where failure is acceptable gets to do its thing.  I think this'd make
both of us happy.  E.g.:

/**
 * sgx_unmark_page_reclaimable() - Unmark a page as reclaimable
 * @page:	EPC page
 *
 * Clear the reclaimable flag from the page and remove the page from the active
 * page list.
 *
 * Return:
 *   0 on success,
 *   -EBUSY if a reclaim is in progress
 */
int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
{
	/*
	 * Remove the page from the active list if necessary.  If the page
	 * is actively being reclaimed, i.e. RECLAIMABLE is set but the
	 * page isn't on the active list, return -EBUSY as we can't free
	 * the page at this time since it is "owned" by the reclaimer.
	 */
	spin_lock(&sgx_active_page_list_lock);
	if (page->desc & SGX_EPC_PAGE_RECLAIMABLE) {
		if (list_empty(&page->list)) {
			spin_unlock(&sgx_active_page_list_lock);
			return -EBUSY;
		}
		list_del(&page->list);
		page->desc &= ~SGX_EPC_PAGE_RECLAIMABLE;
	}
	spin_unlock(&sgx_active_page_list_lock);

	return 0;
}

/**
 * sgx_free_page() - Free an EPC page
 * @page:	pointer a previously allocated EPC page
 *
 * EREMOVE an EPC page and insert it back to the list of free pages. The page
 * must not be reclaimable.
 */
void sgx_free_page(struct sgx_epc_page *page)
{
	struct sgx_epc_section *section;
	int ret;

	/*
	 * Don't take sgx_active_page_list_lock when asserting the page isn't
	 * reclaimable, missing a WARN in the very rare case is preferable to
	 * unnecessarily taking a global lock in the common case.
	 */
	WARN_ON_ONCE(page->desc & SGX_EPC_PAGE_RECLAIMABLE);

	ret = __eremove(sgx_epc_addr(page));
	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
		return;

	section = sgx_epc_section(page);

	spin_lock(&section->lock);
	list_add_tail(&page->list, &section->page_list);
	section->free_cnt++;
	spin_unlock(&section->lock);
}  


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 12/12] x86/sgx: Reinstate per EPC section free page counts
  2019-10-21 11:19       ` Jarkko Sakkinen
@ 2019-10-22 19:35         ` Sean Christopherson
  2019-10-23 12:02           ` Jarkko Sakkinen
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-10-22 19:35 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

On Mon, Oct 21, 2019 at 02:19:08PM +0300, Jarkko Sakkinen wrote:
> On Fri, Oct 18, 2019 at 07:30:57AM -0700, Sean Christopherson wrote:
> > On Fri, Oct 18, 2019 at 03:49:42PM +0300, Jarkko Sakkinen wrote:
> > > On Wed, Oct 16, 2019 at 11:37:45AM -0700, Sean Christopherson wrote:
> > > > Track the free page count on a per EPC section basis so that the value
> > > > is properly protected by the section's spinlock.
> > > > 
> > > > As was pointed out when the change was proposed[*], using a global
> > > > non-atomic counter to track the number of free EPC pages is not safe.
> > > > The order of non-atomic reads and writes are not guaranteed, i.e.
> > > > concurrent RMW operats can write stale data.  This causes a variety
> > > > of bad behavior, e.g. livelocks because the free page count wraps and
> > > > causes the swap thread to stop reclaiming.
> > > > 
> > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > 
> > > What is the reason not change it just to atomic?
> > 
> > The purpose of separate sections is to avoid bouncing locks and whatnot
> > across packages.  Adding a global atomic to the hotpath defeats that
> > purpose.
> 
> I do get that but it does not actually cause incorrect behaviour,
> right? Not being atomic obivously does because READ part of the
> READ+STORE can get re-ordered.

Haven't tested yet, but it should be functionally correct.  I just don't
understand the motivation for the change to a global free count.  I get
that we don't have any NUMA awareness whatsoever, but if that's the
argument, why bother with the complexity of per-section tracking in the
first place?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 01/12] x86/sgx: Pass EADD the kernel's virtual address for the source page
  2019-10-22  3:22     ` Sean Christopherson
@ 2019-10-23 11:57       ` Jarkko Sakkinen
  0 siblings, 0 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-23 11:57 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Mon, Oct 21, 2019 at 08:22:07PM -0700, Sean Christopherson wrote:
> On Fri, Oct 18, 2019 at 12:57:13PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Oct 16, 2019 at 11:37:34AM -0700, Sean Christopherson wrote:
> > > Use the kernel's virtual address to reference the source page when
> > > EADDing a page to the enclave.  gup() "does not guarantee that the page
> > > exists in the user mappings", i.e. EADD can still fault and deadlock
> > > due to mmap_sem contention if it consumes the userspace address.
> > > 
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > +	kunmap_atomic((void *)pginfo.contents);
> > 
> > Remark: if not casted first to unsigned long,  it will give a warning on
> > 32-bit build.
> 
> Do we care?  SGX is 64-bit only, I don't see that changing anytime soon.

Nope, that's why I started it with remark :-) It's merely note to
myself.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 09/12] x86/sgx: Split second half of sgx_free_page() to a separate helper
  2019-10-22  3:36     ` Sean Christopherson
@ 2019-10-23 11:59       ` Jarkko Sakkinen
  0 siblings, 0 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-23 11:59 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Mon, Oct 21, 2019 at 08:36:17PM -0700, Sean Christopherson wrote:
> On Fri, Oct 18, 2019 at 01:06:55PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Oct 16, 2019 at 11:37:42AM -0700, Sean Christopherson wrote:
> > > Move the post-reclaim half of sgx_free_page() to a standalone helper so
> > > that it can be used in flows where the page is known to be
> > > non-reclaimable.
> > 
> > The call sites wher it is known to be reclaimable should handle the
> > error instead of creating call site specific versions of the function.
> 
> What if we completely split the function(s)?  The existing callers of
> sgx_free_page() stay as is, there is one and only one "free_page()", we
> don't take sgx_active_page_list_lock in most flows, and the one case
> where failure is acceptable gets to do its thing.  I think this'd make
> both of us happy.  E.g.:

The split is a clean way to sort this out.

Makes sense also from "symmetry" perspective: we don't mark pages
reclaimable in sgx_alloc_page().

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH for_v23 v3 12/12] x86/sgx: Reinstate per EPC section free page counts
  2019-10-22 19:35         ` Sean Christopherson
@ 2019-10-23 12:02           ` Jarkko Sakkinen
  0 siblings, 0 replies; 32+ messages in thread
From: Jarkko Sakkinen @ 2019-10-23 12:02 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Tue, Oct 22, 2019 at 12:35:30PM -0700, Sean Christopherson wrote:
> On Mon, Oct 21, 2019 at 02:19:08PM +0300, Jarkko Sakkinen wrote:
> > On Fri, Oct 18, 2019 at 07:30:57AM -0700, Sean Christopherson wrote:
> > > On Fri, Oct 18, 2019 at 03:49:42PM +0300, Jarkko Sakkinen wrote:
> > > > On Wed, Oct 16, 2019 at 11:37:45AM -0700, Sean Christopherson wrote:
> > > > > Track the free page count on a per EPC section basis so that the value
> > > > > is properly protected by the section's spinlock.
> > > > > 
> > > > > As was pointed out when the change was proposed[*], using a global
> > > > > non-atomic counter to track the number of free EPC pages is not safe.
> > > > > The order of non-atomic reads and writes are not guaranteed, i.e.
> > > > > concurrent RMW operats can write stale data.  This causes a variety
> > > > > of bad behavior, e.g. livelocks because the free page count wraps and
> > > > > causes the swap thread to stop reclaiming.
> > > > > 
> > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > > 
> > > > What is the reason not change it just to atomic?
> > > 
> > > The purpose of separate sections is to avoid bouncing locks and whatnot
> > > across packages.  Adding a global atomic to the hotpath defeats that
> > > purpose.
> > 
> > I do get that but it does not actually cause incorrect behaviour,
> > right? Not being atomic obivously does because READ part of the
> > READ+STORE can get re-ordered.
> 
> Haven't tested yet, but it should be functionally correct.  I just don't
> understand the motivation for the change to a global free count.  I get
> that we don't have any NUMA awareness whatsoever, but if that's the
> argument, why bother with the complexity of per-section tracking in the
> first place?

You are right what you are saying. We can revert to the aggregation
code. I'm just checking that I exactly get the point when it comes
to concurrency issues.

I can take care of reverting it as I broke it.

/Jarkko

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-10-23 12:02 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-16 18:37 [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Sean Christopherson
2019-10-16 18:37 ` [PATCH for_v23 v3 01/12] x86/sgx: Pass EADD the kernel's virtual address for the source page Sean Christopherson
2019-10-18  9:57   ` Jarkko Sakkinen
2019-10-22  3:22     ` Sean Christopherson
2019-10-23 11:57       ` Jarkko Sakkinen
2019-10-16 18:37 ` [PATCH for_v23 v3 02/12] x86/sgx: Check the validity of the source page address for EADD Sean Christopherson
2019-10-16 18:37 ` [PATCH for_v23 v3 03/12] x86/sgx: Fix EEXTEND error handling Sean Christopherson
2019-10-18 10:42   ` Jarkko Sakkinen
2019-10-16 18:37 ` [PATCH for_v23 v3 04/12] x86/sgx: Drop mmap_sem before EEXTENDing an enclave page Sean Christopherson
2019-10-18 10:04   ` Jarkko Sakkinen
2019-10-16 18:37 ` [PATCH for_v23 v3 05/12] x86/sgx: Remove redundant message from WARN on non-emtpy mm_list Sean Christopherson
2019-10-18 12:08   ` Jarkko Sakkinen
2019-10-16 18:37 ` [PATCH for_v23 v3 06/12] x86/sgx: Fix a memory leak in sgx_encl_destroy() Sean Christopherson
2019-10-18 12:17   ` Jarkko Sakkinen
2019-10-16 18:37 ` [PATCH for_v23 v3 07/12] x86/sgx: WARN on any non-zero return from __eremove() Sean Christopherson
2019-10-16 18:37 ` [PATCH for_v23 v3 08/12] x86/sgx: WARN only once if EREMOVE fails Sean Christopherson
2019-10-16 18:37 ` [PATCH for_v23 v3 09/12] x86/sgx: Split second half of sgx_free_page() to a separate helper Sean Christopherson
2019-10-18 10:06   ` Jarkko Sakkinen
2019-10-22  3:36     ` Sean Christopherson
2019-10-23 11:59       ` Jarkko Sakkinen
2019-10-16 18:37 ` [PATCH for_v23 v3 10/12] x86/sgx: Use the post-reclaim variant of __sgx_free_page() Sean Christopherson
2019-10-16 18:37 ` [PATCH for_v23 v3 11/12] x86/sgx: Don't update free page count if EPC section allocation fails Sean Christopherson
2019-10-16 18:37 ` [PATCH for_v23 v3 12/12] x86/sgx: Reinstate per EPC section free page counts Sean Christopherson
2019-10-18 12:49   ` Jarkko Sakkinen
2019-10-18 12:55     ` Jarkko Sakkinen
2019-10-18 14:30     ` Sean Christopherson
2019-10-21 11:19       ` Jarkko Sakkinen
2019-10-22 19:35         ` Sean Christopherson
2019-10-23 12:02           ` Jarkko Sakkinen
2019-10-17 18:10 ` [PATCH for_v23 v3 00/12] x86/sgx: Bug fixes for v23 Jarkko Sakkinen
2019-10-17 18:12   ` Jarkko Sakkinen
2019-10-18 13:13   ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).