All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/5] SGX shmem backing store issue
@ 2022-05-09 21:47 Reinette Chatre
  2022-05-09 21:47 ` [PATCH V2 1/5] x86/sgx: Disconnect backing page references from dirty status Reinette Chatre
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Reinette Chatre @ 2022-05-09 21:47 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

First version of series was submitted with incomplete fixes as RFC V1:
https://lore.kernel.org/linux-sgx/cover.1651171455.git.reinette.chatre@intel.com/

Changes since RFC V1:
- Remaining issue was root-caused with debugging help from Dave.
  Patch 4/5 is new and eliminates all occurences of the ENCLS[ELDU] related
  WARN.
- Drop "x86/sgx: Do not free backing memory on ENCLS[ELDU] failure" from
  series. ENCLS[ELDU] failure is not recoverable so it serves no purpose to
  keep the backing memory that could not be restored to the enclave.
- Patch 1/5 is new and refactors sgx_encl_put_backing() to only put
  references to pages and not also mark the pages as dirty. (Dave)
- Mark PCMD page dirty before (not after) writing data to it. (Dave)
- Patch 5/5 is new and adds debug code to WARN if PCMD page is ever
  found to contain data after it is truncated. (Dave)

Hi Everybody,

Haitao reported encountering the following WARN while stress testing SGX
with the SGX2 series [1] applied:

ELDU returned 1073741837 (0x4000000d)
WARNING: CPU: 72 PID: 24407 at arch/x86/kernel/cpu/sgx/encl.c:81 sgx_encl_eldu+0x3cf/0x400
...
Call Trace:
<TASK>
? xa_load+0x6e/0xa0
__sgx_encl_load_page+0x3d/0x80
sgx_encl_load_page_in_vma+0x4a/0x60
sgx_vma_fault+0x7f/0x3b0
? sysvec_call_function_single+0x66/0xd0
? asm_sysvec_call_function_single+0x12/0x20
__do_fault+0x39/0x110
__handle_mm_fault+0x1222/0x15a0
handle_mm_fault+0xdb/0x2c0
do_user_addr_fault+0x1d1/0x650
? exit_to_user_mode_prepare+0x45/0x170
exc_page_fault+0x76/0x170
? asm_exc_page_fault+0x8/0x30
asm_exc_page_fault+0x1e/0x30
...
</TASK>

ENCLS[ELDU] is returning a #GP when attempting to load an enclave
page from the backing store into the enclave.

Haitao's stress testing involves running two concurrent loops of the SGX2
selftests on a system with 4GB EPC memory. One of the tests is modified
to reduce the oversubscription heap size:

diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index d480c2dd2858..12008789325b 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -398,7 +398,7 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
 	 * Create enclave with additional heap that is as big as all
 	 * available physical SGX memory.
 	 */
-	total_mem = get_total_epc_mem();
+	total_mem = get_total_epc_mem()/16;
 	ASSERT_NE(total_mem, 0);
 	TH_LOG("Creating an enclave with %lu bytes heap may take a while ...",
 	       total_mem);

If the the test compiled with above snippet is renamed as "test_sgx_small"
and the original renamed as "test_sgx_large" the two concurrent loops are
run as follows:

(for i in $(seq 1 999); do echo "Iteration $i"; sudo ./test_sgx_large; done ) > log.large 2>&1
(for i in $(seq 1 9999); do echo "Iteration $i"; sudo ./test_sgx_small; done ) > log.small 2>&1

If the SGX driver is modified to always WARN when ENCLS[ELDU] encounters a #GP
(see below) then the WARN appears after a few iterations of "test_sgx_large"
and shows up throughout the testing.

diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index 99004b02e2ed..68c1dbc84ed3 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -18,7 +18,7 @@
 #define ENCLS_WARN(r, name) {						  \
 	do {								  \
 		int _r = (r);						  \
-		WARN_ONCE(_r, "%s returned %d (0x%x)\n", (name), _r, _r); \
+		WARN(_r, "%s returned %d (0x%x)\n", (name), _r, _r); \
 	} while (0);							  \
 }


I learned the following during investigation of the issue:
* Reverting commit 08999b2489b4 ("x86/sgx: Free backing memory after
  faulting the enclave page") resolves the issue. With that commit
  reverted the concurrent selftest loops can run to completion without
  encountering any WARNs.

* The issue is also resolved if only the calls (introduced by commit
  08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave
  page") ) to sgx_encl_truncate_backing_page() within __sgx_encl_eldu()
  are disabled.

* ENCLS[ELDU] faults with #GP when the provided PCMD data is all zeroes.

The fixes in this series address scenarios where the PCMD data in the
backing store may not be correct. There are no occurences of the
WARN when the stress testing is performed with this series applied.

Thank you very much

Reinette

[1] https://lore.kernel.org/lkml/cover.1649878359.git.reinette.chatre@intel.com/

Reinette Chatre (5):
  x86/sgx: Disconnect backing page references from dirty status
  x86/sgx: Mark PCMD page as dirty when modifying contents
  x86/sgx: Obtain backing storage page with enclave mutex held
  x86/sgx: Fix race between reclaimer and page fault handler
  x86/sgx: Ensure no data in PCMD page after truncate

 arch/x86/kernel/cpu/sgx/encl.c | 108 ++++++++++++++++++++++++++++++---
 arch/x86/kernel/cpu/sgx/encl.h |   2 +-
 arch/x86/kernel/cpu/sgx/main.c |  13 ++--
 3 files changed, 109 insertions(+), 14 deletions(-)


base-commit: 672c0c5173427e6b3e2a9bbb7be51ceeec78093a
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V2 1/5] x86/sgx: Disconnect backing page references from dirty status
  2022-05-09 21:47 [PATCH V2 0/5] SGX shmem backing store issue Reinette Chatre
@ 2022-05-09 21:47 ` Reinette Chatre
  2022-05-10 20:12   ` Dave Hansen
  2022-05-09 21:48 ` [PATCH V2 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents Reinette Chatre
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Reinette Chatre @ 2022-05-09 21:47 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

sgx_encl_put_backing() is used to release references to the
backing storage and, optionally, mark both backing store pages
as dirty.

Managing references and dirty status together has two consequences:
1) Both backing store pages are marked as dirty, even if only one of
   the backing store pages are changed.
2) Backing store pages are marked dirty after the important data
   has been written to it. This is against the custom of marking
   pages as dirty before important data are written to them.

Remove the option to sgx_encl_put_backing() to set the backing
pages as dirty and set the needed pages as dirty right before
receiving important data. This aligns the usage with the
existing custom and prepares the code for a following change
where only one of the backing pages need to be marked as dirty.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 10 ++--------
 arch/x86/kernel/cpu/sgx/encl.h |  2 +-
 arch/x86/kernel/cpu/sgx/main.c |  6 ++++--
 3 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 7c63a1911fae..398695a20605 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -94,7 +94,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	kunmap_atomic(pcmd_page);
 	kunmap_atomic((void *)(unsigned long)pginfo.contents);
 
-	sgx_encl_put_backing(&b, false);
+	sgx_encl_put_backing(&b);
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
@@ -645,15 +645,9 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 /**
  * sgx_encl_put_backing() - Unpin the backing storage
  * @backing:	data for accessing backing storage for the page
- * @do_write:	mark pages dirty
  */
-void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write)
+void sgx_encl_put_backing(struct sgx_backing *backing)
 {
-	if (do_write) {
-		set_page_dirty(backing->pcmd);
-		set_page_dirty(backing->contents);
-	}
-
 	put_page(backing->pcmd);
 	put_page(backing->contents);
 }
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..d44e7372151f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -107,7 +107,7 @@ void sgx_encl_release(struct kref *ref);
 int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
 int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 			 struct sgx_backing *backing);
-void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
+void sgx_encl_put_backing(struct sgx_backing *backing);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8e4bc6453d26..fad3d6c4756e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -190,6 +190,8 @@ static int __sgx_encl_ewb(struct sgx_epc_page *epc_page, void *va_slot,
 	pginfo.metadata = (unsigned long)kmap_atomic(backing->pcmd) +
 			  backing->pcmd_offset;
 
+	set_page_dirty(backing->pcmd);
+	set_page_dirty(backing->contents);
 	ret = __ewb(&pginfo, sgx_get_epc_virt_addr(epc_page), va_slot);
 
 	kunmap_atomic((void *)(unsigned long)(pginfo.metadata -
@@ -320,7 +322,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 
-		sgx_encl_put_backing(&secs_backing, true);
+		sgx_encl_put_backing(&secs_backing);
 	}
 
 out:
@@ -411,7 +413,7 @@ static void sgx_reclaim_pages(void)
 
 		encl_page = epc_page->owner;
 		sgx_reclaimer_write(epc_page, &backing[i]);
-		sgx_encl_put_backing(&backing[i], true);
+		sgx_encl_put_backing(&backing[i]);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V2 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents
  2022-05-09 21:47 [PATCH V2 0/5] SGX shmem backing store issue Reinette Chatre
  2022-05-09 21:47 ` [PATCH V2 1/5] x86/sgx: Disconnect backing page references from dirty status Reinette Chatre
@ 2022-05-09 21:48 ` Reinette Chatre
  2022-05-11 10:43   ` Jarkko Sakkinen
  2022-05-09 21:48 ` [PATCH V2 3/5] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Reinette Chatre @ 2022-05-09 21:48 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Recent commit 08999b2489b4 ("x86/sgx: Free backing memory
after faulting the enclave page") expanded __sgx_encl_eldu()
to clear an enclave page's PCMD (Paging Crypto MetaData)
from the PCMD page in the backing store after the enclave
page is restored to the enclave.

Since the PCMD page in the backing store is modified the page
should be marked as dirty to ensure the modified data is retained.

Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC v1:
- Do not set dirty bit on enclave page since it is not being
  written to and always will be discarded.  (Dave)

 arch/x86/kernel/cpu/sgx/encl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 398695a20605..2521d64e8bcf 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -83,6 +83,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 		ret = -EFAULT;
 	}
 
+	set_page_dirty(b.pcmd);
 	memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V2 3/5] x86/sgx: Obtain backing storage page with enclave mutex held
  2022-05-09 21:47 [PATCH V2 0/5] SGX shmem backing store issue Reinette Chatre
  2022-05-09 21:47 ` [PATCH V2 1/5] x86/sgx: Disconnect backing page references from dirty status Reinette Chatre
  2022-05-09 21:48 ` [PATCH V2 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents Reinette Chatre
@ 2022-05-09 21:48 ` Reinette Chatre
  2022-05-11 11:13   ` Jarkko Sakkinen
  2022-05-09 21:48 ` [PATCH V2 4/5] x86/sgx: Fix race between reclaimer and page fault handler Reinette Chatre
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Reinette Chatre @ 2022-05-09 21:48 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
instruction faulting with a #GP.

The WARN is encountered when the reclaimer evicts a range of
pages from the enclave when the same pages are faulted back
right away.

The SGX backing storage is accessed on two paths: when there
are insufficient free pages in the EPC the reclaimer works
to move enclave pages to the backing storage and as enclaves
access pages that have been moved to the backing storage
they are retrieved from there as part of page fault handling.

An oversubscribed SGX system will often run the reclaimer and
page fault handler concurrently and needs to ensure that the
backing store is accessed safely between the reclaimer and
the page fault handler. This is not the case because the
reclaimer accesses the backing store without the enclave mutex
while the page fault handler accesses the backing store with
the enclave mutex.

Two scenarios are considered to describe the consequences of
the unsafe access:
(a) Scenario: Fault a page right after it was reclaimed.
    Consequence: The page is faulted by loading outdated data
    into the enclave using ENCLS[ELDU] that faults when it checks
    the MAC and PCMD data.
(b) Scenario: Fault a page while reclaiming another page that
    share a PCMD page.
    Consequence: A race between the reclaimer and page fault
    handler, the reclaimer attempting to access a PCMD at the
    same time it is truncated by the page fault handler. This
    could result in lost PCMD data. Data may still be
    lost if the reclaimer wins the race, this is addressed in
    the following patch.

The reclaimer accesses pages from the backing storage without
holding the enclave mutex and runs the risk of concurrently
accessing the backing storage with the page fault handler that
does access the backing storage with the enclave mutex held.

The two scenarios ((a) and (b)) are shown below.

In scenario (a), a page is written to the backing store
by the reclaimer and then immediately faulted back, before
the reclaimer is able to set the dirty bit of the page:

sgx_reclaim_pages() {                    sgx_vma_fault() {
...                                      ...
  sgx_reclaimer_write() {
    mutex_lock(&encl->lock);
    /* Write data to backing store */
    mutex_unlock(&encl->lock);
  }
                                         mutex_lock(&encl->lock);
                                         __sgx_encl_eldu() {
                                           ...
                                           /* Enclave backing store
                                            * page not released
                                            * nor marked dirty -
                                            * contents may not be
                                            * up to date.
                                            */
                                           sgx_encl_get_backing();
                                           ...
                                           /*
                                            * Enclave data restored
                                            * from backing store
                                            * and PCMD pages that
                                            * are not up to date.
                                            * ENCLS[ELDU] faults
                                            * because of MAC or PCMD
                                            * checking failure.
                                            */
                                           sgx_encl_put_backing();
                                         }
                                         ...
/* set page dirty */
sgx_encl_put_backing();
...
                                         mutex_unlock(&encl->lock);
}                                        }

In scenario (b) below a PCMD page is truncated from the backing
store after all its pages have been loaded in to the enclave
at the same time the PCMD page is loaded from the backing store
when one of its pages are reclaimed:

sgx_reclaim_pages() {              sgx_vma_fault() {
                                     ...
                                     mutex_lock(&encl->lock);
                                     ...
                                     __sgx_encl_eldu() {
                                       ...
                                       if (pcmd_page_empty) {
/*
 * EPC page being reclaimed              /*
 * shares a PCMD page with an             * PCMD page truncated
 * enclave page that is being             * while requested from
 * faulted in.                            * reclaimer.
 */                                       */
sgx_encl_get_backing()  <---------->      sgx_encl_truncate_backing_page()
                                        }
                                       mutex_unlock(&encl->lock);
}                                    }

In scenario (b) there is a race between the reclaimer and the page fault
handler when the reclaimer attempts to get access to the same PCMD page
that is being truncated. This could result in the reclaimer writing to
the PCMD page that is then truncated, causing the PCMD data to be lost,
or in a new PCMD page being allocated. The lost PCMD data may still occur
after protecting the backing store access with the mutex - this is fixed
in the next patch. By ensuring the backing store is accessed with the mutex
held the enclave page state can be made accurate with the
SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page
is in the process of being reclaimed.

Consistently protect the reclaimer's backing store access with the
enclave's mutex to ensure that it can safely run concurrently with the
page fault handler.

Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Reported-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index fad3d6c4756e..a60f8b2780fb 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 	sgx_encl_ewb(epc_page, backing);
 	encl_page->epc_page = NULL;
 	encl->secs_child_cnt--;
+	sgx_encl_put_backing(backing);
 
 	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
 		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
@@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void)
 			goto skip;
 
 		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
+
+		mutex_lock(&encl_page->encl->lock);
 		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
-		if (ret)
+		if (ret) {
+			mutex_unlock(&encl_page->encl->lock);
 			goto skip;
+		}
 
-		mutex_lock(&encl_page->encl->lock);
 		encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
 		mutex_unlock(&encl_page->encl->lock);
 		continue;
@@ -413,7 +417,6 @@ static void sgx_reclaim_pages(void)
 
 		encl_page = epc_page->owner;
 		sgx_reclaimer_write(epc_page, &backing[i]);
-		sgx_encl_put_backing(&backing[i]);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V2 4/5] x86/sgx: Fix race between reclaimer and page fault handler
  2022-05-09 21:47 [PATCH V2 0/5] SGX shmem backing store issue Reinette Chatre
                   ` (2 preceding siblings ...)
  2022-05-09 21:48 ` [PATCH V2 3/5] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
@ 2022-05-09 21:48 ` Reinette Chatre
  2022-05-10 21:55   ` Dave Hansen
  2022-05-09 21:48 ` [PATCH V2 5/5] x86/sgx: Ensure no data in PCMD page after truncate Reinette Chatre
  2022-05-11 15:00 ` [PATCH V2 0/5] SGX shmem backing store issue Haitao Huang
  5 siblings, 1 reply; 19+ messages in thread
From: Reinette Chatre @ 2022-05-09 21:48 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
instruction faulting with a #GP.

The WARN is encountered when the reclaimer evicts a range of
pages from the enclave when the same pages are faulted back right away.

Consider two enclave pages (ENCLAVE_A and ENCLAVE_B)
sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the
enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains
just one entry, that of ENCLAVE_B.

Scenario proceeds where ENCLAVE_A is being evicted from the enclave
while ENCLAVE_B is faulted in.

sgx_reclaim_pages() {

  ...

  /*
   * Reclaim ENCLAVE_A
   */
  mutex_lock(&encl->lock);
  /*
   * Get a reference to ENCLAVE_A's
   * shmem page where enclave page
   * encrypted data will be stored
   * as well as a reference to the
   * enclave page's PCMD data page,
   * PCMD_AB.
   * Release mutex before writing
   * any data to the shmem pages.
   */
  sgx_encl_get_backing(...);
  encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
  mutex_unlock(&encl->lock);

                                    /*
                                     * Fault ENCLAVE_B
                                     */

                                    sgx_vma_fault() {

                                      mutex_lock(&encl->lock);
                                      /*
                                       * Get reference to
                                       * ENCLAVE_B's shmem page
                                       * as well as PCMD_AB.
                                       */
                                      sgx_encl_get_backing(...)
                                     /*
                                      * Load page back into
                                      * enclave via ELDU.
                                      */
                                     /*
                                      * Release reference to
                                      * ENCLAVE_B' shmem page and
                                      * PCMD_AB.
                                      */
                                     sgx_encl_put_backing(...);
                                     /*
                                      * PCMD_AB is found empty so
                                      * it and ENCLAVE_B's shmem page
                                      * are truncated.
                                      */
                                     /* Truncate ENCLAVE_B backing page */
                                     sgx_encl_truncate_backing_page();
                                     /* Truncate PCMD_AB */
                                     sgx_encl_truncate_backing_page();

                                     mutex_unlock(&encl->lock);

                                     ...
                                     }
  mutex_lock(&encl->lock);
  encl_page->desc &=
       ~SGX_ENCL_PAGE_BEING_RECLAIMED;
  /*
  * Write encrypted contents of
  * ENCLAVE_A to ENCLAVE_A shmem
  * page and its PCMD data to
  * PCMD_AB.
  */
  sgx_encl_put_backing(...)

  /*
   * Reference to PCMD_AB is
   * dropped and it is truncated.
   * ENCLAVE_A's PCMD data is lost.
   */
  mutex_unlock(&encl->lock);
}

What happens next depends on whether it is ENCLAVE_A being faulted
in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting
with a #GP.

If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called
a new PCMD page is allocated and providing the empty PCMD data for
ENCLAVE_A would cause ENCLS[ELDU] to #GP

If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the
reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction
would #GP during its checks of the PCMD value and the WARN would be
encountered.

Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time
it obtains a reference to the backing store pages of an enclave page it
is in the process of reclaiming, fix the race by only truncating the PCMD
page after ensuring that no page sharing the PCMD page is in the process
of being reclaimed.

Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
Reported-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 90 +++++++++++++++++++++++++++++++++-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 2521d64e8bcf..d1d4e8572702 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -12,6 +12,87 @@
 #include "encls.h"
 #include "sgx.h"
 
+#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd))
+/*
+ * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to
+ * determine the page index associated with the first PCMD entry
+ * within a PCMD page.
+ */
+#define PCMD_FIRST_MASK GENMASK(4, 0)
+
+/**
+ * pcmd_page_in_use() - Query if any enclave page associated with a PCMD
+ *                      page is in process of being reclaimed.
+ * @encl:        Enclave to which PCMD page belongs
+ * @start_addr:  Address of enclave page using first entry within the PCMD page
+ *
+ * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
+ * stored. The PCMD data of a reclaimed enclave page contains enough
+ * information for the processor to verify the page at the time
+ * it is loaded back into the Enclave Page Cache (EPC).
+ *
+ * The backing storage to which enclave pages are reclaimed is laid out as
+ * follows:
+ * All enclave pages:SECS page:PCMD pages
+ *
+ * Each PCMD page contains the PCMD metadata of
+ * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
+ *
+ * A PCMD page can only be truncated if it is (a) empty, and (b) no enclave
+ * page sharing the PCMD page is in the process of being reclaimed.
+ *
+ * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
+ * intends to reclaim that enclave page - it means that the PCMD data
+ * associated with that enclave page is about to get some data and thus
+ * even if the PCMD page is empty, it should not be truncated.
+ *
+ * Return: 1 the PCMD page is in use, 0 if is not in use, -EFAULT on error
+ */
+static int pcmd_page_in_use(struct sgx_encl *encl,
+			    unsigned long start_addr)
+{
+	struct sgx_encl_page *entry;
+	unsigned long addr;
+	int reclaimed = 0;
+	int i;
+
+	/*
+	 * PCMD_FIRST_MASK is based on number of PCMD entries within
+	 * PCMD page being 32.
+	 */
+	BUILD_BUG_ON(PCMDS_PER_PAGE != 32);
+
+	for (i = 0; i < PCMDS_PER_PAGE; i++) {
+		addr = start_addr + i * PAGE_SIZE;
+
+		/*
+		 * Stop when reaching the SECS page - it does not
+		 * have a page_array entry and its reclaim is
+		 * started and completed with enclave mutex held so
+		 * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
+		 * flag.
+		 */
+		if (addr == encl->base + encl->size)
+			break;
+
+		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
+		if (!entry)
+			return -EFAULT;
+
+		/*
+		 * VA page slot ID uses same bit as the flag so it is important
+		 * to ensure that the page is not already in backing store.
+		 */
+		if (entry->epc_page &&
+		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
+			reclaimed = 1;
+			break;
+		}
+	}
+
+	return reclaimed;
+}
+
 /*
  * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's
  * follow right after the EPC data in the backing storage. In addition to the
@@ -47,6 +128,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK;
 	struct sgx_encl *encl = encl_page->encl;
 	pgoff_t page_index, page_pcmd_off;
+	unsigned long pcmd_first_page;
 	struct sgx_pageinfo pginfo;
 	struct sgx_backing b;
 	bool pcmd_page_empty;
@@ -58,6 +140,12 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	else
 		page_index = PFN_DOWN(encl->size);
 
+	/*
+	 * Address of enclave page using the first entry within the
+	 * PCMD page.
+	 */
+	pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base;
+
 	page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
 
 	ret = sgx_encl_get_backing(encl, page_index, &b);
@@ -99,7 +187,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-	if (pcmd_page_empty)
+	if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page))
 		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
 
 	return ret;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V2 5/5] x86/sgx: Ensure no data in PCMD page after truncate
  2022-05-09 21:47 [PATCH V2 0/5] SGX shmem backing store issue Reinette Chatre
                   ` (3 preceding siblings ...)
  2022-05-09 21:48 ` [PATCH V2 4/5] x86/sgx: Fix race between reclaimer and page fault handler Reinette Chatre
@ 2022-05-09 21:48 ` Reinette Chatre
  2022-05-11 10:36   ` Jarkko Sakkinen
  2022-05-11 15:00 ` [PATCH V2 0/5] SGX shmem backing store issue Haitao Huang
  5 siblings, 1 reply; 19+ messages in thread
From: Reinette Chatre @ 2022-05-09 21:48 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

A PCMD (Paging Crypto MetaData) page contains the PCMD
structures of enclave pages that have been encrypted and
moved to the shmem backing store. When all enclave pages
sharing a PCMD page are loaded in the enclave, there is no
need for the PCMD page and it can be truncated from the
backing store.

A few issues appeared around the truncation of PCMD pages. The
known issues have been addressed but the PCMD handling code could
be made more robust by loudly complaining if any new issue appears
in this area.

Add a check that will complain once with a WARN if the PCMD page is not
actually empty after it has been truncated. There should never be data
in the PCMD page at this point since it is always updated with the
enclave mutex held.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index d1d4e8572702..af972dbad965 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -183,12 +183,19 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	kunmap_atomic(pcmd_page);
 	kunmap_atomic((void *)(unsigned long)pginfo.contents);
 
+	get_page(b.pcmd);
 	sgx_encl_put_backing(&b);
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-	if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page))
+	if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page)) {
 		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
+		pcmd_page = kmap_atomic(b.pcmd);
+		WARN_ON_ONCE(memchr_inv(pcmd_page, 0, PAGE_SIZE));
+		kunmap_atomic(pcmd_page);
+	}
+
+	put_page(b.pcmd);
 
 	return ret;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 1/5] x86/sgx: Disconnect backing page references from dirty status
  2022-05-09 21:47 ` [PATCH V2 1/5] x86/sgx: Disconnect backing page references from dirty status Reinette Chatre
@ 2022-05-10 20:12   ` Dave Hansen
  2022-05-10 23:00     ` Reinette Chatre
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2022-05-10 20:12 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 5/9/22 14:47, Reinette Chatre wrote:
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -190,6 +190,8 @@ static int __sgx_encl_ewb(struct sgx_epc_page *epc_page, void *va_slot,
>  	pginfo.metadata = (unsigned long)kmap_atomic(backing->pcmd) +
>  			  backing->pcmd_offset;
>  
> +	set_page_dirty(backing->pcmd);
> +	set_page_dirty(backing->contents);
>  	ret = __ewb(&pginfo, sgx_get_epc_virt_addr(epc_page), va_slot);

I think I may have led you astray on this.  It's typical to do:

	lock_page(page)
	set_page_dirty(backing->contents);
	// modify page
	unlock_page(page)

That's safe because laundering the page requires locking it.  But, the
page lock isn't held in this case.  So, it's possible to do:

	get_page(page) // page can be truncated, but not reclaimed
	set_page_dirty(page)
	// race: something launders page here
	__ewb(page)
	put_page(page)
	// page is reclaimed, __ewb() contents were thrown away

__shmem_rw() has a similar pattern to what __sgx_encl_ewb() uses.  It
doesn't hold a lock_page(), but does dirty the page *after* its memset()
writes to the page.

In other words, I think the page dirtying order in the previous (before
this patch) SGX code was correct.  The concept of this patch is correct,
but let's just change the dirtying order, please.

Could you also please add a cc: stable@ and a Link: to either this
message or the original message where I suggested this, like this
(temporary) commit has:

https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=5ee32d5d3f3cdb943b01992d2ffc5093b139d023

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 4/5] x86/sgx: Fix race between reclaimer and page fault handler
  2022-05-09 21:48 ` [PATCH V2 4/5] x86/sgx: Fix race between reclaimer and page fault handler Reinette Chatre
@ 2022-05-10 21:55   ` Dave Hansen
  2022-05-10 23:01     ` Reinette Chatre
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2022-05-10 21:55 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 5/9/22 14:48, Reinette Chatre wrote:
...
> +#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd))
> +/*
> + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to
> + * determine the page index associated with the first PCMD entry
> + * within a PCMD page.
> + */
> +#define PCMD_FIRST_MASK GENMASK(4, 0)
> +
> +/**
> + * pcmd_page_in_use() - Query if any enclave page associated with a PCMD
> + *                      page is in process of being reclaimed.

The name is a bit too generic for what it does.

> + * @encl:        Enclave to which PCMD page belongs
> + * @start_addr:  Address of enclave page using first entry within the PCMD page
> + *
> + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
> + * stored. The PCMD data of a reclaimed enclave page contains enough
> + * information for the processor to verify the page at the time
> + * it is loaded back into the Enclave Page Cache (EPC).
> + *
> + * The backing storage to which enclave pages are reclaimed is laid out as
> + * follows:
> + * All enclave pages:SECS page:PCMD pages
> + *
> + * Each PCMD page contains the PCMD metadata of
> + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
> + *
> + * A PCMD page can only be truncated if it is (a) empty, and (b) no enclave
> + * page sharing the PCMD page is in the process of being reclaimed.

Let's also point out that (b) is _because_ the page is about to become
non-empty.

> + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
> + * intends to reclaim that enclave page - it means that the PCMD data
> + * associated with that enclave page is about to get some data and thus
> + * even if the PCMD page is empty, it should not be truncated.
> + *
> + * Return: 1 the PCMD page is in use, 0 if is not in use, -EFAULT on error
> + */
> +static int pcmd_page_in_use(struct sgx_encl *encl,
> +			    unsigned long start_addr)
> +{
> +	struct sgx_encl_page *entry;
> +	unsigned long addr;
> +	int reclaimed = 0;
> +	int i;

Can 'addr' and 'entry' be declared within the loop instead?

> +
> +	/*
> +	 * PCMD_FIRST_MASK is based on number of PCMD entries within
> +	 * PCMD page being 32.
> +	 */
> +	BUILD_BUG_ON(PCMDS_PER_PAGE != 32);
> +
> +	for (i = 0; i < PCMDS_PER_PAGE; i++) {
> +		addr = start_addr + i * PAGE_SIZE;
> +
> +		/*
> +		 * Stop when reaching the SECS page - it does not
> +		 * have a page_array entry and its reclaim is
> +		 * started and completed with enclave mutex held so
> +		 * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
> +		 * flag.
> +		 */
> +		if (addr == encl->base + encl->size)
> +			break;
> +
> +		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
> +		if (!entry)
> +			return -EFAULT;

Can xa_load() return NULL if it simply can't find an 'encl_page' at
'addr'?  In that case, there would never be a PCMD entry for the page
since it doesn't exist.  Returning -EFAULT would be a
pcmd_page_in_use()==true condition, which doesn't seem accurate.

> +		/*
> +		 * VA page slot ID uses same bit as the flag so it is important
> +		 * to ensure that the page is not already in backing store.
> +		 */

Probably a patch for another day, but isn't this a bit silly?  Are we
short of bits in ->desc or something?

> +		if (entry->epc_page &&
> +		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
> +			reclaimed = 1;
> +			break;
> +		}
> +	}
> +
> +	return reclaimed;
> +}

Could we also please add a comment about encl->lock needing to be held
over this?  Without that, there would be all kinds of trouble.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 1/5] x86/sgx: Disconnect backing page references from dirty status
  2022-05-10 20:12   ` Dave Hansen
@ 2022-05-10 23:00     ` Reinette Chatre
  0 siblings, 0 replies; 19+ messages in thread
From: Reinette Chatre @ 2022-05-10 23:00 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 5/10/2022 1:12 PM, Dave Hansen wrote:
> On 5/9/22 14:47, Reinette Chatre wrote:
>> --- a/arch/x86/kernel/cpu/sgx/main.c
>> +++ b/arch/x86/kernel/cpu/sgx/main.c
>> @@ -190,6 +190,8 @@ static int __sgx_encl_ewb(struct sgx_epc_page *epc_page, void *va_slot,
>>  	pginfo.metadata = (unsigned long)kmap_atomic(backing->pcmd) +
>>  			  backing->pcmd_offset;
>>  
>> +	set_page_dirty(backing->pcmd);
>> +	set_page_dirty(backing->contents);
>>  	ret = __ewb(&pginfo, sgx_get_epc_virt_addr(epc_page), va_slot);
> 
> I think I may have led you astray on this.  It's typical to do:
> 
> 	lock_page(page)
> 	set_page_dirty(backing->contents);
> 	// modify page
> 	unlock_page(page)
> 
> That's safe because laundering the page requires locking it.  But, the
> page lock isn't held in this case.  So, it's possible to do:
> 
> 	get_page(page) // page can be truncated, but not reclaimed
> 	set_page_dirty(page)
> 	// race: something launders page here
> 	__ewb(page)
> 	put_page(page)
> 	// page is reclaimed, __ewb() contents were thrown away
> 
> __shmem_rw() has a similar pattern to what __sgx_encl_ewb() uses.  It
> doesn't hold a lock_page(), but does dirty the page *after* its memset()
> writes to the page.
> 
> In other words, I think the page dirtying order in the previous (before
> this patch) SGX code was correct.  The concept of this patch is correct,
> but let's just change the dirtying order, please.
> 
> Could you also please add a cc: stable@ and a Link: to either this
> message or the original message where I suggested this, like this
> (temporary) commit has:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=5ee32d5d3f3cdb943b01992d2ffc5093b139d023

Will do. Thank you very much.

Reinette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 4/5] x86/sgx: Fix race between reclaimer and page fault handler
  2022-05-10 21:55   ` Dave Hansen
@ 2022-05-10 23:01     ` Reinette Chatre
  0 siblings, 0 replies; 19+ messages in thread
From: Reinette Chatre @ 2022-05-10 23:01 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 5/10/2022 2:55 PM, Dave Hansen wrote:
> On 5/9/22 14:48, Reinette Chatre wrote:
> ...
>> +#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd))
>> +/*
>> + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to
>> + * determine the page index associated with the first PCMD entry
>> + * within a PCMD page.
>> + */
>> +#define PCMD_FIRST_MASK GENMASK(4, 0)
>> +
>> +/**
>> + * pcmd_page_in_use() - Query if any enclave page associated with a PCMD
>> + *                      page is in process of being reclaimed.
> 
> The name is a bit too generic for what it does.

This function is called after a PCMD page is determined to be empty and is
about to be truncated. The question this function needs to answer is, "is this
PCMD page in use?" - that is, even though it is empty, it cannot be truncated
since there is a reference to this page (specifically from the reclaimer) and
a reference is obtained with the intent to write data to the page.

For some other name options, how about:
reclaimer_has_pcmd_ref()
reclaimer_using_pcmd()

Do any of these look better to you?

> 
>> + * @encl:        Enclave to which PCMD page belongs
>> + * @start_addr:  Address of enclave page using first entry within the PCMD page
>> + *
>> + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
>> + * stored. The PCMD data of a reclaimed enclave page contains enough
>> + * information for the processor to verify the page at the time
>> + * it is loaded back into the Enclave Page Cache (EPC).
>> + *
>> + * The backing storage to which enclave pages are reclaimed is laid out as
>> + * follows:
>> + * All enclave pages:SECS page:PCMD pages
>> + *
>> + * Each PCMD page contains the PCMD metadata of
>> + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
>> + *
>> + * A PCMD page can only be truncated if it is (a) empty, and (b) no enclave
>> + * page sharing the PCMD page is in the process of being reclaimed.
> 
> Let's also point out that (b) is _because_ the page is about to become
> non-empty.

Thank you. I will emphasize that.

> 
>> + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
>> + * intends to reclaim that enclave page - it means that the PCMD data
>> + * associated with that enclave page is about to get some data and thus
>> + * even if the PCMD page is empty, it should not be truncated.
>> + *
>> + * Return: 1 the PCMD page is in use, 0 if is not in use, -EFAULT on error
>> + */
>> +static int pcmd_page_in_use(struct sgx_encl *encl,
>> +			    unsigned long start_addr)
>> +{
>> +	struct sgx_encl_page *entry;
>> +	unsigned long addr;
>> +	int reclaimed = 0;
>> +	int i;
> 
> Can 'addr' and 'entry' be declared within the loop instead?

Yes, will do.

> 
>> +
>> +	/*
>> +	 * PCMD_FIRST_MASK is based on number of PCMD entries within
>> +	 * PCMD page being 32.
>> +	 */
>> +	BUILD_BUG_ON(PCMDS_PER_PAGE != 32);
>> +
>> +	for (i = 0; i < PCMDS_PER_PAGE; i++) {
>> +		addr = start_addr + i * PAGE_SIZE;
>> +
>> +		/*
>> +		 * Stop when reaching the SECS page - it does not
>> +		 * have a page_array entry and its reclaim is
>> +		 * started and completed with enclave mutex held so
>> +		 * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
>> +		 * flag.
>> +		 */
>> +		if (addr == encl->base + encl->size)
>> +			break;
>> +
>> +		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
>> +		if (!entry)
>> +			return -EFAULT;
> 
> Can xa_load() return NULL if it simply can't find an 'encl_page' at
> 'addr'?  In that case, there would never be a PCMD entry for the page
> since it doesn't exist.  Returning -EFAULT would be a
> pcmd_page_in_use()==true condition, which doesn't seem accurate.

Thank you very much for catching this. This should be:

entry = xa_load(&encl->page_array, PFN_DOWN(addr));
if (!entry)
	continue;

> 
>> +		/*
>> +		 * VA page slot ID uses same bit as the flag so it is important
>> +		 * to ensure that the page is not already in backing store.
>> +		 */
> 
> Probably a patch for another day, but isn't this a bit silly?  Are we
> short of bits in ->desc or something?

I am not familiar with the history behind this. The VA slot information
do indeed take up quite a few bits though, leaving just three bits
available.

> 
>> +		if (entry->epc_page &&
>> +		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
>> +			reclaimed = 1;
>> +			break;
>> +		}
>> +	}
>> +
>> +	return reclaimed;
>> +}
> 
> Could we also please add a comment about encl->lock needing to be held
> over this?  Without that, there would be all kinds of trouble.

Will do. I will add a "Context:" portion to the function's kernel-doc.

Reinette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 5/5] x86/sgx: Ensure no data in PCMD page after truncate
  2022-05-09 21:48 ` [PATCH V2 5/5] x86/sgx: Ensure no data in PCMD page after truncate Reinette Chatre
@ 2022-05-11 10:36   ` Jarkko Sakkinen
  2022-05-11 18:02     ` Reinette Chatre
  0 siblings, 1 reply; 19+ messages in thread
From: Jarkko Sakkinen @ 2022-05-11 10:36 UTC (permalink / raw)
  To: Reinette Chatre; +Cc: dave.hansen, linux-sgx, haitao.huang

On Mon, May 09, 2022 at 02:48:03PM -0700, Reinette Chatre wrote:
> A PCMD (Paging Crypto MetaData) page contains the PCMD
> structures of enclave pages that have been encrypted and
> moved to the shmem backing store. When all enclave pages
> sharing a PCMD page are loaded in the enclave, there is no
> need for the PCMD page and it can be truncated from the
> backing store.
> 
> A few issues appeared around the truncation of PCMD pages. The
> known issues have been addressed but the PCMD handling code could
> be made more robust by loudly complaining if any new issue appears
> in this area.
> 
> Add a check that will complain once with a WARN if the PCMD page is not
> actually empty after it has been truncated. There should never be data
> in the PCMD page at this point since it is always updated with the
> enclave mutex held.
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/encl.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index d1d4e8572702..af972dbad965 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -183,12 +183,19 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	kunmap_atomic(pcmd_page);
>  	kunmap_atomic((void *)(unsigned long)pginfo.contents);
>  
> +	get_page(b.pcmd);
>  	sgx_encl_put_backing(&b);
>  
>  	sgx_encl_truncate_backing_page(encl, page_index);
>  
> -	if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page))
> +	if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page)) {
>  		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
> +		pcmd_page = kmap_atomic(b.pcmd);
> +		WARN_ON_ONCE(memchr_inv(pcmd_page, 0, PAGE_SIZE));

Is WARN necessary, or would it make more sense to use pr_warn()?

It would give a better chance to collect information if "panic_on_warn" is
set for the running kernel.

> +		kunmap_atomic(pcmd_page);
> +	}
> +
> +	put_page(b.pcmd);
>  
>  	return ret;
>  }
> -- 
> 2.25.1
> 

BR, Jarkko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents
  2022-05-09 21:48 ` [PATCH V2 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents Reinette Chatre
@ 2022-05-11 10:43   ` Jarkko Sakkinen
  2022-05-11 18:01     ` Reinette Chatre
  0 siblings, 1 reply; 19+ messages in thread
From: Jarkko Sakkinen @ 2022-05-11 10:43 UTC (permalink / raw)
  To: Reinette Chatre; +Cc: dave.hansen, linux-sgx, haitao.huang

On Mon, May 09, 2022 at 02:48:00PM -0700, Reinette Chatre wrote:
> Recent commit 08999b2489b4 ("x86/sgx: Free backing memory
> after faulting the enclave page") expanded __sgx_encl_eldu()
> to clear an enclave page's PCMD (Paging Crypto MetaData)
> from the PCMD page in the backing store after the enclave
> page is restored to the enclave.
> 
> Since the PCMD page in the backing store is modified the page
> should be marked as dirty to ensure the modified data is retained.
> 
> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since RFC v1:
> - Do not set dirty bit on enclave page since it is not being
>   written to and always will be discarded.  (Dave)
> 
>  arch/x86/kernel/cpu/sgx/encl.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 398695a20605..2521d64e8bcf 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -83,6 +83,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  		ret = -EFAULT;
>  	}
>  
> +	set_page_dirty(b.pcmd);
>  	memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
>  
>  	/*
> -- 
> 2.25.1
> 

LGTM.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 3/5] x86/sgx: Obtain backing storage page with enclave mutex held
  2022-05-09 21:48 ` [PATCH V2 3/5] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
@ 2022-05-11 11:13   ` Jarkko Sakkinen
  2022-05-11 18:02     ` Reinette Chatre
  0 siblings, 1 reply; 19+ messages in thread
From: Jarkko Sakkinen @ 2022-05-11 11:13 UTC (permalink / raw)
  To: Reinette Chatre; +Cc: dave.hansen, linux-sgx, haitao.huang

On Mon, May 09, 2022 at 02:48:01PM -0700, Reinette Chatre wrote:
> Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
> instruction faulting with a #GP.
> 
> The WARN is encountered when the reclaimer evicts a range of
> pages from the enclave when the same pages are faulted back
> right away.
> 
> The SGX backing storage is accessed on two paths: when there
> are insufficient free pages in the EPC the reclaimer works
> to move enclave pages to the backing storage and as enclaves
> access pages that have been moved to the backing storage
> they are retrieved from there as part of page fault handling.
> 
> An oversubscribed SGX system will often run the reclaimer and
> page fault handler concurrently and needs to ensure that the
> backing store is accessed safely between the reclaimer and
> the page fault handler. This is not the case because the
> reclaimer accesses the backing store without the enclave mutex
> while the page fault handler accesses the backing store with
> the enclave mutex.
> 
> Two scenarios are considered to describe the consequences of
> the unsafe access:
> (a) Scenario: Fault a page right after it was reclaimed.
>     Consequence: The page is faulted by loading outdated data
>     into the enclave using ENCLS[ELDU] that faults when it checks
>     the MAC and PCMD data.
> (b) Scenario: Fault a page while reclaiming another page that
>     share a PCMD page.
>     Consequence: A race between the reclaimer and page fault
>     handler, the reclaimer attempting to access a PCMD at the
>     same time it is truncated by the page fault handler. This
>     could result in lost PCMD data. Data may still be
>     lost if the reclaimer wins the race, this is addressed in
>     the following patch.
> 
> The reclaimer accesses pages from the backing storage without
> holding the enclave mutex and runs the risk of concurrently
> accessing the backing storage with the page fault handler that
> does access the backing storage with the enclave mutex held.
> 
> The two scenarios ((a) and (b)) are shown below.
> 
> In scenario (a), a page is written to the backing store
> by the reclaimer and then immediately faulted back, before
> the reclaimer is able to set the dirty bit of the page:
> 
> sgx_reclaim_pages() {                    sgx_vma_fault() {
> ...                                      ...
>   sgx_reclaimer_write() {
>     mutex_lock(&encl->lock);
>     /* Write data to backing store */
>     mutex_unlock(&encl->lock);
>   }
>                                          mutex_lock(&encl->lock);
>                                          __sgx_encl_eldu() {
>                                            ...
>                                            /* Enclave backing store
>                                             * page not released
>                                             * nor marked dirty -
>                                             * contents may not be
>                                             * up to date.
>                                             */
>                                            sgx_encl_get_backing();
>                                            ...
>                                            /*
>                                             * Enclave data restored
>                                             * from backing store
>                                             * and PCMD pages that
>                                             * are not up to date.
>                                             * ENCLS[ELDU] faults
>                                             * because of MAC or PCMD
>                                             * checking failure.
>                                             */
>                                            sgx_encl_put_backing();
>                                          }
>                                          ...
> /* set page dirty */
> sgx_encl_put_backing();
> ...
>                                          mutex_unlock(&encl->lock);
> }                                        }
> 
> In scenario (b) below a PCMD page is truncated from the backing
> store after all its pages have been loaded in to the enclave
> at the same time the PCMD page is loaded from the backing store
> when one of its pages are reclaimed:
> 
> sgx_reclaim_pages() {              sgx_vma_fault() {
>                                      ...
>                                      mutex_lock(&encl->lock);
>                                      ...
>                                      __sgx_encl_eldu() {
>                                        ...
>                                        if (pcmd_page_empty) {
> /*
>  * EPC page being reclaimed              /*
>  * shares a PCMD page with an             * PCMD page truncated
>  * enclave page that is being             * while requested from
>  * faulted in.                            * reclaimer.
>  */                                       */
> sgx_encl_get_backing()  <---------->      sgx_encl_truncate_backing_page()
>                                         }
>                                        mutex_unlock(&encl->lock);
> }                                    }
> 
> In scenario (b) there is a race between the reclaimer and the page fault
> handler when the reclaimer attempts to get access to the same PCMD page
> that is being truncated. This could result in the reclaimer writing to
> the PCMD page that is then truncated, causing the PCMD data to be lost,
> or in a new PCMD page being allocated. The lost PCMD data may still occur
> after protecting the backing store access with the mutex - this is fixed
> in the next patch. By ensuring the backing store is accessed with the mutex
> held the enclave page state can be made accurate with the
> SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page
> is in the process of being reclaimed.
> 
> Consistently protect the reclaimer's backing store access with the
> enclave's mutex to ensure that it can safely run concurrently with the
> page fault handler.
> 
> Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
> Reported-by: Haitao Huang <haitao.huang@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index fad3d6c4756e..a60f8b2780fb 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>  	sgx_encl_ewb(epc_page, backing);
>  	encl_page->epc_page = NULL;
>  	encl->secs_child_cnt--;
> +	sgx_encl_put_backing(backing);
>  
>  	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
>  		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
> @@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void)
>  			goto skip;
>  
>  		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
> +
> +		mutex_lock(&encl_page->encl->lock);
>  		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
> -		if (ret)
> +		if (ret) {
> +			mutex_unlock(&encl_page->encl->lock);
>  			goto skip;
> +		}
>  
> -		mutex_lock(&encl_page->encl->lock);
>  		encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
>  		mutex_unlock(&encl_page->encl->lock);
>  		continue;
> @@ -413,7 +417,6 @@ static void sgx_reclaim_pages(void)
>  
>  		encl_page = epc_page->owner;
>  		sgx_reclaimer_write(epc_page, &backing[i]);
> -		sgx_encl_put_backing(&backing[i]);
>  
>  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
>  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> -- 
> 2.25.1
> 

I get the locking part but why is the move of sgx_encl_put_backing
relevant?

BR, Jarkko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 0/5] SGX shmem backing store issue
  2022-05-09 21:47 [PATCH V2 0/5] SGX shmem backing store issue Reinette Chatre
                   ` (4 preceding siblings ...)
  2022-05-09 21:48 ` [PATCH V2 5/5] x86/sgx: Ensure no data in PCMD page after truncate Reinette Chatre
@ 2022-05-11 15:00 ` Haitao Huang
  5 siblings, 0 replies; 19+ messages in thread
From: Haitao Huang @ 2022-05-11 15:00 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx, Reinette Chatre; +Cc: haitao.huang

On Mon, 09 May 2022 16:47:58 -0500, Reinette Chatre  
<reinette.chatre@intel.com> wrote:

> First version of series was submitted with incomplete fixes as RFC V1:
> https://lore.kernel.org/linux-sgx/cover.1651171455.git.reinette.chatre@intel.com/
>
> Changes since RFC V1:
> - Remaining issue was root-caused with debugging help from Dave.
>   Patch 4/5 is new and eliminates all occurences of the ENCLS[ELDU]  
> related
>   WARN.
> - Drop "x86/sgx: Do not free backing memory on ENCLS[ELDU] failure" from
>   series. ENCLS[ELDU] failure is not recoverable so it serves no purpose  
> to
>   keep the backing memory that could not be restored to the enclave.
> - Patch 1/5 is new and refactors sgx_encl_put_backing() to only put
>   references to pages and not also mark the pages as dirty. (Dave)
> - Mark PCMD page dirty before (not after) writing data to it. (Dave)
> - Patch 5/5 is new and adds debug code to WARN if PCMD page is ever
>   found to contain data after it is truncated. (Dave)
>
> Hi Everybody,
>
> Haitao reported encountering the following WARN while stress testing SGX
> with the SGX2 series [1] applied:
>
> ELDU returned 1073741837 (0x4000000d)
> WARNING: CPU: 72 PID: 24407 at arch/x86/kernel/cpu/sgx/encl.c:81  
> sgx_encl_eldu+0x3cf/0x400
> ...
> Call Trace:
> <TASK>
> ? xa_load+0x6e/0xa0
> __sgx_encl_load_page+0x3d/0x80
> sgx_encl_load_page_in_vma+0x4a/0x60
> sgx_vma_fault+0x7f/0x3b0
> ? sysvec_call_function_single+0x66/0xd0
> ? asm_sysvec_call_function_single+0x12/0x20
> __do_fault+0x39/0x110
> __handle_mm_fault+0x1222/0x15a0
> handle_mm_fault+0xdb/0x2c0
> do_user_addr_fault+0x1d1/0x650
> ? exit_to_user_mode_prepare+0x45/0x170
> exc_page_fault+0x76/0x170
> ? asm_exc_page_fault+0x8/0x30
> asm_exc_page_fault+0x1e/0x30
> ...
> </TASK>
>
> ENCLS[ELDU] is returning a #GP when attempting to load an enclave
> page from the backing store into the enclave.
>
> Haitao's stress testing involves running two concurrent loops of the SGX2
> selftests on a system with 4GB EPC memory. One of the tests is modified
> to reduce the oversubscription heap size:
>
> diff --git a/tools/testing/selftests/sgx/main.c  
> b/tools/testing/selftests/sgx/main.c
> index d480c2dd2858..12008789325b 100644
> --- a/tools/testing/selftests/sgx/main.c
> +++ b/tools/testing/selftests/sgx/main.c
> @@ -398,7 +398,7 @@ TEST_F_TIMEOUT(enclave,  
> unclobbered_vdso_oversubscribed_remove, 900)
>  	 * Create enclave with additional heap that is as big as all
>  	 * available physical SGX memory.
>  	 */
> -	total_mem = get_total_epc_mem();
> +	total_mem = get_total_epc_mem()/16;
>  	ASSERT_NE(total_mem, 0);
>  	TH_LOG("Creating an enclave with %lu bytes heap may take a while ...",
>  	       total_mem);
>
> If the the test compiled with above snippet is renamed as  
> "test_sgx_small"
> and the original renamed as "test_sgx_large" the two concurrent loops are
> run as follows:
>
> (for i in $(seq 1 999); do echo "Iteration $i"; sudo ./test_sgx_large;  
> done ) > log.large 2>&1
> (for i in $(seq 1 9999); do echo "Iteration $i"; sudo ./test_sgx_small;  
> done ) > log.small 2>&1
>
> If the SGX driver is modified to always WARN when ENCLS[ELDU] encounters  
> a #GP
> (see below) then the WARN appears after a few iterations of  
> "test_sgx_large"
> and shows up throughout the testing.
>
> diff --git a/arch/x86/kernel/cpu/sgx/encls.h  
> b/arch/x86/kernel/cpu/sgx/encls.h
> index 99004b02e2ed..68c1dbc84ed3 100644
> --- a/arch/x86/kernel/cpu/sgx/encls.h
> +++ b/arch/x86/kernel/cpu/sgx/encls.h
> @@ -18,7 +18,7 @@
>  #define ENCLS_WARN(r, name) {						  \
>  	do {								  \
>  		int _r = (r);						  \
> -		WARN_ONCE(_r, "%s returned %d (0x%x)\n", (name), _r, _r); \
> +		WARN(_r, "%s returned %d (0x%x)\n", (name), _r, _r); \
>  	} while (0);							  \
>  }
>
>
> I learned the following during investigation of the issue:
> * Reverting commit 08999b2489b4 ("x86/sgx: Free backing memory after
>   faulting the enclave page") resolves the issue. With that commit
>   reverted the concurrent selftest loops can run to completion without
>   encountering any WARNs.
>
> * The issue is also resolved if only the calls (introduced by commit
>   08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave
>   page") ) to sgx_encl_truncate_backing_page() within __sgx_encl_eldu()
>   are disabled.
>
> * ENCLS[ELDU] faults with #GP when the provided PCMD data is all zeroes.
>
> The fixes in this series address scenarios where the PCMD data in the
> backing store may not be correct. There are no occurences of the
> WARN when the stress testing is performed with this series applied.
>
> Thank you very much
>
> Reinette
>
> [1]  
> https://lore.kernel.org/lkml/cover.1649878359.git.reinette.chatre@intel.com/
>
> Reinette Chatre (5):
>   x86/sgx: Disconnect backing page references from dirty status
>   x86/sgx: Mark PCMD page as dirty when modifying contents
>   x86/sgx: Obtain backing storage page with enclave mutex held
>   x86/sgx: Fix race between reclaimer and page fault handler
>   x86/sgx: Ensure no data in PCMD page after truncate
>
>  arch/x86/kernel/cpu/sgx/encl.c | 108 ++++++++++++++++++++++++++++++---
>  arch/x86/kernel/cpu/sgx/encl.h |   2 +-
>  arch/x86/kernel/cpu/sgx/main.c |  13 ++--
>  3 files changed, 109 insertions(+), 14 deletions(-)
>
>
> base-commit: 672c0c5173427e6b3e2a9bbb7be51ceeec78093a

Tested-by: Haitao Huang <haitao.huang@intel.com>
Thanks
Haitao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents
  2022-05-11 10:43   ` Jarkko Sakkinen
@ 2022-05-11 18:01     ` Reinette Chatre
  2022-05-12 14:15       ` Jarkko Sakkinen
  0 siblings, 1 reply; 19+ messages in thread
From: Reinette Chatre @ 2022-05-11 18:01 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: dave.hansen, linux-sgx, haitao.huang

Hi Jarkko,

On 5/11/2022 3:43 AM, Jarkko Sakkinen wrote:
> On Mon, May 09, 2022 at 02:48:00PM -0700, Reinette Chatre wrote:
>> Recent commit 08999b2489b4 ("x86/sgx: Free backing memory
>> after faulting the enclave page") expanded __sgx_encl_eldu()
>> to clear an enclave page's PCMD (Paging Crypto MetaData)
>> from the PCMD page in the backing store after the enclave
>> page is restored to the enclave.
>>
>> Since the PCMD page in the backing store is modified the page
>> should be marked as dirty to ensure the modified data is retained.
>>
>> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>> Changes since RFC v1:
>> - Do not set dirty bit on enclave page since it is not being
>>   written to and always will be discarded.  (Dave)
>>
>>  arch/x86/kernel/cpu/sgx/encl.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
>> index 398695a20605..2521d64e8bcf 100644
>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>> @@ -83,6 +83,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>>  		ret = -EFAULT;
>>  	}
>>  
>> +	set_page_dirty(b.pcmd);
>>  	memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
>>  
>>  	/*
>> -- 
>> 2.25.1
>>
> 
> LGTM.
> 
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

Thank you very much.

Based on my understanding of Dave's feedback I do plan to move the
set_page_dirty() to be after the memset:
https://lore.kernel.org/linux-sgx/8922e48f-6646-c7cc-6393-7c78dcf23d23@intel.com/

Reinette


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 3/5] x86/sgx: Obtain backing storage page with enclave mutex held
  2022-05-11 11:13   ` Jarkko Sakkinen
@ 2022-05-11 18:02     ` Reinette Chatre
  2022-05-12 14:14       ` Jarkko Sakkinen
  0 siblings, 1 reply; 19+ messages in thread
From: Reinette Chatre @ 2022-05-11 18:02 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: dave.hansen, linux-sgx, haitao.huang

Hi Jarkko,

On 5/11/2022 4:13 AM, Jarkko Sakkinen wrote:
> On Mon, May 09, 2022 at 02:48:01PM -0700, Reinette Chatre wrote:

...

>> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
>> index fad3d6c4756e..a60f8b2780fb 100644
>> --- a/arch/x86/kernel/cpu/sgx/main.c
>> +++ b/arch/x86/kernel/cpu/sgx/main.c
>> @@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>>  	sgx_encl_ewb(epc_page, backing);
>>  	encl_page->epc_page = NULL;
>>  	encl->secs_child_cnt--;
>> +	sgx_encl_put_backing(backing);
>>  
>>  	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
>>  		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
>> @@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void)
>>  			goto skip;
>>  
>>  		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
>> +
>> +		mutex_lock(&encl_page->encl->lock);
>>  		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
>> -		if (ret)
>> +		if (ret) {
>> +			mutex_unlock(&encl_page->encl->lock);
>>  			goto skip;
>> +		}
>>  
>> -		mutex_lock(&encl_page->encl->lock);
>>  		encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
>>  		mutex_unlock(&encl_page->encl->lock);
>>  		continue;
>> @@ -413,7 +417,6 @@ static void sgx_reclaim_pages(void)
>>  
>>  		encl_page = epc_page->owner;
>>  		sgx_reclaimer_write(epc_page, &backing[i]);
>> -		sgx_encl_put_backing(&backing[i]);
>>  
>>  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
>>  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
>> -- 
>> 2.25.1
>>
> 
> I get the locking part but why is the move of sgx_encl_put_backing
> relevant?

Moving sgx_encl_put_backing() accomplishes the locking goal.

Before the patch:

sgx_reclaim_pages() {
	...
	sgx_reclaimer_write() {
		mutex_lock(&encl->lock);
		...
		mutex_unlock(&encl->lock);
	}
	sgx_encl_put_backing(); /* Not protected by enclave mutex */
}

After the patch:

sgx_reclaim_pages() {
	...
	sgx_reclaimer_write() {
		mutex_lock(&encl->lock);
		...
			sgx_encl_put_backing(); /* Protected by enclave mutex */
		...
		mutex_unlock(&encl->lock);
	}

}

Even so, because of patch 1/1 the first scenario described in the
changelog is no longer valid since the page is marked as dirty
with the enclave mutex held. It may thus not be required
to call sgx_encl_put_backing() with enclave mutex held but it
remains important for sgx_encl_get_backing() to be called with
enclave mutex held since it ensures that SGX_ENCL_PAGE_BEING_RECLAIMED
can be used (in patch 4/5) to reliably reflect references to the
backing storage.
Considering that I would like to continue to consistently protect
sgx_encl_get_backing()/sgx_encl_put_backing() with the enclave mutex.

Reinette	






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 5/5] x86/sgx: Ensure no data in PCMD page after truncate
  2022-05-11 10:36   ` Jarkko Sakkinen
@ 2022-05-11 18:02     ` Reinette Chatre
  0 siblings, 0 replies; 19+ messages in thread
From: Reinette Chatre @ 2022-05-11 18:02 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: dave.hansen, linux-sgx, haitao.huang

Hi Jarkko,

On 5/11/2022 3:36 AM, Jarkko Sakkinen wrote:
> On Mon, May 09, 2022 at 02:48:03PM -0700, Reinette Chatre wrote:
>> A PCMD (Paging Crypto MetaData) page contains the PCMD
>> structures of enclave pages that have been encrypted and
>> moved to the shmem backing store. When all enclave pages
>> sharing a PCMD page are loaded in the enclave, there is no
>> need for the PCMD page and it can be truncated from the
>> backing store.
>>
>> A few issues appeared around the truncation of PCMD pages. The
>> known issues have been addressed but the PCMD handling code could
>> be made more robust by loudly complaining if any new issue appears
>> in this area.
>>
>> Add a check that will complain once with a WARN if the PCMD page is not
>> actually empty after it has been truncated. There should never be data
>> in the PCMD page at this point since it is always updated with the
>> enclave mutex held.
>>
>> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>>  arch/x86/kernel/cpu/sgx/encl.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
>> index d1d4e8572702..af972dbad965 100644
>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>> @@ -183,12 +183,19 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>>  	kunmap_atomic(pcmd_page);
>>  	kunmap_atomic((void *)(unsigned long)pginfo.contents);
>>  
>> +	get_page(b.pcmd);
>>  	sgx_encl_put_backing(&b);
>>  
>>  	sgx_encl_truncate_backing_page(encl, page_index);
>>  
>> -	if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page))
>> +	if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page)) {
>>  		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
>> +		pcmd_page = kmap_atomic(b.pcmd);
>> +		WARN_ON_ONCE(memchr_inv(pcmd_page, 0, PAGE_SIZE));
> 
> Is WARN necessary, or would it make more sense to use pr_warn()?
> 

I will change it to pr_warn().

> It would give a better chance to collect information if "panic_on_warn" is
> set for the running kernel.
> 

Reinette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 3/5] x86/sgx: Obtain backing storage page with enclave mutex held
  2022-05-11 18:02     ` Reinette Chatre
@ 2022-05-12 14:14       ` Jarkko Sakkinen
  0 siblings, 0 replies; 19+ messages in thread
From: Jarkko Sakkinen @ 2022-05-12 14:14 UTC (permalink / raw)
  To: Reinette Chatre; +Cc: dave.hansen, linux-sgx, haitao.huang

On Wed, May 11, 2022 at 11:02:47AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 5/11/2022 4:13 AM, Jarkko Sakkinen wrote:
> > On Mon, May 09, 2022 at 02:48:01PM -0700, Reinette Chatre wrote:
> 
> ...
> 
> >> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> >> index fad3d6c4756e..a60f8b2780fb 100644
> >> --- a/arch/x86/kernel/cpu/sgx/main.c
> >> +++ b/arch/x86/kernel/cpu/sgx/main.c
> >> @@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
> >>  	sgx_encl_ewb(epc_page, backing);
> >>  	encl_page->epc_page = NULL;
> >>  	encl->secs_child_cnt--;
> >> +	sgx_encl_put_backing(backing);
> >>  
> >>  	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
> >>  		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
> >> @@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void)
> >>  			goto skip;
> >>  
> >>  		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
> >> +
> >> +		mutex_lock(&encl_page->encl->lock);
> >>  		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
> >> -		if (ret)
> >> +		if (ret) {
> >> +			mutex_unlock(&encl_page->encl->lock);
> >>  			goto skip;
> >> +		}
> >>  
> >> -		mutex_lock(&encl_page->encl->lock);
> >>  		encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
> >>  		mutex_unlock(&encl_page->encl->lock);
> >>  		continue;
> >> @@ -413,7 +417,6 @@ static void sgx_reclaim_pages(void)
> >>  
> >>  		encl_page = epc_page->owner;
> >>  		sgx_reclaimer_write(epc_page, &backing[i]);
> >> -		sgx_encl_put_backing(&backing[i]);
> >>  
> >>  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
> >>  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> >> -- 
> >> 2.25.1
> >>
> > 
> > I get the locking part but why is the move of sgx_encl_put_backing
> > relevant?
> 
> Moving sgx_encl_put_backing() accomplishes the locking goal.
> 
> Before the patch:
> 
> sgx_reclaim_pages() {
> 	...
> 	sgx_reclaimer_write() {
> 		mutex_lock(&encl->lock);
> 		...
> 		mutex_unlock(&encl->lock);
> 	}
> 	sgx_encl_put_backing(); /* Not protected by enclave mutex */
> }
> 
> After the patch:
> 
> sgx_reclaim_pages() {
> 	...
> 	sgx_reclaimer_write() {
> 		mutex_lock(&encl->lock);
> 		...
> 			sgx_encl_put_backing(); /* Protected by enclave mutex */
> 		...
> 		mutex_unlock(&encl->lock);
> 	}
> 
> }

Right.

> Even so, because of patch 1/1 the first scenario described in the
> changelog is no longer valid since the page is marked as dirty
> with the enclave mutex held. It may thus not be required
> to call sgx_encl_put_backing() with enclave mutex held but it
> remains important for sgx_encl_get_backing() to be called with
> enclave mutex held since it ensures that SGX_ENCL_PAGE_BEING_RECLAIMED
> can be used (in patch 4/5) to reliably reflect references to the
> backing storage.
> Considering that I would like to continue to consistently protect
> sgx_encl_get_backing()/sgx_encl_put_backing() with the enclave mutex.

I fully agree with your conclusion.

> Reinette	

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

Also

Tested-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents
  2022-05-11 18:01     ` Reinette Chatre
@ 2022-05-12 14:15       ` Jarkko Sakkinen
  0 siblings, 0 replies; 19+ messages in thread
From: Jarkko Sakkinen @ 2022-05-12 14:15 UTC (permalink / raw)
  To: Reinette Chatre; +Cc: dave.hansen, linux-sgx, haitao.huang

On Wed, May 11, 2022 at 11:01:50AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 5/11/2022 3:43 AM, Jarkko Sakkinen wrote:
> > On Mon, May 09, 2022 at 02:48:00PM -0700, Reinette Chatre wrote:
> >> Recent commit 08999b2489b4 ("x86/sgx: Free backing memory
> >> after faulting the enclave page") expanded __sgx_encl_eldu()
> >> to clear an enclave page's PCMD (Paging Crypto MetaData)
> >> from the PCMD page in the backing store after the enclave
> >> page is restored to the enclave.
> >>
> >> Since the PCMD page in the backing store is modified the page
> >> should be marked as dirty to ensure the modified data is retained.
> >>
> >> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
> >> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> >> ---
> >> Changes since RFC v1:
> >> - Do not set dirty bit on enclave page since it is not being
> >>   written to and always will be discarded.  (Dave)
> >>
> >>  arch/x86/kernel/cpu/sgx/encl.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> >> index 398695a20605..2521d64e8bcf 100644
> >> --- a/arch/x86/kernel/cpu/sgx/encl.c
> >> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> >> @@ -83,6 +83,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
> >>  		ret = -EFAULT;
> >>  	}
> >>  
> >> +	set_page_dirty(b.pcmd);
> >>  	memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
> >>  
> >>  	/*
> >> -- 
> >> 2.25.1
> >>
> > 
> > LGTM.
> > 
> > Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> 
> Thank you very much.
> 
> Based on my understanding of Dave's feedback I do plan to move the
> set_page_dirty() to be after the memset:
> https://lore.kernel.org/linux-sgx/8922e48f-6646-c7cc-6393-7c78dcf23d23@intel.com/
> 
> Reinette

You can keep my tag, I'll compare the patches and complain if there was
anything else done :-) Do not expect that though (at all).

BR, Jarkko

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-05-12 14:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09 21:47 [PATCH V2 0/5] SGX shmem backing store issue Reinette Chatre
2022-05-09 21:47 ` [PATCH V2 1/5] x86/sgx: Disconnect backing page references from dirty status Reinette Chatre
2022-05-10 20:12   ` Dave Hansen
2022-05-10 23:00     ` Reinette Chatre
2022-05-09 21:48 ` [PATCH V2 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents Reinette Chatre
2022-05-11 10:43   ` Jarkko Sakkinen
2022-05-11 18:01     ` Reinette Chatre
2022-05-12 14:15       ` Jarkko Sakkinen
2022-05-09 21:48 ` [PATCH V2 3/5] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
2022-05-11 11:13   ` Jarkko Sakkinen
2022-05-11 18:02     ` Reinette Chatre
2022-05-12 14:14       ` Jarkko Sakkinen
2022-05-09 21:48 ` [PATCH V2 4/5] x86/sgx: Fix race between reclaimer and page fault handler Reinette Chatre
2022-05-10 21:55   ` Dave Hansen
2022-05-10 23:01     ` Reinette Chatre
2022-05-09 21:48 ` [PATCH V2 5/5] x86/sgx: Ensure no data in PCMD page after truncate Reinette Chatre
2022-05-11 10:36   ` Jarkko Sakkinen
2022-05-11 18:02     ` Reinette Chatre
2022-05-11 15:00 ` [PATCH V2 0/5] SGX shmem backing store issue Haitao Huang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.