All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/5] SGX shmem backing store issue
@ 2022-05-12 21:50 Reinette Chatre
  2022-05-12 21:50 ` [PATCH V3 1/5] x86/sgx: Disconnect backing page references from dirty status Reinette Chatre
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Reinette Chatre @ 2022-05-12 21:50 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: haitao.huang, hpa, linux-kernel, stable

V2:
https://lore.kernel.org/linux-sgx/cover.1652131695.git.reinette.chatre@intel.com/

Changes since V2:
- Expand audience of series to include stable team, x86 maintainers, and lkml.
- Mark pages as dirty after receiving important data, not before. (Dave)
- Improve changelogs.
- Add Haitao and Jarkko's tags.
- Fix incorrect exit if address does not have enclave page associated. (Dave)
- See individual patches for detailed changes.

First version of series was submitted with incomplete fixes as RFC V1:
https://lore.kernel.org/linux-sgx/cover.1651171455.git.reinette.chatre@intel.com/

Changes since RFC V1:
- Remaining issue was root-caused with debugging help from Dave.
  Patch 4/5 is new and eliminates all occurences of the ENCLS[ELDU] related
  WARN.
- Drop "x86/sgx: Do not free backing memory on ENCLS[ELDU] failure" from
  series. ENCLS[ELDU] failure is not recoverable so it serves no purpose to
  keep the backing memory that could not be restored to the enclave.
- Patch 1/5 is new and refactors sgx_encl_put_backing() to only put
  references to pages and not also mark the pages as dirty. (Dave)
- Mark PCMD page dirty before (not after) writing data to it. (Dave)
- Patch 5/5 is new and adds debug code to WARN if PCMD page is ever
  found to contain data after it is truncated. (Dave)

== Cover Letter ==

Hi Everybody,

Haitao reported encountering the following WARN while stress testing SGX
with the SGX2 series [1] applied:

ELDU returned 1073741837 (0x4000000d)
WARNING: CPU: 72 PID: 24407 at arch/x86/kernel/cpu/sgx/encl.c:81 sgx_encl_eldu+0x3cf/0x400
...
Call Trace:
<TASK>
? xa_load+0x6e/0xa0
__sgx_encl_load_page+0x3d/0x80
sgx_encl_load_page_in_vma+0x4a/0x60
sgx_vma_fault+0x7f/0x3b0
? sysvec_call_function_single+0x66/0xd0
? asm_sysvec_call_function_single+0x12/0x20
__do_fault+0x39/0x110
__handle_mm_fault+0x1222/0x15a0
handle_mm_fault+0xdb/0x2c0
do_user_addr_fault+0x1d1/0x650
? exit_to_user_mode_prepare+0x45/0x170
exc_page_fault+0x76/0x170
? asm_exc_page_fault+0x8/0x30
asm_exc_page_fault+0x1e/0x30
...
</TASK>

ENCLS[ELDU] is returning a #GP when attempting to load an enclave
page from the backing store into the enclave.

Haitao's stress testing involves running two concurrent loops of the SGX2
selftests on a system with 4GB EPC memory. One of the tests is modified
to reduce the oversubscription heap size:

diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index d480c2dd2858..12008789325b 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -398,7 +398,7 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
 	 * Create enclave with additional heap that is as big as all
 	 * available physical SGX memory.
 	 */
-	total_mem = get_total_epc_mem();
+	total_mem = get_total_epc_mem()/16;
 	ASSERT_NE(total_mem, 0);
 	TH_LOG("Creating an enclave with %lu bytes heap may take a while ...",
 	       total_mem);

If the the test compiled with above snippet is renamed as "test_sgx_small"
and the original renamed as "test_sgx_large" the two concurrent loops are
run as follows:

(for i in $(seq 1 999); do echo "Iteration $i"; sudo ./test_sgx_large; done ) > log.large 2>&1
(for i in $(seq 1 9999); do echo "Iteration $i"; sudo ./test_sgx_small; done ) > log.small 2>&1

If the SGX driver is modified to always WARN when ENCLS[ELDU] encounters a #GP
(see below) then the WARN appears after a few iterations of "test_sgx_large"
and shows up throughout the testing.

diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index 99004b02e2ed..68c1dbc84ed3 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -18,7 +18,7 @@
 #define ENCLS_WARN(r, name) {						  \
 	do {								  \
 		int _r = (r);						  \
-		WARN_ONCE(_r, "%s returned %d (0x%x)\n", (name), _r, _r); \
+		WARN(_r, "%s returned %d (0x%x)\n", (name), _r, _r); \
 	} while (0);							  \
 }


I learned the following during investigation of the issue:
* Reverting commit 08999b2489b4 ("x86/sgx: Free backing memory after
  faulting the enclave page") resolves the issue. With that commit
  reverted the concurrent selftest loops can run to completion without
  encountering any WARNs.

* The issue is also resolved if only the calls (introduced by commit
  08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave
  page") ) to sgx_encl_truncate_backing_page() within __sgx_encl_eldu()
  are disabled.

* ENCLS[ELDU] faults with #GP when the provided PCMD data is all zeroes. It
  does so because of a check that is not documented in the SDM. In this
  check a page type of zero indicates an SECS page is being loaded into the
  enclave, but since the SECS data is not empty the instruction faults with
  #GP.

The fixes in this series address scenarios where the PCMD data in the
backing store may not be correct. While the SGX2 tests uncovered these
issues, the fixes are not related to SGX2 and apply on top of v5.18-rc6.
There are no occurences of the WARN when the stress testing is performed
with this series applied.

Thank you very much

Reinette

[1] https://lore.kernel.org/lkml/cover.1649878359.git.reinette.chatre@intel.com/

Reinette Chatre (5):
  x86/sgx: Disconnect backing page references from dirty status
  x86/sgx: Mark PCMD page as dirty when modifying contents
  x86/sgx: Obtain backing storage page with enclave mutex held
  x86/sgx: Fix race between reclaimer and page fault handler
  x86/sgx: Ensure no data in PCMD page after truncate

 arch/x86/kernel/cpu/sgx/encl.c | 113 ++++++++++++++++++++++++++++++---
 arch/x86/kernel/cpu/sgx/encl.h |   2 +-
 arch/x86/kernel/cpu/sgx/main.c |  13 ++--
 3 files changed, 114 insertions(+), 14 deletions(-)

base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V3 1/5] x86/sgx: Disconnect backing page references from dirty status
  2022-05-12 21:50 [PATCH V3 0/5] SGX shmem backing store issue Reinette Chatre
@ 2022-05-12 21:50 ` Reinette Chatre
  2022-05-13 14:39   ` Jarkko Sakkinen
  2022-05-12 21:50 ` [PATCH V3 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents Reinette Chatre
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Reinette Chatre @ 2022-05-12 21:50 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: haitao.huang, hpa, linux-kernel, stable

SGX uses shmem backing storage to store encrypted enclave pages
and their crypto metadata when enclave pages are moved out of
enclave memory. Two shmem backing storage pages are associated with
each enclave page - one backing page to contain the encrypted
enclave page data and one backing page (shared by a few
enclave pages) to contain the crypto metadata used by the
processor to verify the enclave page when it is loaded back into
the enclave.

sgx_encl_put_backing() is used to release references to the
backing storage and, optionally, mark both backing store pages
as dirty.

Managing references and dirty status together in this way results
in both backing store pages marked as dirty, even if only one of
the backing store pages are changed.

Additionally, waiting until the page reference is dropped to set
the page dirty risks a race with the page fault handler that
may load outdated data into the enclave when a page is faulted
right after it is reclaimed.

Consider what happens if the reclaimer writes a page to the backing
store and the page is immediately faulted back, before the reclaimer
is able to set the dirty bit of the page:

sgx_reclaim_pages() {                    sgx_vma_fault() {
  ...
  sgx_encl_get_backing();
  ...                                      ...
  sgx_reclaimer_write() {
    mutex_lock(&encl->lock);
    /* Write data to backing store */
    mutex_unlock(&encl->lock);
  }
                                           mutex_lock(&encl->lock);
                                           __sgx_encl_eldu() {
                                             ...
                                             /*
                                              * Enclave backing store
                                              * page not released
                                              * nor marked dirty -
                                              * contents may not be
                                              * up to date.
                                              */
                                              sgx_encl_get_backing();
                                              ...
                                              /*
                                               * Enclave data restored
                                               * from backing store
                                               * and PCMD pages that
                                               * are not up to date.
                                               * ENCLS[ELDU] faults
                                               * because of MAC or PCMD
                                               * checking failure.
                                               */
                                               sgx_encl_put_backing();
                                            }
                                            ...
  /* set page dirty */
  sgx_encl_put_backing();
  ...
                                            mutex_unlock(&encl->lock);
}                                        }

Remove the option to sgx_encl_put_backing() to set the backing
pages as dirty and set the needed pages as dirty right after
receiving important data while enclave mutex is held. This ensures that
the page fault handler can get up to date data from a page and prepares
the code for a following change where only one of the backing pages
need to be marked as dirty.

Cc: stable@vger.kernel.org
Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/linux-sgx/8922e48f-6646-c7cc-6393-7c78dcf23d23@intel.com/
---
Changes since V2:
- Mark page as dirty after receiving important data, not before. (Dave)
- Add cc to stable and include link to discussion. (Dave)
- Update changelog and move changelog description of race between reclaimer
  and page fault handler from later in series to this patch.
- Add Haitao's Tested-by tag.

Changes since RFC v1:
- New patch.

 arch/x86/kernel/cpu/sgx/encl.c | 10 ++--------
 arch/x86/kernel/cpu/sgx/encl.h |  2 +-
 arch/x86/kernel/cpu/sgx/main.c |  6 ++++--
 3 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 7c63a1911fae..398695a20605 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -94,7 +94,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	kunmap_atomic(pcmd_page);
 	kunmap_atomic((void *)(unsigned long)pginfo.contents);
 
-	sgx_encl_put_backing(&b, false);
+	sgx_encl_put_backing(&b);
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
@@ -645,15 +645,9 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 /**
  * sgx_encl_put_backing() - Unpin the backing storage
  * @backing:	data for accessing backing storage for the page
- * @do_write:	mark pages dirty
  */
-void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write)
+void sgx_encl_put_backing(struct sgx_backing *backing)
 {
-	if (do_write) {
-		set_page_dirty(backing->pcmd);
-		set_page_dirty(backing->contents);
-	}
-
 	put_page(backing->pcmd);
 	put_page(backing->contents);
 }
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..d44e7372151f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -107,7 +107,7 @@ void sgx_encl_release(struct kref *ref);
 int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
 int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 			 struct sgx_backing *backing);
-void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
+void sgx_encl_put_backing(struct sgx_backing *backing);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8e4bc6453d26..e71df40a4f38 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -191,6 +191,8 @@ static int __sgx_encl_ewb(struct sgx_epc_page *epc_page, void *va_slot,
 			  backing->pcmd_offset;
 
 	ret = __ewb(&pginfo, sgx_get_epc_virt_addr(epc_page), va_slot);
+	set_page_dirty(backing->pcmd);
+	set_page_dirty(backing->contents);
 
 	kunmap_atomic((void *)(unsigned long)(pginfo.metadata -
 					      backing->pcmd_offset));
@@ -320,7 +322,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 
-		sgx_encl_put_backing(&secs_backing, true);
+		sgx_encl_put_backing(&secs_backing);
 	}
 
 out:
@@ -411,7 +413,7 @@ static void sgx_reclaim_pages(void)
 
 		encl_page = epc_page->owner;
 		sgx_reclaimer_write(epc_page, &backing[i]);
-		sgx_encl_put_backing(&backing[i], true);
+		sgx_encl_put_backing(&backing[i]);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V3 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents
  2022-05-12 21:50 [PATCH V3 0/5] SGX shmem backing store issue Reinette Chatre
  2022-05-12 21:50 ` [PATCH V3 1/5] x86/sgx: Disconnect backing page references from dirty status Reinette Chatre
@ 2022-05-12 21:50 ` Reinette Chatre
  2022-05-12 21:50 ` [PATCH V3 3/5] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Reinette Chatre @ 2022-05-12 21:50 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: haitao.huang, hpa, linux-kernel, stable

Recent commit 08999b2489b4 ("x86/sgx: Free backing memory
after faulting the enclave page") expanded __sgx_encl_eldu()
to clear an enclave page's PCMD (Paging Crypto MetaData)
from the PCMD page in the backing store after the enclave
page is restored to the enclave.

Since the PCMD page in the backing store is modified the page
should be marked as dirty to ensure the modified data is retained.

Cc: stable@vger.kernel.org
Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V2:
- Set page as dirty after receiving data, not before. (Dave)
- Add Jarkko's Reviewed-by tag.
- Add Haitao's Tested-by tag.

Changes since RFC v1:
- Do not set dirty bit on enclave page since it is not being
  written to and always will be discarded.  (Dave)

 arch/x86/kernel/cpu/sgx/encl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 398695a20605..5104a428b72c 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -84,6 +84,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	}
 
 	memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
+	set_page_dirty(b.pcmd);
 
 	/*
 	 * The area for the PCMD in the page was zeroed above.  Check if the
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V3 3/5] x86/sgx: Obtain backing storage page with enclave mutex held
  2022-05-12 21:50 [PATCH V3 0/5] SGX shmem backing store issue Reinette Chatre
  2022-05-12 21:50 ` [PATCH V3 1/5] x86/sgx: Disconnect backing page references from dirty status Reinette Chatre
  2022-05-12 21:50 ` [PATCH V3 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents Reinette Chatre
@ 2022-05-12 21:50 ` Reinette Chatre
  2022-05-13 14:39   ` Jarkko Sakkinen
  2022-05-12 21:51 ` [PATCH V3 4/5] x86/sgx: Fix race between reclaimer and page fault handler Reinette Chatre
  2022-05-12 21:51 ` [PATCH V3 5/5] x86/sgx: Ensure no data in PCMD page after truncate Reinette Chatre
  4 siblings, 1 reply; 10+ messages in thread
From: Reinette Chatre @ 2022-05-12 21:50 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: haitao.huang, hpa, linux-kernel, stable

Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
instruction faulting with a #GP.

The WARN is encountered when the reclaimer evicts a range of
pages from the enclave when the same pages are faulted back
right away.

The SGX backing storage is accessed on two paths: when there
are insufficient free pages in the EPC the reclaimer works
to move enclave pages to the backing storage and as enclaves
access pages that have been moved to the backing storage
they are retrieved from there as part of page fault handling.

An oversubscribed SGX system will often run the reclaimer and
page fault handler concurrently and needs to ensure that the
backing store is accessed safely between the reclaimer and
the page fault handler. This is not the case because the
reclaimer accesses the backing store without the enclave mutex
while the page fault handler accesses the backing store with
the enclave mutex.

Consider the scenario where a page is faulted while a page sharing
a PCMD page with the faulted page is being reclaimed. The
consequence is a race between the reclaimer and page fault
handler, the reclaimer attempting to access a PCMD at the
same time it is truncated by the page fault handler. This
could result in lost PCMD data. Data may still be
lost if the reclaimer wins the race, this is addressed in
the following patch.

The reclaimer accesses pages from the backing storage without
holding the enclave mutex and runs the risk of concurrently
accessing the backing storage with the page fault handler that
does access the backing storage with the enclave mutex held.

In the scenario below a PCMD page is truncated from the backing
store after all its pages have been loaded in to the enclave
at the same time the PCMD page is loaded from the backing store
when one of its pages are reclaimed:

sgx_reclaim_pages() {              sgx_vma_fault() {
                                     ...
                                     mutex_lock(&encl->lock);
                                     ...
                                     __sgx_encl_eldu() {
                                       ...
                                       if (pcmd_page_empty) {
/*
 * EPC page being reclaimed              /*
 * shares a PCMD page with an             * PCMD page truncated
 * enclave page that is being             * while requested from
 * faulted in.                            * reclaimer.
 */                                       */
sgx_encl_get_backing()  <---------->      sgx_encl_truncate_backing_page()
                                        }
                                       mutex_unlock(&encl->lock);
}                                    }

In this scenario there is a race between the reclaimer and the page fault
handler when the reclaimer attempts to get access to the same PCMD page
that is being truncated. This could result in the reclaimer writing to
the PCMD page that is then truncated, causing the PCMD data to be lost,
or in a new PCMD page being allocated. The lost PCMD data may still occur
after protecting the backing store access with the mutex - this is fixed
in the next patch. By ensuring the backing store is accessed with the mutex
held the enclave page state can be made accurate with the
SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page
is in the process of being reclaimed.

Consistently protect the reclaimer's backing store access with the
enclave's mutex to ensure that it can safely run concurrently with the
page fault handler.

Cc: stable@vger.kernel.org
Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Reported-by: Haitao Huang <haitao.huang@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V2:
- Move description of "scenario a" to the new patch in series that marks
  page as dirty with enclave mutex held.
- Add Haitao's "Tested-by" tag.
- Add Jarkko's "Reviewed-by" and "Tested-by" tags.

 arch/x86/kernel/cpu/sgx/main.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index e71df40a4f38..ab4ec54bbdd9 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 	sgx_encl_ewb(epc_page, backing);
 	encl_page->epc_page = NULL;
 	encl->secs_child_cnt--;
+	sgx_encl_put_backing(backing);
 
 	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
 		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
@@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void)
 			goto skip;
 
 		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
+
+		mutex_lock(&encl_page->encl->lock);
 		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
-		if (ret)
+		if (ret) {
+			mutex_unlock(&encl_page->encl->lock);
 			goto skip;
+		}
 
-		mutex_lock(&encl_page->encl->lock);
 		encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
 		mutex_unlock(&encl_page->encl->lock);
 		continue;
@@ -413,7 +417,6 @@ static void sgx_reclaim_pages(void)
 
 		encl_page = epc_page->owner;
 		sgx_reclaimer_write(epc_page, &backing[i]);
-		sgx_encl_put_backing(&backing[i]);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V3 4/5] x86/sgx: Fix race between reclaimer and page fault handler
  2022-05-12 21:50 [PATCH V3 0/5] SGX shmem backing store issue Reinette Chatre
                   ` (2 preceding siblings ...)
  2022-05-12 21:50 ` [PATCH V3 3/5] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
@ 2022-05-12 21:51 ` Reinette Chatre
  2022-05-13 14:40   ` Jarkko Sakkinen
  2022-05-12 21:51 ` [PATCH V3 5/5] x86/sgx: Ensure no data in PCMD page after truncate Reinette Chatre
  4 siblings, 1 reply; 10+ messages in thread
From: Reinette Chatre @ 2022-05-12 21:51 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: haitao.huang, hpa, linux-kernel, stable

Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
instruction faulting with a #GP.

The WARN is encountered when the reclaimer evicts a range of
pages from the enclave when the same pages are faulted back right away.

Consider two enclave pages (ENCLAVE_A and ENCLAVE_B)
sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the
enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains
just one entry, that of ENCLAVE_B.

Scenario proceeds where ENCLAVE_A is being evicted from the enclave
while ENCLAVE_B is faulted in.

sgx_reclaim_pages() {

  ...

  /*
   * Reclaim ENCLAVE_A
   */
  mutex_lock(&encl->lock);
  /*
   * Get a reference to ENCLAVE_A's
   * shmem page where enclave page
   * encrypted data will be stored
   * as well as a reference to the
   * enclave page's PCMD data page,
   * PCMD_AB.
   * Release mutex before writing
   * any data to the shmem pages.
   */
  sgx_encl_get_backing(...);
  encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
  mutex_unlock(&encl->lock);

                                    /*
                                     * Fault ENCLAVE_B
                                     */

                                    sgx_vma_fault() {

                                      mutex_lock(&encl->lock);
                                      /*
                                       * Get reference to
                                       * ENCLAVE_B's shmem page
                                       * as well as PCMD_AB.
                                       */
                                      sgx_encl_get_backing(...)
                                     /*
                                      * Load page back into
                                      * enclave via ELDU.
                                      */
                                     /*
                                      * Release reference to
                                      * ENCLAVE_B' shmem page and
                                      * PCMD_AB.
                                      */
                                     sgx_encl_put_backing(...);
                                     /*
                                      * PCMD_AB is found empty so
                                      * it and ENCLAVE_B's shmem page
                                      * are truncated.
                                      */
                                     /* Truncate ENCLAVE_B backing page */
                                     sgx_encl_truncate_backing_page();
                                     /* Truncate PCMD_AB */
                                     sgx_encl_truncate_backing_page();

                                     mutex_unlock(&encl->lock);

                                     ...
                                     }
  mutex_lock(&encl->lock);
  encl_page->desc &=
       ~SGX_ENCL_PAGE_BEING_RECLAIMED;
  /*
  * Write encrypted contents of
  * ENCLAVE_A to ENCLAVE_A shmem
  * page and its PCMD data to
  * PCMD_AB.
  */
  sgx_encl_put_backing(...)

  /*
   * Reference to PCMD_AB is
   * dropped and it is truncated.
   * ENCLAVE_A's PCMD data is lost.
   */
  mutex_unlock(&encl->lock);
}

What happens next depends on whether it is ENCLAVE_A being faulted
in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting
with a #GP.

If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called
a new PCMD page is allocated and providing the empty PCMD data for
ENCLAVE_A would cause ENCLS[ELDU] to #GP

If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the
reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction
would #GP during its checks of the PCMD value and the WARN would be
encountered.

Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time
it obtains a reference to the backing store pages of an enclave page it
is in the process of reclaiming, fix the race by only truncating the PCMD
page after ensuring that no page sharing the PCMD page is in the process
of being reclaimed.

Cc: stable@vger.kernel.org
Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
Reported-by: Haitao Huang <haitao.huang@intel.com>
Tested-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V2:
- Declare "addr" and "entry" within loop. (Dave)
- Fix incorrect error return when enclave page not found to belong
  to enclave. Continue search instead of returning failure. (Dave)
- Add Haitao's "Tested-by" tag.
- Rename pcmd_page_in_use() to reclaimer_writing_to_pcmd() to be less
  generic. (Dave)
- Improve function comments to be clear about it testing for PCMD
  page soon becoming non-empty, also add context info to kernel-doc
  to indicate that enclave mutex must be held. (Dave)

Changes since RFC v1:
- New patch.

 arch/x86/kernel/cpu/sgx/encl.c | 94 +++++++++++++++++++++++++++++++++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 5104a428b72c..243f3bd78145 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -12,6 +12,92 @@
 #include "encls.h"
 #include "sgx.h"
 
+#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd))
+/*
+ * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to
+ * determine the page index associated with the first PCMD entry
+ * within a PCMD page.
+ */
+#define PCMD_FIRST_MASK GENMASK(4, 0)
+
+/**
+ * reclaimer_writing_to_pcmd() - Query if any enclave page associated with
+ *                               a PCMD page is in process of being reclaimed.
+ * @encl:        Enclave to which PCMD page belongs
+ * @start_addr:  Address of enclave page using first entry within the PCMD page
+ *
+ * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
+ * stored. The PCMD data of a reclaimed enclave page contains enough
+ * information for the processor to verify the page at the time
+ * it is loaded back into the Enclave Page Cache (EPC).
+ *
+ * The backing storage to which enclave pages are reclaimed is laid out as
+ * follows:
+ * Encrypted enclave pages:SECS page:PCMD pages
+ *
+ * Each PCMD page contains the PCMD metadata of
+ * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
+ *
+ * A PCMD page can only be truncated if it is (a) empty, and (b) not in the
+ * process of getting data (and thus soon being non-empty). (b) is tested with
+ * a check if an enclave page sharing the PCMD page is in the process of being
+ * reclaimed.
+ *
+ * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
+ * intends to reclaim that enclave page - it means that the PCMD page
+ * associated with that enclave page is about to get some data and thus
+ * even if the PCMD page is empty, it should not be truncated.
+ *
+ * Context: Enclave mutex (&sgx_encl->lock) must be held.
+ * Return: 1 if the reclaimer is about to write to the PCMD page
+ *         0 if the reclaimer has no intention to write to the PCMD page
+ */
+static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
+				     unsigned long start_addr)
+{
+	int reclaimed = 0;
+	int i;
+
+	/*
+	 * PCMD_FIRST_MASK is based on number of PCMD entries within
+	 * PCMD page being 32.
+	 */
+	BUILD_BUG_ON(PCMDS_PER_PAGE != 32);
+
+	for (i = 0; i < PCMDS_PER_PAGE; i++) {
+		struct sgx_encl_page *entry;
+		unsigned long addr;
+
+		addr = start_addr + i * PAGE_SIZE;
+
+		/*
+		 * Stop when reaching the SECS page - it does not
+		 * have a page_array entry and its reclaim is
+		 * started and completed with enclave mutex held so
+		 * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
+		 * flag.
+		 */
+		if (addr == encl->base + encl->size)
+			break;
+
+		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
+		if (!entry)
+			continue;
+
+		/*
+		 * VA page slot ID uses same bit as the flag so it is important
+		 * to ensure that the page is not already in backing store.
+		 */
+		if (entry->epc_page &&
+		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
+			reclaimed = 1;
+			break;
+		}
+	}
+
+	return reclaimed;
+}
+
 /*
  * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's
  * follow right after the EPC data in the backing storage. In addition to the
@@ -47,6 +133,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK;
 	struct sgx_encl *encl = encl_page->encl;
 	pgoff_t page_index, page_pcmd_off;
+	unsigned long pcmd_first_page;
 	struct sgx_pageinfo pginfo;
 	struct sgx_backing b;
 	bool pcmd_page_empty;
@@ -58,6 +145,11 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	else
 		page_index = PFN_DOWN(encl->size);
 
+	/*
+	 * Address of enclave page using the first entry within the PCMD page.
+	 */
+	pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base;
+
 	page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
 
 	ret = sgx_encl_get_backing(encl, page_index, &b);
@@ -99,7 +191,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-	if (pcmd_page_empty)
+	if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page))
 		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
 
 	return ret;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V3 5/5] x86/sgx: Ensure no data in PCMD page after truncate
  2022-05-12 21:50 [PATCH V3 0/5] SGX shmem backing store issue Reinette Chatre
                   ` (3 preceding siblings ...)
  2022-05-12 21:51 ` [PATCH V3 4/5] x86/sgx: Fix race between reclaimer and page fault handler Reinette Chatre
@ 2022-05-12 21:51 ` Reinette Chatre
  2022-05-13 14:39   ` Jarkko Sakkinen
  4 siblings, 1 reply; 10+ messages in thread
From: Reinette Chatre @ 2022-05-12 21:51 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: haitao.huang, hpa, linux-kernel, stable

A PCMD (Paging Crypto MetaData) page contains the PCMD
structures of enclave pages that have been encrypted and
moved to the shmem backing store. When all enclave pages
sharing a PCMD page are loaded in the enclave, there is no
need for the PCMD page and it can be truncated from the
backing store.

A few issues appeared around the truncation of PCMD pages. The
known issues have been addressed but the PCMD handling code could
be made more robust by loudly complaining if any new issue appears
in this area.

Add a check that will complain with a warning if the PCMD page is not
actually empty after it has been truncated. There should never be data
in the PCMD page at this point since it is was just checked to be empty
and truncated with enclave mutex held and is updated with the
enclave mutex held.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V2:
- Change WARN_ON_ONCE() to pr_warn(). (Jarkko).
- Add Haitao's Tested-by tag.

Changes since RFC v1:
- New patch.

 arch/x86/kernel/cpu/sgx/encl.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 243f3bd78145..3c24e6124d95 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -187,12 +187,20 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	kunmap_atomic(pcmd_page);
 	kunmap_atomic((void *)(unsigned long)pginfo.contents);
 
+	get_page(b.pcmd);
 	sgx_encl_put_backing(&b);
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-	if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page))
+	if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page)) {
 		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
+		pcmd_page = kmap_atomic(b.pcmd);
+		if (memchr_inv(pcmd_page, 0, PAGE_SIZE))
+			pr_warn("PCMD page not empty after truncate.\n");
+		kunmap_atomic(pcmd_page);
+	}
+
+	put_page(b.pcmd);
 
 	return ret;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/5] x86/sgx: Disconnect backing page references from dirty status
  2022-05-12 21:50 ` [PATCH V3 1/5] x86/sgx: Disconnect backing page references from dirty status Reinette Chatre
@ 2022-05-13 14:39   ` Jarkko Sakkinen
  0 siblings, 0 replies; 10+ messages in thread
From: Jarkko Sakkinen @ 2022-05-13 14:39 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, haitao.huang,
	hpa, linux-kernel, stable

On Thu, May 12, 2022 at 02:50:57PM -0700, Reinette Chatre wrote:
> SGX uses shmem backing storage to store encrypted enclave pages
> and their crypto metadata when enclave pages are moved out of
> enclave memory. Two shmem backing storage pages are associated with
> each enclave page - one backing page to contain the encrypted
> enclave page data and one backing page (shared by a few
> enclave pages) to contain the crypto metadata used by the
> processor to verify the enclave page when it is loaded back into
> the enclave.
> 
> sgx_encl_put_backing() is used to release references to the
> backing storage and, optionally, mark both backing store pages
> as dirty.
> 
> Managing references and dirty status together in this way results
> in both backing store pages marked as dirty, even if only one of
> the backing store pages are changed.
> 
> Additionally, waiting until the page reference is dropped to set
> the page dirty risks a race with the page fault handler that
> may load outdated data into the enclave when a page is faulted
> right after it is reclaimed.
> 
> Consider what happens if the reclaimer writes a page to the backing
> store and the page is immediately faulted back, before the reclaimer
> is able to set the dirty bit of the page:
> 
> sgx_reclaim_pages() {                    sgx_vma_fault() {
>   ...
>   sgx_encl_get_backing();
>   ...                                      ...
>   sgx_reclaimer_write() {
>     mutex_lock(&encl->lock);
>     /* Write data to backing store */
>     mutex_unlock(&encl->lock);
>   }
>                                            mutex_lock(&encl->lock);
>                                            __sgx_encl_eldu() {
>                                              ...
>                                              /*
>                                               * Enclave backing store
>                                               * page not released
>                                               * nor marked dirty -
>                                               * contents may not be
>                                               * up to date.
>                                               */
>                                               sgx_encl_get_backing();
>                                               ...
>                                               /*
>                                                * Enclave data restored
>                                                * from backing store
>                                                * and PCMD pages that
>                                                * are not up to date.
>                                                * ENCLS[ELDU] faults
>                                                * because of MAC or PCMD
>                                                * checking failure.
>                                                */
>                                                sgx_encl_put_backing();
>                                             }
>                                             ...
>   /* set page dirty */
>   sgx_encl_put_backing();
>   ...
>                                             mutex_unlock(&encl->lock);
> }                                        }
> 
> Remove the option to sgx_encl_put_backing() to set the backing
> pages as dirty and set the needed pages as dirty right after
> receiving important data while enclave mutex is held. This ensures that
> the page fault handler can get up to date data from a page and prepares
> the code for a following change where only one of the backing pages
> need to be marked as dirty.
> 
> Cc: stable@vger.kernel.org
> Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Tested-by: Haitao Huang <haitao.huang@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> Link: https://lore.kernel.org/linux-sgx/8922e48f-6646-c7cc-6393-7c78dcf23d23@intel.com/
> ---
> Changes since V2:
> - Mark page as dirty after receiving important data, not before. (Dave)
> - Add cc to stable and include link to discussion. (Dave)
> - Update changelog and move changelog description of race between reclaimer
>   and page fault handler from later in series to this patch.
> - Add Haitao's Tested-by tag.
> 
> Changes since RFC v1:
> - New patch.
> 
>  arch/x86/kernel/cpu/sgx/encl.c | 10 ++--------
>  arch/x86/kernel/cpu/sgx/encl.h |  2 +-
>  arch/x86/kernel/cpu/sgx/main.c |  6 ++++--
>  3 files changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 7c63a1911fae..398695a20605 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -94,7 +94,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	kunmap_atomic(pcmd_page);
>  	kunmap_atomic((void *)(unsigned long)pginfo.contents);
>  
> -	sgx_encl_put_backing(&b, false);
> +	sgx_encl_put_backing(&b);
>  
>  	sgx_encl_truncate_backing_page(encl, page_index);
>  
> @@ -645,15 +645,9 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
>  /**
>   * sgx_encl_put_backing() - Unpin the backing storage
>   * @backing:	data for accessing backing storage for the page
> - * @do_write:	mark pages dirty
>   */
> -void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write)
> +void sgx_encl_put_backing(struct sgx_backing *backing)
>  {
> -	if (do_write) {
> -		set_page_dirty(backing->pcmd);
> -		set_page_dirty(backing->contents);
> -	}
> -
>  	put_page(backing->pcmd);
>  	put_page(backing->contents);
>  }
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index fec43ca65065..d44e7372151f 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -107,7 +107,7 @@ void sgx_encl_release(struct kref *ref);
>  int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
>  int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
>  			 struct sgx_backing *backing);
> -void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
> +void sgx_encl_put_backing(struct sgx_backing *backing);
>  int sgx_encl_test_and_clear_young(struct mm_struct *mm,
>  				  struct sgx_encl_page *page);
>  
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 8e4bc6453d26..e71df40a4f38 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -191,6 +191,8 @@ static int __sgx_encl_ewb(struct sgx_epc_page *epc_page, void *va_slot,
>  			  backing->pcmd_offset;
>  
>  	ret = __ewb(&pginfo, sgx_get_epc_virt_addr(epc_page), va_slot);
> +	set_page_dirty(backing->pcmd);
> +	set_page_dirty(backing->contents);
>  
>  	kunmap_atomic((void *)(unsigned long)(pginfo.metadata -
>  					      backing->pcmd_offset));
> @@ -320,7 +322,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>  		sgx_encl_free_epc_page(encl->secs.epc_page);
>  		encl->secs.epc_page = NULL;
>  
> -		sgx_encl_put_backing(&secs_backing, true);
> +		sgx_encl_put_backing(&secs_backing);
>  	}
>  
>  out:
> @@ -411,7 +413,7 @@ static void sgx_reclaim_pages(void)
>  
>  		encl_page = epc_page->owner;
>  		sgx_reclaimer_write(epc_page, &backing[i]);
> -		sgx_encl_put_backing(&backing[i], true);
> +		sgx_encl_put_backing(&backing[i]);
>  
>  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
>  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> -- 
> 2.25.1
> 

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 5/5] x86/sgx: Ensure no data in PCMD page after truncate
  2022-05-12 21:51 ` [PATCH V3 5/5] x86/sgx: Ensure no data in PCMD page after truncate Reinette Chatre
@ 2022-05-13 14:39   ` Jarkko Sakkinen
  0 siblings, 0 replies; 10+ messages in thread
From: Jarkko Sakkinen @ 2022-05-13 14:39 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, haitao.huang,
	hpa, linux-kernel, stable

On Thu, May 12, 2022 at 02:51:01PM -0700, Reinette Chatre wrote:
> A PCMD (Paging Crypto MetaData) page contains the PCMD
> structures of enclave pages that have been encrypted and
> moved to the shmem backing store. When all enclave pages
> sharing a PCMD page are loaded in the enclave, there is no
> need for the PCMD page and it can be truncated from the
> backing store.
> 
> A few issues appeared around the truncation of PCMD pages. The
> known issues have been addressed but the PCMD handling code could
> be made more robust by loudly complaining if any new issue appears
> in this area.
> 
> Add a check that will complain with a warning if the PCMD page is not
> actually empty after it has been truncated. There should never be data
> in the PCMD page at this point since it is was just checked to be empty
> and truncated with enclave mutex held and is updated with the
> enclave mutex held.
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Tested-by: Haitao Huang <haitao.huang@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V2:
> - Change WARN_ON_ONCE() to pr_warn(). (Jarkko).
> - Add Haitao's Tested-by tag.
> 
> Changes since RFC v1:
> - New patch.
> 
>  arch/x86/kernel/cpu/sgx/encl.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 243f3bd78145..3c24e6124d95 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -187,12 +187,20 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	kunmap_atomic(pcmd_page);
>  	kunmap_atomic((void *)(unsigned long)pginfo.contents);
>  
> +	get_page(b.pcmd);
>  	sgx_encl_put_backing(&b);
>  
>  	sgx_encl_truncate_backing_page(encl, page_index);
>  
> -	if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page))
> +	if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page)) {
>  		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
> +		pcmd_page = kmap_atomic(b.pcmd);
> +		if (memchr_inv(pcmd_page, 0, PAGE_SIZE))
> +			pr_warn("PCMD page not empty after truncate.\n");
> +		kunmap_atomic(pcmd_page);
> +	}
> +
> +	put_page(b.pcmd);
>  
>  	return ret;
>  }
> -- 
> 2.25.1
> 

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 3/5] x86/sgx: Obtain backing storage page with enclave mutex held
  2022-05-12 21:50 ` [PATCH V3 3/5] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
@ 2022-05-13 14:39   ` Jarkko Sakkinen
  0 siblings, 0 replies; 10+ messages in thread
From: Jarkko Sakkinen @ 2022-05-13 14:39 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, haitao.huang,
	hpa, linux-kernel, stable

On Thu, May 12, 2022 at 02:50:59PM -0700, Reinette Chatre wrote:
> Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
> instruction faulting with a #GP.
> 
> The WARN is encountered when the reclaimer evicts a range of
> pages from the enclave when the same pages are faulted back
> right away.
> 
> The SGX backing storage is accessed on two paths: when there
> are insufficient free pages in the EPC the reclaimer works
> to move enclave pages to the backing storage and as enclaves
> access pages that have been moved to the backing storage
> they are retrieved from there as part of page fault handling.
> 
> An oversubscribed SGX system will often run the reclaimer and
> page fault handler concurrently and needs to ensure that the
> backing store is accessed safely between the reclaimer and
> the page fault handler. This is not the case because the
> reclaimer accesses the backing store without the enclave mutex
> while the page fault handler accesses the backing store with
> the enclave mutex.
> 
> Consider the scenario where a page is faulted while a page sharing
> a PCMD page with the faulted page is being reclaimed. The
> consequence is a race between the reclaimer and page fault
> handler, the reclaimer attempting to access a PCMD at the
> same time it is truncated by the page fault handler. This
> could result in lost PCMD data. Data may still be
> lost if the reclaimer wins the race, this is addressed in
> the following patch.
> 
> The reclaimer accesses pages from the backing storage without
> holding the enclave mutex and runs the risk of concurrently
> accessing the backing storage with the page fault handler that
> does access the backing storage with the enclave mutex held.
> 
> In the scenario below a PCMD page is truncated from the backing
> store after all its pages have been loaded in to the enclave
> at the same time the PCMD page is loaded from the backing store
> when one of its pages are reclaimed:
> 
> sgx_reclaim_pages() {              sgx_vma_fault() {
>                                      ...
>                                      mutex_lock(&encl->lock);
>                                      ...
>                                      __sgx_encl_eldu() {
>                                        ...
>                                        if (pcmd_page_empty) {
> /*
>  * EPC page being reclaimed              /*
>  * shares a PCMD page with an             * PCMD page truncated
>  * enclave page that is being             * while requested from
>  * faulted in.                            * reclaimer.
>  */                                       */
> sgx_encl_get_backing()  <---------->      sgx_encl_truncate_backing_page()
>                                         }
>                                        mutex_unlock(&encl->lock);
> }                                    }
> 
> In this scenario there is a race between the reclaimer and the page fault
> handler when the reclaimer attempts to get access to the same PCMD page
> that is being truncated. This could result in the reclaimer writing to
> the PCMD page that is then truncated, causing the PCMD data to be lost,
> or in a new PCMD page being allocated. The lost PCMD data may still occur
> after protecting the backing store access with the mutex - this is fixed
> in the next patch. By ensuring the backing store is accessed with the mutex
> held the enclave page state can be made accurate with the
> SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page
> is in the process of being reclaimed.
> 
> Consistently protect the reclaimer's backing store access with the
> enclave's mutex to ensure that it can safely run concurrently with the
> page fault handler.
> 
> Cc: stable@vger.kernel.org
> Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
> Reported-by: Haitao Huang <haitao.huang@intel.com>
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
> Tested-by: Haitao Huang <haitao.huang@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V2:
> - Move description of "scenario a" to the new patch in series that marks
>   page as dirty with enclave mutex held.
> - Add Haitao's "Tested-by" tag.
> - Add Jarkko's "Reviewed-by" and "Tested-by" tags.
> 
>  arch/x86/kernel/cpu/sgx/main.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index e71df40a4f38..ab4ec54bbdd9 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>  	sgx_encl_ewb(epc_page, backing);
>  	encl_page->epc_page = NULL;
>  	encl->secs_child_cnt--;
> +	sgx_encl_put_backing(backing);
>  
>  	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
>  		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
> @@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void)
>  			goto skip;
>  
>  		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
> +
> +		mutex_lock(&encl_page->encl->lock);
>  		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
> -		if (ret)
> +		if (ret) {
> +			mutex_unlock(&encl_page->encl->lock);
>  			goto skip;
> +		}
>  
> -		mutex_lock(&encl_page->encl->lock);
>  		encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
>  		mutex_unlock(&encl_page->encl->lock);
>  		continue;
> @@ -413,7 +417,6 @@ static void sgx_reclaim_pages(void)
>  
>  		encl_page = epc_page->owner;
>  		sgx_reclaimer_write(epc_page, &backing[i]);
> -		sgx_encl_put_backing(&backing[i]);
>  
>  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
>  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> -- 
> 2.25.1
> 

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 4/5] x86/sgx: Fix race between reclaimer and page fault handler
  2022-05-12 21:51 ` [PATCH V3 4/5] x86/sgx: Fix race between reclaimer and page fault handler Reinette Chatre
@ 2022-05-13 14:40   ` Jarkko Sakkinen
  0 siblings, 0 replies; 10+ messages in thread
From: Jarkko Sakkinen @ 2022-05-13 14:40 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, haitao.huang,
	hpa, linux-kernel, stable

On Thu, May 12, 2022 at 02:51:00PM -0700, Reinette Chatre wrote:
> Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
> instruction faulting with a #GP.
> 
> The WARN is encountered when the reclaimer evicts a range of
> pages from the enclave when the same pages are faulted back right away.
> 
> Consider two enclave pages (ENCLAVE_A and ENCLAVE_B)
> sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the
> enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains
> just one entry, that of ENCLAVE_B.
> 
> Scenario proceeds where ENCLAVE_A is being evicted from the enclave
> while ENCLAVE_B is faulted in.
> 
> sgx_reclaim_pages() {
> 
>   ...
> 
>   /*
>    * Reclaim ENCLAVE_A
>    */
>   mutex_lock(&encl->lock);
>   /*
>    * Get a reference to ENCLAVE_A's
>    * shmem page where enclave page
>    * encrypted data will be stored
>    * as well as a reference to the
>    * enclave page's PCMD data page,
>    * PCMD_AB.
>    * Release mutex before writing
>    * any data to the shmem pages.
>    */
>   sgx_encl_get_backing(...);
>   encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
>   mutex_unlock(&encl->lock);
> 
>                                     /*
>                                      * Fault ENCLAVE_B
>                                      */
> 
>                                     sgx_vma_fault() {
> 
>                                       mutex_lock(&encl->lock);
>                                       /*
>                                        * Get reference to
>                                        * ENCLAVE_B's shmem page
>                                        * as well as PCMD_AB.
>                                        */
>                                       sgx_encl_get_backing(...)
>                                      /*
>                                       * Load page back into
>                                       * enclave via ELDU.
>                                       */
>                                      /*
>                                       * Release reference to
>                                       * ENCLAVE_B' shmem page and
>                                       * PCMD_AB.
>                                       */
>                                      sgx_encl_put_backing(...);
>                                      /*
>                                       * PCMD_AB is found empty so
>                                       * it and ENCLAVE_B's shmem page
>                                       * are truncated.
>                                       */
>                                      /* Truncate ENCLAVE_B backing page */
>                                      sgx_encl_truncate_backing_page();
>                                      /* Truncate PCMD_AB */
>                                      sgx_encl_truncate_backing_page();
> 
>                                      mutex_unlock(&encl->lock);
> 
>                                      ...
>                                      }
>   mutex_lock(&encl->lock);
>   encl_page->desc &=
>        ~SGX_ENCL_PAGE_BEING_RECLAIMED;
>   /*
>   * Write encrypted contents of
>   * ENCLAVE_A to ENCLAVE_A shmem
>   * page and its PCMD data to
>   * PCMD_AB.
>   */
>   sgx_encl_put_backing(...)
> 
>   /*
>    * Reference to PCMD_AB is
>    * dropped and it is truncated.
>    * ENCLAVE_A's PCMD data is lost.
>    */
>   mutex_unlock(&encl->lock);
> }
> 
> What happens next depends on whether it is ENCLAVE_A being faulted
> in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting
> with a #GP.
> 
> If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called
> a new PCMD page is allocated and providing the empty PCMD data for
> ENCLAVE_A would cause ENCLS[ELDU] to #GP
> 
> If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the
> reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction
> would #GP during its checks of the PCMD value and the WARN would be
> encountered.
> 
> Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time
> it obtains a reference to the backing store pages of an enclave page it
> is in the process of reclaiming, fix the race by only truncating the PCMD
> page after ensuring that no page sharing the PCMD page is in the process
> of being reclaimed.
> 
> Cc: stable@vger.kernel.org
> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
> Reported-by: Haitao Huang <haitao.huang@intel.com>
> Tested-by: Haitao Huang <haitao.huang@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V2:
> - Declare "addr" and "entry" within loop. (Dave)
> - Fix incorrect error return when enclave page not found to belong
>   to enclave. Continue search instead of returning failure. (Dave)
> - Add Haitao's "Tested-by" tag.
> - Rename pcmd_page_in_use() to reclaimer_writing_to_pcmd() to be less
>   generic. (Dave)
> - Improve function comments to be clear about it testing for PCMD
>   page soon becoming non-empty, also add context info to kernel-doc
>   to indicate that enclave mutex must be held. (Dave)
> 
> Changes since RFC v1:
> - New patch.
> 
>  arch/x86/kernel/cpu/sgx/encl.c | 94 +++++++++++++++++++++++++++++++++-
>  1 file changed, 93 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 5104a428b72c..243f3bd78145 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -12,6 +12,92 @@
>  #include "encls.h"
>  #include "sgx.h"
>  
> +#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd))
> +/*
> + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to
> + * determine the page index associated with the first PCMD entry
> + * within a PCMD page.
> + */
> +#define PCMD_FIRST_MASK GENMASK(4, 0)
> +
> +/**
> + * reclaimer_writing_to_pcmd() - Query if any enclave page associated with
> + *                               a PCMD page is in process of being reclaimed.
> + * @encl:        Enclave to which PCMD page belongs
> + * @start_addr:  Address of enclave page using first entry within the PCMD page
> + *
> + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
> + * stored. The PCMD data of a reclaimed enclave page contains enough
> + * information for the processor to verify the page at the time
> + * it is loaded back into the Enclave Page Cache (EPC).
> + *
> + * The backing storage to which enclave pages are reclaimed is laid out as
> + * follows:
> + * Encrypted enclave pages:SECS page:PCMD pages
> + *
> + * Each PCMD page contains the PCMD metadata of
> + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
> + *
> + * A PCMD page can only be truncated if it is (a) empty, and (b) not in the
> + * process of getting data (and thus soon being non-empty). (b) is tested with
> + * a check if an enclave page sharing the PCMD page is in the process of being
> + * reclaimed.
> + *
> + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
> + * intends to reclaim that enclave page - it means that the PCMD page
> + * associated with that enclave page is about to get some data and thus
> + * even if the PCMD page is empty, it should not be truncated.
> + *
> + * Context: Enclave mutex (&sgx_encl->lock) must be held.
> + * Return: 1 if the reclaimer is about to write to the PCMD page
> + *         0 if the reclaimer has no intention to write to the PCMD page
> + */
> +static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
> +				     unsigned long start_addr)
> +{
> +	int reclaimed = 0;
> +	int i;
> +
> +	/*
> +	 * PCMD_FIRST_MASK is based on number of PCMD entries within
> +	 * PCMD page being 32.
> +	 */
> +	BUILD_BUG_ON(PCMDS_PER_PAGE != 32);
> +
> +	for (i = 0; i < PCMDS_PER_PAGE; i++) {
> +		struct sgx_encl_page *entry;
> +		unsigned long addr;
> +
> +		addr = start_addr + i * PAGE_SIZE;
> +
> +		/*
> +		 * Stop when reaching the SECS page - it does not
> +		 * have a page_array entry and its reclaim is
> +		 * started and completed with enclave mutex held so
> +		 * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
> +		 * flag.
> +		 */
> +		if (addr == encl->base + encl->size)
> +			break;
> +
> +		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
> +		if (!entry)
> +			continue;
> +
> +		/*
> +		 * VA page slot ID uses same bit as the flag so it is important
> +		 * to ensure that the page is not already in backing store.
> +		 */
> +		if (entry->epc_page &&
> +		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
> +			reclaimed = 1;
> +			break;
> +		}
> +	}
> +
> +	return reclaimed;
> +}
> +
>  /*
>   * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's
>   * follow right after the EPC data in the backing storage. In addition to the
> @@ -47,6 +133,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK;
>  	struct sgx_encl *encl = encl_page->encl;
>  	pgoff_t page_index, page_pcmd_off;
> +	unsigned long pcmd_first_page;
>  	struct sgx_pageinfo pginfo;
>  	struct sgx_backing b;
>  	bool pcmd_page_empty;
> @@ -58,6 +145,11 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	else
>  		page_index = PFN_DOWN(encl->size);
>  
> +	/*
> +	 * Address of enclave page using the first entry within the PCMD page.
> +	 */
> +	pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base;
> +
>  	page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
>  
>  	ret = sgx_encl_get_backing(encl, page_index, &b);
> @@ -99,7 +191,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  
>  	sgx_encl_truncate_backing_page(encl, page_index);
>  
> -	if (pcmd_page_empty)
> +	if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page))
>  		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
>  
>  	return ret;
> -- 
> 2.25.1
> 

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-13 14:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12 21:50 [PATCH V3 0/5] SGX shmem backing store issue Reinette Chatre
2022-05-12 21:50 ` [PATCH V3 1/5] x86/sgx: Disconnect backing page references from dirty status Reinette Chatre
2022-05-13 14:39   ` Jarkko Sakkinen
2022-05-12 21:50 ` [PATCH V3 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents Reinette Chatre
2022-05-12 21:50 ` [PATCH V3 3/5] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
2022-05-13 14:39   ` Jarkko Sakkinen
2022-05-12 21:51 ` [PATCH V3 4/5] x86/sgx: Fix race between reclaimer and page fault handler Reinette Chatre
2022-05-13 14:40   ` Jarkko Sakkinen
2022-05-12 21:51 ` [PATCH V3 5/5] x86/sgx: Ensure no data in PCMD page after truncate Reinette Chatre
2022-05-13 14:39   ` Jarkko Sakkinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.