All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] SGX shmem backing store issue
@ 2022-04-28 20:11 Reinette Chatre
  2022-04-28 20:11 ` [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure Reinette Chatre
                   ` (7 more replies)
  0 siblings, 8 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-04-28 20:11 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Everybody,

Haitao reported encountering the following WARN while stress testing SGX
with the SGX2 series [1] applied:

ELDU returned 1073741837 (0x4000000d)
WARNING: CPU: 72 PID: 24407 at arch/x86/kernel/cpu/sgx/encl.c:81 sgx_encl_eldu+0x3cf/0x400
...
Call Trace:
<TASK>
? xa_load+0x6e/0xa0
__sgx_encl_load_page+0x3d/0x80
sgx_encl_load_page_in_vma+0x4a/0x60
sgx_vma_fault+0x7f/0x3b0
? sysvec_call_function_single+0x66/0xd0
? asm_sysvec_call_function_single+0x12/0x20
__do_fault+0x39/0x110
__handle_mm_fault+0x1222/0x15a0
handle_mm_fault+0xdb/0x2c0
do_user_addr_fault+0x1d1/0x650
? exit_to_user_mode_prepare+0x45/0x170
exc_page_fault+0x76/0x170
? asm_exc_page_fault+0x8/0x30
asm_exc_page_fault+0x1e/0x30
...
</TASK>

ENCLS[ELDU] is returning a #GP when attempting to load an enclave
page from the backing store into the enclave. This is likely because
of a MAC verification failure.

Haitao's stress testing involves running two concurrent loops of the SGX
selftests on a system with 4GB EPC memory. One of the tests is modified
to reduce the oversubscription heap size:

diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index d480c2dd2858..12008789325b 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -398,7 +398,7 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
 	 * Create enclave with additional heap that is as big as all
 	 * available physical SGX memory.
 	 */
-	total_mem = get_total_epc_mem();
+	total_mem = get_total_epc_mem()/16;
 	ASSERT_NE(total_mem, 0);
 	TH_LOG("Creating an enclave with %lu bytes heap may take a while ...",
 	       total_mem);

If the the test compiled with above snippet is renamed as "test_sgx_small"
and the original renamed as "test_sgx_large" the two concurrent loops are
run as follows:

(for i in $(seq 1 999); do echo "Iteration $i"; sudo ./test_sgx_large; done ) > log.large 2>&1
(for i in $(seq 1 9999); do echo "Iteration $i"; sudo ./test_sgx_small; done ) > log.small 2>&1

If the SGX driver is modified to always WARN when ENCLS[ELDU] encounters a #GP
(see below) then the WARN appears after a few iterations of "test_sgx_large"
and shows up throughout the testing.

diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index 99004b02e2ed..68c1dbc84ed3 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -18,7 +18,7 @@
 #define ENCLS_WARN(r, name) {						  \
 	do {								  \
 		int _r = (r);						  \
-		WARN_ONCE(_r, "%s returned %d (0x%x)\n", (name), _r, _r); \
+		WARN(_r, "%s returned %d (0x%x)\n", (name), _r, _r); \
 	} while (0);							  \
 }

I focused on investigating this issue the past two weeks and was able to
narrow down the cause but not yet fix the issue. Now I am out of ideas and
would really appreciate if you could provide guidance on how to proceed.

Here is what I have learned so far:
* Reverting commit 08999b2489b4 ("x86/sgx: Free backing memory after
  faulting the enclave page") resolves the issue. With that commit
  reverted the concurrent selftest loops can run to completion without
  encountering any WARNs.

* The issue is also resolved if only the calls (introduced by commit
  08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave
  page") ) to sgx_encl_truncate_backing_page() within __sgx_encl_eldu()
  are disabled.

* Two issues with commit 08999b2489b4 ("x86/sgx: Free backing memory
  after faulting the enclave page") are fixed with patch 1 and 2 in
  this series. These fixes do not resolve the WARN.

* There is a potential race between the reclaimer and the page fault
  handler while interacting with the backing store. This is addressed
  in patch 3, but it does not resolve the WARN.

* Lockdep did reveal a locking issue in SGX2 - the EAUG flow should not
  do direct reclaim since the page fault handler already has mmap_lock
  that the reclaimer will attempt to obtain. This will be fixed in the
  next version of SGX2 but that fix does not resolve the WARN.

* Since ENCLS[ELDU] returns #GP on MAC verification failure a possible
  cause could be if a problem occurs while loading the page from
  backing store.
  sgx_encl_get_backing_page() is used to both allocate backing storage
  in the ENCLS[EWB] flow as well as to lookup backing storage in the
  ENCLS[ELDU] flow.
  As part of the investigation the interface was split to ensure that
  backing storage is only allocated during the ENCLS[EWB] flow and only
  lookup occurs during the ENCLS[ELDU] flow. This can be found in
  patch 4.
  This change revealed that there are cases during ENCLS[ELDU] flow
  when the backing page is not present - attempting to lookup a
  backing page returns -ENOENT. Before patch 4 a new backing page
  would be allocated and thus trigger a #GP when ENCLS[ELDU] is
  attempted.
  This change is not a fix, the code path just fails earlier now, but
  it reveals a cause of the WARNs encountered (MAC verification will
  fail on a newly allocated page) but not why the pages cannot be found
  in the backing store. With this change the number of WARNs are
  reduced significantly but not completely. Even with this patch
  applied the WARN was encountered once while running the stress
  selftests.

Could you please advise on how to proceed? Do you perhaps have ideas
why the backing store could not contain a page when it is loaded back
during the ENCLS[ELDU] flow?
There is some interesting text in the comments within shmem_fault()
that hints that faulting pages should be avoided into holes as they
are being punched ... but I do not understand those details and
would really appreciate guidance on how to understand what is going on
here.

These SGX2 tests exercise the concurrent operation of reclaimer and page
fault handler and have a good track record of locating issues. Thanks
to it we already have:
8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
ac5d272a0ad0 ("x86/sgx: Fix free page accounting")
While this issue is triggered by the SGX2 tests I cannot find a connection
with the SGX2 code paths. I have made a few attempts to create an
equivalent test for SGX1 but have not yet had success to create a SGX1
test that can trigger this issue.

These patches are based on v5.18-rc4 with the SGX2 series applied but
do not touch SGX2 code paths. Patch 1,2, and 3 also apply cleanly to
v5.18-rc4 while patch 4 has a conflict in arch/x86/kernel/cpu/sgx/encl.h
due to the SGX2 declarations.

Your feedback will be greatly appreciated.
Thank you very much

Reinette

[1] https://lore.kernel.org/lkml/cover.1649878359.git.reinette.chatre@intel.com/

Reinette Chatre (4):
  x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  x86/sgx: Set dirty bit after modifying page contents
  x86/sgx: Obtain backing storage page with enclave mutex held
  x86/sgx: Do not allocate backing pages when loading from backing store

 arch/x86/kernel/cpu/sgx/encl.c  | 89 ++++++++++++++++++++++++++++-----
 arch/x86/kernel/cpu/sgx/encl.h  |  8 ++-
 arch/x86/kernel/cpu/sgx/ioctl.c |  1 +
 arch/x86/kernel/cpu/sgx/main.c  | 15 +++---
 4 files changed, 93 insertions(+), 20 deletions(-)

-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-04-28 20:11 [RFC PATCH 0/4] SGX shmem backing store issue Reinette Chatre
@ 2022-04-28 20:11 ` Reinette Chatre
  2022-04-28 21:30   ` Dave Hansen
  2022-04-28 20:11 ` [RFC PATCH 2/4] x86/sgx: Set dirty bit after modifying page contents Reinette Chatre
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-04-28 20:11 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Recent commit 08999b2489b4 ("x86/sgx: Free backing memory
after faulting the enclave page") frees the backing storage
after it becomes unneeded due to the data from it being
loaded back into the EPC (Enclave Page Cache).

The backing storage is freed after running ENCLS[ELDU],
whether ENCLS[ELDU] succeeded or not. If ENCLS[ELDU]
thus failed then the data within that page is lost.

Exit with error without removing the backing storage if
it could not be restored to the enclave.

Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 1a2cbe44b8d9..e5d2661800ac 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -81,6 +81,10 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 			ENCLS_WARN(ret, "ELDU");
 
 		ret = -EFAULT;
+		kunmap_atomic(pcmd_page);
+		kunmap_atomic((void *)(unsigned long)pginfo.contents);
+		sgx_encl_put_backing(&b, false);
+		return ret;
 	}
 
 	memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH 2/4] x86/sgx: Set dirty bit after modifying page contents
  2022-04-28 20:11 [RFC PATCH 0/4] SGX shmem backing store issue Reinette Chatre
  2022-04-28 20:11 ` [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure Reinette Chatre
@ 2022-04-28 20:11 ` Reinette Chatre
  2022-04-28 21:40   ` Dave Hansen
  2022-05-06 22:27   ` Jarkko Sakkinen
  2022-04-28 20:11 ` [RFC PATCH 3/4] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-04-28 20:11 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Recent commit 08999b2489b4 ("x86/sgx: Free backing memory
after faulting the enclave page") expanded __sgx_encl_eldu()
to clear an enclave page's PCMD (Paging Crypto MetaData)
from the PCMD page in the backing store after the enclave
page is restored to the enclave.

Since the PCMD page in the backing store is modified the page
should be set as dirty when releasing the reference to
ensure the modified data is retained.

Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index e5d2661800ac..e03f124ce772 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -98,7 +98,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	kunmap_atomic(pcmd_page);
 	kunmap_atomic((void *)(unsigned long)pginfo.contents);
 
-	sgx_encl_put_backing(&b, false);
+	sgx_encl_put_backing(&b, true);
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH 3/4] x86/sgx: Obtain backing storage page with enclave mutex held
  2022-04-28 20:11 [RFC PATCH 0/4] SGX shmem backing store issue Reinette Chatre
  2022-04-28 20:11 ` [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure Reinette Chatre
  2022-04-28 20:11 ` [RFC PATCH 2/4] x86/sgx: Set dirty bit after modifying page contents Reinette Chatre
@ 2022-04-28 20:11 ` Reinette Chatre
  2022-04-28 21:58   ` Dave Hansen
  2022-05-06 22:43   ` Jarkko Sakkinen
  2022-04-28 20:11 ` [RFC PATCH 4/4] x86/sgx: Do not allocate backing pages when loading from backing store Reinette Chatre
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-04-28 20:11 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

The SGX backing storage is accessed on two paths: when there
are insufficient enclave pages in the EPC the reclaimer works
to move enclave pages to the backing storage and as enclaves
access pages that have been moved to the backing storage
they are retrieved from there as part of page fault handling.

An oversubscribed SGX system will often run the reclaimer and
page fault handler concurrently and needs to ensure that the
backing store is accessed safely between the reclaimer and
the page fault handler. The scenarios to consider here are:
(a) faulting a page right after it was reclaimed,
(b) faulting a page and reclaiming another page that are
    sharing a PCMD page.

The reclaimer obtains pages from the backing storage without
holding the enclave mutex and runs the risk of concurrently
accessing the backing storage with the page fault handler that
does access the backing storage with the enclave mutex held.

In the scenario below a page is written to the backing store
by the reclaimer and then immediately faulted back, before
the reclaimer is able to set the dirty bit of the page:

sgx_reclaim_pages() {                    sgx_vma_fault() {
...                                      ...
/* write data to backing store */
sgx_reclaimer_write();
                                         mutex_lock(&encl->lock);
                                         __sgx_encl_eldu() {
                                         ...
                                         /* page not dirty -
                                          * contents may not be
                                          * up to date
                                         */
                                         sgx_encl_get_backing();
                                         ...
                                         }
                                         ...
/* set page dirty */
sgx_encl_put_backing();
...
                                         mutex_unlock(&encl->lock);
}                                        }

While it is not possible to concurrently reclaim and fault the same
enclave page the PCMD pages are shared between enclave pages
in the enclave and enclave pages in the backing store.
In the below scenario a PCMD page is truncated from the backing
store after all its pages have been loaded in to the enclave
at the same time the PCMD page is loaded from the backing store
when one of its pages are reclaimed:

sgx_reclaim_pages() {              sgx_vma_fault() {
                                     ...
                                     mutex_lock(&encl->lock);
                                     ...
                                     __sgx_encl_eldu() {
                                       ...
                                       if (pcmd_page_empty) {
/*
 * EPC page being reclaimed              /*
 * shares a PCMD page with an             * PCMD page truncated
 * enclave page that is being             * while requested from
 * faulted in.                            * reclaimer.
 */                                       */
sgx_encl_get_backing()  <---------->      sgx_encl_truncate_backing_page()
                                        }
}                                    }

Protect the reclaimer's backing store access with the enclave's mutex
to ensure that it can safely run concurrently with the page fault handler.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 0e8741a80cf3..ae79b8d6f645 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -252,6 +252,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 	sgx_encl_ewb(epc_page, backing);
 	encl_page->epc_page = NULL;
 	encl->secs_child_cnt--;
+	sgx_encl_put_backing(backing, true);
 
 	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
 		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
@@ -323,11 +324,14 @@ static void sgx_reclaim_pages(void)
 			goto skip;
 
 		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
+
+		mutex_lock(&encl_page->encl->lock);
 		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
-		if (ret)
+		if (ret) {
+			mutex_unlock(&encl_page->encl->lock);
 			goto skip;
+		}
 
-		mutex_lock(&encl_page->encl->lock);
 		encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
 		mutex_unlock(&encl_page->encl->lock);
 		continue;
@@ -355,7 +359,6 @@ static void sgx_reclaim_pages(void)
 
 		encl_page = epc_page->owner;
 		sgx_reclaimer_write(epc_page, &backing[i]);
-		sgx_encl_put_backing(&backing[i], true);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH 4/4] x86/sgx: Do not allocate backing pages when loading from backing store
  2022-04-28 20:11 [RFC PATCH 0/4] SGX shmem backing store issue Reinette Chatre
                   ` (2 preceding siblings ...)
  2022-04-28 20:11 ` [RFC PATCH 3/4] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
@ 2022-04-28 20:11 ` Reinette Chatre
  2022-04-28 21:12 ` [RFC PATCH 0/4] SGX shmem backing store issue Dave Hansen
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-04-28 20:11 UTC (permalink / raw)
  To: dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Preface:
This is intended as a debugging aid forming part of the investigation
into ENCLS[ELDU] returning #GP. This is not intended for inclusion.

Changelog:
The shmem backing store is used to (a) store encrypted enclave pages
when they are reclaimed from the enclave, and (b) load encrypted
enclave pages back into the enclave when they are accessed.

The same interface, sgx_encl_get_backing_page(), is used whether
a new backing page is needed to store an enclave page being
reclaimed or whether an existing backing page is loaded to reload
an enclave page to the EPC. This is because of this flow:
sgx_encl_get_backing_page()
        shmem_read_mapping_page_gfp()
                shmem_getpage_gfp(..., ..., ..., SGP_CACHE, ...)

With this interface used the backing pages are retrieved with the
SGP_CACHE flag that will automatically allocate a backing page if
it is not present.

In an effort to diagnose #GP ENCLS[ELDU] the interface is split
to ensure that when a backing page is expected to exist it is just
loaded (lookup), not allocated.

Replace sgx_encl_get_backing() with sgx_encl_lookup_backing() and
sgx_encl_alloc_backing() to distinguish whether a backing page needs
to be allocated or is expected to exist. sgx_encl_alloc_backing()
is used by the reclaimer during the ENCLS[EWB] flow and
sgx_encl_lookup_backing() is used in the ENCLS[ELDU] flow.

An IDA is used to keep track of PCMD page allocation to ensure these
pages are allocated once.

This patch revealed that there are scenarios where the backing store
does not contain a page expected to exist - sgx_encl_lookup_backing()
fails with -ENOENT. This would explain ENCLS[ELDU] returning #GP
since previously such a missing page would be allocated and thus
trigger a MAC verification failure.

Specifically, with the debugging included enabled an oversubscribe
stress test encounters the error:
sgx: sgx_encl_get_backing_page:847 fail 1 backing page with -2

The reason why the backing page is not present is not understood.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c  | 83 ++++++++++++++++++++++++++++-----
 arch/x86/kernel/cpu/sgx/encl.h  |  8 +++-
 arch/x86/kernel/cpu/sgx/ioctl.c |  1 +
 arch/x86/kernel/cpu/sgx/main.c  |  6 +--
 4 files changed, 82 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index e03f124ce772..22ed886dc825 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -60,7 +60,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
 
-	ret = sgx_encl_get_backing(encl, page_index, &b);
+	ret = sgx_encl_lookup_backing(encl, page_index, &b);
 	if (ret)
 		return ret;
 
@@ -102,8 +102,10 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-	if (pcmd_page_empty)
+	if (pcmd_page_empty) {
+		ida_free(&encl->pcmd_in_backing, PFN_DOWN(page_pcmd_off));
 		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
+	}
 
 	return ret;
 }
@@ -617,6 +619,7 @@ void sgx_encl_release(struct kref *ref)
 
 	if (encl->backing)
 		fput(encl->backing);
+	ida_destroy(&encl->pcmd_in_backing);
 
 	cleanup_srcu_struct(&encl->srcu);
 
@@ -807,17 +810,39 @@ const cpumask_t *sgx_encl_cpumask(struct sgx_encl *encl)
 }
 
 static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
-					      pgoff_t index)
+					      pgoff_t index, enum sgp_type sgp)
 {
 	struct inode *inode = encl->backing->f_path.dentry->d_inode;
-	struct address_space *mapping = inode->i_mapping;
-	gfp_t gfpmask = mapping_gfp_mask(mapping);
+	struct page *page = NULL;
+	int ret;
+
+	ret = shmem_getpage(inode, index, &page, sgp);
+	if (ret) {
+		pr_debug("%s:%d fail %d backing page with %d\n",
+			 __func__, __LINE__, sgp, ret);
+		return ERR_PTR(ret);
+	}
+
+	if (!page) {
+		pr_debug("%s:%d fail %d backing page with NULL page\n",
+			 __func__, __LINE__, sgp);
+		return ERR_PTR(-EFAULT);
+	}
 
-	return shmem_read_mapping_page_gfp(mapping, index, gfpmask);
+	if (PageHWPoison(page)) {
+		pr_debug("%s:%d fail %d backing page with poison page\n",
+			 __func__, __LINE__, sgp);
+		unlock_page(page);
+		put_page(page);
+		return ERR_PTR(-EIO);
+	}
+
+	unlock_page(page);
+	return page;
 }
 
 /**
- * sgx_encl_get_backing() - Pin the backing storage
+ * sgx_encl_alloc_backing() - Pin the backing storage
  * @encl:	an enclave pointer
  * @page_index:	enclave page index
  * @backing:	data for accessing backing storage for the page
@@ -829,18 +854,54 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
  *   0 on success,
  *   -errno otherwise.
  */
-int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
-			 struct sgx_backing *backing)
+int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
+			   struct sgx_backing *backing)
+{
+	pgoff_t page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
+	pgoff_t pcmd_index = PFN_DOWN(page_pcmd_off);
+	struct page *contents;
+	struct page *pcmd;
+	int ret;
+
+	contents = sgx_encl_get_backing_page(encl, page_index, SGP_CACHE);
+	if (IS_ERR(contents))
+		return PTR_ERR(contents);
+
+	ret = ida_alloc_range(&encl->pcmd_in_backing, pcmd_index,
+			      pcmd_index, GFP_KERNEL);
+	if (ret == -ENOSPC) {
+		/* pcmd_index backing page already created, just look it up */
+		pcmd = sgx_encl_get_backing_page(encl, pcmd_index, SGP_NOALLOC);
+	} else if (ret >= 0) {
+		pcmd = sgx_encl_get_backing_page(encl, pcmd_index, SGP_CACHE);
+	} else {
+		pcmd = ERR_PTR(ret);
+	}
+	if (IS_ERR(pcmd)) {
+		put_page(contents);
+		return PTR_ERR(pcmd);
+	}
+
+	backing->page_index = page_index;
+	backing->contents = contents;
+	backing->pcmd = pcmd;
+	backing->pcmd_offset = page_pcmd_off & (PAGE_SIZE - 1);
+
+	return 0;
+}
+
+int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
+			    struct sgx_backing *backing)
 {
 	pgoff_t page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
 	struct page *contents;
 	struct page *pcmd;
 
-	contents = sgx_encl_get_backing_page(encl, page_index);
+	contents = sgx_encl_get_backing_page(encl, page_index, SGP_NOALLOC);
 	if (IS_ERR(contents))
 		return PTR_ERR(contents);
 
-	pcmd = sgx_encl_get_backing_page(encl, PFN_DOWN(page_pcmd_off));
+	pcmd = sgx_encl_get_backing_page(encl, PFN_DOWN(page_pcmd_off), SGP_NOALLOC);
 	if (IS_ERR(pcmd)) {
 		put_page(contents);
 		return PTR_ERR(pcmd);
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 66adb8faec45..2a8d3bd3338f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -8,6 +8,7 @@
 #define _X86_ENCL_H
 
 #include <linux/cpumask.h>
+#include <linux/idr.h>
 #include <linux/kref.h>
 #include <linux/list.h>
 #include <linux/mm_types.h>
@@ -62,6 +63,7 @@ struct sgx_encl {
 
 	cpumask_t cpumask;
 	struct file *backing;
+	struct ida pcmd_in_backing;
 	struct kref refcount;
 	struct list_head va_pages;
 	unsigned long mm_list_version;
@@ -107,8 +109,10 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
 void sgx_encl_release(struct kref *ref);
 int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
 const cpumask_t *sgx_encl_cpumask(struct sgx_encl *encl);
-int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
-			 struct sgx_backing *backing);
+int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
+			   struct sgx_backing *backing);
+int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
+			    struct sgx_backing *backing);
 void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 1c2f40b72551..94d3817b40ff 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -82,6 +82,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 	}
 
 	encl->backing = backing;
+	ida_init(&encl->pcmd_in_backing);
 
 	secs_epc = sgx_alloc_epc_page(&encl->secs, true);
 	if (IS_ERR(secs_epc)) {
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index ae79b8d6f645..148ec695b1b3 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -255,8 +255,8 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 	sgx_encl_put_backing(backing, true);
 
 	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
-		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
-					   &secs_backing);
+		ret = sgx_encl_alloc_backing(encl, PFN_DOWN(encl->size),
+					     &secs_backing);
 		if (ret)
 			goto out;
 
@@ -326,7 +326,7 @@ static void sgx_reclaim_pages(void)
 		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
 
 		mutex_lock(&encl_page->encl->lock);
-		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
+		ret = sgx_encl_alloc_backing(encl_page->encl, page_index, &backing[i]);
 		if (ret) {
 			mutex_unlock(&encl_page->encl->lock);
 			goto skip;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-04-28 20:11 [RFC PATCH 0/4] SGX shmem backing store issue Reinette Chatre
                   ` (3 preceding siblings ...)
  2022-04-28 20:11 ` [RFC PATCH 4/4] x86/sgx: Do not allocate backing pages when loading from backing store Reinette Chatre
@ 2022-04-28 21:12 ` Dave Hansen
  2022-04-29 18:50   ` Reinette Chatre
  2022-04-28 21:29 ` Dave Hansen
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Dave Hansen @ 2022-04-28 21:12 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 4/28/22 13:11, Reinette Chatre wrote:
> ELDU returned 1073741837 (0x4000000d)
> WARNING: CPU: 72 PID: 24407 at arch/x86/kernel/cpu/sgx/encl.c:81 sgx_encl_eldu+0x3cf/0x400
> ...
> Call Trace:
> <TASK>
> ? xa_load+0x6e/0xa0
> __sgx_encl_load_page+0x3d/0x80
> sgx_encl_load_page_in_vma+0x4a/0x60
> sgx_vma_fault+0x7f/0x3b0

First of all, thanks for all the work to narrow this down.

It sounds like there are probably at least two failure modes at play here:

	1. shmem_read_mapping_page_gfp() is called to retrieve an
	   existing page, but an empty one is allocated instead.  ELDU
	   fails on the empty page.  This one should be fixed by patch 	
	   4/4.
	2. shmem_read_mapping_page_gfp() actually finds a page, but it
	   still fails ELDU.

Is that right?

If so, I'd probably delve deeper into what the page and the PCMD look
like.  I usually go after these kinds of things with tracing.  I'd
probably dump some representation of the PCMD and page contents with
trace_printk().  Dump them when the at __sgx_encl_ewb() time, then also
dump them where the warning is being hit.  Pair the warning with a
tracing_off().

// A crude checksum:
u64 sum_page(u64 *page)
{
	u64 ret = 0
	int i;

	for (i = 0; i < PAGE_SIZE/sizeof(u64)); i++)
		ret += page[i];

	return ret;
}

Then, logically something like this:

	trace_printk("bad ELDU on shm page: %x sum: pcmd: %x %x...\n",
		page_to_pfn(shm_page), sum_page(page_kmap),
		&pcmd, ...);

Both at EWB time and ELDU time.  Let's see if the pages that are coming
out of shmem are the same as the ones that were put in.

When you hit the warning, tracing should turn itself off.  Then, you can
just grep through the trace for that same pfn.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-04-28 20:11 [RFC PATCH 0/4] SGX shmem backing store issue Reinette Chatre
                   ` (4 preceding siblings ...)
  2022-04-28 21:12 ` [RFC PATCH 0/4] SGX shmem backing store issue Dave Hansen
@ 2022-04-28 21:29 ` Dave Hansen
  2022-04-28 22:20   ` Reinette Chatre
  2022-05-04  6:40 ` Jarkko Sakkinen
  2022-05-05  6:09 ` Jarkko Sakkinen
  7 siblings, 1 reply; 50+ messages in thread
From: Dave Hansen @ 2022-04-28 21:29 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 4/28/22 13:11, Reinette Chatre wrote:
> ELDU returned 1073741837 (0x4000000d)

BTW, what does that decode to exactly?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-04-28 20:11 ` [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure Reinette Chatre
@ 2022-04-28 21:30   ` Dave Hansen
  2022-04-28 22:20     ` Reinette Chatre
  2022-05-06 22:09     ` Jarkko Sakkinen
  0 siblings, 2 replies; 50+ messages in thread
From: Dave Hansen @ 2022-04-28 21:30 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 4/28/22 13:11, Reinette Chatre wrote:
> 
> The backing storage is freed after running ENCLS[ELDU],
> whether ENCLS[ELDU] succeeded or not. If ENCLS[ELDU]
> thus failed then the data within that page is lost.
> 
> Exit with error without removing the backing storage if
> it could not be restored to the enclave.
> 
> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/encl.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 1a2cbe44b8d9..e5d2661800ac 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -81,6 +81,10 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  			ENCLS_WARN(ret, "ELDU");
>  
>  		ret = -EFAULT;
> +		kunmap_atomic(pcmd_page);
> +		kunmap_atomic((void *)(unsigned long)pginfo.contents);
> +		sgx_encl_put_backing(&b, false);
> +		return ret;
>  	}
>  
>  	memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));

Are there any transient, recoverable errors that can come back from
ELDU?  If so, this makes a lot of sense.  If not, then it doesn't make a
lot of sense to preserve the swapped-out content because they enclave is
going to die anyway.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 2/4] x86/sgx: Set dirty bit after modifying page contents
  2022-04-28 20:11 ` [RFC PATCH 2/4] x86/sgx: Set dirty bit after modifying page contents Reinette Chatre
@ 2022-04-28 21:40   ` Dave Hansen
  2022-04-28 22:41     ` Reinette Chatre
  2022-05-06 22:27   ` Jarkko Sakkinen
  1 sibling, 1 reply; 50+ messages in thread
From: Dave Hansen @ 2022-04-28 21:40 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 4/28/22 13:11, Reinette Chatre wrote:
> 
> Since the PCMD page in the backing store is modified the page
> should be set as dirty when releasing the reference to
> ensure the modified data is retained.
> 
> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/encl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index e5d2661800ac..e03f124ce772 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -98,7 +98,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	kunmap_atomic(pcmd_page);
>  	kunmap_atomic((void *)(unsigned long)pginfo.contents);
>  
> -	sgx_encl_put_backing(&b, false);
> +	sgx_encl_put_backing(&b, true);

I think you're on the right track here.  The concept is right.  The
memset() wrote fresh data into the b.pcmd page.  But, if it were clean
swap cache, it can be discarded again and the memset() might be lost.

I *think* all that would do in the end is leave us with a PCMD page that
will never be truncated because the page has non-zero PCMD data that
will never be used, the result of the thrown-away memset().

But, I'd rather this be done more directly and closer to the actual
dirtying of the page.  Perhaps:

+	set_page_dirty(b.pcmd);
        memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));


I don't think the "b.contents" page needs the same treatment because its
contents are always discarded in this path while some of the PCMD page's
contents need to be preserved.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 3/4] x86/sgx: Obtain backing storage page with enclave mutex held
  2022-04-28 20:11 ` [RFC PATCH 3/4] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
@ 2022-04-28 21:58   ` Dave Hansen
  2022-04-28 22:44     ` Reinette Chatre
  2022-05-06 22:43   ` Jarkko Sakkinen
  1 sibling, 1 reply; 50+ messages in thread
From: Dave Hansen @ 2022-04-28 21:58 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 4/28/22 13:11, Reinette Chatre wrote:
> @@ -252,6 +252,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>  	sgx_encl_ewb(epc_page, backing);
>  	encl_page->epc_page = NULL;
>  	encl->secs_child_cnt--;
> +	sgx_encl_put_backing(backing, true);

Could you also talk a bit about why this needed to move?  It's a bit
harder to make sense of the refcounting when the get and put are done in
different functions.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-04-28 21:29 ` Dave Hansen
@ 2022-04-28 22:20   ` Reinette Chatre
  0 siblings, 0 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-04-28 22:20 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 4/28/2022 2:29 PM, Dave Hansen wrote:
> On 4/28/22 13:11, Reinette Chatre wrote:
>> ELDU returned 1073741837 (0x4000000d)
> 
> BTW, what does that decode to exactly?

ENCLS instructions can fault as well as return SGX error
codes.

0x40000000 (SGX_ENCLS_FAULT_FLAG) indicates that this is
an ENCLS fault (as opposed to an error returned by the
ENCLS instruction) with 0xd (#GP) identifying the fault.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-04-28 21:30   ` Dave Hansen
@ 2022-04-28 22:20     ` Reinette Chatre
  2022-04-28 22:53       ` Dave Hansen
  2022-05-06 22:09     ` Jarkko Sakkinen
  1 sibling, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-04-28 22:20 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 4/28/2022 2:30 PM, Dave Hansen wrote:
> On 4/28/22 13:11, Reinette Chatre wrote:

> Are there any transient, recoverable errors that can come back from
> ELDU?  If so, this makes a lot of sense.  If not, then it doesn't make a
> lot of sense to preserve the swapped-out content because they enclave is
> going to die anyway.

Good point.

Theoretically ELDU could encounter a page fault while accessing the 
regions it needs to read from and write to. These faults are passed
through and the instruction would return with a #PF that is
propagated with the page fault handler returning SIGBUS.

Even so, this flow also impacts the SGX2 flows that need to load pages from
the backing store. In this case the kernel would pass it as an error
(-EFAULT) to the runtime but it would not result in the
enclave being killed. If it was a #PF that caused the issue then
perhaps theoretically the SGX2 instruction has a chance of succeeding
if the runtime attempts it again? 

Reinette


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 2/4] x86/sgx: Set dirty bit after modifying page contents
  2022-04-28 21:40   ` Dave Hansen
@ 2022-04-28 22:41     ` Reinette Chatre
  0 siblings, 0 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-04-28 22:41 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 4/28/2022 2:40 PM, Dave Hansen wrote:
> On 4/28/22 13:11, Reinette Chatre wrote:
>>
>> Since the PCMD page in the backing store is modified the page
>> should be set as dirty when releasing the reference to
>> ensure the modified data is retained.
>>
>> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>>  arch/x86/kernel/cpu/sgx/encl.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
>> index e5d2661800ac..e03f124ce772 100644
>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>> @@ -98,7 +98,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>>  	kunmap_atomic(pcmd_page);
>>  	kunmap_atomic((void *)(unsigned long)pginfo.contents);
>>  
>> -	sgx_encl_put_backing(&b, false);
>> +	sgx_encl_put_backing(&b, true);
> 
> I think you're on the right track here.  The concept is right.  The
> memset() wrote fresh data into the b.pcmd page.  But, if it were clean
> swap cache, it can be discarded again and the memset() might be lost.
> 
> I *think* all that would do in the end is leave us with a PCMD page that
> will never be truncated because the page has non-zero PCMD data that
> will never be used, the result of the thrown-away memset().

Thank you very much for the analysis. This explains why this change
did not impact the issue I am chasing.

> 
> But, I'd rather this be done more directly and closer to the actual
> dirtying of the page.  Perhaps:
> 
> +	set_page_dirty(b.pcmd);
>         memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
> 
> 

Will do (with comments to explain why this line was extracted from
the subsequent sgx_encl_put_backing() call). 


> I don't think the "b.contents" page needs the same treatment because its
> contents are always discarded in this path while some of the PCMD page's
> contents need to be preserved.

That's right. The consistent symmetrical API of get()/put() was appealing
to me.

Thank you very much.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 3/4] x86/sgx: Obtain backing storage page with enclave mutex held
  2022-04-28 21:58   ` Dave Hansen
@ 2022-04-28 22:44     ` Reinette Chatre
  0 siblings, 0 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-04-28 22:44 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 4/28/2022 2:58 PM, Dave Hansen wrote:
> On 4/28/22 13:11, Reinette Chatre wrote:
>> @@ -252,6 +252,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>>  	sgx_encl_ewb(epc_page, backing);
>>  	encl_page->epc_page = NULL;
>>  	encl->secs_child_cnt--;
>> +	sgx_encl_put_backing(backing, true);
> 
> Could you also talk a bit about why this needed to move?  It's a bit
> harder to make sense of the refcounting when the get and put are done in
> different functions.

This needed to move to address the following scenario described in the
changelog:

sgx_reclaim_pages() {                    sgx_vma_fault() {
...                                      ...
/* write data to backing store */
sgx_reclaimer_write();
                                         mutex_lock(&encl->lock);
                                         __sgx_encl_eldu() {
                                         ...
                                         /* page not dirty -
                                          * contents may not be
                                          * up to date
                                         */
                                         sgx_encl_get_backing();
                                         ...
                                         }
                                         ...
/* set page dirty */
sgx_encl_put_backing();
...
                                         mutex_unlock(&encl->lock);
}                                        }

Before this change the backing store page was modified within
sgx_reclaimer_write() that essentially does:

sgx_reclaimer_write() {
    mutex_lock(&encl->lock);
    /* write encrypted data to backing store */
    mutex_unlock(&encl->lock);
}

The reclaimer followed the sgx_reclaimer_write() call with a
call to sgx_encl_put_backing() where the pages have their
dirty bits set. sgx_encl_put_backing() was thus done without
the enclave mutex held. If that page is faulted in at that
time the page fault handler may thus attempt to load
the page from the backing store between its contents being
changed and it being marked as dirty.

After the change the page fault handler would not be able to
load the page from the backing store before it is marked as
dirty.

I can improve the flow in the changelog to be clear on the
reclaimer's mutex usage. Perhaps something like:

sgx_reclaim_pages() {                    sgx_vma_fault() {
...                                      ...

  sgx_reclaimer_write() {
     mutex_lock(&encl->lock);
     /* write to backing store */
     mutex_unlock(&encl->lock);
  }
                                         mutex_lock(&encl->lock);
                                         __sgx_encl_eldu() {
                                         ...
                                         /* page not dirty -
                                          * contents may not be
                                          * up to date
                                         */
                                         sgx_encl_get_backing();
                                         ...
                                         }
                                         ...
/* set page dirty */
sgx_encl_put_backing();
...
                                         mutex_unlock(&encl->lock);
}                                        }



Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-04-28 22:20     ` Reinette Chatre
@ 2022-04-28 22:53       ` Dave Hansen
  2022-04-28 23:49         ` Reinette Chatre
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Hansen @ 2022-04-28 22:53 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 4/28/22 15:20, Reinette Chatre wrote:
> Hi Dave,
> 
> On 4/28/2022 2:30 PM, Dave Hansen wrote:
>> On 4/28/22 13:11, Reinette Chatre wrote:
> 
>> Are there any transient, recoverable errors that can come back from
>> ELDU?  If so, this makes a lot of sense.  If not, then it doesn't make a
>> lot of sense to preserve the swapped-out content because they enclave is
>> going to die anyway.
> 
> Good point.
> 
> Theoretically ELDU could encounter a page fault while accessing the 
> regions it needs to read from and write to. These faults are passed
> through and the instruction would return with a #PF that is
> propagated with the page fault handler returning SIGBUS.

We don't have to worry about those, though, do we?  We're operating
entirely on kernel mappings that won't cause #PF.

> Even so, this flow also impacts the SGX2 flows that need to load pages from
> the backing store. In this case the kernel would pass it as an error
> (-EFAULT) to the runtime but it would not result in the
> enclave being killed. If it was a #PF that caused the issue then
> perhaps theoretically the SGX2 instruction has a chance of succeeding
> if the runtime attempts it again? 

How are the SGX2 flows different than what we have now?

I also looked a little deeper at this transient failure problem.  The
ELDU documentation also mentions a possible error code of:

	SGX_EPC_PAGE_CONFLICT

It *looks* like there can be conflicts on the SECS page as well as the
EPC page being explicitly accessed.  Is that a possible problem here?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-04-28 22:53       ` Dave Hansen
@ 2022-04-28 23:49         ` Reinette Chatre
  2022-05-03  2:01           ` Kai Huang
  2022-05-07 17:25           ` Jarkko Sakkinen
  0 siblings, 2 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-04-28 23:49 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 4/28/2022 3:53 PM, Dave Hansen wrote:
> On 4/28/22 15:20, Reinette Chatre wrote:
>> Hi Dave,
>>
>> On 4/28/2022 2:30 PM, Dave Hansen wrote:
>>> On 4/28/22 13:11, Reinette Chatre wrote:
>>
>>> Are there any transient, recoverable errors that can come back from
>>> ELDU?  If so, this makes a lot of sense.  If not, then it doesn't make a
>>> lot of sense to preserve the swapped-out content because they enclave is
>>> going to die anyway.
>>
>> Good point.
>>
>> Theoretically ELDU could encounter a page fault while accessing the 
>> regions it needs to read from and write to. These faults are passed
>> through and the instruction would return with a #PF that is
>> propagated with the page fault handler returning SIGBUS.
> 
> We don't have to worry about those, though, do we?  We're operating
> entirely on kernel mappings that won't cause #PF.

Indeed, yes, I do not see how an ELDU error or fault is recoverable.

> 
>> Even so, this flow also impacts the SGX2 flows that need to load pages from
>> the backing store. In this case the kernel would pass it as an error
>> (-EFAULT) to the runtime but it would not result in the
>> enclave being killed. If it was a #PF that caused the issue then
>> perhaps theoretically the SGX2 instruction has a chance of succeeding
>> if the runtime attempts it again? 
> 
> How are the SGX2 flows different than what we have now?

SGX2 uses the same flow as the page fault handler to load the
page into the enclave. The only difference is that the SGX2 flow removed the
VMA permission checks. See:
https://lore.kernel.org/lkml/db3a14f2d2df7678dec23375d48c96b603f8cfb5.1649878359.git.reinette.chatre@intel.com/

As per the trace printed in the WARN the issue being investigated is
triggered by the ELDU run as part of the page fault handler, not
the SGX2 flows.

> 
> I also looked a little deeper at this transient failure problem.  The
> ELDU documentation also mentions a possible error code of:
> 
> 	SGX_EPC_PAGE_CONFLICT
> 
> It *looks* like there can be conflicts on the SECS page as well as the
> EPC page being explicitly accessed.  Is that a possible problem here?

I went down this path myself. SGX_EPC_PAGE_CONFLICT is an error code
supported by newer ELDUC - the ELDU used in current code would indeed
#GP in this case. The SDM text describing ELDUC as "This leaf function
behaves like ELDU but with improved conflict handling for oversubscription"
really does seem relevant to the test that triggers this issue.

I stopped pursuing this because from what I understand if
SGX_EPC_PAGE_CONFLICT is encountered with commit 08999b2489b4 ("x86/sgx:
Free backing memory after faulting the enclave page") then it should
also be encountered without it. The issue is not present with
08999b2489b4 ("x86/sgx: Free backing memory after faulting the
enclave page") removed. I am thus currently investigating based on
the assumption that the #GP is encountered because of MAC
verification problem. I may be wrong here also and need more information
since the SDM documents two seemingly related errors:
#GP(0) -> If the instruction fails to verify MAC.
SGX_MAC_COMPARE_FAIL -> If the MAC check fails.


Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-04-28 21:12 ` [RFC PATCH 0/4] SGX shmem backing store issue Dave Hansen
@ 2022-04-29 18:50   ` Reinette Chatre
  2022-04-29 19:45     ` Dave Hansen
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-04-29 18:50 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 4/28/2022 2:12 PM, Dave Hansen wrote:
> On 4/28/22 13:11, Reinette Chatre wrote:
>> ELDU returned 1073741837 (0x4000000d)
>> WARNING: CPU: 72 PID: 24407 at arch/x86/kernel/cpu/sgx/encl.c:81 sgx_encl_eldu+0x3cf/0x400
>> ...
>> Call Trace:
>> <TASK>
>> ? xa_load+0x6e/0xa0
>> __sgx_encl_load_page+0x3d/0x80
>> sgx_encl_load_page_in_vma+0x4a/0x60
>> sgx_vma_fault+0x7f/0x3b0
> 
> First of all, thanks for all the work to narrow this down.
> 
> It sounds like there are probably at least two failure modes at play here:
> 
> 	1. shmem_read_mapping_page_gfp() is called to retrieve an
> 	   existing page, but an empty one is allocated instead.  ELDU
> 	   fails on the empty page.  This one should be fixed by patch 	
> 	   4/4.

I am not yet comfortable calling patch 4/4 a fix. From what I understand
hitting that error means that the kernel lost a page. This patch detects
it earlier so that we can avoid the ELDU #GP but it does still seem
an issue to hit that error in the first place.

> 	2. shmem_read_mapping_page_gfp() actually finds a page, but it
> 	   still fails ELDU.
> 
> Is that right?

Correct.

> 
> If so, I'd probably delve deeper into what the page and the PCMD look
> like.  I usually go after these kinds of things with tracing.  I'd
> probably dump some representation of the PCMD and page contents with
> trace_printk().  Dump them when the at __sgx_encl_ewb() time, then also
> dump them where the warning is being hit.  Pair the warning with a
> tracing_off().
> 
> // A crude checksum:
> u64 sum_page(u64 *page)
> {
> 	u64 ret = 0
> 	int i;
> 
> 	for (i = 0; i < PAGE_SIZE/sizeof(u64)); i++)
> 		ret += page[i];
> 
> 	return ret;
> }
> 
> Then, logically something like this:
> 
> 	trace_printk("bad ELDU on shm page: %x sum: pcmd: %x %x...\n",
> 		page_to_pfn(shm_page), sum_page(page_kmap),
> 		&pcmd, ...);
> 
> Both at EWB time and ELDU time.  Let's see if the pages that are coming
> out of shmem are the same as the ones that were put in.
> 
> When you hit the warning, tracing should turn itself off.  Then, you can
> just grep through the trace for that same pfn.

Thank you very much for this guidance.

I added trace_printk() to the EWB and ELDU flows as below and found an
interesting result.

EWB:
__sgx_encl_ewb() {
   ...
   trace_printk("EWB encl %px on SHM page: 0x%lx PCMD page: 0x%lx index %lu sum: 0x%llx \n",
                encl_page->encl, page_to_pfn(backing->contents), page_to_pfn(backing->pcmd),
                page_index, sum_page((u64 *)pginfo.contents));
   ...
}


ELDU:
__sgx_encl_eldu() {
  ...
  trace_printk("WARN encl %px ELDU on SHM page: 0x%lx PCMD page: 0x%lx index %lu sum: 0x%llx \n",
               encl, page_to_pfn(b.contents), page_to_pfn(b.pcmd), page_index, sum_page((u64 *)pginfo.contents));
  ...
}


Below is the last few entries of the trace buffer:
           ksgxd-806     [022] ...2.   857.259294: __sgx_encl_ewb: EWB encl ffff90a2a38d0000 on SHM page: 0x27906c PCMD page: 0x27906d index 168866 sum: 0x0
           ksgxd-806     [022] ...2.   857.259299: __sgx_encl_ewb: EWB encl ffff90a2a38d0000 on SHM page: 0x27906c PCMD page: 0x27906d index 168866 sum: 0x9f7dc28a239283bb
=>           ksgxd-806     [022] ...2.   857.259321: __sgx_encl_ewb: EWB encl ffff90a2a38d0000 on SHM page: 0x27906e PCMD page: 0x27906d index 168867 sum: 0xcfe97176b1e8c386
           ksgxd-806     [022] ...2.   857.259325: __sgx_encl_ewb: EWB encl ffff90a2a38d0000 on SHM page: 0x27906f PCMD page: 0x27906d index 168868 sum: 0xd1d012775db676d9
           ksgxd-806     [022] ...2.   857.259329: __sgx_encl_ewb: EWB encl ffff90a2a38d0000 on SHM page: 0x279070 PCMD page: 0x27906d index 168869 sum: 0x1c568b08bd31d7bb
           ksgxd-806     [022] ...2.   857.259333: __sgx_encl_ewb: EWB encl ffff90a2a38d0000 on SHM page: 0x279071 PCMD page: 0x27906d index 168870 sum: 0x4dba242d348982a7
=>  test_sgx_large-4092    [030] ...2.   857.259341: sgx_encl_eldu: WARN encl ffff90a2a38d0000 ELDU on SHM page: 0x27906e PCMD page: 0x279052 index 168867 sum: 0xcfe97176b1e8c386


It seems that in this scenario a page is evicted from the enclave and then almost immediately
faulted back in. At the time it is faulted back in the sum matches with the original
encrypted enclave page at the time it was evicted from the enclave.
The interesting part is that the enclave page's PFN matches with the PFN when it was
evicted, but the PCMD PFN does not match.

I did search the log and I cannot find the PCMD PFN printed during ELDU used by the
enclave at all.

I am not familiar with this area - is the PFN expected to be consistent? Do
you perhaps have an idea why the PFN of the PCMD page may have changed? This
is running with this series applied so the ELDU flow would do a lookup of the
PCMD page and not attempt to allocate a new one.

Any hints would be appreciated.

Thank you very much

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-04-29 18:50   ` Reinette Chatre
@ 2022-04-29 19:45     ` Dave Hansen
  2022-04-30  3:22       ` Reinette Chatre
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Hansen @ 2022-04-29 19:45 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]

On 4/29/22 11:50, Reinette Chatre wrote:
> I am not familiar with this area - is the PFN expected to be consistent? Do
> you perhaps have an idea why the PFN of the PCMD page may have changed? This
> is running with this series applied so the ELDU flow would do a lookup of the
> PCMD page and not attempt to allocate a new one.

First of all, cool!  This is really good progress!

It's possible for the PCMD shmem page to be swapped out and swapped back
in.  The pfn would be likely to change in that case.  But, that seems
highly unlikely in that short of a period of time.

I'd dump out two more things:

First, dump out page_pcmd_off, just like you're doing with page_index in
case page_pcmd_off itself is getting botched.  I looked but couldn't
find anything obvious.

Second, dump out the pfn in sgx_encl_truncate_backing_page().  It's
possible that something is getting overly-aggressive and zapping the
PCMD page too early.  That would be easy to explain with that PCMD
locking issue you discovered.  But, your patch should have fixed that issue.

For debugging, could you double-check that the PCMD page *is* empty
around sgx_encl_truncate_backing_page()?  If there's a race here you can
also enlarge the race window by adding an msleep() or a spin loop
somewhere after the memchr_inv().

You could also hold an extra reference on the PCMD page, maybe something
like the attached patch.  That will let you inspect the actual page
after it is *actually* truncated.  There should never be data in the
page there.

[-- Attachment #2: pcmd.diff --]
[-- Type: text/x-patch, Size: 773 bytes --]

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 7c63a1911fae..309ffae43e0f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -91,15 +91,20 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	 */
 	pcmd_page_empty = !memchr_inv(pcmd_page, 0, PAGE_SIZE);
 
-	kunmap_atomic(pcmd_page);
 	kunmap_atomic((void *)(unsigned long)pginfo.contents);
 
+	get_page(b.pcmd);
 	sgx_encl_put_backing(&b, false);
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-	if (pcmd_page_empty)
+	if (pcmd_page_empty) {
 		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
+		WARN_ON(memchr_inv(pcmd_page, 0, PAGE_SIZE);
+	}
+
+	kunmap_atomic(pcmd_page);
+	put_page(pcmd_page);
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-04-29 19:45     ` Dave Hansen
@ 2022-04-30  3:22       ` Reinette Chatre
  2022-04-30 15:52         ` Reinette Chatre
  2022-05-02 14:36         ` Dave Hansen
  0 siblings, 2 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-04-30  3:22 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 4/29/2022 12:45 PM, Dave Hansen wrote:
> On 4/29/22 11:50, Reinette Chatre wrote:
>> I am not familiar with this area - is the PFN expected to be consistent? Do
>> you perhaps have an idea why the PFN of the PCMD page may have changed? This
>> is running with this series applied so the ELDU flow would do a lookup of the
>> PCMD page and not attempt to allocate a new one.
> 
> First of all, cool!  This is really good progress!
> 
> It's possible for the PCMD shmem page to be swapped out and swapped back
> in.  The pfn would be likely to change in that case.  But, that seems
> highly unlikely in that short of a period of time.
> 
> I'd dump out two more things:
> 
> First, dump out page_pcmd_off, just like you're doing with page_index in
> case page_pcmd_off itself is getting botched.  I looked but couldn't
> find anything obvious.
> 
> Second, dump out the pfn in sgx_encl_truncate_backing_page().  It's
> possible that something is getting overly-aggressive and zapping the
> PCMD page too early.  That would be easy to explain with that PCMD
> locking issue you discovered.  But, your patch should have fixed that issue.
> 
> For debugging, could you double-check that the PCMD page *is* empty
> around sgx_encl_truncate_backing_page()?  If there's a race here you can
> also enlarge the race window by adding an msleep() or a spin loop
> somewhere after the memchr_inv().
> 
> You could also hold an extra reference on the PCMD page, maybe something
> like the attached patch.  That will let you inspect the actual page
> after it is *actually* truncated.  There should never be data in the
> page there.


Thank you so much for pointing me in the right direction. With this
information I believe that I found a race condition. I did create a
fix for it and it has been running for more than an hour now without
a WARN _and_ without the -ENOENT error introduced in patch 4/4.

Please find below the description of the race and the fix.

Scenario is that the reclaimer is evicting a range of pages that
are being faulted right away.

Consider three enclave pages (ENCLAVE_A, ENCLAVE_B, and ENCLAVE_C)
sharing a PCMD page (PCMD_ABC). ENCLAVE_A and ENCLAVE_B are in the
enclave memory and ENCLAVE_C is in backing store. PCMD_ABC contains
just one entry, that of ENCLAVE_C.

Scenario proceeds where ENCLAVE_A is being evicted from the enclave
while ENCLAVE_C is faulted in.

Scenario is with this patch series applied.

sgx_reclaim_pages() {

  ...

  /*
   * Reclaim ENCLAVE_A
   */
  mutex_lock(&encl->lock);
  /*
   * Get a reference to ENCLAVE_A's
   * shmem page where enclave page
   * encrypted data will be stored
   * as well as a reference to the
   * enclave page's PCMD data page,
   * PCMD_ABC.
   */
  sgx_encl_alloc_backing(...);
  mutex_unlock(&encl->lock);

                                    /*
                                     * Fault ENCLAVE_C
                                     */

                                    sgx_vma_fault() {

                                      mutex_lock(&encl->lock);
                                      /*
                                       * Get reference to
                                       * ENCLAVE_C's shmem page
                                       * as well as PCMD_ABC.
                                       */
                                      sgx_encl_get_backing(...)
                                     /*
                                      * load page back into
                                      * enclave via ELDU
                                      */
                                     /*
                                      * Release reference to
                                      * ENCLAVE_C' shmem page and
                                      * PCMD_ABC.
                                      */
                                     sgx_encl_put_backing(...);
                                     /*
                                      * PCMD_ABC is found empty so
                                      * it and ENCLAVE_C's shmem page
                                      * are truncated.
                                      */
                                     /* Truncate ENCLAVE_C backing page */
                                     sgx_encl_truncate_backing_page();
                                     /* Truncate PCMD_ABC */
                                     sgx_encl_truncate_backing_page();

                                     mutex_unlock(&encl->lock);

                                     ...
                                     }
  mutex_lock(&encl->lock);
  /*
  * Write encrypted contents of
  * ENCLAVE_A to ENCLAVE_A shmem
  * page and its PCMD data to
  * PCMD_ABC.
  */

  sgx_encl_put_backing(...)

  /*
   * Reference to PCMD_ABC is
   * dropped and it is truncated.
   * ENCLAVE_A's PCMD data is lost.
   */
  mutex_unlock(&encl->lock);
}



What happens next depends on whether it is ENCLAVE_A being faulted
in or ENCLAVE_B (or ENCLAVE_C) being evicted.

If ENCLAVE_A is faulted then the error introduced in patch 4/4 would
be encountered since lookup of PCMD_ABC fails. Before patch 4/4 a new
PCMD page would be allocated and then ENCLS[ELDU] would WARN on the PCMD
data.
If ENCLAVE_B is evicted first then a new PCMD_ABC would be allocated but
later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction would #GP
during its checks of the PCMD value and the WARN would be encountered.

Below is the fix I am currently testing. It would not truncate the memory
if there is a reference to the PCMD page.

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 22ed886dc825..81c7bfc12b9b 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -98,11 +98,10 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	kunmap_atomic(pcmd_page);
 	kunmap_atomic((void *)(unsigned long)pginfo.contents);
 
-	sgx_encl_put_backing(&b, true);
-
+	put_page(b.contents);
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-	if (pcmd_page_empty) {
+	if (pcmd_page_empty && && put_page_testzero(b.pcmd)) {
 		ida_free(&encl->pcmd_in_backing, PFN_DOWN(page_pcmd_off));
 		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
 	}


I plan on letting the test keep running to make sure it really fixes the
issue. If all goes well patch 4/4 can be dropped and the fix included instead.

Thanks again for your guidance. It was tremendously helpful.

Reinette

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-04-30  3:22       ` Reinette Chatre
@ 2022-04-30 15:52         ` Reinette Chatre
  2022-05-02 14:36         ` Dave Hansen
  1 sibling, 0 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-04-30 15:52 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang



On 4/29/2022 8:22 PM, Reinette Chatre wrote:

> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 22ed886dc825..81c7bfc12b9b 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -98,11 +98,10 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	kunmap_atomic(pcmd_page);
>  	kunmap_atomic((void *)(unsigned long)pginfo.contents);
>  
> -	sgx_encl_put_backing(&b, true);
> -
> +	put_page(b.contents);
>  	sgx_encl_truncate_backing_page(encl, page_index);
>  
> -	if (pcmd_page_empty) {
> +	if (pcmd_page_empty && && put_page_testzero(b.pcmd)) {

These tests need to be swapped and instead be:

	if (put_page_testzero(b.pcmd) && pcmd_page_empty) {

I fixed the tests. Even with previous order the concurrent
selftests ran overnight without encountering a single error
or WARN. I increased the number of test iterations and will
run it over the weekend with the correct test order shown
above to ensure the reference to the PCMD page is always
dropped.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-04-30  3:22       ` Reinette Chatre
  2022-04-30 15:52         ` Reinette Chatre
@ 2022-05-02 14:36         ` Dave Hansen
  2022-05-02 17:11           ` Reinette Chatre
  1 sibling, 1 reply; 50+ messages in thread
From: Dave Hansen @ 2022-05-02 14:36 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 4/29/22 20:22, Reinette Chatre wrote:
> -	if (pcmd_page_empty) {
> +	if (pcmd_page_empty && && put_page_testzero(b.pcmd)) {
>  		ida_free(&encl->pcmd_in_backing, PFN_DOWN(page_pcmd_off));
>  		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
>  	}

Reinette, nice work on tracking this down!

One comment on the fix though:  Will this put_page_testzero() ever
actually trigger?  The page cache itself holds a reference, which is
released in this case via:

	shmem_truncate_range() ->
	truncate_inode_pages_range() ->
	truncate_inode_folio() ->
	filemap_remove_folio() ->
	filemap_free_folio() ->
	folio_put_refs()

So I think the lowest the page refcount can actually be at the point of
put_page_testzero() is 1.

I suspect the end result of that patch would actually just be that PCMD
pages are never freed at runtime, which would also work around the bug
you discovered.

No matter what we do here, we need to ensure that there is no race between:

        pcmd_page_empty = !memchr_inv(pcmd_page, 0, PAGE_SIZE);
and
	sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));

Holding the encl->lock over that section of code would work.  It also
would not be awful to add a special "truncation lock" which is taken
before putting data in a pcmd page and is taken over the
memchr()->truncate window as well.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-02 14:36         ` Dave Hansen
@ 2022-05-02 17:11           ` Reinette Chatre
  2022-05-02 21:33             ` Dave Hansen
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-05-02 17:11 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 5/2/2022 7:36 AM, Dave Hansen wrote:
> On 4/29/22 20:22, Reinette Chatre wrote:
>> -	if (pcmd_page_empty) {
>> +	if (pcmd_page_empty && && put_page_testzero(b.pcmd)) {
>>  		ida_free(&encl->pcmd_in_backing, PFN_DOWN(page_pcmd_off));
>>  		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
>>  	}
> 
> Reinette, nice work on tracking this down!
> 
> One comment on the fix though:  Will this put_page_testzero() ever
> actually trigger?  The page cache itself holds a reference, which is
> released in this case via:
> 
> 	shmem_truncate_range() ->
> 	truncate_inode_pages_range() ->
> 	truncate_inode_folio() ->
> 	filemap_remove_folio() ->
> 	filemap_free_folio() ->
> 	folio_put_refs()
> 
> So I think the lowest the page refcount can actually be at the point of
> put_page_testzero() is 1.

oh, I see, thank you for pointing that out. We then need something like
"put_page_testone()".

> 
> I suspect the end result of that patch would actually just be that PCMD
> pages are never freed at runtime, which would also work around the bug
> you discovered.
> 
> No matter what we do here, we need to ensure that there is no race between:
> 
>         pcmd_page_empty = !memchr_inv(pcmd_page, 0, PAGE_SIZE);
> and
> 	sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
> 
> Holding the encl->lock over that section of code would work.  It also
> would not be awful to add a special "truncation lock" which is taken
> before putting data in a pcmd page and is taken over the
> memchr()->truncate window as well.

&encl->lock is already held over that section of the code so there
never is a race between the empty check and the truncate call.

The problem as I see it is that at the time __sgx_encl_eldu() is called
by the page fault handler (with enclave mutex held), the reclaimer already
has a reference to the PCMD page (but of course does _not_ have the enclave
mutex). The reclaimer expects to write to the PCMD page later when it obtains
the enclave mutex again and expects it to be safe since it has a reference
to the PCMD page.

The page fault handler will find the PCMD page empty when it removes
the last entry and thus truncate the page.

When reclaimer runs again it writes to the just-truncated PCMD page and
when it releases its reference the page is removed and the just written
data is lost.

My goal was to prevent the page fault handler from doing a "is the PCMD
page empty" if the reclaimer has a reference to it. Even if the PCMD page
is empty when the page fault handler checks it the page is expected to
get data right when reclaimer can get the mutex back since the reclaimer
already has a reference to the page.

Regarding your other suggestion, to use a "truncation lock". SGX just
gets the page pointer from shmem every time it is needed so there is
no room as I see it to add metadata to a PCMD page. One possibility
that we could do is for the page fault handler to check if any enclave
page sharing the PCMD page is being reclaimed (has the SGX_ENCL_PAGE_BEING_RECLAIMED
flag set). The reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED for an enclave
page when it obtains the reference to its PCMD page and clears the flag
when it releases the reference.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-02 17:11           ` Reinette Chatre
@ 2022-05-02 21:33             ` Dave Hansen
  2022-05-04 22:13               ` Reinette Chatre
  2022-05-07 17:46               ` Jarkko Sakkinen
  0 siblings, 2 replies; 50+ messages in thread
From: Dave Hansen @ 2022-05-02 21:33 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 5/2/22 10:11, Reinette Chatre wrote:
> My goal was to prevent the page fault handler from doing a "is the PCMD
> page empty" if the reclaimer has a reference to it. Even if the PCMD page
> is empty when the page fault handler checks it the page is expected to
> get data right when reclaimer can get the mutex back since the reclaimer
> already has a reference to the page.

I think shmem_truncate_range() might be the wrong operation.  It
destroys data and, in the end, we don't want to destroy data.
Filesystems and the page cache already have nice ways to keep from
destroying data, we just need to use them.

First, I think set_page_dirty(pcmd_page) before the EWB is a good start.
 That's what filesystems do before important data that needs to be saved
goes into pages.

Second, I think we need behavior like POSIX_FADV_DONTNEED, not
FALLOC_FL_PUNCH_HOLE.  The DONTNEED operations end up in
mapping_evict_folio(), which has both page refcount *and* dirty page
checks.  That means that either elevating a refcount or set_page_dirty()
will thwart DONTNEED-like behavior.

There are two basic things we need to do:

1. Prevent page from being truncated around EWB time
2. Prevent unreferenced page with all zeros from staying in shmem
   forever or taking up swap space

On the EWB (write to PCMD side) I think something like this works:

	sgx_encl_get_backing()
		get_page(pcmd_page)

	...
	lock_page(pcmd_page);
	// check for truncation since sgx_encl_get_backing()
	if (pcmd_page->mapping != shmem)
		goto retry;
	 // double check this is OK under lock_page():
	set_page_dirty(pcmd_page);
	__sgx_encl_ewb();
	unlock_page(pcmd_page);

That's basically what filesystems do.  Get the page from the page cache,
lock it, then make sure it is consistent.  If not, retry.

On the "free" / read in (ELDU) side:

	// get pcmd_page ref
	lock_page(pcmd_page);
	// truncation is not a concern because that's only done
	// on the read-in side, here, where we hold encl->lock

	memset();
	if (!memchr_inv())
		// clear the way for DONTNEED:
		ClearPageDirty(pcmd_page);
	unlock_page(pcmd_page);
	// drop pcmd_page ref
	...
	POSIX_FADV_DONTNEED

There's one downside to this: it's _possible_ that an transient
get_page() would block POSIX_FADV_DONTNEED.  Then the zeroed page would
stick around forever, or at least until the next ELDU operation did
another memchr_inv().

I went looking around for some of those and could not find any that I
*know* apply to shmem.

This doesn't feel like a great solution; it's more complicated than I
would like.  Any other ideas?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-04-28 23:49         ` Reinette Chatre
@ 2022-05-03  2:01           ` Kai Huang
  2022-05-07 17:25           ` Jarkko Sakkinen
  1 sibling, 0 replies; 50+ messages in thread
From: Kai Huang @ 2022-05-03  2:01 UTC (permalink / raw)
  To: Reinette Chatre, Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On Thu, 2022-04-28 at 16:49 -0700, Reinette Chatre wrote:
> > 
> > I also looked a little deeper at this transient failure problem.  The
> > ELDU documentation also mentions a possible error code of:
> > 
> >  	SGX_EPC_PAGE_CONFLICT
> > 
> > It *looks* like there can be conflicts on the SECS page as well as the
> > EPC page being explicitly accessed.  Is that a possible problem here?
> 
> I went down this path myself. SGX_EPC_PAGE_CONFLICT is an error code
> supported by newer ELDUC - the ELDU used in current code would indeed
> #GP in this case. The SDM text describing ELDUC as "This leaf function
> behaves like ELDU but with improved conflict handling for oversubscription"
> really does seem relevant to the test that triggers this issue.

This new error code and the new leaf functions with "C" postfix (ELDUC, etc) are
introduced to support VMM oversubscription of EPC.  VMM oversubscription of EPC
runs independently with guest so theoretically when VMM is performing some
operation on EPC in one CPU, guest running in another CPU can touch the EPC
simultaneously.

The new "C" variants are supposed to be used by VMM when it supports VMM
oversubscription of EPC, so that the VMM can a ENCLS instruction error code,
rather than a #GP when this case happens.  At guest side, when ENCLS conflicting
happens, VMM will get a VM-exit so it can be handed by VMM, i.e. by letting the
guest to run the same ENCLS again.

For now the SGX driver doesn't need to use the "C" variant, nor should it expect
the new SGX_EPC_PAGE_CONFLICT error code, because the driver already needs to
serialize those ENCLS leaf functions which can not run concurrently.  I assume
this should apply to SGX2 support too.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-04-28 20:11 [RFC PATCH 0/4] SGX shmem backing store issue Reinette Chatre
                   ` (5 preceding siblings ...)
  2022-04-28 21:29 ` Dave Hansen
@ 2022-05-04  6:40 ` Jarkko Sakkinen
  2022-05-05  6:09 ` Jarkko Sakkinen
  7 siblings, 0 replies; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-04  6:40 UTC (permalink / raw)
  To: Reinette Chatre; +Cc: dave.hansen, linux-sgx, haitao.huang

On Thu, Apr 28, 2022 at 01:11:23PM -0700, Reinette Chatre wrote:
> Hi Everybody,
> 
> Haitao reported encountering the following WARN while stress testing SGX
> with the SGX2 series [1] applied:
> 
> ELDU returned 1073741837 (0x4000000d)
> WARNING: CPU: 72 PID: 24407 at arch/x86/kernel/cpu/sgx/encl.c:81 sgx_encl_eldu+0x3cf/0x400
> ...
> Call Trace:
> <TASK>
> ? xa_load+0x6e/0xa0
> __sgx_encl_load_page+0x3d/0x80
> sgx_encl_load_page_in_vma+0x4a/0x60
> sgx_vma_fault+0x7f/0x3b0
> ? sysvec_call_function_single+0x66/0xd0
> ? asm_sysvec_call_function_single+0x12/0x20
> __do_fault+0x39/0x110
> __handle_mm_fault+0x1222/0x15a0
> handle_mm_fault+0xdb/0x2c0
> do_user_addr_fault+0x1d1/0x650
> ? exit_to_user_mode_prepare+0x45/0x170
> exc_page_fault+0x76/0x170
> ? asm_exc_page_fault+0x8/0x30
> asm_exc_page_fault+0x1e/0x30
> ...
> </TASK>
> 
> ENCLS[ELDU] is returning a #GP when attempting to load an enclave
> page from the backing store into the enclave. This is likely because
> of a MAC verification failure.
> 
> Haitao's stress testing involves running two concurrent loops of the SGX
> selftests on a system with 4GB EPC memory. One of the tests is modified
> to reduce the oversubscription heap size:
> 
> diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
> index d480c2dd2858..12008789325b 100644
> --- a/tools/testing/selftests/sgx/main.c
> +++ b/tools/testing/selftests/sgx/main.c
> @@ -398,7 +398,7 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
>  	 * Create enclave with additional heap that is as big as all
>  	 * available physical SGX memory.
>  	 */
> -	total_mem = get_total_epc_mem();
> +	total_mem = get_total_epc_mem()/16;
>  	ASSERT_NE(total_mem, 0);
>  	TH_LOG("Creating an enclave with %lu bytes heap may take a while ...",
>  	       total_mem);
> 
> If the the test compiled with above snippet is renamed as "test_sgx_small"
> and the original renamed as "test_sgx_large" the two concurrent loops are
> run as follows:
> 
> (for i in $(seq 1 999); do echo "Iteration $i"; sudo ./test_sgx_large; done ) > log.large 2>&1
> (for i in $(seq 1 9999); do echo "Iteration $i"; sudo ./test_sgx_small; done ) > log.small 2>&1
> 
> If the SGX driver is modified to always WARN when ENCLS[ELDU] encounters a #GP
> (see below) then the WARN appears after a few iterations of "test_sgx_large"
> and shows up throughout the testing.
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
> index 99004b02e2ed..68c1dbc84ed3 100644
> --- a/arch/x86/kernel/cpu/sgx/encls.h
> +++ b/arch/x86/kernel/cpu/sgx/encls.h
> @@ -18,7 +18,7 @@
>  #define ENCLS_WARN(r, name) {						  \
>  	do {								  \
>  		int _r = (r);						  \
> -		WARN_ONCE(_r, "%s returned %d (0x%x)\n", (name), _r, _r); \
> +		WARN(_r, "%s returned %d (0x%x)\n", (name), _r, _r); \
>  	} while (0);							  \
>  }
> 
> I focused on investigating this issue the past two weeks and was able to
> narrow down the cause but not yet fix the issue. Now I am out of ideas and
> would really appreciate if you could provide guidance on how to proceed.
> 
> Here is what I have learned so far:
> * Reverting commit 08999b2489b4 ("x86/sgx: Free backing memory after
>   faulting the enclave page") resolves the issue. With that commit
>   reverted the concurrent selftest loops can run to completion without
>   encountering any WARNs.
> 
> * The issue is also resolved if only the calls (introduced by commit
>   08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave
>   page") ) to sgx_encl_truncate_backing_page() within __sgx_encl_eldu()
>   are disabled.
> 
> * Two issues with commit 08999b2489b4 ("x86/sgx: Free backing memory
>   after faulting the enclave page") are fixed with patch 1 and 2 in
>   this series. These fixes do not resolve the WARN.
> 
> * There is a potential race between the reclaimer and the page fault
>   handler while interacting with the backing store. This is addressed
>   in patch 3, but it does not resolve the WARN.
> 
> * Lockdep did reveal a locking issue in SGX2 - the EAUG flow should not
>   do direct reclaim since the page fault handler already has mmap_lock
>   that the reclaimer will attempt to obtain. This will be fixed in the
>   next version of SGX2 but that fix does not resolve the WARN.
> 
> * Since ENCLS[ELDU] returns #GP on MAC verification failure a possible
>   cause could be if a problem occurs while loading the page from
>   backing store.
>   sgx_encl_get_backing_page() is used to both allocate backing storage
>   in the ENCLS[EWB] flow as well as to lookup backing storage in the
>   ENCLS[ELDU] flow.
>   As part of the investigation the interface was split to ensure that
>   backing storage is only allocated during the ENCLS[EWB] flow and only
>   lookup occurs during the ENCLS[ELDU] flow. This can be found in
>   patch 4.
>   This change revealed that there are cases during ENCLS[ELDU] flow
>   when the backing page is not present - attempting to lookup a
>   backing page returns -ENOENT. Before patch 4 a new backing page
>   would be allocated and thus trigger a #GP when ENCLS[ELDU] is
>   attempted.
>   This change is not a fix, the code path just fails earlier now, but
>   it reveals a cause of the WARNs encountered (MAC verification will
>   fail on a newly allocated page) but not why the pages cannot be found
>   in the backing store. With this change the number of WARNs are
>   reduced significantly but not completely. Even with this patch
>   applied the WARN was encountered once while running the stress
>   selftests.
> 
> Could you please advise on how to proceed? Do you perhaps have ideas
> why the backing store could not contain a page when it is loaded back
> during the ENCLS[ELDU] flow?
> There is some interesting text in the comments within shmem_fault()
> that hints that faulting pages should be avoided into holes as they
> are being punched ... but I do not understand those details and
> would really appreciate guidance on how to understand what is going on
> here.
> 
> These SGX2 tests exercise the concurrent operation of reclaimer and page
> fault handler and have a good track record of locating issues. Thanks
> to it we already have:
> 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
> ac5d272a0ad0 ("x86/sgx: Fix free page accounting")
> While this issue is triggered by the SGX2 tests I cannot find a connection
> with the SGX2 code paths. I have made a few attempts to create an
> equivalent test for SGX1 but have not yet had success to create a SGX1
> test that can trigger this issue.
> 
> These patches are based on v5.18-rc4 with the SGX2 series applied but
> do not touch SGX2 code paths. Patch 1,2, and 3 also apply cleanly to
> v5.18-rc4 while patch 4 has a conflict in arch/x86/kernel/cpu/sgx/encl.h
> due to the SGX2 declarations.
> 
> Your feedback will be greatly appreciated.
> Thank you very much
> 
> Reinette

Hi, sorry for the response lag. I was on sick leave for the 2nd half of
last week. Looking into this.

BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-02 21:33             ` Dave Hansen
@ 2022-05-04 22:13               ` Reinette Chatre
  2022-05-04 22:58                 ` Dave Hansen
  2022-05-04 23:05                 ` Dave Hansen
  2022-05-07 17:46               ` Jarkko Sakkinen
  1 sibling, 2 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-05-04 22:13 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 5/2/2022 2:33 PM, Dave Hansen wrote:
> On 5/2/22 10:11, Reinette Chatre wrote:
>> My goal was to prevent the page fault handler from doing a "is the PCMD
>> page empty" if the reclaimer has a reference to it. Even if the PCMD page
>> is empty when the page fault handler checks it the page is expected to
>> get data right when reclaimer can get the mutex back since the reclaimer
>> already has a reference to the page.
> 
> I think shmem_truncate_range() might be the wrong operation.  It
> destroys data and, in the end, we don't want to destroy data.
> Filesystems and the page cache already have nice ways to keep from
> destroying data, we just need to use them.
> 
> First, I think set_page_dirty(pcmd_page) before the EWB is a good start.
>  That's what filesystems do before important data that needs to be saved
> goes into pages.

ok, at this time the reclaimer sets the dirty bit _after_ writing to the
shmem backing store. 

From what I understand we need to split sgx_encl_put_backing() and remove
the part where dirty bit is set and only use sgx_encl_put_backing()
to do the put_page() calls.

The motivation for this change is not exactly clear to me since in
the implementation the access to these pages are protected by the
enclave mutex.

> 
> Second, I think we need behavior like POSIX_FADV_DONTNEED, not
> FALLOC_FL_PUNCH_HOLE.  The DONTNEED operations end up in
> mapping_evict_folio(), which has both page refcount *and* dirty page
> checks.  That means that either elevating a refcount or set_page_dirty()
> will thwart DONTNEED-like behavior.
> 
> There are two basic things we need to do:
> 
> 1. Prevent page from being truncated around EWB time
> 2. Prevent unreferenced page with all zeros from staying in shmem
>    forever or taking up swap space
> 
> On the EWB (write to PCMD side) I think something like this works:
> 
> 	sgx_encl_get_backing()
> 		get_page(pcmd_page)
> 
> 	...
> 	lock_page(pcmd_page);
> 	// check for truncation since sgx_encl_get_backing()
> 	if (pcmd_page->mapping != shmem)
> 		goto retry;
> 	 // double check this is OK under lock_page():
> 	set_page_dirty(pcmd_page);
> 	__sgx_encl_ewb();
> 	unlock_page(pcmd_page);
> 
> That's basically what filesystems do.  Get the page from the page cache,
> lock it, then make sure it is consistent.  If not, retry.
> 
> On the "free" / read in (ELDU) side:
> 
> 	// get pcmd_page ref
> 	lock_page(pcmd_page);
> 	// truncation is not a concern because that's only done
> 	// on the read-in side, here, where we hold encl->lock
> 
> 	memset();
> 	if (!memchr_inv())
> 		// clear the way for DONTNEED:
> 		ClearPageDirty(pcmd_page);
> 	unlock_page(pcmd_page);
> 	// drop pcmd_page ref
> 	...
> 	POSIX_FADV_DONTNEED
> 
> There's one downside to this: it's _possible_ that an transient
> get_page() would block POSIX_FADV_DONTNEED.  Then the zeroed page would
> stick around forever, or at least until the next ELDU operation did
> another memchr_inv().
> 
> I went looking around for some of those and could not find any that I
> *know* apply to shmem.
> 
> This doesn't feel like a great solution; it's more complicated than I
> would like.  Any other ideas?

I am not familiar with the memory management and found the above a bit
intimidating. Noting that you are open to other ideas I 
attempted to contain the fix using existing SGX state. Note that the
reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time it obtains a
reference to the backing store pages of an enclave page it is in the
process of reclaiming. The fix only truncates the PCMD page
after ensuring that no page sharing the PCMD page is in the process of being
reclaimed.

I confirmed with this fix that PCMD pages are indeed being truncated -
but only if they are empty _and_ no page sharing the PCMD page is in
the process of being truncated.

With this fix the error printed by patch 4/4 are no longer encountered
and that patch is thus no longer required.

Temporarily I added the line of debugging that you can find in the patch
below and this line is printed while running the stress test. Here is a
snippet from the log file to give an idea how frequently this scenario
is encountered while running the stress test:

[  368.277362] sgx: pcmd_page_in_use:49 encl ffff9e76d8660000 addr 0x7f2545148000 desc 0x7f2545148008 (index 282952) being reclaimed
[  368.277400] sgx: pcmd_page_in_use:49 encl ffff9e76d8660000 addr 0x7f2545149000 desc 0x7f2545149008 (index 282953) being reclaimed
[  655.620465] sgx: pcmd_page_in_use:49 encl ffff9e774ce50000 addr 0x7fbe340f9000 desc 0x7fbe340f9008 (index 213241) being reclaimed
[  655.620520] sgx: pcmd_page_in_use:49 encl ffff9e774ce50000 addr 0x7fbe340fa000 desc 0x7fbe340fa008 (index 213242) being reclaimed
[  717.514992] sgx: pcmd_page_in_use:49 encl ffff9e780cd60000 addr 0x7f312c8cc000 desc 0x7f312c8cc008 (index 182476) being reclaimed
[  911.387007] sgx: pcmd_page_in_use:49 encl ffff9e77a6770000 addr 0x7f252fa2d000 desc 0x7f252fa2d008 (index 195117) being reclaimed
[  946.991400] sgx: pcmd_page_in_use:49 encl ffff9e76dd1a0000 addr 0x7fc210784000 desc 0x7fc210784008 (index 67460) being reclaimed
[  946.991479] sgx: pcmd_page_in_use:49 encl ffff9e76dd1a0000 addr 0x7fc210788000 desc 0x7fc210788008 (index 67464) being reclaimed
[ 1157.210449] sgx: pcmd_page_in_use:49 encl ffff9e76dc550000 addr 0x7f273158c000 desc 0x7f273158c008 (index 202124) being reclaimed
[ 1158.478397] sgx: pcmd_page_in_use:49 encl ffff9e76dc550000 addr 0x7f2743cac000 desc 0x7f2743cac008 (index 277676) being reclaimed
[ 1158.478451] sgx: pcmd_page_in_use:49 encl ffff9e76dc550000 addr 0x7f2743cad000 desc 0x7f2743cad008 (index 277677) being reclaimed
[ 1158.478467] sgx: pcmd_page_in_use:49 encl ffff9e76dc550000 addr 0x7f2743cae000 desc 0x7f2743cae008 (index 277678) being reclaimed
[ 1615.425623] sgx: pcmd_page_in_use:49 encl ffff9e774cee0000 addr 0x7efe1692c000 desc 0x7efe1692c008 (index 92460) being reclaimed
[ 1677.797446] sgx: pcmd_page_in_use:49 encl ffff9e77efc30000 addr 0x7feb3637f000 desc 0x7feb3637f008 (index 222079) being reclaimed
[ 1802.242423] sgx: pcmd_page_in_use:49 encl ffff9e76de190000 addr 0x7f5344fbf000 desc 0x7f5344fbf008 (index 282559) being reclaimed
[ 1887.397432] sgx: pcmd_page_in_use:49 encl ffff9e76d7960000 addr 0x7f0870ba8000 desc 0x7f0870ba8008 (index 461736) being reclaimed
[ 1994.519753] sgx: pcmd_page_in_use:49 encl ffff9e76d98d0000 addr 0x7f0531b43000 desc 0x7f0531b43008 (index 203587) being reclaimed
[ 2178.070802] sgx: pcmd_page_in_use:49 encl ffff9e775b400000 addr 0x7f1134149000 desc 0x7f1134149008 (index 213321) being reclaimed
[ 2349.632505] sgx: pcmd_page_in_use:49 encl ffff9e76ed450000 addr 0x7f944b94c000 desc 0x7f944b94c008 (index 309580) being reclaimed
[ 2349.632597] sgx: pcmd_page_in_use:49 encl ffff9e76ed450000 addr 0x7f944b94d000 desc 0x7f944b94d008 (index 309581) being reclaimed
[ 2349.632613] sgx: pcmd_page_in_use:49 encl ffff9e76ed450000 addr 0x7f944b94e000 desc 0x7f944b94e008 (index 309582) being reclaimed

---8<---

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 3051a26179bc..53bcaa153378 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -12,6 +12,76 @@
 #include "encls.h"
 #include "sgx.h"
 
+/**
+ * pcmd_page_in_use() - Query if any enclave page associated with a PCMD
+ *                      page is in process of being reclaimed.
+ * @encl:        Enclave to which PCMD page belongs
+ * @start_addr:  Address of enclave page using first entry within the PCMD page
+ *
+ * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
+ * stored. The PCMD data of a reclaimed enclave page contains enough
+ * information for the processor to verify the page at the time
+ * it is loaded back into the Enclave Page Cache (EPC).
+ *
+ * The backing storage to which enclave pages are reclaimed is laid out as
+ * follows:
+ * All enclave pages:SECS page:PCMD pages
+ *
+ * Each PCMD page contains the PCMD metadata of
+ * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
+ *
+ * A PCMD page can only be truncated if it is (a) empty, and (b) no enclave
+ * page sharing the PCMD page is in the process of being reclaimed.
+ *
+ * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
+ * intends to reclaim that enclave page - it means that the PCMD data
+ * associated with that enclave page is about to get some data and thus
+ * even if the PCMD page is empty, it should not be truncated.
+ *
+ * Return: 1 the PCMD page is in use, 0 if is not in use, -EFAULT on error
+ */
+static int pcmd_page_in_use(struct sgx_encl *encl,
+			    unsigned long start_addr)
+{
+	struct sgx_encl_page *entry;
+	unsigned long addr;
+	int reclaimed = 0;
+	int i;
+
+	for (i = 0; i < PAGE_SIZE/sizeof(struct sgx_pcmd); i++) {
+		addr = start_addr + i * PAGE_SIZE;
+
+		/*
+		 * Stop when reaching the SECS page - it does not
+		 * have a page_array entry and its reclaim is
+		 * started and completed with enclave mutex held so
+		 * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
+		 * flag.
+		 */
+		if (addr == encl->base + encl->size)
+			break;
+
+		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
+		if (!entry)
+			return -EFAULT;
+
+		/*
+		 * VA page slot ID uses same bit as the flag so it is important
+		 * to ensure that the page is not already in backing store.
+		 */
+		if (entry->epc_page &&
+		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
+			pr_debug("%s:%d encl %px addr 0x%lx desc 0x%lx (index %lu) being reclaimed\n",
+				 __func__, __LINE__, encl, addr, entry->desc,
+				 PFN_DOWN(entry->desc - entry->encl->base));
+			reclaimed = 1;
+			break;
+		}
+	}
+
+	return reclaimed;
+}
+
 /*
  * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's
  * follow right after the EPC data in the backing storage. In addition to the
@@ -47,6 +117,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK;
 	struct sgx_encl *encl = encl_page->encl;
 	pgoff_t page_index, page_pcmd_off;
+	unsigned long pcmd_first_page;
 	struct sgx_pageinfo pginfo;
 	struct sgx_backing b;
 	bool pcmd_page_empty;
@@ -58,6 +129,12 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	else
 		page_index = PFN_DOWN(encl->size);
 
+	/*
+	 * Address of enclave page using the first entry within the
+	 * PCMD page.
+	 */
+	pcmd_first_page = PFN_PHYS(page_index & ~0x1FUL) + encl->base;
+
 	page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
 
 	ret = sgx_encl_lookup_backing(encl, page_index, &b);
@@ -103,7 +180,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-	if (pcmd_page_empty) {
+	if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page)) {
 		ida_free(&encl->pcmd_in_backing, PFN_DOWN(page_pcmd_off));
 		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
 	}
---8<---

What do you think?

Reinette

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-04 22:13               ` Reinette Chatre
@ 2022-05-04 22:58                 ` Dave Hansen
  2022-05-04 23:36                   ` Reinette Chatre
  2022-05-04 23:05                 ` Dave Hansen
  1 sibling, 1 reply; 50+ messages in thread
From: Dave Hansen @ 2022-05-04 22:58 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 5/4/22 15:13, Reinette Chatre wrote:
> +		/*
> +		 * VA page slot ID uses same bit as the flag so it is important
> +		 * to ensure that the page is not already in backing store.
> +		 */
> +		if (entry->epc_page &&
> +		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
> +			pr_debug("%s:%d encl %px addr 0x%lx desc 0x%lx (index %lu) being reclaimed\n",
> +				 __func__, __LINE__, encl, addr, entry->desc,
> +				 PFN_DOWN(entry->desc - entry->encl->base));
> +			reclaimed = 1;
> +			break;
> +		}

I don't think this works as-is.  As it stands,
SGX_ENCL_PAGE_BEING_RECLAIMED is cleared *BEFORE* the EWB.  It's
possible for that check above to race with SGX_ENCL_PAGE_BEING_RECLAIMED
being cleared.  That would erroneously lead pcmd_page_in_use() to return
false *just* before data is written to the PCMD page.

static void sgx_encl_ewb()
{
        encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED;
	...
	__sgx_encl_ewb()
}

You might be able to move the ~SGX_ENCL_PAGE_BEING_RECLAIMED until after
the __sgx_encl_ewb().  But, at that point, the pcmd_page_empty is a bit
useless because it's stale.  I guess it might act as a performance
optimization because it allows avoiding the expensive pcmd_page_in_use()
check.

Also, one nit:

> +	/*
> +	 * Address of enclave page using the first entry within the
> +	 * PCMD page.
> +	 */
> +	pcmd_first_page = PFN_PHYS(page_index & ~0x1FUL) + encl->base;

That ~0x1FUL is pretty magic.  I assume that's 5 bits because there are
32 PCMDs per page.  This could probably be something resembling this
instead:

#define PCMDS_PER_PAGE (PAGE_SIZE/sizeof(struct sgx_pcmd))
#define PCMD_FIRST_MASK GENMASK(PCMDS_PER_PAGE, 0)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-04 22:13               ` Reinette Chatre
  2022-05-04 22:58                 ` Dave Hansen
@ 2022-05-04 23:05                 ` Dave Hansen
  1 sibling, 0 replies; 50+ messages in thread
From: Dave Hansen @ 2022-05-04 23:05 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 5/4/22 15:13, Reinette Chatre wrote:
> -	if (pcmd_page_empty) {
> +	if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page)) {
>  		ida_free(&encl->pcmd_in_backing, PFN_DOWN(page_pcmd_off));
>  		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
>  	}

One other thing.  The role of encl->lock here is very important.
Without it, two concurrent page faults could do their individual
memset(), each see !pcmd_page_empty, then decline to truncate the page.

Also, given the challenges here, I do think we should check the
pcmd_page after truncate to ensure it is still all zero's.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-04 22:58                 ` Dave Hansen
@ 2022-05-04 23:36                   ` Reinette Chatre
  2022-05-04 23:50                     ` Dave Hansen
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-05-04 23:36 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 5/4/2022 3:58 PM, Dave Hansen wrote:
> On 5/4/22 15:13, Reinette Chatre wrote:
>> +		/*
>> +		 * VA page slot ID uses same bit as the flag so it is important
>> +		 * to ensure that the page is not already in backing store.
>> +		 */
>> +		if (entry->epc_page &&
>> +		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
>> +			pr_debug("%s:%d encl %px addr 0x%lx desc 0x%lx (index %lu) being reclaimed\n",
>> +				 __func__, __LINE__, encl, addr, entry->desc,
>> +				 PFN_DOWN(entry->desc - entry->encl->base));
>> +			reclaimed = 1;
>> +			break;
>> +		}
> 
> I don't think this works as-is.  As it stands,
> SGX_ENCL_PAGE_BEING_RECLAIMED is cleared *BEFORE* the EWB.  It's
> possible for that check above to race with SGX_ENCL_PAGE_BEING_RECLAIMED
> being cleared.  That would erroneously lead pcmd_page_in_use() to return
> false *just* before data is written to the PCMD page.
> 
> static void sgx_encl_ewb()
> {
>         encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED;
> 	...
> 	__sgx_encl_ewb()
> }

You are correct that SGX_ENCL_PAGE_BEING_RECLAIMED is cleared before
EWB. Please do note though that clearing that bit and running EWB
are all done in one section with the enclave mutex held the entire time.

The page fault handler cannot check for SGX_ENCL_PAGE_BEING_RECLAIMED
until the reclaimer releases the mutex.

These checks are thus not able to race. Here is a higher level view:


    sgx_reclaim_pages() {

      ...

      /*
       * Reclaim ENCLAVE_A
       */
      mutex_lock(&encl->lock);
      /*
       * Get a reference to ENCLAVE_A's
       * shmem page where enclave page
       * encrypted data will be stored
       * as well as a reference to the
       * enclave page's PCMD data page,
       * PCMD_AB.
       * Release mutex before writing
       * any data to the shmem pages.
       */
      sgx_encl_get_backing(...);
      encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
      mutex_unlock(&encl->lock);

                                        /*
                                         * Fault ENCLAVE_B
                                         */

                                        sgx_vma_fault() {

                                          mutex_lock(&encl->lock);
                                          /*
                                           * Get reference to
                                           * ENCLAVE_B's shmem page
                                           * as well as PCMD_AB.
                                           */
                                          sgx_encl_get_backing(...)
                                         /*
                                          * Load page back into
                                          * enclave via ELDU.
                                          */
                                         /*
                                          * Release reference to
                                          * ENCLAVE_B' shmem page and
                                          * PCMD_AB.
                                          */
                                         sgx_encl_put_backing(...);
                                         /*
                                          * PCMD_AB is found empty so
                                          * it and ENCLAVE_B's shmem page
                                          * are truncated.
                                          */
                                         /* Truncate ENCLAVE_B backing page */
                                         sgx_encl_truncate_backing_page();

                                          /*
                                           * pcmd_page_empty indicates that
                                           * PCMD_AB is empty
                                           */
                                          pcmd_page_in_use() {
                                           /*
                                            * Check SGX_ENCL_PAGE_BEING_RECLAIMED
                                            * of pages sharing PCMD page.
                                            * Will find that ENCLAVE_A's 
                                            * SGX_ENCL_PAGE_BEING_RECLAIMED is
                                            * set and PCMD_AB page will NOT
                                            * be truncated.
                                            */
                                          }
                                         /* sgx_encl_truncate_backing_page(); => not run */

                                         mutex_unlock(&encl->lock);

                                         ...
                                         }
      mutex_lock(&encl->lock);
      encl_page->desc &=
           ~SGX_ENCL_PAGE_BEING_RECLAIMED;
      /*
      * Write encrypted contents of
      * ENCLAVE_A to ENCLAVE_A shmem
      * page and its PCMD data to
      * PCMD_AB.
      */
      sgx_encl_put_backing(...)

      /*
       * Reference to PCMD_AB is
       * dropped.
       */
      mutex_unlock(&encl->lock);
    }



> 
> You might be able to move the ~SGX_ENCL_PAGE_BEING_RECLAIMED until after
> the __sgx_encl_ewb().

This will not change much since all is done with mutex held. At the current
location that bit has served its purpose and the code that follows will
replace it with the VA slot information.

>  But, at that point, the pcmd_page_empty is a bit
> useless because it's stale. 

I cannot see how pcmd_page_empty would be stale - could you please elaborate
on the flow where you see this happening?

> I guess it might act as a performance
> optimization because it allows avoiding the expensive pcmd_page_in_use()
> check.
> 
> Also, one nit:
> 
>> +	/*
>> +	 * Address of enclave page using the first entry within the
>> +	 * PCMD page.
>> +	 */
>> +	pcmd_first_page = PFN_PHYS(page_index & ~0x1FUL) + encl->base;
> 
> That ~0x1FUL is pretty magic.  I assume that's 5 bits because there are
> 32 PCMDs per page.  This could probably be something resembling this
> instead:
> 
> #define PCMDS_PER_PAGE (PAGE_SIZE/sizeof(struct sgx_pcmd))
> #define PCMD_FIRST_MASK GENMASK(PCMDS_PER_PAGE, 0)
> 

Thank you, yes, that needs to improve.

> One other thing.  The role of encl->lock here is very important.
> Without it, two concurrent page faults could do their individual
> memset(), each see !pcmd_page_empty, then decline to truncate the page.

Your argument is not clear to me. Yes, this would be an issue (and
that will not be the only issue) if the enclave mutex is not held
by the page fault handler ... but the enclave mutex _is_ held by the
page fault handler.
 
> Also, given the challenges here, I do think we should check the
> pcmd_page after truncate to ensure it is still all zero's.

Could you please elaborate the challenges? I do not think it would
help the issue at hand at all to add a check since this is done with
the mutex held while the reclaimer has a reference to the page but no
mutex ... as long as the page fault handler has that mutex it can write
and check the PCMD page all it want, the new changes would occur to the
PCMD page when it (the page fault handler) releases the mutex and the
reclaimer attempts to use the page it has a reference to.

Reinette





^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-04 23:36                   ` Reinette Chatre
@ 2022-05-04 23:50                     ` Dave Hansen
  2022-05-05  0:08                       ` Reinette Chatre
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Hansen @ 2022-05-04 23:50 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

On 5/4/22 16:36, Reinette Chatre wrote:
> The page fault handler cannot check for SGX_ENCL_PAGE_BEING_RECLAIMED
> until the reclaimer releases the mutex.

Ahh, got it now.  I thought the mutex wasn't held on the reclaimer side.
 Thanks for the refresher.

>> One other thing.  The role of encl->lock here is very important.
>> Without it, two concurrent page faults could do their individual
>> memset(), each see !pcmd_page_empty, then decline to truncate the page.
> 
> Your argument is not clear to me. Yes, this would be an issue (and
> that will not be the only issue) if the enclave mutex is not held
> by the page fault handler ... but the enclave mutex _is_ held by the
> page fault handler.

It would just be nice to clearly state the locking logic in the eventual
changelog.

>> Also, given the challenges here, I do think we should check the
>> pcmd_page after truncate to ensure it is still all zero's.
> 
> Could you please elaborate the challenges? I do not think it would
> help the issue at hand at all to add a check since this is done with
> the mutex held while the reclaimer has a reference to the page but no
> mutex ... as long as the page fault handler has that mutex it can write
> and check the PCMD page all it want, the new changes would occur to the
> PCMD page when it (the page fault handler) releases the mutex and the
> reclaimer attempts to use the page it has a reference to.

I'm just asking for a debugging check that looks at the page *after* is
has been truncated to ensure it is zero, in case another bug creeps in here.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-04 23:50                     ` Dave Hansen
@ 2022-05-05  0:08                       ` Reinette Chatre
  0 siblings, 0 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-05-05  0:08 UTC (permalink / raw)
  To: Dave Hansen, dave.hansen, jarkko, linux-sgx; +Cc: haitao.huang

Hi Dave,

On 5/4/2022 4:50 PM, Dave Hansen wrote:
> On 5/4/22 16:36, Reinette Chatre wrote:
>> The page fault handler cannot check for SGX_ENCL_PAGE_BEING_RECLAIMED
>> until the reclaimer releases the mutex.
> 
> Ahh, got it now.  I thought the mutex wasn't held on the reclaimer side.
>  Thanks for the refresher.
> 
>>> One other thing.  The role of encl->lock here is very important.
>>> Without it, two concurrent page faults could do their individual
>>> memset(), each see !pcmd_page_empty, then decline to truncate the page.
>>
>> Your argument is not clear to me. Yes, this would be an issue (and
>> that will not be the only issue) if the enclave mutex is not held
>> by the page fault handler ... but the enclave mutex _is_ held by the
>> page fault handler.
> 
> It would just be nice to clearly state the locking logic in the eventual
> changelog.

Will do. Below is the changelog I am currently working on:

x86/sgx: Fix race between reclaimer and page fault handler

Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
instruction faulting with a #GP.

The WARN is encountered when the reclaimer evicts a range of
pages from the enclave when the same pages are faulted back right away.

Consider two enclave pages (ENCLAVE_A and ENCLAVE_B)
sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the
enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains
just one entry, that of ENCLAVE_B.

Scenario proceeds where ENCLAVE_A is being evicted from the enclave
while ENCLAVE_B is faulted in.

sgx_reclaim_pages() {

  ...

  /*
   * Reclaim ENCLAVE_A
   */
  mutex_lock(&encl->lock);
  /*
   * Get a reference to ENCLAVE_A's
   * shmem page where enclave page
   * encrypted data will be stored
   * as well as a reference to the
   * enclave page's PCMD data page,
   * PCMD_AB.
   * Release mutex before writing
   * any data to the shmem pages.
   */
  sgx_encl_get_backing(...);
  encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
  mutex_unlock(&encl->lock);

                                    /*
                                     * Fault ENCLAVE_B
                                     */

                                    sgx_vma_fault() {

                                      mutex_lock(&encl->lock);
                                      /*
                                       * Get reference to
                                       * ENCLAVE_B's shmem page
                                       * as well as PCMD_AB.
                                       */
                                      sgx_encl_get_backing(...)
                                     /*
                                      * Load page back into
                                      * enclave via ELDU.
                                      */
                                     /*
                                      * Release reference to
                                      * ENCLAVE_B' shmem page and
                                      * PCMD_AB.
                                      */
                                     sgx_encl_put_backing(...);
                                     /*
                                      * PCMD_AB is found empty so
                                      * it and ENCLAVE_B's shmem page
                                      * are truncated.
                                      */
                                     /* Truncate ENCLAVE_B backing page */
                                     sgx_encl_truncate_backing_page();
                                     /* Truncate PCMD_AB */
                                     sgx_encl_truncate_backing_page();

                                     mutex_unlock(&encl->lock);

                                     ...
                                     }
  mutex_lock(&encl->lock);
  encl_page->desc &=
       ~SGX_ENCL_PAGE_BEING_RECLAIMED;
  /*
  * Write encrypted contents of
  * ENCLAVE_A to ENCLAVE_A shmem
  * page and its PCMD data to
  * PCMD_AB.
  */
  sgx_encl_put_backing(...)

  /*
   * Reference to PCMD_AB is
   * dropped and it is truncated.
   * ENCLAVE_A's PCMD data is lost.
   */
  mutex_unlock(&encl->lock);
}

What happens next depends on whether it is ENCLAVE_A being faulted
in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting
with a #GP.

If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called
a new PCMD page is allocated and providing the empty PCMD data for
ENCLAVE_A would cause ENCLS[ELDU] to #GP

If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the
reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction
would #GP during its checks of the PCMD value and the WARN would be
encountered.

Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time
it obtains a reference to the backing store pages of an enclave page it is
in the process of reclaiming, fix the race by only truncating the PCMD page
after ensuring that no page sharing the PCMD page is in the process of being
reclaimed.

Reported-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>

> 
>>> Also, given the challenges here, I do think we should check the
>>> pcmd_page after truncate to ensure it is still all zero's.
>>
>> Could you please elaborate the challenges? I do not think it would
>> help the issue at hand at all to add a check since this is done with
>> the mutex held while the reclaimer has a reference to the page but no
>> mutex ... as long as the page fault handler has that mutex it can write
>> and check the PCMD page all it want, the new changes would occur to the
>> PCMD page when it (the page fault handler) releases the mutex and the
>> reclaimer attempts to use the page it has a reference to.
> 
> I'm just asking for a debugging check that looks at the page *after* is
> has been truncated to ensure it is zero, in case another bug creeps in here.

ok, I can add your patch from
https://lore.kernel.org/linux-sgx/faaaf70d-22b7-eaa5-4623-bbdf1012827a@intel.com/
to this series.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-04-28 20:11 [RFC PATCH 0/4] SGX shmem backing store issue Reinette Chatre
                   ` (6 preceding siblings ...)
  2022-05-04  6:40 ` Jarkko Sakkinen
@ 2022-05-05  6:09 ` Jarkko Sakkinen
  7 siblings, 0 replies; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-05  6:09 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, linux-sgx; +Cc: haitao.huang

On Thu, 2022-04-28 at 13:11 -0700, Reinette Chatre wrote:

> diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
> index d480c2dd2858..12008789325b 100644
> --- a/tools/testing/selftests/sgx/main.c
> +++ b/tools/testing/selftests/sgx/main.c
> @@ -398,7 +398,7 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
>          * Create enclave with additional heap that is as big as all
>          * available physical SGX memory.
>          */
> -       total_mem = get_total_epc_mem();
> +       total_mem = get_total_epc_mem()/16;
>         ASSERT_NE(total_mem, 0);
>         TH_LOG("Creating an enclave with %lu bytes heap may take a while ...",
>                total_mem);
> 
> If the the test compiled with above snippet is renamed as "test_sgx_small"
> and the original renamed as "test_sgx_large" the two concurrent loops are
> run as follows:
> 
> (for i in $(seq 1 999); do echo "Iteration $i"; sudo ./test_sgx_large; done ) > log.large 2>&1
> (for i in $(seq 1 9999); do echo "Iteration $i"; sudo ./test_sgx_small; done ) > log.small 2>&1

Setting up VM image to see if I could reproduce this locally.

I'll reduce the iterations to 64 and 640 to scale the test to 256 MB EPC size.

BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-04-28 21:30   ` Dave Hansen
  2022-04-28 22:20     ` Reinette Chatre
@ 2022-05-06 22:09     ` Jarkko Sakkinen
  1 sibling, 0 replies; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-06 22:09 UTC (permalink / raw)
  To: Dave Hansen, Reinette Chatre, dave.hansen, linux-sgx; +Cc: haitao.huang

On Thu, 2022-04-28 at 14:30 -0700, Dave Hansen wrote:
> On 4/28/22 13:11, Reinette Chatre wrote:
> > 
> > The backing storage is freed after running ENCLS[ELDU],
> > whether ENCLS[ELDU] succeeded or not. If ENCLS[ELDU]
> > thus failed then the data within that page is lost.
> > 
> > Exit with error without removing the backing storage if
> > it could not be restored to the enclave.
> > 
> > Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
> > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/encl.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index 1a2cbe44b8d9..e5d2661800ac 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -81,6 +81,10 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
> >                         ENCLS_WARN(ret, "ELDU");
> >  
> >                 ret = -EFAULT;
> > +               kunmap_atomic(pcmd_page);
> > +               kunmap_atomic((void *)(unsigned long)pginfo.contents);
> > +               sgx_encl_put_backing(&b, false);
> > +               return ret;
> >         }
> >  
> >         memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
> 
> Are there any transient, recoverable errors that can come back from
> ELDU?  If so, this makes a lot of sense.  If not, then it doesn't make a
> lot of sense to preserve the swapped-out content because they enclave is
> going to die anyway.

Nope, it's pretty much game over then.

BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 2/4] x86/sgx: Set dirty bit after modifying page contents
  2022-04-28 20:11 ` [RFC PATCH 2/4] x86/sgx: Set dirty bit after modifying page contents Reinette Chatre
  2022-04-28 21:40   ` Dave Hansen
@ 2022-05-06 22:27   ` Jarkko Sakkinen
  2022-05-06 22:40     ` Reinette Chatre
  1 sibling, 1 reply; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-06 22:27 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, linux-sgx; +Cc: haitao.huang

On Thu, 2022-04-28 at 13:11 -0700, Reinette Chatre wrote:
> Recent commit 08999b2489b4 ("x86/sgx: Free backing memory
> after faulting the enclave page") expanded __sgx_encl_eldu()
> to clear an enclave page's PCMD (Paging Crypto MetaData)
> from the PCMD page in the backing store after the enclave
> page is restored to the enclave.
> 
> Since the PCMD page in the backing store is modified the page
> should be set as dirty when releasing the reference to
> ensure the modified data is retained.
> 
> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/encl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index e5d2661800ac..e03f124ce772 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -98,7 +98,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>         kunmap_atomic(pcmd_page);
>         kunmap_atomic((void *)(unsigned long)pginfo.contents);
>  
> -       sgx_encl_put_backing(&b, false);
> +       sgx_encl_put_backing(&b, true);
>  
>         sgx_encl_truncate_backing_page(encl, page_index);
>  

So, the implementation is:

void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write)
{
	if (do_write) {
		set_page_dirty(backing->pcmd);
		set_page_dirty(backing->contents);
	}

	put_page(backing->pcmd);
	put_page(backing->contents);
}

And we only want to set dirty for PCMD part, right?

Thus, perhaps we should fix it instead by:

	set_page_dirty(backing->pcmd);
	put_page(backing->pcmd):
	put_page(backing->contents);
	
?	

I would not mind getting rid of that function anyway. It's kind
of bad wrapping IMHO.
	
BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 2/4] x86/sgx: Set dirty bit after modifying page contents
  2022-05-06 22:27   ` Jarkko Sakkinen
@ 2022-05-06 22:40     ` Reinette Chatre
  2022-05-07 18:01       ` Jarkko Sakkinen
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-05-06 22:40 UTC (permalink / raw)
  To: Jarkko Sakkinen, dave.hansen, linux-sgx; +Cc: haitao.huang

Hi Jarkko,

On 5/6/2022 3:27 PM, Jarkko Sakkinen wrote:
> On Thu, 2022-04-28 at 13:11 -0700, Reinette Chatre wrote:
>> Recent commit 08999b2489b4 ("x86/sgx: Free backing memory
>> after faulting the enclave page") expanded __sgx_encl_eldu()
>> to clear an enclave page's PCMD (Paging Crypto MetaData)
>> from the PCMD page in the backing store after the enclave
>> page is restored to the enclave.
>>
>> Since the PCMD page in the backing store is modified the page
>> should be set as dirty when releasing the reference to
>> ensure the modified data is retained.
>>
>> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>>  arch/x86/kernel/cpu/sgx/encl.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
>> index e5d2661800ac..e03f124ce772 100644
>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>> @@ -98,7 +98,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>>         kunmap_atomic(pcmd_page);
>>         kunmap_atomic((void *)(unsigned long)pginfo.contents);
>>  
>> -       sgx_encl_put_backing(&b, false);
>> +       sgx_encl_put_backing(&b, true);
>>  
>>         sgx_encl_truncate_backing_page(encl, page_index);
>>  
> 
> So, the implementation is:
> 
> void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write)
> {
> 	if (do_write) {
> 		set_page_dirty(backing->pcmd);
> 		set_page_dirty(backing->contents);
> 	}
> 
> 	put_page(backing->pcmd);
> 	put_page(backing->contents);
> }
> 
> And we only want to set dirty for PCMD part, right?
> 
> Thus, perhaps we should fix it instead by:
> 
> 	set_page_dirty(backing->pcmd);
> 	put_page(backing->pcmd):
> 	put_page(backing->contents);
> 	
> ?	
> 
> I would not mind getting rid of that function anyway. It's kind
> of bad wrapping IMHO.

Could we rather just change sgx_encl_put_backing() to be:

void sgx_encl_put_backing(struct sgx_backing *backing)
{
	put_page(backing->pcmd);
	put_page(backing->contents);
}

Two reasons:

1) Instead of getting rid of sgx_encl_put_backing() we can keep
   it so that its name reflects its work and its usage remains
   symmetrical to sgx_encl_get_backing() that will stay and
   the code thus continues to be clear on the page references.

2) By moving the set_page_dirty() out of that function the
   set_page_dirty() can be called _before_ writing data to
   the page, which is the right thing to do per:
   https://lore.kernel.org/linux-sgx/c057af3d-b7fb-34cd-0d75-989fca0e67fe@intel.com/

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 3/4] x86/sgx: Obtain backing storage page with enclave mutex held
  2022-04-28 20:11 ` [RFC PATCH 3/4] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
  2022-04-28 21:58   ` Dave Hansen
@ 2022-05-06 22:43   ` Jarkko Sakkinen
  1 sibling, 0 replies; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-06 22:43 UTC (permalink / raw)
  To: Reinette Chatre; +Cc: dave.hansen, linux-sgx, haitao.huang

On Thu, Apr 28, 2022 at 01:11:26PM -0700, Reinette Chatre wrote:
> The SGX backing storage is accessed on two paths: when there
> are insufficient enclave pages in the EPC the reclaimer works
> to move enclave pages to the backing storage and as enclaves
> access pages that have been moved to the backing storage
> they are retrieved from there as part of page fault handling.
> 
> An oversubscribed SGX system will often run the reclaimer and
> page fault handler concurrently and needs to ensure that the
> backing store is accessed safely between the reclaimer and
> the page fault handler. The scenarios to consider here are:
> (a) faulting a page right after it was reclaimed,
> (b) faulting a page and reclaiming another page that are
>     sharing a PCMD page.
> 
> The reclaimer obtains pages from the backing storage without
> holding the enclave mutex and runs the risk of concurrently
> accessing the backing storage with the page fault handler that
> does access the backing storage with the enclave mutex held.
> 
> In the scenario below a page is written to the backing store
> by the reclaimer and then immediately faulted back, before
> the reclaimer is able to set the dirty bit of the page:
> 
> sgx_reclaim_pages() {                    sgx_vma_fault() {
> ...                                      ...
> /* write data to backing store */
> sgx_reclaimer_write();
>                                          mutex_lock(&encl->lock);
>                                          __sgx_encl_eldu() {
>                                          ...
>                                          /* page not dirty -
>                                           * contents may not be
>                                           * up to date
>                                          */
>                                          sgx_encl_get_backing();
>                                          ...
>                                          }
>                                          ...
> /* set page dirty */
> sgx_encl_put_backing();
> ...
>                                          mutex_unlock(&encl->lock);
> }                                        }
> 
> While it is not possible to concurrently reclaim and fault the same
> enclave page the PCMD pages are shared between enclave pages
> in the enclave and enclave pages in the backing store.
> In the below scenario a PCMD page is truncated from the backing
> store after all its pages have been loaded in to the enclave
> at the same time the PCMD page is loaded from the backing store
> when one of its pages are reclaimed:
> 
> sgx_reclaim_pages() {              sgx_vma_fault() {
>                                      ...
>                                      mutex_lock(&encl->lock);
>                                      ...
>                                      __sgx_encl_eldu() {
>                                        ...
>                                        if (pcmd_page_empty) {
> /*
>  * EPC page being reclaimed              /*
>  * shares a PCMD page with an             * PCMD page truncated
>  * enclave page that is being             * while requested from
>  * faulted in.                            * reclaimer.
>  */                                       */
> sgx_encl_get_backing()  <---------->      sgx_encl_truncate_backing_page()
>                                         }
> }                                    }
> 
> Protect the reclaimer's backing store access with the enclave's mutex
> to ensure that it can safely run concurrently with the page fault handler.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 0e8741a80cf3..ae79b8d6f645 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -252,6 +252,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>  	sgx_encl_ewb(epc_page, backing);
>  	encl_page->epc_page = NULL;
>  	encl->secs_child_cnt--;
> +	sgx_encl_put_backing(backing, true);
>  
>  	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
>  		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
> @@ -323,11 +324,14 @@ static void sgx_reclaim_pages(void)
>  			goto skip;
>  
>  		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
> +
> +		mutex_lock(&encl_page->encl->lock);
>  		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
> -		if (ret)
> +		if (ret) {
> +			mutex_unlock(&encl_page->encl->lock);
>  			goto skip;
> +		}
>  
> -		mutex_lock(&encl_page->encl->lock);
>  		encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
>  		mutex_unlock(&encl_page->encl->lock);
>  		continue;
> @@ -355,7 +359,6 @@ static void sgx_reclaim_pages(void)
>  
>  		encl_page = epc_page->owner;
>  		sgx_reclaimer_write(epc_page, &backing[i]);
> -		sgx_encl_put_backing(&backing[i], true);
>  
>  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
>  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> -- 
> 2.25.1
> 

I fully agree with your fix but also there I would open code the required
statements from sgx_encL_put_backing(). Let's render that out overtime.
It masks more than helps.

BR, Jarkko



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-04-28 23:49         ` Reinette Chatre
  2022-05-03  2:01           ` Kai Huang
@ 2022-05-07 17:25           ` Jarkko Sakkinen
  2022-05-09 17:17             ` Reinette Chatre
  1 sibling, 1 reply; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-07 17:25 UTC (permalink / raw)
  To: Reinette Chatre; +Cc: Dave Hansen, dave.hansen, linux-sgx, haitao.huang

On Thu, Apr 28, 2022 at 04:49:00PM -0700, Reinette Chatre wrote:
> > I also looked a little deeper at this transient failure problem.  The
> > ELDU documentation also mentions a possible error code of:
> > 
> > 	SGX_EPC_PAGE_CONFLICT
> > 
> > It *looks* like there can be conflicts on the SECS page as well as the
> > EPC page being explicitly accessed.  Is that a possible problem here?
> 
> I went down this path myself. SGX_EPC_PAGE_CONFLICT is an error code
> supported by newer ELDUC - the ELDU used in current code would indeed
> #GP in this case. The SDM text describing ELDUC as "This leaf function
> behaves like ELDU but with improved conflict handling for oversubscription"
> really does seem relevant to the test that triggers this issue.
> 
> I stopped pursuing this because from what I understand if
> SGX_EPC_PAGE_CONFLICT is encountered with commit 08999b2489b4 ("x86/sgx:
> Free backing memory after faulting the enclave page") then it should
> also be encountered without it. The issue is not present with
> 08999b2489b4 ("x86/sgx: Free backing memory after faulting the
> enclave page") removed. I am thus currently investigating based on
> the assumption that the #GP is encountered because of MAC
> verification problem. I may be wrong here also and need more information
> since the SDM documents two seemingly related errors:
> #GP(0) -> If the instruction fails to verify MAC.
> SGX_MAC_COMPARE_FAIL -> If the MAC check fails.

This part puzzles me in the pseudo-code.

The version is read first:

TMP_VER := DS:RDX[63:0]; 

Then there's MAC calculation, comparison,  and finally this check:

(* Check version before committing *)
IF (DS:RDX ≠ 0) 
        THEN #GP(0); 
ELSE
        DS:RDX := TMP_VER;
FI; 

For me it is a mystery what does zero the slot and in what condition
it would be non-zero. Perhaps the #GP refers anyway to this check?

BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-02 21:33             ` Dave Hansen
  2022-05-04 22:13               ` Reinette Chatre
@ 2022-05-07 17:46               ` Jarkko Sakkinen
  2022-05-07 17:48                 ` Jarkko Sakkinen
  1 sibling, 1 reply; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-07 17:46 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Reinette Chatre, dave.hansen, linux-sgx, haitao.huang

On Mon, May 02, 2022 at 02:33:06PM -0700, Dave Hansen wrote:
> On 5/2/22 10:11, Reinette Chatre wrote:
> > My goal was to prevent the page fault handler from doing a "is the PCMD
> > page empty" if the reclaimer has a reference to it. Even if the PCMD page
> > is empty when the page fault handler checks it the page is expected to
> > get data right when reclaimer can get the mutex back since the reclaimer
> > already has a reference to the page.
> 
> I think shmem_truncate_range() might be the wrong operation.  It
> destroys data and, in the end, we don't want to destroy data.
> Filesystems and the page cache already have nice ways to keep from
> destroying data, we just need to use them.
> 
> First, I think set_page_dirty(pcmd_page) before the EWB is a good start.
>  That's what filesystems do before important data that needs to be saved
> goes into pages.
> 
> Second, I think we need behavior like POSIX_FADV_DONTNEED, not
> FALLOC_FL_PUNCH_HOLE.  The DONTNEED operations end up in
> mapping_evict_folio(), which has both page refcount *and* dirty page
> checks.  That means that either elevating a refcount or set_page_dirty()
> will thwart DONTNEED-like behavior.
> 
> There are two basic things we need to do:
> 
> 1. Prevent page from being truncated around EWB time
> 2. Prevent unreferenced page with all zeros from staying in shmem
>    forever or taking up swap space
> 
> On the EWB (write to PCMD side) I think something like this works:
> 
> 	sgx_encl_get_backing()
> 		get_page(pcmd_page)
> 
> 	...
> 	lock_page(pcmd_page);
> 	// check for truncation since sgx_encl_get_backing()
> 	if (pcmd_page->mapping != shmem)
> 		goto retry;
> 	 // double check this is OK under lock_page():
> 	set_page_dirty(pcmd_page);
> 	__sgx_encl_ewb();
> 	unlock_page(pcmd_page);
> 
> That's basically what filesystems do.  Get the page from the page cache,
> lock it, then make sure it is consistent.  If not, retry.
> 
> On the "free" / read in (ELDU) side:
> 
> 	// get pcmd_page ref
> 	lock_page(pcmd_page);
> 	// truncation is not a concern because that's only done
> 	// on the read-in side, here, where we hold encl->lock
> 
> 	memset();
> 	if (!memchr_inv())
> 		// clear the way for DONTNEED:
> 		ClearPageDirty(pcmd_page);
> 	unlock_page(pcmd_page);
> 	// drop pcmd_page ref
> 	...
> 	POSIX_FADV_DONTNEED
> 
> There's one downside to this: it's _possible_ that an transient
> get_page() would block POSIX_FADV_DONTNEED.  Then the zeroed page would
> stick around forever, or at least until the next ELDU operation did
> another memchr_inv().
> 
> I went looking around for some of those and could not find any that I
> *know* apply to shmem.
> 
> This doesn't feel like a great solution; it's more complicated than I
> would like.  Any other ideas?

If we could do both truncation and swapping in one side, i.e. in ksgxd,
that would simplify this process a lot. Then the whole synchronization
problem would not exist.

E.g. perhaps #PF handler could just zero PCMD and collect zeroed pages
indices to a list and ksgxd would truncate them.

BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-07 17:46               ` Jarkko Sakkinen
@ 2022-05-07 17:48                 ` Jarkko Sakkinen
  2022-05-09 17:09                   ` Reinette Chatre
  0 siblings, 1 reply; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-07 17:48 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Reinette Chatre, dave.hansen, linux-sgx, haitao.huang

On Sat, May 07, 2022 at 08:46:50PM +0300, Jarkko Sakkinen wrote:
> On Mon, May 02, 2022 at 02:33:06PM -0700, Dave Hansen wrote:
> > On 5/2/22 10:11, Reinette Chatre wrote:
> > > My goal was to prevent the page fault handler from doing a "is the PCMD
> > > page empty" if the reclaimer has a reference to it. Even if the PCMD page
> > > is empty when the page fault handler checks it the page is expected to
> > > get data right when reclaimer can get the mutex back since the reclaimer
> > > already has a reference to the page.
> > 
> > I think shmem_truncate_range() might be the wrong operation.  It
> > destroys data and, in the end, we don't want to destroy data.
> > Filesystems and the page cache already have nice ways to keep from
> > destroying data, we just need to use them.
> > 
> > First, I think set_page_dirty(pcmd_page) before the EWB is a good start.
> >  That's what filesystems do before important data that needs to be saved
> > goes into pages.
> > 
> > Second, I think we need behavior like POSIX_FADV_DONTNEED, not
> > FALLOC_FL_PUNCH_HOLE.  The DONTNEED operations end up in
> > mapping_evict_folio(), which has both page refcount *and* dirty page
> > checks.  That means that either elevating a refcount or set_page_dirty()
> > will thwart DONTNEED-like behavior.
> > 
> > There are two basic things we need to do:
> > 
> > 1. Prevent page from being truncated around EWB time
> > 2. Prevent unreferenced page with all zeros from staying in shmem
> >    forever or taking up swap space
> > 
> > On the EWB (write to PCMD side) I think something like this works:
> > 
> > 	sgx_encl_get_backing()
> > 		get_page(pcmd_page)
> > 
> > 	...
> > 	lock_page(pcmd_page);
> > 	// check for truncation since sgx_encl_get_backing()
> > 	if (pcmd_page->mapping != shmem)
> > 		goto retry;
> > 	 // double check this is OK under lock_page():
> > 	set_page_dirty(pcmd_page);
> > 	__sgx_encl_ewb();
> > 	unlock_page(pcmd_page);
> > 
> > That's basically what filesystems do.  Get the page from the page cache,
> > lock it, then make sure it is consistent.  If not, retry.
> > 
> > On the "free" / read in (ELDU) side:
> > 
> > 	// get pcmd_page ref
> > 	lock_page(pcmd_page);
> > 	// truncation is not a concern because that's only done
> > 	// on the read-in side, here, where we hold encl->lock
> > 
> > 	memset();
> > 	if (!memchr_inv())
> > 		// clear the way for DONTNEED:
> > 		ClearPageDirty(pcmd_page);
> > 	unlock_page(pcmd_page);
> > 	// drop pcmd_page ref
> > 	...
> > 	POSIX_FADV_DONTNEED
> > 
> > There's one downside to this: it's _possible_ that an transient
> > get_page() would block POSIX_FADV_DONTNEED.  Then the zeroed page would
> > stick around forever, or at least until the next ELDU operation did
> > another memchr_inv().
> > 
> > I went looking around for some of those and could not find any that I
> > *know* apply to shmem.
> > 
> > This doesn't feel like a great solution; it's more complicated than I
> > would like.  Any other ideas?
> 
> If we could do both truncation and swapping in one side, i.e. in ksgxd,
> that would simplify this process a lot. Then the whole synchronization
> problem would not exist.
> 
> E.g. perhaps #PF handler could just zero PCMD and collect zeroed pages
> indices to a list and ksgxd would truncate them.

I.e. instead of immediate response, go for lazy response that is taken 
care by ksgxd.

BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 2/4] x86/sgx: Set dirty bit after modifying page contents
  2022-05-06 22:40     ` Reinette Chatre
@ 2022-05-07 18:01       ` Jarkko Sakkinen
  0 siblings, 0 replies; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-07 18:01 UTC (permalink / raw)
  To: Reinette Chatre; +Cc: dave.hansen, linux-sgx, haitao.huang

On Fri, May 06, 2022 at 03:40:26PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 5/6/2022 3:27 PM, Jarkko Sakkinen wrote:
> > On Thu, 2022-04-28 at 13:11 -0700, Reinette Chatre wrote:
> >> Recent commit 08999b2489b4 ("x86/sgx: Free backing memory
> >> after faulting the enclave page") expanded __sgx_encl_eldu()
> >> to clear an enclave page's PCMD (Paging Crypto MetaData)
> >> from the PCMD page in the backing store after the enclave
> >> page is restored to the enclave.
> >>
> >> Since the PCMD page in the backing store is modified the page
> >> should be set as dirty when releasing the reference to
> >> ensure the modified data is retained.
> >>
> >> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
> >> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> >> ---
> >>  arch/x86/kernel/cpu/sgx/encl.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> >> index e5d2661800ac..e03f124ce772 100644
> >> --- a/arch/x86/kernel/cpu/sgx/encl.c
> >> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> >> @@ -98,7 +98,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
> >>         kunmap_atomic(pcmd_page);
> >>         kunmap_atomic((void *)(unsigned long)pginfo.contents);
> >>  
> >> -       sgx_encl_put_backing(&b, false);
> >> +       sgx_encl_put_backing(&b, true);
> >>  
> >>         sgx_encl_truncate_backing_page(encl, page_index);
> >>  
> > 
> > So, the implementation is:
> > 
> > void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write)
> > {
> > 	if (do_write) {
> > 		set_page_dirty(backing->pcmd);
> > 		set_page_dirty(backing->contents);
> > 	}
> > 
> > 	put_page(backing->pcmd);
> > 	put_page(backing->contents);
> > }
> > 
> > And we only want to set dirty for PCMD part, right?
> > 
> > Thus, perhaps we should fix it instead by:
> > 
> > 	set_page_dirty(backing->pcmd);
> > 	put_page(backing->pcmd):
> > 	put_page(backing->contents);
> > 	
> > ?	
> > 
> > I would not mind getting rid of that function anyway. It's kind
> > of bad wrapping IMHO.
> 
> Could we rather just change sgx_encl_put_backing() to be:
> 
> void sgx_encl_put_backing(struct sgx_backing *backing)
> {
> 	put_page(backing->pcmd);
> 	put_page(backing->contents);
> }
> 
> Two reasons:
> 
> 1) Instead of getting rid of sgx_encl_put_backing() we can keep
>    it so that its name reflects its work and its usage remains
>    symmetrical to sgx_encl_get_backing() that will stay and
>    the code thus continues to be clear on the page references.
> 
> 2) By moving the set_page_dirty() out of that function the
>    set_page_dirty() can be called _before_ writing data to
>    the page, which is the right thing to do per:
>    https://lore.kernel.org/linux-sgx/c057af3d-b7fb-34cd-0d75-989fca0e67fe@intel.com/


Yes, this does make sense to me. Important bit was to refactor
set_page_dirty() calls out but for sure put_page()'s can still
be wrapped.

As I responded to Dave just few moments ago, one option would be
*possibly* to do the truncation part in ksgxd but I hope we hope
we don't have to go to that because it would mean also extra
book-keeping...

Took a bit of effort but I finally got this reproduced with my
laptop with this CPU:

Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz

That's why I've had some radio silence in the meanwhile. In other
words, I should be now able to also to test the fix locally.

PS. I'm sorry if I've double-replied to this. Was not sure if
my first response left yesterday, and could not find it from
lore (and this does contain anyway stuff that was not incldued
into that).

BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-07 17:48                 ` Jarkko Sakkinen
@ 2022-05-09 17:09                   ` Reinette Chatre
  2022-05-10 22:28                     ` Jarkko Sakkinen
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-05-09 17:09 UTC (permalink / raw)
  To: Jarkko Sakkinen, Dave Hansen; +Cc: dave.hansen, linux-sgx, haitao.huang

Hi Jarkko,

On 5/7/2022 10:48 AM, Jarkko Sakkinen wrote:
> On Sat, May 07, 2022 at 08:46:50PM +0300, Jarkko Sakkinen wrote:
>> On Mon, May 02, 2022 at 02:33:06PM -0700, Dave Hansen wrote:
>>> On 5/2/22 10:11, Reinette Chatre wrote:
>>>> My goal was to prevent the page fault handler from doing a "is the PCMD
>>>> page empty" if the reclaimer has a reference to it. Even if the PCMD page
>>>> is empty when the page fault handler checks it the page is expected to
>>>> get data right when reclaimer can get the mutex back since the reclaimer
>>>> already has a reference to the page.
>>>
>>> I think shmem_truncate_range() might be the wrong operation.  It
>>> destroys data and, in the end, we don't want to destroy data.
>>> Filesystems and the page cache already have nice ways to keep from
>>> destroying data, we just need to use them.
>>>
>>> First, I think set_page_dirty(pcmd_page) before the EWB is a good start.
>>>  That's what filesystems do before important data that needs to be saved
>>> goes into pages.
>>>
>>> Second, I think we need behavior like POSIX_FADV_DONTNEED, not
>>> FALLOC_FL_PUNCH_HOLE.  The DONTNEED operations end up in
>>> mapping_evict_folio(), which has both page refcount *and* dirty page
>>> checks.  That means that either elevating a refcount or set_page_dirty()
>>> will thwart DONTNEED-like behavior.
>>>
>>> There are two basic things we need to do:
>>>
>>> 1. Prevent page from being truncated around EWB time
>>> 2. Prevent unreferenced page with all zeros from staying in shmem
>>>    forever or taking up swap space
>>>
>>> On the EWB (write to PCMD side) I think something like this works:
>>>
>>> 	sgx_encl_get_backing()
>>> 		get_page(pcmd_page)
>>>
>>> 	...
>>> 	lock_page(pcmd_page);
>>> 	// check for truncation since sgx_encl_get_backing()
>>> 	if (pcmd_page->mapping != shmem)
>>> 		goto retry;
>>> 	 // double check this is OK under lock_page():
>>> 	set_page_dirty(pcmd_page);
>>> 	__sgx_encl_ewb();
>>> 	unlock_page(pcmd_page);
>>>
>>> That's basically what filesystems do.  Get the page from the page cache,
>>> lock it, then make sure it is consistent.  If not, retry.
>>>
>>> On the "free" / read in (ELDU) side:
>>>
>>> 	// get pcmd_page ref
>>> 	lock_page(pcmd_page);
>>> 	// truncation is not a concern because that's only done
>>> 	// on the read-in side, here, where we hold encl->lock
>>>
>>> 	memset();
>>> 	if (!memchr_inv())
>>> 		// clear the way for DONTNEED:
>>> 		ClearPageDirty(pcmd_page);
>>> 	unlock_page(pcmd_page);
>>> 	// drop pcmd_page ref
>>> 	...
>>> 	POSIX_FADV_DONTNEED
>>>
>>> There's one downside to this: it's _possible_ that an transient
>>> get_page() would block POSIX_FADV_DONTNEED.  Then the zeroed page would
>>> stick around forever, or at least until the next ELDU operation did
>>> another memchr_inv().
>>>
>>> I went looking around for some of those and could not find any that I
>>> *know* apply to shmem.
>>>
>>> This doesn't feel like a great solution; it's more complicated than I
>>> would like.  Any other ideas?
>>
>> If we could do both truncation and swapping in one side, i.e. in ksgxd,
>> that would simplify this process a lot. Then the whole synchronization
>> problem would not exist.
>>
>> E.g. perhaps #PF handler could just zero PCMD and collect zeroed pages
>> indices to a list and ksgxd would truncate them.
> 
> I.e. instead of immediate response, go for lazy response that is taken 
> care by ksgxd.

Could you please elaborate how you envision this solution? From what I understand
there would be a per-enclave list that contains information about empty
PCMD pages intended to be truncated. The page fault handler adds pages
to this list and the reclaimer needs to remove pages from this list when
it writes to those pages and then do the actual truncation - but it is not
clear how the reclaimer will know when it can safely remove a page from the
list since it obtains PCMD page references in batches.

Did you get a chance to consider the fix proposed in
https://lore.kernel.org/linux-sgx/d4b52482-2dd0-d5f1-bda9-e1d97883298d@intel.com/ 

I understand that the email thread may have become hard to follow and I plan
to submit a new series today.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-05-07 17:25           ` Jarkko Sakkinen
@ 2022-05-09 17:17             ` Reinette Chatre
  2022-05-10  0:36               ` Kai Huang
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-05-09 17:17 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: Dave Hansen, dave.hansen, linux-sgx, haitao.huang

Hi Jarkko,

On 5/7/2022 10:25 AM, Jarkko Sakkinen wrote:
> On Thu, Apr 28, 2022 at 04:49:00PM -0700, Reinette Chatre wrote:
>>> I also looked a little deeper at this transient failure problem.  The
>>> ELDU documentation also mentions a possible error code of:
>>>
>>> 	SGX_EPC_PAGE_CONFLICT
>>>
>>> It *looks* like there can be conflicts on the SECS page as well as the
>>> EPC page being explicitly accessed.  Is that a possible problem here?
>>
>> I went down this path myself. SGX_EPC_PAGE_CONFLICT is an error code
>> supported by newer ELDUC - the ELDU used in current code would indeed
>> #GP in this case. The SDM text describing ELDUC as "This leaf function
>> behaves like ELDU but with improved conflict handling for oversubscription"
>> really does seem relevant to the test that triggers this issue.
>>
>> I stopped pursuing this because from what I understand if
>> SGX_EPC_PAGE_CONFLICT is encountered with commit 08999b2489b4 ("x86/sgx:
>> Free backing memory after faulting the enclave page") then it should
>> also be encountered without it. The issue is not present with
>> 08999b2489b4 ("x86/sgx: Free backing memory after faulting the
>> enclave page") removed. I am thus currently investigating based on
>> the assumption that the #GP is encountered because of MAC
>> verification problem. I may be wrong here also and need more information
>> since the SDM documents two seemingly related errors:
>> #GP(0) -> If the instruction fails to verify MAC.
>> SGX_MAC_COMPARE_FAIL -> If the MAC check fails.
> 
> This part puzzles me in the pseudo-code.
> 
> The version is read first:
> 
> TMP_VER := DS:RDX[63:0]; 
> 
> Then there's MAC calculation, comparison,  and finally this check:
> 
> (* Check version before committing *)
> IF (DS:RDX ≠ 0) 
>         THEN #GP(0); 
> ELSE
>         DS:RDX := TMP_VER;
> FI; 
> 
> For me it is a mystery what does zero the slot and in what condition
> it would be non-zero. Perhaps the #GP refers anyway to this check?

RDX contains the VA slot information and that appears to be correct
in these scenarios. The issue is the PCMD data pointed to by the
PAGEINFO.PCMD (link to PAGEINFO found in RBX) is all zeroes.

There are two scenarios under which the PCMD data could be zeroes. They
are documented in:
https://lore.kernel.org/linux-sgx/8157fa40-8d02-8819-e1b6-fd2d8863fb56@intel.com/
https://lore.kernel.org/linux-sgx/da387afc-e666-45d0-1e99-066d8c4aab03@intel.com/

I understand that context may be lost by pointing you to various emails
in this thread - I'll wrap up all learnings when I submit the new version
of this series today.


Reinette 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-05-09 17:17             ` Reinette Chatre
@ 2022-05-10  0:36               ` Kai Huang
  2022-05-11 10:26                 ` Jarkko Sakkinen
  0 siblings, 1 reply; 50+ messages in thread
From: Kai Huang @ 2022-05-10  0:36 UTC (permalink / raw)
  To: Reinette Chatre, Jarkko Sakkinen
  Cc: Dave Hansen, dave.hansen, linux-sgx, haitao.huang

On Mon, 2022-05-09 at 10:17 -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 5/7/2022 10:25 AM, Jarkko Sakkinen wrote:
> > On Thu, Apr 28, 2022 at 04:49:00PM -0700, Reinette Chatre wrote:
> > > > I also looked a little deeper at this transient failure problem.  The
> > > > ELDU documentation also mentions a possible error code of:
> > > > 
> > > > 	SGX_EPC_PAGE_CONFLICT
> > > > 
> > > > It *looks* like there can be conflicts on the SECS page as well as the
> > > > EPC page being explicitly accessed.  Is that a possible problem here?
> > > 
> > > I went down this path myself. SGX_EPC_PAGE_CONFLICT is an error code
> > > supported by newer ELDUC - the ELDU used in current code would indeed
> > > #GP in this case. The SDM text describing ELDUC as "This leaf function
> > > behaves like ELDU but with improved conflict handling for oversubscription"
> > > really does seem relevant to the test that triggers this issue.
> > > 
> > > I stopped pursuing this because from what I understand if
> > > SGX_EPC_PAGE_CONFLICT is encountered with commit 08999b2489b4 ("x86/sgx:
> > > Free backing memory after faulting the enclave page") then it should
> > > also be encountered without it. The issue is not present with
> > > 08999b2489b4 ("x86/sgx: Free backing memory after faulting the
> > > enclave page") removed. I am thus currently investigating based on
> > > the assumption that the #GP is encountered because of MAC
> > > verification problem. I may be wrong here also and need more information
> > > since the SDM documents two seemingly related errors:
> > > #GP(0) -> If the instruction fails to verify MAC.
> > > SGX_MAC_COMPARE_FAIL -> If the MAC check fails.
> > 
> > This part puzzles me in the pseudo-code.
> > 
> > The version is read first:
> > 
> > TMP_VER := DS:RDX[63:0]; 
> > 
> > Then there's MAC calculation, comparison,  and finally this check:
> > 
> > (* Check version before committing *)
> > IF (DS:RDX ≠ 0) 
> >         THEN #GP(0); 
> > ELSE
> >         DS:RDX := TMP_VER;
> > FI; 
> > 
> > For me it is a mystery what does zero the slot and in what condition
> > it would be non-zero. Perhaps the #GP refers anyway to this check?
> 
> RDX contains the VA slot information and that appears to be correct
> in these scenarios. The issue is the PCMD data pointed to by the
> PAGEINFO.PCMD (link to PAGEINFO found in RBX) is all zeroes.
> 
> There are two scenarios under which the PCMD data could be zeroes. They
> are documented in:
> https://lore.kernel.org/linux-sgx/8157fa40-8d02-8819-e1b6-fd2d8863fb56@intel.com/
> https://lore.kernel.org/linux-sgx/da387afc-e666-45d0-1e99-066d8c4aab03@intel.com/
> 
> I understand that context may be lost by pointing you to various emails
> in this thread - I'll wrap up all learnings when I submit the new version
> of this series today.
> 

Hi Reinette,

Regardless the root cause of this problem, I agree with Jarkko above pseudo-code
in the spec is quite confusing.  I can try to get it clarified from Intel
internally if you want.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-09 17:09                   ` Reinette Chatre
@ 2022-05-10 22:28                     ` Jarkko Sakkinen
  2022-05-11 17:23                       ` Reinette Chatre
  0 siblings, 1 reply; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-10 22:28 UTC (permalink / raw)
  To: Reinette Chatre; +Cc: Dave Hansen, dave.hansen, linux-sgx, haitao.huang

On Mon, May 09, 2022 at 10:09:24AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 5/7/2022 10:48 AM, Jarkko Sakkinen wrote:
> > On Sat, May 07, 2022 at 08:46:50PM +0300, Jarkko Sakkinen wrote:
> >> On Mon, May 02, 2022 at 02:33:06PM -0700, Dave Hansen wrote:
> >>> On 5/2/22 10:11, Reinette Chatre wrote:
> >>>> My goal was to prevent the page fault handler from doing a "is the PCMD
> >>>> page empty" if the reclaimer has a reference to it. Even if the PCMD page
> >>>> is empty when the page fault handler checks it the page is expected to
> >>>> get data right when reclaimer can get the mutex back since the reclaimer
> >>>> already has a reference to the page.
> >>>
> >>> I think shmem_truncate_range() might be the wrong operation.  It
> >>> destroys data and, in the end, we don't want to destroy data.
> >>> Filesystems and the page cache already have nice ways to keep from
> >>> destroying data, we just need to use them.
> >>>
> >>> First, I think set_page_dirty(pcmd_page) before the EWB is a good start.
> >>>  That's what filesystems do before important data that needs to be saved
> >>> goes into pages.
> >>>
> >>> Second, I think we need behavior like POSIX_FADV_DONTNEED, not
> >>> FALLOC_FL_PUNCH_HOLE.  The DONTNEED operations end up in
> >>> mapping_evict_folio(), which has both page refcount *and* dirty page
> >>> checks.  That means that either elevating a refcount or set_page_dirty()
> >>> will thwart DONTNEED-like behavior.
> >>>
> >>> There are two basic things we need to do:
> >>>
> >>> 1. Prevent page from being truncated around EWB time
> >>> 2. Prevent unreferenced page with all zeros from staying in shmem
> >>>    forever or taking up swap space
> >>>
> >>> On the EWB (write to PCMD side) I think something like this works:
> >>>
> >>> 	sgx_encl_get_backing()
> >>> 		get_page(pcmd_page)
> >>>
> >>> 	...
> >>> 	lock_page(pcmd_page);
> >>> 	// check for truncation since sgx_encl_get_backing()
> >>> 	if (pcmd_page->mapping != shmem)
> >>> 		goto retry;
> >>> 	 // double check this is OK under lock_page():
> >>> 	set_page_dirty(pcmd_page);
> >>> 	__sgx_encl_ewb();
> >>> 	unlock_page(pcmd_page);
> >>>
> >>> That's basically what filesystems do.  Get the page from the page cache,
> >>> lock it, then make sure it is consistent.  If not, retry.
> >>>
> >>> On the "free" / read in (ELDU) side:
> >>>
> >>> 	// get pcmd_page ref
> >>> 	lock_page(pcmd_page);
> >>> 	// truncation is not a concern because that's only done
> >>> 	// on the read-in side, here, where we hold encl->lock
> >>>
> >>> 	memset();
> >>> 	if (!memchr_inv())
> >>> 		// clear the way for DONTNEED:
> >>> 		ClearPageDirty(pcmd_page);
> >>> 	unlock_page(pcmd_page);
> >>> 	// drop pcmd_page ref
> >>> 	...
> >>> 	POSIX_FADV_DONTNEED
> >>>
> >>> There's one downside to this: it's _possible_ that an transient
> >>> get_page() would block POSIX_FADV_DONTNEED.  Then the zeroed page would
> >>> stick around forever, or at least until the next ELDU operation did
> >>> another memchr_inv().
> >>>
> >>> I went looking around for some of those and could not find any that I
> >>> *know* apply to shmem.
> >>>
> >>> This doesn't feel like a great solution; it's more complicated than I
> >>> would like.  Any other ideas?
> >>
> >> If we could do both truncation and swapping in one side, i.e. in ksgxd,
> >> that would simplify this process a lot. Then the whole synchronization
> >> problem would not exist.
> >>
> >> E.g. perhaps #PF handler could just zero PCMD and collect zeroed pages
> >> indices to a list and ksgxd would truncate them.
> > 
> > I.e. instead of immediate response, go for lazy response that is taken 
> > care by ksgxd.
> 
> Could you please elaborate how you envision this solution? From what I understand
> there would be a per-enclave list that contains information about empty
> PCMD pages intended to be truncated. The page fault handler adds pages
> to this list and the reclaimer needs to remove pages from this list when
> it writes to those pages and then do the actual truncation - but it is not
> clear how the reclaimer will know when it can safely remove a page from the
> list since it obtains PCMD page references in batches.
> 
> Did you get a chance to consider the fix proposed in
> https://lore.kernel.org/linux-sgx/d4b52482-2dd0-d5f1-bda9-e1d97883298d@intel.com/ 
> 
> I understand that the email thread may have become hard to follow and I plan
> to submit a new series today.

Let's just say that I came a bit late to the series, and should have read
the whole thread before responding to anything. As long as enclave lock is
kept on both sides things should be fine.

I think for bugs like these it would make sense to put them out early as
possible, e.g. I would be fine getting them from kernel bugzilla. Now there
there was two week latency on finding the issue, and making it public.
Unless there is something confidential, it would be best to get early
alert. I'm always ready to change my priorities to help to fix such issues.

BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-05-10  0:36               ` Kai Huang
@ 2022-05-11 10:26                 ` Jarkko Sakkinen
  2022-05-11 18:29                   ` Haitao Huang
  0 siblings, 1 reply; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-11 10:26 UTC (permalink / raw)
  To: Kai Huang
  Cc: Reinette Chatre, Dave Hansen, dave.hansen, linux-sgx, haitao.huang

On Tue, May 10, 2022 at 12:36:19PM +1200, Kai Huang wrote:
> On Mon, 2022-05-09 at 10:17 -0700, Reinette Chatre wrote:
> > Hi Jarkko,
> > 
> > On 5/7/2022 10:25 AM, Jarkko Sakkinen wrote:
> > > On Thu, Apr 28, 2022 at 04:49:00PM -0700, Reinette Chatre wrote:
> > > > > I also looked a little deeper at this transient failure problem.  The
> > > > > ELDU documentation also mentions a possible error code of:
> > > > > 
> > > > > 	SGX_EPC_PAGE_CONFLICT
> > > > > 
> > > > > It *looks* like there can be conflicts on the SECS page as well as the
> > > > > EPC page being explicitly accessed.  Is that a possible problem here?
> > > > 
> > > > I went down this path myself. SGX_EPC_PAGE_CONFLICT is an error code
> > > > supported by newer ELDUC - the ELDU used in current code would indeed
> > > > #GP in this case. The SDM text describing ELDUC as "This leaf function
> > > > behaves like ELDU but with improved conflict handling for oversubscription"
> > > > really does seem relevant to the test that triggers this issue.
> > > > 
> > > > I stopped pursuing this because from what I understand if
> > > > SGX_EPC_PAGE_CONFLICT is encountered with commit 08999b2489b4 ("x86/sgx:
> > > > Free backing memory after faulting the enclave page") then it should
> > > > also be encountered without it. The issue is not present with
> > > > 08999b2489b4 ("x86/sgx: Free backing memory after faulting the
> > > > enclave page") removed. I am thus currently investigating based on
> > > > the assumption that the #GP is encountered because of MAC
> > > > verification problem. I may be wrong here also and need more information
> > > > since the SDM documents two seemingly related errors:
> > > > #GP(0) -> If the instruction fails to verify MAC.
> > > > SGX_MAC_COMPARE_FAIL -> If the MAC check fails.
> > > 
> > > This part puzzles me in the pseudo-code.
> > > 
> > > The version is read first:
> > > 
> > > TMP_VER := DS:RDX[63:0]; 
> > > 
> > > Then there's MAC calculation, comparison,  and finally this check:
> > > 
> > > (* Check version before committing *)
> > > IF (DS:RDX ≠ 0) 
> > >         THEN #GP(0); 
> > > ELSE
> > >         DS:RDX := TMP_VER;
> > > FI; 
> > > 
> > > For me it is a mystery what does zero the slot and in what condition
> > > it would be non-zero. Perhaps the #GP refers anyway to this check?
> > 
> > RDX contains the VA slot information and that appears to be correct
> > in these scenarios. The issue is the PCMD data pointed to by the
> > PAGEINFO.PCMD (link to PAGEINFO found in RBX) is all zeroes.
> > 
> > There are two scenarios under which the PCMD data could be zeroes. They
> > are documented in:
> > https://lore.kernel.org/linux-sgx/8157fa40-8d02-8819-e1b6-fd2d8863fb56@intel.com/
> > https://lore.kernel.org/linux-sgx/da387afc-e666-45d0-1e99-066d8c4aab03@intel.com/
> > 
> > I understand that context may be lost by pointing you to various emails
> > in this thread - I'll wrap up all learnings when I submit the new version
> > of this series today.
> > 
> 
> Hi Reinette,
> 
> Regardless the root cause of this problem, I agree with Jarkko above pseudo-code
> in the spec is quite confusing.  I can try to get it clarified from Intel
> internally if you want.
 
It is :-) Yeah, it would be great if it could be made a bit more punctual!

BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-10 22:28                     ` Jarkko Sakkinen
@ 2022-05-11 17:23                       ` Reinette Chatre
  2022-05-12 14:10                         ` Jarkko Sakkinen
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-05-11 17:23 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: Dave Hansen, dave.hansen, linux-sgx, haitao.huang

Hi Jarkko,

On 5/10/2022 3:28 PM, Jarkko Sakkinen wrote:
> Let's just say that I came a bit late to the series, and should have read
> the whole thread before responding to anything. As long as enclave lock is
> kept on both sides things should be fine.

For the most part, yes. The remaining scenario is the case when the reclaimer
releases the enclave mutex while keeping a reference to the backing store pages.
By releasing the enclave mutex there is opportunity for page fault
handler to run and also operate on the backing store. Both the reclaimer
(after patch 3/4 in this series) and page fault handler operate on
the backing store with enclave mutex held but if that is done without
taking backing store references into account data could be lost. This is
addressed in the following series with:
https://lore.kernel.org/linux-sgx/d0ace4a1770ab8f4196bfeae06d0970ddb14ef01.1652131695.git.reinette.chatre@intel.com/


> 
> I think for bugs like these it would make sense to put them out early as
> possible, e.g. I would be fine getting them from kernel bugzilla. Now there
> there was two week latency on finding the issue, and making it public.
> Unless there is something confidential, it would be best to get early
> alert. I'm always ready to change my priorities to help to fix such issues.

I am sorry about this. The reason I first struggled with this by myself was
because it was made out to be an SGX2 issue. This was made worse when I was not
able to create an SGX1 test case that can trigger the issue. I thus lacked
evidence that it is an upstream issue and it took me a while to debug and
understand the issue to gain confidence that it is indeed an upstream issue.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-05-11 10:26                 ` Jarkko Sakkinen
@ 2022-05-11 18:29                   ` Haitao Huang
  2022-05-11 22:00                     ` Kai Huang
  2022-05-12 21:14                     ` Jarkko Sakkinen
  0 siblings, 2 replies; 50+ messages in thread
From: Haitao Huang @ 2022-05-11 18:29 UTC (permalink / raw)
  To: Kai Huang, Jarkko Sakkinen
  Cc: Reinette Chatre, Dave Hansen, dave.hansen, linux-sgx, haitao.huang

Hi Jarkko

On Wed, 11 May 2022 05:26:15 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Tue, May 10, 2022 at 12:36:19PM +1200, Kai Huang wrote:
>> On Mon, 2022-05-09 at 10:17 -0700, Reinette Chatre wrote:
>> > Hi Jarkko,
>> >
>> > On 5/7/2022 10:25 AM, Jarkko Sakkinen wrote:
>> > > On Thu, Apr 28, 2022 at 04:49:00PM -0700, Reinette Chatre wrote:
>> > > > > I also looked a little deeper at this transient failure  
>> problem.  The
>> > > > > ELDU documentation also mentions a possible error code of:
>> > > > >
>> > > > > 	SGX_EPC_PAGE_CONFLICT
>> > > > >
>> > > > > It *looks* like there can be conflicts on the SECS page as well  
>> as the
>> > > > > EPC page being explicitly accessed.  Is that a possible problem  
>> here?
>> > > >
>> > > > I went down this path myself. SGX_EPC_PAGE_CONFLICT is an error  
>> code
>> > > > supported by newer ELDUC - the ELDU used in current code would  
>> indeed
>> > > > #GP in this case. The SDM text describing ELDUC as "This leaf  
>> function
>> > > > behaves like ELDU but with improved conflict handling for  
>> oversubscription"
>> > > > really does seem relevant to the test that triggers this issue.
>> > > >
>> > > > I stopped pursuing this because from what I understand if
>> > > > SGX_EPC_PAGE_CONFLICT is encountered with commit 08999b2489b4  
>> ("x86/sgx:
>> > > > Free backing memory after faulting the enclave page") then it  
>> should
>> > > > also be encountered without it. The issue is not present with
>> > > > 08999b2489b4 ("x86/sgx: Free backing memory after faulting the
>> > > > enclave page") removed. I am thus currently investigating based on
>> > > > the assumption that the #GP is encountered because of MAC
>> > > > verification problem. I may be wrong here also and need more  
>> information
>> > > > since the SDM documents two seemingly related errors:
>> > > > #GP(0) -> If the instruction fails to verify MAC.
>> > > > SGX_MAC_COMPARE_FAIL -> If the MAC check fails.
>> > >
>> > > This part puzzles me in the pseudo-code.
>> > >
>> > > The version is read first:
>> > >
>> > > TMP_VER := DS:RDX[63:0];
>> > >
>> > > Then there's MAC calculation, comparison,  and finally this check:
>> > >
>> > > (* Check version before committing *)
>> > > IF (DS:RDX ≠ 0)
>> > >         THEN #GP(0);
>> > > ELSE
>> > >         DS:RDX := TMP_VER;
>> > > FI;
>> > >
>> > > For me it is a mystery what does zero the slot and in what condition
>> > > it would be non-zero. Perhaps the #GP refers anyway to this check?
>> >


We discussed this internally, and agree this part of pseudo code needs be  
corrected/clarified.

Here is what we think was going on when ELDU invoked with PCMD of all  
zeros: ELDU would check if the PCMD.SECINFO.FLAGS.PT is 0 which indicates  
that the page being loaded is a PT_SECS, and the PAGEINFO.SECS is not  
zero, then the instruction will #GP(0).  Thus, ELDU is behaving correctly  
– it is an omission in the SDM pseudocode.

The version checking code above also need be clarified because the VA slot  
would be cleared at this point and TMP_VER should be zero.

We will follow up with the architect once he is back from sabbatical and  
make needed adjustment for SDM.

Thanks
Haitao

>> > RDX contains the VA slot information and that appears to be correct
>> > in these scenarios. The issue is the PCMD data pointed to by the
>> > PAGEINFO.PCMD (link to PAGEINFO found in RBX) is all zeroes.
>> >
>> > There are two scenarios under which the PCMD data could be zeroes.  
>> They
>> > are documented in:
>> >  
>> https://lore.kernel.org/linux-sgx/8157fa40-8d02-8819-e1b6-fd2d8863fb56@intel.com/
>> >  
>> https://lore.kernel.org/linux-sgx/da387afc-e666-45d0-1e99-066d8c4aab03@intel.com/
>> >
>> > I understand that context may be lost by pointing you to various  
>> emails
>> > in this thread - I'll wrap up all learnings when I submit the new  
>> version
>> > of this series today.
>> >
>>
>> Hi Reinette,
>>
>> Regardless the root cause of this problem, I agree with Jarkko above  
>> pseudo-code
>> in the spec is quite confusing.  I can try to get it clarified from  
>> Intel
>> internally if you want.
> It is :-) Yeah, it would be great if it could be made a bit more  
> punctual!
>
> BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-05-11 18:29                   ` Haitao Huang
@ 2022-05-11 22:00                     ` Kai Huang
  2022-05-12 21:14                     ` Jarkko Sakkinen
  1 sibling, 0 replies; 50+ messages in thread
From: Kai Huang @ 2022-05-11 22:00 UTC (permalink / raw)
  To: Haitao Huang, Jarkko Sakkinen
  Cc: Reinette Chatre, Dave Hansen, dave.hansen, linux-sgx, haitao.huang


> > > > > 
> > > > > This part puzzles me in the pseudo-code.
> > > > > 
> > > > > The version is read first:
> > > > > 
> > > > > TMP_VER := DS:RDX[63:0];
> > > > > 
> > > > > Then there's MAC calculation, comparison,  and finally this check:
> > > > > 
> > > > > (* Check version before committing *)
> > > > > IF (DS:RDX ≠ 0)
> > > > >         THEN #GP(0);
> > > > > ELSE
> > > > >         DS:RDX := TMP_VER;
> > > > > FI;
> > > > > 
> > > > > For me it is a mystery what does zero the slot and in what condition
> > > > > it would be non-zero. Perhaps the #GP refers anyway to this check?
> > > > 
> 
> 
> We discussed this internally, and agree this part of pseudo code needs be  
> corrected/clarified.
> 
> Here is what we think was going on when ELDU invoked with PCMD of all  
> zeros: ELDU would check if the PCMD.SECINFO.FLAGS.PT is 0 which indicates  
> that the page being loaded is a PT_SECS, and the PAGEINFO.SECS is not  
> zero, then the instruction will #GP(0).  Thus, ELDU is behaving correctly  
> – it is an omission in the SDM pseudocode.
> 
> The version checking code above also need be clarified because the VA slot  
> would be cleared at this point and TMP_VER should be zero.

"VA slot would be cleared at this point" isn't accurate.  The VA slot itself is
still occupied at this point.  The original TMP_VER before the decryption is the
VA slot value stored by EWB, and after decryption it becomes 0, if the
decryption is correct.  The correct pseudo-code should be:

	IF (TMP_VER != 0)
		THEN #GP(0);
	ELSE
		DS:RDX	= TMP_VER;
	FI;

The check of TMP_VER against 0 is just an additional safe guard to make sure
decryption didn't fail.


-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 0/4] SGX shmem backing store issue
  2022-05-11 17:23                       ` Reinette Chatre
@ 2022-05-12 14:10                         ` Jarkko Sakkinen
  0 siblings, 0 replies; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-12 14:10 UTC (permalink / raw)
  To: Reinette Chatre; +Cc: Dave Hansen, dave.hansen, linux-sgx, haitao.huang

On Wed, May 11, 2022 at 10:23:29AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 5/10/2022 3:28 PM, Jarkko Sakkinen wrote:
> > Let's just say that I came a bit late to the series, and should have read
> > the whole thread before responding to anything. As long as enclave lock is
> > kept on both sides things should be fine.
> 
> For the most part, yes. The remaining scenario is the case when the reclaimer
> releases the enclave mutex while keeping a reference to the backing store pages.
> By releasing the enclave mutex there is opportunity for page fault
> handler to run and also operate on the backing store. Both the reclaimer
> (after patch 3/4 in this series) and page fault handler operate on
> the backing store with enclave mutex held but if that is done without
> taking backing store references into account data could be lost. This is
> addressed in the following series with:
> https://lore.kernel.org/linux-sgx/d0ace4a1770ab8f4196bfeae06d0970ddb14ef01.1652131695.git.reinette.chatre@intel.com/

Right, I get it now. Great job figuring all this out!

> > 
> > I think for bugs like these it would make sense to put them out early as
> > possible, e.g. I would be fine getting them from kernel bugzilla. Now there
> > there was two week latency on finding the issue, and making it public.
> > Unless there is something confidential, it would be best to get early
> > alert. I'm always ready to change my priorities to help to fix such issues.
> 
> I am sorry about this. The reason I first struggled with this by myself was
> because it was made out to be an SGX2 issue. This was made worse when I was not
> able to create an SGX1 test case that can trigger the issue. I thus lacked
> evidence that it is an upstream issue and it took me a while to debug and
> understand the issue to gain confidence that it is indeed an upstream issue.

No worries, just making the crash log available early on gives a chance
to investigate it :-) We can then figure out whether it is SGX2 issue or
not, or any other circumstance. There should be a low-barrier to do this.

E.g. I happened to get sick when the series was posted, and therefore
lagged on investigation. Giving the maximum amount of window here is super
important.

> Reinette

BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure
  2022-05-11 18:29                   ` Haitao Huang
  2022-05-11 22:00                     ` Kai Huang
@ 2022-05-12 21:14                     ` Jarkko Sakkinen
  1 sibling, 0 replies; 50+ messages in thread
From: Jarkko Sakkinen @ 2022-05-12 21:14 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Kai Huang, Reinette Chatre, Dave Hansen, dave.hansen, linux-sgx,
	haitao.huang

On Wed, May 11, 2022 at 01:29:59PM -0500, Haitao Huang wrote:
> Hi Jarkko
> 
> On Wed, 11 May 2022 05:26:15 -0500, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > On Tue, May 10, 2022 at 12:36:19PM +1200, Kai Huang wrote:
> > > On Mon, 2022-05-09 at 10:17 -0700, Reinette Chatre wrote:
> > > > Hi Jarkko,
> > > >
> > > > On 5/7/2022 10:25 AM, Jarkko Sakkinen wrote:
> > > > > On Thu, Apr 28, 2022 at 04:49:00PM -0700, Reinette Chatre wrote:
> > > > > > > I also looked a little deeper at this transient failure
> > > problem.  The
> > > > > > > ELDU documentation also mentions a possible error code of:
> > > > > > >
> > > > > > > 	SGX_EPC_PAGE_CONFLICT
> > > > > > >
> > > > > > > It *looks* like there can be conflicts on the SECS page as
> > > well as the
> > > > > > > EPC page being explicitly accessed.  Is that a possible
> > > problem here?
> > > > > >
> > > > > > I went down this path myself. SGX_EPC_PAGE_CONFLICT is an
> > > error code
> > > > > > supported by newer ELDUC - the ELDU used in current code would
> > > indeed
> > > > > > #GP in this case. The SDM text describing ELDUC as "This leaf
> > > function
> > > > > > behaves like ELDU but with improved conflict handling for
> > > oversubscription"
> > > > > > really does seem relevant to the test that triggers this issue.
> > > > > >
> > > > > > I stopped pursuing this because from what I understand if
> > > > > > SGX_EPC_PAGE_CONFLICT is encountered with commit 08999b2489b4
> > > ("x86/sgx:
> > > > > > Free backing memory after faulting the enclave page") then it
> > > should
> > > > > > also be encountered without it. The issue is not present with
> > > > > > 08999b2489b4 ("x86/sgx: Free backing memory after faulting the
> > > > > > enclave page") removed. I am thus currently investigating based on
> > > > > > the assumption that the #GP is encountered because of MAC
> > > > > > verification problem. I may be wrong here also and need more
> > > information
> > > > > > since the SDM documents two seemingly related errors:
> > > > > > #GP(0) -> If the instruction fails to verify MAC.
> > > > > > SGX_MAC_COMPARE_FAIL -> If the MAC check fails.
> > > > >
> > > > > This part puzzles me in the pseudo-code.
> > > > >
> > > > > The version is read first:
> > > > >
> > > > > TMP_VER := DS:RDX[63:0];
> > > > >
> > > > > Then there's MAC calculation, comparison,  and finally this check:
> > > > >
> > > > > (* Check version before committing *)
> > > > > IF (DS:RDX ≠ 0)
> > > > >         THEN #GP(0);
> > > > > ELSE
> > > > >         DS:RDX := TMP_VER;
> > > > > FI;
> > > > >
> > > > > For me it is a mystery what does zero the slot and in what condition
> > > > > it would be non-zero. Perhaps the #GP refers anyway to this check?
> > > >
> 
> 
> We discussed this internally, and agree this part of pseudo code needs be
> corrected/clarified.
> 
> Here is what we think was going on when ELDU invoked with PCMD of all zeros:
> ELDU would check if the PCMD.SECINFO.FLAGS.PT is 0 which indicates that the
> page being loaded is a PT_SECS, and the PAGEINFO.SECS is not zero, then the
> instruction will #GP(0).  Thus, ELDU is behaving correctly – it is an
> omission in the SDM pseudocode.
> 
> The version checking code above also need be clarified because the VA slot
> would be cleared at this point and TMP_VER should be zero.
> 
> We will follow up with the architect once he is back from sabbatical and
> make needed adjustment for SDM.

OK, great to hear, thank you!

BR, Jarkko

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-05-12 21:12 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28 20:11 [RFC PATCH 0/4] SGX shmem backing store issue Reinette Chatre
2022-04-28 20:11 ` [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure Reinette Chatre
2022-04-28 21:30   ` Dave Hansen
2022-04-28 22:20     ` Reinette Chatre
2022-04-28 22:53       ` Dave Hansen
2022-04-28 23:49         ` Reinette Chatre
2022-05-03  2:01           ` Kai Huang
2022-05-07 17:25           ` Jarkko Sakkinen
2022-05-09 17:17             ` Reinette Chatre
2022-05-10  0:36               ` Kai Huang
2022-05-11 10:26                 ` Jarkko Sakkinen
2022-05-11 18:29                   ` Haitao Huang
2022-05-11 22:00                     ` Kai Huang
2022-05-12 21:14                     ` Jarkko Sakkinen
2022-05-06 22:09     ` Jarkko Sakkinen
2022-04-28 20:11 ` [RFC PATCH 2/4] x86/sgx: Set dirty bit after modifying page contents Reinette Chatre
2022-04-28 21:40   ` Dave Hansen
2022-04-28 22:41     ` Reinette Chatre
2022-05-06 22:27   ` Jarkko Sakkinen
2022-05-06 22:40     ` Reinette Chatre
2022-05-07 18:01       ` Jarkko Sakkinen
2022-04-28 20:11 ` [RFC PATCH 3/4] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
2022-04-28 21:58   ` Dave Hansen
2022-04-28 22:44     ` Reinette Chatre
2022-05-06 22:43   ` Jarkko Sakkinen
2022-04-28 20:11 ` [RFC PATCH 4/4] x86/sgx: Do not allocate backing pages when loading from backing store Reinette Chatre
2022-04-28 21:12 ` [RFC PATCH 0/4] SGX shmem backing store issue Dave Hansen
2022-04-29 18:50   ` Reinette Chatre
2022-04-29 19:45     ` Dave Hansen
2022-04-30  3:22       ` Reinette Chatre
2022-04-30 15:52         ` Reinette Chatre
2022-05-02 14:36         ` Dave Hansen
2022-05-02 17:11           ` Reinette Chatre
2022-05-02 21:33             ` Dave Hansen
2022-05-04 22:13               ` Reinette Chatre
2022-05-04 22:58                 ` Dave Hansen
2022-05-04 23:36                   ` Reinette Chatre
2022-05-04 23:50                     ` Dave Hansen
2022-05-05  0:08                       ` Reinette Chatre
2022-05-04 23:05                 ` Dave Hansen
2022-05-07 17:46               ` Jarkko Sakkinen
2022-05-07 17:48                 ` Jarkko Sakkinen
2022-05-09 17:09                   ` Reinette Chatre
2022-05-10 22:28                     ` Jarkko Sakkinen
2022-05-11 17:23                       ` Reinette Chatre
2022-05-12 14:10                         ` Jarkko Sakkinen
2022-04-28 21:29 ` Dave Hansen
2022-04-28 22:20   ` Reinette Chatre
2022-05-04  6:40 ` Jarkko Sakkinen
2022-05-05  6:09 ` Jarkko Sakkinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.