Re: [RFC PATCH 0/4] SGX shmem backing store issue

From: Reinette Chatre <reinette.chatre@intel.com>
To: Dave Hansen <dave.hansen@intel.com>,
	<dave.hansen@linux.intel.com>, <jarkko@kernel.org>,
	<linux-sgx@vger.kernel.org>
Cc: <haitao.huang@intel.com>
Subject: Re: [RFC PATCH 0/4] SGX shmem backing store issue
Date: Fri, 29 Apr 2022 20:22:15 -0700	[thread overview]
Message-ID: <feedd67f-4b50-aac1-16de-a3ffce9208c6@intel.com> (raw)
In-Reply-To: <faaaf70d-22b7-eaa5-4623-bbdf1012827a@intel.com>

Hi Dave,

On 4/29/2022 12:45 PM, Dave Hansen wrote:
> On 4/29/22 11:50, Reinette Chatre wrote:
>> I am not familiar with this area - is the PFN expected to be consistent? Do
>> you perhaps have an idea why the PFN of the PCMD page may have changed? This
>> is running with this series applied so the ELDU flow would do a lookup of the
>> PCMD page and not attempt to allocate a new one.
> 
> First of all, cool!  This is really good progress!
> 
> It's possible for the PCMD shmem page to be swapped out and swapped back
> in.  The pfn would be likely to change in that case.  But, that seems
> highly unlikely in that short of a period of time.
> 
> I'd dump out two more things:
> 
> First, dump out page_pcmd_off, just like you're doing with page_index in
> case page_pcmd_off itself is getting botched.  I looked but couldn't
> find anything obvious.
> 
> Second, dump out the pfn in sgx_encl_truncate_backing_page().  It's
> possible that something is getting overly-aggressive and zapping the
> PCMD page too early.  That would be easy to explain with that PCMD
> locking issue you discovered.  But, your patch should have fixed that issue.
> 
> For debugging, could you double-check that the PCMD page *is* empty
> around sgx_encl_truncate_backing_page()?  If there's a race here you can
> also enlarge the race window by adding an msleep() or a spin loop
> somewhere after the memchr_inv().
> 
> You could also hold an extra reference on the PCMD page, maybe something
> like the attached patch.  That will let you inspect the actual page
> after it is *actually* truncated.  There should never be data in the
> page there.


Thank you so much for pointing me in the right direction. With this
information I believe that I found a race condition. I did create a
fix for it and it has been running for more than an hour now without
a WARN _and_ without the -ENOENT error introduced in patch 4/4.

Please find below the description of the race and the fix.

Scenario is that the reclaimer is evicting a range of pages that
are being faulted right away.

Consider three enclave pages (ENCLAVE_A, ENCLAVE_B, and ENCLAVE_C)
sharing a PCMD page (PCMD_ABC). ENCLAVE_A and ENCLAVE_B are in the
enclave memory and ENCLAVE_C is in backing store. PCMD_ABC contains
just one entry, that of ENCLAVE_C.

Scenario proceeds where ENCLAVE_A is being evicted from the enclave
while ENCLAVE_C is faulted in.

Scenario is with this patch series applied.

sgx_reclaim_pages() {

  ...

  /*
   * Reclaim ENCLAVE_A
   */
  mutex_lock(&encl->lock);
  /*
   * Get a reference to ENCLAVE_A's
   * shmem page where enclave page
   * encrypted data will be stored
   * as well as a reference to the
   * enclave page's PCMD data page,
   * PCMD_ABC.
   */
  sgx_encl_alloc_backing(...);
  mutex_unlock(&encl->lock);

                                    /*
                                     * Fault ENCLAVE_C
                                     */

                                    sgx_vma_fault() {

                                      mutex_lock(&encl->lock);
                                      /*
                                       * Get reference to
                                       * ENCLAVE_C's shmem page
                                       * as well as PCMD_ABC.
                                       */
                                      sgx_encl_get_backing(...)
                                     /*
                                      * load page back into
                                      * enclave via ELDU
                                      */
                                     /*
                                      * Release reference to
                                      * ENCLAVE_C' shmem page and
                                      * PCMD_ABC.
                                      */
                                     sgx_encl_put_backing(...);
                                     /*
                                      * PCMD_ABC is found empty so
                                      * it and ENCLAVE_C's shmem page
                                      * are truncated.
                                      */
                                     /* Truncate ENCLAVE_C backing page */
                                     sgx_encl_truncate_backing_page();
                                     /* Truncate PCMD_ABC */
                                     sgx_encl_truncate_backing_page();

                                     mutex_unlock(&encl->lock);

                                     ...
                                     }
  mutex_lock(&encl->lock);
  /*
  * Write encrypted contents of
  * ENCLAVE_A to ENCLAVE_A shmem
  * page and its PCMD data to
  * PCMD_ABC.
  */

  sgx_encl_put_backing(...)

  /*
   * Reference to PCMD_ABC is
   * dropped and it is truncated.
   * ENCLAVE_A's PCMD data is lost.
   */
  mutex_unlock(&encl->lock);
}



What happens next depends on whether it is ENCLAVE_A being faulted
in or ENCLAVE_B (or ENCLAVE_C) being evicted.

If ENCLAVE_A is faulted then the error introduced in patch 4/4 would
be encountered since lookup of PCMD_ABC fails. Before patch 4/4 a new
PCMD page would be allocated and then ENCLS[ELDU] would WARN on the PCMD
data.
If ENCLAVE_B is evicted first then a new PCMD_ABC would be allocated but
later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction would #GP
during its checks of the PCMD value and the WARN would be encountered.

Below is the fix I am currently testing. It would not truncate the memory
if there is a reference to the PCMD page.

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 22ed886dc825..81c7bfc12b9b 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -98,11 +98,10 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	kunmap_atomic(pcmd_page);
 	kunmap_atomic((void *)(unsigned long)pginfo.contents);
 
-	sgx_encl_put_backing(&b, true);
-
+	put_page(b.contents);
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-	if (pcmd_page_empty) {
+	if (pcmd_page_empty && && put_page_testzero(b.pcmd)) {
 		ida_free(&encl->pcmd_in_backing, PFN_DOWN(page_pcmd_off));
 		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
 	}


I plan on letting the test keep running to make sure it really fixes the
issue. If all goes well patch 4/4 can be dropped and the fix included instead.

Thanks again for your guidance. It was tremendously helpful.

Reinette