All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jarkko Sakkinen <jarkko@kernel.org>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: dave.hansen@linux.intel.com, tglx@linutronix.de, bp@alien8.de,
	luto@kernel.org, mingo@redhat.com, linux-sgx@vger.kernel.org,
	x86@kernel.org, haitao.huang@intel.com, hpa@zytor.com,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH V3 4/5] x86/sgx: Fix race between reclaimer and page fault handler
Date: Fri, 13 May 2022 17:40:14 +0300	[thread overview]
Message-ID: <Yn5tznK08SeWxd4S@iki.fi> (raw)
In-Reply-To: <ed20a5db516aa813873268e125680041ae11dfcf.1652389823.git.reinette.chatre@intel.com>

On Thu, May 12, 2022 at 02:51:00PM -0700, Reinette Chatre wrote:
> Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
> instruction faulting with a #GP.
> 
> The WARN is encountered when the reclaimer evicts a range of
> pages from the enclave when the same pages are faulted back right away.
> 
> Consider two enclave pages (ENCLAVE_A and ENCLAVE_B)
> sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the
> enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains
> just one entry, that of ENCLAVE_B.
> 
> Scenario proceeds where ENCLAVE_A is being evicted from the enclave
> while ENCLAVE_B is faulted in.
> 
> sgx_reclaim_pages() {
> 
>   ...
> 
>   /*
>    * Reclaim ENCLAVE_A
>    */
>   mutex_lock(&encl->lock);
>   /*
>    * Get a reference to ENCLAVE_A's
>    * shmem page where enclave page
>    * encrypted data will be stored
>    * as well as a reference to the
>    * enclave page's PCMD data page,
>    * PCMD_AB.
>    * Release mutex before writing
>    * any data to the shmem pages.
>    */
>   sgx_encl_get_backing(...);
>   encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
>   mutex_unlock(&encl->lock);
> 
>                                     /*
>                                      * Fault ENCLAVE_B
>                                      */
> 
>                                     sgx_vma_fault() {
> 
>                                       mutex_lock(&encl->lock);
>                                       /*
>                                        * Get reference to
>                                        * ENCLAVE_B's shmem page
>                                        * as well as PCMD_AB.
>                                        */
>                                       sgx_encl_get_backing(...)
>                                      /*
>                                       * Load page back into
>                                       * enclave via ELDU.
>                                       */
>                                      /*
>                                       * Release reference to
>                                       * ENCLAVE_B' shmem page and
>                                       * PCMD_AB.
>                                       */
>                                      sgx_encl_put_backing(...);
>                                      /*
>                                       * PCMD_AB is found empty so
>                                       * it and ENCLAVE_B's shmem page
>                                       * are truncated.
>                                       */
>                                      /* Truncate ENCLAVE_B backing page */
>                                      sgx_encl_truncate_backing_page();
>                                      /* Truncate PCMD_AB */
>                                      sgx_encl_truncate_backing_page();
> 
>                                      mutex_unlock(&encl->lock);
> 
>                                      ...
>                                      }
>   mutex_lock(&encl->lock);
>   encl_page->desc &=
>        ~SGX_ENCL_PAGE_BEING_RECLAIMED;
>   /*
>   * Write encrypted contents of
>   * ENCLAVE_A to ENCLAVE_A shmem
>   * page and its PCMD data to
>   * PCMD_AB.
>   */
>   sgx_encl_put_backing(...)
> 
>   /*
>    * Reference to PCMD_AB is
>    * dropped and it is truncated.
>    * ENCLAVE_A's PCMD data is lost.
>    */
>   mutex_unlock(&encl->lock);
> }
> 
> What happens next depends on whether it is ENCLAVE_A being faulted
> in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting
> with a #GP.
> 
> If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called
> a new PCMD page is allocated and providing the empty PCMD data for
> ENCLAVE_A would cause ENCLS[ELDU] to #GP
> 
> If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the
> reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction
> would #GP during its checks of the PCMD value and the WARN would be
> encountered.
> 
> Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time
> it obtains a reference to the backing store pages of an enclave page it
> is in the process of reclaiming, fix the race by only truncating the PCMD
> page after ensuring that no page sharing the PCMD page is in the process
> of being reclaimed.
> 
> Cc: stable@vger.kernel.org
> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
> Reported-by: Haitao Huang <haitao.huang@intel.com>
> Tested-by: Haitao Huang <haitao.huang@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V2:
> - Declare "addr" and "entry" within loop. (Dave)
> - Fix incorrect error return when enclave page not found to belong
>   to enclave. Continue search instead of returning failure. (Dave)
> - Add Haitao's "Tested-by" tag.
> - Rename pcmd_page_in_use() to reclaimer_writing_to_pcmd() to be less
>   generic. (Dave)
> - Improve function comments to be clear about it testing for PCMD
>   page soon becoming non-empty, also add context info to kernel-doc
>   to indicate that enclave mutex must be held. (Dave)
> 
> Changes since RFC v1:
> - New patch.
> 
>  arch/x86/kernel/cpu/sgx/encl.c | 94 +++++++++++++++++++++++++++++++++-
>  1 file changed, 93 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 5104a428b72c..243f3bd78145 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -12,6 +12,92 @@
>  #include "encls.h"
>  #include "sgx.h"
>  
> +#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd))
> +/*
> + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to
> + * determine the page index associated with the first PCMD entry
> + * within a PCMD page.
> + */
> +#define PCMD_FIRST_MASK GENMASK(4, 0)
> +
> +/**
> + * reclaimer_writing_to_pcmd() - Query if any enclave page associated with
> + *                               a PCMD page is in process of being reclaimed.
> + * @encl:        Enclave to which PCMD page belongs
> + * @start_addr:  Address of enclave page using first entry within the PCMD page
> + *
> + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
> + * stored. The PCMD data of a reclaimed enclave page contains enough
> + * information for the processor to verify the page at the time
> + * it is loaded back into the Enclave Page Cache (EPC).
> + *
> + * The backing storage to which enclave pages are reclaimed is laid out as
> + * follows:
> + * Encrypted enclave pages:SECS page:PCMD pages
> + *
> + * Each PCMD page contains the PCMD metadata of
> + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
> + *
> + * A PCMD page can only be truncated if it is (a) empty, and (b) not in the
> + * process of getting data (and thus soon being non-empty). (b) is tested with
> + * a check if an enclave page sharing the PCMD page is in the process of being
> + * reclaimed.
> + *
> + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
> + * intends to reclaim that enclave page - it means that the PCMD page
> + * associated with that enclave page is about to get some data and thus
> + * even if the PCMD page is empty, it should not be truncated.
> + *
> + * Context: Enclave mutex (&sgx_encl->lock) must be held.
> + * Return: 1 if the reclaimer is about to write to the PCMD page
> + *         0 if the reclaimer has no intention to write to the PCMD page
> + */
> +static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
> +				     unsigned long start_addr)
> +{
> +	int reclaimed = 0;
> +	int i;
> +
> +	/*
> +	 * PCMD_FIRST_MASK is based on number of PCMD entries within
> +	 * PCMD page being 32.
> +	 */
> +	BUILD_BUG_ON(PCMDS_PER_PAGE != 32);
> +
> +	for (i = 0; i < PCMDS_PER_PAGE; i++) {
> +		struct sgx_encl_page *entry;
> +		unsigned long addr;
> +
> +		addr = start_addr + i * PAGE_SIZE;
> +
> +		/*
> +		 * Stop when reaching the SECS page - it does not
> +		 * have a page_array entry and its reclaim is
> +		 * started and completed with enclave mutex held so
> +		 * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
> +		 * flag.
> +		 */
> +		if (addr == encl->base + encl->size)
> +			break;
> +
> +		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
> +		if (!entry)
> +			continue;
> +
> +		/*
> +		 * VA page slot ID uses same bit as the flag so it is important
> +		 * to ensure that the page is not already in backing store.
> +		 */
> +		if (entry->epc_page &&
> +		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
> +			reclaimed = 1;
> +			break;
> +		}
> +	}
> +
> +	return reclaimed;
> +}
> +
>  /*
>   * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's
>   * follow right after the EPC data in the backing storage. In addition to the
> @@ -47,6 +133,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK;
>  	struct sgx_encl *encl = encl_page->encl;
>  	pgoff_t page_index, page_pcmd_off;
> +	unsigned long pcmd_first_page;
>  	struct sgx_pageinfo pginfo;
>  	struct sgx_backing b;
>  	bool pcmd_page_empty;
> @@ -58,6 +145,11 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	else
>  		page_index = PFN_DOWN(encl->size);
>  
> +	/*
> +	 * Address of enclave page using the first entry within the PCMD page.
> +	 */
> +	pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base;
> +
>  	page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
>  
>  	ret = sgx_encl_get_backing(encl, page_index, &b);
> @@ -99,7 +191,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  
>  	sgx_encl_truncate_backing_page(encl, page_index);
>  
> -	if (pcmd_page_empty)
> +	if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page))
>  		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
>  
>  	return ret;
> -- 
> 2.25.1
> 

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko

  reply	other threads:[~2022-05-13 14:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12 21:50 [PATCH V3 0/5] SGX shmem backing store issue Reinette Chatre
2022-05-12 21:50 ` [PATCH V3 1/5] x86/sgx: Disconnect backing page references from dirty status Reinette Chatre
2022-05-13 14:39   ` Jarkko Sakkinen
2022-05-12 21:50 ` [PATCH V3 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents Reinette Chatre
2022-05-12 21:50 ` [PATCH V3 3/5] x86/sgx: Obtain backing storage page with enclave mutex held Reinette Chatre
2022-05-13 14:39   ` Jarkko Sakkinen
2022-05-12 21:51 ` [PATCH V3 4/5] x86/sgx: Fix race between reclaimer and page fault handler Reinette Chatre
2022-05-13 14:40   ` Jarkko Sakkinen [this message]
2022-05-12 21:51 ` [PATCH V3 5/5] x86/sgx: Ensure no data in PCMD page after truncate Reinette Chatre
2022-05-13 14:39   ` Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yn5tznK08SeWxd4S@iki.fi \
    --to=jarkko@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=haitao.huang@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.