linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/sgx: WARN once if EREMOVE fails when killing an enclave
@ 2019-10-08  4:13 Sean Christopherson
  2019-10-08  4:15 ` Sean Christopherson
  2019-10-09  0:04 ` Jarkko Sakkinen
  0 siblings, 2 replies; 6+ messages in thread
From: Sean Christopherson @ 2019-10-08  4:13 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

WARN if EREMOVE fails when destroying an enclave.  sgx_encl_release()
uses the non-WARN __sgx_free_page() when freeing pages as some pages may
be in the process of being reclaimed, i.e. are owned by the reclaimer.
But EREMOVE should never fail as sgx_encl_destroy() is only called when
the enclave cannot have active threads, e.g. prior to EINIT and when the
enclave is being released.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 54ca827e68a9..a6786e7ae40e 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -463,16 +463,23 @@ void sgx_encl_destroy(struct sgx_encl *encl)
 	struct sgx_encl_page *entry;
 	struct radix_tree_iter iter;
 	void **slot;
+	int r;
 
 	atomic_or(SGX_ENCL_DEAD, &encl->flags);
 
 	radix_tree_for_each_slot(slot, &encl->page_tree, &iter, 0) {
 		entry = *slot;
 		if (entry->epc_page) {
-			if (!__sgx_free_page(entry->epc_page)) {
+			/*
+			 * Freeing the page can fail if it's in the process of
+			 * being reclaimed (-EBUSY), but EREMOVE itself should
+			 * not fail at this point.
+			 */
+			r = __sgx_free_page(entry->epc_page);
+			WARN_ONCE(r > 0, "sgx: EREMOVE returned %d (0x%x)", r, r);
+			if (!r) {
 				encl->secs_child_cnt--;
 				entry->epc_page = NULL;
-
 			}
 
 			radix_tree_delete(&entry->encl->page_tree,
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/sgx: WARN once if EREMOVE fails when killing an enclave
  2019-10-08  4:13 [PATCH] x86/sgx: WARN once if EREMOVE fails when killing an enclave Sean Christopherson
@ 2019-10-08  4:15 ` Sean Christopherson
  2019-10-09  0:04 ` Jarkko Sakkinen
  1 sibling, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2019-10-08  4:15 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

On Mon, Oct 07, 2019 at 09:13:34PM -0700, Sean Christopherson wrote:
> WARN if EREMOVE fails when destroying an enclave.  sgx_encl_release()
> uses the non-WARN __sgx_free_page() when freeing pages as some pages may
> be in the process of being reclaimed, i.e. are owned by the reclaimer.
> But EREMOVE should never fail as sgx_encl_destroy() is only called when
> the enclave cannot have active threads, e.g. prior to EINIT and when the
> enclave is being released.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/encl.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 54ca827e68a9..a6786e7ae40e 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -463,16 +463,23 @@ void sgx_encl_destroy(struct sgx_encl *encl)
>  	struct sgx_encl_page *entry;
>  	struct radix_tree_iter iter;
>  	void **slot;
> +	int r;
>  
>  	atomic_or(SGX_ENCL_DEAD, &encl->flags);
>  
>  	radix_tree_for_each_slot(slot, &encl->page_tree, &iter, 0) {
>  		entry = *slot;
>  		if (entry->epc_page) {
> -			if (!__sgx_free_page(entry->epc_page)) {
> +			/*
> +			 * Freeing the page can fail if it's in the process of
> +			 * being reclaimed (-EBUSY), but EREMOVE itself should
> +			 * not fail at this point.
> +			 */
> +			r = __sgx_free_page(entry->epc_page);
> +			WARN_ONCE(r > 0, "sgx: EREMOVE returned %d (0x%x)", r, r);
> +			if (!r) {
>  				encl->secs_child_cnt--;
>  				entry->epc_page = NULL;
> -
>  			}
>  
>  			radix_tree_delete(&entry->encl->page_tree,
> -- 
> 2.22.0

Intended for v23, forgot to tag the subject...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/sgx: WARN once if EREMOVE fails when killing an enclave
  2019-10-08  4:13 [PATCH] x86/sgx: WARN once if EREMOVE fails when killing an enclave Sean Christopherson
  2019-10-08  4:15 ` Sean Christopherson
@ 2019-10-09  0:04 ` Jarkko Sakkinen
  2019-10-10 18:35   ` Sean Christopherson
  1 sibling, 1 reply; 6+ messages in thread
From: Jarkko Sakkinen @ 2019-10-09  0:04 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-sgx

On Mon, Oct 07, 2019 at 09:13:34PM -0700, Sean Christopherson wrote:
> WARN if EREMOVE fails when destroying an enclave.  sgx_encl_release()
> uses the non-WARN __sgx_free_page() when freeing pages as some pages may
> be in the process of being reclaimed, i.e. are owned by the reclaimer.
> But EREMOVE should never fail as sgx_encl_destroy() is only called when
> the enclave cannot have active threads, e.g. prior to EINIT and when the
> enclave is being released.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

For me this concludes that I will manually convert all the call sites
to use __sgx_free_page() and add appropriate warnings. I agree with
Borislav's conclusions here.

/Jarkko

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/sgx: WARN once if EREMOVE fails when killing an enclave
  2019-10-09  0:04 ` Jarkko Sakkinen
@ 2019-10-10 18:35   ` Sean Christopherson
  2019-10-10 18:56     ` Sean Christopherson
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2019-10-10 18:35 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

On Wed, Oct 09, 2019 at 03:04:50AM +0300, Jarkko Sakkinen wrote:
> On Mon, Oct 07, 2019 at 09:13:34PM -0700, Sean Christopherson wrote:
> > WARN if EREMOVE fails when destroying an enclave.  sgx_encl_release()
> > uses the non-WARN __sgx_free_page() when freeing pages as some pages may
> > be in the process of being reclaimed, i.e. are owned by the reclaimer.
> > But EREMOVE should never fail as sgx_encl_destroy() is only called when
> > the enclave cannot have active threads, e.g. prior to EINIT and when the
> > enclave is being released.
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> For me this concludes that I will manually convert all the call sites
> to use __sgx_free_page() and add appropriate warnings. I agree with
> Borislav's conclusions here.

Argh, now we have a bunch of call sites that can silently leak EPC pages,
and I'm seeing timeouts during testing that strongly suggest pages are
being leaked...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/sgx: WARN once if EREMOVE fails when killing an enclave
  2019-10-10 18:35   ` Sean Christopherson
@ 2019-10-10 18:56     ` Sean Christopherson
  2019-10-10 20:52       ` Sean Christopherson
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2019-10-10 18:56 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

On Thu, Oct 10, 2019 at 11:35:48AM -0700, Sean Christopherson wrote:
> On Wed, Oct 09, 2019 at 03:04:50AM +0300, Jarkko Sakkinen wrote:
> > On Mon, Oct 07, 2019 at 09:13:34PM -0700, Sean Christopherson wrote:
> > > WARN if EREMOVE fails when destroying an enclave.  sgx_encl_release()
> > > uses the non-WARN __sgx_free_page() when freeing pages as some pages may
> > > be in the process of being reclaimed, i.e. are owned by the reclaimer.
> > > But EREMOVE should never fail as sgx_encl_destroy() is only called when
> > > the enclave cannot have active threads, e.g. prior to EINIT and when the
> > > enclave is being released.
> > > 
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > For me this concludes that I will manually convert all the call sites
> > to use __sgx_free_page() and add appropriate warnings. I agree with
> > Borislav's conclusions here.
> 
> Argh, now we have a bunch of call sites that can silently leak EPC pages,
> and I'm seeing timeouts during testing that strongly suggest pages are
> being leaked...

Confirmed that we're leaking pages, but it's not related to the -EBUSY
case in sgx_free_page().  Debug in progress...

As to the sgx_free_page() thing, I think we can invert the old WARN logic
and make everyone happy.  I'll send a patch.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/sgx: WARN once if EREMOVE fails when killing an enclave
  2019-10-10 18:56     ` Sean Christopherson
@ 2019-10-10 20:52       ` Sean Christopherson
  0 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2019-10-10 20:52 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

On Thu, Oct 10, 2019 at 11:56:07AM -0700, Sean Christopherson wrote:
> On Thu, Oct 10, 2019 at 11:35:48AM -0700, Sean Christopherson wrote:
> > On Wed, Oct 09, 2019 at 03:04:50AM +0300, Jarkko Sakkinen wrote:
> > > On Mon, Oct 07, 2019 at 09:13:34PM -0700, Sean Christopherson wrote:
> > > > WARN if EREMOVE fails when destroying an enclave.  sgx_encl_release()
> > > > uses the non-WARN __sgx_free_page() when freeing pages as some pages may
> > > > be in the process of being reclaimed, i.e. are owned by the reclaimer.
> > > > But EREMOVE should never fail as sgx_encl_destroy() is only called when
> > > > the enclave cannot have active threads, e.g. prior to EINIT and when the
> > > > enclave is being released.
> > > > 
> > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > 
> > > For me this concludes that I will manually convert all the call sites
> > > to use __sgx_free_page() and add appropriate warnings. I agree with
> > > Borislav's conclusions here.
> > 
> > Argh, now we have a bunch of call sites that can silently leak EPC pages,
> > and I'm seeing timeouts during testing that strongly suggest pages are
> > being leaked...
> 
> Confirmed that we're leaking pages, but it's not related to the -EBUSY
> case in sgx_free_page().  Debug in progress...
> 
> As to the sgx_free_page() thing, I think we can invert the old WARN logic
> and make everyone happy.  I'll send a patch.

Figured out what's up.  I'm testing in a VM with multiple EPC sections.
Because of a change in v23[*], sgx_nr_free_pages is getting corrupted due
to non-atomic concurrent writes.  When it drops below 0 and wraps to a
high value the swap thread stops reclaiming and things grind to a halt.

[*] https://patchwork.kernel.org/patch/11146733/#22887361

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-10-10 20:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-08  4:13 [PATCH] x86/sgx: WARN once if EREMOVE fails when killing an enclave Sean Christopherson
2019-10-08  4:15 ` Sean Christopherson
2019-10-09  0:04 ` Jarkko Sakkinen
2019-10-10 18:35   ` Sean Christopherson
2019-10-10 18:56     ` Sean Christopherson
2019-10-10 20:52       ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).