linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
@ 2020-12-16 13:49 Jarkko Sakkinen
  2021-01-04 20:22 ` Haitao Huang
  2021-01-05 14:57 ` Borislav Petkov
  0 siblings, 2 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-12-16 13:49 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, linux-sgx, Jarkko Sakkinen, Sean Christopherson,
	Haitao Huang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Jethro Beekman

Add synchronize_srcu_expedited() to sgx_encl_release() to catch a grace
period initiated by sgx_mmu_notifier_release().

A trivial example of a failing sequence with tasks A and B:

1. A: -> sgx_release()
2. B: -> sgx_mmu_notifier_release()
3. B: -> list_del_rcu()
3. A: -> sgx_encl_release()
4. A: -> cleanup_srcu_struct()

The loop in sgx_release() observes an empty list because B has removed its
entry in the middle, and calls cleanup_srcu_struct() before B has a chance
to calls synchronize_srcu().

Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Haitao Huang <haitao.huang@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
v3: Fine-tuned tags, and added missing change log for v2.
v2: Switch to synchronize_srcu_expedited().
 arch/x86/kernel/cpu/sgx/encl.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index ee50a5010277..fe7256db6e73 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -438,6 +438,12 @@ void sgx_encl_release(struct kref *ref)
 	if (encl->backing)
 		fput(encl->backing);
 
+	/*
+	 * Each sgx_mmu_notifier_release() starts a grace period. Therefore, an
+	 * additional sync is required here.
+	 */
+	synchronize_srcu_expedited(&encl->srcu);
+
 	cleanup_srcu_struct(&encl->srcu);
 
 	WARN_ON_ONCE(!list_empty(&encl->mm_list));
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2020-12-16 13:49 [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release() Jarkko Sakkinen
@ 2021-01-04 20:22 ` Haitao Huang
  2021-01-11 23:41   ` Jarkko Sakkinen
  2021-01-05 14:57 ` Borislav Petkov
  1 sibling, 1 reply; 14+ messages in thread
From: Haitao Huang @ 2021-01-04 20:22 UTC (permalink / raw)
  To: x86, Jarkko Sakkinen
  Cc: linux-kernel, linux-sgx, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Jethro Beekman

On Wed, 16 Dec 2020 07:49:20 -0600, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> Add synchronize_srcu_expedited() to sgx_encl_release() to catch a grace
> period initiated by sgx_mmu_notifier_release().
>
> A trivial example of a failing sequence with tasks A and B:
>
> 1. A: -> sgx_release()
> 2. B: -> sgx_mmu_notifier_release()
> 3. B: -> list_del_rcu()
> 3. A: -> sgx_encl_release()
> 4. A: -> cleanup_srcu_struct()
>
> The loop in sgx_release() observes an empty list because B has removed  
> its
> entry in the middle, and calls cleanup_srcu_struct() before B has a  
> chance
> to calls synchronize_srcu().
>
> Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Suggested-by: Haitao Huang <haitao.huang@linux.intel.com>
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> ---
> v3: Fine-tuned tags, and added missing change log for v2.
> v2: Switch to synchronize_srcu_expedited().
>  arch/x86/kernel/cpu/sgx/encl.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c  
> b/arch/x86/kernel/cpu/sgx/encl.c
> index ee50a5010277..fe7256db6e73 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -438,6 +438,12 @@ void sgx_encl_release(struct kref *ref)
>  	if (encl->backing)
>  		fput(encl->backing);
> +	/*
> +	 * Each sgx_mmu_notifier_release() starts a grace period. Therefore, an
> +	 * additional sync is required here.
> +	 */
> +	synchronize_srcu_expedited(&encl->srcu);
> +
>  	cleanup_srcu_struct(&encl->srcu);
> 	WARN_ON_ONCE(!list_empty(&encl->mm_list));

Tested-by: Haitao Huang <haitao.huang@linux.intel.com>
Thanks
Haitao

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2020-12-16 13:49 [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release() Jarkko Sakkinen
  2021-01-04 20:22 ` Haitao Huang
@ 2021-01-05 14:57 ` Borislav Petkov
  2021-01-12  0:08   ` Jarkko Sakkinen
  1 sibling, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2021-01-05 14:57 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: x86, linux-kernel, linux-sgx, Sean Christopherson, Haitao Huang,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jethro Beekman

On Wed, Dec 16, 2020 at 03:49:20PM +0200, Jarkko Sakkinen wrote:
> Add synchronize_srcu_expedited() to sgx_encl_release() to catch a grace
> period initiated by sgx_mmu_notifier_release().
> 
> A trivial example of a failing sequence with tasks A and B:
> 
> 1. A: -> sgx_release()
> 2. B: -> sgx_mmu_notifier_release()
> 3. B: -> list_del_rcu()
> 3. A: -> sgx_encl_release()
> 4. A: -> cleanup_srcu_struct()
> 
> The loop in sgx_release() observes an empty list because B has removed its
> entry in the middle, and calls cleanup_srcu_struct() before B has a chance
> to calls synchronize_srcu().

Leading to what? NULL ptr?

https://lkml.kernel.org/r/X9e2jOWz1hfXVpQ5@google.com

already suggested that you should explain the bug better and add the
splat but I'm still missing that explanation.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2021-01-04 20:22 ` Haitao Huang
@ 2021-01-11 23:41   ` Jarkko Sakkinen
  0 siblings, 0 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2021-01-11 23:41 UTC (permalink / raw)
  To: Haitao Huang
  Cc: x86, linux-kernel, linux-sgx, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Jethro Beekman

On Mon, Jan 04, 2021 at 02:22:05PM -0600, Haitao Huang wrote:
> On Wed, 16 Dec 2020 07:49:20 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > Add synchronize_srcu_expedited() to sgx_encl_release() to catch a grace
> > period initiated by sgx_mmu_notifier_release().
> > 
> > A trivial example of a failing sequence with tasks A and B:
> > 
> > 1. A: -> sgx_release()
> > 2. B: -> sgx_mmu_notifier_release()
> > 3. B: -> list_del_rcu()
> > 3. A: -> sgx_encl_release()
> > 4. A: -> cleanup_srcu_struct()
> > 
> > The loop in sgx_release() observes an empty list because B has removed
> > its
> > entry in the middle, and calls cleanup_srcu_struct() before B has a
> > chance
> > to calls synchronize_srcu().
> > 
> > Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Suggested-by: Haitao Huang <haitao.huang@linux.intel.com>
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > ---
> > v3: Fine-tuned tags, and added missing change log for v2.
> > v2: Switch to synchronize_srcu_expedited().
> >  arch/x86/kernel/cpu/sgx/encl.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c
> > b/arch/x86/kernel/cpu/sgx/encl.c
> > index ee50a5010277..fe7256db6e73 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -438,6 +438,12 @@ void sgx_encl_release(struct kref *ref)
> >  	if (encl->backing)
> >  		fput(encl->backing);
> > +	/*
> > +	 * Each sgx_mmu_notifier_release() starts a grace period. Therefore, an
> > +	 * additional sync is required here.
> > +	 */
> > +	synchronize_srcu_expedited(&encl->srcu);
> > +
> >  	cleanup_srcu_struct(&encl->srcu);
> > 	WARN_ON_ONCE(!list_empty(&encl->mm_list));
> 
> Tested-by: Haitao Huang <haitao.huang@linux.intel.com>

Thanks.

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2021-01-05 14:57 ` Borislav Petkov
@ 2021-01-12  0:08   ` Jarkko Sakkinen
  2021-01-12 18:35     ` Borislav Petkov
  2021-01-14  4:42     ` Haitao Huang
  0 siblings, 2 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2021-01-12  0:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, linux-kernel, linux-sgx, Sean Christopherson, Haitao Huang,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jethro Beekman

On Tue, Jan 05, 2021 at 03:57:49PM +0100, Borislav Petkov wrote:
> On Wed, Dec 16, 2020 at 03:49:20PM +0200, Jarkko Sakkinen wrote:
> > Add synchronize_srcu_expedited() to sgx_encl_release() to catch a grace
> > period initiated by sgx_mmu_notifier_release().
> > 
> > A trivial example of a failing sequence with tasks A and B:
> > 
> > 1. A: -> sgx_release()
> > 2. B: -> sgx_mmu_notifier_release()
> > 3. B: -> list_del_rcu()
> > 3. A: -> sgx_encl_release()
> > 4. A: -> cleanup_srcu_struct()
> > 
> > The loop in sgx_release() observes an empty list because B has removed its
> > entry in the middle, and calls cleanup_srcu_struct() before B has a chance
> > to calls synchronize_srcu().
> 
> Leading to what? NULL ptr?
> 
> https://lkml.kernel.org/r/X9e2jOWz1hfXVpQ5@google.com
> 
> already suggested that you should explain the bug better and add the
> splat but I'm still missing that explanation.

OK, I'll try to explain it how I understand the issue.

Consider this loop in the VFS release hook (sgx_release):

	/*
	 * Drain the remaining mm_list entries. At this point the list contains
	 * entries for processes, which have closed the enclave file but have
	 * not exited yet. The processes, which have exited, are gone from the
	 * list by sgx_mmu_notifier_release().
	 */
	for ( ; ; )  {
		spin_lock(&encl->mm_lock);

		if (list_empty(&encl->mm_list)) {
			encl_mm = NULL;
		} else {
			encl_mm = list_first_entry(&encl->mm_list,
						   struct sgx_encl_mm, list);
			list_del_rcu(&encl_mm->list);
		}

		spin_unlock(&encl->mm_lock);

		/* The enclave is no longer mapped by any mm. */
		if (!encl_mm)
			break;

		synchronize_srcu(&encl->srcu);
		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
		kfree(encl_mm);
	}


At this point all processes have closed the enclave file, but that doesn't
mean that they all have exited yet.

Now, let's imagine that there is exactly one entry in the encl->mm_list.
and sgx_release() execution gets scheduled right after returning from
synchronize_srcu().

With some bad luck, some process comes and removes that last entry befoe
sgx_release() acquires mm_lock. The loop in sgx_release() just leaves

		/* The enclave is no longer mapped by any mm. */
		if (!encl_mm)
			break;

No synchronize_srcu().

After writing this, I think that the placement for synchronize_srcu()
in this patch is not best possible. It should be rather that the
above loop would also call synchronize_srcu() when leaving.

I.e. the code change would result:

	for ( ; ; )  {
		spin_lock(&encl->mm_lock);

		if (list_empty(&encl->mm_list)) {
			encl_mm = NULL;
		} else {
			encl_mm = list_first_entry(&encl->mm_list,
						   struct sgx_encl_mm, list);
			list_del_rcu(&encl_mm->list);
		}

		spin_unlock(&encl->mm_lock);

                /* 
                 * synchronize_srcu() is mandatory *even* when the list was
                 * empty, in order make sure that grace periods stays in
                 * sync even when another task took away the last entry
                 * (i.e. exiting process when it deletes its mm_list).
                 */
		synchronize_srcu(&encl->srcu);

		/* The enclave is no longer mapped by any mm. */
		if (!encl_mm)
			break;

		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
		kfree(encl_mm);
	}

What do you think? Does this start to make more sense now?
I don't have logs for this but the bug can be also reasoned.

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2021-01-12  0:08   ` Jarkko Sakkinen
@ 2021-01-12 18:35     ` Borislav Petkov
  2021-01-12 20:11       ` Paul E. McKenney
  2021-01-13 17:18       ` Jarkko Sakkinen
  2021-01-14  4:42     ` Haitao Huang
  1 sibling, 2 replies; 14+ messages in thread
From: Borislav Petkov @ 2021-01-12 18:35 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: x86, linux-kernel, linux-sgx, Sean Christopherson, Haitao Huang,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jethro Beekman,
	Paul E. McKenney

+ paulmck.

On Tue, Jan 12, 2021 at 02:08:10AM +0200, Jarkko Sakkinen wrote:
> On Tue, Jan 05, 2021 at 03:57:49PM +0100, Borislav Petkov wrote:
> > On Wed, Dec 16, 2020 at 03:49:20PM +0200, Jarkko Sakkinen wrote:
> > > Add synchronize_srcu_expedited() to sgx_encl_release() to catch a grace
> > > period initiated by sgx_mmu_notifier_release().
> > > 
> > > A trivial example of a failing sequence with tasks A and B:
> > > 
> > > 1. A: -> sgx_release()
> > > 2. B: -> sgx_mmu_notifier_release()
> > > 3. B: -> list_del_rcu()
> > > 3. A: -> sgx_encl_release()
> > > 4. A: -> cleanup_srcu_struct()
> > > 
> > > The loop in sgx_release() observes an empty list because B has removed its
> > > entry in the middle, and calls cleanup_srcu_struct() before B has a chance
> > > to calls synchronize_srcu().
> > 
> > Leading to what? NULL ptr?
> > 
> > https://lkml.kernel.org/r/X9e2jOWz1hfXVpQ5@google.com
> > 
> > already suggested that you should explain the bug better and add the
> > splat but I'm still missing that explanation.
> 
> OK, I'll try to explain it how I understand the issue.
> 
> Consider this loop in the VFS release hook (sgx_release):
> 
> 	/*
> 	 * Drain the remaining mm_list entries. At this point the list contains
> 	 * entries for processes, which have closed the enclave file but have
> 	 * not exited yet. The processes, which have exited, are gone from the
> 	 * list by sgx_mmu_notifier_release().
> 	 */
> 	for ( ; ; )  {
> 		spin_lock(&encl->mm_lock);
> 
> 		if (list_empty(&encl->mm_list)) {
> 			encl_mm = NULL;
> 		} else {
> 			encl_mm = list_first_entry(&encl->mm_list,
> 						   struct sgx_encl_mm, list);
> 			list_del_rcu(&encl_mm->list);
> 		}
> 
> 		spin_unlock(&encl->mm_lock);
> 
> 		/* The enclave is no longer mapped by any mm. */
> 		if (!encl_mm)
> 			break;
> 
> 		synchronize_srcu(&encl->srcu);
> 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> 		kfree(encl_mm);
> 	}
> 
> 
> At this point all processes have closed the enclave file, but that doesn't
> mean that they all have exited yet.
> 
> Now, let's imagine that there is exactly one entry in the encl->mm_list.
> and sgx_release() execution gets scheduled right after returning from
> synchronize_srcu().
> 
> With some bad luck, some process comes and removes that last entry befoe
> sgx_release() acquires mm_lock. The loop in sgx_release() just leaves
> 
> 		/* The enclave is no longer mapped by any mm. */
> 		if (!encl_mm)
> 			break;
> 
> No synchronize_srcu().
> 
> After writing this, I think that the placement for synchronize_srcu()
> in this patch is not best possible. It should be rather that the
> above loop would also call synchronize_srcu() when leaving.
> 
> I.e. the code change would result:
> 
> 	for ( ; ; )  {
> 		spin_lock(&encl->mm_lock);
> 
> 		if (list_empty(&encl->mm_list)) {
> 			encl_mm = NULL;
> 		} else {
> 			encl_mm = list_first_entry(&encl->mm_list,
> 						   struct sgx_encl_mm, list);
> 			list_del_rcu(&encl_mm->list);
> 		}
> 
> 		spin_unlock(&encl->mm_lock);
> 
>                 /* 
>                  * synchronize_srcu() is mandatory *even* when the list was
>                  * empty, in order make sure that grace periods stays in
>                  * sync even when another task took away the last entry
>                  * (i.e. exiting process when it deletes its mm_list).
>                  */
> 		synchronize_srcu(&encl->srcu);
> 
> 		/* The enclave is no longer mapped by any mm. */
> 		if (!encl_mm)
> 			break;
> 
> 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> 		kfree(encl_mm);
> 	}
> 
> What do you think? Does this start to make more sense now?
> I don't have logs for this but the bug can be also reasoned.

It does. Now you need to write it up in a detailed form so that it is
clear to readers months/years from now what exactly can happen. You can
use a two-column format like

	CPU A				CPU B

Bla
					Blu

This happens now here
					But this needs to happen there

and so on.

Also, from reading up a bit on this, Documentation/RCU/checklist.rst says

"Use of the expedited primitives should be restricted to rare
configuration-change operations that would not normally be undertaken
while a real-time workload is running."

so why are you using synchronize_srcu_expedited()? Grepping the tree
reveals only a couple of call sites only... but I've almost no clue of
RCU so lemme CC Paul.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2021-01-12 18:35     ` Borislav Petkov
@ 2021-01-12 20:11       ` Paul E. McKenney
  2021-01-13 17:18       ` Jarkko Sakkinen
  1 sibling, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-01-12 20:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jarkko Sakkinen, x86, linux-kernel, linux-sgx,
	Sean Christopherson, Haitao Huang, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Jethro Beekman

On Tue, Jan 12, 2021 at 07:35:50PM +0100, Borislav Petkov wrote:
> + paulmck.
> 
> On Tue, Jan 12, 2021 at 02:08:10AM +0200, Jarkko Sakkinen wrote:
> > On Tue, Jan 05, 2021 at 03:57:49PM +0100, Borislav Petkov wrote:
> > > On Wed, Dec 16, 2020 at 03:49:20PM +0200, Jarkko Sakkinen wrote:
> > > > Add synchronize_srcu_expedited() to sgx_encl_release() to catch a grace
> > > > period initiated by sgx_mmu_notifier_release().
> > > > 
> > > > A trivial example of a failing sequence with tasks A and B:
> > > > 
> > > > 1. A: -> sgx_release()
> > > > 2. B: -> sgx_mmu_notifier_release()
> > > > 3. B: -> list_del_rcu()
> > > > 3. A: -> sgx_encl_release()
> > > > 4. A: -> cleanup_srcu_struct()
> > > > 
> > > > The loop in sgx_release() observes an empty list because B has removed its
> > > > entry in the middle, and calls cleanup_srcu_struct() before B has a chance
> > > > to calls synchronize_srcu().
> > > 
> > > Leading to what? NULL ptr?
> > > 
> > > https://lkml.kernel.org/r/X9e2jOWz1hfXVpQ5@google.com
> > > 
> > > already suggested that you should explain the bug better and add the
> > > splat but I'm still missing that explanation.
> > 
> > OK, I'll try to explain it how I understand the issue.
> > 
> > Consider this loop in the VFS release hook (sgx_release):
> > 
> > 	/*
> > 	 * Drain the remaining mm_list entries. At this point the list contains
> > 	 * entries for processes, which have closed the enclave file but have
> > 	 * not exited yet. The processes, which have exited, are gone from the
> > 	 * list by sgx_mmu_notifier_release().
> > 	 */
> > 	for ( ; ; )  {
> > 		spin_lock(&encl->mm_lock);
> > 
> > 		if (list_empty(&encl->mm_list)) {
> > 			encl_mm = NULL;
> > 		} else {
> > 			encl_mm = list_first_entry(&encl->mm_list,
> > 						   struct sgx_encl_mm, list);
> > 			list_del_rcu(&encl_mm->list);
> > 		}
> > 
> > 		spin_unlock(&encl->mm_lock);
> > 
> > 		/* The enclave is no longer mapped by any mm. */
> > 		if (!encl_mm)
> > 			break;
> > 
> > 		synchronize_srcu(&encl->srcu);
> > 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> > 		kfree(encl_mm);
> > 	}
> > 
> > 
> > At this point all processes have closed the enclave file, but that doesn't
> > mean that they all have exited yet.
> > 
> > Now, let's imagine that there is exactly one entry in the encl->mm_list.
> > and sgx_release() execution gets scheduled right after returning from
> > synchronize_srcu().
> > 
> > With some bad luck, some process comes and removes that last entry befoe
> > sgx_release() acquires mm_lock. The loop in sgx_release() just leaves
> > 
> > 		/* The enclave is no longer mapped by any mm. */
> > 		if (!encl_mm)
> > 			break;
> > 
> > No synchronize_srcu().
> > 
> > After writing this, I think that the placement for synchronize_srcu()
> > in this patch is not best possible. It should be rather that the
> > above loop would also call synchronize_srcu() when leaving.
> > 
> > I.e. the code change would result:
> > 
> > 	for ( ; ; )  {
> > 		spin_lock(&encl->mm_lock);
> > 
> > 		if (list_empty(&encl->mm_list)) {
> > 			encl_mm = NULL;
> > 		} else {
> > 			encl_mm = list_first_entry(&encl->mm_list,
> > 						   struct sgx_encl_mm, list);
> > 			list_del_rcu(&encl_mm->list);
> > 		}
> > 
> > 		spin_unlock(&encl->mm_lock);
> > 
> >                 /* 
> >                  * synchronize_srcu() is mandatory *even* when the list was
> >                  * empty, in order make sure that grace periods stays in
> >                  * sync even when another task took away the last entry
> >                  * (i.e. exiting process when it deletes its mm_list).
> >                  */
> > 		synchronize_srcu(&encl->srcu);
> > 
> > 		/* The enclave is no longer mapped by any mm. */
> > 		if (!encl_mm)
> > 			break;
> > 
> > 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> > 		kfree(encl_mm);
> > 	}
> > 
> > What do you think? Does this start to make more sense now?
> > I don't have logs for this but the bug can be also reasoned.
> 
> It does. Now you need to write it up in a detailed form so that it is
> clear to readers months/years from now what exactly can happen. You can
> use a two-column format like
> 
> 	CPU A				CPU B
> 
> Bla
> 					Blu
> 
> This happens now here
> 					But this needs to happen there
> 
> and so on.
> 
> Also, from reading up a bit on this, Documentation/RCU/checklist.rst says
> 
> "Use of the expedited primitives should be restricted to rare
> configuration-change operations that would not normally be undertaken
> while a real-time workload is running."
> 
> so why are you using synchronize_srcu_expedited()? Grepping the tree
> reveals only a couple of call sites only... but I've almost no clue of
> RCU so lemme CC Paul.

The SRCU expedited grace periods are easier on real-time workloads than
synchronize_rcu_expedited(), but the SRCU variant still burns more CPU
time on a given grace period.  But either way, as the document says
further down "However, real-time workloads can use rcupdate.rcu_normal
kernel boot parameter to completely disable expedited grace periods,
though this might have performance implications."

So what are the performance implications in this case?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2021-01-12 18:35     ` Borislav Petkov
  2021-01-12 20:11       ` Paul E. McKenney
@ 2021-01-13 17:18       ` Jarkko Sakkinen
  2021-01-13 17:46         ` Paul E. McKenney
  1 sibling, 1 reply; 14+ messages in thread
From: Jarkko Sakkinen @ 2021-01-13 17:18 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, linux-kernel, linux-sgx, Sean Christopherson, Haitao Huang,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jethro Beekman,
	Paul E. McKenney

On Tue, Jan 12, 2021 at 07:35:50PM +0100, Borislav Petkov wrote:
> + paulmck.
> 
> On Tue, Jan 12, 2021 at 02:08:10AM +0200, Jarkko Sakkinen wrote:
> > On Tue, Jan 05, 2021 at 03:57:49PM +0100, Borislav Petkov wrote:
> > > On Wed, Dec 16, 2020 at 03:49:20PM +0200, Jarkko Sakkinen wrote:
> > > > Add synchronize_srcu_expedited() to sgx_encl_release() to catch a grace
> > > > period initiated by sgx_mmu_notifier_release().
> > > > 
> > > > A trivial example of a failing sequence with tasks A and B:
> > > > 
> > > > 1. A: -> sgx_release()
> > > > 2. B: -> sgx_mmu_notifier_release()
> > > > 3. B: -> list_del_rcu()
> > > > 3. A: -> sgx_encl_release()
> > > > 4. A: -> cleanup_srcu_struct()
> > > > 
> > > > The loop in sgx_release() observes an empty list because B has removed its
> > > > entry in the middle, and calls cleanup_srcu_struct() before B has a chance
> > > > to calls synchronize_srcu().
> > > 
> > > Leading to what? NULL ptr?
> > > 
> > > https://lkml.kernel.org/r/X9e2jOWz1hfXVpQ5@google.com
> > > 
> > > already suggested that you should explain the bug better and add the
> > > splat but I'm still missing that explanation.
> > 
> > OK, I'll try to explain it how I understand the issue.
> > 
> > Consider this loop in the VFS release hook (sgx_release):
> > 
> > 	/*
> > 	 * Drain the remaining mm_list entries. At this point the list contains
> > 	 * entries for processes, which have closed the enclave file but have
> > 	 * not exited yet. The processes, which have exited, are gone from the
> > 	 * list by sgx_mmu_notifier_release().
> > 	 */
> > 	for ( ; ; )  {
> > 		spin_lock(&encl->mm_lock);
> > 
> > 		if (list_empty(&encl->mm_list)) {
> > 			encl_mm = NULL;
> > 		} else {
> > 			encl_mm = list_first_entry(&encl->mm_list,
> > 						   struct sgx_encl_mm, list);
> > 			list_del_rcu(&encl_mm->list);
> > 		}
> > 
> > 		spin_unlock(&encl->mm_lock);
> > 
> > 		/* The enclave is no longer mapped by any mm. */
> > 		if (!encl_mm)
> > 			break;
> > 
> > 		synchronize_srcu(&encl->srcu);
> > 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> > 		kfree(encl_mm);
> > 	}
> > 
> > 
> > At this point all processes have closed the enclave file, but that doesn't
> > mean that they all have exited yet.
> > 
> > Now, let's imagine that there is exactly one entry in the encl->mm_list.
> > and sgx_release() execution gets scheduled right after returning from
> > synchronize_srcu().
> > 
> > With some bad luck, some process comes and removes that last entry befoe
> > sgx_release() acquires mm_lock. The loop in sgx_release() just leaves
> > 
> > 		/* The enclave is no longer mapped by any mm. */
> > 		if (!encl_mm)
> > 			break;
> > 
> > No synchronize_srcu().
> > 
> > After writing this, I think that the placement for synchronize_srcu()
> > in this patch is not best possible. It should be rather that the
> > above loop would also call synchronize_srcu() when leaving.
> > 
> > I.e. the code change would result:
> > 
> > 	for ( ; ; )  {
> > 		spin_lock(&encl->mm_lock);
> > 
> > 		if (list_empty(&encl->mm_list)) {
> > 			encl_mm = NULL;
> > 		} else {
> > 			encl_mm = list_first_entry(&encl->mm_list,
> > 						   struct sgx_encl_mm, list);
> > 			list_del_rcu(&encl_mm->list);
> > 		}
> > 
> > 		spin_unlock(&encl->mm_lock);
> > 
> >                 /* 
> >                  * synchronize_srcu() is mandatory *even* when the list was
> >                  * empty, in order make sure that grace periods stays in
> >                  * sync even when another task took away the last entry
> >                  * (i.e. exiting process when it deletes its mm_list).
> >                  */
> > 		synchronize_srcu(&encl->srcu);
> > 
> > 		/* The enclave is no longer mapped by any mm. */
> > 		if (!encl_mm)
> > 			break;
> > 
> > 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> > 		kfree(encl_mm);
> > 	}
> > 
> > What do you think? Does this start to make more sense now?
> > I don't have logs for this but the bug can be also reasoned.
> 
> It does. Now you need to write it up in a detailed form so that it is
> clear to readers months/years from now what exactly can happen. You can
> use a two-column format like
> 
> 	CPU A				CPU B
> 
> Bla
> 					Blu
> 
> This happens now here
> 					But this needs to happen there
> 
> and so on.
> 
> Also, from reading up a bit on this, Documentation/RCU/checklist.rst says
> 
> "Use of the expedited primitives should be restricted to rare
> configuration-change operations that would not normally be undertaken
> while a real-time workload is running."
> 
> so why are you using synchronize_srcu_expedited()? Grepping the tree
> reveals only a couple of call sites only... but I've almost no clue of
> RCU so lemme CC Paul.

It spun out of this discussion:

https://lore.kernel.org/linux-sgx/20201215213517.GA34761@kernel.org/raw

My reasoning was that this is not a common case. The main loop
that uses synchronize_srcu().

> 
> -- 
> Regards/Gruss,
>     Boris.

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2021-01-13 17:18       ` Jarkko Sakkinen
@ 2021-01-13 17:46         ` Paul E. McKenney
  2021-01-13 18:00           ` Borislav Petkov
  2021-01-15  1:49           ` Jarkko Sakkinen
  0 siblings, 2 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-01-13 17:46 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Borislav Petkov, x86, linux-kernel, linux-sgx,
	Sean Christopherson, Haitao Huang, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Jethro Beekman

On Wed, Jan 13, 2021 at 07:18:23PM +0200, Jarkko Sakkinen wrote:
> On Tue, Jan 12, 2021 at 07:35:50PM +0100, Borislav Petkov wrote:
> > + paulmck.
> > 
> > On Tue, Jan 12, 2021 at 02:08:10AM +0200, Jarkko Sakkinen wrote:
> > > On Tue, Jan 05, 2021 at 03:57:49PM +0100, Borislav Petkov wrote:
> > > > On Wed, Dec 16, 2020 at 03:49:20PM +0200, Jarkko Sakkinen wrote:
> > > > > Add synchronize_srcu_expedited() to sgx_encl_release() to catch a grace
> > > > > period initiated by sgx_mmu_notifier_release().
> > > > > 
> > > > > A trivial example of a failing sequence with tasks A and B:
> > > > > 
> > > > > 1. A: -> sgx_release()
> > > > > 2. B: -> sgx_mmu_notifier_release()
> > > > > 3. B: -> list_del_rcu()
> > > > > 3. A: -> sgx_encl_release()
> > > > > 4. A: -> cleanup_srcu_struct()
> > > > > 
> > > > > The loop in sgx_release() observes an empty list because B has removed its
> > > > > entry in the middle, and calls cleanup_srcu_struct() before B has a chance
> > > > > to calls synchronize_srcu().
> > > > 
> > > > Leading to what? NULL ptr?
> > > > 
> > > > https://lkml.kernel.org/r/X9e2jOWz1hfXVpQ5@google.com
> > > > 
> > > > already suggested that you should explain the bug better and add the
> > > > splat but I'm still missing that explanation.
> > > 
> > > OK, I'll try to explain it how I understand the issue.
> > > 
> > > Consider this loop in the VFS release hook (sgx_release):
> > > 
> > > 	/*
> > > 	 * Drain the remaining mm_list entries. At this point the list contains
> > > 	 * entries for processes, which have closed the enclave file but have
> > > 	 * not exited yet. The processes, which have exited, are gone from the
> > > 	 * list by sgx_mmu_notifier_release().
> > > 	 */
> > > 	for ( ; ; )  {
> > > 		spin_lock(&encl->mm_lock);
> > > 
> > > 		if (list_empty(&encl->mm_list)) {
> > > 			encl_mm = NULL;
> > > 		} else {
> > > 			encl_mm = list_first_entry(&encl->mm_list,
> > > 						   struct sgx_encl_mm, list);
> > > 			list_del_rcu(&encl_mm->list);
> > > 		}
> > > 
> > > 		spin_unlock(&encl->mm_lock);
> > > 
> > > 		/* The enclave is no longer mapped by any mm. */
> > > 		if (!encl_mm)
> > > 			break;
> > > 
> > > 		synchronize_srcu(&encl->srcu);
> > > 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> > > 		kfree(encl_mm);
> > > 	}
> > > 
> > > 
> > > At this point all processes have closed the enclave file, but that doesn't
> > > mean that they all have exited yet.
> > > 
> > > Now, let's imagine that there is exactly one entry in the encl->mm_list.
> > > and sgx_release() execution gets scheduled right after returning from
> > > synchronize_srcu().
> > > 
> > > With some bad luck, some process comes and removes that last entry befoe
> > > sgx_release() acquires mm_lock. The loop in sgx_release() just leaves
> > > 
> > > 		/* The enclave is no longer mapped by any mm. */
> > > 		if (!encl_mm)
> > > 			break;
> > > 
> > > No synchronize_srcu().
> > > 
> > > After writing this, I think that the placement for synchronize_srcu()
> > > in this patch is not best possible. It should be rather that the
> > > above loop would also call synchronize_srcu() when leaving.
> > > 
> > > I.e. the code change would result:
> > > 
> > > 	for ( ; ; )  {
> > > 		spin_lock(&encl->mm_lock);
> > > 
> > > 		if (list_empty(&encl->mm_list)) {
> > > 			encl_mm = NULL;
> > > 		} else {
> > > 			encl_mm = list_first_entry(&encl->mm_list,
> > > 						   struct sgx_encl_mm, list);
> > > 			list_del_rcu(&encl_mm->list);
> > > 		}
> > > 
> > > 		spin_unlock(&encl->mm_lock);
> > > 
> > >                 /* 
> > >                  * synchronize_srcu() is mandatory *even* when the list was
> > >                  * empty, in order make sure that grace periods stays in
> > >                  * sync even when another task took away the last entry
> > >                  * (i.e. exiting process when it deletes its mm_list).
> > >                  */
> > > 		synchronize_srcu(&encl->srcu);
> > > 
> > > 		/* The enclave is no longer mapped by any mm. */
> > > 		if (!encl_mm)
> > > 			break;
> > > 
> > > 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> > > 		kfree(encl_mm);
> > > 	}
> > > 
> > > What do you think? Does this start to make more sense now?
> > > I don't have logs for this but the bug can be also reasoned.
> > 
> > It does. Now you need to write it up in a detailed form so that it is
> > clear to readers months/years from now what exactly can happen. You can
> > use a two-column format like
> > 
> > 	CPU A				CPU B
> > 
> > Bla
> > 					Blu
> > 
> > This happens now here
> > 					But this needs to happen there
> > 
> > and so on.
> > 
> > Also, from reading up a bit on this, Documentation/RCU/checklist.rst says
> > 
> > "Use of the expedited primitives should be restricted to rare
> > configuration-change operations that would not normally be undertaken
> > while a real-time workload is running."
> > 
> > so why are you using synchronize_srcu_expedited()? Grepping the tree
> > reveals only a couple of call sites only... but I've almost no clue of
> > RCU so lemme CC Paul.
> 
> It spun out of this discussion:
> 
> https://lore.kernel.org/linux-sgx/20201215213517.GA34761@kernel.org/raw
> 
> My reasoning was that this is not a common case. The main loop
> that uses synchronize_srcu().

It seems to me that loading and unloading SGX enclaves qualifies as a
configuration operation, so use of synchronize_srcu_expedited() should be
just fine in that case.  This of course implies that SGX enclaves should
not be loaded or unloaded while an aggressive real-time application
is running.  Which might well be the case for other reasons.

So I believe synchronize_srcu_expedited() should be fine in this case.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2021-01-13 17:46         ` Paul E. McKenney
@ 2021-01-13 18:00           ` Borislav Petkov
  2021-01-13 18:22             ` Paul E. McKenney
  2021-01-15  1:49           ` Jarkko Sakkinen
  1 sibling, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2021-01-13 18:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jarkko Sakkinen, x86, linux-kernel, linux-sgx,
	Sean Christopherson, Haitao Huang, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Jethro Beekman

On Wed, Jan 13, 2021 at 09:46:02AM -0800, Paul E. McKenney wrote:

< Lemme trim that mail fat >

> It seems to me that loading and unloading SGX enclaves qualifies as a
> configuration operation, so use of synchronize_srcu_expedited() should be
> just fine in that case.  This of course implies that SGX enclaves should
> not be loaded or unloaded while an aggressive real-time application
> is running.  Which might well be the case for other reasons.

I believe RT and SGX should be orthogonal to each-other unless someone rolls out
of the woodwork, wanting to run realtime enclaves... Ewww.

> So I believe synchronize_srcu_expedited() should be fine in this case.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2021-01-13 18:00           ` Borislav Petkov
@ 2021-01-13 18:22             ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-01-13 18:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jarkko Sakkinen, x86, linux-kernel, linux-sgx,
	Sean Christopherson, Haitao Huang, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Jethro Beekman

On Wed, Jan 13, 2021 at 07:00:33PM +0100, Borislav Petkov wrote:
> On Wed, Jan 13, 2021 at 09:46:02AM -0800, Paul E. McKenney wrote:
> 
> < Lemme trim that mail fat >
> 
> > It seems to me that loading and unloading SGX enclaves qualifies as a
> > configuration operation, so use of synchronize_srcu_expedited() should be
> > just fine in that case.  This of course implies that SGX enclaves should
> > not be loaded or unloaded while an aggressive real-time application
> > is running.  Which might well be the case for other reasons.
> 
> I believe RT and SGX should be orthogonal to each-other unless someone rolls out
> of the woodwork, wanting to run realtime enclaves... Ewww.

I could speculate about an RT workload running on a system that also
ran non-realtime SGX workloads, but who knows?  Stranger things have
happened.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2021-01-12  0:08   ` Jarkko Sakkinen
  2021-01-12 18:35     ` Borislav Petkov
@ 2021-01-14  4:42     ` Haitao Huang
  2021-01-15  9:58       ` Jarkko Sakkinen
  1 sibling, 1 reply; 14+ messages in thread
From: Haitao Huang @ 2021-01-14  4:42 UTC (permalink / raw)
  To: Borislav Petkov, Jarkko Sakkinen
  Cc: x86, linux-kernel, linux-sgx, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jethro Beekman

On Mon, 11 Jan 2021 18:08:10 -0600, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Tue, Jan 05, 2021 at 03:57:49PM +0100, Borislav Petkov wrote:
>> On Wed, Dec 16, 2020 at 03:49:20PM +0200, Jarkko Sakkinen wrote:
>> > Add synchronize_srcu_expedited() to sgx_encl_release() to catch a  
>> grace
>> > period initiated by sgx_mmu_notifier_release().
>> >
>> > A trivial example of a failing sequence with tasks A and B:
>> >
>> > 1. A: -> sgx_release()
>> > 2. B: -> sgx_mmu_notifier_release()
>> > 3. B: -> list_del_rcu()
>> > 3. A: -> sgx_encl_release()
>> > 4. A: -> cleanup_srcu_struct()
>> >
>> > The loop in sgx_release() observes an empty list because B has  
>> removed its
>> > entry in the middle, and calls cleanup_srcu_struct() before B has a  
>> chance
>> > to calls synchronize_srcu().
>>
>> Leading to what? NULL ptr?
>>
>> https://lkml.kernel.org/r/X9e2jOWz1hfXVpQ5@google.com
>>
>> already suggested that you should explain the bug better and add the
>> splat but I'm still missing that explanation.
>
> OK, I'll try to explain it how I understand the issue.
>
> Consider this loop in the VFS release hook (sgx_release):
>
> 	/*
> 	 * Drain the remaining mm_list entries. At this point the list contains
> 	 * entries for processes, which have closed the enclave file but have
> 	 * not exited yet. The processes, which have exited, are gone from the
> 	 * list by sgx_mmu_notifier_release().
> 	 */
> 	for ( ; ; )  {
> 		spin_lock(&encl->mm_lock);
>
> 		if (list_empty(&encl->mm_list)) {
> 			encl_mm = NULL;
> 		} else {
> 			encl_mm = list_first_entry(&encl->mm_list,
> 						   struct sgx_encl_mm, list);
> 			list_del_rcu(&encl_mm->list);
> 		}
>
> 		spin_unlock(&encl->mm_lock);
>
> 		/* The enclave is no longer mapped by any mm. */
> 		if (!encl_mm)
> 			break;
>
> 		synchronize_srcu(&encl->srcu);
> 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> 		kfree(encl_mm);
> 	}
>
>
> At this point all processes have closed the enclave file, but that  
> doesn't
> mean that they all have exited yet.
>
> Now, let's imagine that there is exactly one entry in the encl->mm_list.
> and sgx_release() execution gets scheduled right after returning from
> synchronize_srcu().
>
> With some bad luck, some process comes and removes that last entry befoe
> sgx_release() acquires mm_lock. The loop in sgx_release() just leaves
>
> 		/* The enclave is no longer mapped by any mm. */
> 		if (!encl_mm)
> 			break;
>
> No synchronize_srcu().
>
> After writing this, I think that the placement for synchronize_srcu()
> in this patch is not best possible. It should be rather that the
> above loop would also call synchronize_srcu() when leaving.
>
> I.e. the code change would result:
>
> 	for ( ; ; )  {
> 		spin_lock(&encl->mm_lock);
>
> 		if (list_empty(&encl->mm_list)) {
> 			encl_mm = NULL;
> 		} else {
> 			encl_mm = list_first_entry(&encl->mm_list,
> 						   struct sgx_encl_mm, list);
> 			list_del_rcu(&encl_mm->list);
> 		}
>
> 		spin_unlock(&encl->mm_lock);
>
>                 /*
>                  * synchronize_srcu() is mandatory *even* when the list  
> was
>                  * empty, in order make sure that grace periods stays in
>                  * sync even when another task took away the last entry
>                  * (i.e. exiting process when it deletes its mm_list).
>                  */
> 		synchronize_srcu(&encl->srcu);
>
> 		/* The enclave is no longer mapped by any mm. */
> 		if (!encl_mm)
> 			break;
>
> 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> 		kfree(encl_mm);
> 	}
>
> What do you think? Does this start to make more sense now?
> I don't have logs for this but the bug can be also reasoned.
>
> /Jarkko

I did this experiment just now and find it runs much much slower than both  
original code and code with synchronize_srcu_expedited fix in this patch.
Haitao

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2021-01-13 17:46         ` Paul E. McKenney
  2021-01-13 18:00           ` Borislav Petkov
@ 2021-01-15  1:49           ` Jarkko Sakkinen
  1 sibling, 0 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2021-01-15  1:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Borislav Petkov, x86, linux-kernel, linux-sgx,
	Sean Christopherson, Haitao Huang, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Jethro Beekman

On Wed, Jan 13, 2021 at 09:46:02AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 13, 2021 at 07:18:23PM +0200, Jarkko Sakkinen wrote:
> > On Tue, Jan 12, 2021 at 07:35:50PM +0100, Borislav Petkov wrote:
> > > + paulmck.
> > > 
> > > On Tue, Jan 12, 2021 at 02:08:10AM +0200, Jarkko Sakkinen wrote:
> > > > On Tue, Jan 05, 2021 at 03:57:49PM +0100, Borislav Petkov wrote:
> > > > > On Wed, Dec 16, 2020 at 03:49:20PM +0200, Jarkko Sakkinen wrote:
> > > > > > Add synchronize_srcu_expedited() to sgx_encl_release() to catch a grace
> > > > > > period initiated by sgx_mmu_notifier_release().
> > > > > > 
> > > > > > A trivial example of a failing sequence with tasks A and B:
> > > > > > 
> > > > > > 1. A: -> sgx_release()
> > > > > > 2. B: -> sgx_mmu_notifier_release()
> > > > > > 3. B: -> list_del_rcu()
> > > > > > 3. A: -> sgx_encl_release()
> > > > > > 4. A: -> cleanup_srcu_struct()
> > > > > > 
> > > > > > The loop in sgx_release() observes an empty list because B has removed its
> > > > > > entry in the middle, and calls cleanup_srcu_struct() before B has a chance
> > > > > > to calls synchronize_srcu().
> > > > > 
> > > > > Leading to what? NULL ptr?
> > > > > 
> > > > > https://lkml.kernel.org/r/X9e2jOWz1hfXVpQ5@google.com
> > > > > 
> > > > > already suggested that you should explain the bug better and add the
> > > > > splat but I'm still missing that explanation.
> > > > 
> > > > OK, I'll try to explain it how I understand the issue.
> > > > 
> > > > Consider this loop in the VFS release hook (sgx_release):
> > > > 
> > > > 	/*
> > > > 	 * Drain the remaining mm_list entries. At this point the list contains
> > > > 	 * entries for processes, which have closed the enclave file but have
> > > > 	 * not exited yet. The processes, which have exited, are gone from the
> > > > 	 * list by sgx_mmu_notifier_release().
> > > > 	 */
> > > > 	for ( ; ; )  {
> > > > 		spin_lock(&encl->mm_lock);
> > > > 
> > > > 		if (list_empty(&encl->mm_list)) {
> > > > 			encl_mm = NULL;
> > > > 		} else {
> > > > 			encl_mm = list_first_entry(&encl->mm_list,
> > > > 						   struct sgx_encl_mm, list);
> > > > 			list_del_rcu(&encl_mm->list);
> > > > 		}
> > > > 
> > > > 		spin_unlock(&encl->mm_lock);
> > > > 
> > > > 		/* The enclave is no longer mapped by any mm. */
> > > > 		if (!encl_mm)
> > > > 			break;
> > > > 
> > > > 		synchronize_srcu(&encl->srcu);
> > > > 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> > > > 		kfree(encl_mm);
> > > > 	}
> > > > 
> > > > 
> > > > At this point all processes have closed the enclave file, but that doesn't
> > > > mean that they all have exited yet.
> > > > 
> > > > Now, let's imagine that there is exactly one entry in the encl->mm_list.
> > > > and sgx_release() execution gets scheduled right after returning from
> > > > synchronize_srcu().
> > > > 
> > > > With some bad luck, some process comes and removes that last entry befoe
> > > > sgx_release() acquires mm_lock. The loop in sgx_release() just leaves
> > > > 
> > > > 		/* The enclave is no longer mapped by any mm. */
> > > > 		if (!encl_mm)
> > > > 			break;
> > > > 
> > > > No synchronize_srcu().
> > > > 
> > > > After writing this, I think that the placement for synchronize_srcu()
> > > > in this patch is not best possible. It should be rather that the
> > > > above loop would also call synchronize_srcu() when leaving.
> > > > 
> > > > I.e. the code change would result:
> > > > 
> > > > 	for ( ; ; )  {
> > > > 		spin_lock(&encl->mm_lock);
> > > > 
> > > > 		if (list_empty(&encl->mm_list)) {
> > > > 			encl_mm = NULL;
> > > > 		} else {
> > > > 			encl_mm = list_first_entry(&encl->mm_list,
> > > > 						   struct sgx_encl_mm, list);
> > > > 			list_del_rcu(&encl_mm->list);
> > > > 		}
> > > > 
> > > > 		spin_unlock(&encl->mm_lock);
> > > > 
> > > >                 /* 
> > > >                  * synchronize_srcu() is mandatory *even* when the list was
> > > >                  * empty, in order make sure that grace periods stays in
> > > >                  * sync even when another task took away the last entry
> > > >                  * (i.e. exiting process when it deletes its mm_list).
> > > >                  */
> > > > 		synchronize_srcu(&encl->srcu);
> > > > 
> > > > 		/* The enclave is no longer mapped by any mm. */
> > > > 		if (!encl_mm)
> > > > 			break;
> > > > 
> > > > 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> > > > 		kfree(encl_mm);
> > > > 	}
> > > > 
> > > > What do you think? Does this start to make more sense now?
> > > > I don't have logs for this but the bug can be also reasoned.
> > > 
> > > It does. Now you need to write it up in a detailed form so that it is
> > > clear to readers months/years from now what exactly can happen. You can
> > > use a two-column format like
> > > 
> > > 	CPU A				CPU B
> > > 
> > > Bla
> > > 					Blu
> > > 
> > > This happens now here
> > > 					But this needs to happen there
> > > 
> > > and so on.
> > > 
> > > Also, from reading up a bit on this, Documentation/RCU/checklist.rst says
> > > 
> > > "Use of the expedited primitives should be restricted to rare
> > > configuration-change operations that would not normally be undertaken
> > > while a real-time workload is running."
> > > 
> > > so why are you using synchronize_srcu_expedited()? Grepping the tree
> > > reveals only a couple of call sites only... but I've almost no clue of
> > > RCU so lemme CC Paul.
> > 
> > It spun out of this discussion:
> > 
> > https://lore.kernel.org/linux-sgx/20201215213517.GA34761@kernel.org/raw
> > 
> > My reasoning was that this is not a common case. The main loop
> > that uses synchronize_srcu().
> 
> It seems to me that loading and unloading SGX enclaves qualifies as a
> configuration operation, so use of synchronize_srcu_expedited() should be
> just fine in that case.  This of course implies that SGX enclaves should
> not be loaded or unloaded while an aggressive real-time application
> is running.  Which might well be the case for other reasons.
> 
> So I believe synchronize_srcu_expedited() should be fine in this case.
> 
> 							Thanx, Paul

Thank you for explaining this in detail.

I'll leave it out of the bug fix, and reconsider as a separate patch. I
think it should be fine to use it here but is really out-of-scope for the
change.

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
  2021-01-14  4:42     ` Haitao Huang
@ 2021-01-15  9:58       ` Jarkko Sakkinen
  0 siblings, 0 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2021-01-15  9:58 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Borislav Petkov, x86, linux-kernel, linux-sgx,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Jethro Beekman

On Wed, Jan 13, 2021 at 10:42:12PM -0600, Haitao Huang wrote:
> On Mon, 11 Jan 2021 18:08:10 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > On Tue, Jan 05, 2021 at 03:57:49PM +0100, Borislav Petkov wrote:
> > > On Wed, Dec 16, 2020 at 03:49:20PM +0200, Jarkko Sakkinen wrote:
> > > > Add synchronize_srcu_expedited() to sgx_encl_release() to catch a
> > > grace
> > > > period initiated by sgx_mmu_notifier_release().
> > > >
> > > > A trivial example of a failing sequence with tasks A and B:
> > > >
> > > > 1. A: -> sgx_release()
> > > > 2. B: -> sgx_mmu_notifier_release()
> > > > 3. B: -> list_del_rcu()
> > > > 3. A: -> sgx_encl_release()
> > > > 4. A: -> cleanup_srcu_struct()
> > > >
> > > > The loop in sgx_release() observes an empty list because B has
> > > removed its
> > > > entry in the middle, and calls cleanup_srcu_struct() before B has
> > > a chance
> > > > to calls synchronize_srcu().
> > > 
> > > Leading to what? NULL ptr?
> > > 
> > > https://lkml.kernel.org/r/X9e2jOWz1hfXVpQ5@google.com
> > > 
> > > already suggested that you should explain the bug better and add the
> > > splat but I'm still missing that explanation.
> > 
> > OK, I'll try to explain it how I understand the issue.
> > 
> > Consider this loop in the VFS release hook (sgx_release):
> > 
> > 	/*
> > 	 * Drain the remaining mm_list entries. At this point the list contains
> > 	 * entries for processes, which have closed the enclave file but have
> > 	 * not exited yet. The processes, which have exited, are gone from the
> > 	 * list by sgx_mmu_notifier_release().
> > 	 */
> > 	for ( ; ; )  {
> > 		spin_lock(&encl->mm_lock);
> > 
> > 		if (list_empty(&encl->mm_list)) {
> > 			encl_mm = NULL;
> > 		} else {
> > 			encl_mm = list_first_entry(&encl->mm_list,
> > 						   struct sgx_encl_mm, list);
> > 			list_del_rcu(&encl_mm->list);
> > 		}
> > 
> > 		spin_unlock(&encl->mm_lock);
> > 
> > 		/* The enclave is no longer mapped by any mm. */
> > 		if (!encl_mm)
> > 			break;
> > 
> > 		synchronize_srcu(&encl->srcu);
> > 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> > 		kfree(encl_mm);
> > 	}
> > 
> > 
> > At this point all processes have closed the enclave file, but that
> > doesn't
> > mean that they all have exited yet.
> > 
> > Now, let's imagine that there is exactly one entry in the encl->mm_list.
> > and sgx_release() execution gets scheduled right after returning from
> > synchronize_srcu().
> > 
> > With some bad luck, some process comes and removes that last entry befoe
> > sgx_release() acquires mm_lock. The loop in sgx_release() just leaves
> > 
> > 		/* The enclave is no longer mapped by any mm. */
> > 		if (!encl_mm)
> > 			break;
> > 
> > No synchronize_srcu().
> > 
> > After writing this, I think that the placement for synchronize_srcu()
> > in this patch is not best possible. It should be rather that the
> > above loop would also call synchronize_srcu() when leaving.
> > 
> > I.e. the code change would result:
> > 
> > 	for ( ; ; )  {
> > 		spin_lock(&encl->mm_lock);
> > 
> > 		if (list_empty(&encl->mm_list)) {
> > 			encl_mm = NULL;
> > 		} else {
> > 			encl_mm = list_first_entry(&encl->mm_list,
> > 						   struct sgx_encl_mm, list);
> > 			list_del_rcu(&encl_mm->list);
> > 		}
> > 
> > 		spin_unlock(&encl->mm_lock);
> > 
> >                 /*
> >                  * synchronize_srcu() is mandatory *even* when the list
> > was
> >                  * empty, in order make sure that grace periods stays in
> >                  * sync even when another task took away the last entry
> >                  * (i.e. exiting process when it deletes its mm_list).
> >                  */
> > 		synchronize_srcu(&encl->srcu);
> > 
> > 		/* The enclave is no longer mapped by any mm. */
> > 		if (!encl_mm)
> > 			break;
> > 
> > 		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> > 		kfree(encl_mm);
> > 	}
> > 
> > What do you think? Does this start to make more sense now?
> > I don't have logs for this but the bug can be also reasoned.
> > 
> > /Jarkko
> 
> I did this experiment just now and find it runs much much slower than both
> original code and code with synchronize_srcu_expedited fix in this patch.
> Haitao

Yeah, but using expedited is not really in the scope of a bug fix.

It's a change that needs to be considered separately.

And it's more complicated than that. It also needs data to back it
up. How we can test and compare?

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-01-15  9:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 13:49 [PATCH v3] x86/sgx: Synchronize encl->srcu in sgx_encl_release() Jarkko Sakkinen
2021-01-04 20:22 ` Haitao Huang
2021-01-11 23:41   ` Jarkko Sakkinen
2021-01-05 14:57 ` Borislav Petkov
2021-01-12  0:08   ` Jarkko Sakkinen
2021-01-12 18:35     ` Borislav Petkov
2021-01-12 20:11       ` Paul E. McKenney
2021-01-13 17:18       ` Jarkko Sakkinen
2021-01-13 17:46         ` Paul E. McKenney
2021-01-13 18:00           ` Borislav Petkov
2021-01-13 18:22             ` Paul E. McKenney
2021-01-15  1:49           ` Jarkko Sakkinen
2021-01-14  4:42     ` Haitao Huang
2021-01-15  9:58       ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).