stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/sgx: Silence softlockup detection when releasing large enclaves
@ 2022-01-18 19:14 Reinette Chatre
  2022-01-20 13:01 ` Jarkko Sakkinen
  0 siblings, 1 reply; 5+ messages in thread
From: Reinette Chatre @ 2022-01-18 19:14 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: linux-kernel, stable

Vijay reported that the "unclobbered_vdso_oversubscribed" selftest
triggers the softlockup detector.

Actual SGX systems have 128GB of enclave memory or more.  The
"unclobbered_vdso_oversubscribed" selftest creates one enclave which
consumes all of the enclave memory on the system. Tearing down such a
large enclave takes around a minute, most of it in the loop where
the EREMOVE instruction is applied to each individual 4k enclave page.

Spending one minute in a loop triggers the softlockup detector.

Add a cond_resched() to give other tasks a chance to run and placate
the softlockup detector.

Cc: stable@vger.kernel.org
Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Reported-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Softlockup message:
watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [test_sgx:11502]
Kernel panic - not syncing: softlockup: hung tasks
<snip>
sgx_encl_release+0x86/0x1c0
sgx_release+0x11c/0x130
__fput+0xb0/0x280
____fput+0xe/0x10
task_work_run+0x6c/0xc0
exit_to_user_mode_prepare+0x1eb/0x1f0
syscall_exit_to_user_mode+0x1d/0x50
do_syscall_64+0x46/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae

 arch/x86/kernel/cpu/sgx/encl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..ab2b79327a8a 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -410,6 +410,7 @@ void sgx_encl_release(struct kref *ref)
 		}
 
 		kfree(entry);
+		cond_resched();
 	}
 
 	xa_destroy(&encl->page_array);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/sgx: Silence softlockup detection when releasing large enclaves
  2022-01-18 19:14 [PATCH] x86/sgx: Silence softlockup detection when releasing large enclaves Reinette Chatre
@ 2022-01-20 13:01 ` Jarkko Sakkinen
  2022-01-20 16:28   ` Reinette Chatre
  0 siblings, 1 reply; 5+ messages in thread
From: Jarkko Sakkinen @ 2022-01-20 13:01 UTC (permalink / raw)
  To: Reinette Chatre, dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: linux-kernel, stable

On Tue, 2022-01-18 at 11:14 -0800, Reinette Chatre wrote:
> Vijay reported that the "unclobbered_vdso_oversubscribed" selftest
> triggers the softlockup detector.
> 
> Actual SGX systems have 128GB of enclave memory or more.  The
> "unclobbered_vdso_oversubscribed" selftest creates one enclave which
> consumes all of the enclave memory on the system. Tearing down such a
> large enclave takes around a minute, most of it in the loop where
> the EREMOVE instruction is applied to each individual 4k enclave
> page.
> 
> Spending one minute in a loop triggers the softlockup detector.
> 
> Add a cond_resched() to give other tasks a chance to run and placate
> the softlockup detector.
> 
> Cc: stable@vger.kernel.org
> Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
> Reported-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Softlockup message:
> watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [test_sgx:11502]
> Kernel panic - not syncing: softlockup: hung tasks
> <snip>
> sgx_encl_release+0x86/0x1c0
> sgx_release+0x11c/0x130
> __fput+0xb0/0x280
> ____fput+0xe/0x10
> task_work_run+0x6c/0xc0
> exit_to_user_mode_prepare+0x1eb/0x1f0
> syscall_exit_to_user_mode+0x1d/0x50
> do_syscall_64+0x46/0xb0
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
>  arch/x86/kernel/cpu/sgx/encl.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c
> b/arch/x86/kernel/cpu/sgx/encl.c
> index 001808e3901c..ab2b79327a8a 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -410,6 +410,7 @@ void sgx_encl_release(struct kref *ref)
>                 }
>  
>                 kfree(entry);
> +               cond_resched();
>         }
>  
>         xa_destroy(&encl->page_array);

I'd add a comment, e.g.

/* Invoke scheduler to prevent soft lockups. */

Other than that makes sense.

BR, Jarkko


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/sgx: Silence softlockup detection when releasing large enclaves
  2022-01-20 13:01 ` Jarkko Sakkinen
@ 2022-01-20 16:28   ` Reinette Chatre
  2022-01-26 14:29     ` Jarkko Sakkinen
  0 siblings, 1 reply; 5+ messages in thread
From: Reinette Chatre @ 2022-01-20 16:28 UTC (permalink / raw)
  To: Jarkko Sakkinen, dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: linux-kernel, stable

Hi Jarkko,

On 1/20/2022 5:01 AM, Jarkko Sakkinen wrote:
> On Tue, 2022-01-18 at 11:14 -0800, Reinette Chatre wrote:
>> Vijay reported that the "unclobbered_vdso_oversubscribed" selftest
>> triggers the softlockup detector.
>>
>> Actual SGX systems have 128GB of enclave memory or more.  The
>> "unclobbered_vdso_oversubscribed" selftest creates one enclave which
>> consumes all of the enclave memory on the system. Tearing down such a
>> large enclave takes around a minute, most of it in the loop where
>> the EREMOVE instruction is applied to each individual 4k enclave
>> page.
>>
>> Spending one minute in a loop triggers the softlockup detector.
>>
>> Add a cond_resched() to give other tasks a chance to run and placate
>> the softlockup detector.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
>> Reported-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>> Softlockup message:
>> watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [test_sgx:11502]
>> Kernel panic - not syncing: softlockup: hung tasks
>> <snip>
>> sgx_encl_release+0x86/0x1c0
>> sgx_release+0x11c/0x130
>> __fput+0xb0/0x280
>> ____fput+0xe/0x10
>> task_work_run+0x6c/0xc0
>> exit_to_user_mode_prepare+0x1eb/0x1f0
>> syscall_exit_to_user_mode+0x1d/0x50
>> do_syscall_64+0x46/0xb0
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>>  arch/x86/kernel/cpu/sgx/encl.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c
>> b/arch/x86/kernel/cpu/sgx/encl.c
>> index 001808e3901c..ab2b79327a8a 100644
>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>> @@ -410,6 +410,7 @@ void sgx_encl_release(struct kref *ref)
>>                 }
>>  
>>                 kfree(entry);
>> +               cond_resched();
>>         }
>>  
>>         xa_destroy(&encl->page_array);
> 
> I'd add a comment, e.g.
> 
> /* Invoke scheduler to prevent soft lockups. */

I could do that. I would like to point out though that there are already
six other usages of cond_resched() in the driver and it does indeed
seem to be the common pattern. When adding this comment to the now
seventh usage it would be the first comment documenting the usage of
cond_resched() in the driver.

> 
> Other than that makes sense.

Thank you very much for taking a look.

Reinette

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/sgx: Silence softlockup detection when releasing large enclaves
  2022-01-20 16:28   ` Reinette Chatre
@ 2022-01-26 14:29     ` Jarkko Sakkinen
  2022-01-26 14:30       ` Jarkko Sakkinen
  0 siblings, 1 reply; 5+ messages in thread
From: Jarkko Sakkinen @ 2022-01-26 14:29 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, linux-kernel, stable

On Thu, Jan 20, 2022 at 08:28:36AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 1/20/2022 5:01 AM, Jarkko Sakkinen wrote:
> > On Tue, 2022-01-18 at 11:14 -0800, Reinette Chatre wrote:
> >> Vijay reported that the "unclobbered_vdso_oversubscribed" selftest
> >> triggers the softlockup detector.
> >>
> >> Actual SGX systems have 128GB of enclave memory or more.  The
> >> "unclobbered_vdso_oversubscribed" selftest creates one enclave which
> >> consumes all of the enclave memory on the system. Tearing down such a
> >> large enclave takes around a minute, most of it in the loop where
> >> the EREMOVE instruction is applied to each individual 4k enclave
> >> page.
> >>
> >> Spending one minute in a loop triggers the softlockup detector.
> >>
> >> Add a cond_resched() to give other tasks a chance to run and placate
> >> the softlockup detector.
> >>
> >> Cc: stable@vger.kernel.org
> >> Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
> >> Reported-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
> >> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> >> ---
> >> Softlockup message:
> >> watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [test_sgx:11502]
> >> Kernel panic - not syncing: softlockup: hung tasks
> >> <snip>
> >> sgx_encl_release+0x86/0x1c0
> >> sgx_release+0x11c/0x130
> >> __fput+0xb0/0x280
> >> ____fput+0xe/0x10
> >> task_work_run+0x6c/0xc0
> >> exit_to_user_mode_prepare+0x1eb/0x1f0
> >> syscall_exit_to_user_mode+0x1d/0x50
> >> do_syscall_64+0x46/0xb0
> >> entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>
> >>  arch/x86/kernel/cpu/sgx/encl.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/arch/x86/kernel/cpu/sgx/encl.c
> >> b/arch/x86/kernel/cpu/sgx/encl.c
> >> index 001808e3901c..ab2b79327a8a 100644
> >> --- a/arch/x86/kernel/cpu/sgx/encl.c
> >> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> >> @@ -410,6 +410,7 @@ void sgx_encl_release(struct kref *ref)
> >>                 }
> >>  
> >>                 kfree(entry);
> >> +               cond_resched();
> >>         }
> >>  
> >>         xa_destroy(&encl->page_array);
> > 
> > I'd add a comment, e.g.
> > 
> > /* Invoke scheduler to prevent soft lockups. */
> 
> I could do that. I would like to point out though that there are already
> six other usages of cond_resched() in the driver and it does indeed
> seem to be the common pattern. When adding this comment to the now
> seventh usage it would be the first comment documenting the usage of
> cond_resched() in the driver.
> 
> > 
> > Other than that makes sense.
> 
> Thank you very much for taking a look.

Well, I believe in inline comments to evolution. As in here it was missing,
a reminder makes sense.

/Jarkko

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/sgx: Silence softlockup detection when releasing large enclaves
  2022-01-26 14:29     ` Jarkko Sakkinen
@ 2022-01-26 14:30       ` Jarkko Sakkinen
  0 siblings, 0 replies; 5+ messages in thread
From: Jarkko Sakkinen @ 2022-01-26 14:30 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, linux-kernel, stable

On Wed, Jan 26, 2022 at 04:29:12PM +0200, Jarkko Sakkinen wrote:
> On Thu, Jan 20, 2022 at 08:28:36AM -0800, Reinette Chatre wrote:
> > Hi Jarkko,
> > 
> > On 1/20/2022 5:01 AM, Jarkko Sakkinen wrote:
> > > On Tue, 2022-01-18 at 11:14 -0800, Reinette Chatre wrote:
> > >> Vijay reported that the "unclobbered_vdso_oversubscribed" selftest
> > >> triggers the softlockup detector.
> > >>
> > >> Actual SGX systems have 128GB of enclave memory or more.  The
> > >> "unclobbered_vdso_oversubscribed" selftest creates one enclave which
> > >> consumes all of the enclave memory on the system. Tearing down such a
> > >> large enclave takes around a minute, most of it in the loop where
> > >> the EREMOVE instruction is applied to each individual 4k enclave
> > >> page.
> > >>
> > >> Spending one minute in a loop triggers the softlockup detector.
> > >>
> > >> Add a cond_resched() to give other tasks a chance to run and placate
> > >> the softlockup detector.
> > >>
> > >> Cc: stable@vger.kernel.org
> > >> Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
> > >> Reported-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
> > >> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> > >> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > >> ---
> > >> Softlockup message:
> > >> watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [test_sgx:11502]
> > >> Kernel panic - not syncing: softlockup: hung tasks
> > >> <snip>
> > >> sgx_encl_release+0x86/0x1c0
> > >> sgx_release+0x11c/0x130
> > >> __fput+0xb0/0x280
> > >> ____fput+0xe/0x10
> > >> task_work_run+0x6c/0xc0
> > >> exit_to_user_mode_prepare+0x1eb/0x1f0
> > >> syscall_exit_to_user_mode+0x1d/0x50
> > >> do_syscall_64+0x46/0xb0
> > >> entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >>
> > >>  arch/x86/kernel/cpu/sgx/encl.c | 1 +
> > >>  1 file changed, 1 insertion(+)
> > >>
> > >> diff --git a/arch/x86/kernel/cpu/sgx/encl.c
> > >> b/arch/x86/kernel/cpu/sgx/encl.c
> > >> index 001808e3901c..ab2b79327a8a 100644
> > >> --- a/arch/x86/kernel/cpu/sgx/encl.c
> > >> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > >> @@ -410,6 +410,7 @@ void sgx_encl_release(struct kref *ref)
> > >>                 }
> > >>  
> > >>                 kfree(entry);
> > >> +               cond_resched();
> > >>         }
> > >>  
> > >>         xa_destroy(&encl->page_array);
> > > 
> > > I'd add a comment, e.g.
> > > 
> > > /* Invoke scheduler to prevent soft lockups. */
> > 
> > I could do that. I would like to point out though that there are already
> > six other usages of cond_resched() in the driver and it does indeed
> > seem to be the common pattern. When adding this comment to the now
> > seventh usage it would be the first comment documenting the usage of
> > cond_resched() in the driver.
> > 
> > > 
> > > Other than that makes sense.
> > 
> > Thank you very much for taking a look.
> 
> Well, I believe in inline comments to evolution. As in here it was missing,
> a reminder makes sense.

E.g. there gazillion uses of kmalloc() in kernel but still not all of them
have a comment bound to them...

BR, Jarkko

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-26 14:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-18 19:14 [PATCH] x86/sgx: Silence softlockup detection when releasing large enclaves Reinette Chatre
2022-01-20 13:01 ` Jarkko Sakkinen
2022-01-20 16:28   ` Reinette Chatre
2022-01-26 14:29     ` Jarkko Sakkinen
2022-01-26 14:30       ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).