linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] KVM: SVM: Periodically schedule when unregistering regions on destroy
@ 2020-08-25 19:56 David Rientjes
  2020-09-11  7:57 ` David Rientjes
  2020-09-11 17:24 ` Paolo Bonzini
  0 siblings, 2 replies; 3+ messages in thread
From: David Rientjes @ 2020-08-25 19:56 UTC (permalink / raw)
  To: Tom Lendacky, Brijesh Singh, Paolo Bonzini, Sean Christopherson
  Cc: Joerg Roedel, linux-kernel, x86, kvm

There may be many encrypted regions that need to be unregistered when a
SEV VM is destroyed.  This can lead to soft lockups.  For example, on a
host running 4.15:

watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348]
CPU: 206 PID: 194348 Comm: t_virtual_machi
RIP: 0010:free_unref_page_list+0x105/0x170
...
Call Trace:
 [<0>] release_pages+0x159/0x3d0
 [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd]
 [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd]
 [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd]
 [<0>] kvm_arch_destroy_vm+0x47/0x200
 [<0>] kvm_put_kvm+0x1a8/0x2f0
 [<0>] kvm_vm_release+0x25/0x30
 [<0>] do_exit+0x335/0xc10
 [<0>] do_group_exit+0x3f/0xa0
 [<0>] get_signal+0x1bc/0x670
 [<0>] do_signal+0x31/0x130

Although the CLFLUSH is no longer issued on every encrypted region to be
unregistered, there are no other changes that can prevent soft lockups for
very large SEV VMs in the latest kernel.

Periodically schedule if necessary.  This still holds kvm->lock across the
resched, but since this only happens when the VM is destroyed this is
assumed to be acceptable.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/x86/kvm/svm/sev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1106,6 +1106,7 @@ void sev_vm_destroy(struct kvm *kvm)
 		list_for_each_safe(pos, q, head) {
 			__unregister_enc_region_locked(kvm,
 				list_entry(pos, struct enc_region, list));
+			cond_resched();
 		}
 	}
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [patch] KVM: SVM: Periodically schedule when unregistering regions on destroy
  2020-08-25 19:56 [patch] KVM: SVM: Periodically schedule when unregistering regions on destroy David Rientjes
@ 2020-09-11  7:57 ` David Rientjes
  2020-09-11 17:24 ` Paolo Bonzini
  1 sibling, 0 replies; 3+ messages in thread
From: David Rientjes @ 2020-09-11  7:57 UTC (permalink / raw)
  To: Tom Lendacky, Brijesh Singh, Paolo Bonzini, Sean Christopherson
  Cc: Joerg Roedel, linux-kernel, x86, kvm

Paolo, ping?

On Tue, 25 Aug 2020, David Rientjes wrote:

> There may be many encrypted regions that need to be unregistered when a
> SEV VM is destroyed.  This can lead to soft lockups.  For example, on a
> host running 4.15:
> 
> watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348]
> CPU: 206 PID: 194348 Comm: t_virtual_machi
> RIP: 0010:free_unref_page_list+0x105/0x170
> ...
> Call Trace:
>  [<0>] release_pages+0x159/0x3d0
>  [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd]
>  [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd]
>  [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd]
>  [<0>] kvm_arch_destroy_vm+0x47/0x200
>  [<0>] kvm_put_kvm+0x1a8/0x2f0
>  [<0>] kvm_vm_release+0x25/0x30
>  [<0>] do_exit+0x335/0xc10
>  [<0>] do_group_exit+0x3f/0xa0
>  [<0>] get_signal+0x1bc/0x670
>  [<0>] do_signal+0x31/0x130
> 
> Although the CLFLUSH is no longer issued on every encrypted region to be
> unregistered, there are no other changes that can prevent soft lockups for
> very large SEV VMs in the latest kernel.
> 
> Periodically schedule if necessary.  This still holds kvm->lock across the
> resched, but since this only happens when the VM is destroyed this is
> assumed to be acceptable.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  arch/x86/kvm/svm/sev.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1106,6 +1106,7 @@ void sev_vm_destroy(struct kvm *kvm)
>  		list_for_each_safe(pos, q, head) {
>  			__unregister_enc_region_locked(kvm,
>  				list_entry(pos, struct enc_region, list));
> +			cond_resched();
>  		}
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [patch] KVM: SVM: Periodically schedule when unregistering regions on destroy
  2020-08-25 19:56 [patch] KVM: SVM: Periodically schedule when unregistering regions on destroy David Rientjes
  2020-09-11  7:57 ` David Rientjes
@ 2020-09-11 17:24 ` Paolo Bonzini
  1 sibling, 0 replies; 3+ messages in thread
From: Paolo Bonzini @ 2020-09-11 17:24 UTC (permalink / raw)
  To: David Rientjes, Tom Lendacky, Brijesh Singh, Sean Christopherson
  Cc: Joerg Roedel, linux-kernel, x86, kvm

On 25/08/20 21:56, David Rientjes wrote:
> There may be many encrypted regions that need to be unregistered when a
> SEV VM is destroyed.  This can lead to soft lockups.  For example, on a
> host running 4.15:
> 
> watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348]
> CPU: 206 PID: 194348 Comm: t_virtual_machi
> RIP: 0010:free_unref_page_list+0x105/0x170
> ...
> Call Trace:
>  [<0>] release_pages+0x159/0x3d0
>  [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd]
>  [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd]
>  [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd]
>  [<0>] kvm_arch_destroy_vm+0x47/0x200
>  [<0>] kvm_put_kvm+0x1a8/0x2f0
>  [<0>] kvm_vm_release+0x25/0x30
>  [<0>] do_exit+0x335/0xc10
>  [<0>] do_group_exit+0x3f/0xa0
>  [<0>] get_signal+0x1bc/0x670
>  [<0>] do_signal+0x31/0x130
> 
> Although the CLFLUSH is no longer issued on every encrypted region to be
> unregistered, there are no other changes that can prevent soft lockups for
> very large SEV VMs in the latest kernel.
> 
> Periodically schedule if necessary.  This still holds kvm->lock across the
> resched, but since this only happens when the VM is destroyed this is
> assumed to be acceptable.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  arch/x86/kvm/svm/sev.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1106,6 +1106,7 @@ void sev_vm_destroy(struct kvm *kvm)
>  		list_for_each_safe(pos, q, head) {
>  			__unregister_enc_region_locked(kvm,
>  				list_entry(pos, struct enc_region, list));
> +			cond_resched();
>  		}
>  	}
>  
> 

Queued, thanks.  Sorry for the delay.

I am currently on leave so I am going through the patches and queuing
them, but I will only push kvm/next and kvm/queue next week.  kvm/master
patches will be sent to Linus for the next -rc though.

Paolo


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-09-11 17:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-25 19:56 [patch] KVM: SVM: Periodically schedule when unregistering regions on destroy David Rientjes
2020-09-11  7:57 ` David Rientjes
2020-09-11 17:24 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).