[PATCH] KVM: x86: Move kzalloc out of atomic context on PREEMPT

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] KVM: x86: Move kzalloc out of atomic context on PREEMPT_RT
@ 2022-05-19  9:02 Yajun Deng
  2022-05-19 15:11 ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: Yajun Deng @ 2022-05-19  9:02 UTC (permalink / raw)
  To: pbonzini, seanjc, vkuznets, wanpengli, jmattson, joro, tglx,
	mingo, bp, dave.hansen, hpa
  Cc: x86, kvm, linux-kernel, Yajun Deng

The memory allocator is fully preemptible and therefore cannot
be invoked from truly atomic contexts.

See Documentation/locking/locktypes.rst (line: 470)

Add raw_spin_unlock() before memory allocation and raw_spin_lock()
after it.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
---
 arch/x86/kernel/kvm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index d0bb2b3fb305..8f8ec9bbd847 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -205,7 +205,9 @@ void kvm_async_pf_task_wake(u32 token)
 		 * async PF was not yet handled.
 		 * Add dummy entry for the token.
 		 */
-		n = kzalloc(sizeof(*n), GFP_ATOMIC);
+		raw_spin_unlock(&b->lock);
+		n = kzalloc(sizeof(*n), GFP_KERNEL);
+		raw_spin_lock(&b->lock);
 		if (!n) {
 			/*
 			 * Allocation failed! Busy wait while other cpu
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: x86: Move kzalloc out of atomic context on PREEMPT_RT
  2022-05-19  9:02 [PATCH] KVM: x86: Move kzalloc out of atomic context on PREEMPT_RT Yajun Deng
@ 2022-05-19 15:11 ` Sean Christopherson
  2022-05-20 13:41   ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2022-05-19 15:11 UTC (permalink / raw)
  To: Yajun Deng
  Cc: pbonzini, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp,
	dave.hansen, hpa, x86, kvm, linux-kernel

"x86/kvm:" is the preferred shortlog scope for the guest side of things, "KVM: x86"
is for the host, i.e. for arch/x86/kvm.

On Thu, May 19, 2022, Yajun Deng wrote:
> The memory allocator is fully preemptible and therefore cannot
> be invoked from truly atomic contexts.
> 
> See Documentation/locking/locktypes.rst (line: 470)
> 
> Add raw_spin_unlock() before memory allocation and raw_spin_lock()
> after it.
> 
> Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
> ---
>  arch/x86/kernel/kvm.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index d0bb2b3fb305..8f8ec9bbd847 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -205,7 +205,9 @@ void kvm_async_pf_task_wake(u32 token)
>  		 * async PF was not yet handled.
>  		 * Add dummy entry for the token.
>  		 */
> -		n = kzalloc(sizeof(*n), GFP_ATOMIC);
> +		raw_spin_unlock(&b->lock);
> +		n = kzalloc(sizeof(*n), GFP_KERNEL);
> +		raw_spin_lock(&b->lock);

This is flawed, if the async #PF is handled while the lock is dropped then this
will enqueue a second, duplicate entry and not call apf_task_wake_one() as it
should.  I.e. two entries will be leaked.

AFAICT, kfree() is safe to call under a raw spinlock, so this?  Compile tested
only...

--
From: Sean Christopherson <seanjc@google.com>
Date: Thu, 19 May 2022 07:57:11 -0700
Subject: [PATCH] x86/kvm: Alloc dummy async #PF token outside of raw spinlock

Drop the raw spinlock in kvm_async_pf_task_wake() before allocating the
the dummy async #PF token, the allocator is preemptible on PREEMPT_RT
kernels and must not be called from truly atomic contexts.

Opportunistically document why it's ok to loop on allocation failure,
i.e. why the function won't get stuck in an infinite loop.

Reported-by: Yajun Deng <yajun.deng@linux.dev>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/kvm.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index d0bb2b3fb305..5a4100896969 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -190,7 +190,7 @@ void kvm_async_pf_task_wake(u32 token)
 {
 	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
 	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
-	struct kvm_task_sleep_node *n;
+	struct kvm_task_sleep_node *n, *dummy = NULL;

 	if (token == ~0) {
 		apf_task_wake_all();
@@ -202,24 +202,34 @@ void kvm_async_pf_task_wake(u32 token)
 	n = _find_apf_task(b, token);
 	if (!n) {
 		/*
-		 * async PF was not yet handled.
-		 * Add dummy entry for the token.
+		 * Async #PF not yet handled, add a dummy entry for the token.
+		 * Allocating the token must be down outside of the raw lock
+		 * as the allocator is preemptible on PREEMPT_RT kernels.
 		 */
-		n = kzalloc(sizeof(*n), GFP_ATOMIC);
-		if (!n) {
-			/*
-			 * Allocation failed! Busy wait while other cpu
-			 * handles async PF.
-			 */
+		if (!dummy) {
 			raw_spin_unlock(&b->lock);
-			cpu_relax();
+			dummy = kzalloc(sizeof(*dummy), GFP_KERNEL);
+
+			/*
+			 * Continue looping on allocation failure, eventually
+			 * the async #PF will be handled and allocating a new
+			 * node will be unnecessary.
+			 */
+			if (!dummy)
+				cpu_relax();
+
+			/*
+			 * Recheck for async #PF completion before enqueueing
+			 * the dummy token to avoid duplicate list entries.
+			 */
 			goto again;
 		}
-		n->token = token;
-		n->cpu = smp_processor_id();
-		init_swait_queue_head(&n->wq);
-		hlist_add_head(&n->link, &b->list);
+		dummy->token = token;
+		dummy->cpu = smp_processor_id();
+		init_swait_queue_head(&dummy->wq);
+		hlist_add_head(&dummy->link, &b->list);
 	} else {
+		kfree(dummy);
 		apf_task_wake_one(n);
 	}
 	raw_spin_unlock(&b->lock);

base-commit: a3808d88461270c71d3fece5e51cc486ecdac7d0
--


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: x86: Move kzalloc out of atomic context on PREEMPT_RT
  2022-05-19 15:11 ` Sean Christopherson
@ 2022-05-20 13:41   ` Paolo Bonzini
  2022-05-20 14:49     ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2022-05-20 13:41 UTC (permalink / raw)
  To: Sean Christopherson, Yajun Deng
  Cc: vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp,
	dave.hansen, hpa, x86, kvm, linux-kernel

On 5/19/22 17:11, Sean Christopherson wrote:
> AFAICT, kfree() is safe to call under a raw spinlock, so this?  Compile tested
> only...

Freeing outside the lock is not complicated enough to check if it is:

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6aa1241a80b7..f849f7c9fbf2 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -229,12 +229,15 @@ void kvm_async_pf_task_wake(u32 token)
  		dummy->cpu = smp_processor_id();
  		init_swait_queue_head(&dummy->wq);
  		hlist_add_head(&dummy->link, &b->list);
+		dummy = NULL;
  	} else {
-		kfree(dummy);
  		apf_task_wake_one(n);
  	}
  	raw_spin_unlock(&b->lock);
-	return;
+
+	/* A dummy token might be allocated and ultimately not used.  */
+	if (dummy)
+		kfree(dummy);
  }
  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);


I queued your patch with the above fixup.

Paolo
> --
> From: Sean Christopherson <seanjc@google.com>
> Date: Thu, 19 May 2022 07:57:11 -0700
> Subject: [PATCH] x86/kvm: Alloc dummy async #PF token outside of raw spinlock
> 
> Drop the raw spinlock in kvm_async_pf_task_wake() before allocating the
> the dummy async #PF token, the allocator is preemptible on PREEMPT_RT
> kernels and must not be called from truly atomic contexts.
> 
> Opportunistically document why it's ok to loop on allocation failure,
> i.e. why the function won't get stuck in an infinite loop.
> 
> Reported-by: Yajun Deng <yajun.deng@linux.dev>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kernel/kvm.c | 38 ++++++++++++++++++++++++--------------
>   1 file changed, 24 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index d0bb2b3fb305..5a4100896969 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -190,7 +190,7 @@ void kvm_async_pf_task_wake(u32 token)
>   {
>   	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
>   	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
> -	struct kvm_task_sleep_node *n;
> +	struct kvm_task_sleep_node *n, *dummy = NULL;
> 
>   	if (token == ~0) {
>   		apf_task_wake_all();
> @@ -202,24 +202,34 @@ void kvm_async_pf_task_wake(u32 token)
>   	n = _find_apf_task(b, token);
>   	if (!n) {
>   		/*
> -		 * async PF was not yet handled.
> -		 * Add dummy entry for the token.
> +		 * Async #PF not yet handled, add a dummy entry for the token.
> +		 * Allocating the token must be down outside of the raw lock
> +		 * as the allocator is preemptible on PREEMPT_RT kernels.
>   		 */
> -		n = kzalloc(sizeof(*n), GFP_ATOMIC);
> -		if (!n) {
> -			/*
> -			 * Allocation failed! Busy wait while other cpu
> -			 * handles async PF.
> -			 */
> +		if (!dummy) {
>   			raw_spin_unlock(&b->lock);
> -			cpu_relax();
> +			dummy = kzalloc(sizeof(*dummy), GFP_KERNEL);
> +
> +			/*
> +			 * Continue looping on allocation failure, eventually
> +			 * the async #PF will be handled and allocating a new
> +			 * node will be unnecessary.
> +			 */
> +			if (!dummy)
> +				cpu_relax();
> +
> +			/*
> +			 * Recheck for async #PF completion before enqueueing
> +			 * the dummy token to avoid duplicate list entries.
> +			 */
>   			goto again;
>   		}
> -		n->token = token;
> -		n->cpu = smp_processor_id();
> -		init_swait_queue_head(&n->wq);
> -		hlist_add_head(&n->link, &b->list);
> +		dummy->token = token;
> +		dummy->cpu = smp_processor_id();
> +		init_swait_queue_head(&dummy->wq);
> +		hlist_add_head(&dummy->link, &b->list);
>   	} else {
> +		kfree(dummy);
>   		apf_task_wake_one(n);
>   	}
>   	raw_spin_unlock(&b->lock);
> 
> base-commit: a3808d88461270c71d3fece5e51cc486ecdac7d0
> --
> 


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: x86: Move kzalloc out of atomic context on PREEMPT_RT
  2022-05-20 13:41   ` Paolo Bonzini
@ 2022-05-20 14:49     ` Sean Christopherson
  0 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2022-05-20 14:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yajun Deng, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp,
	dave.hansen, hpa, x86, kvm, linux-kernel

On Fri, May 20, 2022, Paolo Bonzini wrote:
> On 5/19/22 17:11, Sean Christopherson wrote:
> > AFAICT, kfree() is safe to call under a raw spinlock, so this?  Compile tested
> > only...
> 
> Freeing outside the lock is not complicated enough to check if it is:
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 6aa1241a80b7..f849f7c9fbf2 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -229,12 +229,15 @@ void kvm_async_pf_task_wake(u32 token)
>  		dummy->cpu = smp_processor_id();
>  		init_swait_queue_head(&dummy->wq);
>  		hlist_add_head(&dummy->link, &b->list);
> +		dummy = NULL;
>  	} else {
> -		kfree(dummy);
>  		apf_task_wake_one(n);
>  	}
>  	raw_spin_unlock(&b->lock);
> -	return;
> +
> +	/* A dummy token might be allocated and ultimately not used.  */
> +	if (dummy)
> +		kfree(dummy);
>  }
>  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);
> 
> 
> I queued your patch with the above fixup.

Ha, I wrote it exactly that way, then grepped around found a few instances of kfree()
being called in side a raw spinlock, so changed it back :-)

100% agree it's not worth having to generate another patch if it turns out those
callers are wrong.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-20 14:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19  9:02 [PATCH] KVM: x86: Move kzalloc out of atomic context on PREEMPT_RT Yajun Deng
2022-05-19 15:11 ` Sean Christopherson
2022-05-20 13:41   ` Paolo Bonzini
2022-05-20 14:49     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).