Re: [PATCH] KVM: x86: Move kzalloc out of atomic context on PREEMPT_RT

From: Paolo Bonzini <pbonzini@redhat.com>
To: Sean Christopherson <seanjc@google.com>,
	Yajun Deng <yajun.deng@linux.dev>
Cc: vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com,
	joro@8bytes.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	x86@kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: x86: Move kzalloc out of atomic context on PREEMPT_RT
Date: Fri, 20 May 2022 15:41:18 +0200	[thread overview]
Message-ID: <f7585471-43be-4b40-f398-dfd7dc937131@redhat.com> (raw)
In-Reply-To: <YoZeI6UeQbP3t1dF@google.com>

On 5/19/22 17:11, Sean Christopherson wrote:
> AFAICT, kfree() is safe to call under a raw spinlock, so this?  Compile tested
> only...

Freeing outside the lock is not complicated enough to check if it is:

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6aa1241a80b7..f849f7c9fbf2 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -229,12 +229,15 @@ void kvm_async_pf_task_wake(u32 token)
  		dummy->cpu = smp_processor_id();
  		init_swait_queue_head(&dummy->wq);
  		hlist_add_head(&dummy->link, &b->list);
+		dummy = NULL;
  	} else {
-		kfree(dummy);
  		apf_task_wake_one(n);
  	}
  	raw_spin_unlock(&b->lock);
-	return;
+
+	/* A dummy token might be allocated and ultimately not used.  */
+	if (dummy)
+		kfree(dummy);
  }
  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);


I queued your patch with the above fixup.

Paolo
> --
> From: Sean Christopherson <seanjc@google.com>
> Date: Thu, 19 May 2022 07:57:11 -0700
> Subject: [PATCH] x86/kvm: Alloc dummy async #PF token outside of raw spinlock
> 
> Drop the raw spinlock in kvm_async_pf_task_wake() before allocating the
> the dummy async #PF token, the allocator is preemptible on PREEMPT_RT
> kernels and must not be called from truly atomic contexts.
> 
> Opportunistically document why it's ok to loop on allocation failure,
> i.e. why the function won't get stuck in an infinite loop.
> 
> Reported-by: Yajun Deng <yajun.deng@linux.dev>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kernel/kvm.c | 38 ++++++++++++++++++++++++--------------
>   1 file changed, 24 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index d0bb2b3fb305..5a4100896969 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -190,7 +190,7 @@ void kvm_async_pf_task_wake(u32 token)
>   {
>   	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
>   	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
> -	struct kvm_task_sleep_node *n;
> +	struct kvm_task_sleep_node *n, *dummy = NULL;
> 
>   	if (token == ~0) {
>   		apf_task_wake_all();
> @@ -202,24 +202,34 @@ void kvm_async_pf_task_wake(u32 token)
>   	n = _find_apf_task(b, token);
>   	if (!n) {
>   		/*
> -		 * async PF was not yet handled.
> -		 * Add dummy entry for the token.
> +		 * Async #PF not yet handled, add a dummy entry for the token.
> +		 * Allocating the token must be down outside of the raw lock
> +		 * as the allocator is preemptible on PREEMPT_RT kernels.
>   		 */
> -		n = kzalloc(sizeof(*n), GFP_ATOMIC);
> -		if (!n) {
> -			/*
> -			 * Allocation failed! Busy wait while other cpu
> -			 * handles async PF.
> -			 */
> +		if (!dummy) {
>   			raw_spin_unlock(&b->lock);
> -			cpu_relax();
> +			dummy = kzalloc(sizeof(*dummy), GFP_KERNEL);
> +
> +			/*
> +			 * Continue looping on allocation failure, eventually
> +			 * the async #PF will be handled and allocating a new
> +			 * node will be unnecessary.
> +			 */
> +			if (!dummy)
> +				cpu_relax();
> +
> +			/*
> +			 * Recheck for async #PF completion before enqueueing
> +			 * the dummy token to avoid duplicate list entries.
> +			 */
>   			goto again;
>   		}
> -		n->token = token;
> -		n->cpu = smp_processor_id();
> -		init_swait_queue_head(&n->wq);
> -		hlist_add_head(&n->link, &b->list);
> +		dummy->token = token;
> +		dummy->cpu = smp_processor_id();
> +		init_swait_queue_head(&dummy->wq);
> +		hlist_add_head(&dummy->link, &b->list);
>   	} else {
> +		kfree(dummy);
>   		apf_task_wake_one(n);
>   	}
>   	raw_spin_unlock(&b->lock);
> 
> base-commit: a3808d88461270c71d3fece5e51cc486ecdac7d0
> --
>