kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Pei Zhang <pezhang@redhat.com>
Subject: Re: [patch 2/2 V2] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
Date: Fri, 7 May 2021 09:11:52 -0300	[thread overview]
Message-ID: <20210507121152.GA367281@fuller.cnet> (raw)
In-Reply-To: <YJRhMrxTrSDClwbQ@google.com>

Hi Sean,

On Thu, May 06, 2021 at 09:35:46PM +0000, Sean Christopherson wrote:
> On Thu, May 06, 2021, Marcelo Tosatti wrote:
> > Index: kvm/arch/x86/kvm/vmx/posted_intr.c
> > ===================================================================
> > --- kvm.orig/arch/x86/kvm/vmx/posted_intr.c
> > +++ kvm/arch/x86/kvm/vmx/posted_intr.c
> > @@ -114,7 +114,7 @@ static void __pi_post_block(struct kvm_v
> >  	} while (cmpxchg64(&pi_desc->control, old.control,
> >  			   new.control) != old.control);
> >  
> > -	if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
> > +	if (vcpu->pre_pcpu != -1) {
> >  		spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> >  		list_del(&vcpu->blocked_vcpu_list);
> >  		spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > @@ -135,20 +135,13 @@ static void __pi_post_block(struct kvm_v
> >   *   this case, return 1, otherwise, return 0.
> >   *
> >   */
> > -int pi_pre_block(struct kvm_vcpu *vcpu)
> > +static int __pi_pre_block(struct kvm_vcpu *vcpu)
> >  {
> >  	unsigned int dest;
> >  	struct pi_desc old, new;
> >  	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> >  
> > -	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > -		!irq_remapping_cap(IRQ_POSTING_CAP)  ||
> > -		!kvm_vcpu_apicv_active(vcpu))
> > -		return 0;
> > -
> > -	WARN_ON(irqs_disabled());
> > -	local_irq_disable();
> > -	if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> > +	if (vcpu->pre_pcpu == -1) {
> >  		vcpu->pre_pcpu = vcpu->cpu;
> >  		spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> >  		list_add_tail(&vcpu->blocked_vcpu_list,
> > @@ -188,12 +181,33 @@ int pi_pre_block(struct kvm_vcpu *vcpu)
> >  	if (pi_test_on(pi_desc) == 1)
> >  		__pi_post_block(vcpu);
> >  
> > +	return (vcpu->pre_pcpu == -1);
> 
> Nothing checks the return of __pi_pre_block(), this can be dropped and the
> helper can be a void return.

Done.

> > +}
> > +
> > +int pi_pre_block(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +
> > +	vmx->in_blocked_section = true;
> > +
> > +	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > +		!irq_remapping_cap(IRQ_POSTING_CAP)  ||
> > +		!kvm_vcpu_apicv_active(vcpu))
> 
> Opportunistically fix the indentation?

Done.

> > +		return 0;
> > +
> > +	WARN_ON(irqs_disabled());
> > +	local_irq_disable();
> > +	__pi_pre_block(vcpu);
> >  	local_irq_enable();
> > +
> >  	return (vcpu->pre_pcpu == -1);
> >  }
> >  
> >  void pi_post_block(struct kvm_vcpu *vcpu)
> >  {
> > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +
> > +	vmx->in_blocked_section = false;
> >  	if (vcpu->pre_pcpu == -1)
> >  		return;
> >  
> > @@ -236,6 +250,52 @@ bool pi_has_pending_interrupt(struct kvm
> >  		(pi_test_sn(pi_desc) && !pi_is_pir_empty(pi_desc));
> >  }
> >  
> > +static void pi_update_wakeup_vector(void *data)
> > +{
> > +	struct vcpu_vmx *vmx;
> > +	struct kvm_vcpu *vcpu = data;
> > +
> > +	vmx = to_vmx(vcpu);
> > +
> > +	/* race with pi_post_block ? */
> > +	if (vcpu->pre_pcpu != -1)
> 
> This seems wrong.  The funky code in __pi_pre_block() regarding pre_cpu muddies
> the waters, but I don't think it's safe to call __pi_pre_block() from a pCPU
> other than the pCPU that is associated with the vCPU.

From Intel's manual:

"29.6 POSTED-INTERRUPT PROCESSING

...

Use of the posted-interrupt descriptor differs from that of other
data structures that are referenced by pointers in a VMCS. There is a
general requirement that software ensure that each such data structure
is modified only when no logical processor with a current VMCS that
references it is in VMX non-root operation. That requirement does not
apply to the posted-interrupt descriptor. There is a requirement,
however, that such modifications be done using locked read-modify-write
instructions."

> If the vCPU is migrated after vmx_pi_start_assignment() grabs vcpu->cpu but
> before the IPI arrives (to run pi_update_wakeup_vector()), then it's possible
> that a different pCPU could be running __pi_pre_block() concurrently with this
> code.  If that happens, both pcPUs could see "vcpu->pre_cpu == -1" and corrupt
> the list due to a double list_add_tail.

Good point.

> The existing code is unnecessarily confusing, but unless I'm missing something,
> it's guaranteed to call pi_pre_block() from the pCPU that is associated with the
> pCPU, i.e. arguably it could/should use this_cpu_ptr(). 

Well problem is it might not exit kvm_vcpu_block(). However that can be
fixed.


>  Because the existing
> code grabs vcpu->cpu with IRQs disabled and is called only from KVM_RUN,
> vcpu->cpu is guaranteed to match the current pCPU since vcpu->cpu will be set to
> the current pCPU when the vCPU is scheduled in.
> 
> Assuming my analysis is correct (definitely not guaranteed), I'm struggling to
> come up with an elegant solution.  But, do we need an elegant solution?  E.g.
> can the start_assignment() hook simply kick all vCPUs with APICv active?
> 
> > +		return;
> > +
> > +	if (!vmx->in_blocked_section)
> > +		return;
> > +
> > +	__pi_pre_block(vcpu);
> > +}
> > +
> > +void vmx_pi_start_assignment(struct kvm *kvm, int device_count)
> > +{
> > +	struct kvm_vcpu *vcpu;
> > +	int i;
> > +
> > +	if (!irq_remapping_cap(IRQ_POSTING_CAP))
> > +		return;
> > +
> > +	/* only care about first device assignment */
> > +	if (device_count != 1)
> > +		return;
> > +
> > +	/* Update wakeup vector and add vcpu to blocked_vcpu_list */
> > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > +		struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +		int pcpu;
> > +
> > +		if (!kvm_vcpu_apicv_active(vcpu))
> > +			continue;
> > +
> > +		preempt_disable();
> 
> Any reason not to do "cpu = get_cpu()"?  Might make sense to do that outside of
> the for-loop, too.

kvm_vcpu_kick seems cleaner, just need to add another arch
hook to allow kvm_vcpu_block() to return.

Thanks for the review! Will resend after testing.


      reply	other threads:[~2021-05-07 12:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 18:57 [patch 0/2] VMX: configure posted interrupt " Marcelo Tosatti
2021-05-06 18:57 ` [patch 1/2] KVM: x86: add start_assignment hook to kvm_x86_ops Marcelo Tosatti
2021-05-06 19:54   ` Sean Christopherson
2021-05-06 18:57 ` [patch 2/2] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device Marcelo Tosatti
2021-05-06 19:11   ` Marcelo Tosatti
2021-05-06 19:21   ` [patch 2/2 V2] " Marcelo Tosatti
2021-05-06 21:35     ` Sean Christopherson
2021-05-07 12:11       ` Marcelo Tosatti [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210507121152.GA367281@fuller.cnet \
    --to=mtosatti@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pezhang@redhat.com \
    --cc=seanjc@google.com \
    --subject='Re: [patch 2/2 V2] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).