All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Pei Zhang <pezhang@redhat.com>
Subject: Re: [patch 2/2 V2] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
Date: Fri, 7 May 2021 09:11:52 -0300	[thread overview]
Message-ID: <20210507121152.GA367281@fuller.cnet> (raw)
In-Reply-To: <YJRhMrxTrSDClwbQ@google.com>

Hi Sean,

On Thu, May 06, 2021 at 09:35:46PM +0000, Sean Christopherson wrote:
> On Thu, May 06, 2021, Marcelo Tosatti wrote:
> > Index: kvm/arch/x86/kvm/vmx/posted_intr.c
> > ===================================================================
> > --- kvm.orig/arch/x86/kvm/vmx/posted_intr.c
> > +++ kvm/arch/x86/kvm/vmx/posted_intr.c
> > @@ -114,7 +114,7 @@ static void __pi_post_block(struct kvm_v
> >  	} while (cmpxchg64(&pi_desc->control, old.control,
> >  			   new.control) != old.control);
> >  
> > -	if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
> > +	if (vcpu->pre_pcpu != -1) {
> >  		spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> >  		list_del(&vcpu->blocked_vcpu_list);
> >  		spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > @@ -135,20 +135,13 @@ static void __pi_post_block(struct kvm_v
> >   *   this case, return 1, otherwise, return 0.
> >   *
> >   */
> > -int pi_pre_block(struct kvm_vcpu *vcpu)
> > +static int __pi_pre_block(struct kvm_vcpu *vcpu)
> >  {
> >  	unsigned int dest;
> >  	struct pi_desc old, new;
> >  	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> >  
> > -	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > -		!irq_remapping_cap(IRQ_POSTING_CAP)  ||
> > -		!kvm_vcpu_apicv_active(vcpu))
> > -		return 0;
> > -
> > -	WARN_ON(irqs_disabled());
> > -	local_irq_disable();
> > -	if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> > +	if (vcpu->pre_pcpu == -1) {
> >  		vcpu->pre_pcpu = vcpu->cpu;
> >  		spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> >  		list_add_tail(&vcpu->blocked_vcpu_list,
> > @@ -188,12 +181,33 @@ int pi_pre_block(struct kvm_vcpu *vcpu)
> >  	if (pi_test_on(pi_desc) == 1)
> >  		__pi_post_block(vcpu);
> >  
> > +	return (vcpu->pre_pcpu == -1);
> 
> Nothing checks the return of __pi_pre_block(), this can be dropped and the
> helper can be a void return.

Done.

> > +}
> > +
> > +int pi_pre_block(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +
> > +	vmx->in_blocked_section = true;
> > +
> > +	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > +		!irq_remapping_cap(IRQ_POSTING_CAP)  ||
> > +		!kvm_vcpu_apicv_active(vcpu))
> 
> Opportunistically fix the indentation?

Done.

> > +		return 0;
> > +
> > +	WARN_ON(irqs_disabled());
> > +	local_irq_disable();
> > +	__pi_pre_block(vcpu);
> >  	local_irq_enable();
> > +
> >  	return (vcpu->pre_pcpu == -1);
> >  }
> >  
> >  void pi_post_block(struct kvm_vcpu *vcpu)
> >  {
> > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +
> > +	vmx->in_blocked_section = false;
> >  	if (vcpu->pre_pcpu == -1)
> >  		return;
> >  
> > @@ -236,6 +250,52 @@ bool pi_has_pending_interrupt(struct kvm
> >  		(pi_test_sn(pi_desc) && !pi_is_pir_empty(pi_desc));
> >  }
> >  
> > +static void pi_update_wakeup_vector(void *data)
> > +{
> > +	struct vcpu_vmx *vmx;
> > +	struct kvm_vcpu *vcpu = data;
> > +
> > +	vmx = to_vmx(vcpu);
> > +
> > +	/* race with pi_post_block ? */
> > +	if (vcpu->pre_pcpu != -1)
> 
> This seems wrong.  The funky code in __pi_pre_block() regarding pre_cpu muddies
> the waters, but I don't think it's safe to call __pi_pre_block() from a pCPU
> other than the pCPU that is associated with the vCPU.

From Intel's manual:

"29.6 POSTED-INTERRUPT PROCESSING

...

Use of the posted-interrupt descriptor differs from that of other
data structures that are referenced by pointers in a VMCS. There is a
general requirement that software ensure that each such data structure
is modified only when no logical processor with a current VMCS that
references it is in VMX non-root operation. That requirement does not
apply to the posted-interrupt descriptor. There is a requirement,
however, that such modifications be done using locked read-modify-write
instructions."

> If the vCPU is migrated after vmx_pi_start_assignment() grabs vcpu->cpu but
> before the IPI arrives (to run pi_update_wakeup_vector()), then it's possible
> that a different pCPU could be running __pi_pre_block() concurrently with this
> code.  If that happens, both pcPUs could see "vcpu->pre_cpu == -1" and corrupt
> the list due to a double list_add_tail.

Good point.

> The existing code is unnecessarily confusing, but unless I'm missing something,
> it's guaranteed to call pi_pre_block() from the pCPU that is associated with the
> pCPU, i.e. arguably it could/should use this_cpu_ptr(). 

Well problem is it might not exit kvm_vcpu_block(). However that can be
fixed.


>  Because the existing
> code grabs vcpu->cpu with IRQs disabled and is called only from KVM_RUN,
> vcpu->cpu is guaranteed to match the current pCPU since vcpu->cpu will be set to
> the current pCPU when the vCPU is scheduled in.
> 
> Assuming my analysis is correct (definitely not guaranteed), I'm struggling to
> come up with an elegant solution.  But, do we need an elegant solution?  E.g.
> can the start_assignment() hook simply kick all vCPUs with APICv active?
> 
> > +		return;
> > +
> > +	if (!vmx->in_blocked_section)
> > +		return;
> > +
> > +	__pi_pre_block(vcpu);
> > +}
> > +
> > +void vmx_pi_start_assignment(struct kvm *kvm, int device_count)
> > +{
> > +	struct kvm_vcpu *vcpu;
> > +	int i;
> > +
> > +	if (!irq_remapping_cap(IRQ_POSTING_CAP))
> > +		return;
> > +
> > +	/* only care about first device assignment */
> > +	if (device_count != 1)
> > +		return;
> > +
> > +	/* Update wakeup vector and add vcpu to blocked_vcpu_list */
> > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > +		struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +		int pcpu;
> > +
> > +		if (!kvm_vcpu_apicv_active(vcpu))
> > +			continue;
> > +
> > +		preempt_disable();
> 
> Any reason not to do "cpu = get_cpu()"?  Might make sense to do that outside of
> the for-loop, too.

kvm_vcpu_kick seems cleaner, just need to add another arch
hook to allow kvm_vcpu_block() to return.

Thanks for the review! Will resend after testing.


      reply	other threads:[~2021-05-07 12:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 18:57 [patch 0/2] VMX: configure posted interrupt descriptor when assigning device Marcelo Tosatti
2021-05-06 18:57 ` [patch 1/2] KVM: x86: add start_assignment hook to kvm_x86_ops Marcelo Tosatti
2021-05-06 19:54   ` Sean Christopherson
2021-05-06 18:57 ` [patch 2/2] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device Marcelo Tosatti
2021-05-06 19:11   ` Marcelo Tosatti
2021-05-06 19:21   ` [patch 2/2 V2] " Marcelo Tosatti
2021-05-06 21:35     ` Sean Christopherson
2021-05-07 12:11       ` Marcelo Tosatti [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210507121152.GA367281@fuller.cnet \
    --to=mtosatti@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pezhang@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.