Re: [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed

From: "Wu, Feng" <feng.wu@intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	"keir@xen.org" <keir@xen.org>,
	"george.dunlap@eu.citrix.com" <george.dunlap@eu.citrix.com>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
	"dario.faggioli@citrix.com" <dario.faggioli@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	"Wu, Feng" <feng.wu@intel.com>
Subject: Re: [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed
Date: Fri, 3 Jun 2016 05:12:32 +0000	[thread overview]
Message-ID: <E959C4978C3B6342920538CF579893F019705622@SHSMSX103.ccr.corp.intel.com> (raw)
In-Reply-To: <574D971302000078000F000F@prv-mh.provo.novell.com>

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, May 31, 2016 7:52 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: RE: [PATCH v2 1/4] VMX: Properly handle pi when all the assigned
> devices are removed
> 
> >>> On 31.05.16 at 12:22, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, May 27, 2016 9:43 PM
> >> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> > @@ -113,7 +113,19 @@ static void vmx_vcpu_block(struct vcpu *v)
> >> >  		&per_cpu(vmx_pi_blocking, v->processor).lock;
> >> >      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> >> >
> >> > -    spin_lock_irqsave(pi_blocking_list_lock, flags);
> >> > +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> >> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up) )
> >> > +    {
> >> > +        /*
> >> > +         * The vcpu is to be destroyed and it has already been removed
> >> > +         * from the per-CPU list if it is blocking, we shouldn't add
> >> > +         * new vCPU to the list.
> >> > +         */
> >>
> >> This comment appears to be stale wrt the implementation further
> >> down. But see there for whether it's the comment or the code
> >> that need to change.
> >>
> >> > +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> >> > +        return;
> >> > +    }
> >> > +
> >> > +    spin_lock(pi_blocking_list_lock);
> >>
> >> There doesn't appear to be an active problem with this, but
> >> taking a global lock inside a per-vCPU one feels bad. Both here
> >> and in vmx_pi_blocking_cleanup() I can't see why the global
> >> lock alone won't do - you acquire it anyway.
> >
> > The purpose of ' v->arch.hvm_vmx.pi_hotplug_lock ' is to make
> > sure things go right when vmx_pi_blocking_cleanup() and
> > vmx_vcpu_block() are running concurrently. However, the lock
> > 'pi_blocking_list_lock' in vmx_pi_blocking_cleanup() refers to
> > ' v->arch.hvm_vmx.pi_blocking.lock ' while 'pi_blocking_list_lock'
> > in vmx_vcpu_block() is '&per_cpu(vmx_pi_blocking, v->processor).lock'.
> > These two can be different, please consider the following scenario:
> >
> > vmx_pi_blocking_cleanup() gets called when the vcpu is in blocking
> > state and 'v->arch.hvm_vmx.pi_blocking.lock' points to the per-cpu
> > lock of pCPU0, and when acquiring the lock in this function, another
> > one wins, (such as vmx_pi_do_resume() or pi_wakeup_interrupt()),
> > so we are spinning in vmx_pi_blocking_cleanup(). After the vCPU
> > is woken up, it is running on pCPU1, then sometime later, the vCPU
> > is blocking again and in vmx_vcpu_block() and if the previous
> > vmx_pi_blocking_cleanup() still doesn't finish its job. Then we
> > acquire a different lock (it is pCPU1 this time)in vmx_vcpu_block()
> > compared to the one in vmx_pi_blocking_cleanup. This can break
> > things, since we might still put the vCPU to the per-cpu blocking list
> > after vCPU is destroyed or the last assigned device is detached.
> 
> Makes sense. Then the next question is whether you really need
> to hold both locks at the same time.

I think we need to hold both. The two spinlocks have different purpose:
'v->arch.hvm_vmx.pi_hotplug_lock' is to protect some race case while
device hotplug or the domain is being down, while the other one is
used to normally protect accesses to the blocking list.

> Please understand that I'd
> really prefer for this to have a simpler solution - the original code
> was already more complicated than one would really think it should
> be, and now it's getting worse. While I can't immediately suggest
> alternatives, it feels like the solution to the current problem may
> rather be simplification instead of making things even more
> complicated.

To be honest, comments like this make one frustrated. If you have
any other better solutions, we can discuss it here. However, just
saying the current solution is too complicated is not a good way
to speed up the process.

> 
> >> >  void vmx_pi_hooks_deassign(struct domain *d)
> >> >  {
> >> > +    struct vcpu *v;
> >> > +
> >> >      if ( !iommu_intpost || !has_hvm_container_domain(d) )
> >> >          return;
> >> >
> >> > -    ASSERT(d->arch.hvm_domain.vmx.vcpu_block);
> >> > -
> >> > -    d->arch.hvm_domain.vmx.vcpu_block = NULL;
> >> > -    d->arch.hvm_domain.vmx.pi_switch_from = NULL;
> >> > -    d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> >> > -    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
> >> > +    for_each_vcpu ( d, v )
> >> > +        vmx_pi_blocking_cleanup(v);
> >>
> >> If you keep the hooks in place, why is it relevant to do the cleanup
> >> here instead of just at domain death? As suggested before, if you
> >> did it there, you'd likely get away without adding the new per-vCPU
> >> flag.
> >
> > I don't quite understand this. Adding the cleanup here is to handle
> > the corner case when the last assigned device is detached from the
> > domain.
> 
> There's nothing to be cleaned up really if that de-assign isn't a
> result of the domain being brought down.

I mentioned it clearly in [0/4] of this series. We _need_ to cleanup
the blocking list when removing the device while the domain
is running, since vmx_vcpu_block() might be running at the same
time. If that is the case, there might be some stale vcpus in the
blocking list.

> 
> > Why do you think we don't need to per-vCPU flag, we need
> > to set it here to indicate that the vCPU is cleaned up, and in
> > vmx_vcpu_block(), we cannot put the vCPUs with this flag set to the
> > per-cpu blocking list. Do I miss something?
> 
> When clean up is done only at domain destruction time, 

No.

Thanks,
Feng

> there's per-domain state already that can be checked instead of this
> per-vCPU flag.
> 
> >> Furthermore, if things remain the way they are now, a per-domain
> >> flag would suffice.
> >
> > vmx_pi_blocking_cleanup() is called in vmx_cpu_destroy(), which can
> > be called during domain destroy or vcpu_initialise() if it meets some
> > errors. For the latter case, if we use per-domain flag and set it in
> > vmx_pi_blocking_clean(), that should be problematic, since it will
> > affect another vcpus which should be running normally.
> 
> I don't see how the error case of vcpu_initialise() can be problematic.
> Any such vCPU would never run, and hence never want to block.
> 
> >> And finally - do you really need to retain all four hooks? Since if not,
> >> one that gets zapped here could be used in place of such a per-
> >> domain flag.
> >
> > Can you elaborate a bit more about this? thanks a lot!
> 
> I have a really hard time what more I can say here. I dislike
> redundant information, and hence if the flag value can be
> deduced from some other field, I'd rather see that other field
> used instead of yet another flag getting added.
> 
> >> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> >> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> >> > @@ -231,6 +231,9 @@ struct arch_vmx_struct {
> >> >       * pCPU and wakeup the related vCPU.
> >> >       */
> >> >      struct pi_blocking_vcpu pi_blocking;
> >> > +
> >> > +    spinlock_t            pi_hotplug_lock;
> >>
> >> Being through all of the patch, I have a problem seeing the hotplug
> >> nature of this.
> >
> > This is related to the assigned device hotplug.
> 
> This reply means nothing to me. As said - I'm missing the connection
> between this name and the rest of the patch (where there is no
> further talk of hotplug afair).
> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel