From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Wu, Feng" <feng.wu@intel.com>
Subject: Re: [RFC v1 08/15] Update IRTE according to guest
 interrupt config changes
Date: Thu, 2 Apr 2015 08:02:51 +0000
Message-ID: <E959C4978C3B6342920538CF579893F002487127@SHSMSX104.ccr.corp.intel.com>
References: <1427286717-4093-1-git-send-email-feng.wu@intel.com>
	<1427286717-4093-9-git-send-email-feng.wu@intel.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D1261FB664@SHSMSX101.ccr.corp.intel.com>
	<E959C4978C3B6342920538CF579893F002486E78@SHSMSX104.ccr.corp.intel.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D1261FB859@SHSMSX101.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D1261FB859@SHSMSX101.ccr.corp.intel.com>
Content-Language: en-US
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: "Tian, Kevin" <kevin.tian@intel.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: "Zhang, Yang Z" <yang.z.zhang@intel.com>, "Wu, Feng" <feng.wu@intel.com>, "keir@xen.org" <keir@xen.org>, "JBeulich@suse.com" <JBeulich@suse.com>
List-Id: xen-devel@lists.xenproject.org


> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, April 02, 2015 2:50 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> Subject: RE: [RFC v1 08/15] Update IRTE according to guest interrupt config
> changes
> 
> > From: Wu, Feng
> > Sent: Thursday, April 02, 2015 2:21 PM
> >
> >
> >
> > > -----Original Message-----
> > > From: Tian, Kevin
> > > Sent: Thursday, April 02, 2015 1:52 PM
> > > To: Wu, Feng; xen-devel@lists.xen.org
> > > Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> > > Subject: RE: [RFC v1 08/15] Update IRTE according to guest interrupt config
> > > changes
> > >
> > > > From: Wu, Feng
> > > > Sent: Wednesday, March 25, 2015 8:32 PM
> > > >
> > > > When guest changes its interrupt configuration (such as, vector, etc.)
> > > > for direct-assigned devices, we need to update the associated IRTE
> > > > with the new guest vector, so external interrupts from the assigned
> > > > devices can be injected to guests without VM-Exit.
> > > >
> > > > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > > > the destination vCPU. This follows the hardware behavior, since modern
> > > > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> > > >
> > > > For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> > > > still use interrupt remapping.
> > > >
> > > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > > ---
> > > >  xen/drivers/passthrough/io.c | 77
> > > > +++++++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 76 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > > > index ae050df..1d9a132 100644
> > > > --- a/xen/drivers/passthrough/io.c
> > > > +++ b/xen/drivers/passthrough/io.c
> > > > @@ -26,6 +26,7 @@
> > > >  #include <asm/hvm/iommu.h>
> > > >  #include <asm/hvm/support.h>
> > > >  #include <xen/hvm/irq.h>
> > > > +#include <asm/io_apic.h>
> > > >
> > > >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> > > >
> > > > @@ -199,6 +200,61 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci
> > *dpci)
> > > >      xfree(dpci);
> > > >  }
> > > >
> > > > +/*
> > > > + * Here we handle the following cases:
> > > > + * - For lowest-priority interrupts, we find the destination vCPU from the
> > > > + *   guest vector using vector-hashing mechamisn and return true. This
> > > > follows
> > > > + *   the hardware behavior, since modern Intel CPUs use vector
> hashing
> > to
> > > > + *   handle the lowest-priority interrupt.
> > > > + * - Otherwise, for single destination interrupt, it is straightforward to
> > > > + *   find the destination vCPU and return true.
> > > > + * - For multicase/broadcast vCPU, we cannot handle it via interrupt
> > posting,
> > > > + *   so return false.
> > > > + */
> > > > +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> > > > +                                uint8_t dest_mode, uint8_t
> > > > deliver_mode,
> > > > +                                uint32_t gvec, struct vcpu
> > > > **dest_vcpu)
> > > > +{
> > > > +    struct vcpu *v, **dest_vcpu_array;
> > > > +    unsigned int dest_vcpu_num = 0;
> > > > +    int ret;
> > > > +
> > > > +    if ( deliver_mode == dest_LowestPrio )
> > > > +        dest_vcpu_array = xzalloc_array(struct vcpu *, d->max_vcpus);
> > > > +
> > > > +    for_each_vcpu ( d, v )
> > > > +    {
> > > > +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> > > > +                                dest_id, dest_mode) )
> > > > +            continue;
> > > > +
> > > > +        dest_vcpu_num++;
> > > > +
> > > > +        if ( deliver_mode == dest_LowestPrio )
> > > > +            dest_vcpu_array[dest_vcpu_num] = v;
> > > > +        else
> > > > +            *dest_vcpu = v;
> > > > +    }
> > > > +
> > > > +    if ( deliver_mode == dest_LowestPrio )
> > > > +    {
> > > > +        if (  dest_vcpu_num != 0 )
> > > > +        {
> > > > +            *dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num];
> > > > +            ret = 1;
> > > > +        }
> > > > +        else
> > > > +            ret = 0;
> > > > +
> > > > +        xfree(dest_vcpu_array);
> > > > +        return ret;
> > > > +    }
> > > > +    else if (  dest_vcpu_num == 1 )
> > > > +        return 1;
> > > > +    else
> > > > +        return 0;
> > > > +}
> > > > +
> > > >  int pt_irq_create_bind(
> > > >      struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
> > > >  {
> > > > @@ -257,7 +313,7 @@ int pt_irq_create_bind(
> > > >      {
> > > >      case PT_IRQ_TYPE_MSI:
> > > >      {
> > > > -        uint8_t dest, dest_mode;
> > > > +        uint8_t dest, dest_mode, deliver_mode;
> > > >          int dest_vcpu_id;
> > > >
> > > >          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> > > > @@ -330,11 +386,30 @@ int pt_irq_create_bind(
> > > >          /* Calculate dest_vcpu_id for MSI-type pirq migration. */
> > > >          dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
> > > >          dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> > > > +        deliver_mode = (pirq_dpci->gmsi.gflags >>
> > > > GFLAGS_SHIFT_DELIV_MODE) &
> > > > +                        VMSI_DELIV_MASK;
> > > >          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest,
> dest_mode);
> > > >          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> > > >          spin_unlock(&d->event_lock);
> > > >          if ( dest_vcpu_id >= 0 )
> > > >              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> > > > +
> > > > +        /* Use interrupt posting if it is supported */
> > > > +        if ( iommu_intpost )
> > > > +        {
> > > > +            struct vcpu *vcpu = NULL;
> > > > +
> > > > +            if ( !pi_find_dest_vcpu(d, dest, dest_mode, deliver_mode,
> > > > +                                    pirq_dpci->gmsi.gvec,
> > &vcpu) )
> > > > +                break;
> > > > +
> > >
> > > Is it possible this new pi_find_dest_vcpu will return a different target from
> > > earlier hvm_girq_des_2_vcpu_id? if yes it will cause tricky issues since
> > > earlier pirqs are migrated according to different policy. We need consolidate
> > > vcpu selection policies together to keep consistency.
> >
> > In my understanding, what you described above is the software way to
> deliver
> > the interrupts to vCPU, when posted-interrupt is used, interrupts are
> delivered
> > by hardware according to the settings in IRTE, hence those software path will
> > not get touched for these interrupts. So do we need to care about how
> > software
> > might migrate the interrupts here?
> 
> just curious why we can't use one policy for vcpu selection. if multicast
> handling is a difference, you may pass intpost as a parameter to use
> same function.
> 

Digging into hvm_girq_dest_2_vcpu_id, I find that hvm_girq_dest_2_vcpu_id()
is introduced by commit 023e3bc7, and it is just an optimization for interrupts
with single destination. For most case, the destination of a vCPU is determined
by vmsi_deliver().

> >
> > >
> > > and why failure to find dest_vcpu doesn't lead to an error but a break?
> >
> > We cannot post multicast/broadcast interrupts to a guest, and
> > pi_find_dest_vcpu() returns 0 when encountering a multicast/broadcast
> > interrupt, in that case, we still use interrupt remapping mechanism for it.
> 
> then you might handle postint first, and then if muticast or no intpost support
> then go to software style.

That is a good suggestion. I will think more about how the handle this better.

Thanks,
Feng

> 
> Thanks
> Kevin