Re: [PATCH 2/2] x86/hvm/viridian: Enable APIC assist enlightenment

From: Paul Durrant <Paul.Durrant@citrix.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	Stefano Stabellini <Stefano.Stabellini@citrix.com>,
	Jan Beulich <jbeulich@suse.com>,
	Ian Jackson <Ian.Jackson@citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Keir (Xen.org)" <keir@xen.org>
Subject: Re: [PATCH 2/2] x86/hvm/viridian: Enable APIC assist enlightenment
Date: Wed, 16 Mar 2016 15:02:05 +0000	[thread overview]
Message-ID: <e8b535dfbed04196ab7f194c5be40539@AMSPEX02CL03.citrite.net> (raw)
In-Reply-To: <20160315231808.GC29285@char.us.oracle.com>

> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: 15 March 2016 23:18
> To: Paul Durrant
> Cc: xen-devel@lists.xenproject.org; Wei Liu; Stefano Stabellini; Andrew
> Cooper; Ian Jackson; Jan Beulich; Keir (Xen.org)
> Subject: Re: [Xen-devel] [PATCH 2/2] x86/hvm/viridian: Enable APIC assist
> enlightenment
> 
> On Tue, Mar 15, 2016 at 04:14:16PM +0000, Paul Durrant wrote:
> > This patch adds code to enable the APIC assist enlightenment which,
> > under certain conditions, means that the guest can avoid an EOI of
> > the local APIC and thereby avoid a VMEXIT.
> 
> I noticed you deleted the comment having the spec.
> 

That's because Microsoft killed the link and I couldn't find the spec online any more. But, thankfully, I've now found it again although it has actually taken me weeks to do so.

> Would it be better if:
> 
> 1) It was in the git commit description.
> 2) As part of the docs directory?
> 
> Oh and the link is broken. An the link it points to is
> broken too.
> 
> So no spec for this, eh?
> 

Now I've found it again I'll reference it.

> 
> 
> >
> > Use of the enlightenment by the hypervisor is under control of the
> > toolstack.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> > Cc: Keir Fraser <keir@xen.org>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> >  docs/man/xl.cfg.pod.5              |  8 ++++++
> >  tools/libxl/libxl_dom.c            |  3 ++
> >  tools/libxl/libxl_types.idl        |  1 +
> >  xen/arch/x86/hvm/viridian.c        | 59
> +++++++++++++++++++++++++++++++-------
> >  xen/arch/x86/hvm/vlapic.c          | 58
> +++++++++++++++++++++++++++++++++----
> >  xen/include/asm-x86/hvm/viridian.h |  5 ++++
> >  xen/include/public/hvm/params.h    |  7 ++++-
> >  7 files changed, 124 insertions(+), 17 deletions(-)
> >
> > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> > index 56b1117..49acda7 100644
> > --- a/docs/man/xl.cfg.pod.5
> > +++ b/docs/man/xl.cfg.pod.5
> > @@ -1484,6 +1484,14 @@ This set incorporates use of hypercalls for
> remote TLB flushing.
> >  This enlightenment may improve performance of Windows guests running
> >  on hosts with higher levels of (physical) CPU contention.
> >
> > +=item B<apic_assist>
> > +
> > +This set incorporates use of the APIC assist page to avoid EOI of
> 
> What set? Option?
> 

Set of viridian enlightenments. Read the pre-amble to the section.

> > +the local APIC.
> > +This enlightenment may improve performance of Windows guests,
> 
> may? How does this square with Intel vAPIC?
> Could you mention that here? Or in the commit description?
> 

Yes, I'll mention that it's clearly somewhat pointless if posted interrupts are in operation and that it has no effect in that case.

> And can non-Windows guests use it too?

If they adhere to the viridian spec., yes.

> Or is this particular for Windows guests? IF so which ones?
> 
> > +particularly those running PV drivers that make use of per-vcpu
> > +event channel upcall vectors.
> 
> What kind of PV drivers? Anybody?
>

Anybody's drivers that care to make use of FIFO event channels and that HVM op.

> > +
> >  =item B<defaults>
> >
> >  This is a special value that enables the default set of groups, which
> > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> > index b825b98..ee75ad1 100644
> > --- a/tools/libxl/libxl_dom.c
> > +++ b/tools/libxl/libxl_dom.c
> > @@ -253,6 +253,9 @@ static int hvm_set_viridian_features(libxl__gc *gc,
> uint32_t domid,
> >      if (libxl_bitmap_test(&enlightenments,
> LIBXL_VIRIDIAN_ENLIGHTENMENT_HCALL_REMOTE_TLB_FLUSH))
> >          mask |= HVMPV_hcall_remote_tlb_flush;
> >
> > +    if (libxl_bitmap_test(&enlightenments,
> LIBXL_VIRIDIAN_ENLIGHTENMENT_APIC_ASSIST))
> > +        mask |= HVMPV_apic_assist;
> > +
> >      if (mask != 0 &&
> >          xc_hvm_param_set(CTX->xch,
> >                           domid,
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 632c009..e3be957 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -221,6 +221,7 @@ libxl_viridian_enlightenment =
> Enumeration("viridian_enlightenment", [
> >      (2, "time_ref_count"),
> >      (3, "reference_tsc"),
> >      (4, "hcall_remote_tlb_flush"),
> > +    (5, "apic_assist"),
> >      ])
> >
> >  libxl_hdtype = Enumeration("hdtype", [
> > diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c
> > index c779290..1f73691 100644
> > --- a/xen/arch/x86/hvm/viridian.c
> > +++ b/xen/arch/x86/hvm/viridian.c
> > @@ -221,16 +221,6 @@ static void initialize_apic_assist(struct vcpu *v)
> >      struct page_info *page = get_page_from_gfn(d, gmfn, NULL,
> P2M_ALLOC);
> >      void *va;
> >
> > -    /*
> > -     * We don't yet make use of the APIC assist page but by setting
> > -     * the CPUID3A_MSR_APIC_ACCESS bit in CPUID leaf 40000003 we are
> duty
> > -     * bound to support the MSR. We therefore do just enough to keep
> windows
> > -     * happy.
> > -     *
> > -     * See http://msdn.microsoft.com/en-
> us/library/ff538657%28VS.85%29.aspx for
> > -     * details of how Windows uses the page.
> > -     */
> > -
> >      if ( !page )
> >          return;
> >
> > @@ -251,6 +241,55 @@ static void initialize_apic_assist(struct vcpu *v)
> >
> >      v->arch.hvm_vcpu.viridian.apic_assist.page = page;
> >      v->arch.hvm_vcpu.viridian.apic_assist.va = va;
> > +    v->arch.hvm_vcpu.viridian.apic_assist.vector = -1;
> > +}
> > +
> > +void viridian_start_apic_assist(struct vcpu *v, int vector)
> > +{
> > +    void *va = v->arch.hvm_vcpu.viridian.apic_assist.va;
> > +
> > +    if ( !(viridian_feature_mask(v->domain) & HVMPV_apic_assist) ||
> > +         !va )
> > +        return;
> > +
> > +    /*
> > +     * If there is already an assist pending then something has gone
> > +     * wrong and the VM will most likely hang so force a crash now
> > +     * to make the problem clear.
> > +     */
> > +    if (v->arch.hvm_vcpu.viridian.apic_assist.vector >= 0)
> 
> I think you need spaces sprinkled here.
>

I do indeed.

> > +        domain_crash(v->domain);
> > +
> > +    v->arch.hvm_vcpu.viridian.apic_assist.vector = vector;
> > +    *(uint32_t *)va |= 1u;
> 
> Oh my. What does that do? Why not 0xBADF00D
> Is that prescriped in the spec?

The LSB is the only bit that has defined functionality. The next 31 bits are reserved and the rest of the page is undefined.

> 
> That looks quite unhealthy to do.
> 

Reading the spec. again I do see that the reserved bits are defined to be zero, so it looks like I can use = rather than |=.

> > +}
> > +
> > +bool_t viridian_complete_apic_assist(struct vcpu *v, int *vector)
> > +{
> > +    void *va = v->arch.hvm_vcpu.viridian.apic_assist.va;
> > +
> > +    if ( !(viridian_feature_mask(v->domain) & HVMPV_apic_assist) ||
> > +         !va )
> > +        return 0;
> > +
> > +    if ( *(uint32_t *)va & 1 )
> 
> Is that really part of the spec?
> 

Yes.

> > +        return 0; /* Interrupt not yet processed by the guest */
> > +
> > +    *vector = v->arch.hvm_vcpu.viridian.apic_assist.vector;
> > +    v->arch.hvm_vcpu.viridian.apic_assist.vector = -1;
> 
> Could you explain that bit please? Why the reset of the vector?
> Maybe it becomes clear later on.
> 
> /me grumbles about not spec.
> 
> > +    return 1;
> > +}
> > +
> > +void viridian_abort_apic_assist(struct vcpu *v)
> > +{
> > +    void *va = v->arch.hvm_vcpu.viridian.apic_assist.va;
> > +
> > +    if ( !(viridian_feature_mask(v->domain) & HVMPV_apic_assist) ||
> > +         !va )
> > +        return;
> 
> This is the third time I see that. Perhaps you can make a function
> out of it.
> 

Yes, or maybe a macro.

> > +
> > +    *(uint32_t *)va &= ~1u;
> > +    v->arch.hvm_vcpu.viridian.apic_assist.vector = -1;
> >  }
> >
> >  static void teardown_apic_assist(struct vcpu *v)
> > diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> > index 01a8430..aac4263 100644
> > --- a/xen/arch/x86/hvm/vlapic.c
> > +++ b/xen/arch/x86/hvm/vlapic.c
> > @@ -38,6 +38,7 @@
> >  #include <asm/hvm/support.h>
> >  #include <asm/hvm/vmx/vmx.h>
> >  #include <asm/hvm/nestedhvm.h>
> > +#include <asm/hvm/viridian.h>
> >  #include <public/hvm/ioreq.h>
> >  #include <public/hvm/params.h>
> >
> > @@ -95,6 +96,18 @@ static int vlapic_find_highest_vector(const void
> *bitmap)
> >      return (fls(word[word_offset*4]) - 1) + (word_offset * 32);
> >  }
> >
> > +static int vlapic_find_lowest_vector(const void *bitmap)
> > +{
> > +    const uint32_t *word = bitmap;
> > +    unsigned int word_offset;
> > +
> > +    /* Work forwards through the bitmap (first 32-bit word in every four).
> */
> 
> Would it be easier said:
> 
> Operate on chunks of 4 bytes?
> 

I was aping the comment in the 'find highest' function. I'll keep it as it is for consistency.

> > +    for ( word_offset = 0; word_offset < NR_VECTORS / 32; word_offset++)
> > +        if ( word[word_offset * 4] )
> > +            return (ffs(word[word_offset * 4]) - 1) + (word_offset * 32);
> > +
> > +    return -1;
> > +}
> >
> >  /*
> >   * IRR-specific bitmap update & search routines.
> > @@ -1157,7 +1170,7 @@ int vlapic_virtual_intr_delivery_enabled(void)
> >  int vlapic_has_pending_irq(struct vcpu *v)
> >  {
> >      struct vlapic *vlapic = vcpu_vlapic(v);
> > -    int irr, isr;
> > +    int irr, vector, isr;
> 
> vector = -1?
> 

No need. vector is not checked unless viridian_complete_apic_assist() returns 1, in which case it is guaranteed to be set.

  Paul

> >
> >      if ( !vlapic_enabled(vlapic) )
> >          return -1;
> > @@ -1170,10 +1183,27 @@ int vlapic_has_pending_irq(struct vcpu *v)
> >           !nestedhvm_vcpu_in_guestmode(v) )
> >          return irr;
> >
> > +    /*
> > +     * If APIC assist was used then there may have been no EOI so
> > +     * we need to clear the requisite bit from the ISR here, before
> > +     * comparing with the IRR.
> > +     */
> > +    if ( viridian_complete_apic_assist(v, &vector) &&
> > +         vector != -1 )
> > +        vlapic_clear_vector(vector, &vlapic->regs->data[APIC_ISR]);
> > +
> >      isr = vlapic_find_highest_isr(vlapic);
> >      isr = (isr != -1) ? isr : 0;
> >      if ( (isr & 0xf0) >= (irr & 0xf0) )
> > +    {
> > +        /*
> > +         * There's already a higher priority vector pending so
> > +         * we need to abort any previous APIC assist to ensure there
> > +         * is an EOI.
> > +         */
> > +        viridian_abort_apic_assist(v);
> >          return -1;
> > +    }
> >
> >      return irr;
> >  }
> > @@ -1181,13 +1211,29 @@ int vlapic_has_pending_irq(struct vcpu *v)
> >  int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool_t force_ack)
> >  {
> >      struct vlapic *vlapic = vcpu_vlapic(v);
> > +    int isr;
> >
> > -    if ( force_ack || !vlapic_virtual_intr_delivery_enabled() )
> > -    {
> > -        vlapic_set_vector(vector, &vlapic->regs->data[APIC_ISR]);
> > -        vlapic_clear_irr(vector, vlapic);
> > -    }
> > +    if ( !force_ack &&
> > +         vlapic_virtual_intr_delivery_enabled() )
> > +        return 1;
> > +
> > +    if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
> > +        goto done;
> > +
> > +    isr = vlapic_find_lowest_vector(&vlapic->regs->data[APIC_ISR]);
> > +    if ( isr >= 0 && isr < vector )
> > +        goto done;
> > +
> > +    /*
> > +     * This vector is edge triggered and there are no lower priority
> > +     * vectors pending, so we can use APIC assist to avoid exiting
> > +     * for EOI.
> > +     */
> > +    viridian_start_apic_assist(v, vector);
> >
> > +done:
> > +    vlapic_set_vector(vector, &vlapic->regs->data[APIC_ISR]);
> > +    vlapic_clear_irr(vector, vlapic);
> >      return 1;
> >  }
> >
> > diff --git a/xen/include/asm-x86/hvm/viridian.h b/xen/include/asm-
> x86/hvm/viridian.h
> > index c60c113..658d46a 100644
> > --- a/xen/include/asm-x86/hvm/viridian.h
> > +++ b/xen/include/asm-x86/hvm/viridian.h
> > @@ -25,6 +25,7 @@ struct viridian_vcpu
> >          union viridian_apic_assist msr;
> >          struct page_info *page;
> >          void *va;
> > +        int vector;
> >      } apic_assist;
> >      cpumask_var_t flush_cpumask;
> >  };
> > @@ -125,6 +126,10 @@ void viridian_time_ref_count_thaw(struct domain
> *d);
> >  int viridian_vcpu_init(struct vcpu *v);
> >  void viridian_vcpu_deinit(struct vcpu *v);
> >
> > +void viridian_start_apic_assist(struct vcpu *v, int vector);
> > +bool_t viridian_complete_apic_assist(struct vcpu *v, int *vector);
> > +void viridian_abort_apic_assist(struct vcpu *v);
> > +
> >  #endif /* __ASM_X86_HVM_VIRIDIAN_H__ */
> >
> >  /*
> > diff --git a/xen/include/public/hvm/params.h
> b/xen/include/public/hvm/params.h
> > index 73d4718..e69c72c 100644
> > --- a/xen/include/public/hvm/params.h
> > +++ b/xen/include/public/hvm/params.h
> > @@ -115,12 +115,17 @@
> >  #define _HVMPV_hcall_remote_tlb_flush 4
> >  #define HVMPV_hcall_remote_tlb_flush (1 <<
> _HVMPV_hcall_remote_tlb_flush)
> >
> > +/* Use APIC assist */
> > +#define _HVMPV_apic_assist 5
> > +#define HVMPV_apic_assist (1 << _HVMPV_apic_assist)
> > +
> >  #define HVMPV_feature_mask \
> >          (HVMPV_base_freq | \
> >           HVMPV_no_freq | \
> >           HVMPV_time_ref_count | \
> >           HVMPV_reference_tsc | \
> > -         HVMPV_hcall_remote_tlb_flush)
> > +         HVMPV_hcall_remote_tlb_flush | \
> > +         HVMPV_apic_assist)
> >
> >  #endif
> >
> > --
> > 2.1.4
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel