Re: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

From: "Tian, Kevin" <kevin.tian@intel.com>
To: "Xuquan (Euler)" <xuquan8@huawei.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: "yang.zhang.wz@gmail.com" <yang.zhang.wz@gmail.com>,
	"jbeulich@suse.com" <jbeulich@suse.com>,
	"George.Dunlap@eu.citrix.com" <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>
Subject: Re: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
Date: Mon, 12 Sep 2016 07:57:44 +0000	[thread overview]
Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D18DED4739@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <E0A769A898ADB6449596C41F51EF62C6AA3E8E@SZXEMI506-MBX.china.huawei.com>

> From: Xuquan (Euler) [mailto:xuquan8@huawei.com]
> Sent: Friday, September 09, 2016 11:02 AM
> 
> On August 30, 2016 1:58 PM, Tian Kevin < kevin.tian@intel.com > wrote:
> >> From: Xuquan (Euler) [mailto:xuquan8@huawei.com]
> >> Sent: Friday, August 19, 2016 8:59 PM
> >>
> >> When Xen apicv is enabled, wall clock time is faster on Windows7-32
> >> guest with high payload (with 2vCPU, captured from xentrace, in high
> >> payload, the count of IPI interrupt increases rapidly between these vCPUs).
> >>
> >> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
> >> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the
> >> IPI intrrupt is high priority than periodic timer interrupt. Xen
> >> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI)
> >> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI
> >> interrupt within VMX non-root operation without a VM exit. Within VMX
> >> non-root operation, if periodic timer interrupt index of bit is set in
> >> VIRR and highest, the apicv delivers periodic timer interrupt within VMX
> >non-root operation as well.
> >>
> >> But in current code, if Xen doesn't update periodic timer interrupt
> >> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
> >> aware of this case to decrease the count (pending_intr_nr) of pending
> >> periodic timer interrupt, then Xen will deliver a periodic timer
> >> interrupt again. The guest receives more periodic timer interrupt.
> >>
> >> If the periodic timer interrut is delivered and not the highest
> >> priority, make Xen be aware of this case to decrease the count of
> >> pending periodic timer interrupt.
> >>
> >> Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
> >> Signed-off-by: Rongguang He <herongguang.he@huawei.com>
> >> Signed-off-by: Quan Xu <xuquan8@huawei.com>
> >> ---
> >> Why RFC:
> >> 1. I am not quite sure for other cases, such as nested case.
> >> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including
> >external
> >>    interrupts, non-maskable interrupt system-management interrrupts,
> >exceptions
> >>    and VM exit) may occur before delivery of a periodic timer interrupt, the
> >periodic
> >>    timer interrupt may be lost when a coming periodic timer interrupt is
> >delivered.
> >>    Actually, and so current code is.
> >> ---
> >>  xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++-
> >>  1 file changed, 15 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
> >> index 8fca08c..d3a034e 100644
> >> --- a/xen/arch/x86/hvm/vmx/intr.c
> >> +++ b/xen/arch/x86/hvm/vmx/intr.c
> >> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
> >>              __vmwrite(EOI_EXIT_BITMAP(i),
> >v->arch.hvm_vmx.eoi_exit_bitmap[i]);
> >>          }
> >>
> >> -        pt_intr_post(v, intack);
> >> +       /*
> >> +        * If the periodic timer interrut is delivered and not the highest
> >priority,
> >> +        * make Xen be aware of this case to decrease the count of pending
> >periodic
> >> +        * timer interrupt.
> >> +        */
> >> +        if ( pt_vector != -1 && intack.vector > pt_vector )
> >> +        {
> >> +            struct hvm_intack pt_intack;
> >> +
> >> +            pt_intack.vector = pt_vector;
> >> +            pt_intack.source = hvm_intsrc_lapic;
> >> +            pt_intr_post(v, pt_intack);
> >> +        }
> >> +        else
> >> +            pt_intr_post(v, intack);
> >>      }
> >
> >Here is my thought. w/o above patch one pending pt interrupt may result in
> >multiple injections of guest timer interrupt, as you identified, when pt_vector
> >happens to be not the highest pending vector. Then w/ your patch, multiple
> >pending pt interrupt instances may be combined as one injection of guest timer
> >interrupt. Especially intack.vector
> >>pt_vector doesn't mean pt_intr is eligible for injection, which might
> >be blocked due to other reasons. As Jan pointed out, this path is very fragile, so
> >better we can have a fix to sustain the original behavior with one pending pt
> >instance causing one actual injection.
> >
> >What about doing pt_intr_post in EOI-induced VM-exit instead? EOI exit bit for
> >pt_intr is already set to 1 which means every injection would incur an
> >EOI-induced VM-exit:
> >
> >       /*
> >        * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced
> >VM
> >        * exit, then pending periodic time interrups have the chance to be
> >injected
> >        * for compensation
> >        */
> >        if (pt_vector != -1)
> >            vmx_set_eoi_exit_bitmap(v, pt_vector);
> >
> >I don't think delaying post makes much difference here. Even we do post earlier
> >like your patch, further pending instances have to wait until current instance is
> >completed. So as long as post happens before EOI is completed, it should be
> >fine.
> 
> Kevin, I verify your suggestion with below modification. wall clock time is __still__ faster.
> I hope this modification is correct to your suggestion.
> 
> I __guess__ that the vm-entry is more frequent than PT interrupt,
> Specifically, if there is a PT interrupt pending, the count (pending_intr_nr) is 1..
> 
> before next PT interrupt coming, each PT interrupt injection may not incur an EOI-induced
> VM-exit directly, multiple PT interrupt inject for a pending PT interrupt,
> then multiple EOI-induced VM-exit incur with multiple pt_intr_post() to decrease the count
> (pending_intr_nr), but the count (pending_intr_nr) is still 1 before next PT interrupt coming,
> then only one pt_intr_post() is effective..

I don't quite understand your description here, but just for your patch see below...

> 
> Thanks for your time!!
> Quan
> 
> ======= modification ========
> 
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 1d5d287..cc247c3 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -433,6 +433,11 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>  void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
>  {
>      struct domain *d = vlapic_domain(vlapic);
> +    struct hvm_intack pt_intack;
> +
> +    pt_intack.vector = vector;
> +    pt_intack.source = hvm_intsrc_lapic;
> +    pt_intr_post(vlapic_vcpu(vlapic), pt_intack);
> 
>      if ( vlapic_test_and_clear_vector(vector, &vlapic->regs->data[APIC_TMR]) )
>          vioapic_update_EOI(d, vector);
> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
> index 8fca08c..29d9bbf 100644
> --- a/xen/arch/x86/hvm/vmx/intr.c
> +++ b/xen/arch/x86/hvm/vmx/intr.c
> @@ -333,8 +333,6 @@ void vmx_intr_assist(void)
>              clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed);
>              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>          }
> -
> -        pt_intr_post(v, intack);
>      }
>      else
>      {
> 

Because we update pt irq in every vmentry, there is a chance that 
already-injected instance (before EOI-induced exit happens) will 
incur another pending IRR setting if there is a VM-exit happens 
between HW virtual interrupt injection (vIRR->0, vISR->1) and 
EOI-induced exit (vISR->0), since pt_intr_post hasn't been invoked
yet. I guess this is the reason why you still see faster wallclock.

I think you need mark this pending_intr_post situation explicitly. 
Then pt_update_irq should skip such pt timer when pending_intr_post 
of that timer is true (otherwise the update is meaningless since 
previous one hasn't been posted yet). Then with your change to post 
in EOI-induced exit handler, it should work correctly to meet the goal 
- one virtual interrupt delivery for one pending pt intr...

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel