All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: "Xuquan (Euler)" <xuquan8@huawei.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: "yang.zhang.wz@gmail.com" <yang.zhang.wz@gmail.com>,
	"jbeulich@suse.com" <jbeulich@suse.com>,
	"George.Dunlap@eu.citrix.com" <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>
Subject: Re: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
Date: Mon, 12 Sep 2016 07:57:44 +0000	[thread overview]
Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D18DED4739@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <E0A769A898ADB6449596C41F51EF62C6AA3E8E@SZXEMI506-MBX.china.huawei.com>

> From: Xuquan (Euler) [mailto:xuquan8@huawei.com]
> Sent: Friday, September 09, 2016 11:02 AM
> 
> On August 30, 2016 1:58 PM, Tian Kevin < kevin.tian@intel.com > wrote:
> >> From: Xuquan (Euler) [mailto:xuquan8@huawei.com]
> >> Sent: Friday, August 19, 2016 8:59 PM
> >>
> >> When Xen apicv is enabled, wall clock time is faster on Windows7-32
> >> guest with high payload (with 2vCPU, captured from xentrace, in high
> >> payload, the count of IPI interrupt increases rapidly between these vCPUs).
> >>
> >> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
> >> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the
> >> IPI intrrupt is high priority than periodic timer interrupt. Xen
> >> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI)
> >> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI
> >> interrupt within VMX non-root operation without a VM exit. Within VMX
> >> non-root operation, if periodic timer interrupt index of bit is set in
> >> VIRR and highest, the apicv delivers periodic timer interrupt within VMX
> >non-root operation as well.
> >>
> >> But in current code, if Xen doesn't update periodic timer interrupt
> >> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
> >> aware of this case to decrease the count (pending_intr_nr) of pending
> >> periodic timer interrupt, then Xen will deliver a periodic timer
> >> interrupt again. The guest receives more periodic timer interrupt.
> >>
> >> If the periodic timer interrut is delivered and not the highest
> >> priority, make Xen be aware of this case to decrease the count of
> >> pending periodic timer interrupt.
> >>
> >> Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
> >> Signed-off-by: Rongguang He <herongguang.he@huawei.com>
> >> Signed-off-by: Quan Xu <xuquan8@huawei.com>
> >> ---
> >> Why RFC:
> >> 1. I am not quite sure for other cases, such as nested case.
> >> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including
> >external
> >>    interrupts, non-maskable interrupt system-management interrrupts,
> >exceptions
> >>    and VM exit) may occur before delivery of a periodic timer interrupt, the
> >periodic
> >>    timer interrupt may be lost when a coming periodic timer interrupt is
> >delivered.
> >>    Actually, and so current code is.
> >> ---
> >>  xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++-
> >>  1 file changed, 15 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
> >> index 8fca08c..d3a034e 100644
> >> --- a/xen/arch/x86/hvm/vmx/intr.c
> >> +++ b/xen/arch/x86/hvm/vmx/intr.c
> >> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
> >>              __vmwrite(EOI_EXIT_BITMAP(i),
> >v->arch.hvm_vmx.eoi_exit_bitmap[i]);
> >>          }
> >>
> >> -        pt_intr_post(v, intack);
> >> +       /*
> >> +        * If the periodic timer interrut is delivered and not the highest
> >priority,
> >> +        * make Xen be aware of this case to decrease the count of pending
> >periodic
> >> +        * timer interrupt.
> >> +        */
> >> +        if ( pt_vector != -1 && intack.vector > pt_vector )
> >> +        {
> >> +            struct hvm_intack pt_intack;
> >> +
> >> +            pt_intack.vector = pt_vector;
> >> +            pt_intack.source = hvm_intsrc_lapic;
> >> +            pt_intr_post(v, pt_intack);
> >> +        }
> >> +        else
> >> +            pt_intr_post(v, intack);
> >>      }
> >
> >Here is my thought. w/o above patch one pending pt interrupt may result in
> >multiple injections of guest timer interrupt, as you identified, when pt_vector
> >happens to be not the highest pending vector. Then w/ your patch, multiple
> >pending pt interrupt instances may be combined as one injection of guest timer
> >interrupt. Especially intack.vector
> >>pt_vector doesn't mean pt_intr is eligible for injection, which might
> >be blocked due to other reasons. As Jan pointed out, this path is very fragile, so
> >better we can have a fix to sustain the original behavior with one pending pt
> >instance causing one actual injection.
> >
> >What about doing pt_intr_post in EOI-induced VM-exit instead? EOI exit bit for
> >pt_intr is already set to 1 which means every injection would incur an
> >EOI-induced VM-exit:
> >
> >       /*
> >        * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced
> >VM
> >        * exit, then pending periodic time interrups have the chance to be
> >injected
> >        * for compensation
> >        */
> >        if (pt_vector != -1)
> >            vmx_set_eoi_exit_bitmap(v, pt_vector);
> >
> >I don't think delaying post makes much difference here. Even we do post earlier
> >like your patch, further pending instances have to wait until current instance is
> >completed. So as long as post happens before EOI is completed, it should be
> >fine.
> 
> Kevin, I verify your suggestion with below modification. wall clock time is __still__ faster.
> I hope this modification is correct to your suggestion.
> 
> I __guess__ that the vm-entry is more frequent than PT interrupt,
> Specifically, if there is a PT interrupt pending, the count (pending_intr_nr) is 1..
> 
> before next PT interrupt coming, each PT interrupt injection may not incur an EOI-induced
> VM-exit directly, multiple PT interrupt inject for a pending PT interrupt,
> then multiple EOI-induced VM-exit incur with multiple pt_intr_post() to decrease the count
> (pending_intr_nr), but the count (pending_intr_nr) is still 1 before next PT interrupt coming,
> then only one pt_intr_post() is effective..

I don't quite understand your description here, but just for your patch see below...

> 
> Thanks for your time!!
> Quan
> 
> ======= modification ========
> 
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 1d5d287..cc247c3 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -433,6 +433,11 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>  void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
>  {
>      struct domain *d = vlapic_domain(vlapic);
> +    struct hvm_intack pt_intack;
> +
> +    pt_intack.vector = vector;
> +    pt_intack.source = hvm_intsrc_lapic;
> +    pt_intr_post(vlapic_vcpu(vlapic), pt_intack);
> 
>      if ( vlapic_test_and_clear_vector(vector, &vlapic->regs->data[APIC_TMR]) )
>          vioapic_update_EOI(d, vector);
> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
> index 8fca08c..29d9bbf 100644
> --- a/xen/arch/x86/hvm/vmx/intr.c
> +++ b/xen/arch/x86/hvm/vmx/intr.c
> @@ -333,8 +333,6 @@ void vmx_intr_assist(void)
>              clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed);
>              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>          }
> -
> -        pt_intr_post(v, intack);
>      }
>      else
>      {
> 

Because we update pt irq in every vmentry, there is a chance that 
already-injected instance (before EOI-induced exit happens) will 
incur another pending IRR setting if there is a VM-exit happens 
between HW virtual interrupt injection (vIRR->0, vISR->1) and 
EOI-induced exit (vISR->0), since pt_intr_post hasn't been invoked
yet. I guess this is the reason why you still see faster wallclock.

I think you need mark this pending_intr_post situation explicitly. 
Then pt_update_irq should skip such pt timer when pending_intr_post 
of that timer is true (otherwise the update is meaningless since 
previous one hasn't been posted yet). Then with your change to post 
in EOI-induced exit handler, it should work correctly to meet the goal 
- one virtual interrupt delivery for one pending pt intr...

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-09-12  7:57 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-19 12:58 [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue Xuquan (Euler)
2016-08-22  9:38 ` Yang Zhang
2016-08-22 10:52   ` Xuquan (Euler)
2016-08-22 10:35 ` Jan Beulich
2016-08-22 11:40   ` Yang Zhang
2016-08-22 11:52     ` Xuquan (Euler)
2016-08-22 12:01     ` Jan Beulich
2016-08-22 11:41   ` Xuquan (Euler)
2016-08-22 12:04     ` Jan Beulich
2016-08-22 13:09       ` Yang Zhang
2016-08-22 13:18         ` Xuquan (Euler)
2016-08-22 14:57         ` Jan Beulich
2016-08-23  2:11           ` Yang Zhang
2016-08-22 14:02       ` Xuquan (Euler)
2016-08-22 14:07         ` Yang Zhang
2016-08-22 14:15         ` Yang Zhang
2016-08-22 15:11         ` Jan Beulich
2016-08-23  0:40           ` Xuquan (Euler)
2016-08-30  5:58 ` Tian, Kevin
2016-09-06  8:53   ` Xuquan (Euler)
2016-09-06  9:04     ` Jan Beulich
2016-09-06 11:22       ` Xuquan (Euler)
2016-09-06 13:20         ` Jan Beulich
2016-09-07  0:47           ` Xuquan (Euler)
2016-09-09  3:02   ` Xuquan (Euler)
2016-09-12  7:57     ` Tian, Kevin [this message]
2016-09-12  9:07       ` Xuquan (Euler)
2016-09-19  9:25         ` Tian, Kevin
2016-09-20  0:25           ` Xuquan (Euler)
2016-09-20  0:46             ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AADFC41AFE54684AB9EE6CBC0274A5D18DED4739@SHSMSX101.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jun.nakajima@intel.com \
    --cc=xen-devel@lists.xen.org \
    --cc=xuquan8@huawei.com \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.