xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
@ 2016-09-20 13:30 Xuquan (Euler)
  2016-09-23 15:33 ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Xuquan (Euler) @ 2016-09-20 13:30 UTC (permalink / raw)
  To: xen-devel
  Cc: yang.zhang.wz, Kevin Tian, jbeulich, George.Dunlap,
	Andrew Cooper, Hanweidong (Randy),
	Jiangyifei, Nakajima, Jun

[-- Attachment #1: Type: text/plain, Size: 4047 bytes --]

>From 97760602b5c94745e76ed78d23e8fdf9988d234e Mon Sep 17 00:00:00 2001
From: Quan Xu <xuquan8@huawei.com>
Date: Tue, 20 Sep 2016 21:12:54 +0800
Subject: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue

When Xen apicv is enabled, wall clock time is faster on Windows7-32
guest with high payload (with 2vCPU, captured from xentrace, in
high payload, the count of IPI interrupt increases rapidly between
these vCPUs).

If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
are both pending (index of bit set in vIRR), unfortunately, the IPI
intrrupt is high priority than periodic timer interrupt. Xen updates
IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
within VMX non-root operation without a VM-Exit. Within VMX non-root
operation, if periodic timer interrupt index of bit is set in vIRR and
highest, the apicv delivers periodic timer interrupt within VMX non-root
operation as well.

But in current code, if Xen doesn't update periodic timer interrupt bit
set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
of this case to decrease the count (pending_intr_nr) of pending periodic
timer interrupt, then Xen will deliver a periodic timer interrupt again.

And that we update periodic timer interrupt in every VM-entry, there is
a chance that already-injected instance (before EOI-induced exit happens)
will incur another pending IRR setting if there is a VM-exit happens
between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
guest receives more periodic timer interrupt.

So change to pt_intr_post in EOI-induced exit handler and skip periodic
timer when it is not be completely consumed (irq_issued is ture).

Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
Signed-off-by: Rongguang He <herongguang.he@huawei.com>
Signed-off-by: Quan Xu <xuquan8@huawei.com>

---
v2:
  -change to pt_intr_post in EOI-induced exit handler.
  -skip periodic timer when it is not be completely consumed
   (irq_issued is ture).
---
 xen/arch/x86/hvm/vlapic.c   | 6 ++++++
 xen/arch/x86/hvm/vmx/intr.c | 2 --
 xen/arch/x86/hvm/vpt.c      | 3 ++-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 1d5d287..f83d6ab 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
 void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
 {
     struct domain *d = vlapic_domain(vlapic);
+    struct vcpu *v = vlapic_vcpu(vlapic);
+    struct hvm_intack pt_intack;
+
+    pt_intack.vector = vector;
+    pt_intack.source = hvm_intsrc_lapic;
+    pt_intr_post(v, pt_intack);

     if ( vlapic_test_and_clear_vector(vector, &vlapic->regs->data[APIC_TMR]) )
         vioapic_update_EOI(d, vector);
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 8fca08c..29d9bbf 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -333,8 +333,6 @@ void vmx_intr_assist(void)
             clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed);
             __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
         }
-
-        pt_intr_post(v, intack);
     }
     else
     {
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 5c48fdb..a9da436 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v)
             }
             else
             {
-                if ( (pt->last_plt_gtime + pt->period) < max_lag )
+                if ( (pt->last_plt_gtime + pt->period) < max_lag &&
+                     !pt->irq_issued )
                 {
                     max_lag = pt->last_plt_gtime + pt->period;
                     earliest_pt = pt;
--
1.8.3.4


[-- Attachment #2: 0001-x86-apicv-fix-RTC-periodic-timer-and-apicv-issue.patch --]
[-- Type: application/octet-stream, Size: 3951 bytes --]

From 97760602b5c94745e76ed78d23e8fdf9988d234e Mon Sep 17 00:00:00 2001
From: Quan Xu <xuquan8@huawei.com>
Date: Tue, 20 Sep 2016 21:12:54 +0800
Subject: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue

When Xen apicv is enabled, wall clock time is faster on Windows7-32
guest with high payload (with 2vCPU, captured from xentrace, in
high payload, the count of IPI interrupt increases rapidly between
these vCPUs).

If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
are both pending (index of bit set in vIRR), unfortunately, the IPI
intrrupt is high priority than periodic timer interrupt. Xen updates
IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
within VMX non-root operation without a VM-Exit. Within VMX non-root
operation, if periodic timer interrupt index of bit is set in vIRR and
highest, the apicv delivers periodic timer interrupt within VMX non-root
operation as well.

But in current code, if Xen doesn't update periodic timer interrupt bit
set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
of this case to decrease the count (pending_intr_nr) of pending periodic
timer interrupt, then Xen will deliver a periodic timer interrupt again.

And that we update periodic timer interrupt in every VM-entry, there is
a chance that already-injected instance (before EOI-induced exit happens)
will incur another pending IRR setting if there is a VM-exit happens
between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
guest receives more periodic timer interrupt.

So change to pt_intr_post in EOI-induced exit handler and skip periodic
timer when it is not be completely consumed (irq_issued is ture).

Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
Signed-off-by: Rongguang He <herongguang.he@huawei.com>
Signed-off-by: Quan Xu <xuquan8@huawei.com>

---
v2:
  -change to pt_intr_post in EOI-induced exit handler.
  -skip periodic timer when it is not be completely consumed
   (irq_issued is ture).
---
 xen/arch/x86/hvm/vlapic.c   | 6 ++++++
 xen/arch/x86/hvm/vmx/intr.c | 2 --
 xen/arch/x86/hvm/vpt.c      | 3 ++-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 1d5d287..f83d6ab 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
 void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
 {
     struct domain *d = vlapic_domain(vlapic);
+    struct vcpu *v = vlapic_vcpu(vlapic);
+    struct hvm_intack pt_intack;
+
+    pt_intack.vector = vector;
+    pt_intack.source = hvm_intsrc_lapic;
+    pt_intr_post(v, pt_intack);
 
     if ( vlapic_test_and_clear_vector(vector, &vlapic->regs->data[APIC_TMR]) )
         vioapic_update_EOI(d, vector);
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 8fca08c..29d9bbf 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -333,8 +333,6 @@ void vmx_intr_assist(void)
             clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed);
             __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
         }
-
-        pt_intr_post(v, intack);
     }
     else
     {
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 5c48fdb..a9da436 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v)
             }
             else
             {
-                if ( (pt->last_plt_gtime + pt->period) < max_lag )
+                if ( (pt->last_plt_gtime + pt->period) < max_lag &&
+                     !pt->irq_issued )
                 {
                     max_lag = pt->last_plt_gtime + pt->period;
                     earliest_pt = pt;
-- 
1.8.3.4


[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-09-20 13:30 [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue Xuquan (Euler)
@ 2016-09-23 15:33 ` Jan Beulich
  2016-09-23 23:34   ` Tian, Kevin
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-09-23 15:33 UTC (permalink / raw)
  To: Xuquan (Euler), Tim Deegan
  Cc: yang.zhang.wz, Kevin Tian, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, xen-devel, Jiangyifei,
	Jun Nakajima

>>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>  void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
>  {
>      struct domain *d = vlapic_domain(vlapic);
> +    struct vcpu *v = vlapic_vcpu(vlapic);
> +    struct hvm_intack pt_intack;
> +
> +    pt_intack.vector = vector;
> +    pt_intack.source = hvm_intsrc_lapic;
> +    pt_intr_post(v, pt_intack);

This also sits on the EOI LAPIC register write path, i.e. the change
then also affects non-apicv environments.

Furthermore - don't we have the same problem as with v1 again
then? What prevents multiple EOIs to come here before the timer
interrupt actually gets handled? You'd then clear ->irq_issued
each time, rendering your change to pt_update_irq() ineffective.

In any event please use hvm_intack_lapic() instead of open
coding it (and then I don't think you need a local variable at all).

> --- a/xen/arch/x86/hvm/vmx/intr.c
> +++ b/xen/arch/x86/hvm/vmx/intr.c
> @@ -333,8 +333,6 @@ void vmx_intr_assist(void)
>              clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed);
>              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>          }
> -
> -        pt_intr_post(v, intack);
>      }

I'll defer to the VMX maintainers to determine whether removing this
but not the other one is correct.

> --- a/xen/arch/x86/hvm/vpt.c
> +++ b/xen/arch/x86/hvm/vpt.c
> @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v)
>              }
>              else
>              {
> -                if ( (pt->last_plt_gtime + pt->period) < max_lag )
> +                if ( (pt->last_plt_gtime + pt->period) < max_lag &&
> +                     !pt->irq_issued )
>                  {
>                      max_lag = pt->last_plt_gtime + pt->period;
>                      earliest_pt = pt;

Looking at the rest of the code I really wonder why this check
hadn't been there from the beginning. Tim, do recall whether
this was intentional (as opposed to simply having been an
oversight)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-09-23 15:33 ` Jan Beulich
@ 2016-09-23 23:34   ` Tian, Kevin
  2016-09-24  1:06     ` Xuquan (Euler)
  2016-09-26  6:35     ` Jan Beulich
  0 siblings, 2 replies; 23+ messages in thread
From: Tian, Kevin @ 2016-09-23 23:34 UTC (permalink / raw)
  To: Jan Beulich, Xuquan (Euler), Tim Deegan
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, xen-devel, Jiangyifei, Nakajima,
	Jun

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, September 23, 2016 11:34 PM
> 
> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
> > --- a/xen/arch/x86/hvm/vlapic.c
> > +++ b/xen/arch/x86/hvm/vlapic.c
> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
> >  void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
> >  {
> >      struct domain *d = vlapic_domain(vlapic);
> > +    struct vcpu *v = vlapic_vcpu(vlapic);
> > +    struct hvm_intack pt_intack;
> > +
> > +    pt_intack.vector = vector;
> > +    pt_intack.source = hvm_intsrc_lapic;
> > +    pt_intr_post(v, pt_intack);
> 
> This also sits on the EOI LAPIC register write path, i.e. the change
> then also affects non-apicv environments.

The new logic should be entered only when EOI-induced exit happens.

> 
> Furthermore - don't we have the same problem as with v1 again
> then? What prevents multiple EOIs to come here before the timer
> interrupt actually gets handled? You'd then clear ->irq_issued
> each time, rendering your change to pt_update_irq() ineffective.

based on this patch, one irq_issued should cause only one EOI on
related pt vector and this EOI exit clears irq_issued then next EOI
would be seen only upon another injection when irq_issued is set
again... However there might be an issue if this pt vector is shared 
with another device interrupt, which although is not a typical case...

> 
> In any event please use hvm_intack_lapic() instead of open
> coding it (and then I don't think you need a local variable at all).
> 
> > --- a/xen/arch/x86/hvm/vmx/intr.c
> > +++ b/xen/arch/x86/hvm/vmx/intr.c
> > @@ -333,8 +333,6 @@ void vmx_intr_assist(void)
> >              clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed);
> >              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
> >          }
> > -
> > -        pt_intr_post(v, intack);
> >      }
> 
> I'll defer to the VMX maintainers to determine whether removing this
> but not the other one is correct.

This is correct. The other one is for non-apicv scenario.

> 
> > --- a/xen/arch/x86/hvm/vpt.c
> > +++ b/xen/arch/x86/hvm/vpt.c
> > @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v)
> >              }
> >              else
> >              {
> > -                if ( (pt->last_plt_gtime + pt->period) < max_lag )
> > +                if ( (pt->last_plt_gtime + pt->period) < max_lag &&
> > +                     !pt->irq_issued )
> >                  {
> >                      max_lag = pt->last_plt_gtime + pt->period;
> >                      earliest_pt = pt;
> 
> Looking at the rest of the code I really wonder why this check
> hadn't been there from the beginning. Tim, do recall whether
> this was intentional (as opposed to simply having been an
> oversight)?
> 

Possibly simplify the emulation by relying on IRR/ISR to handling pending
situation?

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-09-23 23:34   ` Tian, Kevin
@ 2016-09-24  1:06     ` Xuquan (Euler)
  2016-09-26  6:39       ` Jan Beulich
  2016-09-26  6:35     ` Jan Beulich
  1 sibling, 1 reply; 23+ messages in thread
From: Xuquan (Euler) @ 2016-09-24  1:06 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Tim Deegan
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, xen-devel, Jiangyifei, Nakajima,
	Jun

On September 24, 2016 7:34 AM, Tian Kevin < kevin.tian@intel.com > wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, September 23, 2016 11:34 PM
>>
>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
>> > --- a/xen/arch/x86/hvm/vlapic.c
>> > +++ b/xen/arch/x86/hvm/vlapic.c
>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
>> >      struct domain *d = vlapic_domain(vlapic);
>> > +    struct vcpu *v = vlapic_vcpu(vlapic);
>> > +    struct hvm_intack pt_intack;
>> > +
>> > +    pt_intack.vector = vector;
>> > +    pt_intack.source = hvm_intsrc_lapic;
>> > +    pt_intr_post(v, pt_intack);
>>
>> This also sits on the EOI LAPIC register write path, i.e. the change
>> then also affects non-apicv environments.
>
>The new logic should be entered only when EOI-induced exit happens.
>

Yes, more that the EOI-induced exit is conditional, specifically, the bitmap is set by vmx_set_eoi_exit_bitmap().
Jan, what do you think? While I recall from v1 discussion, you have the same comment. We can dig it deep..

>>
>> Furthermore - don't we have the same problem as with v1 again then?
>> What prevents multiple EOIs to come here before the timer interrupt
>> actually gets handled? You'd then clear ->irq_issued each time,
>> rendering your change to pt_update_irq() ineffective.
>
>based on this patch, one irq_issued should cause only one EOI on related pt
>vector and this EOI exit clears irq_issued then next EOI would be seen only upon
>another injection when irq_issued is set again... However there might be an
>issue if this pt vector is shared with another device interrupt, which although is
>not a typical case...
>
Agreed.

>>
>> In any event please use hvm_intack_lapic() instead of open coding it
>> (and then I don't think you need a local variable at all).
>>

Indeed.

>> > --- a/xen/arch/x86/hvm/vmx/intr.c
>> > +++ b/xen/arch/x86/hvm/vmx/intr.c
>> > @@ -333,8 +333,6 @@ void vmx_intr_assist(void)
>
>> >              clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed);
>> >              __vmwrite(EOI_EXIT_BITMAP(i),
>v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>> >          }
>> > -
>> > -        pt_intr_post(v, intack);
>> >      }
>>
>> I'll defer to the VMX maintainers to determine whether removing this
>> but not the other one is correct.
>
>This is correct. The other one is for non-apicv scenario.
>
>>
>> > --- a/xen/arch/x86/hvm/vpt.c
>> > +++ b/xen/arch/x86/hvm/vpt.c
>> > @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v)
>> >              }
>> >              else
>> >              {
>> > -                if ( (pt->last_plt_gtime + pt->period) < max_lag )
>> > +                if ( (pt->last_plt_gtime + pt->period) < max_lag &&
>> > +                     !pt->irq_issued )
>> >                  {
>> >                      max_lag = pt->last_plt_gtime + pt->period;
>> >                      earliest_pt = pt;
>>
>> Looking at the rest of the code I really wonder why this check hadn't
>> been there from the beginning. Tim, do recall whether this was
>> intentional (as opposed to simply having been an oversight)?
>>
>
>Possibly simplify the emulation by relying on IRR/ISR to handling pending
>situation?

I think IRR is enough. But there is a challenge that the pt interrupt is __not_only__ handled by LAPIC ( see the bottom of pt_update_irq() )..

Quan






_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-09-23 23:34   ` Tian, Kevin
  2016-09-24  1:06     ` Xuquan (Euler)
@ 2016-09-26  6:35     ` Jan Beulich
  2016-09-26 19:34       ` Tian, Kevin
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-09-26  6:35 UTC (permalink / raw)
  To: Xuquan (Euler), Kevin Tian
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Jun Nakajima

>>> On 24.09.16 at 01:34, <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, September 23, 2016 11:34 PM
>> 
>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
>> > --- a/xen/arch/x86/hvm/vlapic.c
>> > +++ b/xen/arch/x86/hvm/vlapic.c
>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>> >  void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
>> >  {
>> >      struct domain *d = vlapic_domain(vlapic);
>> > +    struct vcpu *v = vlapic_vcpu(vlapic);
>> > +    struct hvm_intack pt_intack;
>> > +
>> > +    pt_intack.vector = vector;
>> > +    pt_intack.source = hvm_intsrc_lapic;
>> > +    pt_intr_post(v, pt_intack);
>> 
>> This also sits on the EOI LAPIC register write path, i.e. the change
>> then also affects non-apicv environments.
> 
> The new logic should be entered only when EOI-induced exit happens.

To me this reply reads ambiguously: Does "should" here mean the
patch needs further adjustment in your opinion, or that you think
the patch already does what is needed to ensure the new logic to
bet involved only in the EOI-induced exit case. To me it continues
to look like the former.

>> Furthermore - don't we have the same problem as with v1 again
>> then? What prevents multiple EOIs to come here before the timer
>> interrupt actually gets handled? You'd then clear ->irq_issued
>> each time, rendering your change to pt_update_irq() ineffective.
> 
> based on this patch, one irq_issued should cause only one EOI on
> related pt vector and this EOI exit clears irq_issued then next EOI
> would be seen only upon another injection when irq_issued is set
> again... However there might be an issue if this pt vector is shared 
> with another device interrupt, which although is not a typical case...

That's a common problem: I don't think we can consider only the
typical case. Or does hardware only deal with those, too?

And then the "should" here reads as ambiguously to me as the
earlier one, the more that you seem to consider EOIs for only the
one vector of interest, whereas my reply was specifically meant
to cover also EOIs for other vectors.

>> > --- a/xen/arch/x86/hvm/vmx/intr.c
>> > +++ b/xen/arch/x86/hvm/vmx/intr.c
>> > @@ -333,8 +333,6 @@ void vmx_intr_assist(void)
>> >              clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed);
>> >              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>> >          }
>> > -
>> > -        pt_intr_post(v, intack);
>> >      }
>> 
>> I'll defer to the VMX maintainers to determine whether removing this
>> but not the other one is correct.
> 
> This is correct. The other one is for non-apicv scenario.

Sure, but the other path will also get used under certain conditions
even when apicv is in use.

>> > --- a/xen/arch/x86/hvm/vpt.c
>> > +++ b/xen/arch/x86/hvm/vpt.c
>> > @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v)
>> >              }
>> >              else
>> >              {
>> > -                if ( (pt->last_plt_gtime + pt->period) < max_lag )
>> > +                if ( (pt->last_plt_gtime + pt->period) < max_lag &&
>> > +                     !pt->irq_issued )
>> >                  {
>> >                      max_lag = pt->last_plt_gtime + pt->period;
>> >                      earliest_pt = pt;
>> 
>> Looking at the rest of the code I really wonder why this check
>> hadn't been there from the beginning. Tim, do recall whether
>> this was intentional (as opposed to simply having been an
>> oversight)?
> 
> Possibly simplify the emulation by relying on IRR/ISR to handling pending
> situation?

Again I'm suffering ambiguity here: Do you suggest a possible
explanation for why things are the way they are, or a possible
code adjustment to be done?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-09-24  1:06     ` Xuquan (Euler)
@ 2016-09-26  6:39       ` Jan Beulich
  2016-10-10  9:21         ` Xuquan (Quan Xu)
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-09-26  6:39 UTC (permalink / raw)
  To: Xuquan (Euler), Kevin Tian
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Jun Nakajima

>>> On 24.09.16 at 03:06, <xuquan8@huawei.com> wrote:
> On September 24, 2016 7:34 AM, Tian Kevin < kevin.tian@intel.com > wrote:
>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>> Sent: Friday, September 23, 2016 11:34 PM
>>>
>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
>>> > --- a/xen/arch/x86/hvm/vlapic.c
>>> > +++ b/xen/arch/x86/hvm/vlapic.c
>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
>>> >      struct domain *d = vlapic_domain(vlapic);
>>> > +    struct vcpu *v = vlapic_vcpu(vlapic);
>>> > +    struct hvm_intack pt_intack;
>>> > +
>>> > +    pt_intack.vector = vector;
>>> > +    pt_intack.source = hvm_intsrc_lapic;
>>> > +    pt_intr_post(v, pt_intack);
>>>
>>> This also sits on the EOI LAPIC register write path, i.e. the change
>>> then also affects non-apicv environments.
>>
>>The new logic should be entered only when EOI-induced exit happens.
>>
> 
> Yes, more that the EOI-induced exit is conditional, specifically, the bitmap 
> is set by vmx_set_eoi_exit_bitmap().
> Jan, what do you think? While I recall from v1 discussion, you have the same 
> comment. We can dig it deep..

See my reply to Kevin sent a minute ago. As I'm not sure what
Kevin means to state with several of his responses, I can't
properly respond for now. And then what you say doesn't
really address my concern - things being conditional elsewhere
doesn't mean we won't get here too in the non-apicv case, at
least not in a way that I can follow right away.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-09-26  6:35     ` Jan Beulich
@ 2016-09-26 19:34       ` Tian, Kevin
  0 siblings, 0 replies; 23+ messages in thread
From: Tian, Kevin @ 2016-09-26 19:34 UTC (permalink / raw)
  To: Jan Beulich, Xuquan (Euler)
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, September 26, 2016 2:35 PM
> 
> >>> On 24.09.16 at 01:34, <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, September 23, 2016 11:34 PM
> >>
> >> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
> >> > --- a/xen/arch/x86/hvm/vlapic.c
> >> > +++ b/xen/arch/x86/hvm/vlapic.c
> >> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
> >> >  void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
> >> >  {
> >> >      struct domain *d = vlapic_domain(vlapic);
> >> > +    struct vcpu *v = vlapic_vcpu(vlapic);
> >> > +    struct hvm_intack pt_intack;
> >> > +
> >> > +    pt_intack.vector = vector;
> >> > +    pt_intack.source = hvm_intsrc_lapic;
> >> > +    pt_intr_post(v, pt_intack);
> >>
> >> This also sits on the EOI LAPIC register write path, i.e. the change
> >> then also affects non-apicv environments.
> >
> > The new logic should be entered only when EOI-induced exit happens.
> 
> To me this reply reads ambiguously: Does "should" here mean the
> patch needs further adjustment in your opinion, or that you think
> the patch already does what is needed to ensure the new logic to
> bet involved only in the EOI-induced exit case. To me it continues
> to look like the former.

yes the former. Possibly clearer if I directly reply to original Quan's
patch. :-)

> 
> >> Furthermore - don't we have the same problem as with v1 again
> >> then? What prevents multiple EOIs to come here before the timer
> >> interrupt actually gets handled? You'd then clear ->irq_issued
> >> each time, rendering your change to pt_update_irq() ineffective.
> >
> > based on this patch, one irq_issued should cause only one EOI on
> > related pt vector and this EOI exit clears irq_issued then next EOI
> > would be seen only upon another injection when irq_issued is set
> > again... However there might be an issue if this pt vector is shared
> > with another device interrupt, which although is not a typical case...
> 
> That's a common problem: I don't think we can consider only the
> typical case. Or does hardware only deal with those, too?

need more thinking here.

> 
> And then the "should" here reads as ambiguously to me as the
> earlier one, the more that you seem to consider EOIs for only the
> one vector of interest, whereas my reply was specifically meant
> to cover also EOIs for other vectors.

This "should" means the behavior after further adjustment

> 
> >> > --- a/xen/arch/x86/hvm/vmx/intr.c
> >> > +++ b/xen/arch/x86/hvm/vmx/intr.c
> >> > @@ -333,8 +333,6 @@ void vmx_intr_assist(void)
> >> >              clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed);
> >> >              __vmwrite(EOI_EXIT_BITMAP(i),
> v->arch.hvm_vmx.eoi_exit_bitmap[i]);
> >> >          }
> >> > -
> >> > -        pt_intr_post(v, intack);
> >> >      }
> >>
> >> I'll defer to the VMX maintainers to determine whether removing this
> >> but not the other one is correct.
> >
> > This is correct. The other one is for non-apicv scenario.
> 
> Sure, but the other path will also get used under certain conditions
> even when apicv is in use.
> 
> >> > --- a/xen/arch/x86/hvm/vpt.c
> >> > +++ b/xen/arch/x86/hvm/vpt.c
> >> > @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v)
> >> >              }
> >> >              else
> >> >              {
> >> > -                if ( (pt->last_plt_gtime + pt->period) < max_lag )
> >> > +                if ( (pt->last_plt_gtime + pt->period) < max_lag &&
> >> > +                     !pt->irq_issued )
> >> >                  {
> >> >                      max_lag = pt->last_plt_gtime + pt->period;
> >> >                      earliest_pt = pt;
> >>
> >> Looking at the rest of the code I really wonder why this check
> >> hadn't been there from the beginning. Tim, do recall whether
> >> this was intentional (as opposed to simply having been an
> >> oversight)?
> >
> > Possibly simplify the emulation by relying on IRR/ISR to handling pending
> > situation?
> 
> Again I'm suffering ambiguity here: Do you suggest a possible
> explanation for why things are the way they are, or a possible
> code adjustment to be done?
> 

the former.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-09-26  6:39       ` Jan Beulich
@ 2016-10-10  9:21         ` Xuquan (Quan Xu)
  2016-10-10  9:39           ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Xuquan (Quan Xu) @ 2016-10-10  9:21 UTC (permalink / raw)
  To: Jan Beulich, Kevin Tian
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Jun Nakajima

On September 26, 2016 2:39 PM, Jan Beulich < JBeulich@suse.com > wrote:
>>>> On 24.09.16 at 03:06, <xuquan8@huawei.com> wrote:
>> On September 24, 2016 7:34 AM, Tian Kevin < kevin.tian@intel.com > wrote:
>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>> Sent: Friday, September 23, 2016 11:34 PM
>>>>
>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
>>>> > --- a/xen/arch/x86/hvm/vlapic.c
>>>> > +++ b/xen/arch/x86/hvm/vlapic.c
>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>>>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
>>>> >      struct domain *d = vlapic_domain(vlapic);
>>>> > +    struct vcpu *v = vlapic_vcpu(vlapic);
>>>> > +    struct hvm_intack pt_intack;
>>>> > +
>>>> > +    pt_intack.vector = vector;
>>>> > +    pt_intack.source = hvm_intsrc_lapic;
>>>> > +    pt_intr_post(v, pt_intack);
>>>>
>>>> This also sits on the EOI LAPIC register write path, i.e. the change
>>>> then also affects non-apicv environments.
>>>
>>>The new logic should be entered only when EOI-induced exit happens.
>>>
>>
>> Yes, more that the EOI-induced exit is conditional, specifically, the
>> bitmap is set by vmx_set_eoi_exit_bitmap().
>> Jan, what do you think? While I recall from v1 discussion, you have
>> the same comment. We can dig it deep..
>
>See my reply to Kevin sent a minute ago. As I'm not sure what Kevin means to
>state with several of his responses, I can't properly respond for now. And then
>what you say doesn't really address my concern - things being conditional
>elsewhere doesn't mean we won't get here too in the non-apicv case, at least
>not in a way that I can follow right away.
>

Jan, any idea now? 
Quan




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-10  9:21         ` Xuquan (Quan Xu)
@ 2016-10-10  9:39           ` Jan Beulich
  2016-10-10 10:49             ` Xuquan (Quan Xu)
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2016-10-10  9:39 UTC (permalink / raw)
  To: Xuquan (Quan Xu)
  Cc: yang.zhang.wz, Kevin Tian, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Jun Nakajima

>>> On 10.10.16 at 11:21, <xuquan8@huawei.com> wrote:
> On September 26, 2016 2:39 PM, Jan Beulich < JBeulich@suse.com > wrote:
>>>>> On 24.09.16 at 03:06, <xuquan8@huawei.com> wrote:
>>> On September 24, 2016 7:34 AM, Tian Kevin < kevin.tian@intel.com > wrote:
>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>> Sent: Friday, September 23, 2016 11:34 PM
>>>>>
>>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
>>>>> > --- a/xen/arch/x86/hvm/vlapic.c
>>>>> > +++ b/xen/arch/x86/hvm/vlapic.c
>>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>>>>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
>>>>> >      struct domain *d = vlapic_domain(vlapic);
>>>>> > +    struct vcpu *v = vlapic_vcpu(vlapic);
>>>>> > +    struct hvm_intack pt_intack;
>>>>> > +
>>>>> > +    pt_intack.vector = vector;
>>>>> > +    pt_intack.source = hvm_intsrc_lapic;
>>>>> > +    pt_intr_post(v, pt_intack);
>>>>>
>>>>> This also sits on the EOI LAPIC register write path, i.e. the change
>>>>> then also affects non-apicv environments.
>>>>
>>>>The new logic should be entered only when EOI-induced exit happens.
>>>>
>>>
>>> Yes, more that the EOI-induced exit is conditional, specifically, the
>>> bitmap is set by vmx_set_eoi_exit_bitmap().
>>> Jan, what do you think? While I recall from v1 discussion, you have
>>> the same comment. We can dig it deep..
>>
>>See my reply to Kevin sent a minute ago. As I'm not sure what Kevin means to
>>state with several of his responses, I can't properly respond for now. And then
>>what you say doesn't really address my concern - things being conditional
>>elsewhere doesn't mean we won't get here too in the non-apicv case, at least
>>not in a way that I can follow right away.
> 
> Jan, any idea now? 

I don't think there was anything left open on the other sub-thread;
if there is, please point out specific aspects which are still unclear.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-10  9:39           ` Jan Beulich
@ 2016-10-10 10:49             ` Xuquan (Quan Xu)
  2016-10-11  7:48               ` Tian, Kevin
  0 siblings, 1 reply; 23+ messages in thread
From: Xuquan (Quan Xu) @ 2016-10-10 10:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: yang.zhang.wz, Kevin Tian, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Jun Nakajima

On October 10, 2016 5:40 PM, Jan Beulich < JBeulich@suse.com > wrote:
>>>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
>>>>>> > --- a/xen/arch/x86/hvm/vlapic.c
>>>>>> > +++ b/xen/arch/x86/hvm/vlapic.c
>>>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>>>>>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
>>>>>> >      struct domain *d = vlapic_domain(vlapic);
>>>>>> > +    struct vcpu *v = vlapic_vcpu(vlapic);
>>>>>> > +    struct hvm_intack pt_intack;
>>>>>> > +
>>>>>> > +    pt_intack.vector = vector;
>>>>>> > +    pt_intack.source = hvm_intsrc_lapic;
>>>>>> > +    pt_intr_post(v, pt_intack);
>>>>>>
>>>>>> This also sits on the EOI LAPIC register write path, i.e. the
>>>>>> change then also affects non-apicv environments.
>>>>>
>>>>>The new logic should be entered only when EOI-induced exit happens.
>>>>>
>>>>
>>>> Yes, more that the EOI-induced exit is conditional, specifically,
>>>> the bitmap is set by vmx_set_eoi_exit_bitmap().
>>>> Jan, what do you think? While I recall from v1 discussion, you have
>>>> the same comment. We can dig it deep..
>>>
>>>See my reply to Kevin sent a minute ago. As I'm not sure what Kevin
>>>means to state with several of his responses, I can't properly respond
>>>for now. And then what you say doesn't really address my concern -
>>>things being conditional elsewhere doesn't mean we won't get here too
>>>in the non-apicv case, at least not in a way that I can follow right away.
>>
>> Jan, any idea now?
>
>I don't think there was anything left open on the other sub-thread; if there is,
>please point out specific aspects which are still unclear.
>

Sorry, I overlooked the other sub-thread after holiday(10.1-10.7)..
I will read it again..

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-10 10:49             ` Xuquan (Quan Xu)
@ 2016-10-11  7:48               ` Tian, Kevin
  2016-10-11 11:12                 ` Xuquan (Quan Xu)
  2016-10-17  9:27                 ` Xuquan (Quan Xu)
  0 siblings, 2 replies; 23+ messages in thread
From: Tian, Kevin @ 2016-10-11  7:48 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Jan Beulich
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Monday, October 10, 2016 6:49 PM
> 
> On October 10, 2016 5:40 PM, Jan Beulich < JBeulich@suse.com > wrote:
> >>>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
> >>>>>> > --- a/xen/arch/x86/hvm/vlapic.c
> >>>>>> > +++ b/xen/arch/x86/hvm/vlapic.c
> >>>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
> >>>>>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
> >>>>>> >      struct domain *d = vlapic_domain(vlapic);
> >>>>>> > +    struct vcpu *v = vlapic_vcpu(vlapic);
> >>>>>> > +    struct hvm_intack pt_intack;
> >>>>>> > +
> >>>>>> > +    pt_intack.vector = vector;
> >>>>>> > +    pt_intack.source = hvm_intsrc_lapic;
> >>>>>> > +    pt_intr_post(v, pt_intack);
> >>>>>>
> >>>>>> This also sits on the EOI LAPIC register write path, i.e. the
> >>>>>> change then also affects non-apicv environments.
> >>>>>
> >>>>>The new logic should be entered only when EOI-induced exit happens.
> >>>>>
> >>>>
> >>>> Yes, more that the EOI-induced exit is conditional, specifically,
> >>>> the bitmap is set by vmx_set_eoi_exit_bitmap().
> >>>> Jan, what do you think? While I recall from v1 discussion, you have
> >>>> the same comment. We can dig it deep..
> >>>
> >>>See my reply to Kevin sent a minute ago. As I'm not sure what Kevin
> >>>means to state with several of his responses, I can't properly respond
> >>>for now. And then what you say doesn't really address my concern -
> >>>things being conditional elsewhere doesn't mean we won't get here too
> >>>in the non-apicv case, at least not in a way that I can follow right away.
> >>
> >> Jan, any idea now?
> >
> >I don't think there was anything left open on the other sub-thread; if there is,
> >please point out specific aspects which are still unclear.
> >
> 
> Sorry, I overlooked the other sub-thread after holiday(10.1-10.7)..
> I will read it again..
> 
> Quan

Is there any discussion after 10.1? I didn't see it.

Back to the main open before holiday - multiple EOIs may come
to clear irq_issued before guest actually handles the very vpt
injection (possible if vpt vector is shared with other sources). I
don't see a good solution on that open... :/

We've discussed various options which all fail in one or another 
place - either miss an injection, or incur undesired injections. 
Possibly we should consider another direction - fall back to 
non-apicv path when we see vpt vector pending but it's not the
highest one.

Original condition to enter virtual intr delivery:
	    else if ( cpu_has_vmx_virtual_intr_delivery &&
              intack.source != hvm_intsrc_pic &&
              intack.source != hvm_intsrc_vector )

now new condition:
	    else if ( cpu_has_vmx_virtual_intr_delivery &&
              intack.source != hvm_intsrc_pic &&
              intack.source != hvm_intsrc_vector &&
			(pt_vector == -1 || intack.vector == pt_vector) )

Thoughts?

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-11  7:48               ` Tian, Kevin
@ 2016-10-11 11:12                 ` Xuquan (Quan Xu)
  2016-10-17  9:27                 ` Xuquan (Quan Xu)
  1 sibling, 0 replies; 23+ messages in thread
From: Xuquan (Quan Xu) @ 2016-10-11 11:12 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

On October 11, 2016 3:49 PM, Tian, Kevin <Kevin.tian@intel.com>
>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> Sent: Monday, October 10, 2016 6:49 PM
>>
>> On October 10, 2016 5:40 PM, Jan Beulich < JBeulich@suse.com > wrote:
>> >>>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
>> >>>>>> > --- a/xen/arch/x86/hvm/vlapic.c
>> >>>>>> > +++ b/xen/arch/x86/hvm/vlapic.c
>> >>>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic
>> >>>>>> > *vlapic) void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
>> >>>>>> >      struct domain *d = vlapic_domain(vlapic);
>> >>>>>> > +    struct vcpu *v = vlapic_vcpu(vlapic);
>> >>>>>> > +    struct hvm_intack pt_intack;
>> >>>>>> > +
>> >>>>>> > +    pt_intack.vector = vector;
>> >>>>>> > +    pt_intack.source = hvm_intsrc_lapic;
>> >>>>>> > +    pt_intr_post(v, pt_intack);
>> >>>>>>
>> >>>>>> This also sits on the EOI LAPIC register write path, i.e. the
>> >>>>>> change then also affects non-apicv environments.
>> >>>>>
>> >>>>>The new logic should be entered only when EOI-induced exit happens.
>> >>>>>
>> >>>>
>> >>>> Yes, more that the EOI-induced exit is conditional, specifically,
>> >>>> the bitmap is set by vmx_set_eoi_exit_bitmap().
>> >>>> Jan, what do you think? While I recall from v1 discussion, you
>> >>>> have the same comment. We can dig it deep..
>> >>>
>> >>>See my reply to Kevin sent a minute ago. As I'm not sure what Kevin
>> >>>means to state with several of his responses, I can't properly
>> >>>respond for now. And then what you say doesn't really address my
>> >>>concern - things being conditional elsewhere doesn't mean we won't
>> >>>get here too in the non-apicv case, at least not in a way that I can follow
>right away.
>> >>
>> >> Jan, any idea now?
>> >
>> >I don't think there was anything left open on the other sub-thread;
>> >if there is, please point out specific aspects which are still unclear.
>> >
>>
>> Sorry, I overlooked the other sub-thread after holiday(10.1-10.7)..
>> I will read it again..
>>
>> Quan
>
>Is there any discussion after 10.1? I didn't see it.
>
>Back to the main open before holiday - multiple EOIs may come to clear
>irq_issued before guest actually handles the very vpt injection (possible if vpt
>vector is shared with other sources). I don't see a good solution on that open... :/
>
>We've discussed various options which all fail in one or another place - either
>miss an injection, or incur undesired injections.
>Possibly we should consider another direction - fall back to non-apicv path when
>we see vpt vector pending but it's not the highest one.
>
>Original condition to enter virtual intr delivery:
>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
>              intack.source != hvm_intsrc_pic &&
>              intack.source != hvm_intsrc_vector )
>
>now new condition:
>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
>              intack.source != hvm_intsrc_pic &&
>              intack.source != hvm_intsrc_vector &&
>			(pt_vector == -1 || intack.vector == pt_vector) )
>
>Thoughts?
>
Kevin, 
When I try to fix it as your suggestion, I cannot boot the guest, with below message(from xl dmesg):


(d1) HVM Loader
(d1) Detected Xen v4.8-unstable
(d1) Xenbus rings @0xfeffc000, event channel 1
(d1) System requested SeaBIOS
(d1) CPU speed is 2394 MHz
(d1) Relocating guest memory for lowmem MMIO space disabled
(XEN) irq.c:275: Dom1 PCI link 0 changed 0 -> 5
(d1) PCI-ISA link 0 routed to IRQ5
(XEN) irq.c:275: Dom1 PCI link 1 changed 0 -> 10
(d1) PCI-ISA link 1 routed to IRQ10
(XEN) irq.c:275: Dom1 PCI link 2 changed 0 -> 11
(d1) PCI-ISA link 2 routed to IRQ11
(XEN) irq.c:275: Dom1 PCI link 3 changed 0 -> 5
(d1) PCI-ISA link 3 routed to IRQ5
(d1) pci dev 01:3 INTA->IRQ10
(d1) pci dev 02:0 INTA->IRQ11
(d1) RAM in high memory; setting high_mem resource base to 20f800000
(d1) pci dev 03:0 bar 10 size 002000000: 0f0000008
(d1) pci dev 02:0 bar 14 size 001000000: 0f2000008
(d1) pci dev 03:0 bar 30 size 000010000: 0f3000000
(d1) pci dev 03:0 bar 14 size 000001000: 0f3010000
(d1) pci dev 02:0 bar 10 size 000000100: 00000c001
(d1) pci dev 01:1 bar 20 size 000000010: 00000c101
(d1) Multiprocessor initialisation:
(d1)  - CPU0 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
(d1)  - CPU1 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
(d1) Testing HVM environment:
(d1)  - REP INSB across page boundaries ... passed
(d1)  - GS base MSRs and SWAPGS ... passed
(d1) Passed 2 of 2 tests
(d1) Writing SMBIOS tables ...
(d1) Loading SeaBIOS ...
(d1) Creating MP tables ...
(d1) Loading ACPI ...
(d1) vm86 TSS at fc00a300
(d1) BIOS map:
(d1)  10000-100e3: Scratch space
(d1)  c0000-fffff: Main BIOS
(d1) E820 table:
(d1)  [00]: 00000000:00000000 - 00000000:000a0000: RAM
(d1)  HOLE: 00000000:000a0000 - 00000000:000c0000
(d1)  [01]: 00000000:000c0000 - 00000000:00100000: RESERVED
(d1)  [02]: 00000000:00100000 - 00000000:f0000000: RAM
(d1)  HOLE: 00000000:f0000000 - 00000000:fc000000
(d1)  [03]: 00000000:fc000000 - 00000001:00000000: RESERVED
(d1)  [04]: 00000001:00000000 - 00000002:0f800000: RAM
(d1) Invoking SeaBIOS ...
(d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e)
(d1) BUILD: gcc: (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973] binutils: (GNU
(d1) Binutils; SUSE Linux Enterprise 11) 2.23.1
(d1)
(d1) Found Xen hypervisor signature at 40000000
(d1) Running on QEMU (i440fx)
(d1) xen: copy e820...
(d1) Relocating init from 0x000d8fa0 to 0xeffabc40 (size 82736)
(d1) Found 6 PCI devices (max PCI bus is 00)
(d1) Allocated Xen hypercall page at effff000
(d1) Detected Xen v4.8-unstable
(d1) xen: copy BIOS tables...
(d1) Copying SMBIOS entry point from 0x00010020 to 0x000f5b60
(d1) Copying MPTABLE from 0xfc001170/fc001180 to 0x000f5a60
(d1) Copying PIR from 0x00010040 to 0x000f59e0
(d1) Copying ACPI RSDP from 0x000100c0 to 0x000f59b0
(d1) Using pmtimer, ioport 0xb008
(d1) Scan for VGA option rom
(d1) Running option rom at c000:0003
(XEN) stdvga.c:174:d1v0 entering stdvga mode
(d1) pmm call arg1=0
(d1) Turning on vga text mode console
(d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e)
(d1) Machine UUID 59e20ef4-565a-49cb-9559-cde6d391cdf4
(d1) All threads complete.
(d1) Found 0 lpt ports
(d1) Found 0 serial ports
(d1) ATA controller 1 at 1f0/3f4/0 (irq 14 dev 9)
(d1) ATA controller 2 at 170/374/0 (irq 15 dev 9)
(d1) PS2 keyboard initialized
(d1) ata0-0: QEMU HARDDISK ATA-7 Hard-Disk (30720 MiBytes)
(d1) Searching bootorder for: /pci@i0cf8/*@1,1/drive@0/disk@0
(d1) All threads complete.
(d1) Scan for option roms
(d1)
(d1) Press ESC for boot menu.
(d1)
(d1) Searching bootorder for: HALT
(d1) drive 0x000f5940: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 s=62914561
(d1) Space available for UMB: c9800-ec800, f5380-f5940
(d1) Returned 258048 bytes of ZoneHigh
(d1) e820 map has 7 items:
(d1)   0: 0000000000000000 - 000000000009fc00 = 1 RAM
(d1)   1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED
(d1)   2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
(d1)   3: 0000000000100000 - 00000000effff000 = 1 RAM
(d1)   4: 00000000effff000 - 00000000f0000000 = 2 RESERVED
(d1)   5: 00000000fc000000 - 0000000100000000 = 2 RESERVED
(d1)   6: 0000000100000000 - 000000020f800000 = 1 RAM
(d1) enter handle_19:
(d1)   NULL
(d1) Booting from Hard Disk...
(d1) Booting from 0000:7c00
(XEN) stdvga.c:179:d1v0 leaving stdvga mode
(XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest state (0).
(XEN) ************* VMCS Area **************
(XEN) *** Guest State ***
(XEN) CR0: actual=0x0000000080010031, shadow=0x0000000080010031, gh_mask=ffffffffffffffff
(XEN) CR4: actual=0x00000000000426f8, shadow=0x00000000000406b8, gh_mask=ffffffffffffffff
(XEN) CR3 = 0x0000000000185000
(XEN) PDPTE0 = 0x0000000000186001  PDPTE1 = 0x0000000000187001
(XEN) PDPTE2 = 0x0000000000188001  PDPTE3 = 0x0000000000189001
(XEN) RSP = 0x000000008ce53a6c (0x000000008ce53a6c)  RIP = 0x000000008161dd21 (0x000000008161dd21)
(XEN) RFLAGS=0x00200046 (0x00200046)  DR7 = 0x0000000000000400
(XEN) Sysenter RSP=000000008078b000 CS:RIP=0008:000000008168c0c0
(XEN)        sel  attr  limit   base
(XEN)   CS: 0008 0c09b ffffffff 0000000000000000
(XEN)   DS: 0023 0c0f3 ffffffff 0000000000000000
(XEN)   SS: 0010 0c093 ffffffff 0000000000000000
(XEN)   ES: 0023 0c0f3 ffffffff 0000000000000000
(XEN)   FS: 0030 04093 00003748 0000000081779c00
(XEN)   GS: 0000 1c000 ffffffff 0000000000000000
(XEN) GDTR:            000003ff 0000000081553000
(XEN) LDTR: 0000 1c000 ffffffff 0000000000000000
(XEN) IDTR:            000007ff 0000000081553400
(XEN)   TR: 0028 0008b 000020ab 00000000801ad000
(XEN) EFER = 0x0000000000000000  PAT = 0x0007010600070106
(XEN) PreemptionTimer = 0x00000000  SM Base = 0x00000000
(XEN) DebugCtl = 0x0000000000000000  DebugExceptions = 0x0000000000000000
(XEN) Interruptibility = 00000000  ActivityState = 00000000
(XEN) InterruptStatus = d100
(XEN) *** Host State ***
(XEN) RIP = 0xffff82d0801ff560 (vmx_asm_vmexit_handler)  RSP = 0xffff83187e20ff90
(XEN) CS=e008 SS=0000 DS=0000 ES=0000 FS=0000 GS=0000 TR=e040
(XEN) FSBase=0000000000000000 GSBase=0000000000000000 TRBase=ffff830839dfec00
(XEN) GDTBase=ffff83187e36f000 IDTBase=ffff83187e37b000
(XEN) CR0=0000000080050033 CR3=0000000821f6e000 CR4=00000000001526e0
(XEN) Sysenter RSP=ffff83187e20ffc0 CS:RIP=e008:ffff82d080244a00
(XEN) EFER = 0x0000000000000000  PAT = 0x0000050100070406
(XEN) *** Control State ***
(XEN) PinBased=000000bf CPUBased=b6a065fa SecondaryExec=0000576b
(XEN) EntryControls=000051ff ExitControls=000fefff
(XEN) ExceptionBitmap=00060002 PFECmask=00000000 PFECmatch=00000000
(XEN) VMEntry: intr_info=800000e1 errcode=00000000 ilen=00000000
(XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000006
(XEN)         reason=80000021 qualification=0000000000000000
(XEN) IDTVectoring: info=00000000 errcode=00000000
(XEN) TSC Offset = 0xfffd7fc7c2ca5a1c  TSC Multiplier = 0x0000000000000000
(XEN) TPR Threshold = 0x00  PostedIntrVec = 0xf4
(XEN) EPT pointer = 0x0000000821f8501e  EPTP index = 0x0000
(XEN) PLE Gap=00000080 Window=00001000
(XEN) Virtual processor ID = 0x5f7a VMfunc controls = 0000000000000000
(XEN) **************************************
(XEN) domain_crash called from vmx.c:3111
(XEN) Domain 1 (vcpu#0) crashed on cpu#12:
(XEN) ----[ Xen-4.8-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    12
(XEN) RIP:    0008:[<000000008161dd21>]
(XEN) RFLAGS: 0000000000200046   CONTEXT: hvm guest (d1v0)
(XEN) rax: 00000000000c0000   rbx: 000000000000000a   rcx: 00000000000c00d1
(XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 000000008ce53acc
(XEN) rbp: 000000008ce53a6c   rsp: 000000008ce53a6c   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000080010031   cr4: 00000000000406b8
(XEN) cr3: 0000000000185000   cr2: 000000008cc09000
(XEN) ds: 0023   es: 0023   fs: 0030   gs: 0000   ss: 0010   cs: 0008
(XEN) HVM2 save: CPU
(XEN) HVM2 save: PIC
(XEN) HVM2 save: IOAPIC
(XEN) HVM2 save: LAPIC
(XEN) HVM2 save: LAPIC_REGS
(XEN) HVM2 save: PCI_IRQ
(XEN) HVM2 save: ISA_IRQ
(XEN) HVM2 save: PCI_LINK
(XEN) HVM2 save: PIT
(XEN) HVM2 save: RTC
(XEN) HVM2 save: HPET
(XEN) HVM2 save: PMTIMER
(XEN) HVM2 save: MTRR
(XEN) HVM2 save: VIRIDIAN_DOMAIN
(XEN) HVM2 save: CPU_XSAVE
(XEN) HVM2 save: VIRIDIAN_VCPU
(XEN) HVM2 save: VMCE_VCPU
(XEN) HVM2 save: TSC_ADJUST
(XEN) HVM2 restore: CPU 0


Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-11  7:48               ` Tian, Kevin
  2016-10-11 11:12                 ` Xuquan (Quan Xu)
@ 2016-10-17  9:27                 ` Xuquan (Quan Xu)
  2016-10-21  8:43                   ` Tian, Kevin
  2016-10-24  7:02                   ` Tian, Kevin
  1 sibling, 2 replies; 23+ messages in thread
From: Xuquan (Quan Xu) @ 2016-10-17  9:27 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Tian, Kevin, Jan Beulich
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

On October 11, 2016 7:11 PM, Xuquan < xuquan8@huawei.com > wrote:
>On October 11, 2016 3:49 PM, Tian, Kevin <Kevin.tian@intel.com>
>>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>>> Sent: Monday, October 10, 2016 6:49 PM
>>>
>>> On October 10, 2016 5:40 PM, Jan Beulich < JBeulich@suse.com > wrote:
>>> >>>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
>>> >>>>>> > --- a/xen/arch/x86/hvm/vlapic.c
>>> >>>>>> > +++ b/xen/arch/x86/hvm/vlapic.c
>>> >>>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic
>>> >>>>>> > *vlapic) void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
>>> >>>>>> >      struct domain *d = vlapic_domain(vlapic);
>>> >>>>>> > +    struct vcpu *v = vlapic_vcpu(vlapic);
>>> >>>>>> > +    struct hvm_intack pt_intack;
>>> >>>>>> > +
>>> >>>>>> > +    pt_intack.vector = vector;
>>> >>>>>> > +    pt_intack.source = hvm_intsrc_lapic;
>>> >>>>>> > +    pt_intr_post(v, pt_intack);
>>> >>>>>>
>>> >>>>>> This also sits on the EOI LAPIC register write path, i.e. the
>>> >>>>>> change then also affects non-apicv environments.
>>> >>>>>
>>> >>>>>The new logic should be entered only when EOI-induced exit happens.
>>> >>>>>
>>> >>>>
>>> >>>> Yes, more that the EOI-induced exit is conditional,
>>> >>>> specifically, the bitmap is set by vmx_set_eoi_exit_bitmap().
>>> >>>> Jan, what do you think? While I recall from v1 discussion, you
>>> >>>> have the same comment. We can dig it deep..
>>> >>>
>>> >>>See my reply to Kevin sent a minute ago. As I'm not sure what
>>> >>>Kevin means to state with several of his responses, I can't
>>> >>>properly respond for now. And then what you say doesn't really
>>> >>>address my concern - things being conditional elsewhere doesn't
>>> >>>mean we won't get here too in the non-apicv case, at least not in
>>> >>>a way that I can follow
>>right away.
>>> >>
>>> >> Jan, any idea now?
>>> >
>>> >I don't think there was anything left open on the other sub-thread;
>>> >if there is, please point out specific aspects which are still unclear.
>>> >
>>>
>>> Sorry, I overlooked the other sub-thread after holiday(10.1-10.7)..
>>> I will read it again..
>>>
>>> Quan
>>
>>Is there any discussion after 10.1? I didn't see it.
>>
>>Back to the main open before holiday - multiple EOIs may come to clear
>>irq_issued before guest actually handles the very vpt injection
>>(possible if vpt vector is shared with other sources). I don't see a
>>good solution on that open... :/
>>
>>We've discussed various options which all fail in one or another place
>>- either miss an injection, or incur undesired injections.
>>Possibly we should consider another direction - fall back to non-apicv
>>path when we see vpt vector pending but it's not the highest one.
>>
>>Original condition to enter virtual intr delivery:
>>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
>>              intack.source != hvm_intsrc_pic &&
>>              intack.source != hvm_intsrc_vector )
>>
>>now new condition:
>>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
>>              intack.source != hvm_intsrc_pic &&
>>              intack.source != hvm_intsrc_vector &&
>>			(pt_vector == -1 || intack.vector == pt_vector) )
>>
>>Thoughts?
>>
>Kevin,
>When I try to fix it as your suggestion, I cannot boot the guest, with below
>message(from xl dmesg):

with Kevin's patch, the hypervisor always calls ' vmx_inject_extint() -> __vmx_inject_exception()' to inject exception, then vm-entry on loop..
the interrupt (PT or IPI, or others) can't deliver to guest..

and so far, we suppress MSR-based APIC suggestion when having APIC-V by http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec62a2818e64390ac2ccfbd74e6b7
so I think we couldn't fallback to non-apicv dynamically here..

Quan



>(d1) HVM Loader
>(d1) Detected Xen v4.8-unstable
>(d1) Xenbus rings @0xfeffc000, event channel 1
>(d1) System requested SeaBIOS
>(d1) CPU speed is 2394 MHz
>(d1) Relocating guest memory for lowmem MMIO space disabled
>(XEN) irq.c:275: Dom1 PCI link 0 changed 0 -> 5
>(d1) PCI-ISA link 0 routed to IRQ5
>(XEN) irq.c:275: Dom1 PCI link 1 changed 0 -> 10
>(d1) PCI-ISA link 1 routed to IRQ10
>(XEN) irq.c:275: Dom1 PCI link 2 changed 0 -> 11
>(d1) PCI-ISA link 2 routed to IRQ11
>(XEN) irq.c:275: Dom1 PCI link 3 changed 0 -> 5
>(d1) PCI-ISA link 3 routed to IRQ5
>(d1) pci dev 01:3 INTA->IRQ10
>(d1) pci dev 02:0 INTA->IRQ11
>(d1) RAM in high memory; setting high_mem resource base to 20f800000
>(d1) pci dev 03:0 bar 10 size 002000000: 0f0000008
>(d1) pci dev 02:0 bar 14 size 001000000: 0f2000008
>(d1) pci dev 03:0 bar 30 size 000010000: 0f3000000
>(d1) pci dev 03:0 bar 14 size 000001000: 0f3010000
>(d1) pci dev 02:0 bar 10 size 000000100: 00000c001
>(d1) pci dev 01:1 bar 20 size 000000010: 00000c101
>(d1) Multiprocessor initialisation:
>(d1)  - CPU0 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
>(d1)  - CPU1 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
>(d1) Testing HVM environment:
>(d1)  - REP INSB across page boundaries ... passed
>(d1)  - GS base MSRs and SWAPGS ... passed
>(d1) Passed 2 of 2 tests
>(d1) Writing SMBIOS tables ...
>(d1) Loading SeaBIOS ...
>(d1) Creating MP tables ...
>(d1) Loading ACPI ...
>(d1) vm86 TSS at fc00a300
>(d1) BIOS map:
>(d1)  10000-100e3: Scratch space
>(d1)  c0000-fffff: Main BIOS
>(d1) E820 table:
>(d1)  [00]: 00000000:00000000 - 00000000:000a0000: RAM
>(d1)  HOLE: 00000000:000a0000 - 00000000:000c0000
>(d1)  [01]: 00000000:000c0000 - 00000000:00100000: RESERVED
>(d1)  [02]: 00000000:00100000 - 00000000:f0000000: RAM
>(d1)  HOLE: 00000000:f0000000 - 00000000:fc000000
>(d1)  [03]: 00000000:fc000000 - 00000001:00000000: RESERVED
>(d1)  [04]: 00000001:00000000 - 00000002:0f800000: RAM
>(d1) Invoking SeaBIOS ...
>(d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e)
>(d1) BUILD: gcc: (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973] binutils:
>(GNU
>(d1) Binutils; SUSE Linux Enterprise 11) 2.23.1
>(d1)
>(d1) Found Xen hypervisor signature at 40000000
>(d1) Running on QEMU (i440fx)
>(d1) xen: copy e820...
>(d1) Relocating init from 0x000d8fa0 to 0xeffabc40 (size 82736)
>(d1) Found 6 PCI devices (max PCI bus is 00)
>(d1) Allocated Xen hypercall page at effff000
>(d1) Detected Xen v4.8-unstable
>(d1) xen: copy BIOS tables...
>(d1) Copying SMBIOS entry point from 0x00010020 to 0x000f5b60
>(d1) Copying MPTABLE from 0xfc001170/fc001180 to 0x000f5a60
>(d1) Copying PIR from 0x00010040 to 0x000f59e0
>(d1) Copying ACPI RSDP from 0x000100c0 to 0x000f59b0
>(d1) Using pmtimer, ioport 0xb008
>(d1) Scan for VGA option rom
>(d1) Running option rom at c000:0003
>(XEN) stdvga.c:174:d1v0 entering stdvga mode
>(d1) pmm call arg1=0
>(d1) Turning on vga text mode console
>(d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e)
>(d1) Machine UUID 59e20ef4-565a-49cb-9559-cde6d391cdf4
>(d1) All threads complete.
>(d1) Found 0 lpt ports
>(d1) Found 0 serial ports
>(d1) ATA controller 1 at 1f0/3f4/0 (irq 14 dev 9)
>(d1) ATA controller 2 at 170/374/0 (irq 15 dev 9)
>(d1) PS2 keyboard initialized
>(d1) ata0-0: QEMU HARDDISK ATA-7 Hard-Disk (30720 MiBytes)
>(d1) Searching bootorder for: /pci@i0cf8/*@1,1/drive@0/disk@0
>(d1) All threads complete.
>(d1) Scan for option roms
>(d1)
>(d1) Press ESC for boot menu.
>(d1)
>(d1) Searching bootorder for: HALT
>(d1) drive 0x000f5940: PCHS=16383/16/63 translation=lba LCHS=1024/255/63
>s=62914561
>(d1) Space available for UMB: c9800-ec800, f5380-f5940
>(d1) Returned 258048 bytes of ZoneHigh
>(d1) e820 map has 7 items:
>(d1)   0: 0000000000000000 - 000000000009fc00 = 1 RAM
>(d1)   1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED
>(d1)   2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
>(d1)   3: 0000000000100000 - 00000000effff000 = 1 RAM
>(d1)   4: 00000000effff000 - 00000000f0000000 = 2 RESERVED
>(d1)   5: 00000000fc000000 - 0000000100000000 = 2 RESERVED
>(d1)   6: 0000000100000000 - 000000020f800000 = 1 RAM
>(d1) enter handle_19:
>(d1)   NULL
>(d1) Booting from Hard Disk...
>(d1) Booting from 0000:7c00
>(XEN) stdvga.c:179:d1v0 leaving stdvga mode
>(XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest state (0).
>(XEN) ************* VMCS Area **************
>(XEN) *** Guest State ***
>(XEN) CR0: actual=0x0000000080010031, shadow=0x0000000080010031,
>gh_mask=ffffffffffffffff
>(XEN) CR4: actual=0x00000000000426f8, shadow=0x00000000000406b8,
>gh_mask=ffffffffffffffff
>(XEN) CR3 = 0x0000000000185000
>(XEN) PDPTE0 = 0x0000000000186001  PDPTE1 = 0x0000000000187001
>(XEN) PDPTE2 = 0x0000000000188001  PDPTE3 = 0x0000000000189001
>(XEN) RSP = 0x000000008ce53a6c (0x000000008ce53a6c)  RIP =
>0x000000008161dd21 (0x000000008161dd21)
>(XEN) RFLAGS=0x00200046 (0x00200046)  DR7 = 0x0000000000000400
>(XEN) Sysenter RSP=000000008078b000 CS:RIP=0008:000000008168c0c0
>(XEN)        sel  attr  limit   base
>(XEN)   CS: 0008 0c09b ffffffff 0000000000000000
>(XEN)   DS: 0023 0c0f3 ffffffff 0000000000000000
>(XEN)   SS: 0010 0c093 ffffffff 0000000000000000
>(XEN)   ES: 0023 0c0f3 ffffffff 0000000000000000
>(XEN)   FS: 0030 04093 00003748 0000000081779c00
>(XEN)   GS: 0000 1c000 ffffffff 0000000000000000
>(XEN) GDTR:            000003ff 0000000081553000
>(XEN) LDTR: 0000 1c000 ffffffff 0000000000000000
>(XEN) IDTR:            000007ff 0000000081553400
>(XEN)   TR: 0028 0008b 000020ab 00000000801ad000
>(XEN) EFER = 0x0000000000000000  PAT = 0x0007010600070106
>(XEN) PreemptionTimer = 0x00000000  SM Base = 0x00000000
>(XEN) DebugCtl = 0x0000000000000000  DebugExceptions =
>0x0000000000000000
>(XEN) Interruptibility = 00000000  ActivityState = 00000000
>(XEN) InterruptStatus = d100
>(XEN) *** Host State ***
>(XEN) RIP = 0xffff82d0801ff560 (vmx_asm_vmexit_handler)  RSP =
>0xffff83187e20ff90
>(XEN) CS=e008 SS=0000 DS=0000 ES=0000 FS=0000 GS=0000 TR=e040
>(XEN) FSBase=0000000000000000 GSBase=0000000000000000
>TRBase=ffff830839dfec00
>(XEN) GDTBase=ffff83187e36f000 IDTBase=ffff83187e37b000
>(XEN) CR0=0000000080050033 CR3=0000000821f6e000
>CR4=00000000001526e0
>(XEN) Sysenter RSP=ffff83187e20ffc0 CS:RIP=e008:ffff82d080244a00
>(XEN) EFER = 0x0000000000000000  PAT = 0x0000050100070406
>(XEN) *** Control State ***
>(XEN) PinBased=000000bf CPUBased=b6a065fa SecondaryExec=0000576b
>(XEN) EntryControls=000051ff ExitControls=000fefff
>(XEN) ExceptionBitmap=00060002 PFECmask=00000000 PFECmatch=00000000
>(XEN) VMEntry: intr_info=800000e1 errcode=00000000 ilen=00000000
>(XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000006
>(XEN)         reason=80000021 qualification=0000000000000000
>(XEN) IDTVectoring: info=00000000 errcode=00000000
>(XEN) TSC Offset = 0xfffd7fc7c2ca5a1c  TSC Multiplier = 0x0000000000000000
>(XEN) TPR Threshold = 0x00  PostedIntrVec = 0xf4
>(XEN) EPT pointer = 0x0000000821f8501e  EPTP index = 0x0000
>(XEN) PLE Gap=00000080 Window=00001000
>(XEN) Virtual processor ID = 0x5f7a VMfunc controls = 0000000000000000
>(XEN) **************************************
>(XEN) domain_crash called from vmx.c:3111
>(XEN) Domain 1 (vcpu#0) crashed on cpu#12:
>(XEN) ----[ Xen-4.8-unstable  x86_64  debug=y   Not tainted ]----
>(XEN) CPU:    12
>(XEN) RIP:    0008:[<000000008161dd21>]
>(XEN) RFLAGS: 0000000000200046   CONTEXT: hvm guest (d1v0)
>(XEN) rax: 00000000000c0000   rbx: 000000000000000a   rcx:
>00000000000c00d1
>(XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi:
>000000008ce53acc
>(XEN) rbp: 000000008ce53a6c   rsp: 000000008ce53a6c   r8:
>0000000000000000
>(XEN) r9:  0000000000000000   r10: 0000000000000000   r11:
>0000000000000000
>(XEN) r12: 0000000000000000   r13: 0000000000000000   r14:
>0000000000000000
>(XEN) r15: 0000000000000000   cr0: 0000000080010031   cr4:
>00000000000406b8
>(XEN) cr3: 0000000000185000   cr2: 000000008cc09000
>(XEN) ds: 0023   es: 0023   fs: 0030   gs: 0000   ss: 0010   cs: 0008
>(XEN) HVM2 save: CPU
>(XEN) HVM2 save: PIC
>(XEN) HVM2 save: IOAPIC
>(XEN) HVM2 save: LAPIC
>(XEN) HVM2 save: LAPIC_REGS
>(XEN) HVM2 save: PCI_IRQ
>(XEN) HVM2 save: ISA_IRQ
>(XEN) HVM2 save: PCI_LINK
>(XEN) HVM2 save: PIT
>(XEN) HVM2 save: RTC
>(XEN) HVM2 save: HPET
>(XEN) HVM2 save: PMTIMER
>(XEN) HVM2 save: MTRR
>(XEN) HVM2 save: VIRIDIAN_DOMAIN
>(XEN) HVM2 save: CPU_XSAVE
>(XEN) HVM2 save: VIRIDIAN_VCPU
>(XEN) HVM2 save: VMCE_VCPU
>(XEN) HVM2 save: TSC_ADJUST
>(XEN) HVM2 restore: CPU 0
>
>
>Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-17  9:27                 ` Xuquan (Quan Xu)
@ 2016-10-21  8:43                   ` Tian, Kevin
  2016-10-24  7:02                   ` Tian, Kevin
  1 sibling, 0 replies; 23+ messages in thread
From: Tian, Kevin @ 2016-10-21  8:43 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Jan Beulich
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

Sorry I haven't got on this thread yet, but want you know it's on my list 
(hopefully will respond next week).

> -----Original Message-----
> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Monday, October 17, 2016 5:28 PM
> To: Xuquan (Quan Xu); Tian, Kevin; Jan Beulich
> Cc: Andrew Cooper; George.Dunlap@eu.citrix.com; yang.zhang.wz@gmail.com;
> Hanweidong (Randy); Jiangyifei; Nakajima, Jun; xen-devel@lists.xen.org;
> konrad.wilk@oracle.com; Tim Deegan
> Subject: RE: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
> 
> On October 11, 2016 7:11 PM, Xuquan < xuquan8@huawei.com > wrote:
> >On October 11, 2016 3:49 PM, Tian, Kevin <Kevin.tian@intel.com>
> >>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> >>> Sent: Monday, October 10, 2016 6:49 PM
> >>>
> >>> On October 10, 2016 5:40 PM, Jan Beulich < JBeulich@suse.com > wrote:
> >>> >>>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote:
> >>> >>>>>> > --- a/xen/arch/x86/hvm/vlapic.c
> >>> >>>>>> > +++ b/xen/arch/x86/hvm/vlapic.c
> >>> >>>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic
> >>> >>>>>> > *vlapic) void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
> >>> >>>>>> >      struct domain *d = vlapic_domain(vlapic);
> >>> >>>>>> > +    struct vcpu *v = vlapic_vcpu(vlapic);
> >>> >>>>>> > +    struct hvm_intack pt_intack;
> >>> >>>>>> > +
> >>> >>>>>> > +    pt_intack.vector = vector;
> >>> >>>>>> > +    pt_intack.source = hvm_intsrc_lapic;
> >>> >>>>>> > +    pt_intr_post(v, pt_intack);
> >>> >>>>>>
> >>> >>>>>> This also sits on the EOI LAPIC register write path, i.e. the
> >>> >>>>>> change then also affects non-apicv environments.
> >>> >>>>>
> >>> >>>>>The new logic should be entered only when EOI-induced exit happens.
> >>> >>>>>
> >>> >>>>
> >>> >>>> Yes, more that the EOI-induced exit is conditional,
> >>> >>>> specifically, the bitmap is set by vmx_set_eoi_exit_bitmap().
> >>> >>>> Jan, what do you think? While I recall from v1 discussion, you
> >>> >>>> have the same comment. We can dig it deep..
> >>> >>>
> >>> >>>See my reply to Kevin sent a minute ago. As I'm not sure what
> >>> >>>Kevin means to state with several of his responses, I can't
> >>> >>>properly respond for now. And then what you say doesn't really
> >>> >>>address my concern - things being conditional elsewhere doesn't
> >>> >>>mean we won't get here too in the non-apicv case, at least not in
> >>> >>>a way that I can follow
> >>right away.
> >>> >>
> >>> >> Jan, any idea now?
> >>> >
> >>> >I don't think there was anything left open on the other sub-thread;
> >>> >if there is, please point out specific aspects which are still unclear.
> >>> >
> >>>
> >>> Sorry, I overlooked the other sub-thread after holiday(10.1-10.7)..
> >>> I will read it again..
> >>>
> >>> Quan
> >>
> >>Is there any discussion after 10.1? I didn't see it.
> >>
> >>Back to the main open before holiday - multiple EOIs may come to clear
> >>irq_issued before guest actually handles the very vpt injection
> >>(possible if vpt vector is shared with other sources). I don't see a
> >>good solution on that open... :/
> >>
> >>We've discussed various options which all fail in one or another place
> >>- either miss an injection, or incur undesired injections.
> >>Possibly we should consider another direction - fall back to non-apicv
> >>path when we see vpt vector pending but it's not the highest one.
> >>
> >>Original condition to enter virtual intr delivery:
> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
> >>              intack.source != hvm_intsrc_pic &&
> >>              intack.source != hvm_intsrc_vector )
> >>
> >>now new condition:
> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
> >>              intack.source != hvm_intsrc_pic &&
> >>              intack.source != hvm_intsrc_vector &&
> >>			(pt_vector == -1 || intack.vector == pt_vector) )
> >>
> >>Thoughts?
> >>
> >Kevin,
> >When I try to fix it as your suggestion, I cannot boot the guest, with below
> >message(from xl dmesg):
> 
> with Kevin's patch, the hypervisor always calls ' vmx_inject_extint() ->
> __vmx_inject_exception()' to inject exception, then vm-entry on loop..
> the interrupt (PT or IPI, or others) can't deliver to guest..
> 
> and so far, we suppress MSR-based APIC suggestion when having APIC-V by
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec62a2818e643
> 90ac2ccfbd74e6b7
> so I think we couldn't fallback to non-apicv dynamically here..
> 
> Quan
> 
> 
> 
> >(d1) HVM Loader
> >(d1) Detected Xen v4.8-unstable
> >(d1) Xenbus rings @0xfeffc000, event channel 1
> >(d1) System requested SeaBIOS
> >(d1) CPU speed is 2394 MHz
> >(d1) Relocating guest memory for lowmem MMIO space disabled
> >(XEN) irq.c:275: Dom1 PCI link 0 changed 0 -> 5
> >(d1) PCI-ISA link 0 routed to IRQ5
> >(XEN) irq.c:275: Dom1 PCI link 1 changed 0 -> 10
> >(d1) PCI-ISA link 1 routed to IRQ10
> >(XEN) irq.c:275: Dom1 PCI link 2 changed 0 -> 11
> >(d1) PCI-ISA link 2 routed to IRQ11
> >(XEN) irq.c:275: Dom1 PCI link 3 changed 0 -> 5
> >(d1) PCI-ISA link 3 routed to IRQ5
> >(d1) pci dev 01:3 INTA->IRQ10
> >(d1) pci dev 02:0 INTA->IRQ11
> >(d1) RAM in high memory; setting high_mem resource base to 20f800000
> >(d1) pci dev 03:0 bar 10 size 002000000: 0f0000008
> >(d1) pci dev 02:0 bar 14 size 001000000: 0f2000008
> >(d1) pci dev 03:0 bar 30 size 000010000: 0f3000000
> >(d1) pci dev 03:0 bar 14 size 000001000: 0f3010000
> >(d1) pci dev 02:0 bar 10 size 000000100: 00000c001
> >(d1) pci dev 01:1 bar 20 size 000000010: 00000c101
> >(d1) Multiprocessor initialisation:
> >(d1)  - CPU0 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
> >(d1)  - CPU1 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
> >(d1) Testing HVM environment:
> >(d1)  - REP INSB across page boundaries ... passed
> >(d1)  - GS base MSRs and SWAPGS ... passed
> >(d1) Passed 2 of 2 tests
> >(d1) Writing SMBIOS tables ...
> >(d1) Loading SeaBIOS ...
> >(d1) Creating MP tables ...
> >(d1) Loading ACPI ...
> >(d1) vm86 TSS at fc00a300
> >(d1) BIOS map:
> >(d1)  10000-100e3: Scratch space
> >(d1)  c0000-fffff: Main BIOS
> >(d1) E820 table:
> >(d1)  [00]: 00000000:00000000 - 00000000:000a0000: RAM
> >(d1)  HOLE: 00000000:000a0000 - 00000000:000c0000
> >(d1)  [01]: 00000000:000c0000 - 00000000:00100000: RESERVED
> >(d1)  [02]: 00000000:00100000 - 00000000:f0000000: RAM
> >(d1)  HOLE: 00000000:f0000000 - 00000000:fc000000
> >(d1)  [03]: 00000000:fc000000 - 00000001:00000000: RESERVED
> >(d1)  [04]: 00000001:00000000 - 00000002:0f800000: RAM
> >(d1) Invoking SeaBIOS ...
> >(d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e)
> >(d1) BUILD: gcc: (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973] binutils:
> >(GNU
> >(d1) Binutils; SUSE Linux Enterprise 11) 2.23.1
> >(d1)
> >(d1) Found Xen hypervisor signature at 40000000
> >(d1) Running on QEMU (i440fx)
> >(d1) xen: copy e820...
> >(d1) Relocating init from 0x000d8fa0 to 0xeffabc40 (size 82736)
> >(d1) Found 6 PCI devices (max PCI bus is 00)
> >(d1) Allocated Xen hypercall page at effff000
> >(d1) Detected Xen v4.8-unstable
> >(d1) xen: copy BIOS tables...
> >(d1) Copying SMBIOS entry point from 0x00010020 to 0x000f5b60
> >(d1) Copying MPTABLE from 0xfc001170/fc001180 to 0x000f5a60
> >(d1) Copying PIR from 0x00010040 to 0x000f59e0
> >(d1) Copying ACPI RSDP from 0x000100c0 to 0x000f59b0
> >(d1) Using pmtimer, ioport 0xb008
> >(d1) Scan for VGA option rom
> >(d1) Running option rom at c000:0003
> >(XEN) stdvga.c:174:d1v0 entering stdvga mode
> >(d1) pmm call arg1=0
> >(d1) Turning on vga text mode console
> >(d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e)
> >(d1) Machine UUID 59e20ef4-565a-49cb-9559-cde6d391cdf4
> >(d1) All threads complete.
> >(d1) Found 0 lpt ports
> >(d1) Found 0 serial ports
> >(d1) ATA controller 1 at 1f0/3f4/0 (irq 14 dev 9)
> >(d1) ATA controller 2 at 170/374/0 (irq 15 dev 9)
> >(d1) PS2 keyboard initialized
> >(d1) ata0-0: QEMU HARDDISK ATA-7 Hard-Disk (30720 MiBytes)
> >(d1) Searching bootorder for: /pci@i0cf8/*@1,1/drive@0/disk@0
> >(d1) All threads complete.
> >(d1) Scan for option roms
> >(d1)
> >(d1) Press ESC for boot menu.
> >(d1)
> >(d1) Searching bootorder for: HALT
> >(d1) drive 0x000f5940: PCHS=16383/16/63 translation=lba LCHS=1024/255/63
> >s=62914561
> >(d1) Space available for UMB: c9800-ec800, f5380-f5940
> >(d1) Returned 258048 bytes of ZoneHigh
> >(d1) e820 map has 7 items:
> >(d1)   0: 0000000000000000 - 000000000009fc00 = 1 RAM
> >(d1)   1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED
> >(d1)   2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
> >(d1)   3: 0000000000100000 - 00000000effff000 = 1 RAM
> >(d1)   4: 00000000effff000 - 00000000f0000000 = 2 RESERVED
> >(d1)   5: 00000000fc000000 - 0000000100000000 = 2 RESERVED
> >(d1)   6: 0000000100000000 - 000000020f800000 = 1 RAM
> >(d1) enter handle_19:
> >(d1)   NULL
> >(d1) Booting from Hard Disk...
> >(d1) Booting from 0000:7c00
> >(XEN) stdvga.c:179:d1v0 leaving stdvga mode
> >(XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest state (0).
> >(XEN) ************* VMCS Area **************
> >(XEN) *** Guest State ***
> >(XEN) CR0: actual=0x0000000080010031, shadow=0x0000000080010031,
> >gh_mask=ffffffffffffffff
> >(XEN) CR4: actual=0x00000000000426f8, shadow=0x00000000000406b8,
> >gh_mask=ffffffffffffffff
> >(XEN) CR3 = 0x0000000000185000
> >(XEN) PDPTE0 = 0x0000000000186001  PDPTE1 = 0x0000000000187001
> >(XEN) PDPTE2 = 0x0000000000188001  PDPTE3 = 0x0000000000189001
> >(XEN) RSP = 0x000000008ce53a6c (0x000000008ce53a6c)  RIP =
> >0x000000008161dd21 (0x000000008161dd21)
> >(XEN) RFLAGS=0x00200046 (0x00200046)  DR7 = 0x0000000000000400
> >(XEN) Sysenter RSP=000000008078b000 CS:RIP=0008:000000008168c0c0
> >(XEN)        sel  attr  limit   base
> >(XEN)   CS: 0008 0c09b ffffffff 0000000000000000
> >(XEN)   DS: 0023 0c0f3 ffffffff 0000000000000000
> >(XEN)   SS: 0010 0c093 ffffffff 0000000000000000
> >(XEN)   ES: 0023 0c0f3 ffffffff 0000000000000000
> >(XEN)   FS: 0030 04093 00003748 0000000081779c00
> >(XEN)   GS: 0000 1c000 ffffffff 0000000000000000
> >(XEN) GDTR:            000003ff 0000000081553000
> >(XEN) LDTR: 0000 1c000 ffffffff 0000000000000000
> >(XEN) IDTR:            000007ff 0000000081553400
> >(XEN)   TR: 0028 0008b 000020ab 00000000801ad000
> >(XEN) EFER = 0x0000000000000000  PAT = 0x0007010600070106
> >(XEN) PreemptionTimer = 0x00000000  SM Base = 0x00000000
> >(XEN) DebugCtl = 0x0000000000000000  DebugExceptions =
> >0x0000000000000000
> >(XEN) Interruptibility = 00000000  ActivityState = 00000000
> >(XEN) InterruptStatus = d100
> >(XEN) *** Host State ***
> >(XEN) RIP = 0xffff82d0801ff560 (vmx_asm_vmexit_handler)  RSP =
> >0xffff83187e20ff90
> >(XEN) CS=e008 SS=0000 DS=0000 ES=0000 FS=0000 GS=0000 TR=e040
> >(XEN) FSBase=0000000000000000 GSBase=0000000000000000
> >TRBase=ffff830839dfec00
> >(XEN) GDTBase=ffff83187e36f000 IDTBase=ffff83187e37b000
> >(XEN) CR0=0000000080050033 CR3=0000000821f6e000
> >CR4=00000000001526e0
> >(XEN) Sysenter RSP=ffff83187e20ffc0 CS:RIP=e008:ffff82d080244a00
> >(XEN) EFER = 0x0000000000000000  PAT = 0x0000050100070406
> >(XEN) *** Control State ***
> >(XEN) PinBased=000000bf CPUBased=b6a065fa SecondaryExec=0000576b
> >(XEN) EntryControls=000051ff ExitControls=000fefff
> >(XEN) ExceptionBitmap=00060002 PFECmask=00000000 PFECmatch=00000000
> >(XEN) VMEntry: intr_info=800000e1 errcode=00000000 ilen=00000000
> >(XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000006
> >(XEN)         reason=80000021 qualification=0000000000000000
> >(XEN) IDTVectoring: info=00000000 errcode=00000000
> >(XEN) TSC Offset = 0xfffd7fc7c2ca5a1c  TSC Multiplier = 0x0000000000000000
> >(XEN) TPR Threshold = 0x00  PostedIntrVec = 0xf4
> >(XEN) EPT pointer = 0x0000000821f8501e  EPTP index = 0x0000
> >(XEN) PLE Gap=00000080 Window=00001000
> >(XEN) Virtual processor ID = 0x5f7a VMfunc controls = 0000000000000000
> >(XEN) **************************************
> >(XEN) domain_crash called from vmx.c:3111
> >(XEN) Domain 1 (vcpu#0) crashed on cpu#12:
> >(XEN) ----[ Xen-4.8-unstable  x86_64  debug=y   Not tainted ]----
> >(XEN) CPU:    12
> >(XEN) RIP:    0008:[<000000008161dd21>]
> >(XEN) RFLAGS: 0000000000200046   CONTEXT: hvm guest (d1v0)
> >(XEN) rax: 00000000000c0000   rbx: 000000000000000a   rcx:
> >00000000000c00d1
> >(XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi:
> >000000008ce53acc
> >(XEN) rbp: 000000008ce53a6c   rsp: 000000008ce53a6c   r8:
> >0000000000000000
> >(XEN) r9:  0000000000000000   r10: 0000000000000000   r11:
> >0000000000000000
> >(XEN) r12: 0000000000000000   r13: 0000000000000000   r14:
> >0000000000000000
> >(XEN) r15: 0000000000000000   cr0: 0000000080010031   cr4:
> >00000000000406b8
> >(XEN) cr3: 0000000000185000   cr2: 000000008cc09000
> >(XEN) ds: 0023   es: 0023   fs: 0030   gs: 0000   ss: 0010   cs: 0008
> >(XEN) HVM2 save: CPU
> >(XEN) HVM2 save: PIC
> >(XEN) HVM2 save: IOAPIC
> >(XEN) HVM2 save: LAPIC
> >(XEN) HVM2 save: LAPIC_REGS
> >(XEN) HVM2 save: PCI_IRQ
> >(XEN) HVM2 save: ISA_IRQ
> >(XEN) HVM2 save: PCI_LINK
> >(XEN) HVM2 save: PIT
> >(XEN) HVM2 save: RTC
> >(XEN) HVM2 save: HPET
> >(XEN) HVM2 save: PMTIMER
> >(XEN) HVM2 save: MTRR
> >(XEN) HVM2 save: VIRIDIAN_DOMAIN
> >(XEN) HVM2 save: CPU_XSAVE
> >(XEN) HVM2 save: VIRIDIAN_VCPU
> >(XEN) HVM2 save: VMCE_VCPU
> >(XEN) HVM2 save: TSC_ADJUST
> >(XEN) HVM2 restore: CPU 0
> >
> >
> >Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-17  9:27                 ` Xuquan (Quan Xu)
  2016-10-21  8:43                   ` Tian, Kevin
@ 2016-10-24  7:02                   ` Tian, Kevin
  2016-10-25  8:36                     ` Xuquan (Quan Xu)
  2016-10-25  9:01                     ` Xuquan (Quan Xu)
  1 sibling, 2 replies; 23+ messages in thread
From: Tian, Kevin @ 2016-10-24  7:02 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Jan Beulich
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Monday, October 17, 2016 5:28 PM
> 
> >>
> >>Back to the main open before holiday - multiple EOIs may come to clear
> >>irq_issued before guest actually handles the very vpt injection
> >>(possible if vpt vector is shared with other sources). I don't see a
> >>good solution on that open... :/
> >>
> >>We've discussed various options which all fail in one or another place
> >>- either miss an injection, or incur undesired injections.
> >>Possibly we should consider another direction - fall back to non-apicv
> >>path when we see vpt vector pending but it's not the highest one.
> >>
> >>Original condition to enter virtual intr delivery:
> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
> >>              intack.source != hvm_intsrc_pic &&
> >>              intack.source != hvm_intsrc_vector )
> >>
> >>now new condition:
> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
> >>              intack.source != hvm_intsrc_pic &&
> >>              intack.source != hvm_intsrc_vector &&
> >>			(pt_vector == -1 || intack.vector == pt_vector) )
> >>
> >>Thoughts?
> >>
> >Kevin,
> >When I try to fix it as your suggestion, I cannot boot the guest, with below
> >message(from xl dmesg):
> 
> with Kevin's patch, the hypervisor always calls ' vmx_inject_extint() ->
> __vmx_inject_exception()' to inject exception, then vm-entry on loop..
> the interrupt (PT or IPI, or others) can't deliver to guest..
> 
> and so far, we suppress MSR-based APIC suggestion when having APIC-V by
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec62a2818e643
> 90ac2ccfbd74e6b7
> so I think we couldn't fallback to non-apicv dynamically here..
> 

What about setting EOI exit bitmap for intack.vector when it's higher
than pending pt_vector? This way we can guarantee there's always a
chance to post pt_vector when pt_vector becomes the highest one...

(of course you need make later pt_intr_post conditionally then, only
when intack.vector==pt_vector)

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-24  7:02                   ` Tian, Kevin
@ 2016-10-25  8:36                     ` Xuquan (Quan Xu)
  2016-10-25 13:01                       ` Konrad Rzeszutek Wilk
  2016-10-26  5:19                       ` Tian, Kevin
  2016-10-25  9:01                     ` Xuquan (Quan Xu)
  1 sibling, 2 replies; 23+ messages in thread
From: Xuquan (Quan Xu) @ 2016-10-25  8:36 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

On October 24, 2016 3:02 PM, Tian, Kevin wrote:
>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> Sent: Monday, October 17, 2016 5:28 PM
>>
>> >>
>> >>Back to the main open before holiday - multiple EOIs may come to
>> >>clear irq_issued before guest actually handles the very vpt
>> >>injection (possible if vpt vector is shared with other sources). I
>> >>don't see a good solution on that open... :/
>> >>
>> >>We've discussed various options which all fail in one or another
>> >>place
>> >>- either miss an injection, or incur undesired injections.
>> >>Possibly we should consider another direction - fall back to
>> >>non-apicv path when we see vpt vector pending but it's not the highest one.
>> >>
>> >>Original condition to enter virtual intr delivery:
>> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
>> >>              intack.source != hvm_intsrc_pic &&
>> >>              intack.source != hvm_intsrc_vector )
>> >>
>> >>now new condition:
>> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
>> >>              intack.source != hvm_intsrc_pic &&
>> >>              intack.source != hvm_intsrc_vector &&
>> >>			(pt_vector == -1 || intack.vector == pt_vector) )
>> >>
>> >>Thoughts?
>> >>
>> >Kevin,
>> >When I try to fix it as your suggestion, I cannot boot the guest,
>> >with below message(from xl dmesg):
>>
>> with Kevin's patch, the hypervisor always calls ' vmx_inject_extint()
>> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop..
>> the interrupt (PT or IPI, or others) can't deliver to guest..
>>
>> and so far, we suppress MSR-based APIC suggestion when having APIC-V
>> by
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec6
>> 2a2818e643
>> 90ac2ccfbd74e6b7
>> so I think we couldn't fallback to non-apicv dynamically here..
>>
>
>What about setting EOI exit bitmap for intack.vector when it's higher than
>pending pt_vector? This way we can guarantee there's always a chance to post
>pt_vector when pt_vector becomes the highest one...
>
>(of course you need make later pt_intr_post conditionally then, only when
>intack.vector==pt_vector)

Kevin, thanks for your positive reply. I have returned back that server (Intel(R) Xeon(R) CPU E5-2620 v3), so I can't
Verify it on time.

By my understanding, ''Virtual-Interrupt Delivery'' and "EOI Virtualization" were independent of each other.
Even we set setting EOI exit bitmap here to cause EOI-induced VM exit, we still can't guarantee the PT interrupt is delivered to guest
 from  the highest one delivered      to      the highest one EOI-induced VM exit..

I am afraid we could find a clean solution based on current implementation (kvm is ok).. and apicv results in decreased application performance for some windows guest.

So I suggest to configure ' msr_based_apic=1' / 'apicv = 0' to enable viridian for windows guest.. this is a workaround..

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-24  7:02                   ` Tian, Kevin
  2016-10-25  8:36                     ` Xuquan (Quan Xu)
@ 2016-10-25  9:01                     ` Xuquan (Quan Xu)
  1 sibling, 0 replies; 23+ messages in thread
From: Xuquan (Quan Xu) @ 2016-10-25  9:01 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Tian, Kevin, Jan Beulich
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

On October 25, 2016 4:36 PM, Quan Xu wrote:
>I am afraid we could find a clean solution based on current implementation (kvm

Sorry, a typo.. 

s/could/could not/

>is ok).. and apicv results in decreased application performance for some windows
>guest.
>
>So I suggest to configure ' msr_based_apic=1' / 'apicv = 0' to enable viridian for
>windows guest.. this is a workaround..
>
>Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-25  8:36                     ` Xuquan (Quan Xu)
@ 2016-10-25 13:01                       ` Konrad Rzeszutek Wilk
  2016-10-26  8:42                         ` Xuquan (Quan Xu)
  2016-10-26  5:19                       ` Tian, Kevin
  1 sibling, 1 reply; 23+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-10-25 13:01 UTC (permalink / raw)
  To: Xuquan (Quan Xu)
  Cc: yang.zhang.wz, Tian, Kevin, Nakajima, Jun, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Jan Beulich

On Tue, Oct 25, 2016 at 08:36:23AM +0000, Xuquan (Quan Xu) wrote:
> On October 24, 2016 3:02 PM, Tian, Kevin wrote:
> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> >> Sent: Monday, October 17, 2016 5:28 PM
> >>
> >> >>
> >> >>Back to the main open before holiday - multiple EOIs may come to
> >> >>clear irq_issued before guest actually handles the very vpt
> >> >>injection (possible if vpt vector is shared with other sources). I
> >> >>don't see a good solution on that open... :/
> >> >>
> >> >>We've discussed various options which all fail in one or another
> >> >>place
> >> >>- either miss an injection, or incur undesired injections.
> >> >>Possibly we should consider another direction - fall back to
> >> >>non-apicv path when we see vpt vector pending but it's not the highest one.
> >> >>
> >> >>Original condition to enter virtual intr delivery:
> >> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
> >> >>              intack.source != hvm_intsrc_pic &&
> >> >>              intack.source != hvm_intsrc_vector )
> >> >>
> >> >>now new condition:
> >> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
> >> >>              intack.source != hvm_intsrc_pic &&
> >> >>              intack.source != hvm_intsrc_vector &&
> >> >>			(pt_vector == -1 || intack.vector == pt_vector) )
> >> >>
> >> >>Thoughts?
> >> >>
> >> >Kevin,
> >> >When I try to fix it as your suggestion, I cannot boot the guest,
> >> >with below message(from xl dmesg):
> >>
> >> with Kevin's patch, the hypervisor always calls ' vmx_inject_extint()
> >> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop..
> >> the interrupt (PT or IPI, or others) can't deliver to guest..
> >>
> >> and so far, we suppress MSR-based APIC suggestion when having APIC-V
> >> by
> >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec6
> >> 2a2818e643
> >> 90ac2ccfbd74e6b7
> >> so I think we couldn't fallback to non-apicv dynamically here..
> >>
> >
> >What about setting EOI exit bitmap for intack.vector when it's higher than
> >pending pt_vector? This way we can guarantee there's always a chance to post
> >pt_vector when pt_vector becomes the highest one...
> >
> >(of course you need make later pt_intr_post conditionally then, only when
> >intack.vector==pt_vector)
> 
> Kevin, thanks for your positive reply. I have returned back that server (Intel(R) Xeon(R) CPU E5-2620 v3), so I can't
> Verify it on time.
> 
> By my understanding, ''Virtual-Interrupt Delivery'' and "EOI Virtualization" were independent of each other.
> Even we set setting EOI exit bitmap here to cause EOI-induced VM exit, we still can't guarantee the PT interrupt is delivered to guest
>  from  the highest one delivered      to      the highest one EOI-induced VM exit..
> 
> I am afraid we could find a clean solution based on current implementation (kvm is ok).. and apicv results in decreased application performance for some windows guest.
> 

Could you describe what the KVM implementation solution did that made it
clean? I am curious whether the concept could be put in Xen?

> So I suggest to configure ' msr_based_apic=1' / 'apicv = 0' to enable viridian for windows guest.. this is a workaround..
> 
> Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-25  8:36                     ` Xuquan (Quan Xu)
  2016-10-25 13:01                       ` Konrad Rzeszutek Wilk
@ 2016-10-26  5:19                       ` Tian, Kevin
  2016-10-26  8:38                         ` Xuquan (Quan Xu)
  1 sibling, 1 reply; 23+ messages in thread
From: Tian, Kevin @ 2016-10-26  5:19 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Jan Beulich
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Tuesday, October 25, 2016 4:36 PM
> 
> On October 24, 2016 3:02 PM, Tian, Kevin wrote:
> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> >> Sent: Monday, October 17, 2016 5:28 PM
> >>
> >> >>
> >> >>Back to the main open before holiday - multiple EOIs may come to
> >> >>clear irq_issued before guest actually handles the very vpt
> >> >>injection (possible if vpt vector is shared with other sources). I
> >> >>don't see a good solution on that open... :/
> >> >>
> >> >>We've discussed various options which all fail in one or another
> >> >>place
> >> >>- either miss an injection, or incur undesired injections.
> >> >>Possibly we should consider another direction - fall back to
> >> >>non-apicv path when we see vpt vector pending but it's not the highest one.
> >> >>
> >> >>Original condition to enter virtual intr delivery:
> >> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
> >> >>              intack.source != hvm_intsrc_pic &&
> >> >>              intack.source != hvm_intsrc_vector )
> >> >>
> >> >>now new condition:
> >> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
> >> >>              intack.source != hvm_intsrc_pic &&
> >> >>              intack.source != hvm_intsrc_vector &&
> >> >>			(pt_vector == -1 || intack.vector == pt_vector) )
> >> >>
> >> >>Thoughts?
> >> >>
> >> >Kevin,
> >> >When I try to fix it as your suggestion, I cannot boot the guest,
> >> >with below message(from xl dmesg):
> >>
> >> with Kevin's patch, the hypervisor always calls ' vmx_inject_extint()
> >> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop..
> >> the interrupt (PT or IPI, or others) can't deliver to guest..
> >>
> >> and so far, we suppress MSR-based APIC suggestion when having APIC-V
> >> by
> >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec6
> >> 2a2818e643
> >> 90ac2ccfbd74e6b7
> >> so I think we couldn't fallback to non-apicv dynamically here..
> >>
> >
> >What about setting EOI exit bitmap for intack.vector when it's higher than
> >pending pt_vector? This way we can guarantee there's always a chance to post
> >pt_vector when pt_vector becomes the highest one...
> >
> >(of course you need make later pt_intr_post conditionally then, only when
> >intack.vector==pt_vector)
> 
> Kevin, thanks for your positive reply. I have returned back that server (Intel(R) Xeon(R)
> CPU E5-2620 v3), so I can't
> Verify it on time.
> 
> By my understanding, ''Virtual-Interrupt Delivery'' and "EOI Virtualization" were
> independent of each other.
> Even we set setting EOI exit bitmap here to cause EOI-induced VM exit, we still can't
> guarantee the PT interrupt is delivered to guest
>  from  the highest one delivered      to      the highest one EOI-induced VM exit..

Can you elaborate why you think it doesn't work? I didn't get your point
here. The idea here is that given above situation occurs - multiple pending
vectors but pt_vector is not the highest - set EOI exit bitmap for highest
vector. Then once guest EOI to the highest vector, a VM-exit happens and
then if pt_vector happens to be the next highest vector, you have chance
to pt_intr_post before resuming to the guest. 

EOI exit bitmap anyway is required for vpt to work correctly...

> 
> I am afraid we could find a clean solution based on current implementation (kvm is ok)..
> and apicv results in decreased application performance for some windows guest.
> 
> So I suggest to configure ' msr_based_apic=1' / 'apicv = 0' to enable viridian for windows
> guest.. this is a workaround..
> 
> Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-26  5:19                       ` Tian, Kevin
@ 2016-10-26  8:38                         ` Xuquan (Quan Xu)
  2016-10-26  9:35                           ` Tian, Kevin
  0 siblings, 1 reply; 23+ messages in thread
From: Xuquan (Quan Xu) @ 2016-10-26  8:38 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

On October 26, 2016 1:20 PM, Tian, Kevin wrote:
>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> Sent: Tuesday, October 25, 2016 4:36 PM
>>
>> On October 24, 2016 3:02 PM, Tian, Kevin wrote:
>> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> >> Sent: Monday, October 17, 2016 5:28 PM
>> >>
>> >> >>
>> >> >>Back to the main open before holiday - multiple EOIs may come to
>> >> >>clear irq_issued before guest actually handles the very vpt
>> >> >>injection (possible if vpt vector is shared with other sources).
>> >> >>I don't see a good solution on that open... :/
>> >> >>
>> >> >>We've discussed various options which all fail in one or another
>> >> >>place
>> >> >>- either miss an injection, or incur undesired injections.
>> >> >>Possibly we should consider another direction - fall back to
>> >> >>non-apicv path when we see vpt vector pending but it's not the highest
>one.
>> >> >>
>> >> >>Original condition to enter virtual intr delivery:
>> >> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
>> >> >>              intack.source != hvm_intsrc_pic &&
>> >> >>              intack.source != hvm_intsrc_vector )
>> >> >>
>> >> >>now new condition:
>> >> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
>> >> >>              intack.source != hvm_intsrc_pic &&
>> >> >>              intack.source != hvm_intsrc_vector &&
>> >> >>			(pt_vector == -1 || intack.vector == pt_vector) )
>> >> >>
>> >> >>Thoughts?
>> >> >>
>> >> >Kevin,
>> >> >When I try to fix it as your suggestion, I cannot boot the guest,
>> >> >with below message(from xl dmesg):
>> >>
>> >> with Kevin's patch, the hypervisor always calls '
>> >> vmx_inject_extint()
>> >> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop..
>> >> the interrupt (PT or IPI, or others) can't deliver to guest..
>> >>
>> >> and so far, we suppress MSR-based APIC suggestion when having
>> >> APIC-V by
>> >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824
>> >> ec6
>> >> 2a2818e643
>> >> 90ac2ccfbd74e6b7
>> >> so I think we couldn't fallback to non-apicv dynamically here..
>> >>
>> >
>> >What about setting EOI exit bitmap for intack.vector when it's higher
>> >than pending pt_vector? This way we can guarantee there's always a
>> >chance to post pt_vector when pt_vector becomes the highest one...
>> >
>> >(of course you need make later pt_intr_post conditionally then, only
>> >when
>> >intack.vector==pt_vector)
>>
>> Kevin, thanks for your positive reply. I have returned back that
>> server (Intel(R) Xeon(R) CPU E5-2620 v3), so I can't Verify it on
>> time.
>>
>> By my understanding, ''Virtual-Interrupt Delivery'' and "EOI
>> Virtualization" were independent of each other.
>> Even we set setting EOI exit bitmap here to cause EOI-induced VM exit,
>> we still can't guarantee the PT interrupt is delivered to guest
>>  from  the highest one delivered      to      the highest one
>EOI-induced VM exit..
>
>Can you elaborate why you think it doesn't work? I didn't get your point here.
>The idea here is that given above situation occurs - multiple pending vectors but
>pt_vector is not the highest - set EOI exit bitmap for highest vector.

I understood your suggestion.

>Then once
>guest EOI to the highest vector, a VM-exit happens and then if pt_vector happens
>to be the next highest vector, you have chance to pt_intr_post before resuming
>to the guest.
>

The gap may be the count (pending_intr_nr) of pending periodic timer interrupt..


The highest vector is consumed as below:

a). vIRR to RVI (software, vmx_intr_assist() )
b). RVI to SVI .etc, then deliver interrupt with Vector through IDT .. (hardware, Virtual-Interrupt Delivery )
c). EOI (hardware, EOI virtualization).. if we set EOI exit bitmap for highest vector, THEN cause EOI-induced VM exit..

if this is serial execution for each pending interrupt, your suggestion is working.

But at step b), after delivered the highest vector, __hardware__ may continue to deliver vpt (if highest) to RVI and deliver it to guest as below " Virtual-Interrupt Delivery ",
and without decreasing the count (pending_intr_nr) of pending periodic timer interrupt..

(( if Xen doesn't update periodic timer interrupt bit set
in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
case to decrease the count (pending_intr_nr) of pending periodic timer interrupt,
then Xen will deliver a periodic timer interrupt again))

..
Virtual-Interrupt Delivery >>>:
Vector ← RVI;
VISR[Vector] ← 1;
SVI ← Vector;
VPPR ← Vector & F0H;
VIRR[Vector] ← 0;
IF any bits set in VIRR
THEN RVI ← highest index of bit set in VIRR (___if vpt is the highest index___)
ELSE RVI ← 0;
FI;
deliver interrupt with Vector through IDT;
cease recognition of any pending virtual interrupt;
<<<<

I think this is still the case, one pending vpt with multiple delivery.



Sorry for my bad description!!
Quan

>EOI exit bitmap anyway is required for vpt to work correctly...
>
>>
>> I am afraid we could find a clean solution based on current implementation
>(kvm is ok)..
>> and apicv results in decreased application performance for some windows
>guest.
>>
>> So I suggest to configure ' msr_based_apic=1' / 'apicv = 0' to enable
>> viridian for windows guest.. this is a workaround..
>>
>> Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-25 13:01                       ` Konrad Rzeszutek Wilk
@ 2016-10-26  8:42                         ` Xuquan (Quan Xu)
  0 siblings, 0 replies; 23+ messages in thread
From: Xuquan (Quan Xu) @ 2016-10-26  8:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: yang.zhang.wz, Tian, Kevin, Nakajima, Jun, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Jan Beulich

On October 25, 2016 9:01 PM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com > wrote:
>>
>
>Could you describe what the KVM implementation solution did that made it clean?
>I am curious whether the concept could be put in Xen?

Konrad, I am still learning this part. I will update it later.
Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-26  8:38                         ` Xuquan (Quan Xu)
@ 2016-10-26  9:35                           ` Tian, Kevin
  2016-10-26 12:02                             ` Xuquan (Quan Xu)
  0 siblings, 1 reply; 23+ messages in thread
From: Tian, Kevin @ 2016-10-26  9:35 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Jan Beulich
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Wednesday, October 26, 2016 4:39 PM
> 
> On October 26, 2016 1:20 PM, Tian, Kevin wrote:
> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> >> Sent: Tuesday, October 25, 2016 4:36 PM
> >>
> >> On October 24, 2016 3:02 PM, Tian, Kevin wrote:
> >> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> >> >> Sent: Monday, October 17, 2016 5:28 PM
> >> >>
> >> >> >>
> >> >> >>Back to the main open before holiday - multiple EOIs may come to
> >> >> >>clear irq_issued before guest actually handles the very vpt
> >> >> >>injection (possible if vpt vector is shared with other sources).
> >> >> >>I don't see a good solution on that open... :/
> >> >> >>
> >> >> >>We've discussed various options which all fail in one or another
> >> >> >>place
> >> >> >>- either miss an injection, or incur undesired injections.
> >> >> >>Possibly we should consider another direction - fall back to
> >> >> >>non-apicv path when we see vpt vector pending but it's not the highest
> >one.
> >> >> >>
> >> >> >>Original condition to enter virtual intr delivery:
> >> >> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
> >> >> >>              intack.source != hvm_intsrc_pic &&
> >> >> >>              intack.source != hvm_intsrc_vector )
> >> >> >>
> >> >> >>now new condition:
> >> >> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
> >> >> >>              intack.source != hvm_intsrc_pic &&
> >> >> >>              intack.source != hvm_intsrc_vector &&
> >> >> >>			(pt_vector == -1 || intack.vector == pt_vector) )
> >> >> >>
> >> >> >>Thoughts?
> >> >> >>
> >> >> >Kevin,
> >> >> >When I try to fix it as your suggestion, I cannot boot the guest,
> >> >> >with below message(from xl dmesg):
> >> >>
> >> >> with Kevin's patch, the hypervisor always calls '
> >> >> vmx_inject_extint()
> >> >> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop..
> >> >> the interrupt (PT or IPI, or others) can't deliver to guest..
> >> >>
> >> >> and so far, we suppress MSR-based APIC suggestion when having
> >> >> APIC-V by
> >> >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824
> >> >> ec6
> >> >> 2a2818e643
> >> >> 90ac2ccfbd74e6b7
> >> >> so I think we couldn't fallback to non-apicv dynamically here..
> >> >>
> >> >
> >> >What about setting EOI exit bitmap for intack.vector when it's higher
> >> >than pending pt_vector? This way we can guarantee there's always a
> >> >chance to post pt_vector when pt_vector becomes the highest one...
> >> >
> >> >(of course you need make later pt_intr_post conditionally then, only
> >> >when
> >> >intack.vector==pt_vector)
> >>
> >> Kevin, thanks for your positive reply. I have returned back that
> >> server (Intel(R) Xeon(R) CPU E5-2620 v3), so I can't Verify it on
> >> time.
> >>
> >> By my understanding, ''Virtual-Interrupt Delivery'' and "EOI
> >> Virtualization" were independent of each other.
> >> Even we set setting EOI exit bitmap here to cause EOI-induced VM exit,
> >> we still can't guarantee the PT interrupt is delivered to guest
> >>  from  the highest one delivered      to      the highest one
> >EOI-induced VM exit..
> >
> >Can you elaborate why you think it doesn't work? I didn't get your point here.
> >The idea here is that given above situation occurs - multiple pending vectors but
> >pt_vector is not the highest - set EOI exit bitmap for highest vector.
> 
> I understood your suggestion.
> 
> >Then once
> >guest EOI to the highest vector, a VM-exit happens and then if pt_vector happens
> >to be the next highest vector, you have chance to pt_intr_post before resuming
> >to the guest.
> >
> 
> The gap may be the count (pending_intr_nr) of pending periodic timer interrupt..
> 
> 
> The highest vector is consumed as below:
> 
> a). vIRR to RVI (software, vmx_intr_assist() )
> b). RVI to SVI .etc, then deliver interrupt with Vector through IDT .. (hardware,
> Virtual-Interrupt Delivery )
> c). EOI (hardware, EOI virtualization).. if we set EOI exit bitmap for highest vector, THEN
> cause EOI-induced VM exit..
> 
> if this is serial execution for each pending interrupt, your suggestion is working.
> 
> But at step b), after delivered the highest vector, __hardware__ may continue to deliver
> vpt (if highest) to RVI and deliver it to guest as below " Virtual-Interrupt Delivery ",
> and without decreasing the count (pending_intr_nr) of pending periodic timer interrupt..
> 
> (( if Xen doesn't update periodic timer interrupt bit set
> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
> case to decrease the count (pending_intr_nr) of pending periodic timer interrupt,
> then Xen will deliver a periodic timer interrupt again))
> 
> ..
> Virtual-Interrupt Delivery >>>:
> Vector ← RVI;
> VISR[Vector] ← 1;
> SVI ← Vector;
> VPPR ← Vector & F0H;
> VIRR[Vector] ← 0;
> IF any bits set in VIRR
> THEN RVI ← highest index of bit set in VIRR (___if vpt is the highest index___)

updating RVI doesn't mean delivering the virtual interrupt. There is no difference
from how software updates that field before resuming to guest.

> ELSE RVI ← 0;
> FI;
> deliver interrupt with Vector through IDT;
> cease recognition of any pending virtual interrupt;

please note recognition is ceased here.

If you check 29.2.1 Evaluation of pending virtual interrupts, virtual interrupt
is evaluated at VM entry, TPR virtualization, EOI virtualization, self-IPI 
virtualization, and posted-interrupt processing. and evaluation is as below:

IF “interrupt-window exiting” is 0 AND
RVI[7:4] > VPPR[7:4] (see Section 29.1.1 for definition of VPPR)
THEN recognize a pending virtual interrupt;
ELSE
do not recognize a pending virtual interrupt;
FI;

Even an evaluation is triggered when highest vector is being handled
(before EOI), RVI (with pt_vector) is smaller than VPPR (which is the
current highest vector). So no injection of pt_vector will happen. Then
later when highest vector is EOI-ed, a VM-exit happens immediately...


Thanks
Kevin


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue
  2016-10-26  9:35                           ` Tian, Kevin
@ 2016-10-26 12:02                             ` Xuquan (Quan Xu)
  0 siblings, 0 replies; 23+ messages in thread
From: Xuquan (Quan Xu) @ 2016-10-26 12:02 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich
  Cc: yang.zhang.wz, Hanweidong (Randy),
	George.Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jiangyifei,
	Nakajima, Jun

On October 26, 2016 5:35 PM, Tian, Kevin wrote:
>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> Sent: Wednesday, October 26, 2016 4:39 PM
>>
>> On October 26, 2016 1:20 PM, Tian, Kevin wrote:
>> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> >> Sent: Tuesday, October 25, 2016 4:36 PM
>> >>
>> >> On October 24, 2016 3:02 PM, Tian, Kevin wrote:
>> >> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> >> >> Sent: Monday, October 17, 2016 5:28 PM
>> >> >>
>> >> >> >>
>> >> >> >>Back to the main open before holiday - multiple EOIs may come
>> >> >> >>to clear irq_issued before guest actually handles the very vpt
>> >> >> >>injection (possible if vpt vector is shared with other sources).
>> >> >> >>I don't see a good solution on that open... :/
>> >> >> >>
>> >> >> >>We've discussed various options which all fail in one or
>> >> >> >>another place
>> >> >> >>- either miss an injection, or incur undesired injections.
>> >> >> >>Possibly we should consider another direction - fall back to
>> >> >> >>non-apicv path when we see vpt vector pending but it's not the
>> >> >> >>highest
>> >one.
>> >> >> >>
>> >> >> >>Original condition to enter virtual intr delivery:
>> >> >> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
>> >> >> >>              intack.source != hvm_intsrc_pic &&
>> >> >> >>              intack.source != hvm_intsrc_vector )
>> >> >> >>
>> >> >> >>now new condition:
>> >> >> >>	    else if ( cpu_has_vmx_virtual_intr_delivery &&
>> >> >> >>              intack.source != hvm_intsrc_pic &&
>> >> >> >>              intack.source != hvm_intsrc_vector &&
>> >> >> >>			(pt_vector == -1 || intack.vector == pt_vector) )
>> >> >> >>
>> >> >> >>Thoughts?
>> >> >> >>
>> >> >> >Kevin,
>> >> >> >When I try to fix it as your suggestion, I cannot boot the
>> >> >> >guest, with below message(from xl dmesg):
>> >> >>
>> >> >> with Kevin's patch, the hypervisor always calls '
>> >> >> vmx_inject_extint()
>> >> >> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop..
>> >> >> the interrupt (PT or IPI, or others) can't deliver to guest..
>> >> >>
>> >> >> and so far, we suppress MSR-based APIC suggestion when having
>> >> >> APIC-V by
>> >> >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b
>> >> >> 824
>> >> >> ec6
>> >> >> 2a2818e643
>> >> >> 90ac2ccfbd74e6b7
>> >> >> so I think we couldn't fallback to non-apicv dynamically here..
>> >> >>
>> >> >
>> >> >What about setting EOI exit bitmap for intack.vector when it's
>> >> >higher than pending pt_vector? This way we can guarantee there's
>> >> >always a chance to post pt_vector when pt_vector becomes the highest
>one...
>> >> >
>> >> >(of course you need make later pt_intr_post conditionally then,
>> >> >only when
>> >> >intack.vector==pt_vector)
>> >>
>> >> Kevin, thanks for your positive reply. I have returned back that
>> >> server (Intel(R) Xeon(R) CPU E5-2620 v3), so I can't Verify it on
>> >> time.
>> >>
>> >> By my understanding, ''Virtual-Interrupt Delivery'' and "EOI
>> >> Virtualization" were independent of each other.
>> >> Even we set setting EOI exit bitmap here to cause EOI-induced VM
>> >> exit, we still can't guarantee the PT interrupt is delivered to guest
>> >>  from  the highest one delivered      to      the highest one
>> >EOI-induced VM exit..
>> >
>> >Can you elaborate why you think it doesn't work? I didn't get your point here.
>> >The idea here is that given above situation occurs - multiple pending
>> >vectors but pt_vector is not the highest - set EOI exit bitmap for highest
>vector.
>>
>> I understood your suggestion.
>>
>> >Then once
>> >guest EOI to the highest vector, a VM-exit happens and then if
>> >pt_vector happens to be the next highest vector, you have chance to
>> >pt_intr_post before resuming to the guest.
>> >
>>
>> The gap may be the count (pending_intr_nr) of pending periodic timer
>interrupt..
>>
>>
>> The highest vector is consumed as below:
>>
>> a). vIRR to RVI (software, vmx_intr_assist() ) b). RVI to SVI .etc,
>> then deliver interrupt with Vector through IDT .. (hardware,
>> Virtual-Interrupt Delivery ) c). EOI (hardware, EOI virtualization)..
>> if we set EOI exit bitmap for highest vector, THEN cause EOI-induced
>> VM exit..
>>
>> if this is serial execution for each pending interrupt, your suggestion is working.
>>
>> But at step b), after delivered the highest vector, __hardware__ may
>> continue to deliver vpt (if highest) to RVI and deliver it to guest as
>> below " Virtual-Interrupt Delivery ", and without decreasing the count
>(pending_intr_nr) of pending periodic timer interrupt..
>>
>> (( if Xen doesn't update periodic timer interrupt bit set in VIRR to
>> guest interrupt status (RVI) directly, Xen is not aware of this case
>> to decrease the count (pending_intr_nr) of pending periodic timer
>> interrupt, then Xen will deliver a periodic timer interrupt again))
>>
>> ..
>> Virtual-Interrupt Delivery >>>:
>> Vector ← RVI;
>> VISR[Vector] ← 1;
>> SVI ← Vector;
>> VPPR ← Vector & F0H;
>> VIRR[Vector] ← 0;
>> IF any bits set in VIRR
>> THEN RVI ← highest index of bit set in VIRR (___if vpt is the highest
>> index___)
>
>updating RVI doesn't mean delivering the virtual interrupt. There is no difference
>from how software updates that field before resuming to guest.
>
>> ELSE RVI ← 0;
>> FI;
>> deliver interrupt with Vector through IDT; cease recognition of any
>> pending virtual interrupt;
>
>please note recognition is ceased here.
>
>If you check 29.2.1 Evaluation of pending virtual interrupts, virtual interrupt is
>evaluated at VM entry, TPR virtualization, EOI virtualization, self-IPI virtualization,
>and posted-interrupt processing. and evaluation is as below:
>
>IF “interrupt-window exiting” is 0 AND
>RVI[7:4] > VPPR[7:4] (see Section 29.1.1 for definition of VPPR) THEN recognize a
>pending virtual interrupt; ELSE do not recognize a pending virtual interrupt; FI;
>
>Even an evaluation is triggered when highest vector is being handled (before EOI),
>RVI (with pt_vector) is smaller than VPPR (which is the current highest vector). So
>no injection of pt_vector will happen. Then later when highest vector is EOI-ed, a
>VM-exit happens immediately...
>
>


Cool, you are right. I will update/verify patch as your suggestion when I get back that server later.
Thank you very much!!


Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-10-26 12:02 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-20 13:30 [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue Xuquan (Euler)
2016-09-23 15:33 ` Jan Beulich
2016-09-23 23:34   ` Tian, Kevin
2016-09-24  1:06     ` Xuquan (Euler)
2016-09-26  6:39       ` Jan Beulich
2016-10-10  9:21         ` Xuquan (Quan Xu)
2016-10-10  9:39           ` Jan Beulich
2016-10-10 10:49             ` Xuquan (Quan Xu)
2016-10-11  7:48               ` Tian, Kevin
2016-10-11 11:12                 ` Xuquan (Quan Xu)
2016-10-17  9:27                 ` Xuquan (Quan Xu)
2016-10-21  8:43                   ` Tian, Kevin
2016-10-24  7:02                   ` Tian, Kevin
2016-10-25  8:36                     ` Xuquan (Quan Xu)
2016-10-25 13:01                       ` Konrad Rzeszutek Wilk
2016-10-26  8:42                         ` Xuquan (Quan Xu)
2016-10-26  5:19                       ` Tian, Kevin
2016-10-26  8:38                         ` Xuquan (Quan Xu)
2016-10-26  9:35                           ` Tian, Kevin
2016-10-26 12:02                             ` Xuquan (Quan Xu)
2016-10-25  9:01                     ` Xuquan (Quan Xu)
2016-09-26  6:35     ` Jan Beulich
2016-09-26 19:34       ` Tian, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).