All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
@ 2017-02-06 23:32 Chao Gao
  2017-02-07  6:48 ` Xuquan (Quan Xu)
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Chao Gao @ 2017-02-06 23:32 UTC (permalink / raw)
  To: xen-devel
  Cc: tianyu.lan, Kevin Tian, Jan Beulich, Andrew Cooper, xuquan8,
	Jun Nakajima

Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
added a assertion that intack.vector is the highest priority vector. But
according to the osstest, the assertion failed sometimes. More discussion can
be found in the thread
(https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).

The assertion failure is hard to reproduce. In order to root cause issue, this
patch is to add logs to dump PIR and vIRR when failure takes place. It should
be reverted once the root cause is found.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 xen/arch/x86/hvm/vmx/intr.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 24e4505..5e5b37a 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -23,6 +23,7 @@
 #include <xen/errno.h>
 #include <xen/trace.h>
 #include <xen/event.h>
+#include <asm/apicdef.h>
 #include <asm/current.h>
 #include <asm/cpufeature.h>
 #include <asm/processor.h>
@@ -318,6 +319,48 @@ void vmx_intr_assist(void)
         */
         if ( pt_vector != -1 )
         {
+            /*
+             * We assert that intack.vector is the highest priority vector for
+             * only an interrupt from vlapic can reach this point and the
+             * highest vector is chosen in hvm_vcpu_has_pending_irq().
+             * But, in fact, the assertion failed sometimes. It is suspected
+             * that PIR is not synced to vIRR which makes pt_vector is left in
+             * PIR. In order to verify this suspicion, dump some information
+             * when the assertion fails.
+             */
+            if ( unlikely(intack.vector < pt_vector) )
+            {
+                struct vlapic *vlapic;
+                struct pi_desc *pi_desc;
+                uint32_t *word;
+                int i;
+
+                printk("Assertion failed on %pv. Some info are below.\n",
+                       current);
+                printk("intack info: vector 0x%2x, source %x\n", intack.vector,
+                       intack.source);
+                printk("pt_vector info: vector 0x%2x\n", pt_vector);
+
+                vlapic = vcpu_vlapic(v);
+                if ( vlapic && vlapic->regs->data )
+                {
+                    word = (void *)&vlapic->regs->data[APIC_IRR];
+                    printk("vIRR:");
+                    for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- )
+                        printk(" %08x", word[i*4]);
+                    printk("\n");
+                }
+
+                pi_desc = &v->arch.hvm_vmx.pi_desc;
+                if ( pi_desc && pi_desc->pir )
+                {
+                    word = (void *)&pi_desc->pir;
+                    printk(" PIR:");
+                    for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- )
+                        printk(" %08x", word[i]);
+                    printk("\n");
+                }
+            }
             ASSERT(intack.vector >= pt_vector);
             vmx_set_eoi_exit_bitmap(v, intack.vector);
         }
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-07 10:04   ` Jan Beulich
@ 2017-02-07  6:32     ` Chao Gao
  2017-02-07 13:46       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Chao Gao @ 2017-02-07  6:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tianyu.lan, Xuquan (Quan Xu),
	Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
>> 
>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>> {
>> 	/* This may race with setting of irr in __apic_accept_irq() and
>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>> 	 * will cause vmexit immediately and the value will be recalculated
>> 	 * on the next vmentry.
>> 	 */
>> ...
>> }
>> 
>> I am afraid, there may be a similar race with setting of vIRR..
>
>I think this is unrelated: If another interrupt came in, the highest
>set bit in vIRR can only increase. Plus pt_update_irq(), before
>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>highest priority interrupt should still have a vector >= pt_vector.

I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
when the assertion failed. In the process of debugging pt_update_irq() 
when I was working on another feature, I noticed that 0x30 is always 
the vector of IRQ2. I suspect that the source of the periodic timer interrupt
is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
one in hvm_isa_irq_assert() and the other in pt_irq_vector().
If guest changed the content in viopaic entry between the two read
(ie. change the vector from 0x30 to 0x38), the assertion would fail.
Do you think it is a reasonable explanation?

Thanks,
chao
>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-06 23:32 [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT() Chao Gao
@ 2017-02-07  6:48 ` Xuquan (Quan Xu)
  2017-02-07 10:04   ` Jan Beulich
  2017-02-07  9:51 ` Jan Beulich
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Xuquan (Quan Xu) @ 2017-02-07  6:48 UTC (permalink / raw)
  To: Chao Gao, xen-devel
  Cc: tianyu.lan, Andrew Cooper, Kevin Tian, Jun Nakajima, Jan Beulich

On February 07, 2017 7:33 AM, Chao Gao wrote:
>Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue")
>has added a assertion that intack.vector is the highest priority vector. But
>according to the osstest, the assertion failed sometimes. More discussion
>can be found in the thread
>(https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.ht
>ml).
>
>The assertion failure is hard to reproduce. In order to root cause issue, this
>patch is to add logs to dump PIR and vIRR when failure takes place. It
>should be reverted once the root cause is found.
>
>Signed-off-by: Chao Gao <chao.gao@intel.com>
>---
> xen/arch/x86/hvm/vmx/intr.c | 43
>+++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
>diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>index 24e4505..5e5b37a 100644
>--- a/xen/arch/x86/hvm/vmx/intr.c
>+++ b/xen/arch/x86/hvm/vmx/intr.c
>@@ -23,6 +23,7 @@
> #include <xen/errno.h>
> #include <xen/trace.h>
> #include <xen/event.h>
>+#include <asm/apicdef.h>
> #include <asm/current.h>
> #include <asm/cpufeature.h>
> #include <asm/processor.h>
>@@ -318,6 +319,48 @@ void vmx_intr_assist(void)
>         */
>         if ( pt_vector != -1 )
>         {
>+            /*
>+             * We assert that intack.vector is the highest priority vector
>for
>+             * only an interrupt from vlapic can reach this point and
>the
>+             * highest vector is chosen in hvm_vcpu_has_pending_irq().
>+             * But, in fact, the assertion failed sometimes. It is
>suspected
>+             * that PIR is not synced to vIRR which makes pt_vector is
>left in
>+             * PIR. In order to verify this suspicion, dump some
>information
>+             * when the assertion fails.
>+             */

Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,

int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
{
	/* This may race with setting of irr in __apic_accept_irq() and
	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
	 * will cause vmexit immediately and the value will be recalculated
	 * on the next vmentry.
	 */
...
}

I am afraid, there may be a similar race with setting of vIRR..

Quan




>+            if ( unlikely(intack.vector < pt_vector) )
>+            {
>+                struct vlapic *vlapic;
>+                struct pi_desc *pi_desc;
>+                uint32_t *word;
>+                int i;
>+
>+                printk("Assertion failed on %pv. Some info are
>below.\n",
>+                       current);
>+                printk("intack info: vector 0x%2x, source %x\n",
>intack.vector,
>+                       intack.source);
>+                printk("pt_vector info: vector 0x%2x\n", pt_vector);
>+
>+                vlapic = vcpu_vlapic(v);
>+                if ( vlapic && vlapic->regs->data )
>+                {
>+                    word = (void *)&vlapic->regs->data[APIC_IRR];
>+                    printk("vIRR:");
>+                    for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- )
>+                        printk(" %08x", word[i*4]);
>+                    printk("\n");
>+                }
>+
>+                pi_desc = &v->arch.hvm_vmx.pi_desc;
>+                if ( pi_desc && pi_desc->pir )
>+                {
>+                    word = (void *)&pi_desc->pir;
>+                    printk(" PIR:");
>+                    for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- )
>+                        printk(" %08x", word[i]);
>+                    printk("\n");
>+                }
>+            }
>             ASSERT(intack.vector >= pt_vector);
>             vmx_set_eoi_exit_bitmap(v, intack.vector);
>         }
>--
>1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-07 13:46       ` Jan Beulich
@ 2017-02-07  7:28         ` Chao Gao
  2017-02-07 16:18           ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Chao Gao @ 2017-02-07  7:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tianyu.lan, Xuquan (Quan Xu),
	Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote:
>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>>>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
>>>> 
>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>>>> {
>>>> 	/* This may race with setting of irr in __apic_accept_irq() and
>>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>>>> 	 * will cause vmexit immediately and the value will be recalculated
>>>> 	 * on the next vmentry.
>>>> 	 */
>>>> ...
>>>> }
>>>> 
>>>> I am afraid, there may be a similar race with setting of vIRR..
>>>
>>>I think this is unrelated: If another interrupt came in, the highest
>>>set bit in vIRR can only increase. Plus pt_update_irq(), before
>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>>>highest priority interrupt should still have a vector >= pt_vector.
>> 
>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
>> when the assertion failed. In the process of debugging pt_update_irq() 
>> when I was working on another feature, I noticed that 0x30 is always 
>> the vector of IRQ2. I suspect that the source of the periodic timer interrupt
>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
>> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
>> If guest changed the content in viopaic entry between the two read
>> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
>> Do you think it is a reasonable explanation?
>
>IRQ2? If this was going via the PIC, that would be impossible, as
>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
>IO-APIC (and we follow this model - see tools/libacpi/build.c). How

Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC.

>did you determine that 0x30 is the vector of IRQ2?

By monitoring the write operation to IO-APIC redirection entry.
From experimental data, linux kernel always writes vector 0x30 to
redirection entry of pin2 of IO-APIC.

Thank,
chao
>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-06 23:32 [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT() Chao Gao
  2017-02-07  6:48 ` Xuquan (Quan Xu)
@ 2017-02-07  9:51 ` Jan Beulich
  2017-02-07 10:04   ` Wei Liu
  2017-02-10  9:36 ` Jan Beulich
  2017-02-14 10:51 ` Jan Beulich
  3 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2017-02-07  9:51 UTC (permalink / raw)
  To: Wei Liu, Ian Jackson, Chao Gao
  Cc: tianyu.lan, xuquan8, Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> added a assertion that intack.vector is the highest priority vector. But
> according to the osstest, the assertion failed sometimes. More discussion can
> be found in the thread
> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> 
> The assertion failure is hard to reproduce. In order to root cause issue, this
> patch is to add logs to dump PIR and vIRR when failure takes place. It should
> be reverted once the root cause is found.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>

If this wasn't just a temporary thing, I'd have some comments, but
it's good enough (perhaps with a little massaging when committing)
for this purpose.

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Ian, Wei, is there a way to automatically scan osstest logs for
occurrences of the extra output, or does someone need to look
through the logs of every spurious failure?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-07  6:48 ` Xuquan (Quan Xu)
@ 2017-02-07 10:04   ` Jan Beulich
  2017-02-07  6:32     ` Chao Gao
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2017-02-07 10:04 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Chao Gao, xen-devel
  Cc: tianyu.lan, Andrew Cooper, Kevin Tian, Jun Nakajima

>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
> 
> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
> {
> 	/* This may race with setting of irr in __apic_accept_irq() and
> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
> 	 * will cause vmexit immediately and the value will be recalculated
> 	 * on the next vmentry.
> 	 */
> ...
> }
> 
> I am afraid, there may be a similar race with setting of vIRR..

I think this is unrelated: If another interrupt came in, the highest
set bit in vIRR can only increase. Plus pt_update_irq(), before
returning, calls vlapic_set_irq(), which ought to result in pt_vector's
vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
highest priority interrupt should still have a vector >= pt_vector.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-07  9:51 ` Jan Beulich
@ 2017-02-07 10:04   ` Wei Liu
  2017-02-08  7:49     ` Tian, Kevin
  0 siblings, 1 reply; 21+ messages in thread
From: Wei Liu @ 2017-02-07 10:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tianyu.lan, xuquan8, Wei Liu, Andrew Cooper, Ian Jackson,
	Kevin Tian, Jun Nakajima, xen-devel, Chao Gao

On Tue, Feb 07, 2017 at 02:51:54AM -0700, Jan Beulich wrote:
> >>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> > added a assertion that intack.vector is the highest priority vector. But
> > according to the osstest, the assertion failed sometimes. More discussion can
> > be found in the thread
> > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> > 
> > The assertion failure is hard to reproduce. In order to root cause issue, this
> > patch is to add logs to dump PIR and vIRR when failure takes place. It should
> > be reverted once the root cause is found.
> > 
> > Signed-off-by: Chao Gao <chao.gao@intel.com>
> 
> If this wasn't just a temporary thing, I'd have some comments, but
> it's good enough (perhaps with a little massaging when committing)
> for this purpose.
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Ian, Wei, is there a way to automatically scan osstest logs for
> occurrences of the extra output, or does someone need to look
> through the logs of every spurious failure?
> 

I don't think there is automatically scanning in place.

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-07  6:32     ` Chao Gao
@ 2017-02-07 13:46       ` Jan Beulich
  2017-02-07  7:28         ` Chao Gao
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2017-02-07 13:46 UTC (permalink / raw)
  To: Chao Gao
  Cc: tianyu.lan, Xuquan (Quan Xu),
	Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
>>> 
>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>>> {
>>> 	/* This may race with setting of irr in __apic_accept_irq() and
>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>>> 	 * will cause vmexit immediately and the value will be recalculated
>>> 	 * on the next vmentry.
>>> 	 */
>>> ...
>>> }
>>> 
>>> I am afraid, there may be a similar race with setting of vIRR..
>>
>>I think this is unrelated: If another interrupt came in, the highest
>>set bit in vIRR can only increase. Plus pt_update_irq(), before
>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>>highest priority interrupt should still have a vector >= pt_vector.
> 
> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
> when the assertion failed. In the process of debugging pt_update_irq() 
> when I was working on another feature, I noticed that 0x30 is always 
> the vector of IRQ2. I suspect that the source of the periodic timer interrupt
> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
> If guest changed the content in viopaic entry between the two read
> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
> Do you think it is a reasonable explanation?

IRQ2? If this was going via the PIC, that would be impossible, as
that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
IO-APIC (and we follow this model - see tools/libacpi/build.c). How
did you determine that 0x30 is the vector of IRQ2?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-07  7:28         ` Chao Gao
@ 2017-02-07 16:18           ` Jan Beulich
  2017-02-08  0:12             ` Chao Gao
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2017-02-07 16:18 UTC (permalink / raw)
  To: Chao Gao
  Cc: tianyu.lan, Xuquan (Quan Xu),
	Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote:
> On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote:
>>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
>>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>>>>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
>>>>> 
>>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>>>>> {
>>>>> 	/* This may race with setting of irr in __apic_accept_irq() and
>>>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>>>>> 	 * will cause vmexit immediately and the value will be recalculated
>>>>> 	 * on the next vmentry.
>>>>> 	 */
>>>>> ...
>>>>> }
>>>>> 
>>>>> I am afraid, there may be a similar race with setting of vIRR..
>>>>
>>>>I think this is unrelated: If another interrupt came in, the highest
>>>>set bit in vIRR can only increase. Plus pt_update_irq(), before
>>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>>>>highest priority interrupt should still have a vector >= pt_vector.
>>> 
>>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
>>> when the assertion failed. In the process of debugging pt_update_irq() 
>>> when I was working on another feature, I noticed that 0x30 is always 
>>> the vector of IRQ2. I suspect that the source of the periodic timer 
> interrupt
>>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
>>> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
>>> If guest changed the content in viopaic entry between the two read
>>> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
>>> Do you think it is a reasonable explanation?
>>
>>IRQ2? If this was going via the PIC, that would be impossible, as
>>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
>>IO-APIC (and we follow this model - see tools/libacpi/build.c). How
> 
> Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC.
> 
>>did you determine that 0x30 is the vector of IRQ2?
> 
> By monitoring the write operation to IO-APIC redirection entry.
> From experimental data, linux kernel always writes vector 0x30 to
> redirection entry of pin2 of IO-APIC.

But if it goes through the IO-APIC, it'll also go through the LAPIC
(and in the ExtInt case you wouldn't be able to read a valid vector
from the RTE).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-07 16:18           ` Jan Beulich
@ 2017-02-08  0:12             ` Chao Gao
  2017-02-08  8:06               ` Tian, Kevin
  0 siblings, 1 reply; 21+ messages in thread
From: Chao Gao @ 2017-02-08  0:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tianyu.lan, Xuquan (Quan Xu),
	Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

On Tue, Feb 07, 2017 at 09:18:56AM -0700, Jan Beulich wrote:
>>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote:
>> On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote:
>>>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
>>>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>>>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>>>>>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
>>>>>> 
>>>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>>>>>> {
>>>>>> 	/* This may race with setting of irr in __apic_accept_irq() and
>>>>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>>>>>> 	 * will cause vmexit immediately and the value will be recalculated
>>>>>> 	 * on the next vmentry.
>>>>>> 	 */
>>>>>> ...
>>>>>> }
>>>>>> 
>>>>>> I am afraid, there may be a similar race with setting of vIRR..
>>>>>
>>>>>I think this is unrelated: If another interrupt came in, the highest
>>>>>set bit in vIRR can only increase. Plus pt_update_irq(), before
>>>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>>>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>>>>>highest priority interrupt should still have a vector >= pt_vector.
>>>> 
>>>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
>>>> when the assertion failed. In the process of debugging pt_update_irq() 
>>>> when I was working on another feature, I noticed that 0x30 is always 
>>>> the vector of IRQ2. I suspect that the source of the periodic timer 
>> interrupt
>>>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
>>>> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
>>>> If guest changed the content in viopaic entry between the two read
>>>> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
>>>> Do you think it is a reasonable explanation?
>>>
>>>IRQ2? If this was going via the PIC, that would be impossible, as
>>>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
>>>IO-APIC (and we follow this model - see tools/libacpi/build.c). How
>> 
>> Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC.
>> 
>>>did you determine that 0x30 is the vector of IRQ2?
>> 
>> By monitoring the write operation to IO-APIC redirection entry.
>> From experimental data, linux kernel always writes vector 0x30 to
>> redirection entry of pin2 of IO-APIC.
>
>But if it goes through the IO-APIC, it'll also go through the LAPIC
>(and in the ExtInt case you wouldn't be able to read a valid vector
>from the RTE).

Yes. If a pt interrupt goes through IO-APIC, pt_update_irq() will read
the RTE twice. One is in hvm_isa_irq_assert() to determine which bit
in vIRR should be set. The other is in pt_irq_vector() to get the bit
we have set and return it. Is it possible that the return vector is
different from the bit set in vIRR of LAPIC, since guest has changed
RTE of IO-APIC pin2.

thanks,
chao
>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-08  8:06               ` Tian, Kevin
@ 2017-02-08  1:30                 ` Chao Gao
  0 siblings, 0 replies; 21+ messages in thread
From: Chao Gao @ 2017-02-08  1:30 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Lan, Tianyu, Xuquan (Quan Xu),
	Jan Beulich, Andrew Cooper, Nakajima, Jun, xen-devel

On Wed, Feb 08, 2017 at 04:06:53PM +0800, Tian, Kevin wrote:
>> From: Gao, Chao
>> Sent: Wednesday, February 08, 2017 8:12 AM
>> 
>> On Tue, Feb 07, 2017 at 09:18:56AM -0700, Jan Beulich wrote:
>> >>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote:
>> >> On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote:
>> >>>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
>> >>>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>> >>>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>> >>>>>> Some comment from QEMU/KVM code, in
>> <linux-kernel>/arch/x86/kvm/lapic.c,
>> >>>>>>
>> >>>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>> >>>>>> {
>> >>>>>> 	/* This may race with setting of irr in __apic_accept_irq() and
>> >>>>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>> >>>>>> 	 * will cause vmexit immediately and the value will be recalculated
>> >>>>>> 	 * on the next vmentry.
>> >>>>>> 	 */
>> >>>>>> ...
>> >>>>>> }
>> >>>>>>
>> >>>>>> I am afraid, there may be a similar race with setting of vIRR..
>> >>>>>
>> >>>>>I think this is unrelated: If another interrupt came in, the highest
>> >>>>>set bit in vIRR can only increase. Plus pt_update_irq(), before
>> >>>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>> >>>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>> >>>>>highest priority interrupt should still have a vector >= pt_vector.
>> >>>>
>> >>>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
>> >>>> when the assertion failed. In the process of debugging pt_update_irq()
>> >>>> when I was working on another feature, I noticed that 0x30 is always
>> >>>> the vector of IRQ2. I suspect that the source of the periodic timer
>> >> interrupt
>> >>>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
>> >>>> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
>> >>>> If guest changed the content in viopaic entry between the two read
>> >>>> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
>> >>>> Do you think it is a reasonable explanation?
>> >>>
>> >>>IRQ2? If this was going via the PIC, that would be impossible, as
>> >>>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
>> >>>IO-APIC (and we follow this model - see tools/libacpi/build.c). How
>> >>
>> >> Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC.
>> >>
>> >>>did you determine that 0x30 is the vector of IRQ2?
>> >>
>> >> By monitoring the write operation to IO-APIC redirection entry.
>> >> From experimental data, linux kernel always writes vector 0x30 to
>> >> redirection entry of pin2 of IO-APIC.
>> >
>> >But if it goes through the IO-APIC, it'll also go through the LAPIC
>> >(and in the ExtInt case you wouldn't be able to read a valid vector
>> >from the RTE).
>> 
>> Yes. If a pt interrupt goes through IO-APIC, pt_update_irq() will read
>> the RTE twice. One is in hvm_isa_irq_assert() to determine which bit
>> in vIRR should be set. The other is in pt_irq_vector() to get the bit
>> we have set and return it. Is it possible that the return vector is
>> different from the bit set in vIRR of LAPIC, since guest has changed
>> RTE of IO-APIC pin2.
>> 
>
>In theory it could happen. There is no lock to prevent vIOAPIC RTE
>being modified between above two operations. But I'm not sure
>whether it's the reason for this bug. My feeling is that vector for IRQ0
>should be fixed one. Did you capture such vector change in your side?

I agree. I don't see guest changes the vector for IRQ0 in hundreds of tests
(simply create and shutdown guest).

Thanks
chao
>
>Thanks
>Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-07 10:04   ` Wei Liu
@ 2017-02-08  7:49     ` Tian, Kevin
  2017-02-10  9:34       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2017-02-08  7:49 UTC (permalink / raw)
  To: Wei Liu, Jan Beulich
  Cc: Lan, Tianyu, xuquan8, Andrew Cooper, Ian Jackson, Nakajima, Jun,
	xen-devel, Gao, Chao

> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: Tuesday, February 07, 2017 6:05 PM
> 
> On Tue, Feb 07, 2017 at 02:51:54AM -0700, Jan Beulich wrote:
> > >>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> > > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> > > added a assertion that intack.vector is the highest priority vector. But
> > > according to the osstest, the assertion failed sometimes. More discussion can
> > > be found in the thread
> > > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> > >
> > > The assertion failure is hard to reproduce. In order to root cause issue, this
> > > patch is to add logs to dump PIR and vIRR when failure takes place. It should
> > > be reverted once the root cause is found.
> > >
> > > Signed-off-by: Chao Gao <chao.gao@intel.com>
> >
> > If this wasn't just a temporary thing, I'd have some comments, but
> > it's good enough (perhaps with a little massaging when committing)
> > for this purpose.
> >
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >
> > Ian, Wei, is there a way to automatically scan osstest logs for
> > occurrences of the extra output, or does someone need to look
> > through the logs of every spurious failure?
> >
> 
> I don't think there is automatically scanning in place.

Curious how this issue was initially caught? Would same practice make
sure next fail catching our eye?

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-08  0:12             ` Chao Gao
@ 2017-02-08  8:06               ` Tian, Kevin
  2017-02-08  1:30                 ` Chao Gao
  0 siblings, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2017-02-08  8:06 UTC (permalink / raw)
  To: Gao, Chao, Jan Beulich
  Cc: Lan, Tianyu, Andrew Cooper, Xuquan (Quan Xu), Nakajima, Jun, xen-devel

> From: Gao, Chao
> Sent: Wednesday, February 08, 2017 8:12 AM
> 
> On Tue, Feb 07, 2017 at 09:18:56AM -0700, Jan Beulich wrote:
> >>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote:
> >> On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote:
> >>>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
> >>>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
> >>>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
> >>>>>> Some comment from QEMU/KVM code, in
> <linux-kernel>/arch/x86/kvm/lapic.c,
> >>>>>>
> >>>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
> >>>>>> {
> >>>>>> 	/* This may race with setting of irr in __apic_accept_irq() and
> >>>>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
> >>>>>> 	 * will cause vmexit immediately and the value will be recalculated
> >>>>>> 	 * on the next vmentry.
> >>>>>> 	 */
> >>>>>> ...
> >>>>>> }
> >>>>>>
> >>>>>> I am afraid, there may be a similar race with setting of vIRR..
> >>>>>
> >>>>>I think this is unrelated: If another interrupt came in, the highest
> >>>>>set bit in vIRR can only increase. Plus pt_update_irq(), before
> >>>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
> >>>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
> >>>>>highest priority interrupt should still have a vector >= pt_vector.
> >>>>
> >>>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
> >>>> when the assertion failed. In the process of debugging pt_update_irq()
> >>>> when I was working on another feature, I noticed that 0x30 is always
> >>>> the vector of IRQ2. I suspect that the source of the periodic timer
> >> interrupt
> >>>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
> >>>> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
> >>>> If guest changed the content in viopaic entry between the two read
> >>>> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
> >>>> Do you think it is a reasonable explanation?
> >>>
> >>>IRQ2? If this was going via the PIC, that would be impossible, as
> >>>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
> >>>IO-APIC (and we follow this model - see tools/libacpi/build.c). How
> >>
> >> Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC.
> >>
> >>>did you determine that 0x30 is the vector of IRQ2?
> >>
> >> By monitoring the write operation to IO-APIC redirection entry.
> >> From experimental data, linux kernel always writes vector 0x30 to
> >> redirection entry of pin2 of IO-APIC.
> >
> >But if it goes through the IO-APIC, it'll also go through the LAPIC
> >(and in the ExtInt case you wouldn't be able to read a valid vector
> >from the RTE).
> 
> Yes. If a pt interrupt goes through IO-APIC, pt_update_irq() will read
> the RTE twice. One is in hvm_isa_irq_assert() to determine which bit
> in vIRR should be set. The other is in pt_irq_vector() to get the bit
> we have set and return it. Is it possible that the return vector is
> different from the bit set in vIRR of LAPIC, since guest has changed
> RTE of IO-APIC pin2.
> 

In theory it could happen. There is no lock to prevent vIOAPIC RTE
being modified between above two operations. But I'm not sure
whether it's the reason for this bug. My feeling is that vector for IRQ0
should be fixed one. Did you capture such vector change in your side?

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-08  7:49     ` Tian, Kevin
@ 2017-02-10  9:34       ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2017-02-10  9:34 UTC (permalink / raw)
  To: Kevin Tian
  Cc: Tianyu Lan, xuquan8, Wei Liu, Andrew Cooper, Ian Jackson,
	Jun Nakajima, xen-devel, Chao Gao

>>> On 08.02.17 at 08:49, <kevin.tian@intel.com> wrote:
> Curious how this issue was initially caught? Would same practice make
> sure next fail catching our eye?

I guess Andrew just happened to look at the logs of a spurious
failure of some test.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-06 23:32 [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT() Chao Gao
  2017-02-07  6:48 ` Xuquan (Quan Xu)
  2017-02-07  9:51 ` Jan Beulich
@ 2017-02-10  9:36 ` Jan Beulich
  2017-02-13  2:48   ` Tian, Kevin
  2017-02-14 10:51 ` Jan Beulich
  3 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2017-02-10  9:36 UTC (permalink / raw)
  To: Chao Gao, Jun Nakajima, Kevin Tian
  Cc: tianyu.lan, Andrew Cooper, xuquan8, xen-devel

>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> added a assertion that intack.vector is the highest priority vector. But
> according to the osstest, the assertion failed sometimes. More discussion can
> be found in the thread
> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> 
> The assertion failure is hard to reproduce. In order to root cause issue, this
> patch is to add logs to dump PIR and vIRR when failure takes place. It should
> be reverted once the root cause is found.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Jun, Kevin - can you ack this? Or was Chao expected to make any
changes? I'd like to see this go in rather sooner than later, so we
can get it back out well before 4.9 is going to settle.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-10  9:36 ` Jan Beulich
@ 2017-02-13  2:48   ` Tian, Kevin
  0 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2017-02-13  2:48 UTC (permalink / raw)
  To: Jan Beulich, Gao, Chao, Nakajima, Jun
  Cc: Lan, Tianyu, Andrew Cooper, xuquan8, xen-devel

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, February 10, 2017 5:36 PM
> 
> >>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> > added a assertion that intack.vector is the highest priority vector. But
> > according to the osstest, the assertion failed sometimes. More discussion can
> > be found in the thread
> > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> >
> > The assertion failure is hard to reproduce. In order to root cause issue, this
> > patch is to add logs to dump PIR and vIRR when failure takes place. It should
> > be reverted once the root cause is found.
> >
> > Signed-off-by: Chao Gao <chao.gao@intel.com>
> 
> Jun, Kevin - can you ack this? Or was Chao expected to make any
> changes? I'd like to see this go in rather sooner than later, so we
> can get it back out well before 4.9 is going to settle.
> 

My bad... thought I've acked but seems distracted by later discussions.

Acked-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-06 23:32 [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT() Chao Gao
                   ` (2 preceding siblings ...)
  2017-02-10  9:36 ` Jan Beulich
@ 2017-02-14 10:51 ` Jan Beulich
  2017-02-14 12:12   ` Julien Grall
  2017-05-04  9:47   ` Julien Grall
  3 siblings, 2 replies; 21+ messages in thread
From: Jan Beulich @ 2017-02-14 10:51 UTC (permalink / raw)
  To: Julien Grall
  Cc: tianyu.lan, xuquan8, Andrew Cooper, Kevin Tian, Jun Nakajima,
	xen-devel, Chao Gao

>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> added a assertion that intack.vector is the highest priority vector. But
> according to the osstest, the assertion failed sometimes. More discussion can
> be found in the thread
> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> 
> The assertion failure is hard to reproduce. In order to root cause issue, this
> patch is to add logs to dump PIR and vIRR when failure takes place. It should
> be reverted once the root cause is found.

Julien, could you add a note on the 4.9 tracking list for us to not
forget to revert this (or at least add NDEBUG conditionals) the
latest in the RC phase?

Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-14 10:51 ` Jan Beulich
@ 2017-02-14 12:12   ` Julien Grall
  2017-02-14 14:37     ` Julien Grall
  2017-05-04  9:47   ` Julien Grall
  1 sibling, 1 reply; 21+ messages in thread
From: Julien Grall @ 2017-02-14 12:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tianyu.lan, xuquan8, Andrew Cooper, Kevin Tian, Jun Nakajima,
	xen-devel, Chao Gao

Hi Jan,

On 02/14/2017 10:51 AM, Jan Beulich wrote:
>>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
>> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
>> added a assertion that intack.vector is the highest priority vector. But
>> according to the osstest, the assertion failed sometimes. More discussion can
>> be found in the thread
>> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
>>
>> The assertion failure is hard to reproduce. In order to root cause issue, this
>> patch is to add logs to dump PIR and vIRR when failure takes place. It should
>> be reverted once the root cause is found.
>
> Julien, could you add a note on the 4.9 tracking list for us to not
> forget to revert this (or at least add NDEBUG conditionals) the
> latest in the RC phase?

I have added a note in the 4.9 tracking list.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-14 12:12   ` Julien Grall
@ 2017-02-14 14:37     ` Julien Grall
  0 siblings, 0 replies; 21+ messages in thread
From: Julien Grall @ 2017-02-14 14:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tianyu.lan, xuquan8, Andrew Cooper, Kevin Tian, Jun Nakajima,
	xen-devel, nd, Chao Gao

(Resending the e-mail due to SMTP issue)

On 02/14/2017 12:12 PM, Julien Grall wrote:
> Hi Jan,
>
> On 02/14/2017 10:51 AM, Jan Beulich wrote:
>>>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
>>> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv
>>> issue") has
>>> added a assertion that intack.vector is the highest priority vector. But
>>> according to the osstest, the assertion failed sometimes. More
>>> discussion can
>>> be found in the thread
>>> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
>>>
>>>
>>> The assertion failure is hard to reproduce. In order to root cause
>>> issue, this
>>> patch is to add logs to dump PIR and vIRR when failure takes place.
>>> It should
>>> be reverted once the root cause is found.
>>
>> Julien, could you add a note on the 4.9 tracking list for us to not
>> forget to revert this (or at least add NDEBUG conditionals) the
>> latest in the RC phase?
>
> I have added a note in the 4.9 tracking list.
>
> Cheers,
>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-02-14 10:51 ` Jan Beulich
  2017-02-14 12:12   ` Julien Grall
@ 2017-05-04  9:47   ` Julien Grall
  2017-05-04 10:20     ` Jan Beulich
  1 sibling, 1 reply; 21+ messages in thread
From: Julien Grall @ 2017-05-04  9:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tianyu.lan, xuquan8, Andrew Cooper, Kevin Tian, Jun Nakajima,
	xen-devel, Chao Gao

Hi,

On 14/02/17 10:51, Jan Beulich wrote:
>>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
>> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
>> added a assertion that intack.vector is the highest priority vector. But
>> according to the osstest, the assertion failed sometimes. More discussion can
>> be found in the thread
>> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
>>
>> The assertion failure is hard to reproduce. In order to root cause issue, this
>> patch is to add logs to dump PIR and vIRR when failure takes place. It should
>> be reverted once the root cause is found.
>
> Julien, could you add a note on the 4.9 tracking list for us to not
> forget to revert this (or at least add NDEBUG conditionals) the
> latest in the RC phase?

Bumping this discussion as we are in the RC phase. Was the root cause 
found and we can revert the patch? Or shall we surround the code with 
NDEBUG?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT()
  2017-05-04  9:47   ` Julien Grall
@ 2017-05-04 10:20     ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2017-05-04 10:20 UTC (permalink / raw)
  To: Julien Grall
  Cc: tianyu.lan, xuquan8, Andrew Cooper, Kevin Tian, Jun Nakajima,
	xen-devel, Chao Gao

>>> On 04.05.17 at 11:47, <julien.grall@arm.com> wrote:
> On 14/02/17 10:51, Jan Beulich wrote:
>>>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
>>> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
>>> added a assertion that intack.vector is the highest priority vector. But
>>> according to the osstest, the assertion failed sometimes. More discussion can
>>> be found in the thread
>>> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
>>>
>>> The assertion failure is hard to reproduce. In order to root cause issue, this
>>> patch is to add logs to dump PIR and vIRR when failure takes place. It should
>>> be reverted once the root cause is found.
>>
>> Julien, could you add a note on the 4.9 tracking list for us to not
>> forget to revert this (or at least add NDEBUG conditionals) the
>> latest in the RC phase?
> 
> Bumping this discussion as we are in the RC phase. Was the root cause 
> found and we can revert the patch?

Sadly not, so far.

> Or shall we surround the code with NDEBUG?

Yes, I'll submit a patch.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-05-04 10:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-06 23:32 [PATCH] X86/vmx: Dump PIR and vIRR before ASSERT() Chao Gao
2017-02-07  6:48 ` Xuquan (Quan Xu)
2017-02-07 10:04   ` Jan Beulich
2017-02-07  6:32     ` Chao Gao
2017-02-07 13:46       ` Jan Beulich
2017-02-07  7:28         ` Chao Gao
2017-02-07 16:18           ` Jan Beulich
2017-02-08  0:12             ` Chao Gao
2017-02-08  8:06               ` Tian, Kevin
2017-02-08  1:30                 ` Chao Gao
2017-02-07  9:51 ` Jan Beulich
2017-02-07 10:04   ` Wei Liu
2017-02-08  7:49     ` Tian, Kevin
2017-02-10  9:34       ` Jan Beulich
2017-02-10  9:36 ` Jan Beulich
2017-02-13  2:48   ` Tian, Kevin
2017-02-14 10:51 ` Jan Beulich
2017-02-14 12:12   ` Julien Grall
2017-02-14 14:37     ` Julien Grall
2017-05-04  9:47   ` Julien Grall
2017-05-04 10:20     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.