xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* guest being crashed from viridian_start_apic_assist()
@ 2016-06-03  9:04 Jan Beulich
  2016-06-03  9:39 ` Paul Durrant
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-06-03  9:04 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

Paul,

to test a guest ACPI change I went through all Windows versions that
I have installed anywhere, to find 32-bit Win7 dying (the other
variants I tried worked fine), albeit retrying a couple of times I then
didn't see this a second time. Is this something you've observed
elsewhere? The APIC assist logic isn't that complicated, and I can't
really spot any race in there...

Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: guest being crashed from viridian_start_apic_assist()
  2016-06-03  9:04 guest being crashed from viridian_start_apic_assist() Jan Beulich
@ 2016-06-03  9:39 ` Paul Durrant
  2016-06-03  9:59   ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Durrant @ 2016-06-03  9:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 03 June 2016 10:04
> To: Paul Durrant
> Cc: xen-devel
> Subject: guest being crashed from viridian_start_apic_assist()
> 
> Paul,
> 
> to test a guest ACPI change I went through all Windows versions that
> I have installed anywhere, to find 32-bit Win7 dying (the other
> variants I tried worked fine), albeit retrying a couple of times I then
> didn't see this a second time. Is this something you've observed
> elsewhere? The APIC assist logic isn't that complicated, and I can't
> really spot any race in there...
> 

Hi Jan,

How did it die?

I've not seen any problems in dev testing and I usually use win7 32-bit. APIC assist has not yet been enabled by default in XenServer builds though so it has not had automated test exposure yet... I should really get round to turning it on now that 7.0 has shipped.

  Paul

> Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: guest being crashed from viridian_start_apic_assist()
  2016-06-03  9:39 ` Paul Durrant
@ 2016-06-03  9:59   ` Jan Beulich
  2016-06-03 12:11     ` Paul Durrant
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-06-03  9:59 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 03.06.16 at 11:39, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 03 June 2016 10:04
>> 
>> to test a guest ACPI change I went through all Windows versions that
>> I have installed anywhere, to find 32-bit Win7 dying (the other
>> variants I tried worked fine), albeit retrying a couple of times I then
>> didn't see this a second time. Is this something you've observed
>> elsewhere? The APIC assist logic isn't that complicated, and I can't
>> really spot any race in there...
> 
> How did it die?

From the single domain_crash() in that function. No other information
was available (apart from the register dump, which doesn't look like
it would be very useful for analysis here).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: guest being crashed from viridian_start_apic_assist()
  2016-06-03  9:59   ` Jan Beulich
@ 2016-06-03 12:11     ` Paul Durrant
  2016-06-03 13:05       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Durrant @ 2016-06-03 12:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 03 June 2016 10:59
> To: Paul Durrant
> Cc: xen-devel
> Subject: RE: guest being crashed from viridian_start_apic_assist()
> 
> >>> On 03.06.16 at 11:39, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 03 June 2016 10:04
> >>
> >> to test a guest ACPI change I went through all Windows versions that
> >> I have installed anywhere, to find 32-bit Win7 dying (the other
> >> variants I tried worked fine), albeit retrying a couple of times I then
> >> didn't see this a second time. Is this something you've observed
> >> elsewhere? The APIC assist logic isn't that complicated, and I can't
> >> really spot any race in there...
> >
> > How did it die?
> 
> From the single domain_crash() in that function. No other information
> was available (apart from the register dump, which doesn't look like
> it would be very useful for analysis here).
> 

There is an inconsistency between the test to abort a previous assist, which is based on:

    irr = vlapic_find_highest_irr(vlapic);

    isr = vlapic_find_highest_isr(vlapic);
    if ( (isr & 0xf0) >= (irr & 0xf0) )

and the test to start a new one:

    isr = vlapic_find_lowest_vector(&vlapic->regs->data[APIC_ISR]);
    if ( isr >= 0 && isr < vector )

I suspect that may be the problem you hit.

  Paul

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: guest being crashed from viridian_start_apic_assist()
  2016-06-03 12:11     ` Paul Durrant
@ 2016-06-03 13:05       ` Jan Beulich
  2016-06-03 13:31         ` Paul Durrant
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-06-03 13:05 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 03.06.16 at 14:11, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 03 June 2016 10:59
>> To: Paul Durrant
>> Cc: xen-devel
>> Subject: RE: guest being crashed from viridian_start_apic_assist()
>> 
>> >>> On 03.06.16 at 11:39, <Paul.Durrant@citrix.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> Sent: 03 June 2016 10:04
>> >>
>> >> to test a guest ACPI change I went through all Windows versions that
>> >> I have installed anywhere, to find 32-bit Win7 dying (the other
>> >> variants I tried worked fine), albeit retrying a couple of times I then
>> >> didn't see this a second time. Is this something you've observed
>> >> elsewhere? The APIC assist logic isn't that complicated, and I can't
>> >> really spot any race in there...
>> >
>> > How did it die?
>> 
>> From the single domain_crash() in that function. No other information
>> was available (apart from the register dump, which doesn't look like
>> it would be very useful for analysis here).
>> 
> 
> There is an inconsistency between the test to abort a previous assist, which 
> is based on:
> 
>     irr = vlapic_find_highest_irr(vlapic);
> 
>     isr = vlapic_find_highest_isr(vlapic);
>     if ( (isr & 0xf0) >= (irr & 0xf0) )
> 
> and the test to start a new one:
> 
>     isr = vlapic_find_lowest_vector(&vlapic->regs->data[APIC_ISR]);
>     if ( isr >= 0 && isr < vector )
> 
> I suspect that may be the problem you hit.

I don't think so. Of the second if(), the right side could actually be
dropped, as the logic in vlapic_has_pending_irq() guarantees
(vector & 0xf0) > (highest_isr & 0xf0), i.e. vector > highest_isr
and hence necessarily vector > lowest_isr. That is, if
vlapic_has_pending_irq() finds _some_ ISR bit set, it must be one
below vector. Which means viridian_start_apic_assist() gets called
solely when there's _no_ other ISR bit set.

What I find more problematic looking at those functions (but
unrelated to the problem here afaict) is the
vlapic_virtual_intr_delivery_enabled() related logic and its
interaction with the Viridian APIC assist (leaving aside nested
mode for the moment): vlapic_has_pending_irq() won't do
anything with the APIC assist in that case, but if force_ack is
true in vlapic_ack_pending_irq() there will be an interaction.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: guest being crashed from viridian_start_apic_assist()
  2016-06-03 13:05       ` Jan Beulich
@ 2016-06-03 13:31         ` Paul Durrant
  2016-06-03 14:37           ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Durrant @ 2016-06-03 13:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 03 June 2016 14:06
> To: Paul Durrant
> Cc: xen-devel
> Subject: RE: guest being crashed from viridian_start_apic_assist()
> 
> >>> On 03.06.16 at 14:11, <Paul.Durrant@citrix.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 03 June 2016 10:59
> >> To: Paul Durrant
> >> Cc: xen-devel
> >> Subject: RE: guest being crashed from viridian_start_apic_assist()
> >>
> >> >>> On 03.06.16 at 11:39, <Paul.Durrant@citrix.com> wrote:
> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> Sent: 03 June 2016 10:04
> >> >>
> >> >> to test a guest ACPI change I went through all Windows versions that
> >> >> I have installed anywhere, to find 32-bit Win7 dying (the other
> >> >> variants I tried worked fine), albeit retrying a couple of times I then
> >> >> didn't see this a second time. Is this something you've observed
> >> >> elsewhere? The APIC assist logic isn't that complicated, and I can't
> >> >> really spot any race in there...
> >> >
> >> > How did it die?
> >>
> >> From the single domain_crash() in that function. No other information
> >> was available (apart from the register dump, which doesn't look like
> >> it would be very useful for analysis here).
> >>
> >
> > There is an inconsistency between the test to abort a previous assist, which
> > is based on:
> >
> >     irr = vlapic_find_highest_irr(vlapic);
> >
> >     isr = vlapic_find_highest_isr(vlapic);
> >     if ( (isr & 0xf0) >= (irr & 0xf0) )
> >
> > and the test to start a new one:
> >
> >     isr = vlapic_find_lowest_vector(&vlapic->regs->data[APIC_ISR]);
> >     if ( isr >= 0 && isr < vector )
> >
> > I suspect that may be the problem you hit.
> 
> I don't think so. Of the second if(), the right side could actually be
> dropped, as the logic in vlapic_has_pending_irq() guarantees
> (vector & 0xf0) > (highest_isr & 0xf0), i.e. vector > highest_isr
> and hence necessarily vector > lowest_isr. That is, if
> vlapic_has_pending_irq() finds _some_ ISR bit set, it must be one
> below vector. Which means viridian_start_apic_assist() gets called
> solely when there's _no_ other ISR bit set.
> 

Yes, that test does appear to be unnecessary.

> What I find more problematic looking at those functions (but
> unrelated to the problem here afaict) is the
> vlapic_virtual_intr_delivery_enabled() related logic and its
> interaction with the Viridian APIC assist (leaving aside nested
> mode for the moment): vlapic_has_pending_irq() won't do
> anything with the APIC assist in that case, but if force_ack is
> true in vlapic_ack_pending_irq() there will be an interaction.
> 

...and the interaction would indeed be the crash you saw, I think, because you could start an APIC assist when vlapic_virtual_intr_delivery_enabled() is true, but never complete it because vlapic_has_pending_irq() bails before doing so. Thus, attempting the next APIC assist would lead to a domain_crash().

  Paul

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: guest being crashed from viridian_start_apic_assist()
  2016-06-03 13:31         ` Paul Durrant
@ 2016-06-03 14:37           ` Jan Beulich
  2016-06-03 14:44             ` Paul Durrant
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-06-03 14:37 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 03.06.16 at 15:31, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 03 June 2016 14:06
>> 
>> What I find more problematic looking at those functions (but
>> unrelated to the problem here afaict) is the
>> vlapic_virtual_intr_delivery_enabled() related logic and its
>> interaction with the Viridian APIC assist (leaving aside nested
>> mode for the moment): vlapic_has_pending_irq() won't do
>> anything with the APIC assist in that case, but if force_ack is
>> true in vlapic_ack_pending_irq() there will be an interaction.
> 
> ...and the interaction would indeed be the crash you saw, I think, because 
> you could start an APIC assist when vlapic_virtual_intr_delivery_enabled() is 
> true, but never complete it because vlapic_has_pending_irq() bails before 
> doing so. Thus, attempting the next APIC assist would lead to a 
> domain_crash().

With one caveat - this was on a Nehalem, which doesn't have that
capability. Hence me saying "unrelated"...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: guest being crashed from viridian_start_apic_assist()
  2016-06-03 14:37           ` Jan Beulich
@ 2016-06-03 14:44             ` Paul Durrant
  2016-06-03 15:01               ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Durrant @ 2016-06-03 14:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 03 June 2016 15:38
> To: Paul Durrant
> Cc: xen-devel
> Subject: RE: guest being crashed from viridian_start_apic_assist()
> 
> >>> On 03.06.16 at 15:31, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 03 June 2016 14:06
> >>
> >> What I find more problematic looking at those functions (but
> >> unrelated to the problem here afaict) is the
> >> vlapic_virtual_intr_delivery_enabled() related logic and its
> >> interaction with the Viridian APIC assist (leaving aside nested
> >> mode for the moment): vlapic_has_pending_irq() won't do
> >> anything with the APIC assist in that case, but if force_ack is
> >> true in vlapic_ack_pending_irq() there will be an interaction.
> >
> > ...and the interaction would indeed be the crash you saw, I think, because
> > you could start an APIC assist when vlapic_virtual_intr_delivery_enabled()
> is
> > true, but never complete it because vlapic_has_pending_irq() bails before
> > doing so. Thus, attempting the next APIC assist would lead to a
> > domain_crash().
> 
> With one caveat - this was on a Nehalem, which doesn't have that
> capability. Hence me saying "unrelated"...
> 

Ah, I see.

  Paul

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: guest being crashed from viridian_start_apic_assist()
  2016-06-03 14:44             ` Paul Durrant
@ 2016-06-03 15:01               ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2016-06-03 15:01 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 03.06.16 at 16:44, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 03 June 2016 15:38
>> To: Paul Durrant
>> Cc: xen-devel
>> Subject: RE: guest being crashed from viridian_start_apic_assist()
>> 
>> >>> On 03.06.16 at 15:31, <Paul.Durrant@citrix.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> Sent: 03 June 2016 14:06
>> >>
>> >> What I find more problematic looking at those functions (but
>> >> unrelated to the problem here afaict) is the
>> >> vlapic_virtual_intr_delivery_enabled() related logic and its
>> >> interaction with the Viridian APIC assist (leaving aside nested
>> >> mode for the moment): vlapic_has_pending_irq() won't do
>> >> anything with the APIC assist in that case, but if force_ack is
>> >> true in vlapic_ack_pending_irq() there will be an interaction.
>> >
>> > ...and the interaction would indeed be the crash you saw, I think, because
>> > you could start an APIC assist when vlapic_virtual_intr_delivery_enabled()
>> is
>> > true, but never complete it because vlapic_has_pending_irq() bails before
>> > doing so. Thus, attempting the next APIC assist would lead to a
>> > domain_crash().
>> 
>> With one caveat - this was on a Nehalem, which doesn't have that
>> capability. Hence me saying "unrelated"...
> 
> Ah, I see.

And btw., that force_ack flag also gets set to true only by vVMX
code (i.e. I wouldn't be surprised at all if something was broken
there).

The bad thing then is - it looks like we still have no real explanation
for that one time guest crash.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-06-03 15:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-03  9:04 guest being crashed from viridian_start_apic_assist() Jan Beulich
2016-06-03  9:39 ` Paul Durrant
2016-06-03  9:59   ` Jan Beulich
2016-06-03 12:11     ` Paul Durrant
2016-06-03 13:05       ` Jan Beulich
2016-06-03 13:31         ` Paul Durrant
2016-06-03 14:37           ` Jan Beulich
2016-06-03 14:44             ` Paul Durrant
2016-06-03 15:01               ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).