All of lore.kernel.org
 help / color / mirror / Atom feed
* HPET stack overflow, and general problems with do_IRQ()
@ 2013-08-15 20:21 Andrew Cooper
  2013-08-16  7:53 ` Jan Beulich
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Cooper @ 2013-08-15 20:21 UTC (permalink / raw)
  To: Jan Beulich, Tim Deegan, Keir Fraser; +Cc: Xen-devel List

Hello,

I have finally managed to get a full stack dump from affected hardware.

The logs can be found here (including hypervisor with debugging symbols):

http://xenbits.xen.org/people/andrewcoop/hpet-overflow-full-stackdump.tar.gz

The interesting log file is xen.pcpu0.stack.log

By my count (grepping for e008 as CS), there are are 8 exception frames
on the Xen stack (all stack page 6)

However, because of the early ack() at the LAPIC, and disabling of
interrupts, the vectors (in order of interrupts arriving) are

c1, 99, b1, b9, a9, a1, 91, 89

These 8 interrupts take a little more than half the available stack,
while the bottom half of the stack seems be a vmentry which failed
because of a hap pagefault.

One "solution" to the problem would be to extend the Xen
PRIMARY_STACK_SIZE to 3 pages rather than 2, but is hardly a good thing
to do.


I think that the fundamental problem is the early ack and re-enabling of
interrupts.  We have servers where 150 VMs using PCIpassthrough are
starting to run out of available entries in the IDTs.  While unlikely,
it would be possible to encounter a situation with 40 nested interrupts,
at which point there is a real danger of trashing the compat sysenter
trampoline, located at the base of stack page 3, and just a few more
before MCEs and NMIs will end up walking over the main stack.

While I hate to suggest this, the only sensible solution without edge
cases is to never enable interrupts in do_IRQ().  I suppose a slightly
less extreme solution could be to promote the TPR to 0xe0 and re-enable
interrupts, so high-priority processing can still occur?

Thought/comments?

Unfortunately, I am out of the office now until Monday 26th, with
limited access to internet during that time (Although I will still be
with internet tomorrow morning).  I will check emails when I can, but I
don't expect to be able to make timely contributions to the above
discussion during this time.

~Andrew

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: HPET stack overflow, and general problems with do_IRQ()
  2013-08-15 20:21 HPET stack overflow, and general problems with do_IRQ() Andrew Cooper
@ 2013-08-16  7:53 ` Jan Beulich
  2013-08-16 15:34   ` Keir Fraser
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Beulich @ 2013-08-16  7:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Keir Fraser, Xen-devel List

>>> On 15.08.13 at 22:21, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> Hello,
> 
> I have finally managed to get a full stack dump from affected hardware.
> 
> The logs can be found here (including hypervisor with debugging symbols):
> 
> http://xenbits.xen.org/people/andrewcoop/hpet-overflow-full-stackdump.tar.gz 
> 
> The interesting log file is xen.pcpu0.stack.log
> 
> By my count (grepping for e008 as CS), there are are 8 exception frames
> on the Xen stack (all stack page 6)
> 
> However, because of the early ack() at the LAPIC, and disabling of
> interrupts, the vectors (in order of interrupts arriving) are
> 
> c1, 99, b1, b9, a9, a1, 91, 89

So these are all HPET interrupts as it seems to me. You said the
box just has 8 of them, so the fundamental problem is not the
general handling of interrupts that you talk about below, but the
fact that _all_ these channels are bound to CPU0: That's an
insane side effect of the way channel management works when
there are (potentially) more CPUs than channels. So _I_ think
this is what needs fixing.

That's even more so that the above sequence would be impossible
for guest interrupts (which don't get EOI-ed immediately, and
interrupts don't get re-enabled on that path either). Hence in the
discussion here we need to only be concerned of interrupts that
Xen uses for itself: timer, console, iommu, and HPET. Out of these,
timer and console - going through the IO-APIC - are safe from this
because of how io_apic.c implements the ->ack()/->end() pairs.
Both IOMMU implementations ack their IRQs in the LAPIC only in
->end(). And that's what I suggested to switch HPET to too. And
other than I said about this earlier, disabling interrupts in the
->end() handler isn't even necessary, as it already gets called with
them disabled.

So we have two possible fixes to the HPET, either of which is
very likely to deal with the problem on its own.

Jan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: HPET stack overflow, and general problems with do_IRQ()
  2013-08-16  7:53 ` Jan Beulich
@ 2013-08-16 15:34   ` Keir Fraser
  0 siblings, 0 replies; 3+ messages in thread
From: Keir Fraser @ 2013-08-16 15:34 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: Tim Deegan, Xen-devel List

On 16/08/2013 08:53, "Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 15.08.13 at 22:21, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> Hello,
>> 
>> I have finally managed to get a full stack dump from affected hardware.
>> 
>> The logs can be found here (including hypervisor with debugging symbols):
>> 
>> http://xenbits.xen.org/people/andrewcoop/hpet-overflow-full-stackdump.tar.gz
>> 
>> The interesting log file is xen.pcpu0.stack.log
>> 
>> By my count (grepping for e008 as CS), there are are 8 exception frames
>> on the Xen stack (all stack page 6)
>> 
>> However, because of the early ack() at the LAPIC, and disabling of
>> interrupts, the vectors (in order of interrupts arriving) are
>> 
>> c1, 99, b1, b9, a9, a1, 91, 89
> 
> So these are all HPET interrupts as it seems to me. You said the
> box just has 8 of them, so the fundamental problem is not the
> general handling of interrupts that you talk about below, but the
> fact that _all_ these channels are bound to CPU0: That's an
> insane side effect of the way channel management works when
> there are (potentially) more CPUs than channels. So _I_ think
> this is what needs fixing.
> 
> That's even more so that the above sequence would be impossible
> for guest interrupts (which don't get EOI-ed immediately, and
> interrupts don't get re-enabled on that path either). Hence in the
> discussion here we need to only be concerned of interrupts that
> Xen uses for itself: timer, console, iommu, and HPET. Out of these,
> timer and console - going through the IO-APIC - are safe from this
> because of how io_apic.c implements the ->ack()/->end() pairs.
> Both IOMMU implementations ack their IRQs in the LAPIC only in
> ->end(). And that's what I suggested to switch HPET to too. And
> other than I said about this earlier, disabling interrupts in the
> ->end() handler isn't even necessary, as it already gets called with
> them disabled.
> 
> So we have two possible fixes to the HPET, either of which is
> very likely to deal with the problem on its own.

Additionally, with per-vcpu stacks we could have a larger per-cpu irq stack.
It would be easier to grow that without 'wasting' memory. Although I think
Jan's arguments above do make sense.

 -- Keir

> Jan
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-08-16 15:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-15 20:21 HPET stack overflow, and general problems with do_IRQ() Andrew Cooper
2013-08-16  7:53 ` Jan Beulich
2013-08-16 15:34   ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.