* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
[not found] <4E7B4768.8060103@canonical.com>
@ 2011-09-22 17:44 ` Stefano Stabellini
2011-09-30 9:13 ` Stefan Bader
0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-09-22 17:44 UTC (permalink / raw)
To: Stefan Bader; +Cc: xen-devel, Stefano Stabellini
On Thu, 22 Sep 2011, Stefan Bader wrote:
> On 22.09.2011 13:58, Stefan Bader wrote:
> > On 22.09.2011 12:30, Stefano Stabellini wrote:
> >> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>> On 21.09.2011 15:31, Stefano Stabellini wrote:
> >>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
> >>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
> >>>>> gets configured via dhcp. And initial pings also get routed and done correctly.
> >>>>> But slightly higher traffic (like checking for updates) hangs. And after a while
> >>>>> there are messages about tx timeouts.
> >>>>> The ne2k_pci type nic almost immediately has those issues and never comes up
> >>>>> correctly.
> >>>>>
> >>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
> >>>>> this should be but both nics get configured with level,low IRQs. Disk emulation
> >>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
> >>>>> at least not level.
> >>>>
> >>>
> >>>> Does the e1000 emulated card work correctly?
> >>>
> >>> Yes, that one seems to work ok.
> >>>
> >>>> What happens if you disable interrupt remapping (see patch below)?
> >>>
> >>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
> >>> still works. Both then using IOAPIC-fasteoi.
> >>>
> >>
> >> That means there must be another subtle bug in Xen in interrupt
> >> remapping that only affects 8139p emulation
> >>
> > Right, or to be complete:
> > - e1000: ok
> > - 8139cp: unstable (setup is possible)
> > - ne2k_pci: not working (tx problems from the beginning)
> >
> > The behaviour feels a bit like interrupts may get lost if occurring at a higher
> > rate. Why this affects various drivers differently is a bit weird.
> >>
>
> This is mainly speculating... Quite a while back there was this patch to events:
>
> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date: Fri Aug 20 19:10:01 2010 -0700
>
> xen: handle events as edge-triggered
>
> The commit message stated that Xen events are logically edge triggered. So PV
> events were changed to be handled as edge interrupts. Would that not mean that
> for xen-pirq-apic being using events this would apply the same and those should
> be apic-edge instead of level?
That commit is referring to the internal way Linux handles these event,
that look like normal interrupt to the Linux irq subsystem. It is not
related to the way actual events are delivered from Xen to Linux, so it
shouldn't matter here.
I would add lots of printk's in:
xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert
xen/arch/x86/hvm/irq.c:assert_irq
xen/arch/x86/hvm/irq.c:assert_gsi
to find out why xen is not injecting those interrupts
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-09-22 17:44 ` Re: Still struggling with HVM: tx timeouts on emulated nics Stefano Stabellini
@ 2011-09-30 9:13 ` Stefan Bader
2011-09-30 14:09 ` Stefano Stabellini
0 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-09-30 9:13 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel
On 22.09.2011 19:44, Stefano Stabellini wrote:
> On Thu, 22 Sep 2011, Stefan Bader wrote:
>> On 22.09.2011 13:58, Stefan Bader wrote:
>>> On 22.09.2011 12:30, Stefano Stabellini wrote:
>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>>> On 21.09.2011 15:31, Stefano Stabellini wrote:
>>>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
>>>>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
>>>>>>> gets configured via dhcp. And initial pings also get routed and done correctly.
>>>>>>> But slightly higher traffic (like checking for updates) hangs. And after a while
>>>>>>> there are messages about tx timeouts.
>>>>>>> The ne2k_pci type nic almost immediately has those issues and never comes up
>>>>>>> correctly.
>>>>>>>
>>>>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
>>>>>>> this should be but both nics get configured with level,low IRQs. Disk emulation
>>>>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
>>>>>>> at least not level.
>>>>>>
>>>>>
>>>>>> Does the e1000 emulated card work correctly?
>>>>>
>>>>> Yes, that one seems to work ok.
>>>>>
>>>>>> What happens if you disable interrupt remapping (see patch below)?
>>>>>
>>>>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
>>>>> still works. Both then using IOAPIC-fasteoi.
>>>>>
>>>>
>>>> That means there must be another subtle bug in Xen in interrupt
>>>> remapping that only affects 8139p emulation
>>>>
>>> Right, or to be complete:
>>> - e1000: ok
>>> - 8139cp: unstable (setup is possible)
>>> - ne2k_pci: not working (tx problems from the beginning)
>>>
>>> The behaviour feels a bit like interrupts may get lost if occurring at a higher
>>> rate. Why this affects various drivers differently is a bit weird.
>>>>
>>
>> This is mainly speculating... Quite a while back there was this patch to events:
>>
>> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
>> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> Date: Fri Aug 20 19:10:01 2010 -0700
>>
>> xen: handle events as edge-triggered
>>
>> The commit message stated that Xen events are logically edge triggered. So PV
>> events were changed to be handled as edge interrupts. Would that not mean that
>> for xen-pirq-apic being using events this would apply the same and those should
>> be apic-edge instead of level?
>
> That commit is referring to the internal way Linux handles these event,
> that look like normal interrupt to the Linux irq subsystem. It is not
> related to the way actual events are delivered from Xen to Linux, so it
> shouldn't matter here.
>
> I would add lots of printk's in:
>
> xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert
> xen/arch/x86/hvm/irq.c:assert_irq
> xen/arch/x86/hvm/irq.c:assert_gsi
>
> to find out why xen is not injecting those interrupts
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
It took quite a bit of time but at least I got some hopefully useful information
now. So in general, whenever an interrupt is asserted,
the hypervisor runs through this:
__hvm_pci_intx_assert:
when assert count was 0 before incrementing
call assert_gsi
call send_guest_pirq (when hvm uses pirq)
In the send_guest_pirq chain is a call to evtchn_set_pending which tests as one
of the first actions whether evtchn_pending in the shared_info is set. If that
is the case the call immediately returns with 1.
Adding printks to call_assert_gsi, I noticed that
- When things stop working, the last call to send_guest_pirq returned 1.
- But not every time the return code is one, the stall happens.
- e1000 also has cases where send_guest_pirq returns 1 but they happen much
less often (than using the 8139cp).
Usually every intx_assert has a intx_deassert call that follows. when the stall
occurs, this does not happen. Right here I got some troubles to understand where
this intx_deassert is actually triggered. With an added WARN_ON the stack traces
seem odd, like this:
(XEN) [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130
(XEN) [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60
(XEN) [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60
(XEN) [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220
(XEN) [<ffff82c48017dba6>] copy_from_user+0x26/0x90
(XEN) [<ffff82c4801f9446>] do_iret+0xb6/0x1a0
(XEN) [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d
Not really sure how one gets from do_update_descriptor to do_hvm_op and the only
thing in there which does the deassert is some irq level setting.
Actually the guest does not really do much do EOI (which I had been assuming).
But since domain_pirq_to_irq maps to 0 for emuirqs, the call to
PHYSDEVOP_irq_status_query will hit the following and not set the flag for
needing EOI.
irq_status_query.flags = 0;
if ( is_hvm_domain(v->domain) &&
domain_pirq_to_irq(v->domain, irq) <= 0 )
{
ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
break;
}
So all the guest is doing is to clear evtchn_pending in the pirq EOI function. I
fail to understand what actually is doing the hvm_pci_intx_deassert calls but
the way the fasteoi code in the guest looks to be working, there seems to be
some gap between calling the handler and the eoi function... So from what I see,
I would assume the following:
dom0 domU
- intx_assert (count 0->1)
- send_guest_pirq = 0
(evtchn_pending = 1)
- upcall starts fasteoi handler
- something does intx_deassert
(count 1->0)
- intx_assert (count 0->1)
- send_guest_pirq = 1
(evtchn_pending still set)
- handler->eoi sets evtchn to 0 but
otherwise does nothing
- there is no intx_deassert, so even
when another intx_assert would happen
(which does not seem to be the case)
no further send_guest_pirq would be
called.
Unfortunately I do miss some details on the inner working here. Generally I
wonder whether not setting the needsEOI flag for those pirqs just is the
problem. But it also could be intentional...
-Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-09-30 9:13 ` Stefan Bader
@ 2011-09-30 14:09 ` Stefano Stabellini
2011-09-30 16:06 ` Stefan Bader
0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-09-30 14:09 UTC (permalink / raw)
To: Stefan Bader; +Cc: xen-devel, Stefano Stabellini
On Fri, 30 Sep 2011, Stefan Bader wrote:
> On 22.09.2011 19:44, Stefano Stabellini wrote:
> > On Thu, 22 Sep 2011, Stefan Bader wrote:
> >> On 22.09.2011 13:58, Stefan Bader wrote:
> >>> On 22.09.2011 12:30, Stefano Stabellini wrote:
> >>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>>>> On 21.09.2011 15:31, Stefano Stabellini wrote:
> >>>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
> >>>>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
> >>>>>>> gets configured via dhcp. And initial pings also get routed and done correctly.
> >>>>>>> But slightly higher traffic (like checking for updates) hangs. And after a while
> >>>>>>> there are messages about tx timeouts.
> >>>>>>> The ne2k_pci type nic almost immediately has those issues and never comes up
> >>>>>>> correctly.
> >>>>>>>
> >>>>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
> >>>>>>> this should be but both nics get configured with level,low IRQs. Disk emulation
> >>>>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
> >>>>>>> at least not level.
> >>>>>>
> >>>>>
> >>>>>> Does the e1000 emulated card work correctly?
> >>>>>
> >>>>> Yes, that one seems to work ok.
> >>>>>
> >>>>>> What happens if you disable interrupt remapping (see patch below)?
> >>>>>
> >>>>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
> >>>>> still works. Both then using IOAPIC-fasteoi.
> >>>>>
> >>>>
> >>>> That means there must be another subtle bug in Xen in interrupt
> >>>> remapping that only affects 8139p emulation
> >>>>
> >>> Right, or to be complete:
> >>> - e1000: ok
> >>> - 8139cp: unstable (setup is possible)
> >>> - ne2k_pci: not working (tx problems from the beginning)
> >>>
> >>> The behaviour feels a bit like interrupts may get lost if occurring at a higher
> >>> rate. Why this affects various drivers differently is a bit weird.
> >>>>
> >>
> >> This is mainly speculating... Quite a while back there was this patch to events:
> >>
> >> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
> >> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> >> Date: Fri Aug 20 19:10:01 2010 -0700
> >>
> >> xen: handle events as edge-triggered
> >>
> >> The commit message stated that Xen events are logically edge triggered. So PV
> >> events were changed to be handled as edge interrupts. Would that not mean that
> >> for xen-pirq-apic being using events this would apply the same and those should
> >> be apic-edge instead of level?
> >
> > That commit is referring to the internal way Linux handles these event,
> > that look like normal interrupt to the Linux irq subsystem. It is not
> > related to the way actual events are delivered from Xen to Linux, so it
> > shouldn't matter here.
> >
> > I would add lots of printk's in:
> >
> > xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert
> > xen/arch/x86/hvm/irq.c:assert_irq
> > xen/arch/x86/hvm/irq.c:assert_gsi
> >
> > to find out why xen is not injecting those interrupts
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
>
> It took quite a bit of time but at least I got some hopefully useful information
> now. So in general, whenever an interrupt is asserted,
> the hypervisor runs through this:
>
> __hvm_pci_intx_assert:
> when assert count was 0 before incrementing
> call assert_gsi
> call send_guest_pirq (when hvm uses pirq)
>
> In the send_guest_pirq chain is a call to evtchn_set_pending which tests as one
> of the first actions whether evtchn_pending in the shared_info is set. If that
> is the case the call immediately returns with 1.
>
> Adding printks to call_assert_gsi, I noticed that
> - When things stop working, the last call to send_guest_pirq returned 1.
> - But not every time the return code is one, the stall happens.
> - e1000 also has cases where send_guest_pirq returns 1 but they happen much
> less often (than using the 8139cp).
>
> Usually every intx_assert has a intx_deassert call that follows. when the stall
> occurs, this does not happen. Right here I got some troubles to understand where
> this intx_deassert is actually triggered. With an added WARN_ON the stack traces
> seem odd, like this:
>
> (XEN) [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130
> (XEN) [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60
> (XEN) [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60
> (XEN) [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220
> (XEN) [<ffff82c48017dba6>] copy_from_user+0x26/0x90
> (XEN) [<ffff82c4801f9446>] do_iret+0xb6/0x1a0
> (XEN) [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d
>
> Not really sure how one gets from do_update_descriptor to do_hvm_op and the only
> thing in there which does the deassert is some irq level setting.
>
> Actually the guest does not really do much do EOI (which I had been assuming).
> But since domain_pirq_to_irq maps to 0 for emuirqs, the call to
> PHYSDEVOP_irq_status_query will hit the following and not set the flag for
> needing EOI.
>
> irq_status_query.flags = 0;
> if ( is_hvm_domain(v->domain) &&
> domain_pirq_to_irq(v->domain, irq) <= 0 )
> {
> ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
> break;
> }
>
> So all the guest is doing is to clear evtchn_pending in the pirq EOI function. I
> fail to understand what actually is doing the hvm_pci_intx_deassert calls but
> the way the fasteoi code in the guest looks to be working, there seems to be
> some gap between calling the handler and the eoi function... So from what I see,
> I would assume the following:
>
> dom0 domU
> - intx_assert (count 0->1)
> - send_guest_pirq = 0
> (evtchn_pending = 1)
> - upcall starts fasteoi handler
> - something does intx_deassert
> (count 1->0)
> - intx_assert (count 0->1)
> - send_guest_pirq = 1
> (evtchn_pending still set)
> - handler->eoi sets evtchn to 0 but
> otherwise does nothing
> - there is no intx_deassert, so even
> when another intx_assert would happen
> (which does not seem to be the case)
> no further send_guest_pirq would be
> called.
>
> Unfortunately I do miss some details on the inner working here. Generally I
> wonder whether not setting the needsEOI flag for those pirqs just is the
> problem. But it also could be intentional...
Thanks for the very detailed analysis.
It seems to me that the problem is that if the interrupt is a level
triggered interrupt when the guest issues an EOI we should be
reinjecting the interrupt again if it has been issued a second time in
the meantime. However this doesn't happen if the interrupt has been
remapped onto an even channel. In that case the guest is not even going
to issue an EOI at all.
So I wrote a patch to force the guest to issue EOIs even on remapped
irqs; in the hypercall handler we check whether we need to reinject the
interrupt and if that is the case we set the corresponding event channel
pending.
Could you please try the patch I appended? I haven't been able to reproduce
your problem so I am not really sure if it works.
diff -r e042fb60e0ee xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c Thu Sep 29 11:23:01 2011 +0000
+++ b/xen/arch/x86/physdev.c Fri Sep 30 14:01:46 2011 +0000
@@ -11,6 +11,7 @@
#include <asm/current.h>
#include <asm/io_apic.h>
#include <asm/msi.h>
+#include <asm/hvm/irq.h>
#include <asm/hypercall.h>
#include <public/xen.h>
#include <public/physdev.h>
@@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
if ( !is_hvm_domain(v->domain) ||
domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
pirq_guest_eoi(pirq);
+ if ( is_hvm_domain(v->domain) &&
+ domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
+ {
+ struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
+ int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
+
+ /* if this is a level irq and count > 0, send another
+ * notification */
+ if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
+ && hvm_irq->gsi_assert_count[gsi] )
+ send_guest_pirq(v->domain, pirq);
+ }
spin_unlock(&v->domain->event_lock);
ret = 0;
break;
@@ -327,12 +340,6 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
if ( (irq < 0) || (irq >= v->domain->nr_pirqs) )
break;
irq_status_query.flags = 0;
- if ( is_hvm_domain(v->domain) &&
- domain_pirq_to_irq(v->domain, irq) <= 0 )
- {
- ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
- break;
- }
/*
* Even edge-triggered or message-based IRQs can need masking from
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-09-30 14:09 ` Stefano Stabellini
@ 2011-09-30 16:06 ` Stefan Bader
2011-09-30 17:59 ` Stefan Bader
0 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-09-30 16:06 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel
On 30.09.2011 16:09, Stefano Stabellini wrote:
> @@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
> if ( !is_hvm_domain(v->domain) ||
> domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
> pirq_guest_eoi(pirq);
> + if ( is_hvm_domain(v->domain) &&
> + domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
> + {
> + struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
> + int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
> +
> + /* if this is a level irq and count > 0, send another
> + * notification */
> + if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
> + && hvm_irq->gsi_assert_count[gsi] )
> + send_guest_pirq(v->domain, pirq);
> + }
> spin_unlock(&v->domain->event_lock);
> ret = 0;
> break;
This hunk looks substantially different from my 4.1.1 based code. There is no
spin_lock acquired. Not sure that could be a reason for the different behaviour,
too. I'll add that spinlock too.
case PHYSDEVOP_eoi: {
struct physdev_eoi eoi;
ret = -EFAULT;
if ( copy_from_guest(&eoi, arg, 1) != 0 )
break;
ret = -EINVAL;
if ( eoi.irq >= v->domain->nr_pirqs )
break;
if ( v->domain->arch.pirq_eoi_map )
evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
if ( !is_hvm_domain(v->domain) ||
domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
ret = pirq_guest_eoi(v->domain, eoi.irq);
else
ret = 0;
break;
}
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-09-30 16:06 ` Stefan Bader
@ 2011-09-30 17:59 ` Stefan Bader
2011-10-03 17:24 ` Stefano Stabellini
0 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-09-30 17:59 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel
On 30.09.2011 18:06, Stefan Bader wrote:
> On 30.09.2011 16:09, Stefano Stabellini wrote:
>> @@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>> if ( !is_hvm_domain(v->domain) ||
>> domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>> pirq_guest_eoi(pirq);
>> + if ( is_hvm_domain(v->domain) &&
>> + domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
>> + {
>> + struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
>> + int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
>> +
>> + /* if this is a level irq and count > 0, send another
>> + * notification */
>> + if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
>> + && hvm_irq->gsi_assert_count[gsi] )
>> + send_guest_pirq(v->domain, pirq);
>> + }
>> spin_unlock(&v->domain->event_lock);
>> ret = 0;
>> break;
>
> This hunk looks substantially different from my 4.1.1 based code. There is no
> spin_lock acquired. Not sure that could be a reason for the different behaviour,
> too. I'll add that spinlock too.
>
> case PHYSDEVOP_eoi: {
> struct physdev_eoi eoi;
> ret = -EFAULT;
> if ( copy_from_guest(&eoi, arg, 1) != 0 )
> break;
> ret = -EINVAL;
> if ( eoi.irq >= v->domain->nr_pirqs )
> break;
> if ( v->domain->arch.pirq_eoi_map )
> evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
> if ( !is_hvm_domain(v->domain) ||
> domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
> ret = pirq_guest_eoi(v->domain, eoi.irq);
> else
> ret = 0;
> break;
> }
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
Ok, so I had been modifying that hunk to
spin_lock(&v->domain->event_lock);
if ( v->domain->arch.pirq_eoi_map )
evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
if ( !is_hvm_domain(v->domain) ||
domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
pirq_guest_eoi(v->domain, eoi.irq);
if ( is_hvm_domain(v->domain) &&
domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
{
struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
/* if this is a level irq and count > 0, send another
* notification */
if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
&& hvm_irq->gsi_assert_count[gsi] ) {
printk("re-send event for gsi%i\n", gsi);
send_guest_pirq(v->domain, eoi.irq);
}
}
spin_unlock(&v->domain->event_lock);
ret = 0;
Also I did not completely remove the section that would return the status
without setting needsEOI. I just changed the if condition to be <0 instead of
<=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check
could be useful for something.
irq_status_query.flags = 0;
if ( is_hvm_domain(v->domain) &&
domain_pirq_to_irq(v->domain, irq) < 0 )
{
ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
break;
}
With that a quick test shows both the re-sends done sometimes and the domU doing
EOIs. And there is no stall apparent. Did the same quick test with the e1000
emulated NIC and that still seems ok. Those were not very thorough tests but at
least I would have observed a stall pretty quick otherwise.
-Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-09-30 17:59 ` Stefan Bader
@ 2011-10-03 17:24 ` Stefano Stabellini
2011-10-03 18:13 ` Stefano Stabellini
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-03 17:24 UTC (permalink / raw)
To: Stefan Bader; +Cc: xen-devel, Stefano Stabellini
On Fri, 30 Sep 2011, Stefan Bader wrote:
> Also I did not completely remove the section that would return the status
> without setting needsEOI. I just changed the if condition to be <0 instead of
> <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check
> could be useful for something.
>
> irq_status_query.flags = 0;
> if ( is_hvm_domain(v->domain) &&
> domain_pirq_to_irq(v->domain, irq) < 0 )
> {
> ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
> break;
> }
>
You need to remove the entire test because we want to receive
notifications in all cases.
> With that a quick test shows both the re-sends done sometimes and the domU doing
> EOIs. And there is no stall apparent. Did the same quick test with the e1000
> emulated NIC and that still seems ok. Those were not very thorough tests but at
> least I would have observed a stall pretty quick otherwise.
I am glad it fixes the problem for you.
I am going to send a different patch upstream for Xen 4.2, because I
would also like it to cover the very unlikely scenario in which a PV
guest (like dom0 or a PV guest with PCI passthrough) is loosing level
interrupts because when Xen tries to set the corresponding event channel
pending the bit is alreay set. The codebase is different enough that
making the same change on 4.1 is non-trivial. I am appending the new
patch to this email, it would be great if you could test it. You just
need a 4.2 hypervisor, not the entire system. You should be able to
perform the test updating only xen.gz.
If you have trouble if xen-unstable.hg tip, try changeset 23843.
---
diff -r bf533533046c xen/arch/x86/hvm/irq.c
--- a/xen/arch/x86/hvm/irq.c Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/hvm/irq.c Mon Oct 03 16:54:51 2011 +0000
@@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
if ( hvm_domain_use_pirq(d, pirq) )
{
- send_guest_pirq(d, pirq);
+ if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
+ pirq->lost++;
return;
}
vioapic_irq_positive_edge(d, ioapic_gsi);
@@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
{
struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
unsigned int gsi, link, isa_irq;
+ struct pirq *pirq;
ASSERT((device <= 31) && (intx <= 3));
@@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
gsi = hvm_pci_intx_gsi(device, intx);
if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
assert_gsi(d, gsi);
+ else {
+ pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
+ if ( hvm_domain_use_pirq(d, pirq) )
+ pirq->lost++;
+ }
link = hvm_pci_intx_link(device, intx);
isa_irq = hvm_irq->pci_link.route[link];
diff -r bf533533046c xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/irq.c Mon Oct 03 16:54:51 2011 +0000
@@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
!test_and_set_bool(pirq->masked) )
action->in_flight++;
if ( !hvm_do_IRQ_dpci(d, pirq) )
- send_guest_pirq(d, pirq);
+ {
+ if ( send_guest_pirq(d, pirq) &&
+ action->ack_type == ACKTYPE_EOI )
+ pirq->lost++;
+ }
}
if ( action->ack_type != ACKTYPE_NONE )
diff -r bf533533046c xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/physdev.c Mon Oct 03 16:54:51 2011 +0000
@@ -11,6 +11,7 @@
#include <asm/current.h>
#include <asm/io_apic.h>
#include <asm/msi.h>
+#include <asm/hvm/irq.h>
#include <asm/hypercall.h>
#include <public/xen.h>
#include <public/physdev.h>
@@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
if ( !is_hvm_domain(v->domain) ||
domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
pirq_guest_eoi(pirq);
+ if ( pirq->lost > 0) {
+ if ( !send_guest_pirq(v->domain, pirq) )
+ pirq->lost--;
+ }
spin_unlock(&v->domain->event_lock);
ret = 0;
break;
@@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
break;
irq_status_query.flags = 0;
if ( is_hvm_domain(v->domain) &&
- domain_pirq_to_irq(v->domain, irq) <= 0 )
+ domain_pirq_to_irq(v->domain, irq) <= 0 &&
+ domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
{
- ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+ ret = -EINVAL;
break;
}
diff -r bf533533046c xen/include/xen/irq.h
--- a/xen/include/xen/irq.h Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/include/xen/irq.h Mon Oct 03 16:54:51 2011 +0000
@@ -146,6 +146,7 @@ struct pirq {
int pirq;
u16 evtchn;
bool_t masked;
+ u32 lost;
struct rcu_head rcu_head;
struct arch_pirq arch;
};
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-10-03 17:24 ` Stefano Stabellini
@ 2011-10-03 18:13 ` Stefano Stabellini
2011-10-04 10:07 ` Andrew Cooper
2011-10-05 16:10 ` Stefan Bader
2011-10-27 10:37 ` [PATCH] xen: do not loose level interrupt notifications Stefano Stabellini
2 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-03 18:13 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel, Jan Beulich, Stefan Bader
CC'ing Jan, that probably is going to have an opinion on this.
Let me add a bit of background: Stefan found out that PV on HVM guests
could loose level interrupts coming from emulated devices. Looking
through the code I realized that we need to add some logic to inject a
pirq in the guest if a level interrupt has been raised while the guest
is servicing the first one.
While this is all very specific to interrupt remapping and emulated
devices, I realized that something similar could happen even with dom0
or other PV guests with PCI passthrough:
1) the device raises a level interrupt and xen injects it into the
guest;
2) the guest is temporarely stuck: it does not ack it or eoi it;
3) the xen timer kicks in and eois the interrupt;
4) the device thinks it is all fine and sends a second interrupt;
5) Xen fails to inject the second interrupt into the guest because the
guest has still the event channel pending bit set;
at this point the guest looses the second interrupt notification, that
is not supposed to happen with level interrupts and I think it might
cause problems with some devices.
Jan, do you think we should try to handle this case, or is it too
unlikely?
In any case we need to handle the PV on HVM remapping bug, that because
of the way interrupts are emulated is much more likely to happen...
On Mon, 3 Oct 2011, Stefano Stabellini wrote:
> On Fri, 30 Sep 2011, Stefan Bader wrote:
> > Also I did not completely remove the section that would return the status
> > without setting needsEOI. I just changed the if condition to be <0 instead of
> > <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check
> > could be useful for something.
> >
> > irq_status_query.flags = 0;
> > if ( is_hvm_domain(v->domain) &&
> > domain_pirq_to_irq(v->domain, irq) < 0 )
> > {
> > ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
> > break;
> > }
> >
>
> You need to remove the entire test because we want to receive
> notifications in all cases.
>
>
> > With that a quick test shows both the re-sends done sometimes and the domU doing
> > EOIs. And there is no stall apparent. Did the same quick test with the e1000
> > emulated NIC and that still seems ok. Those were not very thorough tests but at
> > least I would have observed a stall pretty quick otherwise.
>
> I am glad it fixes the problem for you.
>
> I am going to send a different patch upstream for Xen 4.2, because I
> would also like it to cover the very unlikely scenario in which a PV
> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
> interrupts because when Xen tries to set the corresponding event channel
> pending the bit is alreay set. The codebase is different enough that
> making the same change on 4.1 is non-trivial. I am appending the new
> patch to this email, it would be great if you could test it. You just
> need a 4.2 hypervisor, not the entire system. You should be able to
> perform the test updating only xen.gz.
> If you have trouble if xen-unstable.hg tip, try changeset 23843.
>
> ---
>
>
> diff -r bf533533046c xen/arch/x86/hvm/irq.c
> --- a/xen/arch/x86/hvm/irq.c Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/hvm/irq.c Mon Oct 03 16:54:51 2011 +0000
> @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
>
> if ( hvm_domain_use_pirq(d, pirq) )
> {
> - send_guest_pirq(d, pirq);
> + if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
> + pirq->lost++;
> return;
> }
> vioapic_irq_positive_edge(d, ioapic_gsi);
> @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
> {
> struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
> unsigned int gsi, link, isa_irq;
> + struct pirq *pirq;
>
> ASSERT((device <= 31) && (intx <= 3));
>
> @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
> gsi = hvm_pci_intx_gsi(device, intx);
> if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
> assert_gsi(d, gsi);
> + else {
> + pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
> + if ( hvm_domain_use_pirq(d, pirq) )
> + pirq->lost++;
> + }
>
> link = hvm_pci_intx_link(device, intx);
> isa_irq = hvm_irq->pci_link.route[link];
> diff -r bf533533046c xen/arch/x86/irq.c
> --- a/xen/arch/x86/irq.c Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/irq.c Mon Oct 03 16:54:51 2011 +0000
> @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
> !test_and_set_bool(pirq->masked) )
> action->in_flight++;
> if ( !hvm_do_IRQ_dpci(d, pirq) )
> - send_guest_pirq(d, pirq);
> + {
> + if ( send_guest_pirq(d, pirq) &&
> + action->ack_type == ACKTYPE_EOI )
> + pirq->lost++;
> + }
> }
>
> if ( action->ack_type != ACKTYPE_NONE )
> diff -r bf533533046c xen/arch/x86/physdev.c
> --- a/xen/arch/x86/physdev.c Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/physdev.c Mon Oct 03 16:54:51 2011 +0000
> @@ -11,6 +11,7 @@
> #include <asm/current.h>
> #include <asm/io_apic.h>
> #include <asm/msi.h>
> +#include <asm/hvm/irq.h>
> #include <asm/hypercall.h>
> #include <public/xen.h>
> #include <public/physdev.h>
> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
> if ( !is_hvm_domain(v->domain) ||
> domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
> pirq_guest_eoi(pirq);
> + if ( pirq->lost > 0) {
> + if ( !send_guest_pirq(v->domain, pirq) )
> + pirq->lost--;
> + }
> spin_unlock(&v->domain->event_lock);
> ret = 0;
> break;
> @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
> break;
> irq_status_query.flags = 0;
> if ( is_hvm_domain(v->domain) &&
> - domain_pirq_to_irq(v->domain, irq) <= 0 )
> + domain_pirq_to_irq(v->domain, irq) <= 0 &&
> + domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
> {
> - ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
> + ret = -EINVAL;
> break;
> }
>
> diff -r bf533533046c xen/include/xen/irq.h
> --- a/xen/include/xen/irq.h Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/include/xen/irq.h Mon Oct 03 16:54:51 2011 +0000
> @@ -146,6 +146,7 @@ struct pirq {
> int pirq;
> u16 evtchn;
> bool_t masked;
> + u32 lost;
> struct rcu_head rcu_head;
> struct arch_pirq arch;
> };
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-10-03 18:13 ` Stefano Stabellini
@ 2011-10-04 10:07 ` Andrew Cooper
2011-10-04 14:13 ` Stefano Stabellini
0 siblings, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2011-10-04 10:07 UTC (permalink / raw)
To: xen-devel
On 03/10/11 19:13, Stefano Stabellini wrote:
> CC'ing Jan, that probably is going to have an opinion on this.
>
> Let me add a bit of background: Stefan found out that PV on HVM guests
> could loose level interrupts coming from emulated devices. Looking
> through the code I realized that we need to add some logic to inject a
> pirq in the guest if a level interrupt has been raised while the guest
> is servicing the first one.
> While this is all very specific to interrupt remapping and emulated
> devices, I realized that something similar could happen even with dom0
> or other PV guests with PCI passthrough:
>
> 1) the device raises a level interrupt and xen injects it into the
> guest;
>
> 2) the guest is temporarely stuck: it does not ack it or eoi it;
>
> 3) the xen timer kicks in and eois the interrupt;
>
> 4) the device thinks it is all fine and sends a second interrupt;
>
> 5) Xen fails to inject the second interrupt into the guest because the
> guest has still the event channel pending bit set;
>
> at this point the guest looses the second interrupt notification, that
> is not supposed to happen with level interrupts and I think it might
> cause problems with some devices.
>
> Jan, do you think we should try to handle this case, or is it too
> unlikely?
I am not certain whether this is relevant, but the ICH10 IO-APIC
documentation indicated that early EOI'ing of a line level interrupt
should not have this effect. Specifically, it states that EOI'ing a
line level interrupt whos line is still asserted will cause the
interrupt to be "re-raised" from the IO-APIC. It uses this to assert
that it is fine to use multiple IO-APIC entries with the same vector,
with a broadcast of vector number alone to EOI the interrupt.
In this case, while Xen sees two interrupts, from the devices point of
view, only I has happened.
In the case where the device has dropped its line level interrupt of its
own accord, then I would agree that the current Xen behavior is wrong.
I cant offhand think of a good reason why this would occur.
I know it is not helpful in this case, but as a rule of thumb, line
level interrupts should not be used with Xen. The average response time
on an unloaded system is ~30ms, ranging from 5 to 150. (On a sample set
of a Dell R710, Xen 4.1.0 and 2.6.32 dom0, over 2 weeks of debugging
another line level interrupt bug).
~Andrew
> In any case we need to handle the PV on HVM remapping bug, that because
> of the way interrupts are emulated is much more likely to happen...
>
>
> On Mon, 3 Oct 2011, Stefano Stabellini wrote:
>> On Fri, 30 Sep 2011, Stefan Bader wrote:
>>> Also I did not completely remove the section that would return the status
>>> without setting needsEOI. I just changed the if condition to be <0 instead of
>>> <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check
>>> could be useful for something.
>>>
>>> irq_status_query.flags = 0;
>>> if ( is_hvm_domain(v->domain) &&
>>> domain_pirq_to_irq(v->domain, irq) < 0 )
>>> {
>>> ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
>>> break;
>>> }
>>>
>> You need to remove the entire test because we want to receive
>> notifications in all cases.
>>
>>
>>> With that a quick test shows both the re-sends done sometimes and the domU doing
>>> EOIs. And there is no stall apparent. Did the same quick test with the e1000
>>> emulated NIC and that still seems ok. Those were not very thorough tests but at
>>> least I would have observed a stall pretty quick otherwise.
>> I am glad it fixes the problem for you.
>>
>> I am going to send a different patch upstream for Xen 4.2, because I
>> would also like it to cover the very unlikely scenario in which a PV
>> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
>> interrupts because when Xen tries to set the corresponding event channel
>> pending the bit is alreay set. The codebase is different enough that
>> making the same change on 4.1 is non-trivial. I am appending the new
>> patch to this email, it would be great if you could test it. You just
>> need a 4.2 hypervisor, not the entire system. You should be able to
>> perform the test updating only xen.gz.
>> If you have trouble if xen-unstable.hg tip, try changeset 23843.
>>
>> ---
>>
>>
>> diff -r bf533533046c xen/arch/x86/hvm/irq.c
>> --- a/xen/arch/x86/hvm/irq.c Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/hvm/irq.c Mon Oct 03 16:54:51 2011 +0000
>> @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
>>
>> if ( hvm_domain_use_pirq(d, pirq) )
>> {
>> - send_guest_pirq(d, pirq);
>> + if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
>> + pirq->lost++;
>> return;
>> }
>> vioapic_irq_positive_edge(d, ioapic_gsi);
>> @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
>> {
>> struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
>> unsigned int gsi, link, isa_irq;
>> + struct pirq *pirq;
>>
>> ASSERT((device <= 31) && (intx <= 3));
>>
>> @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
>> gsi = hvm_pci_intx_gsi(device, intx);
>> if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
>> assert_gsi(d, gsi);
>> + else {
>> + pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
>> + if ( hvm_domain_use_pirq(d, pirq) )
>> + pirq->lost++;
>> + }
>>
>> link = hvm_pci_intx_link(device, intx);
>> isa_irq = hvm_irq->pci_link.route[link];
>> diff -r bf533533046c xen/arch/x86/irq.c
>> --- a/xen/arch/x86/irq.c Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/irq.c Mon Oct 03 16:54:51 2011 +0000
>> @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
>> !test_and_set_bool(pirq->masked) )
>> action->in_flight++;
>> if ( !hvm_do_IRQ_dpci(d, pirq) )
>> - send_guest_pirq(d, pirq);
>> + {
>> + if ( send_guest_pirq(d, pirq) &&
>> + action->ack_type == ACKTYPE_EOI )
>> + pirq->lost++;
>> + }
>> }
>>
>> if ( action->ack_type != ACKTYPE_NONE )
>> diff -r bf533533046c xen/arch/x86/physdev.c
>> --- a/xen/arch/x86/physdev.c Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/physdev.c Mon Oct 03 16:54:51 2011 +0000
>> @@ -11,6 +11,7 @@
>> #include <asm/current.h>
>> #include <asm/io_apic.h>
>> #include <asm/msi.h>
>> +#include <asm/hvm/irq.h>
>> #include <asm/hypercall.h>
>> #include <public/xen.h>
>> #include <public/physdev.h>
>> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>> if ( !is_hvm_domain(v->domain) ||
>> domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>> pirq_guest_eoi(pirq);
>> + if ( pirq->lost > 0) {
>> + if ( !send_guest_pirq(v->domain, pirq) )
>> + pirq->lost--;
>> + }
>> spin_unlock(&v->domain->event_lock);
>> ret = 0;
>> break;
>> @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>> break;
>> irq_status_query.flags = 0;
>> if ( is_hvm_domain(v->domain) &&
>> - domain_pirq_to_irq(v->domain, irq) <= 0 )
>> + domain_pirq_to_irq(v->domain, irq) <= 0 &&
>> + domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
>> {
>> - ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
>> + ret = -EINVAL;
>> break;
>> }
>>
>> diff -r bf533533046c xen/include/xen/irq.h
>> --- a/xen/include/xen/irq.h Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/include/xen/irq.h Mon Oct 03 16:54:51 2011 +0000
>> @@ -146,6 +146,7 @@ struct pirq {
>> int pirq;
>> u16 evtchn;
>> bool_t masked;
>> + u32 lost;
>> struct rcu_head rcu_head;
>> struct arch_pirq arch;
>> };
>>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-10-04 10:07 ` Andrew Cooper
@ 2011-10-04 14:13 ` Stefano Stabellini
0 siblings, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-04 14:13 UTC (permalink / raw)
To: Andrew Cooper; +Cc: xen-devel
On Tue, 4 Oct 2011, Andrew Cooper wrote:
> On 03/10/11 19:13, Stefano Stabellini wrote:
> > CC'ing Jan, that probably is going to have an opinion on this.
> >
> > Let me add a bit of background: Stefan found out that PV on HVM guests
> > could loose level interrupts coming from emulated devices. Looking
> > through the code I realized that we need to add some logic to inject a
> > pirq in the guest if a level interrupt has been raised while the guest
> > is servicing the first one.
> > While this is all very specific to interrupt remapping and emulated
> > devices, I realized that something similar could happen even with dom0
> > or other PV guests with PCI passthrough:
> >
> > 1) the device raises a level interrupt and xen injects it into the
> > guest;
> >
> > 2) the guest is temporarely stuck: it does not ack it or eoi it;
> >
> > 3) the xen timer kicks in and eois the interrupt;
> >
> > 4) the device thinks it is all fine and sends a second interrupt;
> >
> > 5) Xen fails to inject the second interrupt into the guest because the
> > guest has still the event channel pending bit set;
> >
> > at this point the guest looses the second interrupt notification, that
> > is not supposed to happen with level interrupts and I think it might
> > cause problems with some devices.
> >
> > Jan, do you think we should try to handle this case, or is it too
> > unlikely?
>
> I am not certain whether this is relevant, but the ICH10 IO-APIC
> documentation indicated that early EOI'ing of a line level interrupt
> should not have this effect. Specifically, it states that EOI'ing a
> line level interrupt whos line is still asserted will cause the
> interrupt to be "re-raised" from the IO-APIC. It uses this to assert
> that it is fine to use multiple IO-APIC entries with the same vector,
> with a broadcast of vector number alone to EOI the interrupt.
>
> In this case, while Xen sees two interrupts, from the devices point of
> view, only I has happened.
>
> In the case where the device has dropped its line level interrupt of its
> own accord, then I would agree that the current Xen behavior is wrong.
> I cant offhand think of a good reason why this would occur.
I think this scenario is actually possible. It is certainly happening
with qemu's emulated devices.
This patch would take care of re-injecting the interrupts both in the
case of the device deasserting and reasserting the interrupt while the
guest hasn't cleared the pending bit yet and in case a PV on HVM guest
eois the interrupt too early.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-10-03 17:24 ` Stefano Stabellini
2011-10-03 18:13 ` Stefano Stabellini
@ 2011-10-05 16:10 ` Stefan Bader
2011-10-06 10:12 ` Stefano Stabellini
2011-10-27 10:37 ` [PATCH] xen: do not loose level interrupt notifications Stefano Stabellini
2 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-10-05 16:10 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel
[-- Attachment #1: Type: text/plain, Size: 6824 bytes --]
On 03.10.2011 19:24, Stefano Stabellini wrote:
> I am going to send a different patch upstream for Xen 4.2, because I
> would also like it to cover the very unlikely scenario in which a PV
> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
> interrupts because when Xen tries to set the corresponding event channel
> pending the bit is alreay set. The codebase is different enough that
> making the same change on 4.1 is non-trivial. I am appending the new
> patch to this email, it would be great if you could test it. You just
> need a 4.2 hypervisor, not the entire system. You should be able to
> perform the test updating only xen.gz.
> If you have trouble if xen-unstable.hg tip, try changeset 23843.
Hi Stefano,
currently I would have the problem that I don't have too much time to move to
another hypervisor (tests may or may not be useful there with substantial
changes beside this one) with our next release being close.
But I think I got a usable backport of your change to 4.1.1 (you think it looks
ok?) and have given that a quick test which seems to be ok...
Though one drawback is that I don't have a setup which would use passthrough, so
that path is not tested. I think I did see (with a debugging version) that the
lost count was incremented and decremented in dom0, though.
-Stefan
---
Index: xen-4.1.1/xen/arch/x86/domain.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/domain.c 2011-10-05 15:03:19.405815293 +0200
+++ xen-4.1.1/xen/arch/x86/domain.c 2011-10-05 15:09:59.781816622 +0200
@@ -514,6 +514,12 @@ int arch_domain_create(struct domain *d,
memset(d->arch.pirq_irq, 0,
d->nr_pirqs * sizeof(*d->arch.pirq_irq));
+ d->arch.pirq_lost = xmalloc_array(int, d->nr_pirqs);
+ if ( !d->arch.pirq_lost)
+ goto fail;
+ memset(d->arch.pirq_lost, 0,
+ d->nr_pirqs * sizeof(*d->arch.pirq_lost));
+
d->arch.irq_pirq = xmalloc_array(int, nr_irqs);
if ( !d->arch.irq_pirq )
goto fail;
@@ -575,6 +581,7 @@ int arch_domain_create(struct domain *d,
fail:
d->is_dying = DOMDYING_dead;
vmce_destroy_msr(d);
+ xfree(d->arch.pirq_lost);
xfree(d->arch.pirq_irq);
xfree(d->arch.irq_pirq);
xfree(d->arch.pirq_emuirq);
@@ -628,6 +635,7 @@ void arch_domain_destroy(struct domain *
#endif
free_xenheap_page(d->shared_info);
+ xfree(d->arch.pirq_lost);
xfree(d->arch.pirq_irq);
xfree(d->arch.irq_pirq);
xfree(d->arch.pirq_emuirq);
Index: xen-4.1.1/xen/arch/x86/hvm/irq.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/hvm/irq.c 2011-10-05 15:14:35.441815292 +0200
+++ xen-4.1.1/xen/arch/x86/hvm/irq.c 2011-10-05 17:55:43.603986605 +0200
@@ -33,7 +33,9 @@ static void assert_gsi(struct domain *d,
int pirq = domain_emuirq_to_pirq(d, ioapic_gsi);
if ( hvm_domain_use_pirq(d, pirq) )
{
- send_guest_pirq(d, pirq);
+ if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
+ if (d->arch.pirq_lost)
+ d->arch.pirq_lost[pirq]++;
return;
}
vioapic_irq_positive_edge(d, ioapic_gsi);
@@ -67,6 +69,12 @@ static void __hvm_pci_intx_assert(
gsi = hvm_pci_intx_gsi(device, intx);
if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
assert_gsi(d, gsi);
+ else {
+ int pirq = domain_emuirq_to_pirq(d, gsi);
+
+ if ( hvm_domain_use_pirq(d, pirq) && d->arch.pirq_lost)
+ d->arch.pirq_lost[pirq]++;
+ }
link = hvm_pci_intx_link(device, intx);
isa_irq = hvm_irq->pci_link.route[link];
Index: xen-4.1.1/xen/arch/x86/irq.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/irq.c 2011-10-05 15:26:58.477815292 +0200
+++ xen-4.1.1/xen/arch/x86/irq.c 2011-10-05 17:56:23.191986535 +0200
@@ -888,10 +888,13 @@ static void __do_IRQ_guest(int irq)
desc->status |= IRQ_INPROGRESS; /* cleared during hvm eoi */
}
}
- else if ( send_guest_pirq(d, pirq) &&
- (action->ack_type == ACKTYPE_NONE) )
- {
- already_pending++;
+ else {
+ if ( send_guest_pirq(d, pirq) ) {
+ if ( action->ack_type == ACKTYPE_EOI && d->arch.pirq_lost)
+ d->arch.pirq_lost[pirq]++;
+ else if ( action->ack_type == ACKTYPE_NONE )
+ already_pending++;
+ }
}
}
Index: xen-4.1.1/xen/arch/x86/physdev.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/physdev.c 2011-10-05 15:36:14.545815292 +0200
+++ xen-4.1.1/xen/arch/x86/physdev.c 2011-10-05 17:57:06.055986460 +0200
@@ -261,13 +261,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
ret = -EINVAL;
if ( eoi.irq >= v->domain->nr_pirqs )
break;
+ spin_lock(&v->domain->event_lock);
if ( v->domain->arch.pirq_eoi_map )
evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
if ( !is_hvm_domain(v->domain) ||
domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
- ret = pirq_guest_eoi(v->domain, eoi.irq);
- else
- ret = 0;
+ pirq_guest_eoi(v->domain, eoi.irq);
+ if ( v->domain->arch.pirq_lost && v->domain->arch.pirq_lost[eoi.irq]) {
+ if ( !send_guest_pirq(v->domain, eoi.irq) )
+ v->domain->arch.pirq_lost[eoi.irq]--;
+ }
+ ret = 0;
+ spin_unlock(&v->domain->event_lock);
break;
}
@@ -323,9 +328,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
break;
irq_status_query.flags = 0;
if ( is_hvm_domain(v->domain) &&
- domain_pirq_to_irq(v->domain, irq) <= 0 )
+ domain_pirq_to_irq(v->domain, irq) <= 0 &&
+ domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
{
- ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+ ret = -EINVAL;
break;
}
Index: xen-4.1.1/xen/include/asm-x86/domain.h
===================================================================
--- xen-4.1.1.orig/xen/include/asm-x86/domain.h 2011-10-05 15:10:11.709815293 +0200
+++ xen-4.1.1/xen/include/asm-x86/domain.h 2011-10-05 15:12:46.237815276 +0200
@@ -312,6 +312,9 @@ struct arch_domain
(possibly other cases in the future */
uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */
uint64_t vtsc_usercount; /* not used for hvm */
+
+ /* Protected by d->event_lock, count of lost pirqs */
+ int *pirq_lost;
} __cacheline_aligned;
#define has_arch_pdevs(d) (!list_empty(&(d)->arch.pdev_list))
[-- Attachment #2: xen-backport-pirq-lost.patch --]
[-- Type: text/x-diff, Size: 5500 bytes --]
Index: xen-4.1.1/xen/arch/x86/domain.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/domain.c 2011-10-05 15:03:19.405815293 +0200
+++ xen-4.1.1/xen/arch/x86/domain.c 2011-10-05 15:09:59.781816622 +0200
@@ -514,6 +514,12 @@ int arch_domain_create(struct domain *d,
memset(d->arch.pirq_irq, 0,
d->nr_pirqs * sizeof(*d->arch.pirq_irq));
+ d->arch.pirq_lost = xmalloc_array(int, d->nr_pirqs);
+ if ( !d->arch.pirq_lost)
+ goto fail;
+ memset(d->arch.pirq_lost, 0,
+ d->nr_pirqs * sizeof(*d->arch.pirq_lost));
+
d->arch.irq_pirq = xmalloc_array(int, nr_irqs);
if ( !d->arch.irq_pirq )
goto fail;
@@ -575,6 +581,7 @@ int arch_domain_create(struct domain *d,
fail:
d->is_dying = DOMDYING_dead;
vmce_destroy_msr(d);
+ xfree(d->arch.pirq_lost);
xfree(d->arch.pirq_irq);
xfree(d->arch.irq_pirq);
xfree(d->arch.pirq_emuirq);
@@ -628,6 +635,7 @@ void arch_domain_destroy(struct domain *
#endif
free_xenheap_page(d->shared_info);
+ xfree(d->arch.pirq_lost);
xfree(d->arch.pirq_irq);
xfree(d->arch.irq_pirq);
xfree(d->arch.pirq_emuirq);
Index: xen-4.1.1/xen/arch/x86/hvm/irq.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/hvm/irq.c 2011-10-05 15:14:35.441815292 +0200
+++ xen-4.1.1/xen/arch/x86/hvm/irq.c 2011-10-05 17:55:43.603986605 +0200
@@ -33,7 +33,9 @@ static void assert_gsi(struct domain *d,
int pirq = domain_emuirq_to_pirq(d, ioapic_gsi);
if ( hvm_domain_use_pirq(d, pirq) )
{
- send_guest_pirq(d, pirq);
+ if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
+ if (d->arch.pirq_lost)
+ d->arch.pirq_lost[pirq]++;
return;
}
vioapic_irq_positive_edge(d, ioapic_gsi);
@@ -67,6 +69,12 @@ static void __hvm_pci_intx_assert(
gsi = hvm_pci_intx_gsi(device, intx);
if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
assert_gsi(d, gsi);
+ else {
+ int pirq = domain_emuirq_to_pirq(d, gsi);
+
+ if ( hvm_domain_use_pirq(d, pirq) && d->arch.pirq_lost)
+ d->arch.pirq_lost[pirq]++;
+ }
link = hvm_pci_intx_link(device, intx);
isa_irq = hvm_irq->pci_link.route[link];
Index: xen-4.1.1/xen/arch/x86/irq.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/irq.c 2011-10-05 15:26:58.477815292 +0200
+++ xen-4.1.1/xen/arch/x86/irq.c 2011-10-05 17:56:23.191986535 +0200
@@ -888,10 +888,13 @@ static void __do_IRQ_guest(int irq)
desc->status |= IRQ_INPROGRESS; /* cleared during hvm eoi */
}
}
- else if ( send_guest_pirq(d, pirq) &&
- (action->ack_type == ACKTYPE_NONE) )
- {
- already_pending++;
+ else {
+ if ( send_guest_pirq(d, pirq) ) {
+ if ( action->ack_type == ACKTYPE_EOI && d->arch.pirq_lost)
+ d->arch.pirq_lost[pirq]++;
+ else if ( action->ack_type == ACKTYPE_NONE )
+ already_pending++;
+ }
}
}
Index: xen-4.1.1/xen/arch/x86/physdev.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/physdev.c 2011-10-05 15:36:14.545815292 +0200
+++ xen-4.1.1/xen/arch/x86/physdev.c 2011-10-05 17:57:06.055986460 +0200
@@ -261,13 +261,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
ret = -EINVAL;
if ( eoi.irq >= v->domain->nr_pirqs )
break;
+ spin_lock(&v->domain->event_lock);
if ( v->domain->arch.pirq_eoi_map )
evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
if ( !is_hvm_domain(v->domain) ||
domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
- ret = pirq_guest_eoi(v->domain, eoi.irq);
- else
- ret = 0;
+ pirq_guest_eoi(v->domain, eoi.irq);
+ if ( v->domain->arch.pirq_lost && v->domain->arch.pirq_lost[eoi.irq]) {
+ if ( !send_guest_pirq(v->domain, eoi.irq) )
+ v->domain->arch.pirq_lost[eoi.irq]--;
+ }
+ ret = 0;
+ spin_unlock(&v->domain->event_lock);
break;
}
@@ -323,9 +328,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
break;
irq_status_query.flags = 0;
if ( is_hvm_domain(v->domain) &&
- domain_pirq_to_irq(v->domain, irq) <= 0 )
+ domain_pirq_to_irq(v->domain, irq) <= 0 &&
+ domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
{
- ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+ ret = -EINVAL;
break;
}
Index: xen-4.1.1/xen/include/asm-x86/domain.h
===================================================================
--- xen-4.1.1.orig/xen/include/asm-x86/domain.h 2011-10-05 15:10:11.709815293 +0200
+++ xen-4.1.1/xen/include/asm-x86/domain.h 2011-10-05 15:12:46.237815276 +0200
@@ -312,6 +312,9 @@ struct arch_domain
(possibly other cases in the future */
uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */
uint64_t vtsc_usercount; /* not used for hvm */
+
+ /* Protected by d->event_lock, count of lost pirqs */
+ int *pirq_lost;
} __cacheline_aligned;
#define has_arch_pdevs(d) (!list_empty(&(d)->arch.pdev_list))
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-10-05 16:10 ` Stefan Bader
@ 2011-10-06 10:12 ` Stefano Stabellini
2011-10-06 12:16 ` Stefan Bader
0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-06 10:12 UTC (permalink / raw)
To: Stefan Bader; +Cc: xen-devel, Stefano Stabellini
On Wed, 5 Oct 2011, Stefan Bader wrote:
> On 03.10.2011 19:24, Stefano Stabellini wrote:
> > I am going to send a different patch upstream for Xen 4.2, because I
> > would also like it to cover the very unlikely scenario in which a PV
> > guest (like dom0 or a PV guest with PCI passthrough) is loosing level
> > interrupts because when Xen tries to set the corresponding event channel
> > pending the bit is alreay set. The codebase is different enough that
> > making the same change on 4.1 is non-trivial. I am appending the new
> > patch to this email, it would be great if you could test it. You just
> > need a 4.2 hypervisor, not the entire system. You should be able to
> > perform the test updating only xen.gz.
> > If you have trouble if xen-unstable.hg tip, try changeset 23843.
>
> Hi Stefano,
>
> currently I would have the problem that I don't have too much time to move to
> another hypervisor (tests may or may not be useful there with substantial
> changes beside this one) with our next release being close.
> But I think I got a usable backport of your change to 4.1.1 (you think it looks
> ok?) and have given that a quick test which seems to be ok...
> Though one drawback is that I don't have a setup which would use passthrough, so
> that path is not tested. I think I did see (with a debugging version) that the
> lost count was incremented and decremented in dom0, though.
>
Honestly if you have to commit to a backport for your package right now,
I would go for the previous version, because it is simpler and less
likely to introduce regressions.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-10-06 10:12 ` Stefano Stabellini
@ 2011-10-06 12:16 ` Stefan Bader
0 siblings, 0 replies; 20+ messages in thread
From: Stefan Bader @ 2011-10-06 12:16 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel
On 06.10.2011 12:12, Stefano Stabellini wrote:
> On Wed, 5 Oct 2011, Stefan Bader wrote:
>> On 03.10.2011 19:24, Stefano Stabellini wrote:
>>> I am going to send a different patch upstream for Xen 4.2, because I
>>> would also like it to cover the very unlikely scenario in which a PV
>>> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
>>> interrupts because when Xen tries to set the corresponding event channel
>>> pending the bit is alreay set. The codebase is different enough that
>>> making the same change on 4.1 is non-trivial. I am appending the new
>>> patch to this email, it would be great if you could test it. You just
>>> need a 4.2 hypervisor, not the entire system. You should be able to
>>> perform the test updating only xen.gz.
>>> If you have trouble if xen-unstable.hg tip, try changeset 23843.
>>
>> Hi Stefano,
>>
>> currently I would have the problem that I don't have too much time to move to
>> another hypervisor (tests may or may not be useful there with substantial
>> changes beside this one) with our next release being close.
>> But I think I got a usable backport of your change to 4.1.1 (you think it looks
>> ok?) and have given that a quick test which seems to be ok...
>> Though one drawback is that I don't have a setup which would use passthrough, so
>> that path is not tested. I think I did see (with a debugging version) that the
>> lost count was incremented and decremented in dom0, though.
>>
>
> Honestly if you have to commit to a backport for your package right now,
> I would go for the previous version, because it is simpler and less
> likely to introduce regressions.
Agreed. Well at least I hope that since that backport seemed to fix the issue I
saw in 4.1.1 it will give some more confidence for you on the 4.2 version. With
the drawback of the passthrough not being tested.
-Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] xen: do not loose level interrupt notifications
2011-10-03 17:24 ` Stefano Stabellini
2011-10-03 18:13 ` Stefano Stabellini
2011-10-05 16:10 ` Stefan Bader
@ 2011-10-27 10:37 ` Stefano Stabellini
2011-10-27 11:18 ` Keir Fraser
2 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-27 10:37 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel, Jan Beulich, Stefan Bader
PV on HVM guests can loose level interrupts coming from emulated
devices: we are missing code to retry to inject a pirq in the guest if
it corresponds to a level interrupt and the interrupt has been raised
while the guest is servicing the first one.
The same thing could also happen with PV guests, including dom0, even
though it is much more unlikely. In case of PV guests the scenario would
be the following:
1) a device raises a level interrupt and xen injects it into the
guest;
2) the guest is temporarely stuck: it does not ack it or eoi it;
3) the xen timer kicks in and eois the interrupt;
4) the device thinks it is all fine and sends a second interrupt;
5) Xen fails to inject the second interrupt into the guest because the
guest has still the event channel pending bit set;
at this point the guest looses the second interrupt notification, that
is not supposed to happen with level interrupts and it might cause
problems with some devices.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
diff -r bf533533046c xen/arch/x86/hvm/irq.c
--- a/xen/arch/x86/hvm/irq.c Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/hvm/irq.c Mon Oct 03 16:54:51 2011 +0000
@@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
if ( hvm_domain_use_pirq(d, pirq) )
{
- send_guest_pirq(d, pirq);
+ if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
+ pirq->lost++;
return;
}
vioapic_irq_positive_edge(d, ioapic_gsi);
@@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
{
struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
unsigned int gsi, link, isa_irq;
+ struct pirq *pirq;
ASSERT((device <= 31) && (intx <= 3));
@@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
gsi = hvm_pci_intx_gsi(device, intx);
if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
assert_gsi(d, gsi);
+ else {
+ pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
+ if ( hvm_domain_use_pirq(d, pirq) )
+ pirq->lost++;
+ }
link = hvm_pci_intx_link(device, intx);
isa_irq = hvm_irq->pci_link.route[link];
diff -r bf533533046c xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/irq.c Mon Oct 03 16:54:51 2011 +0000
@@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
!test_and_set_bool(pirq->masked) )
action->in_flight++;
if ( !hvm_do_IRQ_dpci(d, pirq) )
- send_guest_pirq(d, pirq);
+ {
+ if ( send_guest_pirq(d, pirq) &&
+ action->ack_type == ACKTYPE_EOI )
+ pirq->lost++;
+ }
}
if ( action->ack_type != ACKTYPE_NONE )
diff -r bf533533046c xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/physdev.c Mon Oct 03 16:54:51 2011 +0000
@@ -11,6 +11,7 @@
#include <asm/current.h>
#include <asm/io_apic.h>
#include <asm/msi.h>
+#include <asm/hvm/irq.h>
#include <asm/hypercall.h>
#include <public/xen.h>
#include <public/physdev.h>
@@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
if ( !is_hvm_domain(v->domain) ||
domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
pirq_guest_eoi(pirq);
+ if ( pirq->lost > 0) {
+ if ( !send_guest_pirq(v->domain, pirq) )
+ pirq->lost--;
+ }
spin_unlock(&v->domain->event_lock);
ret = 0;
break;
@@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
break;
irq_status_query.flags = 0;
if ( is_hvm_domain(v->domain) &&
- domain_pirq_to_irq(v->domain, irq) <= 0 )
+ domain_pirq_to_irq(v->domain, irq) <= 0 &&
+ domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
{
- ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+ ret = -EINVAL;
break;
}
diff -r bf533533046c xen/include/xen/irq.h
--- a/xen/include/xen/irq.h Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/include/xen/irq.h Mon Oct 03 16:54:51 2011 +0000
@@ -146,6 +146,7 @@ struct pirq {
int pirq;
u16 evtchn;
bool_t masked;
+ u32 lost;
struct rcu_head rcu_head;
struct arch_pirq arch;
};
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen: do not loose level interrupt notifications
2011-10-27 10:37 ` [PATCH] xen: do not loose level interrupt notifications Stefano Stabellini
@ 2011-10-27 11:18 ` Keir Fraser
2011-10-27 11:42 ` Stefano Stabellini
0 siblings, 1 reply; 20+ messages in thread
From: Keir Fraser @ 2011-10-27 11:18 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel, Stefan Bader, Jan Beulich
On 27/10/2011 11:37, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com>
wrote:
> PV on HVM guests can loose level interrupts coming from emulated
> devices: we are missing code to retry to inject a pirq in the guest if
> it corresponds to a level interrupt and the interrupt has been raised
> while the guest is servicing the first one.
>
> The same thing could also happen with PV guests, including dom0, even
> though it is much more unlikely. In case of PV guests the scenario would
> be the following:
>
> 1) a device raises a level interrupt and xen injects it into the
> guest;
>
> 2) the guest is temporarely stuck: it does not ack it or eoi it;
>
> 3) the xen timer kicks in and eois the interrupt;
>
> 4) the device thinks it is all fine and sends a second interrupt;
>
> 5) Xen fails to inject the second interrupt into the guest because the
> guest has still the event channel pending bit set;
>
> at this point the guest looses the second interrupt notification, that
> is not supposed to happen with level interrupts and it might cause
> problems with some devices.
You can't really lose a level-triggered interrupt. In step (4) the device
isn't really actively involved in sending another interrupt -- it never
deasserted its INTx line, and nor will it until the guest's ISR quenches the
interrupt at the device. If the guest misses such an interrupt, and doesn't
execute the relevant ISR when it should, then another interrupt will simply
be raised by the interrupt controller when the guest does finally EOI the
interrupt. Because the device is *still* asserting the line.
Well, that's the PV case anyway. I don't see any problem with our handling
of the PV case.
Is PV-HVM so different?
-- Keir
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen: do not loose level interrupt notifications
2011-10-27 11:18 ` Keir Fraser
@ 2011-10-27 11:42 ` Stefano Stabellini
2011-10-27 12:17 ` Keir Fraser
0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-27 11:42 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel, Stefan Bader, Jan Beulich, Stefano Stabellini
On Thu, 27 Oct 2011, Keir Fraser wrote:
> On 27/10/2011 11:37, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com>
> wrote:
>
> > PV on HVM guests can loose level interrupts coming from emulated
> > devices: we are missing code to retry to inject a pirq in the guest if
> > it corresponds to a level interrupt and the interrupt has been raised
> > while the guest is servicing the first one.
> >
> > The same thing could also happen with PV guests, including dom0, even
> > though it is much more unlikely. In case of PV guests the scenario would
> > be the following:
> >
> > 1) a device raises a level interrupt and xen injects it into the
> > guest;
> >
> > 2) the guest is temporarely stuck: it does not ack it or eoi it;
> >
> > 3) the xen timer kicks in and eois the interrupt;
> >
> > 4) the device thinks it is all fine and sends a second interrupt;
> >
> > 5) Xen fails to inject the second interrupt into the guest because the
> > guest has still the event channel pending bit set;
> >
> > at this point the guest looses the second interrupt notification, that
> > is not supposed to happen with level interrupts and it might cause
> > problems with some devices.
>
> You can't really lose a level-triggered interrupt. In step (4) the device
> isn't really actively involved in sending another interrupt -- it never
> deasserted its INTx line, and nor will it until the guest's ISR quenches the
> interrupt at the device. If the guest misses such an interrupt, and doesn't
> execute the relevant ISR when it should, then another interrupt will simply
> be raised by the interrupt controller when the guest does finally EOI the
> interrupt. Because the device is *still* asserting the line.
>
> Well, that's the PV case anyway. I don't see any problem with our handling
> of the PV case.
OK, you convinced me.
> Is PV-HVM so different?
Yes, it is. In the PV on HVM case we need to reassert an emulated
interrupt if the guest EOIs it without quenching the interrupt in the
ISR.
We are doing it already in the emulated code path but we are not doing
it for interrupts that have been remapped into pirqs.
That said, if we don't care about the PV case we can simplify the patch,
I am appending a new one that only takes care of the PV on HVM case.
---
xen: re-inject emulated level pirqs in PV on HVM guests if still asserted
PV on HVM guests can loose level interrupts coming from emulated devices
if they have been remapped onto event channels. The reason is that we
are missing the code to inject a pirq again in the guest when the guest
EOIs it, if it corresponds to an emulated level interrupt and the
interrupt is still asserted.
Fix this issue and also return error when the guest tries to get the
irq_status of a non-existing pirq.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
diff -r c681dd5aecf3 xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c Tue Oct 25 19:22:09 2011 +0100
+++ b/xen/arch/x86/physdev.c Thu Oct 27 11:30:55 2011 +0000
@@ -11,6 +11,7 @@
#include <asm/current.h>
#include <asm/io_apic.h>
#include <asm/msi.h>
+#include <asm/hvm/irq.h>
#include <asm/hypercall.h>
#include <public/xen.h>
#include <public/physdev.h>
@@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
if ( !is_hvm_domain(v->domain) ||
domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
pirq_guest_eoi(pirq);
+ if ( is_hvm_domain(v->domain) &&
+ domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
+ {
+ struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
+ int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
+
+ /* if this is a level irq and count > 0, send another
+ * notification */
+ if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
+ && hvm_irq->gsi_assert_count[gsi] )
+ send_guest_pirq(v->domain, pirq);
+ }
spin_unlock(&v->domain->event_lock);
ret = 0;
break;
@@ -328,9 +341,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
break;
irq_status_query.flags = 0;
if ( is_hvm_domain(v->domain) &&
- domain_pirq_to_irq(v->domain, irq) <= 0 )
+ domain_pirq_to_irq(v->domain, irq) <= 0 &&
+ domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
{
- ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+ ret = -EINVAL;
break;
}
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen: do not loose level interrupt notifications
2011-10-27 11:42 ` Stefano Stabellini
@ 2011-10-27 12:17 ` Keir Fraser
0 siblings, 0 replies; 20+ messages in thread
From: Keir Fraser @ 2011-10-27 12:17 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel, Stefan Bader, Jan Beulich
On 27/10/2011 12:42, "Stefano Stabellini" <stefano.stabellini@eu.citrix.com>
wrote:
>> Is PV-HVM so different?
>
> Yes, it is. In the PV on HVM case we need to reassert an emulated
> interrupt if the guest EOIs it without quenching the interrupt in the
> ISR.
> We are doing it already in the emulated code path but we are not doing
> it for interrupts that have been remapped into pirqs.
>
> That said, if we don't care about the PV case we can simplify the patch,
> I am appending a new one that only takes care of the PV on HVM case.
Ah yes, when we are *emulating* an INTx line, either for an emulated device
or because we are doing MSI-INTx emulation, we do have to remember to
reassert. That makes sense.
-- Keir
^ permalink raw reply [flat|nested] 20+ messages in thread
* Still struggling with HVM: tx timeouts on emulated nics
@ 2011-09-21 13:03 Stefan Bader
2011-09-21 13:31 ` Stefano Stabellini
0 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-09-21 13:03 UTC (permalink / raw)
To: xen-devel, Stefano Stabellini
[-- Attachment #1: Type: text/plain, Size: 1282 bytes --]
This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
gets configured via dhcp. And initial pings also get routed and done correctly.
But slightly higher traffic (like checking for updates) hangs. And after a while
there are messages about tx timeouts.
The ne2k_pci type nic almost immediately has those issues and never comes up
correctly.
I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
this should be but both nics get configured with level,low IRQs. Disk emulation
seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
at least not level.
Btw, what exactly is the difference between xen-pirq-ioapic and IO-APIC?
Another problem came up recently though that may just be me doing the wrong
thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated
devices. xen-blkfront is a module in my case and I thought I once had been able
to use that by removing the unplug arg and making the blkfront driver load. But
when I recently tried the module loaded but no disks appeared... Again, not sure
I just forgot how to do that right or that was different when using a 4.1.0
hypervisor still...
-Stefan
[-- Attachment #2: hvm-nic-tx-dmesg.txt --]
[-- Type: text/plain, Size: 43114 bytes --]
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.0.0-11-server (buildd@crested) (gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu2) ) #18-Ubuntu SMP Wed Sep 14 01:20:37 UTC 2011 (Ubuntu 3.0.0-11.18-server 3.0.4)
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.0.0-11-server root=/dev/mapper/oneiric--server64-root ro debug apic=debug xen_emul_unplug=unnecessary
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Centaur CentaurHauls
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009e000 (usable)
[ 0.000000] BIOS-e820: 000000000009e000 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000007f800000 (usable)
[ 0.000000] BIOS-e820: 00000000fc000000 - 0000000100000000 (reserved)
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI 2.4 present.
[ 0.000000] DMI: Xen HVM domU, BIOS 4.1.1 09/01/2011
[ 0.000000] Hypervisor detected: Xen HVM
[ 0.000000] Xen version 4.1.
[ 0.000000] Xen Platform PCI: I/O protocol version 1
[ 0.000000] HVMOP_pagetable_dying not supported
[ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[ 0.000000] No AGP bridge found
[ 0.000000] last_pfn = 0x7f800 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: write-back
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF write-combining
[ 0.000000] C0000-FFFFF write-back
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 0000F0000000 mask FFFFF8000000 uncachable
[ 0.000000] 1 base 0000F8000000 mask FFFFFC000000 uncachable
[ 0.000000] 2 disabled
[ 0.000000] 3 disabled
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] TOM2: 0000000820000000 aka 33280M
[ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[ 0.000000] Scan SMP from ffff880000000000 for 1024 bytes.
[ 0.000000] Scan SMP from ffff88000009fc00 for 1024 bytes.
[ 0.000000] Scan SMP from ffff8800000f0000 for 65536 bytes.
[ 0.000000] found SMP MP-table at [ffff8800000fbcb0] fbcb0
[ 0.000000] mpc: fbba0-fbcac
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 20480
[ 0.000000] init_memory_mapping: 0000000000000000-000000007f800000
[ 0.000000] 0000000000 - 007f800000 page 2M
[ 0.000000] kernel direct mapping tables up to 7f800000 @ 7f7fd000-7f800000
[ 0.000000] RAMDISK: 364d4000 - 37262000
[ 0.000000] ACPI: RSDP 00000000000ea020 00024 (v02 Xen)
[ 0.000000] ACPI: XSDT 00000000fc0134f0 0003C (v01 Xen HVM 00000000 HVML 00000000)
[ 0.000000] ACPI: FACP 00000000fc0132d0 000F4 (v04 Xen HVM 00000000 HVML 00000000)
[ 0.000000] ACPI: DSDT 00000000fc003440 0FE05 (v02 Xen HVM 00000000 INTL 20100528)
[ 0.000000] ACPI: FACS 00000000fc003400 00040
[ 0.000000] ACPI: APIC 00000000fc0133d0 000D8 (v02 Xen HVM 00000000 HVML 00000000)
[ 0.000000] ACPI: HPET 00000000fc0134b0 00038 (v01 Xen HVM 00000000 HVML 00000000)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] mapped APIC to ffffffffff5fc000 ( fee00000)
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at 0000000000000000-000000007f800000
[ 0.000000] Initmem setup node 0 0000000000000000-000000007f800000
[ 0.000000] NODE_DATA [000000007f7f8000 - 000000007f7fcfff]
[ 0.000000] [ffffea0000000000-ffffea0001bfffff] PMD -> [ffff88007ce00000-ffff88007e9fffff] on node 0
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0x00000010 -> 0x00001000
[ 0.000000] DMA32 0x00001000 -> 0x00100000
[ 0.000000] Normal empty
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[2] active PFN ranges
[ 0.000000] 0: 0x00000010 -> 0x0000009e
[ 0.000000] 0: 0x00000100 -> 0x0007f800
[ 0.000000] On node 0 totalpages: 522126
[ 0.000000] DMA zone: 56 pages used for memmap
[ 0.000000] DMA zone: 5 pages reserved
[ 0.000000] DMA zone: 3921 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 7084 pages used for memmap
[ 0.000000] DMA32 zone: 511060 pages, LIFO batch:31
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] mapped APIC to ffffffffff5fc000 ( fee00000)
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x08] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x0a] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x0c] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x0e] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x10] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x12] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x14] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x16] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x18] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x1a] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x1c] disabled)
[ 0.000000] ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-47
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 00, APIC ID 1, APIC INT 02
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 low level)
[ 0.000000] Int: type 0, pol 3, trig 3, bus 00, IRQ 05, APIC ID 1, APIC INT 05
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 low level)
[ 0.000000] Int: type 0, pol 3, trig 3, bus 00, IRQ 0a, APIC ID 1, APIC INT 0a
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 low level)
[ 0.000000] Int: type 0, pol 3, trig 3, bus 00, IRQ 0b, APIC ID 1, APIC INT 0b
[ 0.000000] Int: type 0, pol 3, trig 3, bus 00, IRQ 09, APIC ID 1, APIC INT 09
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 01, APIC ID 1, APIC INT 01
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 03, APIC ID 1, APIC INT 03
[ 0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 04, APIC ID 1, APIC INT 04
[ 0.000000] ACPI: IRQ5 used by override.
[ 0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 06, APIC ID 1, APIC INT 06
[ 0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 07, APIC ID 1, APIC INT 07
[ 0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 08, APIC ID 1, APIC INT 08
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] ACPI: IRQ10 used by override.
[ 0.000000] ACPI: IRQ11 used by override.
[ 0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0c, APIC ID 1, APIC INT 0c
[ 0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0d, APIC ID 1, APIC INT 0d
[ 0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0e, APIC ID 1, APIC INT 0e
[ 0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0f, APIC ID 1, APIC INT 0f
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.000000] SMP: Allowing 15 CPUs, 11 hotplug CPUs
[ 0.000000] mapped IOAPIC to ffffffffff5fb000 (fec00000)
[ 0.000000] nr_irqs_gsi: 64
[ 0.000000] PM: Registered nosave memory: 000000000009e000 - 00000000000a0000
[ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
[ 0.000000] PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
[ 0.000000] Allocating PCI resources starting at 7f800000 (gap: 7f800000:7c800000)
[ 0.000000] Booting paravirtualized kernel on Xen HVM
[ 0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:15 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 27 pages/cpu @ffff88007f400000 s79616 r8192 d22784 u131072
[ 0.000000] pcpu-alloc: s79616 r8192 d22784 u131072 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 --
[ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 514981
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.0.0-11-server root=/dev/mapper/oneiric--server64-root ro debug apic=debug xen_emul_unplug=unnecessary
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Calgary: detecting Calgary via BIOS EBDA area
[ 0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
[ 0.000000] Memory: 2028616k/2088960k available (6186k kernel code, 456k absent, 59888k reserved, 6936k data, 900k init)
[ 0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=15, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:16640 nr_irqs:1208 16
[ 0.000000] Xen HVM callback vector for event delivery is enabled
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] allocated 16777216 bytes of page_cgroup
[ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[ 0.000000] hpet clockevent registered
[ 0.000000] Detected 2000.196 MHz processor.
[ 0.000000] Marking TSC unstable due to TSCs unsynchronized
[ 0.020000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4000.39 BogoMIPS (lpj=20001960)
[ 0.020000] pid_max: default: 32768 minimum: 301
[ 0.020000] Security Framework initialized
[ 0.020000] AppArmor: AppArmor initialized
[ 0.020000] Yama: becoming mindful.
[ 0.020000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 0.020000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[ 0.020000] Mount-cache hash table entries: 256
[ 0.020000] Initializing cgroup subsys cpuacct
[ 0.020000] Initializing cgroup subsys memory
[ 0.020000] Initializing cgroup subsys devices
[ 0.020000] Initializing cgroup subsys freezer
[ 0.020000] Initializing cgroup subsys net_cls
[ 0.020000] Initializing cgroup subsys blkio
[ 0.020000] Initializing cgroup subsys perf_event
[ 0.020000] tseg: 00dff00000
[ 0.020000] CPU: Physical Processor ID: 0
[ 0.020000] CPU: Processor Core ID: 0
[ 0.020000] mce: CPU supports 6 MCE banks
[ 0.020000] using AMD E400 aware idle routine
[ 0.020000] ACPI: Core revision 20110413
[ 0.020671] ftrace: allocating 26000 entries in 102 pages
[ 0.040103] Getting VERSION: 50014
[ 0.040118] Getting VERSION: 50014
[ 0.040126] Getting ID: 0
[ 0.040133] Getting ID: ff000000
[ 0.040141] Getting LVT0: 10700
[ 0.040147] Getting LVT1: 400
[ 0.041801] x2apic not enabled, IRQ remapping init failed
[ 0.041807] Switched APIC routing to physical flat.
[ 0.041873] masked ExtINT on CPU#0
[ 0.043550] ENABLING IO-APIC IRQs
[ 0.043553] init IO_APIC IRQs
[ 0.043556] apic 1 pin 0 not connected
[ 0.043567] IOAPIC[0]: Set routing entry (1-1 -> 0x31 -> IRQ 1 Mode:0 Active:0)
[ 0.043594] IOAPIC[0]: Set routing entry (1-2 -> 0x30 -> IRQ 0 Mode:0 Active:0)
[ 0.050003] IOAPIC[0]: Set routing entry (1-3 -> 0x33 -> IRQ 3 Mode:0 Active:0)
[ 0.050029] IOAPIC[0]: Set routing entry (1-4 -> 0x34 -> IRQ 4 Mode:0 Active:0)
[ 0.050054] IOAPIC[0]: Set routing entry (1-5 -> 0x35 -> IRQ 5 Mode:1 Active:1)
[ 0.050079] IOAPIC[0]: Set routing entry (1-6 -> 0x36 -> IRQ 6 Mode:0 Active:0)
[ 0.050104] IOAPIC[0]: Set routing entry (1-7 -> 0x37 -> IRQ 7 Mode:0 Active:0)
[ 0.050129] IOAPIC[0]: Set routing entry (1-8 -> 0x38 -> IRQ 8 Mode:0 Active:0)
[ 0.050154] IOAPIC[0]: Set routing entry (1-9 -> 0x39 -> IRQ 9 Mode:1 Active:1)
[ 0.050179] IOAPIC[0]: Set routing entry (1-10 -> 0x3a -> IRQ 10 Mode:1 Active:1)
[ 0.050204] IOAPIC[0]: Set routing entry (1-11 -> 0x3b -> IRQ 11 Mode:1 Active:1)
[ 0.050229] IOAPIC[0]: Set routing entry (1-12 -> 0x3c -> IRQ 12 Mode:0 Active:0)
[ 0.050254] IOAPIC[0]: Set routing entry (1-13 -> 0x3d -> IRQ 13 Mode:0 Active:0)
[ 0.050279] IOAPIC[0]: Set routing entry (1-14 -> 0x3e -> IRQ 14 Mode:0 Active:0)
[ 0.050304] IOAPIC[0]: Set routing entry (1-15 -> 0x3f -> IRQ 15 Mode:0 Active:0)
[ 0.050328] apic 1 pin 16 not connected
[ 0.050332] apic 1 pin 17 not connected
[ 0.050335] apic 1 pin 18 not connected
[ 0.050338] apic 1 pin 19 not connected
[ 0.050341] apic 1 pin 20 not connected
[ 0.050345] apic 1 pin 21 not connected
[ 0.050348] apic 1 pin 22 not connected
[ 0.050351] apic 1 pin 23 not connected
[ 0.050354] apic 1 pin 24 not connected
[ 0.050358] apic 1 pin 25 not connected
[ 0.050361] apic 1 pin 26 not connected
[ 0.050364] apic 1 pin 27 not connected
[ 0.050367] apic 1 pin 28 not connected
[ 0.050370] apic 1 pin 29 not connected
[ 0.050374] apic 1 pin 30 not connected
[ 0.050377] apic 1 pin 31 not connected
[ 0.050380] apic 1 pin 32 not connected
[ 0.050383] apic 1 pin 33 not connected
[ 0.050387] apic 1 pin 34 not connected
[ 0.050390] apic 1 pin 35 not connected
[ 0.050393] apic 1 pin 36 not connected
[ 0.050396] apic 1 pin 37 not connected
[ 0.050399] apic 1 pin 38 not connected
[ 0.050403] apic 1 pin 39 not connected
[ 0.050406] apic 1 pin 40 not connected
[ 0.050409] apic 1 pin 41 not connected
[ 0.050412] apic 1 pin 42 not connected
[ 0.050416] apic 1 pin 43 not connected
[ 0.050419] apic 1 pin 44 not connected
[ 0.050422] apic 1 pin 45 not connected
[ 0.050425] apic 1 pin 46 not connected
[ 0.050428] apic 1 pin 47 not connected
[ 0.050566] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=0 pin2=0
[ 0.151653] CPU0: AMD Opteron(tm) Processor 6128 stepping 01
[ 0.151670] Xen: using vcpuop timer interface
[ 0.151686] installing Xen timer for CPU 0
[ 0.151766] Performance Events: Broken PMU hardware detected, using software events only.
[ 0.152552] Booting Node 0, Processors #1
[ 0.152556] smpboot cpu 1: start_ip = 99000
[ 0.020000] masked ExtINT on CPU#1
[ 0.310459] installing Xen timer for CPU 1
[ 0.310616] #2
[ 0.310624] smpboot cpu 2: start_ip = 99000
[ 0.020000] masked ExtINT on CPU#2
[ 0.470391] installing Xen timer for CPU 2
[ 0.470506] #3
[ 0.470511] smpboot cpu 3: start_ip = 99000
[ 0.020000] masked ExtINT on CPU#3
[ 0.630436] Brought up 4 CPUs
[ 0.630431] installing Xen timer for CPU 3
[ 0.630454] Total of 4 processors activated (16072.03 BogoMIPS).
[ 0.631461] devtmpfs: initialized
[ 0.631639] print_constraints: dummy:
[ 0.631669] Time: 12:04:45 Date: 09/21/11
[ 0.631719] NET: Registered protocol family 16
[ 0.631836] Trying to unpack rootfs image as initramfs...
[ 0.653188] Extended Config Space enabled on 0 nodes
[ 0.653251] ACPI: bus type pci registered
[ 0.653633] PCI: Using configuration type 1 for base access
[ 0.653638] PCI: Using configuration type 1 for extended access
[ 0.654361] bio: create slab <bio-0> at 0
[ 0.654361] ACPI: EC: Look up EC in DSDT
[ 0.663724] ACPI: Interpreter enabled
[ 0.663731] ACPI: (supports S0 S3 S4 S5)
[ 0.663750] ACPI: Using IOAPIC for interrupt routing
[ 0.778431] ACPI: No dock devices found.
[ 0.778438] HEST: Table not found.
[ 0.778443] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[ 0.778513] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[ 0.778634] pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7]
[ 0.778640] pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff]
[ 0.778645] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
[ 0.778651] pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xfbffffff]
[ 0.778829] pci 0000:00:00.0: [8086:1237] type 0 class 0x000600
[ 0.780379] pci 0000:00:01.0: [8086:7000] type 0 class 0x000601
[ 0.782861] pci 0000:00:01.1: [8086:7010] type 0 class 0x000101
[ 0.784385] pci 0000:00:01.1: reg 20: [io 0xc300-0xc30f]
[ 0.785463] pci 0000:00:01.3: [8086:7113] type 0 class 0x000680
[ 0.785515] * Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
[ 0.785517] * this clock source is slow. Consider trying other clock sources
[ 0.787717] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
[ 0.788460] pci 0000:00:02.0: [1013:00b8] type 0 class 0x000300
[ 0.788835] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
[ 0.789128] pci 0000:00:02.0: reg 14: [mem 0xf3000000-0xf3000fff]
[ 0.790919] pci 0000:00:03.0: [5853:0001] type 0 class 0x00ff80
[ 0.791419] pci 0000:00:03.0: reg 10: [io 0xc000-0xc0ff]
[ 0.791747] pci 0000:00:03.0: reg 14: [mem 0xf2000000-0xf2ffffff pref]
[ 0.793759] pci 0000:00:04.0: [10ec:8139] type 0 class 0x000200
[ 0.794209] pci 0000:00:04.0: reg 10: [io 0xc100-0xc1ff]
[ 0.794538] pci 0000:00:04.0: reg 14: [mem 0xf3001000-0xf30010ff]
[ 0.796559] pci 0000:00:05.0: [10ec:8029] type 0 class 0x000200
[ 0.797009] pci 0000:00:05.0: reg 10: [io 0xc200-0xc2ff]
[ 0.800179] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[ 0.800570] pci0000:00: Requesting ACPI _OSC control (0x1d)
[ 0.800577] pci0000:00: ACPI _OSC request failed (AE_NOT_FOUND), returned control mask: 0x1d
[ 0.800582] ACPI _OSC control for PCIe not granted, disabling ASPM
[ 0.935017] Freeing initrd memory: 13880k freed
[ 0.986014] ACPI: PCI Interrupt Link [LNKA] (IRQs *5 10 11)
[ 0.986234] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[ 0.986432] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[ 0.986626] ACPI: PCI Interrupt Link [LNKD] (IRQs *5 10 11)
[ 0.986762] xen/balloon: Initialising balloon driver.
[ 0.986768] last_pfn = 0x7f800 max_arch_pfn = 0x400000000
[ 0.986778] xen-balloon: Initialising balloon driver.
[ 0.986905] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[ 0.986913] vgaarb: loaded
[ 0.986916] vgaarb: bridge control possible 0000:00:02.0
[ 0.987107] SCSI subsystem initialized
[ 0.987136] libata version 3.00 loaded.
[ 0.987136] usbcore: registered new interface driver usbfs
[ 0.987136] usbcore: registered new interface driver hub
[ 0.987136] usbcore: registered new device driver usb
[ 0.987136] PCI: Using ACPI for IRQ routing
[ 0.987136] PCI: pci_cache_line_size set to 64 bytes
[ 0.987136] reserve RAM buffer: 000000000009e000 - 000000000009ffff
[ 0.987136] reserve RAM buffer: 000000007f800000 - 000000007fffffff
[ 0.987136] NetLabel: Initializing
[ 0.987136] NetLabel: domain hash size = 128
[ 0.987136] NetLabel: protocols = UNLABELED CIPSOv4
[ 0.987136] NetLabel: unlabeled traffic allowed by default
[ 0.987136]
[ 0.987136] printing PIC contents
[ 0.987136] ... PIC IMR: ffff
[ 0.987136] ... PIC IRR: 0001
[ 0.987136] ... PIC ISR: 0000
[ 0.987136] ... PIC ELCR: 0c20
[ 0.987136] printing local APIC contents on CPU#0/0:
[ 0.987136] ... APIC ID: 00000000 (0)
[ 0.987136] ... APIC VERSION: 00050014
[ 0.987136] ... APIC TASKPRI: 00000000 (00)
[ 0.987136] ... APIC PROCPRI: 00000000
[ 0.987136] ... APIC LDR: 01000000
[ 0.987136] ... APIC DFR: ffffffff
[ 0.987136] ... APIC SPIV: 000001ff
[ 0.987136] ... APIC ISR field:
[ 0.987136] 0000000000000000000000000000000000000000000000000000000000000000
[ 0.987136] ... APIC TMR field:
[ 0.987136] 0000000000000000000000000000000000000000000000000000000000000000
[ 0.987136] ... APIC IRR field:
[ 0.987136] 0000000000000000000000000000000000000000000000000000000000000000
[ 0.987136] ... APIC ESR: 00000000
[ 0.987136] ... APIC ICR: 00000699
[ 0.987136] ... APIC ICR2: 06000000
[ 0.987136] ... APIC LVTT: 00010000
[ 0.987136] ... APIC LVTPC: 00010000
[ 0.987136] ... APIC LVT0: 00010700
[ 0.987136] ... APIC LVT1: 00000400
[ 0.987136] ... APIC LVTERR: 000000fe
[ 0.987136] ... APIC TMICT: 00000000
[ 0.987136] ... APIC TMCCT: 00000000
[ 0.987136] ... APIC TDCR: 00000000
[ 0.987136]
[ 0.987136] number of MP IRQ sources: 15.
[ 0.987136] number of IO-APIC #1 registers: 48.
[ 0.987136] testing the IO APIC.......................
[ 0.987136]
[ 0.987136] IO APIC #1......
[ 0.987136] .... register #00: 00000000
[ 0.987136] ....... : physical APIC id: 00
[ 0.987136] ....... : Delivery Type: 0
[ 0.987136] ....... : LTS : 0
[ 0.987136] .... register #01: 002F0011
[ 0.987136] ....... : max redirection entries: 002F
[ 0.987136] ....... : PRQ implemented: 0
[ 0.987136] ....... : IO APIC version: 0011
[ 0.987136] .... register #02: 00000000
[ 0.987136] ....... : arbitration: 00
[ 0.987136] .... IRQ redirection table:
[ 0.987136] NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
[ 0.987136] 00 000 1 0 0 0 0 0 0 00
[ 0.987136] 01 000 0 0 0 0 0 0 0 31
[ 0.987136] 02 000 0 0 0 0 0 0 0 30
[ 0.987136] 03 000 0 0 0 0 0 0 0 33
[ 0.987136] 04 000 0 0 0 0 0 0 0 34
[ 0.987136] 05 000 1 1 0 1 0 0 0 35
[ 0.987136] 06 000 0 0 0 0 0 0 0 36
[ 0.987136] 07 000 0 0 0 0 0 0 0 37
[ 0.987136] 08 000 0 0 0 0 0 0 0 38
[ 0.987136] 09 000 0 1 0 1 0 0 0 39
[ 0.987136] 0a 000 1 1 0 1 0 0 0 3A
[ 0.987136] 0b 000 1 1 0 1 0 0 0 3B
[ 0.987136] 0c 000 0 0 0 0 0 0 0 3C
[ 0.987136] 0d 000 0 0 0 0 0 0 0 3D
[ 0.987136] 0e 000 0 0 0 0 0 0 0 3E
[ 0.987136] 0f 000 0 0 0 0 0 0 0 3F
[ 0.987136] 10 000 1 0 0 0 0 0 0 00
[ 0.987136] 11 000 1 0 0 0 0 0 0 00
[ 0.987136] 12 000 1 0 0 0 0 0 0 00
[ 0.987136] 13 000 1 0 0 0 0 0 0 00
[ 0.987136] 14 000 1 0 0 0 0 0 0 00
[ 0.987136] 15 000 1 0 0 0 0 0 0 00
[ 0.987136] 16 000 1 0 0 0 0 0 0 00
[ 0.987136] 17 000 1 0 0 0 0 0 0 00
[ 0.987136] 18 000 1 0 0 0 0 0 0 00
[ 0.987136] 19 000 1 0 0 0 0 0 0 00
[ 0.987136] 1a 000 1 0 0 0 0 0 0 00
[ 0.987136] 1b 000 1 0 0 0 0 0 0 00
[ 0.987136] 1c 000 1 0 0 0 0 0 0 00
[ 0.987136] 1d 000 1 0 0 0 0 0 0 00
[ 0.987136] 1e 000 1 0 0 0 0 0 0 00
[ 0.987136] 1f 000 1 0 0 0 0 0 0 00
[ 0.987136] 20 000 1 0 0 0 0 0 0 00
[ 0.987136] 21 000 1 0 0 0 0 0 0 00
[ 0.987136] 22 000 1 0 0 0 0 0 0 00
[ 0.987136] 23 000 1 0 0 0 0 0 0 00
[ 0.987136] 24 000 1 0 0 0 0 0 0 00
[ 0.987136] 25 000 1 0 0 0 0 0 0 00
[ 0.987136] 26 000 1 0 0 0 0 0 0 00
[ 0.987136] 27 000 1 0 0 0 0 0 0 00
[ 0.987136] 28 000 1 0 0 0 0 0 0 00
[ 0.987136] 29 000 1 0 0 0 0 0 0 00
[ 0.987136] 2a 000 1 0 0 0 0 0 0 00
[ 0.987136] 2b 000 1 0 0 0 0 0 0 00
[ 0.987136] 2c 000 1 0 0 0 0 0 0 00
[ 0.987136] 2d 000 1 0 0 0 0 0 0 00
[ 0.987136] 2e 000 1 0 0 0 0 0 0 00
[ 0.990006] 2f 000 1 0 0 0 0 0 0 00
[ 0.990012] IRQ to pin mappings:
[ 0.990015] IRQ0 -> 0:2
[ 0.990019] IRQ1 -> 0:1
[ 0.990023] IRQ3 -> 0:3
[ 0.990026] IRQ4 -> 0:4
[ 0.990030] IRQ5 -> 0:5
[ 0.990033] IRQ6 -> 0:6
[ 0.990037] IRQ7 -> 0:7
[ 0.990040] IRQ8 -> 0:8
[ 0.990044] IRQ9 -> 0:9
[ 0.990048] IRQ10 -> 0:10
[ 0.990051] IRQ11 -> 0:11
[ 0.990055] IRQ12 -> 0:12
[ 0.990059] IRQ13 -> 0:13
[ 0.990062] IRQ14 -> 0:14
[ 0.990066] IRQ15 -> 0:15
[ 0.990071] .................................... done.
[ 0.990087] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[ 0.990108] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[ 0.990115] hpet0: 3 comparators, 64-bit 62.500000 MHz counter
[ 1.020042] Switching to clocksource xen
[ 1.020106] Switched to NOHz mode on CPU #2
[ 1.020419] Switched to NOHz mode on CPU #1
[ 1.026147] AppArmor: AppArmor Filesystem Enabled
[ 1.026182] pnp: PnP ACPI init
[ 1.026197] ACPI: bus type pnp registered
[ 1.026228] pnp 00:00: [mem 0x00000000-0x0009ffff]
[ 1.026267] system 00:00: [mem 0x00000000-0x0009ffff] could not be reserved
[ 1.026274] system 00:00: Plug and Play ACPI device, IDs PNP0c02 (active)
[ 1.026366] pnp 00:01: [bus 00-ff]
[ 1.026370] pnp 00:01: [io 0x0cf8-0x0cff]
[ 1.026374] pnp 00:01: [io 0x0000-0x0cf7 window]
[ 1.026378] pnp 00:01: [io 0x0d00-0xffff window]
[ 1.026382] pnp 00:01: [mem 0x000a0000-0x000bffff window]
[ 1.026387] pnp 00:01: [mem 0xf0000000-0xfbffffff window]
[ 1.026421] pnp 00:01: Plug and Play ACPI device, IDs PNP0a03 (active)
[ 1.026432] pnp 00:02: [io 0x10c0-0x1141]
[ 1.026436] pnp 00:02: [io 0xb044-0xb047]
[ 1.026463] system 00:02: [io 0x10c0-0x1141] has been reserved
[ 1.026468] system 00:02: [io 0xb044-0xb047] has been reserved
[ 1.026473] system 00:02: Plug and Play ACPI device, IDs PNP0c02 (active)
[ 1.026497] pnp 00:03: [mem 0xfed00000-0xfed003ff]
[ 1.026518] pnp 00:03: Plug and Play ACPI device, IDs PNP0103 (active)
[ 1.026538] pnp 00:04: [io 0x0010-0x001f]
[ 1.026542] pnp 00:04: [io 0x0022-0x002d]
[ 1.026545] pnp 00:04: [io 0x0030-0x003f]
[ 1.026549] pnp 00:04: [io 0x0044-0x005f]
[ 1.026552] pnp 00:04: [io 0x0062-0x0063]
[ 1.026556] pnp 00:04: [io 0x0065-0x006f]
[ 1.026559] pnp 00:04: [io 0x0072-0x007f]
[ 1.026563] pnp 00:04: [io 0x0080]
[ 1.026566] pnp 00:04: [io 0x0084-0x0086]
[ 1.026569] pnp 00:04: [io 0x0088]
[ 1.026573] pnp 00:04: [io 0x008c-0x008e]
[ 1.026576] pnp 00:04: [io 0x0090-0x009f]
[ 1.026580] pnp 00:04: [io 0x00a2-0x00bd]
[ 1.026583] pnp 00:04: [io 0x00e0-0x00ef]
[ 1.026587] pnp 00:04: [io 0x08a0-0x08a3]
[ 1.026590] pnp 00:04: [io 0x0cc0-0x0ccf]
[ 1.026594] pnp 00:04: [io 0x04d0-0x04d1]
[ 1.026629] system 00:04: [io 0x08a0-0x08a3] has been reserved
[ 1.026635] system 00:04: [io 0x0cc0-0x0ccf] has been reserved
[ 1.026639] system 00:04: [io 0x04d0-0x04d1] has been reserved
[ 1.026645] system 00:04: Plug and Play ACPI device, IDs PNP0c02 (active)
[ 1.026659] pnp 00:05: [dma 4]
[ 1.026662] pnp 00:05: [io 0x0000-0x000f]
[ 1.026666] pnp 00:05: [io 0x0081-0x0083]
[ 1.026671] pnp 00:05: [io 0x0087]
[ 1.026674] pnp 00:05: [io 0x0089-0x008b]
[ 1.026678] pnp 00:05: [io 0x008f]
[ 1.026681] pnp 00:05: [io 0x00c0-0x00df]
[ 1.026684] pnp 00:05: [io 0x0480-0x048f]
[ 1.026713] pnp 00:05: Plug and Play ACPI device, IDs PNP0200 (active)
[ 1.026725] pnp 00:06: [io 0x0070-0x0071]
[ 1.026745] xen: --> irq=8, pirq=16
[ 1.026750] pnp 00:06: [irq 8]
[ 1.026771] pnp 00:06: Plug and Play ACPI device, IDs PNP0b00 (active)
[ 1.026781] pnp 00:07: [io 0x0061]
[ 1.026803] pnp 00:07: Plug and Play ACPI device, IDs PNP0800 (active)
[ 1.026825] xen: --> irq=12, pirq=17
[ 1.026829] pnp 00:08: [irq 12]
[ 1.026850] pnp 00:08: Plug and Play ACPI device, IDs PNP0f13 (active)
[ 1.026866] pnp 00:09: [io 0x0060]
[ 1.026869] pnp 00:09: [io 0x0064]
[ 1.026878] xen: --> irq=1, pirq=18
[ 1.026881] pnp 00:09: [irq 1]
[ 1.026905] pnp 00:09: Plug and Play ACPI device, IDs PNP0303 PNP030b (active)
[ 1.026922] pnp 00:0a: [io 0x03f0-0x03f5]
[ 1.026926] pnp 00:0a: [io 0x03f7]
[ 1.026934] xen: --> irq=6, pirq=19
[ 1.026938] pnp 00:0a: [irq 6]
[ 1.026941] pnp 00:0a: [dma 2]
[ 1.026962] pnp 00:0a: Plug and Play ACPI device, IDs PNP0700 (active)
[ 1.026984] pnp 00:0b: [io 0x03f8-0x03ff]
[ 1.026992] xen: --> irq=4, pirq=20
[ 1.026996] pnp 00:0b: [irq 4]
[ 1.027018] pnp 00:0b: Plug and Play ACPI device, IDs PNP0501 (active)
[ 1.027048] pnp 00:0c: [io 0x0378-0x037f]
[ 1.027056] xen: --> irq=7, pirq=21
[ 1.027060] pnp 00:0c: [irq 7]
[ 1.027084] pnp 00:0c: Plug and Play ACPI device, IDs PNP0400 (active)
[ 1.029095] Switched to NOHz mode on CPU #0
[ 1.029911] Switched to NOHz mode on CPU #3
[ 1.098728] pnp: PnP ACPI: found 13 devices
[ 1.098736] ACPI: ACPI bus type pnp unregistered
[ 1.104769] PCI: max bus depth: 0 pci_try_num: 1
[ 1.104783] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7]
[ 1.104789] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff]
[ 1.104794] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff]
[ 1.104799] pci_bus 0000:00: resource 7 [mem 0xf0000000-0xfbffffff]
[ 1.104853] NET: Registered protocol family 2
[ 1.105001] IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
[ 1.105753] TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
[ 1.107366] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[ 1.107785] TCP: Hash tables configured (established 262144 bind 65536)
[ 1.107790] TCP reno registered
[ 1.107801] UDP hash table entries: 1024 (order: 3, 32768 bytes)
[ 1.107829] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
[ 1.108003] NET: Registered protocol family 1
[ 1.108019] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[ 1.108153] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[ 1.108392] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[ 1.108920] pci 0000:00:02.0: Boot video device
[ 1.109529] PCI: CLS 0 bytes, default 64
[ 1.110111] audit: initializing netlink socket (disabled)
[ 1.110125] type=2000 audit(1316606687.451:1): initialized
[ 1.138892] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[ 1.152615] VFS: Disk quotas dquot_6.5.2
[ 1.152689] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 1.153476] fuse init (API version 7.16)
[ 1.153577] msgmni has been set to 3989
[ 1.154960] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[ 1.155114] io scheduler noop registered
[ 1.155120] io scheduler deadline registered (default)
[ 1.155179] io scheduler cfq registered
[ 1.155285] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[ 1.155313] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[ 1.155381] efifb: probing for efifb
[ 1.155497] efifb: framebuffer at 0xf0000000, mapped to 0xffffc90000900000, using 1408k, total 1408k
[ 1.155504] efifb: mode is 800x600x24, linelength=2400, pages=1
[ 1.155508] efifb: scrolling: redraw
[ 1.155512] efifb: Truecolor: size=0:8:8:8, shift=0:16:8:0
[ 1.183380] Console: switching to colour frame buffer device 100x37
[ 1.210619] fb0: EFI VGA frame buffer device
[ 1.211058] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[ 1.211668] ACPI: Power Button [PWRF]
[ 1.212013] input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
[ 1.212617] ACPI: Sleep Button [SLPF]
[ 1.212939] ACPI: acpi_idle registered with cpuidle
[ 1.291734] ERST: Table is not found!
[ 1.292124] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[ 1.339997] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 2.236900] 00:0b: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 2.238060] Linux agpgart interface v0.103
[ 2.239591] brd: module loaded
[ 2.257113] loop: module loaded
[ 2.273373] ata_piix 0000:00:01.1: version 2.13
[ 2.289886] ata_piix 0000:00:01.1: setting latency timer to 64
[ 2.306932] scsi0 : ata_piix
[ 2.323107] scsi1 : ata_piix
[ 2.338841] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc300 irq 14
[ 2.354925] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc308 irq 15
[ 2.355237] Fixed MDIO Bus: probed
[ 2.355259] PPP generic driver version 2.4.2
[ 2.355300] tun: Universal TUN/TAP device driver, 1.6
[ 2.355302] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[ 2.355380] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 2.355395] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 2.355405] uhci_hcd: USB Universal Host Controller Interface driver
[ 2.355471] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
[ 2.358680] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 2.358688] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 2.358743] mousedev: PS/2 mouse device common for all mice
[ 2.358928] rtc_cmos 00:06: rtc core: registered rtc_cmos as rtc0
[ 2.358978] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[ 2.359079] device-mapper: uevent: version 1.0.3
[ 2.359144] device-mapper: ioctl: 4.20.0-ioctl (2011-02-02) initialised: dm-devel@redhat.com
[ 2.359152] cpuidle: using governor ladder
[ 2.359154] cpuidle: using governor menu
[ 2.359156] EFI Variables Facility v0.08 2004-May-17
[ 2.359406] TCP cubic registered
[ 2.359526] NET: Registered protocol family 10
[ 2.360127] NET: Registered protocol family 17
[ 2.360144] Registering the dns_resolver key type
[ 2.360239] PM: Hibernation image not present or could not be loaded.
[ 2.360251] registered taskstats version 1
[ 2.367547] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
[ 2.374868] Magic number: 11:381:77
[ 2.512454] ata1.01: NODEV after polling detection
[ 2.522271] ata1.00: ATA-7: QEMU HARDDISK, 0.10.2, max UDMA/100
[ 2.522275] ata1.00: 16384000 sectors, multi 16: LBA48
[ 2.524043] ata1.00: configured for MWDMA2
[ 2.524272] scsi 0:0:0:0: Direct-Access ATA QEMU HARDDISK 0.10 PQ: 0 ANSI: 5
[ 2.524424] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 2.524447] sd 0:0:0:0: [sda] 16384000 512-byte logical blocks: (8.38 GB/7.81 GiB)
[ 2.524545] sd 0:0:0:0: [sda] Write Protect is off
[ 2.524549] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 2.524634] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 2.544562] sda: sda1 sda2 < sda5 >
[ 2.818026] rtc_cmos 00:06: setting system clock to 2011-09-21 12:04:46 UTC (1316606686)
[ 2.818427] sd 0:0:0:0: [sda] Attached SCSI disk
[ 3.097595] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[ 3.097598] EDD information not available.
[ 3.165041] Freeing unused kernel memory: 900k freed
[ 3.187512] Write protecting the kernel read-only data: 12288k
[ 3.216820] Freeing unused kernel memory: 1988k freed
[ 3.243501] Freeing unused kernel memory: 1372k freed
[ 3.312423] udevd[95]: starting version 173
[ 3.349204] ne2k-pci.c:v1.03 9/22/2003 D. Becker/P. Gortmaker
[ 3.375292] xen: --> irq=36, pirq=22
[ 3.397463] ne2k-pci 0000:00:05.0: PCI INT A -> GSI 36 (level, low) -> IRQ 36
[ 3.420185] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
[ 3.423679] eth0: RealTek RTL-8029 found at 0xc200, IRQ 36, 00:16:3e:11:10:14.
[ 3.461751] xen: --> irq=32, pirq=23
[ 3.481957] 8139cp 0000:00:04.0: PCI INT A -> GSI 32 (level, low) -> IRQ 32
[ 3.573231] 8139cp 0000:00:04.0: eth1: RTL-8139C+ at 0xffffc9000031c000, 00:16:3e:11:10:04, IRQ 32
[ 3.619031] 8139cp 0000:00:04.0: setting latency timer to 64
[ 3.653525] 8139too: 8139too Fast Ethernet driver 0.9.28
[ 3.653924] FDC 0 is a S82078B
[ 4.348933] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
[ 6.576003] udevd[329]: starting version 173
[ 6.587679] Adding 1044476k swap on /dev/mapper/oneiric--server64-swap_1. Priority:-1 extents:1 across:1044476k
[ 6.607657] piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
[ 6.647843] xen map irq failed -22
[ 6.647850] xen-platform-pci 0000:00:03.0: PCI INT A: failed to register GSI
[ 6.647861] xen-platform-pci: probe of 0000:00:03.0 failed with error -1
[ 6.692222] udevd[347]: renamed network interface eth1 to rename3
[ 6.694388] lp: driver loaded but no devices found
[ 6.763849] parport_pc 00:0c: reported by Plug and Play ACPI
[ 6.791506] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[ 6.828582] udevd[348]: renamed network interface eth0 to eth1
[ 6.831007] type=1400 audit(1316606690.953:2): apparmor="STATUS" operation="profile_load" name="/sbin/dhclient" pid=496 comm="apparmor_parser"
[ 6.832850] type=1400 audit(1316606690.953:3): apparmor="STATUS" operation="profile_replace" name="/sbin/dhclient" pid=503 comm="apparmor_parser"
[ 6.833442] type=1400 audit(1316606690.953:4): apparmor="STATUS" operation="profile_load" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=503 comm="apparmor_parser"
[ 6.833811] type=1400 audit(1316606690.953:5): apparmor="STATUS" operation="profile_load" name="/usr/lib/connman/scripts/dhclient-script" pid=503 comm="apparmor_parser"
[ 6.836616] type=1400 audit(1316606690.953:6): apparmor="STATUS" operation="profile_replace" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=496 comm="apparmor_parser"
[ 6.840340] type=1400 audit(1316606690.963:7): apparmor="STATUS" operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" pid=496 comm="apparmor_parser"
[ 6.860412] udevd[347]: renamed network interface rename3 to eth0
[ 6.875981] lp0: using parport0 (interrupt-driven).
[ 6.878190] ppdev: user-space parallel port driver
[ 6.911760] 8139cp 0000:00:04.0: eth0: link up, 100Mbps, full-duplex, lpa 0x05E1
[ 7.010846] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro
[ 7.165200] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
[ 7.523428] type=1400 audit(1316606691.643:8): apparmor="STATUS" operation="profile_load" name="/usr/sbin/tcpdump" pid=696 comm="apparmor_parser"
[ 7.523642] type=1400 audit(1316606691.643:9): apparmor="STATUS" operation="profile_replace" name="/sbin/dhclient" pid=695 comm="apparmor_parser"
[ 7.524156] type=1400 audit(1316606691.643:10): apparmor="STATUS" operation="profile_replace" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=695 comm="apparmor_parser"
[ 7.524504] type=1400 audit(1316606691.643:11): apparmor="STATUS" operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" pid=695 comm="apparmor_parser"
[ 17.200108] eth0: no IPv6 routers present
[ 332.040096] ------------[ cut here ]------------
[ 332.040108] WARNING: at /build/buildd/linux-3.0.0/net/sched/sch_generic.c:255 dev_watchdog+0x25a/0x270()
[ 332.040111] Hardware name: HVM domU
[ 332.040114] NETDEV WATCHDOG: eth1 (ne2k-pci): transmit queue 0 timed out
[ 332.040116] Modules linked in: ppdev parport_pc psmouse serio_raw xen_platform_pci lp i2c_piix4 parport 8139too floppy 8139cp ne2k_pci 8390
[ 332.040131] Pid: 0, comm: swapper Not tainted 3.0.0-11-server #18-Ubuntu
[ 332.040133] Call Trace:
[ 332.040136] <IRQ> [<ffffffff8105e80f>] warn_slowpath_common+0x7f/0xc0
[ 332.040146] [<ffffffff8105e906>] warn_slowpath_fmt+0x46/0x50
[ 332.040151] [<ffffffff81091b92>] ? clockevents_program_event+0x62/0xa0
[ 332.040155] [<ffffffff81507e8a>] dev_watchdog+0x25a/0x270
[ 332.040158] [<ffffffff81507c30>] ? qdisc_reset+0x50/0x50
[ 332.040162] [<ffffffff81507c30>] ? qdisc_reset+0x50/0x50
[ 332.040167] [<ffffffff8106d2c6>] call_timer_fn+0x46/0x160
[ 332.040171] [<ffffffff81507c30>] ? qdisc_reset+0x50/0x50
[ 332.040174] [<ffffffff8106ebf2>] run_timer_softirq+0x132/0x2a0
[ 332.040179] [<ffffffff810076fc>] ? xen_timer_interrupt+0x2c/0x40
[ 332.040183] [<ffffffff81065db8>] __do_softirq+0xa8/0x210
[ 332.040189] [<ffffffff81607d1c>] call_softirq+0x1c/0x30
[ 332.040192] [<ffffffff8100c295>] do_softirq+0x65/0xa0
[ 332.040195] [<ffffffff8106619e>] irq_exit+0x8e/0xb0
[ 332.040200] [<ffffffff81385cf5>] xen_evtchn_do_upcall+0x35/0x50
[ 332.040204] [<ffffffff81607e43>] xen_hvm_callback_vector+0x13/0x20
[ 332.040206] <EOI> [<ffffffff81031c6b>] ? native_safe_halt+0xb/0x10
[ 332.040214] [<ffffffff810859d8>] ? hrtimer_start+0x18/0x20
[ 332.040218] [<ffffffff81012165>] default_idle+0x45/0x120
[ 332.040222] [<ffffffff8101229d>] amd_e400_idle+0x5d/0x120
[ 332.040225] [<ffffffff8100920b>] cpu_idle+0xab/0x100
[ 332.040230] [<ffffffff815c8b4e>] rest_init+0x72/0x74
[ 332.040235] [<ffffffff81ce6c2b>] start_kernel+0x3d4/0x3df
[ 332.040239] [<ffffffff81ce6388>] x86_64_start_reservations+0x132/0x136
[ 332.040243] [<ffffffff81ce6140>] ? early_idt_handlers+0x140/0x140
[ 332.040247] [<ffffffff81ce6459>] x86_64_start_kernel+0xcd/0xdc
[ 332.040249] ---[ end trace ad0f6614f5a7b7a3 ]---
[ 332.040349] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x2, t=44.
[ 333.240212] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x2, t=22.
[ 337.240207] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=21.
[ 341.040211] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=71.
[ 342.040103] eth1: no IPv6 routers present
[ 342.040183] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x2, t=100.
[ 352.040213] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=71.
[ 365.040191] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=71.
[ 403.040208] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=70.
[ 441.040219] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=70.
[ 467.040212] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=70.
[ 495.040209] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=70.
[ 522.040210] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=70.
[ 550.040205] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=70.
[ 578.040227] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=70.
[ 605.040218] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=70.
[ 631.040208] eth1: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=70.
[ 1801.000294] 8139cp 0000:00:04.0: eth0: Transmit timeout, status d 3b 15 80ff
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Still struggling with HVM: tx timeouts on emulated nics
2011-09-21 13:03 Still struggling with HVM: tx timeouts on emulated nics Stefan Bader
@ 2011-09-21 13:31 ` Stefano Stabellini
2011-09-21 15:34 ` Stefan Bader
0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-09-21 13:31 UTC (permalink / raw)
To: Stefan Bader; +Cc: xen-devel, Stabellini
On Wed, 21 Sep 2011, Stefan Bader wrote:
> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
> gets configured via dhcp. And initial pings also get routed and done correctly.
> But slightly higher traffic (like checking for updates) hangs. And after a while
> there are messages about tx timeouts.
> The ne2k_pci type nic almost immediately has those issues and never comes up
> correctly.
>
> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
> this should be but both nics get configured with level,low IRQs. Disk emulation
> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
> at least not level.
Does the e1000 emulated card work correctly?
What happens if you disable interrupt remapping (see patch below)?
diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 1017c7b..472a58b 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -354,8 +354,7 @@ int __init pci_xen_init(void)
int __init pci_xen_hvm_init(void)
{
- if (!xen_feature(XENFEAT_hvm_pirqs))
- return 0;
+ return 0;
#ifdef CONFIG_ACPI
/*
> Btw, what exactly is the difference between xen-pirq-ioapic and IO-APIC?
xen-pirq-ioapic interrupts are interrupts that have been remapped onto
event channels
> Another problem came up recently though that may just be me doing the wrong
> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated
> devices. xen-blkfront is a module in my case and I thought I once had been able
> to use that by removing the unplug arg and making the blkfront driver load. But
> when I recently tried the module loaded but no disks appeared... Again, not sure
> I just forgot how to do that right or that was different when using a 4.1.0
> hypervisor still...
xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
older hypervisors that didn't support the unplug protocol and had other
ways to cope with multiple drivers accessing the same devices.
You can use xen_emul_unplug=never to prevent any unplug but you won't
get any PV interfaces.
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-09-21 13:31 ` Stefano Stabellini
@ 2011-09-21 15:34 ` Stefan Bader
2011-09-22 10:30 ` Stefano Stabellini
0 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-09-21 15:34 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel, Stefano
On 21.09.2011 15:31, Stefano Stabellini wrote:
> On Wed, 21 Sep 2011, Stefan Bader wrote:
>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
>> gets configured via dhcp. And initial pings also get routed and done correctly.
>> But slightly higher traffic (like checking for updates) hangs. And after a while
>> there are messages about tx timeouts.
>> The ne2k_pci type nic almost immediately has those issues and never comes up
>> correctly.
>>
>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
>> this should be but both nics get configured with level,low IRQs. Disk emulation
>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
>> at least not level.
>
> Does the e1000 emulated card work correctly?
Yes, that one seems to work ok.
> What happens if you disable interrupt remapping (see patch below)?
8139cp seems to work correctly now (much higher irq stats as well) and e1000
still works. Both then using IOAPIC-fasteoi.
>
>
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index 1017c7b..472a58b 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -354,8 +354,7 @@ int __init pci_xen_init(void)
>
> int __init pci_xen_hvm_init(void)
> {
> - if (!xen_feature(XENFEAT_hvm_pirqs))
> - return 0;
> + return 0;
>
> #ifdef CONFIG_ACPI
> /*
>
>
>> Btw, what exactly is the difference between xen-pirq-ioapic and IO-APIC?
>
> xen-pirq-ioapic interrupts are interrupts that have been remapped onto
> event channels
Ah ok, thanks.
>
>
>> Another problem came up recently though that may just be me doing the wrong
>> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated
>> devices. xen-blkfront is a module in my case and I thought I once had been able
>> to use that by removing the unplug arg and making the blkfront driver load. But
>> when I recently tried the module loaded but no disks appeared... Again, not sure
>> I just forgot how to do that right or that was different when using a 4.1.0
>> hypervisor still...
>
> xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
> older hypervisors that didn't support the unplug protocol and had other
> ways to cope with multiple drivers accessing the same devices.
> You can use xen_emul_unplug=never to prevent any unplug but you won't
> get any PV interfaces.
Hm, odd. Somehow I thought that I had been using pv interfaces that way when the
interrupts for the emulated ide was broken.
A bit suboptimal atm, because without any option and a kernel compiled with the
platform pci and pv drivers (as modules) booting in HVM mode the kernel decides
that having both is no use and unplugs the emulated devices. Which then leaves
you with ... none.
-Stefan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-09-21 15:34 ` Stefan Bader
@ 2011-09-22 10:30 ` Stefano Stabellini
2011-09-22 11:58 ` Stefan Bader
0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-09-22 10:30 UTC (permalink / raw)
To: Stefan Bader; +Cc: xen-devel, Stefano, Stefano Stabellini
On Wed, 21 Sep 2011, Stefan Bader wrote:
> On 21.09.2011 15:31, Stefano Stabellini wrote:
> > On Wed, 21 Sep 2011, Stefan Bader wrote:
> >> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
> >> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
> >> gets configured via dhcp. And initial pings also get routed and done correctly.
> >> But slightly higher traffic (like checking for updates) hangs. And after a while
> >> there are messages about tx timeouts.
> >> The ne2k_pci type nic almost immediately has those issues and never comes up
> >> correctly.
> >>
> >> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
> >> this should be but both nics get configured with level,low IRQs. Disk emulation
> >> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
> >> at least not level.
> >
>
> > Does the e1000 emulated card work correctly?
>
> Yes, that one seems to work ok.
>
> > What happens if you disable interrupt remapping (see patch below)?
>
> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
> still works. Both then using IOAPIC-fasteoi.
>
That means there must be another subtle bug in Xen in interrupt
remapping that only affects 8139p emulation
> >> Another problem came up recently though that may just be me doing the wrong
> >> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated
> >> devices. xen-blkfront is a module in my case and I thought I once had been able
> >> to use that by removing the unplug arg and making the blkfront driver load. But
> >> when I recently tried the module loaded but no disks appeared... Again, not sure
> >> I just forgot how to do that right or that was different when using a 4.1.0
> >> hypervisor still...
> >
> > xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
> > older hypervisors that didn't support the unplug protocol and had other
> > ways to cope with multiple drivers accessing the same devices.
> > You can use xen_emul_unplug=never to prevent any unplug but you won't
> > get any PV interfaces.
>
> Hm, odd. Somehow I thought that I had been using pv interfaces that way when the
> interrupts for the emulated ide was broken.
> A bit suboptimal atm, because without any option and a kernel compiled with the
> platform pci and pv drivers (as modules) booting in HVM mode the kernel decides
> that having both is no use and unplugs the emulated devices. Which then leaves
> you with ... none.
In theory you would have the PV frontend modules in the initrd.
On the other hand having both can easily cause data corruptions on your
drive.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-09-22 10:30 ` Stefano Stabellini
@ 2011-09-22 11:58 ` Stefan Bader
2011-09-22 14:32 ` Stefan Bader
0 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-09-22 11:58 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel, Stefano
On 22.09.2011 12:30, Stefano Stabellini wrote:
> On Wed, 21 Sep 2011, Stefan Bader wrote:
>> On 21.09.2011 15:31, Stefano Stabellini wrote:
>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
>>>> gets configured via dhcp. And initial pings also get routed and done correctly.
>>>> But slightly higher traffic (like checking for updates) hangs. And after a while
>>>> there are messages about tx timeouts.
>>>> The ne2k_pci type nic almost immediately has those issues and never comes up
>>>> correctly.
>>>>
>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
>>>> this should be but both nics get configured with level,low IRQs. Disk emulation
>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
>>>> at least not level.
>>>
>>
>>> Does the e1000 emulated card work correctly?
>>
>> Yes, that one seems to work ok.
>>
>>> What happens if you disable interrupt remapping (see patch below)?
>>
>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
>> still works. Both then using IOAPIC-fasteoi.
>>
>
> That means there must be another subtle bug in Xen in interrupt
> remapping that only affects 8139p emulation
>
Right, or to be complete:
- e1000: ok
- 8139cp: unstable (setup is possible)
- ne2k_pci: not working (tx problems from the beginning)
The behaviour feels a bit like interrupts may get lost if occurring at a higher
rate. Why this affects various drivers differently is a bit weird.
>
>>>> Another problem came up recently though that may just be me doing the wrong
>>>> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated
>>>> devices. xen-blkfront is a module in my case and I thought I once had been able
>>>> to use that by removing the unplug arg and making the blkfront driver load. But
>>>> when I recently tried the module loaded but no disks appeared... Again, not sure
>>>> I just forgot how to do that right or that was different when using a 4.1.0
>>>> hypervisor still...
>>>
>>> xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
>>> older hypervisors that didn't support the unplug protocol and had other
>>> ways to cope with multiple drivers accessing the same devices.
>>> You can use xen_emul_unplug=never to prevent any unplug but you won't
>>> get any PV interfaces.
>>
>> Hm, odd. Somehow I thought that I had been using pv interfaces that way when the
>> interrupts for the emulated ide was broken.
>> A bit suboptimal atm, because without any option and a kernel compiled with the
>> platform pci and pv drivers (as modules) booting in HVM mode the kernel decides
>> that having both is no use and unplugs the emulated devices. Which then leaves
>> you with ... none.
>
> In theory you would have the PV frontend modules in the initrd.
> On the other hand having both can easily cause data corruptions on your
> drive.
They _are_ in the initrd. And the boot rightfully drops to a maintenance shell
right now (without any argument and the emulated devices unplugged). And
"modprobe xen-blkfront" loads the module but it does _not_ detect any pv device.
-Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
2011-09-22 11:58 ` Stefan Bader
@ 2011-09-22 14:32 ` Stefan Bader
0 siblings, 0 replies; 20+ messages in thread
From: Stefan Bader @ 2011-09-22 14:32 UTC (permalink / raw)
To: xen-devel
On 22.09.2011 13:58, Stefan Bader wrote:
> On 22.09.2011 12:30, Stefano Stabellini wrote:
>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>> On 21.09.2011 15:31, Stefano Stabellini wrote:
>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
>>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
>>>>> gets configured via dhcp. And initial pings also get routed and done correctly.
>>>>> But slightly higher traffic (like checking for updates) hangs. And after a while
>>>>> there are messages about tx timeouts.
>>>>> The ne2k_pci type nic almost immediately has those issues and never comes up
>>>>> correctly.
>>>>>
>>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
>>>>> this should be but both nics get configured with level,low IRQs. Disk emulation
>>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
>>>>> at least not level.
>>>>
>>>
>>>> Does the e1000 emulated card work correctly?
>>>
>>> Yes, that one seems to work ok.
>>>
>>>> What happens if you disable interrupt remapping (see patch below)?
>>>
>>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
>>> still works. Both then using IOAPIC-fasteoi.
>>>
>>
>> That means there must be another subtle bug in Xen in interrupt
>> remapping that only affects 8139p emulation
>>
> Right, or to be complete:
> - e1000: ok
> - 8139cp: unstable (setup is possible)
> - ne2k_pci: not working (tx problems from the beginning)
>
> The behaviour feels a bit like interrupts may get lost if occurring at a higher
> rate. Why this affects various drivers differently is a bit weird.
>>
This is mainly speculating... Quite a while back there was this patch to events:
commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Fri Aug 20 19:10:01 2010 -0700
xen: handle events as edge-triggered
The commit message stated that Xen events are logically edge triggered. So PV
events were changed to be handled as edge interrupts. Would that not mean that
for xen-pirq-apic being using events this would apply the same and those should
be apic-edge instead of level?
>>>>> Another problem came up recently though that may just be me doing the wrong
>>>>> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated
>>>>> devices. xen-blkfront is a module in my case and I thought I once had been able
>>>>> to use that by removing the unplug arg and making the blkfront driver load. But
>>>>> when I recently tried the module loaded but no disks appeared... Again, not sure
>>>>> I just forgot how to do that right or that was different when using a 4.1.0
>>>>> hypervisor still...
>>>>
>>>> xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
>>>> older hypervisors that didn't support the unplug protocol and had other
>>>> ways to cope with multiple drivers accessing the same devices.
>>>> You can use xen_emul_unplug=never to prevent any unplug but you won't
>>>> get any PV interfaces.
>>>
>>> Hm, odd. Somehow I thought that I had been using pv interfaces that way when the
>>> interrupts for the emulated ide was broken.
>>> A bit suboptimal atm, because without any option and a kernel compiled with the
>>> platform pci and pv drivers (as modules) booting in HVM mode the kernel decides
>>> that having both is no use and unplugs the emulated devices. Which then leaves
>>> you with ... none.
>>
>> In theory you would have the PV frontend modules in the initrd.
>> On the other hand having both can easily cause data corruptions on your
>> drive.
>
> They _are_ in the initrd. And the boot rightfully drops to a maintenance shell
> right now (without any argument and the emulated devices unplugged). And
> "modprobe xen-blkfront" loads the module but it does _not_ detect any pv device.
>
> -Stefan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2011-10-27 12:17 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <4E7B4768.8060103@canonical.com>
2011-09-22 17:44 ` Re: Still struggling with HVM: tx timeouts on emulated nics Stefano Stabellini
2011-09-30 9:13 ` Stefan Bader
2011-09-30 14:09 ` Stefano Stabellini
2011-09-30 16:06 ` Stefan Bader
2011-09-30 17:59 ` Stefan Bader
2011-10-03 17:24 ` Stefano Stabellini
2011-10-03 18:13 ` Stefano Stabellini
2011-10-04 10:07 ` Andrew Cooper
2011-10-04 14:13 ` Stefano Stabellini
2011-10-05 16:10 ` Stefan Bader
2011-10-06 10:12 ` Stefano Stabellini
2011-10-06 12:16 ` Stefan Bader
2011-10-27 10:37 ` [PATCH] xen: do not loose level interrupt notifications Stefano Stabellini
2011-10-27 11:18 ` Keir Fraser
2011-10-27 11:42 ` Stefano Stabellini
2011-10-27 12:17 ` Keir Fraser
2011-09-21 13:03 Still struggling with HVM: tx timeouts on emulated nics Stefan Bader
2011-09-21 13:31 ` Stefano Stabellini
2011-09-21 15:34 ` Stefan Bader
2011-09-22 10:30 ` Stefano Stabellini
2011-09-22 11:58 ` Stefan Bader
2011-09-22 14:32 ` Stefan Bader
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.