Re: Re: Still struggling with HVM: tx timeouts on emulated nics

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
       [not found] <4E7B4768.8060103@canonical.com>
@ 2011-09-22 17:44 ` Stefano Stabellini
  2011-09-30  9:13   ` Stefan Bader
  0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-09-22 17:44 UTC (permalink / raw)
  To: Stefan Bader; +Cc: xen-devel, Stefano Stabellini

On Thu, 22 Sep 2011, Stefan Bader wrote:
> On 22.09.2011 13:58, Stefan Bader wrote:
> > On 22.09.2011 12:30, Stefano Stabellini wrote:
> >> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>> On 21.09.2011 15:31, Stefano Stabellini wrote:
> >>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
> >>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
> >>>>> gets configured via dhcp. And initial pings also get routed and done correctly.
> >>>>> But slightly higher traffic (like checking for updates) hangs. And after a while
> >>>>> there are messages about tx timeouts.
> >>>>> The ne2k_pci type nic almost immediately has those issues and never comes up
> >>>>> correctly.
> >>>>>
> >>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
> >>>>> this should be but both nics get configured with level,low IRQs. Disk emulation
> >>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
> >>>>> at least not level.
> >>>>
> >>>
> >>>> Does the e1000 emulated card work correctly?
> >>>
> >>> Yes, that one seems to work ok.
> >>>
> >>>> What happens if you disable interrupt remapping (see patch below)?
> >>>
> >>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
> >>> still works. Both then using IOAPIC-fasteoi.
> >>>
> >>
> >> That means there must be another subtle bug in Xen in interrupt
> >> remapping that only affects 8139p emulation
> >>
> > Right, or to be complete:
> > - e1000: ok
> > - 8139cp: unstable (setup is possible)
> > - ne2k_pci: not working (tx problems from the beginning)
> > 
> > The behaviour feels a bit like interrupts may get lost if occurring at a higher
> > rate. Why this affects various drivers differently is a bit weird.
> >>
> 
> This is mainly speculating... Quite a while back there was this patch to events:
> 
> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date:   Fri Aug 20 19:10:01 2010 -0700
> 
>     xen: handle events as edge-triggered
> 
> The commit message stated that Xen events are logically edge triggered. So PV
> events were changed to be handled as edge interrupts. Would that not mean that
> for xen-pirq-apic being using events this would apply the same and those should
> be apic-edge instead of level?

That commit is referring to the internal way Linux handles these event,
that look like normal interrupt to the Linux irq subsystem. It is not
related to the way actual events are delivered from Xen to Linux, so it
shouldn't matter here.

I would add lots of printk's in:

xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert
xen/arch/x86/hvm/irq.c:assert_irq
xen/arch/x86/hvm/irq.c:assert_gsi

to find out why xen is not injecting those interrupts

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-09-22 17:44 ` Re: Still struggling with HVM: tx timeouts on emulated nics Stefano Stabellini
@ 2011-09-30  9:13   ` Stefan Bader
  2011-09-30 14:09     ` Stefano Stabellini
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-09-30  9:13 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel

On 22.09.2011 19:44, Stefano Stabellini wrote:
> On Thu, 22 Sep 2011, Stefan Bader wrote:
>> On 22.09.2011 13:58, Stefan Bader wrote:
>>> On 22.09.2011 12:30, Stefano Stabellini wrote:
>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>>> On 21.09.2011 15:31, Stefano Stabellini wrote:
>>>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
>>>>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
>>>>>>> gets configured via dhcp. And initial pings also get routed and done correctly.
>>>>>>> But slightly higher traffic (like checking for updates) hangs. And after a while
>>>>>>> there are messages about tx timeouts.
>>>>>>> The ne2k_pci type nic almost immediately has those issues and never comes up
>>>>>>> correctly.
>>>>>>>
>>>>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
>>>>>>> this should be but both nics get configured with level,low IRQs. Disk emulation
>>>>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
>>>>>>> at least not level.
>>>>>>
>>>>>
>>>>>> Does the e1000 emulated card work correctly?
>>>>>
>>>>> Yes, that one seems to work ok.
>>>>>
>>>>>> What happens if you disable interrupt remapping (see patch below)?
>>>>>
>>>>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
>>>>> still works. Both then using IOAPIC-fasteoi.
>>>>>
>>>>
>>>> That means there must be another subtle bug in Xen in interrupt
>>>> remapping that only affects 8139p emulation
>>>>
>>> Right, or to be complete:
>>> - e1000: ok
>>> - 8139cp: unstable (setup is possible)
>>> - ne2k_pci: not working (tx problems from the beginning)
>>>
>>> The behaviour feels a bit like interrupts may get lost if occurring at a higher
>>> rate. Why this affects various drivers differently is a bit weird.
>>>>
>>
>> This is mainly speculating... Quite a while back there was this patch to events:
>>
>> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
>> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> Date:   Fri Aug 20 19:10:01 2010 -0700
>>
>>     xen: handle events as edge-triggered
>>
>> The commit message stated that Xen events are logically edge triggered. So PV
>> events were changed to be handled as edge interrupts. Would that not mean that
>> for xen-pirq-apic being using events this would apply the same and those should
>> be apic-edge instead of level?
> 
> That commit is referring to the internal way Linux handles these event,
> that look like normal interrupt to the Linux irq subsystem. It is not
> related to the way actual events are delivered from Xen to Linux, so it
> shouldn't matter here.
> 
> I would add lots of printk's in:
> 
> xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert
> xen/arch/x86/hvm/irq.c:assert_irq
> xen/arch/x86/hvm/irq.c:assert_gsi
> 
> to find out why xen is not injecting those interrupts
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

It took quite a bit of time but at least I got some hopefully useful information
now. So in general, whenever an interrupt is asserted,
the hypervisor runs through this:

__hvm_pci_intx_assert:
  when assert count was 0 before incrementing
    call assert_gsi
      call send_guest_pirq (when hvm uses pirq)

In the send_guest_pirq chain is a call to evtchn_set_pending which tests as one
of the first actions whether evtchn_pending in the shared_info is set. If that
is the case the call immediately returns with 1.

Adding printks to call_assert_gsi, I noticed that
- When things stop working, the last call to send_guest_pirq returned 1.
- But not every time the return code is one, the stall happens.
- e1000 also has cases where send_guest_pirq returns 1 but they happen much
  less often (than using the 8139cp).

Usually every intx_assert has a intx_deassert call that follows. when the stall
occurs, this does not happen. Right here I got some troubles to understand where
this intx_deassert is actually triggered. With an added WARN_ON the stack traces
seem odd, like this:

(XEN)    [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130
(XEN)    [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60
(XEN)    [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60
(XEN)    [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220
(XEN)    [<ffff82c48017dba6>] copy_from_user+0x26/0x90
(XEN)    [<ffff82c4801f9446>] do_iret+0xb6/0x1a0
(XEN)    [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d

Not really sure how one gets from do_update_descriptor to do_hvm_op and the only
thing in there which does the deassert is some irq level setting.

Actually the guest does not really do much do EOI (which I had been assuming).
But since domain_pirq_to_irq maps to 0 for emuirqs, the call to
PHYSDEVOP_irq_status_query will hit the following and not set the flag for
needing EOI.

        irq_status_query.flags = 0;
        if ( is_hvm_domain(v->domain) &&
             domain_pirq_to_irq(v->domain, irq) <= 0 )
        {
            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
            break;
        }

So all the guest is doing is to clear evtchn_pending in the pirq EOI function. I
fail to understand what actually is doing the hvm_pci_intx_deassert calls but
the way the fasteoi code in the guest looks to be working, there seems to be
some gap between calling the handler and the eoi function... So from what I see,
I would assume the following:

dom0                                     domU
- intx_assert (count 0->1)
- send_guest_pirq = 0
  (evtchn_pending = 1)
                                         - upcall starts fasteoi handler
- something does intx_deassert
  (count 1->0)
- intx_assert (count 0->1)
- send_guest_pirq = 1
  (evtchn_pending still set)
                                         - handler->eoi sets evtchn to 0 but
                                           otherwise does nothing
- there is no intx_deassert, so even
  when another intx_assert would happen
  (which does not seem to be the case)
  no further send_guest_pirq would be
  called.

Unfortunately I do miss some details on the inner working here. Generally I
wonder whether not setting the needsEOI flag for those pirqs just is the
problem. But it also could be intentional...

-Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-09-30  9:13   ` Stefan Bader
@ 2011-09-30 14:09     ` Stefano Stabellini
  2011-09-30 16:06       ` Stefan Bader
  0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-09-30 14:09 UTC (permalink / raw)
  To: Stefan Bader; +Cc: xen-devel, Stefano Stabellini

On Fri, 30 Sep 2011, Stefan Bader wrote:
> On 22.09.2011 19:44, Stefano Stabellini wrote:
> > On Thu, 22 Sep 2011, Stefan Bader wrote:
> >> On 22.09.2011 13:58, Stefan Bader wrote:
> >>> On 22.09.2011 12:30, Stefano Stabellini wrote:
> >>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>>>> On 21.09.2011 15:31, Stefano Stabellini wrote:
> >>>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
> >>>>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
> >>>>>>> gets configured via dhcp. And initial pings also get routed and done correctly.
> >>>>>>> But slightly higher traffic (like checking for updates) hangs. And after a while
> >>>>>>> there are messages about tx timeouts.
> >>>>>>> The ne2k_pci type nic almost immediately has those issues and never comes up
> >>>>>>> correctly.
> >>>>>>>
> >>>>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
> >>>>>>> this should be but both nics get configured with level,low IRQs. Disk emulation
> >>>>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
> >>>>>>> at least not level.
> >>>>>>
> >>>>>
> >>>>>> Does the e1000 emulated card work correctly?
> >>>>>
> >>>>> Yes, that one seems to work ok.
> >>>>>
> >>>>>> What happens if you disable interrupt remapping (see patch below)?
> >>>>>
> >>>>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
> >>>>> still works. Both then using IOAPIC-fasteoi.
> >>>>>
> >>>>
> >>>> That means there must be another subtle bug in Xen in interrupt
> >>>> remapping that only affects 8139p emulation
> >>>>
> >>> Right, or to be complete:
> >>> - e1000: ok
> >>> - 8139cp: unstable (setup is possible)
> >>> - ne2k_pci: not working (tx problems from the beginning)
> >>>
> >>> The behaviour feels a bit like interrupts may get lost if occurring at a higher
> >>> rate. Why this affects various drivers differently is a bit weird.
> >>>>
> >>
> >> This is mainly speculating... Quite a while back there was this patch to events:
> >>
> >> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
> >> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> >> Date:   Fri Aug 20 19:10:01 2010 -0700
> >>
> >>     xen: handle events as edge-triggered
> >>
> >> The commit message stated that Xen events are logically edge triggered. So PV
> >> events were changed to be handled as edge interrupts. Would that not mean that
> >> for xen-pirq-apic being using events this would apply the same and those should
> >> be apic-edge instead of level?
> > 
> > That commit is referring to the internal way Linux handles these event,
> > that look like normal interrupt to the Linux irq subsystem. It is not
> > related to the way actual events are delivered from Xen to Linux, so it
> > shouldn't matter here.
> > 
> > I would add lots of printk's in:
> > 
> > xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert
> > xen/arch/x86/hvm/irq.c:assert_irq
> > xen/arch/x86/hvm/irq.c:assert_gsi
> > 
> > to find out why xen is not injecting those interrupts
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> It took quite a bit of time but at least I got some hopefully useful information
> now. So in general, whenever an interrupt is asserted,
> the hypervisor runs through this:
> 
> __hvm_pci_intx_assert:
>   when assert count was 0 before incrementing
>     call assert_gsi
>       call send_guest_pirq (when hvm uses pirq)
> 
> In the send_guest_pirq chain is a call to evtchn_set_pending which tests as one
> of the first actions whether evtchn_pending in the shared_info is set. If that
> is the case the call immediately returns with 1.
> 
> Adding printks to call_assert_gsi, I noticed that
> - When things stop working, the last call to send_guest_pirq returned 1.
> - But not every time the return code is one, the stall happens.
> - e1000 also has cases where send_guest_pirq returns 1 but they happen much
>   less often (than using the 8139cp).
> 
> Usually every intx_assert has a intx_deassert call that follows. when the stall
> occurs, this does not happen. Right here I got some troubles to understand where
> this intx_deassert is actually triggered. With an added WARN_ON the stack traces
> seem odd, like this:
> 
> (XEN)    [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130
> (XEN)    [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60
> (XEN)    [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60
> (XEN)    [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220
> (XEN)    [<ffff82c48017dba6>] copy_from_user+0x26/0x90
> (XEN)    [<ffff82c4801f9446>] do_iret+0xb6/0x1a0
> (XEN)    [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d
> 
> Not really sure how one gets from do_update_descriptor to do_hvm_op and the only
> thing in there which does the deassert is some irq level setting.
> 
> Actually the guest does not really do much do EOI (which I had been assuming).
> But since domain_pirq_to_irq maps to 0 for emuirqs, the call to
> PHYSDEVOP_irq_status_query will hit the following and not set the flag for
> needing EOI.
> 
>         irq_status_query.flags = 0;
>         if ( is_hvm_domain(v->domain) &&
>              domain_pirq_to_irq(v->domain, irq) <= 0 )
>         {
>             ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
>             break;
>         }
> 
> So all the guest is doing is to clear evtchn_pending in the pirq EOI function. I
> fail to understand what actually is doing the hvm_pci_intx_deassert calls but
> the way the fasteoi code in the guest looks to be working, there seems to be
> some gap between calling the handler and the eoi function... So from what I see,
> I would assume the following:
> 
> dom0                                     domU
> - intx_assert (count 0->1)
> - send_guest_pirq = 0
>   (evtchn_pending = 1)
>                                          - upcall starts fasteoi handler
> - something does intx_deassert
>   (count 1->0)
> - intx_assert (count 0->1)
> - send_guest_pirq = 1
>   (evtchn_pending still set)
>                                          - handler->eoi sets evtchn to 0 but
>                                            otherwise does nothing
> - there is no intx_deassert, so even
>   when another intx_assert would happen
>   (which does not seem to be the case)
>   no further send_guest_pirq would be
>   called.
> 
> Unfortunately I do miss some details on the inner working here. Generally I
> wonder whether not setting the needsEOI flag for those pirqs just is the
> problem. But it also could be intentional...
 
Thanks for the very detailed analysis.
It seems to me that the problem is that if the interrupt is a level
triggered interrupt when the guest issues an EOI we should be
reinjecting the interrupt again if it has been issued a second time in
the meantime. However this doesn't happen if the interrupt has been
remapped onto an even channel. In that case the guest is not even going
to issue an EOI at all.
So I wrote a patch to force the guest to issue EOIs even on remapped
irqs; in the hypercall handler we check whether we need to reinject the
interrupt and if that is the case we set the corresponding event channel
pending.
Could you please try the patch I appended? I haven't been able to reproduce
your problem so I am not really sure if it works.



diff -r e042fb60e0ee xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Thu Sep 29 11:23:01 2011 +0000
+++ b/xen/arch/x86/physdev.c	Fri Sep 30 14:01:46 2011 +0000
@@ -11,6 +11,7 @@
 #include <asm/current.h>
 #include <asm/io_apic.h>
 #include <asm/msi.h>
+#include <asm/hvm/irq.h>
 #include <asm/hypercall.h>
 #include <public/xen.h>
 #include <public/physdev.h>
@@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
             pirq_guest_eoi(pirq);
+        if ( is_hvm_domain(v->domain) &&
+                domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
+        {
+            struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
+            int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
+
+            /* if this is a level irq and count > 0, send another
+             * notification */ 
+            if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
+                    && hvm_irq->gsi_assert_count[gsi] )
+                send_guest_pirq(v->domain, pirq);
+        }
         spin_unlock(&v->domain->event_lock);
         ret = 0;
         break;
@@ -327,12 +340,6 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( (irq < 0) || (irq >= v->domain->nr_pirqs) )
             break;
         irq_status_query.flags = 0;
-        if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
-        {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
-            break;
-        }
 
         /*
          * Even edge-triggered or message-based IRQs can need masking from

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-09-30 14:09     ` Stefano Stabellini
@ 2011-09-30 16:06       ` Stefan Bader
  2011-09-30 17:59         ` Stefan Bader
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-09-30 16:06 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel

On 30.09.2011 16:09, Stefano Stabellini wrote:
> @@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>          if ( !is_hvm_domain(v->domain) ||
>               domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>              pirq_guest_eoi(pirq);
> +        if ( is_hvm_domain(v->domain) &&
> +                domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
> +        {
> +            struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
> +            int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
> +
> +            /* if this is a level irq and count > 0, send another
> +             * notification */ 
> +            if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
> +                    && hvm_irq->gsi_assert_count[gsi] )
> +                send_guest_pirq(v->domain, pirq);
> +        }
>          spin_unlock(&v->domain->event_lock);
>          ret = 0;
>          break;

This hunk looks substantially different from my 4.1.1 based code. There is no
spin_lock acquired. Not sure that could be a reason for the different behaviour,
too. I'll add that spinlock too.

    case PHYSDEVOP_eoi: {
        struct physdev_eoi eoi;
        ret = -EFAULT;
        if ( copy_from_guest(&eoi, arg, 1) != 0 )
            break;
        ret = -EINVAL;
        if ( eoi.irq >= v->domain->nr_pirqs )
            break;
        if ( v->domain->arch.pirq_eoi_map )
            evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
        if ( !is_hvm_domain(v->domain) ||
             domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
            ret = pirq_guest_eoi(v->domain, eoi.irq);
        else
            ret = 0;
        break;
    }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-09-30 16:06       ` Stefan Bader
@ 2011-09-30 17:59         ` Stefan Bader
  2011-10-03 17:24           ` Stefano Stabellini
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-09-30 17:59 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel

On 30.09.2011 18:06, Stefan Bader wrote:
> On 30.09.2011 16:09, Stefano Stabellini wrote:
>> @@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>>          if ( !is_hvm_domain(v->domain) ||
>>               domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>>              pirq_guest_eoi(pirq);
>> +        if ( is_hvm_domain(v->domain) &&
>> +                domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
>> +        {
>> +            struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
>> +            int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
>> +
>> +            /* if this is a level irq and count > 0, send another
>> +             * notification */ 
>> +            if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
>> +                    && hvm_irq->gsi_assert_count[gsi] )
>> +                send_guest_pirq(v->domain, pirq);
>> +        }
>>          spin_unlock(&v->domain->event_lock);
>>          ret = 0;
>>          break;
> 
> This hunk looks substantially different from my 4.1.1 based code. There is no
> spin_lock acquired. Not sure that could be a reason for the different behaviour,
> too. I'll add that spinlock too.
> 
>     case PHYSDEVOP_eoi: {
>         struct physdev_eoi eoi;
>         ret = -EFAULT;
>         if ( copy_from_guest(&eoi, arg, 1) != 0 )
>             break;
>         ret = -EINVAL;
>         if ( eoi.irq >= v->domain->nr_pirqs )
>             break;
>         if ( v->domain->arch.pirq_eoi_map )
>             evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
>         if ( !is_hvm_domain(v->domain) ||
>              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>             ret = pirq_guest_eoi(v->domain, eoi.irq);
>         else
>             ret = 0;
>         break;
>     }
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

Ok, so I had been modifying that hunk to

        spin_lock(&v->domain->event_lock);
        if ( v->domain->arch.pirq_eoi_map )
            evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
        if ( !is_hvm_domain(v->domain) ||
             domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
            pirq_guest_eoi(v->domain, eoi.irq);
        if ( is_hvm_domain(v->domain) &&
                domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
        {
            struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
            int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);

            /* if this is a level irq and count > 0, send another
             * notification */
            if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
                    && hvm_irq->gsi_assert_count[gsi] ) {
                printk("re-send event for gsi%i\n", gsi);
                send_guest_pirq(v->domain, eoi.irq);
           }
        }
        spin_unlock(&v->domain->event_lock);
        ret = 0;

Also I did not completely remove the section that would return the status
without setting needsEOI. I just changed the if condition to be <0 instead of
<=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check
could be useful for something.

        irq_status_query.flags = 0;
        if ( is_hvm_domain(v->domain) &&
             domain_pirq_to_irq(v->domain, irq) < 0 )
        {
            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
            break;
        }

With that a quick test shows both the re-sends done sometimes and the domU doing
EOIs. And there is no stall apparent. Did the same quick test with the e1000
emulated NIC and that still seems ok. Those were not very thorough tests but at
least I would have observed a stall pretty quick otherwise.

-Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-09-30 17:59         ` Stefan Bader
@ 2011-10-03 17:24           ` Stefano Stabellini
  2011-10-03 18:13             ` Stefano Stabellini
                               ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-03 17:24 UTC (permalink / raw)
  To: Stefan Bader; +Cc: xen-devel, Stefano Stabellini

On Fri, 30 Sep 2011, Stefan Bader wrote:
> Also I did not completely remove the section that would return the status
> without setting needsEOI. I just changed the if condition to be <0 instead of
> <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check
> could be useful for something.
> 
>         irq_status_query.flags = 0;
>         if ( is_hvm_domain(v->domain) &&
>              domain_pirq_to_irq(v->domain, irq) < 0 )
>         {
>             ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
>             break;
>         }
> 

You need to remove the entire test because we want to receive
notifications in all cases.


> With that a quick test shows both the re-sends done sometimes and the domU doing
> EOIs. And there is no stall apparent. Did the same quick test with the e1000
> emulated NIC and that still seems ok. Those were not very thorough tests but at
> least I would have observed a stall pretty quick otherwise.

I am glad it fixes the problem for you.

I am going to send a different patch upstream for Xen 4.2, because I
would also like it to cover the very unlikely scenario in which a PV
guest (like dom0 or a PV guest with PCI passthrough) is loosing level
interrupts because when Xen tries to set the corresponding event channel
pending the bit is alreay set. The codebase is different enough that
making the same change on 4.1 is non-trivial. I am appending the new
patch to this email, it would be great if you could test it. You just
need a 4.2 hypervisor, not the entire system. You should be able to
perform the test updating only xen.gz.
If you have trouble if xen-unstable.hg tip, try changeset 23843.

---


diff -r bf533533046c xen/arch/x86/hvm/irq.c
--- a/xen/arch/x86/hvm/irq.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/hvm/irq.c	Mon Oct 03 16:54:51 2011 +0000
@@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
 
     if ( hvm_domain_use_pirq(d, pirq) )
     {
-        send_guest_pirq(d, pirq);
+        if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
+            pirq->lost++;
         return;
     }
     vioapic_irq_positive_edge(d, ioapic_gsi);
@@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
 {
     struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
     unsigned int gsi, link, isa_irq;
+    struct pirq *pirq;
 
     ASSERT((device <= 31) && (intx <= 3));
 
@@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
     gsi = hvm_pci_intx_gsi(device, intx);
     if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
         assert_gsi(d, gsi);
+    else {
+        pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
+        if ( hvm_domain_use_pirq(d, pirq) )
+            pirq->lost++;
+    }
 
     link    = hvm_pci_intx_link(device, intx);
     isa_irq = hvm_irq->pci_link.route[link];
diff -r bf533533046c xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/irq.c	Mon Oct 03 16:54:51 2011 +0000
@@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
              !test_and_set_bool(pirq->masked) )
             action->in_flight++;
         if ( !hvm_do_IRQ_dpci(d, pirq) )
-            send_guest_pirq(d, pirq);
+        {
+            if ( send_guest_pirq(d, pirq) &&
+                    action->ack_type == ACKTYPE_EOI )
+                pirq->lost++;
+        }
     }
 
     if ( action->ack_type != ACKTYPE_NONE )
diff -r bf533533046c xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/physdev.c	Mon Oct 03 16:54:51 2011 +0000
@@ -11,6 +11,7 @@
 #include <asm/current.h>
 #include <asm/io_apic.h>
 #include <asm/msi.h>
+#include <asm/hvm/irq.h>
 #include <asm/hypercall.h>
 #include <public/xen.h>
 #include <public/physdev.h>
@@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
             pirq_guest_eoi(pirq);
+        if ( pirq->lost > 0) {
+            if ( !send_guest_pirq(v->domain, pirq) )
+                pirq->lost--;
+        }
         spin_unlock(&v->domain->event_lock);
         ret = 0;
         break;
@@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
             break;
         irq_status_query.flags = 0;
         if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
+             domain_pirq_to_irq(v->domain, irq) <= 0 &&
+             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
         {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+            ret = -EINVAL;
             break;
         }
 
diff -r bf533533046c xen/include/xen/irq.h
--- a/xen/include/xen/irq.h	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/include/xen/irq.h	Mon Oct 03 16:54:51 2011 +0000
@@ -146,6 +146,7 @@ struct pirq {
     int pirq;
     u16 evtchn;
     bool_t masked;
+    u32 lost;
     struct rcu_head rcu_head;
     struct arch_pirq arch;
 };

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-10-03 17:24           ` Stefano Stabellini
@ 2011-10-03 18:13             ` Stefano Stabellini
  2011-10-04 10:07               ` Andrew Cooper
  2011-10-05 16:10             ` Stefan Bader
  2011-10-27 10:37             ` [PATCH] xen: do not loose level interrupt notifications Stefano Stabellini
  2 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-03 18:13 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Jan Beulich, Stefan Bader

CC'ing Jan, that probably is going to have an opinion on this.

Let me add a bit of background: Stefan found out that PV on HVM guests
could loose level interrupts coming from emulated devices. Looking
through the code I realized that we need to add some logic to inject a
pirq in the guest if a level interrupt has been raised while the guest
is servicing the first one.
While this is all very specific to interrupt remapping and emulated
devices, I realized that something similar could happen even with dom0
or other PV guests with PCI passthrough:

1) the device raises a level interrupt and xen injects it into the
guest;

2) the guest is temporarely stuck: it does not ack it or eoi it;

3) the xen timer kicks in and eois the interrupt;

4) the device thinks it is all fine and sends a second interrupt;

5) Xen fails to inject the second interrupt into the guest because the
guest has still the event channel pending bit set;

at this point the guest looses the second interrupt notification, that
is not supposed to happen with level interrupts and I think it might
cause problems with some devices.

Jan, do you think we should try to handle this case, or is it too
unlikely?

In any case we need to handle the PV on HVM remapping bug, that because
of the way interrupts are emulated is much more likely to happen...


On Mon, 3 Oct 2011, Stefano Stabellini wrote:
> On Fri, 30 Sep 2011, Stefan Bader wrote:
> > Also I did not completely remove the section that would return the status
> > without setting needsEOI. I just changed the if condition to be <0 instead of
> > <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check
> > could be useful for something.
> > 
> >         irq_status_query.flags = 0;
> >         if ( is_hvm_domain(v->domain) &&
> >              domain_pirq_to_irq(v->domain, irq) < 0 )
> >         {
> >             ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
> >             break;
> >         }
> > 
> 
> You need to remove the entire test because we want to receive
> notifications in all cases.
> 
> 
> > With that a quick test shows both the re-sends done sometimes and the domU doing
> > EOIs. And there is no stall apparent. Did the same quick test with the e1000
> > emulated NIC and that still seems ok. Those were not very thorough tests but at
> > least I would have observed a stall pretty quick otherwise.
> 
> I am glad it fixes the problem for you.
> 
> I am going to send a different patch upstream for Xen 4.2, because I
> would also like it to cover the very unlikely scenario in which a PV
> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
> interrupts because when Xen tries to set the corresponding event channel
> pending the bit is alreay set. The codebase is different enough that
> making the same change on 4.1 is non-trivial. I am appending the new
> patch to this email, it would be great if you could test it. You just
> need a 4.2 hypervisor, not the entire system. You should be able to
> perform the test updating only xen.gz.
> If you have trouble if xen-unstable.hg tip, try changeset 23843.
> 
> ---
> 
> 
> diff -r bf533533046c xen/arch/x86/hvm/irq.c
> --- a/xen/arch/x86/hvm/irq.c	Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/hvm/irq.c	Mon Oct 03 16:54:51 2011 +0000
> @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
>  
>      if ( hvm_domain_use_pirq(d, pirq) )
>      {
> -        send_guest_pirq(d, pirq);
> +        if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
> +            pirq->lost++;
>          return;
>      }
>      vioapic_irq_positive_edge(d, ioapic_gsi);
> @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
>  {
>      struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
>      unsigned int gsi, link, isa_irq;
> +    struct pirq *pirq;
>  
>      ASSERT((device <= 31) && (intx <= 3));
>  
> @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
>      gsi = hvm_pci_intx_gsi(device, intx);
>      if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
>          assert_gsi(d, gsi);
> +    else {
> +        pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
> +        if ( hvm_domain_use_pirq(d, pirq) )
> +            pirq->lost++;
> +    }
>  
>      link    = hvm_pci_intx_link(device, intx);
>      isa_irq = hvm_irq->pci_link.route[link];
> diff -r bf533533046c xen/arch/x86/irq.c
> --- a/xen/arch/x86/irq.c	Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/irq.c	Mon Oct 03 16:54:51 2011 +0000
> @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
>               !test_and_set_bool(pirq->masked) )
>              action->in_flight++;
>          if ( !hvm_do_IRQ_dpci(d, pirq) )
> -            send_guest_pirq(d, pirq);
> +        {
> +            if ( send_guest_pirq(d, pirq) &&
> +                    action->ack_type == ACKTYPE_EOI )
> +                pirq->lost++;
> +        }
>      }
>  
>      if ( action->ack_type != ACKTYPE_NONE )
> diff -r bf533533046c xen/arch/x86/physdev.c
> --- a/xen/arch/x86/physdev.c	Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/physdev.c	Mon Oct 03 16:54:51 2011 +0000
> @@ -11,6 +11,7 @@
>  #include <asm/current.h>
>  #include <asm/io_apic.h>
>  #include <asm/msi.h>
> +#include <asm/hvm/irq.h>
>  #include <asm/hypercall.h>
>  #include <public/xen.h>
>  #include <public/physdev.h>
> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>          if ( !is_hvm_domain(v->domain) ||
>               domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>              pirq_guest_eoi(pirq);
> +        if ( pirq->lost > 0) {
> +            if ( !send_guest_pirq(v->domain, pirq) )
> +                pirq->lost--;
> +        }
>          spin_unlock(&v->domain->event_lock);
>          ret = 0;
>          break;
> @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>              break;
>          irq_status_query.flags = 0;
>          if ( is_hvm_domain(v->domain) &&
> -             domain_pirq_to_irq(v->domain, irq) <= 0 )
> +             domain_pirq_to_irq(v->domain, irq) <= 0 &&
> +             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
>          {
> -            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
> +            ret = -EINVAL;
>              break;
>          }
>  
> diff -r bf533533046c xen/include/xen/irq.h
> --- a/xen/include/xen/irq.h	Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/include/xen/irq.h	Mon Oct 03 16:54:51 2011 +0000
> @@ -146,6 +146,7 @@ struct pirq {
>      int pirq;
>      u16 evtchn;
>      bool_t masked;
> +    u32 lost;
>      struct rcu_head rcu_head;
>      struct arch_pirq arch;
>  };
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-10-03 18:13             ` Stefano Stabellini
@ 2011-10-04 10:07               ` Andrew Cooper
  2011-10-04 14:13                 ` Stefano Stabellini
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2011-10-04 10:07 UTC (permalink / raw)
  To: xen-devel

On 03/10/11 19:13, Stefano Stabellini wrote:
> CC'ing Jan, that probably is going to have an opinion on this.
>
> Let me add a bit of background: Stefan found out that PV on HVM guests
> could loose level interrupts coming from emulated devices. Looking
> through the code I realized that we need to add some logic to inject a
> pirq in the guest if a level interrupt has been raised while the guest
> is servicing the first one.
> While this is all very specific to interrupt remapping and emulated
> devices, I realized that something similar could happen even with dom0
> or other PV guests with PCI passthrough:
>
> 1) the device raises a level interrupt and xen injects it into the
> guest;
>
> 2) the guest is temporarely stuck: it does not ack it or eoi it;
>
> 3) the xen timer kicks in and eois the interrupt;
>
> 4) the device thinks it is all fine and sends a second interrupt;
>
> 5) Xen fails to inject the second interrupt into the guest because the
> guest has still the event channel pending bit set;
>
> at this point the guest looses the second interrupt notification, that
> is not supposed to happen with level interrupts and I think it might
> cause problems with some devices.
>
> Jan, do you think we should try to handle this case, or is it too
> unlikely?

I am not certain whether this is relevant, but the ICH10 IO-APIC
documentation indicated that early EOI'ing of a line level interrupt
should not have this effect.  Specifically, it states that EOI'ing a
line level interrupt whos line is still asserted will cause the
interrupt to be "re-raised" from the IO-APIC.  It uses this to assert
that it is fine to use multiple IO-APIC entries with the same vector,
with a broadcast of vector number alone to EOI the interrupt.

In this case, while Xen sees two interrupts, from the devices point of
view, only I has happened.

In the case where the device has dropped its line level interrupt of its
own accord, then I would agree that the current Xen behavior is wrong. 
I cant offhand think of a good reason why this would occur.

I know it is not helpful in this case, but as a rule of thumb, line
level interrupts should not be used with Xen.  The average response time
on an unloaded system is ~30ms, ranging from 5 to 150.  (On a sample set
of a Dell R710, Xen 4.1.0 and 2.6.32 dom0, over 2 weeks of debugging
another line level interrupt bug).

~Andrew

> In any case we need to handle the PV on HVM remapping bug, that because
> of the way interrupts are emulated is much more likely to happen...
>
>
> On Mon, 3 Oct 2011, Stefano Stabellini wrote:
>> On Fri, 30 Sep 2011, Stefan Bader wrote:
>>> Also I did not completely remove the section that would return the status
>>> without setting needsEOI. I just changed the if condition to be <0 instead of
>>> <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check
>>> could be useful for something.
>>>
>>>         irq_status_query.flags = 0;
>>>         if ( is_hvm_domain(v->domain) &&
>>>              domain_pirq_to_irq(v->domain, irq) < 0 )
>>>         {
>>>             ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
>>>             break;
>>>         }
>>>
>> You need to remove the entire test because we want to receive
>> notifications in all cases.
>>
>>
>>> With that a quick test shows both the re-sends done sometimes and the domU doing
>>> EOIs. And there is no stall apparent. Did the same quick test with the e1000
>>> emulated NIC and that still seems ok. Those were not very thorough tests but at
>>> least I would have observed a stall pretty quick otherwise.
>> I am glad it fixes the problem for you.
>>
>> I am going to send a different patch upstream for Xen 4.2, because I
>> would also like it to cover the very unlikely scenario in which a PV
>> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
>> interrupts because when Xen tries to set the corresponding event channel
>> pending the bit is alreay set. The codebase is different enough that
>> making the same change on 4.1 is non-trivial. I am appending the new
>> patch to this email, it would be great if you could test it. You just
>> need a 4.2 hypervisor, not the entire system. You should be able to
>> perform the test updating only xen.gz.
>> If you have trouble if xen-unstable.hg tip, try changeset 23843.
>>
>> ---
>>
>>
>> diff -r bf533533046c xen/arch/x86/hvm/irq.c
>> --- a/xen/arch/x86/hvm/irq.c	Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/hvm/irq.c	Mon Oct 03 16:54:51 2011 +0000
>> @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
>>  
>>      if ( hvm_domain_use_pirq(d, pirq) )
>>      {
>> -        send_guest_pirq(d, pirq);
>> +        if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
>> +            pirq->lost++;
>>          return;
>>      }
>>      vioapic_irq_positive_edge(d, ioapic_gsi);
>> @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
>>  {
>>      struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
>>      unsigned int gsi, link, isa_irq;
>> +    struct pirq *pirq;
>>  
>>      ASSERT((device <= 31) && (intx <= 3));
>>  
>> @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
>>      gsi = hvm_pci_intx_gsi(device, intx);
>>      if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
>>          assert_gsi(d, gsi);
>> +    else {
>> +        pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
>> +        if ( hvm_domain_use_pirq(d, pirq) )
>> +            pirq->lost++;
>> +    }
>>  
>>      link    = hvm_pci_intx_link(device, intx);
>>      isa_irq = hvm_irq->pci_link.route[link];
>> diff -r bf533533046c xen/arch/x86/irq.c
>> --- a/xen/arch/x86/irq.c	Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/irq.c	Mon Oct 03 16:54:51 2011 +0000
>> @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
>>               !test_and_set_bool(pirq->masked) )
>>              action->in_flight++;
>>          if ( !hvm_do_IRQ_dpci(d, pirq) )
>> -            send_guest_pirq(d, pirq);
>> +        {
>> +            if ( send_guest_pirq(d, pirq) &&
>> +                    action->ack_type == ACKTYPE_EOI )
>> +                pirq->lost++;
>> +        }
>>      }
>>  
>>      if ( action->ack_type != ACKTYPE_NONE )
>> diff -r bf533533046c xen/arch/x86/physdev.c
>> --- a/xen/arch/x86/physdev.c	Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/physdev.c	Mon Oct 03 16:54:51 2011 +0000
>> @@ -11,6 +11,7 @@
>>  #include <asm/current.h>
>>  #include <asm/io_apic.h>
>>  #include <asm/msi.h>
>> +#include <asm/hvm/irq.h>
>>  #include <asm/hypercall.h>
>>  #include <public/xen.h>
>>  #include <public/physdev.h>
>> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>>          if ( !is_hvm_domain(v->domain) ||
>>               domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>>              pirq_guest_eoi(pirq);
>> +        if ( pirq->lost > 0) {
>> +            if ( !send_guest_pirq(v->domain, pirq) )
>> +                pirq->lost--;
>> +        }
>>          spin_unlock(&v->domain->event_lock);
>>          ret = 0;
>>          break;
>> @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>>              break;
>>          irq_status_query.flags = 0;
>>          if ( is_hvm_domain(v->domain) &&
>> -             domain_pirq_to_irq(v->domain, irq) <= 0 )
>> +             domain_pirq_to_irq(v->domain, irq) <= 0 &&
>> +             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
>>          {
>> -            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
>> +            ret = -EINVAL;
>>              break;
>>          }
>>  
>> diff -r bf533533046c xen/include/xen/irq.h
>> --- a/xen/include/xen/irq.h	Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/include/xen/irq.h	Mon Oct 03 16:54:51 2011 +0000
>> @@ -146,6 +146,7 @@ struct pirq {
>>      int pirq;
>>      u16 evtchn;
>>      bool_t masked;
>> +    u32 lost;
>>      struct rcu_head rcu_head;
>>      struct arch_pirq arch;
>>  };
>>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-10-04 10:07               ` Andrew Cooper
@ 2011-10-04 14:13                 ` Stefano Stabellini
  0 siblings, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-04 14:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Tue, 4 Oct 2011, Andrew Cooper wrote:
> On 03/10/11 19:13, Stefano Stabellini wrote:
> > CC'ing Jan, that probably is going to have an opinion on this.
> >
> > Let me add a bit of background: Stefan found out that PV on HVM guests
> > could loose level interrupts coming from emulated devices. Looking
> > through the code I realized that we need to add some logic to inject a
> > pirq in the guest if a level interrupt has been raised while the guest
> > is servicing the first one.
> > While this is all very specific to interrupt remapping and emulated
> > devices, I realized that something similar could happen even with dom0
> > or other PV guests with PCI passthrough:
> >
> > 1) the device raises a level interrupt and xen injects it into the
> > guest;
> >
> > 2) the guest is temporarely stuck: it does not ack it or eoi it;
> >
> > 3) the xen timer kicks in and eois the interrupt;
> >
> > 4) the device thinks it is all fine and sends a second interrupt;
> >
> > 5) Xen fails to inject the second interrupt into the guest because the
> > guest has still the event channel pending bit set;
> >
> > at this point the guest looses the second interrupt notification, that
> > is not supposed to happen with level interrupts and I think it might
> > cause problems with some devices.
> >
> > Jan, do you think we should try to handle this case, or is it too
> > unlikely?
> 
> I am not certain whether this is relevant, but the ICH10 IO-APIC
> documentation indicated that early EOI'ing of a line level interrupt
> should not have this effect.  Specifically, it states that EOI'ing a
> line level interrupt whos line is still asserted will cause the
> interrupt to be "re-raised" from the IO-APIC.  It uses this to assert
> that it is fine to use multiple IO-APIC entries with the same vector,
> with a broadcast of vector number alone to EOI the interrupt.
> 
> In this case, while Xen sees two interrupts, from the devices point of
> view, only I has happened.
> 
> In the case where the device has dropped its line level interrupt of its
> own accord, then I would agree that the current Xen behavior is wrong. 
> I cant offhand think of a good reason why this would occur.

I think this scenario is actually possible. It is certainly happening
with qemu's emulated devices.
This patch would take care of re-injecting the interrupts both in the
case of the device deasserting and reasserting the interrupt while the
guest hasn't cleared the pending bit yet and in case a PV on HVM guest
eois the interrupt too early.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-10-03 17:24           ` Stefano Stabellini
  2011-10-03 18:13             ` Stefano Stabellini
@ 2011-10-05 16:10             ` Stefan Bader
  2011-10-06 10:12               ` Stefano Stabellini
  2011-10-27 10:37             ` [PATCH] xen: do not loose level interrupt notifications Stefano Stabellini
  2 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-10-05 16:10 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 6824 bytes --]

On 03.10.2011 19:24, Stefano Stabellini wrote:
> I am going to send a different patch upstream for Xen 4.2, because I
> would also like it to cover the very unlikely scenario in which a PV
> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
> interrupts because when Xen tries to set the corresponding event channel
> pending the bit is alreay set. The codebase is different enough that
> making the same change on 4.1 is non-trivial. I am appending the new
> patch to this email, it would be great if you could test it. You just
> need a 4.2 hypervisor, not the entire system. You should be able to
> perform the test updating only xen.gz.
> If you have trouble if xen-unstable.hg tip, try changeset 23843.

Hi Stefano,

currently I would have the problem that I don't have too much time to move to
another hypervisor (tests may or may not be useful there with substantial
changes beside this one) with our next release being close.
But I think I got a usable backport of your change to 4.1.1 (you think it looks
ok?) and have given that a quick test which seems to be ok...
Though one drawback is that I don't have a setup which would use passthrough, so
that path is not tested. I think I did see (with a debugging version) that the
lost count was incremented and decremented in dom0, though.

-Stefan

---

Index: xen-4.1.1/xen/arch/x86/domain.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/domain.c	2011-10-05 15:03:19.405815293 +0200
+++ xen-4.1.1/xen/arch/x86/domain.c	2011-10-05 15:09:59.781816622 +0200
@@ -514,6 +514,12 @@ int arch_domain_create(struct domain *d,
         memset(d->arch.pirq_irq, 0,
                d->nr_pirqs * sizeof(*d->arch.pirq_irq));

+        d->arch.pirq_lost = xmalloc_array(int, d->nr_pirqs);
+        if ( !d->arch.pirq_lost)
+            goto fail;
+        memset(d->arch.pirq_lost, 0,
+               d->nr_pirqs * sizeof(*d->arch.pirq_lost));
+
         d->arch.irq_pirq = xmalloc_array(int, nr_irqs);
         if ( !d->arch.irq_pirq )
             goto fail;
@@ -575,6 +581,7 @@ int arch_domain_create(struct domain *d,
  fail:
     d->is_dying = DOMDYING_dead;
     vmce_destroy_msr(d);
+    xfree(d->arch.pirq_lost);
     xfree(d->arch.pirq_irq);
     xfree(d->arch.irq_pirq);
     xfree(d->arch.pirq_emuirq);
@@ -628,6 +635,7 @@ void arch_domain_destroy(struct domain *
 #endif

     free_xenheap_page(d->shared_info);
+    xfree(d->arch.pirq_lost);
     xfree(d->arch.pirq_irq);
     xfree(d->arch.irq_pirq);
     xfree(d->arch.pirq_emuirq);
Index: xen-4.1.1/xen/arch/x86/hvm/irq.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/hvm/irq.c	2011-10-05 15:14:35.441815292 +0200
+++ xen-4.1.1/xen/arch/x86/hvm/irq.c	2011-10-05 17:55:43.603986605 +0200
@@ -33,7 +33,9 @@ static void assert_gsi(struct domain *d,
     int pirq = domain_emuirq_to_pirq(d, ioapic_gsi);
     if ( hvm_domain_use_pirq(d, pirq) )
     {
-        send_guest_pirq(d, pirq);
+        if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
+            if (d->arch.pirq_lost)
+                d->arch.pirq_lost[pirq]++;
         return;
     }
     vioapic_irq_positive_edge(d, ioapic_gsi);
@@ -67,6 +69,12 @@ static void __hvm_pci_intx_assert(
     gsi = hvm_pci_intx_gsi(device, intx);
     if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
         assert_gsi(d, gsi);
+    else {
+        int pirq = domain_emuirq_to_pirq(d, gsi);
+
+        if ( hvm_domain_use_pirq(d, pirq) && d->arch.pirq_lost)
+            d->arch.pirq_lost[pirq]++;
+    }

     link    = hvm_pci_intx_link(device, intx);
     isa_irq = hvm_irq->pci_link.route[link];
Index: xen-4.1.1/xen/arch/x86/irq.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/irq.c	2011-10-05 15:26:58.477815292 +0200
+++ xen-4.1.1/xen/arch/x86/irq.c	2011-10-05 17:56:23.191986535 +0200
@@ -888,10 +888,13 @@ static void __do_IRQ_guest(int irq)
                 desc->status |= IRQ_INPROGRESS; /* cleared during hvm eoi */
             }
         }
-        else if ( send_guest_pirq(d, pirq) &&
-                  (action->ack_type == ACKTYPE_NONE) )
-        {
-            already_pending++;
+        else {
+            if ( send_guest_pirq(d, pirq) ) {
+	        if ( action->ack_type == ACKTYPE_EOI && d->arch.pirq_lost)
+                    d->arch.pirq_lost[pirq]++;
+                else if ( action->ack_type == ACKTYPE_NONE )
+                    already_pending++;
+            }
         }
     }

Index: xen-4.1.1/xen/arch/x86/physdev.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/physdev.c	2011-10-05 15:36:14.545815292 +0200
+++ xen-4.1.1/xen/arch/x86/physdev.c	2011-10-05 17:57:06.055986460 +0200
@@ -261,13 +261,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         ret = -EINVAL;
         if ( eoi.irq >= v->domain->nr_pirqs )
             break;
+        spin_lock(&v->domain->event_lock);
         if ( v->domain->arch.pirq_eoi_map )
             evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
-            ret = pirq_guest_eoi(v->domain, eoi.irq);
-        else
-            ret = 0;
+            pirq_guest_eoi(v->domain, eoi.irq);
+        if ( v->domain->arch.pirq_lost && v->domain->arch.pirq_lost[eoi.irq]) {
+            if ( !send_guest_pirq(v->domain, eoi.irq) )
+                v->domain->arch.pirq_lost[eoi.irq]--;
+        }
+        ret = 0;
+        spin_unlock(&v->domain->event_lock);
         break;
     }

@@ -323,9 +328,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
             break;
         irq_status_query.flags = 0;
         if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
+             domain_pirq_to_irq(v->domain, irq) <= 0 &&
+             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
         {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+            ret = -EINVAL;
             break;
         }

Index: xen-4.1.1/xen/include/asm-x86/domain.h
===================================================================
--- xen-4.1.1.orig/xen/include/asm-x86/domain.h	2011-10-05 15:10:11.709815293 +0200
+++ xen-4.1.1/xen/include/asm-x86/domain.h	2011-10-05 15:12:46.237815276 +0200
@@ -312,6 +312,9 @@ struct arch_domain
                                 (possibly other cases in the future */
     uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */
     uint64_t vtsc_usercount; /* not used for hvm */
+
+    /* Protected by d->event_lock, count of lost pirqs */
+    int *pirq_lost;
 } __cacheline_aligned;

 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))

[-- Attachment #2: xen-backport-pirq-lost.patch --]
[-- Type: text/x-diff, Size: 5500 bytes --]

Index: xen-4.1.1/xen/arch/x86/domain.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/domain.c	2011-10-05 15:03:19.405815293 +0200
+++ xen-4.1.1/xen/arch/x86/domain.c	2011-10-05 15:09:59.781816622 +0200
@@ -514,6 +514,12 @@ int arch_domain_create(struct domain *d,
         memset(d->arch.pirq_irq, 0,
                d->nr_pirqs * sizeof(*d->arch.pirq_irq));
 
+        d->arch.pirq_lost = xmalloc_array(int, d->nr_pirqs);
+        if ( !d->arch.pirq_lost)
+            goto fail;
+        memset(d->arch.pirq_lost, 0,
+               d->nr_pirqs * sizeof(*d->arch.pirq_lost));
+
         d->arch.irq_pirq = xmalloc_array(int, nr_irqs);
         if ( !d->arch.irq_pirq )
             goto fail;
@@ -575,6 +581,7 @@ int arch_domain_create(struct domain *d,
  fail:
     d->is_dying = DOMDYING_dead;
     vmce_destroy_msr(d);
+    xfree(d->arch.pirq_lost);
     xfree(d->arch.pirq_irq);
     xfree(d->arch.irq_pirq);
     xfree(d->arch.pirq_emuirq);
@@ -628,6 +635,7 @@ void arch_domain_destroy(struct domain *
 #endif
 
     free_xenheap_page(d->shared_info);
+    xfree(d->arch.pirq_lost);
     xfree(d->arch.pirq_irq);
     xfree(d->arch.irq_pirq);
     xfree(d->arch.pirq_emuirq);
Index: xen-4.1.1/xen/arch/x86/hvm/irq.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/hvm/irq.c	2011-10-05 15:14:35.441815292 +0200
+++ xen-4.1.1/xen/arch/x86/hvm/irq.c	2011-10-05 17:55:43.603986605 +0200
@@ -33,7 +33,9 @@ static void assert_gsi(struct domain *d,
     int pirq = domain_emuirq_to_pirq(d, ioapic_gsi);
     if ( hvm_domain_use_pirq(d, pirq) )
     {
-        send_guest_pirq(d, pirq);
+        if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
+            if (d->arch.pirq_lost)
+                d->arch.pirq_lost[pirq]++;
         return;
     }
     vioapic_irq_positive_edge(d, ioapic_gsi);
@@ -67,6 +69,12 @@ static void __hvm_pci_intx_assert(
     gsi = hvm_pci_intx_gsi(device, intx);
     if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
         assert_gsi(d, gsi);
+    else {
+        int pirq = domain_emuirq_to_pirq(d, gsi);
+
+        if ( hvm_domain_use_pirq(d, pirq) && d->arch.pirq_lost)
+            d->arch.pirq_lost[pirq]++;
+    }
 
     link    = hvm_pci_intx_link(device, intx);
     isa_irq = hvm_irq->pci_link.route[link];
Index: xen-4.1.1/xen/arch/x86/irq.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/irq.c	2011-10-05 15:26:58.477815292 +0200
+++ xen-4.1.1/xen/arch/x86/irq.c	2011-10-05 17:56:23.191986535 +0200
@@ -888,10 +888,13 @@ static void __do_IRQ_guest(int irq)
                 desc->status |= IRQ_INPROGRESS; /* cleared during hvm eoi */
             }
         }
-        else if ( send_guest_pirq(d, pirq) &&
-                  (action->ack_type == ACKTYPE_NONE) )
-        {
-            already_pending++;
+        else {
+            if ( send_guest_pirq(d, pirq) ) {
+	        if ( action->ack_type == ACKTYPE_EOI && d->arch.pirq_lost)
+                    d->arch.pirq_lost[pirq]++;
+                else if ( action->ack_type == ACKTYPE_NONE )
+                    already_pending++;
+            }
         }
     }
 
Index: xen-4.1.1/xen/arch/x86/physdev.c
===================================================================
--- xen-4.1.1.orig/xen/arch/x86/physdev.c	2011-10-05 15:36:14.545815292 +0200
+++ xen-4.1.1/xen/arch/x86/physdev.c	2011-10-05 17:57:06.055986460 +0200
@@ -261,13 +261,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         ret = -EINVAL;
         if ( eoi.irq >= v->domain->nr_pirqs )
             break;
+        spin_lock(&v->domain->event_lock);
         if ( v->domain->arch.pirq_eoi_map )
             evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
-            ret = pirq_guest_eoi(v->domain, eoi.irq);
-        else
-            ret = 0;
+            pirq_guest_eoi(v->domain, eoi.irq);
+        if ( v->domain->arch.pirq_lost && v->domain->arch.pirq_lost[eoi.irq]) {
+            if ( !send_guest_pirq(v->domain, eoi.irq) )
+                v->domain->arch.pirq_lost[eoi.irq]--;
+        }
+        ret = 0;
+        spin_unlock(&v->domain->event_lock);
         break;
     }
 
@@ -323,9 +328,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
             break;
         irq_status_query.flags = 0;
         if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
+             domain_pirq_to_irq(v->domain, irq) <= 0 &&
+             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
         {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+            ret = -EINVAL;
             break;
         }
 
Index: xen-4.1.1/xen/include/asm-x86/domain.h
===================================================================
--- xen-4.1.1.orig/xen/include/asm-x86/domain.h	2011-10-05 15:10:11.709815293 +0200
+++ xen-4.1.1/xen/include/asm-x86/domain.h	2011-10-05 15:12:46.237815276 +0200
@@ -312,6 +312,9 @@ struct arch_domain
                                 (possibly other cases in the future */
     uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */
     uint64_t vtsc_usercount; /* not used for hvm */
+
+    /* Protected by d->event_lock, count of lost pirqs */
+    int *pirq_lost;
 } __cacheline_aligned;
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-10-05 16:10             ` Stefan Bader
@ 2011-10-06 10:12               ` Stefano Stabellini
  2011-10-06 12:16                 ` Stefan Bader
  0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-06 10:12 UTC (permalink / raw)
  To: Stefan Bader; +Cc: xen-devel, Stefano Stabellini

On Wed, 5 Oct 2011, Stefan Bader wrote:
> On 03.10.2011 19:24, Stefano Stabellini wrote:
> > I am going to send a different patch upstream for Xen 4.2, because I
> > would also like it to cover the very unlikely scenario in which a PV
> > guest (like dom0 or a PV guest with PCI passthrough) is loosing level
> > interrupts because when Xen tries to set the corresponding event channel
> > pending the bit is alreay set. The codebase is different enough that
> > making the same change on 4.1 is non-trivial. I am appending the new
> > patch to this email, it would be great if you could test it. You just
> > need a 4.2 hypervisor, not the entire system. You should be able to
> > perform the test updating only xen.gz.
> > If you have trouble if xen-unstable.hg tip, try changeset 23843.
> 
> Hi Stefano,
> 
> currently I would have the problem that I don't have too much time to move to
> another hypervisor (tests may or may not be useful there with substantial
> changes beside this one) with our next release being close.
> But I think I got a usable backport of your change to 4.1.1 (you think it looks
> ok?) and have given that a quick test which seems to be ok...
> Though one drawback is that I don't have a setup which would use passthrough, so
> that path is not tested. I think I did see (with a debugging version) that the
> lost count was incremented and decremented in dom0, though.
> 

Honestly if you have to commit to a backport for your package right now,
I would go for the previous version, because it is simpler and less
likely to introduce regressions.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-10-06 10:12               ` Stefano Stabellini
@ 2011-10-06 12:16                 ` Stefan Bader
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Bader @ 2011-10-06 12:16 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel

On 06.10.2011 12:12, Stefano Stabellini wrote:
> On Wed, 5 Oct 2011, Stefan Bader wrote:
>> On 03.10.2011 19:24, Stefano Stabellini wrote:
>>> I am going to send a different patch upstream for Xen 4.2, because I
>>> would also like it to cover the very unlikely scenario in which a PV
>>> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
>>> interrupts because when Xen tries to set the corresponding event channel
>>> pending the bit is alreay set. The codebase is different enough that
>>> making the same change on 4.1 is non-trivial. I am appending the new
>>> patch to this email, it would be great if you could test it. You just
>>> need a 4.2 hypervisor, not the entire system. You should be able to
>>> perform the test updating only xen.gz.
>>> If you have trouble if xen-unstable.hg tip, try changeset 23843.
>>
>> Hi Stefano,
>>
>> currently I would have the problem that I don't have too much time to move to
>> another hypervisor (tests may or may not be useful there with substantial
>> changes beside this one) with our next release being close.
>> But I think I got a usable backport of your change to 4.1.1 (you think it looks
>> ok?) and have given that a quick test which seems to be ok...
>> Though one drawback is that I don't have a setup which would use passthrough, so
>> that path is not tested. I think I did see (with a debugging version) that the
>> lost count was incremented and decremented in dom0, though.
>>
> 
> Honestly if you have to commit to a backport for your package right now,
> I would go for the previous version, because it is simpler and less
> likely to introduce regressions.

Agreed. Well at least I hope that since that backport seemed to fix the issue I
saw in 4.1.1 it will give some more confidence for you on the 4.2 version. With
the drawback of the passthrough not being tested.

-Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] xen: do not loose level interrupt notifications
  2011-10-03 17:24           ` Stefano Stabellini
  2011-10-03 18:13             ` Stefano Stabellini
  2011-10-05 16:10             ` Stefan Bader
@ 2011-10-27 10:37             ` Stefano Stabellini
  2011-10-27 11:18               ` Keir Fraser
  2 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-27 10:37 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Jan Beulich, Stefan Bader

PV on HVM guests can loose level interrupts coming from emulated
devices: we are missing code to retry to inject a pirq in the guest if
it corresponds to a level interrupt and the interrupt has been raised
while the guest is servicing the first one.

The same thing could also happen with PV guests, including dom0, even
though it is much more unlikely. In case of PV guests the scenario would
be the following:

1) a device raises a level interrupt and xen injects it into the
guest;

2) the guest is temporarely stuck: it does not ack it or eoi it;

3) the xen timer kicks in and eois the interrupt;

4) the device thinks it is all fine and sends a second interrupt;

5) Xen fails to inject the second interrupt into the guest because the
guest has still the event channel pending bit set;

at this point the guest looses the second interrupt notification, that
is not supposed to happen with level interrupts and it might cause
problems with some devices.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

diff -r bf533533046c xen/arch/x86/hvm/irq.c
--- a/xen/arch/x86/hvm/irq.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/hvm/irq.c	Mon Oct 03 16:54:51 2011 +0000
@@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
 
     if ( hvm_domain_use_pirq(d, pirq) )
     {
-        send_guest_pirq(d, pirq);
+        if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
+            pirq->lost++;
         return;
     }
     vioapic_irq_positive_edge(d, ioapic_gsi);
@@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
 {
     struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
     unsigned int gsi, link, isa_irq;
+    struct pirq *pirq;
 
     ASSERT((device <= 31) && (intx <= 3));
 
@@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
     gsi = hvm_pci_intx_gsi(device, intx);
     if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
         assert_gsi(d, gsi);
+    else {
+        pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
+        if ( hvm_domain_use_pirq(d, pirq) )
+            pirq->lost++;
+    }
 
     link    = hvm_pci_intx_link(device, intx);
     isa_irq = hvm_irq->pci_link.route[link];
diff -r bf533533046c xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/irq.c	Mon Oct 03 16:54:51 2011 +0000
@@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
              !test_and_set_bool(pirq->masked) )
             action->in_flight++;
         if ( !hvm_do_IRQ_dpci(d, pirq) )
-            send_guest_pirq(d, pirq);
+        {
+            if ( send_guest_pirq(d, pirq) &&
+                    action->ack_type == ACKTYPE_EOI )
+                pirq->lost++;
+        }
     }
 
     if ( action->ack_type != ACKTYPE_NONE )
diff -r bf533533046c xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/physdev.c	Mon Oct 03 16:54:51 2011 +0000
@@ -11,6 +11,7 @@
 #include <asm/current.h>
 #include <asm/io_apic.h>
 #include <asm/msi.h>
+#include <asm/hvm/irq.h>
 #include <asm/hypercall.h>
 #include <public/xen.h>
 #include <public/physdev.h>
@@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
             pirq_guest_eoi(pirq);
+        if ( pirq->lost > 0) {
+            if ( !send_guest_pirq(v->domain, pirq) )
+                pirq->lost--;
+        }
         spin_unlock(&v->domain->event_lock);
         ret = 0;
         break;
@@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
             break;
         irq_status_query.flags = 0;
         if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
+             domain_pirq_to_irq(v->domain, irq) <= 0 &&
+             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
         {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+            ret = -EINVAL;
             break;
         }
 
diff -r bf533533046c xen/include/xen/irq.h
--- a/xen/include/xen/irq.h	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/include/xen/irq.h	Mon Oct 03 16:54:51 2011 +0000
@@ -146,6 +146,7 @@ struct pirq {
     int pirq;
     u16 evtchn;
     bool_t masked;
+    u32 lost;
     struct rcu_head rcu_head;
     struct arch_pirq arch;
 };

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] xen: do not loose level interrupt notifications
  2011-10-27 10:37             ` [PATCH] xen: do not loose level interrupt notifications Stefano Stabellini
@ 2011-10-27 11:18               ` Keir Fraser
  2011-10-27 11:42                 ` Stefano Stabellini
  0 siblings, 1 reply; 20+ messages in thread
From: Keir Fraser @ 2011-10-27 11:18 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefan Bader, Jan Beulich

On 27/10/2011 11:37, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com>
wrote:

> PV on HVM guests can loose level interrupts coming from emulated
> devices: we are missing code to retry to inject a pirq in the guest if
> it corresponds to a level interrupt and the interrupt has been raised
> while the guest is servicing the first one.
> 
> The same thing could also happen with PV guests, including dom0, even
> though it is much more unlikely. In case of PV guests the scenario would
> be the following:
> 
> 1) a device raises a level interrupt and xen injects it into the
> guest;
> 
> 2) the guest is temporarely stuck: it does not ack it or eoi it;
> 
> 3) the xen timer kicks in and eois the interrupt;
> 
> 4) the device thinks it is all fine and sends a second interrupt;
> 
> 5) Xen fails to inject the second interrupt into the guest because the
> guest has still the event channel pending bit set;
> 
> at this point the guest looses the second interrupt notification, that
> is not supposed to happen with level interrupts and it might cause
> problems with some devices.

You can't really lose a level-triggered interrupt. In step (4) the device
isn't really actively involved in sending another interrupt -- it never
deasserted its INTx line, and nor will it until the guest's ISR quenches the
interrupt at the device. If the guest misses such an interrupt, and doesn't
execute the relevant ISR when it should, then another interrupt will simply
be raised by the interrupt controller when the guest does finally EOI the
interrupt. Because the device is *still* asserting the line.

Well, that's the PV case anyway. I don't see any problem with our handling
of the PV case.

Is PV-HVM so different?

 -- Keir

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] xen: do not loose level interrupt notifications
  2011-10-27 11:18               ` Keir Fraser
@ 2011-10-27 11:42                 ` Stefano Stabellini
  2011-10-27 12:17                   ` Keir Fraser
  0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-10-27 11:42 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel, Stefan Bader, Jan Beulich, Stefano Stabellini

On Thu, 27 Oct 2011, Keir Fraser wrote:
> On 27/10/2011 11:37, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com>
> wrote:
> 
> > PV on HVM guests can loose level interrupts coming from emulated
> > devices: we are missing code to retry to inject a pirq in the guest if
> > it corresponds to a level interrupt and the interrupt has been raised
> > while the guest is servicing the first one.
> > 
> > The same thing could also happen with PV guests, including dom0, even
> > though it is much more unlikely. In case of PV guests the scenario would
> > be the following:
> > 
> > 1) a device raises a level interrupt and xen injects it into the
> > guest;
> > 
> > 2) the guest is temporarely stuck: it does not ack it or eoi it;
> > 
> > 3) the xen timer kicks in and eois the interrupt;
> > 
> > 4) the device thinks it is all fine and sends a second interrupt;
> > 
> > 5) Xen fails to inject the second interrupt into the guest because the
> > guest has still the event channel pending bit set;
> > 
> > at this point the guest looses the second interrupt notification, that
> > is not supposed to happen with level interrupts and it might cause
> > problems with some devices.
> 
> You can't really lose a level-triggered interrupt. In step (4) the device
> isn't really actively involved in sending another interrupt -- it never
> deasserted its INTx line, and nor will it until the guest's ISR quenches the
> interrupt at the device. If the guest misses such an interrupt, and doesn't
> execute the relevant ISR when it should, then another interrupt will simply
> be raised by the interrupt controller when the guest does finally EOI the
> interrupt. Because the device is *still* asserting the line.
> 
> Well, that's the PV case anyway. I don't see any problem with our handling
> of the PV case.

OK, you convinced me.


> Is PV-HVM so different?

Yes, it is. In the PV on HVM case we need to reassert an emulated
interrupt if the guest EOIs it without quenching the interrupt in the
ISR.
We are doing it already in the emulated code path but we are not doing
it for interrupts that have been remapped into pirqs.

That said, if we don't care about the PV case we can simplify the patch,
I am appending a new one that only takes care of the PV on HVM case.

---


xen: re-inject emulated level pirqs in PV on HVM guests if still asserted

PV on HVM guests can loose level interrupts coming from emulated devices
if they have been remapped onto event channels.  The reason is that we
are missing the code to inject a pirq again in the guest when the guest
EOIs it, if it corresponds to an emulated level interrupt and the
interrupt is still asserted.

Fix this issue and also return error when the guest tries to get the
irq_status of a non-existing pirq.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

diff -r c681dd5aecf3 xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Tue Oct 25 19:22:09 2011 +0100
+++ b/xen/arch/x86/physdev.c	Thu Oct 27 11:30:55 2011 +0000
@@ -11,6 +11,7 @@
 #include <asm/current.h>
 #include <asm/io_apic.h>
 #include <asm/msi.h>
+#include <asm/hvm/irq.h>
 #include <asm/hypercall.h>
 #include <public/xen.h>
 #include <public/physdev.h>
@@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
             pirq_guest_eoi(pirq);
+        if ( is_hvm_domain(v->domain) &&
+                domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
+        {
+            struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
+            int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
+
+            /* if this is a level irq and count > 0, send another
+             * notification */ 
+            if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
+                    && hvm_irq->gsi_assert_count[gsi] )
+                send_guest_pirq(v->domain, pirq);
+        }
         spin_unlock(&v->domain->event_lock);
         ret = 0;
         break;
@@ -328,9 +341,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
             break;
         irq_status_query.flags = 0;
         if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
+             domain_pirq_to_irq(v->domain, irq) <= 0 &&
+             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
         {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+            ret = -EINVAL;
             break;
         }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] xen: do not loose level interrupt notifications
  2011-10-27 11:42                 ` Stefano Stabellini
@ 2011-10-27 12:17                   ` Keir Fraser
  0 siblings, 0 replies; 20+ messages in thread
From: Keir Fraser @ 2011-10-27 12:17 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefan Bader, Jan Beulich

On 27/10/2011 12:42, "Stefano Stabellini" <stefano.stabellini@eu.citrix.com>
wrote:

>> Is PV-HVM so different?
> 
> Yes, it is. In the PV on HVM case we need to reassert an emulated
> interrupt if the guest EOIs it without quenching the interrupt in the
> ISR.
> We are doing it already in the emulated code path but we are not doing
> it for interrupts that have been remapped into pirqs.
> 
> That said, if we don't care about the PV case we can simplify the patch,
> I am appending a new one that only takes care of the PV on HVM case.

Ah yes, when we are *emulating* an INTx line, either for an emulated device
or because we are doing MSI-INTx emulation, we do have to remember to
reassert. That makes sense.

 -- Keir

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-09-22 11:58       ` Stefan Bader
@ 2011-09-22 14:32         ` Stefan Bader
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Bader @ 2011-09-22 14:32 UTC (permalink / raw)
  To: xen-devel

On 22.09.2011 13:58, Stefan Bader wrote:
> On 22.09.2011 12:30, Stefano Stabellini wrote:
>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>> On 21.09.2011 15:31, Stefano Stabellini wrote:
>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
>>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
>>>>> gets configured via dhcp. And initial pings also get routed and done correctly.
>>>>> But slightly higher traffic (like checking for updates) hangs. And after a while
>>>>> there are messages about tx timeouts.
>>>>> The ne2k_pci type nic almost immediately has those issues and never comes up
>>>>> correctly.
>>>>>
>>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
>>>>> this should be but both nics get configured with level,low IRQs. Disk emulation
>>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
>>>>> at least not level.
>>>>
>>>
>>>> Does the e1000 emulated card work correctly?
>>>
>>> Yes, that one seems to work ok.
>>>
>>>> What happens if you disable interrupt remapping (see patch below)?
>>>
>>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
>>> still works. Both then using IOAPIC-fasteoi.
>>>
>>
>> That means there must be another subtle bug in Xen in interrupt
>> remapping that only affects 8139p emulation
>>
> Right, or to be complete:
> - e1000: ok
> - 8139cp: unstable (setup is possible)
> - ne2k_pci: not working (tx problems from the beginning)
> 
> The behaviour feels a bit like interrupts may get lost if occurring at a higher
> rate. Why this affects various drivers differently is a bit weird.
>>

This is mainly speculating... Quite a while back there was this patch to events:

commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date:   Fri Aug 20 19:10:01 2010 -0700

    xen: handle events as edge-triggered

The commit message stated that Xen events are logically edge triggered. So PV
events were changed to be handled as edge interrupts. Would that not mean that
for xen-pirq-apic being using events this would apply the same and those should
be apic-edge instead of level?

>>>>> Another problem came up recently though that may just be me doing the wrong
>>>>> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated
>>>>> devices. xen-blkfront is a module in my case and I thought I once had been able
>>>>> to use that by removing the unplug arg and making the blkfront driver load. But
>>>>> when I recently tried the module loaded but no disks appeared... Again, not sure
>>>>> I just forgot how to do that right or that was different when using a 4.1.0
>>>>> hypervisor still...
>>>>  
>>>> xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
>>>> older hypervisors that didn't support the unplug protocol and had other
>>>> ways to cope with multiple drivers accessing the same devices.
>>>> You can use xen_emul_unplug=never to prevent any unplug but you won't
>>>> get any PV interfaces.
>>>
>>> Hm, odd. Somehow I thought that I had been using pv interfaces that way when the
>>> interrupts for the emulated ide was broken.
>>> A bit suboptimal atm, because without any option and a kernel compiled with the
>>> platform pci and pv drivers (as modules) booting in HVM mode the kernel decides
>>> that having both is no use and unplugs the emulated devices. Which then leaves
>>> you with ... none.
>>
>> In theory you would have the PV frontend modules in the initrd.
>> On the other hand having both can easily cause data corruptions on your
>> drive.
> 
> They _are_ in the initrd. And the boot rightfully drops to a maintenance shell
> right now (without any argument and the emulated devices unplugged). And
> "modprobe xen-blkfront" loads the module but it does _not_ detect any pv device.
> 
> -Stefan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-09-22 10:30     ` Stefano Stabellini
@ 2011-09-22 11:58       ` Stefan Bader
  2011-09-22 14:32         ` Stefan Bader
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-09-22 11:58 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano

On 22.09.2011 12:30, Stefano Stabellini wrote:
> On Wed, 21 Sep 2011, Stefan Bader wrote:
>> On 21.09.2011 15:31, Stefano Stabellini wrote:
>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
>>>> gets configured via dhcp. And initial pings also get routed and done correctly.
>>>> But slightly higher traffic (like checking for updates) hangs. And after a while
>>>> there are messages about tx timeouts.
>>>> The ne2k_pci type nic almost immediately has those issues and never comes up
>>>> correctly.
>>>>
>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
>>>> this should be but both nics get configured with level,low IRQs. Disk emulation
>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
>>>> at least not level.
>>>
>>
>>> Does the e1000 emulated card work correctly?
>>
>> Yes, that one seems to work ok.
>>
>>> What happens if you disable interrupt remapping (see patch below)?
>>
>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
>> still works. Both then using IOAPIC-fasteoi.
>>
> 
> That means there must be another subtle bug in Xen in interrupt
> remapping that only affects 8139p emulation
> 
Right, or to be complete:
- e1000: ok
- 8139cp: unstable (setup is possible)
- ne2k_pci: not working (tx problems from the beginning)

The behaviour feels a bit like interrupts may get lost if occurring at a higher
rate. Why this affects various drivers differently is a bit weird.
> 
>>>> Another problem came up recently though that may just be me doing the wrong
>>>> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated
>>>> devices. xen-blkfront is a module in my case and I thought I once had been able
>>>> to use that by removing the unplug arg and making the blkfront driver load. But
>>>> when I recently tried the module loaded but no disks appeared... Again, not sure
>>>> I just forgot how to do that right or that was different when using a 4.1.0
>>>> hypervisor still...
>>>  
>>> xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
>>> older hypervisors that didn't support the unplug protocol and had other
>>> ways to cope with multiple drivers accessing the same devices.
>>> You can use xen_emul_unplug=never to prevent any unplug but you won't
>>> get any PV interfaces.
>>
>> Hm, odd. Somehow I thought that I had been using pv interfaces that way when the
>> interrupts for the emulated ide was broken.
>> A bit suboptimal atm, because without any option and a kernel compiled with the
>> platform pci and pv drivers (as modules) booting in HVM mode the kernel decides
>> that having both is no use and unplugs the emulated devices. Which then leaves
>> you with ... none.
> 
> In theory you would have the PV frontend modules in the initrd.
> On the other hand having both can easily cause data corruptions on your
> drive.

They _are_ in the initrd. And the boot rightfully drops to a maintenance shell
right now (without any argument and the emulated devices unplugged). And
"modprobe xen-blkfront" loads the module but it does _not_ detect any pv device.

-Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-09-21 15:34   ` Stefan Bader
@ 2011-09-22 10:30     ` Stefano Stabellini
  2011-09-22 11:58       ` Stefan Bader
  0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2011-09-22 10:30 UTC (permalink / raw)
  To: Stefan Bader; +Cc: xen-devel, Stefano, Stefano Stabellini

On Wed, 21 Sep 2011, Stefan Bader wrote:
> On 21.09.2011 15:31, Stefano Stabellini wrote:
> > On Wed, 21 Sep 2011, Stefan Bader wrote:
> >> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
> >> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
> >> gets configured via dhcp. And initial pings also get routed and done correctly.
> >> But slightly higher traffic (like checking for updates) hangs. And after a while
> >> there are messages about tx timeouts.
> >> The ne2k_pci type nic almost immediately has those issues and never comes up
> >> correctly.
> >>
> >> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
> >> this should be but both nics get configured with level,low IRQs. Disk emulation
> >> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
> >> at least not level.
> > 
> 
> > Does the e1000 emulated card work correctly?
> 
> Yes, that one seems to work ok.
> 
> > What happens if you disable interrupt remapping (see patch below)?
> 
> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
> still works. Both then using IOAPIC-fasteoi.
> 

That means there must be another subtle bug in Xen in interrupt
remapping that only affects 8139p emulation


> >> Another problem came up recently though that may just be me doing the wrong
> >> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated
> >> devices. xen-blkfront is a module in my case and I thought I once had been able
> >> to use that by removing the unplug arg and making the blkfront driver load. But
> >> when I recently tried the module loaded but no disks appeared... Again, not sure
> >> I just forgot how to do that right or that was different when using a 4.1.0
> >> hypervisor still...
> >  
> > xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
> > older hypervisors that didn't support the unplug protocol and had other
> > ways to cope with multiple drivers accessing the same devices.
> > You can use xen_emul_unplug=never to prevent any unplug but you won't
> > get any PV interfaces.
> 
> Hm, odd. Somehow I thought that I had been using pv interfaces that way when the
> interrupts for the emulated ide was broken.
> A bit suboptimal atm, because without any option and a kernel compiled with the
> platform pci and pv drivers (as modules) booting in HVM mode the kernel decides
> that having both is no use and unplugs the emulated devices. Which then leaves
> you with ... none.

In theory you would have the PV frontend modules in the initrd.
On the other hand having both can easily cause data corruptions on your
drive.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: Still struggling with HVM: tx timeouts on emulated nics
  2011-09-21 13:31 ` Stefano Stabellini
@ 2011-09-21 15:34   ` Stefan Bader
  2011-09-22 10:30     ` Stefano Stabellini
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Bader @ 2011-09-21 15:34 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano

On 21.09.2011 15:31, Stefano Stabellini wrote:
> On Wed, 21 Sep 2011, Stefan Bader wrote:
>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
>> gets configured via dhcp. And initial pings also get routed and done correctly.
>> But slightly higher traffic (like checking for updates) hangs. And after a while
>> there are messages about tx timeouts.
>> The ne2k_pci type nic almost immediately has those issues and never comes up
>> correctly.
>>
>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
>> this should be but both nics get configured with level,low IRQs. Disk emulation
>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
>> at least not level.
> 

> Does the e1000 emulated card work correctly?

Yes, that one seems to work ok.

> What happens if you disable interrupt remapping (see patch below)?

8139cp seems to work correctly now (much higher irq stats as well) and e1000
still works. Both then using IOAPIC-fasteoi.

> 
> 
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index 1017c7b..472a58b 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -354,8 +354,7 @@ int __init pci_xen_init(void)
>  
>  int __init pci_xen_hvm_init(void)
>  {
> -	if (!xen_feature(XENFEAT_hvm_pirqs))
> -		return 0;
> +	return 0;
>  
>  #ifdef CONFIG_ACPI
>  	/*
> 
> 
>> Btw, what exactly is the difference between xen-pirq-ioapic and IO-APIC?
> 
> xen-pirq-ioapic interrupts are interrupts that have been remapped onto
> event channels

Ah ok, thanks.

> 
> 
>> Another problem came up recently though that may just be me doing the wrong
>> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated
>> devices. xen-blkfront is a module in my case and I thought I once had been able
>> to use that by removing the unplug arg and making the blkfront driver load. But
>> when I recently tried the module loaded but no disks appeared... Again, not sure
>> I just forgot how to do that right or that was different when using a 4.1.0
>> hypervisor still...
>  
> xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
> older hypervisors that didn't support the unplug protocol and had other
> ways to cope with multiple drivers accessing the same devices.
> You can use xen_emul_unplug=never to prevent any unplug but you won't
> get any PV interfaces.

Hm, odd. Somehow I thought that I had been using pv interfaces that way when the
interrupts for the emulated ide was broken.
A bit suboptimal atm, because without any option and a kernel compiled with the
platform pci and pv drivers (as modules) booting in HVM mode the kernel decides
that having both is no use and unplugs the emulated devices. Which then leaves
you with ... none.

-Stefan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-10-27 12:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4E7B4768.8060103@canonical.com>
2011-09-22 17:44 ` Re: Still struggling with HVM: tx timeouts on emulated nics Stefano Stabellini
2011-09-30  9:13   ` Stefan Bader
2011-09-30 14:09     ` Stefano Stabellini
2011-09-30 16:06       ` Stefan Bader
2011-09-30 17:59         ` Stefan Bader
2011-10-03 17:24           ` Stefano Stabellini
2011-10-03 18:13             ` Stefano Stabellini
2011-10-04 10:07               ` Andrew Cooper
2011-10-04 14:13                 ` Stefano Stabellini
2011-10-05 16:10             ` Stefan Bader
2011-10-06 10:12               ` Stefano Stabellini
2011-10-06 12:16                 ` Stefan Bader
2011-10-27 10:37             ` [PATCH] xen: do not loose level interrupt notifications Stefano Stabellini
2011-10-27 11:18               ` Keir Fraser
2011-10-27 11:42                 ` Stefano Stabellini
2011-10-27 12:17                   ` Keir Fraser
2011-09-21 13:03 Still struggling with HVM: tx timeouts on emulated nics Stefan Bader
2011-09-21 13:31 ` Stefano Stabellini
2011-09-21 15:34   ` Stefan Bader
2011-09-22 10:30     ` Stefano Stabellini
2011-09-22 11:58       ` Stefan Bader
2011-09-22 14:32         ` Stefan Bader

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.