Re: Re: Still struggling with HVM: tx timeouts on emulated nics

From: Stefan Bader <stefan.bader@canonical.com>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: xen-devel@lists.xensource.com
Subject: Re: Re: Still struggling with HVM: tx timeouts on emulated nics
Date: Fri, 30 Sep 2011 11:13:32 +0200	[thread overview]
Message-ID: <4E85883C.7030808@canonical.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1109221838370.8700@kaball-desktop>

On 22.09.2011 19:44, Stefano Stabellini wrote:
> On Thu, 22 Sep 2011, Stefan Bader wrote:
>> On 22.09.2011 13:58, Stefan Bader wrote:
>>> On 22.09.2011 12:30, Stefano Stabellini wrote:
>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>>> On 21.09.2011 15:31, Stefano Stabellini wrote:
>>>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
>>>>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
>>>>>>> gets configured via dhcp. And initial pings also get routed and done correctly.
>>>>>>> But slightly higher traffic (like checking for updates) hangs. And after a while
>>>>>>> there are messages about tx timeouts.
>>>>>>> The ne2k_pci type nic almost immediately has those issues and never comes up
>>>>>>> correctly.
>>>>>>>
>>>>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
>>>>>>> this should be but both nics get configured with level,low IRQs. Disk emulation
>>>>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
>>>>>>> at least not level.
>>>>>>
>>>>>
>>>>>> Does the e1000 emulated card work correctly?
>>>>>
>>>>> Yes, that one seems to work ok.
>>>>>
>>>>>> What happens if you disable interrupt remapping (see patch below)?
>>>>>
>>>>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000
>>>>> still works. Both then using IOAPIC-fasteoi.
>>>>>
>>>>
>>>> That means there must be another subtle bug in Xen in interrupt
>>>> remapping that only affects 8139p emulation
>>>>
>>> Right, or to be complete:
>>> - e1000: ok
>>> - 8139cp: unstable (setup is possible)
>>> - ne2k_pci: not working (tx problems from the beginning)
>>>
>>> The behaviour feels a bit like interrupts may get lost if occurring at a higher
>>> rate. Why this affects various drivers differently is a bit weird.
>>>>
>>
>> This is mainly speculating... Quite a while back there was this patch to events:
>>
>> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
>> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> Date:   Fri Aug 20 19:10:01 2010 -0700
>>
>>     xen: handle events as edge-triggered
>>
>> The commit message stated that Xen events are logically edge triggered. So PV
>> events were changed to be handled as edge interrupts. Would that not mean that
>> for xen-pirq-apic being using events this would apply the same and those should
>> be apic-edge instead of level?
> 
> That commit is referring to the internal way Linux handles these event,
> that look like normal interrupt to the Linux irq subsystem. It is not
> related to the way actual events are delivered from Xen to Linux, so it
> shouldn't matter here.
> 
> I would add lots of printk's in:
> 
> xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert
> xen/arch/x86/hvm/irq.c:assert_irq
> xen/arch/x86/hvm/irq.c:assert_gsi
> 
> to find out why xen is not injecting those interrupts
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

It took quite a bit of time but at least I got some hopefully useful information
now. So in general, whenever an interrupt is asserted,
the hypervisor runs through this:

__hvm_pci_intx_assert:
  when assert count was 0 before incrementing
    call assert_gsi
      call send_guest_pirq (when hvm uses pirq)

In the send_guest_pirq chain is a call to evtchn_set_pending which tests as one
of the first actions whether evtchn_pending in the shared_info is set. If that
is the case the call immediately returns with 1.

Adding printks to call_assert_gsi, I noticed that
- When things stop working, the last call to send_guest_pirq returned 1.
- But not every time the return code is one, the stall happens.
- e1000 also has cases where send_guest_pirq returns 1 but they happen much
  less often (than using the 8139cp).

Usually every intx_assert has a intx_deassert call that follows. when the stall
occurs, this does not happen. Right here I got some troubles to understand where
this intx_deassert is actually triggered. With an added WARN_ON the stack traces
seem odd, like this:

(XEN)    [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130
(XEN)    [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60
(XEN)    [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60
(XEN)    [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220
(XEN)    [<ffff82c48017dba6>] copy_from_user+0x26/0x90
(XEN)    [<ffff82c4801f9446>] do_iret+0xb6/0x1a0
(XEN)    [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d

Not really sure how one gets from do_update_descriptor to do_hvm_op and the only
thing in there which does the deassert is some irq level setting.

Actually the guest does not really do much do EOI (which I had been assuming).
But since domain_pirq_to_irq maps to 0 for emuirqs, the call to
PHYSDEVOP_irq_status_query will hit the following and not set the flag for
needing EOI.

        irq_status_query.flags = 0;
        if ( is_hvm_domain(v->domain) &&
             domain_pirq_to_irq(v->domain, irq) <= 0 )
        {
            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
            break;
        }

So all the guest is doing is to clear evtchn_pending in the pirq EOI function. I
fail to understand what actually is doing the hvm_pci_intx_deassert calls but
the way the fasteoi code in the guest looks to be working, there seems to be
some gap between calling the handler and the eoi function... So from what I see,
I would assume the following:

dom0                                     domU
- intx_assert (count 0->1)
- send_guest_pirq = 0
  (evtchn_pending = 1)
                                         - upcall starts fasteoi handler
- something does intx_deassert
  (count 1->0)
- intx_assert (count 0->1)
- send_guest_pirq = 1
  (evtchn_pending still set)
                                         - handler->eoi sets evtchn to 0 but
                                           otherwise does nothing
- there is no intx_deassert, so even
  when another intx_assert would happen
  (which does not seem to be the case)
  no further send_guest_pirq would be
  called.

Unfortunately I do miss some details on the inner working here. Generally I
wonder whether not setting the needsEOI flag for those pirqs just is the
problem. But it also could be intentional...

-Stefan