From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: Re: Still struggling with HVM: tx timeouts on emulated nics Date: Thu, 22 Sep 2011 18:44:31 +0100 Message-ID: References: <4E7B4768.8060103@canonical.com> Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Return-path: In-Reply-To: <4E7B4768.8060103@canonical.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Stefan Bader Cc: xen-devel@lists.xensource.com, Stefano Stabellini List-Id: xen-devel@lists.xenproject.org On Thu, 22 Sep 2011, Stefan Bader wrote: > On 22.09.2011 13:58, Stefan Bader wrote: > > On 22.09.2011 12:30, Stefano Stabellini wrote: > >> On Wed, 21 Sep 2011, Stefan Bader wrote: > >>> On 21.09.2011 15:31, Stefano Stabellini wrote: > >>>> On Wed, 21 Sep 2011, Stefan Bader wrote: > >>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the > >>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and > >>>>> gets configured via dhcp. And initial pings also get routed and done correctly. > >>>>> But slightly higher traffic (like checking for updates) hangs. And after a while > >>>>> there are messages about tx timeouts. > >>>>> The ne2k_pci type nic almost immediately has those issues and never comes up > >>>>> correctly. > >>>>> > >>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how > >>>>> this should be but both nics get configured with level,low IRQs. Disk emulation > >>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be > >>>>> at least not level. > >>>> > >>> > >>>> Does the e1000 emulated card work correctly? > >>> > >>> Yes, that one seems to work ok. > >>> > >>>> What happens if you disable interrupt remapping (see patch below)? > >>> > >>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000 > >>> still works. Both then using IOAPIC-fasteoi. > >>> > >> > >> That means there must be another subtle bug in Xen in interrupt > >> remapping that only affects 8139p emulation > >> > > Right, or to be complete: > > - e1000: ok > > - 8139cp: unstable (setup is possible) > > - ne2k_pci: not working (tx problems from the beginning) > > > > The behaviour feels a bit like interrupts may get lost if occurring at a higher > > rate. Why this affects various drivers differently is a bit weird. > >> > > This is mainly speculating... Quite a while back there was this patch to events: > > commit dffe2e1e1a1ddb566a76266136c312801c66dcf7 > Author: Jeremy Fitzhardinge > Date: Fri Aug 20 19:10:01 2010 -0700 > > xen: handle events as edge-triggered > > The commit message stated that Xen events are logically edge triggered. So PV > events were changed to be handled as edge interrupts. Would that not mean that > for xen-pirq-apic being using events this would apply the same and those should > be apic-edge instead of level? That commit is referring to the internal way Linux handles these event, that look like normal interrupt to the Linux irq subsystem. It is not related to the way actual events are delivered from Xen to Linux, so it shouldn't matter here. I would add lots of printk's in: xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert xen/arch/x86/hvm/irq.c:assert_irq xen/arch/x86/hvm/irq.c:assert_gsi to find out why xen is not injecting those interrupts