From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: Re: Still struggling with HVM: tx timeouts on emulated nics Date: Tue, 4 Oct 2011 15:13:03 +0100 Message-ID: References: <4E7B4768.8060103@canonical.com> <4E85883C.7030808@canonical.com> <4E85E8E8.2020702@canonical.com> <4E860382.7040108@canonical.com> <4E8ADAD3.9050404@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Return-path: In-Reply-To: <4E8ADAD3.9050404@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andrew Cooper Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On Tue, 4 Oct 2011, Andrew Cooper wrote: > On 03/10/11 19:13, Stefano Stabellini wrote: > > CC'ing Jan, that probably is going to have an opinion on this. > > > > Let me add a bit of background: Stefan found out that PV on HVM guests > > could loose level interrupts coming from emulated devices. Looking > > through the code I realized that we need to add some logic to inject a > > pirq in the guest if a level interrupt has been raised while the guest > > is servicing the first one. > > While this is all very specific to interrupt remapping and emulated > > devices, I realized that something similar could happen even with dom0 > > or other PV guests with PCI passthrough: > > > > 1) the device raises a level interrupt and xen injects it into the > > guest; > > > > 2) the guest is temporarely stuck: it does not ack it or eoi it; > > > > 3) the xen timer kicks in and eois the interrupt; > > > > 4) the device thinks it is all fine and sends a second interrupt; > > > > 5) Xen fails to inject the second interrupt into the guest because the > > guest has still the event channel pending bit set; > > > > at this point the guest looses the second interrupt notification, that > > is not supposed to happen with level interrupts and I think it might > > cause problems with some devices. > > > > Jan, do you think we should try to handle this case, or is it too > > unlikely? > > I am not certain whether this is relevant, but the ICH10 IO-APIC > documentation indicated that early EOI'ing of a line level interrupt > should not have this effect. Specifically, it states that EOI'ing a > line level interrupt whos line is still asserted will cause the > interrupt to be "re-raised" from the IO-APIC. It uses this to assert > that it is fine to use multiple IO-APIC entries with the same vector, > with a broadcast of vector number alone to EOI the interrupt. > > In this case, while Xen sees two interrupts, from the devices point of > view, only I has happened. > > In the case where the device has dropped its line level interrupt of its > own accord, then I would agree that the current Xen behavior is wrong. > I cant offhand think of a good reason why this would occur. I think this scenario is actually possible. It is certainly happening with qemu's emulated devices. This patch would take care of re-injecting the interrupts both in the case of the device deasserting and reasserting the interrupt while the guest hasn't cleared the pending bit yet and in case a PV on HVM guest eois the interrupt too early.