From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: radeon in dom0/ivtv in domU: irq 16 nobody cared Date: Thu, 08 Apr 2010 11:45:37 -0700 Message-ID: <4BBE2451.7090600@goop.org> References: <20100408001916.GA10840@phenom.dumpdata.com> <20100408173700.GB26343@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20100408173700.GB26343@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xensource.com, Mark Hurenkamp List-Id: xen-devel@lists.xenproject.org On 04/08/2010 10:37 AM, Konrad Rzeszutek Wilk wrote: >> Yes, >> >> Please e-mail your full serial log output, your cat /proc/interrupts, >> and 'lspci -vvv' output. This is to say, for both Dom0 and DomU. >> > I think I am able to reproduce this with one device (in DomU) that shares the IRQ > (17) with another device that is in Dom0. In Dom0 I get: > For the "nobody cared" message to trigger, then there must either have been no interrupt handlers at all, or they all returned IRQ_NONE. So in theory, if irq 17 has an active driver on it, then its irq handler should see the interrupt, poke the device, go "huh, nothing for me to do, must be a spurious interrupt from something else sharing the irq", and I guess return IRQ_NONE. So what stops this? If the irq isn't being shared with anything in dom0, we should be careful not even map the interrupt into dom0 (though I suspect we only ever map, never unmap, interrupts). But if the interrupt is being shared, I think we need a proxy interrupt handler installed by pciback (pcistub?)to absorb apparently spurious interrupts, which always returns IRQ_HANDLED (and perhaps have some of its own screaming interrupt logic in case something has gone awry)? Or if not that, what? How has this problem been avoided before? > -sh-3.1# > -sh-3.1# [ 2349.534294] irq 17: nobody cared (try booting with the > "irqpoll" option) > [ 2349.534477] Pid: 0, comm: swapper Not tainted 2Trace: > [ 2349.534728] [] __report_bad_irq+0x54/0xe2 > [ 2349.534887] [] note_interrupt+0x24d/0x2b8 > [ 2349.535019] [] handle_level_irq+0xef/0x17b > [ 2349.535151] [] xen_evtchn_do_upcall+0x156/0x254 > [ 2349.535282] [] > xen_do_hypervisor_callback+0x1e/0x30 > [ 2349.535282] [] ? > hypercall_page+0x3aa/0x1000 > [ 2349.535282] [] ? hypercall_page+0x3aa/0x1000 > [ 2349.535282] [] ? hypercall_page+0x3aa/0x1000 > [ 2349.535282] [] ? xen_safe_halt+0x1e/0x3d > [ 2349.535282] [] ? xen_idle+0x10b/0x130 > [ 2349.535282] [] ? cpu_idle+0x167/0x1d5 > [ 2349.535282] [] ? rest_init+0xb5/0xbe > [ 2349.535282] [] ? start_kernel+0x777/0x78a > [ 2349.535282] [] ? > x86_64_start_reservations+0x111/0x11c > [ 2349.535282] [] ? xen_start_kernel+0x678/0x686 > [ 2349.535282] handlers: > [ 2349.535282] [] (lpfc_sli_intr_handler+0x0/0x22a > [lpfc]) > [ 2349.535282] [] (tg3_interrupt_tagged+0x0/0xe6 > [tg3]) > [ 2349.535282] Disabling IRQ #17 > [ 2382.845061] lpfc 0000:05:04.0: 0:0459 Adapter heartbeat failure, > taking this port offline. > [ 2397.052375] device-mapper: multipath: Failing path 8:0. > [ 2397.053041] ata3: lost interrupt (Status 0x50) > [ 2397.053275] [ 2398.054372] device-mapper: multipath: Failing path > 8:16. > [ 2398.055179] ata4: lost interrupt (Status 0x50) > [ 2398.055413][ 2447.701115] ata3: lost interrupt (Status 0x50) > [ 2447.701389] sd 2:0:0:0: [sda] Unhandled error code > [ 2447.701515] sd 2:0 > > > .. and it also kills the ata_piix controller which is not on the same > IRQ (??) > That's very strange, but I suspect there's a lot of mysterious magic around piix ide controller interrupts relating to backwards compat, etc. J