Re: i915_driver_irq_handler: irq 42: nobody cared

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: i915_driver_irq_handler: irq 42: nobody cared
       [not found] <4F717CE3.4040206@suse.cz>
@ 2012-03-27  8:42 ` Jiri Slaby
  2012-03-30  9:59   ` Jiri Slaby
       [not found] ` <20120327085749.GE4276@phenom.ffwll.local>
  1 sibling, 1 reply; 32+ messages in thread
From: Jiri Slaby @ 2012-03-27  8:42 UTC (permalink / raw)
  To: Keith Packard; +Cc: dri-devel, Chris Wilson, Jiri Slaby, LKML

On 03/27/2012 10:40 AM, Jiri Slaby wrote:
> Hi,
> 
> I'm getting spurious interrupts leading to disabling the interrupt:
>  42:    1916853    2471662   PCI-MSI-edge      i915@pci:0000:00:02.0
> 
> The message:
> irq 42: nobody cared (try booting with the "irqpoll" option)
> Pid: 20716, comm: virtuoso-t Not tainted 3.3.0-next-20120326_64+ #1673
> 
> It is not new, but now I can reproduce it more-or-less reliably after an
> hour or so. It usually happens when playing a game using wine.
> 
> Do you want me to dump some registers when IRQ_NONE is returned from the
> ISR? As this is MSI, nobody else can sit there.

Also lspci:
00:02.0 VGA compatible controller [0300]: Intel Corporation 82G33/G31
Express Integrated Graphics Controller [8086:29c2] (rev 02) (prog-if 00
[VGA controller])
        Subsystem: Intel Corporation 82G33/G31 Express Integrated
Graphics Controller [8086:29c2]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 42
        Region 0: Memory at feb80000 (32-bit, non-prefetchable) [size=512K]
        Region 1: I/O ports at ec00 [size=8]
        Region 2: Memory at d0000000 (32-bit, prefetchable) [size=256M]
        Region 3: Memory at fea00000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee0300c  Data: 4179
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: i915
00: 86 80 c2 29 07 04 90 00 02 00 00 03 00 00 00 00
10: 00 00 b8 fe 01 ec 00 00 08 00 00 d0 00 00 a0 fe
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 c2 29
30: 00 00 00 00 90 00 00 00 00 00 00 00 05 01 00 00
40: 09 00 0b 01 00 00 00 00 01 00 00 00 00 00 00 00
50: 00 00 30 02 c9 03 00 00 00 00 00 00 00 00 80 af
60: 00 00 02 02 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 05 d0 01 00 0c 30 e0 fe 79 41 00 00 00 00 00 00
a0: 11 11 00 00 00 00 06 03 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 01 00 22 00 00 00 00 00 00 00 00 00 00 01 02 00
e0: 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00
f0: 10 00 00 00 00 00 00 00 90 0f 03 00 e4 e0 5b af

> thanks,
-- js suse labs


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
       [not found] ` <20120327085749.GE4276@phenom.ffwll.local>
@ 2012-03-27 10:54   ` Jiri Slaby
  0 siblings, 0 replies; 32+ messages in thread
From: Jiri Slaby @ 2012-03-27 10:54 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jiri Slaby, Keith Packard, dri-devel, Linux kernel mailing list

On 03/27/2012 10:57 AM, Daniel Vetter wrote:
> And please mind the guy with bad memory and tell us which chip you have
> again?

Where's that? In xorg.log:
https://bugs.freedesktop.org/attachment.cgi?id=58771
?

(II) intel(0): Integrated Graphics Chipset: Intel(R) G33
(--) intel(0): Chipset: "G33"

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-03-27  8:42 ` i915_driver_irq_handler: irq 42: nobody cared Jiri Slaby
@ 2012-03-30  9:59   ` Jiri Slaby
  2012-03-30 10:45     ` Chris Wilson
  0 siblings, 1 reply; 32+ messages in thread
From: Jiri Slaby @ 2012-03-30  9:59 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Keith Packard, dri-devel, Chris Wilson, LKML, daniel

On 03/27/2012 10:42 AM, Jiri Slaby wrote:
> On 03/27/2012 10:40 AM, Jiri Slaby wrote:
>> Hi,
>>
>> I'm getting spurious interrupts leading to disabling the interrupt:
>>  42:    1916853    2471662   PCI-MSI-edge      i915@pci:0000:00:02.0
>>
>> The message:
>> irq 42: nobody cared (try booting with the "irqpoll" option)
>> Pid: 20716, comm: virtuoso-t Not tainted 3.3.0-next-20120326_64+ #1673
>>
>> It is not new, but now I can reproduce it more-or-less reliably after an
>> hour or so. It usually happens when playing a game using wine.
>>
>> Do you want me to dump some registers when IRQ_NONE is returned from the
>> ISR? As this is MSI, nobody else can sit there.

The handler *constantly* returns IRQ_NONE.

With this patch:
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -28,6 +28,7 @@

 #include <linux/sysrq.h>
 #include <linux/slab.h>
+#include <linux/ratelimit.h>
 #include "drmP.h"
 #include "drm.h"
 #include "i915_drm.h"
@@ -1416,6 +1417,14 @@ static irqreturn_t
i915_driver_irq_handler(DRM_IRQ_ARGS)
                iir = new_iir;
        }

+       if (ret == IRQ_NONE && printk_ratelimit()) {
+               printk(KERN_DEBUG "%s:", __func__);
+               for_each_pipe(pipe) {
+                       printk(KERN_CONT " %d=%.8x", pipe,
+                                       pipe_stats[pipe]);
+               }
+       }
+
        return ret;
 }

And I get:
[ 3572.968581] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3572.977472] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3576.224839] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3576.243558] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3576.384912] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3576.403462] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3577.464100] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3577.477383] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3577.829016] i915_driver_irq_handler: 0=00020000 1=00000000
[ 3577.830093] i915_driver_irq_handler: 0=00020000 1=00000000
[ 3578.013015] i915_driver_irq_handler: 12 callbacks suppressed

I don't know what to dump more, because iir is obviously zero too. What
other sources of interrupts are on the (G33) chip?

> Also lspci:
> 00:02.0 VGA compatible controller [0300]: Intel Corporation 82G33/G31
> Express Integrated Graphics Controller [8086:29c2] (rev 02) (prog-if 00
> [VGA controller])
>         Subsystem: Intel Corporation 82G33/G31 Express Integrated
> Graphics Controller [8086:29c2]
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin A routed to IRQ 42
>         Region 0: Memory at feb80000 (32-bit, non-prefetchable) [size=512K]
>         Region 1: I/O ports at ec00 [size=8]
>         Region 2: Memory at d0000000 (32-bit, prefetchable) [size=256M]
>         Region 3: Memory at fea00000 (32-bit, non-prefetchable) [size=1M]
>         Expansion ROM at <unassigned> [disabled]
>         Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
>                 Address: fee0300c  Data: 4179
>         Capabilities: [d0] Power Management version 2
>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Kernel driver in use: i915
> 00: 86 80 c2 29 07 04 90 00 02 00 00 03 00 00 00 00
> 10: 00 00 b8 fe 01 ec 00 00 08 00 00 d0 00 00 a0 fe
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 c2 29
> 30: 00 00 00 00 90 00 00 00 00 00 00 00 05 01 00 00
> 40: 09 00 0b 01 00 00 00 00 01 00 00 00 00 00 00 00
> 50: 00 00 30 02 c9 03 00 00 00 00 00 00 00 00 80 af
> 60: 00 00 02 02 00 00 00 00 00 00 00 00 00 00 00 00
> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 90: 05 d0 01 00 0c 30 e0 fe 79 41 00 00 00 00 00 00
> a0: 11 11 00 00 00 00 06 03 00 00 00 00 00 00 00 00
> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> d0: 01 00 22 00 00 00 00 00 00 00 00 00 00 01 02 00
> e0: 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00
> f0: 10 00 00 00 00 00 00 00 90 0f 03 00 e4 e0 5b af
> 
>> thanks,
-- js suse labs


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-03-30  9:59   ` Jiri Slaby
@ 2012-03-30 10:45     ` Chris Wilson
  2012-03-30 12:11       ` Jiri Slaby
  2012-04-09 17:11       ` Jesse Barnes
  0 siblings, 2 replies; 32+ messages in thread
From: Chris Wilson @ 2012-03-30 10:45 UTC (permalink / raw)
  To: Jiri Slaby, Jiri Slaby; +Cc: Keith Packard, dri-devel, LKML, daniel

On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
> I don't know what to dump more, because iir is obviously zero too. What
> other sources of interrupts are on the (G33) chip?

IIR is the master interrupt, with chained secondary interrupt statuses.
If IIR is 0, the interrupt wasn't raised by the GPU.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-03-30 10:45     ` Chris Wilson
@ 2012-03-30 12:11       ` Jiri Slaby
  2012-03-30 12:24         ` Chris Wilson
  2012-04-09 17:11       ` Jesse Barnes
  1 sibling, 1 reply; 32+ messages in thread
From: Jiri Slaby @ 2012-03-30 12:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Jiri Slaby, Keith Packard, dri-devel, LKML, daniel

On 03/30/2012 12:45 PM, Chris Wilson wrote:
> On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
>> I don't know what to dump more, because iir is obviously zero too. What
>> other sources of interrupts are on the (G33) chip?
> 
> IIR is the master interrupt, with chained secondary interrupt statuses.
> If IIR is 0, the interrupt wasn't raised by the GPU.

This does not make sense, the handler does something different. Even if
IIR is 0, it still takes a look at pipe stats.

And this is MSI, so there can be no other source of the interrupt.
(Except broken IRQ routing.) I may try to boot with MSIs off if you
think it's important.

thanks,
-- js suse labs



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-03-30 12:11       ` Jiri Slaby
@ 2012-03-30 12:24         ` Chris Wilson
  2012-04-06 21:31           ` i915_driver_irq_handler: irq 42: nobody cared [generic IRQ handling broken?] Jiri Slaby
  0 siblings, 1 reply; 32+ messages in thread
From: Chris Wilson @ 2012-03-30 12:24 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Jiri Slaby, Keith Packard, dri-devel, LKML, daniel

On Fri, 30 Mar 2012 14:11:47 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
> On 03/30/2012 12:45 PM, Chris Wilson wrote:
> > On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
> >> I don't know what to dump more, because iir is obviously zero too. What
> >> other sources of interrupts are on the (G33) chip?
> > 
> > IIR is the master interrupt, with chained secondary interrupt statuses.
> > If IIR is 0, the interrupt wasn't raised by the GPU.
> 
> This does not make sense, the handler does something different. Even if
> IIR is 0, it still takes a look at pipe stats.

That was introduced in 05eff845a28499762075d3a72e238a31f4d2407c to close
a race where the pipestat triggered an interrupt after we processed the
secondary registers and before reseting the primary.

But the basic premise that we should only enter the interrupt handler
with IIR!=0 holds (presuming non-shared interrupt lines such as MSI).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared [generic IRQ handling broken?]
  2012-03-30 12:24         ` Chris Wilson
@ 2012-04-06 21:31           ` Jiri Slaby
  2012-04-06 22:40             ` Thomas Gleixner
                               ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Jiri Slaby @ 2012-04-06 21:31 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Jiri Slaby, Keith Packard, dri-devel, LKML, daniel, Thomas Gleixner

On 03/30/2012 02:24 PM, Chris Wilson wrote:
> On Fri, 30 Mar 2012 14:11:47 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
>> On 03/30/2012 12:45 PM, Chris Wilson wrote:
>>> On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
>>>> I don't know what to dump more, because iir is obviously zero too. What
>>>> other sources of interrupts are on the (G33) chip?
>>>
>>> IIR is the master interrupt, with chained secondary interrupt statuses.
>>> If IIR is 0, the interrupt wasn't raised by the GPU.
>>
>> This does not make sense, the handler does something different. Even if
>> IIR is 0, it still takes a look at pipe stats.
> 
> That was introduced in 05eff845a28499762075d3a72e238a31f4d2407c to close
> a race where the pipestat triggered an interrupt after we processed the
> secondary registers and before reseting the primary.
> 
> But the basic premise that we should only enter the interrupt handler
> with IIR!=0 holds (presuming non-shared interrupt lines such as MSI).

Ok, this behavior is definitely new. I get several "nobody cared" about
this interrupt a week. This never used to happen. And something weird
emerges in /proc/interrupts when this happens:
 42:    1003292    1212890   PCI-MSI-edge      �s����:0000:00:02.0
instead of
 42:    1006715    1218472   PCI-MSI-edge      i915@pci:0000:00:02.0

It very looks like the generic IRQ handling code is broken. Like it
frees/corrupts irq_desc and then as well calls random handlers.

Suspend/resume cycle helps in this case and "i915@pci:0000:00:02.0" is
back in /proc/interrupts as can be seen above.

Running 3.3.0-next-20120326_64+ now.

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared [generic IRQ handling broken?]
  2012-04-06 21:31           ` i915_driver_irq_handler: irq 42: nobody cared [generic IRQ handling broken?] Jiri Slaby
@ 2012-04-06 22:40             ` Thomas Gleixner
  2012-04-09 17:12               ` Jesse Barnes
  2012-04-10  8:44               ` Jiri Slaby
  2012-04-10  8:50             ` Daniel Vetter
  2012-04-10  8:52             ` i915_driver_irq_handler: irq 42: nobody cared Jiri Slaby
  2 siblings, 2 replies; 32+ messages in thread
From: Thomas Gleixner @ 2012-04-06 22:40 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Chris Wilson, Jiri Slaby, Keith Packard, dri-devel, LKML, daniel

On Fri, 6 Apr 2012, Jiri Slaby wrote:

> On 03/30/2012 02:24 PM, Chris Wilson wrote:
> > On Fri, 30 Mar 2012 14:11:47 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
> >> On 03/30/2012 12:45 PM, Chris Wilson wrote:
> >>> On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
> >>>> I don't know what to dump more, because iir is obviously zero too. What
> >>>> other sources of interrupts are on the (G33) chip?
> >>>
> >>> IIR is the master interrupt, with chained secondary interrupt statuses.
> >>> If IIR is 0, the interrupt wasn't raised by the GPU.
> >>
> >> This does not make sense, the handler does something different. Even if
> >> IIR is 0, it still takes a look at pipe stats.
> > 
> > That was introduced in 05eff845a28499762075d3a72e238a31f4d2407c to close
> > a race where the pipestat triggered an interrupt after we processed the
> > secondary registers and before reseting the primary.
> > 
> > But the basic premise that we should only enter the interrupt handler
> > with IIR!=0 holds (presuming non-shared interrupt lines such as MSI).
> 
> Ok, this behavior is definitely new. I get several "nobody cared" about
> this interrupt a week. This never used to happen. And something weird
> emerges in /proc/interrupts when this happens:
>  42:    1003292    1212890   PCI-MSI-edge      ???s????????????:0000:00:02.0
> instead of
>  42:    1006715    1218472   PCI-MSI-edge      i915@pci:0000:00:02.0
> 
> It very looks like the generic IRQ handling code is broken. Like it
> frees/corrupts irq_desc and ...

OMG, your problem analyzing skills are amazing.

If irq_desc would have been freed, then it wouldn't print the numbers
and the irq type. And irq_desc is not corrupted either, otherwise the
whole thing would explode in your face.

The printout of the name is done via action->name. The irq action
merily holds a pointer to the device name string, which is handed over
with request_irq. So you are saying that the core code corrupts the
memory which was handed in via a pointer by the driver?

So now that's really an amazing core feature:

It corrupts the memory with weird characters and still maintains the
PCI bus number correct. So it not only corrupts memory it also moves
the PCI part of the string a few characters to the end.

If the pointer in the irq action would have been corrupted, then you
would see a few weird characters and then the full string, not a
random thing which is half correct and shifted by a few bytes.

The pointer which is handed in is dev->devname, which gets allocated
and filled in drm_pci_set_busid().

> ... then as well calls random handlers.

Which random handlers would be called? The core code only calls
handlers which are associated to an particular interrupt. And only
when that particular interrupt is raised and not because the CPU pulls
interrupt events out of thin air.

And it calls the stupid i915 handler and not something else, otherwise
you would not observe the IIR=0 printk or whatever you put there for
debugging.

> Suspend/resume cycle helps in this case and "i915@pci:0000:00:02.0" is
> back in /proc/interrupts as can be seen above.

That's proving what? That the irq core code magically restores the
correct string, right? And probably it stops calling random handlers
as well. Brilliant deduction.

You know what? suspend calls free_irq() via i915_drm_freeze() ->
drm_irq_uninstall() and the resume code calls request_irq() again.
free_irq() removes the action and request_irq installs it fresh.

So now the interesting part is that free_irq() checks the dev_id
cookie for a match, which is also stored in the irq action. So we are
dealing with a magic corrupt only action->name and action->handler
problem. Pretty realistic.

What the heck makes you assume that the irq core code is broken?  Core
code, which works on a gazillion of machines and different device
drivers and does not corrupt anything except that i915 thingy?

Come on, you need to provide better evidence than weird ass guessing.

If you're still convinced that the irq core is messing with your
device string, then simply hand in a NULL pointer when requesting the
interrupt. That will make the core code explode nicely when it tries
to modify that memory.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-03-30 10:45     ` Chris Wilson
  2012-03-30 12:11       ` Jiri Slaby
@ 2012-04-09 17:11       ` Jesse Barnes
  2012-04-10  8:47         ` Jiri Slaby
  1 sibling, 1 reply; 32+ messages in thread
From: Jesse Barnes @ 2012-04-09 17:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Jiri Slaby, Jiri Slaby, LKML, dri-devel

[-- Attachment #1: Type: text/plain, Size: 698 bytes --]

On Fri, 30 Mar 2012 11:45:43 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
> > I don't know what to dump more, because iir is obviously zero too. What
> > other sources of interrupts are on the (G33) chip?
> 
> IIR is the master interrupt, with chained secondary interrupt statuses.
> If IIR is 0, the interrupt wasn't raised by the GPU.

I've actually seen cases where one of the PIPE*STAT regs is stuck, and
even if IIR is 0 we still get interrupts... Jiri can you verify the
PIPE*STAT regs have bits set, maybe one or more we don't check for?

-- 
Jesse Barnes, Intel Open Source Technology Center

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared [generic IRQ handling broken?]
  2012-04-06 22:40             ` Thomas Gleixner
@ 2012-04-09 17:12               ` Jesse Barnes
  2012-04-09 17:52                 ` Dave Airlie
  2012-04-10  8:44               ` Jiri Slaby
  1 sibling, 1 reply; 32+ messages in thread
From: Jesse Barnes @ 2012-04-09 17:12 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Jiri Slaby, Jiri Slaby, LKML, dri-devel, airlied

[-- Attachment #1: Type: text/plain, Size: 466 bytes --]

On Sat, 7 Apr 2012 00:40:28 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:
> You know what? suspend calls free_irq() via i915_drm_freeze() ->
> drm_irq_uninstall() and the resume code calls request_irq() again.
> free_irq() removes the action and request_irq installs it fresh.

Yeah this is a known issue with the DRM code, I thought Dave had a
fix queued a long time ago though...  Dave?

-- 
Jesse Barnes, Intel Open Source Technology Center

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared [generic IRQ handling broken?]
  2012-04-09 17:12               ` Jesse Barnes
@ 2012-04-09 17:52                 ` Dave Airlie
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Airlie @ 2012-04-09 17:52 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Thomas Gleixner, Jiri Slaby, Jiri Slaby, dri-devel, LKML

>> You know what? suspend calls free_irq() via i915_drm_freeze() ->
>> drm_irq_uninstall() and the resume code calls request_irq() again.
>> free_irq() removes the action and request_irq installs it fresh.
>
> Yeah this is a known issue with the DRM code, I thought Dave had a
> fix queued a long time ago though...  Dave?

/me doesn't remember seeing one but maybe this one?

http://lists.freedesktop.org/archives/dri-devel/2011-August/013407.html

probably fell down a hole.

Dave.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared [generic IRQ handling broken?]
  2012-04-06 22:40             ` Thomas Gleixner
  2012-04-09 17:12               ` Jesse Barnes
@ 2012-04-10  8:44               ` Jiri Slaby
  1 sibling, 0 replies; 32+ messages in thread
From: Jiri Slaby @ 2012-04-10  8:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Chris Wilson, Jiri Slaby, Keith Packard, dri-devel, LKML, daniel

On 04/07/2012 12:40 AM, Thomas Gleixner wrote:
> On Fri, 6 Apr 2012, Jiri Slaby wrote:
>> It very looks like the generic IRQ handling code is broken. Like it
>> frees/corrupts irq_desc and ...
> 
> OMG, your problem analyzing skills are amazing.

Hehe, no I did *no* analysis. I stand here as a bug reporter.

> What the heck makes you assume that the irq core code is broken?  Core
> code, which works on a gazillion of machines and different device
> drivers and does not corrupt anything except that i915 thingy?

Note that this is a -next regression. And i915 graphics used. This
definitely doesn't run on a gazillion of machines.

> If you're still convinced that the irq core is messing with your
> device string,

Nope, thanks for the input.

-- 
js
suse labs



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-09 17:11       ` Jesse Barnes
@ 2012-04-10  8:47         ` Jiri Slaby
  2012-04-10  8:58           ` Daniel Vetter
  2012-04-10 16:26           ` Jesse Barnes
  0 siblings, 2 replies; 32+ messages in thread
From: Jiri Slaby @ 2012-04-10  8:47 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Chris Wilson, Jiri Slaby, LKML, dri-devel

On 04/09/2012 07:11 PM, Jesse Barnes wrote:
> On Fri, 30 Mar 2012 11:45:43 +0100 Chris Wilson
> <chris@chris-wilson.co.uk> wrote:
> 
>> On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz>
>> wrote:
>>> I don't know what to dump more, because iir is obviously zero
>>> too. What other sources of interrupts are on the (G33) chip?
>> 
>> IIR is the master interrupt, with chained secondary interrupt
>> statuses. If IIR is 0, the interrupt wasn't raised by the GPU.
> 
> I've actually seen cases where one of the PIPE*STAT regs is stuck,
> and even if IIR is 0 we still get interrupts... Jiri can you verify
> the PIPE*STAT regs have bits set, maybe one or more we don't check
> for?

Note that I already attached their contents... This is what is in them
(pipes 0 and 1):
[ 3572.968581] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3572.977472] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3576.224839] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3576.243558] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3576.384912] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3576.403462] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3577.464100] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3577.477383] i915_driver_irq_handler: 0=00000000 1=00000000
[ 3577.829016] i915_driver_irq_handler: 0=00020000 1=00000000
[ 3577.830093] i915_driver_irq_handler: 0=00020000 1=00000000

I.e. the handler is called when IIR=0 and both pipe stats are 0.

The stats are dumped this way:
@@ -1416,6 +1417,14 @@ static irqreturn_t
i915_driver_irq_handler(DRM_IRQ_ARGS)
                iir = new_iir;
        }

+       if (ret == IRQ_NONE && printk_ratelimit()) {
+               printk(KERN_DEBUG "%s:", __func__);
+               for_each_pipe(pipe) {
+                       printk(KERN_CONT " %d=%.8x", pipe,
+                                       pipe_stats[pipe]);
+               }
+       }
+
        return ret;
 }

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared [generic IRQ handling broken?]
  2012-04-06 21:31           ` i915_driver_irq_handler: irq 42: nobody cared [generic IRQ handling broken?] Jiri Slaby
  2012-04-06 22:40             ` Thomas Gleixner
@ 2012-04-10  8:50             ` Daniel Vetter
  2012-04-10  8:52             ` i915_driver_irq_handler: irq 42: nobody cared Jiri Slaby
  2 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2012-04-10  8:50 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Chris Wilson, Jiri Slaby, Keith Packard, dri-devel, LKML,
	Thomas Gleixner

On Fri, Apr 6, 2012 at 23:31, Jiri Slaby <jslaby@suse.cz> wrote:
>> That was introduced in 05eff845a28499762075d3a72e238a31f4d2407c to close
>> a race where the pipestat triggered an interrupt after we processed the
>> secondary registers and before reseting the primary.
>>
>> But the basic premise that we should only enter the interrupt handler
>> with IIR!=0 holds (presuming non-shared interrupt lines such as MSI).
>
> Ok, this behavior is definitely new. I get several "nobody cared" about
> this interrupt a week. This never used to happen. And something weird
> emerges in /proc/interrupts when this happens:
>  42:    1003292    1212890   PCI-MSI-edge      �s����:0000:00:02.0
> instead of
>  42:    1006715    1218472   PCI-MSI-edge      i915@pci:0000:00:02.0

This looks ugly. Can you try to reproduce on 3.4-rc2? That should
contain everything that -next currently contains drm/i915-wise. If it
still happens there, please bisect it.

Also please check whether any of the subordinate interrupt regs
(pipestat) is stuck and might cause these interrupts as Jesse
suggested.

Thanks, Daniel
-- 
Daniel Vetter
daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-06 21:31           ` i915_driver_irq_handler: irq 42: nobody cared [generic IRQ handling broken?] Jiri Slaby
  2012-04-06 22:40             ` Thomas Gleixner
  2012-04-10  8:50             ` Daniel Vetter
@ 2012-04-10  8:52             ` Jiri Slaby
  2012-04-10 16:50               ` Marcin Slusarz
  2 siblings, 1 reply; 32+ messages in thread
From: Jiri Slaby @ 2012-04-10  8:52 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Chris Wilson, Keith Packard, dri-devel, LKML, daniel, Jesse Barnes

On 04/06/2012 11:31 PM, Jiri Slaby wrote:
> On 03/30/2012 02:24 PM, Chris Wilson wrote:
>> On Fri, 30 Mar 2012 14:11:47 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
>>> On 03/30/2012 12:45 PM, Chris Wilson wrote:
>>>> On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
>>>>> I don't know what to dump more, because iir is obviously zero too. What
>>>>> other sources of interrupts are on the (G33) chip?
>>>>
>>>> IIR is the master interrupt, with chained secondary interrupt statuses.
>>>> If IIR is 0, the interrupt wasn't raised by the GPU.
>>>
>>> This does not make sense, the handler does something different. Even if
>>> IIR is 0, it still takes a look at pipe stats.
>>
>> That was introduced in 05eff845a28499762075d3a72e238a31f4d2407c to close
>> a race where the pipestat triggered an interrupt after we processed the
>> secondary registers and before reseting the primary.
>>
>> But the basic premise that we should only enter the interrupt handler
>> with IIR!=0 holds (presuming non-shared interrupt lines such as MSI).
> 
> Ok, this behavior is definitely new. I get several "nobody cared" about
> this interrupt a week. This never used to happen. And something weird
> emerges in /proc/interrupts when this happens:
>  42:    1003292    1212890   PCI-MSI-edge      �s����:0000:00:02.0
> instead of
>  42:    1006715    1218472   PCI-MSI-edge      i915@pci:0000:00:02.0

See the difference of drm_device->devname:

Before:
20 34 32 3a 20 20 20 20  31 34 30 35 34 36 32 20  | 42:    1405462 |
20 20 20 31 37 32 38 33  30 32 20 20 20 50 43 49  |   1728302   PCI|
2d 4d 53 49 2d 65 64 67  65 20 20 20 20 20 20 69  |-MSI-edge      i|
39 31 35 40 70 63 69 3a  30 30 30 30 3a 30 30 3a  |915@pci:0000:00:|
30 32 2e 30 0a                                    |02.0.|

After:
20 34 32 3a 20 20 20 20  31 30 30 33 32 39 32 20  | 42:    1003292 |
20 20 20 31 32 31 32 38  39 30 20 20 20 50 43 49  |   1212890   PCI|
2d 4d 53 49 2d 65 64 67  65 20 20 20 20 20 20 ef  |-MSI-edge      .|
bf bd 73 ef bf bd ef bf  bd ef bf bd ef bf bd 3a  |..s............:|
30 30 30 30 3a 30 30 3a  30 32 2e 30 0a           |0000:00:02.0.|

Any idea what "ef bf bd" pattern could be? And who *shifts* the
"0000:00:02.0" string?

thanks,

-- 
js
suse labs


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-10  8:47         ` Jiri Slaby
@ 2012-04-10  8:58           ` Daniel Vetter
  2012-04-10  9:48             ` Jiri Slaby
  2012-04-10 16:26           ` Jesse Barnes
  1 sibling, 1 reply; 32+ messages in thread
From: Daniel Vetter @ 2012-04-10  8:58 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Jesse Barnes, LKML, Jiri Slaby, dri-devel

On Tue, Apr 10, 2012 at 10:47:49AM +0200, Jiri Slaby wrote:
> On 04/09/2012 07:11 PM, Jesse Barnes wrote:
> > On Fri, 30 Mar 2012 11:45:43 +0100 Chris Wilson
> > <chris@chris-wilson.co.uk> wrote:
> > 
> >> On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz>
> >> wrote:
> >>> I don't know what to dump more, because iir is obviously zero
> >>> too. What other sources of interrupts are on the (G33) chip?
> >> 
> >> IIR is the master interrupt, with chained secondary interrupt
> >> statuses. If IIR is 0, the interrupt wasn't raised by the GPU.
> > 
> > I've actually seen cases where one of the PIPE*STAT regs is stuck,
> > and even if IIR is 0 we still get interrupts... Jiri can you verify
> > the PIPE*STAT regs have bits set, maybe one or more we don't check
> > for?
> 
> Note that I already attached their contents... This is what is in them
> (pipes 0 and 1):
> [ 3572.968581] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3572.977472] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3576.224839] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3576.243558] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3576.384912] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3576.403462] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3577.464100] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3577.477383] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3577.829016] i915_driver_irq_handler: 0=00020000 1=00000000
> [ 3577.830093] i915_driver_irq_handler: 0=00020000 1=00000000
> 
> I.e. the handler is called when IIR=0 and both pipe stats are 0.

Hm, can you also dump the PORT_HOTPLUG_STAT register? That's the only
other subordinate interrupt source left.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-10  8:58           ` Daniel Vetter
@ 2012-04-10  9:48             ` Jiri Slaby
  0 siblings, 0 replies; 32+ messages in thread
From: Jiri Slaby @ 2012-04-10  9:48 UTC (permalink / raw)
  To: daniel, Jesse Barnes, LKML, dri-devel, Jiri Slaby, Chris Wilson

On 04/10/2012 10:58 AM, Daniel Vetter wrote:
> On Tue, Apr 10, 2012 at 10:47:49AM +0200, Jiri Slaby wrote:
>> On 04/09/2012 07:11 PM, Jesse Barnes wrote:
>>> On Fri, 30 Mar 2012 11:45:43 +0100 Chris Wilson
>>> <chris@chris-wilson.co.uk> wrote:
>>>
>>>> On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz>
>>>> wrote:
>>>>> I don't know what to dump more, because iir is obviously zero
>>>>> too. What other sources of interrupts are on the (G33) chip?
>>>>
>>>> IIR is the master interrupt, with chained secondary interrupt
>>>> statuses. If IIR is 0, the interrupt wasn't raised by the GPU.
>>>
>>> I've actually seen cases where one of the PIPE*STAT regs is stuck,
>>> and even if IIR is 0 we still get interrupts... Jiri can you verify
>>> the PIPE*STAT regs have bits set, maybe one or more we don't check
>>> for?
>>
>> Note that I already attached their contents... This is what is in them
>> (pipes 0 and 1):
>> [ 3572.968581] i915_driver_irq_handler: 0=00000000 1=00000000
>> [ 3572.977472] i915_driver_irq_handler: 0=00000000 1=00000000
>> [ 3576.224839] i915_driver_irq_handler: 0=00000000 1=00000000
>> [ 3576.243558] i915_driver_irq_handler: 0=00000000 1=00000000
>> [ 3576.384912] i915_driver_irq_handler: 0=00000000 1=00000000
>> [ 3576.403462] i915_driver_irq_handler: 0=00000000 1=00000000
>> [ 3577.464100] i915_driver_irq_handler: 0=00000000 1=00000000
>> [ 3577.477383] i915_driver_irq_handler: 0=00000000 1=00000000
>> [ 3577.829016] i915_driver_irq_handler: 0=00020000 1=00000000
>> [ 3577.830093] i915_driver_irq_handler: 0=00020000 1=00000000
>>
>> I.e. the handler is called when IIR=0 and both pipe stats are 0.
> 
> Hm, can you also dump the PORT_HOTPLUG_STAT register? That's the only
> other subordinate interrupt source left.

It's always 0x300:
i915_driver_irq_handler: HP=00000300 0=00000000 1=00000000
i915_driver_irq_handler: HP=00000300 0=00000000 1=00000000
i915_driver_irq_handler: HP=00000300 0=00000000 1=00000000
i915_driver_irq_handler: HP=00000300 0=00000000 1=00000000
i915_driver_irq_handler: HP=00000300 0=00000000 1=00000000
i915_driver_irq_handler: HP=00000300 0=00000000 1=00000000
i915_driver_irq_handler: HP=00000300 0=00000000 1=00000000
i915_driver_irq_handler: HP=00000300 0=00000000 1=00000000

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-10  8:47         ` Jiri Slaby
  2012-04-10  8:58           ` Daniel Vetter
@ 2012-04-10 16:26           ` Jesse Barnes
  2012-04-10 18:11             ` Jiri Slaby
  1 sibling, 1 reply; 32+ messages in thread
From: Jesse Barnes @ 2012-04-10 16:26 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Chris Wilson, Jiri Slaby, LKML, dri-devel

[-- Attachment #1: Type: text/plain, Size: 1937 bytes --]

On Tue, 10 Apr 2012 10:47:49 +0200
Jiri Slaby <jslaby@suse.cz> wrote:

> On 04/09/2012 07:11 PM, Jesse Barnes wrote:
> > On Fri, 30 Mar 2012 11:45:43 +0100 Chris Wilson
> > <chris@chris-wilson.co.uk> wrote:
> > 
> >> On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz>
> >> wrote:
> >>> I don't know what to dump more, because iir is obviously zero
> >>> too. What other sources of interrupts are on the (G33) chip?
> >> 
> >> IIR is the master interrupt, with chained secondary interrupt
> >> statuses. If IIR is 0, the interrupt wasn't raised by the GPU.
> > 
> > I've actually seen cases where one of the PIPE*STAT regs is stuck,
> > and even if IIR is 0 we still get interrupts... Jiri can you verify
> > the PIPE*STAT regs have bits set, maybe one or more we don't check
> > for?
> 
> Note that I already attached their contents... This is what is in them
> (pipes 0 and 1):
> [ 3572.968581] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3572.977472] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3576.224839] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3576.243558] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3576.384912] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3576.403462] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3577.464100] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3577.477383] i915_driver_irq_handler: 0=00000000 1=00000000
> [ 3577.829016] i915_driver_irq_handler: 0=00020000 1=00000000
> [ 3577.830093] i915_driver_irq_handler: 0=00020000 1=00000000
> 
> I.e. the handler is called when IIR=0 and both pipe stats are 0.

Oh sorry missed the PIPE*STAT, I thought it was IMR or something, I
should have read more closely.

So port hotplug is always reporting that port C has a hotplug interrupt
though...  If you write 0x3 back to it does the interrupt stop?

-- 
Jesse Barnes, Intel Open Source Technology Center

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-10  8:52             ` i915_driver_irq_handler: irq 42: nobody cared Jiri Slaby
@ 2012-04-10 16:50               ` Marcin Slusarz
  0 siblings, 0 replies; 32+ messages in thread
From: Marcin Slusarz @ 2012-04-10 16:50 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Jiri Slaby, Chris Wilson, Keith Packard, dri-devel, LKML, daniel,
	Jesse Barnes

On Tue, Apr 10, 2012 at 10:52:06AM +0200, Jiri Slaby wrote:
> On 04/06/2012 11:31 PM, Jiri Slaby wrote:
> > On 03/30/2012 02:24 PM, Chris Wilson wrote:
> >> On Fri, 30 Mar 2012 14:11:47 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
> >>> On 03/30/2012 12:45 PM, Chris Wilson wrote:
> >>>> On Fri, 30 Mar 2012 11:59:28 +0200, Jiri Slaby <jslaby@suse.cz> wrote:
> >>>>> I don't know what to dump more, because iir is obviously zero too. What
> >>>>> other sources of interrupts are on the (G33) chip?
> >>>>
> >>>> IIR is the master interrupt, with chained secondary interrupt statuses.
> >>>> If IIR is 0, the interrupt wasn't raised by the GPU.
> >>>
> >>> This does not make sense, the handler does something different. Even if
> >>> IIR is 0, it still takes a look at pipe stats.
> >>
> >> That was introduced in 05eff845a28499762075d3a72e238a31f4d2407c to close
> >> a race where the pipestat triggered an interrupt after we processed the
> >> secondary registers and before reseting the primary.
> >>
> >> But the basic premise that we should only enter the interrupt handler
> >> with IIR!=0 holds (presuming non-shared interrupt lines such as MSI).
> > 
> > Ok, this behavior is definitely new. I get several "nobody cared" about
> > this interrupt a week. This never used to happen. And something weird
> > emerges in /proc/interrupts when this happens:
> >  42:    1003292    1212890   PCI-MSI-edge      �s����:0000:00:02.0
> > instead of
> >  42:    1006715    1218472   PCI-MSI-edge      i915@pci:0000:00:02.0
> 
> See the difference of drm_device->devname:
> 
> Before:
> 20 34 32 3a 20 20 20 20  31 34 30 35 34 36 32 20  | 42:    1405462 |
> 20 20 20 31 37 32 38 33  30 32 20 20 20 50 43 49  |   1728302   PCI|
> 2d 4d 53 49 2d 65 64 67  65 20 20 20 20 20 20 69  |-MSI-edge      i|
> 39 31 35 40 70 63 69 3a  30 30 30 30 3a 30 30 3a  |915@pci:0000:00:|
> 30 32 2e 30 0a                                    |02.0.|
> 
> After:
> 20 34 32 3a 20 20 20 20  31 30 30 33 32 39 32 20  | 42:    1003292 |
> 20 20 20 31 32 31 32 38  39 30 20 20 20 50 43 49  |   1212890   PCI|
> 2d 4d 53 49 2d 65 64 67  65 20 20 20 20 20 20 ef  |-MSI-edge      .|
> bf bd 73 ef bf bd ef bf  bd ef bf bd ef bf bd 3a  |..s............:|
> 30 30 30 30 3a 30 30 3a  30 32 2e 30 0a           |0000:00:02.0.|
> 
> Any idea what "ef bf bd" pattern could be? And who *shifts* the
> "0000:00:02.0" string?
> 

Maybe this patch will help catch it:

---
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index cf85155..2f9717c 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -69,7 +69,7 @@ static void
 drm_unset_busid(struct drm_device *dev,
 		struct drm_master *master)
 {
-	kfree(dev->devname);
+	free_pages((unsigned long)dev->devname, 0);
 	dev->devname = NULL;
 
 	kfree(master->unique);
diff --git a/drivers/gpu/drm/drm_pci.c b/drivers/gpu/drm/drm_pci.c
index 13f3d93..d788b78 100644
--- a/drivers/gpu/drm/drm_pci.c
+++ b/drivers/gpu/drm/drm_pci.c
@@ -177,9 +177,7 @@ int drm_pci_set_busid(struct drm_device *dev, struct drm_master *master)
 	} else
 		master->unique_len = len;
 
-	dev->devname =
-		kmalloc(strlen(pdriver->name) +
-			master->unique_len + 2, GFP_KERNEL);
+	dev->devname = (void *)__get_free_pages(GFP_KERNEL, 0);
 
 	if (dev->devname == NULL) {
 		ret = -ENOMEM;
@@ -188,6 +186,7 @@ int drm_pci_set_busid(struct drm_device *dev, struct drm_master *master)
 
 	sprintf(dev->devname, "%s@%s", pdriver->name,
 		master->unique);
+	set_memory_ro((unsigned long)dev->devname, 1);
 
 	return 0;
 err:
@@ -217,8 +216,7 @@ int drm_pci_set_unique(struct drm_device *dev,
 	master->unique[master->unique_len] = '\0';
 
 	bus_name = dev->driver->bus->get_name(dev);
-	dev->devname = kmalloc(strlen(bus_name) +
-			       strlen(master->unique) + 2, GFP_KERNEL);
+	dev->devname = (void *)__get_free_pages(GFP_KERNEL, 0);
 	if (!dev->devname) {
 		ret = -ENOMEM;
 		goto err;
@@ -226,6 +224,7 @@ int drm_pci_set_unique(struct drm_device *dev,
 
 	sprintf(dev->devname, "%s@%s", bus_name,
 		master->unique);
+	set_memory_ro((unsigned long)dev->devname, 1);
 
 	/* Return error if the busid submitted doesn't match the device's actual
 	 * busid.
diff --git a/drivers/gpu/drm/drm_platform.c b/drivers/gpu/drm/drm_platform.c
index 82431dc..aa0acec 100644
--- a/drivers/gpu/drm/drm_platform.c
+++ b/drivers/gpu/drm/drm_platform.c
@@ -148,9 +148,7 @@ static int drm_platform_set_busid(struct drm_device *dev, struct drm_master *mas
 		goto err;
 	}
 
-	dev->devname =
-		kmalloc(strlen(dev->platformdev->name) +
-			master->unique_len + 2, GFP_KERNEL);
+	dev->devname = (void *)__get_free_pages(GFP_KERNEL, 0);
 
 	if (dev->devname == NULL) {
 		ret = -ENOMEM;
@@ -159,6 +157,8 @@ static int drm_platform_set_busid(struct drm_device *dev, struct drm_master *mas
 
 	sprintf(dev->devname, "%s@%s", dev->platformdev->name,
 		master->unique);
+	set_memory_ro((unsigned long)dev->devname, 1);
+
 	return 0;
 err:
 	return ret;
diff --git a/drivers/gpu/drm/drm_stub.c b/drivers/gpu/drm/drm_stub.c
index aa454f8..4f53c0f 100644
--- a/drivers/gpu/drm/drm_stub.c
+++ b/drivers/gpu/drm/drm_stub.c
@@ -187,7 +187,7 @@ static void drm_master_destroy(struct kref *kref)
 		master->unique_len = 0;
 	}
 
-	kfree(dev->devname);
+	free_pages((unsigned long)dev->devname, 0);
 	dev->devname = NULL;
 
 	list_for_each_entry_safe(pt, next, &master->magicfree, head) {
@@ -494,7 +494,7 @@ void drm_put_dev(struct drm_device *dev)
 
 	list_del(&dev->driver_item);
 	if (dev->devname) {
-		kfree(dev->devname);
+		free_pages((unsigned long)dev->devname, 0);
 		dev->devname = NULL;
 	}
 	kfree(dev);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-10 16:26           ` Jesse Barnes
@ 2012-04-10 18:11             ` Jiri Slaby
  2012-04-10 18:34               ` Jesse Barnes
  0 siblings, 1 reply; 32+ messages in thread
From: Jiri Slaby @ 2012-04-10 18:11 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Chris Wilson, Jiri Slaby, LKML, dri-devel, daniel

On 04/10/2012 06:26 PM, Jesse Barnes wrote:
> So port hotplug is always reporting that port C has a hotplug
> interrupt though...  If you write 0x3 back to it does the interrupt
> stop?

I'm not sure I got it right. This doesn't help:
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1416,6 +1416,17 @@ static irqreturn_t
i915_driver_irq_handler(DRM_IRQ_ARGS)
                iir = new_iir;
        }

+       if (ret == IRQ_NONE) {
+               u32 hp = I915_READ(PORT_HOTPLUG_STAT);
+               if (hp) {
+                       I915_WRITE(PORT_HOTPLUG_STAT, hp);
+                       I915_READ(PORT_HOTPLUG_STAT);
+               }
+
+               if (printk_ratelimit())
+                       printk(KERN_DEBUG "%s: %.8x\n", __func__, hp);
+
+       }

        return ret;
 }

thanks,
-- 
js
suse labs



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-10 18:11             ` Jiri Slaby
@ 2012-04-10 18:34               ` Jesse Barnes
  2012-04-10 19:52                 ` Jiri Slaby
  2012-04-11  6:29                 ` Michel Dänzer
  0 siblings, 2 replies; 32+ messages in thread
From: Jesse Barnes @ 2012-04-10 18:34 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Chris Wilson, Jiri Slaby, LKML, dri-devel, daniel

[-- Attachment #1: Type: text/plain, Size: 1472 bytes --]

On Tue, 10 Apr 2012 20:11:29 +0200
Jiri Slaby <jslaby@suse.cz> wrote:

> On 04/10/2012 06:26 PM, Jesse Barnes wrote:
> > So port hotplug is always reporting that port C has a hotplug
> > interrupt though...  If you write 0x3 back to it does the interrupt
> > stop?
> 
> I'm not sure I got it right. This doesn't help:
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1416,6 +1416,17 @@ static irqreturn_t
> i915_driver_irq_handler(DRM_IRQ_ARGS)
>                 iir = new_iir;
>         }
> 
> +       if (ret == IRQ_NONE) {
> +               u32 hp = I915_READ(PORT_HOTPLUG_STAT);
> +               if (hp) {
> +                       I915_WRITE(PORT_HOTPLUG_STAT, hp);
> +                       I915_READ(PORT_HOTPLUG_STAT);
> +               }
> +
> +               if (printk_ratelimit())
> +                       printk(KERN_DEBUG "%s: %.8x\n", __func__, hp);
> +
> +       }
> 
>         return ret;
>  }

Yeah that looks right, you still get 0x300?

You could try masking hotplug interrupts altogether.

Also, just to sanity check things, can you look at the output of "lspci
-s 02.0 -vvv -xxx" and see if the "INTx" field is + or -?  If it's +,
then the interrupt is definitely coming from an un-acked IRQ source on
the gfx device.  If it's INTx-, it means something in one of the upper
MSI layers isn't getting handled right.

-- 
Jesse Barnes, Intel Open Source Technology Center

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-10 18:34               ` Jesse Barnes
@ 2012-04-10 19:52                 ` Jiri Slaby
  2012-04-10 20:32                   ` Daniel Vetter
  2012-04-11  6:29                 ` Michel Dänzer
  1 sibling, 1 reply; 32+ messages in thread
From: Jiri Slaby @ 2012-04-10 19:52 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Chris Wilson, Jiri Slaby, LKML, dri-devel, daniel

On 04/10/2012 08:34 PM, Jesse Barnes wrote:
> On Tue, 10 Apr 2012 20:11:29 +0200 Jiri Slaby <jslaby@suse.cz>
> wrote:
> 
>> On 04/10/2012 06:26 PM, Jesse Barnes wrote:
>>> So port hotplug is always reporting that port C has a hotplug 
>>> interrupt though...  If you write 0x3 back to it does the
>>> interrupt stop?
>> 
>> I'm not sure I got it right. This doesn't help: ---
>> a/drivers/gpu/drm/i915/i915_irq.c +++
>> b/drivers/gpu/drm/i915/i915_irq.c @@ -1416,6 +1416,17 @@ static
>> irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS) iir = new_iir; 
>> }
>> 
>> +       if (ret == IRQ_NONE) { +               u32 hp =
>> I915_READ(PORT_HOTPLUG_STAT); +               if (hp) { +
>> I915_WRITE(PORT_HOTPLUG_STAT, hp); +
>> I915_READ(PORT_HOTPLUG_STAT); +               } + +
>> if (printk_ratelimit()) +                       printk(KERN_DEBUG
>> "%s: %.8x\n", __func__, hp); + +       }
>> 
>> return ret; }
> 
> Yeah that looks right, you still get 0x300?

Yes.

> You could try masking hotplug interrupts altogether.

This doesn't help:
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2049,7 +2051,7 @@ static int i915_driver_irq_postinstall(struct
drm_device *dev)
        I915_WRITE(IER, enable_mask);
        POSTING_READ(IER);

-       if (I915_HAS_HOTPLUG(dev)) {
+       if (0 && I915_HAS_HOTPLUG(dev)) {
                u32 hotplug_en = I915_READ(PORT_HOTPLUG_EN);

                /* Note HDMI and DP share bits */


> Also, just to sanity check things, can you look at the output of
> "lspci -s 02.0 -vvv -xxx" and see if the "INTx" field is + or -?
> If it's +, then the interrupt is definitely coming from an un-acked
> IRQ source on the gfx device.  If it's INTx-, it means something in
> one of the upper MSI layers isn't getting handled right.

Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-

I tried 3.2 and 3.3. Although the spurious interrupts were always
there, they occurred with frequency lower by a magnitude (15 vs. 300
after X starts). So I bisected that and it lead to a commit which
fixes bad tiling for me:
http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-jiri&id=79710e6ccabdac80c65cd13b944695ecc3e42a9d

thanks,
-- 
js
suse labs



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-10 19:52                 ` Jiri Slaby
@ 2012-04-10 20:32                   ` Daniel Vetter
  2012-04-10 20:34                     ` Jesse Barnes
  0 siblings, 1 reply; 32+ messages in thread
From: Daniel Vetter @ 2012-04-10 20:32 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Jesse Barnes, Chris Wilson, Jiri Slaby, LKML, dri-devel, daniel

On Tue, Apr 10, 2012 at 09:52:40PM +0200, Jiri Slaby wrote:
> On 04/10/2012 08:34 PM, Jesse Barnes wrote:
> > On Tue, 10 Apr 2012 20:11:29 +0200 Jiri Slaby <jslaby@suse.cz>
> > wrote:
> > 
> >> On 04/10/2012 06:26 PM, Jesse Barnes wrote:
> >>> So port hotplug is always reporting that port C has a hotplug 
> >>> interrupt though...  If you write 0x3 back to it does the
> >>> interrupt stop?
> >> 
> >> I'm not sure I got it right. This doesn't help: ---
> >> a/drivers/gpu/drm/i915/i915_irq.c +++
> >> b/drivers/gpu/drm/i915/i915_irq.c @@ -1416,6 +1416,17 @@ static
> >> irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS) iir = new_iir; 
> >> }
> >> 
> >> +       if (ret == IRQ_NONE) { +               u32 hp =
> >> I915_READ(PORT_HOTPLUG_STAT); +               if (hp) { +
> >> I915_WRITE(PORT_HOTPLUG_STAT, hp); +
> >> I915_READ(PORT_HOTPLUG_STAT); +               } + +
> >> if (printk_ratelimit()) +                       printk(KERN_DEBUG
> >> "%s: %.8x\n", __func__, hp); + +       }
> >> 
> >> return ret; }
> > 
> > Yeah that looks right, you still get 0x300?
> 
> Yes.
> 
> > You could try masking hotplug interrupts altogether.
> 
> This doesn't help:
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2049,7 +2051,7 @@ static int i915_driver_irq_postinstall(struct
> drm_device *dev)
>         I915_WRITE(IER, enable_mask);
>         POSTING_READ(IER);
> 
> -       if (I915_HAS_HOTPLUG(dev)) {
> +       if (0 && I915_HAS_HOTPLUG(dev)) {
>                 u32 hotplug_en = I915_READ(PORT_HOTPLUG_EN);
> 
>                 /* Note HDMI and DP share bits */
> 
> 
> > Also, just to sanity check things, can you look at the output of
> > "lspci -s 02.0 -vvv -xxx" and see if the "INTx" field is + or -?
> > If it's +, then the interrupt is definitely coming from an un-acked
> > IRQ source on the gfx device.  If it's INTx-, it means something in
> > one of the upper MSI layers isn't getting handled right.
> 
> Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> 
> I tried 3.2 and 3.3. Although the spurious interrupts were always
> there, they occurred with frequency lower by a magnitude (15 vs. 300
> after X starts). So I bisected that and it lead to a commit which
> fixes bad tiling for me:
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-jiri&id=79710e6ccabdac80c65cd13b944695ecc3e42a9d

Pipelined fencing is pretty much just broken and we'll completely rip it
out in 3.5. Does this also happen with 3.4-rc2?
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-10 20:32                   ` Daniel Vetter
@ 2012-04-10 20:34                     ` Jesse Barnes
  2012-04-11 10:40                       ` Daniel Vetter
  0 siblings, 1 reply; 32+ messages in thread
From: Jesse Barnes @ 2012-04-10 20:34 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Jiri Slaby, Chris Wilson, Jiri Slaby, LKML, dri-devel

[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]

On Tue, 10 Apr 2012 22:32:12 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Apr 10, 2012 at 09:52:40PM +0200, Jiri Slaby wrote:
> > Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > 
> > I tried 3.2 and 3.3. Although the spurious interrupts were always
> > there, they occurred with frequency lower by a magnitude (15 vs. 300
> > after X starts). So I bisected that and it lead to a commit which
> > fixes bad tiling for me:
> > http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-jiri&id=79710e6ccabdac80c65cd13b944695ecc3e42a9d
> 
> Pipelined fencing is pretty much just broken and we'll completely rip it
> out in 3.5. Does this also happen with 3.4-rc2?

Does INTx- stay that way?  Or does it frequently read INTx+ if you
sample it a lot?  If it stays as INTx-, then something other than the
GPU is getting stuck (though it's possible this could be related to
pipelined fencing, if the fences are programmed to point at some funky
memory space).

-- 
Jesse Barnes, Intel Open Source Technology Center

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-10 18:34               ` Jesse Barnes
  2012-04-10 19:52                 ` Jiri Slaby
@ 2012-04-11  6:29                 ` Michel Dänzer
  2012-04-11 16:03                   ` Jesse Barnes
  1 sibling, 1 reply; 32+ messages in thread
From: Michel Dänzer @ 2012-04-11  6:29 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Jiri Slaby, LKML, Jiri Slaby, dri-devel

On Die, 2012-04-10 at 11:34 -0700, Jesse Barnes wrote: 
> On Tue, 10 Apr 2012 20:11:29 +0200
> Jiri Slaby <jslaby@suse.cz> wrote:
> 
> > On 04/10/2012 06:26 PM, Jesse Barnes wrote:
> > > So port hotplug is always reporting that port C has a hotplug
> > > interrupt though...  If you write 0x3 back to it does the interrupt
> > > stop?
> > 
> > I'm not sure I got it right. This doesn't help:
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -1416,6 +1416,17 @@ static irqreturn_t
> > i915_driver_irq_handler(DRM_IRQ_ARGS)
> >                 iir = new_iir;
> >         }
> > 
> > +       if (ret == IRQ_NONE) {
> > +               u32 hp = I915_READ(PORT_HOTPLUG_STAT);
> > +               if (hp) {
> > +                       I915_WRITE(PORT_HOTPLUG_STAT, hp);
> > +                       I915_READ(PORT_HOTPLUG_STAT);
> > +               }
> > +
> > +               if (printk_ratelimit())
> > +                       printk(KERN_DEBUG "%s: %.8x\n", __func__, hp);
> > +
> > +       }
> > 
> >         return ret;
> >  }
> 
> Yeah that looks right, you still get 0x300?

You said 'If you write 0x3 back' above, but this code writes 0x300.
Which is right?


-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-10 20:34                     ` Jesse Barnes
@ 2012-04-11 10:40                       ` Daniel Vetter
  2012-05-03 19:56                         ` Jiri Slaby
  0 siblings, 1 reply; 32+ messages in thread
From: Daniel Vetter @ 2012-04-11 10:40 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Daniel Vetter, Jiri Slaby, Chris Wilson, Jiri Slaby, LKML, dri-devel

On Tue, Apr 10, 2012 at 01:34:11PM -0700, Jesse Barnes wrote:
> On Tue, 10 Apr 2012 22:32:12 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Tue, Apr 10, 2012 at 09:52:40PM +0200, Jiri Slaby wrote:
> > > Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
> > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > 
> > > I tried 3.2 and 3.3. Although the spurious interrupts were always
> > > there, they occurred with frequency lower by a magnitude (15 vs. 300
> > > after X starts). So I bisected that and it lead to a commit which
> > > fixes bad tiling for me:
> > > http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-jiri&id=79710e6ccabdac80c65cd13b944695ecc3e42a9d
> > 
> > Pipelined fencing is pretty much just broken and we'll completely rip it
> > out in 3.5. Does this also happen with 3.4-rc2?
> 
> Does INTx- stay that way?  Or does it frequently read INTx+ if you
> sample it a lot?  If it stays as INTx-, then something other than the
> GPU is getting stuck (though it's possible this could be related to
> pipelined fencing, if the fences are programmed to point at some funky
> memory space).

Shot in the dark, let's disable msi a bit. Can you try the below patch?


diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 785f67f..249d5fe 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -2071,6 +2071,7 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 	else if (IS_GEN5(dev))
 		i915_ironlake_get_mem_freq(dev);
 
+#if 0
 	/* On the 945G/GM, the chipset reports the MSI capability on the
 	 * integrated graphics even though the support isn't actually there
 	 * according to the published specs.  It doesn't appear to function
@@ -2084,6 +2085,7 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 	 */
 	if (!IS_I945G(dev) && !IS_I945GM(dev))
 		pci_enable_msi(dev->pdev);
+#endif
 
 	spin_lock_init(&dev_priv->gt_lock);
 	spin_lock_init(&dev_priv->irq_lock);
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-11  6:29                 ` Michel Dänzer
@ 2012-04-11 16:03                   ` Jesse Barnes
  0 siblings, 0 replies; 32+ messages in thread
From: Jesse Barnes @ 2012-04-11 16:03 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: Jiri Slaby, LKML, Jiri Slaby, dri-devel

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]

On Wed, 11 Apr 2012 08:29:22 +0200
Michel Dänzer <michel@daenzer.net> wrote:

> On Die, 2012-04-10 at 11:34 -0700, Jesse Barnes wrote: 
> > On Tue, 10 Apr 2012 20:11:29 +0200
> > Jiri Slaby <jslaby@suse.cz> wrote:
> > 
> > > On 04/10/2012 06:26 PM, Jesse Barnes wrote:
> > > > So port hotplug is always reporting that port C has a hotplug
> > > > interrupt though...  If you write 0x3 back to it does the interrupt
> > > > stop?
> > > 
> > > I'm not sure I got it right. This doesn't help:
> > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > @@ -1416,6 +1416,17 @@ static irqreturn_t
> > > i915_driver_irq_handler(DRM_IRQ_ARGS)
> > >                 iir = new_iir;
> > >         }
> > > 
> > > +       if (ret == IRQ_NONE) {
> > > +               u32 hp = I915_READ(PORT_HOTPLUG_STAT);
> > > +               if (hp) {
> > > +                       I915_WRITE(PORT_HOTPLUG_STAT, hp);
> > > +                       I915_READ(PORT_HOTPLUG_STAT);
> > > +               }
> > > +
> > > +               if (printk_ratelimit())
> > > +                       printk(KERN_DEBUG "%s: %.8x\n", __func__, hp);
> > > +
> > > +       }
> > > 
> > >         return ret;
> > >  }
> > 
> > Yeah that looks right, you still get 0x300?
> 
> You said 'If you write 0x3 back' above, but this code writes 0x300.
> Which is right?

0x300 is right, the bits are status bits with write 1 to clear
semantics.  But it looks like this one is just stuck high (probably
because port C isn't actually wired up fully).

-- 
Jesse Barnes, Intel Open Source Technology Center

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-04-11 10:40                       ` Daniel Vetter
@ 2012-05-03 19:56                         ` Jiri Slaby
  2012-05-03 21:15                           ` Daniel Vetter
  0 siblings, 1 reply; 32+ messages in thread
From: Jiri Slaby @ 2012-05-03 19:56 UTC (permalink / raw)
  To: Jesse Barnes, Chris Wilson, Jiri Slaby, LKML, dri-devel

On 04/11/2012 12:40 PM, Daniel Vetter wrote:
> On Tue, Apr 10, 2012 at 01:34:11PM -0700, Jesse Barnes wrote:
>> On Tue, 10 Apr 2012 22:32:12 +0200
>> Daniel Vetter <daniel@ffwll.ch> wrote:
>>
>>> On Tue, Apr 10, 2012 at 09:52:40PM +0200, Jiri Slaby wrote:
>>>> Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>
>>>> I tried 3.2 and 3.3. Although the spurious interrupts were always
>>>> there, they occurred with frequency lower by a magnitude (15 vs. 300
>>>> after X starts). So I bisected that and it lead to a commit which
>>>> fixes bad tiling for me:
>>>> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-jiri&id=79710e6ccabdac80c65cd13b944695ecc3e42a9d
>>>
>>> Pipelined fencing is pretty much just broken and we'll completely rip it
>>> out in 3.5. Does this also happen with 3.4-rc2?
>>
>> Does INTx- stay that way?  Or does it frequently read INTx+ if you
>> sample it a lot?  If it stays as INTx-, then something other than the
>> GPU is getting stuck (though it's possible this could be related to
>> pipelined fencing, if the fences are programmed to point at some funky
>> memory space).

Hi and sorry for the delay. It stays INTx-. And I tested that with patch
removing fencing.

> Shot in the dark, let's disable msi a bit. Can you try the below patch?

Yeah, no IRQ_NONE at the end of i915_driver_irq_handler now. So MSI is
busted, either in the card, the chipset or the kernel. Any idea how to
find out?

thanks,
-- 
js
suse labs



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-05-03 19:56                         ` Jiri Slaby
@ 2012-05-03 21:15                           ` Daniel Vetter
  2012-05-03 21:16                             ` Jiri Slaby
  0 siblings, 1 reply; 32+ messages in thread
From: Daniel Vetter @ 2012-05-03 21:15 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Jesse Barnes, Chris Wilson, Jiri Slaby, LKML, dri-devel

On Thu, May 03, 2012 at 09:56:08PM +0200, Jiri Slaby wrote:
> On 04/11/2012 12:40 PM, Daniel Vetter wrote:
> > On Tue, Apr 10, 2012 at 01:34:11PM -0700, Jesse Barnes wrote:
> >> On Tue, 10 Apr 2012 22:32:12 +0200
> >> Daniel Vetter <daniel@ffwll.ch> wrote:
> >>
> >>> On Tue, Apr 10, 2012 at 09:52:40PM +0200, Jiri Slaby wrote:
> >>>> Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
> >>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>
> >>>> I tried 3.2 and 3.3. Although the spurious interrupts were always
> >>>> there, they occurred with frequency lower by a magnitude (15 vs. 300
> >>>> after X starts). So I bisected that and it lead to a commit which
> >>>> fixes bad tiling for me:
> >>>> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-jiri&id=79710e6ccabdac80c65cd13b944695ecc3e42a9d
> >>>
> >>> Pipelined fencing is pretty much just broken and we'll completely rip it
> >>> out in 3.5. Does this also happen with 3.4-rc2?
> >>
> >> Does INTx- stay that way?  Or does it frequently read INTx+ if you
> >> sample it a lot?  If it stays as INTx-, then something other than the
> >> GPU is getting stuck (though it's possible this could be related to
> >> pipelined fencing, if the fences are programmed to point at some funky
> >> memory space).
> 
> Hi and sorry for the delay. It stays INTx-. And I tested that with patch
> removing fencing.
> 
> > Shot in the dark, let's disable msi a bit. Can you try the below patch?
> 
> Yeah, no IRQ_NONE at the end of i915_driver_irq_handler now. So MSI is
> busted, either in the card, the chipset or the kernel. Any idea how to
> find out?

Ok, so MSI is busted. Can you please paste lspci -nn for you intel gpu?
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-05-03 21:15                           ` Daniel Vetter
@ 2012-05-03 21:16                             ` Jiri Slaby
  2012-05-03 21:54                               ` Jesse Barnes
  0 siblings, 1 reply; 32+ messages in thread
From: Jiri Slaby @ 2012-05-03 21:16 UTC (permalink / raw)
  To: Jesse Barnes, Chris Wilson, Jiri Slaby, LKML, dri-devel

On 05/03/2012 11:15 PM, Daniel Vetter wrote:
>>> Shot in the dark, let's disable msi a bit. Can you try the below patch?
>>
>> Yeah, no IRQ_NONE at the end of i915_driver_irq_handler now. So MSI is
>> busted, either in the card, the chipset or the kernel. Any idea how to
>> find out?
> 
> Ok, so MSI is busted. Can you please paste lspci -nn for you intel gpu?

Sure:
00:02.0 VGA compatible controller [0300]: Intel Corporation 82G33/G31
Express Integrated Graphics Controller [8086:29c2] (rev 02)

thanks,
-- 
js
suse labs



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-05-03 21:16                             ` Jiri Slaby
@ 2012-05-03 21:54                               ` Jesse Barnes
  2012-05-03 23:15                                 ` Ben Widawsky
  0 siblings, 1 reply; 32+ messages in thread
From: Jesse Barnes @ 2012-05-03 21:54 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Chris Wilson, Jiri Slaby, LKML, dri-devel

On Thu, 03 May 2012 23:16:02 +0200
Jiri Slaby <jslaby@suse.cz> wrote:

> On 05/03/2012 11:15 PM, Daniel Vetter wrote:
> >>> Shot in the dark, let's disable msi a bit. Can you try the below patch?
> >>
> >> Yeah, no IRQ_NONE at the end of i915_driver_irq_handler now. So MSI is
> >> busted, either in the card, the chipset or the kernel. Any idea how to
> >> find out?
> > 
> > Ok, so MSI is busted. Can you please paste lspci -nn for you intel gpu?
> 
> Sure:
> 00:02.0 VGA compatible controller [0300]: Intel Corporation 82G33/G31
> Express Integrated Graphics Controller [8086:29c2] (rev 02)

Ok nevermind about the INTx-; now I'm not sure if it means anything or
not in an MSI context (the spec doesn't require it, but I thought our
devices would toggle it if they were sending interrupts).

But since line level works for you I guess it's ok to blacklist your
chipset until we poke some hw folks internally about this.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: i915_driver_irq_handler: irq 42: nobody cared
  2012-05-03 21:54                               ` Jesse Barnes
@ 2012-05-03 23:15                                 ` Ben Widawsky
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Widawsky @ 2012-05-03 23:15 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Jiri Slaby, LKML, Jiri Slaby, dri-devel

On Thu, 3 May 2012 14:54:22 -0700
Jesse Barnes <jbarnes@virtuousgeek.org> wrote:

> On Thu, 03 May 2012 23:16:02 +0200
> Jiri Slaby <jslaby@suse.cz> wrote:
> 
> > On 05/03/2012 11:15 PM, Daniel Vetter wrote:
> > >>> Shot in the dark, let's disable msi a bit. Can you try the below patch?
> > >>
> > >> Yeah, no IRQ_NONE at the end of i915_driver_irq_handler now. So MSI is
> > >> busted, either in the card, the chipset or the kernel. Any idea how to
> > >> find out?
> > > 
> > > Ok, so MSI is busted. Can you please paste lspci -nn for you intel gpu?
> > 
> > Sure:
> > 00:02.0 VGA compatible controller [0300]: Intel Corporation 82G33/G31
> > Express Integrated Graphics Controller [8086:29c2] (rev 02)
> 
> Ok nevermind about the INTx-; now I'm not sure if it means anything or
> not in an MSI context (the spec doesn't require it, but I thought our
> devices would toggle it if they were sending interrupts).
> 
> But since line level works for you I guess it's ok to blacklist your
> chipset until we poke some hw folks internally about this.
> 
> Thanks,

I occassionally see missed IRQ on 16 (which is my USB) but it has only
started showing up in fairly recent dinq (haven't tried Linus') kernels
(I'd been using this laptop for over a year).


-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-05-03 23:23 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4F717CE3.4040206@suse.cz>
2012-03-27  8:42 ` i915_driver_irq_handler: irq 42: nobody cared Jiri Slaby
2012-03-30  9:59   ` Jiri Slaby
2012-03-30 10:45     ` Chris Wilson
2012-03-30 12:11       ` Jiri Slaby
2012-03-30 12:24         ` Chris Wilson
2012-04-06 21:31           ` i915_driver_irq_handler: irq 42: nobody cared [generic IRQ handling broken?] Jiri Slaby
2012-04-06 22:40             ` Thomas Gleixner
2012-04-09 17:12               ` Jesse Barnes
2012-04-09 17:52                 ` Dave Airlie
2012-04-10  8:44               ` Jiri Slaby
2012-04-10  8:50             ` Daniel Vetter
2012-04-10  8:52             ` i915_driver_irq_handler: irq 42: nobody cared Jiri Slaby
2012-04-10 16:50               ` Marcin Slusarz
2012-04-09 17:11       ` Jesse Barnes
2012-04-10  8:47         ` Jiri Slaby
2012-04-10  8:58           ` Daniel Vetter
2012-04-10  9:48             ` Jiri Slaby
2012-04-10 16:26           ` Jesse Barnes
2012-04-10 18:11             ` Jiri Slaby
2012-04-10 18:34               ` Jesse Barnes
2012-04-10 19:52                 ` Jiri Slaby
2012-04-10 20:32                   ` Daniel Vetter
2012-04-10 20:34                     ` Jesse Barnes
2012-04-11 10:40                       ` Daniel Vetter
2012-05-03 19:56                         ` Jiri Slaby
2012-05-03 21:15                           ` Daniel Vetter
2012-05-03 21:16                             ` Jiri Slaby
2012-05-03 21:54                               ` Jesse Barnes
2012-05-03 23:15                                 ` Ben Widawsky
2012-04-11  6:29                 ` Michel Dänzer
2012-04-11 16:03                   ` Jesse Barnes
     [not found] ` <20120327085749.GE4276@phenom.ffwll.local>
2012-03-27 10:54   ` Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).