All of lore.kernel.org
 help / color / mirror / Atom feed
* i915 irq storm mitigation in 3.10
@ 2013-07-06 21:41 Jan Niggemann
  2013-07-08 20:03 ` Daniel Vetter
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Niggemann @ 2013-07-06 21:41 UTC (permalink / raw)
  To: intel-gfx

Hi list,

I'm not sure, but I think the recent patch to keep i915 systems from 
locking up / slowing down in case of irq storms didn't fully help.
I compiled 3.10, but every now and then, my system has the same 
symptoms as before.

grep storm /var/log/messages
Jul  5 21:55:43 muretop kernel: [  324.767375] [drm] HPD interrupt 
storm detected on connector DP-3: switching from hotplug detection to 
polling
Jul  6 23:05:57 muretop kernel: [  200.372208] [drm] HPD interrupt 
storm detected on connector HDMI-A-1: switching from hotplug detection 
to polling
Jul  6 23:07:51 muretop kernel: [  313.752039] [drm] HPD interrupt 
storm detected on connector DP-3: switching from hotplug detection to 
pol

I notice up to 50% CPU from kworker/u8:0 when my system slows down.

I'm new to kernel development, but if you tell me what to do, I'm very 
willing to help diagnosing this.

Regards
jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-06 21:41 i915 irq storm mitigation in 3.10 Jan Niggemann
@ 2013-07-08 20:03 ` Daniel Vetter
  2013-07-21 20:23   ` Jan Niggemann
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Vetter @ 2013-07-08 20:03 UTC (permalink / raw)
  To: Jan Niggemann; +Cc: intel-gfx

On Sat, Jul 06, 2013 at 11:41:07PM +0200, Jan Niggemann wrote:
> Hi list,
> 
> I'm not sure, but I think the recent patch to keep i915 systems from
> locking up / slowing down in case of irq storms didn't fully help.
> I compiled 3.10, but every now and then, my system has the same
> symptoms as before.
> 
> grep storm /var/log/messages
> Jul  5 21:55:43 muretop kernel: [  324.767375] [drm] HPD interrupt
> storm detected on connector DP-3: switching from hotplug detection
> to polling
> Jul  6 23:05:57 muretop kernel: [  200.372208] [drm] HPD interrupt
> storm detected on connector HDMI-A-1: switching from hotplug
> detection to polling
> Jul  6 23:07:51 muretop kernel: [  313.752039] [drm] HPD interrupt
> storm detected on connector DP-3: switching from hotplug detection
> to pol
> 
> I notice up to 50% CPU from kworker/u8:0 when my system slows down.
> 
> I'm new to kernel development, but if you tell me what to do, I'm
> very willing to help diagnosing this.

Seems to work as advertised, once we detect a storm we kill the hotplug
interrupt. Note that since on many systems these storms only happen in
certain conditions (e.g. specific battery charger state) we'll re-enable
the interrupts every 2 minutes to check whether the storm has passed
(there's not really any other way).

But every time this happens we only let through a few interrupts, so this
shouldn't affect you badly. Can you please check whether those slowdowns
line up with 2 minute intervalls?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-08 20:03 ` Daniel Vetter
@ 2013-07-21 20:23   ` Jan Niggemann
  2013-07-21 20:43     ` Daniel Vetter
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Niggemann @ 2013-07-21 20:23 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

Hi list,

I was busy, but this is important so I report my findings:

Am 08.07.2013 22:03, schrieb Daniel Vetter:
> On Sat, Jul 06, 2013 at 11:41:07PM +0200, Jan Niggemann wrote:
>> Hi list,
>>
>> I'm not sure, but I think the recent patch to keep i915 systems from
>> locking up / slowing down in case of irq storms didn't fully help.
>> I compiled 3.10, but every now and then, my system has the same
>> symptoms as before.
>>
>> grep storm /var/log/messages
>> Jul  5 21:55:43 muretop kernel: [  324.767375] [drm] HPD interrupt
>> storm detected on connector DP-3: switching from hotplug detection
>> to polling
>> Jul  6 23:05:57 muretop kernel: [  200.372208] [drm] HPD interrupt
>> storm detected on connector HDMI-A-1: switching from hotplug
>> detection to polling
>> Jul  6 23:07:51 muretop kernel: [  313.752039] [drm] HPD interrupt
>> storm detected on connector DP-3: switching from hotplug detection
>> to pol
>>
>> I notice up to 50% CPU from kworker/u8:0 when my system slows down.
>>
>> I'm new to kernel development, but if you tell me what to do, I'm
>> very willing to help diagnosing this.
>
> Seems to work as advertised, once we detect a storm we kill the 
> hotplug
> interrupt. Note that since on many systems these storms only happen 
> in
> certain conditions (e.g. specific battery charger state) we'll 
> re-enable
> the interrupts every 2 minutes to check whether the storm has passed
> (there's not really any other way).
OK, I didn't know it worked that way.

> But every time this happens we only let through a few interrupts, so 
> this
> shouldn't affect you badly. Can you please check whether those 
> slowdowns
> line up with 2 minute intervalls?
I observed these slowdowns for a couple of weeks now. On my machine, 
they only happen once, some minutes after a cold boot.
They last for a minute or two, and then they are gone.
I'd have guessed that the storm detection kicks in pretty quickly after 
a storm is detected and that it would go unnoticed.

Another thing I have noticed: All storm-related messages are about DP-3 
and HDMI-A-1, but my T400 doesn't have either a HDMI out nor a 
DisplayPort connector...

I returned to 3.2.0 with drm_kms_helper.poll=0 for the time being, so 
this doesn't affect me badly.
Please tell me if and how I can further diagnose this.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-21 20:23   ` Jan Niggemann
@ 2013-07-21 20:43     ` Daniel Vetter
  2013-07-22  6:14       ` Jan Niggemann
  2013-07-22  8:04       ` Egbert Eich
  0 siblings, 2 replies; 17+ messages in thread
From: Daniel Vetter @ 2013-07-21 20:43 UTC (permalink / raw)
  To: Jan Niggemann; +Cc: Egbert Eich, intel-gfx

On Sun, Jul 21, 2013 at 10:23 PM, Jan Niggemann <jn@hz6.de> wrote:
>> But every time this happens we only let through a few interrupts, so this
>> shouldn't affect you badly. Can you please check whether those slowdowns
>> line up with 2 minute intervalls?
>
> I observed these slowdowns for a couple of weeks now. On my machine, they
> only happen once, some minutes after a cold boot.
> They last for a minute or two, and then they are gone.
> I'd have guessed that the storm detection kicks in pretty quickly after a
> storm is detected and that it would go unnoticed.

Hm, that sounds like something doesn't quite work as expected. We
should kill things once we get 5 interrupts or so in 1 second. So if
it's bad enough that it slows your machine down it really should only
be barely noticeable.

Can you please boot with drm.debug=0xe added to your kernel cmdline,
do a cold boot (so that we get a storm) and then once everything is
quiet grab the full kernel log (you probably have to grab it from the
disk, dmesg has likely overflown) and upload it somewhere (since it's
probably too big for the mailing lists)? Please also enable printk
timestamps for your kernel so that we can create a timeline of
everything going on.

That log should give us a starting point, but I guess we need to run
some debug patches past you to figure out what's going on in your
system exactly.

> Another thing I have noticed: All storm-related messages are about DP-3 and
> HDMI-A-1, but my T400 doesn't have either a HDMI out nor a DisplayPort
> connector...

We'll only enable ports (and interrupts for them) if we think they're
there. Could it be that there's a docking station for your machine
available which would have these ports?

> I returned to 3.2.0 with drm_kms_helper.poll=0 for the time being, so this
> doesn't affect me badly.
> Please tell me if and how I can further diagnose this.

That's not good, we need to fix this. Also adding Egbert, maybe he has
seen something like this in the wild already, after all he's
implemented the storm handling code.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-21 20:43     ` Daniel Vetter
@ 2013-07-22  6:14       ` Jan Niggemann
  2013-07-22  6:26         ` Daniel Vetter
  2013-07-22  8:04       ` Egbert Eich
  1 sibling, 1 reply; 17+ messages in thread
From: Jan Niggemann @ 2013-07-22  6:14 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Egbert Eich, intel-gfx

Hi all,

Am 21.07.2013 22:43, schrieb Daniel Vetter:
> Can you please boot with drm.debug=0xe added to your kernel cmdline,
> do a cold boot (so that we get a storm) and then once everything is
> quiet grab the full kernel log (you probably have to grab it from the
> disk, dmesg has likely overflown) and upload it somewhere (since it's
> probably too big for the mailing lists)? Please also enable printk
> timestamps for your kernel so that we can create a timeline of
> everything going on.
I hope I did it correctly, apending both printk.time=1 and 
drm.debug=0xe to my kernel command line:
http://files.hz6.de/kern_20130722.log

> That log should give us a starting point, but I guess we need to run
> some debug patches past you to figure out what's going on in your
> system exactly.
Did the drm.debug option work, it's not very chatty?

>> Another thing I have noticed: All storm-related messages are about 
>> DP-3 and
>> HDMI-A-1, but my T400 doesn't have either a HDMI out nor a 
>> DisplayPort
>> connector...
> We'll only enable ports (and interrupts for them) if we think they're
> there. Could it be that there's a docking station for your machine
> available which would have these ports?
Hmm, yes, I think so.

Thank you all for helping diagnosing this issue!

Regards
jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-22  6:14       ` Jan Niggemann
@ 2013-07-22  6:26         ` Daniel Vetter
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2013-07-22  6:26 UTC (permalink / raw)
  To: Jan Niggemann; +Cc: Egbert Eich, intel-gfx

On Mon, Jul 22, 2013 at 8:14 AM, Jan Niggemann <jn@hz6.de> wrote:
> Am 21.07.2013 22:43, schrieb Daniel Vetter:
>
>> Can you please boot with drm.debug=0xe added to your kernel cmdline,
>> do a cold boot (so that we get a storm) and then once everything is
>> quiet grab the full kernel log (you probably have to grab it from the
>> disk, dmesg has likely overflown) and upload it somewhere (since it's
>> probably too big for the mailing lists)? Please also enable printk
>> timestamps for your kernel so that we can create a timeline of
>> everything going on.
>
> I hope I did it correctly, apending both printk.time=1 and drm.debug=0xe to
> my kernel command line:
> http://files.hz6.de/kern_20130722.log
>
>
>> That log should give us a starting point, but I guess we need to run
>> some debug patches past you to figure out what's going on in your
>> system exactly.
>
> Did the drm.debug option work, it's not very chatty?

Nope, that didn't work out somehow, it should be _much_ more chatty,
even without a hpd irq storm. Usually the kernel prints the
commandline in the log, but I didn't see that here either ... Can you
please try to figure out what's going on here?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-21 20:43     ` Daniel Vetter
  2013-07-22  6:14       ` Jan Niggemann
@ 2013-07-22  8:04       ` Egbert Eich
  2013-07-22 19:28         ` Jan Niggemann
  1 sibling, 1 reply; 17+ messages in thread
From: Egbert Eich @ 2013-07-22  8:04 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Egbert Eich, intel-gfx, Jan Niggemann

Daniel Vetter writes:
 > On Sun, Jul 21, 2013 at 10:23 PM, Jan Niggemann <jn@hz6.de> wrote:
 > >> But every time this happens we only let through a few interrupts, so this
 > >> shouldn't affect you badly. Can you please check whether those slowdowns
 > >> line up with 2 minute intervalls?
 > >
 > > I observed these slowdowns for a couple of weeks now. On my machine, they
 > > only happen once, some minutes after a cold boot.
 > > They last for a minute or two, and then they are gone.
 > > I'd have guessed that the storm detection kicks in pretty quickly after a
 > > storm is detected and that it would go unnoticed.
 > 
 > Hm, that sounds like something doesn't quite work as expected. We
 > should kill things once we get 5 interrupts or so in 1 second. So if
 > it's bad enough that it slows your machine down it really should only
 > be barely noticeable.
 > 

The logs show that the disable mechanism got triggered, so there was
a storm that got detected.
The respective message is generated by the worker, everything up to 
there (detection and marking disabled) seems to be fine.
I bet we are still getting interrupts but the respective bit in 
hpd_event_bits doesn't get set any more. Since we unconditionally 
queue the worker on interrupt there is surprise it is so busy.

Then this points to the call to hpd_irq_setup() in intel_hpd_irq_handler()
not doing what is expected, ie masking out the stormy interrupt.
Could it be that we can't mask/disable an interrupt before ACKing
it?

@Jan, could you also specify what hardware you are using (ie give us
an output of lspci -n)?


Cheers,
	Egbert.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-22  8:04       ` Egbert Eich
@ 2013-07-22 19:28         ` Jan Niggemann
  2013-07-22 20:36           ` Egbert Eich
  2013-07-23 11:26           ` Egbert Eich
  0 siblings, 2 replies; 17+ messages in thread
From: Jan Niggemann @ 2013-07-22 19:28 UTC (permalink / raw)
  To: Egbert Eich; +Cc: intel-gfx

Egbert, Daniel, others,

Am 22.07.2013 10:04, schrieb Egbert Eich:
> Daniel Vetter writes:
>  > On Sun, Jul 21, 2013 at 10:23 PM, Jan Niggemann <jn@hz6.de> wrote:
>  > >> But every time this happens we only let through a few
> interrupts, so this
>  > >> shouldn't affect you badly. Can you please check whether those
> slowdowns
>  > >> line up with 2 minute intervalls?
>  > >
>  > > I observed these slowdowns for a couple of weeks now. On my
> machine, they
>  > > only happen once, some minutes after a cold boot.
>  > > They last for a minute or two, and then they are gone.
>  > > I'd have guessed that the storm detection kicks in pretty
> quickly after a
>  > > storm is detected and that it would go unnoticed.
>  >
>  > Hm, that sounds like something doesn't quite work as expected. We
>  > should kill things once we get 5 interrupts or so in 1 second. So 
> if
>  > it's bad enough that it slows your machine down it really should 
> only
>  > be barely noticeable.
>  >
>
> The logs show that the disable mechanism got triggered, so there was
> a storm that got detected.
> The respective message is generated by the worker, everything up to
> there (detection and marking disabled) seems to be fine.
> I bet we are still getting interrupts but the respective bit in
> hpd_event_bits doesn't get set any more. Since we unconditionally
> queue the worker on interrupt there is surprise it is so busy.
>
> Then this points to the call to hpd_irq_setup() in 
> intel_hpd_irq_handler()
> not doing what is expected, ie masking out the stormy interrupt.
> Could it be that we can't mask/disable an interrupt before ACKing
> it?
>
> @Jan, could you also specify what hardware you are using (ie give us
> an output of lspci -n)?
It's a Lenovo ThinkPad T400, the model is 7434-AG2.
root@muretop:~# lspci -n
00:00.0 0600: 8086:2a40 (rev 07)
00:02.0 0300: 8086:2a42 (rev 07)
00:02.1 0380: 8086:2a43 (rev 07)
00:03.0 0780: 8086:2a44 (rev 07)
00:19.0 0200: 8086:10f5 (rev 03)
00:1a.0 0c03: 8086:2937 (rev 03)
00:1a.1 0c03: 8086:2938 (rev 03)
00:1a.2 0c03: 8086:2939 (rev 03)
00:1a.7 0c03: 8086:293c (rev 03)
00:1b.0 0403: 8086:293e (rev 03)
00:1c.0 0604: 8086:2940 (rev 03)
00:1c.1 0604: 8086:2942 (rev 03)
00:1c.3 0604: 8086:2946 (rev 03)
00:1c.4 0604: 8086:2948 (rev 03)
00:1d.0 0c03: 8086:2934 (rev 03)
00:1d.1 0c03: 8086:2935 (rev 03)
00:1d.2 0c03: 8086:2936 (rev 03)
00:1d.7 0c03: 8086:293a (rev 03)
00:1e.0 0604: 8086:2448 (rev 93)
00:1f.0 0601: 8086:2917 (rev 03)
00:1f.2 0106: 8086:2929 (rev 03)
00:1f.3 0c05: 8086:2930 (rev 03)
03:00.0 0280: 8086:4237
15:00.0 0607: 1180:0476 (rev ba)

As to the log: I messed up the kernel parameters this morning... was 
out of coffee this morning and my 1,5y daughter played around me :-)

Here's my kernel log with drm.debug and printk.time enabled:
Uncompressed (22M): http://files.hz6.de/kern_20130722.log
bzip2'd (some 600 KB): http://files.hz6.de/kern_20130722.log.bz2

Regards
jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-22 19:28         ` Jan Niggemann
@ 2013-07-22 20:36           ` Egbert Eich
  2013-07-23  8:17             ` Daniel Vetter
  2013-07-23 11:26           ` Egbert Eich
  1 sibling, 1 reply; 17+ messages in thread
From: Egbert Eich @ 2013-07-22 20:36 UTC (permalink / raw)
  To: Jan Niggemann; +Cc: Egbert Eich, intel-gfx

Jan Niggemann writes:
 > Egbert, Daniel, others,
 > It's a Lenovo ThinkPad T400, the model is 7434-AG2.
 > root@muretop:~# lspci -n
 > 00:00.0 0600: 8086:2a40 (rev 07)
 > 00:02.0 0300: 8086:2a42 (rev 07)

Ok, this is a gm45.

 > 
 > As to the log: I messed up the kernel parameters this morning... was 
 > out of coffee this morning and my 1,5y daughter played around me :-)
 > 
 > Here's my kernel log with drm.debug and printk.time enabled:
 > Uncompressed (22M): http://files.hz6.de/kern_20130722.log

These logs show that interrupts are still coming thru although they
should be disabled.

Could you try if the patch below makes any difference?

Cheers,
	Egbert.

>From 70dae32e99799d15ddcedd5853514215624a8289 Mon Sep 17 00:00:00 2001
From: Egbert Eich <eich@suse.de>
Date: Mon, 22 Jul 2013 22:33:36 +0200
Subject: [PATCH] drm/i915: Make sure PORT_HOTPLUG_EN is written

Add posting read to make sure PORT_HOTPLUG_EN is written in
i915_hpd_irq_setup().

Signed-off-by: Egbert Eich <eich@suse.de>
---
 drivers/gpu/drm/i915/i915_irq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f708e4e..e43d809 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2802,6 +2802,7 @@ static void i915_hpd_irq_setup(struct drm_device *dev)
 
 		/* Ignore TV since it's buggy */
 		I915_WRITE(PORT_HOTPLUG_EN, hotplug_en);
+		POSTING_READ(PORT_HOTPLUG_EN);
 	}
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-22 20:36           ` Egbert Eich
@ 2013-07-23  8:17             ` Daniel Vetter
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2013-07-23  8:17 UTC (permalink / raw)
  To: Egbert Eich; +Cc: intel-gfx, Jan Niggemann

On Mon, Jul 22, 2013 at 10:36 PM, Egbert Eich <eich@suse.com> wrote:
> Jan Niggemann writes:
>  > Egbert, Daniel, others,
>  > It's a Lenovo ThinkPad T400, the model is 7434-AG2.
>  > root@muretop:~# lspci -n
>  > 00:00.0 0600: 8086:2a40 (rev 07)
>  > 00:02.0 0300: 8086:2a42 (rev 07)
>
> Ok, this is a gm45.
>
>  >
>  > As to the log: I messed up the kernel parameters this morning... was
>  > out of coffee this morning and my 1,5y daughter played around me :-)
>  >
>  > Here's my kernel log with drm.debug and printk.time enabled:
>  > Uncompressed (22M): http://files.hz6.de/kern_20130722.log
>
> These logs show that interrupts are still coming thru although they
> should be disabled.

Hm, in my reading of that logfile it looks like the irq disabling
works, every time we detect a storm there's no noise for 2 minutes on
that pin. But it seems to take the code an awful long time to notice a
storm, sometimes it even takes 2-3 minutes ... And we seem to get a
lot of hpd events per second, so this should trigger much quicker. No
idea yet what's going on here.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-22 19:28         ` Jan Niggemann
  2013-07-22 20:36           ` Egbert Eich
@ 2013-07-23 11:26           ` Egbert Eich
  2013-07-24 20:52             ` Jan Niggemann
  1 sibling, 1 reply; 17+ messages in thread
From: Egbert Eich @ 2013-07-23 11:26 UTC (permalink / raw)
  To: Jan Niggemann; +Cc: Egbert Eich, intel-gfx


Hi Jan -

Jan Niggemann writes:
 > 
 > As to the log: I messed up the kernel parameters this morning... was 
 > out of coffee this morning and my 1,5y daughter played around me :-)
 > 
 > Here's my kernel log with drm.debug and printk.time enabled:
 > Uncompressed (22M): http://files.hz6.de/kern_20130722.log
 > bzip2'd (some 600 KB): http://files.hz6.de/kern_20130722.log.bz2

I've looked at the logs a bit more. Here's a patch adding some more
debug information. Would you please apply this to your 3.10 kernel
and generate a log file the same way as you did before.
The driver will be even more chatty - but I don't expect any problems
from this.

Cheers,
	Egbert.

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e43d809..46bb77c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -919,6 +919,11 @@ static inline void intel_hpd_irq_handler(struct drm_device *dev,
 	spin_lock(&dev_priv->irq_lock);
 	for (i = 1; i < HPD_NUM_PINS; i++) {
 
+		if (IS_G4X(dev) && (hpd[i] & hotplug_trigger) &&
+		    dev_priv->hpd_stats[i].hpd_mark != HPD_ENABLED)
+			DRM_DEBUG_KMS("Received HPD intterupt although disabled\n",
+				      I915_READ(PORT_HOTPLUG_EN));
+
 		if (!(hpd[i] & hotplug_trigger) ||
 		    dev_priv->hpd_stats[i].hpd_mark != HPD_ENABLED)
 			continue;
@@ -929,6 +934,8 @@ static inline void intel_hpd_irq_handler(struct drm_device *dev,
 				   + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD))) {
 			dev_priv->hpd_stats[i].hpd_last_jiffies = jiffies;
 			dev_priv->hpd_stats[i].hpd_cnt = 0;
+			DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - jiffies: %d\n", i,
+				      dev_priv->hpd_stats[i].hpd_last_jiffies);
 		} else if (dev_priv->hpd_stats[i].hpd_cnt > HPD_STORM_THRESHOLD) {
 			dev_priv->hpd_stats[i].hpd_mark = HPD_MARK_DISABLED;
 			dev_priv->hpd_event_bits &= ~(1 << i);
@@ -936,6 +943,8 @@ static inline void intel_hpd_irq_handler(struct drm_device *dev,
 			storm_detected = true;
 		} else {
 			dev_priv->hpd_stats[i].hpd_cnt++;
+			DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: %d\n", i,
+				      dev_priv->hpd_stats[i].hpd_cnt);
 		}
 	}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-23 11:26           ` Egbert Eich
@ 2013-07-24 20:52             ` Jan Niggemann
  2013-07-25  7:50               ` Egbert Eich
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Niggemann @ 2013-07-24 20:52 UTC (permalink / raw)
  To: Egbert Eich; +Cc: intel-gfx

Hi Egbert,

Am 23.07.2013 13:26, schrieb Egbert Eich:
> I've looked at the logs a bit more. Here's a patch adding some more
> debug information. Would you please apply this to your 3.10 kernel
> and generate a log file the same way as you did before.
> The driver will be even more chatty - but I don't expect any problems
> from this.

> diff --git a/drivers/gpu/drm/i915/i915_irq.c
> b/drivers/gpu/drm/i915/i915_irq.c
> index e43d809..46bb77c 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -919,6 +919,11 @@ static inline void intel_hpd_irq_handler(struct
> drm_device *dev,
>  	spin_lock(&dev_priv->irq_lock);
>  	for (i = 1; i < HPD_NUM_PINS; i++) {
>
> +		if (IS_G4X(dev) && (hpd[i] & hotplug_trigger) &&
> +		    dev_priv->hpd_stats[i].hpd_mark != HPD_ENABLED)
> +			DRM_DEBUG_KMS("Received HPD intterupt although disabled\n",
> +				      I915_READ(PORT_HOTPLUG_EN));
> +
>  		if (!(hpd[i] & hotplug_trigger) ||
>  		    dev_priv->hpd_stats[i].hpd_mark != HPD_ENABLED)
>  			continue;
> @@ -929,6 +934,8 @@ static inline void intel_hpd_irq_handler(struct
> drm_device *dev,
>  				   + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD))) {
>  			dev_priv->hpd_stats[i].hpd_last_jiffies = jiffies;
>  			dev_priv->hpd_stats[i].hpd_cnt = 0;
> +			DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - jiffies: %d\n", 
> i,
> +				      dev_priv->hpd_stats[i].hpd_last_jiffies);
>  		} else if (dev_priv->hpd_stats[i].hpd_cnt > HPD_STORM_THRESHOLD) {
>  			dev_priv->hpd_stats[i].hpd_mark = HPD_MARK_DISABLED;
>  			dev_priv->hpd_event_bits &= ~(1 << i);
> @@ -936,6 +943,8 @@ static inline void intel_hpd_irq_handler(struct
> drm_device *dev,
>  			storm_detected = true;
>  		} else {
>  			dev_priv->hpd_stats[i].hpd_cnt++;
> +			DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: %d\n", i,
> +				      dev_priv->hpd_stats[i].hpd_cnt);
>  		}
>  	}
I applied this patch (but not the one you sent on Monday), recompiled 
and logged:
uncompressed (8,2M) http://files.hz6.de/kern_20130724.log
bz2 (288 KB) http://files.hz6.de/kern_20130724.log.bz2

Hope this helps,
Regards
jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-24 20:52             ` Jan Niggemann
@ 2013-07-25  7:50               ` Egbert Eich
  2013-07-25  9:58                 ` Daniel Vetter
  0 siblings, 1 reply; 17+ messages in thread
From: Egbert Eich @ 2013-07-25  7:50 UTC (permalink / raw)
  To: Jan Niggemann; +Cc: Egbert Eich, intel-gfx


Hi Jan!

Jan Niggemann writes:
 > Hi Egbert,

[...]

Thanks for generating the logs! Hope you had a nice birthday dinner :)

 > 
 > I applied this patch (but not the one you sent on Monday), recompiled 
 > and logged:
 > uncompressed (8,2M) http://files.hz6.de/kern_20130724.log
 > bz2 (288 KB) http://files.hz6.de/kern_20130724.log.bz2

These logs show clearly that we are seeing interrupts which
should be disabled.
Could it be that we we have either the status or enable bits
mixed up? Unfortunately the publically available docs for GEN4
don't show the HOTPLUG_EN and HOTPLUG_STAT registers.

Daniel, could you please help me out here?

Thanks a lot!

Cheers,
	Egbert.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-25  7:50               ` Egbert Eich
@ 2013-07-25  9:58                 ` Daniel Vetter
  2013-07-25 11:18                   ` Egbert Eich
  2013-07-25 21:23                   ` Jan Niggemann
  0 siblings, 2 replies; 17+ messages in thread
From: Daniel Vetter @ 2013-07-25  9:58 UTC (permalink / raw)
  To: Egbert Eich; +Cc: intel-gfx, Jan Niggemann

On Thu, Jul 25, 2013 at 9:50 AM, Egbert Eich <eich@suse.com> wrote:
>  > I applied this patch (but not the one you sent on Monday), recompiled
>  > and logged:
>  > uncompressed (8,2M) http://files.hz6.de/kern_20130724.log
>  > bz2 (288 KB) http://files.hz6.de/kern_20130724.log.bz2
>
> These logs show clearly that we are seeing interrupts which
> should be disabled.
> Could it be that we we have either the status or enable bits
> mixed up? Unfortunately the publically available docs for GEN4
> don't show the HOTPLUG_EN and HOTPLUG_STAT registers.
>
> Daniel, could you please help me out here?

I've checked the docs again for gm45 and we seem to have the right
values. But on early gen4 (i.e. i965g/gm) the Bspec has been wrong
about these, so I wouldn't be surprised at all if they're wrong for
the digital ports on gm45, too.

Iirc we've already had a case like that, but there was no real
conclusion (and atm I can't find the bug). Can you pls try the below
patch (on top of Egbert's debug stuff)?

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 6caa748..2d4c884 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1925,9 +1925,9 @@

 #define PORT_HOTPLUG_STAT      (dev_priv->info->display_mmio_offset + 0x61114)
 /* HDMI/DP bits are gen4+ */
-#define   PORTB_HOTPLUG_LIVE_STATUS               (1 << 29)
+#define   PORTD_HOTPLUG_LIVE_STATUS               (1 << 29)
 #define   PORTC_HOTPLUG_LIVE_STATUS               (1 << 28)
-#define   PORTD_HOTPLUG_LIVE_STATUS               (1 << 27)
+#define   PORTB_HOTPLUG_LIVE_STATUS               (1 << 27)
 #define   PORTD_HOTPLUG_INT_STATUS             (3 << 21)
 #define   PORTC_HOTPLUG_INT_STATUS             (3 << 19)
 #define   PORTB_HOTPLUG_INT_STATUS             (3 << 17)


Egbert I think your debug patch has fairly useful information for
debugging the storm code in general, can you please submit a patch
against drm-intel-nightly?

Thanks, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-25  9:58                 ` Daniel Vetter
@ 2013-07-25 11:18                   ` Egbert Eich
  2013-07-25 21:23                   ` Jan Niggemann
  1 sibling, 0 replies; 17+ messages in thread
From: Egbert Eich @ 2013-07-25 11:18 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Egbert Eich, intel-gfx, Jan Niggemann

Daniel Vetter writes:
 > 
 > I've checked the docs again for gm45 and we seem to have the right
 > values. But on early gen4 (i.e. i965g/gm) the Bspec has been wrong
 > about these, so I wouldn't be surprised at all if they're wrong for
 > the digital ports on gm45, too.
 > 
 > Iirc we've already had a case like that, but there was no real
 > conclusion (and atm I can't find the bug). Can you pls try the below
 > patch (on top of Egbert's debug stuff)?
 > 
If this doesn't help, lets disable all the digital ports together for
testing.

 > 
 > 
 > Egbert I think your debug patch has fairly useful information for
 > debugging the storm code in general, can you please submit a patch
 > against drm-intel-nightly?
 > 

Will do :)

Cheers,
	Egbert.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-25  9:58                 ` Daniel Vetter
  2013-07-25 11:18                   ` Egbert Eich
@ 2013-07-25 21:23                   ` Jan Niggemann
  2013-07-26  5:11                     ` Egbert Eich
  1 sibling, 1 reply; 17+ messages in thread
From: Jan Niggemann @ 2013-07-25 21:23 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Egbert Eich, intel-gfx

Hi Daniel,

Am 25.07.2013 11:58, schrieb Daniel Vetter:
> Can you pls try the below
> patch (on top of Egbert's debug stuff)?
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> b/drivers/gpu/drm/i915/i915_reg.h
> index 6caa748..2d4c884 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -1925,9 +1925,9 @@
>
>  #define PORT_HOTPLUG_STAT      (dev_priv->info->display_mmio_offset
> + 0x61114)
>  /* HDMI/DP bits are gen4+ */
> -#define   PORTB_HOTPLUG_LIVE_STATUS               (1 << 29)
> +#define   PORTD_HOTPLUG_LIVE_STATUS               (1 << 29)
>  #define   PORTC_HOTPLUG_LIVE_STATUS               (1 << 28)
> -#define   PORTD_HOTPLUG_LIVE_STATUS               (1 << 27)
> +#define   PORTB_HOTPLUG_LIVE_STATUS               (1 << 27)
>  #define   PORTD_HOTPLUG_INT_STATUS             (3 << 21)
>  #define   PORTC_HOTPLUG_INT_STATUS             (3 << 19)
>  #define   PORTB_HOTPLUG_INT_STATUS             (3 << 17)
I did, here are the logs:
364K http://files.hz6.de/kern_20130724_2.log
I don't understand what exactly this patch does, but I noticed:
- much less drm debug info, resulting in a much smaller log
- no more noticeable lag on my system (even though a storm was 
detected).
I double checked the latter and the lag seems indeed to be gone...

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: i915 irq storm mitigation in 3.10
  2013-07-25 21:23                   ` Jan Niggemann
@ 2013-07-26  5:11                     ` Egbert Eich
  0 siblings, 0 replies; 17+ messages in thread
From: Egbert Eich @ 2013-07-26  5:11 UTC (permalink / raw)
  To: Jan Niggemann; +Cc: Egbert Eich, intel-gfx

Jan Niggemann writes:
 > Hi Daniel,
 > 
 > Am 25.07.2013 11:58, schrieb Daniel Vetter:
 > > Can you pls try the below
 > > patch (on top of Egbert's debug stuff)?
 > >
 > > diff --git a/drivers/gpu/drm/i915/i915_reg.h
 > > b/drivers/gpu/drm/i915/i915_reg.h
 > > index 6caa748..2d4c884 100644
 > > --- a/drivers/gpu/drm/i915/i915_reg.h
 > > +++ b/drivers/gpu/drm/i915/i915_reg.h
 > > @@ -1925,9 +1925,9 @@
 > >
 > >  #define PORT_HOTPLUG_STAT      (dev_priv->info->display_mmio_offset
 > > + 0x61114)
 > >  /* HDMI/DP bits are gen4+ */
 > > -#define   PORTB_HOTPLUG_LIVE_STATUS               (1 << 29)
 > > +#define   PORTD_HOTPLUG_LIVE_STATUS               (1 << 29)
 > >  #define   PORTC_HOTPLUG_LIVE_STATUS               (1 << 28)
 > > -#define   PORTD_HOTPLUG_LIVE_STATUS               (1 << 27)
 > > +#define   PORTB_HOTPLUG_LIVE_STATUS               (1 << 27)
 > >  #define   PORTD_HOTPLUG_INT_STATUS             (3 << 21)
 > >  #define   PORTC_HOTPLUG_INT_STATUS             (3 << 19)
 > >  #define   PORTB_HOTPLUG_INT_STATUS             (3 << 17)
 > I did, here are the logs:
 > 364K http://files.hz6.de/kern_20130724_2.log

I get a 'forbidden' when I try to access this.

 > I don't understand what exactly this patch does, but I noticed:
 > - much less drm debug info, resulting in a much smaller log
 > - no more noticeable lag on my system (even though a storm was 
 > detected).

Ok, so this indicates that the storm detection works now :)

 > I double checked the latter and the lag seems indeed to be gone...

Before we actually masked out individual events it was irrelevant
if the bits were mixed up. Now we need to be correct on these.

Jan, thanks for your help!

Cheers,
	Egbert.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-07-26  5:11 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-06 21:41 i915 irq storm mitigation in 3.10 Jan Niggemann
2013-07-08 20:03 ` Daniel Vetter
2013-07-21 20:23   ` Jan Niggemann
2013-07-21 20:43     ` Daniel Vetter
2013-07-22  6:14       ` Jan Niggemann
2013-07-22  6:26         ` Daniel Vetter
2013-07-22  8:04       ` Egbert Eich
2013-07-22 19:28         ` Jan Niggemann
2013-07-22 20:36           ` Egbert Eich
2013-07-23  8:17             ` Daniel Vetter
2013-07-23 11:26           ` Egbert Eich
2013-07-24 20:52             ` Jan Niggemann
2013-07-25  7:50               ` Egbert Eich
2013-07-25  9:58                 ` Daniel Vetter
2013-07-25 11:18                   ` Egbert Eich
2013-07-25 21:23                   ` Jan Niggemann
2013-07-26  5:11                     ` Egbert Eich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.