linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] drm/i915: HPD IRQ storm detection fixes
@ 2018-10-25 22:26 Lyude Paul
  2018-10-25 22:26 ` [PATCH 1/2] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST Lyude Paul
  2018-10-25 22:26 ` [PATCH 2/2] drm/i915: Add short HPD IRQ storm detection for non-MST systems Lyude Paul
  0 siblings, 2 replies; 6+ messages in thread
From: Lyude Paul @ 2018-10-25 22:26 UTC (permalink / raw)
  To: intel-gfx
  Cc: David Airlie, Rodrigo Vivi, linux-kernel, dri-devel, Jani Nikula,
	Joonas Lahtinen

---- IMPORTANT -----
As a note: I have not had the customer who this second patch is for test
this yet, I'm sending this ahead of time to make sure this is something
that isn't too crazy for upstream to accept. I'm not planning on pushing
this after review until I've verified this actually fixes their
problems.
--------------------

This series contains a fix for a problem which is very difficult to
reproduce under normal circumstances without specialized testing
hardware, along with a fix that seems to be required for some especially
rebellious GM45 laptops.

Lyude Paul (2):
  drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST
  drm/i915: Add short HPD IRQ storm detection for non-MST systems

 drivers/gpu/drm/i915/i915_debugfs.c  | 89 +++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_drv.h      | 15 +++--
 drivers/gpu/drm/i915/i915_irq.c      | 14 ++++-
 drivers/gpu/drm/i915/intel_hotplug.c | 87 ++++++++++++++++-----------
 4 files changed, 152 insertions(+), 53 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST
  2018-10-25 22:26 [PATCH 0/2] drm/i915: HPD IRQ storm detection fixes Lyude Paul
@ 2018-10-25 22:26 ` Lyude Paul
  2018-10-26 17:38   ` Ville Syrjälä
  2018-10-25 22:26 ` [PATCH 2/2] drm/i915: Add short HPD IRQ storm detection for non-MST systems Lyude Paul
  1 sibling, 1 reply; 6+ messages in thread
From: Lyude Paul @ 2018-10-25 22:26 UTC (permalink / raw)
  To: intel-gfx
  Cc: Ville Syrjälä,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, David Airlie,
	dri-devel, linux-kernel

Turns out that if you trigger an HPD storm on a system that has an MST
topology connected to it, you'll end up causing the kernel to eventually
hit a NULL deref:

[  332.339041] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ec
[  332.340906] PGD 0 P4D 0
[  332.342750] Oops: 0000 [#1] SMP PTI
[  332.344579] CPU: 2 PID: 25 Comm: kworker/2:0 Kdump: loaded Tainted: G           O      4.18.0-rc3short-hpd-storm+ #2
[  332.346453] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
[  332.348361] Workqueue: events intel_hpd_irq_storm_reenable_work [i915]
[  332.350301] RIP: 0010:intel_hpd_irq_storm_reenable_work.cold.3+0x2f/0x86 [i915]
[  332.352213] Code: 00 00 ba e8 00 00 00 48 c7 c6 c0 aa 5f a0 48 c7 c7 d0 73 62 a0 4c 89 c1 4c 89 04 24 e8 7f f5 af e0 4c 8b 04 24 44 89 f8 29 e8 <41> 39 80 ec 00 00 00 0f 85 43 13 fc ff 41 0f b6 86 b8 04 00 00 41
[  332.354286] RSP: 0018:ffffc90000147e48 EFLAGS: 00010006
[  332.356344] RAX: 0000000000000005 RBX: ffff8802c226c9d4 RCX: 0000000000000006
[  332.358404] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88032dc95570
[  332.360466] RBP: 0000000000000005 R08: 0000000000000000 R09: ffff88031b3dc840
[  332.362528] R10: 0000000000000000 R11: 000000031a069602 R12: ffff8802c226ca20
[  332.364575] R13: ffff8802c2268000 R14: ffff880310661000 R15: 000000000000000a
[  332.366615] FS:  0000000000000000(0000) GS:ffff88032dc80000(0000) knlGS:0000000000000000
[  332.368658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  332.370690] CR2: 00000000000000ec CR3: 000000000200a003 CR4: 00000000003606e0
[  332.372724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  332.374773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  332.376798] Call Trace:
[  332.378809]  process_one_work+0x1a1/0x350
[  332.380806]  worker_thread+0x30/0x380
[  332.382777]  ? wq_update_unbound_numa+0x10/0x10
[  332.384772]  kthread+0x112/0x130
[  332.386740]  ? kthread_create_worker_on_cpu+0x70/0x70
[  332.388706]  ret_from_fork+0x35/0x40
[  332.390651] Modules linked in: i915(O) vfat fat joydev btusb btrtl btbcm btintel bluetooth ecdh_generic iTCO_wdt wmi_bmof i2c_algo_bit drm_kms_helper intel_rapl syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt coretemp fb_sys_fops crc32_pclmul drm psmouse pcspkr mei_me mei i2c_i801 lpc_ich mfd_core i2c_core tpm_tis tpm_tis_core thinkpad_acpi wmi tpm rfkill video crc32c_intel serio_raw ehci_pci xhci_pci ehci_hcd xhci_hcd [last unloaded: i915]
[  332.394963] CR2: 00000000000000ec

This appears to be due to the fact that with an MST topology, not all
intel_connector structs will have ->encoder set. So, fix this by
skipping connectors without encoders in
intel_hpd_irq_storm_reenable_work().

For those wondering, this bug was found on accident while simulating HPD
storms using a Chamelium connected to a ThinkPad T450s (Broadwell).

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_hotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
index 648a13c6043c..7d21aac10d16 100644
--- a/drivers/gpu/drm/i915/intel_hotplug.c
+++ b/drivers/gpu/drm/i915/intel_hotplug.c
@@ -228,7 +228,8 @@ static void intel_hpd_irq_storm_reenable_work(struct work_struct *work)
 		drm_for_each_connector_iter(connector, &conn_iter) {
 			struct intel_connector *intel_connector = to_intel_connector(connector);
 
-			if (intel_connector->encoder->hpd_pin == pin) {
+			if (intel_connector->encoder &&
+			    intel_connector->encoder->hpd_pin == pin) {
 				if (connector->polled != intel_connector->polled)
 					DRM_DEBUG_DRIVER("Reenabling HPD on connector %s\n",
 							 connector->name);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] drm/i915: Add short HPD IRQ storm detection for non-MST systems
  2018-10-25 22:26 [PATCH 0/2] drm/i915: HPD IRQ storm detection fixes Lyude Paul
  2018-10-25 22:26 ` [PATCH 1/2] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST Lyude Paul
@ 2018-10-25 22:26 ` Lyude Paul
  2018-10-26 17:52   ` Ville Syrjälä
  1 sibling, 1 reply; 6+ messages in thread
From: Lyude Paul @ 2018-10-25 22:26 UTC (permalink / raw)
  To: intel-gfx
  Cc: Ville Syrjälä,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, David Airlie,
	dri-devel, linux-kernel

Unfortunately, it seems that the HPD IRQ storm problem from the early
days of Intel GPUs was never entirely solved, only mostly. Within the
last couple of days, I got a bug report from one of our customers who
had been having issues with their machine suddenly booting up very
slowly after having updated. The amount of time it took to boot went
from around 30 seconds, to over 6 minutes consistently.

After some investigation, I discovered that i915 was reporting massive
amounts of short HPD IRQ spam on this system from the DisplayPort port,
despite there not being anything actually connected. The symptoms would
start with one "long" HPD IRQ being detected at boot:

[    1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00440000, dig 0x00440000, pins 0x000000a0
[    1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long
[    1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 5 - cnt: 0
[    1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long
[    1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 7 - cnt: 0
[    1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long
[    1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long
…

followed by constant short IRQs afterwards:

[    1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status updated from unknown to disconnected
[    1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) received hotplug event.
[    1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3]
[    1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x71450085
[    1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080

The customer's system in question has a GM45 GPU, which is apparently
well known for hotplugging storms.

So, workaround this impressively broken hardware by detecting short IRQ
storms using a separate threshold and count, one that is much higher
then the threshold for long IRQs. We default to a threshold of 50 short
pulses within the timespan of a second, which should amount to about
100ms of constant pulsing. This should be a good middle ground between
avoiding detecting false IRQ storms, while still catching short IRQ
storms quickly enough to minimize the amount of time we'll stutter every
time hotplugging gets re-enabled and immediately disabled by another
short IRQ storm.

And just to be extra safe: we don't enable this by default on systems
with MST support. There's too high of a chance of MST support triggering
storm detection, and systems that are new enough to support MST are a
lot less likely to have issues with IRQ storms anyway.

As a note: this patch was tested using a ThinkPad T450s and a Chamelium
to simulate the short IRQ storms.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  | 89 +++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_drv.h      | 15 +++--
 drivers/gpu/drm/i915/i915_irq.c      | 14 ++++-
 drivers/gpu/drm/i915/intel_hotplug.c | 84 ++++++++++++++++----------
 4 files changed, 150 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b4744a68cd88..84e89fbd55fb 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4566,21 +4566,24 @@ static const struct file_operations i915_forcewake_fops = {
 	.release = i915_forcewake_release,
 };
 
-static int i915_hpd_storm_ctl_show(struct seq_file *m, void *data)
+static int i915_hpd_storm_show(struct seq_file *m, bool long_hpd)
 {
 	struct drm_i915_private *dev_priv = m->private;
 	struct i915_hotplug *hotplug = &dev_priv->hotplug;
+	const unsigned int threshold = long_hpd ?
+		hotplug->long_hpd_storm_threshold :
+		hotplug->short_hpd_storm_threshold;
 
-	seq_printf(m, "Threshold: %d\n", hotplug->hpd_storm_threshold);
+	seq_printf(m, "Threshold: %d\n", threshold);
 	seq_printf(m, "Detected: %s\n",
 		   yesno(delayed_work_pending(&hotplug->reenable_work)));
 
 	return 0;
 }
 
-static ssize_t i915_hpd_storm_ctl_write(struct file *file,
-					const char __user *ubuf, size_t len,
-					loff_t *offp)
+static ssize_t i915_hpd_storm_write(struct file *file,
+				    const char __user *ubuf, size_t len,
+				    loff_t *offp, bool long_hpd)
 {
 	struct seq_file *m = file->private_data;
 	struct drm_i915_private *dev_priv = m->private;
@@ -4589,6 +4592,7 @@ static ssize_t i915_hpd_storm_ctl_write(struct file *file,
 	int i;
 	char *newline;
 	char tmp[16];
+	const char *name = long_hpd ? "long" : "short";
 
 	if (len >= sizeof(tmp))
 		return -EINVAL;
@@ -4603,22 +4607,36 @@ static ssize_t i915_hpd_storm_ctl_write(struct file *file,
 	if (newline)
 		*newline = '\0';
 
-	if (strcmp(tmp, "reset") == 0)
-		new_threshold = HPD_STORM_DEFAULT_THRESHOLD;
-	else if (kstrtouint(tmp, 10, &new_threshold) != 0)
+	if (strcmp(tmp, "reset") == 0) {
+		if (long_hpd)
+			new_threshold = HPD_STORM_DEFAULT_THRESHOLD_LONG;
+		else if (!HAS_DP_MST(dev_priv))
+			new_threshold = HPD_STORM_DEFAULT_THRESHOLD_SHORT;
+		else
+			new_threshold = 0;
+	} else if (kstrtouint(tmp, 10, &new_threshold) != 0) {
 		return -EINVAL;
+	}
 
 	if (new_threshold > 0)
-		DRM_DEBUG_KMS("Setting HPD storm detection threshold to %d\n",
-			      new_threshold);
+		DRM_DEBUG_KMS("Setting %s HPD storm detection threshold to %d\n",
+			      name, new_threshold);
 	else
-		DRM_DEBUG_KMS("Disabling HPD storm detection\n");
+		DRM_DEBUG_KMS("Disabling %s HPD storm detection\n", name);
 
 	spin_lock_irq(&dev_priv->irq_lock);
-	hotplug->hpd_storm_threshold = new_threshold;
+	if (long_hpd)
+		hotplug->long_hpd_storm_threshold = new_threshold;
+	else
+		hotplug->short_hpd_storm_threshold = new_threshold;
+
 	/* Reset the HPD storm stats so we don't accidentally trigger a storm */
-	for_each_hpd_pin(i)
-		hotplug->stats[i].count = 0;
+	for_each_hpd_pin(i) {
+		if (long_hpd)
+			hotplug->stats[i].long_irq.count = 0;
+		else
+			hotplug->stats[i].short_irq.count = 0;
+	}
 	spin_unlock_irq(&dev_priv->irq_lock);
 
 	/* Re-enable hpd immediately if we were in an irq storm */
@@ -4627,6 +4645,18 @@ static ssize_t i915_hpd_storm_ctl_write(struct file *file,
 	return len;
 }
 
+static ssize_t
+i915_hpd_storm_ctl_write(struct file *file, const char __user *ubuf,
+			 size_t len, loff_t *offp)
+{
+	return i915_hpd_storm_write(file, ubuf, len, offp, true);
+}
+
+static int i915_hpd_storm_ctl_show(struct seq_file *m, void *data)
+{
+	return i915_hpd_storm_show(m, true);
+}
+
 static int i915_hpd_storm_ctl_open(struct inode *inode, struct file *file)
 {
 	return single_open(file, i915_hpd_storm_ctl_show, inode->i_private);
@@ -4638,7 +4668,35 @@ static const struct file_operations i915_hpd_storm_ctl_fops = {
 	.read = seq_read,
 	.llseek = seq_lseek,
 	.release = single_release,
-	.write = i915_hpd_storm_ctl_write
+	.write = i915_hpd_storm_ctl_write,
+};
+
+static ssize_t
+i915_short_hpd_storm_ctl_write(struct file *file, const char __user *ubuf,
+			       size_t len, loff_t *offp)
+{
+	return i915_hpd_storm_write(file, ubuf, len, offp, false);
+}
+
+static int i915_short_hpd_storm_ctl_show(struct seq_file *m, void *data)
+{
+	return i915_hpd_storm_show(m, false);
+}
+
+static int
+i915_short_hpd_storm_ctl_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, i915_short_hpd_storm_ctl_show,
+			   inode->i_private);
+}
+
+static const struct file_operations i915_short_hpd_storm_ctl_fops = {
+	.owner = THIS_MODULE,
+	.open = i915_short_hpd_storm_ctl_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+	.write = i915_short_hpd_storm_ctl_write,
 };
 
 static int i915_drrs_ctl_set(void *data, u64 val)
@@ -4818,6 +4876,7 @@ static const struct i915_debugfs_files {
 	{"i915_guc_log_level", &i915_guc_log_level_fops},
 	{"i915_guc_log_relay", &i915_guc_log_relay_fops},
 	{"i915_hpd_storm_ctl", &i915_hpd_storm_ctl_fops},
+	{"i915_short_hpd_storm_ctl", &i915_short_hpd_storm_ctl_fops},
 	{"i915_ipc_status", &i915_ipc_status_fops},
 	{"i915_drrs_ctl", &i915_drrs_ctl_fops},
 	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 808204a7ca7c..ef38baf50299 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -282,14 +282,20 @@ enum hpd_pin {
 #define for_each_hpd_pin(__pin) \
 	for ((__pin) = (HPD_NONE + 1); (__pin) < HPD_NUM_PINS; (__pin)++)
 
-#define HPD_STORM_DEFAULT_THRESHOLD 5
+#define HPD_STORM_DEFAULT_THRESHOLD_LONG	5
+#define HPD_STORM_DEFAULT_THRESHOLD_SHORT	50
+
+struct intel_hpd_irq_stats {
+	unsigned long last_jiffies;
+	int count;
+};
 
 struct i915_hotplug {
 	struct work_struct hotplug_work;
 
 	struct {
-		unsigned long last_jiffies;
-		int count;
+		struct intel_hpd_irq_stats long_irq;
+		struct intel_hpd_irq_stats short_irq;
 		enum {
 			HPD_ENABLED = 0,
 			HPD_DISABLED = 1,
@@ -306,7 +312,8 @@ struct i915_hotplug {
 	struct work_struct poll_init_work;
 	bool poll_enabled;
 
-	unsigned int hpd_storm_threshold;
+	unsigned int long_hpd_storm_threshold;
+	unsigned int short_hpd_storm_threshold;
 
 	/*
 	 * if we get a HPD irq from DP and a HPD irq from non-DP
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 10f28a2ee2e6..28a937f3e8e0 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -4842,7 +4842,19 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
 	if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
 		dev_priv->display_irqs_enabled = false;
 
-	dev_priv->hotplug.hpd_storm_threshold = HPD_STORM_DEFAULT_THRESHOLD;
+	dev_priv->hotplug.long_hpd_storm_threshold =
+		HPD_STORM_DEFAULT_THRESHOLD_LONG;
+
+	/* If we have MST support, we want to avoid doing short HPD IRQ storm
+	 * detection, as short HPD storms will occur as a natural part of
+	 * sideband messaging with MST.
+	 * On older platforms however, IRQ storms can occur with both long and
+	 * short pulses, as seen on some G4x systems.
+	 */
+	if (!HAS_DP_MST(dev_priv)) {
+		dev_priv->hotplug.short_hpd_storm_threshold =
+			HPD_STORM_DEFAULT_THRESHOLD_SHORT;
+	}
 
 	dev->driver->get_vblank_timestamp = drm_calc_vbltimestamp_from_scanoutpos;
 	dev->driver->get_scanout_position = i915_get_crtc_scanoutpos;
diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
index 7d21aac10d16..74798c920909 100644
--- a/drivers/gpu/drm/i915/intel_hotplug.c
+++ b/drivers/gpu/drm/i915/intel_hotplug.c
@@ -118,41 +118,64 @@ enum hpd_pin intel_hpd_pin_default(struct drm_i915_private *dev_priv,
  * @dev_priv: private driver data pointer
  * @pin: the pin to gather stats on
  *
- * Gather stats about HPD irqs from the specified @pin, and detect irq
+ * Gather stats about HPD IRQs from the specified @pin, and detect IRQ
  * storms. Only the pin specific stats and state are changed, the caller is
  * responsible for further action.
  *
- * The number of irqs that are allowed within @HPD_STORM_DETECT_PERIOD is
- * stored in @dev_priv->hotplug.hpd_storm_threshold which defaults to
- * @HPD_STORM_DEFAULT_THRESHOLD. If this threshold is exceeded, it's
- * considered an irq storm and the irq state is set to @HPD_MARK_DISABLED.
+ * IRQ storm detection can happen for both long and short IRQs. The number of
+ * short or long IRQs that are allowed within @HPD_STORM_DETECT_PERIOD are
+ * stored in @dev_priv->hotplug.long_hpd_storm_threshold and
+ * @dev_priv->hotplug.short_hpd_storm_threshold respectively. For long IRQs,
+ * the threshold defaults to @HPD_STORM_DEFAULT_THRESHOLD_LONG on all systems.
+ * For short IRQs however, the default threshold varies. On systems that are
+ * too old to support MST, the threshold will default to
+ * @HPD_STORM_DEFAULT_THRESHOLD_SHORT. Systems which do support MST will not
+ * have short IRQ storm detection enabled by default, as doing so would likely
+ * interfere with the short IRQ bursts caused by sideband transactions.
+ * This is fine, as systems new enough to support MST are unlikely to rely on
+ * HPD storm protection as much as older systems may.
  *
- * The HPD threshold can be controlled through i915_hpd_storm_ctl in debugfs,
- * and should only be adjusted for automated hotplug testing.
+ * The HPD threshold can be controlled through i915_hpd_storm_ctl in debugfs
+ * for long IRQs, and i915_short_hpd_storm_ctl for short IRQs. This should
+ * only ever be adjusted for automated hotplug testing.
  *
  * Return true if an irq storm was detected on @pin.
  */
 static bool intel_hpd_irq_storm_detect(struct drm_i915_private *dev_priv,
-				       enum hpd_pin pin)
+				       enum hpd_pin pin, bool long_hpd)
 {
-	unsigned long start = dev_priv->hotplug.stats[pin].last_jiffies;
-	unsigned long end = start + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD);
-	const int threshold = dev_priv->hotplug.hpd_storm_threshold;
+	struct intel_hpd_irq_stats *stats;
+	const char *name;
+	unsigned long start, end;
+	int threshold;
 	bool storm = false;
 
+	if (long_hpd) {
+		stats = &dev_priv->hotplug.stats[pin].long_irq;
+		threshold = dev_priv->hotplug.long_hpd_storm_threshold;
+		name = "long";
+	} else {
+		stats = &dev_priv->hotplug.stats[pin].short_irq;
+		threshold = dev_priv->hotplug.short_hpd_storm_threshold;
+		name = "short";
+	}
+	start = stats->last_jiffies;
+	end = start + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD);
+
 	if (!time_in_range(jiffies, start, end)) {
-		dev_priv->hotplug.stats[pin].last_jiffies = jiffies;
-		dev_priv->hotplug.stats[pin].count = 0;
-		DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: 0\n", pin);
-	} else if (dev_priv->hotplug.stats[pin].count > threshold &&
-		   threshold) {
+		stats->last_jiffies = jiffies;
+		stats->count = 0;
+		DRM_DEBUG_KMS("Received %s HPD interrupt on PIN %d - cnt: 0\n",
+			      name, pin);
+	} else if (stats->count > threshold && threshold) {
 		dev_priv->hotplug.stats[pin].state = HPD_MARK_DISABLED;
-		DRM_DEBUG_KMS("HPD interrupt storm detected on PIN %d\n", pin);
+		DRM_DEBUG_KMS("Detected %s HPD interrupt storm on PIN %d\n",
+			      name, pin);
 		storm = true;
 	} else {
-		dev_priv->hotplug.stats[pin].count++;
-		DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: %d\n", pin,
-			      dev_priv->hotplug.stats[pin].count);
+		stats->count++;
+		DRM_DEBUG_KMS("Received %s HPD interrupt on PIN %d - cnt: %d\n",
+			      name, pin, stats->count);
 	}
 
 	return storm;
@@ -404,28 +427,24 @@ void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
 	for_each_intel_encoder(&dev_priv->drm, encoder) {
 		enum hpd_pin pin = encoder->hpd_pin;
 		bool has_hpd_pulse = intel_encoder_has_hpd_pulse(encoder);
+		bool long_hpd = true;
 
 		if (!(BIT(pin) & pin_mask))
 			continue;
 
 		if (has_hpd_pulse) {
-			bool long_hpd = long_mask & BIT(pin);
 			enum port port = encoder->port;
 
+			long_hpd = !!(long_mask & BIT(pin));
+
 			DRM_DEBUG_DRIVER("digital hpd port %c - %s\n", port_name(port),
 					 long_hpd ? "long" : "short");
-			/*
-			 * For long HPD pulses we want to have the digital queue happen,
-			 * but we still want HPD storm detection to function.
-			 */
 			queue_dig = true;
-			if (long_hpd) {
+			if (long_hpd)
 				dev_priv->hotplug.long_port_mask |= (1 << port);
-			} else {
-				/* for short HPD just trigger the digital queue */
+			else
 				dev_priv->hotplug.short_port_mask |= (1 << port);
-				continue;
-			}
+
 		}
 
 		if (dev_priv->hotplug.stats[pin].state == HPD_DISABLED) {
@@ -448,7 +467,7 @@ void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
 			queue_hp = true;
 		}
 
-		if (intel_hpd_irq_storm_detect(dev_priv, pin)) {
+		if (intel_hpd_irq_storm_detect(dev_priv, pin, long_hpd)) {
 			dev_priv->hotplug.event_bits &= ~BIT(pin);
 			storm_detected = true;
 		}
@@ -489,7 +508,8 @@ void intel_hpd_init(struct drm_i915_private *dev_priv)
 	int i;
 
 	for_each_hpd_pin(i) {
-		dev_priv->hotplug.stats[i].count = 0;
+		dev_priv->hotplug.stats[i].long_irq.count = 0;
+		dev_priv->hotplug.stats[i].short_irq.count = 0;
 		dev_priv->hotplug.stats[i].state = HPD_ENABLED;
 	}
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST
  2018-10-25 22:26 ` [PATCH 1/2] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST Lyude Paul
@ 2018-10-26 17:38   ` Ville Syrjälä
  0 siblings, 0 replies; 6+ messages in thread
From: Ville Syrjälä @ 2018-10-26 17:38 UTC (permalink / raw)
  To: Lyude Paul
  Cc: intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, dri-devel, linux-kernel

On Thu, Oct 25, 2018 at 06:26:56PM -0400, Lyude Paul wrote:
> Turns out that if you trigger an HPD storm on a system that has an MST
> topology connected to it, you'll end up causing the kernel to eventually
> hit a NULL deref:
> 
> [  332.339041] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ec
> [  332.340906] PGD 0 P4D 0
> [  332.342750] Oops: 0000 [#1] SMP PTI
> [  332.344579] CPU: 2 PID: 25 Comm: kworker/2:0 Kdump: loaded Tainted: G           O      4.18.0-rc3short-hpd-storm+ #2
> [  332.346453] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
> [  332.348361] Workqueue: events intel_hpd_irq_storm_reenable_work [i915]
> [  332.350301] RIP: 0010:intel_hpd_irq_storm_reenable_work.cold.3+0x2f/0x86 [i915]
> [  332.352213] Code: 00 00 ba e8 00 00 00 48 c7 c6 c0 aa 5f a0 48 c7 c7 d0 73 62 a0 4c 89 c1 4c 89 04 24 e8 7f f5 af e0 4c 8b 04 24 44 89 f8 29 e8 <41> 39 80 ec 00 00 00 0f 85 43 13 fc ff 41 0f b6 86 b8 04 00 00 41
> [  332.354286] RSP: 0018:ffffc90000147e48 EFLAGS: 00010006
> [  332.356344] RAX: 0000000000000005 RBX: ffff8802c226c9d4 RCX: 0000000000000006
> [  332.358404] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88032dc95570
> [  332.360466] RBP: 0000000000000005 R08: 0000000000000000 R09: ffff88031b3dc840
> [  332.362528] R10: 0000000000000000 R11: 000000031a069602 R12: ffff8802c226ca20
> [  332.364575] R13: ffff8802c2268000 R14: ffff880310661000 R15: 000000000000000a
> [  332.366615] FS:  0000000000000000(0000) GS:ffff88032dc80000(0000) knlGS:0000000000000000
> [  332.368658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  332.370690] CR2: 00000000000000ec CR3: 000000000200a003 CR4: 00000000003606e0
> [  332.372724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  332.374773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  332.376798] Call Trace:
> [  332.378809]  process_one_work+0x1a1/0x350
> [  332.380806]  worker_thread+0x30/0x380
> [  332.382777]  ? wq_update_unbound_numa+0x10/0x10
> [  332.384772]  kthread+0x112/0x130
> [  332.386740]  ? kthread_create_worker_on_cpu+0x70/0x70
> [  332.388706]  ret_from_fork+0x35/0x40
> [  332.390651] Modules linked in: i915(O) vfat fat joydev btusb btrtl btbcm btintel bluetooth ecdh_generic iTCO_wdt wmi_bmof i2c_algo_bit drm_kms_helper intel_rapl syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt coretemp fb_sys_fops crc32_pclmul drm psmouse pcspkr mei_me mei i2c_i801 lpc_ich mfd_core i2c_core tpm_tis tpm_tis_core thinkpad_acpi wmi tpm rfkill video crc32c_intel serio_raw ehci_pci xhci_pci ehci_hcd xhci_hcd [last unloaded: i915]
> [  332.394963] CR2: 00000000000000ec
> 
> This appears to be due to the fact that with an MST topology, not all
> intel_connector structs will have ->encoder set. So, fix this by
> skipping connectors without encoders in
> intel_hpd_irq_storm_reenable_work().
> 
> For those wondering, this bug was found on accident while simulating HPD
> storms using a Chamelium connected to a ThinkPad T450s (Broadwell).
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_hotplug.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
> index 648a13c6043c..7d21aac10d16 100644
> --- a/drivers/gpu/drm/i915/intel_hotplug.c
> +++ b/drivers/gpu/drm/i915/intel_hotplug.c
> @@ -228,7 +228,8 @@ static void intel_hpd_irq_storm_reenable_work(struct work_struct *work)
>  		drm_for_each_connector_iter(connector, &conn_iter) {
>  			struct intel_connector *intel_connector = to_intel_connector(connector);
>  
> -			if (intel_connector->encoder->hpd_pin == pin) {
> +			if (intel_connector->encoder &&
> +			    intel_connector->encoder->hpd_pin == pin) {

Hmm. Yeah, with MST that assignment happens at enable time. I guess we
currently don't clear that pointer ever, so this seems safe. If we start
to clear it this becomes racy.

Maybe it would be better to just skip MST connectors entirely? Ah,
i915_hpd_poll_init_work() already used ->mst_port for just that. So
probabloy better follow the same example here.

Hmm. That seems a bit racy as well though. We'd need to
reorder intel_dp_add_mst_connector() to assign ->mst_port
before connector_init() to prevent that race.

>  				if (connector->polled != intel_connector->polled)
>  					DRM_DEBUG_DRIVER("Reenabling HPD on connector %s\n",
>  							 connector->name);
> -- 
> 2.17.2

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drm/i915: Add short HPD IRQ storm detection for non-MST systems
  2018-10-25 22:26 ` [PATCH 2/2] drm/i915: Add short HPD IRQ storm detection for non-MST systems Lyude Paul
@ 2018-10-26 17:52   ` Ville Syrjälä
  2018-10-26 17:57     ` Lyude Paul
  0 siblings, 1 reply; 6+ messages in thread
From: Ville Syrjälä @ 2018-10-26 17:52 UTC (permalink / raw)
  To: Lyude Paul
  Cc: intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, dri-devel, linux-kernel

On Thu, Oct 25, 2018 at 06:26:57PM -0400, Lyude Paul wrote:
> Unfortunately, it seems that the HPD IRQ storm problem from the early
> days of Intel GPUs was never entirely solved, only mostly. Within the
> last couple of days, I got a bug report from one of our customers who
> had been having issues with their machine suddenly booting up very
> slowly after having updated. The amount of time it took to boot went
> from around 30 seconds, to over 6 minutes consistently.
> 
> After some investigation, I discovered that i915 was reporting massive
> amounts of short HPD IRQ spam on this system from the DisplayPort port,
> despite there not being anything actually connected. The symptoms would
> start with one "long" HPD IRQ being detected at boot:
> 
> [    1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00440000, dig 0x00440000, pins 0x000000a0
> [    1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long
> [    1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 5 - cnt: 0
> [    1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long
> [    1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 7 - cnt: 0
> [    1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long
> [    1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long
> …
> 
> followed by constant short IRQs afterwards:
> 
> [    1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status updated from unknown to disconnected
> [    1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) received hotplug event.
> [    1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3]
> [    1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
> [    1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [    1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
> [    1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [    1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x71450085
> [    1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
> [    1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [    1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
> [    1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [    1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
> 
> The customer's system in question has a GM45 GPU, which is apparently
> well known for hotplugging storms.
> 
> So, workaround this impressively broken hardware by detecting short IRQ
> storms using a separate threshold and count, one that is much higher
> then the threshold for long IRQs. We default to a threshold of 50 short
> pulses within the timespan of a second, which should amount to about
> 100ms of constant pulsing. This should be a good middle ground between
> avoiding detecting false IRQ storms, while still catching short IRQ
> storms quickly enough to minimize the amount of time we'll stutter every
> time hotplugging gets re-enabled and immediately disabled by another
> short IRQ storm.
> 
> And just to be extra safe: we don't enable this by default on systems
> with MST support. There's too high of a chance of MST support triggering
> storm detection, and systems that are new enough to support MST are a
> lot less likely to have issues with IRQ storms anyway.
> 
> As a note: this patch was tested using a ThinkPad T450s and a Chamelium
> to simulate the short IRQ storms.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c  | 89 +++++++++++++++++++++++-----
>  drivers/gpu/drm/i915/i915_drv.h      | 15 +++--
>  drivers/gpu/drm/i915/i915_irq.c      | 14 ++++-
>  drivers/gpu/drm/i915/intel_hotplug.c | 84 ++++++++++++++++----------
>  4 files changed, 150 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index b4744a68cd88..84e89fbd55fb 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4566,21 +4566,24 @@ static const struct file_operations i915_forcewake_fops = {
>  	.release = i915_forcewake_release,
>  };
>  
> -static int i915_hpd_storm_ctl_show(struct seq_file *m, void *data)
> +static int i915_hpd_storm_show(struct seq_file *m, bool long_hpd)
>  {
>  	struct drm_i915_private *dev_priv = m->private;
>  	struct i915_hotplug *hotplug = &dev_priv->hotplug;
> +	const unsigned int threshold = long_hpd ?
> +		hotplug->long_hpd_storm_threshold :
> +		hotplug->short_hpd_storm_threshold;
>  
> -	seq_printf(m, "Threshold: %d\n", hotplug->hpd_storm_threshold);
> +	seq_printf(m, "Threshold: %d\n", threshold);
>  	seq_printf(m, "Detected: %s\n",
>  		   yesno(delayed_work_pending(&hotplug->reenable_work)));
>  
>  	return 0;
>  }
>  
> -static ssize_t i915_hpd_storm_ctl_write(struct file *file,
> -					const char __user *ubuf, size_t len,
> -					loff_t *offp)
> +static ssize_t i915_hpd_storm_write(struct file *file,
> +				    const char __user *ubuf, size_t len,
> +				    loff_t *offp, bool long_hpd)
>  {
>  	struct seq_file *m = file->private_data;
>  	struct drm_i915_private *dev_priv = m->private;
> @@ -4589,6 +4592,7 @@ static ssize_t i915_hpd_storm_ctl_write(struct file *file,
>  	int i;
>  	char *newline;
>  	char tmp[16];
> +	const char *name = long_hpd ? "long" : "short";
>  
>  	if (len >= sizeof(tmp))
>  		return -EINVAL;
> @@ -4603,22 +4607,36 @@ static ssize_t i915_hpd_storm_ctl_write(struct file *file,
>  	if (newline)
>  		*newline = '\0';
>  
> -	if (strcmp(tmp, "reset") == 0)
> -		new_threshold = HPD_STORM_DEFAULT_THRESHOLD;
> -	else if (kstrtouint(tmp, 10, &new_threshold) != 0)
> +	if (strcmp(tmp, "reset") == 0) {
> +		if (long_hpd)
> +			new_threshold = HPD_STORM_DEFAULT_THRESHOLD_LONG;
> +		else if (!HAS_DP_MST(dev_priv))
> +			new_threshold = HPD_STORM_DEFAULT_THRESHOLD_SHORT;
> +		else
> +			new_threshold = 0;
> +	} else if (kstrtouint(tmp, 10, &new_threshold) != 0) {
>  		return -EINVAL;
> +	}
>  
>  	if (new_threshold > 0)
> -		DRM_DEBUG_KMS("Setting HPD storm detection threshold to %d\n",
> -			      new_threshold);
> +		DRM_DEBUG_KMS("Setting %s HPD storm detection threshold to %d\n",
> +			      name, new_threshold);
>  	else
> -		DRM_DEBUG_KMS("Disabling HPD storm detection\n");
> +		DRM_DEBUG_KMS("Disabling %s HPD storm detection\n", name);
>  
>  	spin_lock_irq(&dev_priv->irq_lock);
> -	hotplug->hpd_storm_threshold = new_threshold;
> +	if (long_hpd)
> +		hotplug->long_hpd_storm_threshold = new_threshold;
> +	else
> +		hotplug->short_hpd_storm_threshold = new_threshold;
> +
>  	/* Reset the HPD storm stats so we don't accidentally trigger a storm */
> -	for_each_hpd_pin(i)
> -		hotplug->stats[i].count = 0;
> +	for_each_hpd_pin(i) {
> +		if (long_hpd)
> +			hotplug->stats[i].long_irq.count = 0;
> +		else
> +			hotplug->stats[i].short_irq.count = 0;
> +	}
>  	spin_unlock_irq(&dev_priv->irq_lock);
>  
>  	/* Re-enable hpd immediately if we were in an irq storm */
> @@ -4627,6 +4645,18 @@ static ssize_t i915_hpd_storm_ctl_write(struct file *file,
>  	return len;
>  }
>  
> +static ssize_t
> +i915_hpd_storm_ctl_write(struct file *file, const char __user *ubuf,
> +			 size_t len, loff_t *offp)
> +{
> +	return i915_hpd_storm_write(file, ubuf, len, offp, true);
> +}
> +
> +static int i915_hpd_storm_ctl_show(struct seq_file *m, void *data)
> +{
> +	return i915_hpd_storm_show(m, true);
> +}
> +
>  static int i915_hpd_storm_ctl_open(struct inode *inode, struct file *file)
>  {
>  	return single_open(file, i915_hpd_storm_ctl_show, inode->i_private);
> @@ -4638,7 +4668,35 @@ static const struct file_operations i915_hpd_storm_ctl_fops = {
>  	.read = seq_read,
>  	.llseek = seq_lseek,
>  	.release = single_release,
> -	.write = i915_hpd_storm_ctl_write
> +	.write = i915_hpd_storm_ctl_write,
> +};
> +
> +static ssize_t
> +i915_short_hpd_storm_ctl_write(struct file *file, const char __user *ubuf,
> +			       size_t len, loff_t *offp)
> +{
> +	return i915_hpd_storm_write(file, ubuf, len, offp, false);
> +}
> +
> +static int i915_short_hpd_storm_ctl_show(struct seq_file *m, void *data)
> +{
> +	return i915_hpd_storm_show(m, false);
> +}
> +
> +static int
> +i915_short_hpd_storm_ctl_open(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, i915_short_hpd_storm_ctl_show,
> +			   inode->i_private);
> +}
> +
> +static const struct file_operations i915_short_hpd_storm_ctl_fops = {
> +	.owner = THIS_MODULE,
> +	.open = i915_short_hpd_storm_ctl_open,
> +	.read = seq_read,
> +	.llseek = seq_lseek,
> +	.release = single_release,
> +	.write = i915_short_hpd_storm_ctl_write,
>  };
>  
>  static int i915_drrs_ctl_set(void *data, u64 val)
> @@ -4818,6 +4876,7 @@ static const struct i915_debugfs_files {
>  	{"i915_guc_log_level", &i915_guc_log_level_fops},
>  	{"i915_guc_log_relay", &i915_guc_log_relay_fops},
>  	{"i915_hpd_storm_ctl", &i915_hpd_storm_ctl_fops},
> +	{"i915_short_hpd_storm_ctl", &i915_short_hpd_storm_ctl_fops},
>  	{"i915_ipc_status", &i915_ipc_status_fops},
>  	{"i915_drrs_ctl", &i915_drrs_ctl_fops},
>  	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops}
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 808204a7ca7c..ef38baf50299 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -282,14 +282,20 @@ enum hpd_pin {
>  #define for_each_hpd_pin(__pin) \
>  	for ((__pin) = (HPD_NONE + 1); (__pin) < HPD_NUM_PINS; (__pin)++)
>  
> -#define HPD_STORM_DEFAULT_THRESHOLD 5
> +#define HPD_STORM_DEFAULT_THRESHOLD_LONG	5
> +#define HPD_STORM_DEFAULT_THRESHOLD_SHORT	50

Hmm. Maybe we could stick to the one counter and make each
long hpd contribute 10 and each short hpd contribute 1
(or whatever seems appropriate)?

> +
> +struct intel_hpd_irq_stats {
> +	unsigned long last_jiffies;
> +	int count;
> +};
>  
>  struct i915_hotplug {
>  	struct work_struct hotplug_work;
>  
>  	struct {
> -		unsigned long last_jiffies;
> -		int count;
> +		struct intel_hpd_irq_stats long_irq;
> +		struct intel_hpd_irq_stats short_irq;
>  		enum {
>  			HPD_ENABLED = 0,
>  			HPD_DISABLED = 1,
> @@ -306,7 +312,8 @@ struct i915_hotplug {
>  	struct work_struct poll_init_work;
>  	bool poll_enabled;
>  
> -	unsigned int hpd_storm_threshold;
> +	unsigned int long_hpd_storm_threshold;
> +	unsigned int short_hpd_storm_threshold;
>  
>  	/*
>  	 * if we get a HPD irq from DP and a HPD irq from non-DP
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 10f28a2ee2e6..28a937f3e8e0 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -4842,7 +4842,19 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
>  	if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
>  		dev_priv->display_irqs_enabled = false;
>  
> -	dev_priv->hotplug.hpd_storm_threshold = HPD_STORM_DEFAULT_THRESHOLD;
> +	dev_priv->hotplug.long_hpd_storm_threshold =
> +		HPD_STORM_DEFAULT_THRESHOLD_LONG;
> +
> +	/* If we have MST support, we want to avoid doing short HPD IRQ storm
> +	 * detection, as short HPD storms will occur as a natural part of
> +	 * sideband messaging with MST.
> +	 * On older platforms however, IRQ storms can occur with both long and
> +	 * short pulses, as seen on some G4x systems.
> +	 */
> +	if (!HAS_DP_MST(dev_priv)) {
> +		dev_priv->hotplug.short_hpd_storm_threshold =
> +			HPD_STORM_DEFAULT_THRESHOLD_SHORT;
> +	}
>  
>  	dev->driver->get_vblank_timestamp = drm_calc_vbltimestamp_from_scanoutpos;
>  	dev->driver->get_scanout_position = i915_get_crtc_scanoutpos;
> diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
> index 7d21aac10d16..74798c920909 100644
> --- a/drivers/gpu/drm/i915/intel_hotplug.c
> +++ b/drivers/gpu/drm/i915/intel_hotplug.c
> @@ -118,41 +118,64 @@ enum hpd_pin intel_hpd_pin_default(struct drm_i915_private *dev_priv,
>   * @dev_priv: private driver data pointer
>   * @pin: the pin to gather stats on
>   *
> - * Gather stats about HPD irqs from the specified @pin, and detect irq
> + * Gather stats about HPD IRQs from the specified @pin, and detect IRQ
>   * storms. Only the pin specific stats and state are changed, the caller is
>   * responsible for further action.
>   *
> - * The number of irqs that are allowed within @HPD_STORM_DETECT_PERIOD is
> - * stored in @dev_priv->hotplug.hpd_storm_threshold which defaults to
> - * @HPD_STORM_DEFAULT_THRESHOLD. If this threshold is exceeded, it's
> - * considered an irq storm and the irq state is set to @HPD_MARK_DISABLED.
> + * IRQ storm detection can happen for both long and short IRQs. The number of
> + * short or long IRQs that are allowed within @HPD_STORM_DETECT_PERIOD are
> + * stored in @dev_priv->hotplug.long_hpd_storm_threshold and
> + * @dev_priv->hotplug.short_hpd_storm_threshold respectively. For long IRQs,
> + * the threshold defaults to @HPD_STORM_DEFAULT_THRESHOLD_LONG on all systems.
> + * For short IRQs however, the default threshold varies. On systems that are
> + * too old to support MST, the threshold will default to
> + * @HPD_STORM_DEFAULT_THRESHOLD_SHORT. Systems which do support MST will not
> + * have short IRQ storm detection enabled by default, as doing so would likely
> + * interfere with the short IRQ bursts caused by sideband transactions.
> + * This is fine, as systems new enough to support MST are unlikely to rely on
> + * HPD storm protection as much as older systems may.
>   *
> - * The HPD threshold can be controlled through i915_hpd_storm_ctl in debugfs,
> - * and should only be adjusted for automated hotplug testing.
> + * The HPD threshold can be controlled through i915_hpd_storm_ctl in debugfs
> + * for long IRQs, and i915_short_hpd_storm_ctl for short IRQs. This should
> + * only ever be adjusted for automated hotplug testing.
>   *
>   * Return true if an irq storm was detected on @pin.
>   */
>  static bool intel_hpd_irq_storm_detect(struct drm_i915_private *dev_priv,
> -				       enum hpd_pin pin)
> +				       enum hpd_pin pin, bool long_hpd)
>  {
> -	unsigned long start = dev_priv->hotplug.stats[pin].last_jiffies;
> -	unsigned long end = start + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD);
> -	const int threshold = dev_priv->hotplug.hpd_storm_threshold;
> +	struct intel_hpd_irq_stats *stats;
> +	const char *name;
> +	unsigned long start, end;
> +	int threshold;
>  	bool storm = false;
>  
> +	if (long_hpd) {
> +		stats = &dev_priv->hotplug.stats[pin].long_irq;
> +		threshold = dev_priv->hotplug.long_hpd_storm_threshold;
> +		name = "long";
> +	} else {
> +		stats = &dev_priv->hotplug.stats[pin].short_irq;
> +		threshold = dev_priv->hotplug.short_hpd_storm_threshold;
> +		name = "short";
> +	}
> +	start = stats->last_jiffies;
> +	end = start + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD);
> +
>  	if (!time_in_range(jiffies, start, end)) {
> -		dev_priv->hotplug.stats[pin].last_jiffies = jiffies;
> -		dev_priv->hotplug.stats[pin].count = 0;
> -		DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: 0\n", pin);
> -	} else if (dev_priv->hotplug.stats[pin].count > threshold &&
> -		   threshold) {
> +		stats->last_jiffies = jiffies;
> +		stats->count = 0;
> +		DRM_DEBUG_KMS("Received %s HPD interrupt on PIN %d - cnt: 0\n",
> +			      name, pin);
> +	} else if (stats->count > threshold && threshold) {
>  		dev_priv->hotplug.stats[pin].state = HPD_MARK_DISABLED;
> -		DRM_DEBUG_KMS("HPD interrupt storm detected on PIN %d\n", pin);
> +		DRM_DEBUG_KMS("Detected %s HPD interrupt storm on PIN %d\n",
> +			      name, pin);
>  		storm = true;
>  	} else {
> -		dev_priv->hotplug.stats[pin].count++;
> -		DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: %d\n", pin,
> -			      dev_priv->hotplug.stats[pin].count);
> +		stats->count++;
> +		DRM_DEBUG_KMS("Received %s HPD interrupt on PIN %d - cnt: %d\n",
> +			      name, pin, stats->count);
>  	}
>  
>  	return storm;
> @@ -404,28 +427,24 @@ void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
>  	for_each_intel_encoder(&dev_priv->drm, encoder) {
>  		enum hpd_pin pin = encoder->hpd_pin;
>  		bool has_hpd_pulse = intel_encoder_has_hpd_pulse(encoder);
> +		bool long_hpd = true;
>  
>  		if (!(BIT(pin) & pin_mask))
>  			continue;
>  
>  		if (has_hpd_pulse) {
> -			bool long_hpd = long_mask & BIT(pin);
>  			enum port port = encoder->port;
>  
> +			long_hpd = !!(long_mask & BIT(pin));
> +
>  			DRM_DEBUG_DRIVER("digital hpd port %c - %s\n", port_name(port),
>  					 long_hpd ? "long" : "short");
> -			/*
> -			 * For long HPD pulses we want to have the digital queue happen,
> -			 * but we still want HPD storm detection to function.
> -			 */
>  			queue_dig = true;
> -			if (long_hpd) {
> +			if (long_hpd)
>  				dev_priv->hotplug.long_port_mask |= (1 << port);
> -			} else {
> -				/* for short HPD just trigger the digital queue */
> +			else
>  				dev_priv->hotplug.short_port_mask |= (1 << port);
> -				continue;
> -			}
> +
>  		}
>  
>  		if (dev_priv->hotplug.stats[pin].state == HPD_DISABLED) {
> @@ -448,7 +467,7 @@ void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
>  			queue_hp = true;
>  		}
>  
> -		if (intel_hpd_irq_storm_detect(dev_priv, pin)) {
> +		if (intel_hpd_irq_storm_detect(dev_priv, pin, long_hpd)) {
>  			dev_priv->hotplug.event_bits &= ~BIT(pin);
>  			storm_detected = true;
>  		}
> @@ -489,7 +508,8 @@ void intel_hpd_init(struct drm_i915_private *dev_priv)
>  	int i;
>  
>  	for_each_hpd_pin(i) {
> -		dev_priv->hotplug.stats[i].count = 0;
> +		dev_priv->hotplug.stats[i].long_irq.count = 0;
> +		dev_priv->hotplug.stats[i].short_irq.count = 0;
>  		dev_priv->hotplug.stats[i].state = HPD_ENABLED;
>  	}
>  
> -- 
> 2.17.2

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drm/i915: Add short HPD IRQ storm detection for non-MST systems
  2018-10-26 17:52   ` Ville Syrjälä
@ 2018-10-26 17:57     ` Lyude Paul
  0 siblings, 0 replies; 6+ messages in thread
From: Lyude Paul @ 2018-10-26 17:57 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, dri-devel, linux-kernel

On Fri, 2018-10-26 at 20:52 +0300, Ville Syrjälä wrote:
> On Thu, Oct 25, 2018 at 06:26:57PM -0400, Lyude Paul wrote:
> > Unfortunately, it seems that the HPD IRQ storm problem from the early
> > days of Intel GPUs was never entirely solved, only mostly. Within the
> > last couple of days, I got a bug report from one of our customers who
> > had been having issues with their machine suddenly booting up very
> > slowly after having updated. The amount of time it took to boot went
> > from around 30 seconds, to over 6 minutes consistently.
> > 
> > After some investigation, I discovered that i915 was reporting massive
> > amounts of short HPD IRQ spam on this system from the DisplayPort port,
> > despite there not being anything actually connected. The symptoms would
> > start with one "long" HPD IRQ being detected at boot:
> > 
> > [    1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received,
> > stat 0x00440000, dig 0x00440000, pins 0x000000a0
> > [    1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B -
> > long
> > [    1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt
> > on PIN 5 - cnt: 0
> > [    1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D -
> > long
> > [    1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt
> > on PIN 7 - cnt: 0
> > [    1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B -
> > long
> > [    1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D -
> > long
> > …
> > 
> > followed by constant short IRQs afterwards:
> > 
> > [    1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1]
> > status updated from unknown to disconnected
> > [    1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7)
> > received hotplug event.
> > [    1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3]
> > [    1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received,
> > stat 0x00200000, dig 0x00200000, pins 0x00000080
> > [    1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D -
> > short
> > [    1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received,
> > stat 0x00200000, dig 0x00200000, pins 0x00000080
> > [    1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D -
> > short
> > [    1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status
> > 0x71450085
> > [    1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received,
> > stat 0x00200000, dig 0x00200000, pins 0x00000080
> > [    1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D -
> > short
> > [    1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received,
> > stat 0x00200000, dig 0x00200000, pins 0x00000080
> > [    1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D -
> > short
> > [    1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received,
> > stat 0x00200000, dig 0x00200000, pins 0x00000080
> > 
> > The customer's system in question has a GM45 GPU, which is apparently
> > well known for hotplugging storms.
> > 
> > So, workaround this impressively broken hardware by detecting short IRQ
> > storms using a separate threshold and count, one that is much higher
> > then the threshold for long IRQs. We default to a threshold of 50 short
> > pulses within the timespan of a second, which should amount to about
> > 100ms of constant pulsing. This should be a good middle ground between
> > avoiding detecting false IRQ storms, while still catching short IRQ
> > storms quickly enough to minimize the amount of time we'll stutter every
> > time hotplugging gets re-enabled and immediately disabled by another
> > short IRQ storm.
> > 
> > And just to be extra safe: we don't enable this by default on systems
> > with MST support. There's too high of a chance of MST support triggering
> > storm detection, and systems that are new enough to support MST are a
> > lot less likely to have issues with IRQ storms anyway.
> > 
> > As a note: this patch was tested using a ThinkPad T450s and a Chamelium
> > to simulate the short IRQ storms.
> > 
> > Signed-off-by: Lyude Paul <lyude@redhat.com>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c  | 89 +++++++++++++++++++++++-----
> >  drivers/gpu/drm/i915/i915_drv.h      | 15 +++--
> >  drivers/gpu/drm/i915/i915_irq.c      | 14 ++++-
> >  drivers/gpu/drm/i915/intel_hotplug.c | 84 ++++++++++++++++----------
> >  4 files changed, 150 insertions(+), 52 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c
> > b/drivers/gpu/drm/i915/i915_debugfs.c
> > index b4744a68cd88..84e89fbd55fb 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -4566,21 +4566,24 @@ static const struct file_operations
> > i915_forcewake_fops = {
> >  	.release = i915_forcewake_release,
> >  };
> >  
> > -static int i915_hpd_storm_ctl_show(struct seq_file *m, void *data)
> > +static int i915_hpd_storm_show(struct seq_file *m, bool long_hpd)
> >  {
> >  	struct drm_i915_private *dev_priv = m->private;
> >  	struct i915_hotplug *hotplug = &dev_priv->hotplug;
> > +	const unsigned int threshold = long_hpd ?
> > +		hotplug->long_hpd_storm_threshold :
> > +		hotplug->short_hpd_storm_threshold;
> >  
> > -	seq_printf(m, "Threshold: %d\n", hotplug->hpd_storm_threshold);
> > +	seq_printf(m, "Threshold: %d\n", threshold);
> >  	seq_printf(m, "Detected: %s\n",
> >  		   yesno(delayed_work_pending(&hotplug->reenable_work)));
> >  
> >  	return 0;
> >  }
> >  
> > -static ssize_t i915_hpd_storm_ctl_write(struct file *file,
> > -					const char __user *ubuf, size_t len,
> > -					loff_t *offp)
> > +static ssize_t i915_hpd_storm_write(struct file *file,
> > +				    const char __user *ubuf, size_t len,
> > +				    loff_t *offp, bool long_hpd)
> >  {
> >  	struct seq_file *m = file->private_data;
> >  	struct drm_i915_private *dev_priv = m->private;
> > @@ -4589,6 +4592,7 @@ static ssize_t i915_hpd_storm_ctl_write(struct file
> > *file,
> >  	int i;
> >  	char *newline;
> >  	char tmp[16];
> > +	const char *name = long_hpd ? "long" : "short";
> >  
> >  	if (len >= sizeof(tmp))
> >  		return -EINVAL;
> > @@ -4603,22 +4607,36 @@ static ssize_t i915_hpd_storm_ctl_write(struct
> > file *file,
> >  	if (newline)
> >  		*newline = '\0';
> >  
> > -	if (strcmp(tmp, "reset") == 0)
> > -		new_threshold = HPD_STORM_DEFAULT_THRESHOLD;
> > -	else if (kstrtouint(tmp, 10, &new_threshold) != 0)
> > +	if (strcmp(tmp, "reset") == 0) {
> > +		if (long_hpd)
> > +			new_threshold = HPD_STORM_DEFAULT_THRESHOLD_LONG;
> > +		else if (!HAS_DP_MST(dev_priv))
> > +			new_threshold = HPD_STORM_DEFAULT_THRESHOLD_SHORT;
> > +		else
> > +			new_threshold = 0;
> > +	} else if (kstrtouint(tmp, 10, &new_threshold) != 0) {
> >  		return -EINVAL;
> > +	}
> >  
> >  	if (new_threshold > 0)
> > -		DRM_DEBUG_KMS("Setting HPD storm detection threshold to %d\n",
> > -			      new_threshold);
> > +		DRM_DEBUG_KMS("Setting %s HPD storm detection threshold to
> > %d\n",
> > +			      name, new_threshold);
> >  	else
> > -		DRM_DEBUG_KMS("Disabling HPD storm detection\n");
> > +		DRM_DEBUG_KMS("Disabling %s HPD storm detection\n", name);
> >  
> >  	spin_lock_irq(&dev_priv->irq_lock);
> > -	hotplug->hpd_storm_threshold = new_threshold;
> > +	if (long_hpd)
> > +		hotplug->long_hpd_storm_threshold = new_threshold;
> > +	else
> > +		hotplug->short_hpd_storm_threshold = new_threshold;
> > +
> >  	/* Reset the HPD storm stats so we don't accidentally trigger a storm
> > */
> > -	for_each_hpd_pin(i)
> > -		hotplug->stats[i].count = 0;
> > +	for_each_hpd_pin(i) {
> > +		if (long_hpd)
> > +			hotplug->stats[i].long_irq.count = 0;
> > +		else
> > +			hotplug->stats[i].short_irq.count = 0;
> > +	}
> >  	spin_unlock_irq(&dev_priv->irq_lock);
> >  
> >  	/* Re-enable hpd immediately if we were in an irq storm */
> > @@ -4627,6 +4645,18 @@ static ssize_t i915_hpd_storm_ctl_write(struct file
> > *file,
> >  	return len;
> >  }
> >  
> > +static ssize_t
> > +i915_hpd_storm_ctl_write(struct file *file, const char __user *ubuf,
> > +			 size_t len, loff_t *offp)
> > +{
> > +	return i915_hpd_storm_write(file, ubuf, len, offp, true);
> > +}
> > +
> > +static int i915_hpd_storm_ctl_show(struct seq_file *m, void *data)
> > +{
> > +	return i915_hpd_storm_show(m, true);
> > +}
> > +
> >  static int i915_hpd_storm_ctl_open(struct inode *inode, struct file
> > *file)
> >  {
> >  	return single_open(file, i915_hpd_storm_ctl_show, inode->i_private);
> > @@ -4638,7 +4668,35 @@ static const struct file_operations
> > i915_hpd_storm_ctl_fops = {
> >  	.read = seq_read,
> >  	.llseek = seq_lseek,
> >  	.release = single_release,
> > -	.write = i915_hpd_storm_ctl_write
> > +	.write = i915_hpd_storm_ctl_write,
> > +};
> > +
> > +static ssize_t
> > +i915_short_hpd_storm_ctl_write(struct file *file, const char __user
> > *ubuf,
> > +			       size_t len, loff_t *offp)
> > +{
> > +	return i915_hpd_storm_write(file, ubuf, len, offp, false);
> > +}
> > +
> > +static int i915_short_hpd_storm_ctl_show(struct seq_file *m, void *data)
> > +{
> > +	return i915_hpd_storm_show(m, false);
> > +}
> > +
> > +static int
> > +i915_short_hpd_storm_ctl_open(struct inode *inode, struct file *file)
> > +{
> > +	return single_open(file, i915_short_hpd_storm_ctl_show,
> > +			   inode->i_private);
> > +}
> > +
> > +static const struct file_operations i915_short_hpd_storm_ctl_fops = {
> > +	.owner = THIS_MODULE,
> > +	.open = i915_short_hpd_storm_ctl_open,
> > +	.read = seq_read,
> > +	.llseek = seq_lseek,
> > +	.release = single_release,
> > +	.write = i915_short_hpd_storm_ctl_write,
> >  };
> >  
> >  static int i915_drrs_ctl_set(void *data, u64 val)
> > @@ -4818,6 +4876,7 @@ static const struct i915_debugfs_files {
> >  	{"i915_guc_log_level", &i915_guc_log_level_fops},
> >  	{"i915_guc_log_relay", &i915_guc_log_relay_fops},
> >  	{"i915_hpd_storm_ctl", &i915_hpd_storm_ctl_fops},
> > +	{"i915_short_hpd_storm_ctl", &i915_short_hpd_storm_ctl_fops},
> >  	{"i915_ipc_status", &i915_ipc_status_fops},
> >  	{"i915_drrs_ctl", &i915_drrs_ctl_fops},
> >  	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops}
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index 808204a7ca7c..ef38baf50299 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -282,14 +282,20 @@ enum hpd_pin {
> >  #define for_each_hpd_pin(__pin) \
> >  	for ((__pin) = (HPD_NONE + 1); (__pin) < HPD_NUM_PINS; (__pin)++)
> >  
> > -#define HPD_STORM_DEFAULT_THRESHOLD 5
> > +#define HPD_STORM_DEFAULT_THRESHOLD_LONG	5
> > +#define HPD_STORM_DEFAULT_THRESHOLD_SHORT	50
> 
> Hmm. Maybe we could stick to the one counter and make each
> long hpd contribute 10 and each short hpd contribute 1
> (or whatever seems appropriate)?

I'm fine with that; will rework the patch series in just a little bit
> 
> > +
> > +struct intel_hpd_irq_stats {
> > +	unsigned long last_jiffies;
> > +	int count;
> > +};
> >  
> >  struct i915_hotplug {
> >  	struct work_struct hotplug_work;
> >  
> >  	struct {
> > -		unsigned long last_jiffies;
> > -		int count;
> > +		struct intel_hpd_irq_stats long_irq;
> > +		struct intel_hpd_irq_stats short_irq;
> >  		enum {
> >  			HPD_ENABLED = 0,
> >  			HPD_DISABLED = 1,
> > @@ -306,7 +312,8 @@ struct i915_hotplug {
> >  	struct work_struct poll_init_work;
> >  	bool poll_enabled;
> >  
> > -	unsigned int hpd_storm_threshold;
> > +	unsigned int long_hpd_storm_threshold;
> > +	unsigned int short_hpd_storm_threshold;
> >  
> >  	/*
> >  	 * if we get a HPD irq from DP and a HPD irq from non-DP
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c
> > b/drivers/gpu/drm/i915/i915_irq.c
> > index 10f28a2ee2e6..28a937f3e8e0 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -4842,7 +4842,19 @@ void intel_irq_init(struct drm_i915_private
> > *dev_priv)
> >  	if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
> >  		dev_priv->display_irqs_enabled = false;
> >  
> > -	dev_priv->hotplug.hpd_storm_threshold = HPD_STORM_DEFAULT_THRESHOLD;
> > +	dev_priv->hotplug.long_hpd_storm_threshold =
> > +		HPD_STORM_DEFAULT_THRESHOLD_LONG;
> > +
> > +	/* If we have MST support, we want to avoid doing short HPD IRQ storm
> > +	 * detection, as short HPD storms will occur as a natural part of
> > +	 * sideband messaging with MST.
> > +	 * On older platforms however, IRQ storms can occur with both long and
> > +	 * short pulses, as seen on some G4x systems.
> > +	 */
> > +	if (!HAS_DP_MST(dev_priv)) {
> > +		dev_priv->hotplug.short_hpd_storm_threshold =
> > +			HPD_STORM_DEFAULT_THRESHOLD_SHORT;
> > +	}
> >  
> >  	dev->driver->get_vblank_timestamp =
> > drm_calc_vbltimestamp_from_scanoutpos;
> >  	dev->driver->get_scanout_position = i915_get_crtc_scanoutpos;
> > diff --git a/drivers/gpu/drm/i915/intel_hotplug.c
> > b/drivers/gpu/drm/i915/intel_hotplug.c
> > index 7d21aac10d16..74798c920909 100644
> > --- a/drivers/gpu/drm/i915/intel_hotplug.c
> > +++ b/drivers/gpu/drm/i915/intel_hotplug.c
> > @@ -118,41 +118,64 @@ enum hpd_pin intel_hpd_pin_default(struct
> > drm_i915_private *dev_priv,
> >   * @dev_priv: private driver data pointer
> >   * @pin: the pin to gather stats on
> >   *
> > - * Gather stats about HPD irqs from the specified @pin, and detect irq
> > + * Gather stats about HPD IRQs from the specified @pin, and detect IRQ
> >   * storms. Only the pin specific stats and state are changed, the caller
> > is
> >   * responsible for further action.
> >   *
> > - * The number of irqs that are allowed within @HPD_STORM_DETECT_PERIOD is
> > - * stored in @dev_priv->hotplug.hpd_storm_threshold which defaults to
> > - * @HPD_STORM_DEFAULT_THRESHOLD. If this threshold is exceeded, it's
> > - * considered an irq storm and the irq state is set to
> > @HPD_MARK_DISABLED.
> > + * IRQ storm detection can happen for both long and short IRQs. The
> > number of
> > + * short or long IRQs that are allowed within @HPD_STORM_DETECT_PERIOD
> > are
> > + * stored in @dev_priv->hotplug.long_hpd_storm_threshold and
> > + * @dev_priv->hotplug.short_hpd_storm_threshold respectively. For long
> > IRQs,
> > + * the threshold defaults to @HPD_STORM_DEFAULT_THRESHOLD_LONG on all
> > systems.
> > + * For short IRQs however, the default threshold varies. On systems that
> > are
> > + * too old to support MST, the threshold will default to
> > + * @HPD_STORM_DEFAULT_THRESHOLD_SHORT. Systems which do support MST will
> > not
> > + * have short IRQ storm detection enabled by default, as doing so would
> > likely
> > + * interfere with the short IRQ bursts caused by sideband transactions.
> > + * This is fine, as systems new enough to support MST are unlikely to
> > rely on
> > + * HPD storm protection as much as older systems may.
> >   *
> > - * The HPD threshold can be controlled through i915_hpd_storm_ctl in
> > debugfs,
> > - * and should only be adjusted for automated hotplug testing.
> > + * The HPD threshold can be controlled through i915_hpd_storm_ctl in
> > debugfs
> > + * for long IRQs, and i915_short_hpd_storm_ctl for short IRQs. This
> > should
> > + * only ever be adjusted for automated hotplug testing.
> >   *
> >   * Return true if an irq storm was detected on @pin.
> >   */
> >  static bool intel_hpd_irq_storm_detect(struct drm_i915_private *dev_priv,
> > -				       enum hpd_pin pin)
> > +				       enum hpd_pin pin, bool long_hpd)
> >  {
> > -	unsigned long start = dev_priv->hotplug.stats[pin].last_jiffies;
> > -	unsigned long end = start + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD);
> > -	const int threshold = dev_priv->hotplug.hpd_storm_threshold;
> > +	struct intel_hpd_irq_stats *stats;
> > +	const char *name;
> > +	unsigned long start, end;
> > +	int threshold;
> >  	bool storm = false;
> >  
> > +	if (long_hpd) {
> > +		stats = &dev_priv->hotplug.stats[pin].long_irq;
> > +		threshold = dev_priv->hotplug.long_hpd_storm_threshold;
> > +		name = "long";
> > +	} else {
> > +		stats = &dev_priv->hotplug.stats[pin].short_irq;
> > +		threshold = dev_priv->hotplug.short_hpd_storm_threshold;
> > +		name = "short";
> > +	}
> > +	start = stats->last_jiffies;
> > +	end = start + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD);
> > +
> >  	if (!time_in_range(jiffies, start, end)) {
> > -		dev_priv->hotplug.stats[pin].last_jiffies = jiffies;
> > -		dev_priv->hotplug.stats[pin].count = 0;
> > -		DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: 0\n",
> > pin);
> > -	} else if (dev_priv->hotplug.stats[pin].count > threshold &&
> > -		   threshold) {
> > +		stats->last_jiffies = jiffies;
> > +		stats->count = 0;
> > +		DRM_DEBUG_KMS("Received %s HPD interrupt on PIN %d - cnt:
> > 0\n",
> > +			      name, pin);
> > +	} else if (stats->count > threshold && threshold) {
> >  		dev_priv->hotplug.stats[pin].state = HPD_MARK_DISABLED;
> > -		DRM_DEBUG_KMS("HPD interrupt storm detected on PIN %d\n",
> > pin);
> > +		DRM_DEBUG_KMS("Detected %s HPD interrupt storm on PIN %d\n",
> > +			      name, pin);
> >  		storm = true;
> >  	} else {
> > -		dev_priv->hotplug.stats[pin].count++;
> > -		DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: %d\n",
> > pin,
> > -			      dev_priv->hotplug.stats[pin].count);
> > +		stats->count++;
> > +		DRM_DEBUG_KMS("Received %s HPD interrupt on PIN %d - cnt:
> > %d\n",
> > +			      name, pin, stats->count);
> >  	}
> >  
> >  	return storm;
> > @@ -404,28 +427,24 @@ void intel_hpd_irq_handler(struct drm_i915_private
> > *dev_priv,
> >  	for_each_intel_encoder(&dev_priv->drm, encoder) {
> >  		enum hpd_pin pin = encoder->hpd_pin;
> >  		bool has_hpd_pulse = intel_encoder_has_hpd_pulse(encoder);
> > +		bool long_hpd = true;
> >  
> >  		if (!(BIT(pin) & pin_mask))
> >  			continue;
> >  
> >  		if (has_hpd_pulse) {
> > -			bool long_hpd = long_mask & BIT(pin);
> >  			enum port port = encoder->port;
> >  
> > +			long_hpd = !!(long_mask & BIT(pin));
> > +
> >  			DRM_DEBUG_DRIVER("digital hpd port %c - %s\n",
> > port_name(port),
> >  					 long_hpd ? "long" : "short");
> > -			/*
> > -			 * For long HPD pulses we want to have the digital
> > queue happen,
> > -			 * but we still want HPD storm detection to function.
> > -			 */
> >  			queue_dig = true;
> > -			if (long_hpd) {
> > +			if (long_hpd)
> >  				dev_priv->hotplug.long_port_mask |= (1 <<
> > port);
> > -			} else {
> > -				/* for short HPD just trigger the digital
> > queue */
> > +			else
> >  				dev_priv->hotplug.short_port_mask |= (1 <<
> > port);
> > -				continue;
> > -			}
> > +
> >  		}
> >  
> >  		if (dev_priv->hotplug.stats[pin].state == HPD_DISABLED) {
> > @@ -448,7 +467,7 @@ void intel_hpd_irq_handler(struct drm_i915_private
> > *dev_priv,
> >  			queue_hp = true;
> >  		}
> >  
> > -		if (intel_hpd_irq_storm_detect(dev_priv, pin)) {
> > +		if (intel_hpd_irq_storm_detect(dev_priv, pin, long_hpd)) {
> >  			dev_priv->hotplug.event_bits &= ~BIT(pin);
> >  			storm_detected = true;
> >  		}
> > @@ -489,7 +508,8 @@ void intel_hpd_init(struct drm_i915_private *dev_priv)
> >  	int i;
> >  
> >  	for_each_hpd_pin(i) {
> > -		dev_priv->hotplug.stats[i].count = 0;
> > +		dev_priv->hotplug.stats[i].long_irq.count = 0;
> > +		dev_priv->hotplug.stats[i].short_irq.count = 0;
> >  		dev_priv->hotplug.stats[i].state = HPD_ENABLED;
> >  	}
> >  
> > -- 
> > 2.17.2
> 
> 
-- 
Cheers,
	Lyude Paul


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-26 17:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-25 22:26 [PATCH 0/2] drm/i915: HPD IRQ storm detection fixes Lyude Paul
2018-10-25 22:26 ` [PATCH 1/2] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST Lyude Paul
2018-10-26 17:38   ` Ville Syrjälä
2018-10-25 22:26 ` [PATCH 2/2] drm/i915: Add short HPD IRQ storm detection for non-MST systems Lyude Paul
2018-10-26 17:52   ` Ville Syrjälä
2018-10-26 17:57     ` Lyude Paul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).