linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] drm/i915: HPD IRQ storm detection fixes
@ 2018-10-26 22:49 Lyude Paul
  2018-10-26 22:49 ` [PATCH v2 1/3] drm/i915: Fix possible race in intel_dp_add_mst_connector() Lyude Paul
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Lyude Paul @ 2018-10-26 22:49 UTC (permalink / raw)
  To: intel-gfx
  Cc: David Airlie, Rodrigo Vivi, linux-kernel, dri-devel, Jani Nikula,
	Joonas Lahtinen

---- IMPORTANT -----
As a note: I have not had the customer who this second patch is for test
this yet, I'm sending this ahead of time to make sure this is something
that isn't too crazy for upstream to accept. I'm not planning on pushing
this after review until I've verified this actually fixes their
problems.
--------------------

This series contains a fix for a problem which is very difficult to
reproduce under normal circumstances without specialized testing
hardware, along with a fix that seems to be required for some especially
rebellious GM45 laptops.

Lyude Paul (3):
  drm/i915: Fix possible race in intel_dp_add_mst_connector()
  drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST
  drm/i915: Add short HPD IRQ storm detection for non-MST systems

 drivers/gpu/drm/i915/i915_debugfs.c  | 74 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h      |  5 +-
 drivers/gpu/drm/i915/i915_irq.c      |  7 +++
 drivers/gpu/drm/i915/intel_dp_mst.c  |  8 +--
 drivers/gpu/drm/i915/intel_hotplug.c | 51 +++++++++++--------
 5 files changed, 120 insertions(+), 25 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] drm/i915: Fix possible race in intel_dp_add_mst_connector()
  2018-10-26 22:49 [PATCH v2 0/3] drm/i915: HPD IRQ storm detection fixes Lyude Paul
@ 2018-10-26 22:49 ` Lyude Paul
  2018-10-26 22:49 ` [PATCH v2 2/3] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST Lyude Paul
  2018-10-26 22:49 ` [PATCH v2 3/3] drm/i915: Add short HPD IRQ storm detection for non-MST systems Lyude Paul
  2 siblings, 0 replies; 7+ messages in thread
From: Lyude Paul @ 2018-10-26 22:49 UTC (permalink / raw)
  To: intel-gfx
  Cc: Ville Syrjälä,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, David Airlie,
	dri-devel, linux-kernel

This hasn't caused any issues yet that I'm aware of, but as Ville
Syrjälä pointed out - we need to make sure that
intel_connector->mst_port is set before initializing MST connectors,
since in theory we could potentially check intel_connector->mst_port in
i915_hpd_poll_init_work() after registering the connector but before
having written it's value.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_dp_mst.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
index 77920f1a3da1..026d6365cff1 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -448,6 +448,10 @@ static struct drm_connector *intel_dp_add_mst_connector(struct drm_dp_mst_topolo
 	if (!intel_connector)
 		return NULL;
 
+	intel_connector->get_hw_state = intel_dp_mst_get_hw_state;
+	intel_connector->mst_port = intel_dp;
+	intel_connector->port = port;
+
 	connector = &intel_connector->base;
 	ret = drm_connector_init(dev, connector, &intel_dp_mst_connector_funcs,
 				 DRM_MODE_CONNECTOR_DisplayPort);
@@ -458,10 +462,6 @@ static struct drm_connector *intel_dp_add_mst_connector(struct drm_dp_mst_topolo
 
 	drm_connector_helper_add(connector, &intel_dp_mst_connector_helper_funcs);
 
-	intel_connector->get_hw_state = intel_dp_mst_get_hw_state;
-	intel_connector->mst_port = intel_dp;
-	intel_connector->port = port;
-
 	for_each_pipe(dev_priv, pipe) {
 		struct drm_encoder *enc =
 			&intel_dp->mst_encoders[pipe]->base.base;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST
  2018-10-26 22:49 [PATCH v2 0/3] drm/i915: HPD IRQ storm detection fixes Lyude Paul
  2018-10-26 22:49 ` [PATCH v2 1/3] drm/i915: Fix possible race in intel_dp_add_mst_connector() Lyude Paul
@ 2018-10-26 22:49 ` Lyude Paul
  2018-11-02 20:41   ` Ville Syrjälä
  2018-10-26 22:49 ` [PATCH v2 3/3] drm/i915: Add short HPD IRQ storm detection for non-MST systems Lyude Paul
  2 siblings, 1 reply; 7+ messages in thread
From: Lyude Paul @ 2018-10-26 22:49 UTC (permalink / raw)
  To: intel-gfx
  Cc: Ville Syrjälä,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, David Airlie,
	dri-devel, linux-kernel

Turns out that if you trigger an HPD storm on a system that has an MST
topology connected to it, you'll end up causing the kernel to eventually
hit a NULL deref:

[  332.339041] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ec
[  332.340906] PGD 0 P4D 0
[  332.342750] Oops: 0000 [#1] SMP PTI
[  332.344579] CPU: 2 PID: 25 Comm: kworker/2:0 Kdump: loaded Tainted: G           O      4.18.0-rc3short-hpd-storm+ #2
[  332.346453] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
[  332.348361] Workqueue: events intel_hpd_irq_storm_reenable_work [i915]
[  332.350301] RIP: 0010:intel_hpd_irq_storm_reenable_work.cold.3+0x2f/0x86 [i915]
[  332.352213] Code: 00 00 ba e8 00 00 00 48 c7 c6 c0 aa 5f a0 48 c7 c7 d0 73 62 a0 4c 89 c1 4c 89 04 24 e8 7f f5 af e0 4c 8b 04 24 44 89 f8 29 e8 <41> 39 80 ec 00 00 00 0f 85 43 13 fc ff 41 0f b6 86 b8 04 00 00 41
[  332.354286] RSP: 0018:ffffc90000147e48 EFLAGS: 00010006
[  332.356344] RAX: 0000000000000005 RBX: ffff8802c226c9d4 RCX: 0000000000000006
[  332.358404] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88032dc95570
[  332.360466] RBP: 0000000000000005 R08: 0000000000000000 R09: ffff88031b3dc840
[  332.362528] R10: 0000000000000000 R11: 000000031a069602 R12: ffff8802c226ca20
[  332.364575] R13: ffff8802c2268000 R14: ffff880310661000 R15: 000000000000000a
[  332.366615] FS:  0000000000000000(0000) GS:ffff88032dc80000(0000) knlGS:0000000000000000
[  332.368658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  332.370690] CR2: 00000000000000ec CR3: 000000000200a003 CR4: 00000000003606e0
[  332.372724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  332.374773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  332.376798] Call Trace:
[  332.378809]  process_one_work+0x1a1/0x350
[  332.380806]  worker_thread+0x30/0x380
[  332.382777]  ? wq_update_unbound_numa+0x10/0x10
[  332.384772]  kthread+0x112/0x130
[  332.386740]  ? kthread_create_worker_on_cpu+0x70/0x70
[  332.388706]  ret_from_fork+0x35/0x40
[  332.390651] Modules linked in: i915(O) vfat fat joydev btusb btrtl btbcm btintel bluetooth ecdh_generic iTCO_wdt wmi_bmof i2c_algo_bit drm_kms_helper intel_rapl syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt coretemp fb_sys_fops crc32_pclmul drm psmouse pcspkr mei_me mei i2c_i801 lpc_ich mfd_core i2c_core tpm_tis tpm_tis_core thinkpad_acpi wmi tpm rfkill video crc32c_intel serio_raw ehci_pci xhci_pci ehci_hcd xhci_hcd [last unloaded: i915]
[  332.394963] CR2: 00000000000000ec

This appears to be due to the fact that with an MST topology, not all
intel_connector structs will have ->encoder set. So, fix this by
skipping connectors without encoders in
intel_hpd_irq_storm_reenable_work().

For those wondering, this bug was found on accident while simulating HPD
storms using a Chamelium connected to a ThinkPad T450s (Broadwell).

Changes since v1:
- Check intel_connector->mst_port instead of intel_connector->encoder

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_hotplug.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
index 648a13c6043c..8326900a311e 100644
--- a/drivers/gpu/drm/i915/intel_hotplug.c
+++ b/drivers/gpu/drm/i915/intel_hotplug.c
@@ -228,7 +228,9 @@ static void intel_hpd_irq_storm_reenable_work(struct work_struct *work)
 		drm_for_each_connector_iter(connector, &conn_iter) {
 			struct intel_connector *intel_connector = to_intel_connector(connector);
 
-			if (intel_connector->encoder->hpd_pin == pin) {
+			/* Don't check MST ports, they don't have pins */
+			if (!intel_connector->mst_port &&
+			    intel_connector->encoder->hpd_pin == pin) {
 				if (connector->polled != intel_connector->polled)
 					DRM_DEBUG_DRIVER("Reenabling HPD on connector %s\n",
 							 connector->name);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] drm/i915: Add short HPD IRQ storm detection for non-MST systems
  2018-10-26 22:49 [PATCH v2 0/3] drm/i915: HPD IRQ storm detection fixes Lyude Paul
  2018-10-26 22:49 ` [PATCH v2 1/3] drm/i915: Fix possible race in intel_dp_add_mst_connector() Lyude Paul
  2018-10-26 22:49 ` [PATCH v2 2/3] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST Lyude Paul
@ 2018-10-26 22:49 ` Lyude Paul
  2018-10-27  5:50   ` [Intel-gfx] " kbuild test robot
  2018-11-02 19:27   ` Ville Syrjälä
  2 siblings, 2 replies; 7+ messages in thread
From: Lyude Paul @ 2018-10-26 22:49 UTC (permalink / raw)
  To: intel-gfx
  Cc: Ville Syrjälä,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, David Airlie,
	dri-devel, linux-kernel

Unfortunately, it seems that the HPD IRQ storm problem from the early
days of Intel GPUs was never entirely solved, only mostly. Within the
last couple of days, I got a bug report from one of our customers who
had been having issues with their machine suddenly booting up very
slowly after having updated. The amount of time it took to boot went
from around 30 seconds, to over 6 minutes consistently.

After some investigation, I discovered that i915 was reporting massive
amounts of short HPD IRQ spam on this system from the DisplayPort port,
despite there not being anything actually connected. The symptoms would
start with one "long" HPD IRQ being detected at boot:

[    1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00440000, dig 0x00440000, pins 0x000000a0
[    1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long
[    1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 5 - cnt: 0
[    1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long
[    1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 7 - cnt: 0
[    1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long
[    1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long
…

followed by constant short IRQs afterwards:

[    1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status updated from unknown to disconnected
[    1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) received hotplug event.
[    1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3]
[    1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x71450085
[    1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080

The customer's system in question has a GM45 GPU, which is apparently
well known for hotplugging storms.

So, workaround this impressively broken hardware by changing the default
HPD storm threshold from 5 to 50. Then, make long IRQs count for 10, and
short IRQs count for 1. This makes it so that 5 long IRQs will trigger
an HPD storm, and on systems with short HPD storm detection 50 short
IRQs will trigger an HPD storm. 50 short IRQs amounts to 100ms of
constant pulsing, which seems like a good middleground between being too
sensitive and not being sensitive enough (which would cause visible
stutters in userspace every time a storm occurs).

And just to be extra safe: we don't enable this by default on systems
with MST support. There's too high of a chance of MST support triggering
storm detection, and systems that are new enough to support MST are a
lot less likely to have issues with IRQ storms anyway.

As a note: this patch was tested using a ThinkPad T450s and a Chamelium
to simulate the short IRQ storms.

Changes since v1:
- Don't use two separate thresholds, just make long IRQs count for 10
  each and short IRQs count for 1. This simplifies the code a bit
  - Ville Syrjälä

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  | 74 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h      |  5 +-
 drivers/gpu/drm/i915/i915_irq.c      |  7 +++
 drivers/gpu/drm/i915/intel_hotplug.c | 47 +++++++++++-------
 4 files changed, 113 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b4744a68cd88..1595b8565875 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4641,6 +4641,79 @@ static const struct file_operations i915_hpd_storm_ctl_fops = {
 	.write = i915_hpd_storm_ctl_write
 };
 
+static int i915_hpd_short_storm_ctl_show(struct seq_file *m, void *data)
+{
+	struct drm_i915_private *dev_priv = m->private;
+
+	seq_printf(m, "Enabled: %s\n",
+		   yesno(dev_priv->hotplug.hpd_short_storm_enabled));
+
+	return 0;
+}
+
+static int
+i915_hpd_short_storm_ctl_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, i915_hpd_short_storm_ctl_show,
+			   inode->i_private);
+}
+
+static ssize_t i915_hpd_short_storm_ctl_write(struct file *file,
+					      const char __user *ubuf,
+					      size_t len, loff_t *offp)
+{
+	struct seq_file *m = file->private_data;
+	struct drm_i915_private *dev_priv = m->private;
+	struct i915_hotplug *hotplug = &dev_priv->hotplug;
+	char *newline;
+	char tmp[16];
+	int i;
+	bool new_state;
+
+	if (len >= sizeof(tmp))
+		return -EINVAL;
+
+	if (copy_from_user(tmp, ubuf, len))
+		return -EFAULT;
+
+	tmp[len] = '\0';
+
+	/* Strip newline, if any */
+	newline = strchr(tmp, '\n');
+	if (newline)
+		*newline = '\0';
+
+	/* Reset to the "default" state for this system */
+	if (strcmp(tmp, "reset") == 0)
+		new_state = !HAS_DP_MST(dev_priv);
+	else if (kstrtobool(tmp, &new_state) != 0)
+		return -EINVAL;
+
+	DRM_DEBUG_KMS("%sabling HPD short storm detection\n",
+		      new_state ? "En" : "Dis");
+
+	spin_lock_irq(&dev_priv->irq_lock);
+	hotplug->hpd_short_storm_enabled = new_state;
+	/* Reset the HPD storm stats so we don't accidentally trigger a storm */
+	for_each_hpd_pin(i)
+		hotplug->stats[i].count = 0;
+	spin_unlock_irq(&dev_priv->irq_lock);
+
+	/* Re-enable hpd immediately if we were in an irq storm */
+	flush_delayed_work(&dev_priv->hotplug.reenable_work);
+
+	return len;
+}
+
+static const struct file_operations i915_hpd_short_storm_ctl_fops = {
+	.owner = THIS_MODULE,
+	.open = i915_hpd_short_storm_ctl_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+	.write = i915_hpd_short_storm_ctl_write,
+};
+
 static int i915_drrs_ctl_set(void *data, u64 val)
 {
 	struct drm_i915_private *dev_priv = data;
@@ -4818,6 +4891,7 @@ static const struct i915_debugfs_files {
 	{"i915_guc_log_level", &i915_guc_log_level_fops},
 	{"i915_guc_log_relay", &i915_guc_log_relay_fops},
 	{"i915_hpd_storm_ctl", &i915_hpd_storm_ctl_fops},
+	{"i915_hpd_short_storm_ctl", &i915_hpd_short_storm_ctl_fops},
 	{"i915_ipc_status", &i915_ipc_status_fops},
 	{"i915_drrs_ctl", &i915_drrs_ctl_fops},
 	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 808204a7ca7c..a378e0fd2022 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -282,7 +282,8 @@ enum hpd_pin {
 #define for_each_hpd_pin(__pin) \
 	for ((__pin) = (HPD_NONE + 1); (__pin) < HPD_NUM_PINS; (__pin)++)
 
-#define HPD_STORM_DEFAULT_THRESHOLD 5
+/* Threshold == 5 for long IRQs, 50 for short */
+#define HPD_STORM_DEFAULT_THRESHOLD 50
 
 struct i915_hotplug {
 	struct work_struct hotplug_work;
@@ -307,6 +308,8 @@ struct i915_hotplug {
 	bool poll_enabled;
 
 	unsigned int hpd_storm_threshold;
+	/* Whether or not to count short HPD IRQs in HPD storms */
+	u8 hpd_short_storm_enabled;
 
 	/*
 	 * if we get a HPD irq from DP and a HPD irq from non-DP
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 10f28a2ee2e6..57970b3a22df 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -4843,6 +4843,13 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
 		dev_priv->display_irqs_enabled = false;
 
 	dev_priv->hotplug.hpd_storm_threshold = HPD_STORM_DEFAULT_THRESHOLD;
+	/* If we have MST support, we want to avoid doing short HPD IRQ storm
+	 * detection, as short HPD storms will occur as a natural part of
+	 * sideband messaging with MST.
+	 * On older platforms however, IRQ storms can occur with both long and
+	 * short pulses, as seen on some G4x systems.
+	 */
+	dev_priv->hotplug.hpd_short_storm_enabled = !HAS_DP_MST(dev_priv);
 
 	dev->driver->get_vblank_timestamp = drm_calc_vbltimestamp_from_scanoutpos;
 	dev->driver->get_scanout_position = i915_get_crtc_scanoutpos;
diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
index 8326900a311e..aa6f52e723f0 100644
--- a/drivers/gpu/drm/i915/intel_hotplug.c
+++ b/drivers/gpu/drm/i915/intel_hotplug.c
@@ -114,32 +114,45 @@ enum hpd_pin intel_hpd_pin_default(struct drm_i915_private *dev_priv,
 #define HPD_STORM_REENABLE_DELAY	(2 * 60 * 1000)
 
 /**
- * intel_hpd_irq_storm_detect - gather stats and detect HPD irq storm on a pin
+ * intel_hpd_irq_storm_detect - gather stats and detect HPD IRQ storm on a pin
  * @dev_priv: private driver data pointer
  * @pin: the pin to gather stats on
  *
- * Gather stats about HPD irqs from the specified @pin, and detect irq
+ * Gather stats about HPD IRQs from the specified @pin, and detect IRQ
  * storms. Only the pin specific stats and state are changed, the caller is
  * responsible for further action.
  *
- * The number of irqs that are allowed within @HPD_STORM_DETECT_PERIOD is
+ * The number of IRQs that are allowed within @HPD_STORM_DETECT_PERIOD is
  * stored in @dev_priv->hotplug.hpd_storm_threshold which defaults to
- * @HPD_STORM_DEFAULT_THRESHOLD. If this threshold is exceeded, it's
- * considered an irq storm and the irq state is set to @HPD_MARK_DISABLED.
+ * @HPD_STORM_DEFAULT_THRESHOLD. Long IRQs count as +10 to this threshold, and
+ * short IRQs count as +1. If this threshold is exceeded, it's considered an
+ * IRQ storm and the IRQ state is set to @HPD_MARK_DISABLED.
+ *
+ * By default, most systems will only count long IRQs towards
+ * &dev_priv->hotplug.hpd_storm_threshold. However, some older systems also
+ * suffer from short IRQ storms and must also track these. Because short IRQ
+ * storms are naturally caused by sideband interactions with DP MST devices,
+ * short IRQ detection is only enabled for systems without DP MST support.
+ * Systems which are new enough to support DP MST are far less likely to
+ * suffer from IRQ storms at all, so this is fine.
  *
  * The HPD threshold can be controlled through i915_hpd_storm_ctl in debugfs,
  * and should only be adjusted for automated hotplug testing.
  *
- * Return true if an irq storm was detected on @pin.
+ * Return true if an IRQ storm was detected on @pin.
  */
 static bool intel_hpd_irq_storm_detect(struct drm_i915_private *dev_priv,
-				       enum hpd_pin pin)
+				       enum hpd_pin pin, bool long_hpd)
 {
 	unsigned long start = dev_priv->hotplug.stats[pin].last_jiffies;
 	unsigned long end = start + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD);
 	const int threshold = dev_priv->hotplug.hpd_storm_threshold;
+	const int increment = long_hpd ? 10 : 1;
 	bool storm = false;
 
+	if (!long_hpd && !dev_priv->hotplug.hpd_short_storm_enabled)
+		return false;
+
 	if (!time_in_range(jiffies, start, end)) {
 		dev_priv->hotplug.stats[pin].last_jiffies = jiffies;
 		dev_priv->hotplug.stats[pin].count = 0;
@@ -150,7 +163,7 @@ static bool intel_hpd_irq_storm_detect(struct drm_i915_private *dev_priv,
 		DRM_DEBUG_KMS("HPD interrupt storm detected on PIN %d\n", pin);
 		storm = true;
 	} else {
-		dev_priv->hotplug.stats[pin].count++;
+		dev_priv->hotplug.stats[pin].count += increment;
 		DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: %d\n", pin,
 			      dev_priv->hotplug.stats[pin].count);
 	}
@@ -405,28 +418,24 @@ void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
 	for_each_intel_encoder(&dev_priv->drm, encoder) {
 		enum hpd_pin pin = encoder->hpd_pin;
 		bool has_hpd_pulse = intel_encoder_has_hpd_pulse(encoder);
+		bool long_hpd = true;
 
 		if (!(BIT(pin) & pin_mask))
 			continue;
 
 		if (has_hpd_pulse) {
-			bool long_hpd = long_mask & BIT(pin);
 			enum port port = encoder->port;
 
+			long_hpd = !!(long_mask & BIT(pin));
+
 			DRM_DEBUG_DRIVER("digital hpd port %c - %s\n", port_name(port),
 					 long_hpd ? "long" : "short");
-			/*
-			 * For long HPD pulses we want to have the digital queue happen,
-			 * but we still want HPD storm detection to function.
-			 */
 			queue_dig = true;
-			if (long_hpd) {
+			if (long_hpd)
 				dev_priv->hotplug.long_port_mask |= (1 << port);
-			} else {
-				/* for short HPD just trigger the digital queue */
+			else
 				dev_priv->hotplug.short_port_mask |= (1 << port);
-				continue;
-			}
+
 		}
 
 		if (dev_priv->hotplug.stats[pin].state == HPD_DISABLED) {
@@ -449,7 +458,7 @@ void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
 			queue_hp = true;
 		}
 
-		if (intel_hpd_irq_storm_detect(dev_priv, pin)) {
+		if (intel_hpd_irq_storm_detect(dev_priv, pin, long_hpd)) {
 			dev_priv->hotplug.event_bits &= ~BIT(pin);
 			storm_detected = true;
 		}
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH v2 3/3] drm/i915: Add short HPD IRQ storm detection for non-MST systems
  2018-10-26 22:49 ` [PATCH v2 3/3] drm/i915: Add short HPD IRQ storm detection for non-MST systems Lyude Paul
@ 2018-10-27  5:50   ` kbuild test robot
  2018-11-02 19:27   ` Ville Syrjälä
  1 sibling, 0 replies; 7+ messages in thread
From: kbuild test robot @ 2018-10-27  5:50 UTC (permalink / raw)
  To: Lyude Paul
  Cc: kbuild-all, intel-gfx, David Airlie, linux-kernel, dri-devel,
	Rodrigo Vivi

[-- Attachment #1: Type: text/plain, Size: 25665 bytes --]

Hi Lyude,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on v4.19 next-20181019]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Lyude-Paul/drm-i915-HPD-IRQ-storm-detection-fixes/20181027-085424
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   include/net/mac80211.h:977: warning: Function parameter or member 'status.rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'status.ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'status.ampdu_ack_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'status.ampdu_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'status.antenna' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'status.tx_time' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'status.is_valid_ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'status.status_driver_data' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'driver_rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'pad' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'rate_driver_data' not described in 'ieee80211_tx_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.chain_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.filtered' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_failed' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_count' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.lost_packets' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_tdls_pkt_time' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_retries' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_failed' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.ack_signal_filled' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.avg_ack_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.packets' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.bytes' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.last_rate' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.msdu' not described in 'sta_info'
   include/linux/dma-buf.h:304: warning: Function parameter or member 'cb_excl.cb' not described in 'dma_buf'
   include/linux/dma-buf.h:304: warning: Function parameter or member 'cb_excl.poll' not described in 'dma_buf'
   include/linux/dma-buf.h:304: warning: Function parameter or member 'cb_excl.active' not described in 'dma_buf'
   include/linux/dma-buf.h:304: warning: Function parameter or member 'cb_shared.cb' not described in 'dma_buf'
   include/linux/dma-buf.h:304: warning: Function parameter or member 'cb_shared.poll' not described in 'dma_buf'
   include/linux/dma-buf.h:304: warning: Function parameter or member 'cb_shared.active' not described in 'dma_buf'
   include/linux/dma-fence-array.h:54: warning: Function parameter or member 'work' not described in 'dma_fence_array'
   include/linux/gpio/driver.h:142: warning: Function parameter or member 'request_key' not described in 'gpio_irq_chip'
   include/linux/iio/hw-consumer.h:1: warning: no structured comments found
   include/linux/input/sparse-keymap.h:46: warning: Function parameter or member 'sw' not described in 'key_entry'
   drivers/pci/pci.c:218: warning: Excess function parameter 'p' description in 'pci_dev_str_match_path'
   include/linux/regulator/driver.h:227: warning: Function parameter or member 'resume' not described in 'regulator_ops'
   drivers/regulator/core.c:4479: warning: Excess function parameter 'state' description in 'regulator_suspend'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw0' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw1' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw2' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw3' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.eadm' not described in 'irb'
   drivers/slimbus/stream.c:1: warning: no structured comments found
   drivers/target/target_core_device.c:1: warning: no structured comments found
   drivers/usb/typec/bus.c:1: warning: no structured comments found
   drivers/usb/typec/class.c:1: warning: no structured comments found
   include/linux/w1.h:281: warning: Function parameter or member 'of_match_table' not described in 'w1_family'
   fs/direct-io.c:257: warning: Excess function parameter 'offset' description in 'dio_complete'
   fs/file_table.c:1: warning: no structured comments found
   fs/libfs.c:477: warning: Excess function parameter 'available' description in 'simple_write_end'
   fs/posix_acl.c:646: warning: Function parameter or member 'inode' not described in 'posix_acl_update_mode'
   fs/posix_acl.c:646: warning: Function parameter or member 'mode_p' not described in 'posix_acl_update_mode'
   fs/posix_acl.c:646: warning: Function parameter or member 'acl' not described in 'posix_acl_update_mode'
   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c:183: warning: Function parameter or member 'blockable' not described in 'amdgpu_mn_read_lock'
   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c:254: warning: Function parameter or member 'blockable' not described in 'amdgpu_mn_invalidate_range_start_gfx'
   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c:302: warning: Function parameter or member 'blockable' not described in 'amdgpu_mn_invalidate_range_start_hsa'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:382: warning: cannot understand function prototype: 'struct amdgpu_vm_pt_cursor '
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:383: warning: cannot understand function prototype: 'struct amdgpu_vm_pt_cursor '
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:554: warning: Function parameter or member 'adev' not described in 'for_each_amdgpu_vm_pt_leaf'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:554: warning: Function parameter or member 'vm' not described in 'for_each_amdgpu_vm_pt_leaf'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:554: warning: Function parameter or member 'start' not described in 'for_each_amdgpu_vm_pt_leaf'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:554: warning: Function parameter or member 'end' not described in 'for_each_amdgpu_vm_pt_leaf'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:554: warning: Function parameter or member 'cursor' not described in 'for_each_amdgpu_vm_pt_leaf'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:602: warning: Function parameter or member 'adev' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:602: warning: Function parameter or member 'vm' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:602: warning: Function parameter or member 'cursor' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:602: warning: Function parameter or member 'entry' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:847: warning: Function parameter or member 'level' not described in 'amdgpu_vm_bo_param'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1355: warning: Function parameter or member 'params' not described in 'amdgpu_vm_update_func'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1355: warning: Function parameter or member 'bo' not described in 'amdgpu_vm_update_func'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1355: warning: Function parameter or member 'pe' not described in 'amdgpu_vm_update_func'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1355: warning: Function parameter or member 'addr' not described in 'amdgpu_vm_update_func'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1355: warning: Function parameter or member 'count' not described in 'amdgpu_vm_update_func'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1355: warning: Function parameter or member 'incr' not described in 'amdgpu_vm_update_func'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1355: warning: Function parameter or member 'flags' not described in 'amdgpu_vm_update_func'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1522: warning: Function parameter or member 'params' not described in 'amdgpu_vm_update_huge'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1522: warning: Function parameter or member 'bo' not described in 'amdgpu_vm_update_huge'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1522: warning: Function parameter or member 'level' not described in 'amdgpu_vm_update_huge'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1522: warning: Function parameter or member 'pe' not described in 'amdgpu_vm_update_huge'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1522: warning: Function parameter or member 'addr' not described in 'amdgpu_vm_update_huge'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1522: warning: Function parameter or member 'count' not described in 'amdgpu_vm_update_huge'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1522: warning: Function parameter or member 'incr' not described in 'amdgpu_vm_update_huge'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1522: warning: Function parameter or member 'flags' not described in 'amdgpu_vm_update_huge'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:3095: warning: Function parameter or member 'pasid' not described in 'amdgpu_vm_make_compute'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_pin' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_unpin' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_res_obj' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_get_sg_table' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_import_sg_table' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_vmap' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_vunmap' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_mmap' not described in 'drm_driver'
   include/drm/drm_mode_config.h:869: warning: Function parameter or member 'quirk_addfb_prefer_xbgr_30bpp' not described in 'drm_mode_config'
   drivers/gpu/drm/drm_fourcc.c:112: warning: Function parameter or member 'dev' not described in 'drm_driver_legacy_fb_format'
   drivers/gpu/drm/drm_fourcc.c:112: warning: Excess function parameter 'native' description in 'drm_driver_legacy_fb_format'
>> drivers/gpu/drm/i915/intel_hotplug.c:147: warning: Function parameter or member 'long_hpd' not described in 'intel_hpd_irq_storm_detect'
   drivers/gpu/drm/i915/i915_vma.h:49: warning: cannot understand function prototype: 'struct i915_vma '
   drivers/gpu/drm/i915/i915_vma.h:1: warning: no structured comments found
   drivers/gpu/drm/i915/intel_guc_fwif.h:554: warning: cannot understand function prototype: 'struct guc_log_buffer_state '
   drivers/gpu/drm/i915/i915_trace.h:1: warning: no structured comments found
   include/linux/skbuff.h:860: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'list' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'ip_defrag_offset' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'skb_mstamp' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'head_frag' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'encapsulation' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'csum_valid' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'csum_level' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'offload_fwd_mark' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'offload_mr_fwd_mark' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff'
   include/linux/skbuff.h:860: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff'
   include/net/sock.h:238: warning: Function parameter or member 'skc_addrpair' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_portpair' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_ipv6only' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_net_refcnt' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_v6_daddr' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_v6_rcv_saddr' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_cookie' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_listener' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_tw_dr' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_rcv_wnd' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_tw_rcv_nxt' not described in 'sock_common'
   include/net/sock.h:509: warning: Function parameter or member 'sk_backlog.rmem_alloc' not described in 'sock'
   include/net/sock.h:509: warning: Function parameter or member 'sk_backlog.len' not described in 'sock'
   include/net/sock.h:509: warning: Function parameter or member 'sk_backlog.head' not described in 'sock'
   include/net/sock.h:509: warning: Function parameter or member 'sk_backlog.tail' not described in 'sock'
   include/net/sock.h:509: warning: Function parameter or member 'sk_wq_raw' not described in 'sock'
   include/net/sock.h:509: warning: Function parameter or member 'tcp_rtx_queue' not described in 'sock'
   include/net/sock.h:509: warning: Function parameter or member 'sk_route_forced_caps' not described in 'sock'
   include/net/sock.h:509: warning: Function parameter or member 'sk_txtime_report_errors' not described in 'sock'
   include/net/sock.h:509: warning: Function parameter or member 'sk_validate_xmit_skb' not described in 'sock'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'adj_list.upper' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'adj_list.lower' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'gso_partial_features' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'switchdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'l3mdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'xfrmdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'tlsdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'name_assign_type' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'ieee802154_ptr' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'mpls_ptr' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'xdp_prog' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'gro_flush_timeout' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'nf_hooks_ingress' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member '____cacheline_aligned_in_smp' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'qdisc_hash' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'xps_cpus_map' not described in 'net_device'
   include/linux/netdevice.h:2018: warning: Function parameter or member 'xps_rxqs_map' not described in 'net_device'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(advertising' not described in 'phylink_link_state'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising' not described in 'phylink_link_state'
   sound/soc/soc-core.c:2918: warning: Excess function parameter 'legacy_dai_naming' description in 'snd_soc_register_dais'
   Documentation/admin-guide/cgroup-v2.rst:1485: WARNING: Block quote ends without a blank line; unexpected unindent.
   Documentation/admin-guide/cgroup-v2.rst:1487: WARNING: Block quote ends without a blank line; unexpected unindent.
   Documentation/admin-guide/cgroup-v2.rst:1488: WARNING: Block quote ends without a blank line; unexpected unindent.
   Documentation/core-api/boot-time-mm.rst:78: ERROR: Error in "kernel-doc" directive:
   unknown option: "nodocs".

vim +147 drivers/gpu/drm/i915/intel_hotplug.c

77913b39 Jani Nikula 2015-06-18  115  
77913b39 Jani Nikula 2015-06-18  116  /**
beb5d089 Lyude Paul  2018-10-26  117   * intel_hpd_irq_storm_detect - gather stats and detect HPD IRQ storm on a pin
77913b39 Jani Nikula 2015-06-18  118   * @dev_priv: private driver data pointer
77913b39 Jani Nikula 2015-06-18  119   * @pin: the pin to gather stats on
77913b39 Jani Nikula 2015-06-18  120   *
beb5d089 Lyude Paul  2018-10-26  121   * Gather stats about HPD IRQs from the specified @pin, and detect IRQ
77913b39 Jani Nikula 2015-06-18  122   * storms. Only the pin specific stats and state are changed, the caller is
77913b39 Jani Nikula 2015-06-18  123   * responsible for further action.
77913b39 Jani Nikula 2015-06-18  124   *
beb5d089 Lyude Paul  2018-10-26  125   * The number of IRQs that are allowed within @HPD_STORM_DETECT_PERIOD is
317eaa95 Lyude       2017-02-03  126   * stored in @dev_priv->hotplug.hpd_storm_threshold which defaults to
beb5d089 Lyude Paul  2018-10-26  127   * @HPD_STORM_DEFAULT_THRESHOLD. Long IRQs count as +10 to this threshold, and
beb5d089 Lyude Paul  2018-10-26  128   * short IRQs count as +1. If this threshold is exceeded, it's considered an
beb5d089 Lyude Paul  2018-10-26  129   * IRQ storm and the IRQ state is set to @HPD_MARK_DISABLED.
beb5d089 Lyude Paul  2018-10-26  130   *
beb5d089 Lyude Paul  2018-10-26  131   * By default, most systems will only count long IRQs towards
beb5d089 Lyude Paul  2018-10-26  132   * &dev_priv->hotplug.hpd_storm_threshold. However, some older systems also
beb5d089 Lyude Paul  2018-10-26  133   * suffer from short IRQ storms and must also track these. Because short IRQ
beb5d089 Lyude Paul  2018-10-26  134   * storms are naturally caused by sideband interactions with DP MST devices,
beb5d089 Lyude Paul  2018-10-26  135   * short IRQ detection is only enabled for systems without DP MST support.
beb5d089 Lyude Paul  2018-10-26  136   * Systems which are new enough to support DP MST are far less likely to
beb5d089 Lyude Paul  2018-10-26  137   * suffer from IRQ storms at all, so this is fine.
317eaa95 Lyude       2017-02-03  138   *
317eaa95 Lyude       2017-02-03  139   * The HPD threshold can be controlled through i915_hpd_storm_ctl in debugfs,
317eaa95 Lyude       2017-02-03  140   * and should only be adjusted for automated hotplug testing.
77913b39 Jani Nikula 2015-06-18  141   *
beb5d089 Lyude Paul  2018-10-26  142   * Return true if an IRQ storm was detected on @pin.
77913b39 Jani Nikula 2015-06-18  143   */
77913b39 Jani Nikula 2015-06-18  144  static bool intel_hpd_irq_storm_detect(struct drm_i915_private *dev_priv,
beb5d089 Lyude Paul  2018-10-26  145  				       enum hpd_pin pin, bool long_hpd)
77913b39 Jani Nikula 2015-06-18  146  {
77913b39 Jani Nikula 2015-06-18 @147  	unsigned long start = dev_priv->hotplug.stats[pin].last_jiffies;
77913b39 Jani Nikula 2015-06-18  148  	unsigned long end = start + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD);
317eaa95 Lyude       2017-02-03  149  	const int threshold = dev_priv->hotplug.hpd_storm_threshold;
beb5d089 Lyude Paul  2018-10-26  150  	const int increment = long_hpd ? 10 : 1;
77913b39 Jani Nikula 2015-06-18  151  	bool storm = false;
77913b39 Jani Nikula 2015-06-18  152  
beb5d089 Lyude Paul  2018-10-26  153  	if (!long_hpd && !dev_priv->hotplug.hpd_short_storm_enabled)
beb5d089 Lyude Paul  2018-10-26  154  		return false;
beb5d089 Lyude Paul  2018-10-26  155  
77913b39 Jani Nikula 2015-06-18  156  	if (!time_in_range(jiffies, start, end)) {
77913b39 Jani Nikula 2015-06-18  157  		dev_priv->hotplug.stats[pin].last_jiffies = jiffies;
77913b39 Jani Nikula 2015-06-18  158  		dev_priv->hotplug.stats[pin].count = 0;
77913b39 Jani Nikula 2015-06-18  159  		DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: 0\n", pin);
317eaa95 Lyude       2017-02-03  160  	} else if (dev_priv->hotplug.stats[pin].count > threshold &&
317eaa95 Lyude       2017-02-03  161  		   threshold) {
77913b39 Jani Nikula 2015-06-18  162  		dev_priv->hotplug.stats[pin].state = HPD_MARK_DISABLED;
77913b39 Jani Nikula 2015-06-18  163  		DRM_DEBUG_KMS("HPD interrupt storm detected on PIN %d\n", pin);
77913b39 Jani Nikula 2015-06-18  164  		storm = true;
77913b39 Jani Nikula 2015-06-18  165  	} else {
beb5d089 Lyude Paul  2018-10-26  166  		dev_priv->hotplug.stats[pin].count += increment;
77913b39 Jani Nikula 2015-06-18  167  		DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: %d\n", pin,
77913b39 Jani Nikula 2015-06-18  168  			      dev_priv->hotplug.stats[pin].count);
77913b39 Jani Nikula 2015-06-18  169  	}
77913b39 Jani Nikula 2015-06-18  170  
77913b39 Jani Nikula 2015-06-18  171  	return storm;
77913b39 Jani Nikula 2015-06-18  172  }
77913b39 Jani Nikula 2015-06-18  173  

:::::: The code at line 147 was first introduced by commit
:::::: 77913b39addfaa836929815515ff55cea1142b66 drm/i915: move generic hotplug code into new intel_hotplug.c file

:::::: TO: Jani Nikula <jani.nikula@intel.com>
:::::: CC: Daniel Vetter <daniel.vetter@ffwll.ch>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6570 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 3/3] drm/i915: Add short HPD IRQ storm detection for non-MST systems
  2018-10-26 22:49 ` [PATCH v2 3/3] drm/i915: Add short HPD IRQ storm detection for non-MST systems Lyude Paul
  2018-10-27  5:50   ` [Intel-gfx] " kbuild test robot
@ 2018-11-02 19:27   ` Ville Syrjälä
  1 sibling, 0 replies; 7+ messages in thread
From: Ville Syrjälä @ 2018-11-02 19:27 UTC (permalink / raw)
  To: Lyude Paul
  Cc: intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, dri-devel, linux-kernel

On Fri, Oct 26, 2018 at 06:49:09PM -0400, Lyude Paul wrote:
> Unfortunately, it seems that the HPD IRQ storm problem from the early
> days of Intel GPUs was never entirely solved, only mostly. Within the
> last couple of days, I got a bug report from one of our customers who
> had been having issues with their machine suddenly booting up very
> slowly after having updated. The amount of time it took to boot went
> from around 30 seconds, to over 6 minutes consistently.
> 
> After some investigation, I discovered that i915 was reporting massive
> amounts of short HPD IRQ spam on this system from the DisplayPort port,
> despite there not being anything actually connected. The symptoms would
> start with one "long" HPD IRQ being detected at boot:
> 
> [    1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00440000, dig 0x00440000, pins 0x000000a0
> [    1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long
> [    1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 5 - cnt: 0
> [    1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long
> [    1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 7 - cnt: 0
> [    1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long
> [    1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long
> …
> 
> followed by constant short IRQs afterwards:
> 
> [    1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status updated from unknown to disconnected
> [    1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) received hotplug event.
> [    1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3]
> [    1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
> [    1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [    1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
> [    1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [    1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x71450085
> [    1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
> [    1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [    1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
> [    1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [    1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
> 
> The customer's system in question has a GM45 GPU, which is apparently
> well known for hotplugging storms.
> 
> So, workaround this impressively broken hardware by changing the default
> HPD storm threshold from 5 to 50. Then, make long IRQs count for 10, and
> short IRQs count for 1. This makes it so that 5 long IRQs will trigger
> an HPD storm, and on systems with short HPD storm detection 50 short
> IRQs will trigger an HPD storm. 50 short IRQs amounts to 100ms of
> constant pulsing, which seems like a good middleground between being too
> sensitive and not being sensitive enough (which would cause visible
> stutters in userspace every time a storm occurs).
> 
> And just to be extra safe: we don't enable this by default on systems
> with MST support. There's too high of a chance of MST support triggering
> storm detection, and systems that are new enough to support MST are a
> lot less likely to have issues with IRQ storms anyway.
> 
> As a note: this patch was tested using a ThinkPad T450s and a Chamelium
> to simulate the short IRQ storms.
> 
> Changes since v1:
> - Don't use two separate thresholds, just make long IRQs count for 10
>   each and short IRQs count for 1. This simplifies the code a bit
>   - Ville Syrjälä
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c  | 74 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_drv.h      |  5 +-
>  drivers/gpu/drm/i915/i915_irq.c      |  7 +++
>  drivers/gpu/drm/i915/intel_hotplug.c | 47 +++++++++++-------
>  4 files changed, 113 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index b4744a68cd88..1595b8565875 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4641,6 +4641,79 @@ static const struct file_operations i915_hpd_storm_ctl_fops = {
>  	.write = i915_hpd_storm_ctl_write
>  };
>  
> +static int i915_hpd_short_storm_ctl_show(struct seq_file *m, void *data)
> +{
> +	struct drm_i915_private *dev_priv = m->private;
> +
> +	seq_printf(m, "Enabled: %s\n",
> +		   yesno(dev_priv->hotplug.hpd_short_storm_enabled));
> +
> +	return 0;
> +}
> +
> +static int
> +i915_hpd_short_storm_ctl_open(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, i915_hpd_short_storm_ctl_show,
> +			   inode->i_private);
> +}
> +
> +static ssize_t i915_hpd_short_storm_ctl_write(struct file *file,
> +					      const char __user *ubuf,
> +					      size_t len, loff_t *offp)
> +{
> +	struct seq_file *m = file->private_data;
> +	struct drm_i915_private *dev_priv = m->private;
> +	struct i915_hotplug *hotplug = &dev_priv->hotplug;
> +	char *newline;
> +	char tmp[16];
> +	int i;
> +	bool new_state;
> +
> +	if (len >= sizeof(tmp))
> +		return -EINVAL;
> +
> +	if (copy_from_user(tmp, ubuf, len))
> +		return -EFAULT;
> +
> +	tmp[len] = '\0';
> +
> +	/* Strip newline, if any */
> +	newline = strchr(tmp, '\n');
> +	if (newline)
> +		*newline = '\0';
> +
> +	/* Reset to the "default" state for this system */
> +	if (strcmp(tmp, "reset") == 0)
> +		new_state = !HAS_DP_MST(dev_priv);
> +	else if (kstrtobool(tmp, &new_state) != 0)
> +		return -EINVAL;
> +
> +	DRM_DEBUG_KMS("%sabling HPD short storm detection\n",
> +		      new_state ? "En" : "Dis");
> +
> +	spin_lock_irq(&dev_priv->irq_lock);
> +	hotplug->hpd_short_storm_enabled = new_state;
> +	/* Reset the HPD storm stats so we don't accidentally trigger a storm */
> +	for_each_hpd_pin(i)
> +		hotplug->stats[i].count = 0;
> +	spin_unlock_irq(&dev_priv->irq_lock);
> +
> +	/* Re-enable hpd immediately if we were in an irq storm */
> +	flush_delayed_work(&dev_priv->hotplug.reenable_work);
> +
> +	return len;
> +}
> +
> +static const struct file_operations i915_hpd_short_storm_ctl_fops = {
> +	.owner = THIS_MODULE,
> +	.open = i915_hpd_short_storm_ctl_open,
> +	.read = seq_read,
> +	.llseek = seq_lseek,
> +	.release = single_release,
> +	.write = i915_hpd_short_storm_ctl_write,
> +};
> +
>  static int i915_drrs_ctl_set(void *data, u64 val)
>  {
>  	struct drm_i915_private *dev_priv = data;
> @@ -4818,6 +4891,7 @@ static const struct i915_debugfs_files {
>  	{"i915_guc_log_level", &i915_guc_log_level_fops},
>  	{"i915_guc_log_relay", &i915_guc_log_relay_fops},
>  	{"i915_hpd_storm_ctl", &i915_hpd_storm_ctl_fops},
> +	{"i915_hpd_short_storm_ctl", &i915_hpd_short_storm_ctl_fops},
>  	{"i915_ipc_status", &i915_ipc_status_fops},
>  	{"i915_drrs_ctl", &i915_drrs_ctl_fops},
>  	{"i915_edp_psr_debug", &i915_edp_psr_debug_fops}
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 808204a7ca7c..a378e0fd2022 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -282,7 +282,8 @@ enum hpd_pin {
>  #define for_each_hpd_pin(__pin) \
>  	for ((__pin) = (HPD_NONE + 1); (__pin) < HPD_NUM_PINS; (__pin)++)
>  
> -#define HPD_STORM_DEFAULT_THRESHOLD 5
> +/* Threshold == 5 for long IRQs, 50 for short */
> +#define HPD_STORM_DEFAULT_THRESHOLD 50
>  
>  struct i915_hotplug {
>  	struct work_struct hotplug_work;
> @@ -307,6 +308,8 @@ struct i915_hotplug {
>  	bool poll_enabled;
>  
>  	unsigned int hpd_storm_threshold;
> +	/* Whether or not to count short HPD IRQs in HPD storms */
> +	u8 hpd_short_storm_enabled;
>  
>  	/*
>  	 * if we get a HPD irq from DP and a HPD irq from non-DP
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 10f28a2ee2e6..57970b3a22df 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -4843,6 +4843,13 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
>  		dev_priv->display_irqs_enabled = false;
>  
>  	dev_priv->hotplug.hpd_storm_threshold = HPD_STORM_DEFAULT_THRESHOLD;
> +	/* If we have MST support, we want to avoid doing short HPD IRQ storm
> +	 * detection, as short HPD storms will occur as a natural part of
> +	 * sideband messaging with MST.
> +	 * On older platforms however, IRQ storms can occur with both long and
> +	 * short pulses, as seen on some G4x systems.
> +	 */
> +	dev_priv->hotplug.hpd_short_storm_enabled = !HAS_DP_MST(dev_priv);
>  
>  	dev->driver->get_vblank_timestamp = drm_calc_vbltimestamp_from_scanoutpos;
>  	dev->driver->get_scanout_position = i915_get_crtc_scanoutpos;
> diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
> index 8326900a311e..aa6f52e723f0 100644
> --- a/drivers/gpu/drm/i915/intel_hotplug.c
> +++ b/drivers/gpu/drm/i915/intel_hotplug.c
> @@ -114,32 +114,45 @@ enum hpd_pin intel_hpd_pin_default(struct drm_i915_private *dev_priv,
>  #define HPD_STORM_REENABLE_DELAY	(2 * 60 * 1000)
>  
>  /**
> - * intel_hpd_irq_storm_detect - gather stats and detect HPD irq storm on a pin
> + * intel_hpd_irq_storm_detect - gather stats and detect HPD IRQ storm on a pin
>   * @dev_priv: private driver data pointer
>   * @pin: the pin to gather stats on
>   *
> - * Gather stats about HPD irqs from the specified @pin, and detect irq
> + * Gather stats about HPD IRQs from the specified @pin, and detect IRQ
>   * storms. Only the pin specific stats and state are changed, the caller is
>   * responsible for further action.
>   *
> - * The number of irqs that are allowed within @HPD_STORM_DETECT_PERIOD is
> + * The number of IRQs that are allowed within @HPD_STORM_DETECT_PERIOD is
>   * stored in @dev_priv->hotplug.hpd_storm_threshold which defaults to
> - * @HPD_STORM_DEFAULT_THRESHOLD. If this threshold is exceeded, it's
> - * considered an irq storm and the irq state is set to @HPD_MARK_DISABLED.
> + * @HPD_STORM_DEFAULT_THRESHOLD. Long IRQs count as +10 to this threshold, and
> + * short IRQs count as +1. If this threshold is exceeded, it's considered an
> + * IRQ storm and the IRQ state is set to @HPD_MARK_DISABLED.
> + *
> + * By default, most systems will only count long IRQs towards
> + * &dev_priv->hotplug.hpd_storm_threshold. However, some older systems also
> + * suffer from short IRQ storms and must also track these. Because short IRQ
> + * storms are naturally caused by sideband interactions with DP MST devices,
> + * short IRQ detection is only enabled for systems without DP MST support.
> + * Systems which are new enough to support DP MST are far less likely to
> + * suffer from IRQ storms at all, so this is fine.
>   *
>   * The HPD threshold can be controlled through i915_hpd_storm_ctl in debugfs,
>   * and should only be adjusted for automated hotplug testing.
>   *
> - * Return true if an irq storm was detected on @pin.
> + * Return true if an IRQ storm was detected on @pin.
>   */
>  static bool intel_hpd_irq_storm_detect(struct drm_i915_private *dev_priv,
> -				       enum hpd_pin pin)
> +				       enum hpd_pin pin, bool long_hpd)
>  {
>  	unsigned long start = dev_priv->hotplug.stats[pin].last_jiffies;
>  	unsigned long end = start + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD);
>  	const int threshold = dev_priv->hotplug.hpd_storm_threshold;
> +	const int increment = long_hpd ? 10 : 1;
>  	bool storm = false;
>  
> +	if (!long_hpd && !dev_priv->hotplug.hpd_short_storm_enabled)
> +		return false;
> +
>  	if (!time_in_range(jiffies, start, end)) {
>  		dev_priv->hotplug.stats[pin].last_jiffies = jiffies;
>  		dev_priv->hotplug.stats[pin].count = 0;
> @@ -150,7 +163,7 @@ static bool intel_hpd_irq_storm_detect(struct drm_i915_private *dev_priv,
>  		DRM_DEBUG_KMS("HPD interrupt storm detected on PIN %d\n", pin);
>  		storm = true;
>  	} else {
> -		dev_priv->hotplug.stats[pin].count++;
> +		dev_priv->hotplug.stats[pin].count += increment;
>  		DRM_DEBUG_KMS("Received HPD interrupt on PIN %d - cnt: %d\n", pin,
>  			      dev_priv->hotplug.stats[pin].count);
>  	}
> @@ -405,28 +418,24 @@ void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
>  	for_each_intel_encoder(&dev_priv->drm, encoder) {
>  		enum hpd_pin pin = encoder->hpd_pin;
>  		bool has_hpd_pulse = intel_encoder_has_hpd_pulse(encoder);
> +		bool long_hpd = true;
>  
>  		if (!(BIT(pin) & pin_mask))
>  			continue;
>  
>  		if (has_hpd_pulse) {
> -			bool long_hpd = long_mask & BIT(pin);
>  			enum port port = encoder->port;
>  
> +			long_hpd = !!(long_mask & BIT(pin));
> +
>  			DRM_DEBUG_DRIVER("digital hpd port %c - %s\n", port_name(port),
>  					 long_hpd ? "long" : "short");
> -			/*
> -			 * For long HPD pulses we want to have the digital queue happen,
> -			 * but we still want HPD storm detection to function.
> -			 */
>  			queue_dig = true;
> -			if (long_hpd) {
> +			if (long_hpd)
>  				dev_priv->hotplug.long_port_mask |= (1 << port);
> -			} else {
> -				/* for short HPD just trigger the digital queue */
> +			else
>  				dev_priv->hotplug.short_port_mask |= (1 << port);
> -				continue;
> -			}
> +
>  		}
>  
>  		if (dev_priv->hotplug.stats[pin].state == HPD_DISABLED) {
> @@ -449,7 +458,7 @@ void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
>  			queue_hp = true;
>  		}
>  
> -		if (intel_hpd_irq_storm_detect(dev_priv, pin)) {
> +		if (intel_hpd_irq_storm_detect(dev_priv, pin, long_hpd)) {
>  			dev_priv->hotplug.event_bits &= ~BIT(pin);
>  			storm_detected = true;
>  		}

Looks like we won't actually disable the hpd irq anywhere. Even the
current long hpd storm handling seems a bit confused as it will
call .hpd_irq_setup() from intel_hpd_irq_handler() but actually
marking the pins as disabled won't happen until
i915_hotplug_work_func(). So the interrupt won't actually get disabled
until the second time through intel_hpd_irq_handler() after we detected
a storm.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/3] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST
  2018-10-26 22:49 ` [PATCH v2 2/3] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST Lyude Paul
@ 2018-11-02 20:41   ` Ville Syrjälä
  0 siblings, 0 replies; 7+ messages in thread
From: Ville Syrjälä @ 2018-11-02 20:41 UTC (permalink / raw)
  To: Lyude Paul
  Cc: intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, dri-devel, linux-kernel

On Fri, Oct 26, 2018 at 06:49:08PM -0400, Lyude Paul wrote:
> Turns out that if you trigger an HPD storm on a system that has an MST
> topology connected to it, you'll end up causing the kernel to eventually
> hit a NULL deref:
> 
> [  332.339041] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ec
> [  332.340906] PGD 0 P4D 0
> [  332.342750] Oops: 0000 [#1] SMP PTI
> [  332.344579] CPU: 2 PID: 25 Comm: kworker/2:0 Kdump: loaded Tainted: G           O      4.18.0-rc3short-hpd-storm+ #2
> [  332.346453] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
> [  332.348361] Workqueue: events intel_hpd_irq_storm_reenable_work [i915]
> [  332.350301] RIP: 0010:intel_hpd_irq_storm_reenable_work.cold.3+0x2f/0x86 [i915]
> [  332.352213] Code: 00 00 ba e8 00 00 00 48 c7 c6 c0 aa 5f a0 48 c7 c7 d0 73 62 a0 4c 89 c1 4c 89 04 24 e8 7f f5 af e0 4c 8b 04 24 44 89 f8 29 e8 <41> 39 80 ec 00 00 00 0f 85 43 13 fc ff 41 0f b6 86 b8 04 00 00 41
> [  332.354286] RSP: 0018:ffffc90000147e48 EFLAGS: 00010006
> [  332.356344] RAX: 0000000000000005 RBX: ffff8802c226c9d4 RCX: 0000000000000006
> [  332.358404] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88032dc95570
> [  332.360466] RBP: 0000000000000005 R08: 0000000000000000 R09: ffff88031b3dc840
> [  332.362528] R10: 0000000000000000 R11: 000000031a069602 R12: ffff8802c226ca20
> [  332.364575] R13: ffff8802c2268000 R14: ffff880310661000 R15: 000000000000000a
> [  332.366615] FS:  0000000000000000(0000) GS:ffff88032dc80000(0000) knlGS:0000000000000000
> [  332.368658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  332.370690] CR2: 00000000000000ec CR3: 000000000200a003 CR4: 00000000003606e0
> [  332.372724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  332.374773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  332.376798] Call Trace:
> [  332.378809]  process_one_work+0x1a1/0x350
> [  332.380806]  worker_thread+0x30/0x380
> [  332.382777]  ? wq_update_unbound_numa+0x10/0x10
> [  332.384772]  kthread+0x112/0x130
> [  332.386740]  ? kthread_create_worker_on_cpu+0x70/0x70
> [  332.388706]  ret_from_fork+0x35/0x40
> [  332.390651] Modules linked in: i915(O) vfat fat joydev btusb btrtl btbcm btintel bluetooth ecdh_generic iTCO_wdt wmi_bmof i2c_algo_bit drm_kms_helper intel_rapl syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt coretemp fb_sys_fops crc32_pclmul drm psmouse pcspkr mei_me mei i2c_i801 lpc_ich mfd_core i2c_core tpm_tis tpm_tis_core thinkpad_acpi wmi tpm rfkill video crc32c_intel serio_raw ehci_pci xhci_pci ehci_hcd xhci_hcd [last unloaded: i915]
> [  332.394963] CR2: 00000000000000ec
> 
> This appears to be due to the fact that with an MST topology, not all
> intel_connector structs will have ->encoder set. So, fix this by
> skipping connectors without encoders in
> intel_hpd_irq_storm_reenable_work().
> 
> For those wondering, this bug was found on accident while simulating HPD
> storms using a Chamelium connected to a ThinkPad T450s (Broadwell).
> 
> Changes since v1:
> - Check intel_connector->mst_port instead of intel_connector->encoder
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>

1 and 2 are
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/intel_hotplug.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
> index 648a13c6043c..8326900a311e 100644
> --- a/drivers/gpu/drm/i915/intel_hotplug.c
> +++ b/drivers/gpu/drm/i915/intel_hotplug.c
> @@ -228,7 +228,9 @@ static void intel_hpd_irq_storm_reenable_work(struct work_struct *work)
>  		drm_for_each_connector_iter(connector, &conn_iter) {
>  			struct intel_connector *intel_connector = to_intel_connector(connector);
>  
> -			if (intel_connector->encoder->hpd_pin == pin) {
> +			/* Don't check MST ports, they don't have pins */
> +			if (!intel_connector->mst_port &&
> +			    intel_connector->encoder->hpd_pin == pin) {
>  				if (connector->polled != intel_connector->polled)
>  					DRM_DEBUG_DRIVER("Reenabling HPD on connector %s\n",
>  							 connector->name);
> -- 
> 2.17.2

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-11-02 20:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-26 22:49 [PATCH v2 0/3] drm/i915: HPD IRQ storm detection fixes Lyude Paul
2018-10-26 22:49 ` [PATCH v2 1/3] drm/i915: Fix possible race in intel_dp_add_mst_connector() Lyude Paul
2018-10-26 22:49 ` [PATCH v2 2/3] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST Lyude Paul
2018-11-02 20:41   ` Ville Syrjälä
2018-10-26 22:49 ` [PATCH v2 3/3] drm/i915: Add short HPD IRQ storm detection for non-MST systems Lyude Paul
2018-10-27  5:50   ` [Intel-gfx] " kbuild test robot
2018-11-02 19:27   ` Ville Syrjälä

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).