[1/2] drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST
diff mbox series

Message ID 20181025222657.9241-2-lyude@redhat.com
State Superseded
Headers show
Series
  • drm/i915: HPD IRQ storm detection fixes
Related show

Commit Message

Lyude Paul Oct. 25, 2018, 10:26 p.m. UTC
Turns out that if you trigger an HPD storm on a system that has an MST
topology connected to it, you'll end up causing the kernel to eventually
hit a NULL deref:

[  332.339041] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ec
[  332.340906] PGD 0 P4D 0
[  332.342750] Oops: 0000 [#1] SMP PTI
[  332.344579] CPU: 2 PID: 25 Comm: kworker/2:0 Kdump: loaded Tainted: G           O      4.18.0-rc3short-hpd-storm+ #2
[  332.346453] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
[  332.348361] Workqueue: events intel_hpd_irq_storm_reenable_work [i915]
[  332.350301] RIP: 0010:intel_hpd_irq_storm_reenable_work.cold.3+0x2f/0x86 [i915]
[  332.352213] Code: 00 00 ba e8 00 00 00 48 c7 c6 c0 aa 5f a0 48 c7 c7 d0 73 62 a0 4c 89 c1 4c 89 04 24 e8 7f f5 af e0 4c 8b 04 24 44 89 f8 29 e8 <41> 39 80 ec 00 00 00 0f 85 43 13 fc ff 41 0f b6 86 b8 04 00 00 41
[  332.354286] RSP: 0018:ffffc90000147e48 EFLAGS: 00010006
[  332.356344] RAX: 0000000000000005 RBX: ffff8802c226c9d4 RCX: 0000000000000006
[  332.358404] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88032dc95570
[  332.360466] RBP: 0000000000000005 R08: 0000000000000000 R09: ffff88031b3dc840
[  332.362528] R10: 0000000000000000 R11: 000000031a069602 R12: ffff8802c226ca20
[  332.364575] R13: ffff8802c2268000 R14: ffff880310661000 R15: 000000000000000a
[  332.366615] FS:  0000000000000000(0000) GS:ffff88032dc80000(0000) knlGS:0000000000000000
[  332.368658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  332.370690] CR2: 00000000000000ec CR3: 000000000200a003 CR4: 00000000003606e0
[  332.372724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  332.374773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  332.376798] Call Trace:
[  332.378809]  process_one_work+0x1a1/0x350
[  332.380806]  worker_thread+0x30/0x380
[  332.382777]  ? wq_update_unbound_numa+0x10/0x10
[  332.384772]  kthread+0x112/0x130
[  332.386740]  ? kthread_create_worker_on_cpu+0x70/0x70
[  332.388706]  ret_from_fork+0x35/0x40
[  332.390651] Modules linked in: i915(O) vfat fat joydev btusb btrtl btbcm btintel bluetooth ecdh_generic iTCO_wdt wmi_bmof i2c_algo_bit drm_kms_helper intel_rapl syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt coretemp fb_sys_fops crc32_pclmul drm psmouse pcspkr mei_me mei i2c_i801 lpc_ich mfd_core i2c_core tpm_tis tpm_tis_core thinkpad_acpi wmi tpm rfkill video crc32c_intel serio_raw ehci_pci xhci_pci ehci_hcd xhci_hcd [last unloaded: i915]
[  332.394963] CR2: 00000000000000ec

This appears to be due to the fact that with an MST topology, not all
intel_connector structs will have ->encoder set. So, fix this by
skipping connectors without encoders in
intel_hpd_irq_storm_reenable_work().

For those wondering, this bug was found on accident while simulating HPD
storms using a Chamelium connected to a ThinkPad T450s (Broadwell).

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_hotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Ville Syrjälä Oct. 26, 2018, 5:38 p.m. UTC | #1
On Thu, Oct 25, 2018 at 06:26:56PM -0400, Lyude Paul wrote:
> Turns out that if you trigger an HPD storm on a system that has an MST
> topology connected to it, you'll end up causing the kernel to eventually
> hit a NULL deref:
> 
> [  332.339041] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ec
> [  332.340906] PGD 0 P4D 0
> [  332.342750] Oops: 0000 [#1] SMP PTI
> [  332.344579] CPU: 2 PID: 25 Comm: kworker/2:0 Kdump: loaded Tainted: G           O      4.18.0-rc3short-hpd-storm+ #2
> [  332.346453] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
> [  332.348361] Workqueue: events intel_hpd_irq_storm_reenable_work [i915]
> [  332.350301] RIP: 0010:intel_hpd_irq_storm_reenable_work.cold.3+0x2f/0x86 [i915]
> [  332.352213] Code: 00 00 ba e8 00 00 00 48 c7 c6 c0 aa 5f a0 48 c7 c7 d0 73 62 a0 4c 89 c1 4c 89 04 24 e8 7f f5 af e0 4c 8b 04 24 44 89 f8 29 e8 <41> 39 80 ec 00 00 00 0f 85 43 13 fc ff 41 0f b6 86 b8 04 00 00 41
> [  332.354286] RSP: 0018:ffffc90000147e48 EFLAGS: 00010006
> [  332.356344] RAX: 0000000000000005 RBX: ffff8802c226c9d4 RCX: 0000000000000006
> [  332.358404] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88032dc95570
> [  332.360466] RBP: 0000000000000005 R08: 0000000000000000 R09: ffff88031b3dc840
> [  332.362528] R10: 0000000000000000 R11: 000000031a069602 R12: ffff8802c226ca20
> [  332.364575] R13: ffff8802c2268000 R14: ffff880310661000 R15: 000000000000000a
> [  332.366615] FS:  0000000000000000(0000) GS:ffff88032dc80000(0000) knlGS:0000000000000000
> [  332.368658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  332.370690] CR2: 00000000000000ec CR3: 000000000200a003 CR4: 00000000003606e0
> [  332.372724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  332.374773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  332.376798] Call Trace:
> [  332.378809]  process_one_work+0x1a1/0x350
> [  332.380806]  worker_thread+0x30/0x380
> [  332.382777]  ? wq_update_unbound_numa+0x10/0x10
> [  332.384772]  kthread+0x112/0x130
> [  332.386740]  ? kthread_create_worker_on_cpu+0x70/0x70
> [  332.388706]  ret_from_fork+0x35/0x40
> [  332.390651] Modules linked in: i915(O) vfat fat joydev btusb btrtl btbcm btintel bluetooth ecdh_generic iTCO_wdt wmi_bmof i2c_algo_bit drm_kms_helper intel_rapl syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt coretemp fb_sys_fops crc32_pclmul drm psmouse pcspkr mei_me mei i2c_i801 lpc_ich mfd_core i2c_core tpm_tis tpm_tis_core thinkpad_acpi wmi tpm rfkill video crc32c_intel serio_raw ehci_pci xhci_pci ehci_hcd xhci_hcd [last unloaded: i915]
> [  332.394963] CR2: 00000000000000ec
> 
> This appears to be due to the fact that with an MST topology, not all
> intel_connector structs will have ->encoder set. So, fix this by
> skipping connectors without encoders in
> intel_hpd_irq_storm_reenable_work().
> 
> For those wondering, this bug was found on accident while simulating HPD
> storms using a Chamelium connected to a ThinkPad T450s (Broadwell).
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_hotplug.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
> index 648a13c6043c..7d21aac10d16 100644
> --- a/drivers/gpu/drm/i915/intel_hotplug.c
> +++ b/drivers/gpu/drm/i915/intel_hotplug.c
> @@ -228,7 +228,8 @@ static void intel_hpd_irq_storm_reenable_work(struct work_struct *work)
>  		drm_for_each_connector_iter(connector, &conn_iter) {
>  			struct intel_connector *intel_connector = to_intel_connector(connector);
>  
> -			if (intel_connector->encoder->hpd_pin == pin) {
> +			if (intel_connector->encoder &&
> +			    intel_connector->encoder->hpd_pin == pin) {

Hmm. Yeah, with MST that assignment happens at enable time. I guess we
currently don't clear that pointer ever, so this seems safe. If we start
to clear it this becomes racy.

Maybe it would be better to just skip MST connectors entirely? Ah,
i915_hpd_poll_init_work() already used ->mst_port for just that. So
probabloy better follow the same example here.

Hmm. That seems a bit racy as well though. We'd need to
reorder intel_dp_add_mst_connector() to assign ->mst_port
before connector_init() to prevent that race.

>  				if (connector->polled != intel_connector->polled)
>  					DRM_DEBUG_DRIVER("Reenabling HPD on connector %s\n",
>  							 connector->name);
> -- 
> 2.17.2

Patch
diff mbox series

diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
index 648a13c6043c..7d21aac10d16 100644
--- a/drivers/gpu/drm/i915/intel_hotplug.c
+++ b/drivers/gpu/drm/i915/intel_hotplug.c
@@ -228,7 +228,8 @@  static void intel_hpd_irq_storm_reenable_work(struct work_struct *work)
 		drm_for_each_connector_iter(connector, &conn_iter) {
 			struct intel_connector *intel_connector = to_intel_connector(connector);
 
-			if (intel_connector->encoder->hpd_pin == pin) {
+			if (intel_connector->encoder &&
+			    intel_connector->encoder->hpd_pin == pin) {
 				if (connector->polled != intel_connector->polled)
 					DRM_DEBUG_DRIVER("Reenabling HPD on connector %s\n",
 							 connector->name);