[PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs
@ 2018-08-07 20:38 Lyude Paul
  2018-08-07 20:38 ` [PATCH v5 01/13] drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement Lyude Paul
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:38 UTC (permalink / raw)
  To: nouveau
  Cc: David Airlie, linux-kernel, dri-devel, Ben Skeggs,
	Maarten Lankhorst, Daniel Vetter, Ilia Mirkin,
	Ville Syrjälä,
	Lyude Paul

This is the latest version of https://patchwork.freedesktop.org/series/46815/

I moved everything out of fb_helper and back into nouveau, because it
seems that other drivers actually do have this handled already as far as
I can tell.

Lyude Paul (13):
  drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement
  drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()
  drm/nouveau: Remove useless poll_enable() call in
    switcheroo_set_state()
  drm/nouveau: Remove useless poll_disable() call in
    switcheroo_set_state()
  drm/nouveau: Remove useless poll_enable() call in drm_load()
  drm/nouveau: Call optimus_dsm functions after nouveau_do_suspend()
  drm/nouveau: Add missing unroll functions in nouveau_do_suspend()
  drm/nouveau: Don't ignore nouveau_do_suspend() ret value
  drm/nouveau: Fix deadlock with fb_helper by inhibiting it's HPD
  drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()
  drm/nouveau: Respond to HPDs by probing one conn at a time
  drm/nouveau: Fix deadlocks in nouveau_connector_detect()
  drm/nouveau: Call pm_runtime_get_noresume() from hpd handlers

 drivers/gpu/drm/nouveau/dispnv50/disp.c     |  2 +-
 drivers/gpu/drm/nouveau/nouveau_connector.c | 60 +++++++++++---
 drivers/gpu/drm/nouveau/nouveau_connector.h |  1 +
 drivers/gpu/drm/nouveau/nouveau_display.c   | 55 +++++++++----
 drivers/gpu/drm/nouveau/nouveau_drm.c       | 23 ++++--
 drivers/gpu/drm/nouveau/nouveau_fbcon.c     | 90 +++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_fbcon.h     |  9 +++
 drivers/gpu/drm/nouveau/nouveau_vga.c       |  2 -
 8 files changed, 209 insertions(+), 33 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5 01/13] drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
@ 2018-08-07 20:38 ` Lyude Paul
  2018-08-07 20:38 ` [PATCH v5 02/13] drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend() Lyude Paul
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:38 UTC (permalink / raw)
  To: nouveau
  Cc: Lukas Wunner, Peter Ujfalusi, stable, Ben Skeggs, David Airlie,
	dri-devel, linux-kernel

Turns out this part is my fault for not noticing when reviewing
9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling"). Currently
we call drm_kms_helper_poll_enable() from nouveau_display_hpd_work().
This makes basically no sense however, because that means we're calling
drm_kms_helper_poll_enable() every time we schedule the hotplug
detection work. This is also against the advice mentioned in
drm_kms_helper_poll_enable()'s documentation:

 Note that calls to enable and disable polling must be strictly ordered,
 which is automatically the case when they're only call from
 suspend/resume callbacks.

Of course, hotplugs can't really be ordered. They could even happen
immediately after we called drm_kms_helper_poll_disable() in
nouveau_display_fini(), which can lead to all sorts of issues.

Additionally; enabling polling /after/ we call
drm_helper_hpd_irq_event() could also mean that we'd miss a hotplug
event anyway, since drm_helper_hpd_irq_event() wouldn't bother trying to
probe connectors so long as polling is disabled.

So; simply move this back into nouveau_display_init() again. The race
condition that both of these patches attempted to work around has
already been fixed properly in

  d61a5c106351 ("drm/nouveau: Fix deadlock on runtime suspend")

Fixes: 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling")
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index ec7861457b84..1d36ab5d4796 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -355,8 +355,6 @@ nouveau_display_hpd_work(struct work_struct *work)
 	pm_runtime_get_sync(drm->dev->dev);

 	drm_helper_hpd_irq_event(drm->dev);
-	/* enable polling for external displays */
-	drm_kms_helper_poll_enable(drm->dev);

 	pm_runtime_mark_last_busy(drm->dev->dev);
 	pm_runtime_put_sync(drm->dev->dev);
@@ -411,6 +409,11 @@ nouveau_display_init(struct drm_device *dev)
 	if (ret)
 		return ret;

+	/* enable connector detection and polling for connectors without HPD
+	 * support
+	 */
+	drm_kms_helper_poll_enable(dev);
+
 	/* enable hotplug interrupts */
 	drm_connector_list_iter_begin(dev, &conn_iter);
 	nouveau_for_each_non_mst_connector_iter(connector, &conn_iter) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 02/13] drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
  2018-08-07 20:38 ` [PATCH v5 01/13] drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement Lyude Paul
@ 2018-08-07 20:38 ` Lyude Paul
  2018-08-07 20:39 ` [PATCH v5 03/13] drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state() Lyude Paul
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:38 UTC (permalink / raw)
  To: nouveau
  Cc: stable, Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie,
	dri-devel, linux-kernel

Since actual hotplug notifications don't get disabled until
nouveau_display_fini() is called, all this will do is cause any hotplugs
that happen between this drm_kms_helper_poll_disable() call and the
actual hotplug disablement to potentially be dropped if ACPI isn't
around to help us.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index c7ec86d6c3c9..5fdc1fbe2ee5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -835,7 +835,6 @@ nouveau_pmops_runtime_suspend(struct device *dev)
 		return -EBUSY;
 	}
 
-	drm_kms_helper_poll_disable(drm_dev);
 	nouveau_switcheroo_optimus_dsm();
 	ret = nouveau_do_suspend(drm_dev, true);
 	pci_save_state(pdev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 03/13] drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state()
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
  2018-08-07 20:38 ` [PATCH v5 01/13] drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement Lyude Paul
  2018-08-07 20:38 ` [PATCH v5 02/13] drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend() Lyude Paul
@ 2018-08-07 20:39 ` Lyude Paul
  2018-08-07 20:39 ` [PATCH v5 04/13] drm/nouveau: Remove useless poll_disable() " Lyude Paul
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:39 UTC (permalink / raw)
  To: nouveau
  Cc: Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie, dri-devel,
	linux-kernel

This doesn't do anything, drm_kms_helper_poll_enable() gets called in
nouveau_pmops_resume()->nouveau_display_resume()->nouveau_display_init()
already.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
[omitting stable Cc, this likely doesn't fix any real bugs]
---
 drivers/gpu/drm/nouveau/nouveau_vga.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_vga.c b/drivers/gpu/drm/nouveau/nouveau_vga.c
index 3da5a4305aa4..09b1d8151881 100644
--- a/drivers/gpu/drm/nouveau/nouveau_vga.c
+++ b/drivers/gpu/drm/nouveau/nouveau_vga.c
@@ -46,7 +46,6 @@ nouveau_switcheroo_set_state(struct pci_dev *pdev,
 		pr_err("VGA switcheroo: switched nouveau on\n");
 		dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
 		nouveau_pmops_resume(&pdev->dev);
-		drm_kms_helper_poll_enable(dev);
 		dev->switch_power_state = DRM_SWITCH_POWER_ON;
 	} else {
 		pr_err("VGA switcheroo: switched nouveau off\n");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 04/13] drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state()
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
                   ` (2 preceding siblings ...)
  2018-08-07 20:39 ` [PATCH v5 03/13] drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state() Lyude Paul
@ 2018-08-07 20:39 ` Lyude Paul
  2018-08-07 20:39 ` [PATCH v5 05/13] drm/nouveau: Remove useless poll_enable() call in drm_load() Lyude Paul
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:39 UTC (permalink / raw)
  To: nouveau
  Cc: Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie, dri-devel,
	linux-kernel

This won't do anything but potentially make us miss hotplugs. We already
call drm_kms_helper_poll_disable() in
nouveau_pmops_suspend()->nouveau_display_suspend()->nouveau_display_fini()

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
[omitting stable Cc, this likely doesn't fix anything important enough
for that]
---
 drivers/gpu/drm/nouveau/nouveau_vga.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_vga.c b/drivers/gpu/drm/nouveau/nouveau_vga.c
index 09b1d8151881..8f1ce4833230 100644
--- a/drivers/gpu/drm/nouveau/nouveau_vga.c
+++ b/drivers/gpu/drm/nouveau/nouveau_vga.c
@@ -50,7 +50,6 @@ nouveau_switcheroo_set_state(struct pci_dev *pdev,
 	} else {
 		pr_err("VGA switcheroo: switched nouveau off\n");
 		dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
-		drm_kms_helper_poll_disable(dev);
 		nouveau_switcheroo_optimus_dsm();
 		nouveau_pmops_suspend(&pdev->dev);
 		dev->switch_power_state = DRM_SWITCH_POWER_OFF;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 05/13] drm/nouveau: Remove useless poll_enable() call in drm_load()
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
                   ` (3 preceding siblings ...)
  2018-08-07 20:39 ` [PATCH v5 04/13] drm/nouveau: Remove useless poll_disable() " Lyude Paul
@ 2018-08-07 20:39 ` Lyude Paul
  2018-08-11  9:45   ` [Nouveau] " Karol Herbst
  2018-08-07 20:39 ` [PATCH v5 06/13] drm/nouveau: Call optimus_dsm functions after nouveau_do_suspend() Lyude Paul
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:39 UTC (permalink / raw)
  To: nouveau
  Cc: Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie, dri-devel,
	linux-kernel

Again, this doesn't do anything. drm_kms_helper_poll_enable() will have
already been called in nouveau_display_init()

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
[omitting stable Cc, this probably doesn't fix any real bugs]
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 5fdc1fbe2ee5..04f704b77a3c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -592,10 +592,8 @@ nouveau_drm_load(struct drm_device *dev, unsigned long flags)
 		pm_runtime_allow(dev->dev);
 		pm_runtime_mark_last_busy(dev->dev);
 		pm_runtime_put(dev->dev);
-	} else {
-		/* enable polling for external displays */
-		drm_kms_helper_poll_enable(dev);
 	}
+
 	return 0;
 
 fail_dispinit:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 06/13] drm/nouveau: Call optimus_dsm functions after nouveau_do_suspend()
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
                   ` (4 preceding siblings ...)
  2018-08-07 20:39 ` [PATCH v5 05/13] drm/nouveau: Remove useless poll_enable() call in drm_load() Lyude Paul
@ 2018-08-07 20:39 ` Lyude Paul
  2018-08-10 21:35   ` [Nouveau] " Karol Herbst
  2018-08-07 20:39 ` [PATCH v5 07/13] drm/nouveau: Add missing unroll functions in nouveau_do_suspend() Lyude Paul
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:39 UTC (permalink / raw)
  To: nouveau
  Cc: stable, Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie,
	dri-devel, linux-kernel

It's not necessary to call these before we call nouveau_do_suspend().
Ideally; we shouldn't have to call this at all here, but getting that to
work will take a bit more effort. So for the time being, just move this
call after nouveau_do_suspend() to allow us to easily be able to abort
the runtime suspend process in nouveau_do_suspend()

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 04f704b77a3c..5ea8fe992484 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -833,8 +833,8 @@ nouveau_pmops_runtime_suspend(struct device *dev)
 		return -EBUSY;
 	}
 
-	nouveau_switcheroo_optimus_dsm();
 	ret = nouveau_do_suspend(drm_dev, true);
+	nouveau_switcheroo_optimus_dsm();
 	pci_save_state(pdev);
 	pci_disable_device(pdev);
 	pci_ignore_hotplug(pdev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 07/13] drm/nouveau: Add missing unroll functions in nouveau_do_suspend()
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
                   ` (5 preceding siblings ...)
  2018-08-07 20:39 ` [PATCH v5 06/13] drm/nouveau: Call optimus_dsm functions after nouveau_do_suspend() Lyude Paul
@ 2018-08-07 20:39 ` Lyude Paul
  2018-08-10 21:36   ` [Nouveau] " Karol Herbst
  2018-08-07 20:39 ` [PATCH v5 08/13] drm/nouveau: Don't ignore nouveau_do_suspend() ret value Lyude Paul
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:39 UTC (permalink / raw)
  To: nouveau
  Cc: stable, Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie,
	dri-devel, linux-kernel

Currently, it appears that nouveau_do_suspend() forgets to roll back
suspending fbcon and suspending the device LEDs. We also currently skip
the entire rollback process if nouveau_display_suspend() fails. So, fix
that.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 5ea8fe992484..db56e9b6b6af 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -674,10 +674,11 @@ nouveau_do_suspend(struct drm_device *dev, bool runtime)
 	if (dev->mode_config.num_crtc) {
 		NV_DEBUG(drm, "suspending console...\n");
 		nouveau_fbcon_set_suspend(dev, 1);
+
 		NV_DEBUG(drm, "suspending display...\n");
 		ret = nouveau_display_suspend(dev, runtime);
 		if (ret)
-			return ret;
+			goto fail_fbcon;
 	}
 
 	NV_DEBUG(drm, "evicting buffers...\n");
@@ -719,7 +720,14 @@ nouveau_do_suspend(struct drm_device *dev, bool runtime)
 	if (dev->mode_config.num_crtc) {
 		NV_DEBUG(drm, "resuming display...\n");
 		nouveau_display_resume(dev, runtime);
+
+fail_fbcon:
+		NV_DEBUG(drm, "resuming console...\n");
+		nouveau_fbcon_set_suspend(dev, 0);
 	}
+
+	nouveau_led_resume(dev);
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 08/13] drm/nouveau: Don't ignore nouveau_do_suspend() ret value
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
                   ` (6 preceding siblings ...)
  2018-08-07 20:39 ` [PATCH v5 07/13] drm/nouveau: Add missing unroll functions in nouveau_do_suspend() Lyude Paul
@ 2018-08-07 20:39 ` Lyude Paul
  2018-08-10 21:37   ` [Nouveau] " Karol Herbst
  2018-08-07 20:39 ` [PATCH v5 09/13] drm/nouveau: Fix deadlock with fb_helper by inhibiting it's HPD Lyude Paul
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:39 UTC (permalink / raw)
  To: nouveau
  Cc: stable, Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie,
	dri-devel, linux-kernel

Otherwise, how will we roll everything back?

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index db56e9b6b6af..f79b435368ec 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -842,6 +842,9 @@ nouveau_pmops_runtime_suspend(struct device *dev)
 	}
 
 	ret = nouveau_do_suspend(drm_dev, true);
+	if (ret)
+		return ret;
+
 	nouveau_switcheroo_optimus_dsm();
 	pci_save_state(pdev);
 	pci_disable_device(pdev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 09/13] drm/nouveau: Fix deadlock with fb_helper by inhibiting it's HPD
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
                   ` (7 preceding siblings ...)
  2018-08-07 20:39 ` [PATCH v5 08/13] drm/nouveau: Don't ignore nouveau_do_suspend() ret value Lyude Paul
@ 2018-08-07 20:39 ` Lyude Paul
  2018-08-07 20:39 ` [PATCH v5 10/13] drm/nouveau: Use pm_runtime_get_noresume() in connector_detect() Lyude Paul
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:39 UTC (permalink / raw)
  To: nouveau
  Cc: stable, Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie,
	Daniel Vetter, Ville Syrjälä,
	Archit Taneja, Ilia Mirkin, dri-devel, linux-kernel

I'm sure I don't need to tell you that fb_helper's locking is a mess.
That being said; fb_helper's locking mess can seriously complicate the
runtime suspend/resume operations of drivers because it can invoke
atomic commits and connector probing from anywhere that calls
drm_fb_helper_hotplug_event(). Since most drivers use
drm_fb_helper_output_poll_changed() as their output_poll_changed
handler, this can happen in every single context that can fire off a
hotplug event. An example:

[  246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds.
[  246.673398]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.676527] kworker/4:0     D    0    37      2 0x80000000
[  246.677580] Workqueue: events output_poll_execute [drm_kms_helper]
[  246.678704] Call Trace:
[  246.679753]  __schedule+0x322/0xaf0
[  246.680916]  schedule+0x33/0x90
[  246.681924]  schedule_preempt_disabled+0x15/0x20
[  246.683023]  __mutex_lock+0x569/0x9a0
[  246.684035]  ? kobject_uevent_env+0x117/0x7b0
[  246.685132]  ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.686179]  mutex_lock_nested+0x1b/0x20
[  246.687278]  ? mutex_lock_nested+0x1b/0x20
[  246.688307]  drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.689420]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.690462]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.691570]  output_poll_execute+0x198/0x1c0 [drm_kms_helper]
[  246.692611]  process_one_work+0x231/0x620
[  246.693725]  worker_thread+0x214/0x3a0
[  246.694756]  kthread+0x12b/0x150
[  246.695856]  ? wq_pool_ids_show+0x140/0x140
[  246.696888]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.697998]  ret_from_fork+0x3a/0x50
[  246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds.
[  246.700153]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.702278] kworker/0:1     D    0    60      2 0x80000000
[  246.703293] Workqueue: pm pm_runtime_work
[  246.704393] Call Trace:
[  246.705403]  __schedule+0x322/0xaf0
[  246.706439]  ? wait_for_completion+0x104/0x190
[  246.707393]  schedule+0x33/0x90
[  246.708375]  schedule_timeout+0x3a5/0x590
[  246.709289]  ? mark_held_locks+0x58/0x80
[  246.710208]  ? _raw_spin_unlock_irq+0x2c/0x40
[  246.711222]  ? wait_for_completion+0x104/0x190
[  246.712134]  ? trace_hardirqs_on_caller+0xf4/0x190
[  246.713094]  ? wait_for_completion+0x104/0x190
[  246.713964]  wait_for_completion+0x12c/0x190
[  246.714895]  ? wake_up_q+0x80/0x80
[  246.715727]  ? get_work_pool+0x90/0x90
[  246.716649]  flush_work+0x1c9/0x280
[  246.717483]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  246.718442]  __cancel_work_timer+0x146/0x1d0
[  246.719247]  cancel_delayed_work_sync+0x13/0x20
[  246.720043]  drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper]
[  246.721123]  nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau]
[  246.721897]  pci_pm_runtime_suspend+0x6b/0x190
[  246.722825]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.723737]  __rpm_callback+0x7a/0x1d0
[  246.724721]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.725607]  rpm_callback+0x24/0x80
[  246.726553]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.727376]  rpm_suspend+0x142/0x6b0
[  246.728185]  pm_runtime_work+0x97/0xc0
[  246.728938]  process_one_work+0x231/0x620
[  246.729796]  worker_thread+0x44/0x3a0
[  246.730614]  kthread+0x12b/0x150
[  246.731395]  ? wq_pool_ids_show+0x140/0x140
[  246.732202]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.732878]  ret_from_fork+0x3a/0x50
[  246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds.
[  246.734587]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.736113] kworker/4:2     D    0   422      2 0x80000080
[  246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[  246.737665] Call Trace:
[  246.738490]  __schedule+0x322/0xaf0
[  246.739250]  schedule+0x33/0x90
[  246.739908]  rpm_resume+0x19c/0x850
[  246.740750]  ? finish_wait+0x90/0x90
[  246.741541]  __pm_runtime_resume+0x4e/0x90
[  246.742370]  nv50_disp_atomic_commit+0x31/0x210 [nouveau]
[  246.743124]  drm_atomic_commit+0x4a/0x50 [drm]
[  246.743775]  restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper]
[  246.744603]  restore_fbdev_mode+0x31/0x140 [drm_kms_helper]
[  246.745373]  drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper]
[  246.746220]  drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
[  246.746884]  drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper]
[  246.747675]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.748544]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.749439]  nv50_mstm_hotplug+0x15/0x20 [nouveau]
[  246.750111]  drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper]
[  246.750764]  drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper]
[  246.751602]  drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper]
[  246.752314]  process_one_work+0x231/0x620
[  246.752979]  worker_thread+0x44/0x3a0
[  246.753838]  kthread+0x12b/0x150
[  246.754619]  ? wq_pool_ids_show+0x140/0x140
[  246.755386]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.756162]  ret_from_fork+0x3a/0x50
[  246.756847]
           Showing all locks held in the system:
[  246.758261] 3 locks held by kworker/4:0/37:
[  246.759016]  #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.759856]  #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.760670]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.761516] 2 locks held by kworker/0:1/60:
[  246.762274]  #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.762982]  #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.763890] 1 lock held by khungtaskd/64:
[  246.764664]  #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[  246.765588] 5 locks held by kworker/4:2/422:
[  246.766440]  #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.767390]  #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.768154]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper]
[  246.768966]  #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper]
[  246.769921]  #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm]
[  246.770839] 1 lock held by dmesg/1038:
[  246.771739] 2 locks held by zsh/1172:
[  246.772650]  #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[  246.773680]  #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870

[  246.775522] =============================================

I've tried a couple of solutions so far, but it seems that right now the
simplest solution is to just prevent fb_helper from handling hotplugs
when runtime suspending, as this allows us to eliminate any possibility
that we might race with fb_helper. We can just "resume" hotplug handling
for fbdev when we runtime resume, and fire off a hotplug event if
anything happened while we were suspended. Additionally; we take care to
abort the runtime resume process if can't lock the new fb_helper hotplug
locks before fb_helper starts doing something.

Changes since v4:
 - Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock
   condition that Lukas described
 - Just move all of this out of drm_fb_helper. It seems that other DRM
   drivers have already figured out other workarounds for this. If other
   drivers do end up needing this in the future, we can just move this
   back into drm_fb_helper again.

Changes since v3:
- Actually check if fb_helper is NULL in both new helpers
- Actually check drm_fbdev_emulation in both new helpers
- Don't fire off a fb_helper hotplug unconditionally; only do it if
  the following conditions are true (as otherwise, calling this in the
  wrong spot will cause Bad Things to happen):
  - fb_helper hotplug handling was actually inhibited previously
  - fb_helper actually has a delayed hotplug pending
  - fb_helper is actually bound
  - fb_helper is actually initialized
- Add __must_check to drm_fb_helper_suspend_hotplug(). There's no
  situation where a driver would actually want to use this without
  checking the return value, so enforce that
- Rewrite and clarify the documentation for both helpers.
- Make sure to return true in the drm_fb_helper_suspend_hotplug() stub
  that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION
  isn't enabled
- Actually grab the toplevel fb_helper lock in
  drm_fb_helper_resume_hotplug(), since it's possible other activity
  (such as a hotplug) could be going on at the same time the driver
  calls drm_fb_helper_resume_hotplug(). We need this to check whether or
  not drm_fb_helper_hotplug_event() needs to be called anyway

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c   |  2 +-
 drivers/gpu/drm/nouveau/nouveau_display.c | 48 +++++++++---
 drivers/gpu/drm/nouveau/nouveau_drm.c     |  3 +
 drivers/gpu/drm/nouveau/nouveau_fbcon.c   | 90 +++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_fbcon.h   |  9 +++
 5 files changed, 139 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 8b522a9b12f6..a0772389ed90 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -2049,7 +2049,7 @@ nv50_disp_atomic_state_alloc(struct drm_device *dev)
 static const struct drm_mode_config_funcs
 nv50_disp_func = {
 	.fb_create = nouveau_user_framebuffer_create,
-	.output_poll_changed = drm_fb_helper_output_poll_changed,
+	.output_poll_changed = nouveau_fbcon_output_poll_changed,
 	.atomic_check = nv50_disp_atomic_check,
 	.atomic_commit = nv50_disp_atomic_commit,
 	.atomic_state_alloc = nv50_disp_atomic_state_alloc,
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 1d36ab5d4796..7229887f7cbf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -293,7 +293,7 @@ nouveau_user_framebuffer_create(struct drm_device *dev,
 
 static const struct drm_mode_config_funcs nouveau_mode_config_funcs = {
 	.fb_create = nouveau_user_framebuffer_create,
-	.output_poll_changed = drm_fb_helper_output_poll_changed,
+	.output_poll_changed = nouveau_fbcon_output_poll_changed,
 };
 
 
@@ -608,25 +608,49 @@ nouveau_display_destroy(struct drm_device *dev)
 int
 nouveau_display_suspend(struct drm_device *dev, bool runtime)
 {
+	struct nouveau_drm *drm = nouveau_drm(dev);
 	struct nouveau_display *disp = nouveau_display(dev);
 	struct drm_crtc *crtc;
+	bool using_atomic = drm_drv_uses_atomic_modeset(dev);
+	int ret;
 
-	if (drm_drv_uses_atomic_modeset(dev)) {
-		if (!runtime) {
-			disp->suspend = drm_atomic_helper_suspend(dev);
-			if (IS_ERR(disp->suspend)) {
-				int ret = PTR_ERR(disp->suspend);
-				disp->suspend = NULL;
-				return ret;
-			}
+	if (!runtime && using_atomic) {
+		disp->suspend = drm_atomic_helper_suspend(dev);
+		if (IS_ERR(disp->suspend)) {
+			int ret = PTR_ERR(disp->suspend);
+			disp->suspend = NULL;
+			return ret;
 		}
-
-		nouveau_display_fini(dev, true);
-		return 0;
 	}
 
 	nouveau_display_fini(dev, true);
 
+	if (runtime && nouveau_fbcon_hotplugged_in_suspend(drm->fbcon)) {
+		NV_DEBUG(drm, "interrupted by delayed fb_helper hotplug\n");
+
+		/* We don't need to run the whole nouveau_display_resume()
+		 * process here, just undo what we did in
+		 * nouveau_display_fini()
+		 */
+		NV_DEBUG(drm, "resuming display...\n");
+		ret = nouveau_display_init(dev);
+		if (WARN_ON(ret)) {
+			NV_ERROR(drm, "Failed to bring back display!\n");
+			return ret;
+		}
+
+		return -EBUSY;
+	}
+
+	/* Any hotplugs received after this point will be picked up by ACPI,
+	 * which will wake us up properly before actually trying to handle the
+	 * hotplug.
+	 */
+
+	if (using_atomic)
+		return 0;
+
+	/* The rest of this is for legacy modesetting only */
 	list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
 		struct nouveau_framebuffer *nouveau_fb;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index f79b435368ec..296b1d63f054 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -841,6 +841,9 @@ nouveau_pmops_runtime_suspend(struct device *dev)
 		return -EBUSY;
 	}
 
+	if (!nouveau_fbcon_hotplug_suspend(nouveau_drm(drm_dev)->fbcon))
+		return -EBUSY;
+
 	ret = nouveau_do_suspend(drm_dev, true);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
index 85c1f10bc2b6..67677d60f07e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
@@ -466,6 +466,7 @@ nouveau_fbcon_set_suspend_work(struct work_struct *work)
 	console_unlock();
 
 	if (state == FBINFO_STATE_RUNNING) {
+		nouveau_fbcon_hotplug_resume(drm->fbcon);
 		pm_runtime_mark_last_busy(drm->dev->dev);
 		pm_runtime_put_sync(drm->dev->dev);
 	}
@@ -487,6 +488,94 @@ nouveau_fbcon_set_suspend(struct drm_device *dev, int state)
 	schedule_work(&drm->fbcon_work);
 }
 
+void
+nouveau_fbcon_output_poll_changed(struct drm_device *dev)
+{
+	struct nouveau_drm *drm = nouveau_drm(dev);
+	struct nouveau_fbdev *fbcon = drm->fbcon;
+
+	if (!fbcon)
+		return;
+
+	mutex_lock(&fbcon->hotplug_lock);
+
+	if (fbcon->hotplug_suspended) {
+		fbcon->hotplug_waiting = true;
+	} else {
+		pm_runtime_get_sync(drm->dev->dev);
+
+		drm_fb_helper_hotplug_event(&fbcon->helper);
+
+		pm_runtime_mark_last_busy(drm->dev->dev);
+		pm_runtime_put_autosuspend(drm->dev->dev);
+	}
+
+	mutex_unlock(&fbcon->hotplug_lock);
+}
+
+bool __must_check
+nouveau_fbcon_hotplug_suspend(struct nouveau_fbdev *fbcon)
+{
+	int ret;
+
+	if (!fbcon)
+		return true;
+
+	/* We can't block on fbcon if it's performing connector probes, as
+	 * it will need to grab a runtime PM reference to the GPU and deadlock
+	 * us if we do. So, just abort if it's busy (that's what we want
+	 * anyway)
+	 */
+	ret = mutex_trylock(&fbcon->hotplug_lock);
+	if (!ret)
+		return false;
+	fbcon->hotplug_suspended = true;
+	mutex_unlock(&fbcon->hotplug_lock);
+
+	return true;
+}
+
+void
+nouveau_fbcon_hotplug_resume(struct nouveau_fbdev *fbcon)
+{
+	struct drm_device *dev;
+
+	if (!fbcon)
+		return;
+	dev = fbcon->helper.dev;
+
+	mutex_lock(&fbcon->hotplug_lock);
+
+	fbcon->hotplug_suspended = false;
+	if (fbcon->hotplug_waiting) {
+		fbcon->hotplug_waiting = false;
+
+		pm_runtime_get_sync(dev->dev);
+
+		drm_fb_helper_hotplug_event(&fbcon->helper);
+
+		pm_runtime_mark_last_busy(dev->dev);
+		pm_runtime_put_autosuspend(dev->dev);
+	}
+
+	mutex_unlock(&fbcon->hotplug_lock);
+}
+
+bool
+nouveau_fbcon_hotplugged_in_suspend(struct nouveau_fbdev *fbcon)
+{
+	bool hotplug;
+
+	if (!fbcon)
+		return false;
+
+	mutex_lock(&fbcon->hotplug_lock);
+	hotplug = fbcon->hotplug_suspended && fbcon->hotplug_waiting;
+	mutex_unlock(&fbcon->hotplug_lock);
+
+	return hotplug;
+}
+
 int
 nouveau_fbcon_init(struct drm_device *dev)
 {
@@ -505,6 +594,7 @@ nouveau_fbcon_init(struct drm_device *dev)
 
 	drm->fbcon = fbcon;
 	INIT_WORK(&drm->fbcon_work, nouveau_fbcon_set_suspend_work);
+	mutex_init(&fbcon->hotplug_lock);
 
 	drm_fb_helper_prepare(dev, &fbcon->helper, &nouveau_fbcon_helper_funcs);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.h b/drivers/gpu/drm/nouveau/nouveau_fbcon.h
index a6f192ea3fa6..a939628bf7f7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.h
@@ -41,6 +41,10 @@ struct nouveau_fbdev {
 	struct nvif_object gdi;
 	struct nvif_object blit;
 	struct nvif_object twod;
+
+	struct mutex hotplug_lock;
+	bool hotplug_suspended;
+	bool hotplug_waiting;
 };
 
 void nouveau_fbcon_restore(void);
@@ -68,6 +72,11 @@ void nouveau_fbcon_set_suspend(struct drm_device *dev, int state);
 void nouveau_fbcon_accel_save_disable(struct drm_device *dev);
 void nouveau_fbcon_accel_restore(struct drm_device *dev);
 
+void nouveau_fbcon_output_poll_changed(struct drm_device *dev);
+bool __must_check nouveau_fbcon_hotplug_suspend(struct nouveau_fbdev *fbcon);
+void nouveau_fbcon_hotplug_resume(struct nouveau_fbdev *fbcon);
+bool nouveau_fbcon_hotplugged_in_suspend(struct nouveau_fbdev *fbcon);
+
 extern int nouveau_nofbaccel;
 
 #endif /* __NV50_FBCON_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 10/13] drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
                   ` (8 preceding siblings ...)
  2018-08-07 20:39 ` [PATCH v5 09/13] drm/nouveau: Fix deadlock with fb_helper by inhibiting it's HPD Lyude Paul
@ 2018-08-07 20:39 ` Lyude Paul
  2018-08-10 21:39   ` [Nouveau] " Karol Herbst
  2018-08-07 20:39 ` [PATCH v5 11/13] drm/nouveau: Respond to HPDs by probing one conn at a time Lyude Paul
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:39 UTC (permalink / raw)
  To: nouveau
  Cc: stable, Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie,
	dri-devel, linux-kernel

It's true we can't resume the device from poll workers in
nouveau_connector_detect(). We can however, prevent the autosuspend
timer from elapsing immediately if it hasn't already without risking any
sort of deadlock with the runtime suspend/resume operations. So do that
instead of entirely avoiding grabbing a power reference.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
---
 drivers/gpu/drm/nouveau/nouveau_connector.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 2a45b4c2ceb0..010d6db14cba 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -572,12 +572,16 @@ nouveau_connector_detect(struct drm_connector *connector, bool force)
 		nv_connector->edid = NULL;
 	}
 
-	/* Outputs are only polled while runtime active, so acquiring a
-	 * runtime PM ref here is unnecessary (and would deadlock upon
-	 * runtime suspend because it waits for polling to finish).
+	/* Outputs are only polled while runtime active, so resuming the
+	 * device here is unnecessary (and would deadlock upon runtime suspend
+	 * because it waits for polling to finish). We do however, want to
+	 * prevent the autosuspend timer from elapsing during this operation
+	 * if possible.
 	 */
-	if (!drm_kms_helper_is_poll_worker()) {
-		ret = pm_runtime_get_sync(connector->dev->dev);
+	if (drm_kms_helper_is_poll_worker()) {
+		pm_runtime_get_noresume(dev->dev);
+	} else {
+		ret = pm_runtime_get_sync(dev->dev);
 		if (ret < 0 && ret != -EACCES)
 			return conn_status;
 	}
@@ -655,10 +659,8 @@ nouveau_connector_detect(struct drm_connector *connector, bool force)
 
  out:
 
-	if (!drm_kms_helper_is_poll_worker()) {
-		pm_runtime_mark_last_busy(connector->dev->dev);
-		pm_runtime_put_autosuspend(connector->dev->dev);
-	}
+	pm_runtime_mark_last_busy(dev->dev);
+	pm_runtime_put_autosuspend(dev->dev);
 
 	return conn_status;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 11/13] drm/nouveau: Respond to HPDs by probing one conn at a time
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
                   ` (9 preceding siblings ...)
  2018-08-07 20:39 ` [PATCH v5 10/13] drm/nouveau: Use pm_runtime_get_noresume() in connector_detect() Lyude Paul
@ 2018-08-07 20:39 ` Lyude Paul
  2018-08-07 20:39 ` [PATCH v5 12/13] drm/nouveau: Fix deadlocks in nouveau_connector_detect() Lyude Paul
  2018-08-07 20:39 ` [PATCH v5 13/13] drm/nouveau: Call pm_runtime_get_noresume() from hpd handlers Lyude Paul
  12 siblings, 0 replies; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:39 UTC (permalink / raw)
  To: nouveau
  Cc: stable, Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie,
	dri-devel, linux-kernel

There isn't actually any reason we need to call drm_hpd_irq_event() from
our hotplug handler, as we already know which connector the hotplug
event was fired for. We're also going to need to avoid probing all
connectors needlessly from hotplug handlers anyway so that we can track
when nouveau_connector_detect() is being called from the context of it's
connector's hotplug handler in order to fix the next deadlocking issue.

This is (slightly) faster anyway!

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
---
 drivers/gpu/drm/nouveau/nouveau_connector.c | 28 ++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 010d6db14cba..9714e09f17db 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -1114,6 +1114,32 @@ nouveau_connector_funcs_lvds = {
 	.atomic_get_property = nouveau_conn_atomic_get_property,
 };
 
+static void
+nouveau_connector_hotplug_probe(struct nouveau_connector *nv_conn)
+{
+	struct drm_modeset_acquire_ctx ctx;
+	struct drm_connector *conn = &nv_conn->base;
+	enum drm_connector_status old_status;
+	struct drm_device *dev = conn->dev;
+	bool changed;
+
+	mutex_lock(&dev->mode_config.mutex);
+
+	drm_modeset_acquire_init(&ctx, 0);
+	drm_modeset_lock(&dev->mode_config.connection_mutex, &ctx);
+
+	old_status = conn->status;
+	conn->status = drm_helper_probe_detect(conn, &ctx, true);
+	changed = old_status != conn->status;
+
+	drm_modeset_drop_locks(&ctx);
+	drm_modeset_acquire_fini(&ctx);
+	mutex_unlock(&dev->mode_config.mutex);
+
+	if (changed)
+		drm_kms_helper_hotplug_event(dev);
+}
+
 static int
 nouveau_connector_hotplug(struct nvif_notify *notify)
 {
@@ -1138,7 +1164,7 @@ nouveau_connector_hotplug(struct nvif_notify *notify)
 				nv50_mstm_remove(nv_encoder->dp.mstm);
 		}
 
-		drm_helper_hpd_irq_event(connector->dev);
+		nouveau_connector_hotplug_probe(nv_connector);
 	}
 
 	return NVIF_NOTIFY_KEEP;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 12/13] drm/nouveau: Fix deadlocks in nouveau_connector_detect()
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
                   ` (10 preceding siblings ...)
  2018-08-07 20:39 ` [PATCH v5 11/13] drm/nouveau: Respond to HPDs by probing one conn at a time Lyude Paul
@ 2018-08-07 20:39 ` Lyude Paul
  2018-08-07 20:39 ` [PATCH v5 13/13] drm/nouveau: Call pm_runtime_get_noresume() from hpd handlers Lyude Paul
  12 siblings, 0 replies; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:39 UTC (permalink / raw)
  To: nouveau
  Cc: stable, Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie,
	dri-devel, linux-kernel

When we disable hotplugging on the GPU, we need to be able to
synchronize with each connector's hotplug interrupt handler before the
interrupt is finally disabled. This can be a problem however, since
nouveau_connector_detect() currently grabs a runtime power reference
when handling connector probing. This will deadlock the runtime suspend
handler like so:

[  861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds.
[  861.483290]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.486332] kworker/0:2     D    0    61      2 0x80000000
[  861.487044] Workqueue: events nouveau_display_hpd_work [nouveau]
[  861.487737] Call Trace:
[  861.488394]  __schedule+0x322/0xaf0
[  861.489070]  schedule+0x33/0x90
[  861.489744]  rpm_resume+0x19c/0x850
[  861.490392]  ? finish_wait+0x90/0x90
[  861.491068]  __pm_runtime_resume+0x4e/0x90
[  861.491753]  nouveau_display_hpd_work+0x22/0x60 [nouveau]
[  861.492416]  process_one_work+0x231/0x620
[  861.493068]  worker_thread+0x44/0x3a0
[  861.493722]  kthread+0x12b/0x150
[  861.494342]  ? wq_pool_ids_show+0x140/0x140
[  861.494991]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.495648]  ret_from_fork+0x3a/0x50
[  861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds.
[  861.496968]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.498341] kworker/6:2     D    0   320      2 0x80000080
[  861.499045] Workqueue: pm pm_runtime_work
[  861.499739] Call Trace:
[  861.500428]  __schedule+0x322/0xaf0
[  861.501134]  ? wait_for_completion+0x104/0x190
[  861.501851]  schedule+0x33/0x90
[  861.502564]  schedule_timeout+0x3a5/0x590
[  861.503284]  ? mark_held_locks+0x58/0x80
[  861.503988]  ? _raw_spin_unlock_irq+0x2c/0x40
[  861.504710]  ? wait_for_completion+0x104/0x190
[  861.505417]  ? trace_hardirqs_on_caller+0xf4/0x190
[  861.506136]  ? wait_for_completion+0x104/0x190
[  861.506845]  wait_for_completion+0x12c/0x190
[  861.507555]  ? wake_up_q+0x80/0x80
[  861.508268]  flush_work+0x1c9/0x280
[  861.508990]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  861.509735]  nvif_notify_put+0xb1/0xc0 [nouveau]
[  861.510482]  nouveau_display_fini+0xbd/0x170 [nouveau]
[  861.511241]  nouveau_display_suspend+0x67/0x120 [nouveau]
[  861.511969]  nouveau_do_suspend+0x5e/0x2d0 [nouveau]
[  861.512715]  nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau]
[  861.513435]  pci_pm_runtime_suspend+0x6b/0x180
[  861.514165]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.514897]  __rpm_callback+0x7a/0x1d0
[  861.515618]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.516313]  rpm_callback+0x24/0x80
[  861.517027]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.517741]  rpm_suspend+0x142/0x6b0
[  861.518449]  pm_runtime_work+0x97/0xc0
[  861.519144]  process_one_work+0x231/0x620
[  861.519831]  worker_thread+0x44/0x3a0
[  861.520522]  kthread+0x12b/0x150
[  861.521220]  ? wq_pool_ids_show+0x140/0x140
[  861.521925]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.522622]  ret_from_fork+0x3a/0x50
[  861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds.
[  861.523977]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.525349] kworker/6:0     D    0  1329      2 0x80000000
[  861.526073] Workqueue: events nvif_notify_work [nouveau]
[  861.526751] Call Trace:
[  861.527411]  __schedule+0x322/0xaf0
[  861.528089]  schedule+0x33/0x90
[  861.528758]  rpm_resume+0x19c/0x850
[  861.529399]  ? finish_wait+0x90/0x90
[  861.530073]  __pm_runtime_resume+0x4e/0x90
[  861.530798]  nouveau_connector_detect+0x7e/0x510 [nouveau]
[  861.531459]  ? ww_mutex_lock+0x47/0x80
[  861.532097]  ? ww_mutex_lock+0x47/0x80
[  861.532819]  ? drm_modeset_lock+0x88/0x130 [drm]
[  861.533481]  drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper]
[  861.534127]  drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper]
[  861.534940]  nouveau_connector_hotplug+0x98/0x120 [nouveau]
[  861.535556]  nvif_notify_work+0x2d/0xb0 [nouveau]
[  861.536221]  process_one_work+0x231/0x620
[  861.536994]  worker_thread+0x44/0x3a0
[  861.537757]  kthread+0x12b/0x150
[  861.538463]  ? wq_pool_ids_show+0x140/0x140
[  861.539102]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.539815]  ret_from_fork+0x3a/0x50
[  861.540521]
               Showing all locks held in the system:
[  861.541696] 2 locks held by kworker/0:2/61:
[  861.542406]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543071]  #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543814] 1 lock held by khungtaskd/64:
[  861.544535]  #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[  861.545160] 3 locks held by kworker/6:2/320:
[  861.545896]  #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.546702]  #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.547443]  #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau]
[  861.548146] 1 lock held by dmesg/983:
[  861.548889] 2 locks held by zsh/1250:
[  861.549605]  #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[  861.550393]  #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870
[  861.551122] 6 locks held by kworker/6:0/1329:
[  861.551957]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.552765]  #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620
[  861.553582]  #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper]
[  861.554357]  #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper]
[  861.555227]  #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper]
[  861.556133]  #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm]

[  861.557864] =============================================

[  861.559507] NMI backtrace for cpu 2
[  861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018
[  861.561948] Call Trace:
[  861.562757]  dump_stack+0x8e/0xd3
[  861.563516]  nmi_cpu_backtrace.cold.3+0x14/0x5a
[  861.564269]  ? lapic_can_unplug_cpu.cold.27+0x42/0x42
[  861.565029]  nmi_trigger_cpumask_backtrace+0xa1/0xae
[  861.565789]  arch_trigger_cpumask_backtrace+0x19/0x20
[  861.566558]  watchdog+0x316/0x580
[  861.567355]  kthread+0x12b/0x150
[  861.568114]  ? reset_hung_task_detector+0x20/0x20
[  861.568863]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.569598]  ret_from_fork+0x3a/0x50
[  861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7:
[  861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120
[  861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120
[  861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120
[  861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120
[  861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120
[  861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120
[  861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120
[  861.572428] Kernel panic - not syncing: hung_task: blocked tasks

So: fix this with a new trick; store the current task_struct that's
executing in the nouveau_connector structure, then avoid attempting to
runtime resume the device when we know that we're just running from the
context of our hotplug interrupt handler. Since hpd interrupts are only
enabled while the device is runtime active, this should be totally safe.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
---
 drivers/gpu/drm/nouveau/nouveau_connector.c | 16 ++++++++++------
 drivers/gpu/drm/nouveau/nouveau_connector.h |  1 +
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 9714e09f17db..8409c3f2c3a1 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -572,13 +572,14 @@ nouveau_connector_detect(struct drm_connector *connector, bool force)
 		nv_connector->edid = NULL;
 	}
 
-	/* Outputs are only polled while runtime active, so resuming the
-	 * device here is unnecessary (and would deadlock upon runtime suspend
-	 * because it waits for polling to finish). We do however, want to
-	 * prevent the autosuspend timer from elapsing during this operation
-	 * if possible.
+	/* Output polling and HPD only happens while we're runtime active, so
+	 * resuming the device here is unnecessary (and would deadlock upon
+	 * runtime suspend because it waits for polling to finish). We do
+	 * however, want to prevent the autosuspend timer from elapsing during
+	 * this operation if possible.
 	 */
-	if (drm_kms_helper_is_poll_worker()) {
+	if (drm_kms_helper_is_poll_worker() ||
+	    nv_connector->hpd_task == current) {
 		pm_runtime_get_noresume(dev->dev);
 	} else {
 		ret = pm_runtime_get_sync(dev->dev);
@@ -1151,6 +1152,8 @@ nouveau_connector_hotplug(struct nvif_notify *notify)
 	const char *name = connector->name;
 	struct nouveau_encoder *nv_encoder;
 
+	nv_connector->hpd_task = current;
+
 	if (rep->mask & NVIF_NOTIFY_CONN_V0_IRQ) {
 		NV_DEBUG(drm, "service %s\n", name);
 		if ((nv_encoder = find_encoder(connector, DCB_OUTPUT_DP)))
@@ -1167,6 +1170,7 @@ nouveau_connector_hotplug(struct nvif_notify *notify)
 		nouveau_connector_hotplug_probe(nv_connector);
 	}
 
+	nv_connector->hpd_task = NULL;
 	return NVIF_NOTIFY_KEEP;
 }
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.h b/drivers/gpu/drm/nouveau/nouveau_connector.h
index 2d9d35a146a4..1964e682ba13 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.h
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.h
@@ -45,6 +45,7 @@ struct nouveau_connector {
 	u8 *dcb;
 
 	struct nvif_notify hpd;
+	struct task_struct *hpd_task;
 
 	struct drm_dp_aux aux;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 13/13] drm/nouveau: Call pm_runtime_get_noresume() from hpd handlers
  2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
                   ` (11 preceding siblings ...)
  2018-08-07 20:39 ` [PATCH v5 12/13] drm/nouveau: Fix deadlocks in nouveau_connector_detect() Lyude Paul
@ 2018-08-07 20:39 ` Lyude Paul
  2018-08-11  9:35   ` [Nouveau] " Karol Herbst
  12 siblings, 1 reply; 20+ messages in thread
From: Lyude Paul @ 2018-08-07 20:39 UTC (permalink / raw)
  To: nouveau
  Cc: stable, Lukas Wunner, Karol Herbst, Ben Skeggs, David Airlie,
	dri-devel, linux-kernel

We can't and don't need to try resuming the device from our hotplug
handlers, but hotplug events are generally something we'd like to keep
the device awake for whenever possible. So, grab a PM ref safely in our
hotplug handlers using pm_runtime_get_noresume() and mark the device as
busy once we're finished.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Karol Herbst <karolherbst@gmail.com>
---
 drivers/gpu/drm/nouveau/nouveau_connector.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 8409c3f2c3a1..5a8e8c1ad647 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -1152,6 +1152,11 @@ nouveau_connector_hotplug(struct nvif_notify *notify)
 	const char *name = connector->name;
 	struct nouveau_encoder *nv_encoder;
 
+	/* Resuming the device here isn't possible; but the suspend PM ops
+	 * will wait for us to finish our work before disabling us so this
+	 * should be enough
+	 */
+	pm_runtime_get_noresume(drm->dev->dev);
 	nv_connector->hpd_task = current;
 
 	if (rep->mask & NVIF_NOTIFY_CONN_V0_IRQ) {
@@ -1171,6 +1176,9 @@ nouveau_connector_hotplug(struct nvif_notify *notify)
 	}
 
 	nv_connector->hpd_task = NULL;
+
+	pm_runtime_mark_last_busy(drm->dev->dev);
+	pm_runtime_put_autosuspend(drm->dev->dev);
 	return NVIF_NOTIFY_KEEP;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Nouveau] [PATCH v5 06/13] drm/nouveau: Call optimus_dsm functions after nouveau_do_suspend()
  2018-08-07 20:39 ` [PATCH v5 06/13] drm/nouveau: Call optimus_dsm functions after nouveau_do_suspend() Lyude Paul
@ 2018-08-10 21:35   ` Karol Herbst
  0 siblings, 0 replies; 20+ messages in thread
From: Karol Herbst @ 2018-08-10 21:35 UTC (permalink / raw)
  To: Lyude Paul; +Cc: nouveau, David Airlie, dri-devel, LKML, stable, Ben Skeggs

Reviewed-by: Karol Herbst <kherbst@redhat.com>

On Tue, Aug 7, 2018 at 10:39 PM, Lyude Paul <lyude@redhat.com> wrote:
> It's not necessary to call these before we call nouveau_do_suspend().
> Ideally; we shouldn't have to call this at all here, but getting that to
> work will take a bit more effort. So for the time being, just move this
> call after nouveau_do_suspend() to allow us to easily be able to abort
> the runtime suspend process in nouveau_do_suspend()
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: stable@vger.kernel.org
> Cc: Lukas Wunner <lukas@wunner.de>
> Cc: Karol Herbst <karolherbst@gmail.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 04f704b77a3c..5ea8fe992484 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -833,8 +833,8 @@ nouveau_pmops_runtime_suspend(struct device *dev)
>                 return -EBUSY;
>         }
>
> -       nouveau_switcheroo_optimus_dsm();
>         ret = nouveau_do_suspend(drm_dev, true);
> +       nouveau_switcheroo_optimus_dsm();
>         pci_save_state(pdev);
>         pci_disable_device(pdev);
>         pci_ignore_hotplug(pdev);
> --
> 2.17.1
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Nouveau] [PATCH v5 07/13] drm/nouveau: Add missing unroll functions in nouveau_do_suspend()
  2018-08-07 20:39 ` [PATCH v5 07/13] drm/nouveau: Add missing unroll functions in nouveau_do_suspend() Lyude Paul
@ 2018-08-10 21:36   ` Karol Herbst
  0 siblings, 0 replies; 20+ messages in thread
From: Karol Herbst @ 2018-08-10 21:36 UTC (permalink / raw)
  To: Lyude Paul; +Cc: nouveau, David Airlie, dri-devel, LKML, stable, Ben Skeggs

Reviewed-by: Karol Herbst <kherbst@redhat.com>

On Tue, Aug 7, 2018 at 10:39 PM, Lyude Paul <lyude@redhat.com> wrote:
> Currently, it appears that nouveau_do_suspend() forgets to roll back
> suspending fbcon and suspending the device LEDs. We also currently skip
> the entire rollback process if nouveau_display_suspend() fails. So, fix
> that.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: stable@vger.kernel.org
> Cc: Lukas Wunner <lukas@wunner.de>
> Cc: Karol Herbst <karolherbst@gmail.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 5ea8fe992484..db56e9b6b6af 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -674,10 +674,11 @@ nouveau_do_suspend(struct drm_device *dev, bool runtime)
>         if (dev->mode_config.num_crtc) {
>                 NV_DEBUG(drm, "suspending console...\n");
>                 nouveau_fbcon_set_suspend(dev, 1);
> +
>                 NV_DEBUG(drm, "suspending display...\n");
>                 ret = nouveau_display_suspend(dev, runtime);
>                 if (ret)
> -                       return ret;
> +                       goto fail_fbcon;
>         }
>
>         NV_DEBUG(drm, "evicting buffers...\n");
> @@ -719,7 +720,14 @@ nouveau_do_suspend(struct drm_device *dev, bool runtime)
>         if (dev->mode_config.num_crtc) {
>                 NV_DEBUG(drm, "resuming display...\n");
>                 nouveau_display_resume(dev, runtime);
> +
> +fail_fbcon:
> +               NV_DEBUG(drm, "resuming console...\n");
> +               nouveau_fbcon_set_suspend(dev, 0);
>         }
> +
> +       nouveau_led_resume(dev);
> +
>         return ret;
>  }
>
> --
> 2.17.1
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Nouveau] [PATCH v5 08/13] drm/nouveau: Don't ignore nouveau_do_suspend() ret value
  2018-08-07 20:39 ` [PATCH v5 08/13] drm/nouveau: Don't ignore nouveau_do_suspend() ret value Lyude Paul
@ 2018-08-10 21:37   ` Karol Herbst
  0 siblings, 0 replies; 20+ messages in thread
From: Karol Herbst @ 2018-08-10 21:37 UTC (permalink / raw)
  To: Lyude Paul; +Cc: nouveau, David Airlie, dri-devel, LKML, stable, Ben Skeggs

Reviewed-by: Karol Herbst <kherbst@redhat.com>

On Tue, Aug 7, 2018 at 10:39 PM, Lyude Paul <lyude@redhat.com> wrote:
> Otherwise, how will we roll everything back?
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: stable@vger.kernel.org
> Cc: Lukas Wunner <lukas@wunner.de>
> Cc: Karol Herbst <karolherbst@gmail.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index db56e9b6b6af..f79b435368ec 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -842,6 +842,9 @@ nouveau_pmops_runtime_suspend(struct device *dev)
>         }
>
>         ret = nouveau_do_suspend(drm_dev, true);
> +       if (ret)
> +               return ret;
> +
>         nouveau_switcheroo_optimus_dsm();
>         pci_save_state(pdev);
>         pci_disable_device(pdev);
> --
> 2.17.1
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Nouveau] [PATCH v5 10/13] drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()
  2018-08-07 20:39 ` [PATCH v5 10/13] drm/nouveau: Use pm_runtime_get_noresume() in connector_detect() Lyude Paul
@ 2018-08-10 21:39   ` Karol Herbst
  0 siblings, 0 replies; 20+ messages in thread
From: Karol Herbst @ 2018-08-10 21:39 UTC (permalink / raw)
  To: Lyude Paul; +Cc: nouveau, David Airlie, dri-devel, LKML, stable, Ben Skeggs

Reviewed-by: Karol Herbst <kherbst@redhat.com>

On Tue, Aug 7, 2018 at 10:39 PM, Lyude Paul <lyude@redhat.com> wrote:
> It's true we can't resume the device from poll workers in
> nouveau_connector_detect(). We can however, prevent the autosuspend
> timer from elapsing immediately if it hasn't already without risking any
> sort of deadlock with the runtime suspend/resume operations. So do that
> instead of entirely avoiding grabbing a power reference.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: stable@vger.kernel.org
> Cc: Lukas Wunner <lukas@wunner.de>
> Cc: Karol Herbst <karolherbst@gmail.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_connector.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
> index 2a45b4c2ceb0..010d6db14cba 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_connector.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
> @@ -572,12 +572,16 @@ nouveau_connector_detect(struct drm_connector *connector, bool force)
>                 nv_connector->edid = NULL;
>         }
>
> -       /* Outputs are only polled while runtime active, so acquiring a
> -        * runtime PM ref here is unnecessary (and would deadlock upon
> -        * runtime suspend because it waits for polling to finish).
> +       /* Outputs are only polled while runtime active, so resuming the
> +        * device here is unnecessary (and would deadlock upon runtime suspend
> +        * because it waits for polling to finish). We do however, want to
> +        * prevent the autosuspend timer from elapsing during this operation
> +        * if possible.
>          */
> -       if (!drm_kms_helper_is_poll_worker()) {
> -               ret = pm_runtime_get_sync(connector->dev->dev);
> +       if (drm_kms_helper_is_poll_worker()) {
> +               pm_runtime_get_noresume(dev->dev);
> +       } else {
> +               ret = pm_runtime_get_sync(dev->dev);
>                 if (ret < 0 && ret != -EACCES)
>                         return conn_status;
>         }
> @@ -655,10 +659,8 @@ nouveau_connector_detect(struct drm_connector *connector, bool force)
>
>   out:
>
> -       if (!drm_kms_helper_is_poll_worker()) {
> -               pm_runtime_mark_last_busy(connector->dev->dev);
> -               pm_runtime_put_autosuspend(connector->dev->dev);
> -       }
> +       pm_runtime_mark_last_busy(dev->dev);
> +       pm_runtime_put_autosuspend(dev->dev);
>
>         return conn_status;
>  }
> --
> 2.17.1
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Nouveau] [PATCH v5 13/13] drm/nouveau: Call pm_runtime_get_noresume() from hpd handlers
  2018-08-07 20:39 ` [PATCH v5 13/13] drm/nouveau: Call pm_runtime_get_noresume() from hpd handlers Lyude Paul
@ 2018-08-11  9:35   ` Karol Herbst
  0 siblings, 0 replies; 20+ messages in thread
From: Karol Herbst @ 2018-08-11  9:35 UTC (permalink / raw)
  To: Lyude Paul; +Cc: nouveau, David Airlie, dri-devel, LKML, stable, Ben Skeggs

On Tue, Aug 7, 2018 at 10:39 PM, Lyude Paul <lyude@redhat.com> wrote:
> We can't and don't need to try resuming the device from our hotplug
> handlers, but hotplug events are generally something we'd like to keep
> the device awake for whenever possible. So, grab a PM ref safely in our
> hotplug handlers using pm_runtime_get_noresume() and mark the device as
> busy once we're finished.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: stable@vger.kernel.org
> Cc: Lukas Wunner <lukas@wunner.de>
> Cc: Karol Herbst <karolherbst@gmail.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_connector.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
> index 8409c3f2c3a1..5a8e8c1ad647 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_connector.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
> @@ -1152,6 +1152,11 @@ nouveau_connector_hotplug(struct nvif_notify *notify)
>         const char *name = connector->name;
>         struct nouveau_encoder *nv_encoder;
>
> +       /* Resuming the device here isn't possible; but the suspend PM ops
> +        * will wait for us to finish our work before disabling us so this
> +        * should be enough
> +        */
> +       pm_runtime_get_noresume(drm->dev->dev);

what happens when pm_runtime_get_noresume is called nd the device is
suspended already? Do we have to take care of this here as well (like
simply aborting) or is it fine as it is?

>         nv_connector->hpd_task = current;
>
>         if (rep->mask & NVIF_NOTIFY_CONN_V0_IRQ) {
> @@ -1171,6 +1176,9 @@ nouveau_connector_hotplug(struct nvif_notify *notify)
>         }
>
>         nv_connector->hpd_task = NULL;
> +
> +       pm_runtime_mark_last_busy(drm->dev->dev);
> +       pm_runtime_put_autosuspend(drm->dev->dev);
>         return NVIF_NOTIFY_KEEP;
>  }
>
> --
> 2.17.1
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Nouveau] [PATCH v5 05/13] drm/nouveau: Remove useless poll_enable() call in drm_load()
  2018-08-07 20:39 ` [PATCH v5 05/13] drm/nouveau: Remove useless poll_enable() call in drm_load() Lyude Paul
@ 2018-08-11  9:45   ` Karol Herbst
  0 siblings, 0 replies; 20+ messages in thread
From: Karol Herbst @ 2018-08-11  9:45 UTC (permalink / raw)
  To: Lyude Paul; +Cc: nouveau, David Airlie, LKML, dri-devel, Ben Skeggs

Patches 1-5 are Acked-by: Karol Herbst <kherbst@redhat.com>

On Tue, Aug 7, 2018 at 10:39 PM, Lyude Paul <lyude@redhat.com> wrote:
> Again, this doesn't do anything. drm_kms_helper_poll_enable() will have
> already been called in nouveau_display_init()
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Lukas Wunner <lukas@wunner.de>
> Cc: Karol Herbst <karolherbst@gmail.com>
> [omitting stable Cc, this probably doesn't fix any real bugs]
> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 5fdc1fbe2ee5..04f704b77a3c 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -592,10 +592,8 @@ nouveau_drm_load(struct drm_device *dev, unsigned long flags)
>                 pm_runtime_allow(dev->dev);
>                 pm_runtime_mark_last_busy(dev->dev);
>                 pm_runtime_put(dev->dev);
> -       } else {
> -               /* enable polling for external displays */
> -               drm_kms_helper_poll_enable(dev);
>         }
> +
>         return 0;
>
>  fail_dispinit:
> --
> 2.17.1
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-08-11  9:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-07 20:38 [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs Lyude Paul
2018-08-07 20:38 ` [PATCH v5 01/13] drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement Lyude Paul
2018-08-07 20:38 ` [PATCH v5 02/13] drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend() Lyude Paul
2018-08-07 20:39 ` [PATCH v5 03/13] drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state() Lyude Paul
2018-08-07 20:39 ` [PATCH v5 04/13] drm/nouveau: Remove useless poll_disable() " Lyude Paul
2018-08-07 20:39 ` [PATCH v5 05/13] drm/nouveau: Remove useless poll_enable() call in drm_load() Lyude Paul
2018-08-11  9:45   ` [Nouveau] " Karol Herbst
2018-08-07 20:39 ` [PATCH v5 06/13] drm/nouveau: Call optimus_dsm functions after nouveau_do_suspend() Lyude Paul
2018-08-10 21:35   ` [Nouveau] " Karol Herbst
2018-08-07 20:39 ` [PATCH v5 07/13] drm/nouveau: Add missing unroll functions in nouveau_do_suspend() Lyude Paul
2018-08-10 21:36   ` [Nouveau] " Karol Herbst
2018-08-07 20:39 ` [PATCH v5 08/13] drm/nouveau: Don't ignore nouveau_do_suspend() ret value Lyude Paul
2018-08-10 21:37   ` [Nouveau] " Karol Herbst
2018-08-07 20:39 ` [PATCH v5 09/13] drm/nouveau: Fix deadlock with fb_helper by inhibiting it's HPD Lyude Paul
2018-08-07 20:39 ` [PATCH v5 10/13] drm/nouveau: Use pm_runtime_get_noresume() in connector_detect() Lyude Paul
2018-08-10 21:39   ` [Nouveau] " Karol Herbst
2018-08-07 20:39 ` [PATCH v5 11/13] drm/nouveau: Respond to HPDs by probing one conn at a time Lyude Paul
2018-08-07 20:39 ` [PATCH v5 12/13] drm/nouveau: Fix deadlocks in nouveau_connector_detect() Lyude Paul
2018-08-07 20:39 ` [PATCH v5 13/13] drm/nouveau: Call pm_runtime_get_noresume() from hpd handlers Lyude Paul
2018-08-11  9:35   ` [Nouveau] " Karol Herbst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).