[PATCH 0/3] drm/dp_mst: Fix regressions from new atomic VCPI helpers

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] drm/dp_mst: Fix regressions from new atomic VCPI helpers
@ 2019-01-29 18:39 Lyude Paul
  2019-01-29 18:39   ` Lyude Paul
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Lyude Paul @ 2019-01-29 18:39 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Daniel Vetter, David Airlie, Maxime Ripard, Maarten Lankhorst,
	linux-kernel, Sean Paul, Rodrigo Vivi, Jani Nikula,
	Joonas Lahtinen

This fixes the extra issues I discovered upstream after the introduction
of my rework of the atomic VCPI helpers that occur during
suspend/resume.

Lyude Paul (3):
  drm/dp_mst: Fix unbalanced malloc ref in drm_dp_mst_deallocate_vcpi()
  drm/atomic: Add drm_atomic_state->suspend_or_resume
  drm/i915: Always allocate VCPI during system resume

 drivers/gpu/drm/drm_atomic_helper.c   | 16 ++++++++--
 drivers/gpu/drm/drm_dp_mst_topology.c | 44 ++++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_display.c  |  4 +--
 drivers/gpu/drm/i915/intel_dp_mst.c   |  9 ++++--
 include/drm/drm_atomic.h              | 11 +++++++
 include/drm/drm_atomic_helper.h       |  3 +-
 6 files changed, 71 insertions(+), 16 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] drm/dp_mst: Fix unbalanced malloc ref in drm_dp_mst_deallocate_vcpi()
  2019-01-29 18:39 [PATCH 0/3] drm/dp_mst: Fix regressions from new atomic VCPI helpers Lyude Paul
@ 2019-01-29 18:39   ` Lyude Paul
  2019-01-29 18:39   ` Lyude Paul
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Lyude Paul @ 2019-01-29 18:39 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Daniel Vetter, Maarten Lankhorst, Maxime Ripard, Sean Paul,
	David Airlie, linux-kernel

In drm_dp_mst_deallocate_vcpi(), we currently unconditionally call
drm_dp_mst_put_port_malloc() on the port that's passed to us, even if we
never successfully allocated VCPI to it. This is contrary to what we do
in drm_dp_mst_allocate_vcpi(), where we only call
drm_dp_mst_get_port_malloc() on the passed port if we successfully
allocated VCPI to it.

As a result, if drm_dp_mst_allocate_vcpi() fails during a modeset and
another successive modeset calls drm_dp_mst_deallocate_vcpi() we will
end up dropping someone else's malloc reference to the port. Example:

[  962.309260] ==================================================================
[  962.309290] BUG: KASAN: use-after-free in drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309296] Read of size 4 at addr ffff888416c30004 by task kworker/0:1H/500

[  962.309308] CPU: 0 PID: 500 Comm: kworker/0:1H Tainted: G        W  O      5.0.0-rc2Lyude-Test+ #1
[  962.309313] Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
[  962.309428] Workqueue: events_highpri intel_atomic_cleanup_work [i915]
[  962.309434] Call Trace:
[  962.309452]  dump_stack+0xad/0x150
[  962.309462]  ? dump_stack_print_info.cold.0+0x1b/0x1b
[  962.309472]  ? kmsg_dump_rewind_nolock+0xd9/0xd9
[  962.309504]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309515]  print_address_description+0x6c/0x23c
[  962.309542]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309568]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309577]  kasan_report.cold.3+0x1a/0x32
[  962.309605]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309631]  drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309658]  ? drm_dp_mst_put_mstb_malloc+0x180/0x180 [drm_kms_helper]
[  962.309687]  drm_dp_mst_destroy_state+0xcd/0x120 [drm_kms_helper]
[  962.309745]  drm_atomic_state_default_clear+0x6ee/0xcc0 [drm]
[  962.309864]  intel_atomic_state_clear+0xe/0x80 [i915]
[  962.309928]  __drm_atomic_state_free+0x35/0xd0 [drm]
[  962.310044]  intel_atomic_cleanup_work+0x56/0x70 [i915]
[  962.310057]  process_one_work+0x884/0x1400
[  962.310067]  ? drain_workqueue+0x5a0/0x5a0
[  962.310075]  ? __schedule+0x87f/0x1e80
[  962.310086]  ? __sched_text_start+0x8/0x8
[  962.310095]  ? run_rebalance_domains+0x400/0x400
[  962.310110]  ? deref_stack_reg+0xb4/0x120
[  962.310117]  ? __read_once_size_nocheck.constprop.7+0x10/0x10
[  962.310124]  ? worker_enter_idle+0x47f/0x6a0
[  962.310134]  ? schedule+0xd7/0x2e0
[  962.310141]  ? __schedule+0x1e80/0x1e80
[  962.310148]  ? _raw_spin_lock_irq+0x9f/0x130
[  962.310155]  ? _raw_write_unlock_irqrestore+0x110/0x110
[  962.310164]  worker_thread+0x196/0x11e0
[  962.310175]  ? set_load_weight+0x2e0/0x2e0
[  962.310181]  ? __switch_to_asm+0x34/0x70
[  962.310187]  ? __switch_to_asm+0x40/0x70
[  962.310194]  ? process_one_work+0x1400/0x1400
[  962.310199]  ? __switch_to_asm+0x40/0x70
[  962.310205]  ? __switch_to_asm+0x34/0x70
[  962.310211]  ? __switch_to_asm+0x34/0x70
[  962.310216]  ? __switch_to_asm+0x40/0x70
[  962.310221]  ? __switch_to_asm+0x34/0x70
[  962.310226]  ? __switch_to_asm+0x40/0x70
[  962.310231]  ? __switch_to_asm+0x34/0x70
[  962.310236]  ? __switch_to_asm+0x40/0x70
[  962.310242]  ? syscall_return_via_sysret+0xf/0x7f
[  962.310248]  ? __switch_to_asm+0x34/0x70
[  962.310253]  ? __switch_to_asm+0x40/0x70
[  962.310258]  ? __switch_to_asm+0x34/0x70
[  962.310263]  ? __switch_to_asm+0x40/0x70
[  962.310268]  ? __switch_to_asm+0x34/0x70
[  962.310273]  ? __switch_to_asm+0x40/0x70
[  962.310281]  ? __schedule+0x87f/0x1e80
[  962.310292]  ? __sched_text_start+0x8/0x8
[  962.310300]  ? save_stack+0x8c/0xb0
[  962.310308]  ? __kasan_kmalloc.constprop.6+0xc6/0xd0
[  962.310313]  ? kthread+0x98/0x3a0
[  962.310318]  ? ret_from_fork+0x35/0x40
[  962.310334]  ? __wake_up_common+0x178/0x6f0
[  962.310343]  ? _raw_spin_lock_irqsave+0xa4/0x140
[  962.310349]  ? __lock_text_start+0x8/0x8
[  962.310355]  ? _raw_write_lock_irqsave+0x70/0x130
[  962.310360]  ? __lock_text_start+0x8/0x8
[  962.310371]  ? process_one_work+0x1400/0x1400
[  962.310376]  kthread+0x2e2/0x3a0
[  962.310383]  ? kthread_create_on_node+0xc0/0xc0
[  962.310389]  ret_from_fork+0x35/0x40

[  962.310401] Allocated by task 1462:
[  962.310410]  __kasan_kmalloc.constprop.6+0xc6/0xd0
[  962.310437]  drm_dp_add_port+0xd60/0x1960 [drm_kms_helper]
[  962.310464]  drm_dp_send_link_address+0x4b0/0x770 [drm_kms_helper]
[  962.310491]  drm_dp_check_and_send_link_address+0x197/0x1f0 [drm_kms_helper]
[  962.310515]  drm_dp_mst_link_probe_work+0x2b6/0x330 [drm_kms_helper]
[  962.310522]  process_one_work+0x884/0x1400
[  962.310529]  worker_thread+0x196/0x11e0
[  962.310533]  kthread+0x2e2/0x3a0
[  962.310538]  ret_from_fork+0x35/0x40

[  962.310543] Freed by task 500:
[  962.310550]  __kasan_slab_free+0x133/0x180
[  962.310555]  kfree+0x92/0x1a0
[  962.310581]  drm_dp_mst_put_port_malloc+0x14d/0x180 [drm_kms_helper]
[  962.310693]  intel_connector_destroy+0xb2/0xe0 [i915]
[  962.310747]  drm_mode_object_put.part.0+0x12b/0x1a0 [drm]
[  962.310802]  drm_atomic_state_default_clear+0x1f2/0xcc0 [drm]
[  962.310916]  intel_atomic_state_clear+0xe/0x80 [i915]
[  962.310972]  __drm_atomic_state_free+0x35/0xd0 [drm]
[  962.311083]  intel_atomic_cleanup_work+0x56/0x70 [i915]
[  962.311092]  process_one_work+0x884/0x1400
[  962.311098]  worker_thread+0x196/0x11e0
[  962.311103]  kthread+0x2e2/0x3a0
[  962.311108]  ret_from_fork+0x35/0x40

[  962.311116] The buggy address belongs to the object at ffff888416c30000
                which belongs to the cache kmalloc-2k of size 2048
[  962.311122] The buggy address is located 4 bytes inside of
                2048-byte region [ffff888416c30000, ffff888416c30800)
[  962.311124] The buggy address belongs to the page:
[  962.311132] page:ffffea00105b0c00 count:1 mapcount:0 mapping:ffff88841d003040 index:0x0 compound_mapcount: 0
[  962.311142] flags: 0x8000000000010200(slab|head)
[  962.311152] raw: 8000000000010200 dead000000000100 dead000000000200 ffff88841d003040
[  962.311159] raw: 0000000000000000 00000000000f000f 00000001ffffffff 0000000000000000
[  962.311162] page dumped because: kasan: bad access detected

So, bail early if drm_dp_mst_deallocate_vcpi() is called on a port with
no VCPI allocation. Additionally, clean up the surrounding kerneldoc
while we're at it since the port is assumed to be kept around because
the DRM driver is expected to hold a malloc reference to it, not just
us.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations")
Cc: Daniel Vetter <daniel@ffwll.ch>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 2552a27362a0..8ba0e5b368da 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -3246,15 +3246,13 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
 /**
  * drm_dp_mst_deallocate_vcpi() - deallocate a VCPI
  * @mgr: manager for this port
- * @port: unverified port to deallocate vcpi for
+ * @port: port to deallocate vcpi for
  */
 void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
 				struct drm_dp_mst_port *port)
 {
-	/*
-	 * A port with VCPI will remain allocated until it's VCPI is
-	 * released, no verified ref needed
-	 */
+	if (!port->vcpi.vcpi)
+		return;
 
 	drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
 	port->vcpi.num_slots = 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 1/3] drm/dp_mst: Fix unbalanced malloc ref in drm_dp_mst_deallocate_vcpi()
@ 2019-01-29 18:39   ` Lyude Paul
  0 siblings, 0 replies; 15+ messages in thread
From: Lyude Paul @ 2019-01-29 18:39 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Maxime Ripard, linux-kernel, David Airlie, Sean Paul

In drm_dp_mst_deallocate_vcpi(), we currently unconditionally call
drm_dp_mst_put_port_malloc() on the port that's passed to us, even if we
never successfully allocated VCPI to it. This is contrary to what we do
in drm_dp_mst_allocate_vcpi(), where we only call
drm_dp_mst_get_port_malloc() on the passed port if we successfully
allocated VCPI to it.

As a result, if drm_dp_mst_allocate_vcpi() fails during a modeset and
another successive modeset calls drm_dp_mst_deallocate_vcpi() we will
end up dropping someone else's malloc reference to the port. Example:

[  962.309260] ==================================================================
[  962.309290] BUG: KASAN: use-after-free in drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309296] Read of size 4 at addr ffff888416c30004 by task kworker/0:1H/500

[  962.309308] CPU: 0 PID: 500 Comm: kworker/0:1H Tainted: G        W  O      5.0.0-rc2Lyude-Test+ #1
[  962.309313] Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
[  962.309428] Workqueue: events_highpri intel_atomic_cleanup_work [i915]
[  962.309434] Call Trace:
[  962.309452]  dump_stack+0xad/0x150
[  962.309462]  ? dump_stack_print_info.cold.0+0x1b/0x1b
[  962.309472]  ? kmsg_dump_rewind_nolock+0xd9/0xd9
[  962.309504]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309515]  print_address_description+0x6c/0x23c
[  962.309542]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309568]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309577]  kasan_report.cold.3+0x1a/0x32
[  962.309605]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309631]  drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
[  962.309658]  ? drm_dp_mst_put_mstb_malloc+0x180/0x180 [drm_kms_helper]
[  962.309687]  drm_dp_mst_destroy_state+0xcd/0x120 [drm_kms_helper]
[  962.309745]  drm_atomic_state_default_clear+0x6ee/0xcc0 [drm]
[  962.309864]  intel_atomic_state_clear+0xe/0x80 [i915]
[  962.309928]  __drm_atomic_state_free+0x35/0xd0 [drm]
[  962.310044]  intel_atomic_cleanup_work+0x56/0x70 [i915]
[  962.310057]  process_one_work+0x884/0x1400
[  962.310067]  ? drain_workqueue+0x5a0/0x5a0
[  962.310075]  ? __schedule+0x87f/0x1e80
[  962.310086]  ? __sched_text_start+0x8/0x8
[  962.310095]  ? run_rebalance_domains+0x400/0x400
[  962.310110]  ? deref_stack_reg+0xb4/0x120
[  962.310117]  ? __read_once_size_nocheck.constprop.7+0x10/0x10
[  962.310124]  ? worker_enter_idle+0x47f/0x6a0
[  962.310134]  ? schedule+0xd7/0x2e0
[  962.310141]  ? __schedule+0x1e80/0x1e80
[  962.310148]  ? _raw_spin_lock_irq+0x9f/0x130
[  962.310155]  ? _raw_write_unlock_irqrestore+0x110/0x110
[  962.310164]  worker_thread+0x196/0x11e0
[  962.310175]  ? set_load_weight+0x2e0/0x2e0
[  962.310181]  ? __switch_to_asm+0x34/0x70
[  962.310187]  ? __switch_to_asm+0x40/0x70
[  962.310194]  ? process_one_work+0x1400/0x1400
[  962.310199]  ? __switch_to_asm+0x40/0x70
[  962.310205]  ? __switch_to_asm+0x34/0x70
[  962.310211]  ? __switch_to_asm+0x34/0x70
[  962.310216]  ? __switch_to_asm+0x40/0x70
[  962.310221]  ? __switch_to_asm+0x34/0x70
[  962.310226]  ? __switch_to_asm+0x40/0x70
[  962.310231]  ? __switch_to_asm+0x34/0x70
[  962.310236]  ? __switch_to_asm+0x40/0x70
[  962.310242]  ? syscall_return_via_sysret+0xf/0x7f
[  962.310248]  ? __switch_to_asm+0x34/0x70
[  962.310253]  ? __switch_to_asm+0x40/0x70
[  962.310258]  ? __switch_to_asm+0x34/0x70
[  962.310263]  ? __switch_to_asm+0x40/0x70
[  962.310268]  ? __switch_to_asm+0x34/0x70
[  962.310273]  ? __switch_to_asm+0x40/0x70
[  962.310281]  ? __schedule+0x87f/0x1e80
[  962.310292]  ? __sched_text_start+0x8/0x8
[  962.310300]  ? save_stack+0x8c/0xb0
[  962.310308]  ? __kasan_kmalloc.constprop.6+0xc6/0xd0
[  962.310313]  ? kthread+0x98/0x3a0
[  962.310318]  ? ret_from_fork+0x35/0x40
[  962.310334]  ? __wake_up_common+0x178/0x6f0
[  962.310343]  ? _raw_spin_lock_irqsave+0xa4/0x140
[  962.310349]  ? __lock_text_start+0x8/0x8
[  962.310355]  ? _raw_write_lock_irqsave+0x70/0x130
[  962.310360]  ? __lock_text_start+0x8/0x8
[  962.310371]  ? process_one_work+0x1400/0x1400
[  962.310376]  kthread+0x2e2/0x3a0
[  962.310383]  ? kthread_create_on_node+0xc0/0xc0
[  962.310389]  ret_from_fork+0x35/0x40

[  962.310401] Allocated by task 1462:
[  962.310410]  __kasan_kmalloc.constprop.6+0xc6/0xd0
[  962.310437]  drm_dp_add_port+0xd60/0x1960 [drm_kms_helper]
[  962.310464]  drm_dp_send_link_address+0x4b0/0x770 [drm_kms_helper]
[  962.310491]  drm_dp_check_and_send_link_address+0x197/0x1f0 [drm_kms_helper]
[  962.310515]  drm_dp_mst_link_probe_work+0x2b6/0x330 [drm_kms_helper]
[  962.310522]  process_one_work+0x884/0x1400
[  962.310529]  worker_thread+0x196/0x11e0
[  962.310533]  kthread+0x2e2/0x3a0
[  962.310538]  ret_from_fork+0x35/0x40

[  962.310543] Freed by task 500:
[  962.310550]  __kasan_slab_free+0x133/0x180
[  962.310555]  kfree+0x92/0x1a0
[  962.310581]  drm_dp_mst_put_port_malloc+0x14d/0x180 [drm_kms_helper]
[  962.310693]  intel_connector_destroy+0xb2/0xe0 [i915]
[  962.310747]  drm_mode_object_put.part.0+0x12b/0x1a0 [drm]
[  962.310802]  drm_atomic_state_default_clear+0x1f2/0xcc0 [drm]
[  962.310916]  intel_atomic_state_clear+0xe/0x80 [i915]
[  962.310972]  __drm_atomic_state_free+0x35/0xd0 [drm]
[  962.311083]  intel_atomic_cleanup_work+0x56/0x70 [i915]
[  962.311092]  process_one_work+0x884/0x1400
[  962.311098]  worker_thread+0x196/0x11e0
[  962.311103]  kthread+0x2e2/0x3a0
[  962.311108]  ret_from_fork+0x35/0x40

[  962.311116] The buggy address belongs to the object at ffff888416c30000
                which belongs to the cache kmalloc-2k of size 2048
[  962.311122] The buggy address is located 4 bytes inside of
                2048-byte region [ffff888416c30000, ffff888416c30800)
[  962.311124] The buggy address belongs to the page:
[  962.311132] page:ffffea00105b0c00 count:1 mapcount:0 mapping:ffff88841d003040 index:0x0 compound_mapcount: 0
[  962.311142] flags: 0x8000000000010200(slab|head)
[  962.311152] raw: 8000000000010200 dead000000000100 dead000000000200 ffff88841d003040
[  962.311159] raw: 0000000000000000 00000000000f000f 00000001ffffffff 0000000000000000
[  962.311162] page dumped because: kasan: bad access detected

So, bail early if drm_dp_mst_deallocate_vcpi() is called on a port with
no VCPI allocation. Additionally, clean up the surrounding kerneldoc
while we're at it since the port is assumed to be kept around because
the DRM driver is expected to hold a malloc reference to it, not just
us.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations")
Cc: Daniel Vetter <daniel@ffwll.ch>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 2552a27362a0..8ba0e5b368da 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -3246,15 +3246,13 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
 /**
  * drm_dp_mst_deallocate_vcpi() - deallocate a VCPI
  * @mgr: manager for this port
- * @port: unverified port to deallocate vcpi for
+ * @port: port to deallocate vcpi for
  */
 void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
 				struct drm_dp_mst_port *port)
 {
-	/*
-	 * A port with VCPI will remain allocated until it's VCPI is
-	 * released, no verified ref needed
-	 */
+	if (!port->vcpi.vcpi)
+		return;
 
 	drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
 	port->vcpi.num_slots = 0;
-- 
2.20.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] drm/atomic: Add drm_atomic_state->suspend_or_resume
  2019-01-29 18:39 [PATCH 0/3] drm/dp_mst: Fix regressions from new atomic VCPI helpers Lyude Paul
@ 2019-01-29 18:39   ` Lyude Paul
  2019-01-29 18:39   ` Lyude Paul
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Lyude Paul @ 2019-01-29 18:39 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Daniel Vetter, Maarten Lankhorst, Maxime Ripard, Sean Paul,
	David Airlie, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	linux-kernel

Since

commit 39b50c603878 ("drm/atomic_helper: Stop modesets on unregistered
connectors harder")

We've been failing atomic checks if they try to enable new displays on
unregistered connectors. This is fine except for the one situation that
breaks atomic assumptions: suspend/resume. If a connector is
unregistered before we attempt to restore the atomic state, something we
end up failing the atomic check that happens when trying to restore the
state during resume.

Normally this would be OK: we try our best to make sure that the atomic
state pre-suspend can be restored post-suspend, but failures at that
point usually don't cause problems. That is of course, until we
introduced the new atomic MST VCPI helpers:

[drm:drm_atomic_helper_check_modeset [drm_kms_helper]] [CRTC:65:pipe B] active changed
[drm:drm_atomic_helper_check_modeset [drm_kms_helper]] Updating routing for [CONNECTOR:123:DP-5]
[drm:drm_atomic_helper_check_modeset [drm_kms_helper]] Disabling [CONNECTOR:123:DP-5]
[drm:drm_atomic_get_private_obj_state [drm]] Added new private object 0000000025844636 state 000000009fd2899a to 000000003a13d7b8
WARNING: CPU: 6 PID: 1070 at drivers/gpu/drm/drm_dp_mst_topology.c:3153 drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper]
Modules linked in: fuse vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic joydev iTCO_wdt i915(O) wmi_bmof intel_rapl btusb btrtl x86_pkg_temp_thermal btbcm btintel coretemp i2c_algo_bit drm_kms_helper(O) crc32_pclmul snd_hda_intel syscopyarea sysfillrect snd_hda_codec sysimgblt snd_hda_core bluetooth fb_sys_fops snd_pcm pcspkr drm(O) psmouse snd_timer mei_me ecdh_generic i2c_i801 mei i2c_core ucsi_acpi typec_ucsi typec wmi thinkpad_acpi ledtrig_audio snd soundcore tpm_tis rfkill tpm_tis_core video tpm acpi_pad pcc_cpufreq uas usb_storage crc32c_intel nvme serio_raw xhci_pci nvme_core xhci_hcd
CPU: 6 PID: 1070 Comm: gnome-shell Tainted: G        W  O      5.0.0-rc2Lyude-Test+ #1
Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
RIP: 0010:drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper]
Code: 00 4c 39 6d f0 74 49 48 8d 7b 10 48 89 f9 48 c1 e9 03 42 80 3c 21 00 0f 85 d2 00 00 00 48 8b 6b 10 48 8d 5d f0 49 39 ee 75 c5 <0f> 0b 48 c7 c7 c0 78 b3 a0 48 89 c2 4c 89 ee e8 03 6c aa ff b8 ea
RSP: 0018:ffff88841235f268 EFLAGS: 00010246
RAX: ffff88841bf12ab0 RBX: ffff88841bf12aa8 RCX: 1ffff110837e2557
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffed108246bde0
RBP: ffff88841bf12ab8 R08: ffffed1083db3c93 R09: ffffed1083db3c92
R10: ffffed1083db3c92 R11: ffff88841ed9e497 R12: ffff888419555d80
R13: ffff8883bc499100 R14: ffff88841bf12ab8 R15: 0000000000000000
FS:  00007f16fbd4cd00(0000) GS:ffff88841ed80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1687c9f000 CR3: 00000003ba3cc003 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 drm_atomic_helper_check_modeset+0xf21/0x2f50 [drm_kms_helper]
 ? drm_atomic_helper_commit_modeset_enables+0xa90/0xa90 [drm_kms_helper]
 ? __printk_safe_exit+0x10/0x10
 ? save_stack+0x8c/0xb0
 ? vprintk_func+0x96/0x1bf
 ? __printk_safe_exit+0x10/0x10
 intel_atomic_check+0x234/0x4750 [i915]
 ? printk+0x9f/0xc5
 ? kmsg_dump_rewind_nolock+0xd9/0xd9
 ? _raw_spin_lock_irqsave+0xa4/0x140
 ? drm_atomic_check_only+0xb1/0x28b0 [drm]
 ? drm_dbg+0x186/0x1b0 [drm]
 ? drm_dev_dbg+0x200/0x200 [drm]
 ? intel_link_compute_m_n+0xb0/0xb0 [i915]
 ? drm_mode_put_tile_group+0x20/0x20 [drm]
 ? skl_plane_format_mod_supported+0x17f/0x1b0 [i915]
 ? drm_plane_check_pixel_format+0x14a/0x310 [drm]
 drm_atomic_check_only+0x13c4/0x28b0 [drm]
 ? drm_state_info+0x220/0x220 [drm]
 ? drm_atomic_helper_disable_plane+0x1d0/0x1d0 [drm_kms_helper]
 ? pick_single_encoder_for_connector+0xe0/0xe0 [drm_kms_helper]
 ? kasan_unpoison_shadow+0x35/0x40
 drm_atomic_commit+0x3b/0x100 [drm]
 drm_atomic_helper_set_config+0xd5/0x100 [drm_kms_helper]
 drm_mode_setcrtc+0x636/0x1660 [drm]
 ? vprintk_func+0x96/0x1bf
 ? drm_dev_dbg+0x200/0x200 [drm]
 ? drm_mode_getcrtc+0x790/0x790 [drm]
 ? printk+0x9f/0xc5
 ? mutex_unlock+0x1d/0x40
 ? drm_mode_addfb2+0x2e9/0x3a0 [drm]
 ? rcu_sync_dtor+0x2e0/0x2e0
 ? drm_dbg+0x186/0x1b0 [drm]
 ? set_page_dirty+0x271/0x4d0
 drm_ioctl_kernel+0x203/0x290 [drm]
 ? drm_mode_getcrtc+0x790/0x790 [drm]
 ? drm_setversion+0x7f0/0x7f0 [drm]
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x34/0x70
 drm_ioctl+0x445/0x950 [drm]
 ? drm_mode_getcrtc+0x790/0x790 [drm]
 ? drm_getunique+0x220/0x220 [drm]
 ? expand_files.part.10+0x920/0x920
 do_vfs_ioctl+0x1a1/0x13d0
 ? ioctl_preallocate+0x2b0/0x2b0
 ? __fget_light+0x2d6/0x390
 ? schedule+0xd7/0x2e0
 ? fget_raw+0x10/0x10
 ? apic_timer_interrupt+0xa/0x20
 ? apic_timer_interrupt+0xa/0x20
 ? rcu_cleanup_dead_rnp+0x2c0/0x2c0
 ksys_ioctl+0x60/0x90
 __x64_sys_ioctl+0x6f/0xb0
 do_syscall_64+0x136/0x440
 ? syscall_return_slowpath+0x2d0/0x2d0
 ? do_page_fault+0x89/0x330
 ? __do_page_fault+0x9c0/0x9c0
 ? prepare_exit_to_usermode+0x188/0x200
 ? perf_trace_sys_enter+0x1090/0x1090
 ? __x64_sys_sigaltstack+0x280/0x280
 ? __put_user_4+0x1c/0x30
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f16ff89a09b
Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff001232b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fff001232f0 RCX: 00007f16ff89a09b
RDX: 00007fff001232f0 RSI: 00000000c06864a2 RDI: 000000000000000b
RBP: 00007fff001232f0 R08: 0000000000000000 R09: 000055a79d484460
R10: 000055a79d44e770 R11: 0000000000000246 R12: 00000000c06864a2
R13: 000000000000000b R14: 0000000000000000 R15: 000055a79d44e770
WARNING: CPU: 6 PID: 1070 at drivers/gpu/drm/drm_dp_mst_topology.c:3153 drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper]
---[ end trace d536c05c13c83be2 ]---
[drm:drm_dp_atomic_release_vcpi_slots [drm_kms_helper]] *ERROR* no VCPI for [MST PORT:00000000f9e2b143] found in mst state 000000009fd2899a

This appears to be happening because we destroy the VCPI allocations
when disabling all connected displays while suspending, and those VCPI
allocations don't get restored on resume due to failing to restore the
atomic state.

So, fix this by introducing the suspending option to
drm_atomic_helper_duplicate_state() and use that to indicate in the
atomic state that it's being used for suspending or resuming the system,
and thus needs to be fixed up by the driver. We can then use the new
state->suspend_or_resume hook to fixup the atomic VCPI helpers so they
merely save and restore the VCPI allocations for such states instead of
performing error checking on them. This fixes the warnings during
suspend/resume when a topology is removed from the system, and allows
us to once again restore the atomic state on resume without any issues.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations")
Cc: Daniel Vetter <daniel@ffwll.ch>
---
 drivers/gpu/drm/drm_atomic_helper.c   | 16 +++++++++---
 drivers/gpu/drm/drm_dp_mst_topology.c | 36 ++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_display.c  |  4 +--
 include/drm/drm_atomic.h              | 11 ++++++++
 include/drm/drm_atomic_helper.h       |  3 ++-
 5 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 6fe2303fccd9..7d23dafdffbc 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -332,8 +332,15 @@ update_connector_routing(struct drm_atomic_state *state,
 	 * about is ensuring that userspace can't do anything but shut off the
 	 * display on a connector that was destroyed after its been notified,
 	 * not before.
+	 *
+	 * Additionally, we also want to ignore connector registration when
+	 * we're trying to restore an atomic state during system resume since
+	 * there's a chance the connector may have been destroyed during the
+	 * process, but it's better to ignore that then cause
+	 * drm_atomic_helper_resume() to fail.
 	 */
-	if (drm_connector_is_unregistered(connector) && crtc_state->active) {
+	if (!state->suspend_or_resume &&
+	    drm_connector_is_unregistered(connector) && crtc_state->active) {
 		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
 				 connector->base.id, connector->name);
 		return -EINVAL;
@@ -3144,6 +3151,7 @@ EXPORT_SYMBOL(drm_atomic_helper_shutdown);
  * drm_atomic_helper_duplicate_state - duplicate an atomic state object
  * @dev: DRM device
  * @ctx: lock acquisition context
+ * @suspending: If this state is being duplicated as part of system suspend
  *
  * Makes a copy of the current atomic state by looping over all objects and
  * duplicating their respective states. This is used for example by suspend/
@@ -3166,7 +3174,8 @@ EXPORT_SYMBOL(drm_atomic_helper_shutdown);
  */
 struct drm_atomic_state *
 drm_atomic_helper_duplicate_state(struct drm_device *dev,
-				  struct drm_modeset_acquire_ctx *ctx)
+				  struct drm_modeset_acquire_ctx *ctx,
+				  bool suspending)
 {
 	struct drm_atomic_state *state;
 	struct drm_connector *conn;
@@ -3180,6 +3189,7 @@ drm_atomic_helper_duplicate_state(struct drm_device *dev,
 		return ERR_PTR(-ENOMEM);
 
 	state->acquire_ctx = ctx;
+	state->suspend_or_resume = suspending;
 
 	drm_for_each_crtc(crtc, dev) {
 		struct drm_crtc_state *crtc_state;
@@ -3263,7 +3273,7 @@ struct drm_atomic_state *drm_atomic_helper_suspend(struct drm_device *dev)
 
 	DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, 0, err);
 
-	state = drm_atomic_helper_duplicate_state(dev, &ctx);
+	state = drm_atomic_helper_duplicate_state(dev, &ctx, true);
 	if (IS_ERR(state))
 		goto unlock;
 
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 8ba0e5b368da..371010f8d42e 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -3032,6 +3032,10 @@ static int drm_dp_init_vcpi(struct drm_dp_mst_topology_mgr *mgr,
  * @port as needed. It is not OK however, to call this function and
  * drm_dp_atomic_release_vcpi_slots() in the same atomic check phase.
  *
+ * When &drm_atomic_state.suspend_or_resume is set to %true%, this function
+ * will not perform any error checking and will instead simply retrieve the
+ * previously recorded VCPI allocation that was present before system suspend.
+ *
  * See also:
  * drm_dp_atomic_release_vcpi_slots()
  * drm_dp_mst_atomic_check()
@@ -3052,9 +3056,11 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
 	if (IS_ERR(topology_state))
 		return PTR_ERR(topology_state);
 
-	port = drm_dp_mst_topology_get_port_validated(mgr, port);
-	if (port == NULL)
-		return -EINVAL;
+	if (!state->suspend_or_resume) {
+		port = drm_dp_mst_topology_get_port_validated(mgr, port);
+		if (!port)
+			return -EINVAL;
+	}
 
 	/* Find the current allocation for this port, if any */
 	list_for_each_entry(pos, &topology_state->vcpis, next) {
@@ -3062,6 +3068,19 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
 			vcpi = pos;
 			prev_slots = vcpi->vcpi;
 
+			/*
+			 * When resuming, we just want to restore the previous
+			 * VCPI without doing error checking
+			 */
+			if (state->suspend_or_resume) {
+				DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] [MST PORT:%p] restoring VCPI of %d\n",
+						 port->connector->base.id,
+						 port->connector->name,
+						 port, prev_slots);
+
+				return prev_slots;
+			}
+
 			/*
 			 * This should never happen, unless the driver tries
 			 * releasing and allocating the same VCPI allocation,
@@ -3124,6 +3143,11 @@ EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
  * drm_dp_atomic_find_vcpi_slots() on the same @port in a single atomic check
  * phase.
  *
+ * When &drm_atomic_state.suspend_or_resume is set, this function will not
+ * modify the VCPI allocations in &drm_dp_mst_topology_state.vcpis, so that
+ * those VCPI allocations may be restored as-is during system resume. In this
+ * scenario, this function will always return 0.
+ *
  * See also:
  * drm_dp_atomic_find_vcpi_slots()
  * drm_dp_mst_atomic_check()
@@ -3140,6 +3164,12 @@ int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
 	struct drm_dp_vcpi_allocation *pos;
 	bool found = false;
 
+	/* During suspend, just keep our VCPI allocations as-is in the atomic
+	 * state so they can be restored as-is at resume
+	 */
+	if (state->suspend_or_resume)
+		return 0;
+
 	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
 	if (IS_ERR(topology_state))
 		return PTR_ERR(topology_state);
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 4a282532af81..4c70f1531119 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3824,7 +3824,7 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
 	 * Disabling the crtcs gracefully seems nicer. Also the
 	 * g33 docs say we should at least disable all the planes.
 	 */
-	state = drm_atomic_helper_duplicate_state(dev, ctx);
+	state = drm_atomic_helper_duplicate_state(dev, ctx, false);
 	if (IS_ERR(state)) {
 		ret = PTR_ERR(state);
 		DRM_ERROR("Duplicating state failed with %i\n", ret);
@@ -15043,7 +15043,7 @@ static void sanitize_watermarks(struct drm_device *dev)
 		goto fail;
 	}
 
-	state = drm_atomic_helper_duplicate_state(dev, &ctx);
+	state = drm_atomic_helper_duplicate_state(dev, &ctx, false);
 	if (WARN_ON(IS_ERR(state)))
 		goto fail;
 
diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h
index 811b4a92568f..0897f51dcd62 100644
--- a/include/drm/drm_atomic.h
+++ b/include/drm/drm_atomic.h
@@ -329,6 +329,17 @@ struct drm_atomic_state {
 	bool allow_modeset : 1;
 	bool legacy_cursor_update : 1;
 	bool async_update : 1;
+	/**
+	 * @suspend_or_resume:
+	 *
+	 * Indicates whether or not this atomic state is being saved or
+	 * committed as part of suspending or resuming the system
+	 * respectively. Drivers and atomic helpers hould use this to fixup
+	 * normal state inconsistencies that might arise between system
+	 * suspend and resume, so as to ensure that restoring the atomic state
+	 * at resume never fails.
+	 */
+	bool suspend_or_resume : 1;
 	struct __drm_planes_state *planes;
 	struct __drm_crtcs_state *crtcs;
 	int num_connector;
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
index 58214be3bf3d..359f557b6898 100644
--- a/include/drm/drm_atomic_helper.h
+++ b/include/drm/drm_atomic_helper.h
@@ -129,7 +129,8 @@ int drm_atomic_helper_disable_all(struct drm_device *dev,
 void drm_atomic_helper_shutdown(struct drm_device *dev);
 struct drm_atomic_state *
 drm_atomic_helper_duplicate_state(struct drm_device *dev,
-				  struct drm_modeset_acquire_ctx *ctx);
+				  struct drm_modeset_acquire_ctx *ctx,
+				  bool suspending);
 struct drm_atomic_state *drm_atomic_helper_suspend(struct drm_device *dev);
 int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state,
 					      struct drm_modeset_acquire_ctx *ctx);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] drm/atomic: Add drm_atomic_state->suspend_or_resume
@ 2019-01-29 18:39   ` Lyude Paul
  0 siblings, 0 replies; 15+ messages in thread
From: Lyude Paul @ 2019-01-29 18:39 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Maxime Ripard, linux-kernel, David Airlie

Since

commit 39b50c603878 ("drm/atomic_helper: Stop modesets on unregistered
connectors harder")

We've been failing atomic checks if they try to enable new displays on
unregistered connectors. This is fine except for the one situation that
breaks atomic assumptions: suspend/resume. If a connector is
unregistered before we attempt to restore the atomic state, something we
end up failing the atomic check that happens when trying to restore the
state during resume.

Normally this would be OK: we try our best to make sure that the atomic
state pre-suspend can be restored post-suspend, but failures at that
point usually don't cause problems. That is of course, until we
introduced the new atomic MST VCPI helpers:

[drm:drm_atomic_helper_check_modeset [drm_kms_helper]] [CRTC:65:pipe B] active changed
[drm:drm_atomic_helper_check_modeset [drm_kms_helper]] Updating routing for [CONNECTOR:123:DP-5]
[drm:drm_atomic_helper_check_modeset [drm_kms_helper]] Disabling [CONNECTOR:123:DP-5]
[drm:drm_atomic_get_private_obj_state [drm]] Added new private object 0000000025844636 state 000000009fd2899a to 000000003a13d7b8
WARNING: CPU: 6 PID: 1070 at drivers/gpu/drm/drm_dp_mst_topology.c:3153 drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper]
Modules linked in: fuse vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic joydev iTCO_wdt i915(O) wmi_bmof intel_rapl btusb btrtl x86_pkg_temp_thermal btbcm btintel coretemp i2c_algo_bit drm_kms_helper(O) crc32_pclmul snd_hda_intel syscopyarea sysfillrect snd_hda_codec sysimgblt snd_hda_core bluetooth fb_sys_fops snd_pcm pcspkr drm(O) psmouse snd_timer mei_me ecdh_generic i2c_i801 mei i2c_core ucsi_acpi typec_ucsi typec wmi thinkpad_acpi ledtrig_audio snd soundcore tpm_tis rfkill tpm_tis_core video tpm acpi_pad pcc_cpufreq uas usb_storage crc32c_intel nvme serio_raw xhci_pci nvme_core xhci_hcd
CPU: 6 PID: 1070 Comm: gnome-shell Tainted: G        W  O      5.0.0-rc2Lyude-Test+ #1
Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
RIP: 0010:drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper]
Code: 00 4c 39 6d f0 74 49 48 8d 7b 10 48 89 f9 48 c1 e9 03 42 80 3c 21 00 0f 85 d2 00 00 00 48 8b 6b 10 48 8d 5d f0 49 39 ee 75 c5 <0f> 0b 48 c7 c7 c0 78 b3 a0 48 89 c2 4c 89 ee e8 03 6c aa ff b8 ea
RSP: 0018:ffff88841235f268 EFLAGS: 00010246
RAX: ffff88841bf12ab0 RBX: ffff88841bf12aa8 RCX: 1ffff110837e2557
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffed108246bde0
RBP: ffff88841bf12ab8 R08: ffffed1083db3c93 R09: ffffed1083db3c92
R10: ffffed1083db3c92 R11: ffff88841ed9e497 R12: ffff888419555d80
R13: ffff8883bc499100 R14: ffff88841bf12ab8 R15: 0000000000000000
FS:  00007f16fbd4cd00(0000) GS:ffff88841ed80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1687c9f000 CR3: 00000003ba3cc003 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 drm_atomic_helper_check_modeset+0xf21/0x2f50 [drm_kms_helper]
 ? drm_atomic_helper_commit_modeset_enables+0xa90/0xa90 [drm_kms_helper]
 ? __printk_safe_exit+0x10/0x10
 ? save_stack+0x8c/0xb0
 ? vprintk_func+0x96/0x1bf
 ? __printk_safe_exit+0x10/0x10
 intel_atomic_check+0x234/0x4750 [i915]
 ? printk+0x9f/0xc5
 ? kmsg_dump_rewind_nolock+0xd9/0xd9
 ? _raw_spin_lock_irqsave+0xa4/0x140
 ? drm_atomic_check_only+0xb1/0x28b0 [drm]
 ? drm_dbg+0x186/0x1b0 [drm]
 ? drm_dev_dbg+0x200/0x200 [drm]
 ? intel_link_compute_m_n+0xb0/0xb0 [i915]
 ? drm_mode_put_tile_group+0x20/0x20 [drm]
 ? skl_plane_format_mod_supported+0x17f/0x1b0 [i915]
 ? drm_plane_check_pixel_format+0x14a/0x310 [drm]
 drm_atomic_check_only+0x13c4/0x28b0 [drm]
 ? drm_state_info+0x220/0x220 [drm]
 ? drm_atomic_helper_disable_plane+0x1d0/0x1d0 [drm_kms_helper]
 ? pick_single_encoder_for_connector+0xe0/0xe0 [drm_kms_helper]
 ? kasan_unpoison_shadow+0x35/0x40
 drm_atomic_commit+0x3b/0x100 [drm]
 drm_atomic_helper_set_config+0xd5/0x100 [drm_kms_helper]
 drm_mode_setcrtc+0x636/0x1660 [drm]
 ? vprintk_func+0x96/0x1bf
 ? drm_dev_dbg+0x200/0x200 [drm]
 ? drm_mode_getcrtc+0x790/0x790 [drm]
 ? printk+0x9f/0xc5
 ? mutex_unlock+0x1d/0x40
 ? drm_mode_addfb2+0x2e9/0x3a0 [drm]
 ? rcu_sync_dtor+0x2e0/0x2e0
 ? drm_dbg+0x186/0x1b0 [drm]
 ? set_page_dirty+0x271/0x4d0
 drm_ioctl_kernel+0x203/0x290 [drm]
 ? drm_mode_getcrtc+0x790/0x790 [drm]
 ? drm_setversion+0x7f0/0x7f0 [drm]
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x34/0x70
 drm_ioctl+0x445/0x950 [drm]
 ? drm_mode_getcrtc+0x790/0x790 [drm]
 ? drm_getunique+0x220/0x220 [drm]
 ? expand_files.part.10+0x920/0x920
 do_vfs_ioctl+0x1a1/0x13d0
 ? ioctl_preallocate+0x2b0/0x2b0
 ? __fget_light+0x2d6/0x390
 ? schedule+0xd7/0x2e0
 ? fget_raw+0x10/0x10
 ? apic_timer_interrupt+0xa/0x20
 ? apic_timer_interrupt+0xa/0x20
 ? rcu_cleanup_dead_rnp+0x2c0/0x2c0
 ksys_ioctl+0x60/0x90
 __x64_sys_ioctl+0x6f/0xb0
 do_syscall_64+0x136/0x440
 ? syscall_return_slowpath+0x2d0/0x2d0
 ? do_page_fault+0x89/0x330
 ? __do_page_fault+0x9c0/0x9c0
 ? prepare_exit_to_usermode+0x188/0x200
 ? perf_trace_sys_enter+0x1090/0x1090
 ? __x64_sys_sigaltstack+0x280/0x280
 ? __put_user_4+0x1c/0x30
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f16ff89a09b
Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff001232b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fff001232f0 RCX: 00007f16ff89a09b
RDX: 00007fff001232f0 RSI: 00000000c06864a2 RDI: 000000000000000b
RBP: 00007fff001232f0 R08: 0000000000000000 R09: 000055a79d484460
R10: 000055a79d44e770 R11: 0000000000000246 R12: 00000000c06864a2
R13: 000000000000000b R14: 0000000000000000 R15: 000055a79d44e770
WARNING: CPU: 6 PID: 1070 at drivers/gpu/drm/drm_dp_mst_topology.c:3153 drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper]
---[ end trace d536c05c13c83be2 ]---
[drm:drm_dp_atomic_release_vcpi_slots [drm_kms_helper]] *ERROR* no VCPI for [MST PORT:00000000f9e2b143] found in mst state 000000009fd2899a

This appears to be happening because we destroy the VCPI allocations
when disabling all connected displays while suspending, and those VCPI
allocations don't get restored on resume due to failing to restore the
atomic state.

So, fix this by introducing the suspending option to
drm_atomic_helper_duplicate_state() and use that to indicate in the
atomic state that it's being used for suspending or resuming the system,
and thus needs to be fixed up by the driver. We can then use the new
state->suspend_or_resume hook to fixup the atomic VCPI helpers so they
merely save and restore the VCPI allocations for such states instead of
performing error checking on them. This fixes the warnings during
suspend/resume when a topology is removed from the system, and allows
us to once again restore the atomic state on resume without any issues.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations")
Cc: Daniel Vetter <daniel@ffwll.ch>
---
 drivers/gpu/drm/drm_atomic_helper.c   | 16 +++++++++---
 drivers/gpu/drm/drm_dp_mst_topology.c | 36 ++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_display.c  |  4 +--
 include/drm/drm_atomic.h              | 11 ++++++++
 include/drm/drm_atomic_helper.h       |  3 ++-
 5 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 6fe2303fccd9..7d23dafdffbc 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -332,8 +332,15 @@ update_connector_routing(struct drm_atomic_state *state,
 	 * about is ensuring that userspace can't do anything but shut off the
 	 * display on a connector that was destroyed after its been notified,
 	 * not before.
+	 *
+	 * Additionally, we also want to ignore connector registration when
+	 * we're trying to restore an atomic state during system resume since
+	 * there's a chance the connector may have been destroyed during the
+	 * process, but it's better to ignore that then cause
+	 * drm_atomic_helper_resume() to fail.
 	 */
-	if (drm_connector_is_unregistered(connector) && crtc_state->active) {
+	if (!state->suspend_or_resume &&
+	    drm_connector_is_unregistered(connector) && crtc_state->active) {
 		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
 				 connector->base.id, connector->name);
 		return -EINVAL;
@@ -3144,6 +3151,7 @@ EXPORT_SYMBOL(drm_atomic_helper_shutdown);
  * drm_atomic_helper_duplicate_state - duplicate an atomic state object
  * @dev: DRM device
  * @ctx: lock acquisition context
+ * @suspending: If this state is being duplicated as part of system suspend
  *
  * Makes a copy of the current atomic state by looping over all objects and
  * duplicating their respective states. This is used for example by suspend/
@@ -3166,7 +3174,8 @@ EXPORT_SYMBOL(drm_atomic_helper_shutdown);
  */
 struct drm_atomic_state *
 drm_atomic_helper_duplicate_state(struct drm_device *dev,
-				  struct drm_modeset_acquire_ctx *ctx)
+				  struct drm_modeset_acquire_ctx *ctx,
+				  bool suspending)
 {
 	struct drm_atomic_state *state;
 	struct drm_connector *conn;
@@ -3180,6 +3189,7 @@ drm_atomic_helper_duplicate_state(struct drm_device *dev,
 		return ERR_PTR(-ENOMEM);
 
 	state->acquire_ctx = ctx;
+	state->suspend_or_resume = suspending;
 
 	drm_for_each_crtc(crtc, dev) {
 		struct drm_crtc_state *crtc_state;
@@ -3263,7 +3273,7 @@ struct drm_atomic_state *drm_atomic_helper_suspend(struct drm_device *dev)
 
 	DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, 0, err);
 
-	state = drm_atomic_helper_duplicate_state(dev, &ctx);
+	state = drm_atomic_helper_duplicate_state(dev, &ctx, true);
 	if (IS_ERR(state))
 		goto unlock;
 
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 8ba0e5b368da..371010f8d42e 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -3032,6 +3032,10 @@ static int drm_dp_init_vcpi(struct drm_dp_mst_topology_mgr *mgr,
  * @port as needed. It is not OK however, to call this function and
  * drm_dp_atomic_release_vcpi_slots() in the same atomic check phase.
  *
+ * When &drm_atomic_state.suspend_or_resume is set to %true%, this function
+ * will not perform any error checking and will instead simply retrieve the
+ * previously recorded VCPI allocation that was present before system suspend.
+ *
  * See also:
  * drm_dp_atomic_release_vcpi_slots()
  * drm_dp_mst_atomic_check()
@@ -3052,9 +3056,11 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
 	if (IS_ERR(topology_state))
 		return PTR_ERR(topology_state);
 
-	port = drm_dp_mst_topology_get_port_validated(mgr, port);
-	if (port == NULL)
-		return -EINVAL;
+	if (!state->suspend_or_resume) {
+		port = drm_dp_mst_topology_get_port_validated(mgr, port);
+		if (!port)
+			return -EINVAL;
+	}
 
 	/* Find the current allocation for this port, if any */
 	list_for_each_entry(pos, &topology_state->vcpis, next) {
@@ -3062,6 +3068,19 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
 			vcpi = pos;
 			prev_slots = vcpi->vcpi;
 
+			/*
+			 * When resuming, we just want to restore the previous
+			 * VCPI without doing error checking
+			 */
+			if (state->suspend_or_resume) {
+				DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] [MST PORT:%p] restoring VCPI of %d\n",
+						 port->connector->base.id,
+						 port->connector->name,
+						 port, prev_slots);
+
+				return prev_slots;
+			}
+
 			/*
 			 * This should never happen, unless the driver tries
 			 * releasing and allocating the same VCPI allocation,
@@ -3124,6 +3143,11 @@ EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
  * drm_dp_atomic_find_vcpi_slots() on the same @port in a single atomic check
  * phase.
  *
+ * When &drm_atomic_state.suspend_or_resume is set, this function will not
+ * modify the VCPI allocations in &drm_dp_mst_topology_state.vcpis, so that
+ * those VCPI allocations may be restored as-is during system resume. In this
+ * scenario, this function will always return 0.
+ *
  * See also:
  * drm_dp_atomic_find_vcpi_slots()
  * drm_dp_mst_atomic_check()
@@ -3140,6 +3164,12 @@ int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
 	struct drm_dp_vcpi_allocation *pos;
 	bool found = false;
 
+	/* During suspend, just keep our VCPI allocations as-is in the atomic
+	 * state so they can be restored as-is at resume
+	 */
+	if (state->suspend_or_resume)
+		return 0;
+
 	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
 	if (IS_ERR(topology_state))
 		return PTR_ERR(topology_state);
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 4a282532af81..4c70f1531119 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3824,7 +3824,7 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
 	 * Disabling the crtcs gracefully seems nicer. Also the
 	 * g33 docs say we should at least disable all the planes.
 	 */
-	state = drm_atomic_helper_duplicate_state(dev, ctx);
+	state = drm_atomic_helper_duplicate_state(dev, ctx, false);
 	if (IS_ERR(state)) {
 		ret = PTR_ERR(state);
 		DRM_ERROR("Duplicating state failed with %i\n", ret);
@@ -15043,7 +15043,7 @@ static void sanitize_watermarks(struct drm_device *dev)
 		goto fail;
 	}
 
-	state = drm_atomic_helper_duplicate_state(dev, &ctx);
+	state = drm_atomic_helper_duplicate_state(dev, &ctx, false);
 	if (WARN_ON(IS_ERR(state)))
 		goto fail;
 
diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h
index 811b4a92568f..0897f51dcd62 100644
--- a/include/drm/drm_atomic.h
+++ b/include/drm/drm_atomic.h
@@ -329,6 +329,17 @@ struct drm_atomic_state {
 	bool allow_modeset : 1;
 	bool legacy_cursor_update : 1;
 	bool async_update : 1;
+	/**
+	 * @suspend_or_resume:
+	 *
+	 * Indicates whether or not this atomic state is being saved or
+	 * committed as part of suspending or resuming the system
+	 * respectively. Drivers and atomic helpers hould use this to fixup
+	 * normal state inconsistencies that might arise between system
+	 * suspend and resume, so as to ensure that restoring the atomic state
+	 * at resume never fails.
+	 */
+	bool suspend_or_resume : 1;
 	struct __drm_planes_state *planes;
 	struct __drm_crtcs_state *crtcs;
 	int num_connector;
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
index 58214be3bf3d..359f557b6898 100644
--- a/include/drm/drm_atomic_helper.h
+++ b/include/drm/drm_atomic_helper.h
@@ -129,7 +129,8 @@ int drm_atomic_helper_disable_all(struct drm_device *dev,
 void drm_atomic_helper_shutdown(struct drm_device *dev);
 struct drm_atomic_state *
 drm_atomic_helper_duplicate_state(struct drm_device *dev,
-				  struct drm_modeset_acquire_ctx *ctx);
+				  struct drm_modeset_acquire_ctx *ctx,
+				  bool suspending);
 struct drm_atomic_state *drm_atomic_helper_suspend(struct drm_device *dev);
 int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state,
 					      struct drm_modeset_acquire_ctx *ctx);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] drm/i915: Always allocate VCPI during system resume
  2019-01-29 18:39 [PATCH 0/3] drm/dp_mst: Fix regressions from new atomic VCPI helpers Lyude Paul
@ 2019-01-29 18:39   ` Lyude Paul
  2019-01-29 18:39   ` Lyude Paul
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Lyude Paul @ 2019-01-29 18:39 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Daniel Vetter, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, linux-kernel

Since there's a chance MST topologies can be removed while the system is
in suspend, there's also a chance that the connectors in the atomic
state which we try to restore on system resume will have been
unregistered if they were part of an MST topology that was removed
mid-suspend. In such situations, we'll currently skip recalculating the
VCPI. Normally this would be safe since we don't expect to be allowed to
enable displays on unregistered connections, but since we need to try
our best to restore even partially invalid atomic states on system
resume we end up introducing the chance that we try enabling a display
on a now-unregistered connector. This of course results on a blow up
during system resume:

[ 1894.177788] [drm:pipe_config_err [i915]] *ERROR* mismatch in dp_m_n (expected tu 0 gmch 1730150/8388608 link 288358/1048576, or tu 0 gmch 0/0 link 0/0, found tu 64, gmch 1730150/8388608 link 288358/1048576)
[ 1894.177795] ------------[ cut here ]------------
[ 1894.177799] pipe state doesn't match!
[ 1894.177905] WARNING: CPU: 2 PID: 1894 at drivers/gpu/drm/i915/intel_display.c:12292 intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
[ 1894.177909] Modules linked in: i915(O) drm_kms_helper(O) drm(O) vfat fat joydev iTCO_wdt wmi_bmof btusb btrtl btbcm intel_rapl btintel i2c_algo_bit x86_pkg_temp_thermal bluetooth syscopyarea coretemp sysfillrect sysimgblt crc32_pclmul fb_sys_fops ucsi_acpi psmouse ecdh_generic pcspkr typec_ucsi i2c_i801 typec mei_me mei i2c_core wmi thinkpad_acpi ledtrig_audio snd soundcore rfkill tpm_tis tpm_tis_core video pcc_cpufreq tpm acpi_pad uas usb_storage crc32c_intel nvme serio_raw nvme_core xhci_pci xhci_hcd [last unloaded: drm]
[ 1894.177990] CPU: 2 PID: 1894 Comm: kworker/u16:8 Tainted: G           O      5.0.0-rc4Lyude-Test+ #9
[ 1894.177994] Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
[ 1894.178005] Workqueue: events_unbound async_run_entry_fn
[ 1894.178079] RIP: 0010:intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
[ 1894.178085] Code: ba 01 00 00 00 4c 89 4c 24 10 e8 3d 46 01 00 4c 8b 4c 24 10 e9 a4 fb ff ff e8 b2 18 bd e0 0f 0b e9 a5 fd ff ff e8 a6 18 bd e0 <0f> 0b e9 f1 f9 ff ff e8 9a 18 bd e0 0f 0b 0f b6 44 24 18 e9 23 f9
[ 1894.178090] RSP: 0000:ffffc90000553c08 EFLAGS: 00010286
[ 1894.178096] RAX: 0000000000000000 RBX: ffff888459edc000 RCX: 0000000000000006
[ 1894.178101] RDX: 0000000000000007 RSI: ffff8884591e2f90 RDI: ffff88845e295690
[ 1894.178105] RBP: ffff888454e27000 R08: 0000000000000000 R09: 0000000000000000
[ 1894.178108] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888454e22000
[ 1894.178112] R13: ffff888454e21000 R14: ffff8883ca680750 R15: ffff8883ca680758
[ 1894.178118] FS:  0000000000000000(0000) GS:ffff88845e280000(0000) knlGS:0000000000000000
[ 1894.178123] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1894.178127] CR2: 0000000000000000 CR3: 0000000002214001 CR4: 00000000003606e0
[ 1894.178131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1894.178135] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1894.178139] Call Trace:
[ 1894.178224]  intel_atomic_commit+0x240/0x320 [i915]
[ 1894.178251]  drm_atomic_helper_commit_duplicated_state+0xc9/0xe0 [drm_kms_helper]
[ 1894.178321]  __intel_display_resume+0x82/0xd0 [i915]
[ 1894.178391]  intel_display_resume+0xcf/0x120 [i915]
[ 1894.178453]  i915_drm_resume+0xb1/0x100 [i915]
[ 1894.178465]  ? pci_pm_suspend_late+0x30/0x30
[ 1894.178473]  dpm_run_callback+0x70/0x210
[ 1894.178484]  device_resume+0xae/0x1f0
[ 1894.178493]  async_resume+0x19/0x30
[ 1894.178502]  async_run_entry_fn+0x39/0x160
[ 1894.178513]  process_one_work+0x23c/0x590
[ 1894.178529]  worker_thread+0x30/0x380
[ 1894.178539]  ? rescuer_thread+0x340/0x340
[ 1894.178545]  kthread+0x112/0x130
[ 1894.178552]  ? kthread_create_on_node+0x60/0x60
[ 1894.178563]  ret_from_fork+0x3a/0x50
[ 1894.178582] irq event stamp: 76782
[ 1894.178591] hardirqs last  enabled at (76781): [<ffffffff8112c135>] vprintk_emit+0x2c5/0x2d0
[ 1894.178600] hardirqs last disabled at (76782): [<ffffffff81001ae8>] trace_hardirqs_off_thunk+0x1a/0x1c
[ 1894.178609] softirqs last  enabled at (76734): [<ffffffff81c0035d>] __do_softirq+0x35d/0x412
[ 1894.178618] softirqs last disabled at (76727): [<ffffffff810bf830>] irq_exit+0xe0/0xf0
[ 1894.178687] WARNING: CPU: 2 PID: 1894 at drivers/gpu/drm/i915/intel_display.c:12292 intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
[ 1894.178692] ---[ end trace 0df08c0b9a67376e ]---

So, fix this by using the new drm_atomic_state.suspend_or_resume hook in
order to force us to retrieve a VCPI allocation when restoring a pipe's
atomic state. This is safe, since drm_dp_atomic_find_vcpi_slots() will
just return the VCPI allocation we had pre-suspend.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations")
---
 drivers/gpu/drm/i915/intel_dp_mst.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
index cdb83d294cdd..04c9a16fdc39 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -80,8 +80,13 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder,
 	mst_pbn = drm_dp_calc_pbn_mode(adjusted_mode->crtc_clock, bpp);
 	pipe_config->pbn = mst_pbn;
 
-	/* Zombie connectors can't have VCPI slots */
-	if (!drm_connector_is_unregistered(connector)) {
+	/*
+	 * Zombie connectors can't have VCPI slots and won't be turned on,
+	 * except during resume where we must make sure we restore the
+	 * previous VCPI allocations
+	 */
+	if (!drm_connector_is_unregistered(connector) ||
+	    state->suspend_or_resume) {
 		slots = drm_dp_atomic_find_vcpi_slots(state,
 						      &intel_dp->mst_mgr,
 						      port,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] drm/i915: Always allocate VCPI during system resume
@ 2019-01-29 18:39   ` Lyude Paul
  0 siblings, 0 replies; 15+ messages in thread
From: Lyude Paul @ 2019-01-29 18:39 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: David Airlie, linux-kernel, Rodrigo Vivi

Since there's a chance MST topologies can be removed while the system is
in suspend, there's also a chance that the connectors in the atomic
state which we try to restore on system resume will have been
unregistered if they were part of an MST topology that was removed
mid-suspend. In such situations, we'll currently skip recalculating the
VCPI. Normally this would be safe since we don't expect to be allowed to
enable displays on unregistered connections, but since we need to try
our best to restore even partially invalid atomic states on system
resume we end up introducing the chance that we try enabling a display
on a now-unregistered connector. This of course results on a blow up
during system resume:

[ 1894.177788] [drm:pipe_config_err [i915]] *ERROR* mismatch in dp_m_n (expected tu 0 gmch 1730150/8388608 link 288358/1048576, or tu 0 gmch 0/0 link 0/0, found tu 64, gmch 1730150/8388608 link 288358/1048576)
[ 1894.177795] ------------[ cut here ]------------
[ 1894.177799] pipe state doesn't match!
[ 1894.177905] WARNING: CPU: 2 PID: 1894 at drivers/gpu/drm/i915/intel_display.c:12292 intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
[ 1894.177909] Modules linked in: i915(O) drm_kms_helper(O) drm(O) vfat fat joydev iTCO_wdt wmi_bmof btusb btrtl btbcm intel_rapl btintel i2c_algo_bit x86_pkg_temp_thermal bluetooth syscopyarea coretemp sysfillrect sysimgblt crc32_pclmul fb_sys_fops ucsi_acpi psmouse ecdh_generic pcspkr typec_ucsi i2c_i801 typec mei_me mei i2c_core wmi thinkpad_acpi ledtrig_audio snd soundcore rfkill tpm_tis tpm_tis_core video pcc_cpufreq tpm acpi_pad uas usb_storage crc32c_intel nvme serio_raw nvme_core xhci_pci xhci_hcd [last unloaded: drm]
[ 1894.177990] CPU: 2 PID: 1894 Comm: kworker/u16:8 Tainted: G           O      5.0.0-rc4Lyude-Test+ #9
[ 1894.177994] Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
[ 1894.178005] Workqueue: events_unbound async_run_entry_fn
[ 1894.178079] RIP: 0010:intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
[ 1894.178085] Code: ba 01 00 00 00 4c 89 4c 24 10 e8 3d 46 01 00 4c 8b 4c 24 10 e9 a4 fb ff ff e8 b2 18 bd e0 0f 0b e9 a5 fd ff ff e8 a6 18 bd e0 <0f> 0b e9 f1 f9 ff ff e8 9a 18 bd e0 0f 0b 0f b6 44 24 18 e9 23 f9
[ 1894.178090] RSP: 0000:ffffc90000553c08 EFLAGS: 00010286
[ 1894.178096] RAX: 0000000000000000 RBX: ffff888459edc000 RCX: 0000000000000006
[ 1894.178101] RDX: 0000000000000007 RSI: ffff8884591e2f90 RDI: ffff88845e295690
[ 1894.178105] RBP: ffff888454e27000 R08: 0000000000000000 R09: 0000000000000000
[ 1894.178108] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888454e22000
[ 1894.178112] R13: ffff888454e21000 R14: ffff8883ca680750 R15: ffff8883ca680758
[ 1894.178118] FS:  0000000000000000(0000) GS:ffff88845e280000(0000) knlGS:0000000000000000
[ 1894.178123] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1894.178127] CR2: 0000000000000000 CR3: 0000000002214001 CR4: 00000000003606e0
[ 1894.178131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1894.178135] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1894.178139] Call Trace:
[ 1894.178224]  intel_atomic_commit+0x240/0x320 [i915]
[ 1894.178251]  drm_atomic_helper_commit_duplicated_state+0xc9/0xe0 [drm_kms_helper]
[ 1894.178321]  __intel_display_resume+0x82/0xd0 [i915]
[ 1894.178391]  intel_display_resume+0xcf/0x120 [i915]
[ 1894.178453]  i915_drm_resume+0xb1/0x100 [i915]
[ 1894.178465]  ? pci_pm_suspend_late+0x30/0x30
[ 1894.178473]  dpm_run_callback+0x70/0x210
[ 1894.178484]  device_resume+0xae/0x1f0
[ 1894.178493]  async_resume+0x19/0x30
[ 1894.178502]  async_run_entry_fn+0x39/0x160
[ 1894.178513]  process_one_work+0x23c/0x590
[ 1894.178529]  worker_thread+0x30/0x380
[ 1894.178539]  ? rescuer_thread+0x340/0x340
[ 1894.178545]  kthread+0x112/0x130
[ 1894.178552]  ? kthread_create_on_node+0x60/0x60
[ 1894.178563]  ret_from_fork+0x3a/0x50
[ 1894.178582] irq event stamp: 76782
[ 1894.178591] hardirqs last  enabled at (76781): [<ffffffff8112c135>] vprintk_emit+0x2c5/0x2d0
[ 1894.178600] hardirqs last disabled at (76782): [<ffffffff81001ae8>] trace_hardirqs_off_thunk+0x1a/0x1c
[ 1894.178609] softirqs last  enabled at (76734): [<ffffffff81c0035d>] __do_softirq+0x35d/0x412
[ 1894.178618] softirqs last disabled at (76727): [<ffffffff810bf830>] irq_exit+0xe0/0xf0
[ 1894.178687] WARNING: CPU: 2 PID: 1894 at drivers/gpu/drm/i915/intel_display.c:12292 intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
[ 1894.178692] ---[ end trace 0df08c0b9a67376e ]---

So, fix this by using the new drm_atomic_state.suspend_or_resume hook in
order to force us to retrieve a VCPI allocation when restoring a pipe's
atomic state. This is safe, since drm_dp_atomic_find_vcpi_slots() will
just return the VCPI allocation we had pre-suspend.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations")
---
 drivers/gpu/drm/i915/intel_dp_mst.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
index cdb83d294cdd..04c9a16fdc39 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -80,8 +80,13 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder,
 	mst_pbn = drm_dp_calc_pbn_mode(adjusted_mode->crtc_clock, bpp);
 	pipe_config->pbn = mst_pbn;
 
-	/* Zombie connectors can't have VCPI slots */
-	if (!drm_connector_is_unregistered(connector)) {
+	/*
+	 * Zombie connectors can't have VCPI slots and won't be turned on,
+	 * except during resume where we must make sure we restore the
+	 * previous VCPI allocations
+	 */
+	if (!drm_connector_is_unregistered(connector) ||
+	    state->suspend_or_resume) {
 		slots = drm_dp_atomic_find_vcpi_slots(state,
 						      &intel_dp->mst_mgr,
 						      port,
-- 
2.20.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/dp_mst: Fix regressions from new atomic VCPI helpers
  2019-01-29 18:39 [PATCH 0/3] drm/dp_mst: Fix regressions from new atomic VCPI helpers Lyude Paul
                   ` (2 preceding siblings ...)
  2019-01-29 18:39   ` Lyude Paul
@ 2019-01-29 19:16 ` Patchwork
  2019-01-29 19:37 ` ✓ Fi.CI.BAT: success " Patchwork
  2019-01-29 22:03 ` ✓ Fi.CI.IGT: " Patchwork
  5 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2019-01-29 19:16 UTC (permalink / raw)
  To: Lyude Paul; +Cc: intel-gfx

== Series Details ==

Series: drm/dp_mst: Fix regressions from new atomic VCPI helpers
URL   : https://patchwork.freedesktop.org/series/55933/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
5eace8601b7f drm/dp_mst: Fix unbalanced malloc ref in drm_dp_mst_deallocate_vcpi()
e8ee56a00aab drm/atomic: Add drm_atomic_state->suspend_or_resume
-:23: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#23: 
[drm:drm_atomic_helper_check_modeset [drm_kms_helper]] [CRTC:65:pipe B] active changed

-:309: WARNING:BOOL_BITFIELD: Avoid using bool as bitfield.  Prefer bool bitfields as unsigned int or u<8|16|32>
#309: FILE: include/drm/drm_atomic.h:342:
+	bool suspend_or_resume : 1;

total: 0 errors, 2 warnings, 0 checks, 155 lines checked
8650b15d482b drm/i915: Always allocate VCPI during system resume

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✓ Fi.CI.BAT: success for drm/dp_mst: Fix regressions from new atomic VCPI helpers
  2019-01-29 18:39 [PATCH 0/3] drm/dp_mst: Fix regressions from new atomic VCPI helpers Lyude Paul
                   ` (3 preceding siblings ...)
  2019-01-29 19:16 ` ✗ Fi.CI.CHECKPATCH: warning for drm/dp_mst: Fix regressions from new atomic VCPI helpers Patchwork
@ 2019-01-29 19:37 ` Patchwork
  2019-01-29 22:03 ` ✓ Fi.CI.IGT: " Patchwork
  5 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2019-01-29 19:37 UTC (permalink / raw)
  To: Lyude Paul; +Cc: intel-gfx

== Series Details ==

Series: drm/dp_mst: Fix regressions from new atomic VCPI helpers
URL   : https://patchwork.freedesktop.org/series/55933/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5501 -> Patchwork_12078
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/55933/revisions/1/mbox/

Known issues
------------

  Here are the changes found in Patchwork_12078 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_busy@basic-flip-a:
    - fi-gdg-551:         NOTRUN -> FAIL [fdo#103182] +1

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103191] / [fdo#107362]

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
    - fi-blb-e6850:       PASS -> INCOMPLETE [fdo#107718]

  
#### Possible fixes ####

  * igt@i915_module_load@reload-with-fault-injection:
    - fi-kbl-7567u:       DMESG-WARN [fdo#105602] / [fdo#108529] -> PASS +1

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-kbl-7567u:       DMESG-FAIL [fdo#105079] -> PASS

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-b-frame-sequence:
    - fi-byt-clapper:     FAIL [fdo#103191] / [fdo#107362] -> PASS +1

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-c:
    - fi-kbl-7567u:       {SKIP} [fdo#109271] -> PASS +33

  * igt@pm_rpm@module-reload:
    - fi-kbl-7567u:       DMESG-WARN [fdo#108529] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#105079]: https://bugs.freedesktop.org/show_bug.cgi?id=105079
  [fdo#105602]: https://bugs.freedesktop.org/show_bug.cgi?id=105602
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108529]: https://bugs.freedesktop.org/show_bug.cgi?id=108529
  [fdo#108915]: https://bugs.freedesktop.org/show_bug.cgi?id=108915
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278


Participating hosts (46 -> 41)
------------------------------

  Additional (2): fi-icl-y fi-gdg-551 
  Missing    (7): fi-kbl-soraka fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-bdw-samus 


Build changes
-------------

    * Linux: CI_DRM_5501 -> Patchwork_12078

  CI_DRM_5501: 5aac0b69f3a3085b9b8a9924cb19cb94f7860dbe @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4798: 998e0a4aedf10fb5f7c271018cd80d874668bf55 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12078: 8650b15d482b21cca0073b796f131ff26f19bdba @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

8650b15d482b drm/i915: Always allocate VCPI during system resume
e8ee56a00aab drm/atomic: Add drm_atomic_state->suspend_or_resume
5eace8601b7f drm/dp_mst: Fix unbalanced malloc ref in drm_dp_mst_deallocate_vcpi()

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12078/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] drm/atomic: Add drm_atomic_state->suspend_or_resume
  2019-01-29 18:39   ` Lyude Paul
  (?)
@ 2019-01-29 20:30   ` Daniel Vetter
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2019-01-29 20:30 UTC (permalink / raw)
  To: Lyude Paul
  Cc: intel-gfx, dri-devel, Daniel Vetter, Maarten Lankhorst,
	Maxime Ripard, Sean Paul, David Airlie, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, linux-kernel

On Tue, Jan 29, 2019 at 01:39:26PM -0500, Lyude Paul wrote:
> Since
> 
> commit 39b50c603878 ("drm/atomic_helper: Stop modesets on unregistered
> connectors harder")
> 
> We've been failing atomic checks if they try to enable new displays on
> unregistered connectors. This is fine except for the one situation that
> breaks atomic assumptions: suspend/resume. If a connector is
> unregistered before we attempt to restore the atomic state, something we
> end up failing the atomic check that happens when trying to restore the
> state during resume.
> 
> Normally this would be OK: we try our best to make sure that the atomic
> state pre-suspend can be restored post-suspend, but failures at that
> point usually don't cause problems. That is of course, until we
> introduced the new atomic MST VCPI helpers:
> 
> [drm:drm_atomic_helper_check_modeset [drm_kms_helper]] [CRTC:65:pipe B] active changed
> [drm:drm_atomic_helper_check_modeset [drm_kms_helper]] Updating routing for [CONNECTOR:123:DP-5]
> [drm:drm_atomic_helper_check_modeset [drm_kms_helper]] Disabling [CONNECTOR:123:DP-5]
> [drm:drm_atomic_get_private_obj_state [drm]] Added new private object 0000000025844636 state 000000009fd2899a to 000000003a13d7b8
> WARNING: CPU: 6 PID: 1070 at drivers/gpu/drm/drm_dp_mst_topology.c:3153 drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper]
> Modules linked in: fuse vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic joydev iTCO_wdt i915(O) wmi_bmof intel_rapl btusb btrtl x86_pkg_temp_thermal btbcm btintel coretemp i2c_algo_bit drm_kms_helper(O) crc32_pclmul snd_hda_intel syscopyarea sysfillrect snd_hda_codec sysimgblt snd_hda_core bluetooth fb_sys_fops snd_pcm pcspkr drm(O) psmouse snd_timer mei_me ecdh_generic i2c_i801 mei i2c_core ucsi_acpi typec_ucsi typec wmi thinkpad_acpi ledtrig_audio snd soundcore tpm_tis rfkill tpm_tis_core video tpm acpi_pad pcc_cpufreq uas usb_storage crc32c_intel nvme serio_raw xhci_pci nvme_core xhci_hcd
> CPU: 6 PID: 1070 Comm: gnome-shell Tainted: G        W  O      5.0.0-rc2Lyude-Test+ #1
> Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
> RIP: 0010:drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper]
> Code: 00 4c 39 6d f0 74 49 48 8d 7b 10 48 89 f9 48 c1 e9 03 42 80 3c 21 00 0f 85 d2 00 00 00 48 8b 6b 10 48 8d 5d f0 49 39 ee 75 c5 <0f> 0b 48 c7 c7 c0 78 b3 a0 48 89 c2 4c 89 ee e8 03 6c aa ff b8 ea
> RSP: 0018:ffff88841235f268 EFLAGS: 00010246
> RAX: ffff88841bf12ab0 RBX: ffff88841bf12aa8 RCX: 1ffff110837e2557
> RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffed108246bde0
> RBP: ffff88841bf12ab8 R08: ffffed1083db3c93 R09: ffffed1083db3c92
> R10: ffffed1083db3c92 R11: ffff88841ed9e497 R12: ffff888419555d80
> R13: ffff8883bc499100 R14: ffff88841bf12ab8 R15: 0000000000000000
> FS:  00007f16fbd4cd00(0000) GS:ffff88841ed80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f1687c9f000 CR3: 00000003ba3cc003 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  drm_atomic_helper_check_modeset+0xf21/0x2f50 [drm_kms_helper]
>  ? drm_atomic_helper_commit_modeset_enables+0xa90/0xa90 [drm_kms_helper]
>  ? __printk_safe_exit+0x10/0x10
>  ? save_stack+0x8c/0xb0
>  ? vprintk_func+0x96/0x1bf
>  ? __printk_safe_exit+0x10/0x10
>  intel_atomic_check+0x234/0x4750 [i915]
>  ? printk+0x9f/0xc5
>  ? kmsg_dump_rewind_nolock+0xd9/0xd9
>  ? _raw_spin_lock_irqsave+0xa4/0x140
>  ? drm_atomic_check_only+0xb1/0x28b0 [drm]
>  ? drm_dbg+0x186/0x1b0 [drm]
>  ? drm_dev_dbg+0x200/0x200 [drm]
>  ? intel_link_compute_m_n+0xb0/0xb0 [i915]
>  ? drm_mode_put_tile_group+0x20/0x20 [drm]
>  ? skl_plane_format_mod_supported+0x17f/0x1b0 [i915]
>  ? drm_plane_check_pixel_format+0x14a/0x310 [drm]
>  drm_atomic_check_only+0x13c4/0x28b0 [drm]
>  ? drm_state_info+0x220/0x220 [drm]
>  ? drm_atomic_helper_disable_plane+0x1d0/0x1d0 [drm_kms_helper]
>  ? pick_single_encoder_for_connector+0xe0/0xe0 [drm_kms_helper]
>  ? kasan_unpoison_shadow+0x35/0x40
>  drm_atomic_commit+0x3b/0x100 [drm]
>  drm_atomic_helper_set_config+0xd5/0x100 [drm_kms_helper]
>  drm_mode_setcrtc+0x636/0x1660 [drm]
>  ? vprintk_func+0x96/0x1bf
>  ? drm_dev_dbg+0x200/0x200 [drm]
>  ? drm_mode_getcrtc+0x790/0x790 [drm]
>  ? printk+0x9f/0xc5
>  ? mutex_unlock+0x1d/0x40
>  ? drm_mode_addfb2+0x2e9/0x3a0 [drm]
>  ? rcu_sync_dtor+0x2e0/0x2e0
>  ? drm_dbg+0x186/0x1b0 [drm]
>  ? set_page_dirty+0x271/0x4d0
>  drm_ioctl_kernel+0x203/0x290 [drm]
>  ? drm_mode_getcrtc+0x790/0x790 [drm]
>  ? drm_setversion+0x7f0/0x7f0 [drm]
>  ? __switch_to_asm+0x34/0x70
>  ? __switch_to_asm+0x34/0x70
>  drm_ioctl+0x445/0x950 [drm]
>  ? drm_mode_getcrtc+0x790/0x790 [drm]
>  ? drm_getunique+0x220/0x220 [drm]
>  ? expand_files.part.10+0x920/0x920
>  do_vfs_ioctl+0x1a1/0x13d0
>  ? ioctl_preallocate+0x2b0/0x2b0
>  ? __fget_light+0x2d6/0x390
>  ? schedule+0xd7/0x2e0
>  ? fget_raw+0x10/0x10
>  ? apic_timer_interrupt+0xa/0x20
>  ? apic_timer_interrupt+0xa/0x20
>  ? rcu_cleanup_dead_rnp+0x2c0/0x2c0
>  ksys_ioctl+0x60/0x90
>  __x64_sys_ioctl+0x6f/0xb0
>  do_syscall_64+0x136/0x440
>  ? syscall_return_slowpath+0x2d0/0x2d0
>  ? do_page_fault+0x89/0x330
>  ? __do_page_fault+0x9c0/0x9c0
>  ? prepare_exit_to_usermode+0x188/0x200
>  ? perf_trace_sys_enter+0x1090/0x1090
>  ? __x64_sys_sigaltstack+0x280/0x280
>  ? __put_user_4+0x1c/0x30
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f16ff89a09b
> Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007fff001232b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007fff001232f0 RCX: 00007f16ff89a09b
> RDX: 00007fff001232f0 RSI: 00000000c06864a2 RDI: 000000000000000b
> RBP: 00007fff001232f0 R08: 0000000000000000 R09: 000055a79d484460
> R10: 000055a79d44e770 R11: 0000000000000246 R12: 00000000c06864a2
> R13: 000000000000000b R14: 0000000000000000 R15: 000055a79d44e770
> WARNING: CPU: 6 PID: 1070 at drivers/gpu/drm/drm_dp_mst_topology.c:3153 drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper]
> ---[ end trace d536c05c13c83be2 ]---
> [drm:drm_dp_atomic_release_vcpi_slots [drm_kms_helper]] *ERROR* no VCPI for [MST PORT:00000000f9e2b143] found in mst state 000000009fd2899a
> 
> This appears to be happening because we destroy the VCPI allocations
> when disabling all connected displays while suspending, and those VCPI
> allocations don't get restored on resume due to failing to restore the
> atomic state.
> 
> So, fix this by introducing the suspending option to
> drm_atomic_helper_duplicate_state() and use that to indicate in the
> atomic state that it's being used for suspending or resuming the system,
> and thus needs to be fixed up by the driver. We can then use the new
> state->suspend_or_resume hook to fixup the atomic VCPI helpers so they
> merely save and restore the VCPI allocations for such states instead of
> performing error checking on them. This fixes the warnings during
> suspend/resume when a topology is removed from the system, and allows
> us to once again restore the atomic state on resume without any issues.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations")
> Cc: Daniel Vetter <daniel@ffwll.ch>
> ---
>  drivers/gpu/drm/drm_atomic_helper.c   | 16 +++++++++---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 36 ++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/intel_display.c  |  4 +--
>  include/drm/drm_atomic.h              | 11 ++++++++
>  include/drm/drm_atomic_helper.h       |  3 ++-
>  5 files changed, 61 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 6fe2303fccd9..7d23dafdffbc 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -332,8 +332,15 @@ update_connector_routing(struct drm_atomic_state *state,
>  	 * about is ensuring that userspace can't do anything but shut off the
>  	 * display on a connector that was destroyed after its been notified,
>  	 * not before.
> +	 *
> +	 * Additionally, we also want to ignore connector registration when
> +	 * we're trying to restore an atomic state during system resume since
> +	 * there's a chance the connector may have been destroyed during the
> +	 * process, but it's better to ignore that then cause
> +	 * drm_atomic_helper_resume() to fail.
>  	 */
> -	if (drm_connector_is_unregistered(connector) && crtc_state->active) {
> +	if (!state->suspend_or_resume &&
> +	    drm_connector_is_unregistered(connector) && crtc_state->active) {
>  		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
>  				 connector->base.id, connector->name);
>  		return -EINVAL;
> @@ -3144,6 +3151,7 @@ EXPORT_SYMBOL(drm_atomic_helper_shutdown);
>   * drm_atomic_helper_duplicate_state - duplicate an atomic state object
>   * @dev: DRM device
>   * @ctx: lock acquisition context
> + * @suspending: If this state is being duplicated as part of system suspend
>   *
>   * Makes a copy of the current atomic state by looping over all objects and
>   * duplicating their respective states. This is used for example by suspend/
> @@ -3166,7 +3174,8 @@ EXPORT_SYMBOL(drm_atomic_helper_shutdown);
>   */
>  struct drm_atomic_state *
>  drm_atomic_helper_duplicate_state(struct drm_device *dev,
> -				  struct drm_modeset_acquire_ctx *ctx)
> +				  struct drm_modeset_acquire_ctx *ctx,
> +				  bool suspending)

The duplicated state in the gpu reset code also wants suspending = true.
If you race a gpu hang with a hotunplug, we don't want blow up either. A
gpu reset which requires the display state reset too essentially looks
like a suspend/resume operation (the hw is completely off after the
reset).

For the watermark optimizer: That's run at boot-up, so right now 0 mst
connectors, and the setting doesn't matter. But if we ever get around to
fast-booting mst, then again I think blowing up when we race the unplug
with the watermark cleanup is probably not good.

tldr; I don't think we need this parameter. state->suspend_or_resume
becomes a bit misleading then, I'd call it state->duplicated. I think
that's also more accurately describing why we need special code: The issue
isn't resuming, it's trying to re-apply an atomic state that doesn't quite
fit to reality anymore.

Aside from the naming/api bikeshed looks good to me.
-Daniel

>  {
>  	struct drm_atomic_state *state;
>  	struct drm_connector *conn;
> @@ -3180,6 +3189,7 @@ drm_atomic_helper_duplicate_state(struct drm_device *dev,
>  		return ERR_PTR(-ENOMEM);
>  
>  	state->acquire_ctx = ctx;
> +	state->suspend_or_resume = suspending;
>  
>  	drm_for_each_crtc(crtc, dev) {
>  		struct drm_crtc_state *crtc_state;
> @@ -3263,7 +3273,7 @@ struct drm_atomic_state *drm_atomic_helper_suspend(struct drm_device *dev)
>  
>  	DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, 0, err);
>  
> -	state = drm_atomic_helper_duplicate_state(dev, &ctx);
> +	state = drm_atomic_helper_duplicate_state(dev, &ctx, true);
>  	if (IS_ERR(state))
>  		goto unlock;
>  
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 8ba0e5b368da..371010f8d42e 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -3032,6 +3032,10 @@ static int drm_dp_init_vcpi(struct drm_dp_mst_topology_mgr *mgr,
>   * @port as needed. It is not OK however, to call this function and
>   * drm_dp_atomic_release_vcpi_slots() in the same atomic check phase.
>   *
> + * When &drm_atomic_state.suspend_or_resume is set to %true%, this function
> + * will not perform any error checking and will instead simply retrieve the
> + * previously recorded VCPI allocation that was present before system suspend.
> + *
>   * See also:
>   * drm_dp_atomic_release_vcpi_slots()
>   * drm_dp_mst_atomic_check()
> @@ -3052,9 +3056,11 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
>  	if (IS_ERR(topology_state))
>  		return PTR_ERR(topology_state);
>  
> -	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> -	if (port == NULL)
> -		return -EINVAL;
> +	if (!state->suspend_or_resume) {
> +		port = drm_dp_mst_topology_get_port_validated(mgr, port);
> +		if (!port)
> +			return -EINVAL;
> +	}
>  
>  	/* Find the current allocation for this port, if any */
>  	list_for_each_entry(pos, &topology_state->vcpis, next) {
> @@ -3062,6 +3068,19 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
>  			vcpi = pos;
>  			prev_slots = vcpi->vcpi;
>  
> +			/*
> +			 * When resuming, we just want to restore the previous
> +			 * VCPI without doing error checking
> +			 */
> +			if (state->suspend_or_resume) {
> +				DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] [MST PORT:%p] restoring VCPI of %d\n",
> +						 port->connector->base.id,
> +						 port->connector->name,
> +						 port, prev_slots);
> +
> +				return prev_slots;
> +			}
> +
>  			/*
>  			 * This should never happen, unless the driver tries
>  			 * releasing and allocating the same VCPI allocation,
> @@ -3124,6 +3143,11 @@ EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
>   * drm_dp_atomic_find_vcpi_slots() on the same @port in a single atomic check
>   * phase.
>   *
> + * When &drm_atomic_state.suspend_or_resume is set, this function will not
> + * modify the VCPI allocations in &drm_dp_mst_topology_state.vcpis, so that
> + * those VCPI allocations may be restored as-is during system resume. In this
> + * scenario, this function will always return 0.
> + *
>   * See also:
>   * drm_dp_atomic_find_vcpi_slots()
>   * drm_dp_mst_atomic_check()
> @@ -3140,6 +3164,12 @@ int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
>  	struct drm_dp_vcpi_allocation *pos;
>  	bool found = false;
>  
> +	/* During suspend, just keep our VCPI allocations as-is in the atomic
> +	 * state so they can be restored as-is at resume
> +	 */
> +	if (state->suspend_or_resume)
> +		return 0;
> +
>  	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
>  	if (IS_ERR(topology_state))
>  		return PTR_ERR(topology_state);
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 4a282532af81..4c70f1531119 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -3824,7 +3824,7 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
>  	 * Disabling the crtcs gracefully seems nicer. Also the
>  	 * g33 docs say we should at least disable all the planes.
>  	 */
> -	state = drm_atomic_helper_duplicate_state(dev, ctx);
> +	state = drm_atomic_helper_duplicate_state(dev, ctx, false);
>  	if (IS_ERR(state)) {
>  		ret = PTR_ERR(state);
>  		DRM_ERROR("Duplicating state failed with %i\n", ret);
> @@ -15043,7 +15043,7 @@ static void sanitize_watermarks(struct drm_device *dev)
>  		goto fail;
>  	}
>  
> -	state = drm_atomic_helper_duplicate_state(dev, &ctx);
> +	state = drm_atomic_helper_duplicate_state(dev, &ctx, false);
>  	if (WARN_ON(IS_ERR(state)))
>  		goto fail;
>  
> diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h
> index 811b4a92568f..0897f51dcd62 100644
> --- a/include/drm/drm_atomic.h
> +++ b/include/drm/drm_atomic.h
> @@ -329,6 +329,17 @@ struct drm_atomic_state {
>  	bool allow_modeset : 1;
>  	bool legacy_cursor_update : 1;
>  	bool async_update : 1;
> +	/**
> +	 * @suspend_or_resume:
> +	 *
> +	 * Indicates whether or not this atomic state is being saved or
> +	 * committed as part of suspending or resuming the system
> +	 * respectively. Drivers and atomic helpers hould use this to fixup
> +	 * normal state inconsistencies that might arise between system
> +	 * suspend and resume, so as to ensure that restoring the atomic state
> +	 * at resume never fails.
> +	 */
> +	bool suspend_or_resume : 1;
>  	struct __drm_planes_state *planes;
>  	struct __drm_crtcs_state *crtcs;
>  	int num_connector;
> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> index 58214be3bf3d..359f557b6898 100644
> --- a/include/drm/drm_atomic_helper.h
> +++ b/include/drm/drm_atomic_helper.h
> @@ -129,7 +129,8 @@ int drm_atomic_helper_disable_all(struct drm_device *dev,
>  void drm_atomic_helper_shutdown(struct drm_device *dev);
>  struct drm_atomic_state *
>  drm_atomic_helper_duplicate_state(struct drm_device *dev,
> -				  struct drm_modeset_acquire_ctx *ctx);
> +				  struct drm_modeset_acquire_ctx *ctx,
> +				  bool suspending);
>  struct drm_atomic_state *drm_atomic_helper_suspend(struct drm_device *dev);
>  int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state,
>  					      struct drm_modeset_acquire_ctx *ctx);
> -- 
> 2.20.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] drm/i915: Always allocate VCPI during system resume
  2019-01-29 18:39   ` Lyude Paul
@ 2019-01-29 20:32     ` Daniel Vetter
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2019-01-29 20:32 UTC (permalink / raw)
  To: Lyude Paul
  Cc: intel-gfx, dri-devel, Daniel Vetter, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, David Airlie, linux-kernel

On Tue, Jan 29, 2019 at 01:39:27PM -0500, Lyude Paul wrote:
> Since there's a chance MST topologies can be removed while the system is
> in suspend, there's also a chance that the connectors in the atomic
> state which we try to restore on system resume will have been
> unregistered if they were part of an MST topology that was removed
> mid-suspend. In such situations, we'll currently skip recalculating the
> VCPI. Normally this would be safe since we don't expect to be allowed to
> enable displays on unregistered connections, but since we need to try
> our best to restore even partially invalid atomic states on system
> resume we end up introducing the chance that we try enabling a display
> on a now-unregistered connector. This of course results on a blow up
> during system resume:
> 
> [ 1894.177788] [drm:pipe_config_err [i915]] *ERROR* mismatch in dp_m_n (expected tu 0 gmch 1730150/8388608 link 288358/1048576, or tu 0 gmch 0/0 link 0/0, found tu 64, gmch 1730150/8388608 link 288358/1048576)
> [ 1894.177795] ------------[ cut here ]------------
> [ 1894.177799] pipe state doesn't match!
> [ 1894.177905] WARNING: CPU: 2 PID: 1894 at drivers/gpu/drm/i915/intel_display.c:12292 intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
> [ 1894.177909] Modules linked in: i915(O) drm_kms_helper(O) drm(O) vfat fat joydev iTCO_wdt wmi_bmof btusb btrtl btbcm intel_rapl btintel i2c_algo_bit x86_pkg_temp_thermal bluetooth syscopyarea coretemp sysfillrect sysimgblt crc32_pclmul fb_sys_fops ucsi_acpi psmouse ecdh_generic pcspkr typec_ucsi i2c_i801 typec mei_me mei i2c_core wmi thinkpad_acpi ledtrig_audio snd soundcore rfkill tpm_tis tpm_tis_core video pcc_cpufreq tpm acpi_pad uas usb_storage crc32c_intel nvme serio_raw nvme_core xhci_pci xhci_hcd [last unloaded: drm]
> [ 1894.177990] CPU: 2 PID: 1894 Comm: kworker/u16:8 Tainted: G           O      5.0.0-rc4Lyude-Test+ #9
> [ 1894.177994] Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
> [ 1894.178005] Workqueue: events_unbound async_run_entry_fn
> [ 1894.178079] RIP: 0010:intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
> [ 1894.178085] Code: ba 01 00 00 00 4c 89 4c 24 10 e8 3d 46 01 00 4c 8b 4c 24 10 e9 a4 fb ff ff e8 b2 18 bd e0 0f 0b e9 a5 fd ff ff e8 a6 18 bd e0 <0f> 0b e9 f1 f9 ff ff e8 9a 18 bd e0 0f 0b 0f b6 44 24 18 e9 23 f9
> [ 1894.178090] RSP: 0000:ffffc90000553c08 EFLAGS: 00010286
> [ 1894.178096] RAX: 0000000000000000 RBX: ffff888459edc000 RCX: 0000000000000006
> [ 1894.178101] RDX: 0000000000000007 RSI: ffff8884591e2f90 RDI: ffff88845e295690
> [ 1894.178105] RBP: ffff888454e27000 R08: 0000000000000000 R09: 0000000000000000
> [ 1894.178108] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888454e22000
> [ 1894.178112] R13: ffff888454e21000 R14: ffff8883ca680750 R15: ffff8883ca680758
> [ 1894.178118] FS:  0000000000000000(0000) GS:ffff88845e280000(0000) knlGS:0000000000000000
> [ 1894.178123] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1894.178127] CR2: 0000000000000000 CR3: 0000000002214001 CR4: 00000000003606e0
> [ 1894.178131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1894.178135] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1894.178139] Call Trace:
> [ 1894.178224]  intel_atomic_commit+0x240/0x320 [i915]
> [ 1894.178251]  drm_atomic_helper_commit_duplicated_state+0xc9/0xe0 [drm_kms_helper]
> [ 1894.178321]  __intel_display_resume+0x82/0xd0 [i915]
> [ 1894.178391]  intel_display_resume+0xcf/0x120 [i915]
> [ 1894.178453]  i915_drm_resume+0xb1/0x100 [i915]
> [ 1894.178465]  ? pci_pm_suspend_late+0x30/0x30
> [ 1894.178473]  dpm_run_callback+0x70/0x210
> [ 1894.178484]  device_resume+0xae/0x1f0
> [ 1894.178493]  async_resume+0x19/0x30
> [ 1894.178502]  async_run_entry_fn+0x39/0x160
> [ 1894.178513]  process_one_work+0x23c/0x590
> [ 1894.178529]  worker_thread+0x30/0x380
> [ 1894.178539]  ? rescuer_thread+0x340/0x340
> [ 1894.178545]  kthread+0x112/0x130
> [ 1894.178552]  ? kthread_create_on_node+0x60/0x60
> [ 1894.178563]  ret_from_fork+0x3a/0x50
> [ 1894.178582] irq event stamp: 76782
> [ 1894.178591] hardirqs last  enabled at (76781): [<ffffffff8112c135>] vprintk_emit+0x2c5/0x2d0
> [ 1894.178600] hardirqs last disabled at (76782): [<ffffffff81001ae8>] trace_hardirqs_off_thunk+0x1a/0x1c
> [ 1894.178609] softirqs last  enabled at (76734): [<ffffffff81c0035d>] __do_softirq+0x35d/0x412
> [ 1894.178618] softirqs last disabled at (76727): [<ffffffff810bf830>] irq_exit+0xe0/0xf0
> [ 1894.178687] WARNING: CPU: 2 PID: 1894 at drivers/gpu/drm/i915/intel_display.c:12292 intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
> [ 1894.178692] ---[ end trace 0df08c0b9a67376e ]---
> 
> So, fix this by using the new drm_atomic_state.suspend_or_resume hook in
> order to force us to retrieve a VCPI allocation when restoring a pipe's
> atomic state. This is safe, since drm_dp_atomic_find_vcpi_slots() will
> just return the VCPI allocation we had pre-suspend.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations")
> ---
>  drivers/gpu/drm/i915/intel_dp_mst.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
> index cdb83d294cdd..04c9a16fdc39 100644
> --- a/drivers/gpu/drm/i915/intel_dp_mst.c
> +++ b/drivers/gpu/drm/i915/intel_dp_mst.c
> @@ -80,8 +80,13 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder,
>  	mst_pbn = drm_dp_calc_pbn_mode(adjusted_mode->crtc_clock, bpp);
>  	pipe_config->pbn = mst_pbn;
>  
> -	/* Zombie connectors can't have VCPI slots */
> -	if (!drm_connector_is_unregistered(connector)) {
> +	/*
> +	 * Zombie connectors can't have VCPI slots and won't be turned on,
> +	 * except during resume where we must make sure we restore the
> +	 * previous VCPI allocations
> +	 */
> +	if (!drm_connector_is_unregistered(connector) ||

Hm, could we push the !drm_connector_is_unregistered check into
drm_dp_atomic_find_vcpi_slots? Then we could also push the
state->duplicated check there, and these special cases would be in the
same library.

From a code logic pov looks correct I think, I couldn't poke any holes int
this idea at least. I guess we'll find out :-)
-Daniel

> +	    state->suspend_or_resume) {
>  		slots = drm_dp_atomic_find_vcpi_slots(state,
>  						      &intel_dp->mst_mgr,
>  						      port,
> -- 
> 2.20.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] drm/i915: Always allocate VCPI during system resume
@ 2019-01-29 20:32     ` Daniel Vetter
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2019-01-29 20:32 UTC (permalink / raw)
  To: Lyude Paul; +Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi

On Tue, Jan 29, 2019 at 01:39:27PM -0500, Lyude Paul wrote:
> Since there's a chance MST topologies can be removed while the system is
> in suspend, there's also a chance that the connectors in the atomic
> state which we try to restore on system resume will have been
> unregistered if they were part of an MST topology that was removed
> mid-suspend. In such situations, we'll currently skip recalculating the
> VCPI. Normally this would be safe since we don't expect to be allowed to
> enable displays on unregistered connections, but since we need to try
> our best to restore even partially invalid atomic states on system
> resume we end up introducing the chance that we try enabling a display
> on a now-unregistered connector. This of course results on a blow up
> during system resume:
> 
> [ 1894.177788] [drm:pipe_config_err [i915]] *ERROR* mismatch in dp_m_n (expected tu 0 gmch 1730150/8388608 link 288358/1048576, or tu 0 gmch 0/0 link 0/0, found tu 64, gmch 1730150/8388608 link 288358/1048576)
> [ 1894.177795] ------------[ cut here ]------------
> [ 1894.177799] pipe state doesn't match!
> [ 1894.177905] WARNING: CPU: 2 PID: 1894 at drivers/gpu/drm/i915/intel_display.c:12292 intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
> [ 1894.177909] Modules linked in: i915(O) drm_kms_helper(O) drm(O) vfat fat joydev iTCO_wdt wmi_bmof btusb btrtl btbcm intel_rapl btintel i2c_algo_bit x86_pkg_temp_thermal bluetooth syscopyarea coretemp sysfillrect sysimgblt crc32_pclmul fb_sys_fops ucsi_acpi psmouse ecdh_generic pcspkr typec_ucsi i2c_i801 typec mei_me mei i2c_core wmi thinkpad_acpi ledtrig_audio snd soundcore rfkill tpm_tis tpm_tis_core video pcc_cpufreq tpm acpi_pad uas usb_storage crc32c_intel nvme serio_raw nvme_core xhci_pci xhci_hcd [last unloaded: drm]
> [ 1894.177990] CPU: 2 PID: 1894 Comm: kworker/u16:8 Tainted: G           O      5.0.0-rc4Lyude-Test+ #9
> [ 1894.177994] Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
> [ 1894.178005] Workqueue: events_unbound async_run_entry_fn
> [ 1894.178079] RIP: 0010:intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
> [ 1894.178085] Code: ba 01 00 00 00 4c 89 4c 24 10 e8 3d 46 01 00 4c 8b 4c 24 10 e9 a4 fb ff ff e8 b2 18 bd e0 0f 0b e9 a5 fd ff ff e8 a6 18 bd e0 <0f> 0b e9 f1 f9 ff ff e8 9a 18 bd e0 0f 0b 0f b6 44 24 18 e9 23 f9
> [ 1894.178090] RSP: 0000:ffffc90000553c08 EFLAGS: 00010286
> [ 1894.178096] RAX: 0000000000000000 RBX: ffff888459edc000 RCX: 0000000000000006
> [ 1894.178101] RDX: 0000000000000007 RSI: ffff8884591e2f90 RDI: ffff88845e295690
> [ 1894.178105] RBP: ffff888454e27000 R08: 0000000000000000 R09: 0000000000000000
> [ 1894.178108] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888454e22000
> [ 1894.178112] R13: ffff888454e21000 R14: ffff8883ca680750 R15: ffff8883ca680758
> [ 1894.178118] FS:  0000000000000000(0000) GS:ffff88845e280000(0000) knlGS:0000000000000000
> [ 1894.178123] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1894.178127] CR2: 0000000000000000 CR3: 0000000002214001 CR4: 00000000003606e0
> [ 1894.178131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1894.178135] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1894.178139] Call Trace:
> [ 1894.178224]  intel_atomic_commit+0x240/0x320 [i915]
> [ 1894.178251]  drm_atomic_helper_commit_duplicated_state+0xc9/0xe0 [drm_kms_helper]
> [ 1894.178321]  __intel_display_resume+0x82/0xd0 [i915]
> [ 1894.178391]  intel_display_resume+0xcf/0x120 [i915]
> [ 1894.178453]  i915_drm_resume+0xb1/0x100 [i915]
> [ 1894.178465]  ? pci_pm_suspend_late+0x30/0x30
> [ 1894.178473]  dpm_run_callback+0x70/0x210
> [ 1894.178484]  device_resume+0xae/0x1f0
> [ 1894.178493]  async_resume+0x19/0x30
> [ 1894.178502]  async_run_entry_fn+0x39/0x160
> [ 1894.178513]  process_one_work+0x23c/0x590
> [ 1894.178529]  worker_thread+0x30/0x380
> [ 1894.178539]  ? rescuer_thread+0x340/0x340
> [ 1894.178545]  kthread+0x112/0x130
> [ 1894.178552]  ? kthread_create_on_node+0x60/0x60
> [ 1894.178563]  ret_from_fork+0x3a/0x50
> [ 1894.178582] irq event stamp: 76782
> [ 1894.178591] hardirqs last  enabled at (76781): [<ffffffff8112c135>] vprintk_emit+0x2c5/0x2d0
> [ 1894.178600] hardirqs last disabled at (76782): [<ffffffff81001ae8>] trace_hardirqs_off_thunk+0x1a/0x1c
> [ 1894.178609] softirqs last  enabled at (76734): [<ffffffff81c0035d>] __do_softirq+0x35d/0x412
> [ 1894.178618] softirqs last disabled at (76727): [<ffffffff810bf830>] irq_exit+0xe0/0xf0
> [ 1894.178687] WARNING: CPU: 2 PID: 1894 at drivers/gpu/drm/i915/intel_display.c:12292 intel_atomic_commit_tail+0xd5e/0xdd0 [i915]
> [ 1894.178692] ---[ end trace 0df08c0b9a67376e ]---
> 
> So, fix this by using the new drm_atomic_state.suspend_or_resume hook in
> order to force us to retrieve a VCPI allocation when restoring a pipe's
> atomic state. This is safe, since drm_dp_atomic_find_vcpi_slots() will
> just return the VCPI allocation we had pre-suspend.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations")
> ---
>  drivers/gpu/drm/i915/intel_dp_mst.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
> index cdb83d294cdd..04c9a16fdc39 100644
> --- a/drivers/gpu/drm/i915/intel_dp_mst.c
> +++ b/drivers/gpu/drm/i915/intel_dp_mst.c
> @@ -80,8 +80,13 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder,
>  	mst_pbn = drm_dp_calc_pbn_mode(adjusted_mode->crtc_clock, bpp);
>  	pipe_config->pbn = mst_pbn;
>  
> -	/* Zombie connectors can't have VCPI slots */
> -	if (!drm_connector_is_unregistered(connector)) {
> +	/*
> +	 * Zombie connectors can't have VCPI slots and won't be turned on,
> +	 * except during resume where we must make sure we restore the
> +	 * previous VCPI allocations
> +	 */
> +	if (!drm_connector_is_unregistered(connector) ||

Hm, could we push the !drm_connector_is_unregistered check into
drm_dp_atomic_find_vcpi_slots? Then we could also push the
state->duplicated check there, and these special cases would be in the
same library.

From a code logic pov looks correct I think, I couldn't poke any holes int
this idea at least. I guess we'll find out :-)
-Daniel

> +	    state->suspend_or_resume) {
>  		slots = drm_dp_atomic_find_vcpi_slots(state,
>  						      &intel_dp->mst_mgr,
>  						      port,
> -- 
> 2.20.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] drm/dp_mst: Fix unbalanced malloc ref in drm_dp_mst_deallocate_vcpi()
  2019-01-29 18:39   ` Lyude Paul
@ 2019-01-29 20:36     ` Daniel Vetter
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2019-01-29 20:36 UTC (permalink / raw)
  To: Lyude Paul
  Cc: intel-gfx, dri-devel, Daniel Vetter, Maarten Lankhorst,
	Maxime Ripard, Sean Paul, David Airlie, linux-kernel

On Tue, Jan 29, 2019 at 01:39:25PM -0500, Lyude Paul wrote:
> In drm_dp_mst_deallocate_vcpi(), we currently unconditionally call
> drm_dp_mst_put_port_malloc() on the port that's passed to us, even if we
> never successfully allocated VCPI to it. This is contrary to what we do
> in drm_dp_mst_allocate_vcpi(), where we only call
> drm_dp_mst_get_port_malloc() on the passed port if we successfully
> allocated VCPI to it.
> 
> As a result, if drm_dp_mst_allocate_vcpi() fails during a modeset and
> another successive modeset calls drm_dp_mst_deallocate_vcpi() we will
> end up dropping someone else's malloc reference to the port. Example:
> 
> [  962.309260] ==================================================================
> [  962.309290] BUG: KASAN: use-after-free in drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309296] Read of size 4 at addr ffff888416c30004 by task kworker/0:1H/500
> 
> [  962.309308] CPU: 0 PID: 500 Comm: kworker/0:1H Tainted: G        W  O      5.0.0-rc2Lyude-Test+ #1
> [  962.309313] Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
> [  962.309428] Workqueue: events_highpri intel_atomic_cleanup_work [i915]
> [  962.309434] Call Trace:
> [  962.309452]  dump_stack+0xad/0x150
> [  962.309462]  ? dump_stack_print_info.cold.0+0x1b/0x1b
> [  962.309472]  ? kmsg_dump_rewind_nolock+0xd9/0xd9
> [  962.309504]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309515]  print_address_description+0x6c/0x23c
> [  962.309542]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309568]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309577]  kasan_report.cold.3+0x1a/0x32
> [  962.309605]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309631]  drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309658]  ? drm_dp_mst_put_mstb_malloc+0x180/0x180 [drm_kms_helper]
> [  962.309687]  drm_dp_mst_destroy_state+0xcd/0x120 [drm_kms_helper]
> [  962.309745]  drm_atomic_state_default_clear+0x6ee/0xcc0 [drm]
> [  962.309864]  intel_atomic_state_clear+0xe/0x80 [i915]
> [  962.309928]  __drm_atomic_state_free+0x35/0xd0 [drm]
> [  962.310044]  intel_atomic_cleanup_work+0x56/0x70 [i915]
> [  962.310057]  process_one_work+0x884/0x1400
> [  962.310067]  ? drain_workqueue+0x5a0/0x5a0
> [  962.310075]  ? __schedule+0x87f/0x1e80
> [  962.310086]  ? __sched_text_start+0x8/0x8
> [  962.310095]  ? run_rebalance_domains+0x400/0x400
> [  962.310110]  ? deref_stack_reg+0xb4/0x120
> [  962.310117]  ? __read_once_size_nocheck.constprop.7+0x10/0x10
> [  962.310124]  ? worker_enter_idle+0x47f/0x6a0
> [  962.310134]  ? schedule+0xd7/0x2e0
> [  962.310141]  ? __schedule+0x1e80/0x1e80
> [  962.310148]  ? _raw_spin_lock_irq+0x9f/0x130
> [  962.310155]  ? _raw_write_unlock_irqrestore+0x110/0x110
> [  962.310164]  worker_thread+0x196/0x11e0
> [  962.310175]  ? set_load_weight+0x2e0/0x2e0
> [  962.310181]  ? __switch_to_asm+0x34/0x70
> [  962.310187]  ? __switch_to_asm+0x40/0x70
> [  962.310194]  ? process_one_work+0x1400/0x1400
> [  962.310199]  ? __switch_to_asm+0x40/0x70
> [  962.310205]  ? __switch_to_asm+0x34/0x70
> [  962.310211]  ? __switch_to_asm+0x34/0x70
> [  962.310216]  ? __switch_to_asm+0x40/0x70
> [  962.310221]  ? __switch_to_asm+0x34/0x70
> [  962.310226]  ? __switch_to_asm+0x40/0x70
> [  962.310231]  ? __switch_to_asm+0x34/0x70
> [  962.310236]  ? __switch_to_asm+0x40/0x70
> [  962.310242]  ? syscall_return_via_sysret+0xf/0x7f
> [  962.310248]  ? __switch_to_asm+0x34/0x70
> [  962.310253]  ? __switch_to_asm+0x40/0x70
> [  962.310258]  ? __switch_to_asm+0x34/0x70
> [  962.310263]  ? __switch_to_asm+0x40/0x70
> [  962.310268]  ? __switch_to_asm+0x34/0x70
> [  962.310273]  ? __switch_to_asm+0x40/0x70
> [  962.310281]  ? __schedule+0x87f/0x1e80
> [  962.310292]  ? __sched_text_start+0x8/0x8
> [  962.310300]  ? save_stack+0x8c/0xb0
> [  962.310308]  ? __kasan_kmalloc.constprop.6+0xc6/0xd0
> [  962.310313]  ? kthread+0x98/0x3a0
> [  962.310318]  ? ret_from_fork+0x35/0x40
> [  962.310334]  ? __wake_up_common+0x178/0x6f0
> [  962.310343]  ? _raw_spin_lock_irqsave+0xa4/0x140
> [  962.310349]  ? __lock_text_start+0x8/0x8
> [  962.310355]  ? _raw_write_lock_irqsave+0x70/0x130
> [  962.310360]  ? __lock_text_start+0x8/0x8
> [  962.310371]  ? process_one_work+0x1400/0x1400
> [  962.310376]  kthread+0x2e2/0x3a0
> [  962.310383]  ? kthread_create_on_node+0xc0/0xc0
> [  962.310389]  ret_from_fork+0x35/0x40
> 
> [  962.310401] Allocated by task 1462:
> [  962.310410]  __kasan_kmalloc.constprop.6+0xc6/0xd0
> [  962.310437]  drm_dp_add_port+0xd60/0x1960 [drm_kms_helper]
> [  962.310464]  drm_dp_send_link_address+0x4b0/0x770 [drm_kms_helper]
> [  962.310491]  drm_dp_check_and_send_link_address+0x197/0x1f0 [drm_kms_helper]
> [  962.310515]  drm_dp_mst_link_probe_work+0x2b6/0x330 [drm_kms_helper]
> [  962.310522]  process_one_work+0x884/0x1400
> [  962.310529]  worker_thread+0x196/0x11e0
> [  962.310533]  kthread+0x2e2/0x3a0
> [  962.310538]  ret_from_fork+0x35/0x40
> 
> [  962.310543] Freed by task 500:
> [  962.310550]  __kasan_slab_free+0x133/0x180
> [  962.310555]  kfree+0x92/0x1a0
> [  962.310581]  drm_dp_mst_put_port_malloc+0x14d/0x180 [drm_kms_helper]
> [  962.310693]  intel_connector_destroy+0xb2/0xe0 [i915]
> [  962.310747]  drm_mode_object_put.part.0+0x12b/0x1a0 [drm]
> [  962.310802]  drm_atomic_state_default_clear+0x1f2/0xcc0 [drm]
> [  962.310916]  intel_atomic_state_clear+0xe/0x80 [i915]
> [  962.310972]  __drm_atomic_state_free+0x35/0xd0 [drm]
> [  962.311083]  intel_atomic_cleanup_work+0x56/0x70 [i915]
> [  962.311092]  process_one_work+0x884/0x1400
> [  962.311098]  worker_thread+0x196/0x11e0
> [  962.311103]  kthread+0x2e2/0x3a0
> [  962.311108]  ret_from_fork+0x35/0x40
> 
> [  962.311116] The buggy address belongs to the object at ffff888416c30000
>                 which belongs to the cache kmalloc-2k of size 2048
> [  962.311122] The buggy address is located 4 bytes inside of
>                 2048-byte region [ffff888416c30000, ffff888416c30800)
> [  962.311124] The buggy address belongs to the page:
> [  962.311132] page:ffffea00105b0c00 count:1 mapcount:0 mapping:ffff88841d003040 index:0x0 compound_mapcount: 0
> [  962.311142] flags: 0x8000000000010200(slab|head)
> [  962.311152] raw: 8000000000010200 dead000000000100 dead000000000200 ffff88841d003040
> [  962.311159] raw: 0000000000000000 00000000000f000f 00000001ffffffff 0000000000000000
> [  962.311162] page dumped because: kasan: bad access detected
> 
> So, bail early if drm_dp_mst_deallocate_vcpi() is called on a port with
> no VCPI allocation. Additionally, clean up the surrounding kerneldoc
> while we're at it since the port is assumed to be kept around because
> the DRM driver is expected to hold a malloc reference to it, not just
> us.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations")
> Cc: Daniel Vetter <daniel@ffwll.ch>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 2552a27362a0..8ba0e5b368da 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -3246,15 +3246,13 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
>  /**
>   * drm_dp_mst_deallocate_vcpi() - deallocate a VCPI
>   * @mgr: manager for this port
> - * @port: unverified port to deallocate vcpi for
> + * @port: port to deallocate vcpi for

Maybe add:

"This can be called unconditionally, whether drm_dp_mst_allocate_vcpi
succeeded or not."

Just to clarify this a bit more.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

>   */
>  void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
>  				struct drm_dp_mst_port *port)
>  {
> -	/*
> -	 * A port with VCPI will remain allocated until it's VCPI is
> -	 * released, no verified ref needed
> -	 */
> +	if (!port->vcpi.vcpi)
> +		return;
>  
>  	drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
>  	port->vcpi.num_slots = 0;
> -- 
> 2.20.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] drm/dp_mst: Fix unbalanced malloc ref in drm_dp_mst_deallocate_vcpi()
@ 2019-01-29 20:36     ` Daniel Vetter
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2019-01-29 20:36 UTC (permalink / raw)
  To: Lyude Paul
  Cc: Maxime Ripard, intel-gfx, linux-kernel, dri-devel, David Airlie

On Tue, Jan 29, 2019 at 01:39:25PM -0500, Lyude Paul wrote:
> In drm_dp_mst_deallocate_vcpi(), we currently unconditionally call
> drm_dp_mst_put_port_malloc() on the port that's passed to us, even if we
> never successfully allocated VCPI to it. This is contrary to what we do
> in drm_dp_mst_allocate_vcpi(), where we only call
> drm_dp_mst_get_port_malloc() on the passed port if we successfully
> allocated VCPI to it.
> 
> As a result, if drm_dp_mst_allocate_vcpi() fails during a modeset and
> another successive modeset calls drm_dp_mst_deallocate_vcpi() we will
> end up dropping someone else's malloc reference to the port. Example:
> 
> [  962.309260] ==================================================================
> [  962.309290] BUG: KASAN: use-after-free in drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309296] Read of size 4 at addr ffff888416c30004 by task kworker/0:1H/500
> 
> [  962.309308] CPU: 0 PID: 500 Comm: kworker/0:1H Tainted: G        W  O      5.0.0-rc2Lyude-Test+ #1
> [  962.309313] Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
> [  962.309428] Workqueue: events_highpri intel_atomic_cleanup_work [i915]
> [  962.309434] Call Trace:
> [  962.309452]  dump_stack+0xad/0x150
> [  962.309462]  ? dump_stack_print_info.cold.0+0x1b/0x1b
> [  962.309472]  ? kmsg_dump_rewind_nolock+0xd9/0xd9
> [  962.309504]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309515]  print_address_description+0x6c/0x23c
> [  962.309542]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309568]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309577]  kasan_report.cold.3+0x1a/0x32
> [  962.309605]  ? drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309631]  drm_dp_mst_put_port_malloc+0x72/0x180 [drm_kms_helper]
> [  962.309658]  ? drm_dp_mst_put_mstb_malloc+0x180/0x180 [drm_kms_helper]
> [  962.309687]  drm_dp_mst_destroy_state+0xcd/0x120 [drm_kms_helper]
> [  962.309745]  drm_atomic_state_default_clear+0x6ee/0xcc0 [drm]
> [  962.309864]  intel_atomic_state_clear+0xe/0x80 [i915]
> [  962.309928]  __drm_atomic_state_free+0x35/0xd0 [drm]
> [  962.310044]  intel_atomic_cleanup_work+0x56/0x70 [i915]
> [  962.310057]  process_one_work+0x884/0x1400
> [  962.310067]  ? drain_workqueue+0x5a0/0x5a0
> [  962.310075]  ? __schedule+0x87f/0x1e80
> [  962.310086]  ? __sched_text_start+0x8/0x8
> [  962.310095]  ? run_rebalance_domains+0x400/0x400
> [  962.310110]  ? deref_stack_reg+0xb4/0x120
> [  962.310117]  ? __read_once_size_nocheck.constprop.7+0x10/0x10
> [  962.310124]  ? worker_enter_idle+0x47f/0x6a0
> [  962.310134]  ? schedule+0xd7/0x2e0
> [  962.310141]  ? __schedule+0x1e80/0x1e80
> [  962.310148]  ? _raw_spin_lock_irq+0x9f/0x130
> [  962.310155]  ? _raw_write_unlock_irqrestore+0x110/0x110
> [  962.310164]  worker_thread+0x196/0x11e0
> [  962.310175]  ? set_load_weight+0x2e0/0x2e0
> [  962.310181]  ? __switch_to_asm+0x34/0x70
> [  962.310187]  ? __switch_to_asm+0x40/0x70
> [  962.310194]  ? process_one_work+0x1400/0x1400
> [  962.310199]  ? __switch_to_asm+0x40/0x70
> [  962.310205]  ? __switch_to_asm+0x34/0x70
> [  962.310211]  ? __switch_to_asm+0x34/0x70
> [  962.310216]  ? __switch_to_asm+0x40/0x70
> [  962.310221]  ? __switch_to_asm+0x34/0x70
> [  962.310226]  ? __switch_to_asm+0x40/0x70
> [  962.310231]  ? __switch_to_asm+0x34/0x70
> [  962.310236]  ? __switch_to_asm+0x40/0x70
> [  962.310242]  ? syscall_return_via_sysret+0xf/0x7f
> [  962.310248]  ? __switch_to_asm+0x34/0x70
> [  962.310253]  ? __switch_to_asm+0x40/0x70
> [  962.310258]  ? __switch_to_asm+0x34/0x70
> [  962.310263]  ? __switch_to_asm+0x40/0x70
> [  962.310268]  ? __switch_to_asm+0x34/0x70
> [  962.310273]  ? __switch_to_asm+0x40/0x70
> [  962.310281]  ? __schedule+0x87f/0x1e80
> [  962.310292]  ? __sched_text_start+0x8/0x8
> [  962.310300]  ? save_stack+0x8c/0xb0
> [  962.310308]  ? __kasan_kmalloc.constprop.6+0xc6/0xd0
> [  962.310313]  ? kthread+0x98/0x3a0
> [  962.310318]  ? ret_from_fork+0x35/0x40
> [  962.310334]  ? __wake_up_common+0x178/0x6f0
> [  962.310343]  ? _raw_spin_lock_irqsave+0xa4/0x140
> [  962.310349]  ? __lock_text_start+0x8/0x8
> [  962.310355]  ? _raw_write_lock_irqsave+0x70/0x130
> [  962.310360]  ? __lock_text_start+0x8/0x8
> [  962.310371]  ? process_one_work+0x1400/0x1400
> [  962.310376]  kthread+0x2e2/0x3a0
> [  962.310383]  ? kthread_create_on_node+0xc0/0xc0
> [  962.310389]  ret_from_fork+0x35/0x40
> 
> [  962.310401] Allocated by task 1462:
> [  962.310410]  __kasan_kmalloc.constprop.6+0xc6/0xd0
> [  962.310437]  drm_dp_add_port+0xd60/0x1960 [drm_kms_helper]
> [  962.310464]  drm_dp_send_link_address+0x4b0/0x770 [drm_kms_helper]
> [  962.310491]  drm_dp_check_and_send_link_address+0x197/0x1f0 [drm_kms_helper]
> [  962.310515]  drm_dp_mst_link_probe_work+0x2b6/0x330 [drm_kms_helper]
> [  962.310522]  process_one_work+0x884/0x1400
> [  962.310529]  worker_thread+0x196/0x11e0
> [  962.310533]  kthread+0x2e2/0x3a0
> [  962.310538]  ret_from_fork+0x35/0x40
> 
> [  962.310543] Freed by task 500:
> [  962.310550]  __kasan_slab_free+0x133/0x180
> [  962.310555]  kfree+0x92/0x1a0
> [  962.310581]  drm_dp_mst_put_port_malloc+0x14d/0x180 [drm_kms_helper]
> [  962.310693]  intel_connector_destroy+0xb2/0xe0 [i915]
> [  962.310747]  drm_mode_object_put.part.0+0x12b/0x1a0 [drm]
> [  962.310802]  drm_atomic_state_default_clear+0x1f2/0xcc0 [drm]
> [  962.310916]  intel_atomic_state_clear+0xe/0x80 [i915]
> [  962.310972]  __drm_atomic_state_free+0x35/0xd0 [drm]
> [  962.311083]  intel_atomic_cleanup_work+0x56/0x70 [i915]
> [  962.311092]  process_one_work+0x884/0x1400
> [  962.311098]  worker_thread+0x196/0x11e0
> [  962.311103]  kthread+0x2e2/0x3a0
> [  962.311108]  ret_from_fork+0x35/0x40
> 
> [  962.311116] The buggy address belongs to the object at ffff888416c30000
>                 which belongs to the cache kmalloc-2k of size 2048
> [  962.311122] The buggy address is located 4 bytes inside of
>                 2048-byte region [ffff888416c30000, ffff888416c30800)
> [  962.311124] The buggy address belongs to the page:
> [  962.311132] page:ffffea00105b0c00 count:1 mapcount:0 mapping:ffff88841d003040 index:0x0 compound_mapcount: 0
> [  962.311142] flags: 0x8000000000010200(slab|head)
> [  962.311152] raw: 8000000000010200 dead000000000100 dead000000000200 ffff88841d003040
> [  962.311159] raw: 0000000000000000 00000000000f000f 00000001ffffffff 0000000000000000
> [  962.311162] page dumped because: kasan: bad access detected
> 
> So, bail early if drm_dp_mst_deallocate_vcpi() is called on a port with
> no VCPI allocation. Additionally, clean up the surrounding kerneldoc
> while we're at it since the port is assumed to be kept around because
> the DRM driver is expected to hold a malloc reference to it, not just
> us.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations")
> Cc: Daniel Vetter <daniel@ffwll.ch>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 2552a27362a0..8ba0e5b368da 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -3246,15 +3246,13 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
>  /**
>   * drm_dp_mst_deallocate_vcpi() - deallocate a VCPI
>   * @mgr: manager for this port
> - * @port: unverified port to deallocate vcpi for
> + * @port: port to deallocate vcpi for

Maybe add:

"This can be called unconditionally, whether drm_dp_mst_allocate_vcpi
succeeded or not."

Just to clarify this a bit more.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

>   */
>  void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
>  				struct drm_dp_mst_port *port)
>  {
> -	/*
> -	 * A port with VCPI will remain allocated until it's VCPI is
> -	 * released, no verified ref needed
> -	 */
> +	if (!port->vcpi.vcpi)
> +		return;
>  
>  	drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
>  	port->vcpi.num_slots = 0;
> -- 
> 2.20.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✓ Fi.CI.IGT: success for drm/dp_mst: Fix regressions from new atomic VCPI helpers
  2019-01-29 18:39 [PATCH 0/3] drm/dp_mst: Fix regressions from new atomic VCPI helpers Lyude Paul
                   ` (4 preceding siblings ...)
  2019-01-29 19:37 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-01-29 22:03 ` Patchwork
  5 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2019-01-29 22:03 UTC (permalink / raw)
  To: Lyude Paul; +Cc: intel-gfx

== Series Details ==

Series: drm/dp_mst: Fix regressions from new atomic VCPI helpers
URL   : https://patchwork.freedesktop.org/series/55933/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5501_full -> Patchwork_12078_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_12078_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_eio@wait-wedge-1us:
    - shard-glk:          PASS -> FAIL [fdo#105957]

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-b:
    - shard-snb:          NOTRUN -> DMESG-WARN [fdo#107956]

  * igt@kms_ccs@pipe-b-crc-sprite-planes-basic:
    - shard-apl:          PASS -> FAIL [fdo#106510] / [fdo#108145]

  * igt@kms_color@pipe-c-legacy-gamma:
    - shard-apl:          PASS -> FAIL [fdo#104782] +1

  * igt@kms_cursor_crc@cursor-128x128-random:
    - shard-apl:          PASS -> FAIL [fdo#103232] +3

  * igt@kms_cursor_crc@cursor-64x21-onscreen:
    - shard-glk:          PASS -> FAIL [fdo#103232] +1

  * igt@kms_cursor_crc@cursor-64x64-suspend:
    - shard-apl:          PASS -> FAIL [fdo#103191] / [fdo#103232]

  * igt@kms_plane@pixel-format-pipe-c-planes-source-clamping:
    - shard-apl:          PASS -> FAIL [fdo#108948]

  * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-max:
    - shard-apl:          PASS -> FAIL [fdo#108145]

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-none:
    - shard-glk:          PASS -> FAIL [fdo#103166]

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-none:
    - shard-apl:          PASS -> FAIL [fdo#103166] +2

  * igt@kms_setmode@basic:
    - shard-kbl:          PASS -> FAIL [fdo#99912]

  
#### Possible fixes ####

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-c:
    - shard-hsw:          DMESG-WARN [fdo#107956] -> PASS

  * igt@kms_cursor_crc@cursor-256x256-random:
    - shard-glk:          FAIL [fdo#103232] -> PASS +1

  * igt@kms_cursor_crc@cursor-256x256-suspend:
    - shard-apl:          FAIL [fdo#103191] / [fdo#103232] -> PASS

  * igt@kms_cursor_crc@cursor-256x85-random:
    - shard-apl:          FAIL [fdo#103232] -> PASS +3

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-glk:          FAIL [fdo#102887] / [fdo#105363] -> PASS

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-none:
    - shard-glk:          FAIL [fdo#103166] -> PASS +2

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-x:
    - shard-apl:          FAIL [fdo#103166] -> PASS

  * igt@kms_setmode@basic:
    - shard-apl:          FAIL [fdo#99912] -> PASS

  * igt@prime_busy@hang-bsd:
    - shard-hsw:          FAIL [fdo#108807] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102887]: https://bugs.freedesktop.org/show_bug.cgi?id=102887
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#104782]: https://bugs.freedesktop.org/show_bug.cgi?id=104782
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#105957]: https://bugs.freedesktop.org/show_bug.cgi?id=105957
  [fdo#106510]: https://bugs.freedesktop.org/show_bug.cgi?id=106510
  [fdo#107956]: https://bugs.freedesktop.org/show_bug.cgi?id=107956
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108807]: https://bugs.freedesktop.org/show_bug.cgi?id=108807
  [fdo#108948]: https://bugs.freedesktop.org/show_bug.cgi?id=108948
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912


Participating hosts (7 -> 5)
------------------------------

  Missing    (2): shard-skl shard-iclb 


Build changes
-------------

    * Linux: CI_DRM_5501 -> Patchwork_12078

  CI_DRM_5501: 5aac0b69f3a3085b9b8a9924cb19cb94f7860dbe @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4798: 998e0a4aedf10fb5f7c271018cd80d874668bf55 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12078: 8650b15d482b21cca0073b796f131ff26f19bdba @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12078/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-01-29 22:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-29 18:39 [PATCH 0/3] drm/dp_mst: Fix regressions from new atomic VCPI helpers Lyude Paul
2019-01-29 18:39 ` [PATCH 1/3] drm/dp_mst: Fix unbalanced malloc ref in drm_dp_mst_deallocate_vcpi() Lyude Paul
2019-01-29 18:39   ` Lyude Paul
2019-01-29 20:36   ` Daniel Vetter
2019-01-29 20:36     ` Daniel Vetter
2019-01-29 18:39 ` [PATCH 2/3] drm/atomic: Add drm_atomic_state->suspend_or_resume Lyude Paul
2019-01-29 18:39   ` Lyude Paul
2019-01-29 20:30   ` Daniel Vetter
2019-01-29 18:39 ` [PATCH 3/3] drm/i915: Always allocate VCPI during system resume Lyude Paul
2019-01-29 18:39   ` Lyude Paul
2019-01-29 20:32   ` Daniel Vetter
2019-01-29 20:32     ` Daniel Vetter
2019-01-29 19:16 ` ✗ Fi.CI.CHECKPATCH: warning for drm/dp_mst: Fix regressions from new atomic VCPI helpers Patchwork
2019-01-29 19:37 ` ✓ Fi.CI.BAT: success " Patchwork
2019-01-29 22:03 ` ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.