linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup
@ 2018-12-14  1:25 Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 01/15] drm/dp_mst: Remove bogus conditional in drm_dp_update_payload_part1() Lyude Paul
                   ` (14 more replies)
  0 siblings, 15 replies; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	David Airlie, Maxime Ripard, Maarten Lankhorst, linux-kernel,
	Sean Paul, linux-doc, Jonathan Corbet, Lyude Paul, Rodrigo Vivi,
	Jani Nikula, Joonas Lahtinen, Sean Paul, Karol Herbst,
	Ben Skeggs, Ilia Mirkin

This is a WIP version of the series I've been working on for a while now
to get all of the atomic DRM drivers in the tree to use the atomic MST
helpers, and to make the atomic MST helpers actually idempotent. Turns
out it's a lot more difficult to do that without also fixing how port
and branch device refcounting works so that it actually makes sense,
since the current upstream implementation requires a ton of magic in the
atomic helpers to work around properly and in many situations just plain
doesn't work as intended.

This patch series is starting to get bigger, and since there's still a
few bits here and there regarding the new refcount implementation that I
haven't quite decided on yet I figured I should get an opinion from
everyone else.

Currently I've got a couple of thoughts on how I could improve this
further:

* Get rid of drm_dp_mst_get_*_validated() entirely - I'm 90% sure that
  with the new refcounting scheme we might not actually need port
  validation at all anymore, assuming we make the use of malloc references
  in all of the DRM drivers. Either way, I don't think validation was ever
  actually a concept that worked: without malloc references, the port or
  branch device that's being passed to drm_dp_mst_get_*_validated()
  could be freed which also in turn means that that the stale pointer
  could in theory have gotten reused for a new port and thus-cause us to
  consider a freed port validated.
* Get rid of drm_dp_mst_get_vcpi_slots() - with malloc references, I
  don't think there's any use for this either
* Get rid of drm_dp_mst_reset_vcpi_slots() - I think the only time this
  function ever made sense was with port validation? Honestly, I wonder
  if we ever needed this at all...

Note: I haven't applied some of the comments from the reviews for the
series this is based off of:

drm/dp_mst: Improve VCPI helpers, use in nouveau
https://patchwork.freedesktop.org/series/51414/

This is just getting put on the ML so I can get some feedback on this.

Lyude Paul (15):
  drm/dp_mst: Remove bogus conditional in drm_dp_update_payload_part1()
  drm/dp_mst: Refactor drm_dp_update_payload_part1()
  drm/dp_mst: Introduce new refcounting scheme for mstbs and ports
  drm/dp_mst: Stop releasing VCPI when removing ports from topology
  drm/dp_mst: Fix payload deallocation on hotplugs using malloc refs
  drm/i915: Keep malloc references to MST ports
  drm/nouveau: Remove bogus cleanup in nv50_mstm_add_connector()
  drm/nouveau: Remove unnecessary VCPI checks in nv50_msto_cleanup()
  drm/nouveau: Fix potential use-after-frees for MSTCs
  drm/nouveau: Stop unsetting mstc->port, use malloc refs
  drm/nouveau: Grab payload lock in nv50_msto_payload()
  drm/dp_mst: Add some atomic state iterator macros
  drm/dp_mst: Start tracking per-port VCPI allocations
  drm/dp_mst: Check payload count in drm_dp_mst_atomic_check()
  drm/nouveau: Use atomic VCPI helpers for MST

 .../gpu/dp-mst/topology-figure-1.dot          |  31 +
 .../gpu/dp-mst/topology-figure-2.dot          |  37 +
 .../gpu/dp-mst/topology-figure-3.dot          |  40 +
 Documentation/gpu/drm-kms-helpers.rst         | 125 ++-
 drivers/gpu/drm/drm_dp_mst_topology.c         | 910 ++++++++++++++----
 drivers/gpu/drm/i915/intel_connector.c        |   4 +
 drivers/gpu/drm/i915/intel_display.c          |   4 +
 drivers/gpu/drm/i915/intel_dp_mst.c           |  66 +-
 drivers/gpu/drm/nouveau/dispnv50/disp.c       |  94 +-
 include/drm/drm_dp_mst_helper.h               | 139 ++-
 10 files changed, 1178 insertions(+), 272 deletions(-)
 create mode 100644 Documentation/gpu/dp-mst/topology-figure-1.dot
 create mode 100644 Documentation/gpu/dp-mst/topology-figure-2.dot
 create mode 100644 Documentation/gpu/dp-mst/topology-figure-3.dot

-- 
2.19.2


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [WIP PATCH 01/15] drm/dp_mst: Remove bogus conditional in drm_dp_update_payload_part1()
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  8:42   ` Daniel Vetter
  2018-12-14  1:25 ` [WIP PATCH 02/15] drm/dp_mst: Refactor drm_dp_update_payload_part1() Lyude Paul
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo, Juston Li,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	linux-kernel

There's no reason we need this, it's just confusing looking.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Juston Li <juston.li@intel.com>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index ad0fb6d003be..9b1b5c9b1fa0 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -1896,9 +1896,7 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
 			req_payload.num_slots = 0;
 		}
 
-		if (mgr->payloads[i].start_slot != req_payload.start_slot) {
-			mgr->payloads[i].start_slot = req_payload.start_slot;
-		}
+		mgr->payloads[i].start_slot = req_payload.start_slot;
 		/* work out what is required to happen with this payload */
 		if (mgr->payloads[i].num_slots != req_payload.num_slots) {
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 02/15] drm/dp_mst: Refactor drm_dp_update_payload_part1()
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 01/15] drm/dp_mst: Remove bogus conditional in drm_dp_update_payload_part1() Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  8:47   ` Daniel Vetter
  2018-12-14  1:25 ` [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports Lyude Paul
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo, Juston Li,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	linux-kernel

There should be no functional changes here

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Juston Li <juston.li@intel.com>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 71 ++++++++++++++++-----------
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 9b1b5c9b1fa0..2ab16c9e6243 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -1879,39 +1879,48 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
 
 	mutex_lock(&mgr->payload_lock);
 	for (i = 0; i < mgr->max_payloads; i++) {
+		struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
+		struct drm_dp_payload *payload = &mgr->payloads[i];
+
 		/* solve the current payloads - compare to the hw ones
 		   - update the hw view */
 		req_payload.start_slot = cur_slots;
-		if (mgr->proposed_vcpis[i]) {
-			port = container_of(mgr->proposed_vcpis[i], struct drm_dp_mst_port, vcpi);
+		if (vcpi) {
+			port = container_of(vcpi, struct drm_dp_mst_port,
+					    vcpi);
 			port = drm_dp_get_validated_port_ref(mgr, port);
 			if (!port) {
 				mutex_unlock(&mgr->payload_lock);
 				return -EINVAL;
 			}
-			req_payload.num_slots = mgr->proposed_vcpis[i]->num_slots;
-			req_payload.vcpi = mgr->proposed_vcpis[i]->vcpi;
+			req_payload.num_slots = vcpi->num_slots;
+			req_payload.vcpi = vcpi->vcpi;
 		} else {
 			port = NULL;
 			req_payload.num_slots = 0;
 		}
 
-		mgr->payloads[i].start_slot = req_payload.start_slot;
+		payload->start_slot = req_payload.start_slot;
 		/* work out what is required to happen with this payload */
-		if (mgr->payloads[i].num_slots != req_payload.num_slots) {
+		if (payload->num_slots != req_payload.num_slots) {
 
 			/* need to push an update for this payload */
 			if (req_payload.num_slots) {
-				drm_dp_create_payload_step1(mgr, mgr->proposed_vcpis[i]->vcpi, &req_payload);
-				mgr->payloads[i].num_slots = req_payload.num_slots;
-				mgr->payloads[i].vcpi = req_payload.vcpi;
-			} else if (mgr->payloads[i].num_slots) {
-				mgr->payloads[i].num_slots = 0;
-				drm_dp_destroy_payload_step1(mgr, port, mgr->payloads[i].vcpi, &mgr->payloads[i]);
-				req_payload.payload_state = mgr->payloads[i].payload_state;
-				mgr->payloads[i].start_slot = 0;
+				drm_dp_create_payload_step1(mgr, vcpi->vcpi,
+							    &req_payload);
+				payload->num_slots = req_payload.num_slots;
+				payload->vcpi = req_payload.vcpi;
+
+			} else if (payload->num_slots) {
+				payload->num_slots = 0;
+				drm_dp_destroy_payload_step1(mgr, port,
+							     payload->vcpi,
+							     payload);
+				req_payload.payload_state =
+					payload->payload_state;
+				payload->start_slot = 0;
 			}
-			mgr->payloads[i].payload_state = req_payload.payload_state;
+			payload->payload_state = req_payload.payload_state;
 		}
 		cur_slots += req_payload.num_slots;
 
@@ -1920,22 +1929,26 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
 	}
 
 	for (i = 0; i < mgr->max_payloads; i++) {
-		if (mgr->payloads[i].payload_state == DP_PAYLOAD_DELETE_LOCAL) {
-			DRM_DEBUG_KMS("removing payload %d\n", i);
-			for (j = i; j < mgr->max_payloads - 1; j++) {
-				memcpy(&mgr->payloads[j], &mgr->payloads[j + 1], sizeof(struct drm_dp_payload));
-				mgr->proposed_vcpis[j] = mgr->proposed_vcpis[j + 1];
-				if (mgr->proposed_vcpis[j] && mgr->proposed_vcpis[j]->num_slots) {
-					set_bit(j + 1, &mgr->payload_mask);
-				} else {
-					clear_bit(j + 1, &mgr->payload_mask);
-				}
-			}
-			memset(&mgr->payloads[mgr->max_payloads - 1], 0, sizeof(struct drm_dp_payload));
-			mgr->proposed_vcpis[mgr->max_payloads - 1] = NULL;
-			clear_bit(mgr->max_payloads, &mgr->payload_mask);
+		if (mgr->payloads[i].payload_state != DP_PAYLOAD_DELETE_LOCAL)
+			continue;
+
+		DRM_DEBUG_KMS("removing payload %d\n", i);
+		for (j = i; j < mgr->max_payloads - 1; j++) {
+			mgr->payloads[j] = mgr->payloads[j + 1];
+			mgr->proposed_vcpis[j] = mgr->proposed_vcpis[j + 1];
 
+			if (mgr->proposed_vcpis[j] &&
+			    mgr->proposed_vcpis[j]->num_slots) {
+				set_bit(j + 1, &mgr->payload_mask);
+			} else {
+				clear_bit(j + 1, &mgr->payload_mask);
+			}
 		}
+
+		memset(&mgr->payloads[mgr->max_payloads - 1], 0,
+		       sizeof(struct drm_dp_payload));
+		mgr->proposed_vcpis[mgr->max_payloads - 1] = NULL;
+		clear_bit(mgr->max_payloads, &mgr->payload_mask);
 	}
 	mutex_unlock(&mgr->payload_lock);
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 01/15] drm/dp_mst: Remove bogus conditional in drm_dp_update_payload_part1() Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 02/15] drm/dp_mst: Refactor drm_dp_update_payload_part1() Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  9:29   ` Daniel Vetter
  2018-12-14  1:25 ` [WIP PATCH 04/15] drm/dp_mst: Stop releasing VCPI when removing ports from topology Lyude Paul
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	David Airlie, Juston Li, Maarten Lankhorst, Maxime Ripard,
	Sean Paul, David Airlie, Jonathan Corbet, linux-doc,
	linux-kernel

The current way of handling refcounting in the DP MST helpers is really
confusing and probably just plain wrong because it's been hacked up many
times over the years without anyone actually going over the code and
seeing if things could be simplified.

To the best of my understanding, the current scheme works like this:
drm_dp_mst_port and drm_dp_mst_branch both have a single refcount. When
this refcount hits 0 for either of the two, they're removed from the
topology state, but not immediately freed. Both ports and branch devices
will reinitialize their kref once it's hit 0 before actually destroying
themselves. The intended purpose behind this is so that we can avoid
problems like not being able to free a remote payload that might still
be active, due to us having removed all of the port/branch device
structures in memory, as per:

91a25e463130 ("drm/dp/mst: deallocate payload on port destruction")

Which may have worked, but then it caused use-after-free errors. Being
new to MST at the time, I tried fixing it;

263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()")

But, that was broken: both drm_dp_mst_port and drm_dp_mst_branch structs
are validated in almost every DP MST helper function. Simply put, this
means we go through the topology and try to see if the given
drm_dp_mst_branch or drm_dp_mst_port is still attached to something
before trying to use it in order to avoid dereferencing freed memory
(something that has happened a LOT in the past with this library).
Because of this it doesn't actually matter whether or not we keep keep
the ports and branches around in memory as that's not enough, because
any function that validates the branches and ports passed to it will
still reject them anyway since they're no longer in the topology
structure. So, use-after-free errors were fixed but payload deallocation
was completely broken.

Two years later, AMD informed me about this issue and I attempted to
come up with a temporary fix, pending a long-overdue cleanup of this
library:

c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref")

But then that introduced use-after-free errors, so I quickly reverted
it:

9765635b3075 ("Revert "drm/dp_mst: Skip validating ports during destruction, just ref"")

And in the process, learned that there is just no simple fix for this:
the design is just broken. Unfortuntely, the usage of these helpers are
quite broken as well. Some drivers like i915 have been smart enough to
avoid accessing any kind of information from MST port structures, but
others like nouveau have assumed, understandably so, that
drm_dp_mst_port structures are normal and can just be accessed at any
time without worrying about use-after-free errors.

After a lot of discussion, me and Daniel Vetter came up with a better
idea to replace all of this.

To summarize, since this is documented far more indepth in the
documentation this patch introduces, we make it so that drm_dp_mst_port
and drm_dp_mst_branch structures have two different classes of
refcounts: topology_kref, and malloc_kref. topology_kref corresponds to
the lifetime of the given drm_dp_mst_port or drm_dp_mst_branch in it's
given topology. Once it hits zero, any associated connectors are removed
and the branch or port can no longer be validated. malloc_kref
corresponds to the lifetime of the memory allocation for the actual
structure, and will always be non-zero so long as the topology_kref is
non-zero. This gives us a way to allow callers to hold onto port and
branch device structures past their topology lifetime, and dramatically
simplifies the lifetimes of both structures. This also finally fixes the
port deallocation problem, properly.

Additionally: since this now means that we can keep ports and branch
devices allocated in memory for however long we need, we no longer need
a significant amount of the port validation that we currently do.

Additionally, there is one last scenario that this fixes, which couldn't
have been fixed properly beforehand:

- CPU1 unrefs port from topology (refcount 1->0)
- CPU2 refs port in topology(refcount 0->1)

Since we now can guarantee memory safety for ports and branches
as-needed, we also can make our main reference counting functions fix
this problem by using kref_get_unless_zero() internally so that topology
refcounts can only ever reach 0 once.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@redhat.com>
Cc: Jerry Zuo <Jerry.Zuo@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Juston Li <juston.li@intel.com>
---
 .../gpu/dp-mst/topology-figure-1.dot          |  31 ++
 .../gpu/dp-mst/topology-figure-2.dot          |  37 ++
 .../gpu/dp-mst/topology-figure-3.dot          |  40 ++
 Documentation/gpu/drm-kms-helpers.rst         | 125 ++++-
 drivers/gpu/drm/drm_dp_mst_topology.c         | 512 +++++++++++++-----
 include/drm/drm_dp_mst_helper.h               |  19 +-
 6 files changed, 637 insertions(+), 127 deletions(-)
 create mode 100644 Documentation/gpu/dp-mst/topology-figure-1.dot
 create mode 100644 Documentation/gpu/dp-mst/topology-figure-2.dot
 create mode 100644 Documentation/gpu/dp-mst/topology-figure-3.dot

diff --git a/Documentation/gpu/dp-mst/topology-figure-1.dot b/Documentation/gpu/dp-mst/topology-figure-1.dot
new file mode 100644
index 000000000000..fb83789e0a3e
--- /dev/null
+++ b/Documentation/gpu/dp-mst/topology-figure-1.dot
@@ -0,0 +1,31 @@
+digraph T {
+    /* Topology references */
+    node [shape=oval];
+    mstb1 -> {port1, port2};
+    port1 -> mstb2;
+    port2 -> mstb3 -> {port3, port4};
+    port3 -> mstb4;
+
+    /* Malloc references */
+    edge [style=dashed];
+    mstb4 -> port3;
+    {port4, port3} -> mstb3;
+    mstb3 -> port2;
+    mstb2 -> port1;
+    {port1, port2} -> mstb1;
+
+    edge [dir=back];
+    node [style=filled;shape=box;fillcolor=lightblue];
+    port1 -> "Payload #1";
+    port3 -> "Payload #2";
+
+    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
+    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
+    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
+    mstb4 [label="MSTB #4";style=filled;fillcolor=palegreen];
+
+    port1 [label="Port #1"];
+    port2 [label="Port #2"];
+    port3 [label="Port #3"];
+    port4 [label="Port #4"];
+}
diff --git a/Documentation/gpu/dp-mst/topology-figure-2.dot b/Documentation/gpu/dp-mst/topology-figure-2.dot
new file mode 100644
index 000000000000..eebce560be40
--- /dev/null
+++ b/Documentation/gpu/dp-mst/topology-figure-2.dot
@@ -0,0 +1,37 @@
+digraph T {
+    /* Topology references */
+    node [shape=oval];
+
+    mstb1 -> {port1, port2};
+    port1 -> mstb2;
+    edge [color=red];
+    port2 -> mstb3 -> {port3, port4};
+    port3 -> mstb4;
+    edge [color=""];
+
+    /* Malloc references */
+    edge [style=dashed];
+    port3 -> mstb3;
+    mstb3 -> port2;
+    mstb2 -> port1;
+    {port1, port2} -> mstb1;
+    edge [color=red];
+    mstb4 -> port3;
+    port4 -> mstb3;
+    edge [color=""];
+
+    edge [dir=back];
+    node [style=filled;shape=box;fillcolor=lightblue];
+    port1 -> "Payload #1";
+    port3 -> "Payload #2";
+
+    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
+    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
+    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
+    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
+
+    port1 [label="Port #1"];
+    port2 [label="Port #2"];
+    port3 [label="Port #3"];
+    port4 [label="Port #4";style=filled;fillcolor=grey];
+}
diff --git a/Documentation/gpu/dp-mst/topology-figure-3.dot b/Documentation/gpu/dp-mst/topology-figure-3.dot
new file mode 100644
index 000000000000..9bf28d87144c
--- /dev/null
+++ b/Documentation/gpu/dp-mst/topology-figure-3.dot
@@ -0,0 +1,40 @@
+digraph T {
+    /* Topology references */
+    node [shape=oval];
+
+    mstb1 -> {port1, port2};
+    port1 -> mstb2;
+    edge [color=grey];
+    port2 -> mstb3 -> {port3, port4};
+    port3 -> mstb4;
+    edge [color=""];
+
+    /* Malloc references */
+    edge [style=dashed];
+    port3 -> mstb3 [penwidth=3];
+    mstb3 -> port2 [penwidth=3];
+    mstb2 -> port1;
+    {port1, port2} -> mstb1;
+    edge [color=grey];
+    mstb4 -> port3;
+    port4 -> mstb3;
+    edge [color=""];
+
+    edge [dir=back];
+    node [style=filled;shape=box;fillcolor=lightblue];
+    port1 -> payload1;
+    port3 -> payload2 [penwidth=3];
+
+    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
+    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
+    mstb3 [label="MSTB #3";penwidth=3;style=filled;fillcolor=palegreen];
+    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
+
+    port1 [label="Port #1"];
+    port2 [label="Port #2";penwidth=3];
+    port3 [label="Port #3";penwidth=3];
+    port4 [label="Port #4";style=filled;fillcolor=grey];
+
+    payload1 [label="Payload #1"];
+    payload2 [label="Payload #2";penwidth=3];
+}
diff --git a/Documentation/gpu/drm-kms-helpers.rst b/Documentation/gpu/drm-kms-helpers.rst
index b422eb8edf16..c0f994c2c72f 100644
--- a/Documentation/gpu/drm-kms-helpers.rst
+++ b/Documentation/gpu/drm-kms-helpers.rst
@@ -208,8 +208,11 @@ Display Port Dual Mode Adaptor Helper Functions Reference
 .. kernel-doc:: drivers/gpu/drm/drm_dp_dual_mode_helper.c
    :export:
 
-Display Port MST Helper Functions Reference
-===========================================
+Display Port MST Helpers
+========================
+
+Functions Reference
+-------------------
 
 .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
    :doc: dp mst helper
@@ -220,6 +223,124 @@ Display Port MST Helper Functions Reference
 .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
    :export:
 
+Branch device and port refcounting
+----------------------------------
+
+Overview
+~~~~~~~~
+
+The refcounting schemes for :c:type:`struct drm_dp_mst_branch` and
+:c:type:`struct drm_dp_mst_port` are somewhat unusual. Both ports and branch
+devices have two different kinds of refcounts: topology refcounts, and malloc
+refcounts.
+
+Topology refcount overview
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Topology refcounts are not exposed to drivers, and are handled internally by the
+DP MST helpers. The helpers use them in order to prevent the in-memory topology
+state from being changed in the middle of critical operations like changing the
+internal state of payload allocations. This means each branch and port will be
+considered to be connected to the rest of the topology until it's topology
+refcount reaches zero. Additionally, for ports this means that their associated
+:c:type:`struct drm_connector` will stay registered with userspace until the
+port's refcount reaches 0.
+
+
+Topology refcount functions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The DP MST helpers use the following functions to manage topology refcounts:
+
+.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
+   :functions: drm_dp_mst_topology_get_port drm_dp_mst_topology_put_port
+               drm_dp_mst_topology_ref_port drm_dp_mst_topology_get_mstb
+               drm_dp_mst_topology_put_mstb drm_dp_mst_topology_ref_mstb
+
+Malloc refcount overview
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Malloc references are used to keep a :c:type:`struct drm_dp_mst_port` or
+:c:type:`struct drm_dp_mst_branch` allocated even after all of its topology
+references have been dropped, so that the driver or MST helpers can safely
+access each branch's last known state before it was disconnected from the
+topology. When the malloc refcount of a port or branch reaches 0, the memory
+allocation containing the :c:type:`struct drm_dp_mst_branch` or :c:type:`struct
+drm_dp_mst_port` respectively will be freed.
+
+Malloc refcounts for ports
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For :c:type:`struct drm_dp_mst_port`, malloc refcounts are exposed to drivers
+through the following functions:
+
+.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
+   :functions: drm_dp_mst_get_port_malloc drm_dp_mst_put_port_malloc
+
+Malloc refcounts for branch devices
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For :c:type:`struct drm_dp_mst_branch`, malloc refcounts are not currently
+exposed to drivers. As of writing this documentation, there are no drivers that
+have a usecase for accessing :c:type:`struct drm_dp_mst_branch` outside of the
+MST helpers. Exposing this API to drivers in a race-free manner would take more
+tweaking of the refcounting scheme, however patches are welcome provided there
+is a legitimate driver usecase for this.
+
+Internally, malloc refcounts for :c:type:`struct drm_dp_mst_branch` are managed
+by the DP MST core through the following functions:
+
+.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
+   :functions: drm_dp_mst_get_mstb_malloc drm_dp_mst_put_mstb_malloc
+
+Refcount relationships in a topology
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Let's take a look at why the relationship between topology and malloc refcounts
+is designed the way it is.
+
+.. kernel-figure:: dp-mst/topology-figure-1.dot
+
+   An example of topology and malloc refs in a DP MST topology with two active
+   payloads. Topology refcount increments are indicated by solid lines, and
+   malloc refcount increments are indicated by dashed lines. Each starts from
+   the branch which incremented the refcount, and ends at the branch to which
+   the refcount belongs to.
+
+As you can see in figure 1, every branch increments the topology
+refcount of it's children, and increments the malloc refcount of it's parent.
+Additionally, every payload increments the malloc refcount of it's assigned port
+by 1.
+
+So, what would happen if MSTB #3 from the above figure was unplugged from the
+system, but the driver hadn't yet removed payload #2 from port #3? The topology
+would start to look like figure 2.
+
+.. kernel-figure:: dp-mst/topology-figure-2.dot
+
+   Ports and branch devices which have been released from memory are colored
+   grey, and references which have been removed are colored red.
+
+Whenever a port or branch device's topology refcount reaches zero, it will
+decrement the topology refcounts of all its children, the malloc refcount of its
+parent, and finally its own malloc refcount. For MSTB #4 and port #4, this means
+they both have been disconnected from the topology and freed from memory. But,
+because payload #2 is still holding a reference to port #3, port #3 is removed
+from the topology but it's :c:type:`struct drm_dp_mst_port` is still accessible
+from memory. This also means port #3 has not yet decremented the malloc refcount
+of MSTB #3, so it's :c:type:`struct drm_dp_mst_branch` will also stay allocated
+in memory until port #3's malloc refcount reaches 0.
+
+This relationship is necessary because in order to release payload #2, we
+need to be able to figure out the last relative of port #3 that's still
+connected to the topology. In this case, we would travel up the topology as
+shown in figure 3.
+
+.. kernel-figure:: dp-mst/topology-figure-3.dot
+
+And finally, remove payload #2 by communicating with port #2 through sideband
+transactions.
+
 MIPI DSI Helper Functions Reference
 ===================================
 
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 2ab16c9e6243..c196fb580beb 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -46,7 +46,7 @@ static bool dump_dp_payload_table(struct drm_dp_mst_topology_mgr *mgr,
 				  char *buf);
 static int test_calc_pbn_mode(void);
 
-static void drm_dp_put_port(struct drm_dp_mst_port *port);
+static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port);
 
 static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,
 				     int id,
@@ -850,46 +850,120 @@ static struct drm_dp_mst_branch *drm_dp_add_mst_branch_device(u8 lct, u8 *rad)
 	if (lct > 1)
 		memcpy(mstb->rad, rad, lct / 2);
 	INIT_LIST_HEAD(&mstb->ports);
-	kref_init(&mstb->kref);
+	kref_init(&mstb->topology_kref);
+	kref_init(&mstb->malloc_kref);
 	return mstb;
 }
 
 static void drm_dp_free_mst_port(struct kref *kref);
+static void drm_dp_free_mst_branch_device(struct kref *kref);
+
+/**
+ * drm_dp_mst_get_mstb_malloc() - Increment the malloc refcount of a branch
+ * device
+ * @mstb: The &struct drm_dp_mst_branch to increment the malloc refcount of
+ *
+ * Increments @mstb.malloc_kref. When @mstb.malloc_kref reaches 0, the memory
+ * allocation for @mstb will be released and @mstb may no longer be used.
+ *
+ * Any malloc references acquired with this function must be released when
+ * they are no longer being used by calling drm_dp_mst_put_mstb_malloc().
+ *
+ * See also: drm_dp_mst_put_mstb_malloc()
+ */
+static void
+drm_dp_mst_get_mstb_malloc(struct drm_dp_mst_branch *mstb)
+{
+	kref_get(&mstb->malloc_kref);
+	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref));
+}
+
+/**
+ * drm_dp_mst_put_mstb_malloc() - Decrement the malloc refcount of a branch
+ * device
+ * @mstb: The &struct drm_dp_mst_branch to decrement the malloc refcount of
+ *
+ * Decrements @mstb.malloc_kref. When @mstb.malloc_kref reaches 0, the memory
+ * allocation for @mstb will be released and @mstb may no longer be used.
+ *
+ * See also: drm_dp_mst_get_mstb_malloc()
+ */
+static void
+drm_dp_mst_put_mstb_malloc(struct drm_dp_mst_branch *mstb)
+{
+	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref)-1);
+	kref_put(&mstb->malloc_kref, drm_dp_free_mst_branch_device);
+}
+
+/**
+ * drm_dp_mst_get_port_malloc() - Increment the malloc refcount of an MST port
+ * @port: The &struct drm_dp_mst_port to increment the malloc refcount of
+ *
+ * Increments @port.malloc_kref. When @port.malloc_kref reaches 0, the memory
+ * allocation for @port will be released and @port may no longer be used.
+ *
+ * Because @port could potentially be freed at any time by the DP MST helpers
+ * if @port.malloc_kref reaches 0, including during a call to this function,
+ * drivers that which to make use of &struct drm_dp_mst_port should ensure
+ * that they grab at least one main malloc reference to their MST ports in
+ * &drm_dp_mst_topology_cbs.add_connector. This callback is called before
+ * there is any chance for @port.malloc_kref to reach 0.
+ *
+ * Any malloc references acquired with this function must be released when
+ * they are no longer being used by calling drm_dp_mst_put_port_malloc().
+ *
+ * See also: drm_dp_mst_put_port_malloc()
+ */
+void
+drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port)
+{
+	kref_get(&port->malloc_kref);
+	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref));
+}
+EXPORT_SYMBOL(drm_dp_mst_get_port_malloc);
+
+/**
+ * drm_dp_mst_put_port_malloc() - Decrement the malloc refcount of an MST port
+ * @port: The &struct drm_dp_mst_port to decrement the malloc refcount of
+ *
+ * Decrements @port.malloc_kref. When @port.malloc_kref reaches 0, the memory
+ * allocation for @port will be released and @port may no longer be used.
+ *
+ * See also: drm_dp_mst_get_port_malloc()
+ */
+void
+drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port)
+{
+	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref)-1);
+	kref_put(&port->malloc_kref, drm_dp_free_mst_port);
+}
+EXPORT_SYMBOL(drm_dp_mst_put_port_malloc);
 
 static void drm_dp_free_mst_branch_device(struct kref *kref)
 {
-	struct drm_dp_mst_branch *mstb = container_of(kref, struct drm_dp_mst_branch, kref);
-	if (mstb->port_parent) {
-		if (list_empty(&mstb->port_parent->next))
-			kref_put(&mstb->port_parent->kref, drm_dp_free_mst_port);
-	}
+	struct drm_dp_mst_branch *mstb =
+		container_of(kref, struct drm_dp_mst_branch, malloc_kref);
+
+	if (mstb->port_parent)
+		drm_dp_mst_put_port_malloc(mstb->port_parent);
+
 	kfree(mstb);
 }
 
 static void drm_dp_destroy_mst_branch_device(struct kref *kref)
 {
-	struct drm_dp_mst_branch *mstb = container_of(kref, struct drm_dp_mst_branch, kref);
+	struct drm_dp_mst_branch *mstb =
+		container_of(kref, struct drm_dp_mst_branch, topology_kref);
+	struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
 	struct drm_dp_mst_port *port, *tmp;
 	bool wake_tx = false;
 
-	/*
-	 * init kref again to be used by ports to remove mst branch when it is
-	 * not needed anymore
-	 */
-	kref_init(kref);
-
-	if (mstb->port_parent && list_empty(&mstb->port_parent->next))
-		kref_get(&mstb->port_parent->kref);
-
-	/*
-	 * destroy all ports - don't need lock
-	 * as there are no more references to the mst branch
-	 * device at this point.
-	 */
+	mutex_lock(&mgr->lock);
 	list_for_each_entry_safe(port, tmp, &mstb->ports, next) {
 		list_del(&port->next);
-		drm_dp_put_port(port);
+		drm_dp_mst_topology_put_port(port);
 	}
+	mutex_unlock(&mgr->lock);
 
 	/* drop any tx slots msg */
 	mutex_lock(&mstb->mgr->qlock);
@@ -908,14 +982,82 @@ static void drm_dp_destroy_mst_branch_device(struct kref *kref)
 	if (wake_tx)
 		wake_up_all(&mstb->mgr->tx_waitq);
 
-	kref_put(kref, drm_dp_free_mst_branch_device);
+	drm_dp_mst_put_mstb_malloc(mstb);
 }
 
-static void drm_dp_put_mst_branch_device(struct drm_dp_mst_branch *mstb)
+/**
+ * drm_dp_mst_topology_get_mstb() - Increment the topology refcount of a
+ * branch device unless its zero
+ * @mstb: &struct drm_dp_mst_branch to increment the topology refcount of
+ *
+ * Attempts to grab a topology reference to @mstb, if it hasn't yet been
+ * removed from the topology (e.g. @mstb.topology_kref has reached 0).
+ *
+ * Any topology references acquired with this function must be released when
+ * they are no longer being used by calling drm_dp_mst_topology_put_mstb().
+ *
+ * See also:
+ * drm_dp_mst_topology_ref_mstb()
+ * drm_dp_mst_topology_get_mstb()
+ *
+ * Returns:
+ * * 1: A topology reference was grabbed successfully
+ * * 0: @port is no longer in the topology, no reference was grabbed
+ */
+static int __must_check
+drm_dp_mst_topology_get_mstb(struct drm_dp_mst_branch *mstb)
 {
-	kref_put(&mstb->kref, drm_dp_destroy_mst_branch_device);
+	int ret = kref_get_unless_zero(&mstb->topology_kref);
+
+	if (ret)
+		DRM_DEBUG("mstb %p (%d)\n", mstb,
+			  kref_read(&mstb->topology_kref));
+
+	return ret;
+}
+
+/**
+ * drm_dp_mst_topology_ref_mstb() - Increment the topology refcount of a
+ * branch device
+ * @mstb: The &struct drm_dp_mst_branch to increment the topology refcount of
+ *
+ * Increments @mstb.topology_refcount without checking whether or not it's
+ * already reached 0. This is only valid to use in scenarios where you are
+ * already guaranteed to have at least one active topology reference to @mstb.
+ * Otherwise, drm_dp_mst_topology_get_mstb() should be used.
+ *
+ * Any topology references acquired with this function must be released when
+ * they are no longer being used by calling drm_dp_mst_topology_put_mstb().
+ *
+ * See also:
+ * drm_dp_mst_topology_get_mstb()
+ * drm_dp_mst_topology_put_mstb()
+ */
+static void
+drm_dp_mst_topology_ref_mstb(struct drm_dp_mst_branch *mstb)
+{
+	kref_get(&mstb->topology_kref);
+	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->topology_kref));
 }
 
+/**
+ * drm_dp_mst_topology_put_mstb() - release a topology reference to a branch
+ * device
+ * @mstb: The &struct drm_dp_mst_branch to release the topology reference from
+ *
+ * Releases a topology reference from @mstb by decrementing
+ * @mstb.topology_kref.
+ *
+ * See also:
+ * drm_dp_mst_topology_get_mstb()
+ * drm_dp_mst_topology_ref_mstb()
+ */
+static void
+drm_dp_mst_topology_put_mstb(struct drm_dp_mst_branch *mstb)
+{
+	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->topology_kref)-1);
+	kref_put(&mstb->topology_kref, drm_dp_destroy_mst_branch_device);
+}
 
 static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int old_pdt)
 {
@@ -930,14 +1072,15 @@ static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int old_pdt)
 	case DP_PEER_DEVICE_MST_BRANCHING:
 		mstb = port->mstb;
 		port->mstb = NULL;
-		drm_dp_put_mst_branch_device(mstb);
+		drm_dp_mst_topology_put_mstb(mstb);
 		break;
 	}
 }
 
 static void drm_dp_destroy_port(struct kref *kref)
 {
-	struct drm_dp_mst_port *port = container_of(kref, struct drm_dp_mst_port, kref);
+	struct drm_dp_mst_port *port =
+		container_of(kref, struct drm_dp_mst_port, topology_kref);
 	struct drm_dp_mst_topology_mgr *mgr = port->mgr;
 
 	if (!port->input) {
@@ -956,7 +1099,6 @@ static void drm_dp_destroy_port(struct kref *kref)
 			 * from an EDID retrieval */
 
 			mutex_lock(&mgr->destroy_connector_lock);
-			kref_get(&port->parent->kref);
 			list_add(&port->next, &mgr->destroy_connector_list);
 			mutex_unlock(&mgr->destroy_connector_lock);
 			schedule_work(&mgr->destroy_connector_work);
@@ -967,25 +1109,93 @@ static void drm_dp_destroy_port(struct kref *kref)
 		drm_dp_port_teardown_pdt(port, port->pdt);
 		port->pdt = DP_PEER_DEVICE_NONE;
 	}
-	kfree(port);
+	drm_dp_mst_put_port_malloc(port);
 }
 
-static void drm_dp_put_port(struct drm_dp_mst_port *port)
+/**
+ * drm_dp_mst_topology_get_port() - Increment the topology refcount of a
+ * port unless its zero
+ * @port: &struct drm_dp_mst_port to increment the topology refcount of
+ *
+ * Attempts to grab a topology reference to @port, if it hasn't yet been
+ * removed from the topology (e.g. @port.topology_kref has reached 0).
+ *
+ * Any topology references acquired with this function must be released when
+ * they are no longer being used by calling drm_dp_mst_topology_put_port().
+ *
+ * See also:
+ * drm_dp_mst_topology_ref_port()
+ * drm_dp_mst_topology_put_port()
+ *
+ * Returns:
+ * * 1: A topology reference was grabbed successfully
+ * * 0: @port is no longer in the topology, no reference was grabbed
+ */
+static int __must_check
+drm_dp_mst_topology_get_port(struct drm_dp_mst_port *port)
 {
-	kref_put(&port->kref, drm_dp_destroy_port);
+	int ret = kref_get_unless_zero(&port->topology_kref);
+
+	if (ret)
+		DRM_DEBUG("port %p (%d)\n", port,
+			  kref_read(&port->topology_kref));
+
+	return ret;
 }
 
-static struct drm_dp_mst_branch *drm_dp_mst_get_validated_mstb_ref_locked(struct drm_dp_mst_branch *mstb, struct drm_dp_mst_branch *to_find)
+/**
+ * drm_dp_mst_topology_ref_port() - Increment the topology refcount of a port
+ * @port: The &struct drm_dp_mst_port to increment the topology refcount of
+ *
+ * Increments @port.topology_refcount without checking whether or not it's
+ * already reached 0. This is only valid to use in scenarios where you are
+ * already guaranteed to have at least one active topology reference to @port.
+ * Otherwise, drm_dp_mst_topology_get_port() should be used.
+ *
+ * Any topology references acquired with this function must be released when
+ * they are no longer being used by calling drm_dp_mst_topology_put_port().
+ *
+ * See also:
+ * drm_dp_mst_topology_get_port()
+ * drm_dp_mst_topology_put_port()
+ */
+static void drm_dp_mst_topology_ref_port(struct drm_dp_mst_port *port)
+{
+	kref_get(&port->topology_kref);
+	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->topology_kref));
+}
+
+/**
+ * drm_dp_mst_topology_put_port() - release a topology reference to a port
+ * @port: The &struct drm_dp_mst_port to release the topology reference from
+ *
+ * Releases a topology reference from @port by decrementing
+ * @port.topology_kref.
+ *
+ * See also:
+ * drm_dp_mst_topology_get_port()
+ * drm_dp_mst_topology_ref_port()
+ */
+static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port)
+{
+	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->topology_kref)-1);
+	kref_put(&port->topology_kref, drm_dp_destroy_port);
+}
+
+static struct drm_dp_mst_branch *
+drm_dp_mst_topology_get_mstb_validated_locked(struct drm_dp_mst_branch *mstb,
+					      struct drm_dp_mst_branch *to_find)
 {
 	struct drm_dp_mst_port *port;
 	struct drm_dp_mst_branch *rmstb;
-	if (to_find == mstb) {
-		kref_get(&mstb->kref);
+
+	if (to_find == mstb)
 		return mstb;
-	}
+
 	list_for_each_entry(port, &mstb->ports, next) {
 		if (port->mstb) {
-			rmstb = drm_dp_mst_get_validated_mstb_ref_locked(port->mstb, to_find);
+			rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
+			    port->mstb, to_find);
 			if (rmstb)
 				return rmstb;
 		}
@@ -993,27 +1203,37 @@ static struct drm_dp_mst_branch *drm_dp_mst_get_validated_mstb_ref_locked(struct
 	return NULL;
 }
 
-static struct drm_dp_mst_branch *drm_dp_get_validated_mstb_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_branch *mstb)
+static struct drm_dp_mst_branch *
+drm_dp_mst_topology_get_mstb_validated(struct drm_dp_mst_topology_mgr *mgr,
+				       struct drm_dp_mst_branch *mstb)
 {
 	struct drm_dp_mst_branch *rmstb = NULL;
+
 	mutex_lock(&mgr->lock);
-	if (mgr->mst_primary)
-		rmstb = drm_dp_mst_get_validated_mstb_ref_locked(mgr->mst_primary, mstb);
+	if (mgr->mst_primary) {
+		rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
+		    mgr->mst_primary, mstb);
+
+		if (rmstb && !drm_dp_mst_topology_get_mstb(rmstb))
+			rmstb = NULL;
+	}
 	mutex_unlock(&mgr->lock);
 	return rmstb;
 }
 
-static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_branch *mstb, struct drm_dp_mst_port *to_find)
+static struct drm_dp_mst_port *
+drm_dp_mst_topology_get_port_validated_locked(struct drm_dp_mst_branch *mstb,
+					      struct drm_dp_mst_port *to_find)
 {
 	struct drm_dp_mst_port *port, *mport;
 
 	list_for_each_entry(port, &mstb->ports, next) {
-		if (port == to_find) {
-			kref_get(&port->kref);
+		if (port == to_find)
 			return port;
-		}
+
 		if (port->mstb) {
-			mport = drm_dp_mst_get_port_ref_locked(port->mstb, to_find);
+			mport = drm_dp_mst_topology_get_port_validated_locked(
+			    port->mstb, to_find);
 			if (mport)
 				return mport;
 		}
@@ -1021,12 +1241,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
 	return NULL;
 }
 
-static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
+static struct drm_dp_mst_port *
+drm_dp_mst_topology_get_port_validated(struct drm_dp_mst_topology_mgr *mgr,
+				       struct drm_dp_mst_port *port)
 {
 	struct drm_dp_mst_port *rport = NULL;
+
 	mutex_lock(&mgr->lock);
-	if (mgr->mst_primary)
-		rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port);
+	if (mgr->mst_primary) {
+		rport = drm_dp_mst_topology_get_port_validated_locked(
+		    mgr->mst_primary, port);
+
+		if (rport && !drm_dp_mst_topology_get_port(rport))
+			rport = NULL;
+	}
 	mutex_unlock(&mgr->lock);
 	return rport;
 }
@@ -1034,11 +1262,12 @@ static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_t
 static struct drm_dp_mst_port *drm_dp_get_port(struct drm_dp_mst_branch *mstb, u8 port_num)
 {
 	struct drm_dp_mst_port *port;
+	int ret;
 
 	list_for_each_entry(port, &mstb->ports, next) {
 		if (port->port_num == port_num) {
-			kref_get(&port->kref);
-			return port;
+			ret = drm_dp_mst_topology_get_port(port);
+			return ret ? port : NULL;
 		}
 	}
 
@@ -1087,6 +1316,11 @@ static bool drm_dp_port_setup_pdt(struct drm_dp_mst_port *port)
 		if (port->mstb) {
 			port->mstb->mgr = port->mgr;
 			port->mstb->port_parent = port;
+			/*
+			 * Make sure this port's memory allocation stays
+			 * around until it's child MSTB releases it
+			 */
+			drm_dp_mst_get_port_malloc(port);
 
 			send_link = true;
 		}
@@ -1147,17 +1381,26 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
 	bool created = false;
 	int old_pdt = 0;
 	int old_ddps = 0;
+
 	port = drm_dp_get_port(mstb, port_msg->port_number);
 	if (!port) {
 		port = kzalloc(sizeof(*port), GFP_KERNEL);
 		if (!port)
 			return;
-		kref_init(&port->kref);
+		kref_init(&port->topology_kref);
+		kref_init(&port->malloc_kref);
 		port->parent = mstb;
 		port->port_num = port_msg->port_number;
 		port->mgr = mstb->mgr;
 		port->aux.name = "DPMST";
 		port->aux.dev = dev->dev;
+
+		/*
+		 * Make sure the memory allocation for our parent branch stays
+		 * around until our own memory allocation is released
+		 */
+		drm_dp_mst_get_mstb_malloc(mstb);
+
 		created = true;
 	} else {
 		old_pdt = port->pdt;
@@ -1177,7 +1420,7 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
 	   for this list */
 	if (created) {
 		mutex_lock(&mstb->mgr->lock);
-		kref_get(&port->kref);
+		drm_dp_mst_topology_ref_port(port);
 		list_add(&port->next, &mstb->ports);
 		mutex_unlock(&mstb->mgr->lock);
 	}
@@ -1202,17 +1445,21 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
 	if (created && !port->input) {
 		char proppath[255];
 
-		build_mst_prop_path(mstb, port->port_num, proppath, sizeof(proppath));
-		port->connector = (*mstb->mgr->cbs->add_connector)(mstb->mgr, port, proppath);
+		build_mst_prop_path(mstb, port->port_num, proppath,
+				    sizeof(proppath));
+		port->connector = (*mstb->mgr->cbs->add_connector)(mstb->mgr,
+								   port,
+								   proppath);
 		if (!port->connector) {
 			/* remove it from the port list */
 			mutex_lock(&mstb->mgr->lock);
 			list_del(&port->next);
 			mutex_unlock(&mstb->mgr->lock);
 			/* drop port list reference */
-			drm_dp_put_port(port);
+			drm_dp_mst_topology_put_port(port);
 			goto out;
 		}
+
 		if ((port->pdt == DP_PEER_DEVICE_DP_LEGACY_CONV ||
 		     port->pdt == DP_PEER_DEVICE_SST_SINK) &&
 		    port->port_num >= DP_MST_LOGICAL_PORT_0) {
@@ -1224,7 +1471,7 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
 
 out:
 	/* put reference to this port */
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_port(port);
 }
 
 static void drm_dp_update_port(struct drm_dp_mst_branch *mstb,
@@ -1259,7 +1506,7 @@ static void drm_dp_update_port(struct drm_dp_mst_branch *mstb,
 			dowork = true;
 	}
 
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_port(port);
 	if (dowork)
 		queue_work(system_long_wq, &mstb->mgr->work);
 
@@ -1270,7 +1517,7 @@ static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_
 {
 	struct drm_dp_mst_branch *mstb;
 	struct drm_dp_mst_port *port;
-	int i;
+	int i, ret;
 	/* find the port by iterating down */
 
 	mutex_lock(&mgr->lock);
@@ -1295,7 +1542,9 @@ static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_
 			}
 		}
 	}
-	kref_get(&mstb->kref);
+	ret = drm_dp_mst_topology_get_mstb(mstb);
+	if (!ret)
+		mstb = NULL;
 out:
 	mutex_unlock(&mgr->lock);
 	return mstb;
@@ -1325,19 +1574,22 @@ static struct drm_dp_mst_branch *get_mst_branch_device_by_guid_helper(
 	return NULL;
 }
 
-static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device_by_guid(
-	struct drm_dp_mst_topology_mgr *mgr,
-	uint8_t *guid)
+static struct drm_dp_mst_branch *
+drm_dp_get_mst_branch_device_by_guid(struct drm_dp_mst_topology_mgr *mgr,
+				     uint8_t *guid)
 {
 	struct drm_dp_mst_branch *mstb;
+	int ret;
 
 	/* find the port by iterating down */
 	mutex_lock(&mgr->lock);
 
 	mstb = get_mst_branch_device_by_guid_helper(mgr->mst_primary, guid);
-
-	if (mstb)
-		kref_get(&mstb->kref);
+	if (mstb) {
+		ret = drm_dp_mst_topology_get_mstb(mstb);
+		if (!ret)
+			mstb = NULL;
+	}
 
 	mutex_unlock(&mgr->lock);
 	return mstb;
@@ -1362,10 +1614,10 @@ static void drm_dp_check_and_send_link_address(struct drm_dp_mst_topology_mgr *m
 			drm_dp_send_enum_path_resources(mgr, mstb, port);
 
 		if (port->mstb) {
-			mstb_child = drm_dp_get_validated_mstb_ref(mgr, port->mstb);
+			mstb_child = drm_dp_mst_topology_get_mstb_validated(mgr, port->mstb);
 			if (mstb_child) {
 				drm_dp_check_and_send_link_address(mgr, mstb_child);
-				drm_dp_put_mst_branch_device(mstb_child);
+				drm_dp_mst_topology_put_mstb(mstb_child);
 			}
 		}
 	}
@@ -1375,16 +1627,19 @@ static void drm_dp_mst_link_probe_work(struct work_struct *work)
 {
 	struct drm_dp_mst_topology_mgr *mgr = container_of(work, struct drm_dp_mst_topology_mgr, work);
 	struct drm_dp_mst_branch *mstb;
+	int ret;
 
 	mutex_lock(&mgr->lock);
 	mstb = mgr->mst_primary;
 	if (mstb) {
-		kref_get(&mstb->kref);
+		ret = drm_dp_mst_topology_get_mstb(mstb);
+		if (!ret)
+			mstb = NULL;
 	}
 	mutex_unlock(&mgr->lock);
 	if (mstb) {
 		drm_dp_check_and_send_link_address(mgr, mstb);
-		drm_dp_put_mst_branch_device(mstb);
+		drm_dp_mst_topology_put_mstb(mstb);
 	}
 }
 
@@ -1695,22 +1950,32 @@ static struct drm_dp_mst_port *drm_dp_get_last_connected_port_to_mstb(struct drm
 	return drm_dp_get_last_connected_port_to_mstb(mstb->port_parent->parent);
 }
 
-static struct drm_dp_mst_branch *drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr *mgr,
-									 struct drm_dp_mst_branch *mstb,
-									 int *port_num)
+static struct drm_dp_mst_branch *
+drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr *mgr,
+					struct drm_dp_mst_branch *mstb,
+					int *port_num)
 {
 	struct drm_dp_mst_branch *rmstb = NULL;
 	struct drm_dp_mst_port *found_port;
+
 	mutex_lock(&mgr->lock);
-	if (mgr->mst_primary) {
+	if (!mgr->mst_primary)
+		goto out;
+
+	do {
 		found_port = drm_dp_get_last_connected_port_to_mstb(mstb);
+		if (!found_port)
+			break;
 
-		if (found_port) {
+		if (drm_dp_mst_topology_get_mstb(found_port->parent)) {
 			rmstb = found_port->parent;
-			kref_get(&rmstb->kref);
 			*port_num = found_port->port_num;
+		} else {
+			/* Search again, starting from this parent */
+			mstb = found_port->parent;
 		}
-	}
+	} while (!rmstb);
+out:
 	mutex_unlock(&mgr->lock);
 	return rmstb;
 }
@@ -1726,17 +1991,19 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
 	u8 sinks[DRM_DP_MAX_SDP_STREAMS];
 	int i;
 
-	port = drm_dp_get_validated_port_ref(mgr, port);
+	port = drm_dp_mst_topology_get_port_validated(mgr, port);
 	if (!port)
 		return -EINVAL;
 
 	port_num = port->port_num;
-	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
+	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
 	if (!mstb) {
-		mstb = drm_dp_get_last_connected_port_and_mstb(mgr, port->parent, &port_num);
+		mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
+							       port->parent,
+							       &port_num);
 
 		if (!mstb) {
-			drm_dp_put_port(port);
+			drm_dp_mst_topology_put_port(port);
 			return -EINVAL;
 		}
 	}
@@ -1766,8 +2033,8 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
 	}
 	kfree(txmsg);
 fail_put:
-	drm_dp_put_mst_branch_device(mstb);
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_mstb(mstb);
+	drm_dp_mst_topology_put_port(port);
 	return ret;
 }
 
@@ -1777,13 +2044,13 @@ int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
 	struct drm_dp_sideband_msg_tx *txmsg;
 	int len, ret;
 
-	port = drm_dp_get_validated_port_ref(mgr, port);
+	port = drm_dp_mst_topology_get_port_validated(mgr, port);
 	if (!port)
 		return -EINVAL;
 
 	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
 	if (!txmsg) {
-		drm_dp_put_port(port);
+		drm_dp_mst_topology_put_port(port);
 		return -ENOMEM;
 	}
 
@@ -1799,7 +2066,7 @@ int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
 			ret = 0;
 	}
 	kfree(txmsg);
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_port(port);
 
 	return ret;
 }
@@ -1888,7 +2155,8 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
 		if (vcpi) {
 			port = container_of(vcpi, struct drm_dp_mst_port,
 					    vcpi);
-			port = drm_dp_get_validated_port_ref(mgr, port);
+			port = drm_dp_mst_topology_get_port_validated(mgr,
+								      port);
 			if (!port) {
 				mutex_unlock(&mgr->payload_lock);
 				return -EINVAL;
@@ -1925,7 +2193,7 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
 		cur_slots += req_payload.num_slots;
 
 		if (port)
-			drm_dp_put_port(port);
+			drm_dp_mst_topology_put_port(port);
 	}
 
 	for (i = 0; i < mgr->max_payloads; i++) {
@@ -2024,7 +2292,7 @@ static int drm_dp_send_dpcd_write(struct drm_dp_mst_topology_mgr *mgr,
 	struct drm_dp_sideband_msg_tx *txmsg;
 	struct drm_dp_mst_branch *mstb;
 
-	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
+	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
 	if (!mstb)
 		return -EINVAL;
 
@@ -2048,7 +2316,7 @@ static int drm_dp_send_dpcd_write(struct drm_dp_mst_topology_mgr *mgr,
 	}
 	kfree(txmsg);
 fail_put:
-	drm_dp_put_mst_branch_device(mstb);
+	drm_dp_mst_topology_put_mstb(mstb);
 	return ret;
 }
 
@@ -2158,7 +2426,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
 
 		/* give this the main reference */
 		mgr->mst_primary = mstb;
-		kref_get(&mgr->mst_primary->kref);
+		drm_dp_mst_topology_ref_mstb(mgr->mst_primary);
 
 		ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
 							 DP_MST_EN | DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC);
@@ -2192,7 +2460,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
 out_unlock:
 	mutex_unlock(&mgr->lock);
 	if (mstb)
-		drm_dp_put_mst_branch_device(mstb);
+		drm_dp_mst_topology_put_mstb(mstb);
 	return ret;
 
 }
@@ -2357,7 +2625,7 @@ static int drm_dp_mst_handle_down_rep(struct drm_dp_mst_topology_mgr *mgr)
 			       mgr->down_rep_recv.initial_hdr.lct,
 				      mgr->down_rep_recv.initial_hdr.rad[0],
 				      mgr->down_rep_recv.msg[0]);
-			drm_dp_put_mst_branch_device(mstb);
+			drm_dp_mst_topology_put_mstb(mstb);
 			memset(&mgr->down_rep_recv, 0, sizeof(struct drm_dp_sideband_msg_rx));
 			return 0;
 		}
@@ -2368,7 +2636,7 @@ static int drm_dp_mst_handle_down_rep(struct drm_dp_mst_topology_mgr *mgr)
 		}
 
 		memset(&mgr->down_rep_recv, 0, sizeof(struct drm_dp_sideband_msg_rx));
-		drm_dp_put_mst_branch_device(mstb);
+		drm_dp_mst_topology_put_mstb(mstb);
 
 		mutex_lock(&mgr->qlock);
 		txmsg->state = DRM_DP_SIDEBAND_TX_RX;
@@ -2441,7 +2709,7 @@ static int drm_dp_mst_handle_up_req(struct drm_dp_mst_topology_mgr *mgr)
 		}
 
 		if (mstb)
-			drm_dp_put_mst_branch_device(mstb);
+			drm_dp_mst_topology_put_mstb(mstb);
 
 		memset(&mgr->up_req_recv, 0, sizeof(struct drm_dp_sideband_msg_rx));
 	}
@@ -2501,7 +2769,7 @@ enum drm_connector_status drm_dp_mst_detect_port(struct drm_connector *connector
 	enum drm_connector_status status = connector_status_disconnected;
 
 	/* we need to search for the port in the mgr in case its gone */
-	port = drm_dp_get_validated_port_ref(mgr, port);
+	port = drm_dp_mst_topology_get_port_validated(mgr, port);
 	if (!port)
 		return connector_status_disconnected;
 
@@ -2526,7 +2794,7 @@ enum drm_connector_status drm_dp_mst_detect_port(struct drm_connector *connector
 		break;
 	}
 out:
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_port(port);
 	return status;
 }
 EXPORT_SYMBOL(drm_dp_mst_detect_port);
@@ -2543,11 +2811,11 @@ bool drm_dp_mst_port_has_audio(struct drm_dp_mst_topology_mgr *mgr,
 {
 	bool ret = false;
 
-	port = drm_dp_get_validated_port_ref(mgr, port);
+	port = drm_dp_mst_topology_get_port_validated(mgr, port);
 	if (!port)
 		return ret;
 	ret = port->has_audio;
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_port(port);
 	return ret;
 }
 EXPORT_SYMBOL(drm_dp_mst_port_has_audio);
@@ -2567,7 +2835,7 @@ struct edid *drm_dp_mst_get_edid(struct drm_connector *connector, struct drm_dp_
 	struct edid *edid = NULL;
 
 	/* we need to search for the port in the mgr in case its gone */
-	port = drm_dp_get_validated_port_ref(mgr, port);
+	port = drm_dp_mst_topology_get_port_validated(mgr, port);
 	if (!port)
 		return NULL;
 
@@ -2578,7 +2846,7 @@ struct edid *drm_dp_mst_get_edid(struct drm_connector *connector, struct drm_dp_
 		drm_connector_set_tile_property(connector);
 	}
 	port->has_audio = drm_detect_monitor_audio(edid);
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_port(port);
 	return edid;
 }
 EXPORT_SYMBOL(drm_dp_mst_get_edid);
@@ -2649,7 +2917,7 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
 	if (IS_ERR(topology_state))
 		return PTR_ERR(topology_state);
 
-	port = drm_dp_get_validated_port_ref(mgr, port);
+	port = drm_dp_mst_topology_get_port_validated(mgr, port);
 	if (port == NULL)
 		return -EINVAL;
 	req_slots = DIV_ROUND_UP(pbn, mgr->pbn_div);
@@ -2657,14 +2925,14 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
 			req_slots, topology_state->avail_slots);
 
 	if (req_slots > topology_state->avail_slots) {
-		drm_dp_put_port(port);
+		drm_dp_mst_topology_put_port(port);
 		return -ENOSPC;
 	}
 
 	topology_state->avail_slots -= req_slots;
 	DRM_DEBUG_KMS("vcpi slots avail=%d", topology_state->avail_slots);
 
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_port(port);
 	return req_slots;
 }
 EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
@@ -2715,7 +2983,7 @@ bool drm_dp_mst_allocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
 {
 	int ret;
 
-	port = drm_dp_get_validated_port_ref(mgr, port);
+	port = drm_dp_mst_topology_get_port_validated(mgr, port);
 	if (!port)
 		return false;
 
@@ -2725,7 +2993,7 @@ bool drm_dp_mst_allocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
 	if (port->vcpi.vcpi > 0) {
 		DRM_DEBUG_KMS("payload: vcpi %d already allocated for pbn %d - requested pbn %d\n", port->vcpi.vcpi, port->vcpi.pbn, pbn);
 		if (pbn == port->vcpi.pbn) {
-			drm_dp_put_port(port);
+			drm_dp_mst_topology_put_port(port);
 			return true;
 		}
 	}
@@ -2733,13 +3001,13 @@ bool drm_dp_mst_allocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
 	ret = drm_dp_init_vcpi(mgr, &port->vcpi, pbn, slots);
 	if (ret) {
 		DRM_DEBUG_KMS("failed to init vcpi slots=%d max=63 ret=%d\n",
-				DIV_ROUND_UP(pbn, mgr->pbn_div), ret);
+			      DIV_ROUND_UP(pbn, mgr->pbn_div), ret);
 		goto out;
 	}
 	DRM_DEBUG_KMS("initing vcpi for pbn=%d slots=%d\n",
-			pbn, port->vcpi.num_slots);
+		      pbn, port->vcpi.num_slots);
 
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_port(port);
 	return true;
 out:
 	return false;
@@ -2749,12 +3017,12 @@ EXPORT_SYMBOL(drm_dp_mst_allocate_vcpi);
 int drm_dp_mst_get_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
 {
 	int slots = 0;
-	port = drm_dp_get_validated_port_ref(mgr, port);
+	port = drm_dp_mst_topology_get_port_validated(mgr, port);
 	if (!port)
 		return slots;
 
 	slots = port->vcpi.num_slots;
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_port(port);
 	return slots;
 }
 EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
@@ -2768,11 +3036,11 @@ EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
  */
 void drm_dp_mst_reset_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
 {
-	port = drm_dp_get_validated_port_ref(mgr, port);
+	port = drm_dp_mst_topology_get_port_validated(mgr, port);
 	if (!port)
 		return;
 	port->vcpi.num_slots = 0;
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_port(port);
 }
 EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
 
@@ -2781,9 +3049,10 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
  * @mgr: manager for this port
  * @port: unverified port to deallocate vcpi for
  */
-void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
+void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
+				struct drm_dp_mst_port *port)
 {
-	port = drm_dp_get_validated_port_ref(mgr, port);
+	port = drm_dp_mst_topology_get_port_validated(mgr, port);
 	if (!port)
 		return;
 
@@ -2792,7 +3061,7 @@ void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr, struct drm_
 	port->vcpi.pbn = 0;
 	port->vcpi.aligned_pbn = 0;
 	port->vcpi.vcpi = 0;
-	drm_dp_put_port(port);
+	drm_dp_mst_topology_put_port(port);
 }
 EXPORT_SYMBOL(drm_dp_mst_deallocate_vcpi);
 
@@ -3078,8 +3347,10 @@ static void drm_dp_tx_work(struct work_struct *work)
 
 static void drm_dp_free_mst_port(struct kref *kref)
 {
-	struct drm_dp_mst_port *port = container_of(kref, struct drm_dp_mst_port, kref);
-	kref_put(&port->parent->kref, drm_dp_free_mst_branch_device);
+	struct drm_dp_mst_port *port =
+		container_of(kref, struct drm_dp_mst_port, malloc_kref);
+
+	drm_dp_mst_put_mstb_malloc(port->parent);
 	kfree(port);
 }
 
@@ -3103,7 +3374,6 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
 		list_del(&port->next);
 		mutex_unlock(&mgr->destroy_connector_lock);
 
-		kref_init(&port->kref);
 		INIT_LIST_HEAD(&port->next);
 
 		mgr->cbs->destroy_connector(mgr, port->connector);
@@ -3117,7 +3387,7 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
 			drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
 		}
 
-		kref_put(&port->kref, drm_dp_free_mst_port);
+		drm_dp_mst_put_port_malloc(port);
 		send_hotplug = true;
 	}
 	if (send_hotplug)
@@ -3292,7 +3562,7 @@ static int drm_dp_mst_i2c_xfer(struct i2c_adapter *adapter, struct i2c_msg *msgs
 	struct drm_dp_sideband_msg_tx *txmsg = NULL;
 	int ret;
 
-	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
+	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
 	if (!mstb)
 		return -EREMOTEIO;
 
@@ -3342,7 +3612,7 @@ static int drm_dp_mst_i2c_xfer(struct i2c_adapter *adapter, struct i2c_msg *msgs
 	}
 out:
 	kfree(txmsg);
-	drm_dp_put_mst_branch_device(mstb);
+	drm_dp_mst_topology_put_mstb(mstb);
 	return ret;
 }
 
diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
index 371cc2816477..50643a39765d 100644
--- a/include/drm/drm_dp_mst_helper.h
+++ b/include/drm/drm_dp_mst_helper.h
@@ -44,7 +44,10 @@ struct drm_dp_vcpi {
 
 /**
  * struct drm_dp_mst_port - MST port
- * @kref: reference count for this port.
+ * @topology_kref: refcount for this port's lifetime in the topology, only the
+ * DP MST helpers should need to touch this
+ * @malloc_kref: refcount for the memory allocation containing this structure.
+ * See drm_dp_mst_get_port_malloc() and drm_dp_mst_put_port_malloc().
  * @port_num: port number
  * @input: if this port is an input port.
  * @mcs: message capability status - DP 1.2 spec.
@@ -67,7 +70,8 @@ struct drm_dp_vcpi {
  * in the MST topology.
  */
 struct drm_dp_mst_port {
-	struct kref kref;
+	struct kref topology_kref;
+	struct kref malloc_kref;
 
 	u8 port_num;
 	bool input;
@@ -102,7 +106,10 @@ struct drm_dp_mst_port {
 
 /**
  * struct drm_dp_mst_branch - MST branch device.
- * @kref: reference count for this port.
+ * @topology_kref: refcount for this branch device's lifetime in the topology,
+ * only the DP MST helpers should need to touch this
+ * @malloc_kref: refcount for the memory allocation containing this structure.
+ * See drm_dp_mst_get_mstb_malloc() and drm_dp_mst_put_mstb_malloc().
  * @rad: Relative Address to talk to this branch device.
  * @lct: Link count total to talk to this branch device.
  * @num_ports: number of ports on the branch.
@@ -121,7 +128,8 @@ struct drm_dp_mst_port {
  * to downstream port of parent branches.
  */
 struct drm_dp_mst_branch {
-	struct kref kref;
+	struct kref topology_kref;
+	struct kref malloc_kref;
 	u8 rad[8];
 	u8 lct;
 	int num_ports;
@@ -626,4 +634,7 @@ int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
 int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
 				 struct drm_dp_mst_port *port, bool power_up);
 
+void drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port);
+void drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port);
+
 #endif
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 04/15] drm/dp_mst: Stop releasing VCPI when removing ports from topology
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (2 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  9:40   ` Daniel Vetter
  2018-12-14  1:25 ` [WIP PATCH 05/15] drm/dp_mst: Fix payload deallocation on hotplugs using malloc refs Lyude Paul
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	linux-kernel

This has never actually worked, and isn't needed anyway: the driver's
always going to try to deallocate VCPI when it tears down the display
that the VCPI belongs to.

Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index c196fb580beb..ae9d019af9f2 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -1084,8 +1084,6 @@ static void drm_dp_destroy_port(struct kref *kref)
 	struct drm_dp_mst_topology_mgr *mgr = port->mgr;
 
 	if (!port->input) {
-		port->vcpi.num_slots = 0;
-
 		kfree(port->cached_edid);
 
 		/*
@@ -3381,12 +3379,6 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
 		drm_dp_port_teardown_pdt(port, port->pdt);
 		port->pdt = DP_PEER_DEVICE_NONE;
 
-		if (!port->input && port->vcpi.vcpi > 0) {
-			drm_dp_mst_reset_vcpi_slots(mgr, port);
-			drm_dp_update_payload_part1(mgr);
-			drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
-		}
-
 		drm_dp_mst_put_port_malloc(port);
 		send_hotplug = true;
 	}
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 05/15] drm/dp_mst: Fix payload deallocation on hotplugs using malloc refs
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (3 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 04/15] drm/dp_mst: Stop releasing VCPI when removing ports from topology Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  9:38   ` Daniel Vetter
  2018-12-14  1:25 ` [WIP PATCH 06/15] drm/i915: Keep malloc references to MST ports Lyude Paul
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	linux-kernel

Up until now, freeing payloads on remote MST hubs that just had ports
removed has almost never worked because we've been relying on port
validation in order to stop us from accessing ports that have already
been freed from memory, but ports which need their payloads released due
to being removed will never be a valid part of the topology after
they've been removed.

Since we've introduced malloc refs, we can replace all of the validation
logic in payload helpers which are used for deallocation with some
well-placed malloc krefs. This ensures that regardless of whether or not
the ports are still valid and in the topology, any port which has an
allocated payload will remain allocated in memory until it's payloads
have been removed - finally allowing us to actually release said
payloads correctly.

Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 54 +++++++++++++++------------
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index ae9d019af9f2..93f08bfd2ab3 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -1989,10 +1989,6 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
 	u8 sinks[DRM_DP_MAX_SDP_STREAMS];
 	int i;
 
-	port = drm_dp_mst_topology_get_port_validated(mgr, port);
-	if (!port)
-		return -EINVAL;
-
 	port_num = port->port_num;
 	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
 	if (!mstb) {
@@ -2000,10 +1996,8 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
 							       port->parent,
 							       &port_num);
 
-		if (!mstb) {
-			drm_dp_mst_topology_put_port(port);
+		if (!mstb)
 			return -EINVAL;
-		}
 	}
 
 	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
@@ -2032,7 +2026,6 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
 	kfree(txmsg);
 fail_put:
 	drm_dp_mst_topology_put_mstb(mstb);
-	drm_dp_mst_topology_put_port(port);
 	return ret;
 }
 
@@ -2137,15 +2130,16 @@ static int drm_dp_destroy_payload_step2(struct drm_dp_mst_topology_mgr *mgr,
  */
 int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
 {
-	int i, j;
-	int cur_slots = 1;
 	struct drm_dp_payload req_payload;
 	struct drm_dp_mst_port *port;
+	int i, j;
+	int cur_slots = 1;
 
 	mutex_lock(&mgr->payload_lock);
 	for (i = 0; i < mgr->max_payloads; i++) {
 		struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
 		struct drm_dp_payload *payload = &mgr->payloads[i];
+		bool put_port = false;
 
 		/* solve the current payloads - compare to the hw ones
 		   - update the hw view */
@@ -2153,12 +2147,20 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
 		if (vcpi) {
 			port = container_of(vcpi, struct drm_dp_mst_port,
 					    vcpi);
-			port = drm_dp_mst_topology_get_port_validated(mgr,
-								      port);
-			if (!port) {
-				mutex_unlock(&mgr->payload_lock);
-				return -EINVAL;
+
+			/* Validated ports don't matter if we're releasing
+			 * VCPI
+			 */
+			if (vcpi->num_slots) {
+				port = drm_dp_mst_topology_get_port_validated(
+				    mgr, port);
+				if (!port) {
+					mutex_unlock(&mgr->payload_lock);
+					return -EINVAL;
+				}
+				put_port = true;
 			}
+
 			req_payload.num_slots = vcpi->num_slots;
 			req_payload.vcpi = vcpi->vcpi;
 		} else {
@@ -2190,7 +2192,7 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
 		}
 		cur_slots += req_payload.num_slots;
 
-		if (port)
+		if (put_port)
 			drm_dp_mst_topology_put_port(port);
 	}
 
@@ -3005,6 +3007,8 @@ bool drm_dp_mst_allocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
 	DRM_DEBUG_KMS("initing vcpi for pbn=%d slots=%d\n",
 		      pbn, port->vcpi.num_slots);
 
+	/* Keep port allocated until it's payload has been removed */
+	drm_dp_mst_get_port_malloc(port);
 	drm_dp_mst_topology_put_port(port);
 	return true;
 out:
@@ -3034,11 +3038,12 @@ EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
  */
 void drm_dp_mst_reset_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
 {
-	port = drm_dp_mst_topology_get_port_validated(mgr, port);
-	if (!port)
-		return;
+	/*
+	 * A port with VCPI will remain allocated until it's VCPI is
+	 * released, no verified ref needed
+	 */
+
 	port->vcpi.num_slots = 0;
-	drm_dp_mst_topology_put_port(port);
 }
 EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
 
@@ -3050,16 +3055,17 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
 void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
 				struct drm_dp_mst_port *port)
 {
-	port = drm_dp_mst_topology_get_port_validated(mgr, port);
-	if (!port)
-		return;
+	/*
+	 * A port with VCPI will remain allocated until it's VCPI is
+	 * released, no verified ref needed
+	 */
 
 	drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
 	port->vcpi.num_slots = 0;
 	port->vcpi.pbn = 0;
 	port->vcpi.aligned_pbn = 0;
 	port->vcpi.vcpi = 0;
-	drm_dp_mst_topology_put_port(port);
+	drm_dp_mst_put_port_malloc(port);
 }
 EXPORT_SYMBOL(drm_dp_mst_deallocate_vcpi);
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 06/15] drm/i915: Keep malloc references to MST ports
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (4 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 05/15] drm/dp_mst: Fix payload deallocation on hotplugs using malloc refs Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  9:32   ` Daniel Vetter
  2018-12-14  1:25 ` [WIP PATCH 07/15] drm/nouveau: Remove bogus cleanup in nv50_mstm_add_connector() Lyude Paul
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, David Airlie,
	linux-kernel

So that the ports stay around until we've destroyed the connectors, in
order to ensure that we don't pass an invalid pointer to any MST helpers
once we introduce the new MST VCPI helpers.

Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/i915/intel_connector.c | 4 ++++
 drivers/gpu/drm/i915/intel_dp_mst.c    | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_connector.c b/drivers/gpu/drm/i915/intel_connector.c
index 18e370f607bc..37d2c644f4b8 100644
--- a/drivers/gpu/drm/i915/intel_connector.c
+++ b/drivers/gpu/drm/i915/intel_connector.c
@@ -95,6 +95,10 @@ void intel_connector_destroy(struct drm_connector *connector)
 	intel_panel_fini(&intel_connector->panel);
 
 	drm_connector_cleanup(connector);
+
+	if (intel_connector->port)
+		drm_dp_mst_put_port_malloc(intel_connector->port);
+
 	kfree(connector);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
index f05427b74e34..4d6ced34d465 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -484,6 +484,8 @@ static struct drm_connector *intel_dp_add_mst_connector(struct drm_dp_mst_topolo
 	if (ret)
 		goto err;
 
+	drm_dp_mst_get_port_malloc(port);
+
 	return connector;
 
 err:
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 07/15] drm/nouveau: Remove bogus cleanup in nv50_mstm_add_connector()
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (5 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 06/15] drm/i915: Keep malloc references to MST ports Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 08/15] drm/nouveau: Remove unnecessary VCPI checks in nv50_msto_cleanup() Lyude Paul
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Ben Skeggs, David Airlie, Ilia Mirkin, Sean Paul, linux-kernel

Trying to destroy the connector using mstc->connector.funcs->destroy()
if connector initialization fails is wrong: there is no possible
codepath in nv50_mstc_new where nv50_mstm_add_connector() would return
<0 and mstc would be non-NULL.

Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 26af45785939..641252208e67 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -1099,11 +1099,8 @@ nv50_mstm_add_connector(struct drm_dp_mst_topology_mgr *mgr,
 	int ret;
 
 	ret = nv50_mstc_new(mstm, port, path, &mstc);
-	if (ret) {
-		if (mstc)
-			mstc->connector.funcs->destroy(&mstc->connector);
+	if (ret)
 		return NULL;
-	}
 
 	return &mstc->connector;
 }
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 08/15] drm/nouveau: Remove unnecessary VCPI checks in nv50_msto_cleanup()
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (6 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 07/15] drm/nouveau: Remove bogus cleanup in nv50_mstm_add_connector() Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 09/15] drm/nouveau: Fix potential use-after-frees for MSTCs Lyude Paul
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Ben Skeggs, David Airlie, Sean Paul, Karol Herbst, Ilia Mirkin,
	linux-kernel

There is no need to look at the port's VCPI allocation before calling
drm_dp_mst_deallocate_vcpi(), as we already have msto->disabled to let
us avoid cleaning up an msto more then once. The DP MST core will never
call drm_dp_mst_deallocate_vcpi() on it's own, which is presumably what
these checks are meant to protect against.

More importantly though, we're about to stop clearing mstc->port in the
next commit, which means if we could potentially hit a use-after-free
error if we tried to check mstc->port->vcpi here. So to make life easier
for anyone who bisects this code in the future, use msto->disabled
instead to check whether or not we need to deallocate VCPI instead.

Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 641252208e67..0f7d72518604 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -704,14 +704,17 @@ nv50_msto_cleanup(struct nv50_msto *msto)
 	struct nv50_mstc *mstc = msto->mstc;
 	struct nv50_mstm *mstm = mstc->mstm;
 
+	if (!msto->disabled)
+		return;
+
 	NV_ATOMIC(drm, "%s: msto cleanup\n", msto->encoder.name);
-	if (mstc->port && mstc->port->vcpi.vcpi > 0 && !nv50_msto_payload(msto))
+
+	if (mstc->port)
 		drm_dp_mst_deallocate_vcpi(&mstm->mgr, mstc->port);
-	if (msto->disabled) {
-		msto->mstc = NULL;
-		msto->head = NULL;
-		msto->disabled = false;
-	}
+
+	msto->mstc = NULL;
+	msto->head = NULL;
+	msto->disabled = false;
 }
 
 static void
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 09/15] drm/nouveau: Fix potential use-after-frees for MSTCs
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (7 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 08/15] drm/nouveau: Remove unnecessary VCPI checks in nv50_msto_cleanup() Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 10/15] drm/nouveau: Stop unsetting mstc->port, use malloc refs Lyude Paul
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Ben Skeggs, David Airlie, Sean Paul, Karol Herbst, Ilia Mirkin,
	linux-kernel

Now that we finally have a sane way to keep port allocations around, use
it to fix the potential unchecked ->port accesses that nouveau makes by
making sure we keep the mst port allocated for as long as it's
drm_connector is accessible.

Additionally, now that we've guaranteed that mstc->port is allocated for
as long as we keep mstc around we can remove the connector registration
checks for codepaths which release payloads, allowing us to release
payloads on active topologies properly. These registration checks were
only required before in order to avoid situations where mstc->port could
technically be pointing at freed memory.

Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 0f7d72518604..982054bbcc8b 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -964,7 +964,11 @@ static void
 nv50_mstc_destroy(struct drm_connector *connector)
 {
 	struct nv50_mstc *mstc = nv50_mstc(connector);
+
 	drm_connector_cleanup(&mstc->connector);
+	if (mstc->port)
+		drm_dp_mst_put_port_malloc(mstc->port);
+
 	kfree(mstc);
 }
 
@@ -1012,6 +1016,7 @@ nv50_mstc_new(struct nv50_mstm *mstm, struct drm_dp_mst_port *port,
 	drm_object_attach_property(&mstc->connector.base, dev->mode_config.path_property, 0);
 	drm_object_attach_property(&mstc->connector.base, dev->mode_config.tile_property, 0);
 	drm_connector_set_path_property(&mstc->connector, path);
+	drm_dp_mst_get_port_malloc(port);
 	return 0;
 }
 
@@ -1077,6 +1082,7 @@ nv50_mstm_destroy_connector(struct drm_dp_mst_topology_mgr *mgr,
 	drm_fb_helper_remove_one_connector(&drm->fbcon->helper, &mstc->connector);
 
 	drm_modeset_lock(&drm->dev->mode_config.connection_mutex, NULL);
+	drm_dp_mst_put_port_malloc(mstc->port);
 	mstc->port = NULL;
 	drm_modeset_unlock(&drm->dev->mode_config.connection_mutex);
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 10/15] drm/nouveau: Stop unsetting mstc->port, use malloc refs
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (8 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 09/15] drm/nouveau: Fix potential use-after-frees for MSTCs Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 11/15] drm/nouveau: Grab payload lock in nv50_msto_payload() Lyude Paul
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Ben Skeggs, David Airlie, Ilia Mirkin, Sean Paul, linux-kernel

Same as we did for i915, but for nouveau this time. Additionally, we
grab a malloc reference to the port that lasts for the entire lifetime
of nv50_mstc, which gives us the guarantee that mstc->port will always
point to valid memory for as long as the mstc stays around.

Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 982054bbcc8b..157d208d37b5 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -709,8 +709,7 @@ nv50_msto_cleanup(struct nv50_msto *msto)
 
 	NV_ATOMIC(drm, "%s: msto cleanup\n", msto->encoder.name);
 
-	if (mstc->port)
-		drm_dp_mst_deallocate_vcpi(&mstm->mgr, mstc->port);
+	drm_dp_mst_deallocate_vcpi(&mstm->mgr, mstc->port);
 
 	msto->mstc = NULL;
 	msto->head = NULL;
@@ -735,7 +734,7 @@ nv50_msto_prepare(struct nv50_msto *msto)
 	};
 
 	NV_ATOMIC(drm, "%s: msto prepare\n", msto->encoder.name);
-	if (mstc->port && mstc->port->vcpi.vcpi > 0) {
+	if (mstc->port->vcpi.vcpi > 0) {
 		struct drm_dp_payload *payload = nv50_msto_payload(msto);
 		if (payload) {
 			args.vcpi.start_slot = payload->start_slot;
@@ -832,8 +831,7 @@ nv50_msto_disable(struct drm_encoder *encoder)
 	struct nv50_mstc *mstc = msto->mstc;
 	struct nv50_mstm *mstm = mstc->mstm;
 
-	if (mstc->port)
-		drm_dp_mst_reset_vcpi_slots(&mstm->mgr, mstc->port);
+	drm_dp_mst_reset_vcpi_slots(&mstm->mgr, mstc->port);
 
 	mstm->outp->update(mstm->outp, msto->head->base.index, NULL, 0, 0);
 	mstm->modified = true;
@@ -945,7 +943,7 @@ nv50_mstc_detect(struct drm_connector *connector, bool force)
 	enum drm_connector_status conn_status;
 	int ret;
 
-	if (!mstc->port)
+	if (drm_connector_is_unregistered(connector))
 		return connector_status_disconnected;
 
 	ret = pm_runtime_get_sync(connector->dev->dev);
@@ -966,8 +964,7 @@ nv50_mstc_destroy(struct drm_connector *connector)
 	struct nv50_mstc *mstc = nv50_mstc(connector);
 
 	drm_connector_cleanup(&mstc->connector);
-	if (mstc->port)
-		drm_dp_mst_put_port_malloc(mstc->port);
+	drm_dp_mst_put_port_malloc(mstc->port);
 
 	kfree(mstc);
 }
@@ -1081,11 +1078,6 @@ nv50_mstm_destroy_connector(struct drm_dp_mst_topology_mgr *mgr,
 
 	drm_fb_helper_remove_one_connector(&drm->fbcon->helper, &mstc->connector);
 
-	drm_modeset_lock(&drm->dev->mode_config.connection_mutex, NULL);
-	drm_dp_mst_put_port_malloc(mstc->port);
-	mstc->port = NULL;
-	drm_modeset_unlock(&drm->dev->mode_config.connection_mutex);
-
 	drm_connector_put(&mstc->connector);
 }
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 11/15] drm/nouveau: Grab payload lock in nv50_msto_payload()
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (9 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 10/15] drm/nouveau: Stop unsetting mstc->port, use malloc refs Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 12/15] drm/dp_mst: Add some atomic state iterator macros Lyude Paul
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Ben Skeggs, David Airlie, Ilia Mirkin, Karol Herbst,
	linux-kernel

Going through the currently programmed payloads isn't safe without
holding mgr->payload_lock, so actually do that and warn if anyone tries
calling nv50_msto_payload() in the future without grabbing the right
locks.

Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 157d208d37b5..67f7bf97e5d9 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -680,6 +680,8 @@ nv50_msto_payload(struct nv50_msto *msto)
 	struct nv50_mstm *mstm = mstc->mstm;
 	int vcpi = mstc->port->vcpi.vcpi, i;
 
+	WARN_ON(!mutex_is_locked(&mstm->mgr.payload_lock));
+
 	NV_ATOMIC(drm, "%s: vcpi %d\n", msto->encoder.name, vcpi);
 	for (i = 0; i < mstm->mgr.max_payloads; i++) {
 		struct drm_dp_payload *payload = &mstm->mgr.payloads[i];
@@ -733,6 +735,8 @@ nv50_msto_prepare(struct nv50_msto *msto)
 			       (0x0100 << msto->head->base.index),
 	};
 
+	mutex_lock(&mstm->mgr.payload_lock);
+
 	NV_ATOMIC(drm, "%s: msto prepare\n", msto->encoder.name);
 	if (mstc->port->vcpi.vcpi > 0) {
 		struct drm_dp_payload *payload = nv50_msto_payload(msto);
@@ -748,7 +752,9 @@ nv50_msto_prepare(struct nv50_msto *msto)
 		  msto->encoder.name, msto->head->base.base.name,
 		  args.vcpi.start_slot, args.vcpi.num_slots,
 		  args.vcpi.pbn, args.vcpi.aligned_pbn);
+
 	nvif_mthd(&drm->display->disp.object, 0, &args, sizeof(args));
+	mutex_unlock(&mstm->mgr.payload_lock);
 }
 
 static int
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 12/15] drm/dp_mst: Add some atomic state iterator macros
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (10 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 11/15] drm/nouveau: Grab payload lock in nv50_msto_payload() Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 13/15] drm/dp_mst: Start tracking per-port VCPI allocations Lyude Paul
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	linux-kernel

Changes since v6:
 - Move EXPORT_SYMBOL() for drm_dp_mst_topology_state_funcs to this
   commit
 - Document __drm_dp_mst_state_iter_get() and note that it shouldn't be
   called directly

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/drm_dp_mst_topology.c |  5 +-
 include/drm/drm_dp_mst_helper.h       | 96 +++++++++++++++++++++++++++
 2 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 93f08bfd2ab3..8d94c8943ac7 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -3415,10 +3415,11 @@ static void drm_dp_mst_destroy_state(struct drm_private_obj *obj,
 	kfree(mst_state);
 }
 
-static const struct drm_private_state_funcs mst_state_funcs = {
+const struct drm_private_state_funcs drm_dp_mst_topology_state_funcs = {
 	.atomic_duplicate_state = drm_dp_mst_duplicate_state,
 	.atomic_destroy_state = drm_dp_mst_destroy_state,
 };
+EXPORT_SYMBOL(drm_dp_mst_topology_state_funcs);
 
 /**
  * drm_atomic_get_mst_topology_state: get MST topology state
@@ -3502,7 +3503,7 @@ int drm_dp_mst_topology_mgr_init(struct drm_dp_mst_topology_mgr *mgr,
 
 	drm_atomic_private_obj_init(dev, &mgr->base,
 				    &mst_state->base,
-				    &mst_state_funcs);
+				    &drm_dp_mst_topology_state_funcs);
 
 	return 0;
 }
diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
index 50643a39765d..b8a8d75410d0 100644
--- a/include/drm/drm_dp_mst_helper.h
+++ b/include/drm/drm_dp_mst_helper.h
@@ -637,4 +637,100 @@ int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
 void drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port);
 void drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port);
 
+extern const struct drm_private_state_funcs drm_dp_mst_topology_state_funcs;
+
+/**
+ * __drm_dp_mst_state_iter_get - private atomic state iterator function for
+ * macro-internal use
+ * @state: &struct drm_atomic_state pointer
+ * @mgr: pointer to the &struct drm_dp_mst_topology_mgr iteration cursor
+ * @old_state: optional pointer to the old &struct drm_dp_mst_topology_state
+ * iteration cursor
+ * @new_state: optional pointer to the new &struct drm_dp_mst_topology_state
+ * iteration cursor
+ * @i: int iteration cursor, for macro-internal use
+ *
+ * Used by for_each_oldnew_mst_mgr_in_state(),
+ * for_each_old_mst_mgr_in_state(), and for_each_new_mst_mgr_in_state(). Don't
+ * call this directly.
+ *
+ * Returns:
+ * True if the current &struct drm_private_obj is a &struct
+ * drm_dp_mst_topology_mgr, false otherwise.
+ */
+static inline bool
+__drm_dp_mst_state_iter_get(struct drm_atomic_state *state,
+			    struct drm_dp_mst_topology_mgr **mgr,
+			    struct drm_dp_mst_topology_state **old_state,
+			    struct drm_dp_mst_topology_state **new_state,
+			    int i)
+{
+	struct __drm_private_objs_state *objs_state = &state->private_objs[i];
+
+	if (objs_state->ptr->funcs != &drm_dp_mst_topology_state_funcs)
+		return false;
+
+	*mgr = to_dp_mst_topology_mgr(objs_state->ptr);
+	if (old_state)
+		*old_state = to_dp_mst_topology_state(objs_state->old_state);
+	if (new_state)
+		*new_state = to_dp_mst_topology_state(objs_state->new_state);
+
+	return true;
+}
+
+/**
+ * for_each_oldnew_mst_mgr_in_state - iterate over all DP MST topology
+ * managers in an atomic update
+ * @__state: &struct drm_atomic_state pointer
+ * @mgr: &struct drm_dp_mst_topology_mgr iteration cursor
+ * @old_state: &struct drm_dp_mst_topology_state iteration cursor for the old
+ * state
+ * @new_state: &struct drm_dp_mst_topology_state iteration cursor for the new
+ * state
+ * @__i: int iteration cursor, for macro-internal use
+ *
+ * This iterates over all DRM DP MST topology managers in an atomic update,
+ * tracking both old and new state. This is useful in places where the state
+ * delta needs to be considered, for example in atomic check functions.
+ */
+#define for_each_oldnew_mst_mgr_in_state(__state, mgr, old_state, new_state, __i) \
+	for ((__i) = 0; (__i) < (__state)->num_private_objs; (__i)++) \
+		for_each_if(__drm_dp_mst_state_iter_get((__state), &(mgr), &(old_state), &(new_state), (__i)))
+
+/**
+ * for_each_old_mst_mgr_in_state - iterate over all DP MST topology managers
+ * in an atomic update
+ * @__state: &struct drm_atomic_state pointer
+ * @mgr: &struct drm_dp_mst_topology_mgr iteration cursor
+ * @old_state: &struct drm_dp_mst_topology_state iteration cursor for the old
+ * state
+ * @__i: int iteration cursor, for macro-internal use
+ *
+ * This iterates over all DRM DP MST topology managers in an atomic update,
+ * tracking only the old state. This is useful in disable functions, where we
+ * need the old state the hardware is still in.
+ */
+#define for_each_old_mst_mgr_in_state(__state, mgr, old_state, __i) \
+	for ((__i) = 0; (__i) < (__state)->num_private_objs; (__i)++) \
+		for_each_if(__drm_dp_mst_state_iter_get((__state), &(mgr), &(old_state), NULL, (__i)))
+
+/**
+ * for_each_new_mst_mgr_in_state - iterate over all DP MST topology managers
+ * in an atomic update
+ * @__state: &struct drm_atomic_state pointer
+ * @mgr: &struct drm_dp_mst_topology_mgr iteration cursor
+ * @new_state: &struct drm_dp_mst_topology_state iteration cursor for the new
+ * state
+ * @__i: int iteration cursor, for macro-internal use
+ *
+ * This iterates over all DRM DP MST topology managers in an atomic update,
+ * tracking only the new state. This is useful in enable functions, where we
+ * need the new state the hardware should be in when the atomic commit
+ * operation has completed.
+ */
+#define for_each_new_mst_mgr_in_state(__state, mgr, new_state, __i) \
+	for ((__i) = 0; (__i) < (__state)->num_private_objs; (__i)++) \
+		for_each_if(__drm_dp_mst_state_iter_get((__state), &(mgr), NULL, &(new_state), (__i)))
+
 #endif
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 13/15] drm/dp_mst: Start tracking per-port VCPI allocations
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (11 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 12/15] drm/dp_mst: Add some atomic state iterator macros Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14 16:37   ` Daniel Vetter
  2018-12-14  1:25 ` [WIP PATCH 14/15] drm/dp_mst: Check payload count in drm_dp_mst_atomic_check() Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 15/15] drm/nouveau: Use atomic VCPI helpers for MST Lyude Paul
  14 siblings, 1 reply; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard, Sean Paul,
	David Airlie, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	linux-kernel

There has been a TODO waiting for quite a long time in
drm_dp_mst_topology.c:

	/* We cannot rely on port->vcpi.num_slots to update
	 * topology_state->avail_slots as the port may not exist if the parent
	 * branch device was unplugged. This should be fixed by tracking
	 * per-port slot allocation in drm_dp_mst_topology_state instead of
	 * depending on the caller to tell us how many slots to release.
	 */

That's not the only reason we should fix this: forcing the driver to
track the VCPI allocations throughout a state's atomic check is
error prone, because it means that extra care has to be taken with the
order that drm_dp_atomic_find_vcpi_slots() and
drm_dp_atomic_release_vcpi_slots() are called in in order to ensure
idempotency. Currently the only driver actually using these helpers,
i915, doesn't even do this correctly: multiple ->best_encoder() checks
with i915's current implementation would not be idempotent and would
over-allocate VCPI slots, something I learned trying to implement
fallback retraining in MST.

So: simplify this whole mess, and teach drm_dp_atomic_find_vcpi_slots()
and drm_dp_atomic_release_vcpi_slots() to track the VCPI allocations for
each port. This allows us to ensure idempotency without having to rely
on the driver as much. Additionally: the driver doesn't need to do any
kind of VCPI slot tracking anymore if it doesn't need it for it's own
internal state.

Additionally; this adds a new drm_dp_mst_atomic_check() helper which
must be used by atomic drivers to perform validity checks for the new
VCPI allocations incurred by a state.

Also: update the documentation and make it more obvious that these
/must/ be called by /all/ atomic drivers supporting MST.

Changes since v6:
 - Keep a kref to all of the ports we have allocations on. This required
   a good bit of changing to when we call drm_dp_find_vcpi_slots(),
   mainly that we need to ensure that we only redo VCPI allocations on
   actual mode or CRTC changes, not crtc_state->active changes.
   Additionally, we no longer take the registration of the DRM connector
   for each port into account because so long as we have a kref to the
   port in the new or previous atomic state, the connector will stay
   registered.
 - Use the small changes to drm_dp_put_port() to add even more error
   checking to make misusage of the helpers more obvious. I added this
   after having to chase down various use-after-free conditions that
   started popping up from the new helpers so no one else has to
   troubleshoot that.
 - Move some accidental DRM_DEBUG_KMS() calls to DRM_DEBUG_ATOMIC()
 - Update documentation again, note that find/release() should both not be
   called on the same port in a single atomic check phase (but multiple
   calls to one or the other is OK)

Changes since v4:
 - Don't skip the atomic checks for VCPI allocations if no new VCPI
   allocations happen in a state. This makes the next change I'm about
   to list here a lot easier to implement.
 - Don't ignore VCPI allocations on destroyed ports, instead ensure that
   when ports are destroyed and still have VCPI allocations in the
   topology state, the only state changes allowed are releasing said
   ports' VCPI. This prevents a state with a mix of VCPI allocations
   from destroyed ports, and allocations from valid ports.

Changes since v3:
 - Don't release VCPI allocations in the topology state immediately in
   drm_dp_atomic_release_vcpi_slots(), instead mark them as 0 and skip
   over them in drm_dp_mst_duplicate_state(). This makes it so
   drm_dp_atomic_release_vcpi_slots() is still idempotent while also
   throwing warnings if the driver messes up it's book keeping and tries
   to release VCPI slots on a port that doesn't have any pre-existing
   VCPI allocation - danvet
 - Change mst_state/state in some debugging messages to "mst state"

Changes since v2:
 - Use kmemdup() for duplicating MST state - danvet
 - Move port validation out of duplicate state callback - danvet
 - Handle looping through MST topology states in
   drm_dp_mst_atomic_check() so the driver doesn't have to do it
 - Fix documentation in drm_dp_atomic_find_vcpi_slots()
 - Move the atomic check for each individual topology state into it's
   own function, reduces indenting
 - Don't consider "stale" MST ports when calculating the bandwidth
   requirements. This is needed because originally we relied on the
   state duplication functions to prune any stale ports from the new
   state, which would prevent us from incorrectly considering their
   bandwidth requirements alongside legitimate new payloads.
 - Add function references in drm_dp_atomic_release_vcpi_slots() - danvet
 - Annotate atomic VCPI and atomic check functions with __must_check
   - danvet

Changes since v1:
 - Don't use the now-removed ->atomic_check() for private objects hook,
   just give drivers a function to call themselves

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 271 +++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_display.c  |   4 +
 drivers/gpu/drm/i915/intel_dp_mst.c   |  64 +++---
 include/drm/drm_dp_mst_helper.h       |  24 ++-
 4 files changed, 298 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 8d94c8943ac7..b9374c981a5b 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -2897,21 +2897,42 @@ static int drm_dp_init_vcpi(struct drm_dp_mst_topology_mgr *mgr,
 }
 
 /**
- * drm_dp_atomic_find_vcpi_slots() - Find and add vcpi slots to the state
+ * drm_dp_atomic_find_vcpi_slots() - Find and add VCPI slots to the state
  * @state: global atomic state
  * @mgr: MST topology manager for the port
  * @port: port to find vcpi slots for
  * @pbn: bandwidth required for the mode in PBN
  *
- * RETURNS:
- * Total slots in the atomic state assigned for this port or error
+ * Allocates VCPI slots to @port, replacing any previous VCPI allocations it
+ * may have had. Any atomic drivers which support MST must call this function
+ * in their &drm_encoder_helper_funcs.atomic_check() callback to change the
+ * current VCPI allocation for the new state, but only when
+ * &drm_crtc_state.mode_changed or &drm_crtc_state.connectors_changed is set
+ * to ensure compatibility with userspace applications that still use the
+ * legacy modesetting UAPI.
+ *
+ * Allocations set by this function are not checked against the bandwidth
+ * restraints of @mgr until the driver calls drm_dp_mst_atomic_check().
+ *
+ * Additionally, it is OK to call this function multiple times on the same
+ * @port as needed. It is not OK however, to call this function and
+ * drm_dp_atomic_release_vcpi_slots() in the same atomic check phase.
+ *
+ * See also:
+ * drm_dp_atomic_release_vcpi_slots()
+ * drm_dp_mst_atomic_check()
+ *
+ * Returns:
+ * Total slots in the atomic state assigned for this port, or a negative error
+ * code if the port no longer exists
  */
 int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
 				  struct drm_dp_mst_topology_mgr *mgr,
 				  struct drm_dp_mst_port *port, int pbn)
 {
 	struct drm_dp_mst_topology_state *topology_state;
-	int req_slots;
+	struct drm_dp_vcpi_allocation *pos, *vcpi = NULL;
+	int prev_slots, req_slots, ret;
 
 	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
 	if (IS_ERR(topology_state))
@@ -2920,20 +2941,56 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
 	port = drm_dp_mst_topology_get_port_validated(mgr, port);
 	if (port == NULL)
 		return -EINVAL;
-	req_slots = DIV_ROUND_UP(pbn, mgr->pbn_div);
-	DRM_DEBUG_KMS("vcpi slots req=%d, avail=%d\n",
-			req_slots, topology_state->avail_slots);
 
-	if (req_slots > topology_state->avail_slots) {
-		drm_dp_mst_topology_put_port(port);
-		return -ENOSPC;
+	topology_state->vcpi_allocated = true;
+
+	/* Find the current allocation for this port, if any */
+	list_for_each_entry(pos, &topology_state->vcpis, next) {
+		if (pos->port == port) {
+			vcpi = pos;
+			prev_slots = vcpi->vcpi;
+
+			/*
+			 * This should never happen, unless the driver tries
+			 * releasing and allocating the same VCPI allocation,
+			 * which is an error
+			 */
+			if (WARN_ON(!prev_slots)) {
+				DRM_ERROR("cannot allocate and release VCPI on [MST PORT:%p] in the same state\n",
+					  port);
+				return -EINVAL;
+			}
+
+			break;
+		}
 	}
+	if (!vcpi)
+		prev_slots = 0;
+
+	req_slots = DIV_ROUND_UP(pbn, mgr->pbn_div);
+
+	DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] [MST PORT:%p] VCPI %d -> %d\n",
+			 port->connector->base.id, port->connector->name,
+			 port, prev_slots, req_slots);
 
-	topology_state->avail_slots -= req_slots;
-	DRM_DEBUG_KMS("vcpi slots avail=%d", topology_state->avail_slots);
+	/* Add the new allocation to the state */
+	if (!vcpi) {
+		vcpi = kzalloc(sizeof(*vcpi), GFP_KERNEL);
+		if (!vcpi) {
+			ret = -ENOMEM;
+			goto out;
+		}
 
+		drm_dp_mst_get_port_malloc(port);
+		vcpi->port = port;
+		list_add(&vcpi->next, &topology_state->vcpis);
+	}
+	vcpi->vcpi = req_slots;
+
+	ret = req_slots;
+out:
 	drm_dp_mst_topology_put_port(port);
-	return req_slots;
+	return ret;
 }
 EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
 
@@ -2941,31 +2998,57 @@ EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
  * drm_dp_atomic_release_vcpi_slots() - Release allocated vcpi slots
  * @state: global atomic state
  * @mgr: MST topology manager for the port
- * @slots: number of vcpi slots to release
+ * @port: The port to release the VCPI slots from
  *
- * RETURNS:
- * 0 if @slots were added back to &drm_dp_mst_topology_state->avail_slots or
- * negative error code
+ * Releases any VCPI slots that have been allocated to a port in the atomic
+ * state. Any atomic drivers which support MST must call this function in
+ * their &drm_connector_helper_funcs.atomic_check() callback when the
+ * connector will no longer have VCPI allocated (e.g. because it's CRTC was
+ * removed) when it had VCPI allocated in the previous atomic state.
+ *
+ * It is OK to call this even if @port has been removed from the system.
+ * Additionally, it is OK to call this function multiple times on the same
+ * @port as needed. It is not OK however, to call this function and
+ * drm_dp_atomic_find_vcpi_slots() on the same @port in a single atomic check
+ * phase.
+ *
+ * See also:
+ * drm_dp_atomic_find_vcpi_slots()
+ * drm_dp_mst_atomic_check()
+ *
+ * Returns:
+ * 0 if all slots for this port were added back to
+ * &drm_dp_mst_topology_state.avail_slots or negative error code
  */
 int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
 				     struct drm_dp_mst_topology_mgr *mgr,
-				     int slots)
+				     struct drm_dp_mst_port *port)
 {
 	struct drm_dp_mst_topology_state *topology_state;
+	struct drm_dp_vcpi_allocation *pos;
+	bool found = false;
 
 	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
 	if (IS_ERR(topology_state))
 		return PTR_ERR(topology_state);
 
-	/* We cannot rely on port->vcpi.num_slots to update
-	 * topology_state->avail_slots as the port may not exist if the parent
-	 * branch device was unplugged. This should be fixed by tracking
-	 * per-port slot allocation in drm_dp_mst_topology_state instead of
-	 * depending on the caller to tell us how many slots to release.
-	 */
-	topology_state->avail_slots += slots;
-	DRM_DEBUG_KMS("vcpi slots released=%d, avail=%d\n",
-			slots, topology_state->avail_slots);
+	list_for_each_entry(pos, &topology_state->vcpis, next) {
+		if (pos->port == port) {
+			found = true;
+			break;
+		}
+	}
+	if (WARN_ON(!found)) {
+		DRM_ERROR("no VCPI for [MST PORT:%p] found in mst state %p\n",
+			  port, &topology_state->base);
+		return -EINVAL;
+	}
+
+	DRM_DEBUG_ATOMIC("[MST PORT:%p] VCPI %d -> 0\n", port, pos->vcpi);
+	if (pos->vcpi) {
+		drm_dp_mst_put_port_malloc(port);
+		pos->vcpi = 0;
+	}
 
 	return 0;
 }
@@ -3395,15 +3478,42 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
 static struct drm_private_state *
 drm_dp_mst_duplicate_state(struct drm_private_obj *obj)
 {
-	struct drm_dp_mst_topology_state *state;
+	struct drm_dp_mst_topology_state *state, *old_state =
+		to_dp_mst_topology_state(obj->state);
+	struct drm_dp_vcpi_allocation *pos, *vcpi;
 
-	state = kmemdup(obj->state, sizeof(*state), GFP_KERNEL);
+	state = kmemdup(old_state, sizeof(*state), GFP_KERNEL);
 	if (!state)
 		return NULL;
 
 	__drm_atomic_helper_private_obj_duplicate_state(obj, &state->base);
 
+	INIT_LIST_HEAD(&state->vcpis);
+	state->vcpi_allocated = false;
+
+	list_for_each_entry(pos, &old_state->vcpis, next) {
+		/* Prune leftover freed VCPI allocations */
+		if (!pos->vcpi)
+			continue;
+
+		vcpi = kmemdup(pos, sizeof(*vcpi), GFP_KERNEL);
+		if (!vcpi)
+			goto fail;
+
+		drm_dp_mst_get_port_malloc(vcpi->port);
+		list_add(&vcpi->next, &state->vcpis);
+	}
+
 	return &state->base;
+
+fail:
+	list_for_each_entry_safe(pos, vcpi, &state->vcpis, next) {
+		drm_dp_mst_put_port_malloc(pos->port);
+		kfree(pos);
+	}
+	kfree(state);
+
+	return NULL;
 }
 
 static void drm_dp_mst_destroy_state(struct drm_private_obj *obj,
@@ -3411,10 +3521,109 @@ static void drm_dp_mst_destroy_state(struct drm_private_obj *obj,
 {
 	struct drm_dp_mst_topology_state *mst_state =
 		to_dp_mst_topology_state(state);
+	struct drm_dp_vcpi_allocation *pos, *tmp;
+
+	list_for_each_entry_safe(pos, tmp, &mst_state->vcpis, next) {
+		/* We only keep references to ports with non-zero VCPIs */
+		if (pos->vcpi)
+			drm_dp_mst_put_port_malloc(pos->port);
+		kfree(pos);
+	}
 
 	kfree(mst_state);
 }
 
+static inline int
+drm_dp_mst_atomic_check_topology_state(struct drm_dp_mst_topology_mgr *mgr,
+				       struct drm_dp_mst_topology_state *mst_state)
+{
+	struct drm_dp_vcpi_allocation *vcpi;
+	int avail_slots = 63, ret;
+
+	/* There's no possible scenario where releasing VCPI or keeping it the
+	 * same would make the state invalid
+	 */
+	if (!mst_state->vcpi_allocated) {
+		DRM_DEBUG_ATOMIC("[MST MGR:%p] state %p allocates no VCPI, check skipped\n",
+				 mgr, &mst_state->base);
+		return 0;
+	}
+
+	list_for_each_entry(vcpi, &mst_state->vcpis, next) {
+		/* Releasing VCPI is always OK-even if the port is gone */
+		if (!vcpi->vcpi) {
+			DRM_DEBUG_ATOMIC("[MST PORT:%p] releases all VCPI slots\n",
+					 vcpi->port);
+			continue;
+		}
+
+		if (!drm_dp_mst_topology_get_port(vcpi->port)) {
+			DRM_DEBUG_ATOMIC("[MST PORT:%p] is gone but still has VCPI, cannot add new VCPI\n",
+					 vcpi->port);
+			return -EINVAL;
+		}
+
+		DRM_DEBUG_ATOMIC("[MST PORT:%p] requires %d vcpi slots\n",
+				 vcpi->port, vcpi->vcpi);
+
+		avail_slots -= vcpi->vcpi;
+		if (avail_slots < 0) {
+			DRM_DEBUG_ATOMIC("[MST PORT:%p] not enough VCPI slots in mst state %p (avail=%d)\n",
+					 vcpi->port, mst_state,
+					 avail_slots + vcpi->vcpi);
+			ret = -ENOSPC;
+			goto port_fail;
+		}
+
+		drm_dp_mst_topology_put_port(vcpi->port);
+	}
+	DRM_DEBUG_ATOMIC("[MST MGR:%p] mst state %p VCPI avail=%d used=%d\n",
+			 mgr, mst_state, avail_slots,
+			 63 - avail_slots);
+
+	return 0;
+port_fail:
+	drm_dp_mst_topology_put_port(vcpi->port);
+	return ret;
+}
+
+/**
+ * drm_dp_mst_atomic_check - Check that the new state of an MST topology in an
+ * atomic update is valid
+ * @state: Pointer to the new &struct drm_dp_mst_topology_state
+ *
+ * Checks the given topology state for an atomic update to ensure that it's
+ * valid. This includes checking whether there's enough bandwidth to support
+ * the new VCPI allocations in the atomic update.
+ *
+ * Any atomic drivers supporting DP MST must make sure to call this after
+ * checking the rest of their state in their
+ * &drm_mode_config_funcs.atomic_check() callback.
+ *
+ * See also:
+ * drm_dp_atomic_find_vcpi_slots()
+ * drm_dp_atomic_release_vcpi_slots()
+ *
+ * Returns:
+ *
+ * 0 if the new state is valid, negative error code otherwise.
+ */
+int drm_dp_mst_atomic_check(struct drm_atomic_state *state)
+{
+	struct drm_dp_mst_topology_mgr *mgr;
+	struct drm_dp_mst_topology_state *mst_state;
+	int i, ret = 0;
+
+	for_each_new_mst_mgr_in_state(state, mgr, mst_state, i) {
+		ret = drm_dp_mst_atomic_check_topology_state(mgr, mst_state);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(drm_dp_mst_atomic_check);
+
 const struct drm_private_state_funcs drm_dp_mst_topology_state_funcs = {
 	.atomic_duplicate_state = drm_dp_mst_duplicate_state,
 	.atomic_destroy_state = drm_dp_mst_destroy_state,
@@ -3497,9 +3706,7 @@ int drm_dp_mst_topology_mgr_init(struct drm_dp_mst_topology_mgr *mgr,
 		return -ENOMEM;
 
 	mst_state->mgr = mgr;
-
-	/* max. time slots - one slot for MTP header */
-	mst_state->avail_slots = 63;
+	INIT_LIST_HEAD(&mst_state->vcpis);
 
 	drm_atomic_private_obj_init(dev, &mgr->base,
 				    &mst_state->base,
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 2c3f3f68d506..868cb897f629 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12690,6 +12690,10 @@ static int intel_atomic_check(struct drm_device *dev,
 				       "[modeset]" : "[fastset]");
 	}
 
+	ret = drm_dp_mst_atomic_check(state);
+	if (ret)
+		return ret;
+
 	if (any_ms) {
 		ret = intel_modeset_checks(state);
 
diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
index 4d6ced34d465..6754d7922a34 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -41,8 +41,13 @@ static bool intel_dp_mst_compute_config(struct intel_encoder *encoder,
 	struct drm_connector *connector = conn_state->connector;
 	void *port = to_intel_connector(connector)->port;
 	struct drm_atomic_state *state = pipe_config->base.state;
+	struct drm_crtc *crtc = pipe_config->base.crtc;
+	struct drm_crtc_state *old_crtc_state =
+		drm_atomic_get_old_crtc_state(state, crtc);
+	struct drm_crtc_state *new_crtc_state = &pipe_config->base;
 	int bpp;
-	int lane_count, slots = 0;
+	int lane_count, slots =
+		to_intel_crtc_state(old_crtc_state)->dp_m_n.tu;
 	const struct drm_display_mode *adjusted_mode = &pipe_config->base.adjusted_mode;
 	int mst_pbn;
 	bool constant_n = drm_dp_has_quirk(&intel_dp->desc,
@@ -77,8 +82,12 @@ static bool intel_dp_mst_compute_config(struct intel_encoder *encoder,
 	mst_pbn = drm_dp_calc_pbn_mode(adjusted_mode->crtc_clock, bpp);
 	pipe_config->pbn = mst_pbn;
 
-	/* Zombie connectors can't have VCPI slots */
-	if (!drm_connector_is_unregistered(connector)) {
+	/* Only change VCPI allocation on actual mode changes, to prevent us
+	 * from trying to allocate VCPI to ports that no longer exist when we
+	 * may just be trying to disable DPMS on them
+	 */
+	if (new_crtc_state->mode_changed ||
+	    new_crtc_state->connectors_changed) {
 		slots = drm_dp_atomic_find_vcpi_slots(state,
 						      &intel_dp->mst_mgr,
 						      port,
@@ -107,36 +116,39 @@ static bool intel_dp_mst_compute_config(struct intel_encoder *encoder,
 	return true;
 }
 
-static int intel_dp_mst_atomic_check(struct drm_connector *connector,
-		struct drm_connector_state *new_conn_state)
+static int
+intel_dp_mst_atomic_check(struct drm_connector *connector,
+			  struct drm_connector_state *new_conn_state)
 {
+	struct intel_connector *intel_connector =
+		to_intel_connector(connector);
+	struct drm_dp_mst_topology_mgr *mgr =
+		&intel_connector->mst_port->mst_mgr;
+	struct drm_dp_mst_port *port = intel_connector->port;
 	struct drm_atomic_state *state = new_conn_state->state;
-	struct drm_connector_state *old_conn_state;
-	struct drm_crtc *old_crtc;
-	struct drm_crtc_state *crtc_state;
-	int slots, ret = 0;
+	struct drm_connector_state *old_conn_state =
+		drm_atomic_get_old_connector_state(state, connector);
+	struct drm_crtc *old_crtc = old_conn_state->crtc,
+			*new_crtc = new_conn_state->crtc;
+	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
 
-	old_conn_state = drm_atomic_get_old_connector_state(state, connector);
-	old_crtc = old_conn_state->crtc;
 	if (!old_crtc)
-		return ret;
+		return 0;
 
-	crtc_state = drm_atomic_get_new_crtc_state(state, old_crtc);
-	slots = to_intel_crtc_state(crtc_state)->dp_m_n.tu;
-	if (drm_atomic_crtc_needs_modeset(crtc_state) && slots > 0) {
-		struct drm_dp_mst_topology_mgr *mgr;
-		struct drm_encoder *old_encoder;
+	old_crtc_state = drm_atomic_get_old_crtc_state(state, old_crtc);
+	if (!old_crtc_state ||
+	    !to_intel_crtc_state(old_crtc_state)->dp_m_n.tu)
+		return 0;
 
-		old_encoder = old_conn_state->best_encoder;
-		mgr = &enc_to_mst(old_encoder)->primary->dp.mst_mgr;
+	/* If we switched CRTCs, clear the previous one's allocation */
+	new_crtc_state = drm_atomic_get_new_crtc_state(state, old_crtc);
+	if (new_crtc_state->connectors_changed && !new_crtc_state->enable)
+		to_intel_crtc_state(new_crtc_state)->dp_m_n.tu = 0;
 
-		ret = drm_dp_atomic_release_vcpi_slots(state, mgr, slots);
-		if (ret)
-			DRM_DEBUG_KMS("failed releasing %d vcpi slots:%d\n", slots, ret);
-		else
-			to_intel_crtc_state(crtc_state)->dp_m_n.tu = 0;
-	}
-	return ret;
+	if (new_crtc)
+		return 0;
+
+	return drm_dp_atomic_release_vcpi_slots(state, mgr, port);
 }
 
 static void intel_mst_disable_dp(struct intel_encoder *encoder,
diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
index b8a8d75410d0..79e3f9c730a5 100644
--- a/include/drm/drm_dp_mst_helper.h
+++ b/include/drm/drm_dp_mst_helper.h
@@ -412,9 +412,16 @@ struct drm_dp_payload {
 
 #define to_dp_mst_topology_state(x) container_of(x, struct drm_dp_mst_topology_state, base)
 
+struct drm_dp_vcpi_allocation {
+	struct drm_dp_mst_port *port;
+	int vcpi;
+	struct list_head next;
+};
+
 struct drm_dp_mst_topology_state {
 	struct drm_private_state base;
-	int avail_slots;
+	struct list_head vcpis;
+	u8 vcpi_allocated;
 	struct drm_dp_mst_topology_mgr *mgr;
 };
 
@@ -625,14 +632,17 @@ void drm_dp_mst_topology_mgr_suspend(struct drm_dp_mst_topology_mgr *mgr);
 int drm_dp_mst_topology_mgr_resume(struct drm_dp_mst_topology_mgr *mgr);
 struct drm_dp_mst_topology_state *drm_atomic_get_mst_topology_state(struct drm_atomic_state *state,
 								    struct drm_dp_mst_topology_mgr *mgr);
-int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
-				  struct drm_dp_mst_topology_mgr *mgr,
-				  struct drm_dp_mst_port *port, int pbn);
-int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
-				     struct drm_dp_mst_topology_mgr *mgr,
-				     int slots);
+int __must_check
+drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
+			      struct drm_dp_mst_topology_mgr *mgr,
+			      struct drm_dp_mst_port *port, int pbn);
+int __must_check
+drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
+				 struct drm_dp_mst_topology_mgr *mgr,
+				 struct drm_dp_mst_port *port);
 int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
 				 struct drm_dp_mst_port *port, bool power_up);
+int __must_check drm_dp_mst_atomic_check(struct drm_atomic_state *state);
 
 void drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port);
 void drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 14/15] drm/dp_mst: Check payload count in drm_dp_mst_atomic_check()
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (12 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 13/15] drm/dp_mst: Start tracking per-port VCPI allocations Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  2018-12-14  1:25 ` [WIP PATCH 15/15] drm/nouveau: Use atomic VCPI helpers for MST Lyude Paul
  14 siblings, 0 replies; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	linux-kernel

It occurred to me that we never actually check this! So let's start
doing that.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index b9374c981a5b..ebffb834f5d6 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -3538,7 +3538,7 @@ drm_dp_mst_atomic_check_topology_state(struct drm_dp_mst_topology_mgr *mgr,
 				       struct drm_dp_mst_topology_state *mst_state)
 {
 	struct drm_dp_vcpi_allocation *vcpi;
-	int avail_slots = 63, ret;
+	int avail_slots = 63, payload_count = 0, ret;
 
 	/* There's no possible scenario where releasing VCPI or keeping it the
 	 * same would make the state invalid
@@ -3575,6 +3575,13 @@ drm_dp_mst_atomic_check_topology_state(struct drm_dp_mst_topology_mgr *mgr,
 			goto port_fail;
 		}
 
+		if (++payload_count > mgr->max_payloads) {
+			DRM_DEBUG_ATOMIC("[MST MGR:%p] state %p has too many payloads (max=%d)\n",
+					 mgr, mst_state, mgr->max_payloads);
+			ret = -EINVAL;
+			goto port_fail;
+		}
+
 		drm_dp_mst_topology_put_port(vcpi->port);
 	}
 	DRM_DEBUG_ATOMIC("[MST MGR:%p] mst state %p VCPI avail=%d used=%d\n",
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP PATCH 15/15] drm/nouveau: Use atomic VCPI helpers for MST
  2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
                   ` (13 preceding siblings ...)
  2018-12-14  1:25 ` [WIP PATCH 14/15] drm/dp_mst: Check payload count in drm_dp_mst_atomic_check() Lyude Paul
@ 2018-12-14  1:25 ` Lyude Paul
  14 siblings, 0 replies; 31+ messages in thread
From: Lyude Paul @ 2018-12-14  1:25 UTC (permalink / raw)
  To: dri-devel, nouveau, intel-gfx, amd-gfx
  Cc: Daniel Vetter, Dave Airlie, Harry Wentland, Jerry Zuo,
	Daniel Vetter, Ben Skeggs, David Airlie, Karol Herbst, Sean Paul,
	Ilia Mirkin, linux-kernel

Currently, nouveau uses the yolo method of setting up MST displays: it
uses the old VCPI helpers (drm_dp_find_vcpi_slots()) for computing the
display configuration. These helpers don't take care to make sure they
take a reference to the mstb port that they're checking, and
additionally don't actually check whether or not the topology still has
enough bandwidth to provide the VCPI tokens required.

So, drop usage of the old helpers and move entirely over to the atomic
helpers.

Changes since v5:
 - Update nv50_msto_atomic_check() and nv50_mstc_atomic_check() to the
   new requirements for drm_dp_atomic_find_vcpi_slots() and
   drm_dp_atomic_release_vcpi_slots()

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 52 ++++++++++++++++++++++---
 1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 67f7bf97e5d9..df696008d205 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -762,16 +762,22 @@ nv50_msto_atomic_check(struct drm_encoder *encoder,
 		       struct drm_crtc_state *crtc_state,
 		       struct drm_connector_state *conn_state)
 {
-	struct nv50_mstc *mstc = nv50_mstc(conn_state->connector);
+	struct drm_atomic_state *state = crtc_state->state;
+	struct drm_connector *connector = conn_state->connector;
+	struct nv50_mstc *mstc = nv50_mstc(connector);
 	struct nv50_mstm *mstm = mstc->mstm;
-	int bpp = conn_state->connector->display_info.bpc * 3;
+	int bpp = connector->display_info.bpc * 3;
 	int slots;
 
-	mstc->pbn = drm_dp_calc_pbn_mode(crtc_state->adjusted_mode.clock, bpp);
+	mstc->pbn = drm_dp_calc_pbn_mode(crtc_state->adjusted_mode.clock,
+					 bpp);
 
-	slots = drm_dp_find_vcpi_slots(&mstm->mgr, mstc->pbn);
-	if (slots < 0)
-		return slots;
+	if (crtc_state->connectors_changed || crtc_state->mode_changed) {
+		slots = drm_dp_atomic_find_vcpi_slots(state, &mstm->mgr,
+						      mstc->port, mstc->pbn);
+		if (slots < 0)
+			return slots;
+	}
 
 	return nv50_outp_atomic_check_view(encoder, crtc_state, conn_state,
 					   mstc->native);
@@ -934,12 +940,42 @@ nv50_mstc_get_modes(struct drm_connector *connector)
 	return ret;
 }
 
+static int
+nv50_mstc_atomic_check(struct drm_connector *connector,
+		       struct drm_connector_state *new_conn_state)
+{
+	struct drm_atomic_state *state = new_conn_state->state;
+	struct nv50_mstc *mstc = nv50_mstc(connector);
+	struct drm_dp_mst_topology_mgr *mgr = &mstc->mstm->mgr;
+	struct drm_connector_state *old_conn_state =
+		drm_atomic_get_old_connector_state(state, connector);
+	struct drm_crtc_state *old_crtc_state;
+	struct drm_crtc *new_crtc = new_conn_state->crtc,
+			*old_crtc = old_conn_state->crtc;
+
+	if (!old_crtc)
+		return 0;
+
+	old_crtc_state = drm_atomic_get_old_crtc_state(state, old_crtc);
+	if (!old_crtc_state || !old_crtc_state->enable)
+		return 0;
+
+	if (new_crtc)
+		return 0;
+
+	/* This connector will be left without an enabled CRTC, so its VCPI
+	 * must be released here
+	 */
+	return drm_dp_atomic_release_vcpi_slots(state, mgr, mstc->port);
+}
+
 static const struct drm_connector_helper_funcs
 nv50_mstc_help = {
 	.get_modes = nv50_mstc_get_modes,
 	.mode_valid = nv50_mstc_mode_valid,
 	.best_encoder = nv50_mstc_best_encoder,
 	.atomic_best_encoder = nv50_mstc_atomic_best_encoder,
+	.atomic_check = nv50_mstc_atomic_check,
 };
 
 static enum drm_connector_status
@@ -2121,6 +2157,10 @@ nv50_disp_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
 			return ret;
 	}
 
+	ret = drm_dp_mst_atomic_check(state);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 01/15] drm/dp_mst: Remove bogus conditional in drm_dp_update_payload_part1()
  2018-12-14  1:25 ` [WIP PATCH 01/15] drm/dp_mst: Remove bogus conditional in drm_dp_update_payload_part1() Lyude Paul
@ 2018-12-14  8:42   ` Daniel Vetter
  2018-12-17 20:09     ` Wentland, Harry
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Vetter @ 2018-12-14  8:42 UTC (permalink / raw)
  To: Lyude Paul
  Cc: dri-devel, nouveau, intel-gfx, amd-gfx, Daniel Vetter,
	Dave Airlie, Harry Wentland, Jerry Zuo, Juston Li,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	linux-kernel

On Thu, Dec 13, 2018 at 08:25:30PM -0500, Lyude Paul wrote:
> There's no reason we need this, it's just confusing looking.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Juston Li <juston.li@intel.com>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index ad0fb6d003be..9b1b5c9b1fa0 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1896,9 +1896,7 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>  			req_payload.num_slots = 0;
>  		}
>  
> -		if (mgr->payloads[i].start_slot != req_payload.start_slot) {
> -			mgr->payloads[i].start_slot = req_payload.start_slot;
> -		}
> +		mgr->payloads[i].start_slot = req_payload.start_slot;

Entertaining!

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

>  		/* work out what is required to happen with this payload */
>  		if (mgr->payloads[i].num_slots != req_payload.num_slots) {
>  
> -- 
> 2.19.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 02/15] drm/dp_mst: Refactor drm_dp_update_payload_part1()
  2018-12-14  1:25 ` [WIP PATCH 02/15] drm/dp_mst: Refactor drm_dp_update_payload_part1() Lyude Paul
@ 2018-12-14  8:47   ` Daniel Vetter
  2018-12-17 20:27     ` Wentland, Harry
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Vetter @ 2018-12-14  8:47 UTC (permalink / raw)
  To: Lyude Paul
  Cc: dri-devel, nouveau, intel-gfx, amd-gfx, Daniel Vetter,
	Dave Airlie, Harry Wentland, Jerry Zuo, Juston Li,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	linux-kernel

On Thu, Dec 13, 2018 at 08:25:31PM -0500, Lyude Paul wrote:
> There should be no functional changes here

Would be good to explain what you did refactor here, instead of me trying
to reconstruct it from the patch. Especially pre-coffee that helps :-)
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Juston Li <juston.li@intel.com>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 71 ++++++++++++++++-----------
>  1 file changed, 42 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 9b1b5c9b1fa0..2ab16c9e6243 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1879,39 +1879,48 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>  
>  	mutex_lock(&mgr->payload_lock);
>  	for (i = 0; i < mgr->max_payloads; i++) {
> +		struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
> +		struct drm_dp_payload *payload = &mgr->payloads[i];
> +
>  		/* solve the current payloads - compare to the hw ones
>  		   - update the hw view */
>  		req_payload.start_slot = cur_slots;
> -		if (mgr->proposed_vcpis[i]) {
> -			port = container_of(mgr->proposed_vcpis[i], struct drm_dp_mst_port, vcpi);
> +		if (vcpi) {
> +			port = container_of(vcpi, struct drm_dp_mst_port,
> +					    vcpi);
>  			port = drm_dp_get_validated_port_ref(mgr, port);
>  			if (!port) {
>  				mutex_unlock(&mgr->payload_lock);
>  				return -EINVAL;
>  			}
> -			req_payload.num_slots = mgr->proposed_vcpis[i]->num_slots;
> -			req_payload.vcpi = mgr->proposed_vcpis[i]->vcpi;
> +			req_payload.num_slots = vcpi->num_slots;
> +			req_payload.vcpi = vcpi->vcpi;
>  		} else {
>  			port = NULL;
>  			req_payload.num_slots = 0;
>  		}
>  
> -		mgr->payloads[i].start_slot = req_payload.start_slot;
> +		payload->start_slot = req_payload.start_slot;
>  		/* work out what is required to happen with this payload */
> -		if (mgr->payloads[i].num_slots != req_payload.num_slots) {
> +		if (payload->num_slots != req_payload.num_slots) {
>  
>  			/* need to push an update for this payload */
>  			if (req_payload.num_slots) {
> -				drm_dp_create_payload_step1(mgr, mgr->proposed_vcpis[i]->vcpi, &req_payload);
> -				mgr->payloads[i].num_slots = req_payload.num_slots;
> -				mgr->payloads[i].vcpi = req_payload.vcpi;
> -			} else if (mgr->payloads[i].num_slots) {
> -				mgr->payloads[i].num_slots = 0;
> -				drm_dp_destroy_payload_step1(mgr, port, mgr->payloads[i].vcpi, &mgr->payloads[i]);
> -				req_payload.payload_state = mgr->payloads[i].payload_state;
> -				mgr->payloads[i].start_slot = 0;
> +				drm_dp_create_payload_step1(mgr, vcpi->vcpi,
> +							    &req_payload);
> +				payload->num_slots = req_payload.num_slots;
> +				payload->vcpi = req_payload.vcpi;
> +
> +			} else if (payload->num_slots) {
> +				payload->num_slots = 0;
> +				drm_dp_destroy_payload_step1(mgr, port,
> +							     payload->vcpi,
> +							     payload);
> +				req_payload.payload_state =
> +					payload->payload_state;
> +				payload->start_slot = 0;
>  			}
> -			mgr->payloads[i].payload_state = req_payload.payload_state;
> +			payload->payload_state = req_payload.payload_state;
>  		}
>  		cur_slots += req_payload.num_slots;
>  
> @@ -1920,22 +1929,26 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>  	}
>  
>  	for (i = 0; i < mgr->max_payloads; i++) {
> -		if (mgr->payloads[i].payload_state == DP_PAYLOAD_DELETE_LOCAL) {
> -			DRM_DEBUG_KMS("removing payload %d\n", i);
> -			for (j = i; j < mgr->max_payloads - 1; j++) {
> -				memcpy(&mgr->payloads[j], &mgr->payloads[j + 1], sizeof(struct drm_dp_payload));
> -				mgr->proposed_vcpis[j] = mgr->proposed_vcpis[j + 1];
> -				if (mgr->proposed_vcpis[j] && mgr->proposed_vcpis[j]->num_slots) {
> -					set_bit(j + 1, &mgr->payload_mask);
> -				} else {
> -					clear_bit(j + 1, &mgr->payload_mask);
> -				}
> -			}
> -			memset(&mgr->payloads[mgr->max_payloads - 1], 0, sizeof(struct drm_dp_payload));
> -			mgr->proposed_vcpis[mgr->max_payloads - 1] = NULL;
> -			clear_bit(mgr->max_payloads, &mgr->payload_mask);
> +		if (mgr->payloads[i].payload_state != DP_PAYLOAD_DELETE_LOCAL)
> +			continue;
> +
> +		DRM_DEBUG_KMS("removing payload %d\n", i);
> +		for (j = i; j < mgr->max_payloads - 1; j++) {
> +			mgr->payloads[j] = mgr->payloads[j + 1];
> +			mgr->proposed_vcpis[j] = mgr->proposed_vcpis[j + 1];
>  
> +			if (mgr->proposed_vcpis[j] &&
> +			    mgr->proposed_vcpis[j]->num_slots) {
> +				set_bit(j + 1, &mgr->payload_mask);
> +			} else {
> +				clear_bit(j + 1, &mgr->payload_mask);
> +			}
>  		}
> +
> +		memset(&mgr->payloads[mgr->max_payloads - 1], 0,
> +		       sizeof(struct drm_dp_payload));
> +		mgr->proposed_vcpis[mgr->max_payloads - 1] = NULL;
> +		clear_bit(mgr->max_payloads, &mgr->payload_mask);

With the commit message improved to mention
- Add local variables in the first loop
- Early continue to reduce 1 indent in the 2nd loop

this is Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

>  	}
>  	mutex_unlock(&mgr->payload_lock);
>  
> -- 
> 2.19.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports
  2018-12-14  1:25 ` [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports Lyude Paul
@ 2018-12-14  9:29   ` Daniel Vetter
  2018-12-18 21:27     ` Lyude Paul
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Vetter @ 2018-12-14  9:29 UTC (permalink / raw)
  To: Lyude Paul
  Cc: dri-devel, nouveau, intel-gfx, amd-gfx, Daniel Vetter,
	Dave Airlie, Harry Wentland, Jerry Zuo, David Airlie, Juston Li,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	Jonathan Corbet, linux-doc, linux-kernel

On Thu, Dec 13, 2018 at 08:25:32PM -0500, Lyude Paul wrote:
> The current way of handling refcounting in the DP MST helpers is really
> confusing and probably just plain wrong because it's been hacked up many
> times over the years without anyone actually going over the code and
> seeing if things could be simplified.
> 
> To the best of my understanding, the current scheme works like this:
> drm_dp_mst_port and drm_dp_mst_branch both have a single refcount. When
> this refcount hits 0 for either of the two, they're removed from the
> topology state, but not immediately freed. Both ports and branch devices
> will reinitialize their kref once it's hit 0 before actually destroying
> themselves. The intended purpose behind this is so that we can avoid
> problems like not being able to free a remote payload that might still
> be active, due to us having removed all of the port/branch device
> structures in memory, as per:
> 
> 91a25e463130 ("drm/dp/mst: deallocate payload on port destruction")
> 
> Which may have worked, but then it caused use-after-free errors. Being
> new to MST at the time, I tried fixing it;
> 
> 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()")
> 
> But, that was broken: both drm_dp_mst_port and drm_dp_mst_branch structs
> are validated in almost every DP MST helper function. Simply put, this
> means we go through the topology and try to see if the given
> drm_dp_mst_branch or drm_dp_mst_port is still attached to something
> before trying to use it in order to avoid dereferencing freed memory
> (something that has happened a LOT in the past with this library).
> Because of this it doesn't actually matter whether or not we keep keep
> the ports and branches around in memory as that's not enough, because
> any function that validates the branches and ports passed to it will
> still reject them anyway since they're no longer in the topology
> structure. So, use-after-free errors were fixed but payload deallocation
> was completely broken.
> 
> Two years later, AMD informed me about this issue and I attempted to
> come up with a temporary fix, pending a long-overdue cleanup of this
> library:
> 
> c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref")
> 
> But then that introduced use-after-free errors, so I quickly reverted
> it:
> 
> 9765635b3075 ("Revert "drm/dp_mst: Skip validating ports during destruction, just ref"")
> 
> And in the process, learned that there is just no simple fix for this:
> the design is just broken. Unfortuntely, the usage of these helpers are
> quite broken as well. Some drivers like i915 have been smart enough to
> avoid accessing any kind of information from MST port structures, but
> others like nouveau have assumed, understandably so, that
> drm_dp_mst_port structures are normal and can just be accessed at any
> time without worrying about use-after-free errors.
> 
> After a lot of discussion, me and Daniel Vetter came up with a better
> idea to replace all of this.
> 
> To summarize, since this is documented far more indepth in the
> documentation this patch introduces, we make it so that drm_dp_mst_port
> and drm_dp_mst_branch structures have two different classes of
> refcounts: topology_kref, and malloc_kref. topology_kref corresponds to
> the lifetime of the given drm_dp_mst_port or drm_dp_mst_branch in it's
> given topology. Once it hits zero, any associated connectors are removed
> and the branch or port can no longer be validated. malloc_kref
> corresponds to the lifetime of the memory allocation for the actual
> structure, and will always be non-zero so long as the topology_kref is
> non-zero. This gives us a way to allow callers to hold onto port and
> branch device structures past their topology lifetime, and dramatically
> simplifies the lifetimes of both structures. This also finally fixes the
> port deallocation problem, properly.
> 
> Additionally: since this now means that we can keep ports and branch
> devices allocated in memory for however long we need, we no longer need
> a significant amount of the port validation that we currently do.
> 
> Additionally, there is one last scenario that this fixes, which couldn't
> have been fixed properly beforehand:
> 
> - CPU1 unrefs port from topology (refcount 1->0)
> - CPU2 refs port in topology(refcount 0->1)
> 
> Since we now can guarantee memory safety for ports and branches
> as-needed, we also can make our main reference counting functions fix
> this problem by using kref_get_unless_zero() internally so that topology
> refcounts can only ever reach 0 once.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: David Airlie <airlied@redhat.com>
> Cc: Jerry Zuo <Jerry.Zuo@amd.com>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Juston Li <juston.li@intel.com>
> ---
>  .../gpu/dp-mst/topology-figure-1.dot          |  31 ++
>  .../gpu/dp-mst/topology-figure-2.dot          |  37 ++
>  .../gpu/dp-mst/topology-figure-3.dot          |  40 ++
>  Documentation/gpu/drm-kms-helpers.rst         | 125 ++++-
>  drivers/gpu/drm/drm_dp_mst_topology.c         | 512 +++++++++++++-----
>  include/drm/drm_dp_mst_helper.h               |  19 +-
>  6 files changed, 637 insertions(+), 127 deletions(-)
>  create mode 100644 Documentation/gpu/dp-mst/topology-figure-1.dot
>  create mode 100644 Documentation/gpu/dp-mst/topology-figure-2.dot
>  create mode 100644 Documentation/gpu/dp-mst/topology-figure-3.dot

Yay, docs, and pretty ones at that! Awesome stuff :-)

> 
> diff --git a/Documentation/gpu/dp-mst/topology-figure-1.dot b/Documentation/gpu/dp-mst/topology-figure-1.dot
> new file mode 100644
> index 000000000000..fb83789e0a3e
> --- /dev/null
> +++ b/Documentation/gpu/dp-mst/topology-figure-1.dot
> @@ -0,0 +1,31 @@
> +digraph T {
> +    /* Topology references */
> +    node [shape=oval];
> +    mstb1 -> {port1, port2};
> +    port1 -> mstb2;
> +    port2 -> mstb3 -> {port3, port4};
> +    port3 -> mstb4;
> +
> +    /* Malloc references */
> +    edge [style=dashed];
> +    mstb4 -> port3;
> +    {port4, port3} -> mstb3;
> +    mstb3 -> port2;
> +    mstb2 -> port1;
> +    {port1, port2} -> mstb1;
> +
> +    edge [dir=back];
> +    node [style=filled;shape=box;fillcolor=lightblue];
> +    port1 -> "Payload #1";
> +    port3 -> "Payload #2";
> +
> +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> +    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
> +    mstb4 [label="MSTB #4";style=filled;fillcolor=palegreen];
> +
> +    port1 [label="Port #1"];
> +    port2 [label="Port #2"];
> +    port3 [label="Port #3"];
> +    port4 [label="Port #4"];
> +}
> diff --git a/Documentation/gpu/dp-mst/topology-figure-2.dot b/Documentation/gpu/dp-mst/topology-figure-2.dot
> new file mode 100644
> index 000000000000..eebce560be40
> --- /dev/null
> +++ b/Documentation/gpu/dp-mst/topology-figure-2.dot
> @@ -0,0 +1,37 @@
> +digraph T {
> +    /* Topology references */
> +    node [shape=oval];
> +
> +    mstb1 -> {port1, port2};
> +    port1 -> mstb2;
> +    edge [color=red];
> +    port2 -> mstb3 -> {port3, port4};
> +    port3 -> mstb4;
> +    edge [color=""];
> +
> +    /* Malloc references */
> +    edge [style=dashed];
> +    port3 -> mstb3;
> +    mstb3 -> port2;
> +    mstb2 -> port1;
> +    {port1, port2} -> mstb1;
> +    edge [color=red];
> +    mstb4 -> port3;
> +    port4 -> mstb3;
> +    edge [color=""];
> +
> +    edge [dir=back];
> +    node [style=filled;shape=box;fillcolor=lightblue];
> +    port1 -> "Payload #1";
> +    port3 -> "Payload #2";
> +
> +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> +    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
> +    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
> +
> +    port1 [label="Port #1"];
> +    port2 [label="Port #2"];
> +    port3 [label="Port #3"];
> +    port4 [label="Port #4";style=filled;fillcolor=grey];
> +}
> diff --git a/Documentation/gpu/dp-mst/topology-figure-3.dot b/Documentation/gpu/dp-mst/topology-figure-3.dot
> new file mode 100644
> index 000000000000..9bf28d87144c
> --- /dev/null
> +++ b/Documentation/gpu/dp-mst/topology-figure-3.dot
> @@ -0,0 +1,40 @@
> +digraph T {
> +    /* Topology references */
> +    node [shape=oval];
> +
> +    mstb1 -> {port1, port2};
> +    port1 -> mstb2;
> +    edge [color=grey];
> +    port2 -> mstb3 -> {port3, port4};
> +    port3 -> mstb4;
> +    edge [color=""];
> +
> +    /* Malloc references */
> +    edge [style=dashed];
> +    port3 -> mstb3 [penwidth=3];
> +    mstb3 -> port2 [penwidth=3];
> +    mstb2 -> port1;
> +    {port1, port2} -> mstb1;
> +    edge [color=grey];
> +    mstb4 -> port3;
> +    port4 -> mstb3;
> +    edge [color=""];
> +
> +    edge [dir=back];
> +    node [style=filled;shape=box;fillcolor=lightblue];
> +    port1 -> payload1;
> +    port3 -> payload2 [penwidth=3];
> +
> +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> +    mstb3 [label="MSTB #3";penwidth=3;style=filled;fillcolor=palegreen];
> +    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
> +
> +    port1 [label="Port #1"];
> +    port2 [label="Port #2";penwidth=3];
> +    port3 [label="Port #3";penwidth=3];
> +    port4 [label="Port #4";style=filled;fillcolor=grey];
> +
> +    payload1 [label="Payload #1"];
> +    payload2 [label="Payload #2";penwidth=3];
> +}
> diff --git a/Documentation/gpu/drm-kms-helpers.rst b/Documentation/gpu/drm-kms-helpers.rst
> index b422eb8edf16..c0f994c2c72f 100644
> --- a/Documentation/gpu/drm-kms-helpers.rst
> +++ b/Documentation/gpu/drm-kms-helpers.rst
> @@ -208,8 +208,11 @@ Display Port Dual Mode Adaptor Helper Functions Reference
>  .. kernel-doc:: drivers/gpu/drm/drm_dp_dual_mode_helper.c
>     :export:
>  
> -Display Port MST Helper Functions Reference
> -===========================================
> +Display Port MST Helpers
> +========================
> +
> +Functions Reference
> +-------------------
>  
>  .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
>     :doc: dp mst helper
> @@ -220,6 +223,124 @@ Display Port MST Helper Functions Reference
>  .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
>     :export:
>  
> +Branch device and port refcounting
> +----------------------------------

I generally try to put the long-form explanations before the function
references. Since usually the references completely drown out everything
else and make it harder to spot the important overview stuff.


> +
> +Overview
> +~~~~~~~~
> +
> +The refcounting schemes for :c:type:`struct drm_dp_mst_branch` and
> +:c:type:`struct drm_dp_mst_port` are somewhat unusual. Both ports and branch
> +devices have two different kinds of refcounts: topology refcounts, and malloc
> +refcounts.
> +
> +Topology refcount overview
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Topology refcounts are not exposed to drivers, and are handled internally by the
> +DP MST helpers. The helpers use them in order to prevent the in-memory topology
> +state from being changed in the middle of critical operations like changing the
> +internal state of payload allocations. This means each branch and port will be
> +considered to be connected to the rest of the topology until it's topology
> +refcount reaches zero. Additionally, for ports this means that their associated
> +:c:type:`struct drm_connector` will stay registered with userspace until the
> +port's refcount reaches 0.
> +
> +
> +Topology refcount functions
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The DP MST helpers use the following functions to manage topology refcounts:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> +   :functions: drm_dp_mst_topology_get_port drm_dp_mst_topology_put_port
> +               drm_dp_mst_topology_ref_port drm_dp_mst_topology_get_mstb
> +               drm_dp_mst_topology_put_mstb drm_dp_mst_topology_ref_mstb
> +
> +Malloc refcount overview
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Malloc references are used to keep a :c:type:`struct drm_dp_mst_port` or
> +:c:type:`struct drm_dp_mst_branch` allocated even after all of its topology
> +references have been dropped, so that the driver or MST helpers can safely
> +access each branch's last known state before it was disconnected from the
> +topology. When the malloc refcount of a port or branch reaches 0, the memory
> +allocation containing the :c:type:`struct drm_dp_mst_branch` or :c:type:`struct
> +drm_dp_mst_port` respectively will be freed.
> +
> +Malloc refcounts for ports
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For :c:type:`struct drm_dp_mst_port`, malloc refcounts are exposed to drivers
> +through the following functions:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> +   :functions: drm_dp_mst_get_port_malloc drm_dp_mst_put_port_malloc
> +
> +Malloc refcounts for branch devices
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For :c:type:`struct drm_dp_mst_branch`, malloc refcounts are not currently
> +exposed to drivers. As of writing this documentation, there are no drivers that
> +have a usecase for accessing :c:type:`struct drm_dp_mst_branch` outside of the
> +MST helpers. Exposing this API to drivers in a race-free manner would take more
> +tweaking of the refcounting scheme, however patches are welcome provided there
> +is a legitimate driver usecase for this.
> +
> +Internally, malloc refcounts for :c:type:`struct drm_dp_mst_branch` are managed
> +by the DP MST core through the following functions:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> +   :functions: drm_dp_mst_get_mstb_malloc drm_dp_mst_put_mstb_malloc
> +
> +Refcount relationships in a topology
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Let's take a look at why the relationship between topology and malloc refcounts
> +is designed the way it is.
> +
> +.. kernel-figure:: dp-mst/topology-figure-1.dot
> +
> +   An example of topology and malloc refs in a DP MST topology with two active
> +   payloads. Topology refcount increments are indicated by solid lines, and
> +   malloc refcount increments are indicated by dashed lines. Each starts from
> +   the branch which incremented the refcount, and ends at the branch to which
> +   the refcount belongs to.
> +
> +As you can see in figure 1, every branch increments the topology
> +refcount of it's children, and increments the malloc refcount of it's parent.
> +Additionally, every payload increments the malloc refcount of it's assigned port
> +by 1.
> +
> +So, what would happen if MSTB #3 from the above figure was unplugged from the
> +system, but the driver hadn't yet removed payload #2 from port #3? The topology
> +would start to look like figure 2.
> +
> +.. kernel-figure:: dp-mst/topology-figure-2.dot
> +
> +   Ports and branch devices which have been released from memory are colored
> +   grey, and references which have been removed are colored red.
> +
> +Whenever a port or branch device's topology refcount reaches zero, it will
> +decrement the topology refcounts of all its children, the malloc refcount of its
> +parent, and finally its own malloc refcount. For MSTB #4 and port #4, this means
> +they both have been disconnected from the topology and freed from memory. But,
> +because payload #2 is still holding a reference to port #3, port #3 is removed
> +from the topology but it's :c:type:`struct drm_dp_mst_port` is still accessible
> +from memory. This also means port #3 has not yet decremented the malloc refcount
> +of MSTB #3, so it's :c:type:`struct drm_dp_mst_branch` will also stay allocated
> +in memory until port #3's malloc refcount reaches 0.
> +
> +This relationship is necessary because in order to release payload #2, we
> +need to be able to figure out the last relative of port #3 that's still
> +connected to the topology. In this case, we would travel up the topology as
> +shown in figure 3.
> +
> +.. kernel-figure:: dp-mst/topology-figure-3.dot
> +
> +And finally, remove payload #2 by communicating with port #2 through sideband
> +transactions.

(Blind guess, I haven't looked ahead in the series yet)

I assume that drivers also want to hold a malloc reference from their
connector, until that connector is destroyed completed (and we hence know
it released all its vcpi and other stuff and really doesn't need the port
anymore). Could we integrated that into these neat graphs too? Answering
the "so how does this integrate into my driver?" question is imo the most
important part for core api docs.

Another one: Any reason for not putting this right into the code as a DOC:
section? Ime moving docs as close as possible to the code improves the
odds it's kept up-to-date. The only overview texts I've left in the .rst
is the stuff that describes overall concepts (e.g. how all the kms objects
fit together).

All the sphinx/rst syntax should carry over 1:1, except in kerneldoc you
also can benefit from the abbreviated reference syntax from kerneldoc.

Anyway, really great stuff.

> +
>  MIPI DSI Helper Functions Reference
>  ===================================
>  
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 2ab16c9e6243..c196fb580beb 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -46,7 +46,7 @@ static bool dump_dp_payload_table(struct drm_dp_mst_topology_mgr *mgr,
>  				  char *buf);
>  static int test_calc_pbn_mode(void);
>  
> -static void drm_dp_put_port(struct drm_dp_mst_port *port);
> +static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port);
>  
>  static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,
>  				     int id,
> @@ -850,46 +850,120 @@ static struct drm_dp_mst_branch *drm_dp_add_mst_branch_device(u8 lct, u8 *rad)
>  	if (lct > 1)
>  		memcpy(mstb->rad, rad, lct / 2);
>  	INIT_LIST_HEAD(&mstb->ports);
> -	kref_init(&mstb->kref);
> +	kref_init(&mstb->topology_kref);
> +	kref_init(&mstb->malloc_kref);
>  	return mstb;
>  }
>  
>  static void drm_dp_free_mst_port(struct kref *kref);
> +static void drm_dp_free_mst_branch_device(struct kref *kref);

I'd move the functions around, forward declarations for static functions
is a bit silly

> +
> +/**
> + * drm_dp_mst_get_mstb_malloc() - Increment the malloc refcount of a branch
> + * device
> + * @mstb: The &struct drm_dp_mst_branch to increment the malloc refcount of
> + *
> + * Increments @mstb.malloc_kref. When @mstb.malloc_kref reaches 0, the memory

s/@/&/ for structure member references. @ references to parameters/members
in the same kerneldoc type only. With & you'll get a nice link, @ is just
markup (and yes & with a member unfortunately doesn't link to the member,
only the overall structure).

Similarly below.

> + * allocation for @mstb will be released and @mstb may no longer be used.
> + *
> + * Any malloc references acquired with this function must be released when
> + * they are no longer being used by calling drm_dp_mst_put_mstb_malloc().

I'd dropped "when they are no longer being used", and the line below too.
Short docs are better generally because attention span of readers.

> + *
> + * See also: drm_dp_mst_put_mstb_malloc()
> + */
> +static void
> +drm_dp_mst_get_mstb_malloc(struct drm_dp_mst_branch *mstb)
> +{
> +	kref_get(&mstb->malloc_kref);
> +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref));
> +}
> +
> +/**
> + * drm_dp_mst_put_mstb_malloc() - Decrement the malloc refcount of a branch
> + * device
> + * @mstb: The &struct drm_dp_mst_branch to decrement the malloc refcount of
> + *
> + * Decrements @mstb.malloc_kref. When @mstb.malloc_kref reaches 0, the memory
> + * allocation for @mstb will be released and @mstb may no longer be used.
> + *
> + * See also: drm_dp_mst_get_mstb_malloc()
> + */
> +static void
> +drm_dp_mst_put_mstb_malloc(struct drm_dp_mst_branch *mstb)
> +{
> +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref)-1);
> +	kref_put(&mstb->malloc_kref, drm_dp_free_mst_branch_device);
> +}
> +
> +/**
> + * drm_dp_mst_get_port_malloc() - Increment the malloc refcount of an MST port
> + * @port: The &struct drm_dp_mst_port to increment the malloc refcount of
> + *
> + * Increments @port.malloc_kref. When @port.malloc_kref reaches 0, the memory
> + * allocation for @port will be released and @port may no longer be used.
> + *
> + * Because @port could potentially be freed at any time by the DP MST helpers
> + * if @port.malloc_kref reaches 0, including during a call to this function,
> + * drivers that which to make use of &struct drm_dp_mst_port should ensure
> + * that they grab at least one main malloc reference to their MST ports in
> + * &drm_dp_mst_topology_cbs.add_connector. This callback is called before
> + * there is any chance for @port.malloc_kref to reach 0.
> + *
> + * Any malloc references acquired with this function must be released when
> + * they are no longer being used by calling drm_dp_mst_put_port_malloc().
> + *
> + * See also: drm_dp_mst_put_port_malloc()

Same reduction as with mstb_malloc version.

> + */
> +void
> +drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port)
> +{
> +	kref_get(&port->malloc_kref);
> +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref));
> +}
> +EXPORT_SYMBOL(drm_dp_mst_get_port_malloc);
> +
> +/**
> + * drm_dp_mst_put_port_malloc() - Decrement the malloc refcount of an MST port
> + * @port: The &struct drm_dp_mst_port to decrement the malloc refcount of
> + *
> + * Decrements @port.malloc_kref. When @port.malloc_kref reaches 0, the memory
> + * allocation for @port will be released and @port may no longer be used.
> + *
> + * See also: drm_dp_mst_get_port_malloc()
> + */
> +void
> +drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port)
> +{
> +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref)-1);
> +	kref_put(&port->malloc_kref, drm_dp_free_mst_port);
> +}
> +EXPORT_SYMBOL(drm_dp_mst_put_port_malloc);
>  
>  static void drm_dp_free_mst_branch_device(struct kref *kref)
>  {
> -	struct drm_dp_mst_branch *mstb = container_of(kref, struct drm_dp_mst_branch, kref);
> -	if (mstb->port_parent) {
> -		if (list_empty(&mstb->port_parent->next))
> -			kref_put(&mstb->port_parent->kref, drm_dp_free_mst_port);
> -	}
> +	struct drm_dp_mst_branch *mstb =
> +		container_of(kref, struct drm_dp_mst_branch, malloc_kref);
> +
> +	if (mstb->port_parent)
> +		drm_dp_mst_put_port_malloc(mstb->port_parent);
> +
>  	kfree(mstb);
>  }
>  
>  static void drm_dp_destroy_mst_branch_device(struct kref *kref)
>  {
> -	struct drm_dp_mst_branch *mstb = container_of(kref, struct drm_dp_mst_branch, kref);
> +	struct drm_dp_mst_branch *mstb =
> +		container_of(kref, struct drm_dp_mst_branch, topology_kref);
> +	struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
>  	struct drm_dp_mst_port *port, *tmp;
>  	bool wake_tx = false;
>  
> -	/*
> -	 * init kref again to be used by ports to remove mst branch when it is
> -	 * not needed anymore
> -	 */
> -	kref_init(kref);
> -
> -	if (mstb->port_parent && list_empty(&mstb->port_parent->next))
> -		kref_get(&mstb->port_parent->kref);
> -
> -	/*
> -	 * destroy all ports - don't need lock
> -	 * as there are no more references to the mst branch
> -	 * device at this point.
> -	 */
> +	mutex_lock(&mgr->lock);
>  	list_for_each_entry_safe(port, tmp, &mstb->ports, next) {
>  		list_del(&port->next);
> -		drm_dp_put_port(port);
> +		drm_dp_mst_topology_put_port(port);
>  	}
> +	mutex_unlock(&mgr->lock);

Would be nice to split this out (to highlight the bugfix more), but
because of the kref_init() hack not really feasible I think :-/
>  
>  	/* drop any tx slots msg */
>  	mutex_lock(&mstb->mgr->qlock);
> @@ -908,14 +982,82 @@ static void drm_dp_destroy_mst_branch_device(struct kref *kref)
>  	if (wake_tx)
>  		wake_up_all(&mstb->mgr->tx_waitq);
>  
> -	kref_put(kref, drm_dp_free_mst_branch_device);
> +	drm_dp_mst_put_mstb_malloc(mstb);
>  }
>  
> -static void drm_dp_put_mst_branch_device(struct drm_dp_mst_branch *mstb)
> +/**
> + * drm_dp_mst_topology_get_mstb() - Increment the topology refcount of a
> + * branch device unless its zero
> + * @mstb: &struct drm_dp_mst_branch to increment the topology refcount of
> + *
> + * Attempts to grab a topology reference to @mstb, if it hasn't yet been
> + * removed from the topology (e.g. @mstb.topology_kref has reached 0).
> + *
> + * Any topology references acquired with this function must be released when
> + * they are no longer being used by calling drm_dp_mst_topology_put_mstb().

I'd explain the relationship with malloc_kref a bit here:

- topology ref implies a malloc ref, hence you can call get_mstb_malloc
  with only holding a topology ref (might be better to explain this in the
  get_mstb_malloc kerneldoc, since it also applies to the unconditional
  kref_get below)
- malloc_ref is enough to call this function, but then it can fail

> + *
> + * See also:
> + * drm_dp_mst_topology_ref_mstb()

I'd write out when you should use this one instead:

"If you already have a topology reference you should use other_function()
instead."

> + * drm_dp_mst_topology_get_mstb()

This is this function itself :-)

> + *
> + * Returns:
> + * * 1: A topology reference was grabbed successfully
> + * * 0: @port is no longer in the topology, no reference was grabbed
> + */
> +static int __must_check
> +drm_dp_mst_topology_get_mstb(struct drm_dp_mst_branch *mstb)

Hm if you both want a kref_get and a kref_get_unless_zero then we need
better naming. topology_get_mstb should be the unconditional kref_get, the
conditional kref_get_unless_zero needs some indication that it could fail.
We need some verb that indicates that instead of "get":
- "validate" since we've used that one already
- "lookup" that's used by all the drm_mode_object lookup functions, feels
  a bit misleading
- "try_get"

>  {
> -	kref_put(&mstb->kref, drm_dp_destroy_mst_branch_device);
> +	int ret = kref_get_unless_zero(&mstb->topology_kref);
> +
> +	if (ret)
> +		DRM_DEBUG("mstb %p (%d)\n", mstb,
> +			  kref_read(&mstb->topology_kref));
> +
> +	return ret;
> +}
> +
> +/**
> + * drm_dp_mst_topology_ref_mstb() - Increment the topology refcount of a
> + * branch device
> + * @mstb: The &struct drm_dp_mst_branch to increment the topology refcount of
> + *
> + * Increments @mstb.topology_refcount without checking whether or not it's
> + * already reached 0. This is only valid to use in scenarios where you are
> + * already guaranteed to have at least one active topology reference to @mstb.
> + * Otherwise, drm_dp_mst_topology_get_mstb() should be used.

s/should/must/  (or my English understanding is off, afaiui "should" isn't
a strict requirement per rfc2119)

> + *
> + * Any topology references acquired with this function must be released when
> + * they are no longer being used by calling drm_dp_mst_topology_put_mstb().
> + *
> + * See also:
> + * drm_dp_mst_topology_get_mstb()
> + * drm_dp_mst_topology_put_mstb()
> + */
> +static void
> +drm_dp_mst_topology_ref_mstb(struct drm_dp_mst_branch *mstb)
> +{

Should we have a WARN_ON(refcount == 0) here?

> +	kref_get(&mstb->topology_kref);
> +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->topology_kref));
>  }
>  
> +/**
> + * drm_dp_mst_topology_put_mstb() - release a topology reference to a branch
> + * device
> + * @mstb: The &struct drm_dp_mst_branch to release the topology reference from
> + *
> + * Releases a topology reference from @mstb by decrementing
> + * @mstb.topology_kref.
> + *
> + * See also:
> + * drm_dp_mst_topology_get_mstb()
> + * drm_dp_mst_topology_ref_mstb()
> + */
> +static void
> +drm_dp_mst_topology_put_mstb(struct drm_dp_mst_branch *mstb)
> +{
> +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->topology_kref)-1);
> +	kref_put(&mstb->topology_kref, drm_dp_destroy_mst_branch_device);
> +}
>  
>  static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int old_pdt)
>  {
> @@ -930,14 +1072,15 @@ static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int old_pdt)
>  	case DP_PEER_DEVICE_MST_BRANCHING:
>  		mstb = port->mstb;
>  		port->mstb = NULL;
> -		drm_dp_put_mst_branch_device(mstb);
> +		drm_dp_mst_topology_put_mstb(mstb);
>  		break;
>  	}
>  }
>  
>  static void drm_dp_destroy_port(struct kref *kref)
>  {
> -	struct drm_dp_mst_port *port = container_of(kref, struct drm_dp_mst_port, kref);
> +	struct drm_dp_mst_port *port =
> +		container_of(kref, struct drm_dp_mst_port, topology_kref);
>  	struct drm_dp_mst_topology_mgr *mgr = port->mgr;
>  
>  	if (!port->input) {
> @@ -956,7 +1099,6 @@ static void drm_dp_destroy_port(struct kref *kref)
>  			 * from an EDID retrieval */
>  
>  			mutex_lock(&mgr->destroy_connector_lock);
> -			kref_get(&port->parent->kref);
>  			list_add(&port->next, &mgr->destroy_connector_list);
>  			mutex_unlock(&mgr->destroy_connector_lock);
>  			schedule_work(&mgr->destroy_connector_work);
> @@ -967,25 +1109,93 @@ static void drm_dp_destroy_port(struct kref *kref)
>  		drm_dp_port_teardown_pdt(port, port->pdt);
>  		port->pdt = DP_PEER_DEVICE_NONE;
>  	}
> -	kfree(port);
> +	drm_dp_mst_put_port_malloc(port);
>  }
>  
> -static void drm_dp_put_port(struct drm_dp_mst_port *port)
> +/**
> + * drm_dp_mst_topology_get_port() - Increment the topology refcount of a
> + * port unless its zero
> + * @port: &struct drm_dp_mst_port to increment the topology refcount of
> + *
> + * Attempts to grab a topology reference to @port, if it hasn't yet been
> + * removed from the topology (e.g. @port.topology_kref has reached 0).
> + *
> + * Any topology references acquired with this function must be released when
> + * they are no longer being used by calling drm_dp_mst_topology_put_port().
> + *
> + * See also:
> + * drm_dp_mst_topology_ref_port()
> + * drm_dp_mst_topology_put_port()
> + *
> + * Returns:
> + * * 1: A topology reference was grabbed successfully
> + * * 0: @port is no longer in the topology, no reference was grabbed
> + */
> +static int __must_check
> +drm_dp_mst_topology_get_port(struct drm_dp_mst_port *port)
>  {
> -	kref_put(&port->kref, drm_dp_destroy_port);
> +	int ret = kref_get_unless_zero(&port->topology_kref);
> +
> +	if (ret)
> +		DRM_DEBUG("port %p (%d)\n", port,
> +			  kref_read(&port->topology_kref));
> +
> +	return ret;
>  }
>  
> -static struct drm_dp_mst_branch *drm_dp_mst_get_validated_mstb_ref_locked(struct drm_dp_mst_branch *mstb, struct drm_dp_mst_branch *to_find)
> +/**
> + * drm_dp_mst_topology_ref_port() - Increment the topology refcount of a port
> + * @port: The &struct drm_dp_mst_port to increment the topology refcount of
> + *
> + * Increments @port.topology_refcount without checking whether or not it's
> + * already reached 0. This is only valid to use in scenarios where you are
> + * already guaranteed to have at least one active topology reference to @port.
> + * Otherwise, drm_dp_mst_topology_get_port() should be used.
> + *
> + * Any topology references acquired with this function must be released when
> + * they are no longer being used by calling drm_dp_mst_topology_put_port().
> + *
> + * See also:
> + * drm_dp_mst_topology_get_port()
> + * drm_dp_mst_topology_put_port()
> + */
> +static void drm_dp_mst_topology_ref_port(struct drm_dp_mst_port *port)
> +{
> +	kref_get(&port->topology_kref);
> +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->topology_kref));
> +}
> +
> +/**
> + * drm_dp_mst_topology_put_port() - release a topology reference to a port
> + * @port: The &struct drm_dp_mst_port to release the topology reference from
> + *
> + * Releases a topology reference from @port by decrementing
> + * @port.topology_kref.
> + *
> + * See also:
> + * drm_dp_mst_topology_get_port()
> + * drm_dp_mst_topology_ref_port()
> + */
> +static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port)
> +{
> +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->topology_kref)-1);
> +	kref_put(&port->topology_kref, drm_dp_destroy_port);
> +}
> +
> +static struct drm_dp_mst_branch *
> +drm_dp_mst_topology_get_mstb_validated_locked(struct drm_dp_mst_branch *mstb,
> +					      struct drm_dp_mst_branch *to_find)
>  {
>  	struct drm_dp_mst_port *port;
>  	struct drm_dp_mst_branch *rmstb;
> -	if (to_find == mstb) {
> -		kref_get(&mstb->kref);
> +
> +	if (to_find == mstb)
>  		return mstb;
> -	}
> +
>  	list_for_each_entry(port, &mstb->ports, next) {
>  		if (port->mstb) {
> -			rmstb = drm_dp_mst_get_validated_mstb_ref_locked(port->mstb, to_find);
> +			rmstb = drm_dp_mst_topology_get_mstb_validated_locked(

I think a prep patch which just renames the current get_validated/put
functions to the new names would be really good. Then this patch here with
the new stuff.


> +			    port->mstb, to_find);
>  			if (rmstb)
>  				return rmstb;
>  		}
> @@ -993,27 +1203,37 @@ static struct drm_dp_mst_branch *drm_dp_mst_get_validated_mstb_ref_locked(struct
>  	return NULL;
>  }
>  
> -static struct drm_dp_mst_branch *drm_dp_get_validated_mstb_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_branch *mstb)
> +static struct drm_dp_mst_branch *
> +drm_dp_mst_topology_get_mstb_validated(struct drm_dp_mst_topology_mgr *mgr,
> +				       struct drm_dp_mst_branch *mstb)
>  {
>  	struct drm_dp_mst_branch *rmstb = NULL;
> +
>  	mutex_lock(&mgr->lock);
> -	if (mgr->mst_primary)
> -		rmstb = drm_dp_mst_get_validated_mstb_ref_locked(mgr->mst_primary, mstb);
> +	if (mgr->mst_primary) {
> +		rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
> +		    mgr->mst_primary, mstb);
> +
> +		if (rmstb && !drm_dp_mst_topology_get_mstb(rmstb))
> +			rmstb = NULL;
> +	}
>  	mutex_unlock(&mgr->lock);
>  	return rmstb;
>  }
>  
> -static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_branch *mstb, struct drm_dp_mst_port *to_find)
> +static struct drm_dp_mst_port *
> +drm_dp_mst_topology_get_port_validated_locked(struct drm_dp_mst_branch *mstb,
> +					      struct drm_dp_mst_port *to_find)
>  {
>  	struct drm_dp_mst_port *port, *mport;
>  
>  	list_for_each_entry(port, &mstb->ports, next) {
> -		if (port == to_find) {
> -			kref_get(&port->kref);
> +		if (port == to_find)
>  			return port;
> -		}
> +
>  		if (port->mstb) {
> -			mport = drm_dp_mst_get_port_ref_locked(port->mstb, to_find);
> +			mport = drm_dp_mst_topology_get_port_validated_locked(
> +			    port->mstb, to_find);
>  			if (mport)
>  				return mport;
>  		}
> @@ -1021,12 +1241,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
>  	return NULL;
>  }
>  
> -static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
> +static struct drm_dp_mst_port *
> +drm_dp_mst_topology_get_port_validated(struct drm_dp_mst_topology_mgr *mgr,
> +				       struct drm_dp_mst_port *port)
>  {
>  	struct drm_dp_mst_port *rport = NULL;
> +
>  	mutex_lock(&mgr->lock);
> -	if (mgr->mst_primary)
> -		rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port);
> +	if (mgr->mst_primary) {
> +		rport = drm_dp_mst_topology_get_port_validated_locked(
> +		    mgr->mst_primary, port);
> +
> +		if (rport && !drm_dp_mst_topology_get_port(rport))
> +			rport = NULL;
> +	}
>  	mutex_unlock(&mgr->lock);
>  	return rport;
>  }
> @@ -1034,11 +1262,12 @@ static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_t
>  static struct drm_dp_mst_port *drm_dp_get_port(struct drm_dp_mst_branch *mstb, u8 port_num)
>  {
>  	struct drm_dp_mst_port *port;
> +	int ret;
>  
>  	list_for_each_entry(port, &mstb->ports, next) {
>  		if (port->port_num == port_num) {
> -			kref_get(&port->kref);
> -			return port;
> +			ret = drm_dp_mst_topology_get_port(port);
> +			return ret ? port : NULL;
>  		}
>  	}
>  
> @@ -1087,6 +1316,11 @@ static bool drm_dp_port_setup_pdt(struct drm_dp_mst_port *port)
>  		if (port->mstb) {
>  			port->mstb->mgr = port->mgr;
>  			port->mstb->port_parent = port;
> +			/*
> +			 * Make sure this port's memory allocation stays
> +			 * around until it's child MSTB releases it
> +			 */
> +			drm_dp_mst_get_port_malloc(port);
>  
>  			send_link = true;
>  		}
> @@ -1147,17 +1381,26 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
>  	bool created = false;
>  	int old_pdt = 0;
>  	int old_ddps = 0;
> +
>  	port = drm_dp_get_port(mstb, port_msg->port_number);
>  	if (!port) {
>  		port = kzalloc(sizeof(*port), GFP_KERNEL);
>  		if (!port)
>  			return;
> -		kref_init(&port->kref);
> +		kref_init(&port->topology_kref);
> +		kref_init(&port->malloc_kref);
>  		port->parent = mstb;
>  		port->port_num = port_msg->port_number;
>  		port->mgr = mstb->mgr;
>  		port->aux.name = "DPMST";
>  		port->aux.dev = dev->dev;
> +
> +		/*
> +		 * Make sure the memory allocation for our parent branch stays
> +		 * around until our own memory allocation is released
> +		 */
> +		drm_dp_mst_get_mstb_malloc(mstb);
> +
>  		created = true;
>  	} else {
>  		old_pdt = port->pdt;
> @@ -1177,7 +1420,7 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
>  	   for this list */
>  	if (created) {
>  		mutex_lock(&mstb->mgr->lock);
> -		kref_get(&port->kref);
> +		drm_dp_mst_topology_ref_port(port);
>  		list_add(&port->next, &mstb->ports);
>  		mutex_unlock(&mstb->mgr->lock);
>  	}
> @@ -1202,17 +1445,21 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
>  	if (created && !port->input) {
>  		char proppath[255];
>  
> -		build_mst_prop_path(mstb, port->port_num, proppath, sizeof(proppath));
> -		port->connector = (*mstb->mgr->cbs->add_connector)(mstb->mgr, port, proppath);
> +		build_mst_prop_path(mstb, port->port_num, proppath,
> +				    sizeof(proppath));
> +		port->connector = (*mstb->mgr->cbs->add_connector)(mstb->mgr,
> +								   port,
> +								   proppath);
>  		if (!port->connector) {
>  			/* remove it from the port list */
>  			mutex_lock(&mstb->mgr->lock);
>  			list_del(&port->next);
>  			mutex_unlock(&mstb->mgr->lock);
>  			/* drop port list reference */
> -			drm_dp_put_port(port);
> +			drm_dp_mst_topology_put_port(port);
>  			goto out;
>  		}
> +
>  		if ((port->pdt == DP_PEER_DEVICE_DP_LEGACY_CONV ||
>  		     port->pdt == DP_PEER_DEVICE_SST_SINK) &&
>  		    port->port_num >= DP_MST_LOGICAL_PORT_0) {
> @@ -1224,7 +1471,7 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
>  
>  out:
>  	/* put reference to this port */
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_port(port);
>  }
>  
>  static void drm_dp_update_port(struct drm_dp_mst_branch *mstb,
> @@ -1259,7 +1506,7 @@ static void drm_dp_update_port(struct drm_dp_mst_branch *mstb,
>  			dowork = true;
>  	}
>  
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_port(port);
>  	if (dowork)
>  		queue_work(system_long_wq, &mstb->mgr->work);
>  
> @@ -1270,7 +1517,7 @@ static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_
>  {
>  	struct drm_dp_mst_branch *mstb;
>  	struct drm_dp_mst_port *port;
> -	int i;
> +	int i, ret;
>  	/* find the port by iterating down */
>  
>  	mutex_lock(&mgr->lock);
> @@ -1295,7 +1542,9 @@ static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_
>  			}
>  		}
>  	}
> -	kref_get(&mstb->kref);
> +	ret = drm_dp_mst_topology_get_mstb(mstb);
> +	if (!ret)
> +		mstb = NULL;
>  out:
>  	mutex_unlock(&mgr->lock);
>  	return mstb;
> @@ -1325,19 +1574,22 @@ static struct drm_dp_mst_branch *get_mst_branch_device_by_guid_helper(
>  	return NULL;
>  }
>  
> -static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device_by_guid(
> -	struct drm_dp_mst_topology_mgr *mgr,
> -	uint8_t *guid)
> +static struct drm_dp_mst_branch *
> +drm_dp_get_mst_branch_device_by_guid(struct drm_dp_mst_topology_mgr *mgr,
> +				     uint8_t *guid)
>  {
>  	struct drm_dp_mst_branch *mstb;
> +	int ret;
>  
>  	/* find the port by iterating down */
>  	mutex_lock(&mgr->lock);
>  
>  	mstb = get_mst_branch_device_by_guid_helper(mgr->mst_primary, guid);
> -
> -	if (mstb)
> -		kref_get(&mstb->kref);
> +	if (mstb) {
> +		ret = drm_dp_mst_topology_get_mstb(mstb);
> +		if (!ret)
> +			mstb = NULL;
> +	}
>  
>  	mutex_unlock(&mgr->lock);
>  	return mstb;
> @@ -1362,10 +1614,10 @@ static void drm_dp_check_and_send_link_address(struct drm_dp_mst_topology_mgr *m
>  			drm_dp_send_enum_path_resources(mgr, mstb, port);
>  
>  		if (port->mstb) {
> -			mstb_child = drm_dp_get_validated_mstb_ref(mgr, port->mstb);
> +			mstb_child = drm_dp_mst_topology_get_mstb_validated(mgr, port->mstb);
>  			if (mstb_child) {
>  				drm_dp_check_and_send_link_address(mgr, mstb_child);
> -				drm_dp_put_mst_branch_device(mstb_child);
> +				drm_dp_mst_topology_put_mstb(mstb_child);
>  			}
>  		}
>  	}
> @@ -1375,16 +1627,19 @@ static void drm_dp_mst_link_probe_work(struct work_struct *work)
>  {
>  	struct drm_dp_mst_topology_mgr *mgr = container_of(work, struct drm_dp_mst_topology_mgr, work);
>  	struct drm_dp_mst_branch *mstb;
> +	int ret;
>  
>  	mutex_lock(&mgr->lock);
>  	mstb = mgr->mst_primary;
>  	if (mstb) {
> -		kref_get(&mstb->kref);
> +		ret = drm_dp_mst_topology_get_mstb(mstb);
> +		if (!ret)
> +			mstb = NULL;
>  	}
>  	mutex_unlock(&mgr->lock);
>  	if (mstb) {
>  		drm_dp_check_and_send_link_address(mgr, mstb);
> -		drm_dp_put_mst_branch_device(mstb);
> +		drm_dp_mst_topology_put_mstb(mstb);
>  	}
>  }
>  
> @@ -1695,22 +1950,32 @@ static struct drm_dp_mst_port *drm_dp_get_last_connected_port_to_mstb(struct drm
>  	return drm_dp_get_last_connected_port_to_mstb(mstb->port_parent->parent);
>  }
>  
> -static struct drm_dp_mst_branch *drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr *mgr,
> -									 struct drm_dp_mst_branch *mstb,
> -									 int *port_num)
> +static struct drm_dp_mst_branch *
> +drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr *mgr,
> +					struct drm_dp_mst_branch *mstb,
> +					int *port_num)
>  {
>  	struct drm_dp_mst_branch *rmstb = NULL;
>  	struct drm_dp_mst_port *found_port;
> +
>  	mutex_lock(&mgr->lock);
> -	if (mgr->mst_primary) {
> +	if (!mgr->mst_primary)
> +		goto out;
> +
> +	do {
>  		found_port = drm_dp_get_last_connected_port_to_mstb(mstb);
> +		if (!found_port)
> +			break;
>  
> -		if (found_port) {
> +		if (drm_dp_mst_topology_get_mstb(found_port->parent)) {
>  			rmstb = found_port->parent;
> -			kref_get(&rmstb->kref);
>  			*port_num = found_port->port_num;
> +		} else {
> +			/* Search again, starting from this parent */
> +			mstb = found_port->parent;
>  		}
> -	}
> +	} while (!rmstb);

Hm, is this a bugfix of validating the entire chain? Afaiui the new
topology_get still validates the entire chain, so I'm a bit confused what
this does here.

> +out:
>  	mutex_unlock(&mgr->lock);
>  	return rmstb;
>  }
> @@ -1726,17 +1991,19 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
>  	u8 sinks[DRM_DP_MAX_SDP_STREAMS];
>  	int i;
>  
> -	port = drm_dp_get_validated_port_ref(mgr, port);
> +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
>  	if (!port)
>  		return -EINVAL;
>  
>  	port_num = port->port_num;
> -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
>  	if (!mstb) {
> -		mstb = drm_dp_get_last_connected_port_and_mstb(mgr, port->parent, &port_num);
> +		mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> +							       port->parent,
> +							       &port_num);
>  
>  		if (!mstb) {
> -			drm_dp_put_port(port);
> +			drm_dp_mst_topology_put_port(port);
>  			return -EINVAL;
>  		}
>  	}
> @@ -1766,8 +2033,8 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
>  	}
>  	kfree(txmsg);
>  fail_put:
> -	drm_dp_put_mst_branch_device(mstb);
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_mstb(mstb);
> +	drm_dp_mst_topology_put_port(port);
>  	return ret;
>  }
>  
> @@ -1777,13 +2044,13 @@ int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
>  	struct drm_dp_sideband_msg_tx *txmsg;
>  	int len, ret;
>  
> -	port = drm_dp_get_validated_port_ref(mgr, port);
> +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
>  	if (!port)
>  		return -EINVAL;
>  
>  	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
>  	if (!txmsg) {
> -		drm_dp_put_port(port);
> +		drm_dp_mst_topology_put_port(port);
>  		return -ENOMEM;
>  	}
>  
> @@ -1799,7 +2066,7 @@ int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
>  			ret = 0;
>  	}
>  	kfree(txmsg);
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_port(port);
>  
>  	return ret;
>  }
> @@ -1888,7 +2155,8 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>  		if (vcpi) {
>  			port = container_of(vcpi, struct drm_dp_mst_port,
>  					    vcpi);
> -			port = drm_dp_get_validated_port_ref(mgr, port);
> +			port = drm_dp_mst_topology_get_port_validated(mgr,
> +								      port);
>  			if (!port) {
>  				mutex_unlock(&mgr->payload_lock);
>  				return -EINVAL;
> @@ -1925,7 +2193,7 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>  		cur_slots += req_payload.num_slots;
>  
>  		if (port)
> -			drm_dp_put_port(port);
> +			drm_dp_mst_topology_put_port(port);
>  	}
>  
>  	for (i = 0; i < mgr->max_payloads; i++) {
> @@ -2024,7 +2292,7 @@ static int drm_dp_send_dpcd_write(struct drm_dp_mst_topology_mgr *mgr,
>  	struct drm_dp_sideband_msg_tx *txmsg;
>  	struct drm_dp_mst_branch *mstb;
>  
> -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
>  	if (!mstb)
>  		return -EINVAL;
>  
> @@ -2048,7 +2316,7 @@ static int drm_dp_send_dpcd_write(struct drm_dp_mst_topology_mgr *mgr,
>  	}
>  	kfree(txmsg);
>  fail_put:
> -	drm_dp_put_mst_branch_device(mstb);
> +	drm_dp_mst_topology_put_mstb(mstb);
>  	return ret;
>  }
>  
> @@ -2158,7 +2426,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
>  
>  		/* give this the main reference */
>  		mgr->mst_primary = mstb;
> -		kref_get(&mgr->mst_primary->kref);
> +		drm_dp_mst_topology_ref_mstb(mgr->mst_primary);
>  
>  		ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
>  							 DP_MST_EN | DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC);
> @@ -2192,7 +2460,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
>  out_unlock:
>  	mutex_unlock(&mgr->lock);
>  	if (mstb)
> -		drm_dp_put_mst_branch_device(mstb);
> +		drm_dp_mst_topology_put_mstb(mstb);
>  	return ret;
>  
>  }
> @@ -2357,7 +2625,7 @@ static int drm_dp_mst_handle_down_rep(struct drm_dp_mst_topology_mgr *mgr)
>  			       mgr->down_rep_recv.initial_hdr.lct,
>  				      mgr->down_rep_recv.initial_hdr.rad[0],
>  				      mgr->down_rep_recv.msg[0]);
> -			drm_dp_put_mst_branch_device(mstb);
> +			drm_dp_mst_topology_put_mstb(mstb);
>  			memset(&mgr->down_rep_recv, 0, sizeof(struct drm_dp_sideband_msg_rx));
>  			return 0;
>  		}
> @@ -2368,7 +2636,7 @@ static int drm_dp_mst_handle_down_rep(struct drm_dp_mst_topology_mgr *mgr)
>  		}
>  
>  		memset(&mgr->down_rep_recv, 0, sizeof(struct drm_dp_sideband_msg_rx));
> -		drm_dp_put_mst_branch_device(mstb);
> +		drm_dp_mst_topology_put_mstb(mstb);
>  
>  		mutex_lock(&mgr->qlock);
>  		txmsg->state = DRM_DP_SIDEBAND_TX_RX;
> @@ -2441,7 +2709,7 @@ static int drm_dp_mst_handle_up_req(struct drm_dp_mst_topology_mgr *mgr)
>  		}
>  
>  		if (mstb)
> -			drm_dp_put_mst_branch_device(mstb);
> +			drm_dp_mst_topology_put_mstb(mstb);
>  
>  		memset(&mgr->up_req_recv, 0, sizeof(struct drm_dp_sideband_msg_rx));
>  	}
> @@ -2501,7 +2769,7 @@ enum drm_connector_status drm_dp_mst_detect_port(struct drm_connector *connector
>  	enum drm_connector_status status = connector_status_disconnected;
>  
>  	/* we need to search for the port in the mgr in case its gone */
> -	port = drm_dp_get_validated_port_ref(mgr, port);
> +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
>  	if (!port)
>  		return connector_status_disconnected;
>  
> @@ -2526,7 +2794,7 @@ enum drm_connector_status drm_dp_mst_detect_port(struct drm_connector *connector
>  		break;
>  	}
>  out:
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_port(port);
>  	return status;
>  }
>  EXPORT_SYMBOL(drm_dp_mst_detect_port);
> @@ -2543,11 +2811,11 @@ bool drm_dp_mst_port_has_audio(struct drm_dp_mst_topology_mgr *mgr,
>  {
>  	bool ret = false;
>  
> -	port = drm_dp_get_validated_port_ref(mgr, port);
> +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
>  	if (!port)
>  		return ret;
>  	ret = port->has_audio;
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_port(port);
>  	return ret;
>  }
>  EXPORT_SYMBOL(drm_dp_mst_port_has_audio);
> @@ -2567,7 +2835,7 @@ struct edid *drm_dp_mst_get_edid(struct drm_connector *connector, struct drm_dp_
>  	struct edid *edid = NULL;
>  
>  	/* we need to search for the port in the mgr in case its gone */
> -	port = drm_dp_get_validated_port_ref(mgr, port);
> +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
>  	if (!port)
>  		return NULL;
>  
> @@ -2578,7 +2846,7 @@ struct edid *drm_dp_mst_get_edid(struct drm_connector *connector, struct drm_dp_
>  		drm_connector_set_tile_property(connector);
>  	}
>  	port->has_audio = drm_detect_monitor_audio(edid);
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_port(port);
>  	return edid;
>  }
>  EXPORT_SYMBOL(drm_dp_mst_get_edid);
> @@ -2649,7 +2917,7 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
>  	if (IS_ERR(topology_state))
>  		return PTR_ERR(topology_state);
>  
> -	port = drm_dp_get_validated_port_ref(mgr, port);
> +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
>  	if (port == NULL)
>  		return -EINVAL;
>  	req_slots = DIV_ROUND_UP(pbn, mgr->pbn_div);
> @@ -2657,14 +2925,14 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
>  			req_slots, topology_state->avail_slots);
>  
>  	if (req_slots > topology_state->avail_slots) {
> -		drm_dp_put_port(port);
> +		drm_dp_mst_topology_put_port(port);
>  		return -ENOSPC;
>  	}
>  
>  	topology_state->avail_slots -= req_slots;
>  	DRM_DEBUG_KMS("vcpi slots avail=%d", topology_state->avail_slots);
>  
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_port(port);
>  	return req_slots;
>  }
>  EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
> @@ -2715,7 +2983,7 @@ bool drm_dp_mst_allocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
>  {
>  	int ret;
>  
> -	port = drm_dp_get_validated_port_ref(mgr, port);
> +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
>  	if (!port)
>  		return false;
>  
> @@ -2725,7 +2993,7 @@ bool drm_dp_mst_allocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
>  	if (port->vcpi.vcpi > 0) {
>  		DRM_DEBUG_KMS("payload: vcpi %d already allocated for pbn %d - requested pbn %d\n", port->vcpi.vcpi, port->vcpi.pbn, pbn);
>  		if (pbn == port->vcpi.pbn) {
> -			drm_dp_put_port(port);
> +			drm_dp_mst_topology_put_port(port);
>  			return true;
>  		}
>  	}
> @@ -2733,13 +3001,13 @@ bool drm_dp_mst_allocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
>  	ret = drm_dp_init_vcpi(mgr, &port->vcpi, pbn, slots);
>  	if (ret) {
>  		DRM_DEBUG_KMS("failed to init vcpi slots=%d max=63 ret=%d\n",
> -				DIV_ROUND_UP(pbn, mgr->pbn_div), ret);
> +			      DIV_ROUND_UP(pbn, mgr->pbn_div), ret);
>  		goto out;
>  	}
>  	DRM_DEBUG_KMS("initing vcpi for pbn=%d slots=%d\n",
> -			pbn, port->vcpi.num_slots);
> +		      pbn, port->vcpi.num_slots);
>  
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_port(port);
>  	return true;
>  out:
>  	return false;
> @@ -2749,12 +3017,12 @@ EXPORT_SYMBOL(drm_dp_mst_allocate_vcpi);
>  int drm_dp_mst_get_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
>  {
>  	int slots = 0;
> -	port = drm_dp_get_validated_port_ref(mgr, port);
> +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
>  	if (!port)
>  		return slots;
>  
>  	slots = port->vcpi.num_slots;
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_port(port);
>  	return slots;
>  }
>  EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
> @@ -2768,11 +3036,11 @@ EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
>   */
>  void drm_dp_mst_reset_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
>  {
> -	port = drm_dp_get_validated_port_ref(mgr, port);
> +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
>  	if (!port)
>  		return;
>  	port->vcpi.num_slots = 0;
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_port(port);
>  }
>  EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
>  
> @@ -2781,9 +3049,10 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
>   * @mgr: manager for this port
>   * @port: unverified port to deallocate vcpi for
>   */
> -void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
> +void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
> +				struct drm_dp_mst_port *port)
>  {
> -	port = drm_dp_get_validated_port_ref(mgr, port);
> +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
>  	if (!port)
>  		return;
>  
> @@ -2792,7 +3061,7 @@ void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr, struct drm_
>  	port->vcpi.pbn = 0;
>  	port->vcpi.aligned_pbn = 0;
>  	port->vcpi.vcpi = 0;
> -	drm_dp_put_port(port);
> +	drm_dp_mst_topology_put_port(port);
>  }
>  EXPORT_SYMBOL(drm_dp_mst_deallocate_vcpi);
>  
> @@ -3078,8 +3347,10 @@ static void drm_dp_tx_work(struct work_struct *work)
>  
>  static void drm_dp_free_mst_port(struct kref *kref)
>  {
> -	struct drm_dp_mst_port *port = container_of(kref, struct drm_dp_mst_port, kref);
> -	kref_put(&port->parent->kref, drm_dp_free_mst_branch_device);
> +	struct drm_dp_mst_port *port =
> +		container_of(kref, struct drm_dp_mst_port, malloc_kref);
> +
> +	drm_dp_mst_put_mstb_malloc(port->parent);
>  	kfree(port);
>  }
>  
> @@ -3103,7 +3374,6 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
>  		list_del(&port->next);
>  		mutex_unlock(&mgr->destroy_connector_lock);
>  
> -		kref_init(&port->kref);
>  		INIT_LIST_HEAD(&port->next);
>  
>  		mgr->cbs->destroy_connector(mgr, port->connector);
> @@ -3117,7 +3387,7 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
>  			drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
>  		}
>  
> -		kref_put(&port->kref, drm_dp_free_mst_port);
> +		drm_dp_mst_put_port_malloc(port);
>  		send_hotplug = true;
>  	}
>  	if (send_hotplug)
> @@ -3292,7 +3562,7 @@ static int drm_dp_mst_i2c_xfer(struct i2c_adapter *adapter, struct i2c_msg *msgs
>  	struct drm_dp_sideband_msg_tx *txmsg = NULL;
>  	int ret;
>  
> -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
>  	if (!mstb)
>  		return -EREMOTEIO;
>  
> @@ -3342,7 +3612,7 @@ static int drm_dp_mst_i2c_xfer(struct i2c_adapter *adapter, struct i2c_msg *msgs
>  	}
>  out:
>  	kfree(txmsg);
> -	drm_dp_put_mst_branch_device(mstb);
> +	drm_dp_mst_topology_put_mstb(mstb);
>  	return ret;
>  }
>  
> diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
> index 371cc2816477..50643a39765d 100644
> --- a/include/drm/drm_dp_mst_helper.h
> +++ b/include/drm/drm_dp_mst_helper.h
> @@ -44,7 +44,10 @@ struct drm_dp_vcpi {
>  
>  /**
>   * struct drm_dp_mst_port - MST port
> - * @kref: reference count for this port.
> + * @topology_kref: refcount for this port's lifetime in the topology, only the
> + * DP MST helpers should need to touch this
> + * @malloc_kref: refcount for the memory allocation containing this structure.
> + * See drm_dp_mst_get_port_malloc() and drm_dp_mst_put_port_malloc().
>   * @port_num: port number
>   * @input: if this port is an input port.
>   * @mcs: message capability status - DP 1.2 spec.
> @@ -67,7 +70,8 @@ struct drm_dp_vcpi {
>   * in the MST topology.
>   */
>  struct drm_dp_mst_port {
> -	struct kref kref;
> +	struct kref topology_kref;
> +	struct kref malloc_kref;

I'd to inline member kerneldoc here (you can mix&match, so no need to
rewrite them all) and spend a few words reference the family of get/put
functions. Same for mstb below.

>  
>  	u8 port_num;
>  	bool input;
> @@ -102,7 +106,10 @@ struct drm_dp_mst_port {
>  
>  /**
>   * struct drm_dp_mst_branch - MST branch device.
> - * @kref: reference count for this port.
> + * @topology_kref: refcount for this branch device's lifetime in the topology,
> + * only the DP MST helpers should need to touch this
> + * @malloc_kref: refcount for the memory allocation containing this structure.
> + * See drm_dp_mst_get_mstb_malloc() and drm_dp_mst_put_mstb_malloc().
>   * @rad: Relative Address to talk to this branch device.
>   * @lct: Link count total to talk to this branch device.
>   * @num_ports: number of ports on the branch.
> @@ -121,7 +128,8 @@ struct drm_dp_mst_port {
>   * to downstream port of parent branches.
>   */
>  struct drm_dp_mst_branch {
> -	struct kref kref;
> +	struct kref topology_kref;
> +	struct kref malloc_kref;
>  	u8 rad[8];
>  	u8 lct;
>  	int num_ports;
> @@ -626,4 +634,7 @@ int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
>  int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
>  				 struct drm_dp_mst_port *port, bool power_up);
>  
> +void drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port);
> +void drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port);
> +
>  #endif
> -- 
> 2.19.2

I really like. Mostly concentrated on looking at the docs. Also still
need to apply it and build the docs, so I can appreciate the DOT graphs.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 06/15] drm/i915: Keep malloc references to MST ports
  2018-12-14  1:25 ` [WIP PATCH 06/15] drm/i915: Keep malloc references to MST ports Lyude Paul
@ 2018-12-14  9:32   ` Daniel Vetter
  2018-12-18 21:52     ` Lyude Paul
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Vetter @ 2018-12-14  9:32 UTC (permalink / raw)
  To: Lyude Paul
  Cc: dri-devel, nouveau, intel-gfx, amd-gfx, Daniel Vetter,
	Dave Airlie, Harry Wentland, Jerry Zuo, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, David Airlie, linux-kernel

On Thu, Dec 13, 2018 at 08:25:35PM -0500, Lyude Paul wrote:
> So that the ports stay around until we've destroyed the connectors, in
> order to ensure that we don't pass an invalid pointer to any MST helpers
> once we introduce the new MST VCPI helpers.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> ---
>  drivers/gpu/drm/i915/intel_connector.c | 4 ++++
>  drivers/gpu/drm/i915/intel_dp_mst.c    | 2 ++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_connector.c b/drivers/gpu/drm/i915/intel_connector.c
> index 18e370f607bc..37d2c644f4b8 100644
> --- a/drivers/gpu/drm/i915/intel_connector.c
> +++ b/drivers/gpu/drm/i915/intel_connector.c
> @@ -95,6 +95,10 @@ void intel_connector_destroy(struct drm_connector *connector)
>  	intel_panel_fini(&intel_connector->panel);
>  
>  	drm_connector_cleanup(connector);
> +
> +	if (intel_connector->port)

We set this as the first thing in intel_dp_add_mst_connector, so this
check isn't doing anything.

> +		drm_dp_mst_put_port_malloc(intel_connector->port);
> +
>  	kfree(connector);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
> index f05427b74e34..4d6ced34d465 100644
> --- a/drivers/gpu/drm/i915/intel_dp_mst.c
> +++ b/drivers/gpu/drm/i915/intel_dp_mst.c
> @@ -484,6 +484,8 @@ static struct drm_connector *intel_dp_add_mst_connector(struct drm_dp_mst_topolo
>  	if (ret)
>  		goto err;
>  
> +	drm_dp_mst_get_port_malloc(port);

Needs to be moved up where we assing intel_connector->port, or it'll
underflow on cleanup on error paths.

> +
>  	return connector;
>  
>  err:
> -- 
> 2.19.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 05/15] drm/dp_mst: Fix payload deallocation on hotplugs using malloc refs
  2018-12-14  1:25 ` [WIP PATCH 05/15] drm/dp_mst: Fix payload deallocation on hotplugs using malloc refs Lyude Paul
@ 2018-12-14  9:38   ` Daniel Vetter
  2018-12-18 22:02     ` Lyude Paul
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Vetter @ 2018-12-14  9:38 UTC (permalink / raw)
  To: Lyude Paul
  Cc: dri-devel, nouveau, intel-gfx, amd-gfx, Daniel Vetter,
	Dave Airlie, Harry Wentland, Jerry Zuo, Maarten Lankhorst,
	Maxime Ripard, Sean Paul, David Airlie, linux-kernel

On Thu, Dec 13, 2018 at 08:25:34PM -0500, Lyude Paul wrote:
> Up until now, freeing payloads on remote MST hubs that just had ports
> removed has almost never worked because we've been relying on port
> validation in order to stop us from accessing ports that have already
> been freed from memory, but ports which need their payloads released due
> to being removed will never be a valid part of the topology after
> they've been removed.
> 
> Since we've introduced malloc refs, we can replace all of the validation
> logic in payload helpers which are used for deallocation with some
> well-placed malloc krefs. This ensures that regardless of whether or not
> the ports are still valid and in the topology, any port which has an
> allocated payload will remain allocated in memory until it's payloads
> have been removed - finally allowing us to actually release said
> payloads correctly.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>

I think with this we can also remove the int return value (that everyone
ignored except for some debug output) from drm_dp_update_payload_part1/2.
Follow-up cleanup patch ofc.

This looks good.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 54 +++++++++++++++------------
>  1 file changed, 30 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index ae9d019af9f2..93f08bfd2ab3 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1989,10 +1989,6 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
>  	u8 sinks[DRM_DP_MAX_SDP_STREAMS];
>  	int i;
>  
> -	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> -	if (!port)
> -		return -EINVAL;
> -
>  	port_num = port->port_num;
>  	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
>  	if (!mstb) {
> @@ -2000,10 +1996,8 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
>  							       port->parent,
>  							       &port_num);
>  
> -		if (!mstb) {
> -			drm_dp_mst_topology_put_port(port);
> +		if (!mstb)
>  			return -EINVAL;
> -		}
>  	}
>  
>  	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> @@ -2032,7 +2026,6 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
>  	kfree(txmsg);
>  fail_put:
>  	drm_dp_mst_topology_put_mstb(mstb);
> -	drm_dp_mst_topology_put_port(port);
>  	return ret;
>  }
>  
> @@ -2137,15 +2130,16 @@ static int drm_dp_destroy_payload_step2(struct drm_dp_mst_topology_mgr *mgr,
>   */
>  int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>  {
> -	int i, j;
> -	int cur_slots = 1;
>  	struct drm_dp_payload req_payload;
>  	struct drm_dp_mst_port *port;
> +	int i, j;
> +	int cur_slots = 1;
>  
>  	mutex_lock(&mgr->payload_lock);
>  	for (i = 0; i < mgr->max_payloads; i++) {
>  		struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
>  		struct drm_dp_payload *payload = &mgr->payloads[i];
> +		bool put_port = false;
>  
>  		/* solve the current payloads - compare to the hw ones
>  		   - update the hw view */
> @@ -2153,12 +2147,20 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>  		if (vcpi) {
>  			port = container_of(vcpi, struct drm_dp_mst_port,
>  					    vcpi);
> -			port = drm_dp_mst_topology_get_port_validated(mgr,
> -								      port);
> -			if (!port) {
> -				mutex_unlock(&mgr->payload_lock);
> -				return -EINVAL;
> +
> +			/* Validated ports don't matter if we're releasing
> +			 * VCPI
> +			 */
> +			if (vcpi->num_slots) {
> +				port = drm_dp_mst_topology_get_port_validated(
> +				    mgr, port);
> +				if (!port) {
> +					mutex_unlock(&mgr->payload_lock);
> +					return -EINVAL;
> +				}
> +				put_port = true;
>  			}
> +
>  			req_payload.num_slots = vcpi->num_slots;
>  			req_payload.vcpi = vcpi->vcpi;
>  		} else {
> @@ -2190,7 +2192,7 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>  		}
>  		cur_slots += req_payload.num_slots;
>  
> -		if (port)
> +		if (put_port)
>  			drm_dp_mst_topology_put_port(port);
>  	}
>  
> @@ -3005,6 +3007,8 @@ bool drm_dp_mst_allocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
>  	DRM_DEBUG_KMS("initing vcpi for pbn=%d slots=%d\n",
>  		      pbn, port->vcpi.num_slots);
>  
> +	/* Keep port allocated until it's payload has been removed */
> +	drm_dp_mst_get_port_malloc(port);
>  	drm_dp_mst_topology_put_port(port);
>  	return true;
>  out:
> @@ -3034,11 +3038,12 @@ EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
>   */
>  void drm_dp_mst_reset_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
>  {
> -	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> -	if (!port)
> -		return;
> +	/*
> +	 * A port with VCPI will remain allocated until it's VCPI is
> +	 * released, no verified ref needed
> +	 */
> +
>  	port->vcpi.num_slots = 0;
> -	drm_dp_mst_topology_put_port(port);
>  }
>  EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
>  
> @@ -3050,16 +3055,17 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
>  void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
>  				struct drm_dp_mst_port *port)
>  {
> -	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> -	if (!port)
> -		return;
> +	/*
> +	 * A port with VCPI will remain allocated until it's VCPI is
> +	 * released, no verified ref needed
> +	 */
>  
>  	drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
>  	port->vcpi.num_slots = 0;
>  	port->vcpi.pbn = 0;
>  	port->vcpi.aligned_pbn = 0;
>  	port->vcpi.vcpi = 0;
> -	drm_dp_mst_topology_put_port(port);
> +	drm_dp_mst_put_port_malloc(port);
>  }
>  EXPORT_SYMBOL(drm_dp_mst_deallocate_vcpi);
>  
> -- 
> 2.19.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 04/15] drm/dp_mst: Stop releasing VCPI when removing ports from topology
  2018-12-14  1:25 ` [WIP PATCH 04/15] drm/dp_mst: Stop releasing VCPI when removing ports from topology Lyude Paul
@ 2018-12-14  9:40   ` Daniel Vetter
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2018-12-14  9:40 UTC (permalink / raw)
  To: Lyude Paul
  Cc: dri-devel, nouveau, intel-gfx, amd-gfx, Daniel Vetter,
	Dave Airlie, Harry Wentland, Jerry Zuo, Maarten Lankhorst,
	Maxime Ripard, Sean Paul, David Airlie, linux-kernel

On Thu, Dec 13, 2018 at 08:25:33PM -0500, Lyude Paul wrote:
> This has never actually worked, and isn't needed anyway: the driver's
> always going to try to deallocate VCPI when it tears down the display
> that the VCPI belongs to.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 8 --------
>  1 file changed, 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index c196fb580beb..ae9d019af9f2 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1084,8 +1084,6 @@ static void drm_dp_destroy_port(struct kref *kref)
>  	struct drm_dp_mst_topology_mgr *mgr = port->mgr;
>  
>  	if (!port->input) {
> -		port->vcpi.num_slots = 0;
> -
>  		kfree(port->cached_edid);
>  
>  		/*
> @@ -3381,12 +3379,6 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
>  		drm_dp_port_teardown_pdt(port, port->pdt);
>  		port->pdt = DP_PEER_DEVICE_NONE;
>  
> -		if (!port->input && port->vcpi.vcpi > 0) {
> -			drm_dp_mst_reset_vcpi_slots(mgr, port);
> -			drm_dp_update_payload_part1(mgr);
> -			drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
> -		}
> -
>  		drm_dp_mst_put_port_malloc(port);
>  		send_hotplug = true;
>  	}
> -- 
> 2.19.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 13/15] drm/dp_mst: Start tracking per-port VCPI allocations
  2018-12-14  1:25 ` [WIP PATCH 13/15] drm/dp_mst: Start tracking per-port VCPI allocations Lyude Paul
@ 2018-12-14 16:37   ` Daniel Vetter
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2018-12-14 16:37 UTC (permalink / raw)
  To: Lyude Paul
  Cc: dri-devel, nouveau, intel-gfx, amd-gfx, Daniel Vetter,
	Dave Airlie, Harry Wentland, Jerry Zuo, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, linux-kernel

On Thu, Dec 13, 2018 at 08:25:42PM -0500, Lyude Paul wrote:
> There has been a TODO waiting for quite a long time in
> drm_dp_mst_topology.c:
> 
> 	/* We cannot rely on port->vcpi.num_slots to update
> 	 * topology_state->avail_slots as the port may not exist if the parent
> 	 * branch device was unplugged. This should be fixed by tracking
> 	 * per-port slot allocation in drm_dp_mst_topology_state instead of
> 	 * depending on the caller to tell us how many slots to release.
> 	 */
> 
> That's not the only reason we should fix this: forcing the driver to
> track the VCPI allocations throughout a state's atomic check is
> error prone, because it means that extra care has to be taken with the
> order that drm_dp_atomic_find_vcpi_slots() and
> drm_dp_atomic_release_vcpi_slots() are called in in order to ensure
> idempotency. Currently the only driver actually using these helpers,
> i915, doesn't even do this correctly: multiple ->best_encoder() checks
> with i915's current implementation would not be idempotent and would
> over-allocate VCPI slots, something I learned trying to implement
> fallback retraining in MST.
> 
> So: simplify this whole mess, and teach drm_dp_atomic_find_vcpi_slots()
> and drm_dp_atomic_release_vcpi_slots() to track the VCPI allocations for
> each port. This allows us to ensure idempotency without having to rely
> on the driver as much. Additionally: the driver doesn't need to do any
> kind of VCPI slot tracking anymore if it doesn't need it for it's own
> internal state.
> 
> Additionally; this adds a new drm_dp_mst_atomic_check() helper which
> must be used by atomic drivers to perform validity checks for the new
> VCPI allocations incurred by a state.
> 
> Also: update the documentation and make it more obvious that these
> /must/ be called by /all/ atomic drivers supporting MST.
> 
> Changes since v6:
>  - Keep a kref to all of the ports we have allocations on. This required
>    a good bit of changing to when we call drm_dp_find_vcpi_slots(),
>    mainly that we need to ensure that we only redo VCPI allocations on
>    actual mode or CRTC changes, not crtc_state->active changes.
>    Additionally, we no longer take the registration of the DRM connector
>    for each port into account because so long as we have a kref to the
>    port in the new or previous atomic state, the connector will stay
>    registered.
>  - Use the small changes to drm_dp_put_port() to add even more error
>    checking to make misusage of the helpers more obvious. I added this
>    after having to chase down various use-after-free conditions that
>    started popping up from the new helpers so no one else has to
>    troubleshoot that.
>  - Move some accidental DRM_DEBUG_KMS() calls to DRM_DEBUG_ATOMIC()
>  - Update documentation again, note that find/release() should both not be
>    called on the same port in a single atomic check phase (but multiple
>    calls to one or the other is OK)
> 
> Changes since v4:
>  - Don't skip the atomic checks for VCPI allocations if no new VCPI
>    allocations happen in a state. This makes the next change I'm about
>    to list here a lot easier to implement.
>  - Don't ignore VCPI allocations on destroyed ports, instead ensure that
>    when ports are destroyed and still have VCPI allocations in the
>    topology state, the only state changes allowed are releasing said
>    ports' VCPI. This prevents a state with a mix of VCPI allocations
>    from destroyed ports, and allocations from valid ports.
> 
> Changes since v3:
>  - Don't release VCPI allocations in the topology state immediately in
>    drm_dp_atomic_release_vcpi_slots(), instead mark them as 0 and skip
>    over them in drm_dp_mst_duplicate_state(). This makes it so
>    drm_dp_atomic_release_vcpi_slots() is still idempotent while also
>    throwing warnings if the driver messes up it's book keeping and tries
>    to release VCPI slots on a port that doesn't have any pre-existing
>    VCPI allocation - danvet
>  - Change mst_state/state in some debugging messages to "mst state"
> 
> Changes since v2:
>  - Use kmemdup() for duplicating MST state - danvet
>  - Move port validation out of duplicate state callback - danvet
>  - Handle looping through MST topology states in
>    drm_dp_mst_atomic_check() so the driver doesn't have to do it
>  - Fix documentation in drm_dp_atomic_find_vcpi_slots()
>  - Move the atomic check for each individual topology state into it's
>    own function, reduces indenting
>  - Don't consider "stale" MST ports when calculating the bandwidth
>    requirements. This is needed because originally we relied on the
>    state duplication functions to prune any stale ports from the new
>    state, which would prevent us from incorrectly considering their
>    bandwidth requirements alongside legitimate new payloads.
>  - Add function references in drm_dp_atomic_release_vcpi_slots() - danvet
>  - Annotate atomic VCPI and atomic check functions with __must_check
>    - danvet
> 
> Changes since v1:
>  - Don't use the now-removed ->atomic_check() for private objects hook,
>    just give drivers a function to call themselves
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 271 +++++++++++++++++++++++---
>  drivers/gpu/drm/i915/intel_display.c  |   4 +
>  drivers/gpu/drm/i915/intel_dp_mst.c   |  64 +++---
>  include/drm/drm_dp_mst_helper.h       |  24 ++-
>  4 files changed, 298 insertions(+), 65 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 8d94c8943ac7..b9374c981a5b 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -2897,21 +2897,42 @@ static int drm_dp_init_vcpi(struct drm_dp_mst_topology_mgr *mgr,
>  }
>  
>  /**
> - * drm_dp_atomic_find_vcpi_slots() - Find and add vcpi slots to the state
> + * drm_dp_atomic_find_vcpi_slots() - Find and add VCPI slots to the state
>   * @state: global atomic state
>   * @mgr: MST topology manager for the port
>   * @port: port to find vcpi slots for
>   * @pbn: bandwidth required for the mode in PBN
>   *
> - * RETURNS:
> - * Total slots in the atomic state assigned for this port or error
> + * Allocates VCPI slots to @port, replacing any previous VCPI allocations it
> + * may have had. Any atomic drivers which support MST must call this function
> + * in their &drm_encoder_helper_funcs.atomic_check() callback to change the
> + * current VCPI allocation for the new state, but only when
> + * &drm_crtc_state.mode_changed or &drm_crtc_state.connectors_changed is set
> + * to ensure compatibility with userspace applications that still use the
> + * legacy modesetting UAPI.
> + *
> + * Allocations set by this function are not checked against the bandwidth
> + * restraints of @mgr until the driver calls drm_dp_mst_atomic_check().
> + *
> + * Additionally, it is OK to call this function multiple times on the same
> + * @port as needed. It is not OK however, to call this function and
> + * drm_dp_atomic_release_vcpi_slots() in the same atomic check phase.
> + *
> + * See also:
> + * drm_dp_atomic_release_vcpi_slots()
> + * drm_dp_mst_atomic_check()
> + *
> + * Returns:
> + * Total slots in the atomic state assigned for this port, or a negative error
> + * code if the port no longer exists
>   */
>  int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
>  				  struct drm_dp_mst_topology_mgr *mgr,
>  				  struct drm_dp_mst_port *port, int pbn)
>  {
>  	struct drm_dp_mst_topology_state *topology_state;
> -	int req_slots;
> +	struct drm_dp_vcpi_allocation *pos, *vcpi = NULL;
> +	int prev_slots, req_slots, ret;
>  
>  	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
>  	if (IS_ERR(topology_state))
> @@ -2920,20 +2941,56 @@ int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
>  	port = drm_dp_mst_topology_get_port_validated(mgr, port);
>  	if (port == NULL)
>  		return -EINVAL;
> -	req_slots = DIV_ROUND_UP(pbn, mgr->pbn_div);
> -	DRM_DEBUG_KMS("vcpi slots req=%d, avail=%d\n",
> -			req_slots, topology_state->avail_slots);
>  
> -	if (req_slots > topology_state->avail_slots) {
> -		drm_dp_mst_topology_put_port(port);
> -		return -ENOSPC;
> +	topology_state->vcpi_allocated = true;
> +
> +	/* Find the current allocation for this port, if any */
> +	list_for_each_entry(pos, &topology_state->vcpis, next) {
> +		if (pos->port == port) {
> +			vcpi = pos;
> +			prev_slots = vcpi->vcpi;
> +
> +			/*
> +			 * This should never happen, unless the driver tries
> +			 * releasing and allocating the same VCPI allocation,
> +			 * which is an error
> +			 */
> +			if (WARN_ON(!prev_slots)) {
> +				DRM_ERROR("cannot allocate and release VCPI on [MST PORT:%p] in the same state\n",
> +					  port);
> +				return -EINVAL;
> +			}
> +
> +			break;
> +		}
>  	}
> +	if (!vcpi)
> +		prev_slots = 0;
> +
> +	req_slots = DIV_ROUND_UP(pbn, mgr->pbn_div);
> +
> +	DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] [MST PORT:%p] VCPI %d -> %d\n",
> +			 port->connector->base.id, port->connector->name,
> +			 port, prev_slots, req_slots);
>  
> -	topology_state->avail_slots -= req_slots;
> -	DRM_DEBUG_KMS("vcpi slots avail=%d", topology_state->avail_slots);
> +	/* Add the new allocation to the state */
> +	if (!vcpi) {
> +		vcpi = kzalloc(sizeof(*vcpi), GFP_KERNEL);
> +		if (!vcpi) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
>  
> +		drm_dp_mst_get_port_malloc(port);
> +		vcpi->port = port;
> +		list_add(&vcpi->next, &topology_state->vcpis);
> +	}
> +	vcpi->vcpi = req_slots;
> +
> +	ret = req_slots;
> +out:
>  	drm_dp_mst_topology_put_port(port);
> -	return req_slots;
> +	return ret;
>  }
>  EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
>  
> @@ -2941,31 +2998,57 @@ EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
>   * drm_dp_atomic_release_vcpi_slots() - Release allocated vcpi slots
>   * @state: global atomic state
>   * @mgr: MST topology manager for the port
> - * @slots: number of vcpi slots to release
> + * @port: The port to release the VCPI slots from
>   *
> - * RETURNS:
> - * 0 if @slots were added back to &drm_dp_mst_topology_state->avail_slots or
> - * negative error code
> + * Releases any VCPI slots that have been allocated to a port in the atomic
> + * state. Any atomic drivers which support MST must call this function in
> + * their &drm_connector_helper_funcs.atomic_check() callback when the
> + * connector will no longer have VCPI allocated (e.g. because it's CRTC was
> + * removed) when it had VCPI allocated in the previous atomic state.
> + *
> + * It is OK to call this even if @port has been removed from the system.
> + * Additionally, it is OK to call this function multiple times on the same
> + * @port as needed. It is not OK however, to call this function and
> + * drm_dp_atomic_find_vcpi_slots() on the same @port in a single atomic check
> + * phase.
> + *
> + * See also:
> + * drm_dp_atomic_find_vcpi_slots()
> + * drm_dp_mst_atomic_check()
> + *
> + * Returns:
> + * 0 if all slots for this port were added back to
> + * &drm_dp_mst_topology_state.avail_slots or negative error code
>   */
>  int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
>  				     struct drm_dp_mst_topology_mgr *mgr,
> -				     int slots)
> +				     struct drm_dp_mst_port *port)
>  {
>  	struct drm_dp_mst_topology_state *topology_state;
> +	struct drm_dp_vcpi_allocation *pos;
> +	bool found = false;
>  
>  	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
>  	if (IS_ERR(topology_state))
>  		return PTR_ERR(topology_state);
>  
> -	/* We cannot rely on port->vcpi.num_slots to update
> -	 * topology_state->avail_slots as the port may not exist if the parent
> -	 * branch device was unplugged. This should be fixed by tracking
> -	 * per-port slot allocation in drm_dp_mst_topology_state instead of
> -	 * depending on the caller to tell us how many slots to release.
> -	 */
> -	topology_state->avail_slots += slots;
> -	DRM_DEBUG_KMS("vcpi slots released=%d, avail=%d\n",
> -			slots, topology_state->avail_slots);
> +	list_for_each_entry(pos, &topology_state->vcpis, next) {
> +		if (pos->port == port) {
> +			found = true;
> +			break;
> +		}
> +	}
> +	if (WARN_ON(!found)) {
> +		DRM_ERROR("no VCPI for [MST PORT:%p] found in mst state %p\n",
> +			  port, &topology_state->base);
> +		return -EINVAL;
> +	}
> +
> +	DRM_DEBUG_ATOMIC("[MST PORT:%p] VCPI %d -> 0\n", port, pos->vcpi);
> +	if (pos->vcpi) {
> +		drm_dp_mst_put_port_malloc(port);
> +		pos->vcpi = 0;
> +	}
>  
>  	return 0;
>  }
> @@ -3395,15 +3478,42 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
>  static struct drm_private_state *
>  drm_dp_mst_duplicate_state(struct drm_private_obj *obj)
>  {
> -	struct drm_dp_mst_topology_state *state;
> +	struct drm_dp_mst_topology_state *state, *old_state =
> +		to_dp_mst_topology_state(obj->state);
> +	struct drm_dp_vcpi_allocation *pos, *vcpi;
>  
> -	state = kmemdup(obj->state, sizeof(*state), GFP_KERNEL);
> +	state = kmemdup(old_state, sizeof(*state), GFP_KERNEL);
>  	if (!state)
>  		return NULL;
>  
>  	__drm_atomic_helper_private_obj_duplicate_state(obj, &state->base);
>  
> +	INIT_LIST_HEAD(&state->vcpis);
> +	state->vcpi_allocated = false;
> +
> +	list_for_each_entry(pos, &old_state->vcpis, next) {
> +		/* Prune leftover freed VCPI allocations */
> +		if (!pos->vcpi)
> +			continue;
> +
> +		vcpi = kmemdup(pos, sizeof(*vcpi), GFP_KERNEL);
> +		if (!vcpi)
> +			goto fail;
> +
> +		drm_dp_mst_get_port_malloc(vcpi->port);
> +		list_add(&vcpi->next, &state->vcpis);
> +	}
> +
>  	return &state->base;
> +
> +fail:
> +	list_for_each_entry_safe(pos, vcpi, &state->vcpis, next) {
> +		drm_dp_mst_put_port_malloc(pos->port);
> +		kfree(pos);
> +	}
> +	kfree(state);
> +
> +	return NULL;
>  }
>  
>  static void drm_dp_mst_destroy_state(struct drm_private_obj *obj,
> @@ -3411,10 +3521,109 @@ static void drm_dp_mst_destroy_state(struct drm_private_obj *obj,
>  {
>  	struct drm_dp_mst_topology_state *mst_state =
>  		to_dp_mst_topology_state(state);
> +	struct drm_dp_vcpi_allocation *pos, *tmp;
> +
> +	list_for_each_entry_safe(pos, tmp, &mst_state->vcpis, next) {
> +		/* We only keep references to ports with non-zero VCPIs */
> +		if (pos->vcpi)
> +			drm_dp_mst_put_port_malloc(pos->port);
> +		kfree(pos);
> +	}
>  
>  	kfree(mst_state);
>  }
>  
> +static inline int
> +drm_dp_mst_atomic_check_topology_state(struct drm_dp_mst_topology_mgr *mgr,
> +				       struct drm_dp_mst_topology_state *mst_state)
> +{
> +	struct drm_dp_vcpi_allocation *vcpi;
> +	int avail_slots = 63, ret;
> +
> +	/* There's no possible scenario where releasing VCPI or keeping it the
> +	 * same would make the state invalid
> +	 */
> +	if (!mst_state->vcpi_allocated) {
> +		DRM_DEBUG_ATOMIC("[MST MGR:%p] state %p allocates no VCPI, check skipped\n",
> +				 mgr, &mst_state->base);
> +		return 0;
> +	}
> +
> +	list_for_each_entry(vcpi, &mst_state->vcpis, next) {
> +		/* Releasing VCPI is always OK-even if the port is gone */
> +		if (!vcpi->vcpi) {
> +			DRM_DEBUG_ATOMIC("[MST PORT:%p] releases all VCPI slots\n",
> +					 vcpi->port);
> +			continue;
> +		}
> +
> +		if (!drm_dp_mst_topology_get_port(vcpi->port)) {

I think this could blow up. Scenario:
- 2 ports, both plugged in in the same dp mst connector
- both get unplugged
- userspace shuts down the first output (because it's legacy userspace
  it's not going to do both together, or maybe the hotplugs happened close
  together but not at the same time)
- first port is shut down (so passes here), but second port still has it's
  vcpi allocation and would cause a -EINVAL here.

I think we can remove this topology_get check here, and assume that that
drm_connector_is_unregistered earlier will catch anything bad. Or
alternatively check whether this port already had vcpi allocation (through
some vpci_changed bool in the vcpi allocation structure), but that seems a
bit too complicated tbh.

Brain is in w/e mode already, so no full detailed review (and I think we
need to converge on the get/put bikeshed first), but looks good to me
overall.

Cheers, Daniel

> +			DRM_DEBUG_ATOMIC("[MST PORT:%p] is gone but still has VCPI, cannot add new VCPI\n",
> +					 vcpi->port);
> +			return -EINVAL;
> +		}
> +
> +		DRM_DEBUG_ATOMIC("[MST PORT:%p] requires %d vcpi slots\n",
> +				 vcpi->port, vcpi->vcpi);
> +
> +		avail_slots -= vcpi->vcpi;
> +		if (avail_slots < 0) {
> +			DRM_DEBUG_ATOMIC("[MST PORT:%p] not enough VCPI slots in mst state %p (avail=%d)\n",
> +					 vcpi->port, mst_state,
> +					 avail_slots + vcpi->vcpi);
> +			ret = -ENOSPC;
> +			goto port_fail;
> +		}
> +
> +		drm_dp_mst_topology_put_port(vcpi->port);
> +	}
> +	DRM_DEBUG_ATOMIC("[MST MGR:%p] mst state %p VCPI avail=%d used=%d\n",
> +			 mgr, mst_state, avail_slots,
> +			 63 - avail_slots);
> +
> +	return 0;
> +port_fail:
> +	drm_dp_mst_topology_put_port(vcpi->port);
> +	return ret;
> +}
> +
> +/**
> + * drm_dp_mst_atomic_check - Check that the new state of an MST topology in an
> + * atomic update is valid
> + * @state: Pointer to the new &struct drm_dp_mst_topology_state
> + *
> + * Checks the given topology state for an atomic update to ensure that it's
> + * valid. This includes checking whether there's enough bandwidth to support
> + * the new VCPI allocations in the atomic update.
> + *
> + * Any atomic drivers supporting DP MST must make sure to call this after
> + * checking the rest of their state in their
> + * &drm_mode_config_funcs.atomic_check() callback.
> + *
> + * See also:
> + * drm_dp_atomic_find_vcpi_slots()
> + * drm_dp_atomic_release_vcpi_slots()
> + *
> + * Returns:
> + *
> + * 0 if the new state is valid, negative error code otherwise.
> + */
> +int drm_dp_mst_atomic_check(struct drm_atomic_state *state)
> +{
> +	struct drm_dp_mst_topology_mgr *mgr;
> +	struct drm_dp_mst_topology_state *mst_state;
> +	int i, ret = 0;
> +
> +	for_each_new_mst_mgr_in_state(state, mgr, mst_state, i) {
> +		ret = drm_dp_mst_atomic_check_topology_state(mgr, mst_state);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_dp_mst_atomic_check);
> +
>  const struct drm_private_state_funcs drm_dp_mst_topology_state_funcs = {
>  	.atomic_duplicate_state = drm_dp_mst_duplicate_state,
>  	.atomic_destroy_state = drm_dp_mst_destroy_state,
> @@ -3497,9 +3706,7 @@ int drm_dp_mst_topology_mgr_init(struct drm_dp_mst_topology_mgr *mgr,
>  		return -ENOMEM;
>  
>  	mst_state->mgr = mgr;
> -
> -	/* max. time slots - one slot for MTP header */
> -	mst_state->avail_slots = 63;
> +	INIT_LIST_HEAD(&mst_state->vcpis);
>  
>  	drm_atomic_private_obj_init(dev, &mgr->base,
>  				    &mst_state->base,
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 2c3f3f68d506..868cb897f629 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -12690,6 +12690,10 @@ static int intel_atomic_check(struct drm_device *dev,
>  				       "[modeset]" : "[fastset]");
>  	}
>  
> +	ret = drm_dp_mst_atomic_check(state);
> +	if (ret)
> +		return ret;
> +
>  	if (any_ms) {
>  		ret = intel_modeset_checks(state);
>  
> diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
> index 4d6ced34d465..6754d7922a34 100644
> --- a/drivers/gpu/drm/i915/intel_dp_mst.c
> +++ b/drivers/gpu/drm/i915/intel_dp_mst.c
> @@ -41,8 +41,13 @@ static bool intel_dp_mst_compute_config(struct intel_encoder *encoder,
>  	struct drm_connector *connector = conn_state->connector;
>  	void *port = to_intel_connector(connector)->port;
>  	struct drm_atomic_state *state = pipe_config->base.state;
> +	struct drm_crtc *crtc = pipe_config->base.crtc;
> +	struct drm_crtc_state *old_crtc_state =
> +		drm_atomic_get_old_crtc_state(state, crtc);
> +	struct drm_crtc_state *new_crtc_state = &pipe_config->base;
>  	int bpp;
> -	int lane_count, slots = 0;
> +	int lane_count, slots =
> +		to_intel_crtc_state(old_crtc_state)->dp_m_n.tu;
>  	const struct drm_display_mode *adjusted_mode = &pipe_config->base.adjusted_mode;
>  	int mst_pbn;
>  	bool constant_n = drm_dp_has_quirk(&intel_dp->desc,
> @@ -77,8 +82,12 @@ static bool intel_dp_mst_compute_config(struct intel_encoder *encoder,
>  	mst_pbn = drm_dp_calc_pbn_mode(adjusted_mode->crtc_clock, bpp);
>  	pipe_config->pbn = mst_pbn;
>  
> -	/* Zombie connectors can't have VCPI slots */
> -	if (!drm_connector_is_unregistered(connector)) {
> +	/* Only change VCPI allocation on actual mode changes, to prevent us
> +	 * from trying to allocate VCPI to ports that no longer exist when we
> +	 * may just be trying to disable DPMS on them
> +	 */
> +	if (new_crtc_state->mode_changed ||
> +	    new_crtc_state->connectors_changed) {
>  		slots = drm_dp_atomic_find_vcpi_slots(state,
>  						      &intel_dp->mst_mgr,
>  						      port,
> @@ -107,36 +116,39 @@ static bool intel_dp_mst_compute_config(struct intel_encoder *encoder,
>  	return true;
>  }
>  
> -static int intel_dp_mst_atomic_check(struct drm_connector *connector,
> -		struct drm_connector_state *new_conn_state)
> +static int
> +intel_dp_mst_atomic_check(struct drm_connector *connector,
> +			  struct drm_connector_state *new_conn_state)
>  {
> +	struct intel_connector *intel_connector =
> +		to_intel_connector(connector);
> +	struct drm_dp_mst_topology_mgr *mgr =
> +		&intel_connector->mst_port->mst_mgr;
> +	struct drm_dp_mst_port *port = intel_connector->port;
>  	struct drm_atomic_state *state = new_conn_state->state;
> -	struct drm_connector_state *old_conn_state;
> -	struct drm_crtc *old_crtc;
> -	struct drm_crtc_state *crtc_state;
> -	int slots, ret = 0;
> +	struct drm_connector_state *old_conn_state =
> +		drm_atomic_get_old_connector_state(state, connector);
> +	struct drm_crtc *old_crtc = old_conn_state->crtc,
> +			*new_crtc = new_conn_state->crtc;
> +	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
>  
> -	old_conn_state = drm_atomic_get_old_connector_state(state, connector);
> -	old_crtc = old_conn_state->crtc;
>  	if (!old_crtc)
> -		return ret;
> +		return 0;
>  
> -	crtc_state = drm_atomic_get_new_crtc_state(state, old_crtc);
> -	slots = to_intel_crtc_state(crtc_state)->dp_m_n.tu;
> -	if (drm_atomic_crtc_needs_modeset(crtc_state) && slots > 0) {
> -		struct drm_dp_mst_topology_mgr *mgr;
> -		struct drm_encoder *old_encoder;
> +	old_crtc_state = drm_atomic_get_old_crtc_state(state, old_crtc);
> +	if (!old_crtc_state ||
> +	    !to_intel_crtc_state(old_crtc_state)->dp_m_n.tu)
> +		return 0;
>  
> -		old_encoder = old_conn_state->best_encoder;
> -		mgr = &enc_to_mst(old_encoder)->primary->dp.mst_mgr;
> +	/* If we switched CRTCs, clear the previous one's allocation */
> +	new_crtc_state = drm_atomic_get_new_crtc_state(state, old_crtc);
> +	if (new_crtc_state->connectors_changed && !new_crtc_state->enable)
> +		to_intel_crtc_state(new_crtc_state)->dp_m_n.tu = 0;
>  
> -		ret = drm_dp_atomic_release_vcpi_slots(state, mgr, slots);
> -		if (ret)
> -			DRM_DEBUG_KMS("failed releasing %d vcpi slots:%d\n", slots, ret);
> -		else
> -			to_intel_crtc_state(crtc_state)->dp_m_n.tu = 0;
> -	}
> -	return ret;
> +	if (new_crtc)
> +		return 0;
> +
> +	return drm_dp_atomic_release_vcpi_slots(state, mgr, port);
>  }
>  
>  static void intel_mst_disable_dp(struct intel_encoder *encoder,
> diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
> index b8a8d75410d0..79e3f9c730a5 100644
> --- a/include/drm/drm_dp_mst_helper.h
> +++ b/include/drm/drm_dp_mst_helper.h
> @@ -412,9 +412,16 @@ struct drm_dp_payload {
>  
>  #define to_dp_mst_topology_state(x) container_of(x, struct drm_dp_mst_topology_state, base)
>  
> +struct drm_dp_vcpi_allocation {
> +	struct drm_dp_mst_port *port;
> +	int vcpi;
> +	struct list_head next;
> +};
> +
>  struct drm_dp_mst_topology_state {
>  	struct drm_private_state base;
> -	int avail_slots;
> +	struct list_head vcpis;
> +	u8 vcpi_allocated;
>  	struct drm_dp_mst_topology_mgr *mgr;
>  };
>  
> @@ -625,14 +632,17 @@ void drm_dp_mst_topology_mgr_suspend(struct drm_dp_mst_topology_mgr *mgr);
>  int drm_dp_mst_topology_mgr_resume(struct drm_dp_mst_topology_mgr *mgr);
>  struct drm_dp_mst_topology_state *drm_atomic_get_mst_topology_state(struct drm_atomic_state *state,
>  								    struct drm_dp_mst_topology_mgr *mgr);
> -int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
> -				  struct drm_dp_mst_topology_mgr *mgr,
> -				  struct drm_dp_mst_port *port, int pbn);
> -int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
> -				     struct drm_dp_mst_topology_mgr *mgr,
> -				     int slots);
> +int __must_check
> +drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
> +			      struct drm_dp_mst_topology_mgr *mgr,
> +			      struct drm_dp_mst_port *port, int pbn);
> +int __must_check
> +drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
> +				 struct drm_dp_mst_topology_mgr *mgr,
> +				 struct drm_dp_mst_port *port);
>  int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
>  				 struct drm_dp_mst_port *port, bool power_up);
> +int __must_check drm_dp_mst_atomic_check(struct drm_atomic_state *state);
>  
>  void drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port);
>  void drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port);
> -- 
> 2.19.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 01/15] drm/dp_mst: Remove bogus conditional in drm_dp_update_payload_part1()
  2018-12-14  8:42   ` Daniel Vetter
@ 2018-12-17 20:09     ` Wentland, Harry
  0 siblings, 0 replies; 31+ messages in thread
From: Wentland, Harry @ 2018-12-17 20:09 UTC (permalink / raw)
  To: Lyude Paul, dri-devel, nouveau, intel-gfx, amd-gfx, Dave Airlie,
	Zuo, Jerry, Juston Li, Maarten Lankhorst, Maxime Ripard,
	Sean Paul, David Airlie, linux-kernel



On 2018-12-14 3:42 a.m., Daniel Vetter wrote:
> On Thu, Dec 13, 2018 at 08:25:30PM -0500, Lyude Paul wrote:
>> There's no reason we need this, it's just confusing looking.
>>
>> Signed-off-by: Lyude Paul <lyude@redhat.com>
>> Cc: Juston Li <juston.li@intel.com>
>> ---
>>  drivers/gpu/drm/drm_dp_mst_topology.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
>> index ad0fb6d003be..9b1b5c9b1fa0 100644
>> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
>> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
>> @@ -1896,9 +1896,7 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>>  			req_payload.num_slots = 0;
>>  		}
>>  
>> -		if (mgr->payloads[i].start_slot != req_payload.start_slot) {
>> -			mgr->payloads[i].start_slot = req_payload.start_slot;
>> -		}
>> +		mgr->payloads[i].start_slot = req_payload.start_slot;
> 
> Entertaining!
> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 

Reviewed-by: Harry Wentland <harry.wentland@amd.com>

Harry

>>  		/* work out what is required to happen with this payload */
>>  		if (mgr->payloads[i].num_slots != req_payload.num_slots) {
>>  
>> -- 
>> 2.19.2
>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 02/15] drm/dp_mst: Refactor drm_dp_update_payload_part1()
  2018-12-14  8:47   ` Daniel Vetter
@ 2018-12-17 20:27     ` Wentland, Harry
  0 siblings, 0 replies; 31+ messages in thread
From: Wentland, Harry @ 2018-12-17 20:27 UTC (permalink / raw)
  To: Lyude Paul, dri-devel, nouveau, intel-gfx, amd-gfx, Dave Airlie,
	Zuo, Jerry, Juston Li, Maarten Lankhorst, Maxime Ripard,
	Sean Paul, David Airlie, linux-kernel

On 2018-12-14 3:47 a.m., Daniel Vetter wrote:
> On Thu, Dec 13, 2018 at 08:25:31PM -0500, Lyude Paul wrote:
>> There should be no functional changes here
> 
> Would be good to explain what you did refactor here, instead of me trying
> to reconstruct it from the patch. Especially pre-coffee that helps :-)

I concur. Something like "use local variables to improve readability".

With that fixed this is
Reviewed-by: Harry Wentland <harry.wentland@amd.com>

Harry

>>
>> Signed-off-by: Lyude Paul <lyude@redhat.com>
>> Cc: Juston Li <juston.li@intel.com>
>> ---
>>  drivers/gpu/drm/drm_dp_mst_topology.c | 71 ++++++++++++++++-----------
>>  1 file changed, 42 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
>> index 9b1b5c9b1fa0..2ab16c9e6243 100644
>> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
>> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
>> @@ -1879,39 +1879,48 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>>  
>>  	mutex_lock(&mgr->payload_lock);
>>  	for (i = 0; i < mgr->max_payloads; i++) {
>> +		struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
>> +		struct drm_dp_payload *payload = &mgr->payloads[i];
>> +
>>  		/* solve the current payloads - compare to the hw ones
>>  		   - update the hw view */
>>  		req_payload.start_slot = cur_slots;
>> -		if (mgr->proposed_vcpis[i]) {
>> -			port = container_of(mgr->proposed_vcpis[i], struct drm_dp_mst_port, vcpi);
>> +		if (vcpi) {
>> +			port = container_of(vcpi, struct drm_dp_mst_port,
>> +					    vcpi);
>>  			port = drm_dp_get_validated_port_ref(mgr, port);
>>  			if (!port) {
>>  				mutex_unlock(&mgr->payload_lock);
>>  				return -EINVAL;
>>  			}
>> -			req_payload.num_slots = mgr->proposed_vcpis[i]->num_slots;
>> -			req_payload.vcpi = mgr->proposed_vcpis[i]->vcpi;
>> +			req_payload.num_slots = vcpi->num_slots;
>> +			req_payload.vcpi = vcpi->vcpi;
>>  		} else {
>>  			port = NULL;
>>  			req_payload.num_slots = 0;
>>  		}
>>  
>> -		mgr->payloads[i].start_slot = req_payload.start_slot;
>> +		payload->start_slot = req_payload.start_slot;
>>  		/* work out what is required to happen with this payload */
>> -		if (mgr->payloads[i].num_slots != req_payload.num_slots) {
>> +		if (payload->num_slots != req_payload.num_slots) {
>>  
>>  			/* need to push an update for this payload */
>>  			if (req_payload.num_slots) {
>> -				drm_dp_create_payload_step1(mgr, mgr->proposed_vcpis[i]->vcpi, &req_payload);
>> -				mgr->payloads[i].num_slots = req_payload.num_slots;
>> -				mgr->payloads[i].vcpi = req_payload.vcpi;
>> -			} else if (mgr->payloads[i].num_slots) {
>> -				mgr->payloads[i].num_slots = 0;
>> -				drm_dp_destroy_payload_step1(mgr, port, mgr->payloads[i].vcpi, &mgr->payloads[i]);
>> -				req_payload.payload_state = mgr->payloads[i].payload_state;
>> -				mgr->payloads[i].start_slot = 0;
>> +				drm_dp_create_payload_step1(mgr, vcpi->vcpi,
>> +							    &req_payload);
>> +				payload->num_slots = req_payload.num_slots;
>> +				payload->vcpi = req_payload.vcpi;
>> +
>> +			} else if (payload->num_slots) {
>> +				payload->num_slots = 0;
>> +				drm_dp_destroy_payload_step1(mgr, port,
>> +							     payload->vcpi,
>> +							     payload);
>> +				req_payload.payload_state =
>> +					payload->payload_state;
>> +				payload->start_slot = 0;
>>  			}
>> -			mgr->payloads[i].payload_state = req_payload.payload_state;
>> +			payload->payload_state = req_payload.payload_state;
>>  		}
>>  		cur_slots += req_payload.num_slots;
>>  
>> @@ -1920,22 +1929,26 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>>  	}
>>  
>>  	for (i = 0; i < mgr->max_payloads; i++) {
>> -		if (mgr->payloads[i].payload_state == DP_PAYLOAD_DELETE_LOCAL) {
>> -			DRM_DEBUG_KMS("removing payload %d\n", i);
>> -			for (j = i; j < mgr->max_payloads - 1; j++) {
>> -				memcpy(&mgr->payloads[j], &mgr->payloads[j + 1], sizeof(struct drm_dp_payload));
>> -				mgr->proposed_vcpis[j] = mgr->proposed_vcpis[j + 1];
>> -				if (mgr->proposed_vcpis[j] && mgr->proposed_vcpis[j]->num_slots) {
>> -					set_bit(j + 1, &mgr->payload_mask);
>> -				} else {
>> -					clear_bit(j + 1, &mgr->payload_mask);
>> -				}
>> -			}
>> -			memset(&mgr->payloads[mgr->max_payloads - 1], 0, sizeof(struct drm_dp_payload));
>> -			mgr->proposed_vcpis[mgr->max_payloads - 1] = NULL;
>> -			clear_bit(mgr->max_payloads, &mgr->payload_mask);
>> +		if (mgr->payloads[i].payload_state != DP_PAYLOAD_DELETE_LOCAL)
>> +			continue;
>> +
>> +		DRM_DEBUG_KMS("removing payload %d\n", i);
>> +		for (j = i; j < mgr->max_payloads - 1; j++) {
>> +			mgr->payloads[j] = mgr->payloads[j + 1];
>> +			mgr->proposed_vcpis[j] = mgr->proposed_vcpis[j + 1];
>>  
>> +			if (mgr->proposed_vcpis[j] &&
>> +			    mgr->proposed_vcpis[j]->num_slots) {
>> +				set_bit(j + 1, &mgr->payload_mask);
>> +			} else {
>> +				clear_bit(j + 1, &mgr->payload_mask);
>> +			}
>>  		}
>> +
>> +		memset(&mgr->payloads[mgr->max_payloads - 1], 0,
>> +		       sizeof(struct drm_dp_payload));
>> +		mgr->proposed_vcpis[mgr->max_payloads - 1] = NULL;
>> +		clear_bit(mgr->max_payloads, &mgr->payload_mask);
> 
> With the commit message improved to mention
> - Add local variables in the first loop
> - Early continue to reduce 1 indent in the 2nd loop
> 
> this is Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
>>  	}
>>  	mutex_unlock(&mgr->payload_lock);
>>  
>> -- 
>> 2.19.2
>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports
  2018-12-14  9:29   ` Daniel Vetter
@ 2018-12-18 21:27     ` Lyude Paul
  2018-12-19 12:48       ` Daniel Vetter
  0 siblings, 1 reply; 31+ messages in thread
From: Lyude Paul @ 2018-12-18 21:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, nouveau, intel-gfx, amd-gfx, Dave Airlie,
	Harry Wentland, Jerry Zuo, David Airlie, Juston Li,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	Jonathan Corbet, linux-doc, linux-kernel

On Fri, 2018-12-14 at 10:29 +0100, Daniel Vetter wrote:
> On Thu, Dec 13, 2018 at 08:25:32PM -0500, Lyude Paul wrote:
> > The current way of handling refcounting in the DP MST helpers is really
> > confusing and probably just plain wrong because it's been hacked up many
> > times over the years without anyone actually going over the code and
> > seeing if things could be simplified.
> > 
> > To the best of my understanding, the current scheme works like this:
> > drm_dp_mst_port and drm_dp_mst_branch both have a single refcount. When
> > this refcount hits 0 for either of the two, they're removed from the
> > topology state, but not immediately freed. Both ports and branch devices
> > will reinitialize their kref once it's hit 0 before actually destroying
> > themselves. The intended purpose behind this is so that we can avoid
> > problems like not being able to free a remote payload that might still
> > be active, due to us having removed all of the port/branch device
> > structures in memory, as per:
> > 
> > 91a25e463130 ("drm/dp/mst: deallocate payload on port destruction")
> > 
> > Which may have worked, but then it caused use-after-free errors. Being
> > new to MST at the time, I tried fixing it;
> > 
> > 263efde31f97 ("drm/dp/mst: Get validated port ref in
> > drm_dp_update_payload_part1()")
> > 
> > But, that was broken: both drm_dp_mst_port and drm_dp_mst_branch structs
> > are validated in almost every DP MST helper function. Simply put, this
> > means we go through the topology and try to see if the given
> > drm_dp_mst_branch or drm_dp_mst_port is still attached to something
> > before trying to use it in order to avoid dereferencing freed memory
> > (something that has happened a LOT in the past with this library).
> > Because of this it doesn't actually matter whether or not we keep keep
> > the ports and branches around in memory as that's not enough, because
> > any function that validates the branches and ports passed to it will
> > still reject them anyway since they're no longer in the topology
> > structure. So, use-after-free errors were fixed but payload deallocation
> > was completely broken.
> > 
> > Two years later, AMD informed me about this issue and I attempted to
> > come up with a temporary fix, pending a long-overdue cleanup of this
> > library:
> > 
> > c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just
> > ref")
> > 
> > But then that introduced use-after-free errors, so I quickly reverted
> > it:
> > 
> > 9765635b3075 ("Revert "drm/dp_mst: Skip validating ports during
> > destruction, just ref"")
> > 
> > And in the process, learned that there is just no simple fix for this:
> > the design is just broken. Unfortuntely, the usage of these helpers are
> > quite broken as well. Some drivers like i915 have been smart enough to
> > avoid accessing any kind of information from MST port structures, but
> > others like nouveau have assumed, understandably so, that
> > drm_dp_mst_port structures are normal and can just be accessed at any
> > time without worrying about use-after-free errors.
> > 
> > After a lot of discussion, me and Daniel Vetter came up with a better
> > idea to replace all of this.
> > 
> > To summarize, since this is documented far more indepth in the
> > documentation this patch introduces, we make it so that drm_dp_mst_port
> > and drm_dp_mst_branch structures have two different classes of
> > refcounts: topology_kref, and malloc_kref. topology_kref corresponds to
> > the lifetime of the given drm_dp_mst_port or drm_dp_mst_branch in it's
> > given topology. Once it hits zero, any associated connectors are removed
> > and the branch or port can no longer be validated. malloc_kref
> > corresponds to the lifetime of the memory allocation for the actual
> > structure, and will always be non-zero so long as the topology_kref is
> > non-zero. This gives us a way to allow callers to hold onto port and
> > branch device structures past their topology lifetime, and dramatically
> > simplifies the lifetimes of both structures. This also finally fixes the
> > port deallocation problem, properly.
> > 
> > Additionally: since this now means that we can keep ports and branch
> > devices allocated in memory for however long we need, we no longer need
> > a significant amount of the port validation that we currently do.
> > 
> > Additionally, there is one last scenario that this fixes, which couldn't
> > have been fixed properly beforehand:
> > 
> > - CPU1 unrefs port from topology (refcount 1->0)
> > - CPU2 refs port in topology(refcount 0->1)
> > 
> > Since we now can guarantee memory safety for ports and branches
> > as-needed, we also can make our main reference counting functions fix
> > this problem by using kref_get_unless_zero() internally so that topology
> > refcounts can only ever reach 0 once.
> > 
> > Signed-off-by: Lyude Paul <lyude@redhat.com>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: David Airlie <airlied@redhat.com>
> > Cc: Jerry Zuo <Jerry.Zuo@amd.com>
> > Cc: Harry Wentland <harry.wentland@amd.com>
> > Cc: Juston Li <juston.li@intel.com>
> > ---
> >  .../gpu/dp-mst/topology-figure-1.dot          |  31 ++
> >  .../gpu/dp-mst/topology-figure-2.dot          |  37 ++
> >  .../gpu/dp-mst/topology-figure-3.dot          |  40 ++
> >  Documentation/gpu/drm-kms-helpers.rst         | 125 ++++-
> >  drivers/gpu/drm/drm_dp_mst_topology.c         | 512 +++++++++++++-----
> >  include/drm/drm_dp_mst_helper.h               |  19 +-
> >  6 files changed, 637 insertions(+), 127 deletions(-)
> >  create mode 100644 Documentation/gpu/dp-mst/topology-figure-1.dot
> >  create mode 100644 Documentation/gpu/dp-mst/topology-figure-2.dot
> >  create mode 100644 Documentation/gpu/dp-mst/topology-figure-3.dot
> 
> Yay, docs, and pretty ones at that! Awesome stuff :-)
> 
> > diff --git a/Documentation/gpu/dp-mst/topology-figure-1.dot
> > b/Documentation/gpu/dp-mst/topology-figure-1.dot
> > new file mode 100644
> > index 000000000000..fb83789e0a3e
> > --- /dev/null
> > +++ b/Documentation/gpu/dp-mst/topology-figure-1.dot
> > @@ -0,0 +1,31 @@
> > +digraph T {
> > +    /* Topology references */
> > +    node [shape=oval];
> > +    mstb1 -> {port1, port2};
> > +    port1 -> mstb2;
> > +    port2 -> mstb3 -> {port3, port4};
> > +    port3 -> mstb4;
> > +
> > +    /* Malloc references */
> > +    edge [style=dashed];
> > +    mstb4 -> port3;
> > +    {port4, port3} -> mstb3;
> > +    mstb3 -> port2;
> > +    mstb2 -> port1;
> > +    {port1, port2} -> mstb1;
> > +
> > +    edge [dir=back];
> > +    node [style=filled;shape=box;fillcolor=lightblue];
> > +    port1 -> "Payload #1";
> > +    port3 -> "Payload #2";
> > +
> > +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> > +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> > +    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
> > +    mstb4 [label="MSTB #4";style=filled;fillcolor=palegreen];
> > +
> > +    port1 [label="Port #1"];
> > +    port2 [label="Port #2"];
> > +    port3 [label="Port #3"];
> > +    port4 [label="Port #4"];
> > +}
> > diff --git a/Documentation/gpu/dp-mst/topology-figure-2.dot
> > b/Documentation/gpu/dp-mst/topology-figure-2.dot
> > new file mode 100644
> > index 000000000000..eebce560be40
> > --- /dev/null
> > +++ b/Documentation/gpu/dp-mst/topology-figure-2.dot
> > @@ -0,0 +1,37 @@
> > +digraph T {
> > +    /* Topology references */
> > +    node [shape=oval];
> > +
> > +    mstb1 -> {port1, port2};
> > +    port1 -> mstb2;
> > +    edge [color=red];
> > +    port2 -> mstb3 -> {port3, port4};
> > +    port3 -> mstb4;
> > +    edge [color=""];
> > +
> > +    /* Malloc references */
> > +    edge [style=dashed];
> > +    port3 -> mstb3;
> > +    mstb3 -> port2;
> > +    mstb2 -> port1;
> > +    {port1, port2} -> mstb1;
> > +    edge [color=red];
> > +    mstb4 -> port3;
> > +    port4 -> mstb3;
> > +    edge [color=""];
> > +
> > +    edge [dir=back];
> > +    node [style=filled;shape=box;fillcolor=lightblue];
> > +    port1 -> "Payload #1";
> > +    port3 -> "Payload #2";
> > +
> > +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> > +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> > +    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
> > +    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
> > +
> > +    port1 [label="Port #1"];
> > +    port2 [label="Port #2"];
> > +    port3 [label="Port #3"];
> > +    port4 [label="Port #4";style=filled;fillcolor=grey];
> > +}
> > diff --git a/Documentation/gpu/dp-mst/topology-figure-3.dot
> > b/Documentation/gpu/dp-mst/topology-figure-3.dot
> > new file mode 100644
> > index 000000000000..9bf28d87144c
> > --- /dev/null
> > +++ b/Documentation/gpu/dp-mst/topology-figure-3.dot
> > @@ -0,0 +1,40 @@
> > +digraph T {
> > +    /* Topology references */
> > +    node [shape=oval];
> > +
> > +    mstb1 -> {port1, port2};
> > +    port1 -> mstb2;
> > +    edge [color=grey];
> > +    port2 -> mstb3 -> {port3, port4};
> > +    port3 -> mstb4;
> > +    edge [color=""];
> > +
> > +    /* Malloc references */
> > +    edge [style=dashed];
> > +    port3 -> mstb3 [penwidth=3];
> > +    mstb3 -> port2 [penwidth=3];
> > +    mstb2 -> port1;
> > +    {port1, port2} -> mstb1;
> > +    edge [color=grey];
> > +    mstb4 -> port3;
> > +    port4 -> mstb3;
> > +    edge [color=""];
> > +
> > +    edge [dir=back];
> > +    node [style=filled;shape=box;fillcolor=lightblue];
> > +    port1 -> payload1;
> > +    port3 -> payload2 [penwidth=3];
> > +
> > +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> > +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> > +    mstb3 [label="MSTB #3";penwidth=3;style=filled;fillcolor=palegreen];
> > +    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
> > +
> > +    port1 [label="Port #1"];
> > +    port2 [label="Port #2";penwidth=3];
> > +    port3 [label="Port #3";penwidth=3];
> > +    port4 [label="Port #4";style=filled;fillcolor=grey];
> > +
> > +    payload1 [label="Payload #1"];
> > +    payload2 [label="Payload #2";penwidth=3];
> > +}
> > diff --git a/Documentation/gpu/drm-kms-helpers.rst
> > b/Documentation/gpu/drm-kms-helpers.rst
> > index b422eb8edf16..c0f994c2c72f 100644
> > --- a/Documentation/gpu/drm-kms-helpers.rst
> > +++ b/Documentation/gpu/drm-kms-helpers.rst
> > @@ -208,8 +208,11 @@ Display Port Dual Mode Adaptor Helper Functions
> > Reference
> >  .. kernel-doc:: drivers/gpu/drm/drm_dp_dual_mode_helper.c
> >     :export:
> >  
> > -Display Port MST Helper Functions Reference
> > -===========================================
> > +Display Port MST Helpers
> > +========================
> > +
> > +Functions Reference
> > +-------------------
> >  
> >  .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> >     :doc: dp mst helper
> > @@ -220,6 +223,124 @@ Display Port MST Helper Functions Reference
> >  .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> >     :export:
> >  
> > +Branch device and port refcounting
> > +----------------------------------
> 
> I generally try to put the long-form explanations before the function
> references. Since usually the references completely drown out everything
> else and make it harder to spot the important overview stuff.
> 
> 
> > +
> > +Overview
> > +~~~~~~~~
> > +
> > +The refcounting schemes for :c:type:`struct drm_dp_mst_branch` and
> > +:c:type:`struct drm_dp_mst_port` are somewhat unusual. Both ports and
> > branch
> > +devices have two different kinds of refcounts: topology refcounts, and
> > malloc
> > +refcounts.
> > +
> > +Topology refcount overview
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Topology refcounts are not exposed to drivers, and are handled internally
> > by the
> > +DP MST helpers. The helpers use them in order to prevent the in-memory
> > topology
> > +state from being changed in the middle of critical operations like
> > changing the
> > +internal state of payload allocations. This means each branch and port
> > will be
> > +considered to be connected to the rest of the topology until it's
> > topology
> > +refcount reaches zero. Additionally, for ports this means that their
> > associated
> > +:c:type:`struct drm_connector` will stay registered with userspace until
> > the
> > +port's refcount reaches 0.
> > +
> > +
> > +Topology refcount functions
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The DP MST helpers use the following functions to manage topology
> > refcounts:
> > +
> > +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > +   :functions: drm_dp_mst_topology_get_port drm_dp_mst_topology_put_port
> > +               drm_dp_mst_topology_ref_port drm_dp_mst_topology_get_mstb
> > +               drm_dp_mst_topology_put_mstb drm_dp_mst_topology_ref_mstb
> > +
> > +Malloc refcount overview
> > +~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Malloc references are used to keep a :c:type:`struct drm_dp_mst_port` or
> > +:c:type:`struct drm_dp_mst_branch` allocated even after all of its
> > topology
> > +references have been dropped, so that the driver or MST helpers can
> > safely
> > +access each branch's last known state before it was disconnected from the
> > +topology. When the malloc refcount of a port or branch reaches 0, the
> > memory
> > +allocation containing the :c:type:`struct drm_dp_mst_branch` or
> > :c:type:`struct
> > +drm_dp_mst_port` respectively will be freed.
> > +
> > +Malloc refcounts for ports
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +For :c:type:`struct drm_dp_mst_port`, malloc refcounts are exposed to
> > drivers
> > +through the following functions:
> > +
> > +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > +   :functions: drm_dp_mst_get_port_malloc drm_dp_mst_put_port_malloc
> > +
> > +Malloc refcounts for branch devices
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +For :c:type:`struct drm_dp_mst_branch`, malloc refcounts are not
> > currently
> > +exposed to drivers. As of writing this documentation, there are no
> > drivers that
> > +have a usecase for accessing :c:type:`struct drm_dp_mst_branch` outside
> > of the
> > +MST helpers. Exposing this API to drivers in a race-free manner would
> > take more
> > +tweaking of the refcounting scheme, however patches are welcome provided
> > there
> > +is a legitimate driver usecase for this.
> > +
> > +Internally, malloc refcounts for :c:type:`struct drm_dp_mst_branch` are
> > managed
> > +by the DP MST core through the following functions:
> > +
> > +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > +   :functions: drm_dp_mst_get_mstb_malloc drm_dp_mst_put_mstb_malloc
> > +
> > +Refcount relationships in a topology
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Let's take a look at why the relationship between topology and malloc
> > refcounts
> > +is designed the way it is.
> > +
> > +.. kernel-figure:: dp-mst/topology-figure-1.dot
> > +
> > +   An example of topology and malloc refs in a DP MST topology with two
> > active
> > +   payloads. Topology refcount increments are indicated by solid lines,
> > and
> > +   malloc refcount increments are indicated by dashed lines. Each starts
> > from
> > +   the branch which incremented the refcount, and ends at the branch to
> > which
> > +   the refcount belongs to.
> > +
> > +As you can see in figure 1, every branch increments the topology
> > +refcount of it's children, and increments the malloc refcount of it's
> > parent.
> > +Additionally, every payload increments the malloc refcount of it's
> > assigned port
> > +by 1.
> > +
> > +So, what would happen if MSTB #3 from the above figure was unplugged from
> > the
> > +system, but the driver hadn't yet removed payload #2 from port #3? The
> > topology
> > +would start to look like figure 2.
> > +
> > +.. kernel-figure:: dp-mst/topology-figure-2.dot
> > +
> > +   Ports and branch devices which have been released from memory are
> > colored
> > +   grey, and references which have been removed are colored red.
> > +
> > +Whenever a port or branch device's topology refcount reaches zero, it
> > will
> > +decrement the topology refcounts of all its children, the malloc refcount
> > of its
> > +parent, and finally its own malloc refcount. For MSTB #4 and port #4,
> > this means
> > +they both have been disconnected from the topology and freed from memory.
> > But,
> > +because payload #2 is still holding a reference to port #3, port #3 is
> > removed
> > +from the topology but it's :c:type:`struct drm_dp_mst_port` is still
> > accessible
> > +from memory. This also means port #3 has not yet decremented the malloc
> > refcount
> > +of MSTB #3, so it's :c:type:`struct drm_dp_mst_branch` will also stay
> > allocated
> > +in memory until port #3's malloc refcount reaches 0.
> > +
> > +This relationship is necessary because in order to release payload #2, we
> > +need to be able to figure out the last relative of port #3 that's still
> > +connected to the topology. In this case, we would travel up the topology
> > as
> > +shown in figure 3.
> > +
> > +.. kernel-figure:: dp-mst/topology-figure-3.dot
> > +
> > +And finally, remove payload #2 by communicating with port #2 through
> > sideband
> > +transactions.
> 
> (Blind guess, I haven't looked ahead in the series yet)
> 
> I assume that drivers also want to hold a malloc reference from their
> connector, until that connector is destroyed completed (and we hence know
> it released all its vcpi and other stuff and really doesn't need the port
> anymore). Could we integrated that into these neat graphs too? Answering
> the "so how does this integrate into my driver?" question is imo the most
> important part for core api docs.
> 
> Another one: Any reason for not putting this right into the code as a DOC:
> section? Ime moving docs as close as possible to the code improves the
> odds it's kept up-to-date. The only overview texts I've left in the .rst
> is the stuff that describes overall concepts (e.g. how all the kms objects
> fit together).
> 
> All the sphinx/rst syntax should carry over 1:1, except in kerneldoc you
> also can benefit from the abbreviated reference syntax from kerneldoc.
> 
> Anyway, really great stuff.
> 
> > +
> >  MIPI DSI Helper Functions Reference
> >  ===================================
> >  
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index 2ab16c9e6243..c196fb580beb 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -46,7 +46,7 @@ static bool dump_dp_payload_table(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  				  char *buf);
> >  static int test_calc_pbn_mode(void);
> >  
> > -static void drm_dp_put_port(struct drm_dp_mst_port *port);
> > +static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port);
> >  
> >  static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,
> >  				     int id,
> > @@ -850,46 +850,120 @@ static struct drm_dp_mst_branch
> > *drm_dp_add_mst_branch_device(u8 lct, u8 *rad)
> >  	if (lct > 1)
> >  		memcpy(mstb->rad, rad, lct / 2);
> >  	INIT_LIST_HEAD(&mstb->ports);
> > -	kref_init(&mstb->kref);
> > +	kref_init(&mstb->topology_kref);
> > +	kref_init(&mstb->malloc_kref);
> >  	return mstb;
> >  }
> >  
> >  static void drm_dp_free_mst_port(struct kref *kref);
> > +static void drm_dp_free_mst_branch_device(struct kref *kref);
> 
> I'd move the functions around, forward declarations for static functions
> is a bit silly
> 
> > +
> > +/**
> > + * drm_dp_mst_get_mstb_malloc() - Increment the malloc refcount of a
> > branch
> > + * device
> > + * @mstb: The &struct drm_dp_mst_branch to increment the malloc refcount
> > of
> > + *
> > + * Increments @mstb.malloc_kref. When @mstb.malloc_kref reaches 0, the
> > memory
> 
> s/@/&/ for structure member references. @ references to parameters/members
> in the same kerneldoc type only. With & you'll get a nice link, @ is just
> markup (and yes & with a member unfortunately doesn't link to the member,
> only the overall structure).
> 
> Similarly below.
> 
> > + * allocation for @mstb will be released and @mstb may no longer be used.
> > + *
> > + * Any malloc references acquired with this function must be released
> > when
> > + * they are no longer being used by calling drm_dp_mst_put_mstb_malloc().
> 
> I'd dropped "when they are no longer being used", and the line below too.
> Short docs are better generally because attention span of readers.
> 
> > + *
> > + * See also: drm_dp_mst_put_mstb_malloc()
> > + */
> > +static void
> > +drm_dp_mst_get_mstb_malloc(struct drm_dp_mst_branch *mstb)
> > +{
> > +	kref_get(&mstb->malloc_kref);
> > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref));
> > +}
> > +
> > +/**
> > + * drm_dp_mst_put_mstb_malloc() - Decrement the malloc refcount of a
> > branch
> > + * device
> > + * @mstb: The &struct drm_dp_mst_branch to decrement the malloc refcount
> > of
> > + *
> > + * Decrements @mstb.malloc_kref. When @mstb.malloc_kref reaches 0, the
> > memory
> > + * allocation for @mstb will be released and @mstb may no longer be used.
> > + *
> > + * See also: drm_dp_mst_get_mstb_malloc()
> > + */
> > +static void
> > +drm_dp_mst_put_mstb_malloc(struct drm_dp_mst_branch *mstb)
> > +{
> > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref)-1);
> > +	kref_put(&mstb->malloc_kref, drm_dp_free_mst_branch_device);
> > +}
> > +
> > +/**
> > + * drm_dp_mst_get_port_malloc() - Increment the malloc refcount of an MST
> > port
> > + * @port: The &struct drm_dp_mst_port to increment the malloc refcount of
> > + *
> > + * Increments @port.malloc_kref. When @port.malloc_kref reaches 0, the
> > memory
> > + * allocation for @port will be released and @port may no longer be used.
> > + *
> > + * Because @port could potentially be freed at any time by the DP MST
> > helpers
> > + * if @port.malloc_kref reaches 0, including during a call to this
> > function,
> > + * drivers that which to make use of &struct drm_dp_mst_port should
> > ensure
> > + * that they grab at least one main malloc reference to their MST ports
> > in
> > + * &drm_dp_mst_topology_cbs.add_connector. This callback is called before
> > + * there is any chance for @port.malloc_kref to reach 0.
> > + *
> > + * Any malloc references acquired with this function must be released
> > when
> > + * they are no longer being used by calling drm_dp_mst_put_port_malloc().
> > + *
> > + * See also: drm_dp_mst_put_port_malloc()
> 
> Same reduction as with mstb_malloc version.
> 
> > + */
> > +void
> > +drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port)
> > +{
> > +	kref_get(&port->malloc_kref);
> > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref));
> > +}
> > +EXPORT_SYMBOL(drm_dp_mst_get_port_malloc);
> > +
> > +/**
> > + * drm_dp_mst_put_port_malloc() - Decrement the malloc refcount of an MST
> > port
> > + * @port: The &struct drm_dp_mst_port to decrement the malloc refcount of
> > + *
> > + * Decrements @port.malloc_kref. When @port.malloc_kref reaches 0, the
> > memory
> > + * allocation for @port will be released and @port may no longer be used.
> > + *
> > + * See also: drm_dp_mst_get_port_malloc()
> > + */
> > +void
> > +drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port)
> > +{
> > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref)-1);
> > +	kref_put(&port->malloc_kref, drm_dp_free_mst_port);
> > +}
> > +EXPORT_SYMBOL(drm_dp_mst_put_port_malloc);
> >  
> >  static void drm_dp_free_mst_branch_device(struct kref *kref)
> >  {
> > -	struct drm_dp_mst_branch *mstb = container_of(kref, struct
> > drm_dp_mst_branch, kref);
> > -	if (mstb->port_parent) {
> > -		if (list_empty(&mstb->port_parent->next))
> > -			kref_put(&mstb->port_parent->kref,
> > drm_dp_free_mst_port);
> > -	}
> > +	struct drm_dp_mst_branch *mstb =
> > +		container_of(kref, struct drm_dp_mst_branch, malloc_kref);
> > +
> > +	if (mstb->port_parent)
> > +		drm_dp_mst_put_port_malloc(mstb->port_parent);
> > +
> >  	kfree(mstb);
> >  }
> >  
> >  static void drm_dp_destroy_mst_branch_device(struct kref *kref)
> >  {
> > -	struct drm_dp_mst_branch *mstb = container_of(kref, struct
> > drm_dp_mst_branch, kref);
> > +	struct drm_dp_mst_branch *mstb =
> > +		container_of(kref, struct drm_dp_mst_branch, topology_kref);
> > +	struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
> >  	struct drm_dp_mst_port *port, *tmp;
> >  	bool wake_tx = false;
> >  
> > -	/*
> > -	 * init kref again to be used by ports to remove mst branch when it is
> > -	 * not needed anymore
> > -	 */
> > -	kref_init(kref);
> > -
> > -	if (mstb->port_parent && list_empty(&mstb->port_parent->next))
> > -		kref_get(&mstb->port_parent->kref);
> > -
> > -	/*
> > -	 * destroy all ports - don't need lock
> > -	 * as there are no more references to the mst branch
> > -	 * device at this point.
> > -	 */
> > +	mutex_lock(&mgr->lock);
> >  	list_for_each_entry_safe(port, tmp, &mstb->ports, next) {
> >  		list_del(&port->next);
> > -		drm_dp_put_port(port);
> > +		drm_dp_mst_topology_put_port(port);
> >  	}
> > +	mutex_unlock(&mgr->lock);
> 
> Would be nice to split this out (to highlight the bugfix more), but
> because of the kref_init() hack not really feasible I think :-/
> >  
> >  	/* drop any tx slots msg */
> >  	mutex_lock(&mstb->mgr->qlock);
> > @@ -908,14 +982,82 @@ static void drm_dp_destroy_mst_branch_device(struct
> > kref *kref)
> >  	if (wake_tx)
> >  		wake_up_all(&mstb->mgr->tx_waitq);
> >  
> > -	kref_put(kref, drm_dp_free_mst_branch_device);
> > +	drm_dp_mst_put_mstb_malloc(mstb);
> >  }
> >  
> > -static void drm_dp_put_mst_branch_device(struct drm_dp_mst_branch *mstb)
> > +/**
> > + * drm_dp_mst_topology_get_mstb() - Increment the topology refcount of a
> > + * branch device unless its zero
> > + * @mstb: &struct drm_dp_mst_branch to increment the topology refcount of
> > + *
> > + * Attempts to grab a topology reference to @mstb, if it hasn't yet been
> > + * removed from the topology (e.g. @mstb.topology_kref has reached 0).
> > + *
> > + * Any topology references acquired with this function must be released
> > when
> > + * they are no longer being used by calling
> > drm_dp_mst_topology_put_mstb().
> 
> I'd explain the relationship with malloc_kref a bit here:
> 
> - topology ref implies a malloc ref, hence you can call get_mstb_malloc
>   with only holding a topology ref (might be better to explain this in the
>   get_mstb_malloc kerneldoc, since it also applies to the unconditional
>   kref_get below)
> - malloc_ref is enough to call this function, but then it can fail
> 
> > + *
> > + * See also:
> > + * drm_dp_mst_topology_ref_mstb()
> 
> I'd write out when you should use this one instead:
> 
> "If you already have a topology reference you should use other_function()
> instead."
> 
> > + * drm_dp_mst_topology_get_mstb()
> 
> This is this function itself :-)
> 
> > + *
> > + * Returns:
> > + * * 1: A topology reference was grabbed successfully
> > + * * 0: @port is no longer in the topology, no reference was grabbed
> > + */
> > +static int __must_check
> > +drm_dp_mst_topology_get_mstb(struct drm_dp_mst_branch *mstb)
> 
> Hm if you both want a kref_get and a kref_get_unless_zero then we need
> better naming. topology_get_mstb should be the unconditional kref_get, the
> conditional kref_get_unless_zero needs some indication that it could fail.
> We need some verb that indicates that instead of "get":
> - "validate" since we've used that one already
> - "lookup" that's used by all the drm_mode_object lookup functions, feels
>   a bit misleading
> - "try_get"
> 
> >  {
> > -	kref_put(&mstb->kref, drm_dp_destroy_mst_branch_device);
> > +	int ret = kref_get_unless_zero(&mstb->topology_kref);
> > +
> > +	if (ret)
> > +		DRM_DEBUG("mstb %p (%d)\n", mstb,
> > +			  kref_read(&mstb->topology_kref));
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * drm_dp_mst_topology_ref_mstb() - Increment the topology refcount of a
> > + * branch device
> > + * @mstb: The &struct drm_dp_mst_branch to increment the topology
> > refcount of
> > + *
> > + * Increments @mstb.topology_refcount without checking whether or not
> > it's
> > + * already reached 0. This is only valid to use in scenarios where you
> > are
> > + * already guaranteed to have at least one active topology reference to
> > @mstb.
> > + * Otherwise, drm_dp_mst_topology_get_mstb() should be used.
> 
> s/should/must/  (or my English understanding is off, afaiui "should" isn't
> a strict requirement per rfc2119)
> 
> > + *
> > + * Any topology references acquired with this function must be released
> > when
> > + * they are no longer being used by calling
> > drm_dp_mst_topology_put_mstb().
> > + *
> > + * See also:
> > + * drm_dp_mst_topology_get_mstb()
> > + * drm_dp_mst_topology_put_mstb()
> > + */
> > +static void
> > +drm_dp_mst_topology_ref_mstb(struct drm_dp_mst_branch *mstb)
> > +{
> 
> Should we have a WARN_ON(refcount == 0) here?
> 
> > +	kref_get(&mstb->topology_kref);
> > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->topology_kref));
> >  }
> >  
> > +/**
> > + * drm_dp_mst_topology_put_mstb() - release a topology reference to a
> > branch
> > + * device
> > + * @mstb: The &struct drm_dp_mst_branch to release the topology reference
> > from
> > + *
> > + * Releases a topology reference from @mstb by decrementing
> > + * @mstb.topology_kref.
> > + *
> > + * See also:
> > + * drm_dp_mst_topology_get_mstb()
> > + * drm_dp_mst_topology_ref_mstb()
> > + */
> > +static void
> > +drm_dp_mst_topology_put_mstb(struct drm_dp_mst_branch *mstb)
> > +{
> > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->topology_kref)-1);
> > +	kref_put(&mstb->topology_kref, drm_dp_destroy_mst_branch_device);
> > +}
> >  
> >  static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int
> > old_pdt)
> >  {
> > @@ -930,14 +1072,15 @@ static void drm_dp_port_teardown_pdt(struct
> > drm_dp_mst_port *port, int old_pdt)
> >  	case DP_PEER_DEVICE_MST_BRANCHING:
> >  		mstb = port->mstb;
> >  		port->mstb = NULL;
> > -		drm_dp_put_mst_branch_device(mstb);
> > +		drm_dp_mst_topology_put_mstb(mstb);
> >  		break;
> >  	}
> >  }
> >  
> >  static void drm_dp_destroy_port(struct kref *kref)
> >  {
> > -	struct drm_dp_mst_port *port = container_of(kref, struct
> > drm_dp_mst_port, kref);
> > +	struct drm_dp_mst_port *port =
> > +		container_of(kref, struct drm_dp_mst_port, topology_kref);
> >  	struct drm_dp_mst_topology_mgr *mgr = port->mgr;
> >  
> >  	if (!port->input) {
> > @@ -956,7 +1099,6 @@ static void drm_dp_destroy_port(struct kref *kref)
> >  			 * from an EDID retrieval */
> >  
> >  			mutex_lock(&mgr->destroy_connector_lock);
> > -			kref_get(&port->parent->kref);
> >  			list_add(&port->next, &mgr->destroy_connector_list);
> >  			mutex_unlock(&mgr->destroy_connector_lock);
> >  			schedule_work(&mgr->destroy_connector_work);
> > @@ -967,25 +1109,93 @@ static void drm_dp_destroy_port(struct kref *kref)
> >  		drm_dp_port_teardown_pdt(port, port->pdt);
> >  		port->pdt = DP_PEER_DEVICE_NONE;
> >  	}
> > -	kfree(port);
> > +	drm_dp_mst_put_port_malloc(port);
> >  }
> >  
> > -static void drm_dp_put_port(struct drm_dp_mst_port *port)
> > +/**
> > + * drm_dp_mst_topology_get_port() - Increment the topology refcount of a
> > + * port unless its zero
> > + * @port: &struct drm_dp_mst_port to increment the topology refcount of
> > + *
> > + * Attempts to grab a topology reference to @port, if it hasn't yet been
> > + * removed from the topology (e.g. @port.topology_kref has reached 0).
> > + *
> > + * Any topology references acquired with this function must be released
> > when
> > + * they are no longer being used by calling
> > drm_dp_mst_topology_put_port().
> > + *
> > + * See also:
> > + * drm_dp_mst_topology_ref_port()
> > + * drm_dp_mst_topology_put_port()
> > + *
> > + * Returns:
> > + * * 1: A topology reference was grabbed successfully
> > + * * 0: @port is no longer in the topology, no reference was grabbed
> > + */
> > +static int __must_check
> > +drm_dp_mst_topology_get_port(struct drm_dp_mst_port *port)
> >  {
> > -	kref_put(&port->kref, drm_dp_destroy_port);
> > +	int ret = kref_get_unless_zero(&port->topology_kref);
> > +
> > +	if (ret)
> > +		DRM_DEBUG("port %p (%d)\n", port,
> > +			  kref_read(&port->topology_kref));
> > +
> > +	return ret;
> >  }
> >  
> > -static struct drm_dp_mst_branch
> > *drm_dp_mst_get_validated_mstb_ref_locked(struct drm_dp_mst_branch *mstb,
> > struct drm_dp_mst_branch *to_find)
> > +/**
> > + * drm_dp_mst_topology_ref_port() - Increment the topology refcount of a
> > port
> > + * @port: The &struct drm_dp_mst_port to increment the topology refcount
> > of
> > + *
> > + * Increments @port.topology_refcount without checking whether or not
> > it's
> > + * already reached 0. This is only valid to use in scenarios where you
> > are
> > + * already guaranteed to have at least one active topology reference to
> > @port.
> > + * Otherwise, drm_dp_mst_topology_get_port() should be used.
> > + *
> > + * Any topology references acquired with this function must be released
> > when
> > + * they are no longer being used by calling
> > drm_dp_mst_topology_put_port().
> > + *
> > + * See also:
> > + * drm_dp_mst_topology_get_port()
> > + * drm_dp_mst_topology_put_port()
> > + */
> > +static void drm_dp_mst_topology_ref_port(struct drm_dp_mst_port *port)
> > +{
> > +	kref_get(&port->topology_kref);
> > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->topology_kref));
> > +}
> > +
> > +/**
> > + * drm_dp_mst_topology_put_port() - release a topology reference to a
> > port
> > + * @port: The &struct drm_dp_mst_port to release the topology reference
> > from
> > + *
> > + * Releases a topology reference from @port by decrementing
> > + * @port.topology_kref.
> > + *
> > + * See also:
> > + * drm_dp_mst_topology_get_port()
> > + * drm_dp_mst_topology_ref_port()
> > + */
> > +static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port)
> > +{
> > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->topology_kref)-1);
> > +	kref_put(&port->topology_kref, drm_dp_destroy_port);
> > +}
> > +
> > +static struct drm_dp_mst_branch *
> > +drm_dp_mst_topology_get_mstb_validated_locked(struct drm_dp_mst_branch
> > *mstb,
> > +					      struct drm_dp_mst_branch
> > *to_find)
> >  {
> >  	struct drm_dp_mst_port *port;
> >  	struct drm_dp_mst_branch *rmstb;
> > -	if (to_find == mstb) {
> > -		kref_get(&mstb->kref);
> > +
> > +	if (to_find == mstb)
> >  		return mstb;
> > -	}
> > +
> >  	list_for_each_entry(port, &mstb->ports, next) {
> >  		if (port->mstb) {
> > -			rmstb = drm_dp_mst_get_validated_mstb_ref_locked(port-
> > >mstb, to_find);
> > +			rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
> 
> I think a prep patch which just renames the current get_validated/put
> functions to the new names would be really good. Then this patch here with
> the new stuff.
> 
> 
> > +			    port->mstb, to_find);
> >  			if (rmstb)
> >  				return rmstb;
> >  		}
> > @@ -993,27 +1203,37 @@ static struct drm_dp_mst_branch
> > *drm_dp_mst_get_validated_mstb_ref_locked(struct
> >  	return NULL;
> >  }
> >  
> > -static struct drm_dp_mst_branch *drm_dp_get_validated_mstb_ref(struct
> > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_branch *mstb)
> > +static struct drm_dp_mst_branch *
> > +drm_dp_mst_topology_get_mstb_validated(struct drm_dp_mst_topology_mgr
> > *mgr,
> > +				       struct drm_dp_mst_branch *mstb)
> >  {
> >  	struct drm_dp_mst_branch *rmstb = NULL;
> > +
> >  	mutex_lock(&mgr->lock);
> > -	if (mgr->mst_primary)
> > -		rmstb = drm_dp_mst_get_validated_mstb_ref_locked(mgr-
> > >mst_primary, mstb);
> > +	if (mgr->mst_primary) {
> > +		rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
> > +		    mgr->mst_primary, mstb);
> > +
> > +		if (rmstb && !drm_dp_mst_topology_get_mstb(rmstb))
> > +			rmstb = NULL;
> > +	}
> >  	mutex_unlock(&mgr->lock);
> >  	return rmstb;
> >  }
> >  
> > -static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct
> > drm_dp_mst_branch *mstb, struct drm_dp_mst_port *to_find)
> > +static struct drm_dp_mst_port *
> > +drm_dp_mst_topology_get_port_validated_locked(struct drm_dp_mst_branch
> > *mstb,
> > +					      struct drm_dp_mst_port *to_find)
> >  {
> >  	struct drm_dp_mst_port *port, *mport;
> >  
> >  	list_for_each_entry(port, &mstb->ports, next) {
> > -		if (port == to_find) {
> > -			kref_get(&port->kref);
> > +		if (port == to_find)
> >  			return port;
> > -		}
> > +
> >  		if (port->mstb) {
> > -			mport = drm_dp_mst_get_port_ref_locked(port->mstb,
> > to_find);
> > +			mport = drm_dp_mst_topology_get_port_validated_locked(
> > +			    port->mstb, to_find);
> >  			if (mport)
> >  				return mport;
> >  		}
> > @@ -1021,12 +1241,20 @@ static struct drm_dp_mst_port
> > *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
> >  	return NULL;
> >  }
> >  
> > -static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct
> > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
> > +static struct drm_dp_mst_port *
> > +drm_dp_mst_topology_get_port_validated(struct drm_dp_mst_topology_mgr
> > *mgr,
> > +				       struct drm_dp_mst_port *port)
> >  {
> >  	struct drm_dp_mst_port *rport = NULL;
> > +
> >  	mutex_lock(&mgr->lock);
> > -	if (mgr->mst_primary)
> > -		rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
> > port);
> > +	if (mgr->mst_primary) {
> > +		rport = drm_dp_mst_topology_get_port_validated_locked(
> > +		    mgr->mst_primary, port);
> > +
> > +		if (rport && !drm_dp_mst_topology_get_port(rport))
> > +			rport = NULL;
> > +	}
> >  	mutex_unlock(&mgr->lock);
> >  	return rport;
> >  }
> > @@ -1034,11 +1262,12 @@ static struct drm_dp_mst_port
> > *drm_dp_get_validated_port_ref(struct drm_dp_mst_t
> >  static struct drm_dp_mst_port *drm_dp_get_port(struct drm_dp_mst_branch
> > *mstb, u8 port_num)
> >  {
> >  	struct drm_dp_mst_port *port;
> > +	int ret;
> >  
> >  	list_for_each_entry(port, &mstb->ports, next) {
> >  		if (port->port_num == port_num) {
> > -			kref_get(&port->kref);
> > -			return port;
> > +			ret = drm_dp_mst_topology_get_port(port);
> > +			return ret ? port : NULL;
> >  		}
> >  	}
> >  
> > @@ -1087,6 +1316,11 @@ static bool drm_dp_port_setup_pdt(struct
> > drm_dp_mst_port *port)
> >  		if (port->mstb) {
> >  			port->mstb->mgr = port->mgr;
> >  			port->mstb->port_parent = port;
> > +			/*
> > +			 * Make sure this port's memory allocation stays
> > +			 * around until it's child MSTB releases it
> > +			 */
> > +			drm_dp_mst_get_port_malloc(port);
> >  
> >  			send_link = true;
> >  		}
> > @@ -1147,17 +1381,26 @@ static void drm_dp_add_port(struct
> > drm_dp_mst_branch *mstb,
> >  	bool created = false;
> >  	int old_pdt = 0;
> >  	int old_ddps = 0;
> > +
> >  	port = drm_dp_get_port(mstb, port_msg->port_number);
> >  	if (!port) {
> >  		port = kzalloc(sizeof(*port), GFP_KERNEL);
> >  		if (!port)
> >  			return;
> > -		kref_init(&port->kref);
> > +		kref_init(&port->topology_kref);
> > +		kref_init(&port->malloc_kref);
> >  		port->parent = mstb;
> >  		port->port_num = port_msg->port_number;
> >  		port->mgr = mstb->mgr;
> >  		port->aux.name = "DPMST";
> >  		port->aux.dev = dev->dev;
> > +
> > +		/*
> > +		 * Make sure the memory allocation for our parent branch stays
> > +		 * around until our own memory allocation is released
> > +		 */
> > +		drm_dp_mst_get_mstb_malloc(mstb);
> > +
> >  		created = true;
> >  	} else {
> >  		old_pdt = port->pdt;
> > @@ -1177,7 +1420,7 @@ static void drm_dp_add_port(struct drm_dp_mst_branch
> > *mstb,
> >  	   for this list */
> >  	if (created) {
> >  		mutex_lock(&mstb->mgr->lock);
> > -		kref_get(&port->kref);
> > +		drm_dp_mst_topology_ref_port(port);
> >  		list_add(&port->next, &mstb->ports);
> >  		mutex_unlock(&mstb->mgr->lock);
> >  	}
> > @@ -1202,17 +1445,21 @@ static void drm_dp_add_port(struct
> > drm_dp_mst_branch *mstb,
> >  	if (created && !port->input) {
> >  		char proppath[255];
> >  
> > -		build_mst_prop_path(mstb, port->port_num, proppath,
> > sizeof(proppath));
> > -		port->connector = (*mstb->mgr->cbs->add_connector)(mstb->mgr,
> > port, proppath);
> > +		build_mst_prop_path(mstb, port->port_num, proppath,
> > +				    sizeof(proppath));
> > +		port->connector = (*mstb->mgr->cbs->add_connector)(mstb->mgr,
> > +								   port,
> > +								   proppath);
> >  		if (!port->connector) {
> >  			/* remove it from the port list */
> >  			mutex_lock(&mstb->mgr->lock);
> >  			list_del(&port->next);
> >  			mutex_unlock(&mstb->mgr->lock);
> >  			/* drop port list reference */
> > -			drm_dp_put_port(port);
> > +			drm_dp_mst_topology_put_port(port);
> >  			goto out;
> >  		}
> > +
> >  		if ((port->pdt == DP_PEER_DEVICE_DP_LEGACY_CONV ||
> >  		     port->pdt == DP_PEER_DEVICE_SST_SINK) &&
> >  		    port->port_num >= DP_MST_LOGICAL_PORT_0) {
> > @@ -1224,7 +1471,7 @@ static void drm_dp_add_port(struct drm_dp_mst_branch
> > *mstb,
> >  
> >  out:
> >  	/* put reference to this port */
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_port(port);
> >  }
> >  
> >  static void drm_dp_update_port(struct drm_dp_mst_branch *mstb,
> > @@ -1259,7 +1506,7 @@ static void drm_dp_update_port(struct
> > drm_dp_mst_branch *mstb,
> >  			dowork = true;
> >  	}
> >  
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_port(port);
> >  	if (dowork)
> >  		queue_work(system_long_wq, &mstb->mgr->work);
> >  
> > @@ -1270,7 +1517,7 @@ static struct drm_dp_mst_branch
> > *drm_dp_get_mst_branch_device(struct drm_dp_mst_
> >  {
> >  	struct drm_dp_mst_branch *mstb;
> >  	struct drm_dp_mst_port *port;
> > -	int i;
> > +	int i, ret;
> >  	/* find the port by iterating down */
> >  
> >  	mutex_lock(&mgr->lock);
> > @@ -1295,7 +1542,9 @@ static struct drm_dp_mst_branch
> > *drm_dp_get_mst_branch_device(struct drm_dp_mst_
> >  			}
> >  		}
> >  	}
> > -	kref_get(&mstb->kref);
> > +	ret = drm_dp_mst_topology_get_mstb(mstb);
> > +	if (!ret)
> > +		mstb = NULL;
> >  out:
> >  	mutex_unlock(&mgr->lock);
> >  	return mstb;
> > @@ -1325,19 +1574,22 @@ static struct drm_dp_mst_branch
> > *get_mst_branch_device_by_guid_helper(
> >  	return NULL;
> >  }
> >  
> > -static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device_by_guid(
> > -	struct drm_dp_mst_topology_mgr *mgr,
> > -	uint8_t *guid)
> > +static struct drm_dp_mst_branch *
> > +drm_dp_get_mst_branch_device_by_guid(struct drm_dp_mst_topology_mgr *mgr,
> > +				     uint8_t *guid)
> >  {
> >  	struct drm_dp_mst_branch *mstb;
> > +	int ret;
> >  
> >  	/* find the port by iterating down */
> >  	mutex_lock(&mgr->lock);
> >  
> >  	mstb = get_mst_branch_device_by_guid_helper(mgr->mst_primary, guid);
> > -
> > -	if (mstb)
> > -		kref_get(&mstb->kref);
> > +	if (mstb) {
> > +		ret = drm_dp_mst_topology_get_mstb(mstb);
> > +		if (!ret)
> > +			mstb = NULL;
> > +	}
> >  
> >  	mutex_unlock(&mgr->lock);
> >  	return mstb;
> > @@ -1362,10 +1614,10 @@ static void
> > drm_dp_check_and_send_link_address(struct drm_dp_mst_topology_mgr *m
> >  			drm_dp_send_enum_path_resources(mgr, mstb, port);
> >  
> >  		if (port->mstb) {
> > -			mstb_child = drm_dp_get_validated_mstb_ref(mgr, port-
> > >mstb);
> > +			mstb_child =
> > drm_dp_mst_topology_get_mstb_validated(mgr, port->mstb);
> >  			if (mstb_child) {
> >  				drm_dp_check_and_send_link_address(mgr,
> > mstb_child);
> > -				drm_dp_put_mst_branch_device(mstb_child);
> > +				drm_dp_mst_topology_put_mstb(mstb_child);
> >  			}
> >  		}
> >  	}
> > @@ -1375,16 +1627,19 @@ static void drm_dp_mst_link_probe_work(struct
> > work_struct *work)
> >  {
> >  	struct drm_dp_mst_topology_mgr *mgr = container_of(work, struct
> > drm_dp_mst_topology_mgr, work);
> >  	struct drm_dp_mst_branch *mstb;
> > +	int ret;
> >  
> >  	mutex_lock(&mgr->lock);
> >  	mstb = mgr->mst_primary;
> >  	if (mstb) {
> > -		kref_get(&mstb->kref);
> > +		ret = drm_dp_mst_topology_get_mstb(mstb);
> > +		if (!ret)
> > +			mstb = NULL;
> >  	}
> >  	mutex_unlock(&mgr->lock);
> >  	if (mstb) {
> >  		drm_dp_check_and_send_link_address(mgr, mstb);
> > -		drm_dp_put_mst_branch_device(mstb);
> > +		drm_dp_mst_topology_put_mstb(mstb);
> >  	}
> >  }
> >  
> > @@ -1695,22 +1950,32 @@ static struct drm_dp_mst_port
> > *drm_dp_get_last_connected_port_to_mstb(struct drm
> >  	return drm_dp_get_last_connected_port_to_mstb(mstb->port_parent-
> > >parent);
> >  }
> >  
> > -static struct drm_dp_mst_branch
> > *drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr
> > *mgr,
> > -									 struc
> > t drm_dp_mst_branch *mstb,
> > -									 int
> > *port_num)
> > +static struct drm_dp_mst_branch *
> > +drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr
> > *mgr,
> > +					struct drm_dp_mst_branch *mstb,
> > +					int *port_num)
> >  {
> >  	struct drm_dp_mst_branch *rmstb = NULL;
> >  	struct drm_dp_mst_port *found_port;
> > +
> >  	mutex_lock(&mgr->lock);
> > -	if (mgr->mst_primary) {
> > +	if (!mgr->mst_primary)
> > +		goto out;
> > +
> > +	do {
> >  		found_port = drm_dp_get_last_connected_port_to_mstb(mstb);
> > +		if (!found_port)
> > +			break;
> >  
> > -		if (found_port) {
> > +		if (drm_dp_mst_topology_get_mstb(found_port->parent)) {
> >  			rmstb = found_port->parent;
> > -			kref_get(&rmstb->kref);
> >  			*port_num = found_port->port_num;
> > +		} else {
> > +			/* Search again, starting from this parent */
> > +			mstb = found_port->parent;
> >  		}
> > -	}
> > +	} while (!rmstb);
> 
> Hm, is this a bugfix of validating the entire chain? Afaiui the new
> topology_get still validates the entire chain, so I'm a bit confused what
> this does here.

JFYI: I'm assuming you meant the old get_validated() functions. I mentioned in
the cover letter for this series that I wasn't sure if we still needed them,
but on closer inspection I think we still do since they perform the actual
validation of the whole topology chain. drm_dp_mst_topology_get_(port|mstb)()
just increments the topology refcount safely.

The change you're seeing here is because since we didn't use
kref_get_unless_zero() before, we'd just go up the topology path above mstb(),
then kref the first thing we find that we think is still connected to the
topology (I honestly don't know how/if this ever worked), then give it a kref
and return it. Now that we use kref_get_unless_zero(), we have to deal with
the fact that the kref could fail, which would happen if we just retrieved a
parent mstb or port that is also disconnected from the topology. So, the only
way to do that is to find what we think is the last connected mstb, check if
it actually is, then restart the search from that mstb if the kref failed and
it's not connected to the topology.

That being said, I've been wondering about figuring out spots like this where
we probably also need to follow that up with "also make sure all of the
parents of this 'connected' topology device are also valid", since it's quite
possible we could run into a scenario like this:

Step 1:
MSTB 1
  |- Port 1
  |- Port 2
  |- Port 3
     |- MSTB 2 ← (just unplugged, top refcount == 0)
        |- Port 4 ← (also unplugged, but top refcount not updated yet)
        |- Port 5 ← (same thing ^)
        |- Port 6 ← (same thing ^)
           |- MSTB 3
              |- Port 7
              |- Port 8
              |- Port 9
                 ^
      drm_dp_get_last_connected_port_to_mstb()
      travels up to Port 6, assumes Port 6 is valid because it's top refcount
      != 0

Now that I type all of that out though, I think we could also fix that fairly
easily by instead just adding a topology_state mutex, and adding a variable to
denote whether or not a port is actually still part of a topology.

Maybe that also means we should come up with a different name for
topology_refcount, resource_refcount maybe?
> 

> > +out:
> >  	mutex_unlock(&mgr->lock);
> >  	return rmstb;
> >  }
> > @@ -1726,17 +1991,19 @@ static int drm_dp_payload_send_msg(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	u8 sinks[DRM_DP_MAX_SDP_STREAMS];
> >  	int i;
> >  
> > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> >  	if (!port)
> >  		return -EINVAL;
> >  
> >  	port_num = port->port_num;
> > -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> > +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
> >  	if (!mstb) {
> > -		mstb = drm_dp_get_last_connected_port_and_mstb(mgr, port-
> > >parent, &port_num);
> > +		mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> > +							       port->parent,
> > +							       &port_num);
> >  
> >  		if (!mstb) {
> > -			drm_dp_put_port(port);
> > +			drm_dp_mst_topology_put_port(port);
> >  			return -EINVAL;
> >  		}
> >  	}
> > @@ -1766,8 +2033,8 @@ static int drm_dp_payload_send_msg(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	}
> >  	kfree(txmsg);
> >  fail_put:
> > -	drm_dp_put_mst_branch_device(mstb);
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_mstb(mstb);
> > +	drm_dp_mst_topology_put_port(port);
> >  	return ret;
> >  }
> >  
> > @@ -1777,13 +2044,13 @@ int drm_dp_send_power_updown_phy(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	struct drm_dp_sideband_msg_tx *txmsg;
> >  	int len, ret;
> >  
> > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> >  	if (!port)
> >  		return -EINVAL;
> >  
> >  	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> >  	if (!txmsg) {
> > -		drm_dp_put_port(port);
> > +		drm_dp_mst_topology_put_port(port);
> >  		return -ENOMEM;
> >  	}
> >  
> > @@ -1799,7 +2066,7 @@ int drm_dp_send_power_updown_phy(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  			ret = 0;
> >  	}
> >  	kfree(txmsg);
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_port(port);
> >  
> >  	return ret;
> >  }
> > @@ -1888,7 +2155,8 @@ int drm_dp_update_payload_part1(struct
> > drm_dp_mst_topology_mgr *mgr)
> >  		if (vcpi) {
> >  			port = container_of(vcpi, struct drm_dp_mst_port,
> >  					    vcpi);
> > -			port = drm_dp_get_validated_port_ref(mgr, port);
> > +			port = drm_dp_mst_topology_get_port_validated(mgr,
> > +								      port);
> >  			if (!port) {
> >  				mutex_unlock(&mgr->payload_lock);
> >  				return -EINVAL;
> > @@ -1925,7 +2193,7 @@ int drm_dp_update_payload_part1(struct
> > drm_dp_mst_topology_mgr *mgr)
> >  		cur_slots += req_payload.num_slots;
> >  
> >  		if (port)
> > -			drm_dp_put_port(port);
> > +			drm_dp_mst_topology_put_port(port);
> >  	}
> >  
> >  	for (i = 0; i < mgr->max_payloads; i++) {
> > @@ -2024,7 +2292,7 @@ static int drm_dp_send_dpcd_write(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	struct drm_dp_sideband_msg_tx *txmsg;
> >  	struct drm_dp_mst_branch *mstb;
> >  
> > -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> > +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
> >  	if (!mstb)
> >  		return -EINVAL;
> >  
> > @@ -2048,7 +2316,7 @@ static int drm_dp_send_dpcd_write(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	}
> >  	kfree(txmsg);
> >  fail_put:
> > -	drm_dp_put_mst_branch_device(mstb);
> > +	drm_dp_mst_topology_put_mstb(mstb);
> >  	return ret;
> >  }
> >  
> > @@ -2158,7 +2426,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct
> > drm_dp_mst_topology_mgr *mgr, bool ms
> >  
> >  		/* give this the main reference */
> >  		mgr->mst_primary = mstb;
> > -		kref_get(&mgr->mst_primary->kref);
> > +		drm_dp_mst_topology_ref_mstb(mgr->mst_primary);
> >  
> >  		ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
> >  							 DP_MST_EN |
> > DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC);
> > @@ -2192,7 +2460,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct
> > drm_dp_mst_topology_mgr *mgr, bool ms
> >  out_unlock:
> >  	mutex_unlock(&mgr->lock);
> >  	if (mstb)
> > -		drm_dp_put_mst_branch_device(mstb);
> > +		drm_dp_mst_topology_put_mstb(mstb);
> >  	return ret;
> >  
> >  }
> > @@ -2357,7 +2625,7 @@ static int drm_dp_mst_handle_down_rep(struct
> > drm_dp_mst_topology_mgr *mgr)
> >  			       mgr->down_rep_recv.initial_hdr.lct,
> >  				      mgr->down_rep_recv.initial_hdr.rad[0],
> >  				      mgr->down_rep_recv.msg[0]);
> > -			drm_dp_put_mst_branch_device(mstb);
> > +			drm_dp_mst_topology_put_mstb(mstb);
> >  			memset(&mgr->down_rep_recv, 0, sizeof(struct
> > drm_dp_sideband_msg_rx));
> >  			return 0;
> >  		}
> > @@ -2368,7 +2636,7 @@ static int drm_dp_mst_handle_down_rep(struct
> > drm_dp_mst_topology_mgr *mgr)
> >  		}
> >  
> >  		memset(&mgr->down_rep_recv, 0, sizeof(struct
> > drm_dp_sideband_msg_rx));
> > -		drm_dp_put_mst_branch_device(mstb);
> > +		drm_dp_mst_topology_put_mstb(mstb);
> >  
> >  		mutex_lock(&mgr->qlock);
> >  		txmsg->state = DRM_DP_SIDEBAND_TX_RX;
> > @@ -2441,7 +2709,7 @@ static int drm_dp_mst_handle_up_req(struct
> > drm_dp_mst_topology_mgr *mgr)
> >  		}
> >  
> >  		if (mstb)
> > -			drm_dp_put_mst_branch_device(mstb);
> > +			drm_dp_mst_topology_put_mstb(mstb);
> >  
> >  		memset(&mgr->up_req_recv, 0, sizeof(struct
> > drm_dp_sideband_msg_rx));
> >  	}
> > @@ -2501,7 +2769,7 @@ enum drm_connector_status
> > drm_dp_mst_detect_port(struct drm_connector *connector
> >  	enum drm_connector_status status = connector_status_disconnected;
> >  
> >  	/* we need to search for the port in the mgr in case its gone */
> > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> >  	if (!port)
> >  		return connector_status_disconnected;
> >  
> > @@ -2526,7 +2794,7 @@ enum drm_connector_status
> > drm_dp_mst_detect_port(struct drm_connector *connector
> >  		break;
> >  	}
> >  out:
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_port(port);
> >  	return status;
> >  }
> >  EXPORT_SYMBOL(drm_dp_mst_detect_port);
> > @@ -2543,11 +2811,11 @@ bool drm_dp_mst_port_has_audio(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  {
> >  	bool ret = false;
> >  
> > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> >  	if (!port)
> >  		return ret;
> >  	ret = port->has_audio;
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_port(port);
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL(drm_dp_mst_port_has_audio);
> > @@ -2567,7 +2835,7 @@ struct edid *drm_dp_mst_get_edid(struct
> > drm_connector *connector, struct drm_dp_
> >  	struct edid *edid = NULL;
> >  
> >  	/* we need to search for the port in the mgr in case its gone */
> > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> >  	if (!port)
> >  		return NULL;
> >  
> > @@ -2578,7 +2846,7 @@ struct edid *drm_dp_mst_get_edid(struct
> > drm_connector *connector, struct drm_dp_
> >  		drm_connector_set_tile_property(connector);
> >  	}
> >  	port->has_audio = drm_detect_monitor_audio(edid);
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_port(port);
> >  	return edid;
> >  }
> >  EXPORT_SYMBOL(drm_dp_mst_get_edid);
> > @@ -2649,7 +2917,7 @@ int drm_dp_atomic_find_vcpi_slots(struct
> > drm_atomic_state *state,
> >  	if (IS_ERR(topology_state))
> >  		return PTR_ERR(topology_state);
> >  
> > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> >  	if (port == NULL)
> >  		return -EINVAL;
> >  	req_slots = DIV_ROUND_UP(pbn, mgr->pbn_div);
> > @@ -2657,14 +2925,14 @@ int drm_dp_atomic_find_vcpi_slots(struct
> > drm_atomic_state *state,
> >  			req_slots, topology_state->avail_slots);
> >  
> >  	if (req_slots > topology_state->avail_slots) {
> > -		drm_dp_put_port(port);
> > +		drm_dp_mst_topology_put_port(port);
> >  		return -ENOSPC;
> >  	}
> >  
> >  	topology_state->avail_slots -= req_slots;
> >  	DRM_DEBUG_KMS("vcpi slots avail=%d", topology_state->avail_slots);
> >  
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_port(port);
> >  	return req_slots;
> >  }
> >  EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
> > @@ -2715,7 +2983,7 @@ bool drm_dp_mst_allocate_vcpi(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  {
> >  	int ret;
> >  
> > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> >  	if (!port)
> >  		return false;
> >  
> > @@ -2725,7 +2993,7 @@ bool drm_dp_mst_allocate_vcpi(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	if (port->vcpi.vcpi > 0) {
> >  		DRM_DEBUG_KMS("payload: vcpi %d already allocated for pbn %d -
> > requested pbn %d\n", port->vcpi.vcpi, port->vcpi.pbn, pbn);
> >  		if (pbn == port->vcpi.pbn) {
> > -			drm_dp_put_port(port);
> > +			drm_dp_mst_topology_put_port(port);
> >  			return true;
> >  		}
> >  	}
> > @@ -2733,13 +3001,13 @@ bool drm_dp_mst_allocate_vcpi(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	ret = drm_dp_init_vcpi(mgr, &port->vcpi, pbn, slots);
> >  	if (ret) {
> >  		DRM_DEBUG_KMS("failed to init vcpi slots=%d max=63 ret=%d\n",
> > -				DIV_ROUND_UP(pbn, mgr->pbn_div), ret);
> > +			      DIV_ROUND_UP(pbn, mgr->pbn_div), ret);
> >  		goto out;
> >  	}
> >  	DRM_DEBUG_KMS("initing vcpi for pbn=%d slots=%d\n",
> > -			pbn, port->vcpi.num_slots);
> > +		      pbn, port->vcpi.num_slots);
> >  
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_port(port);
> >  	return true;
> >  out:
> >  	return false;
> > @@ -2749,12 +3017,12 @@ EXPORT_SYMBOL(drm_dp_mst_allocate_vcpi);
> >  int drm_dp_mst_get_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr, struct
> > drm_dp_mst_port *port)
> >  {
> >  	int slots = 0;
> > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> >  	if (!port)
> >  		return slots;
> >  
> >  	slots = port->vcpi.num_slots;
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_port(port);
> >  	return slots;
> >  }
> >  EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
> > @@ -2768,11 +3036,11 @@ EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
> >   */
> >  void drm_dp_mst_reset_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr,
> > struct drm_dp_mst_port *port)
> >  {
> > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> >  	if (!port)
> >  		return;
> >  	port->vcpi.num_slots = 0;
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_port(port);
> >  }
> >  EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> >  
> > @@ -2781,9 +3049,10 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> >   * @mgr: manager for this port
> >   * @port: unverified port to deallocate vcpi for
> >   */
> > -void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
> > struct drm_dp_mst_port *port)
> > +void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
> > +				struct drm_dp_mst_port *port)
> >  {
> > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> >  	if (!port)
> >  		return;
> >  
> > @@ -2792,7 +3061,7 @@ void drm_dp_mst_deallocate_vcpi(struct
> > drm_dp_mst_topology_mgr *mgr, struct drm_
> >  	port->vcpi.pbn = 0;
> >  	port->vcpi.aligned_pbn = 0;
> >  	port->vcpi.vcpi = 0;
> > -	drm_dp_put_port(port);
> > +	drm_dp_mst_topology_put_port(port);
> >  }
> >  EXPORT_SYMBOL(drm_dp_mst_deallocate_vcpi);
> >  
> > @@ -3078,8 +3347,10 @@ static void drm_dp_tx_work(struct work_struct
> > *work)
> >  
> >  static void drm_dp_free_mst_port(struct kref *kref)
> >  {
> > -	struct drm_dp_mst_port *port = container_of(kref, struct
> > drm_dp_mst_port, kref);
> > -	kref_put(&port->parent->kref, drm_dp_free_mst_branch_device);
> > +	struct drm_dp_mst_port *port =
> > +		container_of(kref, struct drm_dp_mst_port, malloc_kref);
> > +
> > +	drm_dp_mst_put_mstb_malloc(port->parent);
> >  	kfree(port);
> >  }
> >  
> > @@ -3103,7 +3374,6 @@ static void drm_dp_destroy_connector_work(struct
> > work_struct *work)
> >  		list_del(&port->next);
> >  		mutex_unlock(&mgr->destroy_connector_lock);
> >  
> > -		kref_init(&port->kref);
> >  		INIT_LIST_HEAD(&port->next);
> >  
> >  		mgr->cbs->destroy_connector(mgr, port->connector);
> > @@ -3117,7 +3387,7 @@ static void drm_dp_destroy_connector_work(struct
> > work_struct *work)
> >  			drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
> >  		}
> >  
> > -		kref_put(&port->kref, drm_dp_free_mst_port);
> > +		drm_dp_mst_put_port_malloc(port);
> >  		send_hotplug = true;
> >  	}
> >  	if (send_hotplug)
> > @@ -3292,7 +3562,7 @@ static int drm_dp_mst_i2c_xfer(struct i2c_adapter
> > *adapter, struct i2c_msg *msgs
> >  	struct drm_dp_sideband_msg_tx *txmsg = NULL;
> >  	int ret;
> >  
> > -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> > +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
> >  	if (!mstb)
> >  		return -EREMOTEIO;
> >  
> > @@ -3342,7 +3612,7 @@ static int drm_dp_mst_i2c_xfer(struct i2c_adapter
> > *adapter, struct i2c_msg *msgs
> >  	}
> >  out:
> >  	kfree(txmsg);
> > -	drm_dp_put_mst_branch_device(mstb);
> > +	drm_dp_mst_topology_put_mstb(mstb);
> >  	return ret;
> >  }
> >  
> > diff --git a/include/drm/drm_dp_mst_helper.h
> > b/include/drm/drm_dp_mst_helper.h
> > index 371cc2816477..50643a39765d 100644
> > --- a/include/drm/drm_dp_mst_helper.h
> > +++ b/include/drm/drm_dp_mst_helper.h
> > @@ -44,7 +44,10 @@ struct drm_dp_vcpi {
> >  
> >  /**
> >   * struct drm_dp_mst_port - MST port
> > - * @kref: reference count for this port.
> > + * @topology_kref: refcount for this port's lifetime in the topology,
> > only the
> > + * DP MST helpers should need to touch this
> > + * @malloc_kref: refcount for the memory allocation containing this
> > structure.
> > + * See drm_dp_mst_get_port_malloc() and drm_dp_mst_put_port_malloc().
> >   * @port_num: port number
> >   * @input: if this port is an input port.
> >   * @mcs: message capability status - DP 1.2 spec.
> > @@ -67,7 +70,8 @@ struct drm_dp_vcpi {
> >   * in the MST topology.
> >   */
> >  struct drm_dp_mst_port {
> > -	struct kref kref;
> > +	struct kref topology_kref;
> > +	struct kref malloc_kref;
> 
> I'd to inline member kerneldoc here (you can mix&match, so no need to
> rewrite them all) and spend a few words reference the family of get/put
> functions. Same for mstb below.
> 
> >  
> >  	u8 port_num;
> >  	bool input;
> > @@ -102,7 +106,10 @@ struct drm_dp_mst_port {
> >  
> >  /**
> >   * struct drm_dp_mst_branch - MST branch device.
> > - * @kref: reference count for this port.
> > + * @topology_kref: refcount for this branch device's lifetime in the
> > topology,
> > + * only the DP MST helpers should need to touch this
> > + * @malloc_kref: refcount for the memory allocation containing this
> > structure.
> > + * See drm_dp_mst_get_mstb_malloc() and drm_dp_mst_put_mstb_malloc().
> >   * @rad: Relative Address to talk to this branch device.
> >   * @lct: Link count total to talk to this branch device.
> >   * @num_ports: number of ports on the branch.
> > @@ -121,7 +128,8 @@ struct drm_dp_mst_port {
> >   * to downstream port of parent branches.
> >   */
> >  struct drm_dp_mst_branch {
> > -	struct kref kref;
> > +	struct kref topology_kref;
> > +	struct kref malloc_kref;
> >  	u8 rad[8];
> >  	u8 lct;
> >  	int num_ports;
> > @@ -626,4 +634,7 @@ int drm_dp_atomic_release_vcpi_slots(struct
> > drm_atomic_state *state,
> >  int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
> >  				 struct drm_dp_mst_port *port, bool power_up);
> >  
> > +void drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port);
> > +void drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port);
> > +
> >  #endif
> > -- 
> > 2.19.2
> 
> I really like. Mostly concentrated on looking at the docs. Also still
> need to apply it and build the docs, so I can appreciate the DOT graphs.
> -Daniel
-- 
Cheers,
	Lyude Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 06/15] drm/i915: Keep malloc references to MST ports
  2018-12-14  9:32   ` Daniel Vetter
@ 2018-12-18 21:52     ` Lyude Paul
  2018-12-19 13:20       ` Daniel Vetter
  0 siblings, 1 reply; 31+ messages in thread
From: Lyude Paul @ 2018-12-18 21:52 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, nouveau, intel-gfx, amd-gfx, Dave Airlie,
	Harry Wentland, Jerry Zuo, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, linux-kernel

On Fri, 2018-12-14 at 10:32 +0100, Daniel Vetter wrote:
> On Thu, Dec 13, 2018 at 08:25:35PM -0500, Lyude Paul wrote:
> > So that the ports stay around until we've destroyed the connectors, in
> > order to ensure that we don't pass an invalid pointer to any MST helpers
> > once we introduce the new MST VCPI helpers.
> > 
> > Signed-off-by: Lyude Paul <lyude@redhat.com>
> > ---
> >  drivers/gpu/drm/i915/intel_connector.c | 4 ++++
> >  drivers/gpu/drm/i915/intel_dp_mst.c    | 2 ++
> >  2 files changed, 6 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_connector.c
> > b/drivers/gpu/drm/i915/intel_connector.c
> > index 18e370f607bc..37d2c644f4b8 100644
> > --- a/drivers/gpu/drm/i915/intel_connector.c
> > +++ b/drivers/gpu/drm/i915/intel_connector.c
> > @@ -95,6 +95,10 @@ void intel_connector_destroy(struct drm_connector
> > *connector)
> >  	intel_panel_fini(&intel_connector->panel);
> >  
> >  	drm_connector_cleanup(connector);
> > +
> > +	if (intel_connector->port)
> 
> We set this as the first thing in intel_dp_add_mst_connector, so this
> check isn't doing anything.

I think this comment might be a mistake? intel_connector_destroy() is shared
by all of the intel connectors, so some may not have a value in
intel_connector->port. I can move it to it's own destroy callback for MST if
you would prefer though.
> 
> > +		drm_dp_mst_put_port_malloc(intel_connector->port);
> > +
> >  	kfree(connector);
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c
> > b/drivers/gpu/drm/i915/intel_dp_mst.c
> > index f05427b74e34..4d6ced34d465 100644
> > --- a/drivers/gpu/drm/i915/intel_dp_mst.c
> > +++ b/drivers/gpu/drm/i915/intel_dp_mst.c
> > @@ -484,6 +484,8 @@ static struct drm_connector
> > *intel_dp_add_mst_connector(struct drm_dp_mst_topolo
> >  	if (ret)
> >  		goto err;
> >  
> > +	drm_dp_mst_get_port_malloc(port);
> 
> Needs to be moved up where we assing intel_connector->port, or it'll
> underflow on cleanup on error paths.
> 
> > +
> >  	return connector;
> >  
> >  err:
> > -- 
> > 2.19.2
> > 
-- 
Cheers,
	Lyude Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 05/15] drm/dp_mst: Fix payload deallocation on hotplugs using malloc refs
  2018-12-14  9:38   ` Daniel Vetter
@ 2018-12-18 22:02     ` Lyude Paul
  0 siblings, 0 replies; 31+ messages in thread
From: Lyude Paul @ 2018-12-18 22:02 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, nouveau, intel-gfx, amd-gfx, Dave Airlie,
	Harry Wentland, Jerry Zuo, Maarten Lankhorst, Maxime Ripard,
	Sean Paul, David Airlie, linux-kernel

On Fri, 2018-12-14 at 10:38 +0100, Daniel Vetter wrote:
> On Thu, Dec 13, 2018 at 08:25:34PM -0500, Lyude Paul wrote:
> > Up until now, freeing payloads on remote MST hubs that just had ports
> > removed has almost never worked because we've been relying on port
> > validation in order to stop us from accessing ports that have already
> > been freed from memory, but ports which need their payloads released due
> > to being removed will never be a valid part of the topology after
> > they've been removed.
> > 
> > Since we've introduced malloc refs, we can replace all of the validation
> > logic in payload helpers which are used for deallocation with some
> > well-placed malloc krefs. This ensures that regardless of whether or not
> > the ports are still valid and in the topology, any port which has an
> > allocated payload will remain allocated in memory until it's payloads
> > have been removed - finally allowing us to actually release said
> > payloads correctly.
> > 
> > Signed-off-by: Lyude Paul <lyude@redhat.com>
> 
> I think with this we can also remove the int return value (that everyone
> ignored except for some debug output) from drm_dp_update_payload_part1/2.
> Follow-up cleanup patch ofc.
A note as well-I'm not sure we want to remove that yet, it might actually be a
better idea to just teach drivers to actually look at the return values so
they can take action if it fails. Of course, such actions wouldn't involve
actually changing the atomic state that userspace set, but moreso knowing
which parts of updating the payload succeeded or not so if a downstream branch
was disconnected mid-atomic commit we have enough information to know what
steps we need to skip when userspace later unsets the mode on said branch.
> 
> This looks good.
> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/gpu/drm/drm_dp_mst_topology.c | 54 +++++++++++++++------------
> >  1 file changed, 30 insertions(+), 24 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index ae9d019af9f2..93f08bfd2ab3 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -1989,10 +1989,6 @@ static int drm_dp_payload_send_msg(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	u8 sinks[DRM_DP_MAX_SDP_STREAMS];
> >  	int i;
> >  
> > -	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > -	if (!port)
> > -		return -EINVAL;
> > -
> >  	port_num = port->port_num;
> >  	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
> >  	if (!mstb) {
> > @@ -2000,10 +1996,8 @@ static int drm_dp_payload_send_msg(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  							       port->parent,
> >  							       &port_num);
> >  
> > -		if (!mstb) {
> > -			drm_dp_mst_topology_put_port(port);
> > +		if (!mstb)
> >  			return -EINVAL;
> > -		}
> >  	}
> >  
> >  	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> > @@ -2032,7 +2026,6 @@ static int drm_dp_payload_send_msg(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	kfree(txmsg);
> >  fail_put:
> >  	drm_dp_mst_topology_put_mstb(mstb);
> > -	drm_dp_mst_topology_put_port(port);
> >  	return ret;
> >  }
> >  
> > @@ -2137,15 +2130,16 @@ static int drm_dp_destroy_payload_step2(struct
> > drm_dp_mst_topology_mgr *mgr,
> >   */
> >  int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
> >  {
> > -	int i, j;
> > -	int cur_slots = 1;
> >  	struct drm_dp_payload req_payload;
> >  	struct drm_dp_mst_port *port;
> > +	int i, j;
> > +	int cur_slots = 1;
> >  
> >  	mutex_lock(&mgr->payload_lock);
> >  	for (i = 0; i < mgr->max_payloads; i++) {
> >  		struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
> >  		struct drm_dp_payload *payload = &mgr->payloads[i];
> > +		bool put_port = false;
> >  
> >  		/* solve the current payloads - compare to the hw ones
> >  		   - update the hw view */
> > @@ -2153,12 +2147,20 @@ int drm_dp_update_payload_part1(struct
> > drm_dp_mst_topology_mgr *mgr)
> >  		if (vcpi) {
> >  			port = container_of(vcpi, struct drm_dp_mst_port,
> >  					    vcpi);
> > -			port = drm_dp_mst_topology_get_port_validated(mgr,
> > -								      port);
> > -			if (!port) {
> > -				mutex_unlock(&mgr->payload_lock);
> > -				return -EINVAL;
> > +
> > +			/* Validated ports don't matter if we're releasing
> > +			 * VCPI
> > +			 */
> > +			if (vcpi->num_slots) {
> > +				port = drm_dp_mst_topology_get_port_validated(
> > +				    mgr, port);
> > +				if (!port) {
> > +					mutex_unlock(&mgr->payload_lock);
> > +					return -EINVAL;
> > +				}
> > +				put_port = true;
> >  			}
> > +
> >  			req_payload.num_slots = vcpi->num_slots;
> >  			req_payload.vcpi = vcpi->vcpi;
> >  		} else {
> > @@ -2190,7 +2192,7 @@ int drm_dp_update_payload_part1(struct
> > drm_dp_mst_topology_mgr *mgr)
> >  		}
> >  		cur_slots += req_payload.num_slots;
> >  
> > -		if (port)
> > +		if (put_port)
> >  			drm_dp_mst_topology_put_port(port);
> >  	}
> >  
> > @@ -3005,6 +3007,8 @@ bool drm_dp_mst_allocate_vcpi(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	DRM_DEBUG_KMS("initing vcpi for pbn=%d slots=%d\n",
> >  		      pbn, port->vcpi.num_slots);
> >  
> > +	/* Keep port allocated until it's payload has been removed */
> > +	drm_dp_mst_get_port_malloc(port);
> >  	drm_dp_mst_topology_put_port(port);
> >  	return true;
> >  out:
> > @@ -3034,11 +3038,12 @@ EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
> >   */
> >  void drm_dp_mst_reset_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr,
> > struct drm_dp_mst_port *port)
> >  {
> > -	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > -	if (!port)
> > -		return;
> > +	/*
> > +	 * A port with VCPI will remain allocated until it's VCPI is
> > +	 * released, no verified ref needed
> > +	 */
> > +
> >  	port->vcpi.num_slots = 0;
> > -	drm_dp_mst_topology_put_port(port);
> >  }
> >  EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> >  
> > @@ -3050,16 +3055,17 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> >  void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
> >  				struct drm_dp_mst_port *port)
> >  {
> > -	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > -	if (!port)
> > -		return;
> > +	/*
> > +	 * A port with VCPI will remain allocated until it's VCPI is
> > +	 * released, no verified ref needed
> > +	 */
> >  
> >  	drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
> >  	port->vcpi.num_slots = 0;
> >  	port->vcpi.pbn = 0;
> >  	port->vcpi.aligned_pbn = 0;
> >  	port->vcpi.vcpi = 0;
> > -	drm_dp_mst_topology_put_port(port);
> > +	drm_dp_mst_put_port_malloc(port);
> >  }
> >  EXPORT_SYMBOL(drm_dp_mst_deallocate_vcpi);
> >  
> > -- 
> > 2.19.2
> > 
-- 
Cheers,
	Lyude Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports
  2018-12-18 21:27     ` Lyude Paul
@ 2018-12-19 12:48       ` Daniel Vetter
  2018-12-19 18:36         ` Lyude Paul
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Vetter @ 2018-12-19 12:48 UTC (permalink / raw)
  To: Lyude Paul
  Cc: Daniel Vetter, dri-devel, nouveau, intel-gfx, amd-gfx,
	Dave Airlie, Harry Wentland, Jerry Zuo, David Airlie, Juston Li,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	Jonathan Corbet, linux-doc, linux-kernel

On Tue, Dec 18, 2018 at 04:27:58PM -0500, Lyude Paul wrote:
> On Fri, 2018-12-14 at 10:29 +0100, Daniel Vetter wrote:
> > On Thu, Dec 13, 2018 at 08:25:32PM -0500, Lyude Paul wrote:
> > > The current way of handling refcounting in the DP MST helpers is really
> > > confusing and probably just plain wrong because it's been hacked up many
> > > times over the years without anyone actually going over the code and
> > > seeing if things could be simplified.
> > > 
> > > To the best of my understanding, the current scheme works like this:
> > > drm_dp_mst_port and drm_dp_mst_branch both have a single refcount. When
> > > this refcount hits 0 for either of the two, they're removed from the
> > > topology state, but not immediately freed. Both ports and branch devices
> > > will reinitialize their kref once it's hit 0 before actually destroying
> > > themselves. The intended purpose behind this is so that we can avoid
> > > problems like not being able to free a remote payload that might still
> > > be active, due to us having removed all of the port/branch device
> > > structures in memory, as per:
> > > 
> > > 91a25e463130 ("drm/dp/mst: deallocate payload on port destruction")
> > > 
> > > Which may have worked, but then it caused use-after-free errors. Being
> > > new to MST at the time, I tried fixing it;
> > > 
> > > 263efde31f97 ("drm/dp/mst: Get validated port ref in
> > > drm_dp_update_payload_part1()")
> > > 
> > > But, that was broken: both drm_dp_mst_port and drm_dp_mst_branch structs
> > > are validated in almost every DP MST helper function. Simply put, this
> > > means we go through the topology and try to see if the given
> > > drm_dp_mst_branch or drm_dp_mst_port is still attached to something
> > > before trying to use it in order to avoid dereferencing freed memory
> > > (something that has happened a LOT in the past with this library).
> > > Because of this it doesn't actually matter whether or not we keep keep
> > > the ports and branches around in memory as that's not enough, because
> > > any function that validates the branches and ports passed to it will
> > > still reject them anyway since they're no longer in the topology
> > > structure. So, use-after-free errors were fixed but payload deallocation
> > > was completely broken.
> > > 
> > > Two years later, AMD informed me about this issue and I attempted to
> > > come up with a temporary fix, pending a long-overdue cleanup of this
> > > library:
> > > 
> > > c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just
> > > ref")
> > > 
> > > But then that introduced use-after-free errors, so I quickly reverted
> > > it:
> > > 
> > > 9765635b3075 ("Revert "drm/dp_mst: Skip validating ports during
> > > destruction, just ref"")
> > > 
> > > And in the process, learned that there is just no simple fix for this:
> > > the design is just broken. Unfortuntely, the usage of these helpers are
> > > quite broken as well. Some drivers like i915 have been smart enough to
> > > avoid accessing any kind of information from MST port structures, but
> > > others like nouveau have assumed, understandably so, that
> > > drm_dp_mst_port structures are normal and can just be accessed at any
> > > time without worrying about use-after-free errors.
> > > 
> > > After a lot of discussion, me and Daniel Vetter came up with a better
> > > idea to replace all of this.
> > > 
> > > To summarize, since this is documented far more indepth in the
> > > documentation this patch introduces, we make it so that drm_dp_mst_port
> > > and drm_dp_mst_branch structures have two different classes of
> > > refcounts: topology_kref, and malloc_kref. topology_kref corresponds to
> > > the lifetime of the given drm_dp_mst_port or drm_dp_mst_branch in it's
> > > given topology. Once it hits zero, any associated connectors are removed
> > > and the branch or port can no longer be validated. malloc_kref
> > > corresponds to the lifetime of the memory allocation for the actual
> > > structure, and will always be non-zero so long as the topology_kref is
> > > non-zero. This gives us a way to allow callers to hold onto port and
> > > branch device structures past their topology lifetime, and dramatically
> > > simplifies the lifetimes of both structures. This also finally fixes the
> > > port deallocation problem, properly.
> > > 
> > > Additionally: since this now means that we can keep ports and branch
> > > devices allocated in memory for however long we need, we no longer need
> > > a significant amount of the port validation that we currently do.
> > > 
> > > Additionally, there is one last scenario that this fixes, which couldn't
> > > have been fixed properly beforehand:
> > > 
> > > - CPU1 unrefs port from topology (refcount 1->0)
> > > - CPU2 refs port in topology(refcount 0->1)
> > > 
> > > Since we now can guarantee memory safety for ports and branches
> > > as-needed, we also can make our main reference counting functions fix
> > > this problem by using kref_get_unless_zero() internally so that topology
> > > refcounts can only ever reach 0 once.
> > > 
> > > Signed-off-by: Lyude Paul <lyude@redhat.com>
> > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > Cc: David Airlie <airlied@redhat.com>
> > > Cc: Jerry Zuo <Jerry.Zuo@amd.com>
> > > Cc: Harry Wentland <harry.wentland@amd.com>
> > > Cc: Juston Li <juston.li@intel.com>
> > > ---
> > >  .../gpu/dp-mst/topology-figure-1.dot          |  31 ++
> > >  .../gpu/dp-mst/topology-figure-2.dot          |  37 ++
> > >  .../gpu/dp-mst/topology-figure-3.dot          |  40 ++
> > >  Documentation/gpu/drm-kms-helpers.rst         | 125 ++++-
> > >  drivers/gpu/drm/drm_dp_mst_topology.c         | 512 +++++++++++++-----
> > >  include/drm/drm_dp_mst_helper.h               |  19 +-
> > >  6 files changed, 637 insertions(+), 127 deletions(-)
> > >  create mode 100644 Documentation/gpu/dp-mst/topology-figure-1.dot
> > >  create mode 100644 Documentation/gpu/dp-mst/topology-figure-2.dot
> > >  create mode 100644 Documentation/gpu/dp-mst/topology-figure-3.dot
> > 
> > Yay, docs, and pretty ones at that! Awesome stuff :-)
> > 
> > > diff --git a/Documentation/gpu/dp-mst/topology-figure-1.dot
> > > b/Documentation/gpu/dp-mst/topology-figure-1.dot
> > > new file mode 100644
> > > index 000000000000..fb83789e0a3e
> > > --- /dev/null
> > > +++ b/Documentation/gpu/dp-mst/topology-figure-1.dot
> > > @@ -0,0 +1,31 @@
> > > +digraph T {
> > > +    /* Topology references */
> > > +    node [shape=oval];
> > > +    mstb1 -> {port1, port2};
> > > +    port1 -> mstb2;
> > > +    port2 -> mstb3 -> {port3, port4};
> > > +    port3 -> mstb4;
> > > +
> > > +    /* Malloc references */
> > > +    edge [style=dashed];
> > > +    mstb4 -> port3;
> > > +    {port4, port3} -> mstb3;
> > > +    mstb3 -> port2;
> > > +    mstb2 -> port1;
> > > +    {port1, port2} -> mstb1;
> > > +
> > > +    edge [dir=back];
> > > +    node [style=filled;shape=box;fillcolor=lightblue];
> > > +    port1 -> "Payload #1";
> > > +    port3 -> "Payload #2";
> > > +
> > > +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> > > +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> > > +    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
> > > +    mstb4 [label="MSTB #4";style=filled;fillcolor=palegreen];
> > > +
> > > +    port1 [label="Port #1"];
> > > +    port2 [label="Port #2"];
> > > +    port3 [label="Port #3"];
> > > +    port4 [label="Port #4"];
> > > +}
> > > diff --git a/Documentation/gpu/dp-mst/topology-figure-2.dot
> > > b/Documentation/gpu/dp-mst/topology-figure-2.dot
> > > new file mode 100644
> > > index 000000000000..eebce560be40
> > > --- /dev/null
> > > +++ b/Documentation/gpu/dp-mst/topology-figure-2.dot
> > > @@ -0,0 +1,37 @@
> > > +digraph T {
> > > +    /* Topology references */
> > > +    node [shape=oval];
> > > +
> > > +    mstb1 -> {port1, port2};
> > > +    port1 -> mstb2;
> > > +    edge [color=red];
> > > +    port2 -> mstb3 -> {port3, port4};
> > > +    port3 -> mstb4;
> > > +    edge [color=""];
> > > +
> > > +    /* Malloc references */
> > > +    edge [style=dashed];
> > > +    port3 -> mstb3;
> > > +    mstb3 -> port2;
> > > +    mstb2 -> port1;
> > > +    {port1, port2} -> mstb1;
> > > +    edge [color=red];
> > > +    mstb4 -> port3;
> > > +    port4 -> mstb3;
> > > +    edge [color=""];
> > > +
> > > +    edge [dir=back];
> > > +    node [style=filled;shape=box;fillcolor=lightblue];
> > > +    port1 -> "Payload #1";
> > > +    port3 -> "Payload #2";
> > > +
> > > +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> > > +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> > > +    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
> > > +    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
> > > +
> > > +    port1 [label="Port #1"];
> > > +    port2 [label="Port #2"];
> > > +    port3 [label="Port #3"];
> > > +    port4 [label="Port #4";style=filled;fillcolor=grey];
> > > +}
> > > diff --git a/Documentation/gpu/dp-mst/topology-figure-3.dot
> > > b/Documentation/gpu/dp-mst/topology-figure-3.dot
> > > new file mode 100644
> > > index 000000000000..9bf28d87144c
> > > --- /dev/null
> > > +++ b/Documentation/gpu/dp-mst/topology-figure-3.dot
> > > @@ -0,0 +1,40 @@
> > > +digraph T {
> > > +    /* Topology references */
> > > +    node [shape=oval];
> > > +
> > > +    mstb1 -> {port1, port2};
> > > +    port1 -> mstb2;
> > > +    edge [color=grey];
> > > +    port2 -> mstb3 -> {port3, port4};
> > > +    port3 -> mstb4;
> > > +    edge [color=""];
> > > +
> > > +    /* Malloc references */
> > > +    edge [style=dashed];
> > > +    port3 -> mstb3 [penwidth=3];
> > > +    mstb3 -> port2 [penwidth=3];
> > > +    mstb2 -> port1;
> > > +    {port1, port2} -> mstb1;
> > > +    edge [color=grey];
> > > +    mstb4 -> port3;
> > > +    port4 -> mstb3;
> > > +    edge [color=""];
> > > +
> > > +    edge [dir=back];
> > > +    node [style=filled;shape=box;fillcolor=lightblue];
> > > +    port1 -> payload1;
> > > +    port3 -> payload2 [penwidth=3];
> > > +
> > > +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> > > +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> > > +    mstb3 [label="MSTB #3";penwidth=3;style=filled;fillcolor=palegreen];
> > > +    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
> > > +
> > > +    port1 [label="Port #1"];
> > > +    port2 [label="Port #2";penwidth=3];
> > > +    port3 [label="Port #3";penwidth=3];
> > > +    port4 [label="Port #4";style=filled;fillcolor=grey];
> > > +
> > > +    payload1 [label="Payload #1"];
> > > +    payload2 [label="Payload #2";penwidth=3];
> > > +}
> > > diff --git a/Documentation/gpu/drm-kms-helpers.rst
> > > b/Documentation/gpu/drm-kms-helpers.rst
> > > index b422eb8edf16..c0f994c2c72f 100644
> > > --- a/Documentation/gpu/drm-kms-helpers.rst
> > > +++ b/Documentation/gpu/drm-kms-helpers.rst
> > > @@ -208,8 +208,11 @@ Display Port Dual Mode Adaptor Helper Functions
> > > Reference
> > >  .. kernel-doc:: drivers/gpu/drm/drm_dp_dual_mode_helper.c
> > >     :export:
> > >  
> > > -Display Port MST Helper Functions Reference
> > > -===========================================
> > > +Display Port MST Helpers
> > > +========================
> > > +
> > > +Functions Reference
> > > +-------------------
> > >  
> > >  .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > >     :doc: dp mst helper
> > > @@ -220,6 +223,124 @@ Display Port MST Helper Functions Reference
> > >  .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > >     :export:
> > >  
> > > +Branch device and port refcounting
> > > +----------------------------------
> > 
> > I generally try to put the long-form explanations before the function
> > references. Since usually the references completely drown out everything
> > else and make it harder to spot the important overview stuff.
> > 
> > 
> > > +
> > > +Overview
> > > +~~~~~~~~
> > > +
> > > +The refcounting schemes for :c:type:`struct drm_dp_mst_branch` and
> > > +:c:type:`struct drm_dp_mst_port` are somewhat unusual. Both ports and
> > > branch
> > > +devices have two different kinds of refcounts: topology refcounts, and
> > > malloc
> > > +refcounts.
> > > +
> > > +Topology refcount overview
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Topology refcounts are not exposed to drivers, and are handled internally
> > > by the
> > > +DP MST helpers. The helpers use them in order to prevent the in-memory
> > > topology
> > > +state from being changed in the middle of critical operations like
> > > changing the
> > > +internal state of payload allocations. This means each branch and port
> > > will be
> > > +considered to be connected to the rest of the topology until it's
> > > topology
> > > +refcount reaches zero. Additionally, for ports this means that their
> > > associated
> > > +:c:type:`struct drm_connector` will stay registered with userspace until
> > > the
> > > +port's refcount reaches 0.
> > > +
> > > +
> > > +Topology refcount functions
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +The DP MST helpers use the following functions to manage topology
> > > refcounts:
> > > +
> > > +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > > +   :functions: drm_dp_mst_topology_get_port drm_dp_mst_topology_put_port
> > > +               drm_dp_mst_topology_ref_port drm_dp_mst_topology_get_mstb
> > > +               drm_dp_mst_topology_put_mstb drm_dp_mst_topology_ref_mstb
> > > +
> > > +Malloc refcount overview
> > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Malloc references are used to keep a :c:type:`struct drm_dp_mst_port` or
> > > +:c:type:`struct drm_dp_mst_branch` allocated even after all of its
> > > topology
> > > +references have been dropped, so that the driver or MST helpers can
> > > safely
> > > +access each branch's last known state before it was disconnected from the
> > > +topology. When the malloc refcount of a port or branch reaches 0, the
> > > memory
> > > +allocation containing the :c:type:`struct drm_dp_mst_branch` or
> > > :c:type:`struct
> > > +drm_dp_mst_port` respectively will be freed.
> > > +
> > > +Malloc refcounts for ports
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +For :c:type:`struct drm_dp_mst_port`, malloc refcounts are exposed to
> > > drivers
> > > +through the following functions:
> > > +
> > > +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > > +   :functions: drm_dp_mst_get_port_malloc drm_dp_mst_put_port_malloc
> > > +
> > > +Malloc refcounts for branch devices
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +For :c:type:`struct drm_dp_mst_branch`, malloc refcounts are not
> > > currently
> > > +exposed to drivers. As of writing this documentation, there are no
> > > drivers that
> > > +have a usecase for accessing :c:type:`struct drm_dp_mst_branch` outside
> > > of the
> > > +MST helpers. Exposing this API to drivers in a race-free manner would
> > > take more
> > > +tweaking of the refcounting scheme, however patches are welcome provided
> > > there
> > > +is a legitimate driver usecase for this.
> > > +
> > > +Internally, malloc refcounts for :c:type:`struct drm_dp_mst_branch` are
> > > managed
> > > +by the DP MST core through the following functions:
> > > +
> > > +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > > +   :functions: drm_dp_mst_get_mstb_malloc drm_dp_mst_put_mstb_malloc
> > > +
> > > +Refcount relationships in a topology
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Let's take a look at why the relationship between topology and malloc
> > > refcounts
> > > +is designed the way it is.
> > > +
> > > +.. kernel-figure:: dp-mst/topology-figure-1.dot
> > > +
> > > +   An example of topology and malloc refs in a DP MST topology with two
> > > active
> > > +   payloads. Topology refcount increments are indicated by solid lines,
> > > and
> > > +   malloc refcount increments are indicated by dashed lines. Each starts
> > > from
> > > +   the branch which incremented the refcount, and ends at the branch to
> > > which
> > > +   the refcount belongs to.
> > > +
> > > +As you can see in figure 1, every branch increments the topology
> > > +refcount of it's children, and increments the malloc refcount of it's
> > > parent.
> > > +Additionally, every payload increments the malloc refcount of it's
> > > assigned port
> > > +by 1.
> > > +
> > > +So, what would happen if MSTB #3 from the above figure was unplugged from
> > > the
> > > +system, but the driver hadn't yet removed payload #2 from port #3? The
> > > topology
> > > +would start to look like figure 2.
> > > +
> > > +.. kernel-figure:: dp-mst/topology-figure-2.dot
> > > +
> > > +   Ports and branch devices which have been released from memory are
> > > colored
> > > +   grey, and references which have been removed are colored red.
> > > +
> > > +Whenever a port or branch device's topology refcount reaches zero, it
> > > will
> > > +decrement the topology refcounts of all its children, the malloc refcount
> > > of its
> > > +parent, and finally its own malloc refcount. For MSTB #4 and port #4,
> > > this means
> > > +they both have been disconnected from the topology and freed from memory.
> > > But,
> > > +because payload #2 is still holding a reference to port #3, port #3 is
> > > removed
> > > +from the topology but it's :c:type:`struct drm_dp_mst_port` is still
> > > accessible
> > > +from memory. This also means port #3 has not yet decremented the malloc
> > > refcount
> > > +of MSTB #3, so it's :c:type:`struct drm_dp_mst_branch` will also stay
> > > allocated
> > > +in memory until port #3's malloc refcount reaches 0.
> > > +
> > > +This relationship is necessary because in order to release payload #2, we
> > > +need to be able to figure out the last relative of port #3 that's still
> > > +connected to the topology. In this case, we would travel up the topology
> > > as
> > > +shown in figure 3.
> > > +
> > > +.. kernel-figure:: dp-mst/topology-figure-3.dot
> > > +
> > > +And finally, remove payload #2 by communicating with port #2 through
> > > sideband
> > > +transactions.
> > 
> > (Blind guess, I haven't looked ahead in the series yet)
> > 
> > I assume that drivers also want to hold a malloc reference from their
> > connector, until that connector is destroyed completed (and we hence know
> > it released all its vcpi and other stuff and really doesn't need the port
> > anymore). Could we integrated that into these neat graphs too? Answering
> > the "so how does this integrate into my driver?" question is imo the most
> > important part for core api docs.
> > 
> > Another one: Any reason for not putting this right into the code as a DOC:
> > section? Ime moving docs as close as possible to the code improves the
> > odds it's kept up-to-date. The only overview texts I've left in the .rst
> > is the stuff that describes overall concepts (e.g. how all the kms objects
> > fit together).
> > 
> > All the sphinx/rst syntax should carry over 1:1, except in kerneldoc you
> > also can benefit from the abbreviated reference syntax from kerneldoc.
> > 
> > Anyway, really great stuff.
> > 
> > > +
> > >  MIPI DSI Helper Functions Reference
> > >  ===================================
> > >  
> > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > index 2ab16c9e6243..c196fb580beb 100644
> > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > @@ -46,7 +46,7 @@ static bool dump_dp_payload_table(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  				  char *buf);
> > >  static int test_calc_pbn_mode(void);
> > >  
> > > -static void drm_dp_put_port(struct drm_dp_mst_port *port);
> > > +static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port);
> > >  
> > >  static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,
> > >  				     int id,
> > > @@ -850,46 +850,120 @@ static struct drm_dp_mst_branch
> > > *drm_dp_add_mst_branch_device(u8 lct, u8 *rad)
> > >  	if (lct > 1)
> > >  		memcpy(mstb->rad, rad, lct / 2);
> > >  	INIT_LIST_HEAD(&mstb->ports);
> > > -	kref_init(&mstb->kref);
> > > +	kref_init(&mstb->topology_kref);
> > > +	kref_init(&mstb->malloc_kref);
> > >  	return mstb;
> > >  }
> > >  
> > >  static void drm_dp_free_mst_port(struct kref *kref);
> > > +static void drm_dp_free_mst_branch_device(struct kref *kref);
> > 
> > I'd move the functions around, forward declarations for static functions
> > is a bit silly
> > 
> > > +
> > > +/**
> > > + * drm_dp_mst_get_mstb_malloc() - Increment the malloc refcount of a
> > > branch
> > > + * device
> > > + * @mstb: The &struct drm_dp_mst_branch to increment the malloc refcount
> > > of
> > > + *
> > > + * Increments @mstb.malloc_kref. When @mstb.malloc_kref reaches 0, the
> > > memory
> > 
> > s/@/&/ for structure member references. @ references to parameters/members
> > in the same kerneldoc type only. With & you'll get a nice link, @ is just
> > markup (and yes & with a member unfortunately doesn't link to the member,
> > only the overall structure).
> > 
> > Similarly below.
> > 
> > > + * allocation for @mstb will be released and @mstb may no longer be used.
> > > + *
> > > + * Any malloc references acquired with this function must be released
> > > when
> > > + * they are no longer being used by calling drm_dp_mst_put_mstb_malloc().
> > 
> > I'd dropped "when they are no longer being used", and the line below too.
> > Short docs are better generally because attention span of readers.
> > 
> > > + *
> > > + * See also: drm_dp_mst_put_mstb_malloc()
> > > + */
> > > +static void
> > > +drm_dp_mst_get_mstb_malloc(struct drm_dp_mst_branch *mstb)
> > > +{
> > > +	kref_get(&mstb->malloc_kref);
> > > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref));
> > > +}
> > > +
> > > +/**
> > > + * drm_dp_mst_put_mstb_malloc() - Decrement the malloc refcount of a
> > > branch
> > > + * device
> > > + * @mstb: The &struct drm_dp_mst_branch to decrement the malloc refcount
> > > of
> > > + *
> > > + * Decrements @mstb.malloc_kref. When @mstb.malloc_kref reaches 0, the
> > > memory
> > > + * allocation for @mstb will be released and @mstb may no longer be used.
> > > + *
> > > + * See also: drm_dp_mst_get_mstb_malloc()
> > > + */
> > > +static void
> > > +drm_dp_mst_put_mstb_malloc(struct drm_dp_mst_branch *mstb)
> > > +{
> > > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref)-1);
> > > +	kref_put(&mstb->malloc_kref, drm_dp_free_mst_branch_device);
> > > +}
> > > +
> > > +/**
> > > + * drm_dp_mst_get_port_malloc() - Increment the malloc refcount of an MST
> > > port
> > > + * @port: The &struct drm_dp_mst_port to increment the malloc refcount of
> > > + *
> > > + * Increments @port.malloc_kref. When @port.malloc_kref reaches 0, the
> > > memory
> > > + * allocation for @port will be released and @port may no longer be used.
> > > + *
> > > + * Because @port could potentially be freed at any time by the DP MST
> > > helpers
> > > + * if @port.malloc_kref reaches 0, including during a call to this
> > > function,
> > > + * drivers that which to make use of &struct drm_dp_mst_port should
> > > ensure
> > > + * that they grab at least one main malloc reference to their MST ports
> > > in
> > > + * &drm_dp_mst_topology_cbs.add_connector. This callback is called before
> > > + * there is any chance for @port.malloc_kref to reach 0.
> > > + *
> > > + * Any malloc references acquired with this function must be released
> > > when
> > > + * they are no longer being used by calling drm_dp_mst_put_port_malloc().
> > > + *
> > > + * See also: drm_dp_mst_put_port_malloc()
> > 
> > Same reduction as with mstb_malloc version.
> > 
> > > + */
> > > +void
> > > +drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port)
> > > +{
> > > +	kref_get(&port->malloc_kref);
> > > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref));
> > > +}
> > > +EXPORT_SYMBOL(drm_dp_mst_get_port_malloc);
> > > +
> > > +/**
> > > + * drm_dp_mst_put_port_malloc() - Decrement the malloc refcount of an MST
> > > port
> > > + * @port: The &struct drm_dp_mst_port to decrement the malloc refcount of
> > > + *
> > > + * Decrements @port.malloc_kref. When @port.malloc_kref reaches 0, the
> > > memory
> > > + * allocation for @port will be released and @port may no longer be used.
> > > + *
> > > + * See also: drm_dp_mst_get_port_malloc()
> > > + */
> > > +void
> > > +drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port)
> > > +{
> > > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref)-1);
> > > +	kref_put(&port->malloc_kref, drm_dp_free_mst_port);
> > > +}
> > > +EXPORT_SYMBOL(drm_dp_mst_put_port_malloc);
> > >  
> > >  static void drm_dp_free_mst_branch_device(struct kref *kref)
> > >  {
> > > -	struct drm_dp_mst_branch *mstb = container_of(kref, struct
> > > drm_dp_mst_branch, kref);
> > > -	if (mstb->port_parent) {
> > > -		if (list_empty(&mstb->port_parent->next))
> > > -			kref_put(&mstb->port_parent->kref,
> > > drm_dp_free_mst_port);
> > > -	}
> > > +	struct drm_dp_mst_branch *mstb =
> > > +		container_of(kref, struct drm_dp_mst_branch, malloc_kref);
> > > +
> > > +	if (mstb->port_parent)
> > > +		drm_dp_mst_put_port_malloc(mstb->port_parent);
> > > +
> > >  	kfree(mstb);
> > >  }
> > >  
> > >  static void drm_dp_destroy_mst_branch_device(struct kref *kref)
> > >  {
> > > -	struct drm_dp_mst_branch *mstb = container_of(kref, struct
> > > drm_dp_mst_branch, kref);
> > > +	struct drm_dp_mst_branch *mstb =
> > > +		container_of(kref, struct drm_dp_mst_branch, topology_kref);
> > > +	struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
> > >  	struct drm_dp_mst_port *port, *tmp;
> > >  	bool wake_tx = false;
> > >  
> > > -	/*
> > > -	 * init kref again to be used by ports to remove mst branch when it is
> > > -	 * not needed anymore
> > > -	 */
> > > -	kref_init(kref);
> > > -
> > > -	if (mstb->port_parent && list_empty(&mstb->port_parent->next))
> > > -		kref_get(&mstb->port_parent->kref);
> > > -
> > > -	/*
> > > -	 * destroy all ports - don't need lock
> > > -	 * as there are no more references to the mst branch
> > > -	 * device at this point.
> > > -	 */
> > > +	mutex_lock(&mgr->lock);
> > >  	list_for_each_entry_safe(port, tmp, &mstb->ports, next) {
> > >  		list_del(&port->next);
> > > -		drm_dp_put_port(port);
> > > +		drm_dp_mst_topology_put_port(port);
> > >  	}
> > > +	mutex_unlock(&mgr->lock);
> > 
> > Would be nice to split this out (to highlight the bugfix more), but
> > because of the kref_init() hack not really feasible I think :-/
> > >  
> > >  	/* drop any tx slots msg */
> > >  	mutex_lock(&mstb->mgr->qlock);
> > > @@ -908,14 +982,82 @@ static void drm_dp_destroy_mst_branch_device(struct
> > > kref *kref)
> > >  	if (wake_tx)
> > >  		wake_up_all(&mstb->mgr->tx_waitq);
> > >  
> > > -	kref_put(kref, drm_dp_free_mst_branch_device);
> > > +	drm_dp_mst_put_mstb_malloc(mstb);
> > >  }
> > >  
> > > -static void drm_dp_put_mst_branch_device(struct drm_dp_mst_branch *mstb)
> > > +/**
> > > + * drm_dp_mst_topology_get_mstb() - Increment the topology refcount of a
> > > + * branch device unless its zero
> > > + * @mstb: &struct drm_dp_mst_branch to increment the topology refcount of
> > > + *
> > > + * Attempts to grab a topology reference to @mstb, if it hasn't yet been
> > > + * removed from the topology (e.g. @mstb.topology_kref has reached 0).
> > > + *
> > > + * Any topology references acquired with this function must be released
> > > when
> > > + * they are no longer being used by calling
> > > drm_dp_mst_topology_put_mstb().
> > 
> > I'd explain the relationship with malloc_kref a bit here:
> > 
> > - topology ref implies a malloc ref, hence you can call get_mstb_malloc
> >   with only holding a topology ref (might be better to explain this in the
> >   get_mstb_malloc kerneldoc, since it also applies to the unconditional
> >   kref_get below)
> > - malloc_ref is enough to call this function, but then it can fail
> > 
> > > + *
> > > + * See also:
> > > + * drm_dp_mst_topology_ref_mstb()
> > 
> > I'd write out when you should use this one instead:
> > 
> > "If you already have a topology reference you should use other_function()
> > instead."
> > 
> > > + * drm_dp_mst_topology_get_mstb()
> > 
> > This is this function itself :-)
> > 
> > > + *
> > > + * Returns:
> > > + * * 1: A topology reference was grabbed successfully
> > > + * * 0: @port is no longer in the topology, no reference was grabbed
> > > + */
> > > +static int __must_check
> > > +drm_dp_mst_topology_get_mstb(struct drm_dp_mst_branch *mstb)
> > 
> > Hm if you both want a kref_get and a kref_get_unless_zero then we need
> > better naming. topology_get_mstb should be the unconditional kref_get, the
> > conditional kref_get_unless_zero needs some indication that it could fail.
> > We need some verb that indicates that instead of "get":
> > - "validate" since we've used that one already
> > - "lookup" that's used by all the drm_mode_object lookup functions, feels
> >   a bit misleading
> > - "try_get"
> > 
> > >  {
> > > -	kref_put(&mstb->kref, drm_dp_destroy_mst_branch_device);
> > > +	int ret = kref_get_unless_zero(&mstb->topology_kref);
> > > +
> > > +	if (ret)
> > > +		DRM_DEBUG("mstb %p (%d)\n", mstb,
> > > +			  kref_read(&mstb->topology_kref));
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +/**
> > > + * drm_dp_mst_topology_ref_mstb() - Increment the topology refcount of a
> > > + * branch device
> > > + * @mstb: The &struct drm_dp_mst_branch to increment the topology
> > > refcount of
> > > + *
> > > + * Increments @mstb.topology_refcount without checking whether or not
> > > it's
> > > + * already reached 0. This is only valid to use in scenarios where you
> > > are
> > > + * already guaranteed to have at least one active topology reference to
> > > @mstb.
> > > + * Otherwise, drm_dp_mst_topology_get_mstb() should be used.
> > 
> > s/should/must/  (or my English understanding is off, afaiui "should" isn't
> > a strict requirement per rfc2119)
> > 
> > > + *
> > > + * Any topology references acquired with this function must be released
> > > when
> > > + * they are no longer being used by calling
> > > drm_dp_mst_topology_put_mstb().
> > > + *
> > > + * See also:
> > > + * drm_dp_mst_topology_get_mstb()
> > > + * drm_dp_mst_topology_put_mstb()
> > > + */
> > > +static void
> > > +drm_dp_mst_topology_ref_mstb(struct drm_dp_mst_branch *mstb)
> > > +{
> > 
> > Should we have a WARN_ON(refcount == 0) here?
> > 
> > > +	kref_get(&mstb->topology_kref);
> > > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->topology_kref));
> > >  }
> > >  
> > > +/**
> > > + * drm_dp_mst_topology_put_mstb() - release a topology reference to a
> > > branch
> > > + * device
> > > + * @mstb: The &struct drm_dp_mst_branch to release the topology reference
> > > from
> > > + *
> > > + * Releases a topology reference from @mstb by decrementing
> > > + * @mstb.topology_kref.
> > > + *
> > > + * See also:
> > > + * drm_dp_mst_topology_get_mstb()
> > > + * drm_dp_mst_topology_ref_mstb()
> > > + */
> > > +static void
> > > +drm_dp_mst_topology_put_mstb(struct drm_dp_mst_branch *mstb)
> > > +{
> > > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->topology_kref)-1);
> > > +	kref_put(&mstb->topology_kref, drm_dp_destroy_mst_branch_device);
> > > +}
> > >  
> > >  static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int
> > > old_pdt)
> > >  {
> > > @@ -930,14 +1072,15 @@ static void drm_dp_port_teardown_pdt(struct
> > > drm_dp_mst_port *port, int old_pdt)
> > >  	case DP_PEER_DEVICE_MST_BRANCHING:
> > >  		mstb = port->mstb;
> > >  		port->mstb = NULL;
> > > -		drm_dp_put_mst_branch_device(mstb);
> > > +		drm_dp_mst_topology_put_mstb(mstb);
> > >  		break;
> > >  	}
> > >  }
> > >  
> > >  static void drm_dp_destroy_port(struct kref *kref)
> > >  {
> > > -	struct drm_dp_mst_port *port = container_of(kref, struct
> > > drm_dp_mst_port, kref);
> > > +	struct drm_dp_mst_port *port =
> > > +		container_of(kref, struct drm_dp_mst_port, topology_kref);
> > >  	struct drm_dp_mst_topology_mgr *mgr = port->mgr;
> > >  
> > >  	if (!port->input) {
> > > @@ -956,7 +1099,6 @@ static void drm_dp_destroy_port(struct kref *kref)
> > >  			 * from an EDID retrieval */
> > >  
> > >  			mutex_lock(&mgr->destroy_connector_lock);
> > > -			kref_get(&port->parent->kref);
> > >  			list_add(&port->next, &mgr->destroy_connector_list);
> > >  			mutex_unlock(&mgr->destroy_connector_lock);
> > >  			schedule_work(&mgr->destroy_connector_work);
> > > @@ -967,25 +1109,93 @@ static void drm_dp_destroy_port(struct kref *kref)
> > >  		drm_dp_port_teardown_pdt(port, port->pdt);
> > >  		port->pdt = DP_PEER_DEVICE_NONE;
> > >  	}
> > > -	kfree(port);
> > > +	drm_dp_mst_put_port_malloc(port);
> > >  }
> > >  
> > > -static void drm_dp_put_port(struct drm_dp_mst_port *port)
> > > +/**
> > > + * drm_dp_mst_topology_get_port() - Increment the topology refcount of a
> > > + * port unless its zero
> > > + * @port: &struct drm_dp_mst_port to increment the topology refcount of
> > > + *
> > > + * Attempts to grab a topology reference to @port, if it hasn't yet been
> > > + * removed from the topology (e.g. @port.topology_kref has reached 0).
> > > + *
> > > + * Any topology references acquired with this function must be released
> > > when
> > > + * they are no longer being used by calling
> > > drm_dp_mst_topology_put_port().
> > > + *
> > > + * See also:
> > > + * drm_dp_mst_topology_ref_port()
> > > + * drm_dp_mst_topology_put_port()
> > > + *
> > > + * Returns:
> > > + * * 1: A topology reference was grabbed successfully
> > > + * * 0: @port is no longer in the topology, no reference was grabbed
> > > + */
> > > +static int __must_check
> > > +drm_dp_mst_topology_get_port(struct drm_dp_mst_port *port)
> > >  {
> > > -	kref_put(&port->kref, drm_dp_destroy_port);
> > > +	int ret = kref_get_unless_zero(&port->topology_kref);
> > > +
> > > +	if (ret)
> > > +		DRM_DEBUG("port %p (%d)\n", port,
> > > +			  kref_read(&port->topology_kref));
> > > +
> > > +	return ret;
> > >  }
> > >  
> > > -static struct drm_dp_mst_branch
> > > *drm_dp_mst_get_validated_mstb_ref_locked(struct drm_dp_mst_branch *mstb,
> > > struct drm_dp_mst_branch *to_find)
> > > +/**
> > > + * drm_dp_mst_topology_ref_port() - Increment the topology refcount of a
> > > port
> > > + * @port: The &struct drm_dp_mst_port to increment the topology refcount
> > > of
> > > + *
> > > + * Increments @port.topology_refcount without checking whether or not
> > > it's
> > > + * already reached 0. This is only valid to use in scenarios where you
> > > are
> > > + * already guaranteed to have at least one active topology reference to
> > > @port.
> > > + * Otherwise, drm_dp_mst_topology_get_port() should be used.
> > > + *
> > > + * Any topology references acquired with this function must be released
> > > when
> > > + * they are no longer being used by calling
> > > drm_dp_mst_topology_put_port().
> > > + *
> > > + * See also:
> > > + * drm_dp_mst_topology_get_port()
> > > + * drm_dp_mst_topology_put_port()
> > > + */
> > > +static void drm_dp_mst_topology_ref_port(struct drm_dp_mst_port *port)
> > > +{
> > > +	kref_get(&port->topology_kref);
> > > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->topology_kref));
> > > +}
> > > +
> > > +/**
> > > + * drm_dp_mst_topology_put_port() - release a topology reference to a
> > > port
> > > + * @port: The &struct drm_dp_mst_port to release the topology reference
> > > from
> > > + *
> > > + * Releases a topology reference from @port by decrementing
> > > + * @port.topology_kref.
> > > + *
> > > + * See also:
> > > + * drm_dp_mst_topology_get_port()
> > > + * drm_dp_mst_topology_ref_port()
> > > + */
> > > +static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port)
> > > +{
> > > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->topology_kref)-1);
> > > +	kref_put(&port->topology_kref, drm_dp_destroy_port);
> > > +}
> > > +
> > > +static struct drm_dp_mst_branch *
> > > +drm_dp_mst_topology_get_mstb_validated_locked(struct drm_dp_mst_branch
> > > *mstb,
> > > +					      struct drm_dp_mst_branch
> > > *to_find)
> > >  {
> > >  	struct drm_dp_mst_port *port;
> > >  	struct drm_dp_mst_branch *rmstb;
> > > -	if (to_find == mstb) {
> > > -		kref_get(&mstb->kref);
> > > +
> > > +	if (to_find == mstb)
> > >  		return mstb;
> > > -	}
> > > +
> > >  	list_for_each_entry(port, &mstb->ports, next) {
> > >  		if (port->mstb) {
> > > -			rmstb = drm_dp_mst_get_validated_mstb_ref_locked(port-
> > > >mstb, to_find);
> > > +			rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
> > 
> > I think a prep patch which just renames the current get_validated/put
> > functions to the new names would be really good. Then this patch here with
> > the new stuff.
> > 
> > 
> > > +			    port->mstb, to_find);
> > >  			if (rmstb)
> > >  				return rmstb;
> > >  		}
> > > @@ -993,27 +1203,37 @@ static struct drm_dp_mst_branch
> > > *drm_dp_mst_get_validated_mstb_ref_locked(struct
> > >  	return NULL;
> > >  }
> > >  
> > > -static struct drm_dp_mst_branch *drm_dp_get_validated_mstb_ref(struct
> > > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_branch *mstb)
> > > +static struct drm_dp_mst_branch *
> > > +drm_dp_mst_topology_get_mstb_validated(struct drm_dp_mst_topology_mgr
> > > *mgr,
> > > +				       struct drm_dp_mst_branch *mstb)
> > >  {
> > >  	struct drm_dp_mst_branch *rmstb = NULL;
> > > +
> > >  	mutex_lock(&mgr->lock);
> > > -	if (mgr->mst_primary)
> > > -		rmstb = drm_dp_mst_get_validated_mstb_ref_locked(mgr-
> > > >mst_primary, mstb);
> > > +	if (mgr->mst_primary) {
> > > +		rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
> > > +		    mgr->mst_primary, mstb);
> > > +
> > > +		if (rmstb && !drm_dp_mst_topology_get_mstb(rmstb))
> > > +			rmstb = NULL;
> > > +	}
> > >  	mutex_unlock(&mgr->lock);
> > >  	return rmstb;
> > >  }
> > >  
> > > -static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct
> > > drm_dp_mst_branch *mstb, struct drm_dp_mst_port *to_find)
> > > +static struct drm_dp_mst_port *
> > > +drm_dp_mst_topology_get_port_validated_locked(struct drm_dp_mst_branch
> > > *mstb,
> > > +					      struct drm_dp_mst_port *to_find)
> > >  {
> > >  	struct drm_dp_mst_port *port, *mport;
> > >  
> > >  	list_for_each_entry(port, &mstb->ports, next) {
> > > -		if (port == to_find) {
> > > -			kref_get(&port->kref);
> > > +		if (port == to_find)
> > >  			return port;
> > > -		}
> > > +
> > >  		if (port->mstb) {
> > > -			mport = drm_dp_mst_get_port_ref_locked(port->mstb,
> > > to_find);
> > > +			mport = drm_dp_mst_topology_get_port_validated_locked(
> > > +			    port->mstb, to_find);
> > >  			if (mport)
> > >  				return mport;
> > >  		}
> > > @@ -1021,12 +1241,20 @@ static struct drm_dp_mst_port
> > > *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
> > >  	return NULL;
> > >  }
> > >  
> > > -static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct
> > > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
> > > +static struct drm_dp_mst_port *
> > > +drm_dp_mst_topology_get_port_validated(struct drm_dp_mst_topology_mgr
> > > *mgr,
> > > +				       struct drm_dp_mst_port *port)
> > >  {
> > >  	struct drm_dp_mst_port *rport = NULL;
> > > +
> > >  	mutex_lock(&mgr->lock);
> > > -	if (mgr->mst_primary)
> > > -		rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
> > > port);
> > > +	if (mgr->mst_primary) {
> > > +		rport = drm_dp_mst_topology_get_port_validated_locked(
> > > +		    mgr->mst_primary, port);
> > > +
> > > +		if (rport && !drm_dp_mst_topology_get_port(rport))
> > > +			rport = NULL;
> > > +	}
> > >  	mutex_unlock(&mgr->lock);
> > >  	return rport;
> > >  }
> > > @@ -1034,11 +1262,12 @@ static struct drm_dp_mst_port
> > > *drm_dp_get_validated_port_ref(struct drm_dp_mst_t
> > >  static struct drm_dp_mst_port *drm_dp_get_port(struct drm_dp_mst_branch
> > > *mstb, u8 port_num)
> > >  {
> > >  	struct drm_dp_mst_port *port;
> > > +	int ret;
> > >  
> > >  	list_for_each_entry(port, &mstb->ports, next) {
> > >  		if (port->port_num == port_num) {
> > > -			kref_get(&port->kref);
> > > -			return port;
> > > +			ret = drm_dp_mst_topology_get_port(port);
> > > +			return ret ? port : NULL;
> > >  		}
> > >  	}
> > >  
> > > @@ -1087,6 +1316,11 @@ static bool drm_dp_port_setup_pdt(struct
> > > drm_dp_mst_port *port)
> > >  		if (port->mstb) {
> > >  			port->mstb->mgr = port->mgr;
> > >  			port->mstb->port_parent = port;
> > > +			/*
> > > +			 * Make sure this port's memory allocation stays
> > > +			 * around until it's child MSTB releases it
> > > +			 */
> > > +			drm_dp_mst_get_port_malloc(port);
> > >  
> > >  			send_link = true;
> > >  		}
> > > @@ -1147,17 +1381,26 @@ static void drm_dp_add_port(struct
> > > drm_dp_mst_branch *mstb,
> > >  	bool created = false;
> > >  	int old_pdt = 0;
> > >  	int old_ddps = 0;
> > > +
> > >  	port = drm_dp_get_port(mstb, port_msg->port_number);
> > >  	if (!port) {
> > >  		port = kzalloc(sizeof(*port), GFP_KERNEL);
> > >  		if (!port)
> > >  			return;
> > > -		kref_init(&port->kref);
> > > +		kref_init(&port->topology_kref);
> > > +		kref_init(&port->malloc_kref);
> > >  		port->parent = mstb;
> > >  		port->port_num = port_msg->port_number;
> > >  		port->mgr = mstb->mgr;
> > >  		port->aux.name = "DPMST";
> > >  		port->aux.dev = dev->dev;
> > > +
> > > +		/*
> > > +		 * Make sure the memory allocation for our parent branch stays
> > > +		 * around until our own memory allocation is released
> > > +		 */
> > > +		drm_dp_mst_get_mstb_malloc(mstb);
> > > +
> > >  		created = true;
> > >  	} else {
> > >  		old_pdt = port->pdt;
> > > @@ -1177,7 +1420,7 @@ static void drm_dp_add_port(struct drm_dp_mst_branch
> > > *mstb,
> > >  	   for this list */
> > >  	if (created) {
> > >  		mutex_lock(&mstb->mgr->lock);
> > > -		kref_get(&port->kref);
> > > +		drm_dp_mst_topology_ref_port(port);
> > >  		list_add(&port->next, &mstb->ports);
> > >  		mutex_unlock(&mstb->mgr->lock);
> > >  	}
> > > @@ -1202,17 +1445,21 @@ static void drm_dp_add_port(struct
> > > drm_dp_mst_branch *mstb,
> > >  	if (created && !port->input) {
> > >  		char proppath[255];
> > >  
> > > -		build_mst_prop_path(mstb, port->port_num, proppath,
> > > sizeof(proppath));
> > > -		port->connector = (*mstb->mgr->cbs->add_connector)(mstb->mgr,
> > > port, proppath);
> > > +		build_mst_prop_path(mstb, port->port_num, proppath,
> > > +				    sizeof(proppath));
> > > +		port->connector = (*mstb->mgr->cbs->add_connector)(mstb->mgr,
> > > +								   port,
> > > +								   proppath);
> > >  		if (!port->connector) {
> > >  			/* remove it from the port list */
> > >  			mutex_lock(&mstb->mgr->lock);
> > >  			list_del(&port->next);
> > >  			mutex_unlock(&mstb->mgr->lock);
> > >  			/* drop port list reference */
> > > -			drm_dp_put_port(port);
> > > +			drm_dp_mst_topology_put_port(port);
> > >  			goto out;
> > >  		}
> > > +
> > >  		if ((port->pdt == DP_PEER_DEVICE_DP_LEGACY_CONV ||
> > >  		     port->pdt == DP_PEER_DEVICE_SST_SINK) &&
> > >  		    port->port_num >= DP_MST_LOGICAL_PORT_0) {
> > > @@ -1224,7 +1471,7 @@ static void drm_dp_add_port(struct drm_dp_mst_branch
> > > *mstb,
> > >  
> > >  out:
> > >  	/* put reference to this port */
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  }
> > >  
> > >  static void drm_dp_update_port(struct drm_dp_mst_branch *mstb,
> > > @@ -1259,7 +1506,7 @@ static void drm_dp_update_port(struct
> > > drm_dp_mst_branch *mstb,
> > >  			dowork = true;
> > >  	}
> > >  
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  	if (dowork)
> > >  		queue_work(system_long_wq, &mstb->mgr->work);
> > >  
> > > @@ -1270,7 +1517,7 @@ static struct drm_dp_mst_branch
> > > *drm_dp_get_mst_branch_device(struct drm_dp_mst_
> > >  {
> > >  	struct drm_dp_mst_branch *mstb;
> > >  	struct drm_dp_mst_port *port;
> > > -	int i;
> > > +	int i, ret;
> > >  	/* find the port by iterating down */
> > >  
> > >  	mutex_lock(&mgr->lock);
> > > @@ -1295,7 +1542,9 @@ static struct drm_dp_mst_branch
> > > *drm_dp_get_mst_branch_device(struct drm_dp_mst_
> > >  			}
> > >  		}
> > >  	}
> > > -	kref_get(&mstb->kref);
> > > +	ret = drm_dp_mst_topology_get_mstb(mstb);
> > > +	if (!ret)
> > > +		mstb = NULL;
> > >  out:
> > >  	mutex_unlock(&mgr->lock);
> > >  	return mstb;
> > > @@ -1325,19 +1574,22 @@ static struct drm_dp_mst_branch
> > > *get_mst_branch_device_by_guid_helper(
> > >  	return NULL;
> > >  }
> > >  
> > > -static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device_by_guid(
> > > -	struct drm_dp_mst_topology_mgr *mgr,
> > > -	uint8_t *guid)
> > > +static struct drm_dp_mst_branch *
> > > +drm_dp_get_mst_branch_device_by_guid(struct drm_dp_mst_topology_mgr *mgr,
> > > +				     uint8_t *guid)
> > >  {
> > >  	struct drm_dp_mst_branch *mstb;
> > > +	int ret;
> > >  
> > >  	/* find the port by iterating down */
> > >  	mutex_lock(&mgr->lock);
> > >  
> > >  	mstb = get_mst_branch_device_by_guid_helper(mgr->mst_primary, guid);
> > > -
> > > -	if (mstb)
> > > -		kref_get(&mstb->kref);
> > > +	if (mstb) {
> > > +		ret = drm_dp_mst_topology_get_mstb(mstb);
> > > +		if (!ret)
> > > +			mstb = NULL;
> > > +	}
> > >  
> > >  	mutex_unlock(&mgr->lock);
> > >  	return mstb;
> > > @@ -1362,10 +1614,10 @@ static void
> > > drm_dp_check_and_send_link_address(struct drm_dp_mst_topology_mgr *m
> > >  			drm_dp_send_enum_path_resources(mgr, mstb, port);
> > >  
> > >  		if (port->mstb) {
> > > -			mstb_child = drm_dp_get_validated_mstb_ref(mgr, port-
> > > >mstb);
> > > +			mstb_child =
> > > drm_dp_mst_topology_get_mstb_validated(mgr, port->mstb);
> > >  			if (mstb_child) {
> > >  				drm_dp_check_and_send_link_address(mgr,
> > > mstb_child);
> > > -				drm_dp_put_mst_branch_device(mstb_child);
> > > +				drm_dp_mst_topology_put_mstb(mstb_child);
> > >  			}
> > >  		}
> > >  	}
> > > @@ -1375,16 +1627,19 @@ static void drm_dp_mst_link_probe_work(struct
> > > work_struct *work)
> > >  {
> > >  	struct drm_dp_mst_topology_mgr *mgr = container_of(work, struct
> > > drm_dp_mst_topology_mgr, work);
> > >  	struct drm_dp_mst_branch *mstb;
> > > +	int ret;
> > >  
> > >  	mutex_lock(&mgr->lock);
> > >  	mstb = mgr->mst_primary;
> > >  	if (mstb) {
> > > -		kref_get(&mstb->kref);
> > > +		ret = drm_dp_mst_topology_get_mstb(mstb);
> > > +		if (!ret)
> > > +			mstb = NULL;
> > >  	}
> > >  	mutex_unlock(&mgr->lock);
> > >  	if (mstb) {
> > >  		drm_dp_check_and_send_link_address(mgr, mstb);
> > > -		drm_dp_put_mst_branch_device(mstb);
> > > +		drm_dp_mst_topology_put_mstb(mstb);
> > >  	}
> > >  }
> > >  
> > > @@ -1695,22 +1950,32 @@ static struct drm_dp_mst_port
> > > *drm_dp_get_last_connected_port_to_mstb(struct drm
> > >  	return drm_dp_get_last_connected_port_to_mstb(mstb->port_parent-
> > > >parent);
> > >  }
> > >  
> > > -static struct drm_dp_mst_branch
> > > *drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr
> > > *mgr,
> > > -									 struc
> > > t drm_dp_mst_branch *mstb,
> > > -									 int
> > > *port_num)
> > > +static struct drm_dp_mst_branch *
> > > +drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr
> > > *mgr,
> > > +					struct drm_dp_mst_branch *mstb,
> > > +					int *port_num)
> > >  {
> > >  	struct drm_dp_mst_branch *rmstb = NULL;
> > >  	struct drm_dp_mst_port *found_port;
> > > +
> > >  	mutex_lock(&mgr->lock);
> > > -	if (mgr->mst_primary) {
> > > +	if (!mgr->mst_primary)
> > > +		goto out;
> > > +
> > > +	do {
> > >  		found_port = drm_dp_get_last_connected_port_to_mstb(mstb);
> > > +		if (!found_port)
> > > +			break;
> > >  
> > > -		if (found_port) {
> > > +		if (drm_dp_mst_topology_get_mstb(found_port->parent)) {
> > >  			rmstb = found_port->parent;
> > > -			kref_get(&rmstb->kref);
> > >  			*port_num = found_port->port_num;
> > > +		} else {
> > > +			/* Search again, starting from this parent */
> > > +			mstb = found_port->parent;
> > >  		}
> > > -	}
> > > +	} while (!rmstb);
> > 
> > Hm, is this a bugfix of validating the entire chain? Afaiui the new
> > topology_get still validates the entire chain, so I'm a bit confused what
> > this does here.
> 
> JFYI: I'm assuming you meant the old get_validated() functions. I mentioned in
> the cover letter for this series that I wasn't sure if we still needed them,
> but on closer inspection I think we still do since they perform the actual
> validation of the whole topology chain. drm_dp_mst_topology_get_(port|mstb)()
> just increments the topology refcount safely.

Yeah I mixed this up with the old get_validated, but this is only used in
payload_send_mgs.

> The change you're seeing here is because since we didn't use
> kref_get_unless_zero() before, we'd just go up the topology path above mstb(),
> then kref the first thing we find that we think is still connected to the
> topology (I honestly don't know how/if this ever worked), then give it a kref
> and return it. Now that we use kref_get_unless_zero(), we have to deal with
> the fact that the kref could fail, which would happen if we just retrieved a
> parent mstb or port that is also disconnected from the topology. So, the only
> way to do that is to find what we think is the last connected mstb, check if
> it actually is, then restart the search from that mstb if the kref failed and
> it's not connected to the topology.
> 
> That being said, I've been wondering about figuring out spots like this where
> we probably also need to follow that up with "also make sure all of the
> parents of this 'connected' topology device are also valid", since it's quite
> possible we could run into a scenario like this:
> 
> Step 1:
> MSTB 1
>   |- Port 1
>   |- Port 2
>   |- Port 3
>      |- MSTB 2 ← (just unplugged, top refcount == 0)
>         |- Port 4 ← (also unplugged, but top refcount not updated yet)
>         |- Port 5 ← (same thing ^)
>         |- Port 6 ← (same thing ^)
>            |- MSTB 3
>               |- Port 7
>               |- Port 8
>               |- Port 9
>                  ^
>       drm_dp_get_last_connected_port_to_mstb()
>       travels up to Port 6, assumes Port 6 is valid because it's top refcount
>       != 0
> 
> Now that I type all of that out though, I think we could also fix that fairly
> easily by instead just adding a topology_state mutex, and adding a variable to
> denote whether or not a port is actually still part of a topology.
> 
> Maybe that also means we should come up with a different name for
> topology_refcount, resource_refcount maybe?

Hm. Fixing refcounting/lifetime issues with locking is ime a bad idea.
I'm also not sure we fundamentally need to fix this, since even if we make
the internal representation race-free, the real world can still get
unplugged whenever it feels like. So fundamentally the payload_send_msg
function needs to be able to deal with races.

So not sure whether (or why) we need your change here.

Can we instead just fail if we've raced and give up? Not sure why we need
to send out the message still if the recipient is gone ...
-Daniel

> > 
> 
> > > +out:
> > >  	mutex_unlock(&mgr->lock);
> > >  	return rmstb;
> > >  }
> > > @@ -1726,17 +1991,19 @@ static int drm_dp_payload_send_msg(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  	u8 sinks[DRM_DP_MAX_SDP_STREAMS];
> > >  	int i;
> > >  
> > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > >  	if (!port)
> > >  		return -EINVAL;
> > >  
> > >  	port_num = port->port_num;
> > > -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> > > +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
> > >  	if (!mstb) {
> > > -		mstb = drm_dp_get_last_connected_port_and_mstb(mgr, port-
> > > >parent, &port_num);
> > > +		mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> > > +							       port->parent,
> > > +							       &port_num);
> > >  
> > >  		if (!mstb) {
> > > -			drm_dp_put_port(port);
> > > +			drm_dp_mst_topology_put_port(port);
> > >  			return -EINVAL;
> > >  		}
> > >  	}
> > > @@ -1766,8 +2033,8 @@ static int drm_dp_payload_send_msg(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  	}
> > >  	kfree(txmsg);
> > >  fail_put:
> > > -	drm_dp_put_mst_branch_device(mstb);
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_mstb(mstb);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  	return ret;
> > >  }
> > >  
> > > @@ -1777,13 +2044,13 @@ int drm_dp_send_power_updown_phy(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  	struct drm_dp_sideband_msg_tx *txmsg;
> > >  	int len, ret;
> > >  
> > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > >  	if (!port)
> > >  		return -EINVAL;
> > >  
> > >  	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> > >  	if (!txmsg) {
> > > -		drm_dp_put_port(port);
> > > +		drm_dp_mst_topology_put_port(port);
> > >  		return -ENOMEM;
> > >  	}
> > >  
> > > @@ -1799,7 +2066,7 @@ int drm_dp_send_power_updown_phy(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  			ret = 0;
> > >  	}
> > >  	kfree(txmsg);
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  
> > >  	return ret;
> > >  }
> > > @@ -1888,7 +2155,8 @@ int drm_dp_update_payload_part1(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > >  		if (vcpi) {
> > >  			port = container_of(vcpi, struct drm_dp_mst_port,
> > >  					    vcpi);
> > > -			port = drm_dp_get_validated_port_ref(mgr, port);
> > > +			port = drm_dp_mst_topology_get_port_validated(mgr,
> > > +								      port);
> > >  			if (!port) {
> > >  				mutex_unlock(&mgr->payload_lock);
> > >  				return -EINVAL;
> > > @@ -1925,7 +2193,7 @@ int drm_dp_update_payload_part1(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > >  		cur_slots += req_payload.num_slots;
> > >  
> > >  		if (port)
> > > -			drm_dp_put_port(port);
> > > +			drm_dp_mst_topology_put_port(port);
> > >  	}
> > >  
> > >  	for (i = 0; i < mgr->max_payloads; i++) {
> > > @@ -2024,7 +2292,7 @@ static int drm_dp_send_dpcd_write(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  	struct drm_dp_sideband_msg_tx *txmsg;
> > >  	struct drm_dp_mst_branch *mstb;
> > >  
> > > -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> > > +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
> > >  	if (!mstb)
> > >  		return -EINVAL;
> > >  
> > > @@ -2048,7 +2316,7 @@ static int drm_dp_send_dpcd_write(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  	}
> > >  	kfree(txmsg);
> > >  fail_put:
> > > -	drm_dp_put_mst_branch_device(mstb);
> > > +	drm_dp_mst_topology_put_mstb(mstb);
> > >  	return ret;
> > >  }
> > >  
> > > @@ -2158,7 +2426,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct
> > > drm_dp_mst_topology_mgr *mgr, bool ms
> > >  
> > >  		/* give this the main reference */
> > >  		mgr->mst_primary = mstb;
> > > -		kref_get(&mgr->mst_primary->kref);
> > > +		drm_dp_mst_topology_ref_mstb(mgr->mst_primary);
> > >  
> > >  		ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
> > >  							 DP_MST_EN |
> > > DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC);
> > > @@ -2192,7 +2460,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct
> > > drm_dp_mst_topology_mgr *mgr, bool ms
> > >  out_unlock:
> > >  	mutex_unlock(&mgr->lock);
> > >  	if (mstb)
> > > -		drm_dp_put_mst_branch_device(mstb);
> > > +		drm_dp_mst_topology_put_mstb(mstb);
> > >  	return ret;
> > >  
> > >  }
> > > @@ -2357,7 +2625,7 @@ static int drm_dp_mst_handle_down_rep(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > >  			       mgr->down_rep_recv.initial_hdr.lct,
> > >  				      mgr->down_rep_recv.initial_hdr.rad[0],
> > >  				      mgr->down_rep_recv.msg[0]);
> > > -			drm_dp_put_mst_branch_device(mstb);
> > > +			drm_dp_mst_topology_put_mstb(mstb);
> > >  			memset(&mgr->down_rep_recv, 0, sizeof(struct
> > > drm_dp_sideband_msg_rx));
> > >  			return 0;
> > >  		}
> > > @@ -2368,7 +2636,7 @@ static int drm_dp_mst_handle_down_rep(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > >  		}
> > >  
> > >  		memset(&mgr->down_rep_recv, 0, sizeof(struct
> > > drm_dp_sideband_msg_rx));
> > > -		drm_dp_put_mst_branch_device(mstb);
> > > +		drm_dp_mst_topology_put_mstb(mstb);
> > >  
> > >  		mutex_lock(&mgr->qlock);
> > >  		txmsg->state = DRM_DP_SIDEBAND_TX_RX;
> > > @@ -2441,7 +2709,7 @@ static int drm_dp_mst_handle_up_req(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > >  		}
> > >  
> > >  		if (mstb)
> > > -			drm_dp_put_mst_branch_device(mstb);
> > > +			drm_dp_mst_topology_put_mstb(mstb);
> > >  
> > >  		memset(&mgr->up_req_recv, 0, sizeof(struct
> > > drm_dp_sideband_msg_rx));
> > >  	}
> > > @@ -2501,7 +2769,7 @@ enum drm_connector_status
> > > drm_dp_mst_detect_port(struct drm_connector *connector
> > >  	enum drm_connector_status status = connector_status_disconnected;
> > >  
> > >  	/* we need to search for the port in the mgr in case its gone */
> > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > >  	if (!port)
> > >  		return connector_status_disconnected;
> > >  
> > > @@ -2526,7 +2794,7 @@ enum drm_connector_status
> > > drm_dp_mst_detect_port(struct drm_connector *connector
> > >  		break;
> > >  	}
> > >  out:
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  	return status;
> > >  }
> > >  EXPORT_SYMBOL(drm_dp_mst_detect_port);
> > > @@ -2543,11 +2811,11 @@ bool drm_dp_mst_port_has_audio(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  {
> > >  	bool ret = false;
> > >  
> > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > >  	if (!port)
> > >  		return ret;
> > >  	ret = port->has_audio;
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  	return ret;
> > >  }
> > >  EXPORT_SYMBOL(drm_dp_mst_port_has_audio);
> > > @@ -2567,7 +2835,7 @@ struct edid *drm_dp_mst_get_edid(struct
> > > drm_connector *connector, struct drm_dp_
> > >  	struct edid *edid = NULL;
> > >  
> > >  	/* we need to search for the port in the mgr in case its gone */
> > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > >  	if (!port)
> > >  		return NULL;
> > >  
> > > @@ -2578,7 +2846,7 @@ struct edid *drm_dp_mst_get_edid(struct
> > > drm_connector *connector, struct drm_dp_
> > >  		drm_connector_set_tile_property(connector);
> > >  	}
> > >  	port->has_audio = drm_detect_monitor_audio(edid);
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  	return edid;
> > >  }
> > >  EXPORT_SYMBOL(drm_dp_mst_get_edid);
> > > @@ -2649,7 +2917,7 @@ int drm_dp_atomic_find_vcpi_slots(struct
> > > drm_atomic_state *state,
> > >  	if (IS_ERR(topology_state))
> > >  		return PTR_ERR(topology_state);
> > >  
> > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > >  	if (port == NULL)
> > >  		return -EINVAL;
> > >  	req_slots = DIV_ROUND_UP(pbn, mgr->pbn_div);
> > > @@ -2657,14 +2925,14 @@ int drm_dp_atomic_find_vcpi_slots(struct
> > > drm_atomic_state *state,
> > >  			req_slots, topology_state->avail_slots);
> > >  
> > >  	if (req_slots > topology_state->avail_slots) {
> > > -		drm_dp_put_port(port);
> > > +		drm_dp_mst_topology_put_port(port);
> > >  		return -ENOSPC;
> > >  	}
> > >  
> > >  	topology_state->avail_slots -= req_slots;
> > >  	DRM_DEBUG_KMS("vcpi slots avail=%d", topology_state->avail_slots);
> > >  
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  	return req_slots;
> > >  }
> > >  EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
> > > @@ -2715,7 +2983,7 @@ bool drm_dp_mst_allocate_vcpi(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  {
> > >  	int ret;
> > >  
> > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > >  	if (!port)
> > >  		return false;
> > >  
> > > @@ -2725,7 +2993,7 @@ bool drm_dp_mst_allocate_vcpi(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  	if (port->vcpi.vcpi > 0) {
> > >  		DRM_DEBUG_KMS("payload: vcpi %d already allocated for pbn %d -
> > > requested pbn %d\n", port->vcpi.vcpi, port->vcpi.pbn, pbn);
> > >  		if (pbn == port->vcpi.pbn) {
> > > -			drm_dp_put_port(port);
> > > +			drm_dp_mst_topology_put_port(port);
> > >  			return true;
> > >  		}
> > >  	}
> > > @@ -2733,13 +3001,13 @@ bool drm_dp_mst_allocate_vcpi(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  	ret = drm_dp_init_vcpi(mgr, &port->vcpi, pbn, slots);
> > >  	if (ret) {
> > >  		DRM_DEBUG_KMS("failed to init vcpi slots=%d max=63 ret=%d\n",
> > > -				DIV_ROUND_UP(pbn, mgr->pbn_div), ret);
> > > +			      DIV_ROUND_UP(pbn, mgr->pbn_div), ret);
> > >  		goto out;
> > >  	}
> > >  	DRM_DEBUG_KMS("initing vcpi for pbn=%d slots=%d\n",
> > > -			pbn, port->vcpi.num_slots);
> > > +		      pbn, port->vcpi.num_slots);
> > >  
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  	return true;
> > >  out:
> > >  	return false;
> > > @@ -2749,12 +3017,12 @@ EXPORT_SYMBOL(drm_dp_mst_allocate_vcpi);
> > >  int drm_dp_mst_get_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr, struct
> > > drm_dp_mst_port *port)
> > >  {
> > >  	int slots = 0;
> > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > >  	if (!port)
> > >  		return slots;
> > >  
> > >  	slots = port->vcpi.num_slots;
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  	return slots;
> > >  }
> > >  EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
> > > @@ -2768,11 +3036,11 @@ EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
> > >   */
> > >  void drm_dp_mst_reset_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr,
> > > struct drm_dp_mst_port *port)
> > >  {
> > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > >  	if (!port)
> > >  		return;
> > >  	port->vcpi.num_slots = 0;
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  }
> > >  EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> > >  
> > > @@ -2781,9 +3049,10 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> > >   * @mgr: manager for this port
> > >   * @port: unverified port to deallocate vcpi for
> > >   */
> > > -void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
> > > struct drm_dp_mst_port *port)
> > > +void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
> > > +				struct drm_dp_mst_port *port)
> > >  {
> > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > >  	if (!port)
> > >  		return;
> > >  
> > > @@ -2792,7 +3061,7 @@ void drm_dp_mst_deallocate_vcpi(struct
> > > drm_dp_mst_topology_mgr *mgr, struct drm_
> > >  	port->vcpi.pbn = 0;
> > >  	port->vcpi.aligned_pbn = 0;
> > >  	port->vcpi.vcpi = 0;
> > > -	drm_dp_put_port(port);
> > > +	drm_dp_mst_topology_put_port(port);
> > >  }
> > >  EXPORT_SYMBOL(drm_dp_mst_deallocate_vcpi);
> > >  
> > > @@ -3078,8 +3347,10 @@ static void drm_dp_tx_work(struct work_struct
> > > *work)
> > >  
> > >  static void drm_dp_free_mst_port(struct kref *kref)
> > >  {
> > > -	struct drm_dp_mst_port *port = container_of(kref, struct
> > > drm_dp_mst_port, kref);
> > > -	kref_put(&port->parent->kref, drm_dp_free_mst_branch_device);
> > > +	struct drm_dp_mst_port *port =
> > > +		container_of(kref, struct drm_dp_mst_port, malloc_kref);
> > > +
> > > +	drm_dp_mst_put_mstb_malloc(port->parent);
> > >  	kfree(port);
> > >  }
> > >  
> > > @@ -3103,7 +3374,6 @@ static void drm_dp_destroy_connector_work(struct
> > > work_struct *work)
> > >  		list_del(&port->next);
> > >  		mutex_unlock(&mgr->destroy_connector_lock);
> > >  
> > > -		kref_init(&port->kref);
> > >  		INIT_LIST_HEAD(&port->next);
> > >  
> > >  		mgr->cbs->destroy_connector(mgr, port->connector);
> > > @@ -3117,7 +3387,7 @@ static void drm_dp_destroy_connector_work(struct
> > > work_struct *work)
> > >  			drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
> > >  		}
> > >  
> > > -		kref_put(&port->kref, drm_dp_free_mst_port);
> > > +		drm_dp_mst_put_port_malloc(port);
> > >  		send_hotplug = true;
> > >  	}
> > >  	if (send_hotplug)
> > > @@ -3292,7 +3562,7 @@ static int drm_dp_mst_i2c_xfer(struct i2c_adapter
> > > *adapter, struct i2c_msg *msgs
> > >  	struct drm_dp_sideband_msg_tx *txmsg = NULL;
> > >  	int ret;
> > >  
> > > -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> > > +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
> > >  	if (!mstb)
> > >  		return -EREMOTEIO;
> > >  
> > > @@ -3342,7 +3612,7 @@ static int drm_dp_mst_i2c_xfer(struct i2c_adapter
> > > *adapter, struct i2c_msg *msgs
> > >  	}
> > >  out:
> > >  	kfree(txmsg);
> > > -	drm_dp_put_mst_branch_device(mstb);
> > > +	drm_dp_mst_topology_put_mstb(mstb);
> > >  	return ret;
> > >  }
> > >  
> > > diff --git a/include/drm/drm_dp_mst_helper.h
> > > b/include/drm/drm_dp_mst_helper.h
> > > index 371cc2816477..50643a39765d 100644
> > > --- a/include/drm/drm_dp_mst_helper.h
> > > +++ b/include/drm/drm_dp_mst_helper.h
> > > @@ -44,7 +44,10 @@ struct drm_dp_vcpi {
> > >  
> > >  /**
> > >   * struct drm_dp_mst_port - MST port
> > > - * @kref: reference count for this port.
> > > + * @topology_kref: refcount for this port's lifetime in the topology,
> > > only the
> > > + * DP MST helpers should need to touch this
> > > + * @malloc_kref: refcount for the memory allocation containing this
> > > structure.
> > > + * See drm_dp_mst_get_port_malloc() and drm_dp_mst_put_port_malloc().
> > >   * @port_num: port number
> > >   * @input: if this port is an input port.
> > >   * @mcs: message capability status - DP 1.2 spec.
> > > @@ -67,7 +70,8 @@ struct drm_dp_vcpi {
> > >   * in the MST topology.
> > >   */
> > >  struct drm_dp_mst_port {
> > > -	struct kref kref;
> > > +	struct kref topology_kref;
> > > +	struct kref malloc_kref;
> > 
> > I'd to inline member kerneldoc here (you can mix&match, so no need to
> > rewrite them all) and spend a few words reference the family of get/put
> > functions. Same for mstb below.
> > 
> > >  
> > >  	u8 port_num;
> > >  	bool input;
> > > @@ -102,7 +106,10 @@ struct drm_dp_mst_port {
> > >  
> > >  /**
> > >   * struct drm_dp_mst_branch - MST branch device.
> > > - * @kref: reference count for this port.
> > > + * @topology_kref: refcount for this branch device's lifetime in the
> > > topology,
> > > + * only the DP MST helpers should need to touch this
> > > + * @malloc_kref: refcount for the memory allocation containing this
> > > structure.
> > > + * See drm_dp_mst_get_mstb_malloc() and drm_dp_mst_put_mstb_malloc().
> > >   * @rad: Relative Address to talk to this branch device.
> > >   * @lct: Link count total to talk to this branch device.
> > >   * @num_ports: number of ports on the branch.
> > > @@ -121,7 +128,8 @@ struct drm_dp_mst_port {
> > >   * to downstream port of parent branches.
> > >   */
> > >  struct drm_dp_mst_branch {
> > > -	struct kref kref;
> > > +	struct kref topology_kref;
> > > +	struct kref malloc_kref;
> > >  	u8 rad[8];
> > >  	u8 lct;
> > >  	int num_ports;
> > > @@ -626,4 +634,7 @@ int drm_dp_atomic_release_vcpi_slots(struct
> > > drm_atomic_state *state,
> > >  int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
> > >  				 struct drm_dp_mst_port *port, bool power_up);
> > >  
> > > +void drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port);
> > > +void drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port);
> > > +
> > >  #endif
> > > -- 
> > > 2.19.2
> > 
> > I really like. Mostly concentrated on looking at the docs. Also still
> > need to apply it and build the docs, so I can appreciate the DOT graphs.
> > -Daniel
> -- 
> Cheers,
> 	Lyude Paul
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 06/15] drm/i915: Keep malloc references to MST ports
  2018-12-18 21:52     ` Lyude Paul
@ 2018-12-19 13:20       ` Daniel Vetter
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2018-12-19 13:20 UTC (permalink / raw)
  To: Lyude Paul
  Cc: Daniel Vetter, dri-devel, nouveau, intel-gfx, amd-gfx,
	Dave Airlie, Harry Wentland, Jerry Zuo, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, David Airlie, linux-kernel

On Tue, Dec 18, 2018 at 04:52:24PM -0500, Lyude Paul wrote:
> On Fri, 2018-12-14 at 10:32 +0100, Daniel Vetter wrote:
> > On Thu, Dec 13, 2018 at 08:25:35PM -0500, Lyude Paul wrote:
> > > So that the ports stay around until we've destroyed the connectors, in
> > > order to ensure that we don't pass an invalid pointer to any MST helpers
> > > once we introduce the new MST VCPI helpers.
> > > 
> > > Signed-off-by: Lyude Paul <lyude@redhat.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_connector.c | 4 ++++
> > >  drivers/gpu/drm/i915/intel_dp_mst.c    | 2 ++
> > >  2 files changed, 6 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_connector.c
> > > b/drivers/gpu/drm/i915/intel_connector.c
> > > index 18e370f607bc..37d2c644f4b8 100644
> > > --- a/drivers/gpu/drm/i915/intel_connector.c
> > > +++ b/drivers/gpu/drm/i915/intel_connector.c
> > > @@ -95,6 +95,10 @@ void intel_connector_destroy(struct drm_connector
> > > *connector)
> > >  	intel_panel_fini(&intel_connector->panel);
> > >  
> > >  	drm_connector_cleanup(connector);
> > > +
> > > +	if (intel_connector->port)
> > 
> > We set this as the first thing in intel_dp_add_mst_connector, so this
> > check isn't doing anything.
> 
> I think this comment might be a mistake? intel_connector_destroy() is shared
> by all of the intel connectors, so some may not have a value in
> intel_connector->port. I can move it to it's own destroy callback for MST if
> you would prefer though.

Oops, didn't look at the patch carefully enough. I think a
intel_dp_mst_connector_destroy would be useful in this case. But a bit a
bikeshed, so up to you.
-Daniel

> > 
> > > +		drm_dp_mst_put_port_malloc(intel_connector->port);
> > > +
> > >  	kfree(connector);
> > >  }
> > >  
> > > diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c
> > > b/drivers/gpu/drm/i915/intel_dp_mst.c
> > > index f05427b74e34..4d6ced34d465 100644
> > > --- a/drivers/gpu/drm/i915/intel_dp_mst.c
> > > +++ b/drivers/gpu/drm/i915/intel_dp_mst.c
> > > @@ -484,6 +484,8 @@ static struct drm_connector
> > > *intel_dp_add_mst_connector(struct drm_dp_mst_topolo
> > >  	if (ret)
> > >  		goto err;
> > >  
> > > +	drm_dp_mst_get_port_malloc(port);
> > 
> > Needs to be moved up where we assing intel_connector->port, or it'll
> > underflow on cleanup on error paths.
> > 
> > > +
> > >  	return connector;
> > >  
> > >  err:
> > > -- 
> > > 2.19.2
> > > 
> -- 
> Cheers,
> 	Lyude Paul
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports
  2018-12-19 12:48       ` Daniel Vetter
@ 2018-12-19 18:36         ` Lyude Paul
  0 siblings, 0 replies; 31+ messages in thread
From: Lyude Paul @ 2018-12-19 18:36 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, nouveau, intel-gfx, amd-gfx, Dave Airlie,
	Harry Wentland, Jerry Zuo, David Airlie, Juston Li,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, David Airlie,
	Jonathan Corbet, linux-doc, linux-kernel

On Wed, 2018-12-19 at 13:48 +0100, Daniel Vetter wrote:
> On Tue, Dec 18, 2018 at 04:27:58PM -0500, Lyude Paul wrote:
> > On Fri, 2018-12-14 at 10:29 +0100, Daniel Vetter wrote:
> > > On Thu, Dec 13, 2018 at 08:25:32PM -0500, Lyude Paul wrote:
> > > > The current way of handling refcounting in the DP MST helpers is
> > > > really
> > > > confusing and probably just plain wrong because it's been hacked up
> > > > many
> > > > times over the years without anyone actually going over the code and
> > > > seeing if things could be simplified.
> > > > 
> > > > To the best of my understanding, the current scheme works like this:
> > > > drm_dp_mst_port and drm_dp_mst_branch both have a single refcount.
> > > > When
> > > > this refcount hits 0 for either of the two, they're removed from the
> > > > topology state, but not immediately freed. Both ports and branch
> > > > devices
> > > > will reinitialize their kref once it's hit 0 before actually
> > > > destroying
> > > > themselves. The intended purpose behind this is so that we can avoid
> > > > problems like not being able to free a remote payload that might still
> > > > be active, due to us having removed all of the port/branch device
> > > > structures in memory, as per:
> > > > 
> > > > 91a25e463130 ("drm/dp/mst: deallocate payload on port destruction")
> > > > 
> > > > Which may have worked, but then it caused use-after-free errors. Being
> > > > new to MST at the time, I tried fixing it;
> > > > 
> > > > 263efde31f97 ("drm/dp/mst: Get validated port ref in
> > > > drm_dp_update_payload_part1()")
> > > > 
> > > > But, that was broken: both drm_dp_mst_port and drm_dp_mst_branch
> > > > structs
> > > > are validated in almost every DP MST helper function. Simply put, this
> > > > means we go through the topology and try to see if the given
> > > > drm_dp_mst_branch or drm_dp_mst_port is still attached to something
> > > > before trying to use it in order to avoid dereferencing freed memory
> > > > (something that has happened a LOT in the past with this library).
> > > > Because of this it doesn't actually matter whether or not we keep keep
> > > > the ports and branches around in memory as that's not enough, because
> > > > any function that validates the branches and ports passed to it will
> > > > still reject them anyway since they're no longer in the topology
> > > > structure. So, use-after-free errors were fixed but payload
> > > > deallocation
> > > > was completely broken.
> > > > 
> > > > Two years later, AMD informed me about this issue and I attempted to
> > > > come up with a temporary fix, pending a long-overdue cleanup of this
> > > > library:
> > > > 
> > > > c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction,
> > > > just
> > > > ref")
> > > > 
> > > > But then that introduced use-after-free errors, so I quickly reverted
> > > > it:
> > > > 
> > > > 9765635b3075 ("Revert "drm/dp_mst: Skip validating ports during
> > > > destruction, just ref"")
> > > > 
> > > > And in the process, learned that there is just no simple fix for this:
> > > > the design is just broken. Unfortuntely, the usage of these helpers
> > > > are
> > > > quite broken as well. Some drivers like i915 have been smart enough to
> > > > avoid accessing any kind of information from MST port structures, but
> > > > others like nouveau have assumed, understandably so, that
> > > > drm_dp_mst_port structures are normal and can just be accessed at any
> > > > time without worrying about use-after-free errors.
> > > > 
> > > > After a lot of discussion, me and Daniel Vetter came up with a better
> > > > idea to replace all of this.
> > > > 
> > > > To summarize, since this is documented far more indepth in the
> > > > documentation this patch introduces, we make it so that
> > > > drm_dp_mst_port
> > > > and drm_dp_mst_branch structures have two different classes of
> > > > refcounts: topology_kref, and malloc_kref. topology_kref corresponds
> > > > to
> > > > the lifetime of the given drm_dp_mst_port or drm_dp_mst_branch in it's
> > > > given topology. Once it hits zero, any associated connectors are
> > > > removed
> > > > and the branch or port can no longer be validated. malloc_kref
> > > > corresponds to the lifetime of the memory allocation for the actual
> > > > structure, and will always be non-zero so long as the topology_kref is
> > > > non-zero. This gives us a way to allow callers to hold onto port and
> > > > branch device structures past their topology lifetime, and
> > > > dramatically
> > > > simplifies the lifetimes of both structures. This also finally fixes
> > > > the
> > > > port deallocation problem, properly.
> > > > 
> > > > Additionally: since this now means that we can keep ports and branch
> > > > devices allocated in memory for however long we need, we no longer
> > > > need
> > > > a significant amount of the port validation that we currently do.
> > > > 
> > > > Additionally, there is one last scenario that this fixes, which
> > > > couldn't
> > > > have been fixed properly beforehand:
> > > > 
> > > > - CPU1 unrefs port from topology (refcount 1->0)
> > > > - CPU2 refs port in topology(refcount 0->1)
> > > > 
> > > > Since we now can guarantee memory safety for ports and branches
> > > > as-needed, we also can make our main reference counting functions fix
> > > > this problem by using kref_get_unless_zero() internally so that
> > > > topology
> > > > refcounts can only ever reach 0 once.
> > > > 
> > > > Signed-off-by: Lyude Paul <lyude@redhat.com>
> > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > > Cc: David Airlie <airlied@redhat.com>
> > > > Cc: Jerry Zuo <Jerry.Zuo@amd.com>
> > > > Cc: Harry Wentland <harry.wentland@amd.com>
> > > > Cc: Juston Li <juston.li@intel.com>
> > > > ---
> > > >  .../gpu/dp-mst/topology-figure-1.dot          |  31 ++
> > > >  .../gpu/dp-mst/topology-figure-2.dot          |  37 ++
> > > >  .../gpu/dp-mst/topology-figure-3.dot          |  40 ++
> > > >  Documentation/gpu/drm-kms-helpers.rst         | 125 ++++-
> > > >  drivers/gpu/drm/drm_dp_mst_topology.c         | 512 +++++++++++++--
> > > > ---
> > > >  include/drm/drm_dp_mst_helper.h               |  19 +-
> > > >  6 files changed, 637 insertions(+), 127 deletions(-)
> > > >  create mode 100644 Documentation/gpu/dp-mst/topology-figure-1.dot
> > > >  create mode 100644 Documentation/gpu/dp-mst/topology-figure-2.dot
> > > >  create mode 100644 Documentation/gpu/dp-mst/topology-figure-3.dot
> > > 
> > > Yay, docs, and pretty ones at that! Awesome stuff :-)
> > > 
> > > > diff --git a/Documentation/gpu/dp-mst/topology-figure-1.dot
> > > > b/Documentation/gpu/dp-mst/topology-figure-1.dot
> > > > new file mode 100644
> > > > index 000000000000..fb83789e0a3e
> > > > --- /dev/null
> > > > +++ b/Documentation/gpu/dp-mst/topology-figure-1.dot
> > > > @@ -0,0 +1,31 @@
> > > > +digraph T {
> > > > +    /* Topology references */
> > > > +    node [shape=oval];
> > > > +    mstb1 -> {port1, port2};
> > > > +    port1 -> mstb2;
> > > > +    port2 -> mstb3 -> {port3, port4};
> > > > +    port3 -> mstb4;
> > > > +
> > > > +    /* Malloc references */
> > > > +    edge [style=dashed];
> > > > +    mstb4 -> port3;
> > > > +    {port4, port3} -> mstb3;
> > > > +    mstb3 -> port2;
> > > > +    mstb2 -> port1;
> > > > +    {port1, port2} -> mstb1;
> > > > +
> > > > +    edge [dir=back];
> > > > +    node [style=filled;shape=box;fillcolor=lightblue];
> > > > +    port1 -> "Payload #1";
> > > > +    port3 -> "Payload #2";
> > > > +
> > > > +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> > > > +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> > > > +    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
> > > > +    mstb4 [label="MSTB #4";style=filled;fillcolor=palegreen];
> > > > +
> > > > +    port1 [label="Port #1"];
> > > > +    port2 [label="Port #2"];
> > > > +    port3 [label="Port #3"];
> > > > +    port4 [label="Port #4"];
> > > > +}
> > > > diff --git a/Documentation/gpu/dp-mst/topology-figure-2.dot
> > > > b/Documentation/gpu/dp-mst/topology-figure-2.dot
> > > > new file mode 100644
> > > > index 000000000000..eebce560be40
> > > > --- /dev/null
> > > > +++ b/Documentation/gpu/dp-mst/topology-figure-2.dot
> > > > @@ -0,0 +1,37 @@
> > > > +digraph T {
> > > > +    /* Topology references */
> > > > +    node [shape=oval];
> > > > +
> > > > +    mstb1 -> {port1, port2};
> > > > +    port1 -> mstb2;
> > > > +    edge [color=red];
> > > > +    port2 -> mstb3 -> {port3, port4};
> > > > +    port3 -> mstb4;
> > > > +    edge [color=""];
> > > > +
> > > > +    /* Malloc references */
> > > > +    edge [style=dashed];
> > > > +    port3 -> mstb3;
> > > > +    mstb3 -> port2;
> > > > +    mstb2 -> port1;
> > > > +    {port1, port2} -> mstb1;
> > > > +    edge [color=red];
> > > > +    mstb4 -> port3;
> > > > +    port4 -> mstb3;
> > > > +    edge [color=""];
> > > > +
> > > > +    edge [dir=back];
> > > > +    node [style=filled;shape=box;fillcolor=lightblue];
> > > > +    port1 -> "Payload #1";
> > > > +    port3 -> "Payload #2";
> > > > +
> > > > +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> > > > +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> > > > +    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
> > > > +    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
> > > > +
> > > > +    port1 [label="Port #1"];
> > > > +    port2 [label="Port #2"];
> > > > +    port3 [label="Port #3"];
> > > > +    port4 [label="Port #4";style=filled;fillcolor=grey];
> > > > +}
> > > > diff --git a/Documentation/gpu/dp-mst/topology-figure-3.dot
> > > > b/Documentation/gpu/dp-mst/topology-figure-3.dot
> > > > new file mode 100644
> > > > index 000000000000..9bf28d87144c
> > > > --- /dev/null
> > > > +++ b/Documentation/gpu/dp-mst/topology-figure-3.dot
> > > > @@ -0,0 +1,40 @@
> > > > +digraph T {
> > > > +    /* Topology references */
> > > > +    node [shape=oval];
> > > > +
> > > > +    mstb1 -> {port1, port2};
> > > > +    port1 -> mstb2;
> > > > +    edge [color=grey];
> > > > +    port2 -> mstb3 -> {port3, port4};
> > > > +    port3 -> mstb4;
> > > > +    edge [color=""];
> > > > +
> > > > +    /* Malloc references */
> > > > +    edge [style=dashed];
> > > > +    port3 -> mstb3 [penwidth=3];
> > > > +    mstb3 -> port2 [penwidth=3];
> > > > +    mstb2 -> port1;
> > > > +    {port1, port2} -> mstb1;
> > > > +    edge [color=grey];
> > > > +    mstb4 -> port3;
> > > > +    port4 -> mstb3;
> > > > +    edge [color=""];
> > > > +
> > > > +    edge [dir=back];
> > > > +    node [style=filled;shape=box;fillcolor=lightblue];
> > > > +    port1 -> payload1;
> > > > +    port3 -> payload2 [penwidth=3];
> > > > +
> > > > +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> > > > +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> > > > +    mstb3 [label="MSTB
> > > > #3";penwidth=3;style=filled;fillcolor=palegreen];
> > > > +    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
> > > > +
> > > > +    port1 [label="Port #1"];
> > > > +    port2 [label="Port #2";penwidth=3];
> > > > +    port3 [label="Port #3";penwidth=3];
> > > > +    port4 [label="Port #4";style=filled;fillcolor=grey];
> > > > +
> > > > +    payload1 [label="Payload #1"];
> > > > +    payload2 [label="Payload #2";penwidth=3];
> > > > +}
> > > > diff --git a/Documentation/gpu/drm-kms-helpers.rst
> > > > b/Documentation/gpu/drm-kms-helpers.rst
> > > > index b422eb8edf16..c0f994c2c72f 100644
> > > > --- a/Documentation/gpu/drm-kms-helpers.rst
> > > > +++ b/Documentation/gpu/drm-kms-helpers.rst
> > > > @@ -208,8 +208,11 @@ Display Port Dual Mode Adaptor Helper Functions
> > > > Reference
> > > >  .. kernel-doc:: drivers/gpu/drm/drm_dp_dual_mode_helper.c
> > > >     :export:
> > > >  
> > > > -Display Port MST Helper Functions Reference
> > > > -===========================================
> > > > +Display Port MST Helpers
> > > > +========================
> > > > +
> > > > +Functions Reference
> > > > +-------------------
> > > >  
> > > >  .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > > >     :doc: dp mst helper
> > > > @@ -220,6 +223,124 @@ Display Port MST Helper Functions Reference
> > > >  .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > > >     :export:
> > > >  
> > > > +Branch device and port refcounting
> > > > +----------------------------------
> > > 
> > > I generally try to put the long-form explanations before the function
> > > references. Since usually the references completely drown out everything
> > > else and make it harder to spot the important overview stuff.
> > > 
> > > 
> > > > +
> > > > +Overview
> > > > +~~~~~~~~
> > > > +
> > > > +The refcounting schemes for :c:type:`struct drm_dp_mst_branch` and
> > > > +:c:type:`struct drm_dp_mst_port` are somewhat unusual. Both ports and
> > > > branch
> > > > +devices have two different kinds of refcounts: topology refcounts,
> > > > and
> > > > malloc
> > > > +refcounts.
> > > > +
> > > > +Topology refcount overview
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +Topology refcounts are not exposed to drivers, and are handled
> > > > internally
> > > > by the
> > > > +DP MST helpers. The helpers use them in order to prevent the in-
> > > > memory
> > > > topology
> > > > +state from being changed in the middle of critical operations like
> > > > changing the
> > > > +internal state of payload allocations. This means each branch and
> > > > port
> > > > will be
> > > > +considered to be connected to the rest of the topology until it's
> > > > topology
> > > > +refcount reaches zero. Additionally, for ports this means that their
> > > > associated
> > > > +:c:type:`struct drm_connector` will stay registered with userspace
> > > > until
> > > > the
> > > > +port's refcount reaches 0.
> > > > +
> > > > +
> > > > +Topology refcount functions
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +The DP MST helpers use the following functions to manage topology
> > > > refcounts:
> > > > +
> > > > +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > > > +   :functions: drm_dp_mst_topology_get_port
> > > > drm_dp_mst_topology_put_port
> > > > +               drm_dp_mst_topology_ref_port
> > > > drm_dp_mst_topology_get_mstb
> > > > +               drm_dp_mst_topology_put_mstb
> > > > drm_dp_mst_topology_ref_mstb
> > > > +
> > > > +Malloc refcount overview
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +Malloc references are used to keep a :c:type:`struct drm_dp_mst_port`
> > > > or
> > > > +:c:type:`struct drm_dp_mst_branch` allocated even after all of its
> > > > topology
> > > > +references have been dropped, so that the driver or MST helpers can
> > > > safely
> > > > +access each branch's last known state before it was disconnected from
> > > > the
> > > > +topology. When the malloc refcount of a port or branch reaches 0, the
> > > > memory
> > > > +allocation containing the :c:type:`struct drm_dp_mst_branch` or
> > > > :c:type:`struct
> > > > +drm_dp_mst_port` respectively will be freed.
> > > > +
> > > > +Malloc refcounts for ports
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +For :c:type:`struct drm_dp_mst_port`, malloc refcounts are exposed to
> > > > drivers
> > > > +through the following functions:
> > > > +
> > > > +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > > > +   :functions: drm_dp_mst_get_port_malloc drm_dp_mst_put_port_malloc
> > > > +
> > > > +Malloc refcounts for branch devices
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +For :c:type:`struct drm_dp_mst_branch`, malloc refcounts are not
> > > > currently
> > > > +exposed to drivers. As of writing this documentation, there are no
> > > > drivers that
> > > > +have a usecase for accessing :c:type:`struct drm_dp_mst_branch`
> > > > outside
> > > > of the
> > > > +MST helpers. Exposing this API to drivers in a race-free manner would
> > > > take more
> > > > +tweaking of the refcounting scheme, however patches are welcome
> > > > provided
> > > > there
> > > > +is a legitimate driver usecase for this.
> > > > +
> > > > +Internally, malloc refcounts for :c:type:`struct drm_dp_mst_branch`
> > > > are
> > > > managed
> > > > +by the DP MST core through the following functions:
> > > > +
> > > > +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> > > > +   :functions: drm_dp_mst_get_mstb_malloc drm_dp_mst_put_mstb_malloc
> > > > +
> > > > +Refcount relationships in a topology
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +Let's take a look at why the relationship between topology and malloc
> > > > refcounts
> > > > +is designed the way it is.
> > > > +
> > > > +.. kernel-figure:: dp-mst/topology-figure-1.dot
> > > > +
> > > > +   An example of topology and malloc refs in a DP MST topology with
> > > > two
> > > > active
> > > > +   payloads. Topology refcount increments are indicated by solid
> > > > lines,
> > > > and
> > > > +   malloc refcount increments are indicated by dashed lines. Each
> > > > starts
> > > > from
> > > > +   the branch which incremented the refcount, and ends at the branch
> > > > to
> > > > which
> > > > +   the refcount belongs to.
> > > > +
> > > > +As you can see in figure 1, every branch increments the topology
> > > > +refcount of it's children, and increments the malloc refcount of it's
> > > > parent.
> > > > +Additionally, every payload increments the malloc refcount of it's
> > > > assigned port
> > > > +by 1.
> > > > +
> > > > +So, what would happen if MSTB #3 from the above figure was unplugged
> > > > from
> > > > the
> > > > +system, but the driver hadn't yet removed payload #2 from port #3?
> > > > The
> > > > topology
> > > > +would start to look like figure 2.
> > > > +
> > > > +.. kernel-figure:: dp-mst/topology-figure-2.dot
> > > > +
> > > > +   Ports and branch devices which have been released from memory are
> > > > colored
> > > > +   grey, and references which have been removed are colored red.
> > > > +
> > > > +Whenever a port or branch device's topology refcount reaches zero, it
> > > > will
> > > > +decrement the topology refcounts of all its children, the malloc
> > > > refcount
> > > > of its
> > > > +parent, and finally its own malloc refcount. For MSTB #4 and port #4,
> > > > this means
> > > > +they both have been disconnected from the topology and freed from
> > > > memory.
> > > > But,
> > > > +because payload #2 is still holding a reference to port #3, port #3
> > > > is
> > > > removed
> > > > +from the topology but it's :c:type:`struct drm_dp_mst_port` is still
> > > > accessible
> > > > +from memory. This also means port #3 has not yet decremented the
> > > > malloc
> > > > refcount
> > > > +of MSTB #3, so it's :c:type:`struct drm_dp_mst_branch` will also stay
> > > > allocated
> > > > +in memory until port #3's malloc refcount reaches 0.
> > > > +
> > > > +This relationship is necessary because in order to release payload
> > > > #2, we
> > > > +need to be able to figure out the last relative of port #3 that's
> > > > still
> > > > +connected to the topology. In this case, we would travel up the
> > > > topology
> > > > as
> > > > +shown in figure 3.
> > > > +
> > > > +.. kernel-figure:: dp-mst/topology-figure-3.dot
> > > > +
> > > > +And finally, remove payload #2 by communicating with port #2 through
> > > > sideband
> > > > +transactions.
> > > 
> > > (Blind guess, I haven't looked ahead in the series yet)
> > > 
> > > I assume that drivers also want to hold a malloc reference from their
> > > connector, until that connector is destroyed completed (and we hence
> > > know
> > > it released all its vcpi and other stuff and really doesn't need the
> > > port
> > > anymore). Could we integrated that into these neat graphs too? Answering
> > > the "so how does this integrate into my driver?" question is imo the
> > > most
> > > important part for core api docs.
> > > 
> > > Another one: Any reason for not putting this right into the code as a
> > > DOC:
> > > section? Ime moving docs as close as possible to the code improves the
> > > odds it's kept up-to-date. The only overview texts I've left in the .rst
> > > is the stuff that describes overall concepts (e.g. how all the kms
> > > objects
> > > fit together).
> > > 
> > > All the sphinx/rst syntax should carry over 1:1, except in kerneldoc you
> > > also can benefit from the abbreviated reference syntax from kerneldoc.
> > > 
> > > Anyway, really great stuff.
> > > 
> > > > +
> > > >  MIPI DSI Helper Functions Reference
> > > >  ===================================
> > > >  
> > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > index 2ab16c9e6243..c196fb580beb 100644
> > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > @@ -46,7 +46,7 @@ static bool dump_dp_payload_table(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  				  char *buf);
> > > >  static int test_calc_pbn_mode(void);
> > > >  
> > > > -static void drm_dp_put_port(struct drm_dp_mst_port *port);
> > > > +static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port
> > > > *port);
> > > >  
> > > >  static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr
> > > > *mgr,
> > > >  				     int id,
> > > > @@ -850,46 +850,120 @@ static struct drm_dp_mst_branch
> > > > *drm_dp_add_mst_branch_device(u8 lct, u8 *rad)
> > > >  	if (lct > 1)
> > > >  		memcpy(mstb->rad, rad, lct / 2);
> > > >  	INIT_LIST_HEAD(&mstb->ports);
> > > > -	kref_init(&mstb->kref);
> > > > +	kref_init(&mstb->topology_kref);
> > > > +	kref_init(&mstb->malloc_kref);
> > > >  	return mstb;
> > > >  }
> > > >  
> > > >  static void drm_dp_free_mst_port(struct kref *kref);
> > > > +static void drm_dp_free_mst_branch_device(struct kref *kref);
> > > 
> > > I'd move the functions around, forward declarations for static functions
> > > is a bit silly
> > > 
> > > > +
> > > > +/**
> > > > + * drm_dp_mst_get_mstb_malloc() - Increment the malloc refcount of a
> > > > branch
> > > > + * device
> > > > + * @mstb: The &struct drm_dp_mst_branch to increment the malloc
> > > > refcount
> > > > of
> > > > + *
> > > > + * Increments @mstb.malloc_kref. When @mstb.malloc_kref reaches 0,
> > > > the
> > > > memory
> > > 
> > > s/@/&/ for structure member references. @ references to
> > > parameters/members
> > > in the same kerneldoc type only. With & you'll get a nice link, @ is
> > > just
> > > markup (and yes & with a member unfortunately doesn't link to the
> > > member,
> > > only the overall structure).
> > > 
> > > Similarly below.
> > > 
> > > > + * allocation for @mstb will be released and @mstb may no longer be
> > > > used.
> > > > + *
> > > > + * Any malloc references acquired with this function must be released
> > > > when
> > > > + * they are no longer being used by calling
> > > > drm_dp_mst_put_mstb_malloc().
> > > 
> > > I'd dropped "when they are no longer being used", and the line below
> > > too.
> > > Short docs are better generally because attention span of readers.
> > > 
> > > > + *
> > > > + * See also: drm_dp_mst_put_mstb_malloc()
> > > > + */
> > > > +static void
> > > > +drm_dp_mst_get_mstb_malloc(struct drm_dp_mst_branch *mstb)
> > > > +{
> > > > +	kref_get(&mstb->malloc_kref);
> > > > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb-
> > > > >malloc_kref));
> > > > +}
> > > > +
> > > > +/**
> > > > + * drm_dp_mst_put_mstb_malloc() - Decrement the malloc refcount of a
> > > > branch
> > > > + * device
> > > > + * @mstb: The &struct drm_dp_mst_branch to decrement the malloc
> > > > refcount
> > > > of
> > > > + *
> > > > + * Decrements @mstb.malloc_kref. When @mstb.malloc_kref reaches 0,
> > > > the
> > > > memory
> > > > + * allocation for @mstb will be released and @mstb may no longer be
> > > > used.
> > > > + *
> > > > + * See also: drm_dp_mst_get_mstb_malloc()
> > > > + */
> > > > +static void
> > > > +drm_dp_mst_put_mstb_malloc(struct drm_dp_mst_branch *mstb)
> > > > +{
> > > > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb-
> > > > >malloc_kref)-1);
> > > > +	kref_put(&mstb->malloc_kref, drm_dp_free_mst_branch_device);
> > > > +}
> > > > +
> > > > +/**
> > > > + * drm_dp_mst_get_port_malloc() - Increment the malloc refcount of an
> > > > MST
> > > > port
> > > > + * @port: The &struct drm_dp_mst_port to increment the malloc
> > > > refcount of
> > > > + *
> > > > + * Increments @port.malloc_kref. When @port.malloc_kref reaches 0,
> > > > the
> > > > memory
> > > > + * allocation for @port will be released and @port may no longer be
> > > > used.
> > > > + *
> > > > + * Because @port could potentially be freed at any time by the DP MST
> > > > helpers
> > > > + * if @port.malloc_kref reaches 0, including during a call to this
> > > > function,
> > > > + * drivers that which to make use of &struct drm_dp_mst_port should
> > > > ensure
> > > > + * that they grab at least one main malloc reference to their MST
> > > > ports
> > > > in
> > > > + * &drm_dp_mst_topology_cbs.add_connector. This callback is called
> > > > before
> > > > + * there is any chance for @port.malloc_kref to reach 0.
> > > > + *
> > > > + * Any malloc references acquired with this function must be released
> > > > when
> > > > + * they are no longer being used by calling
> > > > drm_dp_mst_put_port_malloc().
> > > > + *
> > > > + * See also: drm_dp_mst_put_port_malloc()
> > > 
> > > Same reduction as with mstb_malloc version.
> > > 
> > > > + */
> > > > +void
> > > > +drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port)
> > > > +{
> > > > +	kref_get(&port->malloc_kref);
> > > > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port-
> > > > >malloc_kref));
> > > > +}
> > > > +EXPORT_SYMBOL(drm_dp_mst_get_port_malloc);
> > > > +
> > > > +/**
> > > > + * drm_dp_mst_put_port_malloc() - Decrement the malloc refcount of an
> > > > MST
> > > > port
> > > > + * @port: The &struct drm_dp_mst_port to decrement the malloc
> > > > refcount of
> > > > + *
> > > > + * Decrements @port.malloc_kref. When @port.malloc_kref reaches 0,
> > > > the
> > > > memory
> > > > + * allocation for @port will be released and @port may no longer be
> > > > used.
> > > > + *
> > > > + * See also: drm_dp_mst_get_port_malloc()
> > > > + */
> > > > +void
> > > > +drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port)
> > > > +{
> > > > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port-
> > > > >malloc_kref)-1);
> > > > +	kref_put(&port->malloc_kref, drm_dp_free_mst_port);
> > > > +}
> > > > +EXPORT_SYMBOL(drm_dp_mst_put_port_malloc);
> > > >  
> > > >  static void drm_dp_free_mst_branch_device(struct kref *kref)
> > > >  {
> > > > -	struct drm_dp_mst_branch *mstb = container_of(kref, struct
> > > > drm_dp_mst_branch, kref);
> > > > -	if (mstb->port_parent) {
> > > > -		if (list_empty(&mstb->port_parent->next))
> > > > -			kref_put(&mstb->port_parent->kref,
> > > > drm_dp_free_mst_port);
> > > > -	}
> > > > +	struct drm_dp_mst_branch *mstb =
> > > > +		container_of(kref, struct drm_dp_mst_branch,
> > > > malloc_kref);
> > > > +
> > > > +	if (mstb->port_parent)
> > > > +		drm_dp_mst_put_port_malloc(mstb->port_parent);
> > > > +
> > > >  	kfree(mstb);
> > > >  }
> > > >  
> > > >  static void drm_dp_destroy_mst_branch_device(struct kref *kref)
> > > >  {
> > > > -	struct drm_dp_mst_branch *mstb = container_of(kref, struct
> > > > drm_dp_mst_branch, kref);
> > > > +	struct drm_dp_mst_branch *mstb =
> > > > +		container_of(kref, struct drm_dp_mst_branch,
> > > > topology_kref);
> > > > +	struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
> > > >  	struct drm_dp_mst_port *port, *tmp;
> > > >  	bool wake_tx = false;
> > > >  
> > > > -	/*
> > > > -	 * init kref again to be used by ports to remove mst branch
> > > > when it is
> > > > -	 * not needed anymore
> > > > -	 */
> > > > -	kref_init(kref);
> > > > -
> > > > -	if (mstb->port_parent && list_empty(&mstb->port_parent->next))
> > > > -		kref_get(&mstb->port_parent->kref);
> > > > -
> > > > -	/*
> > > > -	 * destroy all ports - don't need lock
> > > > -	 * as there are no more references to the mst branch
> > > > -	 * device at this point.
> > > > -	 */
> > > > +	mutex_lock(&mgr->lock);
> > > >  	list_for_each_entry_safe(port, tmp, &mstb->ports, next) {
> > > >  		list_del(&port->next);
> > > > -		drm_dp_put_port(port);
> > > > +		drm_dp_mst_topology_put_port(port);
> > > >  	}
> > > > +	mutex_unlock(&mgr->lock);
> > > 
> > > Would be nice to split this out (to highlight the bugfix more), but
> > > because of the kref_init() hack not really feasible I think :-/
> > > >  
> > > >  	/* drop any tx slots msg */
> > > >  	mutex_lock(&mstb->mgr->qlock);
> > > > @@ -908,14 +982,82 @@ static void
> > > > drm_dp_destroy_mst_branch_device(struct
> > > > kref *kref)
> > > >  	if (wake_tx)
> > > >  		wake_up_all(&mstb->mgr->tx_waitq);
> > > >  
> > > > -	kref_put(kref, drm_dp_free_mst_branch_device);
> > > > +	drm_dp_mst_put_mstb_malloc(mstb);
> > > >  }
> > > >  
> > > > -static void drm_dp_put_mst_branch_device(struct drm_dp_mst_branch
> > > > *mstb)
> > > > +/**
> > > > + * drm_dp_mst_topology_get_mstb() - Increment the topology refcount
> > > > of a
> > > > + * branch device unless its zero
> > > > + * @mstb: &struct drm_dp_mst_branch to increment the topology
> > > > refcount of
> > > > + *
> > > > + * Attempts to grab a topology reference to @mstb, if it hasn't yet
> > > > been
> > > > + * removed from the topology (e.g. @mstb.topology_kref has reached
> > > > 0).
> > > > + *
> > > > + * Any topology references acquired with this function must be
> > > > released
> > > > when
> > > > + * they are no longer being used by calling
> > > > drm_dp_mst_topology_put_mstb().
> > > 
> > > I'd explain the relationship with malloc_kref a bit here:
> > > 
> > > - topology ref implies a malloc ref, hence you can call get_mstb_malloc
> > >   with only holding a topology ref (might be better to explain this in
> > > the
> > >   get_mstb_malloc kerneldoc, since it also applies to the unconditional
> > >   kref_get below)
> > > - malloc_ref is enough to call this function, but then it can fail
> > > 
> > > > + *
> > > > + * See also:
> > > > + * drm_dp_mst_topology_ref_mstb()
> > > 
> > > I'd write out when you should use this one instead:
> > > 
> > > "If you already have a topology reference you should use
> > > other_function()
> > > instead."
> > > 
> > > > + * drm_dp_mst_topology_get_mstb()
> > > 
> > > This is this function itself :-)
> > > 
> > > > + *
> > > > + * Returns:
> > > > + * * 1: A topology reference was grabbed successfully
> > > > + * * 0: @port is no longer in the topology, no reference was grabbed
> > > > + */
> > > > +static int __must_check
> > > > +drm_dp_mst_topology_get_mstb(struct drm_dp_mst_branch *mstb)
> > > 
> > > Hm if you both want a kref_get and a kref_get_unless_zero then we need
> > > better naming. topology_get_mstb should be the unconditional kref_get,
> > > the
> > > conditional kref_get_unless_zero needs some indication that it could
> > > fail.
> > > We need some verb that indicates that instead of "get":
> > > - "validate" since we've used that one already
> > > - "lookup" that's used by all the drm_mode_object lookup functions,
> > > feels
> > >   a bit misleading
> > > - "try_get"
> > > 
> > > >  {
> > > > -	kref_put(&mstb->kref, drm_dp_destroy_mst_branch_device);
> > > > +	int ret = kref_get_unless_zero(&mstb->topology_kref);
> > > > +
> > > > +	if (ret)
> > > > +		DRM_DEBUG("mstb %p (%d)\n", mstb,
> > > > +			  kref_read(&mstb->topology_kref));
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +/**
> > > > + * drm_dp_mst_topology_ref_mstb() - Increment the topology refcount
> > > > of a
> > > > + * branch device
> > > > + * @mstb: The &struct drm_dp_mst_branch to increment the topology
> > > > refcount of
> > > > + *
> > > > + * Increments @mstb.topology_refcount without checking whether or not
> > > > it's
> > > > + * already reached 0. This is only valid to use in scenarios where
> > > > you
> > > > are
> > > > + * already guaranteed to have at least one active topology reference
> > > > to
> > > > @mstb.
> > > > + * Otherwise, drm_dp_mst_topology_get_mstb() should be used.
> > > 
> > > s/should/must/  (or my English understanding is off, afaiui "should"
> > > isn't
> > > a strict requirement per rfc2119)
> > > 
> > > > + *
> > > > + * Any topology references acquired with this function must be
> > > > released
> > > > when
> > > > + * they are no longer being used by calling
> > > > drm_dp_mst_topology_put_mstb().
> > > > + *
> > > > + * See also:
> > > > + * drm_dp_mst_topology_get_mstb()
> > > > + * drm_dp_mst_topology_put_mstb()
> > > > + */
> > > > +static void
> > > > +drm_dp_mst_topology_ref_mstb(struct drm_dp_mst_branch *mstb)
> > > > +{
> > > 
> > > Should we have a WARN_ON(refcount == 0) here?
> > > 
> > > > +	kref_get(&mstb->topology_kref);
> > > > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb-
> > > > >topology_kref));
> > > >  }
> > > >  
> > > > +/**
> > > > + * drm_dp_mst_topology_put_mstb() - release a topology reference to a
> > > > branch
> > > > + * device
> > > > + * @mstb: The &struct drm_dp_mst_branch to release the topology
> > > > reference
> > > > from
> > > > + *
> > > > + * Releases a topology reference from @mstb by decrementing
> > > > + * @mstb.topology_kref.
> > > > + *
> > > > + * See also:
> > > > + * drm_dp_mst_topology_get_mstb()
> > > > + * drm_dp_mst_topology_ref_mstb()
> > > > + */
> > > > +static void
> > > > +drm_dp_mst_topology_put_mstb(struct drm_dp_mst_branch *mstb)
> > > > +{
> > > > +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb-
> > > > >topology_kref)-1);
> > > > +	kref_put(&mstb->topology_kref,
> > > > drm_dp_destroy_mst_branch_device);
> > > > +}
> > > >  
> > > >  static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port,
> > > > int
> > > > old_pdt)
> > > >  {
> > > > @@ -930,14 +1072,15 @@ static void drm_dp_port_teardown_pdt(struct
> > > > drm_dp_mst_port *port, int old_pdt)
> > > >  	case DP_PEER_DEVICE_MST_BRANCHING:
> > > >  		mstb = port->mstb;
> > > >  		port->mstb = NULL;
> > > > -		drm_dp_put_mst_branch_device(mstb);
> > > > +		drm_dp_mst_topology_put_mstb(mstb);
> > > >  		break;
> > > >  	}
> > > >  }
> > > >  
> > > >  static void drm_dp_destroy_port(struct kref *kref)
> > > >  {
> > > > -	struct drm_dp_mst_port *port = container_of(kref, struct
> > > > drm_dp_mst_port, kref);
> > > > +	struct drm_dp_mst_port *port =
> > > > +		container_of(kref, struct drm_dp_mst_port,
> > > > topology_kref);
> > > >  	struct drm_dp_mst_topology_mgr *mgr = port->mgr;
> > > >  
> > > >  	if (!port->input) {
> > > > @@ -956,7 +1099,6 @@ static void drm_dp_destroy_port(struct kref
> > > > *kref)
> > > >  			 * from an EDID retrieval */
> > > >  
> > > >  			mutex_lock(&mgr->destroy_connector_lock);
> > > > -			kref_get(&port->parent->kref);
> > > >  			list_add(&port->next, &mgr-
> > > > >destroy_connector_list);
> > > >  			mutex_unlock(&mgr->destroy_connector_lock);
> > > >  			schedule_work(&mgr->destroy_connector_work);
> > > > @@ -967,25 +1109,93 @@ static void drm_dp_destroy_port(struct kref
> > > > *kref)
> > > >  		drm_dp_port_teardown_pdt(port, port->pdt);
> > > >  		port->pdt = DP_PEER_DEVICE_NONE;
> > > >  	}
> > > > -	kfree(port);
> > > > +	drm_dp_mst_put_port_malloc(port);
> > > >  }
> > > >  
> > > > -static void drm_dp_put_port(struct drm_dp_mst_port *port)
> > > > +/**
> > > > + * drm_dp_mst_topology_get_port() - Increment the topology refcount
> > > > of a
> > > > + * port unless its zero
> > > > + * @port: &struct drm_dp_mst_port to increment the topology refcount
> > > > of
> > > > + *
> > > > + * Attempts to grab a topology reference to @port, if it hasn't yet
> > > > been
> > > > + * removed from the topology (e.g. @port.topology_kref has reached
> > > > 0).
> > > > + *
> > > > + * Any topology references acquired with this function must be
> > > > released
> > > > when
> > > > + * they are no longer being used by calling
> > > > drm_dp_mst_topology_put_port().
> > > > + *
> > > > + * See also:
> > > > + * drm_dp_mst_topology_ref_port()
> > > > + * drm_dp_mst_topology_put_port()
> > > > + *
> > > > + * Returns:
> > > > + * * 1: A topology reference was grabbed successfully
> > > > + * * 0: @port is no longer in the topology, no reference was grabbed
> > > > + */
> > > > +static int __must_check
> > > > +drm_dp_mst_topology_get_port(struct drm_dp_mst_port *port)
> > > >  {
> > > > -	kref_put(&port->kref, drm_dp_destroy_port);
> > > > +	int ret = kref_get_unless_zero(&port->topology_kref);
> > > > +
> > > > +	if (ret)
> > > > +		DRM_DEBUG("port %p (%d)\n", port,
> > > > +			  kref_read(&port->topology_kref));
> > > > +
> > > > +	return ret;
> > > >  }
> > > >  
> > > > -static struct drm_dp_mst_branch
> > > > *drm_dp_mst_get_validated_mstb_ref_locked(struct drm_dp_mst_branch
> > > > *mstb,
> > > > struct drm_dp_mst_branch *to_find)
> > > > +/**
> > > > + * drm_dp_mst_topology_ref_port() - Increment the topology refcount
> > > > of a
> > > > port
> > > > + * @port: The &struct drm_dp_mst_port to increment the topology
> > > > refcount
> > > > of
> > > > + *
> > > > + * Increments @port.topology_refcount without checking whether or not
> > > > it's
> > > > + * already reached 0. This is only valid to use in scenarios where
> > > > you
> > > > are
> > > > + * already guaranteed to have at least one active topology reference
> > > > to
> > > > @port.
> > > > + * Otherwise, drm_dp_mst_topology_get_port() should be used.
> > > > + *
> > > > + * Any topology references acquired with this function must be
> > > > released
> > > > when
> > > > + * they are no longer being used by calling
> > > > drm_dp_mst_topology_put_port().
> > > > + *
> > > > + * See also:
> > > > + * drm_dp_mst_topology_get_port()
> > > > + * drm_dp_mst_topology_put_port()
> > > > + */
> > > > +static void drm_dp_mst_topology_ref_port(struct drm_dp_mst_port
> > > > *port)
> > > > +{
> > > > +	kref_get(&port->topology_kref);
> > > > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port-
> > > > >topology_kref));
> > > > +}
> > > > +
> > > > +/**
> > > > + * drm_dp_mst_topology_put_port() - release a topology reference to a
> > > > port
> > > > + * @port: The &struct drm_dp_mst_port to release the topology
> > > > reference
> > > > from
> > > > + *
> > > > + * Releases a topology reference from @port by decrementing
> > > > + * @port.topology_kref.
> > > > + *
> > > > + * See also:
> > > > + * drm_dp_mst_topology_get_port()
> > > > + * drm_dp_mst_topology_ref_port()
> > > > + */
> > > > +static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port
> > > > *port)
> > > > +{
> > > > +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port-
> > > > >topology_kref)-1);
> > > > +	kref_put(&port->topology_kref, drm_dp_destroy_port);
> > > > +}
> > > > +
> > > > +static struct drm_dp_mst_branch *
> > > > +drm_dp_mst_topology_get_mstb_validated_locked(struct
> > > > drm_dp_mst_branch
> > > > *mstb,
> > > > +					      struct drm_dp_mst_branch
> > > > *to_find)
> > > >  {
> > > >  	struct drm_dp_mst_port *port;
> > > >  	struct drm_dp_mst_branch *rmstb;
> > > > -	if (to_find == mstb) {
> > > > -		kref_get(&mstb->kref);
> > > > +
> > > > +	if (to_find == mstb)
> > > >  		return mstb;
> > > > -	}
> > > > +
> > > >  	list_for_each_entry(port, &mstb->ports, next) {
> > > >  		if (port->mstb) {
> > > > -			rmstb =
> > > > drm_dp_mst_get_validated_mstb_ref_locked(port-
> > > > > mstb, to_find);
> > > > +			rmstb =
> > > > drm_dp_mst_topology_get_mstb_validated_locked(
> > > 
> > > I think a prep patch which just renames the current get_validated/put
> > > functions to the new names would be really good. Then this patch here
> > > with
> > > the new stuff.
> > > 
> > > 
> > > > +			    port->mstb, to_find);
> > > >  			if (rmstb)
> > > >  				return rmstb;
> > > >  		}
> > > > @@ -993,27 +1203,37 @@ static struct drm_dp_mst_branch
> > > > *drm_dp_mst_get_validated_mstb_ref_locked(struct
> > > >  	return NULL;
> > > >  }
> > > >  
> > > > -static struct drm_dp_mst_branch *drm_dp_get_validated_mstb_ref(struct
> > > > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_branch *mstb)
> > > > +static struct drm_dp_mst_branch *
> > > > +drm_dp_mst_topology_get_mstb_validated(struct drm_dp_mst_topology_mgr
> > > > *mgr,
> > > > +				       struct drm_dp_mst_branch *mstb)
> > > >  {
> > > >  	struct drm_dp_mst_branch *rmstb = NULL;
> > > > +
> > > >  	mutex_lock(&mgr->lock);
> > > > -	if (mgr->mst_primary)
> > > > -		rmstb = drm_dp_mst_get_validated_mstb_ref_locked(mgr-
> > > > > mst_primary, mstb);
> > > > +	if (mgr->mst_primary) {
> > > > +		rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
> > > > +		    mgr->mst_primary, mstb);
> > > > +
> > > > +		if (rmstb && !drm_dp_mst_topology_get_mstb(rmstb))
> > > > +			rmstb = NULL;
> > > > +	}
> > > >  	mutex_unlock(&mgr->lock);
> > > >  	return rmstb;
> > > >  }
> > > >  
> > > > -static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct
> > > > drm_dp_mst_branch *mstb, struct drm_dp_mst_port *to_find)
> > > > +static struct drm_dp_mst_port *
> > > > +drm_dp_mst_topology_get_port_validated_locked(struct
> > > > drm_dp_mst_branch
> > > > *mstb,
> > > > +					      struct drm_dp_mst_port
> > > > *to_find)
> > > >  {
> > > >  	struct drm_dp_mst_port *port, *mport;
> > > >  
> > > >  	list_for_each_entry(port, &mstb->ports, next) {
> > > > -		if (port == to_find) {
> > > > -			kref_get(&port->kref);
> > > > +		if (port == to_find)
> > > >  			return port;
> > > > -		}
> > > > +
> > > >  		if (port->mstb) {
> > > > -			mport = drm_dp_mst_get_port_ref_locked(port-
> > > > >mstb,
> > > > to_find);
> > > > +			mport =
> > > > drm_dp_mst_topology_get_port_validated_locked(
> > > > +			    port->mstb, to_find);
> > > >  			if (mport)
> > > >  				return mport;
> > > >  		}
> > > > @@ -1021,12 +1241,20 @@ static struct drm_dp_mst_port
> > > > *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
> > > >  	return NULL;
> > > >  }
> > > >  
> > > > -static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct
> > > > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
> > > > +static struct drm_dp_mst_port *
> > > > +drm_dp_mst_topology_get_port_validated(struct drm_dp_mst_topology_mgr
> > > > *mgr,
> > > > +				       struct drm_dp_mst_port *port)
> > > >  {
> > > >  	struct drm_dp_mst_port *rport = NULL;
> > > > +
> > > >  	mutex_lock(&mgr->lock);
> > > > -	if (mgr->mst_primary)
> > > > -		rport = drm_dp_mst_get_port_ref_locked(mgr-
> > > > >mst_primary,
> > > > port);
> > > > +	if (mgr->mst_primary) {
> > > > +		rport = drm_dp_mst_topology_get_port_validated_locked(
> > > > +		    mgr->mst_primary, port);
> > > > +
> > > > +		if (rport && !drm_dp_mst_topology_get_port(rport))
> > > > +			rport = NULL;
> > > > +	}
> > > >  	mutex_unlock(&mgr->lock);
> > > >  	return rport;
> > > >  }
> > > > @@ -1034,11 +1262,12 @@ static struct drm_dp_mst_port
> > > > *drm_dp_get_validated_port_ref(struct drm_dp_mst_t
> > > >  static struct drm_dp_mst_port *drm_dp_get_port(struct
> > > > drm_dp_mst_branch
> > > > *mstb, u8 port_num)
> > > >  {
> > > >  	struct drm_dp_mst_port *port;
> > > > +	int ret;
> > > >  
> > > >  	list_for_each_entry(port, &mstb->ports, next) {
> > > >  		if (port->port_num == port_num) {
> > > > -			kref_get(&port->kref);
> > > > -			return port;
> > > > +			ret = drm_dp_mst_topology_get_port(port);
> > > > +			return ret ? port : NULL;
> > > >  		}
> > > >  	}
> > > >  
> > > > @@ -1087,6 +1316,11 @@ static bool drm_dp_port_setup_pdt(struct
> > > > drm_dp_mst_port *port)
> > > >  		if (port->mstb) {
> > > >  			port->mstb->mgr = port->mgr;
> > > >  			port->mstb->port_parent = port;
> > > > +			/*
> > > > +			 * Make sure this port's memory allocation
> > > > stays
> > > > +			 * around until it's child MSTB releases it
> > > > +			 */
> > > > +			drm_dp_mst_get_port_malloc(port);
> > > >  
> > > >  			send_link = true;
> > > >  		}
> > > > @@ -1147,17 +1381,26 @@ static void drm_dp_add_port(struct
> > > > drm_dp_mst_branch *mstb,
> > > >  	bool created = false;
> > > >  	int old_pdt = 0;
> > > >  	int old_ddps = 0;
> > > > +
> > > >  	port = drm_dp_get_port(mstb, port_msg->port_number);
> > > >  	if (!port) {
> > > >  		port = kzalloc(sizeof(*port), GFP_KERNEL);
> > > >  		if (!port)
> > > >  			return;
> > > > -		kref_init(&port->kref);
> > > > +		kref_init(&port->topology_kref);
> > > > +		kref_init(&port->malloc_kref);
> > > >  		port->parent = mstb;
> > > >  		port->port_num = port_msg->port_number;
> > > >  		port->mgr = mstb->mgr;
> > > >  		port->aux.name = "DPMST";
> > > >  		port->aux.dev = dev->dev;
> > > > +
> > > > +		/*
> > > > +		 * Make sure the memory allocation for our parent
> > > > branch stays
> > > > +		 * around until our own memory allocation is released
> > > > +		 */
> > > > +		drm_dp_mst_get_mstb_malloc(mstb);
> > > > +
> > > >  		created = true;
> > > >  	} else {
> > > >  		old_pdt = port->pdt;
> > > > @@ -1177,7 +1420,7 @@ static void drm_dp_add_port(struct
> > > > drm_dp_mst_branch
> > > > *mstb,
> > > >  	   for this list */
> > > >  	if (created) {
> > > >  		mutex_lock(&mstb->mgr->lock);
> > > > -		kref_get(&port->kref);
> > > > +		drm_dp_mst_topology_ref_port(port);
> > > >  		list_add(&port->next, &mstb->ports);
> > > >  		mutex_unlock(&mstb->mgr->lock);
> > > >  	}
> > > > @@ -1202,17 +1445,21 @@ static void drm_dp_add_port(struct
> > > > drm_dp_mst_branch *mstb,
> > > >  	if (created && !port->input) {
> > > >  		char proppath[255];
> > > >  
> > > > -		build_mst_prop_path(mstb, port->port_num, proppath,
> > > > sizeof(proppath));
> > > > -		port->connector = (*mstb->mgr->cbs-
> > > > >add_connector)(mstb->mgr,
> > > > port, proppath);
> > > > +		build_mst_prop_path(mstb, port->port_num, proppath,
> > > > +				    sizeof(proppath));
> > > > +		port->connector = (*mstb->mgr->cbs-
> > > > >add_connector)(mstb->mgr,
> > > > +								   por
> > > > t,
> > > > +								   pro
> > > > ppath);
> > > >  		if (!port->connector) {
> > > >  			/* remove it from the port list */
> > > >  			mutex_lock(&mstb->mgr->lock);
> > > >  			list_del(&port->next);
> > > >  			mutex_unlock(&mstb->mgr->lock);
> > > >  			/* drop port list reference */
> > > > -			drm_dp_put_port(port);
> > > > +			drm_dp_mst_topology_put_port(port);
> > > >  			goto out;
> > > >  		}
> > > > +
> > > >  		if ((port->pdt == DP_PEER_DEVICE_DP_LEGACY_CONV ||
> > > >  		     port->pdt == DP_PEER_DEVICE_SST_SINK) &&
> > > >  		    port->port_num >= DP_MST_LOGICAL_PORT_0) {
> > > > @@ -1224,7 +1471,7 @@ static void drm_dp_add_port(struct
> > > > drm_dp_mst_branch
> > > > *mstb,
> > > >  
> > > >  out:
> > > >  	/* put reference to this port */
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  }
> > > >  
> > > >  static void drm_dp_update_port(struct drm_dp_mst_branch *mstb,
> > > > @@ -1259,7 +1506,7 @@ static void drm_dp_update_port(struct
> > > > drm_dp_mst_branch *mstb,
> > > >  			dowork = true;
> > > >  	}
> > > >  
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  	if (dowork)
> > > >  		queue_work(system_long_wq, &mstb->mgr->work);
> > > >  
> > > > @@ -1270,7 +1517,7 @@ static struct drm_dp_mst_branch
> > > > *drm_dp_get_mst_branch_device(struct drm_dp_mst_
> > > >  {
> > > >  	struct drm_dp_mst_branch *mstb;
> > > >  	struct drm_dp_mst_port *port;
> > > > -	int i;
> > > > +	int i, ret;
> > > >  	/* find the port by iterating down */
> > > >  
> > > >  	mutex_lock(&mgr->lock);
> > > > @@ -1295,7 +1542,9 @@ static struct drm_dp_mst_branch
> > > > *drm_dp_get_mst_branch_device(struct drm_dp_mst_
> > > >  			}
> > > >  		}
> > > >  	}
> > > > -	kref_get(&mstb->kref);
> > > > +	ret = drm_dp_mst_topology_get_mstb(mstb);
> > > > +	if (!ret)
> > > > +		mstb = NULL;
> > > >  out:
> > > >  	mutex_unlock(&mgr->lock);
> > > >  	return mstb;
> > > > @@ -1325,19 +1574,22 @@ static struct drm_dp_mst_branch
> > > > *get_mst_branch_device_by_guid_helper(
> > > >  	return NULL;
> > > >  }
> > > >  
> > > > -static struct drm_dp_mst_branch
> > > > *drm_dp_get_mst_branch_device_by_guid(
> > > > -	struct drm_dp_mst_topology_mgr *mgr,
> > > > -	uint8_t *guid)
> > > > +static struct drm_dp_mst_branch *
> > > > +drm_dp_get_mst_branch_device_by_guid(struct drm_dp_mst_topology_mgr
> > > > *mgr,
> > > > +				     uint8_t *guid)
> > > >  {
> > > >  	struct drm_dp_mst_branch *mstb;
> > > > +	int ret;
> > > >  
> > > >  	/* find the port by iterating down */
> > > >  	mutex_lock(&mgr->lock);
> > > >  
> > > >  	mstb = get_mst_branch_device_by_guid_helper(mgr->mst_primary,
> > > > guid);
> > > > -
> > > > -	if (mstb)
> > > > -		kref_get(&mstb->kref);
> > > > +	if (mstb) {
> > > > +		ret = drm_dp_mst_topology_get_mstb(mstb);
> > > > +		if (!ret)
> > > > +			mstb = NULL;
> > > > +	}
> > > >  
> > > >  	mutex_unlock(&mgr->lock);
> > > >  	return mstb;
> > > > @@ -1362,10 +1614,10 @@ static void
> > > > drm_dp_check_and_send_link_address(struct drm_dp_mst_topology_mgr *m
> > > >  			drm_dp_send_enum_path_resources(mgr, mstb,
> > > > port);
> > > >  
> > > >  		if (port->mstb) {
> > > > -			mstb_child =
> > > > drm_dp_get_validated_mstb_ref(mgr, port-
> > > > > mstb);
> > > > +			mstb_child =
> > > > drm_dp_mst_topology_get_mstb_validated(mgr, port->mstb);
> > > >  			if (mstb_child) {
> > > >  				drm_dp_check_and_send_link_address(mgr
> > > > ,
> > > > mstb_child);
> > > > -				drm_dp_put_mst_branch_device(mstb_chil
> > > > d);
> > > > +				drm_dp_mst_topology_put_mstb(mstb_chil
> > > > d);
> > > >  			}
> > > >  		}
> > > >  	}
> > > > @@ -1375,16 +1627,19 @@ static void drm_dp_mst_link_probe_work(struct
> > > > work_struct *work)
> > > >  {
> > > >  	struct drm_dp_mst_topology_mgr *mgr = container_of(work,
> > > > struct
> > > > drm_dp_mst_topology_mgr, work);
> > > >  	struct drm_dp_mst_branch *mstb;
> > > > +	int ret;
> > > >  
> > > >  	mutex_lock(&mgr->lock);
> > > >  	mstb = mgr->mst_primary;
> > > >  	if (mstb) {
> > > > -		kref_get(&mstb->kref);
> > > > +		ret = drm_dp_mst_topology_get_mstb(mstb);
> > > > +		if (!ret)
> > > > +			mstb = NULL;
> > > >  	}
> > > >  	mutex_unlock(&mgr->lock);
> > > >  	if (mstb) {
> > > >  		drm_dp_check_and_send_link_address(mgr, mstb);
> > > > -		drm_dp_put_mst_branch_device(mstb);
> > > > +		drm_dp_mst_topology_put_mstb(mstb);
> > > >  	}
> > > >  }
> > > >  
> > > > @@ -1695,22 +1950,32 @@ static struct drm_dp_mst_port
> > > > *drm_dp_get_last_connected_port_to_mstb(struct drm
> > > >  	return drm_dp_get_last_connected_port_to_mstb(mstb-
> > > > >port_parent-
> > > > > parent);
> > > >  }
> > > >  
> > > > -static struct drm_dp_mst_branch
> > > > *drm_dp_get_last_connected_port_and_mstb(struct
> > > > drm_dp_mst_topology_mgr
> > > > *mgr,
> > > > -									
> > > >  struc
> > > > t drm_dp_mst_branch *mstb,
> > > > -									
> > > >  int
> > > > *port_num)
> > > > +static struct drm_dp_mst_branch *
> > > > +drm_dp_get_last_connected_port_and_mstb(struct
> > > > drm_dp_mst_topology_mgr
> > > > *mgr,
> > > > +					struct drm_dp_mst_branch
> > > > *mstb,
> > > > +					int *port_num)
> > > >  {
> > > >  	struct drm_dp_mst_branch *rmstb = NULL;
> > > >  	struct drm_dp_mst_port *found_port;
> > > > +
> > > >  	mutex_lock(&mgr->lock);
> > > > -	if (mgr->mst_primary) {
> > > > +	if (!mgr->mst_primary)
> > > > +		goto out;
> > > > +
> > > > +	do {
> > > >  		found_port =
> > > > drm_dp_get_last_connected_port_to_mstb(mstb);
> > > > +		if (!found_port)
> > > > +			break;
> > > >  
> > > > -		if (found_port) {
> > > > +		if (drm_dp_mst_topology_get_mstb(found_port->parent))
> > > > {
> > > >  			rmstb = found_port->parent;
> > > > -			kref_get(&rmstb->kref);
> > > >  			*port_num = found_port->port_num;
> > > > +		} else {
> > > > +			/* Search again, starting from this parent */
> > > > +			mstb = found_port->parent;
> > > >  		}
> > > > -	}
> > > > +	} while (!rmstb);
> > > 
> > > Hm, is this a bugfix of validating the entire chain? Afaiui the new
> > > topology_get still validates the entire chain, so I'm a bit confused
> > > what
> > > this does here.
> > 
> > JFYI: I'm assuming you meant the old get_validated() functions. I
> > mentioned in
> > the cover letter for this series that I wasn't sure if we still needed
> > them,
> > but on closer inspection I think we still do since they perform the actual
> > validation of the whole topology chain.
> > drm_dp_mst_topology_get_(port|mstb)()
> > just increments the topology refcount safely.
> 
> Yeah I mixed this up with the old get_validated, but this is only used in
> payload_send_mgs.
> 
> > The change you're seeing here is because since we didn't use
> > kref_get_unless_zero() before, we'd just go up the topology path above
> > mstb(),
> > then kref the first thing we find that we think is still connected to the
> > topology (I honestly don't know how/if this ever worked), then give it a
> > kref
> > and return it. Now that we use kref_get_unless_zero(), we have to deal
> > with
> > the fact that the kref could fail, which would happen if we just retrieved
> > a
> > parent mstb or port that is also disconnected from the topology. So, the
> > only
> > way to do that is to find what we think is the last connected mstb, check
> > if
> > it actually is, then restart the search from that mstb if the kref failed
> > and
> > it's not connected to the topology.
> > 
> > That being said, I've been wondering about figuring out spots like this
> > where
> > we probably also need to follow that up with "also make sure all of the
> > parents of this 'connected' topology device are also valid", since it's
> > quite
> > possible we could run into a scenario like this:
> > 
> > Step 1:
> > MSTB 1
> >   |- Port 1
> >   |- Port 2
> >   |- Port 3
> >      |- MSTB 2 ← (just unplugged, top refcount == 0)
> >         |- Port 4 ← (also unplugged, but top refcount not updated yet)
> >         |- Port 5 ← (same thing ^)
> >         |- Port 6 ← (same thing ^)
> >            |- MSTB 3
> >               |- Port 7
> >               |- Port 8
> >               |- Port 9
> >                  ^
> >       drm_dp_get_last_connected_port_to_mstb()
> >       travels up to Port 6, assumes Port 6 is valid because it's top
> > refcount
> >       != 0
> > 
> > Now that I type all of that out though, I think we could also fix that
> > fairly
> > easily by instead just adding a topology_state mutex, and adding a
> > variable to
> > denote whether or not a port is actually still part of a topology.
> > 
> > Maybe that also means we should come up with a different name for
> > topology_refcount, resource_refcount maybe?
> 
> Hm. Fixing refcounting/lifetime issues with locking is ime a bad idea.
> I'm also not sure we fundamentally need to fix this, since even if we make
> the internal representation race-free, the real world can still get
> unplugged whenever it feels like. So fundamentally the payload_send_msg
> function needs to be able to deal with races.
> 
> So not sure whether (or why) we need your change here.
> 
> Can we instead just fail if we've raced and give up? Not sure why we need
> to send out the message still if the recipient is gone ...
I think you are right actually, I thought a lot more about this and I
think there is always going to be some inevitable racing.

As for whether or not we can just fail if we've raced: we do need to
send a message out if the recepient isn't there so long as there's still
a part of the topology active, because payload allocations actually go
down the whole tree. For example, this is what a payload allocation on a
deeply nested MSTB actually should look like in the real world

  MSTB #1
  |- Port #1 (payload allocation forwards payload #1 down to MSTB #2)
     |- MSTB #2 (payload allocation forwards payload #1 down to Port #4)
        |- Port #3
        |- Port #4 (payload allocation forwards payload #1 down to MSTB #3)
           |- MSTB #3 (Payload allocation for payload #1 goes to sink)
  |- Port #2

When we setup virtual channels, MSTB #1 actually does the job of
allocating a payload on MSTB #2, which does the job of allocating a
payload on MSTB #3. If MSTB #3 is removed, MSTB #1 won't actually
change any payload allocations down the topology until we tell it to.
Hence why we need to be able to actually figure out what the last
connected port was. I've confirmed this as well: AMD has pointed some
bugs out to me that I've been able to reproduce where the problems with
payload deallocation start causing downstream sinks to flicker randomly
due to leaking payloads.

So long story short: we do actually need payload messages to be as
accurate as possible with this. I think we could theoretically get
around the inherent raciness I described above by just doing something
like this:

* Find the last living relative of MSTB #3 (let's pretend it's port #4)
* Try sending a payload message to port #4, while at the same time port
  #4 is removed from the topology in the real world
* When the payload message times out, check if the root of the topology
  (MSTB #1) is still connected and if so, redo the search starting from
  port #4
* Rinse and repeat, until either the topology itself is removed, we
  reach the root of the topology and get no response, or we manage to
  get a response.

We could probably use similar approaches for most of the cases where
being accurate to the real world state of the topology ends up being
important, and possibly even supplement it by querying the actual state
of the hub on failure to ensure we stay in sync.

All of that being said: I think the approach I described above can
probably be saved for another patch series. For now though, we will need
the changes in drm_dp_get_last_connected_port_to_mstb() to at least make
payload deallocation work outside of some very edge cases. I'm fine with
splitting it into another patch though if you think that'd be better
> -Daniel
> 
> > > > +out:
> > > >  	mutex_unlock(&mgr->lock);
> > > >  	return rmstb;
> > > >  }
> > > > @@ -1726,17 +1991,19 @@ static int drm_dp_payload_send_msg(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  	u8 sinks[DRM_DP_MAX_SDP_STREAMS];
> > > >  	int i;
> > > >  
> > > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > > >  	if (!port)
> > > >  		return -EINVAL;
> > > >  
> > > >  	port_num = port->port_num;
> > > > -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> > > > +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port-
> > > > >parent);
> > > >  	if (!mstb) {
> > > > -		mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> > > > port-
> > > > > parent, &port_num);
> > > > +		mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> > > > +							       port-
> > > > >parent,
> > > > +							       &port_n
> > > > um);
> > > >  
> > > >  		if (!mstb) {
> > > > -			drm_dp_put_port(port);
> > > > +			drm_dp_mst_topology_put_port(port);
> > > >  			return -EINVAL;
> > > >  		}
> > > >  	}
> > > > @@ -1766,8 +2033,8 @@ static int drm_dp_payload_send_msg(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  	}
> > > >  	kfree(txmsg);
> > > >  fail_put:
> > > > -	drm_dp_put_mst_branch_device(mstb);
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_mstb(mstb);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  	return ret;
> > > >  }
> > > >  
> > > > @@ -1777,13 +2044,13 @@ int drm_dp_send_power_updown_phy(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  	struct drm_dp_sideband_msg_tx *txmsg;
> > > >  	int len, ret;
> > > >  
> > > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > > >  	if (!port)
> > > >  		return -EINVAL;
> > > >  
> > > >  	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> > > >  	if (!txmsg) {
> > > > -		drm_dp_put_port(port);
> > > > +		drm_dp_mst_topology_put_port(port);
> > > >  		return -ENOMEM;
> > > >  	}
> > > >  
> > > > @@ -1799,7 +2066,7 @@ int drm_dp_send_power_updown_phy(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  			ret = 0;
> > > >  	}
> > > >  	kfree(txmsg);
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  
> > > >  	return ret;
> > > >  }
> > > > @@ -1888,7 +2155,8 @@ int drm_dp_update_payload_part1(struct
> > > > drm_dp_mst_topology_mgr *mgr)
> > > >  		if (vcpi) {
> > > >  			port = container_of(vcpi, struct
> > > > drm_dp_mst_port,
> > > >  					    vcpi);
> > > > -			port = drm_dp_get_validated_port_ref(mgr,
> > > > port);
> > > > +			port =
> > > > drm_dp_mst_topology_get_port_validated(mgr,
> > > > +								      
> > > > port);
> > > >  			if (!port) {
> > > >  				mutex_unlock(&mgr->payload_lock);
> > > >  				return -EINVAL;
> > > > @@ -1925,7 +2193,7 @@ int drm_dp_update_payload_part1(struct
> > > > drm_dp_mst_topology_mgr *mgr)
> > > >  		cur_slots += req_payload.num_slots;
> > > >  
> > > >  		if (port)
> > > > -			drm_dp_put_port(port);
> > > > +			drm_dp_mst_topology_put_port(port);
> > > >  	}
> > > >  
> > > >  	for (i = 0; i < mgr->max_payloads; i++) {
> > > > @@ -2024,7 +2292,7 @@ static int drm_dp_send_dpcd_write(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  	struct drm_dp_sideband_msg_tx *txmsg;
> > > >  	struct drm_dp_mst_branch *mstb;
> > > >  
> > > > -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> > > > +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port-
> > > > >parent);
> > > >  	if (!mstb)
> > > >  		return -EINVAL;
> > > >  
> > > > @@ -2048,7 +2316,7 @@ static int drm_dp_send_dpcd_write(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  	}
> > > >  	kfree(txmsg);
> > > >  fail_put:
> > > > -	drm_dp_put_mst_branch_device(mstb);
> > > > +	drm_dp_mst_topology_put_mstb(mstb);
> > > >  	return ret;
> > > >  }
> > > >  
> > > > @@ -2158,7 +2426,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct
> > > > drm_dp_mst_topology_mgr *mgr, bool ms
> > > >  
> > > >  		/* give this the main reference */
> > > >  		mgr->mst_primary = mstb;
> > > > -		kref_get(&mgr->mst_primary->kref);
> > > > +		drm_dp_mst_topology_ref_mstb(mgr->mst_primary);
> > > >  
> > > >  		ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
> > > >  							 DP_MST_EN |
> > > > DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC);
> > > > @@ -2192,7 +2460,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct
> > > > drm_dp_mst_topology_mgr *mgr, bool ms
> > > >  out_unlock:
> > > >  	mutex_unlock(&mgr->lock);
> > > >  	if (mstb)
> > > > -		drm_dp_put_mst_branch_device(mstb);
> > > > +		drm_dp_mst_topology_put_mstb(mstb);
> > > >  	return ret;
> > > >  
> > > >  }
> > > > @@ -2357,7 +2625,7 @@ static int drm_dp_mst_handle_down_rep(struct
> > > > drm_dp_mst_topology_mgr *mgr)
> > > >  			       mgr->down_rep_recv.initial_hdr.lct,
> > > >  				      mgr-
> > > > >down_rep_recv.initial_hdr.rad[0],
> > > >  				      mgr->down_rep_recv.msg[0]);
> > > > -			drm_dp_put_mst_branch_device(mstb);
> > > > +			drm_dp_mst_topology_put_mstb(mstb);
> > > >  			memset(&mgr->down_rep_recv, 0, sizeof(struct
> > > > drm_dp_sideband_msg_rx));
> > > >  			return 0;
> > > >  		}
> > > > @@ -2368,7 +2636,7 @@ static int drm_dp_mst_handle_down_rep(struct
> > > > drm_dp_mst_topology_mgr *mgr)
> > > >  		}
> > > >  
> > > >  		memset(&mgr->down_rep_recv, 0, sizeof(struct
> > > > drm_dp_sideband_msg_rx));
> > > > -		drm_dp_put_mst_branch_device(mstb);
> > > > +		drm_dp_mst_topology_put_mstb(mstb);
> > > >  
> > > >  		mutex_lock(&mgr->qlock);
> > > >  		txmsg->state = DRM_DP_SIDEBAND_TX_RX;
> > > > @@ -2441,7 +2709,7 @@ static int drm_dp_mst_handle_up_req(struct
> > > > drm_dp_mst_topology_mgr *mgr)
> > > >  		}
> > > >  
> > > >  		if (mstb)
> > > > -			drm_dp_put_mst_branch_device(mstb);
> > > > +			drm_dp_mst_topology_put_mstb(mstb);
> > > >  
> > > >  		memset(&mgr->up_req_recv, 0, sizeof(struct
> > > > drm_dp_sideband_msg_rx));
> > > >  	}
> > > > @@ -2501,7 +2769,7 @@ enum drm_connector_status
> > > > drm_dp_mst_detect_port(struct drm_connector *connector
> > > >  	enum drm_connector_status status =
> > > > connector_status_disconnected;
> > > >  
> > > >  	/* we need to search for the port in the mgr in case its gone
> > > > */
> > > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > > >  	if (!port)
> > > >  		return connector_status_disconnected;
> > > >  
> > > > @@ -2526,7 +2794,7 @@ enum drm_connector_status
> > > > drm_dp_mst_detect_port(struct drm_connector *connector
> > > >  		break;
> > > >  	}
> > > >  out:
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  	return status;
> > > >  }
> > > >  EXPORT_SYMBOL(drm_dp_mst_detect_port);
> > > > @@ -2543,11 +2811,11 @@ bool drm_dp_mst_port_has_audio(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  {
> > > >  	bool ret = false;
> > > >  
> > > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > > >  	if (!port)
> > > >  		return ret;
> > > >  	ret = port->has_audio;
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  	return ret;
> > > >  }
> > > >  EXPORT_SYMBOL(drm_dp_mst_port_has_audio);
> > > > @@ -2567,7 +2835,7 @@ struct edid *drm_dp_mst_get_edid(struct
> > > > drm_connector *connector, struct drm_dp_
> > > >  	struct edid *edid = NULL;
> > > >  
> > > >  	/* we need to search for the port in the mgr in case its gone
> > > > */
> > > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > > >  	if (!port)
> > > >  		return NULL;
> > > >  
> > > > @@ -2578,7 +2846,7 @@ struct edid *drm_dp_mst_get_edid(struct
> > > > drm_connector *connector, struct drm_dp_
> > > >  		drm_connector_set_tile_property(connector);
> > > >  	}
> > > >  	port->has_audio = drm_detect_monitor_audio(edid);
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  	return edid;
> > > >  }
> > > >  EXPORT_SYMBOL(drm_dp_mst_get_edid);
> > > > @@ -2649,7 +2917,7 @@ int drm_dp_atomic_find_vcpi_slots(struct
> > > > drm_atomic_state *state,
> > > >  	if (IS_ERR(topology_state))
> > > >  		return PTR_ERR(topology_state);
> > > >  
> > > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > > >  	if (port == NULL)
> > > >  		return -EINVAL;
> > > >  	req_slots = DIV_ROUND_UP(pbn, mgr->pbn_div);
> > > > @@ -2657,14 +2925,14 @@ int drm_dp_atomic_find_vcpi_slots(struct
> > > > drm_atomic_state *state,
> > > >  			req_slots, topology_state->avail_slots);
> > > >  
> > > >  	if (req_slots > topology_state->avail_slots) {
> > > > -		drm_dp_put_port(port);
> > > > +		drm_dp_mst_topology_put_port(port);
> > > >  		return -ENOSPC;
> > > >  	}
> > > >  
> > > >  	topology_state->avail_slots -= req_slots;
> > > >  	DRM_DEBUG_KMS("vcpi slots avail=%d", topology_state-
> > > > >avail_slots);
> > > >  
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  	return req_slots;
> > > >  }
> > > >  EXPORT_SYMBOL(drm_dp_atomic_find_vcpi_slots);
> > > > @@ -2715,7 +2983,7 @@ bool drm_dp_mst_allocate_vcpi(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  {
> > > >  	int ret;
> > > >  
> > > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > > >  	if (!port)
> > > >  		return false;
> > > >  
> > > > @@ -2725,7 +2993,7 @@ bool drm_dp_mst_allocate_vcpi(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  	if (port->vcpi.vcpi > 0) {
> > > >  		DRM_DEBUG_KMS("payload: vcpi %d already allocated for
> > > > pbn %d -
> > > > requested pbn %d\n", port->vcpi.vcpi, port->vcpi.pbn, pbn);
> > > >  		if (pbn == port->vcpi.pbn) {
> > > > -			drm_dp_put_port(port);
> > > > +			drm_dp_mst_topology_put_port(port);
> > > >  			return true;
> > > >  		}
> > > >  	}
> > > > @@ -2733,13 +3001,13 @@ bool drm_dp_mst_allocate_vcpi(struct
> > > > drm_dp_mst_topology_mgr *mgr,
> > > >  	ret = drm_dp_init_vcpi(mgr, &port->vcpi, pbn, slots);
> > > >  	if (ret) {
> > > >  		DRM_DEBUG_KMS("failed to init vcpi slots=%d max=63
> > > > ret=%d\n",
> > > > -				DIV_ROUND_UP(pbn, mgr->pbn_div), ret);
> > > > +			      DIV_ROUND_UP(pbn, mgr->pbn_div), ret);
> > > >  		goto out;
> > > >  	}
> > > >  	DRM_DEBUG_KMS("initing vcpi for pbn=%d slots=%d\n",
> > > > -			pbn, port->vcpi.num_slots);
> > > > +		      pbn, port->vcpi.num_slots);
> > > >  
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  	return true;
> > > >  out:
> > > >  	return false;
> > > > @@ -2749,12 +3017,12 @@ EXPORT_SYMBOL(drm_dp_mst_allocate_vcpi);
> > > >  int drm_dp_mst_get_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr,
> > > > struct
> > > > drm_dp_mst_port *port)
> > > >  {
> > > >  	int slots = 0;
> > > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > > >  	if (!port)
> > > >  		return slots;
> > > >  
> > > >  	slots = port->vcpi.num_slots;
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  	return slots;
> > > >  }
> > > >  EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
> > > > @@ -2768,11 +3036,11 @@ EXPORT_SYMBOL(drm_dp_mst_get_vcpi_slots);
> > > >   */
> > > >  void drm_dp_mst_reset_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr,
> > > > struct drm_dp_mst_port *port)
> > > >  {
> > > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > > >  	if (!port)
> > > >  		return;
> > > >  	port->vcpi.num_slots = 0;
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  }
> > > >  EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> > > >  
> > > > @@ -2781,9 +3049,10 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> > > >   * @mgr: manager for this port
> > > >   * @port: unverified port to deallocate vcpi for
> > > >   */
> > > > -void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
> > > > struct drm_dp_mst_port *port)
> > > > +void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
> > > > +				struct drm_dp_mst_port *port)
> > > >  {
> > > > -	port = drm_dp_get_validated_port_ref(mgr, port);
> > > > +	port = drm_dp_mst_topology_get_port_validated(mgr, port);
> > > >  	if (!port)
> > > >  		return;
> > > >  
> > > > @@ -2792,7 +3061,7 @@ void drm_dp_mst_deallocate_vcpi(struct
> > > > drm_dp_mst_topology_mgr *mgr, struct drm_
> > > >  	port->vcpi.pbn = 0;
> > > >  	port->vcpi.aligned_pbn = 0;
> > > >  	port->vcpi.vcpi = 0;
> > > > -	drm_dp_put_port(port);
> > > > +	drm_dp_mst_topology_put_port(port);
> > > >  }
> > > >  EXPORT_SYMBOL(drm_dp_mst_deallocate_vcpi);
> > > >  
> > > > @@ -3078,8 +3347,10 @@ static void drm_dp_tx_work(struct work_struct
> > > > *work)
> > > >  
> > > >  static void drm_dp_free_mst_port(struct kref *kref)
> > > >  {
> > > > -	struct drm_dp_mst_port *port = container_of(kref, struct
> > > > drm_dp_mst_port, kref);
> > > > -	kref_put(&port->parent->kref, drm_dp_free_mst_branch_device);
> > > > +	struct drm_dp_mst_port *port =
> > > > +		container_of(kref, struct drm_dp_mst_port,
> > > > malloc_kref);
> > > > +
> > > > +	drm_dp_mst_put_mstb_malloc(port->parent);
> > > >  	kfree(port);
> > > >  }
> > > >  
> > > > @@ -3103,7 +3374,6 @@ static void drm_dp_destroy_connector_work(struct
> > > > work_struct *work)
> > > >  		list_del(&port->next);
> > > >  		mutex_unlock(&mgr->destroy_connector_lock);
> > > >  
> > > > -		kref_init(&port->kref);
> > > >  		INIT_LIST_HEAD(&port->next);
> > > >  
> > > >  		mgr->cbs->destroy_connector(mgr, port->connector);
> > > > @@ -3117,7 +3387,7 @@ static void drm_dp_destroy_connector_work(struct
> > > > work_struct *work)
> > > >  			drm_dp_mst_put_payload_id(mgr, port-
> > > > >vcpi.vcpi);
> > > >  		}
> > > >  
> > > > -		kref_put(&port->kref, drm_dp_free_mst_port);
> > > > +		drm_dp_mst_put_port_malloc(port);
> > > >  		send_hotplug = true;
> > > >  	}
> > > >  	if (send_hotplug)
> > > > @@ -3292,7 +3562,7 @@ static int drm_dp_mst_i2c_xfer(struct
> > > > i2c_adapter
> > > > *adapter, struct i2c_msg *msgs
> > > >  	struct drm_dp_sideband_msg_tx *txmsg = NULL;
> > > >  	int ret;
> > > >  
> > > > -	mstb = drm_dp_get_validated_mstb_ref(mgr, port->parent);
> > > > +	mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port-
> > > > >parent);
> > > >  	if (!mstb)
> > > >  		return -EREMOTEIO;
> > > >  
> > > > @@ -3342,7 +3612,7 @@ static int drm_dp_mst_i2c_xfer(struct
> > > > i2c_adapter
> > > > *adapter, struct i2c_msg *msgs
> > > >  	}
> > > >  out:
> > > >  	kfree(txmsg);
> > > > -	drm_dp_put_mst_branch_device(mstb);
> > > > +	drm_dp_mst_topology_put_mstb(mstb);
> > > >  	return ret;
> > > >  }
> > > >  
> > > > diff --git a/include/drm/drm_dp_mst_helper.h
> > > > b/include/drm/drm_dp_mst_helper.h
> > > > index 371cc2816477..50643a39765d 100644
> > > > --- a/include/drm/drm_dp_mst_helper.h
> > > > +++ b/include/drm/drm_dp_mst_helper.h
> > > > @@ -44,7 +44,10 @@ struct drm_dp_vcpi {
> > > >  
> > > >  /**
> > > >   * struct drm_dp_mst_port - MST port
> > > > - * @kref: reference count for this port.
> > > > + * @topology_kref: refcount for this port's lifetime in the topology,
> > > > only the
> > > > + * DP MST helpers should need to touch this
> > > > + * @malloc_kref: refcount for the memory allocation containing this
> > > > structure.
> > > > + * See drm_dp_mst_get_port_malloc() and drm_dp_mst_put_port_malloc().
> > > >   * @port_num: port number
> > > >   * @input: if this port is an input port.
> > > >   * @mcs: message capability status - DP 1.2 spec.
> > > > @@ -67,7 +70,8 @@ struct drm_dp_vcpi {
> > > >   * in the MST topology.
> > > >   */
> > > >  struct drm_dp_mst_port {
> > > > -	struct kref kref;
> > > > +	struct kref topology_kref;
> > > > +	struct kref malloc_kref;
> > > 
> > > I'd to inline member kerneldoc here (you can mix&match, so no need to
> > > rewrite them all) and spend a few words reference the family of get/put
> > > functions. Same for mstb below.
> > > 
> > > >  
> > > >  	u8 port_num;
> > > >  	bool input;
> > > > @@ -102,7 +106,10 @@ struct drm_dp_mst_port {
> > > >  
> > > >  /**
> > > >   * struct drm_dp_mst_branch - MST branch device.
> > > > - * @kref: reference count for this port.
> > > > + * @topology_kref: refcount for this branch device's lifetime in the
> > > > topology,
> > > > + * only the DP MST helpers should need to touch this
> > > > + * @malloc_kref: refcount for the memory allocation containing this
> > > > structure.
> > > > + * See drm_dp_mst_get_mstb_malloc() and drm_dp_mst_put_mstb_malloc().
> > > >   * @rad: Relative Address to talk to this branch device.
> > > >   * @lct: Link count total to talk to this branch device.
> > > >   * @num_ports: number of ports on the branch.
> > > > @@ -121,7 +128,8 @@ struct drm_dp_mst_port {
> > > >   * to downstream port of parent branches.
> > > >   */
> > > >  struct drm_dp_mst_branch {
> > > > -	struct kref kref;
> > > > +	struct kref topology_kref;
> > > > +	struct kref malloc_kref;
> > > >  	u8 rad[8];
> > > >  	u8 lct;
> > > >  	int num_ports;
> > > > @@ -626,4 +634,7 @@ int drm_dp_atomic_release_vcpi_slots(struct
> > > > drm_atomic_state *state,
> > > >  int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
> > > >  				 struct drm_dp_mst_port *port, bool
> > > > power_up);
> > > >  
> > > > +void drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port);
> > > > +void drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port);
> > > > +
> > > >  #endif
> > > > -- 
> > > > 2.19.2
> > > 
> > > I really like. Mostly concentrated on looking at the docs. Also still
> > > need to apply it and build the docs, so I can appreciate the DOT graphs.
> > > -Daniel
> > -- 
> > Cheers,
> > 	Lyude Paul
> > 
-- 
Cheers,
	Lyude Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-12-19 18:37 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-14  1:25 [WIP PATCH 00/15] MST refcounting/atomic helpers cleanup Lyude Paul
2018-12-14  1:25 ` [WIP PATCH 01/15] drm/dp_mst: Remove bogus conditional in drm_dp_update_payload_part1() Lyude Paul
2018-12-14  8:42   ` Daniel Vetter
2018-12-17 20:09     ` Wentland, Harry
2018-12-14  1:25 ` [WIP PATCH 02/15] drm/dp_mst: Refactor drm_dp_update_payload_part1() Lyude Paul
2018-12-14  8:47   ` Daniel Vetter
2018-12-17 20:27     ` Wentland, Harry
2018-12-14  1:25 ` [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports Lyude Paul
2018-12-14  9:29   ` Daniel Vetter
2018-12-18 21:27     ` Lyude Paul
2018-12-19 12:48       ` Daniel Vetter
2018-12-19 18:36         ` Lyude Paul
2018-12-14  1:25 ` [WIP PATCH 04/15] drm/dp_mst: Stop releasing VCPI when removing ports from topology Lyude Paul
2018-12-14  9:40   ` Daniel Vetter
2018-12-14  1:25 ` [WIP PATCH 05/15] drm/dp_mst: Fix payload deallocation on hotplugs using malloc refs Lyude Paul
2018-12-14  9:38   ` Daniel Vetter
2018-12-18 22:02     ` Lyude Paul
2018-12-14  1:25 ` [WIP PATCH 06/15] drm/i915: Keep malloc references to MST ports Lyude Paul
2018-12-14  9:32   ` Daniel Vetter
2018-12-18 21:52     ` Lyude Paul
2018-12-19 13:20       ` Daniel Vetter
2018-12-14  1:25 ` [WIP PATCH 07/15] drm/nouveau: Remove bogus cleanup in nv50_mstm_add_connector() Lyude Paul
2018-12-14  1:25 ` [WIP PATCH 08/15] drm/nouveau: Remove unnecessary VCPI checks in nv50_msto_cleanup() Lyude Paul
2018-12-14  1:25 ` [WIP PATCH 09/15] drm/nouveau: Fix potential use-after-frees for MSTCs Lyude Paul
2018-12-14  1:25 ` [WIP PATCH 10/15] drm/nouveau: Stop unsetting mstc->port, use malloc refs Lyude Paul
2018-12-14  1:25 ` [WIP PATCH 11/15] drm/nouveau: Grab payload lock in nv50_msto_payload() Lyude Paul
2018-12-14  1:25 ` [WIP PATCH 12/15] drm/dp_mst: Add some atomic state iterator macros Lyude Paul
2018-12-14  1:25 ` [WIP PATCH 13/15] drm/dp_mst: Start tracking per-port VCPI allocations Lyude Paul
2018-12-14 16:37   ` Daniel Vetter
2018-12-14  1:25 ` [WIP PATCH 14/15] drm/dp_mst: Check payload count in drm_dp_mst_atomic_check() Lyude Paul
2018-12-14  1:25 ` [WIP PATCH 15/15] drm/nouveau: Use atomic VCPI helpers for MST Lyude Paul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).