All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
@ 2024-03-15 17:09 sunpeng.li
  2024-03-15 17:09 ` [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode sunpeng.li
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: sunpeng.li @ 2024-03-15 17:09 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Robert Mader, Pekka Paalanen, Sean Paul,
	Simon Ser, Shashank Sharma, Harry Wentland, Sebastian Wick,
	Leo Li

From: Leo Li <sunpeng.li@amd.com>

These patches aim to make the amdgpgu KMS driver play nicer with compositors
when building multi-plane scanout configurations. They do so by:

1. Making cursor behavior more sensible.
2. Allowing placement of DRM OVERLAY planes underneath the PRIMARY plane for
   'underlay' configurations (perhaps more of a RFC, see below).

Please see the commit messages for details.


For #2, the simplest way to accomplish this was to increase the value of the
immutable zpos property for the PRIMARY plane. This allowed OVERLAY planes with
a mutable zpos range of (0-254) to be positioned underneath the PRIMARY for an
underlay scanout configuration.

Technically speaking, DCN hardware does not have a concept of primary or overlay
planes - there are simply 4 general purpose hardware pipes that can be maped in
any configuration. So the immutable zpos restriction on the PRIMARY plane is
kind of arbitrary; it can have a mutable range of (0-254) just like the
OVERLAYs. The distinction between PRIMARY and OVERLAY planes is also somewhat
arbitrary. We can interpret PRIMARY as the first plane that should be enabled on
a CRTC, but beyond that, it doesn't mean much for amdgpu.

Therefore, I'm curious about how compositors devs understand KMS planes and
their zpos properties, and how we would like to use them. It isn't clear to me
how compositors wish to interpret and use the DRM zpos property, or
differentiate between OVERLAY and PRIMARY planes, when it comes to setting up
multi-plane scanout.

Ultimately, what I'd like to answer is "What can we do on the KMS driver and DRM
plane API side, that can make building multi-plane scanout configurations easier
for compositors?" I'm hoping we can converge on something, whether that be
updating the existing documentation to better define the usage, or update the
API to provide support for something that is lacking.

Thanks,
Leo


Some links to provide context and details:
* What is underlay?: https://gitlab.freedesktop.org/emersion/libliftoff/-/issues/76
* Discussion on how to implement underlay on Weston: https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2325164

Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Chao Guo <chao.guo@nxp.com>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Vikas Korjani <Vikas.Korjani@amd.com>
Cc: Robert Mader <robert.mader@posteo.de>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Sean Paul <sean@poorly.run>
Cc: Simon Ser <contact@emersion.fr>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>

Leo Li (2):
  drm/amd/display: Introduce overlay cursor mode
  drm/amd/display: Move PRIMARY plane zpos higher

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 405 ++++++++++++++++--
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
 .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  28 +-
 4 files changed, 391 insertions(+), 50 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode
  2024-03-15 17:09 [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible sunpeng.li
@ 2024-03-15 17:09 ` sunpeng.li
  2024-03-16  8:38   ` kernel test robot
                     ` (3 more replies)
  2024-03-15 17:09 ` [PATCH 2/2] drm/amd/display: Move PRIMARY plane zpos higher sunpeng.li
  2024-03-28 14:33 ` [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible Pekka Paalanen
  2 siblings, 4 replies; 27+ messages in thread
From: sunpeng.li @ 2024-03-15 17:09 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Robert Mader, Pekka Paalanen, Sean Paul,
	Simon Ser, Shashank Sharma, Harry Wentland, Sebastian Wick,
	Leo Li

From: Leo Li <sunpeng.li@amd.com>

[Why]

DCN is the display hardware for amdgpu. DRM planes are backed by DCN
hardware pipes, which carry pixel data from one end (memory), to the
other (output encoder).

Each DCN pipe has the ability to blend in a cursor early on in the
pipeline. In other words, there are no dedicated cursor planes in DCN,
which makes cursor behavior somewhat unintuitive for compositors.

For example, if the cursor is in RGB format, but the top-most DRM plane
is in YUV format, DCN will not be able to blend them. Because of this,
amdgpu_dm rejects all configurations where a cursor needs to be enabled
on top of a YUV formatted plane.

From a compositor's perspective, when computing an allocation for
hardware plane offloading, this cursor-on-yuv configuration result in an
atomic test failure. Since the failure reason is not obvious at all,
compositors will likely fall back to full rendering, which is not ideal.

Instead, amdgpu_dm can try to accommodate the cursor-on-yuv
configuration by opportunistically reserving a separate DCN pipe just
for the cursor. We can refer to this as "overlay cursor mode". It is
contrasted with "native cursor mode", where the native DCN per-pipe
cursor is used.

[How]

On each crtc, compute whether the cursor plane should be enabled in
overlay mode (which currently, is iff the immediate plane below has a
YUV format). If it is, mark the CRTC as requesting overlay cursor mode.

During DC validation, attempt to enable a separate DCN pipe for the
cursor if it's in overlay mode. If that fails, or if no overlay mode is
requested, then fallback to native mode.

Signed-off-by: Leo Li <sunpeng.li@amd.com>
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 309 +++++++++++++++---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
 .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  13 +-
 4 files changed, 288 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 21a61454c878..09ab330aed17 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -8359,8 +8359,19 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
 	 * Disable the cursor first if we're disabling all the planes.
 	 * It'll remain on the screen after the planes are re-enabled
 	 * if we don't.
+	 *
+	 * If the cursor is transitioning from native to overlay mode, the
+	 * native cursor needs to be disabled first.
 	 */
-	if (acrtc_state->active_planes == 0)
+	if (acrtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE &&
+	    dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
+		struct dc_cursor_position cursor_position = {0};
+		dc_stream_set_cursor_position(acrtc_state->stream,
+					      &cursor_position);
+	}
+
+	if (acrtc_state->active_planes == 0 &&
+	    dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE)
 		amdgpu_dm_commit_cursors(state);
 
 	/* update planes when needed */
@@ -8374,7 +8385,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
 		struct dm_plane_state *dm_new_plane_state = to_dm_plane_state(new_plane_state);
 
 		/* Cursor plane is handled after stream updates */
-		if (plane->type == DRM_PLANE_TYPE_CURSOR) {
+		if (plane->type == DRM_PLANE_TYPE_CURSOR &&
+		    acrtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
 			if ((fb && crtc == pcrtc) ||
 			    (old_plane_state->fb && old_plane_state->crtc == pcrtc))
 				cursor_update = true;
@@ -8727,7 +8739,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
 	 * This avoids redundant programming in the case where we're going
 	 * to be disabling a single plane - those pipes are being disabled.
 	 */
-	if (acrtc_state->active_planes)
+	if (acrtc_state->active_planes &&
+	    acrtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE)
 		amdgpu_dm_commit_cursors(state);
 
 cleanup:
@@ -10039,7 +10052,8 @@ static bool should_reset_plane(struct drm_atomic_state *state,
 {
 	struct drm_plane *other;
 	struct drm_plane_state *old_other_state, *new_other_state;
-	struct drm_crtc_state *new_crtc_state;
+	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
+	struct dm_crtc_state *old_dm_crtc_state, *new_dm_crtc_state;
 	struct amdgpu_device *adev = drm_to_adev(plane->dev);
 	int i;
 
@@ -10061,10 +10075,24 @@ static bool should_reset_plane(struct drm_atomic_state *state,
 
 	new_crtc_state =
 		drm_atomic_get_new_crtc_state(state, new_plane_state->crtc);
+	old_crtc_state =
+		drm_atomic_get_old_crtc_state(state, old_plane_state->crtc);
 
 	if (!new_crtc_state)
 		return true;
 
+	/*
+	 * A change in cursor mode means a new dc pipe needs to be acquired or
+	 * released from the state
+	 */
+	old_dm_crtc_state = to_dm_crtc_state(old_crtc_state);
+	new_dm_crtc_state = to_dm_crtc_state(new_crtc_state);
+	if (plane->type == DRM_PLANE_TYPE_CURSOR &&
+	    old_dm_crtc_state != NULL &&
+	    old_dm_crtc_state->cursor_mode != new_dm_crtc_state->cursor_mode) {
+		return true;
+	}
+
 	/* CRTC Degamma changes currently require us to recreate planes. */
 	if (new_crtc_state->color_mgmt_changed)
 		return true;
@@ -10216,6 +10244,68 @@ static int dm_check_cursor_fb(struct amdgpu_crtc *new_acrtc,
 	return 0;
 }
 
+/*
+ * Helper function for checking the cursor in native mode
+ */
+static int dm_check_native_cursor_state(struct drm_crtc *new_plane_crtc,
+					struct drm_plane *plane,
+					struct drm_plane_state *new_plane_state,
+					bool enable)
+{
+
+	struct amdgpu_crtc *new_acrtc;
+	int ret;
+
+	if (!enable || !new_plane_crtc ||
+	    drm_atomic_plane_disabling(plane->state, new_plane_state))
+		return 0;
+
+	new_acrtc = to_amdgpu_crtc(new_plane_crtc);
+
+	if (new_plane_state->src_x != 0 || new_plane_state->src_y != 0) {
+		DRM_DEBUG_ATOMIC("Cropping not supported for cursor plane\n");
+		return -EINVAL;
+	}
+
+	if (new_plane_state->fb) {
+		ret = dm_check_cursor_fb(new_acrtc, new_plane_state,
+						new_plane_state->fb);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static bool dm_should_update_native_cursor(struct drm_atomic_state *state,
+					   struct drm_crtc *old_plane_crtc,
+					   struct drm_crtc *new_plane_crtc,
+					   bool enable)
+{
+	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
+	struct dm_crtc_state *dm_old_crtc_state, *dm_new_crtc_state;
+
+	if (!enable) {
+		if (old_plane_crtc == NULL)
+			return true;
+
+		old_crtc_state = drm_atomic_get_old_crtc_state(
+			state, old_plane_crtc);
+		dm_old_crtc_state = to_dm_crtc_state(old_crtc_state);
+
+		return dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE;
+	} else {
+		if (new_plane_crtc == NULL)
+			return true;
+
+		new_crtc_state = drm_atomic_get_new_crtc_state(
+			state, new_plane_crtc);
+		dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
+
+		return dm_new_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE;
+	}
+}
+
 static int dm_update_plane_state(struct dc *dc,
 				 struct drm_atomic_state *state,
 				 struct drm_plane *plane,
@@ -10231,8 +10321,7 @@ static int dm_update_plane_state(struct dc *dc,
 	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
 	struct dm_crtc_state *dm_new_crtc_state, *dm_old_crtc_state;
 	struct dm_plane_state *dm_new_plane_state, *dm_old_plane_state;
-	struct amdgpu_crtc *new_acrtc;
-	bool needs_reset;
+	bool needs_reset, update_native_cursor;
 	int ret = 0;
 
 
@@ -10241,24 +10330,16 @@ static int dm_update_plane_state(struct dc *dc,
 	dm_new_plane_state = to_dm_plane_state(new_plane_state);
 	dm_old_plane_state = to_dm_plane_state(old_plane_state);
 
-	if (plane->type == DRM_PLANE_TYPE_CURSOR) {
-		if (!enable || !new_plane_crtc ||
-			drm_atomic_plane_disabling(plane->state, new_plane_state))
-			return 0;
-
-		new_acrtc = to_amdgpu_crtc(new_plane_crtc);
-
-		if (new_plane_state->src_x != 0 || new_plane_state->src_y != 0) {
-			DRM_DEBUG_ATOMIC("Cropping not supported for cursor plane\n");
-			return -EINVAL;
-		}
+	update_native_cursor = dm_should_update_native_cursor(state,
+							      old_plane_crtc,
+							      new_plane_crtc,
+							      enable);
 
-		if (new_plane_state->fb) {
-			ret = dm_check_cursor_fb(new_acrtc, new_plane_state,
-						 new_plane_state->fb);
-			if (ret)
-				return ret;
-		}
+	if (plane->type == DRM_PLANE_TYPE_CURSOR && update_native_cursor) {
+		ret = dm_check_native_cursor_state(new_plane_crtc, plane,
+					            new_plane_state, enable);
+		if (ret)
+			return ret;
 
 		return 0;
 	}
@@ -10285,16 +10366,17 @@ static int dm_update_plane_state(struct dc *dc,
 				plane->base.id, old_plane_crtc->base.id);
 
 		ret = dm_atomic_get_state(state, &dm_state);
-		if (ret)
-			return ret;
+		if (ret) {
+			goto out;
+		}
 
 		if (!dc_state_remove_plane(
 				dc,
 				dm_old_crtc_state->stream,
 				dm_old_plane_state->dc_state,
 				dm_state->context)) {
-
-			return -EINVAL;
+			ret = -EINVAL;
+			goto out;
 		}
 
 		if (dm_old_plane_state->dc_state)
@@ -10323,21 +10405,16 @@ static int dm_update_plane_state(struct dc *dc,
 			return 0;
 
 		ret = amdgpu_dm_plane_helper_check_state(new_plane_state, new_crtc_state);
-		if (ret)
-			return ret;
+		if (ret) {
+			goto out;
+		}
 
 		WARN_ON(dm_new_plane_state->dc_state);
 
 		dc_new_plane_state = dc_create_plane_state(dc);
-		if (!dc_new_plane_state)
-			return -ENOMEM;
-
-		/* Block top most plane from being a video plane */
-		if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
-			if (amdgpu_dm_plane_is_video_format(new_plane_state->fb->format->format) && *is_top_most_overlay)
-				return -EINVAL;
-
-			*is_top_most_overlay = false;
+		if (!dc_new_plane_state) {
+			ret = -ENOMEM;
+			goto out;
 		}
 
 		DRM_DEBUG_ATOMIC("Enabling DRM plane: %d on DRM crtc %d\n",
@@ -10350,13 +10427,13 @@ static int dm_update_plane_state(struct dc *dc,
 			new_crtc_state);
 		if (ret) {
 			dc_plane_state_release(dc_new_plane_state);
-			return ret;
+			goto out;
 		}
 
 		ret = dm_atomic_get_state(state, &dm_state);
 		if (ret) {
 			dc_plane_state_release(dc_new_plane_state);
-			return ret;
+			goto out;
 		}
 
 		/*
@@ -10373,7 +10450,8 @@ static int dm_update_plane_state(struct dc *dc,
 				dm_state->context)) {
 
 			dc_plane_state_release(dc_new_plane_state);
-			return -EINVAL;
+			ret = -EINVAL;
+			goto out;
 		}
 
 		dm_new_plane_state->dc_state = dc_new_plane_state;
@@ -10388,6 +10466,16 @@ static int dm_update_plane_state(struct dc *dc,
 		*lock_and_validation_needed = true;
 	}
 
+out:
+	/* If cursor overlay failed, attempt fallback to native mode */
+	if (ret == -EINVAL && plane->type == DRM_PLANE_TYPE_CURSOR) {
+		ret = dm_check_native_cursor_state(new_plane_crtc, plane,
+						    new_plane_state, enable);
+		if (ret) {
+			return ret;
+		}
+		dm_new_crtc_state->cursor_mode = DM_CURSOR_NATIVE_MODE;
+	}
 
 	return ret;
 }
@@ -10544,6 +10632,126 @@ static int add_affected_mst_dsc_crtcs(struct drm_atomic_state *state, struct drm
 	return drm_dp_mst_add_affected_dsc_crtcs(state, &aconnector->mst_root->mst_mgr);
 }
 
+/**
+ * DOC: Cursor Modes - Native vs Overlay
+ *
+ * In native mode, the cursor uses a integrated cursor pipe within each DCN hw
+ * plane. It does not require a dedicated hw plane to enable, but it is
+ * subjected to the same z-order and scaling as the hw plane. It also has format
+ * restrictions, a RGB cursor in native mode cannot be enabled within a non-RGB
+ * hw plane.
+ *
+ * In overlay mode, the cursor uses a separate DCN hw plane, and thus has its
+ * own scaling and z-pos. It also has no blending restrictions. It lends to a
+ * cursor behavior more akin to a DRM client's expectations. However, it does
+ * occupy an extra DCN plane, and therefore will only be used if a DCN plane is
+ * available.
+*/
+
+/**
+ * Set whether the cursor should be enabled in native mode, or overlay mode, on
+ * the dm_crtc_state.
+ *
+ * The cursor should be enabled in overlay mode if the immediate underlying
+ * plane contains a video format.
+ *
+ * Since zpos info is required, drm_atomic_normalize_zpos must be called before
+ * calling this function.
+*/
+static int dm_crtc_set_cursor_mode(struct drm_atomic_state *state,
+				    struct dm_crtc_state *dm_crtc_state)
+{
+	struct drm_plane_state *plane_state, *old_plane_state, *target_plane_state;
+	struct drm_crtc_state *crtc_state = &dm_crtc_state->base;
+	struct drm_plane *plane;
+	bool consider_mode_change = false;
+	bool cursor_changed = false;
+	unsigned int target_zpos;
+	unsigned int cursor_zpos;
+	int i;
+
+	/*
+	 * Cursor mode can change if a plane's format changes, is
+	 * enabled/disabled, or z-order changes.
+	 */
+	for_each_oldnew_plane_in_state(state, plane, old_plane_state, plane_state, i) {
+
+		/* Only care about planes on this CRTC */
+		if ((drm_plane_mask(plane) & crtc_state->plane_mask) == 0)
+			continue;
+
+		if (plane->type == DRM_PLANE_TYPE_CURSOR)
+			cursor_changed = true;
+
+		if (drm_atomic_plane_enabling(old_plane_state, plane_state) ||
+		    drm_atomic_plane_disabling(old_plane_state, plane_state) ||
+		    old_plane_state->fb->format != plane_state->fb->format) {
+			consider_mode_change = true;
+			break;
+		}
+	}
+
+	if (!consider_mode_change && !crtc_state->zpos_changed) {
+		return 0;
+	}
+
+	/*
+	 * If no cursor change on this CRTC, and not enabled on this CRTC, then
+	 * no need to set cursor mode. This avoids needlessly locking the cursor
+	 * state.
+	 */
+	if (!cursor_changed &&
+	    !(drm_plane_mask(crtc_state->crtc->cursor) & crtc_state->plane_mask)) {
+		return 0;
+	}
+
+	plane_state = drm_atomic_get_plane_state(state,
+						 crtc_state->crtc->cursor);
+	if (IS_ERR(plane_state))
+		return PTR_ERR(plane_state);
+
+	/* Cursor is disabled */
+	if (!plane_state->fb)
+		return 0;
+
+	cursor_zpos = plane_state->normalized_zpos;
+
+	/* Get enabled plane immediately below cursor. */
+	target_plane_state = NULL;
+	target_zpos = 0;
+	drm_for_each_plane_mask(plane, state->dev, crtc_state->plane_mask) {
+		if (plane->type == DRM_PLANE_TYPE_CURSOR)
+			continue;
+
+		plane_state = drm_atomic_get_plane_state(state, plane);
+		if (IS_ERR(plane_state))
+			return PTR_ERR(plane_state);
+
+		if (!plane_state->fb ||
+		    plane_state->normalized_zpos >= cursor_zpos)
+			continue;
+
+		if (plane_state->normalized_zpos >= target_zpos) {
+			target_zpos = plane_state->normalized_zpos;
+			target_plane_state = plane_state;
+		}
+	}
+
+	/* Nothing below cursor - use overlay mode */
+	if (target_plane_state == NULL) {
+		dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
+		return 0;
+	}
+
+	if (amdgpu_dm_plane_is_video_format(target_plane_state->fb->format->format)) {
+		dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
+	} else {
+		dm_crtc_state->cursor_mode = DM_CURSOR_NATIVE_MODE;
+	}
+
+	return 0;
+}
+
 /**
  * amdgpu_dm_atomic_check() - Atomic check implementation for AMDgpu DM.
  *
@@ -10713,6 +10921,20 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
 		goto fail;
 	}
 
+	/*
+	 * Determine whether cursors on each CRTC should be enabled in native or
+	 * overlay mode.
+	 */
+	for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
+		dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
+
+		ret = dm_crtc_set_cursor_mode(state, dm_new_crtc_state);
+		if (ret) {
+			drm_dbg(dev, "Failed to determine cursor mode\n");
+			goto fail;
+		}
+	}
+
 	/* Remove exiting planes if they are modified */
 	for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, new_plane_state, i) {
 		if (old_plane_state->fb && new_plane_state->fb &&
@@ -10793,6 +11015,11 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
 
 	/* Check cursor planes scaling */
 	for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
+		/* Overlay cusor does not need scaling check */
+		dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
+		if (dm_new_crtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE)
+			continue;
+
 		ret = dm_check_crtc_cursor(state, crtc, new_crtc_state);
 		if (ret) {
 			DRM_DEBUG_DRIVER("dm_check_crtc_cursor() failed\n");
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index 09519b7abf67..b8d39fdd1e09 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -822,6 +822,11 @@ struct dm_plane_state {
 	enum amdgpu_transfer_function blend_tf;
 };
 
+enum amdgpu_dm_cursor_mode {
+	DM_CURSOR_NATIVE_MODE = 0,
+	DM_CURSOR_OVERLAY_MODE,
+};
+
 struct dm_crtc_state {
 	struct drm_crtc_state base;
 	struct dc_stream_state *stream;
@@ -852,6 +857,8 @@ struct dm_crtc_state {
 	 * encoding.
 	 */
 	enum amdgpu_transfer_function regamma_tf;
+
+	enum amdgpu_dm_cursor_mode cursor_mode;
 };
 
 #define to_dm_crtc_state(x) container_of(x, struct dm_crtc_state, base)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
index e23a0a276e33..67aea1d2feb9 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
@@ -304,6 +304,7 @@ static struct drm_crtc_state *amdgpu_dm_crtc_duplicate_state(struct drm_crtc *cr
 	state->regamma_tf = cur->regamma_tf;
 	state->crc_skip_count = cur->crc_skip_count;
 	state->mpo_requested = cur->mpo_requested;
+	state->cursor_mode = cur->cursor_mode;
 	/* TODO Duplicate dc_stream after objects are stream object is flattened */
 
 	return &state->base;
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 8a4c40b4c27e..ed1fc01f1649 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -104,7 +104,7 @@ void amdgpu_dm_plane_fill_blending_from_plane_state(const struct drm_plane_state
 	*global_alpha = false;
 	*global_alpha_value = 0xff;
 
-	if (plane_state->plane->type != DRM_PLANE_TYPE_OVERLAY)
+	if (plane_state->plane->type == DRM_PLANE_TYPE_PRIMARY)
 		return;
 
 	if (plane_state->pixel_blend_mode == DRM_MODE_BLEND_PREMULTI ||
@@ -1175,10 +1175,21 @@ static int amdgpu_dm_plane_atomic_check(struct drm_plane *plane,
 static int amdgpu_dm_plane_atomic_async_check(struct drm_plane *plane,
 					      struct drm_atomic_state *state)
 {
+	struct drm_crtc_state *new_crtc_state;
+	struct drm_plane_state *new_plane_state;
+	struct dm_crtc_state *dm_new_crtc_state;
+
 	/* Only support async updates on cursor planes. */
 	if (plane->type != DRM_PLANE_TYPE_CURSOR)
 		return -EINVAL;
 
+	new_plane_state = drm_atomic_get_new_plane_state(state, plane);
+	new_crtc_state = drm_atomic_get_new_crtc_state(state, new_plane_state->crtc);
+	dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
+	/* Reject overlay cursors for now*/
+	if (dm_new_crtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE)
+		return -EINVAL;
+
 	return 0;
 }
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] drm/amd/display: Move PRIMARY plane zpos higher
  2024-03-15 17:09 [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible sunpeng.li
  2024-03-15 17:09 ` [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode sunpeng.li
@ 2024-03-15 17:09 ` sunpeng.li
  2024-03-21 21:36   ` Harry Wentland
  2024-03-28 15:20   ` Pekka Paalanen
  2024-03-28 14:33 ` [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible Pekka Paalanen
  2 siblings, 2 replies; 27+ messages in thread
From: sunpeng.li @ 2024-03-15 17:09 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Robert Mader, Pekka Paalanen, Sean Paul,
	Simon Ser, Shashank Sharma, Harry Wentland, Sebastian Wick,
	Leo Li

From: Leo Li <sunpeng.li@amd.com>

[Why]

Compositors have different ways of assigning surfaces to DRM planes for
render offloading. It may decide between various strategies: overlay,
underlay, or a mix of both

One way for compositors to implement the underlay strategy is to assign
a higher zpos to the DRM_PRIMARY plane than the DRM_OVERLAY planes,
effectively turning the DRM_OVERLAY plane into an underlay plane.

Today, amdgpu attaches an immutable zpos of 0 to the DRM_PRIMARY plane.
This however, is an arbitrary restriction. DCN pipes are general
purpose, and can be arranged in any z-order. To support compositors
using this allocation scheme, we can set a non-zero immutable zpos for
the PRIMARY, allowing the placement of OVERLAYS (mutable zpos range
0-254) beneath the PRIMARY.

[How]

Assign a zpos = #no of OVERLAY planes to the PRIMARY plane. Then, clean
up any assumptions in the driver of PRIMARY plane having the lowest
zpos.

Signed-off-by: Leo Li <sunpeng.li@amd.com>
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 96 ++++++++++++++++++-
 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 17 +++-
 2 files changed, 104 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 09ab330aed17..01b00f587701 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -80,6 +80,7 @@
 #include <linux/firmware.h>
 #include <linux/component.h>
 #include <linux/dmi.h>
+#include <linux/sort.h>
 
 #include <drm/display/drm_dp_mst_helper.h>
 #include <drm/display/drm_hdmi_helper.h>
@@ -369,6 +370,20 @@ static inline void reverse_planes_order(struct dc_surface_update *array_of_surfa
 		swap(array_of_surface_update[i], array_of_surface_update[j]);
 }
 
+/*
+ * DC will program planes with their z-order determined by their ordering
+ * in the dc_surface_updates array. This comparator is used to sort them
+ * by descending zpos.
+ */
+static int dm_plane_layer_index_cmp(const void *a, const void *b)
+{
+	const struct dc_surface_update *sa = (struct dc_surface_update *)a;
+	const struct dc_surface_update *sb = (struct dc_surface_update *)b;
+
+	/* Sort by descending dc_plane layer_index (i.e. normalized_zpos) */
+	return sb->surface->layer_index - sa->surface->layer_index;
+}
+
 /**
  * update_planes_and_stream_adapter() - Send planes to be updated in DC
  *
@@ -393,7 +408,8 @@ static inline bool update_planes_and_stream_adapter(struct dc *dc,
 						    struct dc_stream_update *stream_update,
 						    struct dc_surface_update *array_of_surface_update)
 {
-	reverse_planes_order(array_of_surface_update, planes_count);
+	sort(array_of_surface_update, planes_count,
+	     sizeof(*array_of_surface_update), dm_plane_layer_index_cmp, NULL);
 
 	/*
 	 * Previous frame finished and HW is ready for optimization.
@@ -9363,6 +9379,8 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
 		for (j = 0; j < status->plane_count; j++)
 			dummy_updates[j].surface = status->plane_states[0];
 
+		sort(dummy_updates, status->plane_count,
+		     sizeof(*dummy_updates), dm_plane_layer_index_cmp, NULL);
 
 		mutex_lock(&dm->dc_lock);
 		dc_update_planes_and_stream(dm->dc,
@@ -10097,6 +10115,17 @@ static bool should_reset_plane(struct drm_atomic_state *state,
 	if (new_crtc_state->color_mgmt_changed)
 		return true;
 
+	/*
+	 * On zpos change, planes need to be reordered by removing and re-adding
+	 * them one by one to the dc state, in order of descending zpos.
+	 *
+	 * TODO: We can likely skip bandwidth validation if the only thing that
+	 * changed about the plane was it'z z-ordering.
+	 */
+	if (new_crtc_state->zpos_changed) {
+		return true;
+	}
+
 	if (drm_atomic_crtc_needs_modeset(new_crtc_state))
 		return true;
 
@@ -10509,6 +10538,65 @@ dm_get_plane_scale(struct drm_plane_state *plane_state,
 	*out_plane_scale_h = plane_state->crtc_h * 1000 / plane_src_h;
 }
 
+/*
+ * The normalized_zpos value cannot be used by this iterator directly. It's only
+ * calculated for enabled planes, potentially causing normalized_zpos collisions
+ * between enabled/disabled planes in the atomic state. We need a unique value
+ * so that the iterator will not generate the same object twice, or loop
+ * indefinitely.
+ */
+static inline struct __drm_planes_state *__get_next_zpos(
+	struct drm_atomic_state *state,
+	struct __drm_planes_state *prev)
+{
+	unsigned int highest_zpos = 0, prev_zpos = 256;
+	uint32_t highest_id = 0, prev_id = UINT_MAX;
+	struct drm_plane_state *new_plane_state;
+	struct drm_plane *plane;
+	int i, highest_i = -1;
+
+	if (prev != NULL) {
+		prev_zpos = prev->new_state->zpos;
+		prev_id = prev->ptr->base.id;
+	}
+
+	for_each_new_plane_in_state(state, plane, new_plane_state, i) {
+		/* Skip planes with higher zpos than the previously returned */
+		if (new_plane_state->zpos > prev_zpos ||
+		    (new_plane_state->zpos == prev_zpos &&
+		     plane->base.id >= prev_id))
+			continue;
+
+		/* Save the index of the plane with highest zpos */
+		if (new_plane_state->zpos > highest_zpos ||
+		    (new_plane_state->zpos == highest_zpos &&
+		     plane->base.id > highest_id)) {
+			highest_zpos = new_plane_state->zpos;
+			highest_id = plane->base.id;
+			highest_i = i;
+		}
+	}
+
+	if (highest_i < 0)
+		return NULL;
+
+	return &state->planes[highest_i];
+}
+
+/*
+ * Use the uniqueness of the plane's (zpos, drm obj ID) combination to iterate
+ * by descending zpos, as read from the new plane state. This is the same
+ * ordering as defined by drm_atomic_normalize_zpos().
+ */
+#define for_each_oldnew_plane_in_decending_zpos(__state, plane, old_plane_state, new_plane_state) \
+	for (struct __drm_planes_state *__i = __get_next_zpos((__state), NULL); \
+	     __i != NULL; __i = __get_next_zpos((__state), __i))		\
+		for_each_if (((plane) = __i->ptr,			\
+			      (void)(plane) /* Only to avoid unused-but-set-variable warning */, \
+			      (old_plane_state) = __i->old_state,	\
+			      (new_plane_state) = __i->new_state, 1))
+
+
 static int dm_check_crtc_cursor(struct drm_atomic_state *state,
 				struct drm_crtc *crtc,
 				struct drm_crtc_state *new_crtc_state)
@@ -10571,7 +10659,7 @@ static int dm_check_crtc_cursor(struct drm_atomic_state *state,
 	if (i)
 		return i;
 
-	for_each_new_plane_in_state_reverse(state, underlying, new_underlying_state, i) {
+	for_each_oldnew_plane_in_decending_zpos(state, underlying, old_plane_state, new_underlying_state) {
 		/* Narrow down to non-cursor planes on the same CRTC as the cursor */
 		if (new_underlying_state->crtc != crtc || underlying == crtc->cursor)
 			continue;
@@ -10936,7 +11024,7 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
 	}
 
 	/* Remove exiting planes if they are modified */
-	for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, new_plane_state, i) {
+	for_each_oldnew_plane_in_decending_zpos(state, plane, old_plane_state, new_plane_state) {
 		if (old_plane_state->fb && new_plane_state->fb &&
 		    get_mem_type(old_plane_state->fb) !=
 		    get_mem_type(new_plane_state->fb))
@@ -10981,7 +11069,7 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
 	}
 
 	/* Add new/modified planes */
-	for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, new_plane_state, i) {
+	for_each_oldnew_plane_in_decending_zpos(state, plane, old_plane_state, new_plane_state) {
 		ret = dm_update_plane_state(dc, state, plane,
 					    old_plane_state,
 					    new_plane_state,
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index ed1fc01f1649..787c0dcdd1ea 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -104,8 +104,6 @@ void amdgpu_dm_plane_fill_blending_from_plane_state(const struct drm_plane_state
 	*global_alpha = false;
 	*global_alpha_value = 0xff;
 
-	if (plane_state->plane->type == DRM_PLANE_TYPE_PRIMARY)
-		return;
 
 	if (plane_state->pixel_blend_mode == DRM_MODE_BLEND_PREMULTI ||
 		plane_state->pixel_blend_mode == DRM_MODE_BLEND_COVERAGE) {
@@ -1686,6 +1684,7 @@ int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm,
 	int res = -EPERM;
 	unsigned int supported_rotations;
 	uint64_t *modifiers = NULL;
+	unsigned int primary_zpos = dm->dc->caps.max_slave_planes;
 
 	num_formats = amdgpu_dm_plane_get_plane_formats(plane, plane_cap, formats,
 							ARRAY_SIZE(formats));
@@ -1715,10 +1714,18 @@ int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm,
 	}
 
 	if (plane->type == DRM_PLANE_TYPE_PRIMARY) {
-		drm_plane_create_zpos_immutable_property(plane, 0);
+		/*
+		 * Allow OVERLAY planes to be used as underlays by assigning an
+		 * immutable zpos = # of OVERLAY planes to the PRIMARY plane.
+		 */
+		drm_plane_create_zpos_immutable_property(plane, primary_zpos);
 	} else if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
-		unsigned int zpos = 1 + drm_plane_index(plane);
-		drm_plane_create_zpos_property(plane, zpos, 1, 254);
+		/*
+		 * OVERLAY planes can be below or above the PRIMARY, but cannot
+		 * be above the CURSOR plane.
+		 */
+		unsigned int zpos = primary_zpos + 1 + drm_plane_index(plane);
+		drm_plane_create_zpos_property(plane, zpos, 0, 254);
 	} else if (plane->type == DRM_PLANE_TYPE_CURSOR) {
 		drm_plane_create_zpos_immutable_property(plane, 255);
 	}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode
  2024-03-15 17:09 ` [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode sunpeng.li
@ 2024-03-16  8:38   ` kernel test robot
  2024-03-21 21:39   ` Harry Wentland
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2024-03-16  8:38 UTC (permalink / raw)
  To: sunpeng.li, dri-devel, amd-gfx
  Cc: oe-kbuild-all, Joshua Ashton, Michel Dänzer, Chao Guo,
	Xaver Hugl, Vikas Korjani, Robert Mader, Pekka Paalanen,
	Sean Paul, Simon Ser, Shashank Sharma, Harry Wentland,
	Sebastian Wick, Leo Li

Hi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on drm/drm-next drm-exynos/exynos-drm-next drm-intel/for-linux-next drm-intel/for-linux-next-fixes drm-tip/drm-tip linus/master v6.8 next-20240315]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/sunpeng-li-amd-com/drm-amd-display-Introduce-overlay-cursor-mode/20240316-011404
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20240315170959.165505-2-sunpeng.li%40amd.com
patch subject: [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode
config: loongarch-defconfig (https://download.01.org/0day-ci/archive/20240316/202403161600.6KspdesJ-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240316/202403161600.6KspdesJ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403161600.6KspdesJ-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:10639: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Set whether the cursor should be enabled in native mode, or overlay mode, on


vim +10639 drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c

 10637	
 10638	/**
 10639	 * Set whether the cursor should be enabled in native mode, or overlay mode, on
 10640	 * the dm_crtc_state.
 10641	 *
 10642	 * The cursor should be enabled in overlay mode if the immediate underlying
 10643	 * plane contains a video format.
 10644	 *
 10645	 * Since zpos info is required, drm_atomic_normalize_zpos must be called before
 10646	 * calling this function.
 10647	*/
 10648	static int dm_crtc_set_cursor_mode(struct drm_atomic_state *state,
 10649					    struct dm_crtc_state *dm_crtc_state)
 10650	{
 10651		struct drm_plane_state *plane_state, *old_plane_state, *target_plane_state;
 10652		struct drm_crtc_state *crtc_state = &dm_crtc_state->base;
 10653		struct drm_plane *plane;
 10654		bool consider_mode_change = false;
 10655		bool cursor_changed = false;
 10656		unsigned int target_zpos;
 10657		unsigned int cursor_zpos;
 10658		int i;
 10659	
 10660		/*
 10661		 * Cursor mode can change if a plane's format changes, is
 10662		 * enabled/disabled, or z-order changes.
 10663		 */
 10664		for_each_oldnew_plane_in_state(state, plane, old_plane_state, plane_state, i) {
 10665	
 10666			/* Only care about planes on this CRTC */
 10667			if ((drm_plane_mask(plane) & crtc_state->plane_mask) == 0)
 10668				continue;
 10669	
 10670			if (plane->type == DRM_PLANE_TYPE_CURSOR)
 10671				cursor_changed = true;
 10672	
 10673			if (drm_atomic_plane_enabling(old_plane_state, plane_state) ||
 10674			    drm_atomic_plane_disabling(old_plane_state, plane_state) ||
 10675			    old_plane_state->fb->format != plane_state->fb->format) {
 10676				consider_mode_change = true;
 10677				break;
 10678			}
 10679		}
 10680	
 10681		if (!consider_mode_change && !crtc_state->zpos_changed) {
 10682			return 0;
 10683		}
 10684	
 10685		/*
 10686		 * If no cursor change on this CRTC, and not enabled on this CRTC, then
 10687		 * no need to set cursor mode. This avoids needlessly locking the cursor
 10688		 * state.
 10689		 */
 10690		if (!cursor_changed &&
 10691		    !(drm_plane_mask(crtc_state->crtc->cursor) & crtc_state->plane_mask)) {
 10692			return 0;
 10693		}
 10694	
 10695		plane_state = drm_atomic_get_plane_state(state,
 10696							 crtc_state->crtc->cursor);
 10697		if (IS_ERR(plane_state))
 10698			return PTR_ERR(plane_state);
 10699	
 10700		/* Cursor is disabled */
 10701		if (!plane_state->fb)
 10702			return 0;
 10703	
 10704		cursor_zpos = plane_state->normalized_zpos;
 10705	
 10706		/* Get enabled plane immediately below cursor. */
 10707		target_plane_state = NULL;
 10708		target_zpos = 0;
 10709		drm_for_each_plane_mask(plane, state->dev, crtc_state->plane_mask) {
 10710			if (plane->type == DRM_PLANE_TYPE_CURSOR)
 10711				continue;
 10712	
 10713			plane_state = drm_atomic_get_plane_state(state, plane);
 10714			if (IS_ERR(plane_state))
 10715				return PTR_ERR(plane_state);
 10716	
 10717			if (!plane_state->fb ||
 10718			    plane_state->normalized_zpos >= cursor_zpos)
 10719				continue;
 10720	
 10721			if (plane_state->normalized_zpos >= target_zpos) {
 10722				target_zpos = plane_state->normalized_zpos;
 10723				target_plane_state = plane_state;
 10724			}
 10725		}
 10726	
 10727		/* Nothing below cursor - use overlay mode */
 10728		if (target_plane_state == NULL) {
 10729			dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
 10730			return 0;
 10731		}
 10732	
 10733		if (amdgpu_dm_plane_is_video_format(target_plane_state->fb->format->format)) {
 10734			dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
 10735		} else {
 10736			dm_crtc_state->cursor_mode = DM_CURSOR_NATIVE_MODE;
 10737		}
 10738	
 10739		return 0;
 10740	}
 10741	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] drm/amd/display: Move PRIMARY plane zpos higher
  2024-03-15 17:09 ` [PATCH 2/2] drm/amd/display: Move PRIMARY plane zpos higher sunpeng.li
@ 2024-03-21 21:36   ` Harry Wentland
  2024-03-28 15:20   ` Pekka Paalanen
  1 sibling, 0 replies; 27+ messages in thread
From: Harry Wentland @ 2024-03-21 21:36 UTC (permalink / raw)
  To: sunpeng.li, dri-devel, amd-gfx
  Cc: Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Robert Mader, Pekka Paalanen, Sean Paul,
	Simon Ser, Shashank Sharma, Sebastian Wick



On 2024-03-15 13:09, sunpeng.li@amd.com wrote:
> From: Leo Li <sunpeng.li@amd.com>
> 
> [Why]
> 
> Compositors have different ways of assigning surfaces to DRM planes for
> render offloading. It may decide between various strategies: overlay,
> underlay, or a mix of both
> 
> One way for compositors to implement the underlay strategy is to assign
> a higher zpos to the DRM_PRIMARY plane than the DRM_OVERLAY planes,
> effectively turning the DRM_OVERLAY plane into an underlay plane.
> 
> Today, amdgpu attaches an immutable zpos of 0 to the DRM_PRIMARY plane.
> This however, is an arbitrary restriction. DCN pipes are general
> purpose, and can be arranged in any z-order. To support compositors
> using this allocation scheme, we can set a non-zero immutable zpos for
> the PRIMARY, allowing the placement of OVERLAYS (mutable zpos range
> 0-254) beneath the PRIMARY.
> 
> [How]
> 
> Assign a zpos = #no of OVERLAY planes to the PRIMARY plane. Then, clean
> up any assumptions in the driver of PRIMARY plane having the lowest
> zpos.
> 
> Signed-off-by: Leo Li <sunpeng.li@amd.com>

With the typo mentioned below fixes this is
Reviewed-by: Harry Wentland <harry.wentland@amd.com>

Before merging we should run a full promotion test (especially the IGT
tests) on it as this could break things in subtle ways.

Harry

> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 96 ++++++++++++++++++-
>  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 17 +++-
>  2 files changed, 104 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 09ab330aed17..01b00f587701 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -80,6 +80,7 @@
>  #include <linux/firmware.h>
>  #include <linux/component.h>
>  #include <linux/dmi.h>
> +#include <linux/sort.h>
>  
>  #include <drm/display/drm_dp_mst_helper.h>
>  #include <drm/display/drm_hdmi_helper.h>
> @@ -369,6 +370,20 @@ static inline void reverse_planes_order(struct dc_surface_update *array_of_surfa
>  		swap(array_of_surface_update[i], array_of_surface_update[j]);
>  }
>  
> +/*
> + * DC will program planes with their z-order determined by their ordering
> + * in the dc_surface_updates array. This comparator is used to sort them
> + * by descending zpos.
> + */
> +static int dm_plane_layer_index_cmp(const void *a, const void *b)
> +{
> +	const struct dc_surface_update *sa = (struct dc_surface_update *)a;
> +	const struct dc_surface_update *sb = (struct dc_surface_update *)b;
> +
> +	/* Sort by descending dc_plane layer_index (i.e. normalized_zpos) */
> +	return sb->surface->layer_index - sa->surface->layer_index;
> +}
> +
>  /**
>   * update_planes_and_stream_adapter() - Send planes to be updated in DC
>   *
> @@ -393,7 +408,8 @@ static inline bool update_planes_and_stream_adapter(struct dc *dc,
>  						    struct dc_stream_update *stream_update,
>  						    struct dc_surface_update *array_of_surface_update)
>  {
> -	reverse_planes_order(array_of_surface_update, planes_count);
> +	sort(array_of_surface_update, planes_count,
> +	     sizeof(*array_of_surface_update), dm_plane_layer_index_cmp, NULL);
>  
>  	/*
>  	 * Previous frame finished and HW is ready for optimization.
> @@ -9363,6 +9379,8 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
>  		for (j = 0; j < status->plane_count; j++)
>  			dummy_updates[j].surface = status->plane_states[0];
>  
> +		sort(dummy_updates, status->plane_count,
> +		     sizeof(*dummy_updates), dm_plane_layer_index_cmp, NULL);
>  
>  		mutex_lock(&dm->dc_lock);
>  		dc_update_planes_and_stream(dm->dc,
> @@ -10097,6 +10115,17 @@ static bool should_reset_plane(struct drm_atomic_state *state,
>  	if (new_crtc_state->color_mgmt_changed)
>  		return true;
>  
> +	/*
> +	 * On zpos change, planes need to be reordered by removing and re-adding
> +	 * them one by one to the dc state, in order of descending zpos.
> +	 *
> +	 * TODO: We can likely skip bandwidth validation if the only thing that
> +	 * changed about the plane was it'z z-ordering.
> +	 */
> +	if (new_crtc_state->zpos_changed) {
> +		return true;
> +	}
> +
>  	if (drm_atomic_crtc_needs_modeset(new_crtc_state))
>  		return true;
>  
> @@ -10509,6 +10538,65 @@ dm_get_plane_scale(struct drm_plane_state *plane_state,
>  	*out_plane_scale_h = plane_state->crtc_h * 1000 / plane_src_h;
>  }
>  
> +/*
> + * The normalized_zpos value cannot be used by this iterator directly. It's only
> + * calculated for enabled planes, potentially causing normalized_zpos collisions
> + * between enabled/disabled planes in the atomic state. We need a unique value
> + * so that the iterator will not generate the same object twice, or loop
> + * indefinitely.
> + */
> +static inline struct __drm_planes_state *__get_next_zpos(
> +	struct drm_atomic_state *state,
> +	struct __drm_planes_state *prev)
> +{
> +	unsigned int highest_zpos = 0, prev_zpos = 256;
> +	uint32_t highest_id = 0, prev_id = UINT_MAX;
> +	struct drm_plane_state *new_plane_state;
> +	struct drm_plane *plane;
> +	int i, highest_i = -1;
> +
> +	if (prev != NULL) {
> +		prev_zpos = prev->new_state->zpos;
> +		prev_id = prev->ptr->base.id;
> +	}
> +
> +	for_each_new_plane_in_state(state, plane, new_plane_state, i) {
> +		/* Skip planes with higher zpos than the previously returned */
> +		if (new_plane_state->zpos > prev_zpos ||
> +		    (new_plane_state->zpos == prev_zpos &&
> +		     plane->base.id >= prev_id))
> +			continue;
> +
> +		/* Save the index of the plane with highest zpos */
> +		if (new_plane_state->zpos > highest_zpos ||
> +		    (new_plane_state->zpos == highest_zpos &&
> +		     plane->base.id > highest_id)) {
> +			highest_zpos = new_plane_state->zpos;
> +			highest_id = plane->base.id;
> +			highest_i = i;
> +		}
> +	}
> +
> +	if (highest_i < 0)
> +		return NULL;
> +
> +	return &state->planes[highest_i];
> +}
> +
> +/*
> + * Use the uniqueness of the plane's (zpos, drm obj ID) combination to iterate
> + * by descending zpos, as read from the new plane state. This is the same
> + * ordering as defined by drm_atomic_normalize_zpos().
> + */
> +#define for_each_oldnew_plane_in_decending_zpos(__state, plane, old_plane_state, new_plane_state) \

"decending" > "descending"

Harry

> +	for (struct __drm_planes_state *__i = __get_next_zpos((__state), NULL); \
> +	     __i != NULL; __i = __get_next_zpos((__state), __i))		\
> +		for_each_if (((plane) = __i->ptr,			\
> +			      (void)(plane) /* Only to avoid unused-but-set-variable warning */, \
> +			      (old_plane_state) = __i->old_state,	\
> +			      (new_plane_state) = __i->new_state, 1))
> +
> +
>  static int dm_check_crtc_cursor(struct drm_atomic_state *state,
>  				struct drm_crtc *crtc,
>  				struct drm_crtc_state *new_crtc_state)
> @@ -10571,7 +10659,7 @@ static int dm_check_crtc_cursor(struct drm_atomic_state *state,
>  	if (i)
>  		return i;
>  
> -	for_each_new_plane_in_state_reverse(state, underlying, new_underlying_state, i) {
> +	for_each_oldnew_plane_in_decending_zpos(state, underlying, old_plane_state, new_underlying_state) {
>  		/* Narrow down to non-cursor planes on the same CRTC as the cursor */
>  		if (new_underlying_state->crtc != crtc || underlying == crtc->cursor)
>  			continue;
> @@ -10936,7 +11024,7 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
>  	}
>  
>  	/* Remove exiting planes if they are modified */
> -	for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, new_plane_state, i) {
> +	for_each_oldnew_plane_in_decending_zpos(state, plane, old_plane_state, new_plane_state) {
>  		if (old_plane_state->fb && new_plane_state->fb &&
>  		    get_mem_type(old_plane_state->fb) !=
>  		    get_mem_type(new_plane_state->fb))
> @@ -10981,7 +11069,7 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
>  	}
>  
>  	/* Add new/modified planes */
> -	for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, new_plane_state, i) {
> +	for_each_oldnew_plane_in_decending_zpos(state, plane, old_plane_state, new_plane_state) {
>  		ret = dm_update_plane_state(dc, state, plane,
>  					    old_plane_state,
>  					    new_plane_state,
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> index ed1fc01f1649..787c0dcdd1ea 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> @@ -104,8 +104,6 @@ void amdgpu_dm_plane_fill_blending_from_plane_state(const struct drm_plane_state
>  	*global_alpha = false;
>  	*global_alpha_value = 0xff;
>  
> -	if (plane_state->plane->type == DRM_PLANE_TYPE_PRIMARY)
> -		return;
>  
>  	if (plane_state->pixel_blend_mode == DRM_MODE_BLEND_PREMULTI ||
>  		plane_state->pixel_blend_mode == DRM_MODE_BLEND_COVERAGE) {
> @@ -1686,6 +1684,7 @@ int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm,
>  	int res = -EPERM;
>  	unsigned int supported_rotations;
>  	uint64_t *modifiers = NULL;
> +	unsigned int primary_zpos = dm->dc->caps.max_slave_planes;
>  
>  	num_formats = amdgpu_dm_plane_get_plane_formats(plane, plane_cap, formats,
>  							ARRAY_SIZE(formats));
> @@ -1715,10 +1714,18 @@ int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm,
>  	}
>  
>  	if (plane->type == DRM_PLANE_TYPE_PRIMARY) {
> -		drm_plane_create_zpos_immutable_property(plane, 0);
> +		/*
> +		 * Allow OVERLAY planes to be used as underlays by assigning an
> +		 * immutable zpos = # of OVERLAY planes to the PRIMARY plane.
> +		 */
> +		drm_plane_create_zpos_immutable_property(plane, primary_zpos);
>  	} else if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
> -		unsigned int zpos = 1 + drm_plane_index(plane);
> -		drm_plane_create_zpos_property(plane, zpos, 1, 254);
> +		/*
> +		 * OVERLAY planes can be below or above the PRIMARY, but cannot
> +		 * be above the CURSOR plane.
> +		 */
> +		unsigned int zpos = primary_zpos + 1 + drm_plane_index(plane);
> +		drm_plane_create_zpos_property(plane, zpos, 0, 254);
>  	} else if (plane->type == DRM_PLANE_TYPE_CURSOR) {
>  		drm_plane_create_zpos_immutable_property(plane, 255);
>  	}


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode
  2024-03-15 17:09 ` [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode sunpeng.li
  2024-03-16  8:38   ` kernel test robot
@ 2024-03-21 21:39   ` Harry Wentland
  2024-03-28 15:16   ` Pekka Paalanen
  2024-03-28 15:48   ` Robert Mader
  3 siblings, 0 replies; 27+ messages in thread
From: Harry Wentland @ 2024-03-21 21:39 UTC (permalink / raw)
  To: sunpeng.li, dri-devel, amd-gfx
  Cc: Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Robert Mader, Pekka Paalanen, Sean Paul,
	Simon Ser, Shashank Sharma, Sebastian Wick



On 2024-03-15 13:09, sunpeng.li@amd.com wrote:
> From: Leo Li <sunpeng.li@amd.com>
> 
> [Why]
> 
> DCN is the display hardware for amdgpu. DRM planes are backed by DCN
> hardware pipes, which carry pixel data from one end (memory), to the
> other (output encoder).
> 
> Each DCN pipe has the ability to blend in a cursor early on in the
> pipeline. In other words, there are no dedicated cursor planes in DCN,
> which makes cursor behavior somewhat unintuitive for compositors.
> 
> For example, if the cursor is in RGB format, but the top-most DRM plane
> is in YUV format, DCN will not be able to blend them. Because of this,
> amdgpu_dm rejects all configurations where a cursor needs to be enabled
> on top of a YUV formatted plane.
> 
> From a compositor's perspective, when computing an allocation for
> hardware plane offloading, this cursor-on-yuv configuration result in an
> atomic test failure. Since the failure reason is not obvious at all,
> compositors will likely fall back to full rendering, which is not ideal.
> 
> Instead, amdgpu_dm can try to accommodate the cursor-on-yuv
> configuration by opportunistically reserving a separate DCN pipe just
> for the cursor. We can refer to this as "overlay cursor mode". It is
> contrasted with "native cursor mode", where the native DCN per-pipe
> cursor is used.
> 
> [How]
> 
> On each crtc, compute whether the cursor plane should be enabled in
> overlay mode (which currently, is iff the immediate plane below has a
> YUV format). If it is, mark the CRTC as requesting overlay cursor mode.
> 
> During DC validation, attempt to enable a separate DCN pipe for the
> cursor if it's in overlay mode. If that fails, or if no overlay mode is
> requested, then fallback to native mode.
> 
> Signed-off-by: Leo Li <sunpeng.li@amd.com>

We should run this through our usual testing cycle, and I need to go over
it a bit more closely than I have, but I like it, so it's
Acked-by: Harry Wentland <harry.wentland@amd.com>

Harry

> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 309 +++++++++++++++---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
>  .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
>  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  13 +-
>  4 files changed, 288 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 21a61454c878..09ab330aed17 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -8359,8 +8359,19 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>  	 * Disable the cursor first if we're disabling all the planes.
>  	 * It'll remain on the screen after the planes are re-enabled
>  	 * if we don't.
> +	 *
> +	 * If the cursor is transitioning from native to overlay mode, the
> +	 * native cursor needs to be disabled first.
>  	 */
> -	if (acrtc_state->active_planes == 0)
> +	if (acrtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE &&
> +	    dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
> +		struct dc_cursor_position cursor_position = {0};
> +		dc_stream_set_cursor_position(acrtc_state->stream,
> +					      &cursor_position);
> +	}
> +
> +	if (acrtc_state->active_planes == 0 &&
> +	    dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE)
>  		amdgpu_dm_commit_cursors(state);
>  
>  	/* update planes when needed */
> @@ -8374,7 +8385,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>  		struct dm_plane_state *dm_new_plane_state = to_dm_plane_state(new_plane_state);
>  
>  		/* Cursor plane is handled after stream updates */
> -		if (plane->type == DRM_PLANE_TYPE_CURSOR) {
> +		if (plane->type == DRM_PLANE_TYPE_CURSOR &&
> +		    acrtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
>  			if ((fb && crtc == pcrtc) ||
>  			    (old_plane_state->fb && old_plane_state->crtc == pcrtc))
>  				cursor_update = true;
> @@ -8727,7 +8739,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>  	 * This avoids redundant programming in the case where we're going
>  	 * to be disabling a single plane - those pipes are being disabled.
>  	 */
> -	if (acrtc_state->active_planes)
> +	if (acrtc_state->active_planes &&
> +	    acrtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE)
>  		amdgpu_dm_commit_cursors(state);
>  
>  cleanup:
> @@ -10039,7 +10052,8 @@ static bool should_reset_plane(struct drm_atomic_state *state,
>  {
>  	struct drm_plane *other;
>  	struct drm_plane_state *old_other_state, *new_other_state;
> -	struct drm_crtc_state *new_crtc_state;
> +	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
> +	struct dm_crtc_state *old_dm_crtc_state, *new_dm_crtc_state;
>  	struct amdgpu_device *adev = drm_to_adev(plane->dev);
>  	int i;
>  
> @@ -10061,10 +10075,24 @@ static bool should_reset_plane(struct drm_atomic_state *state,
>  
>  	new_crtc_state =
>  		drm_atomic_get_new_crtc_state(state, new_plane_state->crtc);
> +	old_crtc_state =
> +		drm_atomic_get_old_crtc_state(state, old_plane_state->crtc);
>  
>  	if (!new_crtc_state)
>  		return true;
>  
> +	/*
> +	 * A change in cursor mode means a new dc pipe needs to be acquired or
> +	 * released from the state
> +	 */
> +	old_dm_crtc_state = to_dm_crtc_state(old_crtc_state);
> +	new_dm_crtc_state = to_dm_crtc_state(new_crtc_state);
> +	if (plane->type == DRM_PLANE_TYPE_CURSOR &&
> +	    old_dm_crtc_state != NULL &&
> +	    old_dm_crtc_state->cursor_mode != new_dm_crtc_state->cursor_mode) {
> +		return true;
> +	}
> +
>  	/* CRTC Degamma changes currently require us to recreate planes. */
>  	if (new_crtc_state->color_mgmt_changed)
>  		return true;
> @@ -10216,6 +10244,68 @@ static int dm_check_cursor_fb(struct amdgpu_crtc *new_acrtc,
>  	return 0;
>  }
>  
> +/*
> + * Helper function for checking the cursor in native mode
> + */
> +static int dm_check_native_cursor_state(struct drm_crtc *new_plane_crtc,
> +					struct drm_plane *plane,
> +					struct drm_plane_state *new_plane_state,
> +					bool enable)
> +{
> +
> +	struct amdgpu_crtc *new_acrtc;
> +	int ret;
> +
> +	if (!enable || !new_plane_crtc ||
> +	    drm_atomic_plane_disabling(plane->state, new_plane_state))
> +		return 0;
> +
> +	new_acrtc = to_amdgpu_crtc(new_plane_crtc);
> +
> +	if (new_plane_state->src_x != 0 || new_plane_state->src_y != 0) {
> +		DRM_DEBUG_ATOMIC("Cropping not supported for cursor plane\n");
> +		return -EINVAL;
> +	}
> +
> +	if (new_plane_state->fb) {
> +		ret = dm_check_cursor_fb(new_acrtc, new_plane_state,
> +						new_plane_state->fb);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static bool dm_should_update_native_cursor(struct drm_atomic_state *state,
> +					   struct drm_crtc *old_plane_crtc,
> +					   struct drm_crtc *new_plane_crtc,
> +					   bool enable)
> +{
> +	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
> +	struct dm_crtc_state *dm_old_crtc_state, *dm_new_crtc_state;
> +
> +	if (!enable) {
> +		if (old_plane_crtc == NULL)
> +			return true;
> +
> +		old_crtc_state = drm_atomic_get_old_crtc_state(
> +			state, old_plane_crtc);
> +		dm_old_crtc_state = to_dm_crtc_state(old_crtc_state);
> +
> +		return dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE;
> +	} else {
> +		if (new_plane_crtc == NULL)
> +			return true;
> +
> +		new_crtc_state = drm_atomic_get_new_crtc_state(
> +			state, new_plane_crtc);
> +		dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
> +
> +		return dm_new_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE;
> +	}
> +}
> +
>  static int dm_update_plane_state(struct dc *dc,
>  				 struct drm_atomic_state *state,
>  				 struct drm_plane *plane,
> @@ -10231,8 +10321,7 @@ static int dm_update_plane_state(struct dc *dc,
>  	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
>  	struct dm_crtc_state *dm_new_crtc_state, *dm_old_crtc_state;
>  	struct dm_plane_state *dm_new_plane_state, *dm_old_plane_state;
> -	struct amdgpu_crtc *new_acrtc;
> -	bool needs_reset;
> +	bool needs_reset, update_native_cursor;
>  	int ret = 0;
>  
>  
> @@ -10241,24 +10330,16 @@ static int dm_update_plane_state(struct dc *dc,
>  	dm_new_plane_state = to_dm_plane_state(new_plane_state);
>  	dm_old_plane_state = to_dm_plane_state(old_plane_state);
>  
> -	if (plane->type == DRM_PLANE_TYPE_CURSOR) {
> -		if (!enable || !new_plane_crtc ||
> -			drm_atomic_plane_disabling(plane->state, new_plane_state))
> -			return 0;
> -
> -		new_acrtc = to_amdgpu_crtc(new_plane_crtc);
> -
> -		if (new_plane_state->src_x != 0 || new_plane_state->src_y != 0) {
> -			DRM_DEBUG_ATOMIC("Cropping not supported for cursor plane\n");
> -			return -EINVAL;
> -		}
> +	update_native_cursor = dm_should_update_native_cursor(state,
> +							      old_plane_crtc,
> +							      new_plane_crtc,
> +							      enable);
>  
> -		if (new_plane_state->fb) {
> -			ret = dm_check_cursor_fb(new_acrtc, new_plane_state,
> -						 new_plane_state->fb);
> -			if (ret)
> -				return ret;
> -		}
> +	if (plane->type == DRM_PLANE_TYPE_CURSOR && update_native_cursor) {
> +		ret = dm_check_native_cursor_state(new_plane_crtc, plane,
> +					            new_plane_state, enable);
> +		if (ret)
> +			return ret;
>  
>  		return 0;
>  	}
> @@ -10285,16 +10366,17 @@ static int dm_update_plane_state(struct dc *dc,
>  				plane->base.id, old_plane_crtc->base.id);
>  
>  		ret = dm_atomic_get_state(state, &dm_state);
> -		if (ret)
> -			return ret;
> +		if (ret) {
> +			goto out;
> +		}
>  
>  		if (!dc_state_remove_plane(
>  				dc,
>  				dm_old_crtc_state->stream,
>  				dm_old_plane_state->dc_state,
>  				dm_state->context)) {
> -
> -			return -EINVAL;
> +			ret = -EINVAL;
> +			goto out;
>  		}
>  
>  		if (dm_old_plane_state->dc_state)
> @@ -10323,21 +10405,16 @@ static int dm_update_plane_state(struct dc *dc,
>  			return 0;
>  
>  		ret = amdgpu_dm_plane_helper_check_state(new_plane_state, new_crtc_state);
> -		if (ret)
> -			return ret;
> +		if (ret) {
> +			goto out;
> +		}
>  
>  		WARN_ON(dm_new_plane_state->dc_state);
>  
>  		dc_new_plane_state = dc_create_plane_state(dc);
> -		if (!dc_new_plane_state)
> -			return -ENOMEM;
> -
> -		/* Block top most plane from being a video plane */
> -		if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
> -			if (amdgpu_dm_plane_is_video_format(new_plane_state->fb->format->format) && *is_top_most_overlay)
> -				return -EINVAL;
> -
> -			*is_top_most_overlay = false;
> +		if (!dc_new_plane_state) {
> +			ret = -ENOMEM;
> +			goto out;
>  		}
>  
>  		DRM_DEBUG_ATOMIC("Enabling DRM plane: %d on DRM crtc %d\n",
> @@ -10350,13 +10427,13 @@ static int dm_update_plane_state(struct dc *dc,
>  			new_crtc_state);
>  		if (ret) {
>  			dc_plane_state_release(dc_new_plane_state);
> -			return ret;
> +			goto out;
>  		}
>  
>  		ret = dm_atomic_get_state(state, &dm_state);
>  		if (ret) {
>  			dc_plane_state_release(dc_new_plane_state);
> -			return ret;
> +			goto out;
>  		}
>  
>  		/*
> @@ -10373,7 +10450,8 @@ static int dm_update_plane_state(struct dc *dc,
>  				dm_state->context)) {
>  
>  			dc_plane_state_release(dc_new_plane_state);
> -			return -EINVAL;
> +			ret = -EINVAL;
> +			goto out;
>  		}
>  
>  		dm_new_plane_state->dc_state = dc_new_plane_state;
> @@ -10388,6 +10466,16 @@ static int dm_update_plane_state(struct dc *dc,
>  		*lock_and_validation_needed = true;
>  	}
>  
> +out:
> +	/* If cursor overlay failed, attempt fallback to native mode */
> +	if (ret == -EINVAL && plane->type == DRM_PLANE_TYPE_CURSOR) {
> +		ret = dm_check_native_cursor_state(new_plane_crtc, plane,
> +						    new_plane_state, enable);
> +		if (ret) {
> +			return ret;
> +		}
> +		dm_new_crtc_state->cursor_mode = DM_CURSOR_NATIVE_MODE;
> +	}
>  
>  	return ret;
>  }
> @@ -10544,6 +10632,126 @@ static int add_affected_mst_dsc_crtcs(struct drm_atomic_state *state, struct drm
>  	return drm_dp_mst_add_affected_dsc_crtcs(state, &aconnector->mst_root->mst_mgr);
>  }
>  
> +/**
> + * DOC: Cursor Modes - Native vs Overlay
> + *
> + * In native mode, the cursor uses a integrated cursor pipe within each DCN hw
> + * plane. It does not require a dedicated hw plane to enable, but it is
> + * subjected to the same z-order and scaling as the hw plane. It also has format
> + * restrictions, a RGB cursor in native mode cannot be enabled within a non-RGB
> + * hw plane.
> + *
> + * In overlay mode, the cursor uses a separate DCN hw plane, and thus has its
> + * own scaling and z-pos. It also has no blending restrictions. It lends to a
> + * cursor behavior more akin to a DRM client's expectations. However, it does
> + * occupy an extra DCN plane, and therefore will only be used if a DCN plane is
> + * available.
> +*/
> +
> +/**
> + * Set whether the cursor should be enabled in native mode, or overlay mode, on
> + * the dm_crtc_state.
> + *
> + * The cursor should be enabled in overlay mode if the immediate underlying
> + * plane contains a video format.
> + *
> + * Since zpos info is required, drm_atomic_normalize_zpos must be called before
> + * calling this function.
> +*/
> +static int dm_crtc_set_cursor_mode(struct drm_atomic_state *state,
> +				    struct dm_crtc_state *dm_crtc_state)
> +{
> +	struct drm_plane_state *plane_state, *old_plane_state, *target_plane_state;
> +	struct drm_crtc_state *crtc_state = &dm_crtc_state->base;
> +	struct drm_plane *plane;
> +	bool consider_mode_change = false;
> +	bool cursor_changed = false;
> +	unsigned int target_zpos;
> +	unsigned int cursor_zpos;
> +	int i;
> +
> +	/*
> +	 * Cursor mode can change if a plane's format changes, is
> +	 * enabled/disabled, or z-order changes.
> +	 */
> +	for_each_oldnew_plane_in_state(state, plane, old_plane_state, plane_state, i) {
> +
> +		/* Only care about planes on this CRTC */
> +		if ((drm_plane_mask(plane) & crtc_state->plane_mask) == 0)
> +			continue;
> +
> +		if (plane->type == DRM_PLANE_TYPE_CURSOR)
> +			cursor_changed = true;
> +
> +		if (drm_atomic_plane_enabling(old_plane_state, plane_state) ||
> +		    drm_atomic_plane_disabling(old_plane_state, plane_state) ||
> +		    old_plane_state->fb->format != plane_state->fb->format) {
> +			consider_mode_change = true;
> +			break;
> +		}
> +	}
> +
> +	if (!consider_mode_change && !crtc_state->zpos_changed) {
> +		return 0;
> +	}
> +
> +	/*
> +	 * If no cursor change on this CRTC, and not enabled on this CRTC, then
> +	 * no need to set cursor mode. This avoids needlessly locking the cursor
> +	 * state.
> +	 */
> +	if (!cursor_changed &&
> +	    !(drm_plane_mask(crtc_state->crtc->cursor) & crtc_state->plane_mask)) {
> +		return 0;
> +	}
> +
> +	plane_state = drm_atomic_get_plane_state(state,
> +						 crtc_state->crtc->cursor);
> +	if (IS_ERR(plane_state))
> +		return PTR_ERR(plane_state);
> +
> +	/* Cursor is disabled */
> +	if (!plane_state->fb)
> +		return 0;
> +
> +	cursor_zpos = plane_state->normalized_zpos;
> +
> +	/* Get enabled plane immediately below cursor. */
> +	target_plane_state = NULL;
> +	target_zpos = 0;
> +	drm_for_each_plane_mask(plane, state->dev, crtc_state->plane_mask) {
> +		if (plane->type == DRM_PLANE_TYPE_CURSOR)
> +			continue;
> +
> +		plane_state = drm_atomic_get_plane_state(state, plane);
> +		if (IS_ERR(plane_state))
> +			return PTR_ERR(plane_state);
> +
> +		if (!plane_state->fb ||
> +		    plane_state->normalized_zpos >= cursor_zpos)
> +			continue;
> +
> +		if (plane_state->normalized_zpos >= target_zpos) {
> +			target_zpos = plane_state->normalized_zpos;
> +			target_plane_state = plane_state;
> +		}
> +	}
> +
> +	/* Nothing below cursor - use overlay mode */
> +	if (target_plane_state == NULL) {
> +		dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
> +		return 0;
> +	}
> +
> +	if (amdgpu_dm_plane_is_video_format(target_plane_state->fb->format->format)) {
> +		dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
> +	} else {
> +		dm_crtc_state->cursor_mode = DM_CURSOR_NATIVE_MODE;
> +	}
> +
> +	return 0;
> +}
> +
>  /**
>   * amdgpu_dm_atomic_check() - Atomic check implementation for AMDgpu DM.
>   *
> @@ -10713,6 +10921,20 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
>  		goto fail;
>  	}
>  
> +	/*
> +	 * Determine whether cursors on each CRTC should be enabled in native or
> +	 * overlay mode.
> +	 */
> +	for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
> +		dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
> +
> +		ret = dm_crtc_set_cursor_mode(state, dm_new_crtc_state);
> +		if (ret) {
> +			drm_dbg(dev, "Failed to determine cursor mode\n");
> +			goto fail;
> +		}
> +	}
> +
>  	/* Remove exiting planes if they are modified */
>  	for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, new_plane_state, i) {
>  		if (old_plane_state->fb && new_plane_state->fb &&
> @@ -10793,6 +11015,11 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
>  
>  	/* Check cursor planes scaling */
>  	for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
> +		/* Overlay cusor does not need scaling check */
> +		dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
> +		if (dm_new_crtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE)
> +			continue;
> +
>  		ret = dm_check_crtc_cursor(state, crtc, new_crtc_state);
>  		if (ret) {
>  			DRM_DEBUG_DRIVER("dm_check_crtc_cursor() failed\n");
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
> index 09519b7abf67..b8d39fdd1e09 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
> @@ -822,6 +822,11 @@ struct dm_plane_state {
>  	enum amdgpu_transfer_function blend_tf;
>  };
>  
> +enum amdgpu_dm_cursor_mode {
> +	DM_CURSOR_NATIVE_MODE = 0,
> +	DM_CURSOR_OVERLAY_MODE,
> +};
> +
>  struct dm_crtc_state {
>  	struct drm_crtc_state base;
>  	struct dc_stream_state *stream;
> @@ -852,6 +857,8 @@ struct dm_crtc_state {
>  	 * encoding.
>  	 */
>  	enum amdgpu_transfer_function regamma_tf;
> +
> +	enum amdgpu_dm_cursor_mode cursor_mode;
>  };
>  
>  #define to_dm_crtc_state(x) container_of(x, struct dm_crtc_state, base)
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> index e23a0a276e33..67aea1d2feb9 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> @@ -304,6 +304,7 @@ static struct drm_crtc_state *amdgpu_dm_crtc_duplicate_state(struct drm_crtc *cr
>  	state->regamma_tf = cur->regamma_tf;
>  	state->crc_skip_count = cur->crc_skip_count;
>  	state->mpo_requested = cur->mpo_requested;
> +	state->cursor_mode = cur->cursor_mode;
>  	/* TODO Duplicate dc_stream after objects are stream object is flattened */
>  
>  	return &state->base;
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> index 8a4c40b4c27e..ed1fc01f1649 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> @@ -104,7 +104,7 @@ void amdgpu_dm_plane_fill_blending_from_plane_state(const struct drm_plane_state
>  	*global_alpha = false;
>  	*global_alpha_value = 0xff;
>  
> -	if (plane_state->plane->type != DRM_PLANE_TYPE_OVERLAY)
> +	if (plane_state->plane->type == DRM_PLANE_TYPE_PRIMARY)
>  		return;
>  
>  	if (plane_state->pixel_blend_mode == DRM_MODE_BLEND_PREMULTI ||
> @@ -1175,10 +1175,21 @@ static int amdgpu_dm_plane_atomic_check(struct drm_plane *plane,
>  static int amdgpu_dm_plane_atomic_async_check(struct drm_plane *plane,
>  					      struct drm_atomic_state *state)
>  {
> +	struct drm_crtc_state *new_crtc_state;
> +	struct drm_plane_state *new_plane_state;
> +	struct dm_crtc_state *dm_new_crtc_state;
> +
>  	/* Only support async updates on cursor planes. */
>  	if (plane->type != DRM_PLANE_TYPE_CURSOR)
>  		return -EINVAL;
>  
> +	new_plane_state = drm_atomic_get_new_plane_state(state, plane);
> +	new_crtc_state = drm_atomic_get_new_crtc_state(state, new_plane_state->crtc);
> +	dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
> +	/* Reject overlay cursors for now*/
> +	if (dm_new_crtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE)
> +		return -EINVAL;
> +
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-03-15 17:09 [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible sunpeng.li
  2024-03-15 17:09 ` [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode sunpeng.li
  2024-03-15 17:09 ` [PATCH 2/2] drm/amd/display: Move PRIMARY plane zpos higher sunpeng.li
@ 2024-03-28 14:33 ` Pekka Paalanen
  2024-04-03 21:32   ` Leo Li
  2 siblings, 1 reply; 27+ messages in thread
From: Pekka Paalanen @ 2024-03-28 14:33 UTC (permalink / raw)
  To: sunpeng.li, Marius Vlad
  Cc: dri-devel, amd-gfx, Joshua Ashton, Michel Dänzer, Chao Guo,
	Xaver Hugl, Vikas Korjani, Robert Mader, Sean Paul, Simon Ser,
	Shashank Sharma, Harry Wentland, Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 4801 bytes --]

On Fri, 15 Mar 2024 13:09:56 -0400
<sunpeng.li@amd.com> wrote:

> From: Leo Li <sunpeng.li@amd.com>
> 
> These patches aim to make the amdgpgu KMS driver play nicer with compositors
> when building multi-plane scanout configurations. They do so by:
> 
> 1. Making cursor behavior more sensible.
> 2. Allowing placement of DRM OVERLAY planes underneath the PRIMARY plane for
>    'underlay' configurations (perhaps more of a RFC, see below).
> 
> Please see the commit messages for details.
> 
> 
> For #2, the simplest way to accomplish this was to increase the value of the
> immutable zpos property for the PRIMARY plane. This allowed OVERLAY planes with
> a mutable zpos range of (0-254) to be positioned underneath the PRIMARY for an
> underlay scanout configuration.
> 
> Technically speaking, DCN hardware does not have a concept of primary or overlay
> planes - there are simply 4 general purpose hardware pipes that can be maped in
> any configuration. So the immutable zpos restriction on the PRIMARY plane is
> kind of arbitrary; it can have a mutable range of (0-254) just like the
> OVERLAYs. The distinction between PRIMARY and OVERLAY planes is also somewhat
> arbitrary. We can interpret PRIMARY as the first plane that should be enabled on
> a CRTC, but beyond that, it doesn't mean much for amdgpu.
> 
> Therefore, I'm curious about how compositors devs understand KMS planes and
> their zpos properties, and how we would like to use them. It isn't clear to me
> how compositors wish to interpret and use the DRM zpos property, or
> differentiate between OVERLAY and PRIMARY planes, when it comes to setting up
> multi-plane scanout.

You already quoted me on the Weston link, so I don't think I have
anything to add. Sounds fine to me, and we don't have a standard plane
arrangement algorithm that the kernel could optimize zpos ranges
against, yet.

> Ultimately, what I'd like to answer is "What can we do on the KMS driver and DRM
> plane API side, that can make building multi-plane scanout configurations easier
> for compositors?" I'm hoping we can converge on something, whether that be
> updating the existing documentation to better define the usage, or update the
> API to provide support for something that is lacking.

I think there probably should be a standardised plane arrangement
algorithm in userspace, because the search space suffers from
permutational explosion. Either there needs to be very few planes (max
4 or 5 at-all-possible per CRTC, including shareable ones) for an
exhaustive search to be feasible, or all planes should be more or less
equal in capabilities and userspace employs some simplified or
heuristic search.

If the search algorithm is fixed, then drivers could optimize zpos
ranges to have the algorithm find a solution faster.

My worry is that userspace already has heuristic search algorithms that
may start failing if drivers later change their zpos ranges to be more
optimal for another algorithm.

OTOH, as long as exhaustive search is feasible, then it does not matter
how DRM drivers set up the zpos ranges.

In any case, the zpos ranges should try to allow all possible plane
arrangements while minimizing the number of arrangements that won't
work. The absolute values of zpos are pretty much irrelevant, so I
think setting one plane to have an immutable zpos is a good idea, even
if it's not necessary by the driver. That is one less moving part, and
only the relative ordering between the planes matters.


Thanks,
pq

> Some links to provide context and details:
> * What is underlay?: https://gitlab.freedesktop.org/emersion/libliftoff/-/issues/76
> * Discussion on how to implement underlay on Weston: https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2325164
> 
> Cc: Joshua Ashton <joshua@froggi.es>
> Cc: Michel Dänzer <mdaenzer@redhat.com>
> Cc: Chao Guo <chao.guo@nxp.com>
> Cc: Xaver Hugl <xaver.hugl@gmail.com>
> Cc: Vikas Korjani <Vikas.Korjani@amd.com>
> Cc: Robert Mader <robert.mader@posteo.de>
> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
> Cc: Sean Paul <sean@poorly.run>
> Cc: Simon Ser <contact@emersion.fr>
> Cc: Shashank Sharma <shashank.sharma@amd.com>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Sebastian Wick <sebastian.wick@redhat.com>
> 
> Leo Li (2):
>   drm/amd/display: Introduce overlay cursor mode
>   drm/amd/display: Move PRIMARY plane zpos higher
> 
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 405 ++++++++++++++++--
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
>  .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
>  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  28 +-
>  4 files changed, 391 insertions(+), 50 deletions(-)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode
  2024-03-15 17:09 ` [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode sunpeng.li
  2024-03-16  8:38   ` kernel test robot
  2024-03-21 21:39   ` Harry Wentland
@ 2024-03-28 15:16   ` Pekka Paalanen
  2024-03-28 15:48   ` Robert Mader
  3 siblings, 0 replies; 27+ messages in thread
From: Pekka Paalanen @ 2024-03-28 15:16 UTC (permalink / raw)
  To: sunpeng.li
  Cc: dri-devel, amd-gfx, Joshua Ashton, Michel Dänzer, Chao Guo,
	Xaver Hugl, Vikas Korjani, Robert Mader, Sean Paul, Simon Ser,
	Shashank Sharma, Harry Wentland, Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 2266 bytes --]

On Fri, 15 Mar 2024 13:09:57 -0400
<sunpeng.li@amd.com> wrote:

> From: Leo Li <sunpeng.li@amd.com>
> 
> [Why]
> 
> DCN is the display hardware for amdgpu. DRM planes are backed by DCN
> hardware pipes, which carry pixel data from one end (memory), to the
> other (output encoder).
> 
> Each DCN pipe has the ability to blend in a cursor early on in the
> pipeline. In other words, there are no dedicated cursor planes in DCN,
> which makes cursor behavior somewhat unintuitive for compositors.
> 
> For example, if the cursor is in RGB format, but the top-most DRM plane
> is in YUV format, DCN will not be able to blend them. Because of this,
> amdgpu_dm rejects all configurations where a cursor needs to be enabled
> on top of a YUV formatted plane.
> 
> From a compositor's perspective, when computing an allocation for
> hardware plane offloading, this cursor-on-yuv configuration result in an
> atomic test failure. Since the failure reason is not obvious at all,
> compositors will likely fall back to full rendering, which is not ideal.
> 
> Instead, amdgpu_dm can try to accommodate the cursor-on-yuv
> configuration by opportunistically reserving a separate DCN pipe just
> for the cursor. We can refer to this as "overlay cursor mode". It is
> contrasted with "native cursor mode", where the native DCN per-pipe
> cursor is used.

I can't comment on the code, but this explanation sounds like a really
good move!


Thanks,
pq

> [How]
> 
> On each crtc, compute whether the cursor plane should be enabled in
> overlay mode (which currently, is iff the immediate plane below has a
> YUV format). If it is, mark the CRTC as requesting overlay cursor mode.
> 
> During DC validation, attempt to enable a separate DCN pipe for the
> cursor if it's in overlay mode. If that fails, or if no overlay mode is
> requested, then fallback to native mode.
> 
> Signed-off-by: Leo Li <sunpeng.li@amd.com>
> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 309 +++++++++++++++---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
>  .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
>  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  13 +-
>  4 files changed, 288 insertions(+), 42 deletions(-)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] drm/amd/display: Move PRIMARY plane zpos higher
  2024-03-15 17:09 ` [PATCH 2/2] drm/amd/display: Move PRIMARY plane zpos higher sunpeng.li
  2024-03-21 21:36   ` Harry Wentland
@ 2024-03-28 15:20   ` Pekka Paalanen
  1 sibling, 0 replies; 27+ messages in thread
From: Pekka Paalanen @ 2024-03-28 15:20 UTC (permalink / raw)
  To: sunpeng.li
  Cc: dri-devel, amd-gfx, Joshua Ashton, Michel Dänzer, Chao Guo,
	Xaver Hugl, Vikas Korjani, Robert Mader, Sean Paul, Simon Ser,
	Shashank Sharma, Harry Wentland, Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 1516 bytes --]

On Fri, 15 Mar 2024 13:09:58 -0400
<sunpeng.li@amd.com> wrote:

> From: Leo Li <sunpeng.li@amd.com>
> 
> [Why]
> 
> Compositors have different ways of assigning surfaces to DRM planes for
> render offloading. It may decide between various strategies: overlay,
> underlay, or a mix of both
> 
> One way for compositors to implement the underlay strategy is to assign
> a higher zpos to the DRM_PRIMARY plane than the DRM_OVERLAY planes,
> effectively turning the DRM_OVERLAY plane into an underlay plane.
> 
> Today, amdgpu attaches an immutable zpos of 0 to the DRM_PRIMARY plane.
> This however, is an arbitrary restriction. DCN pipes are general
> purpose, and can be arranged in any z-order. To support compositors
> using this allocation scheme, we can set a non-zero immutable zpos for
> the PRIMARY, allowing the placement of OVERLAYS (mutable zpos range
> 0-254) beneath the PRIMARY.
> 
> [How]
> 
> Assign a zpos = #no of OVERLAY planes to the PRIMARY plane. Then, clean
> up any assumptions in the driver of PRIMARY plane having the lowest
> zpos.

This sounds good to me too. I suppose that means

Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>

for both patches. Or at least for their intentions. :-)


Thanks,
pq

> Signed-off-by: Leo Li <sunpeng.li@amd.com>
> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 96 ++++++++++++++++++-
>  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 17 +++-
>  2 files changed, 104 insertions(+), 9 deletions(-)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode
  2024-03-15 17:09 ` [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode sunpeng.li
                     ` (2 preceding siblings ...)
  2024-03-28 15:16   ` Pekka Paalanen
@ 2024-03-28 15:48   ` Robert Mader
  2024-03-28 15:52     ` Harry Wentland
  3 siblings, 1 reply; 27+ messages in thread
From: Robert Mader @ 2024-03-28 15:48 UTC (permalink / raw)
  To: sunpeng.li, dri-devel, amd-gfx
  Cc: Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Pekka Paalanen, Sean Paul, Simon Ser,
	Shashank Sharma, Harry Wentland, Sebastian Wick

Hi,

On 15.03.24 18:09, sunpeng.li@amd.com wrote:
> From: Leo Li <sunpeng.li@amd.com>
>
> [Why]
>
> DCN is the display hardware for amdgpu. DRM planes are backed by DCN
> hardware pipes, which carry pixel data from one end (memory), to the
> other (output encoder).
>
> Each DCN pipe has the ability to blend in a cursor early on in the
> pipeline. In other words, there are no dedicated cursor planes in DCN,
> which makes cursor behavior somewhat unintuitive for compositors.
>
> For example, if the cursor is in RGB format, but the top-most DRM plane
> is in YUV format, DCN will not be able to blend them. Because of this,
> amdgpu_dm rejects all configurations where a cursor needs to be enabled
> on top of a YUV formatted plane.
>
>  From a compositor's perspective, when computing an allocation for
> hardware plane offloading, this cursor-on-yuv configuration result in an
> atomic test failure. Since the failure reason is not obvious at all,
> compositors will likely fall back to full rendering, which is not ideal.
>
> Instead, amdgpu_dm can try to accommodate the cursor-on-yuv
> configuration by opportunistically reserving a separate DCN pipe just
> for the cursor. We can refer to this as "overlay cursor mode". It is
> contrasted with "native cursor mode", where the native DCN per-pipe
> cursor is used.
>
> [How]
>
> On each crtc, compute whether the cursor plane should be enabled in
> overlay mode (which currently, is iff the immediate plane below has a
> YUV format). If it is, mark the CRTC as requesting overlay cursor mode.

iff -> if

IIRC another case where this would be needed is when the scale of the 
plane below doesn't match the cursor scale. This is especially common 
for videos and thus implicitly covered by the YUV check in most cases, 
but should probably be made explicit. Michel Dänzer might be able to 
comment here.

Another possible case that could be covered here is when some area is 
not covered by any plane and just shows a black background. This happens 
e.g. if a compositor puts a YUV surface on the primary plane that has a 
different width/high ratio than the display. The compositor can add 
black bars by just leaving the area uncovered and thus require only a 
single hardware plane for video playback ("Unless explicitly specified 
(..), the active area of a CRTC will be black by default." 
https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#plane-abstraction). 
If the cursor is visible over this black bars, AMD currently refuses the 
commit IIUC (see also Michel Dänzers comment at 
https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3177#note_1924544)

Thanks,

Robert Mader

>
> During DC validation, attempt to enable a separate DCN pipe for the
> cursor if it's in overlay mode. If that fails, or if no overlay mode is
> requested, then fallback to native mode.
>
> Signed-off-by: Leo Li <sunpeng.li@amd.com>
> ---
>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 309 +++++++++++++++---
>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
>   .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
>   .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  13 +-
>   4 files changed, 288 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 21a61454c878..09ab330aed17 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -8359,8 +8359,19 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>   	 * Disable the cursor first if we're disabling all the planes.
>   	 * It'll remain on the screen after the planes are re-enabled
>   	 * if we don't.
> +	 *
> +	 * If the cursor is transitioning from native to overlay mode, the
> +	 * native cursor needs to be disabled first.
>   	 */
> -	if (acrtc_state->active_planes == 0)
> +	if (acrtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE &&
> +	    dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
> +		struct dc_cursor_position cursor_position = {0};
> +		dc_stream_set_cursor_position(acrtc_state->stream,
> +					      &cursor_position);
> +	}
> +
> +	if (acrtc_state->active_planes == 0 &&
> +	    dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE)
>   		amdgpu_dm_commit_cursors(state);
>   
>   	/* update planes when needed */
> @@ -8374,7 +8385,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>   		struct dm_plane_state *dm_new_plane_state = to_dm_plane_state(new_plane_state);
>   
>   		/* Cursor plane is handled after stream updates */
> -		if (plane->type == DRM_PLANE_TYPE_CURSOR) {
> +		if (plane->type == DRM_PLANE_TYPE_CURSOR &&
> +		    acrtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
>   			if ((fb && crtc == pcrtc) ||
>   			    (old_plane_state->fb && old_plane_state->crtc == pcrtc))
>   				cursor_update = true;
> @@ -8727,7 +8739,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>   	 * This avoids redundant programming in the case where we're going
>   	 * to be disabling a single plane - those pipes are being disabled.
>   	 */
> -	if (acrtc_state->active_planes)
> +	if (acrtc_state->active_planes &&
> +	    acrtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE)
>   		amdgpu_dm_commit_cursors(state);
>   
>   cleanup:
> @@ -10039,7 +10052,8 @@ static bool should_reset_plane(struct drm_atomic_state *state,
>   {
>   	struct drm_plane *other;
>   	struct drm_plane_state *old_other_state, *new_other_state;
> -	struct drm_crtc_state *new_crtc_state;
> +	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
> +	struct dm_crtc_state *old_dm_crtc_state, *new_dm_crtc_state;
>   	struct amdgpu_device *adev = drm_to_adev(plane->dev);
>   	int i;
>   
> @@ -10061,10 +10075,24 @@ static bool should_reset_plane(struct drm_atomic_state *state,
>   
>   	new_crtc_state =
>   		drm_atomic_get_new_crtc_state(state, new_plane_state->crtc);
> +	old_crtc_state =
> +		drm_atomic_get_old_crtc_state(state, old_plane_state->crtc);
>   
>   	if (!new_crtc_state)
>   		return true;
>   
> +	/*
> +	 * A change in cursor mode means a new dc pipe needs to be acquired or
> +	 * released from the state
> +	 */
> +	old_dm_crtc_state = to_dm_crtc_state(old_crtc_state);
> +	new_dm_crtc_state = to_dm_crtc_state(new_crtc_state);
> +	if (plane->type == DRM_PLANE_TYPE_CURSOR &&
> +	    old_dm_crtc_state != NULL &&
> +	    old_dm_crtc_state->cursor_mode != new_dm_crtc_state->cursor_mode) {
> +		return true;
> +	}
> +
>   	/* CRTC Degamma changes currently require us to recreate planes. */
>   	if (new_crtc_state->color_mgmt_changed)
>   		return true;
> @@ -10216,6 +10244,68 @@ static int dm_check_cursor_fb(struct amdgpu_crtc *new_acrtc,
>   	return 0;
>   }
>   
> +/*
> + * Helper function for checking the cursor in native mode
> + */
> +static int dm_check_native_cursor_state(struct drm_crtc *new_plane_crtc,
> +					struct drm_plane *plane,
> +					struct drm_plane_state *new_plane_state,
> +					bool enable)
> +{
> +
> +	struct amdgpu_crtc *new_acrtc;
> +	int ret;
> +
> +	if (!enable || !new_plane_crtc ||
> +	    drm_atomic_plane_disabling(plane->state, new_plane_state))
> +		return 0;
> +
> +	new_acrtc = to_amdgpu_crtc(new_plane_crtc);
> +
> +	if (new_plane_state->src_x != 0 || new_plane_state->src_y != 0) {
> +		DRM_DEBUG_ATOMIC("Cropping not supported for cursor plane\n");
> +		return -EINVAL;
> +	}
> +
> +	if (new_plane_state->fb) {
> +		ret = dm_check_cursor_fb(new_acrtc, new_plane_state,
> +						new_plane_state->fb);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static bool dm_should_update_native_cursor(struct drm_atomic_state *state,
> +					   struct drm_crtc *old_plane_crtc,
> +					   struct drm_crtc *new_plane_crtc,
> +					   bool enable)
> +{
> +	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
> +	struct dm_crtc_state *dm_old_crtc_state, *dm_new_crtc_state;
> +
> +	if (!enable) {
> +		if (old_plane_crtc == NULL)
> +			return true;
> +
> +		old_crtc_state = drm_atomic_get_old_crtc_state(
> +			state, old_plane_crtc);
> +		dm_old_crtc_state = to_dm_crtc_state(old_crtc_state);
> +
> +		return dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE;
> +	} else {
> +		if (new_plane_crtc == NULL)
> +			return true;
> +
> +		new_crtc_state = drm_atomic_get_new_crtc_state(
> +			state, new_plane_crtc);
> +		dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
> +
> +		return dm_new_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE;
> +	}
> +}
> +
>   static int dm_update_plane_state(struct dc *dc,
>   				 struct drm_atomic_state *state,
>   				 struct drm_plane *plane,
> @@ -10231,8 +10321,7 @@ static int dm_update_plane_state(struct dc *dc,
>   	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
>   	struct dm_crtc_state *dm_new_crtc_state, *dm_old_crtc_state;
>   	struct dm_plane_state *dm_new_plane_state, *dm_old_plane_state;
> -	struct amdgpu_crtc *new_acrtc;
> -	bool needs_reset;
> +	bool needs_reset, update_native_cursor;
>   	int ret = 0;
>   
>   
> @@ -10241,24 +10330,16 @@ static int dm_update_plane_state(struct dc *dc,
>   	dm_new_plane_state = to_dm_plane_state(new_plane_state);
>   	dm_old_plane_state = to_dm_plane_state(old_plane_state);
>   
> -	if (plane->type == DRM_PLANE_TYPE_CURSOR) {
> -		if (!enable || !new_plane_crtc ||
> -			drm_atomic_plane_disabling(plane->state, new_plane_state))
> -			return 0;
> -
> -		new_acrtc = to_amdgpu_crtc(new_plane_crtc);
> -
> -		if (new_plane_state->src_x != 0 || new_plane_state->src_y != 0) {
> -			DRM_DEBUG_ATOMIC("Cropping not supported for cursor plane\n");
> -			return -EINVAL;
> -		}
> +	update_native_cursor = dm_should_update_native_cursor(state,
> +							      old_plane_crtc,
> +							      new_plane_crtc,
> +							      enable);
>   
> -		if (new_plane_state->fb) {
> -			ret = dm_check_cursor_fb(new_acrtc, new_plane_state,
> -						 new_plane_state->fb);
> -			if (ret)
> -				return ret;
> -		}
> +	if (plane->type == DRM_PLANE_TYPE_CURSOR && update_native_cursor) {
> +		ret = dm_check_native_cursor_state(new_plane_crtc, plane,
> +					            new_plane_state, enable);
> +		if (ret)
> +			return ret;
>   
>   		return 0;
>   	}
> @@ -10285,16 +10366,17 @@ static int dm_update_plane_state(struct dc *dc,
>   				plane->base.id, old_plane_crtc->base.id);
>   
>   		ret = dm_atomic_get_state(state, &dm_state);
> -		if (ret)
> -			return ret;
> +		if (ret) {
> +			goto out;
> +		}
>   
>   		if (!dc_state_remove_plane(
>   				dc,
>   				dm_old_crtc_state->stream,
>   				dm_old_plane_state->dc_state,
>   				dm_state->context)) {
> -
> -			return -EINVAL;
> +			ret = -EINVAL;
> +			goto out;
>   		}
>   
>   		if (dm_old_plane_state->dc_state)
> @@ -10323,21 +10405,16 @@ static int dm_update_plane_state(struct dc *dc,
>   			return 0;
>   
>   		ret = amdgpu_dm_plane_helper_check_state(new_plane_state, new_crtc_state);
> -		if (ret)
> -			return ret;
> +		if (ret) {
> +			goto out;
> +		}
>   
>   		WARN_ON(dm_new_plane_state->dc_state);
>   
>   		dc_new_plane_state = dc_create_plane_state(dc);
> -		if (!dc_new_plane_state)
> -			return -ENOMEM;
> -
> -		/* Block top most plane from being a video plane */
> -		if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
> -			if (amdgpu_dm_plane_is_video_format(new_plane_state->fb->format->format) && *is_top_most_overlay)
> -				return -EINVAL;
> -
> -			*is_top_most_overlay = false;
> +		if (!dc_new_plane_state) {
> +			ret = -ENOMEM;
> +			goto out;
>   		}
>   
>   		DRM_DEBUG_ATOMIC("Enabling DRM plane: %d on DRM crtc %d\n",
> @@ -10350,13 +10427,13 @@ static int dm_update_plane_state(struct dc *dc,
>   			new_crtc_state);
>   		if (ret) {
>   			dc_plane_state_release(dc_new_plane_state);
> -			return ret;
> +			goto out;
>   		}
>   
>   		ret = dm_atomic_get_state(state, &dm_state);
>   		if (ret) {
>   			dc_plane_state_release(dc_new_plane_state);
> -			return ret;
> +			goto out;
>   		}
>   
>   		/*
> @@ -10373,7 +10450,8 @@ static int dm_update_plane_state(struct dc *dc,
>   				dm_state->context)) {
>   
>   			dc_plane_state_release(dc_new_plane_state);
> -			return -EINVAL;
> +			ret = -EINVAL;
> +			goto out;
>   		}
>   
>   		dm_new_plane_state->dc_state = dc_new_plane_state;
> @@ -10388,6 +10466,16 @@ static int dm_update_plane_state(struct dc *dc,
>   		*lock_and_validation_needed = true;
>   	}
>   
> +out:
> +	/* If cursor overlay failed, attempt fallback to native mode */
> +	if (ret == -EINVAL && plane->type == DRM_PLANE_TYPE_CURSOR) {
> +		ret = dm_check_native_cursor_state(new_plane_crtc, plane,
> +						    new_plane_state, enable);
> +		if (ret) {
> +			return ret;
> +		}
> +		dm_new_crtc_state->cursor_mode = DM_CURSOR_NATIVE_MODE;
> +	}
>   
>   	return ret;
>   }
> @@ -10544,6 +10632,126 @@ static int add_affected_mst_dsc_crtcs(struct drm_atomic_state *state, struct drm
>   	return drm_dp_mst_add_affected_dsc_crtcs(state, &aconnector->mst_root->mst_mgr);
>   }
>   
> +/**
> + * DOC: Cursor Modes - Native vs Overlay
> + *
> + * In native mode, the cursor uses a integrated cursor pipe within each DCN hw
> + * plane. It does not require a dedicated hw plane to enable, but it is
> + * subjected to the same z-order and scaling as the hw plane. It also has format
> + * restrictions, a RGB cursor in native mode cannot be enabled within a non-RGB
> + * hw plane.
> + *
> + * In overlay mode, the cursor uses a separate DCN hw plane, and thus has its
> + * own scaling and z-pos. It also has no blending restrictions. It lends to a
> + * cursor behavior more akin to a DRM client's expectations. However, it does
> + * occupy an extra DCN plane, and therefore will only be used if a DCN plane is
> + * available.
> +*/
> +
> +/**
> + * Set whether the cursor should be enabled in native mode, or overlay mode, on
> + * the dm_crtc_state.
> + *
> + * The cursor should be enabled in overlay mode if the immediate underlying
> + * plane contains a video format.
> + *
> + * Since zpos info is required, drm_atomic_normalize_zpos must be called before
> + * calling this function.
> +*/
> +static int dm_crtc_set_cursor_mode(struct drm_atomic_state *state,
> +				    struct dm_crtc_state *dm_crtc_state)
> +{
> +	struct drm_plane_state *plane_state, *old_plane_state, *target_plane_state;
> +	struct drm_crtc_state *crtc_state = &dm_crtc_state->base;
> +	struct drm_plane *plane;
> +	bool consider_mode_change = false;
> +	bool cursor_changed = false;
> +	unsigned int target_zpos;
> +	unsigned int cursor_zpos;
> +	int i;
> +
> +	/*
> +	 * Cursor mode can change if a plane's format changes, is
> +	 * enabled/disabled, or z-order changes.
> +	 */
> +	for_each_oldnew_plane_in_state(state, plane, old_plane_state, plane_state, i) {
> +
> +		/* Only care about planes on this CRTC */
> +		if ((drm_plane_mask(plane) & crtc_state->plane_mask) == 0)
> +			continue;
> +
> +		if (plane->type == DRM_PLANE_TYPE_CURSOR)
> +			cursor_changed = true;
> +
> +		if (drm_atomic_plane_enabling(old_plane_state, plane_state) ||
> +		    drm_atomic_plane_disabling(old_plane_state, plane_state) ||
> +		    old_plane_state->fb->format != plane_state->fb->format) {
> +			consider_mode_change = true;
> +			break;
> +		}
> +	}
> +
> +	if (!consider_mode_change && !crtc_state->zpos_changed) {
> +		return 0;
> +	}
> +
> +	/*
> +	 * If no cursor change on this CRTC, and not enabled on this CRTC, then
> +	 * no need to set cursor mode. This avoids needlessly locking the cursor
> +	 * state.
> +	 */
> +	if (!cursor_changed &&
> +	    !(drm_plane_mask(crtc_state->crtc->cursor) & crtc_state->plane_mask)) {
> +		return 0;
> +	}
> +
> +	plane_state = drm_atomic_get_plane_state(state,
> +						 crtc_state->crtc->cursor);
> +	if (IS_ERR(plane_state))
> +		return PTR_ERR(plane_state);
> +
> +	/* Cursor is disabled */
> +	if (!plane_state->fb)
> +		return 0;
> +
> +	cursor_zpos = plane_state->normalized_zpos;
> +
> +	/* Get enabled plane immediately below cursor. */
> +	target_plane_state = NULL;
> +	target_zpos = 0;
> +	drm_for_each_plane_mask(plane, state->dev, crtc_state->plane_mask) {
> +		if (plane->type == DRM_PLANE_TYPE_CURSOR)
> +			continue;
> +
> +		plane_state = drm_atomic_get_plane_state(state, plane);
> +		if (IS_ERR(plane_state))
> +			return PTR_ERR(plane_state);
> +
> +		if (!plane_state->fb ||
> +		    plane_state->normalized_zpos >= cursor_zpos)
> +			continue;
> +
> +		if (plane_state->normalized_zpos >= target_zpos) {
> +			target_zpos = plane_state->normalized_zpos;
> +			target_plane_state = plane_state;
> +		}
> +	}
> +
> +	/* Nothing below cursor - use overlay mode */
> +	if (target_plane_state == NULL) {
> +		dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
> +		return 0;
> +	}
> +
> +	if (amdgpu_dm_plane_is_video_format(target_plane_state->fb->format->format)) {
> +		dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
> +	} else {
> +		dm_crtc_state->cursor_mode = DM_CURSOR_NATIVE_MODE;
> +	}
> +
> +	return 0;
> +}
> +
>   /**
>    * amdgpu_dm_atomic_check() - Atomic check implementation for AMDgpu DM.
>    *
> @@ -10713,6 +10921,20 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
>   		goto fail;
>   	}
>   
> +	/*
> +	 * Determine whether cursors on each CRTC should be enabled in native or
> +	 * overlay mode.
> +	 */
> +	for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
> +		dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
> +
> +		ret = dm_crtc_set_cursor_mode(state, dm_new_crtc_state);
> +		if (ret) {
> +			drm_dbg(dev, "Failed to determine cursor mode\n");
> +			goto fail;
> +		}
> +	}
> +
>   	/* Remove exiting planes if they are modified */
>   	for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, new_plane_state, i) {
>   		if (old_plane_state->fb && new_plane_state->fb &&
> @@ -10793,6 +11015,11 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
>   
>   	/* Check cursor planes scaling */
>   	for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
> +		/* Overlay cusor does not need scaling check */
> +		dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
> +		if (dm_new_crtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE)
> +			continue;
> +
>   		ret = dm_check_crtc_cursor(state, crtc, new_crtc_state);
>   		if (ret) {
>   			DRM_DEBUG_DRIVER("dm_check_crtc_cursor() failed\n");
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
> index 09519b7abf67..b8d39fdd1e09 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
> @@ -822,6 +822,11 @@ struct dm_plane_state {
>   	enum amdgpu_transfer_function blend_tf;
>   };
>   
> +enum amdgpu_dm_cursor_mode {
> +	DM_CURSOR_NATIVE_MODE = 0,
> +	DM_CURSOR_OVERLAY_MODE,
> +};
> +
>   struct dm_crtc_state {
>   	struct drm_crtc_state base;
>   	struct dc_stream_state *stream;
> @@ -852,6 +857,8 @@ struct dm_crtc_state {
>   	 * encoding.
>   	 */
>   	enum amdgpu_transfer_function regamma_tf;
> +
> +	enum amdgpu_dm_cursor_mode cursor_mode;
>   };
>   
>   #define to_dm_crtc_state(x) container_of(x, struct dm_crtc_state, base)
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> index e23a0a276e33..67aea1d2feb9 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> @@ -304,6 +304,7 @@ static struct drm_crtc_state *amdgpu_dm_crtc_duplicate_state(struct drm_crtc *cr
>   	state->regamma_tf = cur->regamma_tf;
>   	state->crc_skip_count = cur->crc_skip_count;
>   	state->mpo_requested = cur->mpo_requested;
> +	state->cursor_mode = cur->cursor_mode;
>   	/* TODO Duplicate dc_stream after objects are stream object is flattened */
>   
>   	return &state->base;
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> index 8a4c40b4c27e..ed1fc01f1649 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> @@ -104,7 +104,7 @@ void amdgpu_dm_plane_fill_blending_from_plane_state(const struct drm_plane_state
>   	*global_alpha = false;
>   	*global_alpha_value = 0xff;
>   
> -	if (plane_state->plane->type != DRM_PLANE_TYPE_OVERLAY)
> +	if (plane_state->plane->type == DRM_PLANE_TYPE_PRIMARY)
>   		return;
>   
>   	if (plane_state->pixel_blend_mode == DRM_MODE_BLEND_PREMULTI ||
> @@ -1175,10 +1175,21 @@ static int amdgpu_dm_plane_atomic_check(struct drm_plane *plane,
>   static int amdgpu_dm_plane_atomic_async_check(struct drm_plane *plane,
>   					      struct drm_atomic_state *state)
>   {
> +	struct drm_crtc_state *new_crtc_state;
> +	struct drm_plane_state *new_plane_state;
> +	struct dm_crtc_state *dm_new_crtc_state;
> +
>   	/* Only support async updates on cursor planes. */
>   	if (plane->type != DRM_PLANE_TYPE_CURSOR)
>   		return -EINVAL;
>   
> +	new_plane_state = drm_atomic_get_new_plane_state(state, plane);
> +	new_crtc_state = drm_atomic_get_new_crtc_state(state, new_plane_state->crtc);
> +	dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
> +	/* Reject overlay cursors for now*/
> +	if (dm_new_crtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE)
> +		return -EINVAL;
> +
>   	return 0;
>   }
>   

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode
  2024-03-28 15:48   ` Robert Mader
@ 2024-03-28 15:52     ` Harry Wentland
  2024-04-01 14:38       ` Leo Li
  0 siblings, 1 reply; 27+ messages in thread
From: Harry Wentland @ 2024-03-28 15:52 UTC (permalink / raw)
  To: Robert Mader, sunpeng.li, dri-devel, amd-gfx
  Cc: Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Pekka Paalanen, Sean Paul, Simon Ser,
	Shashank Sharma, Sebastian Wick



On 2024-03-28 11:48, Robert Mader wrote:
> Hi,
> 
> On 15.03.24 18:09, sunpeng.li@amd.com wrote:
>> From: Leo Li <sunpeng.li@amd.com>
>>
>> [Why]
>>
>> DCN is the display hardware for amdgpu. DRM planes are backed by DCN
>> hardware pipes, which carry pixel data from one end (memory), to the
>> other (output encoder).
>>
>> Each DCN pipe has the ability to blend in a cursor early on in the
>> pipeline. In other words, there are no dedicated cursor planes in DCN,
>> which makes cursor behavior somewhat unintuitive for compositors.
>>
>> For example, if the cursor is in RGB format, but the top-most DRM plane
>> is in YUV format, DCN will not be able to blend them. Because of this,
>> amdgpu_dm rejects all configurations where a cursor needs to be enabled
>> on top of a YUV formatted plane.
>>
>>  From a compositor's perspective, when computing an allocation for
>> hardware plane offloading, this cursor-on-yuv configuration result in an
>> atomic test failure. Since the failure reason is not obvious at all,
>> compositors will likely fall back to full rendering, which is not ideal.
>>
>> Instead, amdgpu_dm can try to accommodate the cursor-on-yuv
>> configuration by opportunistically reserving a separate DCN pipe just
>> for the cursor. We can refer to this as "overlay cursor mode". It is
>> contrasted with "native cursor mode", where the native DCN per-pipe
>> cursor is used.
>>
>> [How]
>>
>> On each crtc, compute whether the cursor plane should be enabled in
>> overlay mode (which currently, is iff the immediate plane below has a
>> YUV format). If it is, mark the CRTC as requesting overlay cursor mode.
> 
> iff -> if
> 
> IIRC another case where this would be needed is when the scale of the 
> plane below doesn't match the cursor scale. This is especially common 
> for videos and thus implicitly covered by the YUV check in most cases, 
> but should probably be made explicit. Michel Dänzer might be able to 
> comment here.
> 
> Another possible case that could be covered here is when some area is 
> not covered by any plane and just shows a black background. This happens 
> e.g. if a compositor puts a YUV surface on the primary plane that has a 
> different width/high ratio than the display. The compositor can add 
> black bars by just leaving the area uncovered and thus require only a 
> single hardware plane for video playback ("Unless explicitly specified 
> (..), the active area of a CRTC will be black by default." 
> https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#plane-abstraction). If the cursor is visible over this black bars, AMD currently refuses the commit IIUC (see also Michel Dänzers comment at https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3177#note_1924544)
> 

Thanks for mentioning both of these scenarios. I agree it would be
good if we can expand the use of the overlay cursor to these cases.

Harry

> Thanks,
> 
> Robert Mader
> 
>>
>> During DC validation, attempt to enable a separate DCN pipe for the
>> cursor if it's in overlay mode. If that fails, or if no overlay mode is
>> requested, then fallback to native mode.
>>
>> Signed-off-by: Leo Li <sunpeng.li@amd.com>
>> ---
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 309 +++++++++++++++---
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
>>   .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
>>   .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  13 +-
>>   4 files changed, 288 insertions(+), 42 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index 21a61454c878..09ab330aed17 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -8359,8 +8359,19 @@ static void amdgpu_dm_commit_planes(struct 
>> drm_atomic_state *state,
>>        * Disable the cursor first if we're disabling all the planes.
>>        * It'll remain on the screen after the planes are re-enabled
>>        * if we don't.
>> +     *
>> +     * If the cursor is transitioning from native to overlay mode, the
>> +     * native cursor needs to be disabled first.
>>        */
>> -    if (acrtc_state->active_planes == 0)
>> +    if (acrtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE &&
>> +        dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
>> +        struct dc_cursor_position cursor_position = {0};
>> +        dc_stream_set_cursor_position(acrtc_state->stream,
>> +                          &cursor_position);
>> +    }
>> +
>> +    if (acrtc_state->active_planes == 0 &&
>> +        dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE)
>>           amdgpu_dm_commit_cursors(state);
>>       /* update planes when needed */
>> @@ -8374,7 +8385,8 @@ static void amdgpu_dm_commit_planes(struct 
>> drm_atomic_state *state,
>>           struct dm_plane_state *dm_new_plane_state = 
>> to_dm_plane_state(new_plane_state);
>>           /* Cursor plane is handled after stream updates */
>> -        if (plane->type == DRM_PLANE_TYPE_CURSOR) {
>> +        if (plane->type == DRM_PLANE_TYPE_CURSOR &&
>> +            acrtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
>>               if ((fb && crtc == pcrtc) ||
>>                   (old_plane_state->fb && old_plane_state->crtc == 
>> pcrtc))
>>                   cursor_update = true;
>> @@ -8727,7 +8739,8 @@ static void amdgpu_dm_commit_planes(struct 
>> drm_atomic_state *state,
>>        * This avoids redundant programming in the case where we're going
>>        * to be disabling a single plane - those pipes are being disabled.
>>        */
>> -    if (acrtc_state->active_planes)
>> +    if (acrtc_state->active_planes &&
>> +        acrtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE)
>>           amdgpu_dm_commit_cursors(state);
>>   cleanup:
>> @@ -10039,7 +10052,8 @@ static bool should_reset_plane(struct 
>> drm_atomic_state *state,
>>   {
>>       struct drm_plane *other;
>>       struct drm_plane_state *old_other_state, *new_other_state;
>> -    struct drm_crtc_state *new_crtc_state;
>> +    struct drm_crtc_state *old_crtc_state, *new_crtc_state;
>> +    struct dm_crtc_state *old_dm_crtc_state, *new_dm_crtc_state;
>>       struct amdgpu_device *adev = drm_to_adev(plane->dev);
>>       int i;
>> @@ -10061,10 +10075,24 @@ static bool should_reset_plane(struct 
>> drm_atomic_state *state,
>>       new_crtc_state =
>>           drm_atomic_get_new_crtc_state(state, new_plane_state->crtc);
>> +    old_crtc_state =
>> +        drm_atomic_get_old_crtc_state(state, old_plane_state->crtc);
>>       if (!new_crtc_state)
>>           return true;
>> +    /*
>> +     * A change in cursor mode means a new dc pipe needs to be 
>> acquired or
>> +     * released from the state
>> +     */
>> +    old_dm_crtc_state = to_dm_crtc_state(old_crtc_state);
>> +    new_dm_crtc_state = to_dm_crtc_state(new_crtc_state);
>> +    if (plane->type == DRM_PLANE_TYPE_CURSOR &&
>> +        old_dm_crtc_state != NULL &&
>> +        old_dm_crtc_state->cursor_mode != 
>> new_dm_crtc_state->cursor_mode) {
>> +        return true;
>> +    }
>> +
>>       /* CRTC Degamma changes currently require us to recreate planes. */
>>       if (new_crtc_state->color_mgmt_changed)
>>           return true;
>> @@ -10216,6 +10244,68 @@ static int dm_check_cursor_fb(struct 
>> amdgpu_crtc *new_acrtc,
>>       return 0;
>>   }
>> +/*
>> + * Helper function for checking the cursor in native mode
>> + */
>> +static int dm_check_native_cursor_state(struct drm_crtc *new_plane_crtc,
>> +                    struct drm_plane *plane,
>> +                    struct drm_plane_state *new_plane_state,
>> +                    bool enable)
>> +{
>> +
>> +    struct amdgpu_crtc *new_acrtc;
>> +    int ret;
>> +
>> +    if (!enable || !new_plane_crtc ||
>> +        drm_atomic_plane_disabling(plane->state, new_plane_state))
>> +        return 0;
>> +
>> +    new_acrtc = to_amdgpu_crtc(new_plane_crtc);
>> +
>> +    if (new_plane_state->src_x != 0 || new_plane_state->src_y != 0) {
>> +        DRM_DEBUG_ATOMIC("Cropping not supported for cursor plane\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (new_plane_state->fb) {
>> +        ret = dm_check_cursor_fb(new_acrtc, new_plane_state,
>> +                        new_plane_state->fb);
>> +        if (ret)
>> +            return ret;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static bool dm_should_update_native_cursor(struct drm_atomic_state 
>> *state,
>> +                       struct drm_crtc *old_plane_crtc,
>> +                       struct drm_crtc *new_plane_crtc,
>> +                       bool enable)
>> +{
>> +    struct drm_crtc_state *old_crtc_state, *new_crtc_state;
>> +    struct dm_crtc_state *dm_old_crtc_state, *dm_new_crtc_state;
>> +
>> +    if (!enable) {
>> +        if (old_plane_crtc == NULL)
>> +            return true;
>> +
>> +        old_crtc_state = drm_atomic_get_old_crtc_state(
>> +            state, old_plane_crtc);
>> +        dm_old_crtc_state = to_dm_crtc_state(old_crtc_state);
>> +
>> +        return dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE;
>> +    } else {
>> +        if (new_plane_crtc == NULL)
>> +            return true;
>> +
>> +        new_crtc_state = drm_atomic_get_new_crtc_state(
>> +            state, new_plane_crtc);
>> +        dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
>> +
>> +        return dm_new_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE;
>> +    }
>> +}
>> +
>>   static int dm_update_plane_state(struct dc *dc,
>>                    struct drm_atomic_state *state,
>>                    struct drm_plane *plane,
>> @@ -10231,8 +10321,7 @@ static int dm_update_plane_state(struct dc *dc,
>>       struct drm_crtc_state *old_crtc_state, *new_crtc_state;
>>       struct dm_crtc_state *dm_new_crtc_state, *dm_old_crtc_state;
>>       struct dm_plane_state *dm_new_plane_state, *dm_old_plane_state;
>> -    struct amdgpu_crtc *new_acrtc;
>> -    bool needs_reset;
>> +    bool needs_reset, update_native_cursor;
>>       int ret = 0;
>> @@ -10241,24 +10330,16 @@ static int dm_update_plane_state(struct dc *dc,
>>       dm_new_plane_state = to_dm_plane_state(new_plane_state);
>>       dm_old_plane_state = to_dm_plane_state(old_plane_state);
>> -    if (plane->type == DRM_PLANE_TYPE_CURSOR) {
>> -        if (!enable || !new_plane_crtc ||
>> -            drm_atomic_plane_disabling(plane->state, new_plane_state))
>> -            return 0;
>> -
>> -        new_acrtc = to_amdgpu_crtc(new_plane_crtc);
>> -
>> -        if (new_plane_state->src_x != 0 || new_plane_state->src_y != 
>> 0) {
>> -            DRM_DEBUG_ATOMIC("Cropping not supported for cursor 
>> plane\n");
>> -            return -EINVAL;
>> -        }
>> +    update_native_cursor = dm_should_update_native_cursor(state,
>> +                                  old_plane_crtc,
>> +                                  new_plane_crtc,
>> +                                  enable);
>> -        if (new_plane_state->fb) {
>> -            ret = dm_check_cursor_fb(new_acrtc, new_plane_state,
>> -                         new_plane_state->fb);
>> -            if (ret)
>> -                return ret;
>> -        }
>> +    if (plane->type == DRM_PLANE_TYPE_CURSOR && update_native_cursor) {
>> +        ret = dm_check_native_cursor_state(new_plane_crtc, plane,
>> +                                new_plane_state, enable);
>> +        if (ret)
>> +            return ret;
>>           return 0;
>>       }
>> @@ -10285,16 +10366,17 @@ static int dm_update_plane_state(struct dc *dc,
>>                   plane->base.id, old_plane_crtc->base.id);
>>           ret = dm_atomic_get_state(state, &dm_state);
>> -        if (ret)
>> -            return ret;
>> +        if (ret) {
>> +            goto out;
>> +        }
>>           if (!dc_state_remove_plane(
>>                   dc,
>>                   dm_old_crtc_state->stream,
>>                   dm_old_plane_state->dc_state,
>>                   dm_state->context)) {
>> -
>> -            return -EINVAL;
>> +            ret = -EINVAL;
>> +            goto out;
>>           }
>>           if (dm_old_plane_state->dc_state)
>> @@ -10323,21 +10405,16 @@ static int dm_update_plane_state(struct dc *dc,
>>               return 0;
>>           ret = amdgpu_dm_plane_helper_check_state(new_plane_state, 
>> new_crtc_state);
>> -        if (ret)
>> -            return ret;
>> +        if (ret) {
>> +            goto out;
>> +        }
>>           WARN_ON(dm_new_plane_state->dc_state);
>>           dc_new_plane_state = dc_create_plane_state(dc);
>> -        if (!dc_new_plane_state)
>> -            return -ENOMEM;
>> -
>> -        /* Block top most plane from being a video plane */
>> -        if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
>> -            if 
>> (amdgpu_dm_plane_is_video_format(new_plane_state->fb->format->format) 
>> && *is_top_most_overlay)
>> -                return -EINVAL;
>> -
>> -            *is_top_most_overlay = false;
>> +        if (!dc_new_plane_state) {
>> +            ret = -ENOMEM;
>> +            goto out;
>>           }
>>           DRM_DEBUG_ATOMIC("Enabling DRM plane: %d on DRM crtc %d\n",
>> @@ -10350,13 +10427,13 @@ static int dm_update_plane_state(struct dc *dc,
>>               new_crtc_state);
>>           if (ret) {
>>               dc_plane_state_release(dc_new_plane_state);
>> -            return ret;
>> +            goto out;
>>           }
>>           ret = dm_atomic_get_state(state, &dm_state);
>>           if (ret) {
>>               dc_plane_state_release(dc_new_plane_state);
>> -            return ret;
>> +            goto out;
>>           }
>>           /*
>> @@ -10373,7 +10450,8 @@ static int dm_update_plane_state(struct dc *dc,
>>                   dm_state->context)) {
>>               dc_plane_state_release(dc_new_plane_state);
>> -            return -EINVAL;
>> +            ret = -EINVAL;
>> +            goto out;
>>           }
>>           dm_new_plane_state->dc_state = dc_new_plane_state;
>> @@ -10388,6 +10466,16 @@ static int dm_update_plane_state(struct dc *dc,
>>           *lock_and_validation_needed = true;
>>       }
>> +out:
>> +    /* If cursor overlay failed, attempt fallback to native mode */
>> +    if (ret == -EINVAL && plane->type == DRM_PLANE_TYPE_CURSOR) {
>> +        ret = dm_check_native_cursor_state(new_plane_crtc, plane,
>> +                            new_plane_state, enable);
>> +        if (ret) {
>> +            return ret;
>> +        }
>> +        dm_new_crtc_state->cursor_mode = DM_CURSOR_NATIVE_MODE;
>> +    }
>>       return ret;
>>   }
>> @@ -10544,6 +10632,126 @@ static int add_affected_mst_dsc_crtcs(struct 
>> drm_atomic_state *state, struct drm
>>       return drm_dp_mst_add_affected_dsc_crtcs(state, 
>> &aconnector->mst_root->mst_mgr);
>>   }
>> +/**
>> + * DOC: Cursor Modes - Native vs Overlay
>> + *
>> + * In native mode, the cursor uses a integrated cursor pipe within 
>> each DCN hw
>> + * plane. It does not require a dedicated hw plane to enable, but it is
>> + * subjected to the same z-order and scaling as the hw plane. It also 
>> has format
>> + * restrictions, a RGB cursor in native mode cannot be enabled within 
>> a non-RGB
>> + * hw plane.
>> + *
>> + * In overlay mode, the cursor uses a separate DCN hw plane, and thus 
>> has its
>> + * own scaling and z-pos. It also has no blending restrictions. It 
>> lends to a
>> + * cursor behavior more akin to a DRM client's expectations. However, 
>> it does
>> + * occupy an extra DCN plane, and therefore will only be used if a 
>> DCN plane is
>> + * available.
>> +*/
>> +
>> +/**
>> + * Set whether the cursor should be enabled in native mode, or 
>> overlay mode, on
>> + * the dm_crtc_state.
>> + *
>> + * The cursor should be enabled in overlay mode if the immediate 
>> underlying
>> + * plane contains a video format.
>> + *
>> + * Since zpos info is required, drm_atomic_normalize_zpos must be 
>> called before
>> + * calling this function.
>> +*/
>> +static int dm_crtc_set_cursor_mode(struct drm_atomic_state *state,
>> +                    struct dm_crtc_state *dm_crtc_state)
>> +{
>> +    struct drm_plane_state *plane_state, *old_plane_state, 
>> *target_plane_state;
>> +    struct drm_crtc_state *crtc_state = &dm_crtc_state->base;
>> +    struct drm_plane *plane;
>> +    bool consider_mode_change = false;
>> +    bool cursor_changed = false;
>> +    unsigned int target_zpos;
>> +    unsigned int cursor_zpos;
>> +    int i;
>> +
>> +    /*
>> +     * Cursor mode can change if a plane's format changes, is
>> +     * enabled/disabled, or z-order changes.
>> +     */
>> +    for_each_oldnew_plane_in_state(state, plane, old_plane_state, 
>> plane_state, i) {
>> +
>> +        /* Only care about planes on this CRTC */
>> +        if ((drm_plane_mask(plane) & crtc_state->plane_mask) == 0)
>> +            continue;
>> +
>> +        if (plane->type == DRM_PLANE_TYPE_CURSOR)
>> +            cursor_changed = true;
>> +
>> +        if (drm_atomic_plane_enabling(old_plane_state, plane_state) ||
>> +            drm_atomic_plane_disabling(old_plane_state, plane_state) ||
>> +            old_plane_state->fb->format != plane_state->fb->format) {
>> +            consider_mode_change = true;
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (!consider_mode_change && !crtc_state->zpos_changed) {
>> +        return 0;
>> +    }
>> +
>> +    /*
>> +     * If no cursor change on this CRTC, and not enabled on this 
>> CRTC, then
>> +     * no need to set cursor mode. This avoids needlessly locking the 
>> cursor
>> +     * state.
>> +     */
>> +    if (!cursor_changed &&
>> +        !(drm_plane_mask(crtc_state->crtc->cursor) & 
>> crtc_state->plane_mask)) {
>> +        return 0;
>> +    }
>> +
>> +    plane_state = drm_atomic_get_plane_state(state,
>> +                         crtc_state->crtc->cursor);
>> +    if (IS_ERR(plane_state))
>> +        return PTR_ERR(plane_state);
>> +
>> +    /* Cursor is disabled */
>> +    if (!plane_state->fb)
>> +        return 0;
>> +
>> +    cursor_zpos = plane_state->normalized_zpos;
>> +
>> +    /* Get enabled plane immediately below cursor. */
>> +    target_plane_state = NULL;
>> +    target_zpos = 0;
>> +    drm_for_each_plane_mask(plane, state->dev, crtc_state->plane_mask) {
>> +        if (plane->type == DRM_PLANE_TYPE_CURSOR)
>> +            continue;
>> +
>> +        plane_state = drm_atomic_get_plane_state(state, plane);
>> +        if (IS_ERR(plane_state))
>> +            return PTR_ERR(plane_state);
>> +
>> +        if (!plane_state->fb ||
>> +            plane_state->normalized_zpos >= cursor_zpos)
>> +            continue;
>> +
>> +        if (plane_state->normalized_zpos >= target_zpos) {
>> +            target_zpos = plane_state->normalized_zpos;
>> +            target_plane_state = plane_state;
>> +        }
>> +    }
>> +
>> +    /* Nothing below cursor - use overlay mode */
>> +    if (target_plane_state == NULL) {
>> +        dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
>> +        return 0;
>> +    }
>> +
>> +    if 
>> (amdgpu_dm_plane_is_video_format(target_plane_state->fb->format->format)) {
>> +        dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
>> +    } else {
>> +        dm_crtc_state->cursor_mode = DM_CURSOR_NATIVE_MODE;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   /**
>>    * amdgpu_dm_atomic_check() - Atomic check implementation for AMDgpu 
>> DM.
>>    *
>> @@ -10713,6 +10921,20 @@ static int amdgpu_dm_atomic_check(struct 
>> drm_device *dev,
>>           goto fail;
>>       }
>> +    /*
>> +     * Determine whether cursors on each CRTC should be enabled in 
>> native or
>> +     * overlay mode.
>> +     */
>> +    for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
>> +        dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
>> +
>> +        ret = dm_crtc_set_cursor_mode(state, dm_new_crtc_state);
>> +        if (ret) {
>> +            drm_dbg(dev, "Failed to determine cursor mode\n");
>> +            goto fail;
>> +        }
>> +    }
>> +
>>       /* Remove exiting planes if they are modified */
>>       for_each_oldnew_plane_in_state_reverse(state, plane, 
>> old_plane_state, new_plane_state, i) {
>>           if (old_plane_state->fb && new_plane_state->fb &&
>> @@ -10793,6 +11015,11 @@ static int amdgpu_dm_atomic_check(struct 
>> drm_device *dev,
>>       /* Check cursor planes scaling */
>>       for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
>> +        /* Overlay cusor does not need scaling check */
>> +        dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
>> +        if (dm_new_crtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE)
>> +            continue;
>> +
>>           ret = dm_check_crtc_cursor(state, crtc, new_crtc_state);
>>           if (ret) {
>>               DRM_DEBUG_DRIVER("dm_check_crtc_cursor() failed\n");
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
>> index 09519b7abf67..b8d39fdd1e09 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
>> @@ -822,6 +822,11 @@ struct dm_plane_state {
>>       enum amdgpu_transfer_function blend_tf;
>>   };
>> +enum amdgpu_dm_cursor_mode {
>> +    DM_CURSOR_NATIVE_MODE = 0,
>> +    DM_CURSOR_OVERLAY_MODE,
>> +};
>> +
>>   struct dm_crtc_state {
>>       struct drm_crtc_state base;
>>       struct dc_stream_state *stream;
>> @@ -852,6 +857,8 @@ struct dm_crtc_state {
>>        * encoding.
>>        */
>>       enum amdgpu_transfer_function regamma_tf;
>> +
>> +    enum amdgpu_dm_cursor_mode cursor_mode;
>>   };
>>   #define to_dm_crtc_state(x) container_of(x, struct dm_crtc_state, base)
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
>> index e23a0a276e33..67aea1d2feb9 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
>> @@ -304,6 +304,7 @@ static struct drm_crtc_state 
>> *amdgpu_dm_crtc_duplicate_state(struct drm_crtc *cr
>>       state->regamma_tf = cur->regamma_tf;
>>       state->crc_skip_count = cur->crc_skip_count;
>>       state->mpo_requested = cur->mpo_requested;
>> +    state->cursor_mode = cur->cursor_mode;
>>       /* TODO Duplicate dc_stream after objects are stream object is 
>> flattened */
>>       return &state->base;
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
>> index 8a4c40b4c27e..ed1fc01f1649 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
>> @@ -104,7 +104,7 @@ void 
>> amdgpu_dm_plane_fill_blending_from_plane_state(const struct 
>> drm_plane_state
>>       *global_alpha = false;
>>       *global_alpha_value = 0xff;
>> -    if (plane_state->plane->type != DRM_PLANE_TYPE_OVERLAY)
>> +    if (plane_state->plane->type == DRM_PLANE_TYPE_PRIMARY)
>>           return;
>>       if (plane_state->pixel_blend_mode == DRM_MODE_BLEND_PREMULTI ||
>> @@ -1175,10 +1175,21 @@ static int amdgpu_dm_plane_atomic_check(struct 
>> drm_plane *plane,
>>   static int amdgpu_dm_plane_atomic_async_check(struct drm_plane *plane,
>>                             struct drm_atomic_state *state)
>>   {
>> +    struct drm_crtc_state *new_crtc_state;
>> +    struct drm_plane_state *new_plane_state;
>> +    struct dm_crtc_state *dm_new_crtc_state;
>> +
>>       /* Only support async updates on cursor planes. */
>>       if (plane->type != DRM_PLANE_TYPE_CURSOR)
>>           return -EINVAL;
>> +    new_plane_state = drm_atomic_get_new_plane_state(state, plane);
>> +    new_crtc_state = drm_atomic_get_new_crtc_state(state, 
>> new_plane_state->crtc);
>> +    dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
>> +    /* Reject overlay cursors for now*/
>> +    if (dm_new_crtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE)
>> +        return -EINVAL;
>> +
>>       return 0;
>>   }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode
  2024-03-28 15:52     ` Harry Wentland
@ 2024-04-01 14:38       ` Leo Li
  0 siblings, 0 replies; 27+ messages in thread
From: Leo Li @ 2024-04-01 14:38 UTC (permalink / raw)
  To: Harry Wentland, Robert Mader, dri-devel, amd-gfx
  Cc: Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Pekka Paalanen, Sean Paul, Simon Ser,
	Shashank Sharma, Sebastian Wick



On 2024-03-28 11:52, Harry Wentland wrote:
> 
> 
> On 2024-03-28 11:48, Robert Mader wrote:
>> Hi,
>>
>> On 15.03.24 18:09, sunpeng.li@amd.com wrote:
>>> From: Leo Li <sunpeng.li@amd.com>
>>>
>>> [Why]
>>>
>>> DCN is the display hardware for amdgpu. DRM planes are backed by DCN
>>> hardware pipes, which carry pixel data from one end (memory), to the
>>> other (output encoder).
>>>
>>> Each DCN pipe has the ability to blend in a cursor early on in the
>>> pipeline. In other words, there are no dedicated cursor planes in DCN,
>>> which makes cursor behavior somewhat unintuitive for compositors.
>>>
>>> For example, if the cursor is in RGB format, but the top-most DRM plane
>>> is in YUV format, DCN will not be able to blend them. Because of this,
>>> amdgpu_dm rejects all configurations where a cursor needs to be enabled
>>> on top of a YUV formatted plane.
>>>
>>>  From a compositor's perspective, when computing an allocation for
>>> hardware plane offloading, this cursor-on-yuv configuration result in an
>>> atomic test failure. Since the failure reason is not obvious at all,
>>> compositors will likely fall back to full rendering, which is not ideal.
>>>
>>> Instead, amdgpu_dm can try to accommodate the cursor-on-yuv
>>> configuration by opportunistically reserving a separate DCN pipe just
>>> for the cursor. We can refer to this as "overlay cursor mode". It is
>>> contrasted with "native cursor mode", where the native DCN per-pipe
>>> cursor is used.
>>>
>>> [How]
>>>
>>> On each crtc, compute whether the cursor plane should be enabled in
>>> overlay mode (which currently, is iff the immediate plane below has a
>>> YUV format). If it is, mark the CRTC as requesting overlay cursor mode.
>>
>> iff -> if
>>
>> IIRC another case where this would be needed is when the scale of the plane 
>> below doesn't match the cursor scale. This is especially common for videos and 
>> thus implicitly covered by the YUV check in most cases, but should probably be 
>> made explicit. Michel Dänzer might be able to comment here.
>>
>> Another possible case that could be covered here is when some area is not 
>> covered by any plane and just shows a black background. This happens e.g. if a 
>> compositor puts a YUV surface on the primary plane that has a different 
>> width/high ratio than the display. The compositor can add black bars by just 
>> leaving the area uncovered and thus require only a single hardware plane for 
>> video playback ("Unless explicitly specified (..), the active area of a CRTC 
>> will be black by default." 
>> https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#plane-abstraction). If 
>> the cursor is visible over this black bars, AMD currently refuses the commit 
>> IIUC (see also Michel Dänzers comment at 
>> https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3177#note_1924544)
>>
> 
> Thanks for mentioning both of these scenarios. I agree it would be
> good if we can expand the use of the overlay cursor to these cases.
> 
> Harry
> 
>> Thanks,
>>
>> Robert Mader

All good points, thanks for the feedback. I'll take a peek at the scaling + no
underlying plane case and send a v2 when I get around to it.

Leo

>>
>>>
>>> During DC validation, attempt to enable a separate DCN pipe for the
>>> cursor if it's in overlay mode. If that fails, or if no overlay mode is
>>> requested, then fallback to native mode.
>>>
>>> Signed-off-by: Leo Li <sunpeng.li@amd.com>
>>> ---
>>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 309 +++++++++++++++---
>>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
>>>   .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
>>>   .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  13 +-
>>>   4 files changed, 288 insertions(+), 42 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> index 21a61454c878..09ab330aed17 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> @@ -8359,8 +8359,19 @@ static void amdgpu_dm_commit_planes(struct 
>>> drm_atomic_state *state,
>>>        * Disable the cursor first if we're disabling all the planes.
>>>        * It'll remain on the screen after the planes are re-enabled
>>>        * if we don't.
>>> +     *
>>> +     * If the cursor is transitioning from native to overlay mode, the
>>> +     * native cursor needs to be disabled first.
>>>        */
>>> -    if (acrtc_state->active_planes == 0)
>>> +    if (acrtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE &&
>>> +        dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
>>> +        struct dc_cursor_position cursor_position = {0};
>>> +        dc_stream_set_cursor_position(acrtc_state->stream,
>>> +                          &cursor_position);
>>> +    }
>>> +
>>> +    if (acrtc_state->active_planes == 0 &&
>>> +        dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE)
>>>           amdgpu_dm_commit_cursors(state);
>>>       /* update planes when needed */
>>> @@ -8374,7 +8385,8 @@ static void amdgpu_dm_commit_planes(struct 
>>> drm_atomic_state *state,
>>>           struct dm_plane_state *dm_new_plane_state = 
>>> to_dm_plane_state(new_plane_state);
>>>           /* Cursor plane is handled after stream updates */
>>> -        if (plane->type == DRM_PLANE_TYPE_CURSOR) {
>>> +        if (plane->type == DRM_PLANE_TYPE_CURSOR &&
>>> +            acrtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
>>>               if ((fb && crtc == pcrtc) ||
>>>                   (old_plane_state->fb && old_plane_state->crtc == pcrtc))
>>>                   cursor_update = true;
>>> @@ -8727,7 +8739,8 @@ static void amdgpu_dm_commit_planes(struct 
>>> drm_atomic_state *state,
>>>        * This avoids redundant programming in the case where we're going
>>>        * to be disabling a single plane - those pipes are being disabled.
>>>        */
>>> -    if (acrtc_state->active_planes)
>>> +    if (acrtc_state->active_planes &&
>>> +        acrtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE)
>>>           amdgpu_dm_commit_cursors(state);
>>>   cleanup:
>>> @@ -10039,7 +10052,8 @@ static bool should_reset_plane(struct 
>>> drm_atomic_state *state,
>>>   {
>>>       struct drm_plane *other;
>>>       struct drm_plane_state *old_other_state, *new_other_state;
>>> -    struct drm_crtc_state *new_crtc_state;
>>> +    struct drm_crtc_state *old_crtc_state, *new_crtc_state;
>>> +    struct dm_crtc_state *old_dm_crtc_state, *new_dm_crtc_state;
>>>       struct amdgpu_device *adev = drm_to_adev(plane->dev);
>>>       int i;
>>> @@ -10061,10 +10075,24 @@ static bool should_reset_plane(struct 
>>> drm_atomic_state *state,
>>>       new_crtc_state =
>>>           drm_atomic_get_new_crtc_state(state, new_plane_state->crtc);
>>> +    old_crtc_state =
>>> +        drm_atomic_get_old_crtc_state(state, old_plane_state->crtc);
>>>       if (!new_crtc_state)
>>>           return true;
>>> +    /*
>>> +     * A change in cursor mode means a new dc pipe needs to be acquired or
>>> +     * released from the state
>>> +     */
>>> +    old_dm_crtc_state = to_dm_crtc_state(old_crtc_state);
>>> +    new_dm_crtc_state = to_dm_crtc_state(new_crtc_state);
>>> +    if (plane->type == DRM_PLANE_TYPE_CURSOR &&
>>> +        old_dm_crtc_state != NULL &&
>>> +        old_dm_crtc_state->cursor_mode != new_dm_crtc_state->cursor_mode) {
>>> +        return true;
>>> +    }
>>> +
>>>       /* CRTC Degamma changes currently require us to recreate planes. */
>>>       if (new_crtc_state->color_mgmt_changed)
>>>           return true;
>>> @@ -10216,6 +10244,68 @@ static int dm_check_cursor_fb(struct amdgpu_crtc 
>>> *new_acrtc,
>>>       return 0;
>>>   }
>>> +/*
>>> + * Helper function for checking the cursor in native mode
>>> + */
>>> +static int dm_check_native_cursor_state(struct drm_crtc *new_plane_crtc,
>>> +                    struct drm_plane *plane,
>>> +                    struct drm_plane_state *new_plane_state,
>>> +                    bool enable)
>>> +{
>>> +
>>> +    struct amdgpu_crtc *new_acrtc;
>>> +    int ret;
>>> +
>>> +    if (!enable || !new_plane_crtc ||
>>> +        drm_atomic_plane_disabling(plane->state, new_plane_state))
>>> +        return 0;
>>> +
>>> +    new_acrtc = to_amdgpu_crtc(new_plane_crtc);
>>> +
>>> +    if (new_plane_state->src_x != 0 || new_plane_state->src_y != 0) {
>>> +        DRM_DEBUG_ATOMIC("Cropping not supported for cursor plane\n");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    if (new_plane_state->fb) {
>>> +        ret = dm_check_cursor_fb(new_acrtc, new_plane_state,
>>> +                        new_plane_state->fb);
>>> +        if (ret)
>>> +            return ret;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static bool dm_should_update_native_cursor(struct drm_atomic_state *state,
>>> +                       struct drm_crtc *old_plane_crtc,
>>> +                       struct drm_crtc *new_plane_crtc,
>>> +                       bool enable)
>>> +{
>>> +    struct drm_crtc_state *old_crtc_state, *new_crtc_state;
>>> +    struct dm_crtc_state *dm_old_crtc_state, *dm_new_crtc_state;
>>> +
>>> +    if (!enable) {
>>> +        if (old_plane_crtc == NULL)
>>> +            return true;
>>> +
>>> +        old_crtc_state = drm_atomic_get_old_crtc_state(
>>> +            state, old_plane_crtc);
>>> +        dm_old_crtc_state = to_dm_crtc_state(old_crtc_state);
>>> +
>>> +        return dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE;
>>> +    } else {
>>> +        if (new_plane_crtc == NULL)
>>> +            return true;
>>> +
>>> +        new_crtc_state = drm_atomic_get_new_crtc_state(
>>> +            state, new_plane_crtc);
>>> +        dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
>>> +
>>> +        return dm_new_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE;
>>> +    }
>>> +}
>>> +
>>>   static int dm_update_plane_state(struct dc *dc,
>>>                    struct drm_atomic_state *state,
>>>                    struct drm_plane *plane,
>>> @@ -10231,8 +10321,7 @@ static int dm_update_plane_state(struct dc *dc,
>>>       struct drm_crtc_state *old_crtc_state, *new_crtc_state;
>>>       struct dm_crtc_state *dm_new_crtc_state, *dm_old_crtc_state;
>>>       struct dm_plane_state *dm_new_plane_state, *dm_old_plane_state;
>>> -    struct amdgpu_crtc *new_acrtc;
>>> -    bool needs_reset;
>>> +    bool needs_reset, update_native_cursor;
>>>       int ret = 0;
>>> @@ -10241,24 +10330,16 @@ static int dm_update_plane_state(struct dc *dc,
>>>       dm_new_plane_state = to_dm_plane_state(new_plane_state);
>>>       dm_old_plane_state = to_dm_plane_state(old_plane_state);
>>> -    if (plane->type == DRM_PLANE_TYPE_CURSOR) {
>>> -        if (!enable || !new_plane_crtc ||
>>> -            drm_atomic_plane_disabling(plane->state, new_plane_state))
>>> -            return 0;
>>> -
>>> -        new_acrtc = to_amdgpu_crtc(new_plane_crtc);
>>> -
>>> -        if (new_plane_state->src_x != 0 || new_plane_state->src_y != 0) {
>>> -            DRM_DEBUG_ATOMIC("Cropping not supported for cursor plane\n");
>>> -            return -EINVAL;
>>> -        }
>>> +    update_native_cursor = dm_should_update_native_cursor(state,
>>> +                                  old_plane_crtc,
>>> +                                  new_plane_crtc,
>>> +                                  enable);
>>> -        if (new_plane_state->fb) {
>>> -            ret = dm_check_cursor_fb(new_acrtc, new_plane_state,
>>> -                         new_plane_state->fb);
>>> -            if (ret)
>>> -                return ret;
>>> -        }
>>> +    if (plane->type == DRM_PLANE_TYPE_CURSOR && update_native_cursor) {
>>> +        ret = dm_check_native_cursor_state(new_plane_crtc, plane,
>>> +                                new_plane_state, enable);
>>> +        if (ret)
>>> +            return ret;
>>>           return 0;
>>>       }
>>> @@ -10285,16 +10366,17 @@ static int dm_update_plane_state(struct dc *dc,
>>>                   plane->base.id, old_plane_crtc->base.id);
>>>           ret = dm_atomic_get_state(state, &dm_state);
>>> -        if (ret)
>>> -            return ret;
>>> +        if (ret) {
>>> +            goto out;
>>> +        }
>>>           if (!dc_state_remove_plane(
>>>                   dc,
>>>                   dm_old_crtc_state->stream,
>>>                   dm_old_plane_state->dc_state,
>>>                   dm_state->context)) {
>>> -
>>> -            return -EINVAL;
>>> +            ret = -EINVAL;
>>> +            goto out;
>>>           }
>>>           if (dm_old_plane_state->dc_state)
>>> @@ -10323,21 +10405,16 @@ static int dm_update_plane_state(struct dc *dc,
>>>               return 0;
>>>           ret = amdgpu_dm_plane_helper_check_state(new_plane_state, 
>>> new_crtc_state);
>>> -        if (ret)
>>> -            return ret;
>>> +        if (ret) {
>>> +            goto out;
>>> +        }
>>>           WARN_ON(dm_new_plane_state->dc_state);
>>>           dc_new_plane_state = dc_create_plane_state(dc);
>>> -        if (!dc_new_plane_state)
>>> -            return -ENOMEM;
>>> -
>>> -        /* Block top most plane from being a video plane */
>>> -        if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
>>> -            if 
>>> (amdgpu_dm_plane_is_video_format(new_plane_state->fb->format->format) && 
>>> *is_top_most_overlay)
>>> -                return -EINVAL;
>>> -
>>> -            *is_top_most_overlay = false;
>>> +        if (!dc_new_plane_state) {
>>> +            ret = -ENOMEM;
>>> +            goto out;
>>>           }
>>>           DRM_DEBUG_ATOMIC("Enabling DRM plane: %d on DRM crtc %d\n",
>>> @@ -10350,13 +10427,13 @@ static int dm_update_plane_state(struct dc *dc,
>>>               new_crtc_state);
>>>           if (ret) {
>>>               dc_plane_state_release(dc_new_plane_state);
>>> -            return ret;
>>> +            goto out;
>>>           }
>>>           ret = dm_atomic_get_state(state, &dm_state);
>>>           if (ret) {
>>>               dc_plane_state_release(dc_new_plane_state);
>>> -            return ret;
>>> +            goto out;
>>>           }
>>>           /*
>>> @@ -10373,7 +10450,8 @@ static int dm_update_plane_state(struct dc *dc,
>>>                   dm_state->context)) {
>>>               dc_plane_state_release(dc_new_plane_state);
>>> -            return -EINVAL;
>>> +            ret = -EINVAL;
>>> +            goto out;
>>>           }
>>>           dm_new_plane_state->dc_state = dc_new_plane_state;
>>> @@ -10388,6 +10466,16 @@ static int dm_update_plane_state(struct dc *dc,
>>>           *lock_and_validation_needed = true;
>>>       }
>>> +out:
>>> +    /* If cursor overlay failed, attempt fallback to native mode */
>>> +    if (ret == -EINVAL && plane->type == DRM_PLANE_TYPE_CURSOR) {
>>> +        ret = dm_check_native_cursor_state(new_plane_crtc, plane,
>>> +                            new_plane_state, enable);
>>> +        if (ret) {
>>> +            return ret;
>>> +        }
>>> +        dm_new_crtc_state->cursor_mode = DM_CURSOR_NATIVE_MODE;
>>> +    }
>>>       return ret;
>>>   }
>>> @@ -10544,6 +10632,126 @@ static int add_affected_mst_dsc_crtcs(struct 
>>> drm_atomic_state *state, struct drm
>>>       return drm_dp_mst_add_affected_dsc_crtcs(state, 
>>> &aconnector->mst_root->mst_mgr);
>>>   }
>>> +/**
>>> + * DOC: Cursor Modes - Native vs Overlay
>>> + *
>>> + * In native mode, the cursor uses a integrated cursor pipe within each DCN hw
>>> + * plane. It does not require a dedicated hw plane to enable, but it is
>>> + * subjected to the same z-order and scaling as the hw plane. It also has 
>>> format
>>> + * restrictions, a RGB cursor in native mode cannot be enabled within a non-RGB
>>> + * hw plane.
>>> + *
>>> + * In overlay mode, the cursor uses a separate DCN hw plane, and thus has its
>>> + * own scaling and z-pos. It also has no blending restrictions. It lends to a
>>> + * cursor behavior more akin to a DRM client's expectations. However, it does
>>> + * occupy an extra DCN plane, and therefore will only be used if a DCN plane is
>>> + * available.
>>> +*/
>>> +
>>> +/**
>>> + * Set whether the cursor should be enabled in native mode, or overlay mode, on
>>> + * the dm_crtc_state.
>>> + *
>>> + * The cursor should be enabled in overlay mode if the immediate underlying
>>> + * plane contains a video format.
>>> + *
>>> + * Since zpos info is required, drm_atomic_normalize_zpos must be called before
>>> + * calling this function.
>>> +*/
>>> +static int dm_crtc_set_cursor_mode(struct drm_atomic_state *state,
>>> +                    struct dm_crtc_state *dm_crtc_state)
>>> +{
>>> +    struct drm_plane_state *plane_state, *old_plane_state, *target_plane_state;
>>> +    struct drm_crtc_state *crtc_state = &dm_crtc_state->base;
>>> +    struct drm_plane *plane;
>>> +    bool consider_mode_change = false;
>>> +    bool cursor_changed = false;
>>> +    unsigned int target_zpos;
>>> +    unsigned int cursor_zpos;
>>> +    int i;
>>> +
>>> +    /*
>>> +     * Cursor mode can change if a plane's format changes, is
>>> +     * enabled/disabled, or z-order changes.
>>> +     */
>>> +    for_each_oldnew_plane_in_state(state, plane, old_plane_state, 
>>> plane_state, i) {
>>> +
>>> +        /* Only care about planes on this CRTC */
>>> +        if ((drm_plane_mask(plane) & crtc_state->plane_mask) == 0)
>>> +            continue;
>>> +
>>> +        if (plane->type == DRM_PLANE_TYPE_CURSOR)
>>> +            cursor_changed = true;
>>> +
>>> +        if (drm_atomic_plane_enabling(old_plane_state, plane_state) ||
>>> +            drm_atomic_plane_disabling(old_plane_state, plane_state) ||
>>> +            old_plane_state->fb->format != plane_state->fb->format) {
>>> +            consider_mode_change = true;
>>> +            break;
>>> +        }
>>> +    }
>>> +
>>> +    if (!consider_mode_change && !crtc_state->zpos_changed) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    /*
>>> +     * If no cursor change on this CRTC, and not enabled on this CRTC, then
>>> +     * no need to set cursor mode. This avoids needlessly locking the cursor
>>> +     * state.
>>> +     */
>>> +    if (!cursor_changed &&
>>> +        !(drm_plane_mask(crtc_state->crtc->cursor) & crtc_state->plane_mask)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    plane_state = drm_atomic_get_plane_state(state,
>>> +                         crtc_state->crtc->cursor);
>>> +    if (IS_ERR(plane_state))
>>> +        return PTR_ERR(plane_state);
>>> +
>>> +    /* Cursor is disabled */
>>> +    if (!plane_state->fb)
>>> +        return 0;
>>> +
>>> +    cursor_zpos = plane_state->normalized_zpos;
>>> +
>>> +    /* Get enabled plane immediately below cursor. */
>>> +    target_plane_state = NULL;
>>> +    target_zpos = 0;
>>> +    drm_for_each_plane_mask(plane, state->dev, crtc_state->plane_mask) {
>>> +        if (plane->type == DRM_PLANE_TYPE_CURSOR)
>>> +            continue;
>>> +
>>> +        plane_state = drm_atomic_get_plane_state(state, plane);
>>> +        if (IS_ERR(plane_state))
>>> +            return PTR_ERR(plane_state);
>>> +
>>> +        if (!plane_state->fb ||
>>> +            plane_state->normalized_zpos >= cursor_zpos)
>>> +            continue;
>>> +
>>> +        if (plane_state->normalized_zpos >= target_zpos) {
>>> +            target_zpos = plane_state->normalized_zpos;
>>> +            target_plane_state = plane_state;
>>> +        }
>>> +    }
>>> +
>>> +    /* Nothing below cursor - use overlay mode */
>>> +    if (target_plane_state == NULL) {
>>> +        dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
>>> +        return 0;
>>> +    }
>>> +
>>> +    if 
>>> (amdgpu_dm_plane_is_video_format(target_plane_state->fb->format->format)) {
>>> +        dm_crtc_state->cursor_mode = DM_CURSOR_OVERLAY_MODE;
>>> +    } else {
>>> +        dm_crtc_state->cursor_mode = DM_CURSOR_NATIVE_MODE;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   /**
>>>    * amdgpu_dm_atomic_check() - Atomic check implementation for AMDgpu DM.
>>>    *
>>> @@ -10713,6 +10921,20 @@ static int amdgpu_dm_atomic_check(struct drm_device 
>>> *dev,
>>>           goto fail;
>>>       }
>>> +    /*
>>> +     * Determine whether cursors on each CRTC should be enabled in native or
>>> +     * overlay mode.
>>> +     */
>>> +    for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
>>> +        dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
>>> +
>>> +        ret = dm_crtc_set_cursor_mode(state, dm_new_crtc_state);
>>> +        if (ret) {
>>> +            drm_dbg(dev, "Failed to determine cursor mode\n");
>>> +            goto fail;
>>> +        }
>>> +    }
>>> +
>>>       /* Remove exiting planes if they are modified */
>>>       for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, 
>>> new_plane_state, i) {
>>>           if (old_plane_state->fb && new_plane_state->fb &&
>>> @@ -10793,6 +11015,11 @@ static int amdgpu_dm_atomic_check(struct drm_device 
>>> *dev,
>>>       /* Check cursor planes scaling */
>>>       for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
>>> +        /* Overlay cusor does not need scaling check */
>>> +        dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
>>> +        if (dm_new_crtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE)
>>> +            continue;
>>> +
>>>           ret = dm_check_crtc_cursor(state, crtc, new_crtc_state);
>>>           if (ret) {
>>>               DRM_DEBUG_DRIVER("dm_check_crtc_cursor() failed\n");
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h 
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
>>> index 09519b7abf67..b8d39fdd1e09 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
>>> @@ -822,6 +822,11 @@ struct dm_plane_state {
>>>       enum amdgpu_transfer_function blend_tf;
>>>   };
>>> +enum amdgpu_dm_cursor_mode {
>>> +    DM_CURSOR_NATIVE_MODE = 0,
>>> +    DM_CURSOR_OVERLAY_MODE,
>>> +};
>>> +
>>>   struct dm_crtc_state {
>>>       struct drm_crtc_state base;
>>>       struct dc_stream_state *stream;
>>> @@ -852,6 +857,8 @@ struct dm_crtc_state {
>>>        * encoding.
>>>        */
>>>       enum amdgpu_transfer_function regamma_tf;
>>> +
>>> +    enum amdgpu_dm_cursor_mode cursor_mode;
>>>   };
>>>   #define to_dm_crtc_state(x) container_of(x, struct dm_crtc_state, base)
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c 
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
>>> index e23a0a276e33..67aea1d2feb9 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
>>> @@ -304,6 +304,7 @@ static struct drm_crtc_state 
>>> *amdgpu_dm_crtc_duplicate_state(struct drm_crtc *cr
>>>       state->regamma_tf = cur->regamma_tf;
>>>       state->crc_skip_count = cur->crc_skip_count;
>>>       state->mpo_requested = cur->mpo_requested;
>>> +    state->cursor_mode = cur->cursor_mode;
>>>       /* TODO Duplicate dc_stream after objects are stream object is 
>>> flattened */
>>>       return &state->base;
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
>>> index 8a4c40b4c27e..ed1fc01f1649 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
>>> @@ -104,7 +104,7 @@ void amdgpu_dm_plane_fill_blending_from_plane_state(const 
>>> struct drm_plane_state
>>>       *global_alpha = false;
>>>       *global_alpha_value = 0xff;
>>> -    if (plane_state->plane->type != DRM_PLANE_TYPE_OVERLAY)
>>> +    if (plane_state->plane->type == DRM_PLANE_TYPE_PRIMARY)
>>>           return;
>>>       if (plane_state->pixel_blend_mode == DRM_MODE_BLEND_PREMULTI ||
>>> @@ -1175,10 +1175,21 @@ static int amdgpu_dm_plane_atomic_check(struct 
>>> drm_plane *plane,
>>>   static int amdgpu_dm_plane_atomic_async_check(struct drm_plane *plane,
>>>                             struct drm_atomic_state *state)
>>>   {
>>> +    struct drm_crtc_state *new_crtc_state;
>>> +    struct drm_plane_state *new_plane_state;
>>> +    struct dm_crtc_state *dm_new_crtc_state;
>>> +
>>>       /* Only support async updates on cursor planes. */
>>>       if (plane->type != DRM_PLANE_TYPE_CURSOR)
>>>           return -EINVAL;
>>> +    new_plane_state = drm_atomic_get_new_plane_state(state, plane);
>>> +    new_crtc_state = drm_atomic_get_new_crtc_state(state, 
>>> new_plane_state->crtc);
>>> +    dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
>>> +    /* Reject overlay cursors for now*/
>>> +    if (dm_new_crtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE)
>>> +        return -EINVAL;
>>> +
>>>       return 0;
>>>   }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-03-28 14:33 ` [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible Pekka Paalanen
@ 2024-04-03 21:32   ` Leo Li
  2024-04-04 10:24     ` Pekka Paalanen
  0 siblings, 1 reply; 27+ messages in thread
From: Leo Li @ 2024-04-03 21:32 UTC (permalink / raw)
  To: Pekka Paalanen, Marius Vlad
  Cc: dri-devel, amd-gfx, Joshua Ashton, Michel Dänzer, Chao Guo,
	Xaver Hugl, Vikas Korjani, Robert Mader, Sean Paul, Simon Ser,
	Shashank Sharma, Harry Wentland, Sebastian Wick



On 2024-03-28 10:33, Pekka Paalanen wrote:
> On Fri, 15 Mar 2024 13:09:56 -0400
> <sunpeng.li@amd.com> wrote:
> 
>> From: Leo Li <sunpeng.li@amd.com>
>>
>> These patches aim to make the amdgpgu KMS driver play nicer with compositors
>> when building multi-plane scanout configurations. They do so by:
>>
>> 1. Making cursor behavior more sensible.
>> 2. Allowing placement of DRM OVERLAY planes underneath the PRIMARY plane for
>>     'underlay' configurations (perhaps more of a RFC, see below).
>>
>> Please see the commit messages for details.
>>
>>
>> For #2, the simplest way to accomplish this was to increase the value of the
>> immutable zpos property for the PRIMARY plane. This allowed OVERLAY planes with
>> a mutable zpos range of (0-254) to be positioned underneath the PRIMARY for an
>> underlay scanout configuration.
>>
>> Technically speaking, DCN hardware does not have a concept of primary or overlay
>> planes - there are simply 4 general purpose hardware pipes that can be maped in
>> any configuration. So the immutable zpos restriction on the PRIMARY plane is
>> kind of arbitrary; it can have a mutable range of (0-254) just like the
>> OVERLAYs. The distinction between PRIMARY and OVERLAY planes is also somewhat
>> arbitrary. We can interpret PRIMARY as the first plane that should be enabled on
>> a CRTC, but beyond that, it doesn't mean much for amdgpu.
>>
>> Therefore, I'm curious about how compositors devs understand KMS planes and
>> their zpos properties, and how we would like to use them. It isn't clear to me
>> how compositors wish to interpret and use the DRM zpos property, or
>> differentiate between OVERLAY and PRIMARY planes, when it comes to setting up
>> multi-plane scanout.
> 
> You already quoted me on the Weston link, so I don't think I have
> anything to add. Sounds fine to me, and we don't have a standard plane
> arrangement algorithm that the kernel could optimize zpos ranges
> against, yet.
> 
>> Ultimately, what I'd like to answer is "What can we do on the KMS driver and DRM
>> plane API side, that can make building multi-plane scanout configurations easier
>> for compositors?" I'm hoping we can converge on something, whether that be
>> updating the existing documentation to better define the usage, or update the
>> API to provide support for something that is lacking.
> 
> I think there probably should be a standardised plane arrangement
> algorithm in userspace, because the search space suffers from
> permutational explosion. Either there needs to be very few planes (max
> 4 or 5 at-all-possible per CRTC, including shareable ones) for an
> exhaustive search to be feasible, or all planes should be more or less
> equal in capabilities and userspace employs some simplified or
> heuristic search.
> 
> If the search algorithm is fixed, then drivers could optimize zpos
> ranges to have the algorithm find a solution faster.
> 
> My worry is that userspace already has heuristic search algorithms that
> may start failing if drivers later change their zpos ranges to be more
> optimal for another algorithm.
> 
> OTOH, as long as exhaustive search is feasible, then it does not matter
> how DRM drivers set up the zpos ranges.
> 
> In any case, the zpos ranges should try to allow all possible plane
> arrangements while minimizing the number of arrangements that won't
> work. The absolute values of zpos are pretty much irrelevant, so I
> think setting one plane to have an immutable zpos is a good idea, even
> if it's not necessary by the driver. That is one less moving part, and
> only the relative ordering between the planes matters.
> 
> 
> Thanks,
> pq

Right, thanks for your thoughts! I agree that there should be a common plane
arrangement algorithm. I think libliftoff is the most obvious candidate here. It
only handles overlay arrangements currently, but mixed-mode arrangements is
something I've been trying to look at.

Taking the driver's reported zpos into account could narrow down the search
space for mixed arrangements. We could tell whether underlay, or overlay, or
both, is supported by looking at the allowed zpos ranges.

I also wonder if it'll make underlay assignments easier. libliftoff has an
assumption that the PRIMARY plane has the lowest zpos (which now I realize, is
not always true). Therefore, the underlay buffer has to be placed on the
PRIMARY, with the render buffer on a higher OVERLAY. Swapping buffers between
planes when testing mixed-arrangements is kind of awkward, and simply setting
the OVERLAY's zpos to be lower or higher than the PRIMARY's sounds simpler.

Currently only gamescope makes use of libliftoff, but I'm curious if patches
hooking it up to Weston would be welcomed? If there are other ways to have a
common arrangement algorithm, I'd be happy to hear that as well.

Note that libliftoff's algorithm is more complex than weston, since it searches
harder, and suffers from that permutational explosion. But it solves that by
trying high benefit arrangements first (offloading surfaces that update
frequently), and bailing out once the search reaches a hard-coded deadline.
Since it's currently overlay-only, the goal could be to "simply" have no
regressions.

Thanks,
Leo

> 
>> Some links to provide context and details:
>> * What is underlay?: https://gitlab.freedesktop.org/emersion/libliftoff/-/issues/76
>> * Discussion on how to implement underlay on Weston: https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2325164
>>
>> Cc: Joshua Ashton <joshua@froggi.es>
>> Cc: Michel Dänzer <mdaenzer@redhat.com>
>> Cc: Chao Guo <chao.guo@nxp.com>
>> Cc: Xaver Hugl <xaver.hugl@gmail.com>
>> Cc: Vikas Korjani <Vikas.Korjani@amd.com>
>> Cc: Robert Mader <robert.mader@posteo.de>
>> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
>> Cc: Sean Paul <sean@poorly.run>
>> Cc: Simon Ser <contact@emersion.fr>
>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>> Cc: Harry Wentland <harry.wentland@amd.com>
>> Cc: Sebastian Wick <sebastian.wick@redhat.com>
>>
>> Leo Li (2):
>>    drm/amd/display: Introduce overlay cursor mode
>>    drm/amd/display: Move PRIMARY plane zpos higher
>>
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 405 ++++++++++++++++--
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
>>   .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
>>   .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  28 +-
>>   4 files changed, 391 insertions(+), 50 deletions(-)
>>
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-03 21:32   ` Leo Li
@ 2024-04-04 10:24     ` Pekka Paalanen
  2024-04-04 13:59       ` Harry Wentland
  0 siblings, 1 reply; 27+ messages in thread
From: Pekka Paalanen @ 2024-04-04 10:24 UTC (permalink / raw)
  To: Leo Li
  Cc: Marius Vlad, dri-devel, amd-gfx, Joshua Ashton,
	Michel Dänzer, Chao Guo, Xaver Hugl, Vikas Korjani,
	Robert Mader, Sean Paul, Simon Ser, Shashank Sharma,
	Harry Wentland, Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 7533 bytes --]

On Wed, 3 Apr 2024 17:32:46 -0400
Leo Li <sunpeng.li@amd.com> wrote:

> On 2024-03-28 10:33, Pekka Paalanen wrote:
> > On Fri, 15 Mar 2024 13:09:56 -0400
> > <sunpeng.li@amd.com> wrote:
> >   
> >> From: Leo Li <sunpeng.li@amd.com>
> >>
> >> These patches aim to make the amdgpgu KMS driver play nicer with compositors
> >> when building multi-plane scanout configurations. They do so by:
> >>
> >> 1. Making cursor behavior more sensible.
> >> 2. Allowing placement of DRM OVERLAY planes underneath the PRIMARY plane for
> >>     'underlay' configurations (perhaps more of a RFC, see below).
> >>
> >> Please see the commit messages for details.
> >>
> >>
> >> For #2, the simplest way to accomplish this was to increase the value of the
> >> immutable zpos property for the PRIMARY plane. This allowed OVERLAY planes with
> >> a mutable zpos range of (0-254) to be positioned underneath the PRIMARY for an
> >> underlay scanout configuration.
> >>
> >> Technically speaking, DCN hardware does not have a concept of primary or overlay
> >> planes - there are simply 4 general purpose hardware pipes that can be maped in
> >> any configuration. So the immutable zpos restriction on the PRIMARY plane is
> >> kind of arbitrary; it can have a mutable range of (0-254) just like the
> >> OVERLAYs. The distinction between PRIMARY and OVERLAY planes is also somewhat
> >> arbitrary. We can interpret PRIMARY as the first plane that should be enabled on
> >> a CRTC, but beyond that, it doesn't mean much for amdgpu.
> >>
> >> Therefore, I'm curious about how compositors devs understand KMS planes and
> >> their zpos properties, and how we would like to use them. It isn't clear to me
> >> how compositors wish to interpret and use the DRM zpos property, or
> >> differentiate between OVERLAY and PRIMARY planes, when it comes to setting up
> >> multi-plane scanout.  
> > 
> > You already quoted me on the Weston link, so I don't think I have
> > anything to add. Sounds fine to me, and we don't have a standard plane
> > arrangement algorithm that the kernel could optimize zpos ranges
> > against, yet.
> >   
> >> Ultimately, what I'd like to answer is "What can we do on the KMS driver and DRM
> >> plane API side, that can make building multi-plane scanout configurations easier
> >> for compositors?" I'm hoping we can converge on something, whether that be
> >> updating the existing documentation to better define the usage, or update the
> >> API to provide support for something that is lacking.  
> > 
> > I think there probably should be a standardised plane arrangement
> > algorithm in userspace, because the search space suffers from
> > permutational explosion. Either there needs to be very few planes (max
> > 4 or 5 at-all-possible per CRTC, including shareable ones) for an
> > exhaustive search to be feasible, or all planes should be more or less
> > equal in capabilities and userspace employs some simplified or
> > heuristic search.
> > 
> > If the search algorithm is fixed, then drivers could optimize zpos
> > ranges to have the algorithm find a solution faster.
> > 
> > My worry is that userspace already has heuristic search algorithms that
> > may start failing if drivers later change their zpos ranges to be more
> > optimal for another algorithm.
> > 
> > OTOH, as long as exhaustive search is feasible, then it does not matter
> > how DRM drivers set up the zpos ranges.
> > 
> > In any case, the zpos ranges should try to allow all possible plane
> > arrangements while minimizing the number of arrangements that won't
> > work. The absolute values of zpos are pretty much irrelevant, so I
> > think setting one plane to have an immutable zpos is a good idea, even
> > if it's not necessary by the driver. That is one less moving part, and
> > only the relative ordering between the planes matters.
> > 
> > 
> > Thanks,
> > pq  
> 
> Right, thanks for your thoughts! I agree that there should be a common plane
> arrangement algorithm. I think libliftoff is the most obvious candidate here. It
> only handles overlay arrangements currently, but mixed-mode arrangements is
> something I've been trying to look at.
> 
> Taking the driver's reported zpos into account could narrow down the search
> space for mixed arrangements. We could tell whether underlay, or overlay, or
> both, is supported by looking at the allowed zpos ranges.
> 
> I also wonder if it'll make underlay assignments easier. libliftoff has an
> assumption that the PRIMARY plane has the lowest zpos (which now I realize, is
> not always true). Therefore, the underlay buffer has to be placed on the
> PRIMARY, with the render buffer on a higher OVERLAY. Swapping buffers between
> planes when testing mixed-arrangements is kind of awkward, and simply setting
> the OVERLAY's zpos to be lower or higher than the PRIMARY's sounds simpler.
> 
> Currently only gamescope makes use of libliftoff, but I'm curious if patches
> hooking it up to Weston would be welcomed? If there are other ways to have a
> common arrangement algorithm, I'd be happy to hear that as well.

A natural thing would be to document such an algorithm with the KMS
UAPI.

I don't know libliftoff well enough to say how welcome it would be in
Weston. I have no fundamental or policy reason to keep an independent
implementation in Weston though, so it's plausible at least.

It would need investigation, and perhaps also extending Weston test
suite a lot more towards VKMS to verify plane assignments. Currently
all plane assignment testing is manual on real hardware.

> Note that libliftoff's algorithm is more complex than weston, since it searches
> harder, and suffers from that permutational explosion. But it solves that by
> trying high benefit arrangements first (offloading surfaces that update
> frequently), and bailing out once the search reaches a hard-coded deadline.
> Since it's currently overlay-only, the goal could be to "simply" have no
> regressions.

Ensuring no regressions would indeed need to be taken care of by
extending the VKMS-based automated testing.


Thanks,
pq

> >   
> >> Some links to provide context and details:
> >> * What is underlay?: https://gitlab.freedesktop.org/emersion/libliftoff/-/issues/76
> >> * Discussion on how to implement underlay on Weston: https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2325164
> >>
> >> Cc: Joshua Ashton <joshua@froggi.es>
> >> Cc: Michel Dänzer <mdaenzer@redhat.com>
> >> Cc: Chao Guo <chao.guo@nxp.com>
> >> Cc: Xaver Hugl <xaver.hugl@gmail.com>
> >> Cc: Vikas Korjani <Vikas.Korjani@amd.com>
> >> Cc: Robert Mader <robert.mader@posteo.de>
> >> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
> >> Cc: Sean Paul <sean@poorly.run>
> >> Cc: Simon Ser <contact@emersion.fr>
> >> Cc: Shashank Sharma <shashank.sharma@amd.com>
> >> Cc: Harry Wentland <harry.wentland@amd.com>
> >> Cc: Sebastian Wick <sebastian.wick@redhat.com>
> >>
> >> Leo Li (2):
> >>    drm/amd/display: Introduce overlay cursor mode
> >>    drm/amd/display: Move PRIMARY plane zpos higher
> >>
> >>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 405 ++++++++++++++++--
> >>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
> >>   .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
> >>   .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  28 +-
> >>   4 files changed, 391 insertions(+), 50 deletions(-)
> >>  
> >   


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-04 10:24     ` Pekka Paalanen
@ 2024-04-04 13:59       ` Harry Wentland
  2024-04-04 14:22         ` Marius Vlad
  0 siblings, 1 reply; 27+ messages in thread
From: Harry Wentland @ 2024-04-04 13:59 UTC (permalink / raw)
  To: Pekka Paalanen, Leo Li
  Cc: Marius Vlad, dri-devel, amd-gfx, Joshua Ashton,
	Michel Dänzer, Chao Guo, Xaver Hugl, Vikas Korjani,
	Robert Mader, Sean Paul, Simon Ser, Shashank Sharma,
	Sebastian Wick



On 2024-04-04 06:24, Pekka Paalanen wrote:
> On Wed, 3 Apr 2024 17:32:46 -0400
> Leo Li <sunpeng.li@amd.com> wrote:
> 
>> On 2024-03-28 10:33, Pekka Paalanen wrote:
>>> On Fri, 15 Mar 2024 13:09:56 -0400
>>> <sunpeng.li@amd.com> wrote:
>>>   
>>>> From: Leo Li <sunpeng.li@amd.com>
>>>>
>>>> These patches aim to make the amdgpgu KMS driver play nicer with compositors
>>>> when building multi-plane scanout configurations. They do so by:
>>>>
>>>> 1. Making cursor behavior more sensible.
>>>> 2. Allowing placement of DRM OVERLAY planes underneath the PRIMARY plane for
>>>>     'underlay' configurations (perhaps more of a RFC, see below).
>>>>
>>>> Please see the commit messages for details.
>>>>
>>>>
>>>> For #2, the simplest way to accomplish this was to increase the value of the
>>>> immutable zpos property for the PRIMARY plane. This allowed OVERLAY planes with
>>>> a mutable zpos range of (0-254) to be positioned underneath the PRIMARY for an
>>>> underlay scanout configuration.
>>>>
>>>> Technically speaking, DCN hardware does not have a concept of primary or overlay
>>>> planes - there are simply 4 general purpose hardware pipes that can be maped in
>>>> any configuration. So the immutable zpos restriction on the PRIMARY plane is
>>>> kind of arbitrary; it can have a mutable range of (0-254) just like the
>>>> OVERLAYs. The distinction between PRIMARY and OVERLAY planes is also somewhat
>>>> arbitrary. We can interpret PRIMARY as the first plane that should be enabled on
>>>> a CRTC, but beyond that, it doesn't mean much for amdgpu.
>>>>
>>>> Therefore, I'm curious about how compositors devs understand KMS planes and
>>>> their zpos properties, and how we would like to use them. It isn't clear to me
>>>> how compositors wish to interpret and use the DRM zpos property, or
>>>> differentiate between OVERLAY and PRIMARY planes, when it comes to setting up
>>>> multi-plane scanout.  
>>>
>>> You already quoted me on the Weston link, so I don't think I have
>>> anything to add. Sounds fine to me, and we don't have a standard plane
>>> arrangement algorithm that the kernel could optimize zpos ranges
>>> against, yet.
>>>   
>>>> Ultimately, what I'd like to answer is "What can we do on the KMS driver and DRM
>>>> plane API side, that can make building multi-plane scanout configurations easier
>>>> for compositors?" I'm hoping we can converge on something, whether that be
>>>> updating the existing documentation to better define the usage, or update the
>>>> API to provide support for something that is lacking.  
>>>
>>> I think there probably should be a standardised plane arrangement
>>> algorithm in userspace, because the search space suffers from
>>> permutational explosion. Either there needs to be very few planes (max
>>> 4 or 5 at-all-possible per CRTC, including shareable ones) for an
>>> exhaustive search to be feasible, or all planes should be more or less
>>> equal in capabilities and userspace employs some simplified or
>>> heuristic search.
>>>
>>> If the search algorithm is fixed, then drivers could optimize zpos
>>> ranges to have the algorithm find a solution faster.
>>>
>>> My worry is that userspace already has heuristic search algorithms that
>>> may start failing if drivers later change their zpos ranges to be more
>>> optimal for another algorithm.
>>>
>>> OTOH, as long as exhaustive search is feasible, then it does not matter
>>> how DRM drivers set up the zpos ranges.
>>>
>>> In any case, the zpos ranges should try to allow all possible plane
>>> arrangements while minimizing the number of arrangements that won't
>>> work. The absolute values of zpos are pretty much irrelevant, so I
>>> think setting one plane to have an immutable zpos is a good idea, even
>>> if it's not necessary by the driver. That is one less moving part, and
>>> only the relative ordering between the planes matters.
>>>
>>>
>>> Thanks,
>>> pq  
>>
>> Right, thanks for your thoughts! I agree that there should be a common plane
>> arrangement algorithm. I think libliftoff is the most obvious candidate here. It
>> only handles overlay arrangements currently, but mixed-mode arrangements is
>> something I've been trying to look at.
>>
>> Taking the driver's reported zpos into account could narrow down the search
>> space for mixed arrangements. We could tell whether underlay, or overlay, or
>> both, is supported by looking at the allowed zpos ranges.
>>
>> I also wonder if it'll make underlay assignments easier. libliftoff has an
>> assumption that the PRIMARY plane has the lowest zpos (which now I realize, is
>> not always true). Therefore, the underlay buffer has to be placed on the
>> PRIMARY, with the render buffer on a higher OVERLAY. Swapping buffers between
>> planes when testing mixed-arrangements is kind of awkward, and simply setting
>> the OVERLAY's zpos to be lower or higher than the PRIMARY's sounds simpler.
>>
>> Currently only gamescope makes use of libliftoff, but I'm curious if patches
>> hooking it up to Weston would be welcomed? If there are other ways to have a
>> common arrangement algorithm, I'd be happy to hear that as well.
> 
> A natural thing would be to document such an algorithm with the KMS
> UAPI.
> 
> I don't know libliftoff well enough to say how welcome it would be in
> Weston. I have no fundamental or policy reason to keep an independent
> implementation in Weston though, so it's plausible at least.
> 
> It would need investigation, and perhaps also extending Weston test
> suite a lot more towards VKMS to verify plane assignments. Currently
> all plane assignment testing is manual on real hardware.
> 

It looks like VKMS doesn't have explicit zpos yet, so someone would
probably need to add that.

https://drmdb.emersion.fr/properties/4008636142/zpos

Harry

>> Note that libliftoff's algorithm is more complex than weston, since it searches
>> harder, and suffers from that permutational explosion. But it solves that by
>> trying high benefit arrangements first (offloading surfaces that update
>> frequently), and bailing out once the search reaches a hard-coded deadline.
>> Since it's currently overlay-only, the goal could be to "simply" have no
>> regressions.
> 
> Ensuring no regressions would indeed need to be taken care of by
> extending the VKMS-based automated testing.
> 
> 
> Thanks,
> pq
> 
>>>   
>>>> Some links to provide context and details:
>>>> * What is underlay?: https://gitlab.freedesktop.org/emersion/libliftoff/-/issues/76
>>>> * Discussion on how to implement underlay on Weston: https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2325164
>>>>
>>>> Cc: Joshua Ashton <joshua@froggi.es>
>>>> Cc: Michel Dänzer <mdaenzer@redhat.com>
>>>> Cc: Chao Guo <chao.guo@nxp.com>
>>>> Cc: Xaver Hugl <xaver.hugl@gmail.com>
>>>> Cc: Vikas Korjani <Vikas.Korjani@amd.com>
>>>> Cc: Robert Mader <robert.mader@posteo.de>
>>>> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
>>>> Cc: Sean Paul <sean@poorly.run>
>>>> Cc: Simon Ser <contact@emersion.fr>
>>>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>>>> Cc: Harry Wentland <harry.wentland@amd.com>
>>>> Cc: Sebastian Wick <sebastian.wick@redhat.com>
>>>>
>>>> Leo Li (2):
>>>>    drm/amd/display: Introduce overlay cursor mode
>>>>    drm/amd/display: Move PRIMARY plane zpos higher
>>>>
>>>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 405 ++++++++++++++++--
>>>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
>>>>   .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
>>>>   .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  28 +-
>>>>   4 files changed, 391 insertions(+), 50 deletions(-)
>>>>  
>>>   
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-04 13:59       ` Harry Wentland
@ 2024-04-04 14:22         ` Marius Vlad
  2024-04-11 20:33           ` Leo Li
  0 siblings, 1 reply; 27+ messages in thread
From: Marius Vlad @ 2024-04-04 14:22 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Pekka Paalanen, Leo Li, dri-devel, amd-gfx, Joshua Ashton,
	Michel Dänzer, Chao Guo, Xaver Hugl, Vikas Korjani,
	Robert Mader, Sean Paul, Simon Ser, Shashank Sharma,
	Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 8489 bytes --]

On Thu, Apr 04, 2024 at 09:59:03AM -0400, Harry Wentland wrote:
> 
Hi all,
> 
> On 2024-04-04 06:24, Pekka Paalanen wrote:
> > On Wed, 3 Apr 2024 17:32:46 -0400
> > Leo Li <sunpeng.li@amd.com> wrote:
> > 
> >> On 2024-03-28 10:33, Pekka Paalanen wrote:
> >>> On Fri, 15 Mar 2024 13:09:56 -0400
> >>> <sunpeng.li@amd.com> wrote:
> >>>   
> >>>> From: Leo Li <sunpeng.li@amd.com>
> >>>>
> >>>> These patches aim to make the amdgpgu KMS driver play nicer with compositors
> >>>> when building multi-plane scanout configurations. They do so by:
> >>>>
> >>>> 1. Making cursor behavior more sensible.
> >>>> 2. Allowing placement of DRM OVERLAY planes underneath the PRIMARY plane for
> >>>>     'underlay' configurations (perhaps more of a RFC, see below).
> >>>>
> >>>> Please see the commit messages for details.
> >>>>
> >>>>
> >>>> For #2, the simplest way to accomplish this was to increase the value of the
> >>>> immutable zpos property for the PRIMARY plane. This allowed OVERLAY planes with
> >>>> a mutable zpos range of (0-254) to be positioned underneath the PRIMARY for an
> >>>> underlay scanout configuration.
> >>>>
> >>>> Technically speaking, DCN hardware does not have a concept of primary or overlay
> >>>> planes - there are simply 4 general purpose hardware pipes that can be maped in
> >>>> any configuration. So the immutable zpos restriction on the PRIMARY plane is
> >>>> kind of arbitrary; it can have a mutable range of (0-254) just like the
> >>>> OVERLAYs. The distinction between PRIMARY and OVERLAY planes is also somewhat
> >>>> arbitrary. We can interpret PRIMARY as the first plane that should be enabled on
> >>>> a CRTC, but beyond that, it doesn't mean much for amdgpu.
> >>>>
> >>>> Therefore, I'm curious about how compositors devs understand KMS planes and
> >>>> their zpos properties, and how we would like to use them. It isn't clear to me
> >>>> how compositors wish to interpret and use the DRM zpos property, or
> >>>> differentiate between OVERLAY and PRIMARY planes, when it comes to setting up
> >>>> multi-plane scanout.  
> >>>
> >>> You already quoted me on the Weston link, so I don't think I have
> >>> anything to add. Sounds fine to me, and we don't have a standard plane
> >>> arrangement algorithm that the kernel could optimize zpos ranges
> >>> against, yet.
> >>>   
> >>>> Ultimately, what I'd like to answer is "What can we do on the KMS driver and DRM
> >>>> plane API side, that can make building multi-plane scanout configurations easier
> >>>> for compositors?" I'm hoping we can converge on something, whether that be
> >>>> updating the existing documentation to better define the usage, or update the
> >>>> API to provide support for something that is lacking.  
> >>>
> >>> I think there probably should be a standardised plane arrangement
> >>> algorithm in userspace, because the search space suffers from
> >>> permutational explosion. Either there needs to be very few planes (max
> >>> 4 or 5 at-all-possible per CRTC, including shareable ones) for an
> >>> exhaustive search to be feasible, or all planes should be more or less
> >>> equal in capabilities and userspace employs some simplified or
> >>> heuristic search.
> >>>
> >>> If the search algorithm is fixed, then drivers could optimize zpos
> >>> ranges to have the algorithm find a solution faster.
> >>>
> >>> My worry is that userspace already has heuristic search algorithms that
> >>> may start failing if drivers later change their zpos ranges to be more
> >>> optimal for another algorithm.
> >>>
> >>> OTOH, as long as exhaustive search is feasible, then it does not matter
> >>> how DRM drivers set up the zpos ranges.
> >>>
> >>> In any case, the zpos ranges should try to allow all possible plane
> >>> arrangements while minimizing the number of arrangements that won't
> >>> work. The absolute values of zpos are pretty much irrelevant, so I
> >>> think setting one plane to have an immutable zpos is a good idea, even
> >>> if it's not necessary by the driver. That is one less moving part, and
> >>> only the relative ordering between the planes matters.
> >>>
> >>>
> >>> Thanks,
> >>> pq  
> >>
> >> Right, thanks for your thoughts! I agree that there should be a common plane
> >> arrangement algorithm. I think libliftoff is the most obvious candidate here. It
> >> only handles overlay arrangements currently, but mixed-mode arrangements is
> >> something I've been trying to look at.
> >>
> >> Taking the driver's reported zpos into account could narrow down the search
> >> space for mixed arrangements. We could tell whether underlay, or overlay, or
> >> both, is supported by looking at the allowed zpos ranges.
> >>
> >> I also wonder if it'll make underlay assignments easier. libliftoff has an
> >> assumption that the PRIMARY plane has the lowest zpos (which now I realize, is
> >> not always true). Therefore, the underlay buffer has to be placed on the
> >> PRIMARY, with the render buffer on a higher OVERLAY. Swapping buffers between
> >> planes when testing mixed-arrangements is kind of awkward, and simply setting
> >> the OVERLAY's zpos to be lower or higher than the PRIMARY's sounds simpler.
> >>
> >> Currently only gamescope makes use of libliftoff, but I'm curious if patches
> >> hooking it up to Weston would be welcomed? If there are other ways to have a
> >> common arrangement algorithm, I'd be happy to hear that as well.
> > 
> > A natural thing would be to document such an algorithm with the KMS
> > UAPI.
> > 
> > I don't know libliftoff well enough to say how welcome it would be in
> > Weston. I have no fundamental or policy reason to keep an independent
> > implementation in Weston though, so it's plausible at least.
> > 
> > It would need investigation, and perhaps also extending Weston test
> > suite a lot more towards VKMS to verify plane assignments. Currently
> > all plane assignment testing is manual on real hardware.
> > 
> 
> It looks like VKMS doesn't have explicit zpos yet, so someone would
> probably need to add that.
> 
> https://drmdb.emersion.fr/properties/4008636142/zpos
Yes. If we look into adding that, maybe it should be done using with
ConfigFS: https://patchwork.freedesktop.org/series/122618/

With that in and with zpos support, we could then run a batch of tests that 
can dynamically exercise on-the-fly all possible combinations.
> 
> Harry
> 
> >> Note that libliftoff's algorithm is more complex than weston, since it searches
> >> harder, and suffers from that permutational explosion. But it solves that by
> >> trying high benefit arrangements first (offloading surfaces that update
> >> frequently), and bailing out once the search reaches a hard-coded deadline.
> >> Since it's currently overlay-only, the goal could be to "simply" have no
> >> regressions.
> > 
> > Ensuring no regressions would indeed need to be taken care of by
> > extending the VKMS-based automated testing.
> > 
> > 
> > Thanks,
> > pq
> > 
> >>>   
> >>>> Some links to provide context and details:
> >>>> * What is underlay?: https://gitlab.freedesktop.org/emersion/libliftoff/-/issues/76
> >>>> * Discussion on how to implement underlay on Weston: https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2325164
> >>>>
> >>>> Cc: Joshua Ashton <joshua@froggi.es>
> >>>> Cc: Michel Dänzer <mdaenzer@redhat.com>
> >>>> Cc: Chao Guo <chao.guo@nxp.com>
> >>>> Cc: Xaver Hugl <xaver.hugl@gmail.com>
> >>>> Cc: Vikas Korjani <Vikas.Korjani@amd.com>
> >>>> Cc: Robert Mader <robert.mader@posteo.de>
> >>>> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
> >>>> Cc: Sean Paul <sean@poorly.run>
> >>>> Cc: Simon Ser <contact@emersion.fr>
> >>>> Cc: Shashank Sharma <shashank.sharma@amd.com>
> >>>> Cc: Harry Wentland <harry.wentland@amd.com>
> >>>> Cc: Sebastian Wick <sebastian.wick@redhat.com>
> >>>>
> >>>> Leo Li (2):
> >>>>    drm/amd/display: Introduce overlay cursor mode
> >>>>    drm/amd/display: Move PRIMARY plane zpos higher
> >>>>
> >>>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 405 ++++++++++++++++--
> >>>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
> >>>>   .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
> >>>>   .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  28 +-
> >>>>   4 files changed, 391 insertions(+), 50 deletions(-)
> >>>>  
> >>>   
> > 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-04 14:22         ` Marius Vlad
@ 2024-04-11 20:33           ` Leo Li
  2024-04-12  8:03             ` Pekka Paalanen
  0 siblings, 1 reply; 27+ messages in thread
From: Leo Li @ 2024-04-11 20:33 UTC (permalink / raw)
  To: Marius Vlad, Harry Wentland
  Cc: Pekka Paalanen, dri-devel, amd-gfx, Joshua Ashton,
	Michel Dänzer, Chao Guo, Xaver Hugl, Vikas Korjani,
	Robert Mader, Sean Paul, Simon Ser, Shashank Sharma,
	Sebastian Wick




On 2024-04-04 10:22, Marius Vlad wrote:
> On Thu, Apr 04, 2024 at 09:59:03AM -0400, Harry Wentland wrote:
>>
> Hi all,
>>
>> On 2024-04-04 06:24, Pekka Paalanen wrote:
>>> On Wed, 3 Apr 2024 17:32:46 -0400
>>> Leo Li <sunpeng.li@amd.com> wrote:
>>>
>>>> On 2024-03-28 10:33, Pekka Paalanen wrote:
>>>>> On Fri, 15 Mar 2024 13:09:56 -0400
>>>>> <sunpeng.li@amd.com> wrote:
>>>>>    
>>>>>> From: Leo Li <sunpeng.li@amd.com>
>>>>>>
>>>>>> These patches aim to make the amdgpgu KMS driver play nicer with compositors
>>>>>> when building multi-plane scanout configurations. They do so by:
>>>>>>
>>>>>> 1. Making cursor behavior more sensible.
>>>>>> 2. Allowing placement of DRM OVERLAY planes underneath the PRIMARY plane for
>>>>>>      'underlay' configurations (perhaps more of a RFC, see below).
>>>>>>
>>>>>> Please see the commit messages for details.
>>>>>>
>>>>>>
>>>>>> For #2, the simplest way to accomplish this was to increase the value of the
>>>>>> immutable zpos property for the PRIMARY plane. This allowed OVERLAY planes with
>>>>>> a mutable zpos range of (0-254) to be positioned underneath the PRIMARY for an
>>>>>> underlay scanout configuration.
>>>>>>
>>>>>> Technically speaking, DCN hardware does not have a concept of primary or overlay
>>>>>> planes - there are simply 4 general purpose hardware pipes that can be maped in
>>>>>> any configuration. So the immutable zpos restriction on the PRIMARY plane is
>>>>>> kind of arbitrary; it can have a mutable range of (0-254) just like the
>>>>>> OVERLAYs. The distinction between PRIMARY and OVERLAY planes is also somewhat
>>>>>> arbitrary. We can interpret PRIMARY as the first plane that should be enabled on
>>>>>> a CRTC, but beyond that, it doesn't mean much for amdgpu.
>>>>>>
>>>>>> Therefore, I'm curious about how compositors devs understand KMS planes and
>>>>>> their zpos properties, and how we would like to use them. It isn't clear to me
>>>>>> how compositors wish to interpret and use the DRM zpos property, or
>>>>>> differentiate between OVERLAY and PRIMARY planes, when it comes to setting up
>>>>>> multi-plane scanout.
>>>>>
>>>>> You already quoted me on the Weston link, so I don't think I have
>>>>> anything to add. Sounds fine to me, and we don't have a standard plane
>>>>> arrangement algorithm that the kernel could optimize zpos ranges
>>>>> against, yet.
>>>>>    
>>>>>> Ultimately, what I'd like to answer is "What can we do on the KMS driver and DRM
>>>>>> plane API side, that can make building multi-plane scanout configurations easier
>>>>>> for compositors?" I'm hoping we can converge on something, whether that be
>>>>>> updating the existing documentation to better define the usage, or update the
>>>>>> API to provide support for something that is lacking.
>>>>>
>>>>> I think there probably should be a standardised plane arrangement
>>>>> algorithm in userspace, because the search space suffers from
>>>>> permutational explosion. Either there needs to be very few planes (max
>>>>> 4 or 5 at-all-possible per CRTC, including shareable ones) for an
>>>>> exhaustive search to be feasible, or all planes should be more or less
>>>>> equal in capabilities and userspace employs some simplified or
>>>>> heuristic search.
>>>>>
>>>>> If the search algorithm is fixed, then drivers could optimize zpos
>>>>> ranges to have the algorithm find a solution faster.
>>>>>
>>>>> My worry is that userspace already has heuristic search algorithms that
>>>>> may start failing if drivers later change their zpos ranges to be more
>>>>> optimal for another algorithm.
>>>>>
>>>>> OTOH, as long as exhaustive search is feasible, then it does not matter
>>>>> how DRM drivers set up the zpos ranges.
>>>>>
>>>>> In any case, the zpos ranges should try to allow all possible plane
>>>>> arrangements while minimizing the number of arrangements that won't
>>>>> work. The absolute values of zpos are pretty much irrelevant, so I
>>>>> think setting one plane to have an immutable zpos is a good idea, even
>>>>> if it's not necessary by the driver. That is one less moving part, and
>>>>> only the relative ordering between the planes matters.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> pq
>>>>
>>>> Right, thanks for your thoughts! I agree that there should be a common plane
>>>> arrangement algorithm. I think libliftoff is the most obvious candidate here. It
>>>> only handles overlay arrangements currently, but mixed-mode arrangements is
>>>> something I've been trying to look at.
>>>>
>>>> Taking the driver's reported zpos into account could narrow down the search
>>>> space for mixed arrangements. We could tell whether underlay, or overlay, or
>>>> both, is supported by looking at the allowed zpos ranges.
>>>>
>>>> I also wonder if it'll make underlay assignments easier. libliftoff has an
>>>> assumption that the PRIMARY plane has the lowest zpos (which now I realize, is
>>>> not always true). Therefore, the underlay buffer has to be placed on the
>>>> PRIMARY, with the render buffer on a higher OVERLAY. Swapping buffers between
>>>> planes when testing mixed-arrangements is kind of awkward, and simply setting
>>>> the OVERLAY's zpos to be lower or higher than the PRIMARY's sounds simpler.
>>>>
>>>> Currently only gamescope makes use of libliftoff, but I'm curious if patches
>>>> hooking it up to Weston would be welcomed? If there are other ways to have a
>>>> common arrangement algorithm, I'd be happy to hear that as well.
>>>
>>> A natural thing would be to document such an algorithm with the KMS
>>> UAPI.
>>>
>>> I don't know libliftoff well enough to say how welcome it would be in
>>> Weston. I have no fundamental or policy reason to keep an independent
>>> implementation in Weston though, so it's plausible at least.

Is it the case that different compositors may want different plane arrangement
behaviors? Like selecting which surfaces to offload, for example? It occurred to
me that prescribing an allocation algorithm via something like libliftoff might
be too restrictive. In which case, documenting the parts that can be nailed down
would be better.

That begs the question of what can be nailed down and what can left to
independent implementation. I guess things like which plane should be enabled
first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
can be defined. How to handle atomic test failures could be as well.

I can start working on a draft for this. If anything, as a spark for discussions
for the display hackfest.

>>>
>>> It would need investigation, and perhaps also extending Weston test
>>> suite a lot more towards VKMS to verify plane assignments. Currently
>>> all plane assignment testing is manual on real hardware.
>>>
>>
>> It looks like VKMS doesn't have explicit zpos yet, so someone would
>> probably need to add that.
>>
>> https://drmdb.emersion.fr/properties/4008636142/zpos
> Yes. If we look into adding that, maybe it should be done using with
> ConfigFS: https://patchwork.freedesktop.org/series/122618/
> 
> With that in and with zpos support, we could then run a batch of tests that
> can dynamically exercise on-the-fly all possible combinations.

Using vkms to come up with a bunch of different hw plane configurations is a
good idea. It may come in handy for testing other compositors too. Thanks for
the suggestions.

- Leo


>>
>> Harry
>>
>>>> Note that libliftoff's algorithm is more complex than weston, since it searches
>>>> harder, and suffers from that permutational explosion. But it solves that by
>>>> trying high benefit arrangements first (offloading surfaces that update
>>>> frequently), and bailing out once the search reaches a hard-coded deadline.
>>>> Since it's currently overlay-only, the goal could be to "simply" have no
>>>> regressions.
>>>
>>> Ensuring no regressions would indeed need to be taken care of by
>>> extending the VKMS-based automated testing.
>>>
>>>
>>> Thanks,
>>> pq
>>>
>>>>>    
>>>>>> Some links to provide context and details:
>>>>>> * What is underlay?: https://gitlab.freedesktop.org/emersion/libliftoff/-/issues/76
>>>>>> * Discussion on how to implement underlay on Weston: https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2325164
>>>>>>
>>>>>> Cc: Joshua Ashton <joshua@froggi.es>
>>>>>> Cc: Michel Dänzer <mdaenzer@redhat.com>
>>>>>> Cc: Chao Guo <chao.guo@nxp.com>
>>>>>> Cc: Xaver Hugl <xaver.hugl@gmail.com>
>>>>>> Cc: Vikas Korjani <Vikas.Korjani@amd.com>
>>>>>> Cc: Robert Mader <robert.mader@posteo.de>
>>>>>> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
>>>>>> Cc: Sean Paul <sean@poorly.run>
>>>>>> Cc: Simon Ser <contact@emersion.fr>
>>>>>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>>>>>> Cc: Harry Wentland <harry.wentland@amd.com>
>>>>>> Cc: Sebastian Wick <sebastian.wick@redhat.com>
>>>>>>
>>>>>> Leo Li (2):
>>>>>>     drm/amd/display: Introduce overlay cursor mode
>>>>>>     drm/amd/display: Move PRIMARY plane zpos higher
>>>>>>
>>>>>>    .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 405 ++++++++++++++++--
>>>>>>    .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
>>>>>>    .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
>>>>>>    .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  28 +-
>>>>>>    4 files changed, 391 insertions(+), 50 deletions(-)
>>>>>>   
>>>>>    
>>>
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-11 20:33           ` Leo Li
@ 2024-04-12  8:03             ` Pekka Paalanen
  2024-04-12 14:28               ` Leo Li
  0 siblings, 1 reply; 27+ messages in thread
From: Pekka Paalanen @ 2024-04-12  8:03 UTC (permalink / raw)
  To: Leo Li
  Cc: Marius Vlad, Harry Wentland, dri-devel, amd-gfx, Joshua Ashton,
	Michel Dänzer, Chao Guo, Xaver Hugl, Vikas Korjani,
	Robert Mader, Sean Paul, Simon Ser, Shashank Sharma,
	Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 10706 bytes --]

On Thu, 11 Apr 2024 16:33:57 -0400
Leo Li <sunpeng.li@amd.com> wrote:

> On 2024-04-04 10:22, Marius Vlad wrote:
> > On Thu, Apr 04, 2024 at 09:59:03AM -0400, Harry Wentland wrote:  
> >>  
> > Hi all,  
> >>
> >> On 2024-04-04 06:24, Pekka Paalanen wrote:  
> >>> On Wed, 3 Apr 2024 17:32:46 -0400
> >>> Leo Li <sunpeng.li@amd.com> wrote:
> >>>  
> >>>> On 2024-03-28 10:33, Pekka Paalanen wrote:  
> >>>>> On Fri, 15 Mar 2024 13:09:56 -0400
> >>>>> <sunpeng.li@amd.com> wrote:
> >>>>>      
> >>>>>> From: Leo Li <sunpeng.li@amd.com>
> >>>>>>
> >>>>>> These patches aim to make the amdgpgu KMS driver play nicer with compositors
> >>>>>> when building multi-plane scanout configurations. They do so by:
> >>>>>>
> >>>>>> 1. Making cursor behavior more sensible.
> >>>>>> 2. Allowing placement of DRM OVERLAY planes underneath the PRIMARY plane for
> >>>>>>      'underlay' configurations (perhaps more of a RFC, see below).
> >>>>>>
> >>>>>> Please see the commit messages for details.
> >>>>>>
> >>>>>>
> >>>>>> For #2, the simplest way to accomplish this was to increase the value of the
> >>>>>> immutable zpos property for the PRIMARY plane. This allowed OVERLAY planes with
> >>>>>> a mutable zpos range of (0-254) to be positioned underneath the PRIMARY for an
> >>>>>> underlay scanout configuration.
> >>>>>>
> >>>>>> Technically speaking, DCN hardware does not have a concept of primary or overlay
> >>>>>> planes - there are simply 4 general purpose hardware pipes that can be maped in
> >>>>>> any configuration. So the immutable zpos restriction on the PRIMARY plane is
> >>>>>> kind of arbitrary; it can have a mutable range of (0-254) just like the
> >>>>>> OVERLAYs. The distinction between PRIMARY and OVERLAY planes is also somewhat
> >>>>>> arbitrary. We can interpret PRIMARY as the first plane that should be enabled on
> >>>>>> a CRTC, but beyond that, it doesn't mean much for amdgpu.
> >>>>>>
> >>>>>> Therefore, I'm curious about how compositors devs understand KMS planes and
> >>>>>> their zpos properties, and how we would like to use them. It isn't clear to me
> >>>>>> how compositors wish to interpret and use the DRM zpos property, or
> >>>>>> differentiate between OVERLAY and PRIMARY planes, when it comes to setting up
> >>>>>> multi-plane scanout.  
> >>>>>
> >>>>> You already quoted me on the Weston link, so I don't think I have
> >>>>> anything to add. Sounds fine to me, and we don't have a standard plane
> >>>>> arrangement algorithm that the kernel could optimize zpos ranges
> >>>>> against, yet.
> >>>>>      
> >>>>>> Ultimately, what I'd like to answer is "What can we do on the KMS driver and DRM
> >>>>>> plane API side, that can make building multi-plane scanout configurations easier
> >>>>>> for compositors?" I'm hoping we can converge on something, whether that be
> >>>>>> updating the existing documentation to better define the usage, or update the
> >>>>>> API to provide support for something that is lacking.  
> >>>>>
> >>>>> I think there probably should be a standardised plane arrangement
> >>>>> algorithm in userspace, because the search space suffers from
> >>>>> permutational explosion. Either there needs to be very few planes (max
> >>>>> 4 or 5 at-all-possible per CRTC, including shareable ones) for an
> >>>>> exhaustive search to be feasible, or all planes should be more or less
> >>>>> equal in capabilities and userspace employs some simplified or
> >>>>> heuristic search.
> >>>>>
> >>>>> If the search algorithm is fixed, then drivers could optimize zpos
> >>>>> ranges to have the algorithm find a solution faster.
> >>>>>
> >>>>> My worry is that userspace already has heuristic search algorithms that
> >>>>> may start failing if drivers later change their zpos ranges to be more
> >>>>> optimal for another algorithm.
> >>>>>
> >>>>> OTOH, as long as exhaustive search is feasible, then it does not matter
> >>>>> how DRM drivers set up the zpos ranges.
> >>>>>
> >>>>> In any case, the zpos ranges should try to allow all possible plane
> >>>>> arrangements while minimizing the number of arrangements that won't
> >>>>> work. The absolute values of zpos are pretty much irrelevant, so I
> >>>>> think setting one plane to have an immutable zpos is a good idea, even
> >>>>> if it's not necessary by the driver. That is one less moving part, and
> >>>>> only the relative ordering between the planes matters.
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> pq  
> >>>>
> >>>> Right, thanks for your thoughts! I agree that there should be a common plane
> >>>> arrangement algorithm. I think libliftoff is the most obvious candidate here. It
> >>>> only handles overlay arrangements currently, but mixed-mode arrangements is
> >>>> something I've been trying to look at.
> >>>>
> >>>> Taking the driver's reported zpos into account could narrow down the search
> >>>> space for mixed arrangements. We could tell whether underlay, or overlay, or
> >>>> both, is supported by looking at the allowed zpos ranges.
> >>>>
> >>>> I also wonder if it'll make underlay assignments easier. libliftoff has an
> >>>> assumption that the PRIMARY plane has the lowest zpos (which now I realize, is
> >>>> not always true). Therefore, the underlay buffer has to be placed on the
> >>>> PRIMARY, with the render buffer on a higher OVERLAY. Swapping buffers between
> >>>> planes when testing mixed-arrangements is kind of awkward, and simply setting
> >>>> the OVERLAY's zpos to be lower or higher than the PRIMARY's sounds simpler.
> >>>>
> >>>> Currently only gamescope makes use of libliftoff, but I'm curious if patches
> >>>> hooking it up to Weston would be welcomed? If there are other ways to have a
> >>>> common arrangement algorithm, I'd be happy to hear that as well.  
> >>>
> >>> A natural thing would be to document such an algorithm with the KMS
> >>> UAPI.
> >>>
> >>> I don't know libliftoff well enough to say how welcome it would be in
> >>> Weston. I have no fundamental or policy reason to keep an independent
> >>> implementation in Weston though, so it's plausible at least.  
> 
> Is it the case that different compositors may want different plane arrangement
> behaviors? Like selecting which surfaces to offload, for example? It occurred to
> me that prescribing an allocation algorithm via something like libliftoff might
> be too restrictive. In which case, documenting the parts that can be nailed down
> would be better.

I don't know. Probably there shouldn't be, eventually, because it's
hard to imagine how DE or end user style/taste/preferences would affect
things. Usually the global goal would be optimising power consumption.

There could be trade-offs though, when reduction in power consumption
results in reduced image quality or increased latency. What to favour
for which surfaces is definitely policy and preference. Such
preferences could perhaps be designed into libliftoff API.

> That begs the question of what can be nailed down and what can left to
> independent implementation. I guess things like which plane should be enabled
> first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
> can be defined. How to handle atomic test failures could be as well.

What room is there for the interpretation of zpos values?

I thought they are unambiguous already: only the relative numerical
order matters, and that uniquely defines the KMS plane ordering.


Thanks,
pq

> I can start working on a draft for this. If anything, as a spark for discussions
> for the display hackfest.
> 
> >>>
> >>> It would need investigation, and perhaps also extending Weston test
> >>> suite a lot more towards VKMS to verify plane assignments. Currently
> >>> all plane assignment testing is manual on real hardware.
> >>>  
> >>
> >> It looks like VKMS doesn't have explicit zpos yet, so someone would
> >> probably need to add that.
> >>
> >> https://drmdb.emersion.fr/properties/4008636142/zpos  
> > Yes. If we look into adding that, maybe it should be done using with
> > ConfigFS: https://patchwork.freedesktop.org/series/122618/
> > 
> > With that in and with zpos support, we could then run a batch of tests that
> > can dynamically exercise on-the-fly all possible combinations.  
> 
> Using vkms to come up with a bunch of different hw plane configurations is a
> good idea. It may come in handy for testing other compositors too. Thanks for
> the suggestions.
> 
> - Leo
> 
> 
> >>
> >> Harry
> >>  
> >>>> Note that libliftoff's algorithm is more complex than weston, since it searches
> >>>> harder, and suffers from that permutational explosion. But it solves that by
> >>>> trying high benefit arrangements first (offloading surfaces that update
> >>>> frequently), and bailing out once the search reaches a hard-coded deadline.
> >>>> Since it's currently overlay-only, the goal could be to "simply" have no
> >>>> regressions.  
> >>>
> >>> Ensuring no regressions would indeed need to be taken care of by
> >>> extending the VKMS-based automated testing.
> >>>
> >>>
> >>> Thanks,
> >>> pq
> >>>  
> >>>>>      
> >>>>>> Some links to provide context and details:
> >>>>>> * What is underlay?: https://gitlab.freedesktop.org/emersion/libliftoff/-/issues/76
> >>>>>> * Discussion on how to implement underlay on Weston: https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2325164
> >>>>>>
> >>>>>> Cc: Joshua Ashton <joshua@froggi.es>
> >>>>>> Cc: Michel Dänzer <mdaenzer@redhat.com>
> >>>>>> Cc: Chao Guo <chao.guo@nxp.com>
> >>>>>> Cc: Xaver Hugl <xaver.hugl@gmail.com>
> >>>>>> Cc: Vikas Korjani <Vikas.Korjani@amd.com>
> >>>>>> Cc: Robert Mader <robert.mader@posteo.de>
> >>>>>> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
> >>>>>> Cc: Sean Paul <sean@poorly.run>
> >>>>>> Cc: Simon Ser <contact@emersion.fr>
> >>>>>> Cc: Shashank Sharma <shashank.sharma@amd.com>
> >>>>>> Cc: Harry Wentland <harry.wentland@amd.com>
> >>>>>> Cc: Sebastian Wick <sebastian.wick@redhat.com>
> >>>>>>
> >>>>>> Leo Li (2):
> >>>>>>     drm/amd/display: Introduce overlay cursor mode
> >>>>>>     drm/amd/display: Move PRIMARY plane zpos higher
> >>>>>>
> >>>>>>    .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 405 ++++++++++++++++--
> >>>>>>    .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
> >>>>>>    .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
> >>>>>>    .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  28 +-
> >>>>>>    4 files changed, 391 insertions(+), 50 deletions(-)
> >>>>>>     
> >>>>>      
> >>>  
> >>  


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-12  8:03             ` Pekka Paalanen
@ 2024-04-12 14:28               ` Leo Li
  2024-04-12 15:07                 ` Pekka Paalanen
  0 siblings, 1 reply; 27+ messages in thread
From: Leo Li @ 2024-04-12 14:28 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Marius Vlad, Harry Wentland, dri-devel, amd-gfx, Joshua Ashton,
	Michel Dänzer, Chao Guo, Xaver Hugl, Vikas Korjani,
	Robert Mader, Sean Paul, Simon Ser, Shashank Sharma,
	Sebastian Wick



On 2024-04-12 04:03, Pekka Paalanen wrote:
> On Thu, 11 Apr 2024 16:33:57 -0400
> Leo Li <sunpeng.li@amd.com> wrote:
> 
>> On 2024-04-04 10:22, Marius Vlad wrote:
>>> On Thu, Apr 04, 2024 at 09:59:03AM -0400, Harry Wentland wrote:
>>>>   
>>> Hi all,
>>>>
>>>> On 2024-04-04 06:24, Pekka Paalanen wrote:
>>>>> On Wed, 3 Apr 2024 17:32:46 -0400
>>>>> Leo Li <sunpeng.li@amd.com> wrote:
>>>>>   
>>>>>> On 2024-03-28 10:33, Pekka Paalanen wrote:
>>>>>>> On Fri, 15 Mar 2024 13:09:56 -0400
>>>>>>> <sunpeng.li@amd.com> wrote:
>>>>>>>       
>>>>>>>> From: Leo Li <sunpeng.li@amd.com>
>>>>>>>>
>>>>>>>> These patches aim to make the amdgpgu KMS driver play nicer with compositors
>>>>>>>> when building multi-plane scanout configurations. They do so by:
>>>>>>>>
>>>>>>>> 1. Making cursor behavior more sensible.
>>>>>>>> 2. Allowing placement of DRM OVERLAY planes underneath the PRIMARY plane for
>>>>>>>>       'underlay' configurations (perhaps more of a RFC, see below).
>>>>>>>>
>>>>>>>> Please see the commit messages for details.
>>>>>>>>
>>>>>>>>
>>>>>>>> For #2, the simplest way to accomplish this was to increase the value of the
>>>>>>>> immutable zpos property for the PRIMARY plane. This allowed OVERLAY planes with
>>>>>>>> a mutable zpos range of (0-254) to be positioned underneath the PRIMARY for an
>>>>>>>> underlay scanout configuration.
>>>>>>>>
>>>>>>>> Technically speaking, DCN hardware does not have a concept of primary or overlay
>>>>>>>> planes - there are simply 4 general purpose hardware pipes that can be maped in
>>>>>>>> any configuration. So the immutable zpos restriction on the PRIMARY plane is
>>>>>>>> kind of arbitrary; it can have a mutable range of (0-254) just like the
>>>>>>>> OVERLAYs. The distinction between PRIMARY and OVERLAY planes is also somewhat
>>>>>>>> arbitrary. We can interpret PRIMARY as the first plane that should be enabled on
>>>>>>>> a CRTC, but beyond that, it doesn't mean much for amdgpu.
>>>>>>>>
>>>>>>>> Therefore, I'm curious about how compositors devs understand KMS planes and
>>>>>>>> their zpos properties, and how we would like to use them. It isn't clear to me
>>>>>>>> how compositors wish to interpret and use the DRM zpos property, or
>>>>>>>> differentiate between OVERLAY and PRIMARY planes, when it comes to setting up
>>>>>>>> multi-plane scanout.
>>>>>>>
>>>>>>> You already quoted me on the Weston link, so I don't think I have
>>>>>>> anything to add. Sounds fine to me, and we don't have a standard plane
>>>>>>> arrangement algorithm that the kernel could optimize zpos ranges
>>>>>>> against, yet.
>>>>>>>       
>>>>>>>> Ultimately, what I'd like to answer is "What can we do on the KMS driver and DRM
>>>>>>>> plane API side, that can make building multi-plane scanout configurations easier
>>>>>>>> for compositors?" I'm hoping we can converge on something, whether that be
>>>>>>>> updating the existing documentation to better define the usage, or update the
>>>>>>>> API to provide support for something that is lacking.
>>>>>>>
>>>>>>> I think there probably should be a standardised plane arrangement
>>>>>>> algorithm in userspace, because the search space suffers from
>>>>>>> permutational explosion. Either there needs to be very few planes (max
>>>>>>> 4 or 5 at-all-possible per CRTC, including shareable ones) for an
>>>>>>> exhaustive search to be feasible, or all planes should be more or less
>>>>>>> equal in capabilities and userspace employs some simplified or
>>>>>>> heuristic search.
>>>>>>>
>>>>>>> If the search algorithm is fixed, then drivers could optimize zpos
>>>>>>> ranges to have the algorithm find a solution faster.
>>>>>>>
>>>>>>> My worry is that userspace already has heuristic search algorithms that
>>>>>>> may start failing if drivers later change their zpos ranges to be more
>>>>>>> optimal for another algorithm.
>>>>>>>
>>>>>>> OTOH, as long as exhaustive search is feasible, then it does not matter
>>>>>>> how DRM drivers set up the zpos ranges.
>>>>>>>
>>>>>>> In any case, the zpos ranges should try to allow all possible plane
>>>>>>> arrangements while minimizing the number of arrangements that won't
>>>>>>> work. The absolute values of zpos are pretty much irrelevant, so I
>>>>>>> think setting one plane to have an immutable zpos is a good idea, even
>>>>>>> if it's not necessary by the driver. That is one less moving part, and
>>>>>>> only the relative ordering between the planes matters.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> pq
>>>>>>
>>>>>> Right, thanks for your thoughts! I agree that there should be a common plane
>>>>>> arrangement algorithm. I think libliftoff is the most obvious candidate here. It
>>>>>> only handles overlay arrangements currently, but mixed-mode arrangements is
>>>>>> something I've been trying to look at.
>>>>>>
>>>>>> Taking the driver's reported zpos into account could narrow down the search
>>>>>> space for mixed arrangements. We could tell whether underlay, or overlay, or
>>>>>> both, is supported by looking at the allowed zpos ranges.
>>>>>>
>>>>>> I also wonder if it'll make underlay assignments easier. libliftoff has an
>>>>>> assumption that the PRIMARY plane has the lowest zpos (which now I realize, is
>>>>>> not always true). Therefore, the underlay buffer has to be placed on the
>>>>>> PRIMARY, with the render buffer on a higher OVERLAY. Swapping buffers between
>>>>>> planes when testing mixed-arrangements is kind of awkward, and simply setting
>>>>>> the OVERLAY's zpos to be lower or higher than the PRIMARY's sounds simpler.
>>>>>>
>>>>>> Currently only gamescope makes use of libliftoff, but I'm curious if patches
>>>>>> hooking it up to Weston would be welcomed? If there are other ways to have a
>>>>>> common arrangement algorithm, I'd be happy to hear that as well.
>>>>>
>>>>> A natural thing would be to document such an algorithm with the KMS
>>>>> UAPI.
>>>>>
>>>>> I don't know libliftoff well enough to say how welcome it would be in
>>>>> Weston. I have no fundamental or policy reason to keep an independent
>>>>> implementation in Weston though, so it's plausible at least.
>>
>> Is it the case that different compositors may want different plane arrangement
>> behaviors? Like selecting which surfaces to offload, for example? It occurred to
>> me that prescribing an allocation algorithm via something like libliftoff might
>> be too restrictive. In which case, documenting the parts that can be nailed down
>> would be better.
> 
> I don't know. Probably there shouldn't be, eventually, because it's
> hard to imagine how DE or end user style/taste/preferences would affect
> things. Usually the global goal would be optimising power consumption.
> 
> There could be trade-offs though, when reduction in power consumption
> results in reduced image quality or increased latency. What to favour
> for which surfaces is definitely policy and preference. Such
> preferences could perhaps be designed into libliftoff API.
> 
>> That begs the question of what can be nailed down and what can left to
>> independent implementation. I guess things like which plane should be enabled
>> first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
>> can be defined. How to handle atomic test failures could be as well.
> 
> What room is there for the interpretation of zpos values?
> 
> I thought they are unambiguous already: only the relative numerical
> order matters, and that uniquely defines the KMS plane ordering.

The zpos value of the PRIMARY plane relative to OVERLAYS, for example, as a way
for vendors to communicate overlay, underlay, or mixed-arrangement support. I
don't think allowing OVERLAYs to be placed under the PRIMARY is currently
documented as a way to support underlay.

libliftoff for example, assumes that the PRIMARY has the lowest zpos. So
underlay arrangements will use an OVERLAY for the scanout plane, and the PRIMARY
for the underlay view.

Thanks,
Leo

> 
> 
> Thanks,
> pq
> 
>> I can start working on a draft for this. If anything, as a spark for discussions
>> for the display hackfest.
>>
>>>>>
>>>>> It would need investigation, and perhaps also extending Weston test
>>>>> suite a lot more towards VKMS to verify plane assignments. Currently
>>>>> all plane assignment testing is manual on real hardware.
>>>>>   
>>>>
>>>> It looks like VKMS doesn't have explicit zpos yet, so someone would
>>>> probably need to add that.
>>>>
>>>> https://drmdb.emersion.fr/properties/4008636142/zpos
>>> Yes. If we look into adding that, maybe it should be done using with
>>> ConfigFS: https://patchwork.freedesktop.org/series/122618/
>>>
>>> With that in and with zpos support, we could then run a batch of tests that
>>> can dynamically exercise on-the-fly all possible combinations.
>>
>> Using vkms to come up with a bunch of different hw plane configurations is a
>> good idea. It may come in handy for testing other compositors too. Thanks for
>> the suggestions.
>>
>> - Leo
>>
>>
>>>>
>>>> Harry
>>>>   
>>>>>> Note that libliftoff's algorithm is more complex than weston, since it searches
>>>>>> harder, and suffers from that permutational explosion. But it solves that by
>>>>>> trying high benefit arrangements first (offloading surfaces that update
>>>>>> frequently), and bailing out once the search reaches a hard-coded deadline.
>>>>>> Since it's currently overlay-only, the goal could be to "simply" have no
>>>>>> regressions.
>>>>>
>>>>> Ensuring no regressions would indeed need to be taken care of by
>>>>> extending the VKMS-based automated testing.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> pq
>>>>>   
>>>>>>>       
>>>>>>>> Some links to provide context and details:
>>>>>>>> * What is underlay?: https://gitlab.freedesktop.org/emersion/libliftoff/-/issues/76
>>>>>>>> * Discussion on how to implement underlay on Weston: https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2325164
>>>>>>>>
>>>>>>>> Cc: Joshua Ashton <joshua@froggi.es>
>>>>>>>> Cc: Michel Dänzer <mdaenzer@redhat.com>
>>>>>>>> Cc: Chao Guo <chao.guo@nxp.com>
>>>>>>>> Cc: Xaver Hugl <xaver.hugl@gmail.com>
>>>>>>>> Cc: Vikas Korjani <Vikas.Korjani@amd.com>
>>>>>>>> Cc: Robert Mader <robert.mader@posteo.de>
>>>>>>>> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
>>>>>>>> Cc: Sean Paul <sean@poorly.run>
>>>>>>>> Cc: Simon Ser <contact@emersion.fr>
>>>>>>>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>>>>>>>> Cc: Harry Wentland <harry.wentland@amd.com>
>>>>>>>> Cc: Sebastian Wick <sebastian.wick@redhat.com>
>>>>>>>>
>>>>>>>> Leo Li (2):
>>>>>>>>      drm/amd/display: Introduce overlay cursor mode
>>>>>>>>      drm/amd/display: Move PRIMARY plane zpos higher
>>>>>>>>
>>>>>>>>     .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 405 ++++++++++++++++--
>>>>>>>>     .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
>>>>>>>>     .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c    |   1 +
>>>>>>>>     .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  28 +-
>>>>>>>>     4 files changed, 391 insertions(+), 50 deletions(-)
>>>>>>>>      
>>>>>>>       
>>>>>   
>>>>   
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-12 14:28               ` Leo Li
@ 2024-04-12 15:07                 ` Pekka Paalanen
  2024-04-12 15:31                   ` Alex Deucher
  0 siblings, 1 reply; 27+ messages in thread
From: Pekka Paalanen @ 2024-04-12 15:07 UTC (permalink / raw)
  To: Leo Li
  Cc: Marius Vlad, Harry Wentland, dri-devel, amd-gfx, Joshua Ashton,
	Michel Dänzer, Chao Guo, Xaver Hugl, Vikas Korjani,
	Robert Mader, Sean Paul, Simon Ser, Shashank Sharma,
	Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 2212 bytes --]

On Fri, 12 Apr 2024 10:28:52 -0400
Leo Li <sunpeng.li@amd.com> wrote:

> On 2024-04-12 04:03, Pekka Paalanen wrote:
> > On Thu, 11 Apr 2024 16:33:57 -0400
> > Leo Li <sunpeng.li@amd.com> wrote:
> >   

...

> >> That begs the question of what can be nailed down and what can left to
> >> independent implementation. I guess things like which plane should be enabled
> >> first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
> >> can be defined. How to handle atomic test failures could be as well.  
> > 
> > What room is there for the interpretation of zpos values?
> > 
> > I thought they are unambiguous already: only the relative numerical
> > order matters, and that uniquely defines the KMS plane ordering.  
> 
> The zpos value of the PRIMARY plane relative to OVERLAYS, for example, as a way
> for vendors to communicate overlay, underlay, or mixed-arrangement support. I
> don't think allowing OVERLAYs to be placed under the PRIMARY is currently
> documented as a way to support underlay.

I always thought it's obvious that the zpos numbers dictate the plane
order without any other rules. After all, we have the universal planes
concept, where the plane type is only informational to aid heuristics
rather than defining anything.

Only if the zpos property does not exist, the plane types would come
into play.

Of course, if there actually exists userspace that fails if zpos allows
an overlay type plane to be placed below primary, or fails if primary
zpos is not zero, then DRM needs a new client cap.

> libliftoff for example, assumes that the PRIMARY has the lowest zpos. So
> underlay arrangements will use an OVERLAY for the scanout plane, and the PRIMARY
> for the underlay view.

That's totally ok. It works, right? Plane type does not matter if the
KMS driver accepts the configuration.

What is a "scanout plane"? Aren't all KMS planes by definition scanout
planes?

IOW, if the KMS client understands zpos and can do a proper KMS
configuration search, and all planes have zpos property, then there is
no need to look at the plane type at all. That is the goal of the
universal planes feature.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-12 15:07                 ` Pekka Paalanen
@ 2024-04-12 15:31                   ` Alex Deucher
  2024-04-12 20:14                     ` Leo Li
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Deucher @ 2024-04-12 15:31 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Leo Li, Marius Vlad, Harry Wentland, dri-devel, amd-gfx,
	Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Robert Mader, Sean Paul, Simon Ser,
	Shashank Sharma, Sebastian Wick

On Fri, Apr 12, 2024 at 11:08 AM Pekka Paalanen
<pekka.paalanen@collabora.com> wrote:
>
> On Fri, 12 Apr 2024 10:28:52 -0400
> Leo Li <sunpeng.li@amd.com> wrote:
>
> > On 2024-04-12 04:03, Pekka Paalanen wrote:
> > > On Thu, 11 Apr 2024 16:33:57 -0400
> > > Leo Li <sunpeng.li@amd.com> wrote:
> > >
>
> ...
>
> > >> That begs the question of what can be nailed down and what can left to
> > >> independent implementation. I guess things like which plane should be enabled
> > >> first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
> > >> can be defined. How to handle atomic test failures could be as well.
> > >
> > > What room is there for the interpretation of zpos values?
> > >
> > > I thought they are unambiguous already: only the relative numerical
> > > order matters, and that uniquely defines the KMS plane ordering.
> >
> > The zpos value of the PRIMARY plane relative to OVERLAYS, for example, as a way
> > for vendors to communicate overlay, underlay, or mixed-arrangement support. I
> > don't think allowing OVERLAYs to be placed under the PRIMARY is currently
> > documented as a way to support underlay.
>
> I always thought it's obvious that the zpos numbers dictate the plane
> order without any other rules. After all, we have the universal planes
> concept, where the plane type is only informational to aid heuristics
> rather than defining anything.
>
> Only if the zpos property does not exist, the plane types would come
> into play.
>
> Of course, if there actually exists userspace that fails if zpos allows
> an overlay type plane to be placed below primary, or fails if primary
> zpos is not zero, then DRM needs a new client cap.
>
> > libliftoff for example, assumes that the PRIMARY has the lowest zpos. So
> > underlay arrangements will use an OVERLAY for the scanout plane, and the PRIMARY
> > for the underlay view.
>
> That's totally ok. It works, right? Plane type does not matter if the
> KMS driver accepts the configuration.
>
> What is a "scanout plane"? Aren't all KMS planes by definition scanout
> planes?
>
> IOW, if the KMS client understands zpos and can do a proper KMS
> configuration search, and all planes have zpos property, then there is
> no need to look at the plane type at all. That is the goal of the
> universal planes feature.

The optimal configuration with DCN hardware is using underlays.  E.g.,
the desktop plane would be at the top and would have holes cut out of
it for videos or windows that want their own plane.  If you do it the
other way around, there are lots of limitations.

Alex

>
>
> Thanks,
> pq

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-12 15:31                   ` Alex Deucher
@ 2024-04-12 20:14                     ` Leo Li
  2024-04-15  8:19                       ` Pekka Paalanen
  0 siblings, 1 reply; 27+ messages in thread
From: Leo Li @ 2024-04-12 20:14 UTC (permalink / raw)
  To: Alex Deucher, Pekka Paalanen
  Cc: Marius Vlad, Harry Wentland, dri-devel, amd-gfx, Joshua Ashton,
	Michel Dänzer, Chao Guo, Xaver Hugl, Vikas Korjani,
	Robert Mader, Sean Paul, Simon Ser, Shashank Sharma,
	Sebastian Wick



On 2024-04-12 11:31, Alex Deucher wrote:
> On Fri, Apr 12, 2024 at 11:08 AM Pekka Paalanen
> <pekka.paalanen@collabora.com> wrote:
>>
>> On Fri, 12 Apr 2024 10:28:52 -0400
>> Leo Li <sunpeng.li@amd.com> wrote:
>>
>>> On 2024-04-12 04:03, Pekka Paalanen wrote:
>>>> On Thu, 11 Apr 2024 16:33:57 -0400
>>>> Leo Li <sunpeng.li@amd.com> wrote:
>>>>
>>
>> ...
>>
>>>>> That begs the question of what can be nailed down and what can left to
>>>>> independent implementation. I guess things like which plane should be enabled
>>>>> first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
>>>>> can be defined. How to handle atomic test failures could be as well.
>>>>
>>>> What room is there for the interpretation of zpos values?
>>>>
>>>> I thought they are unambiguous already: only the relative numerical
>>>> order matters, and that uniquely defines the KMS plane ordering.
>>>
>>> The zpos value of the PRIMARY plane relative to OVERLAYS, for example, as a way
>>> for vendors to communicate overlay, underlay, or mixed-arrangement support. I
>>> don't think allowing OVERLAYs to be placed under the PRIMARY is currently
>>> documented as a way to support underlay.
>>
>> I always thought it's obvious that the zpos numbers dictate the plane
>> order without any other rules. After all, we have the universal planes
>> concept, where the plane type is only informational to aid heuristics
>> rather than defining anything.
>>
>> Only if the zpos property does not exist, the plane types would come
>> into play.
>>
>> Of course, if there actually exists userspace that fails if zpos allows
>> an overlay type plane to be placed below primary, or fails if primary
>> zpos is not zero, then DRM needs a new client cap.

Right, it wasn't immediately clear to me that the API allowed placement of 
things beneath the PRIMARY. But reading the docs for drm_plane_create_zpos*, 
there's nothing that forbids it.

>>
>>> libliftoff for example, assumes that the PRIMARY has the lowest zpos. So
>>> underlay arrangements will use an OVERLAY for the scanout plane, and the PRIMARY
>>> for the underlay view.
>>
>> That's totally ok. It works, right? Plane type does not matter if the
>> KMS driver accepts the configuration.
>>
>> What is a "scanout plane"? Aren't all KMS planes by definition scanout
>> planes?

Pardon my terminology, I thought the scanout plane was where weston rendered
non-offloadable surfaces to. I guess it's more correct to call it the "render
plane". On weston, it seems to be always assigned to the PRIMARY.


For libliftoff, using OVERLAYs as the render plane and PRIMARY as the underlay
plane would work. But I think keeping the render plane on PRIMARY (a la weston)
makes underlay arrangements easier to allocate, and would be nice to incorporate
into a shared algorithm.

In an underlay arrangement, pushing down an OVERLAY's zpos below the PRIMARY's
zpos is simpler than swapping their surfaces. If such an arrangement fails
atomic_test, we won't have to worry about swapping the surfaces back. Of course,
it's not that we can't keep track of that in the algorithm, but I think it does
make things easier.

It may help with reducing the amount of atomic tests. Assuming that the same DRM
plane provides the same format/color management/transformation support
regardless of it's zpos, we should be able to reasonably expect that changing
it's z-ordering will not cause atomic_test failures (or at least, expect less
causes for failure). In other words, swapping the render plane from the PRIMARY
to an OVERLAY might have more causes for an atomic_test fail, versus changing
their z-ordering. The driver might have to do more things under-the-hood to
provide this consistent behavior, but I think that's the right place for it.
After all, drivers should know more about their hardware's behavior.

The assumption that the PRIMARY has the lowest zpos isn't always true. I
was made aware that the imx8mq platform places all of their OVERLAYS beneath the
PRIMARY. Granted, the KMS code for enabling OVERLAYS is not upstream yet, but it
is available from this thread:
https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2319898
. I guess this is more of a bad assumption that should be fixed in libliftoff.

>>
>> IOW, if the KMS client understands zpos and can do a proper KMS
>> configuration search, and all planes have zpos property, then there is
>> no need to look at the plane type at all. That is the goal of the
>> universal planes feature.
> 
> The optimal configuration with DCN hardware is using underlays.  E.g.,
> the desktop plane would be at the top and would have holes cut out of
> it for videos or windows that want their own plane.  If you do it the
> other way around, there are lots of limitations.
> 
> Alex

Right, patch 1/2 tries to work around one of these limitations (cursor-on-yuv). 
Others have mentioned we can do the same for scaling.

Thanks,
Leo

> 
>>
>>
>> Thanks,
>> pq

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-12 20:14                     ` Leo Li
@ 2024-04-15  8:19                       ` Pekka Paalanen
  2024-04-15 22:33                         ` Leo Li
  0 siblings, 1 reply; 27+ messages in thread
From: Pekka Paalanen @ 2024-04-15  8:19 UTC (permalink / raw)
  To: Leo Li
  Cc: Alex Deucher, Marius Vlad, Harry Wentland, dri-devel, amd-gfx,
	Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Robert Mader, Sean Paul, Simon Ser,
	Shashank Sharma, Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 6945 bytes --]

On Fri, 12 Apr 2024 16:14:28 -0400
Leo Li <sunpeng.li@amd.com> wrote:

> On 2024-04-12 11:31, Alex Deucher wrote:
> > On Fri, Apr 12, 2024 at 11:08 AM Pekka Paalanen
> > <pekka.paalanen@collabora.com> wrote:  
> >>
> >> On Fri, 12 Apr 2024 10:28:52 -0400
> >> Leo Li <sunpeng.li@amd.com> wrote:
> >>  
> >>> On 2024-04-12 04:03, Pekka Paalanen wrote:  
> >>>> On Thu, 11 Apr 2024 16:33:57 -0400
> >>>> Leo Li <sunpeng.li@amd.com> wrote:
> >>>>  
> >>
> >> ...
> >>  
> >>>>> That begs the question of what can be nailed down and what can left to
> >>>>> independent implementation. I guess things like which plane should be enabled
> >>>>> first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
> >>>>> can be defined. How to handle atomic test failures could be as well.  
> >>>>
> >>>> What room is there for the interpretation of zpos values?
> >>>>
> >>>> I thought they are unambiguous already: only the relative numerical
> >>>> order matters, and that uniquely defines the KMS plane ordering.  
> >>>
> >>> The zpos value of the PRIMARY plane relative to OVERLAYS, for example, as a way
> >>> for vendors to communicate overlay, underlay, or mixed-arrangement support. I
> >>> don't think allowing OVERLAYs to be placed under the PRIMARY is currently
> >>> documented as a way to support underlay.  
> >>
> >> I always thought it's obvious that the zpos numbers dictate the plane
> >> order without any other rules. After all, we have the universal planes
> >> concept, where the plane type is only informational to aid heuristics
> >> rather than defining anything.
> >>
> >> Only if the zpos property does not exist, the plane types would come
> >> into play.
> >>
> >> Of course, if there actually exists userspace that fails if zpos allows
> >> an overlay type plane to be placed below primary, or fails if primary
> >> zpos is not zero, then DRM needs a new client cap.  
> 
> Right, it wasn't immediately clear to me that the API allowed placement of 
> things beneath the PRIMARY. But reading the docs for drm_plane_create_zpos*, 
> there's nothing that forbids it.
> 
> >>  
> >>> libliftoff for example, assumes that the PRIMARY has the lowest zpos. So
> >>> underlay arrangements will use an OVERLAY for the scanout plane, and the PRIMARY
> >>> for the underlay view.  
> >>
> >> That's totally ok. It works, right? Plane type does not matter if the
> >> KMS driver accepts the configuration.
> >>
> >> What is a "scanout plane"? Aren't all KMS planes by definition scanout
> >> planes?  
> 
> Pardon my terminology, I thought the scanout plane was where weston rendered
> non-offloadable surfaces to. I guess it's more correct to call it the "render
> plane". On weston, it seems to be always assigned to the PRIMARY.
> 

The assignment restriction is just technical design debt. It is
limiting. There is no other good reason for it, than when lighting
up a CRTC for the first time, Weston should do it with the renderer FB
only, on the plane that is most likely to succeed i.e. PRIMARY. After
the CRTC is lit, there should be no built-in limitations in what can go
where.

The reason for this is that if a CRTC can be activated, it must always
be able to show the renderer FB without incurring a modeset. This is
important for ensuring that the fallback compositing (renderer) is
always possible. So we start with that configuration, and everything
else is optional bonus.

> 
> For libliftoff, using OVERLAYs as the render plane and PRIMARY as the underlay
> plane would work. But I think keeping the render plane on PRIMARY (a la weston)
> makes underlay arrangements easier to allocate, and would be nice to incorporate
> into a shared algorithm.

If zpos exists, I don't think such limitation is a good idea. It will
just limit the possible configurations for no reason.

With zpos, the KMS plane type should be irrelevant for their
z-ordering. Underlay vs. overlay completely loses its meaning at the
KMS level.

> In an underlay arrangement, pushing down an OVERLAY's zpos below the PRIMARY's
> zpos is simpler than swapping their surfaces. If such an arrangement fails
> atomic_test, we won't have to worry about swapping the surfaces back. Of course,
> it's not that we can't keep track of that in the algorithm, but I think it does
> make things easier.

There is no "swapping" or "swapping back". The tentative configuration
is created as a new object that contains the complete CRTC+connector
state, and if it doesn't work, it's simply destroyed. In Weston at
least, I don't know of libliftoff.

One surface could also be assigned to multiple KMS planes for different
CRTCs, so there should be no 1:1 association in the first place.

> It may help with reducing the amount of atomic tests. Assuming that the same DRM
> plane provides the same format/color management/transformation support
> regardless of it's zpos,

I would definitely expect so.

> we should be able to reasonably expect that changing
> it's z-ordering will not cause atomic_test failures (or at least, expect less
> causes for failure). In other words, swapping the render plane from the PRIMARY
> to an OVERLAY might have more causes for an atomic_test fail, versus changing
> their z-ordering. The driver might have to do more things under-the-hood to
> provide this consistent behavior, but I think that's the right place for it.
> After all, drivers should know more about their hardware's behavior.

Indeed.

> The assumption that the PRIMARY has the lowest zpos isn't always true. I
> was made aware that the imx8mq platform places all of their OVERLAYS beneath the
> PRIMARY. Granted, the KMS code for enabling OVERLAYS is not upstream yet, but it
> is available from this thread:
> https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2319898
> . I guess this is more of a bad assumption that should be fixed in libliftoff.

Weston needs fixing too, at least in case a renderer FB is used on the
CRTC. Weston has two problems: renderer FB is always on PRIMARY plane,
and renderer FB is always completely opaque.


Thanks,
pq

> >>
> >> IOW, if the KMS client understands zpos and can do a proper KMS
> >> configuration search, and all planes have zpos property, then there is
> >> no need to look at the plane type at all. That is the goal of the
> >> universal planes feature.  
> > 
> > The optimal configuration with DCN hardware is using underlays.  E.g.,
> > the desktop plane would be at the top and would have holes cut out of
> > it for videos or windows that want their own plane.  If you do it the
> > other way around, there are lots of limitations.
> > 
> > Alex  
> 
> Right, patch 1/2 tries to work around one of these limitations (cursor-on-yuv). 
> Others have mentioned we can do the same for scaling.
> 
> Thanks,
> Leo

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-15  8:19                       ` Pekka Paalanen
@ 2024-04-15 22:33                         ` Leo Li
  2024-04-16  8:01                           ` Pekka Paalanen
  0 siblings, 1 reply; 27+ messages in thread
From: Leo Li @ 2024-04-15 22:33 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Alex Deucher, Marius Vlad, Harry Wentland, dri-devel, amd-gfx,
	Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Robert Mader, Sean Paul, Simon Ser,
	Shashank Sharma, Sebastian Wick




On 2024-04-15 04:19, Pekka Paalanen wrote:
> On Fri, 12 Apr 2024 16:14:28 -0400
> Leo Li <sunpeng.li@amd.com> wrote:
> 
>> On 2024-04-12 11:31, Alex Deucher wrote:
>>> On Fri, Apr 12, 2024 at 11:08 AM Pekka Paalanen
>>> <pekka.paalanen@collabora.com> wrote:
>>>>
>>>> On Fri, 12 Apr 2024 10:28:52 -0400
>>>> Leo Li <sunpeng.li@amd.com> wrote:
>>>>   
>>>>> On 2024-04-12 04:03, Pekka Paalanen wrote:
>>>>>> On Thu, 11 Apr 2024 16:33:57 -0400
>>>>>> Leo Li <sunpeng.li@amd.com> wrote:
>>>>>>   
>>>>
>>>> ...
>>>>   
>>>>>>> That begs the question of what can be nailed down and what can left to
>>>>>>> independent implementation. I guess things like which plane should be enabled
>>>>>>> first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
>>>>>>> can be defined. How to handle atomic test failures could be as well.
>>>>>>
>>>>>> What room is there for the interpretation of zpos values?
>>>>>>
>>>>>> I thought they are unambiguous already: only the relative numerical
>>>>>> order matters, and that uniquely defines the KMS plane ordering.
>>>>>
>>>>> The zpos value of the PRIMARY plane relative to OVERLAYS, for example, as a way
>>>>> for vendors to communicate overlay, underlay, or mixed-arrangement support. I
>>>>> don't think allowing OVERLAYs to be placed under the PRIMARY is currently
>>>>> documented as a way to support underlay.
>>>>
>>>> I always thought it's obvious that the zpos numbers dictate the plane
>>>> order without any other rules. After all, we have the universal planes
>>>> concept, where the plane type is only informational to aid heuristics
>>>> rather than defining anything.
>>>>
>>>> Only if the zpos property does not exist, the plane types would come
>>>> into play.
>>>>
>>>> Of course, if there actually exists userspace that fails if zpos allows
>>>> an overlay type plane to be placed below primary, or fails if primary
>>>> zpos is not zero, then DRM needs a new client cap.
>>
>> Right, it wasn't immediately clear to me that the API allowed placement of
>> things beneath the PRIMARY. But reading the docs for drm_plane_create_zpos*,
>> there's nothing that forbids it.
>>
>>>>   
>>>>> libliftoff for example, assumes that the PRIMARY has the lowest zpos. So
>>>>> underlay arrangements will use an OVERLAY for the scanout plane, and the PRIMARY
>>>>> for the underlay view.
>>>>
>>>> That's totally ok. It works, right? Plane type does not matter if the
>>>> KMS driver accepts the configuration.
>>>>
>>>> What is a "scanout plane"? Aren't all KMS planes by definition scanout
>>>> planes?
>>
>> Pardon my terminology, I thought the scanout plane was where weston rendered
>> non-offloadable surfaces to. I guess it's more correct to call it the "render
>> plane". On weston, it seems to be always assigned to the PRIMARY.
>>
> 
> The assignment restriction is just technical design debt. It is
> limiting. There is no other good reason for it, than when lighting
> up a CRTC for the first time, Weston should do it with the renderer FB
> only, on the plane that is most likely to succeed i.e. PRIMARY. After
> the CRTC is lit, there should be no built-in limitations in what can go
> where.
> 
> The reason for this is that if a CRTC can be activated, it must always
> be able to show the renderer FB without incurring a modeset. This is
> important for ensuring that the fallback compositing (renderer) is
> always possible. So we start with that configuration, and everything
> else is optional bonus.

Genuinely curious - What exactly is limiting with keeping the renderer FB on
PRIMARY? IOW, what is the additional benefit of placing the renderer FB on
something other than PRIMARY?

> 
>>
>> For libliftoff, using OVERLAYs as the render plane and PRIMARY as the underlay
>> plane would work. But I think keeping the render plane on PRIMARY (a la weston)
>> makes underlay arrangements easier to allocate, and would be nice to incorporate
>> into a shared algorithm.
> 
> If zpos exists, I don't think such limitation is a good idea. It will
> just limit the possible configurations for no reason.
> 
> With zpos, the KMS plane type should be irrelevant for their
> z-ordering. Underlay vs. overlay completely loses its meaning at the
> KMS level.

Right, the plane types loose their meanings. But at least with the way
libliftoff builds the plane arrangement, where we first allocate the renderer fb
matters.

libliftoff incrementally builds the atomic state by adding a single plane to the
atomic state, then testing it. It essentially does a depth-first-search of all
possible arrangements, pruning the search on atomic test fail. The state that
offloads the most number of FBs will be the arrangement used.

Of course, it's unlikely that the entire DFS tree will traversed in time for a
frame. So the key is to search the most probable and high-benefit branches
first, while minimizing the # of atomic tests needed, before a hard-coded
deadline is hit.

Following this algorithm, the PRIMARY needs to be enabled first, followed by all
the secondary planes. After a plane is enabled, it's not preferred to change
it's assigned FB, since that can cause the state to be rejected (in actuality,
not just the FB, but also any color and transformation stuffs associated with
the surface). It is preferable to build on the state by enabling another
fb->plane. This is where changing a plane's zpos to be above/below the PRIMARY
is advantageous, rather than changing the FBs assigned, to accommodate
overlay/underlay arrangements.

I imagine that any algorithm which incrementally builds up the plane arrangement
will have a similar preference. Of course, it's entirely possible that such an
algorithm isn't the best, I admittedly have not thought much about other
possibilities, yet...

Thanks,
Leo

> 
>> In an underlay arrangement, pushing down an OVERLAY's zpos below the PRIMARY's
>> zpos is simpler than swapping their surfaces. If such an arrangement fails
>> atomic_test, we won't have to worry about swapping the surfaces back. Of course,
>> it's not that we can't keep track of that in the algorithm, but I think it does
>> make things easier.
> 
> There is no "swapping" or "swapping back". The tentative configuration
> is created as a new object that contains the complete CRTC+connector
> state, and if it doesn't work, it's simply destroyed. In Weston at
> least, I don't know of libliftoff.
> 
> One surface could also be assigned to multiple KMS planes for different
> CRTCs, so there should be no 1:1 association in the first place.
> 
>> It may help with reducing the amount of atomic tests. Assuming that the same DRM
>> plane provides the same format/color management/transformation support
>> regardless of it's zpos,
> 
> I would definitely expect so.
> 
>> we should be able to reasonably expect that changing
>> it's z-ordering will not cause atomic_test failures (or at least, expect less
>> causes for failure). In other words, swapping the render plane from the PRIMARY
>> to an OVERLAY might have more causes for an atomic_test fail, versus changing
>> their z-ordering. The driver might have to do more things under-the-hood to
>> provide this consistent behavior, but I think that's the right place for it.
>> After all, drivers should know more about their hardware's behavior.
> 
> Indeed.
> 
>> The assumption that the PRIMARY has the lowest zpos isn't always true. I
>> was made aware that the imx8mq platform places all of their OVERLAYS beneath the
>> PRIMARY. Granted, the KMS code for enabling OVERLAYS is not upstream yet, but it
>> is available from this thread:
>> https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1258#note_2319898
>> . I guess this is more of a bad assumption that should be fixed in libliftoff.
> 
> Weston needs fixing too, at least in case a renderer FB is used on the
> CRTC. Weston has two problems: renderer FB is always on PRIMARY plane,
> and renderer FB is always completely opaque.
> 
> 
> Thanks,
> pq
> 
>>>>
>>>> IOW, if the KMS client understands zpos and can do a proper KMS
>>>> configuration search, and all planes have zpos property, then there is
>>>> no need to look at the plane type at all. That is the goal of the
>>>> universal planes feature.
>>>
>>> The optimal configuration with DCN hardware is using underlays.  E.g.,
>>> the desktop plane would be at the top and would have holes cut out of
>>> it for videos or windows that want their own plane.  If you do it the
>>> other way around, there are lots of limitations.
>>>
>>> Alex
>>
>> Right, patch 1/2 tries to work around one of these limitations (cursor-on-yuv).
>> Others have mentioned we can do the same for scaling.
>>
>> Thanks,
>> Leo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-15 22:33                         ` Leo Li
@ 2024-04-16  8:01                           ` Pekka Paalanen
  2024-04-16 14:10                             ` Harry Wentland
  0 siblings, 1 reply; 27+ messages in thread
From: Pekka Paalanen @ 2024-04-16  8:01 UTC (permalink / raw)
  To: Leo Li
  Cc: Alex Deucher, Marius Vlad, Harry Wentland, dri-devel, amd-gfx,
	Joshua Ashton, Michel Dänzer, Chao Guo, Xaver Hugl,
	Vikas Korjani, Robert Mader, Sean Paul, Simon Ser,
	Shashank Sharma, Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 7131 bytes --]

On Mon, 15 Apr 2024 18:33:39 -0400
Leo Li <sunpeng.li@amd.com> wrote:

> On 2024-04-15 04:19, Pekka Paalanen wrote:
> > On Fri, 12 Apr 2024 16:14:28 -0400
> > Leo Li <sunpeng.li@amd.com> wrote:
> >   
> >> On 2024-04-12 11:31, Alex Deucher wrote:  
> >>> On Fri, Apr 12, 2024 at 11:08 AM Pekka Paalanen
> >>> <pekka.paalanen@collabora.com> wrote:  
> >>>>
> >>>> On Fri, 12 Apr 2024 10:28:52 -0400
> >>>> Leo Li <sunpeng.li@amd.com> wrote:
> >>>>     
> >>>>> On 2024-04-12 04:03, Pekka Paalanen wrote:  
> >>>>>> On Thu, 11 Apr 2024 16:33:57 -0400
> >>>>>> Leo Li <sunpeng.li@amd.com> wrote:
> >>>>>>     
> >>>>
> >>>> ...
> >>>>     
> >>>>>>> That begs the question of what can be nailed down and what can left to
> >>>>>>> independent implementation. I guess things like which plane should be enabled
> >>>>>>> first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
> >>>>>>> can be defined. How to handle atomic test failures could be as well.  
> >>>>>>
> >>>>>> What room is there for the interpretation of zpos values?
> >>>>>>
> >>>>>> I thought they are unambiguous already: only the relative numerical
> >>>>>> order matters, and that uniquely defines the KMS plane ordering.  
> >>>>>
> >>>>> The zpos value of the PRIMARY plane relative to OVERLAYS, for example, as a way
> >>>>> for vendors to communicate overlay, underlay, or mixed-arrangement support. I
> >>>>> don't think allowing OVERLAYs to be placed under the PRIMARY is currently
> >>>>> documented as a way to support underlay.  
> >>>>
> >>>> I always thought it's obvious that the zpos numbers dictate the plane
> >>>> order without any other rules. After all, we have the universal planes
> >>>> concept, where the plane type is only informational to aid heuristics
> >>>> rather than defining anything.
> >>>>
> >>>> Only if the zpos property does not exist, the plane types would come
> >>>> into play.
> >>>>
> >>>> Of course, if there actually exists userspace that fails if zpos allows
> >>>> an overlay type plane to be placed below primary, or fails if primary
> >>>> zpos is not zero, then DRM needs a new client cap.  
> >>
> >> Right, it wasn't immediately clear to me that the API allowed placement of
> >> things beneath the PRIMARY. But reading the docs for drm_plane_create_zpos*,
> >> there's nothing that forbids it.
> >>  
> >>>>     
> >>>>> libliftoff for example, assumes that the PRIMARY has the lowest zpos. So
> >>>>> underlay arrangements will use an OVERLAY for the scanout plane, and the PRIMARY
> >>>>> for the underlay view.  
> >>>>
> >>>> That's totally ok. It works, right? Plane type does not matter if the
> >>>> KMS driver accepts the configuration.
> >>>>
> >>>> What is a "scanout plane"? Aren't all KMS planes by definition scanout
> >>>> planes?  
> >>
> >> Pardon my terminology, I thought the scanout plane was where weston rendered
> >> non-offloadable surfaces to. I guess it's more correct to call it the "render
> >> plane". On weston, it seems to be always assigned to the PRIMARY.
> >>  
> > 
> > The assignment restriction is just technical design debt. It is
> > limiting. There is no other good reason for it, than when lighting
> > up a CRTC for the first time, Weston should do it with the renderer FB
> > only, on the plane that is most likely to succeed i.e. PRIMARY. After
> > the CRTC is lit, there should be no built-in limitations in what can go
> > where.
> > 
> > The reason for this is that if a CRTC can be activated, it must always
> > be able to show the renderer FB without incurring a modeset. This is
> > important for ensuring that the fallback compositing (renderer) is
> > always possible. So we start with that configuration, and everything
> > else is optional bonus.  
> 
> Genuinely curious - What exactly is limiting with keeping the renderer FB on
> PRIMARY? IOW, what is the additional benefit of placing the renderer FB on
> something other than PRIMARY?

The limitations come from a combination of hardware limitations.
Perhaps zpos is not mutable, or maybe other planes cannot arbitrarily
move between above and below the primary. This reduces the number of
possible configurations, which might cause off-loading to fail.

I think older hardware has more of these arbitrary restrictions.

> >>
> >> For libliftoff, using OVERLAYs as the render plane and PRIMARY as the underlay
> >> plane would work. But I think keeping the render plane on PRIMARY (a la weston)
> >> makes underlay arrangements easier to allocate, and would be nice to incorporate
> >> into a shared algorithm.  
> > 
> > If zpos exists, I don't think such limitation is a good idea. It will
> > just limit the possible configurations for no reason.
> > 
> > With zpos, the KMS plane type should be irrelevant for their
> > z-ordering. Underlay vs. overlay completely loses its meaning at the
> > KMS level.  
> 
> Right, the plane types loose their meanings. But at least with the way
> libliftoff builds the plane arrangement, where we first allocate the renderer fb
> matters.
> 
> libliftoff incrementally builds the atomic state by adding a single plane to the
> atomic state, then testing it. It essentially does a depth-first-search of all
> possible arrangements, pruning the search on atomic test fail. The state that
> offloads the most number of FBs will be the arrangement used.
> 
> Of course, it's unlikely that the entire DFS tree will traversed in time for a
> frame. So the key is to search the most probable and high-benefit branches
> first, while minimizing the # of atomic tests needed, before a hard-coded
> deadline is hit.
> 
> Following this algorithm, the PRIMARY needs to be enabled first, followed by all
> the secondary planes. After a plane is enabled, it's not preferred to change
> it's assigned FB, since that can cause the state to be rejected (in actuality,
> not just the FB, but also any color and transformation stuffs associated with
> the surface). It is preferable to build on the state by enabling another
> fb->plane. This is where changing a plane's zpos to be above/below the PRIMARY
> is advantageous, rather than changing the FBs assigned, to accommodate
> overlay/underlay arrangements.

This all sounds reasonable, but why limit this to only the renderer FB
on primary plane? The same idea should apply equally to any FB on any
plane. Then one needs more heuristics on when to stop the search short,
and when to reconsider each FB-plane assignment in case new candidates
have appeared but the old ones have not disappeared.

> I imagine that any algorithm which incrementally builds up the plane arrangement
> will have a similar preference. Of course, it's entirely possible that such an
> algorithm isn't the best, I admittedly have not thought much about other
> possibilities, yet...

It's a complicated problem, indeed. Maybe there needs to be a background
task that is not limited by the page flip deadline and can do an
exhaustive search over many refresh periods.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-16  8:01                           ` Pekka Paalanen
@ 2024-04-16 14:10                             ` Harry Wentland
  2024-04-17 18:51                               ` Leo Li
  0 siblings, 1 reply; 27+ messages in thread
From: Harry Wentland @ 2024-04-16 14:10 UTC (permalink / raw)
  To: Pekka Paalanen, Leo Li
  Cc: Alex Deucher, Marius Vlad, dri-devel, amd-gfx, Joshua Ashton,
	Michel Dänzer, Chao Guo, Xaver Hugl, Vikas Korjani,
	Robert Mader, Sean Paul, Simon Ser, Shashank Sharma,
	Sebastian Wick



On 2024-04-16 04:01, Pekka Paalanen wrote:
> On Mon, 15 Apr 2024 18:33:39 -0400
> Leo Li <sunpeng.li@amd.com> wrote:
> 
>> On 2024-04-15 04:19, Pekka Paalanen wrote:
>>> On Fri, 12 Apr 2024 16:14:28 -0400
>>> Leo Li <sunpeng.li@amd.com> wrote:
>>>   
>>>> On 2024-04-12 11:31, Alex Deucher wrote:  
>>>>> On Fri, Apr 12, 2024 at 11:08 AM Pekka Paalanen
>>>>> <pekka.paalanen@collabora.com> wrote:  
>>>>>>
>>>>>> On Fri, 12 Apr 2024 10:28:52 -0400
>>>>>> Leo Li <sunpeng.li@amd.com> wrote:
>>>>>>     
>>>>>>> On 2024-04-12 04:03, Pekka Paalanen wrote:  
>>>>>>>> On Thu, 11 Apr 2024 16:33:57 -0400
>>>>>>>> Leo Li <sunpeng.li@amd.com> wrote:
>>>>>>>>     
>>>>>>
>>>>>> ...
>>>>>>     
>>>>>>>>> That begs the question of what can be nailed down and what can left to
>>>>>>>>> independent implementation. I guess things like which plane should be enabled
>>>>>>>>> first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
>>>>>>>>> can be defined. How to handle atomic test failures could be as well.  
>>>>>>>>
>>>>>>>> What room is there for the interpretation of zpos values?
>>>>>>>>
>>>>>>>> I thought they are unambiguous already: only the relative numerical
>>>>>>>> order matters, and that uniquely defines the KMS plane ordering.  
>>>>>>>
>>>>>>> The zpos value of the PRIMARY plane relative to OVERLAYS, for example, as a way
>>>>>>> for vendors to communicate overlay, underlay, or mixed-arrangement support. I
>>>>>>> don't think allowing OVERLAYs to be placed under the PRIMARY is currently
>>>>>>> documented as a way to support underlay.  
>>>>>>
>>>>>> I always thought it's obvious that the zpos numbers dictate the plane
>>>>>> order without any other rules. After all, we have the universal planes
>>>>>> concept, where the plane type is only informational to aid heuristics
>>>>>> rather than defining anything.
>>>>>>
>>>>>> Only if the zpos property does not exist, the plane types would come
>>>>>> into play.
>>>>>>
>>>>>> Of course, if there actually exists userspace that fails if zpos allows
>>>>>> an overlay type plane to be placed below primary, or fails if primary
>>>>>> zpos is not zero, then DRM needs a new client cap.  
>>>>
>>>> Right, it wasn't immediately clear to me that the API allowed placement of
>>>> things beneath the PRIMARY. But reading the docs for drm_plane_create_zpos*,
>>>> there's nothing that forbids it.
>>>>  
>>>>>>     
>>>>>>> libliftoff for example, assumes that the PRIMARY has the lowest zpos. So
>>>>>>> underlay arrangements will use an OVERLAY for the scanout plane, and the PRIMARY
>>>>>>> for the underlay view.  
>>>>>>
>>>>>> That's totally ok. It works, right? Plane type does not matter if the
>>>>>> KMS driver accepts the configuration.
>>>>>>
>>>>>> What is a "scanout plane"? Aren't all KMS planes by definition scanout
>>>>>> planes?  
>>>>
>>>> Pardon my terminology, I thought the scanout plane was where weston rendered
>>>> non-offloadable surfaces to. I guess it's more correct to call it the "render
>>>> plane". On weston, it seems to be always assigned to the PRIMARY.
>>>>  
>>>
>>> The assignment restriction is just technical design debt. It is
>>> limiting. There is no other good reason for it, than when lighting
>>> up a CRTC for the first time, Weston should do it with the renderer FB
>>> only, on the plane that is most likely to succeed i.e. PRIMARY. After
>>> the CRTC is lit, there should be no built-in limitations in what can go
>>> where.
>>>
>>> The reason for this is that if a CRTC can be activated, it must always
>>> be able to show the renderer FB without incurring a modeset. This is
>>> important for ensuring that the fallback compositing (renderer) is
>>> always possible. So we start with that configuration, and everything
>>> else is optional bonus.  
>>
>> Genuinely curious - What exactly is limiting with keeping the renderer FB on
>> PRIMARY? IOW, what is the additional benefit of placing the renderer FB on
>> something other than PRIMARY?
> 
> The limitations come from a combination of hardware limitations.
> Perhaps zpos is not mutable, or maybe other planes cannot arbitrarily
> move between above and below the primary. This reduces the number of
> possible configurations, which might cause off-loading to fail.
> 
> I think older hardware has more of these arbitrary restrictions.
> 
>>>>
>>>> For libliftoff, using OVERLAYs as the render plane and PRIMARY as the underlay
>>>> plane would work. But I think keeping the render plane on PRIMARY (a la weston)
>>>> makes underlay arrangements easier to allocate, and would be nice to incorporate
>>>> into a shared algorithm.  
>>>
>>> If zpos exists, I don't think such limitation is a good idea. It will
>>> just limit the possible configurations for no reason.
>>>
>>> With zpos, the KMS plane type should be irrelevant for their
>>> z-ordering. Underlay vs. overlay completely loses its meaning at the
>>> KMS level.  
>>
>> Right, the plane types loose their meanings. But at least with the way
>> libliftoff builds the plane arrangement, where we first allocate the renderer fb
>> matters.
>>
>> libliftoff incrementally builds the atomic state by adding a single plane to the
>> atomic state, then testing it. It essentially does a depth-first-search of all
>> possible arrangements, pruning the search on atomic test fail. The state that
>> offloads the most number of FBs will be the arrangement used.
>>
>> Of course, it's unlikely that the entire DFS tree will traversed in time for a
>> frame. So the key is to search the most probable and high-benefit branches
>> first, while minimizing the # of atomic tests needed, before a hard-coded
>> deadline is hit.
>>
>> Following this algorithm, the PRIMARY needs to be enabled first, followed by all
>> the secondary planes. After a plane is enabled, it's not preferred to change
>> it's assigned FB, since that can cause the state to be rejected (in actuality,
>> not just the FB, but also any color and transformation stuffs associated with
>> the surface). It is preferable to build on the state by enabling another
>> fb->plane. This is where changing a plane's zpos to be above/below the PRIMARY
>> is advantageous, rather than changing the FBs assigned, to accommodate
>> overlay/underlay arrangements.
> 
> This all sounds reasonable, but why limit this to only the renderer FB
> on primary plane? The same idea should apply equally to any FB on any
> plane. Then one needs more heuristics on when to stop the search short,
> and when to reconsider each FB-plane assignment in case new candidates
> have appeared but the old ones have not disappeared.
> 
>> I imagine that any algorithm which incrementally builds up the plane arrangement
>> will have a similar preference. Of course, it's entirely possible that such an
>> algorithm isn't the best, I admittedly have not thought much about other
>> possibilities, yet...
> 
> It's a complicated problem, indeed. Maybe there needs to be a background
> task that is not limited by the page flip deadline and can do an
> exhaustive search over many refresh periods.
> 

That would be nice. Kick this off when there is a configuration change,
e.g., user starts video playback, opens a new video, etc.

One would need to avoid doing too much of that, though, as one could
envision scenarios where this happens frequently and could have its
own impact on power by keeping the CPU busy.

Harry

> 
> Thanks,
> pq


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible
  2024-04-16 14:10                             ` Harry Wentland
@ 2024-04-17 18:51                               ` Leo Li
  0 siblings, 0 replies; 27+ messages in thread
From: Leo Li @ 2024-04-17 18:51 UTC (permalink / raw)
  To: Harry Wentland, Pekka Paalanen, Simon Ser
  Cc: Alex Deucher, Marius Vlad, dri-devel, amd-gfx, Joshua Ashton,
	Michel Dänzer, Chao Guo, Xaver Hugl, Vikas Korjani,
	Robert Mader, Sean Paul, Shashank Sharma, Sebastian Wick




On 2024-04-16 10:10, Harry Wentland wrote:
> 
> 
> On 2024-04-16 04:01, Pekka Paalanen wrote:
>> On Mon, 15 Apr 2024 18:33:39 -0400
>> Leo Li <sunpeng.li@amd.com> wrote:
>>
>>> On 2024-04-15 04:19, Pekka Paalanen wrote:
>>>> On Fri, 12 Apr 2024 16:14:28 -0400
>>>> Leo Li <sunpeng.li@amd.com> wrote:
>>>>    
>>>>> On 2024-04-12 11:31, Alex Deucher wrote:
>>>>>> On Fri, Apr 12, 2024 at 11:08 AM Pekka Paalanen
>>>>>> <pekka.paalanen@collabora.com> wrote:
>>>>>>>
>>>>>>> On Fri, 12 Apr 2024 10:28:52 -0400
>>>>>>> Leo Li <sunpeng.li@amd.com> wrote:
>>>>>>>      
>>>>>>>> On 2024-04-12 04:03, Pekka Paalanen wrote:
>>>>>>>>> On Thu, 11 Apr 2024 16:33:57 -0400
>>>>>>>>> Leo Li <sunpeng.li@amd.com> wrote:
>>>>>>>>>      
>>>>>>>
>>>>>>> ...
>>>>>>>      
>>>>>>>>>> That begs the question of what can be nailed down and what can left to
>>>>>>>>>> independent implementation. I guess things like which plane should be enabled
>>>>>>>>>> first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
>>>>>>>>>> can be defined. How to handle atomic test failures could be as well.
>>>>>>>>>
>>>>>>>>> What room is there for the interpretation of zpos values?
>>>>>>>>>
>>>>>>>>> I thought they are unambiguous already: only the relative numerical
>>>>>>>>> order matters, and that uniquely defines the KMS plane ordering.
>>>>>>>>
>>>>>>>> The zpos value of the PRIMARY plane relative to OVERLAYS, for example, as a way
>>>>>>>> for vendors to communicate overlay, underlay, or mixed-arrangement support. I
>>>>>>>> don't think allowing OVERLAYs to be placed under the PRIMARY is currently
>>>>>>>> documented as a way to support underlay.
>>>>>>>
>>>>>>> I always thought it's obvious that the zpos numbers dictate the plane
>>>>>>> order without any other rules. After all, we have the universal planes
>>>>>>> concept, where the plane type is only informational to aid heuristics
>>>>>>> rather than defining anything.
>>>>>>>
>>>>>>> Only if the zpos property does not exist, the plane types would come
>>>>>>> into play.
>>>>>>>
>>>>>>> Of course, if there actually exists userspace that fails if zpos allows
>>>>>>> an overlay type plane to be placed below primary, or fails if primary
>>>>>>> zpos is not zero, then DRM needs a new client cap.
>>>>>
>>>>> Right, it wasn't immediately clear to me that the API allowed placement of
>>>>> things beneath the PRIMARY. But reading the docs for drm_plane_create_zpos*,
>>>>> there's nothing that forbids it.
>>>>>   
>>>>>>>      
>>>>>>>> libliftoff for example, assumes that the PRIMARY has the lowest zpos. So
>>>>>>>> underlay arrangements will use an OVERLAY for the scanout plane, and the PRIMARY
>>>>>>>> for the underlay view.
>>>>>>>
>>>>>>> That's totally ok. It works, right? Plane type does not matter if the
>>>>>>> KMS driver accepts the configuration.
>>>>>>>
>>>>>>> What is a "scanout plane"? Aren't all KMS planes by definition scanout
>>>>>>> planes?
>>>>>
>>>>> Pardon my terminology, I thought the scanout plane was where weston rendered
>>>>> non-offloadable surfaces to. I guess it's more correct to call it the "render
>>>>> plane". On weston, it seems to be always assigned to the PRIMARY.
>>>>>   
>>>>
>>>> The assignment restriction is just technical design debt. It is
>>>> limiting. There is no other good reason for it, than when lighting
>>>> up a CRTC for the first time, Weston should do it with the renderer FB
>>>> only, on the plane that is most likely to succeed i.e. PRIMARY. After
>>>> the CRTC is lit, there should be no built-in limitations in what can go
>>>> where.
>>>>
>>>> The reason for this is that if a CRTC can be activated, it must always
>>>> be able to show the renderer FB without incurring a modeset. This is
>>>> important for ensuring that the fallback compositing (renderer) is
>>>> always possible. So we start with that configuration, and everything
>>>> else is optional bonus.
>>>
>>> Genuinely curious - What exactly is limiting with keeping the renderer FB on
>>> PRIMARY? IOW, what is the additional benefit of placing the renderer FB on
>>> something other than PRIMARY?
>>
>> The limitations come from a combination of hardware limitations.
>> Perhaps zpos is not mutable, or maybe other planes cannot arbitrarily
>> move between above and below the primary. This reduces the number of
>> possible configurations, which might cause off-loading to fail.
>>
>> I think older hardware has more of these arbitrary restrictions.

I see. I was thinking that drivers can do under-the-hood stuff to present a
mutable zpos to clients, even if their hardware planes cannot be arbitrarily
rearranged, by mapping the PRIMARY to a different hardware plane. But not all
planes have the same function, so this sounds more complicated than helpful.

>>
>>>>>
>>>>> For libliftoff, using OVERLAYs as the render plane and PRIMARY as the underlay
>>>>> plane would work. But I think keeping the render plane on PRIMARY (a la weston)
>>>>> makes underlay arrangements easier to allocate, and would be nice to incorporate
>>>>> into a shared algorithm.
>>>>
>>>> If zpos exists, I don't think such limitation is a good idea. It will
>>>> just limit the possible configurations for no reason.
>>>>
>>>> With zpos, the KMS plane type should be irrelevant for their
>>>> z-ordering. Underlay vs. overlay completely loses its meaning at the
>>>> KMS level.
>>>
>>> Right, the plane types loose their meanings. But at least with the way
>>> libliftoff builds the plane arrangement, where we first allocate the renderer fb
>>> matters.
>>>
>>> libliftoff incrementally builds the atomic state by adding a single plane to the
>>> atomic state, then testing it. It essentially does a depth-first-search of all
>>> possible arrangements, pruning the search on atomic test fail. The state that
>>> offloads the most number of FBs will be the arrangement used.
>>>
>>> Of course, it's unlikely that the entire DFS tree will traversed in time for a
>>> frame. So the key is to search the most probable and high-benefit branches
>>> first, while minimizing the # of atomic tests needed, before a hard-coded
>>> deadline is hit.
>>>
>>> Following this algorithm, the PRIMARY needs to be enabled first, followed by all
>>> the secondary planes. After a plane is enabled, it's not preferred to change
>>> it's assigned FB, since that can cause the state to be rejected (in actuality,
>>> not just the FB, but also any color and transformation stuffs associated with
>>> the surface). It is preferable to build on the state by enabling another
>>> fb->plane. This is where changing a plane's zpos to be above/below the PRIMARY
>>> is advantageous, rather than changing the FBs assigned, to accommodate
>>> overlay/underlay arrangements.
>>
>> This all sounds reasonable, but why limit this to only the renderer FB
>> on primary plane? The same idea should apply equally to any FB on any
>> plane. Then one needs more heuristics on when to stop the search short,
>> and when to reconsider each FB-plane assignment in case new candidates
>> have appeared but the old ones have not disappeared.

libliftoff starts the search by assigning the renderer FB, if one is provided by
the compositor, to PRIMARY. I think the reason is to always have the renderer
option available for FBs that need it. Eventually, if the search tree is
traversed enough, an arrangement that does not need the renderer fb may be
found, if all the FBs can be assigned, and there are enough planes for them. But
we may not get there before the deadline.

Perhaps having more time to search is the solution here.

(p.s. if a candidate FB is added or removed, libliftoff starts the search anew)

>>
>>> I imagine that any algorithm which incrementally builds up the plane arrangement
>>> will have a similar preference. Of course, it's entirely possible that such an
>>> algorithm isn't the best, I admittedly have not thought much about other
>>> possibilities, yet...
>>
>> It's a complicated problem, indeed. Maybe there needs to be a background
>> task that is not limited by the page flip deadline and can do an
>> exhaustive search over many refresh periods.
>>
> 
> That would be nice. Kick this off when there is a configuration change,
> e.g., user starts video playback, opens a new video, etc.
> 
> One would need to avoid doing too much of that, though, as one could
> envision scenarios where this happens frequently and could have its
> own impact on power by keeping the CPU busy.
> 
> Harry

I recall emersion had a similar suggestion for libliftoff by caching the
incomplete plane arrangement for further processing on future frames once the
deadline is reached. It avoids the need for a separate task.

Having more time to do a more exhaustive search would make zpos meaningless
outside of determining the correct z-ordering, as pq previously mentioned. It
would support hardware that have zpos limitations. It is more complex, but maybe
that's fine, as long as the complexity doesn't bleed into other parts of the
compositor.

There are still ways to limit the # of atomic tests needed for the search, which
will help speed things up (already considered by libliftoff today):

* IN_FORMAT property for what FB formats a plane supports
* zpos property for correct z-ordering
* Occlusion rules. A FB occluded by a rendered FB or underlay-ed FB cannot be
overlay-ed, for example
* And potentially more

Thanks,
Leo

> 
>>
>> Thanks,
>> pq
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-04-17 18:52 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-15 17:09 [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible sunpeng.li
2024-03-15 17:09 ` [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode sunpeng.li
2024-03-16  8:38   ` kernel test robot
2024-03-21 21:39   ` Harry Wentland
2024-03-28 15:16   ` Pekka Paalanen
2024-03-28 15:48   ` Robert Mader
2024-03-28 15:52     ` Harry Wentland
2024-04-01 14:38       ` Leo Li
2024-03-15 17:09 ` [PATCH 2/2] drm/amd/display: Move PRIMARY plane zpos higher sunpeng.li
2024-03-21 21:36   ` Harry Wentland
2024-03-28 15:20   ` Pekka Paalanen
2024-03-28 14:33 ` [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible Pekka Paalanen
2024-04-03 21:32   ` Leo Li
2024-04-04 10:24     ` Pekka Paalanen
2024-04-04 13:59       ` Harry Wentland
2024-04-04 14:22         ` Marius Vlad
2024-04-11 20:33           ` Leo Li
2024-04-12  8:03             ` Pekka Paalanen
2024-04-12 14:28               ` Leo Li
2024-04-12 15:07                 ` Pekka Paalanen
2024-04-12 15:31                   ` Alex Deucher
2024-04-12 20:14                     ` Leo Li
2024-04-15  8:19                       ` Pekka Paalanen
2024-04-15 22:33                         ` Leo Li
2024-04-16  8:01                           ` Pekka Paalanen
2024-04-16 14:10                             ` Harry Wentland
2024-04-17 18:51                               ` Leo Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.