All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/9] Add new formats support to vkms
@ 2022-04-04 20:45 Igor Torrente
  2022-04-04 20:45 ` [PATCH v5 1/9] drm: vkms: Alloc the compose frame using vzalloc Igor Torrente
                   ` (9 more replies)
  0 siblings, 10 replies; 44+ messages in thread
From: Igor Torrente @ 2022-04-04 20:45 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, leandro.ribeiro, dri-devel,
	tales.aparecida, ~lkcamp/patches, Igor Torrente

Summary
=======
This series of patches refactor some vkms components in order to introduce
new formats to the planes and writeback connector.

Now in the blend function, the plane's pixels are converted to ARGB16161616
and then blended together.

The CRC is calculated based on the ARGB1616161616 buffer. And if required,
this buffer is copied/converted to the writeback buffer format.

And to handle the pixel conversion, new functions were added to convert
from a specific format to ARGB16161616 (the reciprocal is also true).

Tests
=====
This patch series was tested using the following igt tests:
-t ".*kms_plane.*"
-t ".*kms_writeback.*"
-t ".*kms_cursor_crc*"
-t ".*kms_flip.*"

New tests passing
-------------------
- pipe-A-cursor-size-change
- pipe-A-cursor-alpha-transparent

Performance
-----------
It's running slightly faster than the current implementation.

Results running the IGT[1] test
`igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:

|                  Frametime                   |
|:--------------------------------------------:|
|  Implementation |  Current  |   This commit  |
|:---------------:|:---------:|:--------------:|
| frametime range |  9~22 ms  |     10~22 ms   |
|     Average     |  11.4 ms  |     12.32 ms   |

Memory consumption
==================
It consumes less memory than the current implementation in
the common case (more detail in the commit message).

| Memory consumption (output dimensions) |
|:--------------------------------------:|
|       Current      |     This patch    |
|:------------------:|:-----------------:|
|   Width * Heigth   |     2 * Width     |

[1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4

XRGB to ARGB behavior
=====================
During the development, I decided to always fill the alpha channel of
the output pixel whenever the conversion from a format without an alpha
channel to ARGB16161616 is necessary. Therefore, I ignore the value
received from the XRGB and overwrite the value with 0xFFFF.

Primary plane and CRTC size
===========================
This patch series reworks the blend function to accept a primary plane with
a different size and position from CRTC.
Because now we need to fill the background, we had a loss in
performance with this change

---
Igor Torrente (9):
  drm: vkms: Alloc the compose frame using vzalloc
  drm: vkms: Replace hardcoded value of `vkms_composer.map` to
    DRM_FORMAT_MAX_PLANES
  drm: vkms: Rename `vkms_composer` to `vkms_frame_info`
  drm: drm_atomic_helper: Add a new helper to deal with the writeback
    connector validation
  drm: vkms: Add fb information to `vkms_writeback_job`
  drm: vkms: Refactor the plane composer to accept new formats
  drm: vkms: Supports to the case where primary plane doesn't match the
    CRTC
  drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
  drm: vkms: Add support to the RGB565 format

 Documentation/gpu/vkms.rst            |  13 +-
 drivers/gpu/drm/drm_atomic_helper.c   |  39 ++++
 drivers/gpu/drm/vkms/Makefile         |   1 +
 drivers/gpu/drm/vkms/vkms_composer.c  | 325 ++++++++++++--------------
 drivers/gpu/drm/vkms/vkms_crtc.c      |   4 +
 drivers/gpu/drm/vkms/vkms_drv.h       |  41 +++-
 drivers/gpu/drm/vkms/vkms_formats.c   | 298 +++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
 drivers/gpu/drm/vkms/vkms_plane.c     |  50 ++--
 drivers/gpu/drm/vkms/vkms_writeback.c |  35 ++-
 include/drm/drm_atomic_helper.h       |   3 +
 11 files changed, 596 insertions(+), 225 deletions(-)
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h

-- 
2.30.2


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v5 1/9] drm: vkms: Alloc the compose frame using vzalloc
  2022-04-04 20:45 [PATCH v5 0/9] Add new formats support to vkms Igor Torrente
@ 2022-04-04 20:45 ` Igor Torrente
  2022-04-05 14:05   ` André Almeida
  2022-04-04 20:45 ` [PATCH v5 2/9] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES Igor Torrente
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-04 20:45 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, leandro.ribeiro, Melissa Wen, dri-devel,
	tales.aparecida, ~lkcamp/patches, Igor Torrente

Currently, the memory to the composition frame is being allocated using
the kzmalloc. This comes with the limitation of maximum size of one
page size(which in the x86_64 is 4Kb and 4MB for default and hugepage
respectively).

Somes test of igt (e.g. kms_plane@pixel-format) uses more than 4MB when
testing some pixel formats like ARGB16161616 and the following error were
showing up when running kms_plane@plane-panning-bottom-right*:

[drm:vkms_composer_worker [vkms]] *ERROR* Cannot allocate memory for
output frame.

This problem is addessed by allocating the memory using kvzalloc that
circunvents this limitation.

V5: Improve the commit message and drop the debugging issues in VKMS
TO-DO(Melissa Wen).

Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 Documentation/gpu/vkms.rst           | 6 ------
 drivers/gpu/drm/vkms/vkms_composer.c | 6 +++---
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
index 9c873c3912cc..973e2d43108b 100644
--- a/Documentation/gpu/vkms.rst
+++ b/Documentation/gpu/vkms.rst
@@ -102,12 +102,6 @@ Debugging:
 
 - kms_plane: some test cases are failing due to timeout on capturing CRC;
 
-- kms_flip: when running test cases in sequence, some successful individual
-  test cases are failing randomly; when individually, some successful test
-  cases display in the log the following error::
-
-  [drm:vkms_prepare_fb [vkms]] ERROR vmap failed: -4
-
 Virtual hardware (vblank-less) mode:
 
 - VKMS already has support for vblanks simulated via hrtimers, which can be
diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 9e8204be9a14..82f79e508f81 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -180,7 +180,7 @@ static int compose_active_planes(void **vaddr_out,
 	int i;
 
 	if (!*vaddr_out) {
-		*vaddr_out = kzalloc(gem_obj->size, GFP_KERNEL);
+		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
 		if (!*vaddr_out) {
 			DRM_ERROR("Cannot allocate memory for output frame.");
 			return -ENOMEM;
@@ -263,7 +263,7 @@ void vkms_composer_worker(struct work_struct *work)
 				    crtc_state);
 	if (ret) {
 		if (ret == -EINVAL && !wb_pending)
-			kfree(vaddr_out);
+			kvfree(vaddr_out);
 		return;
 	}
 
@@ -275,7 +275,7 @@ void vkms_composer_worker(struct work_struct *work)
 		crtc_state->wb_pending = false;
 		spin_unlock_irq(&out->composer_lock);
 	} else {
-		kfree(vaddr_out);
+		kvfree(vaddr_out);
 	}
 
 	/*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 2/9] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES
  2022-04-04 20:45 [PATCH v5 0/9] Add new formats support to vkms Igor Torrente
  2022-04-04 20:45 ` [PATCH v5 1/9] drm: vkms: Alloc the compose frame using vzalloc Igor Torrente
@ 2022-04-04 20:45 ` Igor Torrente
  2022-04-04 20:45 ` [PATCH v5 3/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info` Igor Torrente
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Igor Torrente @ 2022-04-04 20:45 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, leandro.ribeiro, Melissa Wen, dri-devel,
	tales.aparecida, ~lkcamp/patches, Igor Torrente

The `map` vector at `vkms_composer` uses a hardcoded value to define its
size.

If someday the maximum number of planes increases, this hardcoded value
can be a problem.

This value is being replaced with the DRM_FORMAT_MAX_PLANES macro.

Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_drv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 9496fdc900b8..0eeea6f93733 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -30,7 +30,7 @@ struct vkms_writeback_job {
 struct vkms_composer {
 	struct drm_framebuffer fb;
 	struct drm_rect src, dst;
-	struct dma_buf_map map[4];
+	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
 	unsigned int offset;
 	unsigned int pitch;
 	unsigned int cpp;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 3/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info`
  2022-04-04 20:45 [PATCH v5 0/9] Add new formats support to vkms Igor Torrente
  2022-04-04 20:45 ` [PATCH v5 1/9] drm: vkms: Alloc the compose frame using vzalloc Igor Torrente
  2022-04-04 20:45 ` [PATCH v5 2/9] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES Igor Torrente
@ 2022-04-04 20:45 ` Igor Torrente
  2022-04-04 20:45 ` [PATCH v5 4/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation Igor Torrente
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Igor Torrente @ 2022-04-04 20:45 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, leandro.ribeiro, Melissa Wen, dri-devel,
	tales.aparecida, ~lkcamp/patches, Igor Torrente

Changes the name of this struct to a more meaningful name.
A name that represents better what this struct is about.

Composer is the code that do the compositing of the planes.
This struct contains information on the frame used in the output
composition. Thus, vkms_frame_info is a better name to represent
this.

V5: Fix a commit message typo(Melissa Wen).

Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c | 87 ++++++++++++++--------------
 drivers/gpu/drm/vkms/vkms_drv.h      |  6 +-
 drivers/gpu/drm/vkms/vkms_plane.c    | 38 ++++++------
 3 files changed, 66 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 82f79e508f81..2d946368a561 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -11,11 +11,11 @@
 #include "vkms_drv.h"
 
 static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
-				 const struct vkms_composer *composer)
+				 const struct vkms_frame_info *frame_info)
 {
 	u32 pixel;
-	int src_offset = composer->offset + (y * composer->pitch)
-				      + (x * composer->cpp);
+	int src_offset = frame_info->offset + (y * frame_info->pitch)
+					    + (x * frame_info->cpp);
 
 	pixel = *(u32 *)&buffer[src_offset];
 
@@ -26,24 +26,24 @@ static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
  * compute_crc - Compute CRC value on output frame
  *
  * @vaddr: address to final framebuffer
- * @composer: framebuffer's metadata
+ * @frame_info: framebuffer's metadata
  *
  * returns CRC value computed using crc32 on the visible portion of
  * the final framebuffer at vaddr_out
  */
 static uint32_t compute_crc(const u8 *vaddr,
-			    const struct vkms_composer *composer)
+			    const struct vkms_frame_info *frame_info)
 {
 	int x, y;
 	u32 crc = 0, pixel = 0;
-	int x_src = composer->src.x1 >> 16;
-	int y_src = composer->src.y1 >> 16;
-	int h_src = drm_rect_height(&composer->src) >> 16;
-	int w_src = drm_rect_width(&composer->src) >> 16;
+	int x_src = frame_info->src.x1 >> 16;
+	int y_src = frame_info->src.y1 >> 16;
+	int h_src = drm_rect_height(&frame_info->src) >> 16;
+	int w_src = drm_rect_width(&frame_info->src) >> 16;
 
 	for (y = y_src; y < y_src + h_src; ++y) {
 		for (x = x_src; x < x_src + w_src; ++x) {
-			pixel = get_pixel_from_buffer(x, y, vaddr, composer);
+			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
 			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
 		}
 	}
@@ -98,8 +98,8 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
  * blend - blend value at vaddr_src with value at vaddr_dst
  * @vaddr_dst: destination address
  * @vaddr_src: source address
- * @dst_composer: destination framebuffer's metadata
- * @src_composer: source framebuffer's metadata
+ * @dst_frame_info: destination framebuffer's metadata
+ * @src_frame_info: source framebuffer's metadata
  * @pixel_blend: blending equation based on plane format
  *
  * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
@@ -111,33 +111,33 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
  * pixel color values
  */
 static void blend(void *vaddr_dst, void *vaddr_src,
-		  struct vkms_composer *dst_composer,
-		  struct vkms_composer *src_composer,
+		  struct vkms_frame_info *dst_frame_info,
+		  struct vkms_frame_info *src_frame_info,
 		  void (*pixel_blend)(const u8 *, u8 *))
 {
 	int i, j, j_dst, i_dst;
 	int offset_src, offset_dst;
 	u8 *pixel_dst, *pixel_src;
 
-	int x_src = src_composer->src.x1 >> 16;
-	int y_src = src_composer->src.y1 >> 16;
+	int x_src = src_frame_info->src.x1 >> 16;
+	int y_src = src_frame_info->src.y1 >> 16;
 
-	int x_dst = src_composer->dst.x1;
-	int y_dst = src_composer->dst.y1;
-	int h_dst = drm_rect_height(&src_composer->dst);
-	int w_dst = drm_rect_width(&src_composer->dst);
+	int x_dst = src_frame_info->dst.x1;
+	int y_dst = src_frame_info->dst.y1;
+	int h_dst = drm_rect_height(&src_frame_info->dst);
+	int w_dst = drm_rect_width(&src_frame_info->dst);
 
 	int y_limit = y_src + h_dst;
 	int x_limit = x_src + w_dst;
 
 	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
 		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
-			offset_dst = dst_composer->offset
-				     + (i_dst * dst_composer->pitch)
-				     + (j_dst++ * dst_composer->cpp);
-			offset_src = src_composer->offset
-				     + (i * src_composer->pitch)
-				     + (j * src_composer->cpp);
+			offset_dst = dst_frame_info->offset
+				     + (i_dst * dst_frame_info->pitch)
+				     + (j_dst++ * dst_frame_info->cpp);
+			offset_src = src_frame_info->offset
+				     + (i * src_frame_info->pitch)
+				     + (j * src_frame_info->cpp);
 
 			pixel_src = (u8 *)(vaddr_src + offset_src);
 			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
@@ -149,32 +149,33 @@ static void blend(void *vaddr_dst, void *vaddr_src,
 	}
 }
 
-static void compose_plane(struct vkms_composer *primary_composer,
-			  struct vkms_composer *plane_composer,
+static void compose_plane(struct vkms_frame_info *primary_plane_info,
+			  struct vkms_frame_info *plane_frame_info,
 			  void *vaddr_out)
 {
-	struct drm_framebuffer *fb = &plane_composer->fb;
+	struct drm_framebuffer *fb = &plane_frame_info->fb;
 	void *vaddr;
 	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
 
-	if (WARN_ON(dma_buf_map_is_null(&primary_composer->map[0])))
+	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
 		return;
 
-	vaddr = plane_composer->map[0].vaddr;
+	vaddr = plane_frame_info->map[0].vaddr;
 
 	if (fb->format->format == DRM_FORMAT_ARGB8888)
 		pixel_blend = &alpha_blend;
 	else
 		pixel_blend = &x_blend;
 
-	blend(vaddr_out, vaddr, primary_composer, plane_composer, pixel_blend);
+	blend(vaddr_out, vaddr, primary_plane_info,
+	      plane_frame_info, pixel_blend);
 }
 
 static int compose_active_planes(void **vaddr_out,
-				 struct vkms_composer *primary_composer,
+				 struct vkms_frame_info *primary_plane_info,
 				 struct vkms_crtc_state *crtc_state)
 {
-	struct drm_framebuffer *fb = &primary_composer->fb;
+	struct drm_framebuffer *fb = &primary_plane_info->fb;
 	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
 	const void *vaddr;
 	int i;
@@ -187,10 +188,10 @@ static int compose_active_planes(void **vaddr_out,
 		}
 	}
 
-	if (WARN_ON(dma_buf_map_is_null(&primary_composer->map[0])))
+	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
 		return -EINVAL;
 
-	vaddr = primary_composer->map[0].vaddr;
+	vaddr = primary_plane_info->map[0].vaddr;
 
 	memcpy(*vaddr_out, vaddr, gem_obj->size);
 
@@ -199,8 +200,8 @@ static int compose_active_planes(void **vaddr_out,
 	 * ((primary <- overlay) <- cursor)
 	 */
 	for (i = 1; i < crtc_state->num_active_planes; i++)
-		compose_plane(primary_composer,
-			      crtc_state->active_planes[i]->composer,
+		compose_plane(primary_plane_info,
+			      crtc_state->active_planes[i]->frame_info,
 			      *vaddr_out);
 
 	return 0;
@@ -222,7 +223,7 @@ void vkms_composer_worker(struct work_struct *work)
 						composer_work);
 	struct drm_crtc *crtc = crtc_state->base.crtc;
 	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
-	struct vkms_composer *primary_composer = NULL;
+	struct vkms_frame_info *primary_plane_info = NULL;
 	struct vkms_plane_state *act_plane = NULL;
 	bool crc_pending, wb_pending;
 	void *vaddr_out = NULL;
@@ -250,16 +251,16 @@ void vkms_composer_worker(struct work_struct *work)
 	if (crtc_state->num_active_planes >= 1) {
 		act_plane = crtc_state->active_planes[0];
 		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
-			primary_composer = act_plane->composer;
+			primary_plane_info = act_plane->frame_info;
 	}
 
-	if (!primary_composer)
+	if (!primary_plane_info)
 		return;
 
 	if (wb_pending)
 		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
 
-	ret = compose_active_planes(&vaddr_out, primary_composer,
+	ret = compose_active_planes(&vaddr_out, primary_plane_info,
 				    crtc_state);
 	if (ret) {
 		if (ret == -EINVAL && !wb_pending)
@@ -267,7 +268,7 @@ void vkms_composer_worker(struct work_struct *work)
 		return;
 	}
 
-	crc32 = compute_crc(vaddr_out, primary_composer);
+	crc32 = compute_crc(vaddr_out, primary_plane_info);
 
 	if (wb_pending) {
 		drm_writeback_signal_completion(&out->wb_connector, 0);
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 0eeea6f93733..2e6342164bef 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -27,7 +27,7 @@ struct vkms_writeback_job {
 	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
 };
 
-struct vkms_composer {
+struct vkms_frame_info {
 	struct drm_framebuffer fb;
 	struct drm_rect src, dst;
 	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
@@ -39,11 +39,11 @@ struct vkms_composer {
 /**
  * vkms_plane_state - Driver specific plane state
  * @base: base plane state
- * @composer: data required for composing computation
+ * @frame_info: data required for composing computation
  */
 struct vkms_plane_state {
 	struct drm_shadow_plane_state base;
-	struct vkms_composer *composer;
+	struct vkms_frame_info *frame_info;
 };
 
 struct vkms_plane {
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 32409e15244b..a56b0f76eddd 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -24,20 +24,20 @@ static struct drm_plane_state *
 vkms_plane_duplicate_state(struct drm_plane *plane)
 {
 	struct vkms_plane_state *vkms_state;
-	struct vkms_composer *composer;
+	struct vkms_frame_info *frame_info;
 
 	vkms_state = kzalloc(sizeof(*vkms_state), GFP_KERNEL);
 	if (!vkms_state)
 		return NULL;
 
-	composer = kzalloc(sizeof(*composer), GFP_KERNEL);
-	if (!composer) {
-		DRM_DEBUG_KMS("Couldn't allocate composer\n");
+	frame_info = kzalloc(sizeof(*frame_info), GFP_KERNEL);
+	if (!frame_info) {
+		DRM_DEBUG_KMS("Couldn't allocate frame_info\n");
 		kfree(vkms_state);
 		return NULL;
 	}
 
-	vkms_state->composer = composer;
+	vkms_state->frame_info = frame_info;
 
 	__drm_gem_duplicate_shadow_plane_state(plane, &vkms_state->base);
 
@@ -54,12 +54,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
 		/* dropping the reference we acquired in
 		 * vkms_primary_plane_update()
 		 */
-		if (drm_framebuffer_read_refcount(&vkms_state->composer->fb))
-			drm_framebuffer_put(&vkms_state->composer->fb);
+		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
+			drm_framebuffer_put(&vkms_state->frame_info->fb);
 	}
 
-	kfree(vkms_state->composer);
-	vkms_state->composer = NULL;
+	kfree(vkms_state->frame_info);
+	vkms_state->frame_info = NULL;
 
 	__drm_gem_destroy_shadow_plane_state(&vkms_state->base);
 	kfree(vkms_state);
@@ -99,7 +99,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	struct vkms_plane_state *vkms_plane_state;
 	struct drm_shadow_plane_state *shadow_plane_state;
 	struct drm_framebuffer *fb = new_state->fb;
-	struct vkms_composer *composer;
+	struct vkms_frame_info *frame_info;
 
 	if (!new_state->crtc || !fb)
 		return;
@@ -107,15 +107,15 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	vkms_plane_state = to_vkms_plane_state(new_state);
 	shadow_plane_state = &vkms_plane_state->base;
 
-	composer = vkms_plane_state->composer;
-	memcpy(&composer->src, &new_state->src, sizeof(struct drm_rect));
-	memcpy(&composer->dst, &new_state->dst, sizeof(struct drm_rect));
-	memcpy(&composer->fb, fb, sizeof(struct drm_framebuffer));
-	memcpy(&composer->map, &shadow_plane_state->data, sizeof(composer->map));
-	drm_framebuffer_get(&composer->fb);
-	composer->offset = fb->offsets[0];
-	composer->pitch = fb->pitches[0];
-	composer->cpp = fb->format->cpp[0];
+	frame_info = vkms_plane_state->frame_info;
+	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
+	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
+	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
+	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
+	drm_framebuffer_get(&frame_info->fb);
+	frame_info->offset = fb->offsets[0];
+	frame_info->pitch = fb->pitches[0];
+	frame_info->cpp = fb->format->cpp[0];
 }
 
 static int vkms_plane_atomic_check(struct drm_plane *plane,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 4/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation
  2022-04-04 20:45 [PATCH v5 0/9] Add new formats support to vkms Igor Torrente
                   ` (2 preceding siblings ...)
  2022-04-04 20:45 ` [PATCH v5 3/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info` Igor Torrente
@ 2022-04-04 20:45 ` Igor Torrente
  2022-04-05 14:21   ` André Almeida
  2022-04-04 20:45 ` [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job` Igor Torrente
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-04 20:45 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, leandro.ribeiro, dri-devel,
	tales.aparecida, ~lkcamp/patches, Igor Torrente

Add a helper function to validate the connector configuration receive in
the encoder atomic_check by the drivers.

So the drivers don't need do these common validations themselves.

V2: Move the format verification to a new helper at the drm_atomic_helper.c
    (Thomas Zimmermann).
V3: Format check improvements (Leandro Ribeiro).
    Minor improvements(Thomas Zimmermann).

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/drm_atomic_helper.c   | 39 +++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_writeback.c |  9 +++----
 include/drm/drm_atomic_helper.h       |  3 +++
 3 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 9603193d2fa1..2052e18fa64c 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -776,6 +776,45 @@ drm_atomic_helper_check_modeset(struct drm_device *dev,
 }
 EXPORT_SYMBOL(drm_atomic_helper_check_modeset);
 
+/**
+ * drm_atomic_helper_check_wb_connector_state() - Check writeback encoder state
+ * @encoder: encoder state to check
+ * @conn_state: connector state to check
+ *
+ * Checks if the writeback connector state is valid, and returns an error if it
+ * isn't.
+ *
+ * RETURNS:
+ * Zero for success or -errno
+ */
+int
+drm_atomic_helper_check_wb_encoder_state(struct drm_encoder *encoder,
+					 struct drm_connector_state *conn_state)
+{
+	struct drm_writeback_job *wb_job = conn_state->writeback_job;
+	struct drm_property_blob *pixel_format_blob;
+	struct drm_framebuffer *fb;
+	size_t i, nformats;
+	u32 *formats;
+
+	if (!wb_job || !wb_job->fb)
+		return 0;
+
+	pixel_format_blob = wb_job->connector->pixel_formats_blob_ptr;
+	nformats = pixel_format_blob->length / sizeof(u32);
+	formats = pixel_format_blob->data;
+	fb = wb_job->fb;
+
+	for (i = 0; i < nformats; i++)
+		if (fb->format->format == formats[i])
+			return 0;
+
+	drm_dbg_kms(encoder->dev, "Invalid pixel format %p4cc\n", &fb->format->format);
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(drm_atomic_helper_check_wb_encoder_state);
+
 /**
  * drm_atomic_helper_check_plane_state() - Check plane state for validity
  * @plane_state: plane state to check
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index 8694227f555f..746cb0abc6ec 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -30,6 +30,7 @@ static int vkms_wb_encoder_atomic_check(struct drm_encoder *encoder,
 {
 	struct drm_framebuffer *fb;
 	const struct drm_display_mode *mode = &crtc_state->mode;
+	int ret;
 
 	if (!conn_state->writeback_job || !conn_state->writeback_job->fb)
 		return 0;
@@ -41,11 +42,9 @@ static int vkms_wb_encoder_atomic_check(struct drm_encoder *encoder,
 		return -EINVAL;
 	}
 
-	if (fb->format->format != vkms_wb_formats[0]) {
-		DRM_DEBUG_KMS("Invalid pixel format %p4cc\n",
-			      &fb->format->format);
-		return -EINVAL;
-	}
+	ret = drm_atomic_helper_check_wb_encoder_state(encoder, conn_state);
+	if (ret < 0)
+		return ret;
 
 	return 0;
 }
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
index 4045e2507e11..3fbf695da60f 100644
--- a/include/drm/drm_atomic_helper.h
+++ b/include/drm/drm_atomic_helper.h
@@ -40,6 +40,9 @@ struct drm_private_state;
 
 int drm_atomic_helper_check_modeset(struct drm_device *dev,
 				struct drm_atomic_state *state);
+int
+drm_atomic_helper_check_wb_encoder_state(struct drm_encoder *encoder,
+					 struct drm_connector_state *conn_state);
 int drm_atomic_helper_check_plane_state(struct drm_plane_state *plane_state,
 					const struct drm_crtc_state *crtc_state,
 					int min_scale,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job`
  2022-04-04 20:45 [PATCH v5 0/9] Add new formats support to vkms Igor Torrente
                   ` (3 preceding siblings ...)
  2022-04-04 20:45 ` [PATCH v5 4/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation Igor Torrente
@ 2022-04-04 20:45 ` Igor Torrente
  2022-04-20 11:23   ` Pekka Paalanen
  2022-04-04 20:45 ` [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats Igor Torrente
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-04 20:45 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, leandro.ribeiro, dri-devel,
	tales.aparecida, ~lkcamp/patches, Igor Torrente

This commit is the groundwork to introduce new formats to the planes and
writeback buffer. As part of it, a new buffer metadata field is added to
`vkms_writeback_job`, this metadata is represented by the `vkms_composer`
struct.

Also adds two new function pointers (`{wb,plane}_format_transform_func`)
are defined to handle format conversion to/from internal format.

These things will allow us, in the future, to have different compositing
and wb format types.

V2: Change the code to get the drm_framebuffer reference and not copy its
    contents(Thomas Zimmermann).
V3: Drop the refcount in the wb code(Thomas Zimmermann).
V5: Add {wb,plane}_format_transform_func to vkms_writeback_job
    and vkms_plane_state (Pekka Paalanen)

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c  |  4 ++--
 drivers/gpu/drm/vkms/vkms_drv.h       | 31 +++++++++++++++++++++------
 drivers/gpu/drm/vkms/vkms_plane.c     | 10 ++++-----
 drivers/gpu/drm/vkms/vkms_writeback.c | 20 ++++++++++++++---
 4 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 2d946368a561..95029d2ebcac 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -153,7 +153,7 @@ static void compose_plane(struct vkms_frame_info *primary_plane_info,
 			  struct vkms_frame_info *plane_frame_info,
 			  void *vaddr_out)
 {
-	struct drm_framebuffer *fb = &plane_frame_info->fb;
+	struct drm_framebuffer *fb = plane_frame_info->fb;
 	void *vaddr;
 	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
 
@@ -175,7 +175,7 @@ static int compose_active_planes(void **vaddr_out,
 				 struct vkms_frame_info *primary_plane_info,
 				 struct vkms_crtc_state *crtc_state)
 {
-	struct drm_framebuffer *fb = &primary_plane_info->fb;
+	struct drm_framebuffer *fb = primary_plane_info->fb;
 	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
 	const void *vaddr;
 	int i;
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 2e6342164bef..2704cfb6904b 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -22,13 +22,8 @@
 
 #define NUM_OVERLAY_PLANES 8
 
-struct vkms_writeback_job {
-	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
-	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
-};
-
 struct vkms_frame_info {
-	struct drm_framebuffer fb;
+	struct drm_framebuffer *fb;
 	struct drm_rect src, dst;
 	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
 	unsigned int offset;
@@ -36,6 +31,29 @@ struct vkms_frame_info {
 	unsigned int cpp;
 };
 
+struct pixel_argb_u16 {
+	u16 a, r, g, b;
+};
+
+struct line_buffer {
+	size_t n_pixels;
+	struct pixel_argb_u16 *pixels;
+};
+
+typedef void
+(*wb_format_transform_func)(struct vkms_frame_info *frame_info,
+			    const struct line_buffer *buffer, int y);
+
+typedef void
+(*plane_format_transform_func)(struct line_buffer *buffer,
+			       const struct vkms_frame_info *frame_info, int y);
+
+struct vkms_writeback_job {
+	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
+	struct vkms_frame_info frame_info;
+	wb_format_transform_func format_func;
+};
+
 /**
  * vkms_plane_state - Driver specific plane state
  * @base: base plane state
@@ -44,6 +62,7 @@ struct vkms_frame_info {
 struct vkms_plane_state {
 	struct drm_shadow_plane_state base;
 	struct vkms_frame_info *frame_info;
+	plane_format_transform_func format_func;
 };
 
 struct vkms_plane {
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index a56b0f76eddd..28752af0118c 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -50,12 +50,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
 	struct vkms_plane_state *vkms_state = to_vkms_plane_state(old_state);
 	struct drm_crtc *crtc = vkms_state->base.base.crtc;
 
-	if (crtc) {
+	if (crtc && vkms_state->frame_info->fb) {
 		/* dropping the reference we acquired in
 		 * vkms_primary_plane_update()
 		 */
-		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
-			drm_framebuffer_put(&vkms_state->frame_info->fb);
+		if (drm_framebuffer_read_refcount(vkms_state->frame_info->fb))
+			drm_framebuffer_put(vkms_state->frame_info->fb);
 	}
 
 	kfree(vkms_state->frame_info);
@@ -110,9 +110,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	frame_info = vkms_plane_state->frame_info;
 	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
 	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
-	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
+	frame_info->fb = fb;
 	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
-	drm_framebuffer_get(&frame_info->fb);
+	drm_framebuffer_get(frame_info->fb);
 	frame_info->offset = fb->offsets[0];
 	frame_info->pitch = fb->pitches[0];
 	frame_info->cpp = fb->format->cpp[0];
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index 746cb0abc6ec..ad4bb1fb37ca 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -74,12 +74,15 @@ static int vkms_wb_prepare_job(struct drm_writeback_connector *wb_connector,
 	if (!vkmsjob)
 		return -ENOMEM;
 
-	ret = drm_gem_fb_vmap(job->fb, vkmsjob->map, vkmsjob->data);
+	ret = drm_gem_fb_vmap(job->fb, vkmsjob->frame_info.map, vkmsjob->data);
 	if (ret) {
 		DRM_ERROR("vmap failed: %d\n", ret);
 		goto err_kfree;
 	}
 
+	vkmsjob->frame_info.fb = job->fb;
+	drm_framebuffer_get(vkmsjob->frame_info.fb);
+
 	job->priv = vkmsjob;
 
 	return 0;
@@ -98,7 +101,9 @@ static void vkms_wb_cleanup_job(struct drm_writeback_connector *connector,
 	if (!job->fb)
 		return;
 
-	drm_gem_fb_vunmap(job->fb, vkmsjob->map);
+	drm_gem_fb_vunmap(job->fb, vkmsjob->frame_info.map);
+
+	drm_framebuffer_put(vkmsjob->frame_info.fb);
 
 	vkmsdev = drm_device_to_vkms_device(job->fb->dev);
 	vkms_set_composer(&vkmsdev->output, false);
@@ -115,14 +120,23 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
 	struct drm_writeback_connector *wb_conn = &output->wb_connector;
 	struct drm_connector_state *conn_state = wb_conn->base.state;
 	struct vkms_crtc_state *crtc_state = output->composer_state;
+	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
+	struct vkms_writeback_job *active_wb;
+	struct vkms_frame_info *wb_frame_info;
 
 	if (!conn_state)
 		return;
 
 	vkms_set_composer(&vkmsdev->output, true);
 
+	active_wb = conn_state->writeback_job->priv;
+	wb_frame_info = &active_wb->frame_info;
+
 	spin_lock_irq(&output->composer_lock);
-	crtc_state->active_writeback = conn_state->writeback_job->priv;
+	crtc_state->active_writeback = active_wb;
+	wb_frame_info->offset = fb->offsets[0];
+	wb_frame_info->pitch = fb->pitches[0];
+	wb_frame_info->cpp = fb->format->cpp[0];
 	crtc_state->wb_pending = true;
 	spin_unlock_irq(&output->composer_lock);
 	drm_writeback_queue_job(wb_conn, connector_state);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-04-04 20:45 [PATCH v5 0/9] Add new formats support to vkms Igor Torrente
                   ` (4 preceding siblings ...)
  2022-04-04 20:45 ` [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job` Igor Torrente
@ 2022-04-04 20:45 ` Igor Torrente
  2022-04-20 12:36   ` Pekka Paalanen
  2022-04-04 20:45 ` [PATCH v5 7/9] drm: vkms: Supports to the case where primary plane doesn't match the CRTC Igor Torrente
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-04 20:45 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, leandro.ribeiro, dri-devel,
	tales.aparecida, ~lkcamp/patches, Igor Torrente

Currently the blend function only accepts XRGB_8888 and ARGB_8888
as a color input.

This patch refactors all the functions related to the plane composition
to overcome this limitation.

A new internal format(`struct pixel`) is introduced to deal with all
possible inputs. It consists of 16 bits fields that represent each of
the channels.

The pixels blend is done using this internal format. And new handlers
are being added to convert a specific format to/from this internal format.

So the blend operation depends on these handlers to convert to this common
format. The blended result, if necessary, is converted to the writeback
buffer format.

This patch introduces three major differences to the blend function.
1 - All the planes are blended at once.
2 - The blend calculus is done as per line instead of per pixel.
3 - It is responsible to calculates the CRC and writing the writeback
buffer(if necessary).

These changes allow us to allocate way less memory in the intermediate
buffer to compute these operations. Because now we don't need to
have the entire intermediate image lines at once, just one line is
enough.

| Memory consumption (output dimensions) |
|:--------------------------------------:|
|       Current      |     This patch    |
|:------------------:|:-----------------:|
|   Width * Heigth   |     2 * Width     |

Beyond memory, we also have a minor performance benefit from all
these changes. Results running the IGT[1] test
`igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:

|                 Frametime                  |
|:------------------------------------------:|
|  Implementation |  Current  |  This commit |
|:---------------:|:---------:|:------------:|
| frametime range |  9~22 ms  |    5~17 ms   |
|     Average     |  11.4 ms  |    7.8 ms    |

[1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4

V2: Improves the performance drastically, by performing the operations
    per-line and not per-pixel(Pekka Paalanen).
    Minor improvements(Pekka Paalanen).
V3: Changes the code to blend the planes all at once. This improves
    performance, memory consumption, and removes much of the weirdness
    of the V2(Pekka Paalanen and me).
    Minor improvements(Pekka Paalanen and me).
V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen).
    Several security/robustness improvents(Pekka Paalanen).
    Removes check_planes_x_bounds function and allows partial
    partly off-screen(Pekka Paalanen).

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 Documentation/gpu/vkms.rst            |   4 -
 drivers/gpu/drm/vkms/Makefile         |   1 +
 drivers/gpu/drm/vkms/vkms_composer.c  | 318 ++++++++++++--------------
 drivers/gpu/drm/vkms/vkms_formats.c   | 151 ++++++++++++
 drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
 drivers/gpu/drm/vkms/vkms_plane.c     |   3 +
 drivers/gpu/drm/vkms/vkms_writeback.c |   3 +
 7 files changed, 311 insertions(+), 181 deletions(-)
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h

diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
index 973e2d43108b..a49e4ae92653 100644
--- a/Documentation/gpu/vkms.rst
+++ b/Documentation/gpu/vkms.rst
@@ -118,10 +118,6 @@ Add Plane Features
 
 There's lots of plane features we could add support for:
 
-- Clearing primary plane: clear primary plane before plane composition (at the
-  start) for correctness of pixel blend ops. It also guarantees alpha channel
-  is cleared in the target buffer for stable crc. [Good to get started]
-
 - ARGB format on primary plane: blend the primary plane into background with
   translucent alpha.
 
diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
index 72f779cbfedd..1b28a6a32948 100644
--- a/drivers/gpu/drm/vkms/Makefile
+++ b/drivers/gpu/drm/vkms/Makefile
@@ -3,6 +3,7 @@ vkms-y := \
 	vkms_drv.o \
 	vkms_plane.o \
 	vkms_output.o \
+	vkms_formats.o \
 	vkms_crtc.o \
 	vkms_composer.o \
 	vkms_writeback.o
diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 95029d2ebcac..cf24015bf90f 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -7,204 +7,186 @@
 #include <drm/drm_fourcc.h>
 #include <drm/drm_gem_framebuffer_helper.h>
 #include <drm/drm_vblank.h>
+#include <linux/minmax.h>
 
 #include "vkms_drv.h"
 
-static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
-				 const struct vkms_frame_info *frame_info)
+static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
 {
-	u32 pixel;
-	int src_offset = frame_info->offset + (y * frame_info->pitch)
-					    + (x * frame_info->cpp);
+	u32 new_color;
 
-	pixel = *(u32 *)&buffer[src_offset];
+	new_color = (src * 0xffff + dst * (0xffff - alpha));
 
-	return pixel;
+	return DIV_ROUND_CLOSEST(new_color, 0xffff);
 }
 
 /**
- * compute_crc - Compute CRC value on output frame
+ * pre_mul_alpha_blend - alpha blending equation
+ * @src_frame_info: source framebuffer's metadata
+ * @stage_buffer: The line with the pixels from src_plane
+ * @output_buffer: A line buffer that receives all the blends output
  *
- * @vaddr: address to final framebuffer
- * @frame_info: framebuffer's metadata
+ * Using the information from the `frame_info`, this blends only the
+ * necessary pixels from the `stage_buffer` to the `output_buffer`
+ * using premultiplied blend formula.
  *
- * returns CRC value computed using crc32 on the visible portion of
- * the final framebuffer at vaddr_out
+ * The current DRM assumption is that pixel color values have been already
+ * pre-multiplied with the alpha channel values. See more
+ * drm_plane_create_blend_mode_property(). Also, this formula assumes a
+ * completely opaque background.
  */
-static uint32_t compute_crc(const u8 *vaddr,
-			    const struct vkms_frame_info *frame_info)
+static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
+				struct line_buffer *stage_buffer,
+				struct line_buffer *output_buffer)
 {
-	int x, y;
-	u32 crc = 0, pixel = 0;
-	int x_src = frame_info->src.x1 >> 16;
-	int y_src = frame_info->src.y1 >> 16;
-	int h_src = drm_rect_height(&frame_info->src) >> 16;
-	int w_src = drm_rect_width(&frame_info->src) >> 16;
-
-	for (y = y_src; y < y_src + h_src; ++y) {
-		for (x = x_src; x < x_src + w_src; ++x) {
-			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
-			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
-		}
+	int x, x_dst = frame_info->dst.x1;
+	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
+	struct pixel_argb_u16 *in = stage_buffer->pixels;
+	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
+			    stage_buffer->n_pixels);
+
+	for (x = 0; x < x_limit; x++) {
+		out[x].a = (u16)0xffff;
+		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
+		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
+		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
 	}
-
-	return crc;
 }
 
-static u8 blend_channel(u8 src, u8 dst, u8 alpha)
+static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
 {
-	u32 pre_blend;
-	u8 new_color;
-
-	pre_blend = (src * 255 + dst * (255 - alpha));
-
-	/* Faster div by 255 */
-	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
+	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
+		return true;
 
-	return new_color;
+	return false;
 }
 
 /**
- * alpha_blend - alpha blending equation
- * @argb_src: src pixel on premultiplied alpha mode
- * @argb_dst: dst pixel completely opaque
+ * @wb_frame_info: The writeback frame buffer metadata
+ * @crtc_state: The crtc state
+ * @crc32: The crc output of the final frame
+ * @output_buffer: A buffer of a row that will receive the result of the blend(s)
+ * @stage_buffer: The line with the pixels from plane being blend to the output
  *
- * blend pixels using premultiplied blend formula. The current DRM assumption
- * is that pixel color values have been already pre-multiplied with the alpha
- * channel values. See more drm_plane_create_blend_mode_property(). Also, this
- * formula assumes a completely opaque background.
+ * This function blends the pixels (Using the `pre_mul_alpha_blend`)
+ * from all planes, calculates the crc32 of the output from the former step,
+ * and, if necessary, convert and store the output to the writeback buffer.
  */
-static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
+static void blend(struct vkms_writeback_job *wb,
+		  struct vkms_crtc_state *crtc_state,
+		  u32 *crc32, struct line_buffer *stage_buffer,
+		  struct line_buffer *output_buffer, s64 row_size)
 {
-	u8 alpha;
+	struct vkms_plane_state **plane = crtc_state->active_planes;
+	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
+	u32 n_active_planes = crtc_state->num_active_planes;
+
+	int y_dst = primary_plane_info->dst.y1;
+	int h_dst = drm_rect_height(&primary_plane_info->dst);
+	int y_limit = y_dst + h_dst;
+	int y, i;
+
+	for (y = y_dst; y < y_limit; y++) {
+		plane[0]->format_func(output_buffer, primary_plane_info, y);
+
+		/* If there are other planes besides primary, we consider the active
+		 * planes should be in z-order and compose them associatively:
+		 * ((primary <- overlay) <- cursor)
+		 */
+		for (i = 1; i < n_active_planes; i++) {
+			if (!check_y_limit(plane[i]->frame_info, y))
+				continue;
+
+			plane[i]->format_func(stage_buffer, plane[i]->frame_info, y);
+			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
+					    output_buffer);
+		}
 
-	alpha = argb_src[3];
-	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
-	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
-	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
-}
+		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
 
-/**
- * x_blend - blending equation that ignores the pixel alpha
- *
- * overwrites RGB color value from src pixel to dst pixel.
- */
-static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
-{
-	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
-}
-
-/**
- * blend - blend value at vaddr_src with value at vaddr_dst
- * @vaddr_dst: destination address
- * @vaddr_src: source address
- * @dst_frame_info: destination framebuffer's metadata
- * @src_frame_info: source framebuffer's metadata
- * @pixel_blend: blending equation based on plane format
- *
- * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
- * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
- * and clearing alpha channel to an completely opaque background. This function
- * uses buffer's metadata to locate the new composite values at vaddr_dst.
- *
- * TODO: completely clear the primary plane (a = 0xff) before starting to blend
- * pixel color values
- */
-static void blend(void *vaddr_dst, void *vaddr_src,
-		  struct vkms_frame_info *dst_frame_info,
-		  struct vkms_frame_info *src_frame_info,
-		  void (*pixel_blend)(const u8 *, u8 *))
-{
-	int i, j, j_dst, i_dst;
-	int offset_src, offset_dst;
-	u8 *pixel_dst, *pixel_src;
-
-	int x_src = src_frame_info->src.x1 >> 16;
-	int y_src = src_frame_info->src.y1 >> 16;
-
-	int x_dst = src_frame_info->dst.x1;
-	int y_dst = src_frame_info->dst.y1;
-	int h_dst = drm_rect_height(&src_frame_info->dst);
-	int w_dst = drm_rect_width(&src_frame_info->dst);
-
-	int y_limit = y_src + h_dst;
-	int x_limit = x_src + w_dst;
-
-	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
-		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
-			offset_dst = dst_frame_info->offset
-				     + (i_dst * dst_frame_info->pitch)
-				     + (j_dst++ * dst_frame_info->cpp);
-			offset_src = src_frame_info->offset
-				     + (i * src_frame_info->pitch)
-				     + (j * src_frame_info->cpp);
-
-			pixel_src = (u8 *)(vaddr_src + offset_src);
-			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
-			pixel_blend(pixel_src, pixel_dst);
-			/* clearing alpha channel (0xff)*/
-			pixel_dst[3] = 0xff;
-		}
-		i_dst++;
+		if (wb)
+			wb->format_func(&wb->frame_info, output_buffer, y);
 	}
 }
 
-static void compose_plane(struct vkms_frame_info *primary_plane_info,
-			  struct vkms_frame_info *plane_frame_info,
-			  void *vaddr_out)
+static int check_format_funcs(struct vkms_crtc_state *crtc_state,
+			      struct vkms_writeback_job *active_wb)
 {
-	struct drm_framebuffer *fb = plane_frame_info->fb;
-	void *vaddr;
-	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
-
-	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
-		return;
+	struct vkms_plane_state **planes = crtc_state->active_planes;
+	u32 n_active_planes = crtc_state->num_active_planes;
+	int i;
 
-	vaddr = plane_frame_info->map[0].vaddr;
+	for (i = 0; i < n_active_planes; i++)
+		if (!planes[i]->format_func)
+			return -1;
 
-	if (fb->format->format == DRM_FORMAT_ARGB8888)
-		pixel_blend = &alpha_blend;
-	else
-		pixel_blend = &x_blend;
+	if (active_wb && !active_wb->format_func)
+		return -1;
 
-	blend(vaddr_out, vaddr, primary_plane_info,
-	      plane_frame_info, pixel_blend);
+	return 0;
 }
 
-static int compose_active_planes(void **vaddr_out,
-				 struct vkms_frame_info *primary_plane_info,
-				 struct vkms_crtc_state *crtc_state)
+static int compose_active_planes(struct vkms_writeback_job *active_wb,
+				 struct vkms_crtc_state *crtc_state,
+				 u32 *crc32)
 {
-	struct drm_framebuffer *fb = primary_plane_info->fb;
-	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
-	const void *vaddr;
-	int i;
+	int line_width, ret = 0, pixel_size = sizeof(struct pixel_argb_u16);
+	struct vkms_frame_info *primary_plane_info = NULL;
+	struct line_buffer output_buffer, stage_buffer;
+	struct vkms_plane_state *act_plane = NULL;
+	u32 wb_format;
 
-	if (!*vaddr_out) {
-		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
-		if (!*vaddr_out) {
-			DRM_ERROR("Cannot allocate memory for output frame.");
-			return -ENOMEM;
-		}
+	if (WARN_ON(pixel_size != 8))
+		return -EINVAL;
+
+	if (crtc_state->num_active_planes >= 1) {
+		act_plane = crtc_state->active_planes[0];
+		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
+			primary_plane_info = act_plane->frame_info;
 	}
 
+	if (!primary_plane_info)
+		return -EINVAL;
+
 	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
 		return -EINVAL;
 
-	vaddr = primary_plane_info->map[0].vaddr;
+	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
+		return -EINVAL;
 
-	memcpy(*vaddr_out, vaddr, gem_obj->size);
+	line_width = drm_rect_width(&primary_plane_info->dst);
+	stage_buffer.n_pixels = line_width;
+	output_buffer.n_pixels = line_width;
 
-	/* If there are other planes besides primary, we consider the active
-	 * planes should be in z-order and compose them associatively:
-	 * ((primary <- overlay) <- cursor)
-	 */
-	for (i = 1; i < crtc_state->num_active_planes; i++)
-		compose_plane(primary_plane_info,
-			      crtc_state->active_planes[i]->frame_info,
-			      *vaddr_out);
+	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
+	if (!stage_buffer.pixels) {
+		DRM_ERROR("Cannot allocate memory for the output line buffer");
+		return -ENOMEM;
+	}
 
-	return 0;
+	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
+	if (!output_buffer.pixels) {
+		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
+		ret = -ENOMEM;
+		goto free_stage_buffer;
+	}
+
+	if (active_wb) {
+		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
+
+		wb_format = wb_frame_info->fb->format->format;
+		wb_frame_info->src = primary_plane_info->src;
+		wb_frame_info->dst = primary_plane_info->dst;
+	}
+
+	blend(active_wb, crtc_state, crc32, &stage_buffer,
+	      &output_buffer, (s64)line_width * pixel_size);
+
+	kvfree(output_buffer.pixels);
+free_stage_buffer:
+	kvfree(stage_buffer.pixels);
+
+	return ret;
 }
 
 /**
@@ -222,13 +204,11 @@ void vkms_composer_worker(struct work_struct *work)
 						struct vkms_crtc_state,
 						composer_work);
 	struct drm_crtc *crtc = crtc_state->base.crtc;
+	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
 	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
-	struct vkms_frame_info *primary_plane_info = NULL;
-	struct vkms_plane_state *act_plane = NULL;
 	bool crc_pending, wb_pending;
-	void *vaddr_out = NULL;
-	u32 crc32 = 0;
 	u64 frame_start, frame_end;
+	u32 crc32 = 0;
 	int ret;
 
 	spin_lock_irq(&out->composer_lock);
@@ -248,35 +228,19 @@ void vkms_composer_worker(struct work_struct *work)
 	if (!crc_pending)
 		return;
 
-	if (crtc_state->num_active_planes >= 1) {
-		act_plane = crtc_state->active_planes[0];
-		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
-			primary_plane_info = act_plane->frame_info;
-	}
-
-	if (!primary_plane_info)
-		return;
-
 	if (wb_pending)
-		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
+		ret = compose_active_planes(active_wb, crtc_state, &crc32);
+	else
+		ret = compose_active_planes(NULL, crtc_state, &crc32);
 
-	ret = compose_active_planes(&vaddr_out, primary_plane_info,
-				    crtc_state);
-	if (ret) {
-		if (ret == -EINVAL && !wb_pending)
-			kvfree(vaddr_out);
+	if (ret)
 		return;
-	}
-
-	crc32 = compute_crc(vaddr_out, primary_plane_info);
 
 	if (wb_pending) {
 		drm_writeback_signal_completion(&out->wb_connector, 0);
 		spin_lock_irq(&out->composer_lock);
 		crtc_state->wb_pending = false;
 		spin_unlock_irq(&out->composer_lock);
-	} else {
-		kvfree(vaddr_out);
 	}
 
 	/*
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
new file mode 100644
index 000000000000..931a61405d6a
--- /dev/null
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <drm/drm_rect.h>
+#include <linux/minmax.h>
+
+#include "vkms_formats.h"
+
+static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
+{
+	return frame_info->offset + (y * frame_info->pitch)
+				  + (x * frame_info->cpp);
+}
+
+/*
+ * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
+ *
+ * @frame_info: Buffer metadata
+ * @x: The x(width) coordinate of the 2D buffer
+ * @y: The y(Heigth) coordinate of the 2D buffer
+ *
+ * Takes the information stored in the frame_info, a pair of coordinates, and
+ * returns the address of the first color channel.
+ * This function assumes the channels are packed together, i.e. a color channel
+ * comes immediately after another in the memory. And therefore, this function
+ * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
+ */
+static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
+				int x, int y)
+{
+	int offset = pixel_offset(frame_info, x, y);
+
+	return (u8 *)frame_info->map[0].vaddr + offset;
+}
+
+static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
+{
+	int x_src = frame_info->src.x1 >> 16;
+	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
+
+	return packed_pixels_addr(frame_info, x_src, y_src);
+}
+
+static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
+				 const struct vkms_frame_info *frame_info, int y)
+{
+	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
+	u8 *src_pixels = get_packed_src_addr(frame_info, y);
+	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
+			       stage_buffer->n_pixels);
+
+	for (x = 0; x < x_limit; x++, src_pixels += 4) {
+		/*
+		 * The 257 is the "conversion ratio". This number is obtained by the
+		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
+		 * the best color value in a pixel format with more possibilities.
+		 * A similar idea applies to others RGB color conversions.
+		 */
+		out_pixels[x].a = (u16)src_pixels[3] * 257;
+		out_pixels[x].r = (u16)src_pixels[2] * 257;
+		out_pixels[x].g = (u16)src_pixels[1] * 257;
+		out_pixels[x].b = (u16)src_pixels[0] * 257;
+	}
+}
+
+static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
+				 const struct vkms_frame_info *frame_info, int y)
+{
+	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
+	u8 *src_pixels = get_packed_src_addr(frame_info, y);
+	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
+			       stage_buffer->n_pixels);
+
+	for (x = 0; x < x_limit; x++, src_pixels += 4) {
+		out_pixels[x].a = (u16)0xffff;
+		out_pixels[x].r = (u16)src_pixels[2] * 257;
+		out_pixels[x].g = (u16)src_pixels[1] * 257;
+		out_pixels[x].b = (u16)src_pixels[0] * 257;
+	}
+}
+
+/*
+ * The following  functions take an line of argb_u16 pixels from the
+ * src_buffer, convert them to a specific format, and store them in the
+ * destination.
+ *
+ * They are used in the `compose_active_planes` to convert and store a line
+ * from the src_buffer to the writeback buffer.
+ */
+static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
+				 const struct line_buffer *src_buffer, int y)
+{
+	int x, x_dst = frame_info->dst.x1;
+	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
+	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
+			    src_buffer->n_pixels);
+
+	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
+		/*
+		 * This sequence below is important because the format's byte order is
+		 * in little-endian. In the case of the ARGB8888 the memory is
+		 * organized this way:
+		 *
+		 * | Addr     | = blue channel
+		 * | Addr + 1 | = green channel
+		 * | Addr + 2 | = Red channel
+		 * | Addr + 3 | = Alpha channel
+		 */
+		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
+		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
+		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
+		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
+	}
+}
+
+static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
+				 const struct line_buffer *src_buffer, int y)
+{
+	int x, x_dst = frame_info->dst.x1;
+	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
+	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
+			    src_buffer->n_pixels);
+
+	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
+		dst_pixels[3] = (u8)0xff;
+		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
+		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
+		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
+	}
+}
+
+plane_format_transform_func get_plane_fmt_transform_function(u32 format)
+{
+	if (format == DRM_FORMAT_ARGB8888)
+		return &ARGB8888_to_argb_u16;
+	else if (format == DRM_FORMAT_XRGB8888)
+		return &XRGB8888_to_argb_u16;
+	else
+		return NULL;
+}
+
+wb_format_transform_func get_wb_fmt_transform_function(u32 format)
+{
+	if (format == DRM_FORMAT_ARGB8888)
+		return &argb_u16_to_ARGB8888;
+	else if (format == DRM_FORMAT_XRGB8888)
+		return &argb_u16_to_XRGB8888;
+	else
+		return NULL;
+}
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
new file mode 100644
index 000000000000..adc5a17b9584
--- /dev/null
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#ifndef _VKMS_FORMATS_H_
+#define _VKMS_FORMATS_H_
+
+#include "vkms_drv.h"
+
+plane_format_transform_func get_plane_fmt_transform_function(u32 format);
+
+wb_format_transform_func get_wb_fmt_transform_function(u32 format);
+
+#endif /* _VKMS_FORMATS_H_ */
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 28752af0118c..798243837fd0 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -10,6 +10,7 @@
 #include <drm/drm_plane_helper.h>
 
 #include "vkms_drv.h"
+#include "vkms_formats.h"
 
 static const u32 vkms_formats[] = {
 	DRM_FORMAT_XRGB8888,
@@ -100,6 +101,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	struct drm_shadow_plane_state *shadow_plane_state;
 	struct drm_framebuffer *fb = new_state->fb;
 	struct vkms_frame_info *frame_info;
+	u32 fmt = fb->format->format;
 
 	if (!new_state->crtc || !fb)
 		return;
@@ -116,6 +118,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	frame_info->offset = fb->offsets[0];
 	frame_info->pitch = fb->pitches[0];
 	frame_info->cpp = fb->format->cpp[0];
+	vkms_plane_state->format_func = get_plane_fmt_transform_function(fmt);
 }
 
 static int vkms_plane_atomic_check(struct drm_plane *plane,
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index ad4bb1fb37ca..97f71e784bbf 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -11,6 +11,7 @@
 #include <drm/drm_gem_shmem_helper.h>
 
 #include "vkms_drv.h"
+#include "vkms_formats.h"
 
 static const u32 vkms_wb_formats[] = {
 	DRM_FORMAT_XRGB8888,
@@ -123,6 +124,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
 	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
 	struct vkms_writeback_job *active_wb;
 	struct vkms_frame_info *wb_frame_info;
+	u32 wb_format = fb->format->format;
 
 	if (!conn_state)
 		return;
@@ -140,6 +142,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
 	crtc_state->wb_pending = true;
 	spin_unlock_irq(&output->composer_lock);
 	drm_writeback_queue_job(wb_conn, connector_state);
+	active_wb->format_func = get_wb_fmt_transform_function(wb_format);
 }
 
 static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs = {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 7/9] drm: vkms: Supports to the case where primary plane doesn't match the CRTC
  2022-04-04 20:45 [PATCH v5 0/9] Add new formats support to vkms Igor Torrente
                   ` (5 preceding siblings ...)
  2022-04-04 20:45 ` [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats Igor Torrente
@ 2022-04-04 20:45 ` Igor Torrente
  2022-04-20 13:13   ` Pekka Paalanen
  2022-04-04 20:45 ` [PATCH v5 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats Igor Torrente
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-04 20:45 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, leandro.ribeiro, dri-devel,
	tales.aparecida, ~lkcamp/patches, Igor Torrente

We will break the current assumption that the primary plane has the
same size and position as CRTC.

For that we will add CRTC dimension information to `vkms_crtc_state`
and add a opaque black backgound color.

Because now we need to fill the background, we had a loss in
performance with this change. Results running the IGT[1] test
`igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:

|                  Frametime                   |
|:--------------------------------------------:|
|  Implementation |  Previous |   This commit  |
|:---------------:|:---------:|:--------------:|
| frametime range |  5~18 ms  |     10~22 ms   |
|     Average     |  8.47 ms  |     12.32 ms   |

[1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 Documentation/gpu/vkms.rst           |  3 +--
 drivers/gpu/drm/vkms/vkms_composer.c | 32 +++++++++++++++++++---------
 drivers/gpu/drm/vkms/vkms_crtc.c     |  4 ++++
 drivers/gpu/drm/vkms/vkms_drv.h      |  2 ++
 4 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
index a49e4ae92653..49db221c0f52 100644
--- a/Documentation/gpu/vkms.rst
+++ b/Documentation/gpu/vkms.rst
@@ -121,8 +121,7 @@ There's lots of plane features we could add support for:
 - ARGB format on primary plane: blend the primary plane into background with
   translucent alpha.
 
-- Support when the primary plane isn't exactly matching the output size: blend
-  the primary plane into the black background.
+- Add background color KMS property[Good to get started].
 
 - Full alpha blending on all planes.
 
diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index cf24015bf90f..f80842227669 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -61,6 +61,15 @@ static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
 	return false;
 }
 
+static void fill_background(struct pixel_argb_u16 *backgroud_color,
+			    struct line_buffer *output_buffer)
+{
+	int i;
+
+	for (i = 0; i < output_buffer->n_pixels; i++)
+		output_buffer->pixels[i] = *backgroud_color;
+}
+
 /**
  * @wb_frame_info: The writeback frame buffer metadata
  * @crtc_state: The crtc state
@@ -78,22 +87,23 @@ static void blend(struct vkms_writeback_job *wb,
 		  struct line_buffer *output_buffer, s64 row_size)
 {
 	struct vkms_plane_state **plane = crtc_state->active_planes;
-	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
 	u32 n_active_planes = crtc_state->num_active_planes;
 
-	int y_dst = primary_plane_info->dst.y1;
-	int h_dst = drm_rect_height(&primary_plane_info->dst);
-	int y_limit = y_dst + h_dst;
+	struct pixel_argb_u16 background_color = (struct pixel_argb_u16) {
+		.a = 0xffff
+	};
+
+	int crtc_y_limit = crtc_state->crtc_height;
 	int y, i;
 
-	for (y = y_dst; y < y_limit; y++) {
-		plane[0]->format_func(output_buffer, primary_plane_info, y);
+	for (y = 0; y < crtc_y_limit; y++) {
+		fill_background(&background_color, output_buffer);
 
 		/* If there are other planes besides primary, we consider the active
 		 * planes should be in z-order and compose them associatively:
 		 * ((primary <- overlay) <- cursor)
 		 */
-		for (i = 1; i < n_active_planes; i++) {
+		for (i = 0; i < n_active_planes; i++) {
 			if (!check_y_limit(plane[i]->frame_info, y))
 				continue;
 
@@ -154,7 +164,7 @@ static int compose_active_planes(struct vkms_writeback_job *active_wb,
 	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
 		return -EINVAL;
 
-	line_width = drm_rect_width(&primary_plane_info->dst);
+	line_width = crtc_state->crtc_width;
 	stage_buffer.n_pixels = line_width;
 	output_buffer.n_pixels = line_width;
 
@@ -175,8 +185,10 @@ static int compose_active_planes(struct vkms_writeback_job *active_wb,
 		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
 
 		wb_format = wb_frame_info->fb->format->format;
-		wb_frame_info->src = primary_plane_info->src;
-		wb_frame_info->dst = primary_plane_info->dst;
+		drm_rect_init(&wb_frame_info->src, 0, 0, crtc_state->crtc_width,
+			      crtc_state->crtc_height);
+		drm_rect_init(&wb_frame_info->dst, 0, 0, crtc_state->crtc_width,
+			      crtc_state->crtc_height);
 	}
 
 	blend(active_wb, crtc_state, crc32, &stage_buffer,
diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
index 57bbd32e9beb..4a37e243c2d7 100644
--- a/drivers/gpu/drm/vkms/vkms_crtc.c
+++ b/drivers/gpu/drm/vkms/vkms_crtc.c
@@ -248,7 +248,9 @@ static void vkms_crtc_atomic_begin(struct drm_crtc *crtc,
 static void vkms_crtc_atomic_flush(struct drm_crtc *crtc,
 				   struct drm_atomic_state *state)
 {
+	struct drm_crtc_state *crtc_state = drm_atomic_get_new_crtc_state(state, crtc);
 	struct vkms_output *vkms_output = drm_crtc_to_vkms_output(crtc);
+	struct drm_display_mode *mode = &crtc_state->mode;
 
 	if (crtc->state->event) {
 		spin_lock(&crtc->dev->event_lock);
@@ -264,6 +266,8 @@ static void vkms_crtc_atomic_flush(struct drm_crtc *crtc,
 	}
 
 	vkms_output->composer_state = to_vkms_crtc_state(crtc->state);
+	vkms_output->composer_state->crtc_width = mode->hdisplay;
+	vkms_output->composer_state->crtc_height = mode->vdisplay;
 
 	spin_unlock_irq(&vkms_output->lock);
 }
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 2704cfb6904b..ab92d9f7b701 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -90,6 +90,8 @@ struct vkms_crtc_state {
 	bool wb_pending;
 	u64 frame_start;
 	u64 frame_end;
+	u16 crtc_width;
+	u16 crtc_height;
 };
 
 struct vkms_output {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
  2022-04-04 20:45 [PATCH v5 0/9] Add new formats support to vkms Igor Torrente
                   ` (6 preceding siblings ...)
  2022-04-04 20:45 ` [PATCH v5 7/9] drm: vkms: Supports to the case where primary plane doesn't match the CRTC Igor Torrente
@ 2022-04-04 20:45 ` Igor Torrente
  2022-04-20 13:19   ` Pekka Paalanen
  2022-05-07  7:32   ` Thomas Zimmermann
  2022-04-04 20:45 ` [PATCH v5 9/9] drm: vkms: Add support to the RGB565 format Igor Torrente
  2022-06-13  9:52 ` [PATCH v5 0/9] Add new formats support to vkms Melissa Wen
  9 siblings, 2 replies; 44+ messages in thread
From: Igor Torrente @ 2022-04-04 20:45 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, leandro.ribeiro, dri-devel,
	tales.aparecida, ~lkcamp/patches, Igor Torrente

This will be useful to write tests that depends on these formats.

ARGB and XRGB follows the a similar implementation of the former formats.
Just adjusting for 16 bits per channel.

V3: Adapt the handlers to the new format introduced in patch 7 V3.
V5: Minor improvements
    Added le16_to_cpu/cpu_to_le16 to the 16 bits color read/writes.

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_formats.c   | 77 +++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_plane.c     |  5 +-
 drivers/gpu/drm/vkms/vkms_writeback.c |  2 +
 3 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 931a61405d6a..8d913fa7dbde 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -78,6 +78,41 @@ static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
 	}
 }
 
+static void ARGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
+				     const struct vkms_frame_info *frame_info,
+				     int y)
+{
+	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
+	u16 *src_pixels = get_packed_src_addr(frame_info, y);
+	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
+			       stage_buffer->n_pixels);
+
+	for (x = 0; x < x_limit; x++, src_pixels += 4) {
+		out_pixels[x].a = le16_to_cpu(src_pixels[3]);
+		out_pixels[x].r = le16_to_cpu(src_pixels[2]);
+		out_pixels[x].g = le16_to_cpu(src_pixels[1]);
+		out_pixels[x].b = le16_to_cpu(src_pixels[0]);
+	}
+}
+
+static void XRGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
+				     const struct vkms_frame_info *frame_info,
+				     int y)
+{
+	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
+	u16 *src_pixels = get_packed_src_addr(frame_info, y);
+	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
+			       stage_buffer->n_pixels);
+
+	for (x = 0; x < x_limit; x++, src_pixels += 4) {
+		out_pixels[x].a = (u16)0xffff;
+		out_pixels[x].r = le16_to_cpu(src_pixels[2]);
+		out_pixels[x].g = le16_to_cpu(src_pixels[1]);
+		out_pixels[x].b = le16_to_cpu(src_pixels[0]);
+	}
+}
+
+
 /*
  * The following  functions take an line of argb_u16 pixels from the
  * src_buffer, convert them to a specific format, and store them in the
@@ -130,12 +165,50 @@ static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
 	}
 }
 
+static void argb_u16_to_ARGB16161616(struct vkms_frame_info *frame_info,
+				     const struct line_buffer *src_buffer, int y)
+{
+	int x, x_dst = frame_info->dst.x1;
+	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
+	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
+			    src_buffer->n_pixels);
+
+	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
+		dst_pixels[3] = cpu_to_le16(in_pixels[x].a);
+		dst_pixels[2] = cpu_to_le16(in_pixels[x].r);
+		dst_pixels[1] = cpu_to_le16(in_pixels[x].g);
+		dst_pixels[0] = cpu_to_le16(in_pixels[x].b);
+	}
+}
+
+static void argb_u16_to_XRGB16161616(struct vkms_frame_info *frame_info,
+				     const struct line_buffer *src_buffer, int y)
+{
+	int x, x_dst = frame_info->dst.x1;
+	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
+	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
+			    src_buffer->n_pixels);
+
+	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
+		dst_pixels[3] = (u8)0xffff;
+		dst_pixels[2] = cpu_to_le16(in_pixels[x].r);
+		dst_pixels[1] = cpu_to_le16(in_pixels[x].g);
+		dst_pixels[0] = cpu_to_le16(in_pixels[x].b);
+	}
+}
+
 plane_format_transform_func get_plane_fmt_transform_function(u32 format)
 {
 	if (format == DRM_FORMAT_ARGB8888)
 		return &ARGB8888_to_argb_u16;
 	else if (format == DRM_FORMAT_XRGB8888)
 		return &XRGB8888_to_argb_u16;
+	else if (format == DRM_FORMAT_ARGB16161616)
+		return &ARGB16161616_to_argb_u16;
+	else if (format == DRM_FORMAT_XRGB16161616)
+		return &XRGB16161616_to_argb_u16;
 	else
 		return NULL;
 }
@@ -146,6 +219,10 @@ wb_format_transform_func get_wb_fmt_transform_function(u32 format)
 		return &argb_u16_to_ARGB8888;
 	else if (format == DRM_FORMAT_XRGB8888)
 		return &argb_u16_to_XRGB8888;
+	else if (format == DRM_FORMAT_ARGB16161616)
+		return &argb_u16_to_ARGB16161616;
+	else if (format == DRM_FORMAT_XRGB16161616)
+		return &argb_u16_to_XRGB16161616;
 	else
 		return NULL;
 }
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 798243837fd0..60054a85204a 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -14,11 +14,14 @@
 
 static const u32 vkms_formats[] = {
 	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_XRGB16161616
 };
 
 static const u32 vkms_plane_formats[] = {
 	DRM_FORMAT_ARGB8888,
-	DRM_FORMAT_XRGB8888
+	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_XRGB16161616,
+	DRM_FORMAT_ARGB16161616
 };
 
 static struct drm_plane_state *
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index 97f71e784bbf..cb63a5da9af1 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -15,6 +15,8 @@
 
 static const u32 vkms_wb_formats[] = {
 	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_XRGB16161616,
+	DRM_FORMAT_ARGB16161616
 };
 
 static const struct drm_connector_funcs vkms_wb_connector_funcs = {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 9/9] drm: vkms: Add support to the RGB565 format
  2022-04-04 20:45 [PATCH v5 0/9] Add new formats support to vkms Igor Torrente
                   ` (7 preceding siblings ...)
  2022-04-04 20:45 ` [PATCH v5 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats Igor Torrente
@ 2022-04-04 20:45 ` Igor Torrente
  2022-04-21 10:58   ` Pekka Paalanen
  2022-06-13  9:52 ` [PATCH v5 0/9] Add new formats support to vkms Melissa Wen
  9 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-04 20:45 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, leandro.ribeiro, dri-devel,
	tales.aparecida, ~lkcamp/patches, Igor Torrente

Adds this common format to vkms.

This commit also adds new helper macros to deal with fixed-point
arithmetic.

It was done to improve the precision of the conversion to ARGB16161616
since the "conversion ratio" is not an integer.

V3: Adapt the handlers to the new format introduced in patch 7 V3.
V5: Minor improvements

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_formats.c   | 70 +++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
 drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
 3 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 8d913fa7dbde..4af8b295f31e 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -5,6 +5,23 @@
 
 #include "vkms_formats.h"
 
+/* The following macros help doing fixed point arithmetic. */
+/*
+ * With Fixed-Point scale 15 we have 17 and 15 bits of integer and fractional
+ * parts respectively.
+ *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
+ * 31                                          0
+ */
+#define FIXED_SCALE 15
+
+#define INT_TO_FIXED(a) ((a) << FIXED_SCALE)
+#define FIXED_MUL(a, b) ((s32)(((s64)(a) * (b)) >> FIXED_SCALE))
+#define FIXED_DIV(a, b) ((s32)(((s64)(a) << FIXED_SCALE) / (b)))
+/* This macro converts a fixed point number to int, and round half up it */
+#define FIXED_TO_INT_ROUND(a) (((a) + (1 << (FIXED_SCALE - 1))) >> FIXED_SCALE)
+/* Convert divisor and dividend to Fixed-Point and performs the division */
+#define INT_TO_FIXED_DIV(a, b) (FIXED_DIV(INT_TO_FIXED(a), INT_TO_FIXED(b)))
+
 static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
 {
 	return frame_info->offset + (y * frame_info->pitch)
@@ -112,6 +129,30 @@ static void XRGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
 	}
 }
 
+static void RGB565_to_argb_u16(struct line_buffer *stage_buffer,
+			       const struct vkms_frame_info *frame_info, int y)
+{
+	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
+	u16 *src_pixels = get_packed_src_addr(frame_info, y);
+	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
+			       stage_buffer->n_pixels);
+
+	for (x = 0; x < x_limit; x++, src_pixels++) {
+		u16 rgb_565 = le16_to_cpu(*src_pixels);
+		int fp_r = INT_TO_FIXED((rgb_565 >> 11) & 0x1f);
+		int fp_g = INT_TO_FIXED((rgb_565 >> 5) & 0x3f);
+		int fp_b = INT_TO_FIXED(rgb_565 & 0x1f);
+
+		int fp_rb_ratio = INT_TO_FIXED_DIV(65535, 31);
+		int fp_g_ratio = INT_TO_FIXED_DIV(65535, 63);
+
+		out_pixels[x].a = (u16)0xffff;
+		out_pixels[x].r = FIXED_TO_INT_ROUND(FIXED_MUL(fp_r, fp_rb_ratio));
+		out_pixels[x].g = FIXED_TO_INT_ROUND(FIXED_MUL(fp_g, fp_g_ratio));
+		out_pixels[x].b = FIXED_TO_INT_ROUND(FIXED_MUL(fp_b, fp_rb_ratio));
+	}
+}
+
 
 /*
  * The following  functions take an line of argb_u16 pixels from the
@@ -199,6 +240,31 @@ static void argb_u16_to_XRGB16161616(struct vkms_frame_info *frame_info,
 	}
 }
 
+static void argb_u16_to_RGB565(struct vkms_frame_info *frame_info,
+			       const struct line_buffer *src_buffer, int y)
+{
+	int x, x_dst = frame_info->dst.x1;
+	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
+	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
+			    src_buffer->n_pixels);
+
+	for (x = 0; x < x_limit; x++, dst_pixels++) {
+		int fp_r = INT_TO_FIXED(in_pixels[x].r);
+		int fp_g = INT_TO_FIXED(in_pixels[x].g);
+		int fp_b = INT_TO_FIXED(in_pixels[x].b);
+
+		int fp_rb_ratio = INT_TO_FIXED_DIV(65535, 31);
+		int fp_g_ratio = INT_TO_FIXED_DIV(65535, 63);
+
+		u16 r = FIXED_TO_INT_ROUND(FIXED_DIV(fp_r, fp_rb_ratio));
+		u16 g = FIXED_TO_INT_ROUND(FIXED_DIV(fp_g, fp_g_ratio));
+		u16 b = FIXED_TO_INT_ROUND(FIXED_DIV(fp_b, fp_rb_ratio));
+
+		*dst_pixels = cpu_to_le16(r << 11 | g << 5 | b);
+	}
+}
+
 plane_format_transform_func get_plane_fmt_transform_function(u32 format)
 {
 	if (format == DRM_FORMAT_ARGB8888)
@@ -209,6 +275,8 @@ plane_format_transform_func get_plane_fmt_transform_function(u32 format)
 		return &ARGB16161616_to_argb_u16;
 	else if (format == DRM_FORMAT_XRGB16161616)
 		return &XRGB16161616_to_argb_u16;
+	else if (format == DRM_FORMAT_RGB565)
+		return &RGB565_to_argb_u16;
 	else
 		return NULL;
 }
@@ -223,6 +291,8 @@ wb_format_transform_func get_wb_fmt_transform_function(u32 format)
 		return &argb_u16_to_ARGB16161616;
 	else if (format == DRM_FORMAT_XRGB16161616)
 		return &argb_u16_to_XRGB16161616;
+	else if (format == DRM_FORMAT_RGB565)
+		return &argb_u16_to_RGB565;
 	else
 		return NULL;
 }
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 60054a85204a..94a8e412886f 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -14,14 +14,16 @@
 
 static const u32 vkms_formats[] = {
 	DRM_FORMAT_XRGB8888,
-	DRM_FORMAT_XRGB16161616
+	DRM_FORMAT_XRGB16161616,
+	DRM_FORMAT_RGB565
 };
 
 static const u32 vkms_plane_formats[] = {
 	DRM_FORMAT_ARGB8888,
 	DRM_FORMAT_XRGB8888,
 	DRM_FORMAT_XRGB16161616,
-	DRM_FORMAT_ARGB16161616
+	DRM_FORMAT_ARGB16161616,
+	DRM_FORMAT_RGB565
 };
 
 static struct drm_plane_state *
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index cb63a5da9af1..98da7bee0f4b 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -16,7 +16,8 @@
 static const u32 vkms_wb_formats[] = {
 	DRM_FORMAT_XRGB8888,
 	DRM_FORMAT_XRGB16161616,
-	DRM_FORMAT_ARGB16161616
+	DRM_FORMAT_ARGB16161616,
+	DRM_FORMAT_RGB565
 };
 
 static const struct drm_connector_funcs vkms_wb_connector_funcs = {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 1/9] drm: vkms: Alloc the compose frame using vzalloc
  2022-04-04 20:45 ` [PATCH v5 1/9] drm: vkms: Alloc the compose frame using vzalloc Igor Torrente
@ 2022-04-05 14:05   ` André Almeida
  2022-04-05 19:03     ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: André Almeida @ 2022-04-05 14:05 UTC (permalink / raw)
  To: Igor Torrente
  Cc: melissa.srw, hamohammed.sa, tzimmermann, rodrigosiqueiramelo,
	airlied, leandro.ribeiro, Melissa Wen, ppaalanen, dri-devel,
	tales.aparecida, ~lkcamp/patches

Hi Igor,

Thanks for your patch!

Às 17:45 de 04/04/22, Igor Torrente escreveu:
> Currently, the memory to the composition frame is being allocated using
> the kzmalloc. This comes with the limitation of maximum size of one
> page size(which in the x86_64 is 4Kb and 4MB for default and hugepage
> respectively).
>
> Somes test of igt (e.g. kms_plane@pixel-format) uses more than 4MB when
> testing some pixel formats like ARGB16161616 and the following error were
> showing up when running kms_plane@plane-panning-bottom-right*:
>
> [drm:vkms_composer_worker [vkms]] *ERROR* Cannot allocate memory for
> output frame.
>
> This problem is addessed by allocating the memory using kvzalloc that

addessed -> addressed

OTOH, I would write this in imperative mood, as in "Address this by
allocating..." or "Fix this..."

> circunvents this limitation.

circunvents -> circumvents

>
> V5: Improve the commit message and drop the debugging issues in VKMS
> TO-DO(Melissa Wen).
>

Patch changelog are very useful for the mailing list, but not very
useful for the git log. For that reason, I usually put this right after
the --- in the patch, so the log will be dropped when the patch is applied.

Those comment applies for the rest of your series.

> Reviewed-by: Melissa Wen <mwen@igalia.com>
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 4/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation
  2022-04-04 20:45 ` [PATCH v5 4/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation Igor Torrente
@ 2022-04-05 14:21   ` André Almeida
  2022-04-05 19:05     ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: André Almeida @ 2022-04-05 14:21 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, ppaalanen, dri-devel,
	tales.aparecida, ~lkcamp/patches

Às 17:45 de 04/04/22, Igor Torrente escreveu:
> Add a helper function to validate the connector configuration receive in

Maybe it should be "received"

> the encoder atomic_check by the drivers.
> 
> So the drivers don't need do these common validations themselves.

"don't need do" -> "don't need to do"

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 1/9] drm: vkms: Alloc the compose frame using vzalloc
  2022-04-05 14:05   ` André Almeida
@ 2022-04-05 19:03     ` Igor Torrente
  0 siblings, 0 replies; 44+ messages in thread
From: Igor Torrente @ 2022-04-05 19:03 UTC (permalink / raw)
  To: André Almeida
  Cc: melissa.srw, hamohammed.sa, tzimmermann, rodrigosiqueiramelo,
	airlied, leandro.ribeiro, Melissa Wen, ppaalanen, dri-devel,
	tales.aparecida, ~lkcamp/patches

Hi André,

On 4/5/22 11:05, André Almeida wrote:
> Hi Igor,
> 
> Thanks for your patch!
> 
> Às 17:45 de 04/04/22, Igor Torrente escreveu:
>> Currently, the memory to the composition frame is being allocated using
>> the kzmalloc. This comes with the limitation of maximum size of one
>> page size(which in the x86_64 is 4Kb and 4MB for default and hugepage
>> respectively).
>>
>> Somes test of igt (e.g. kms_plane@pixel-format) uses more than 4MB when
>> testing some pixel formats like ARGB16161616 and the following error were
>> showing up when running kms_plane@plane-panning-bottom-right*:
>>
>> [drm:vkms_composer_worker [vkms]] *ERROR* Cannot allocate memory for
>> output frame.
>>
>> This problem is addessed by allocating the memory using kvzalloc that
> 
> addessed -> addressed
> 
> OTOH, I would write this in imperative mood, as in "Address this by
> allocating..." or "Fix this..."
> 
>> circunvents this limitation.
> 
> circunvents -> circumvents

Thanks, I will fix them!

> 
>>
>> V5: Improve the commit message and drop the debugging issues in VKMS
>> TO-DO(Melissa Wen).
>>
> 
> Patch changelog are very useful for the mailing list, but not very
> useful for the git log. For that reason, I usually put this right after
> the --- in the patch, so the log will be dropped when the patch is applied.
> 
> Those comment applies for the rest of your series.

Well, drivers in the DRM subsystem maintain the change history. As you
can see in the commit below.

4db3189ce0621be901f249f8cd8226c977dd601d
d80976d9ffd9d7f89a26134a299b236910477f3b
84ec374bd580364a32818c9fc269c19d6e931cab
50fff206c5e3a04fcb239ad58d89cad166711b7f

Aside from that, the current VKMS maintainer asked me to add them to the
commit body.

And for that two reasons, I will keep them.

Thanks!
---
Igor Torrente

> 
>> Reviewed-by: Melissa Wen <mwen@igalia.com>
>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>> ---

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 4/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation
  2022-04-05 14:21   ` André Almeida
@ 2022-04-05 19:05     ` Igor Torrente
  0 siblings, 0 replies; 44+ messages in thread
From: Igor Torrente @ 2022-04-05 19:05 UTC (permalink / raw)
  To: André Almeida
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, ppaalanen, dri-devel,
	tales.aparecida, ~lkcamp/patches

Hi André,

On 4/5/22 11:21, André Almeida wrote:
> Às 17:45 de 04/04/22, Igor Torrente escreveu:
>> Add a helper function to validate the connector configuration receive in
> 
> Maybe it should be "received"
> 
>> the encoder atomic_check by the drivers.
>>
>> So the drivers don't need do these common validations themselves.
> 
> "don't need do" -> "don't need to do"

Thanks, I will fix them too.

---
Igor Torrente

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job`
  2022-04-04 20:45 ` [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job` Igor Torrente
@ 2022-04-20 11:23   ` Pekka Paalanen
  2022-04-23 15:12     ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-20 11:23 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 9136 bytes --]

On Mon,  4 Apr 2022 17:45:11 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> This commit is the groundwork to introduce new formats to the planes and
> writeback buffer. As part of it, a new buffer metadata field is added to
> `vkms_writeback_job`, this metadata is represented by the `vkms_composer`
> struct.

Hi,

should this be talking about vkms_frame_info struct instead?

> 
> Also adds two new function pointers (`{wb,plane}_format_transform_func`)
> are defined to handle format conversion to/from internal format.
> 
> These things will allow us, in the future, to have different compositing
> and wb format types.
> 
> V2: Change the code to get the drm_framebuffer reference and not copy its
>     contents(Thomas Zimmermann).
> V3: Drop the refcount in the wb code(Thomas Zimmermann).
> V5: Add {wb,plane}_format_transform_func to vkms_writeback_job
>     and vkms_plane_state (Pekka Paalanen)
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c  |  4 ++--
>  drivers/gpu/drm/vkms/vkms_drv.h       | 31 +++++++++++++++++++++------
>  drivers/gpu/drm/vkms/vkms_plane.c     | 10 ++++-----
>  drivers/gpu/drm/vkms/vkms_writeback.c | 20 ++++++++++++++---
>  4 files changed, 49 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 2d946368a561..95029d2ebcac 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -153,7 +153,7 @@ static void compose_plane(struct vkms_frame_info *primary_plane_info,
>  			  struct vkms_frame_info *plane_frame_info,
>  			  void *vaddr_out)
>  {
> -	struct drm_framebuffer *fb = &plane_frame_info->fb;
> +	struct drm_framebuffer *fb = plane_frame_info->fb;
>  	void *vaddr;
>  	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>  
> @@ -175,7 +175,7 @@ static int compose_active_planes(void **vaddr_out,
>  				 struct vkms_frame_info *primary_plane_info,
>  				 struct vkms_crtc_state *crtc_state)
>  {
> -	struct drm_framebuffer *fb = &primary_plane_info->fb;
> +	struct drm_framebuffer *fb = primary_plane_info->fb;
>  	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>  	const void *vaddr;
>  	int i;
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 2e6342164bef..2704cfb6904b 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -22,13 +22,8 @@
>  
>  #define NUM_OVERLAY_PLANES 8
>  
> -struct vkms_writeback_job {
> -	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
> -	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
> -};
> -
>  struct vkms_frame_info {
> -	struct drm_framebuffer fb;
> +	struct drm_framebuffer *fb;
>  	struct drm_rect src, dst;
>  	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
>  	unsigned int offset;
> @@ -36,6 +31,29 @@ struct vkms_frame_info {
>  	unsigned int cpp;
>  };
>  
> +struct pixel_argb_u16 {
> +	u16 a, r, g, b;
> +};
> +
> +struct line_buffer {
> +	size_t n_pixels;
> +	struct pixel_argb_u16 *pixels;
> +};
> +
> +typedef void
> +(*wb_format_transform_func)(struct vkms_frame_info *frame_info,
> +			    const struct line_buffer *buffer, int y);
> +
> +typedef void
> +(*plane_format_transform_func)(struct line_buffer *buffer,
> +			       const struct vkms_frame_info *frame_info, int y);

It wasn't immediately obvious to me in which direction these function
types work from their names. The arguments are not wb and plane but
vkms_frame_info and line_buffer in both. The implementations of these
functions would have nothing specific to a wb or a plane either, would
they?

What about naming them frame_to_line_func and line_to_frame_func?

> +
> +struct vkms_writeback_job {
> +	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
> +	struct vkms_frame_info frame_info;

Which frame_info is this? Should the field be called wb_frame_info?

> +	wb_format_transform_func format_func;

line_to_frame_func wb_write;

perhaps? The type explains the general type of the function, and the
field name refers to what it is used for.

> +};
> +
>  /**
>   * vkms_plane_state - Driver specific plane state
>   * @base: base plane state
> @@ -44,6 +62,7 @@ struct vkms_frame_info {
>  struct vkms_plane_state {
>  	struct drm_shadow_plane_state base;
>  	struct vkms_frame_info *frame_info;
> +	plane_format_transform_func format_func;

Similarly here, maybe

frame_to_line_func plane_read;

perhaps?

>  };
>  
>  struct vkms_plane {
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index a56b0f76eddd..28752af0118c 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -50,12 +50,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
>  	struct vkms_plane_state *vkms_state = to_vkms_plane_state(old_state);
>  	struct drm_crtc *crtc = vkms_state->base.base.crtc;
>  
> -	if (crtc) {
> +	if (crtc && vkms_state->frame_info->fb) {
>  		/* dropping the reference we acquired in
>  		 * vkms_primary_plane_update()
>  		 */
> -		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
> -			drm_framebuffer_put(&vkms_state->frame_info->fb);
> +		if (drm_framebuffer_read_refcount(vkms_state->frame_info->fb))
> +			drm_framebuffer_put(vkms_state->frame_info->fb);
>  	}
>  
>  	kfree(vkms_state->frame_info);
> @@ -110,9 +110,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	frame_info = vkms_plane_state->frame_info;
>  	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
>  	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
> -	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
> +	frame_info->fb = fb;

This change, replacing the memcpy with storing a pointer, seems to be
another major point of this patch. Should it be a separate patch?
It doesn't seem to fit with the current commit message.

I have no idea what kind of locking or referencing a drm_framebuffer
would need, and I suspect that would be easier to review if it was a
patch of its own.

>  	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
> -	drm_framebuffer_get(&frame_info->fb);
> +	drm_framebuffer_get(frame_info->fb);

Does drm_framebuffer_get() not return anything?

To me it would be more idiomatic to write something like

	frame_info->fb = drm_framebuffer_get(fb);

I don't know if that pattern is used in the kernel, but I use it in
userspace to emphasise that frame_info owns a new reference rather than
borrowing someone else's.


Thanks,
pq

>  	frame_info->offset = fb->offsets[0];
>  	frame_info->pitch = fb->pitches[0];
>  	frame_info->cpp = fb->format->cpp[0];
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index 746cb0abc6ec..ad4bb1fb37ca 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -74,12 +74,15 @@ static int vkms_wb_prepare_job(struct drm_writeback_connector *wb_connector,
>  	if (!vkmsjob)
>  		return -ENOMEM;
>  
> -	ret = drm_gem_fb_vmap(job->fb, vkmsjob->map, vkmsjob->data);
> +	ret = drm_gem_fb_vmap(job->fb, vkmsjob->frame_info.map, vkmsjob->data);
>  	if (ret) {
>  		DRM_ERROR("vmap failed: %d\n", ret);
>  		goto err_kfree;
>  	}
>  
> +	vkmsjob->frame_info.fb = job->fb;
> +	drm_framebuffer_get(vkmsjob->frame_info.fb);
> +
>  	job->priv = vkmsjob;
>  
>  	return 0;
> @@ -98,7 +101,9 @@ static void vkms_wb_cleanup_job(struct drm_writeback_connector *connector,
>  	if (!job->fb)
>  		return;
>  
> -	drm_gem_fb_vunmap(job->fb, vkmsjob->map);
> +	drm_gem_fb_vunmap(job->fb, vkmsjob->frame_info.map);
> +
> +	drm_framebuffer_put(vkmsjob->frame_info.fb);
>  
>  	vkmsdev = drm_device_to_vkms_device(job->fb->dev);
>  	vkms_set_composer(&vkmsdev->output, false);
> @@ -115,14 +120,23 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>  	struct drm_writeback_connector *wb_conn = &output->wb_connector;
>  	struct drm_connector_state *conn_state = wb_conn->base.state;
>  	struct vkms_crtc_state *crtc_state = output->composer_state;
> +	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
> +	struct vkms_writeback_job *active_wb;
> +	struct vkms_frame_info *wb_frame_info;
>  
>  	if (!conn_state)
>  		return;
>  
>  	vkms_set_composer(&vkmsdev->output, true);
>  
> +	active_wb = conn_state->writeback_job->priv;
> +	wb_frame_info = &active_wb->frame_info;
> +
>  	spin_lock_irq(&output->composer_lock);
> -	crtc_state->active_writeback = conn_state->writeback_job->priv;
> +	crtc_state->active_writeback = active_wb;
> +	wb_frame_info->offset = fb->offsets[0];
> +	wb_frame_info->pitch = fb->pitches[0];
> +	wb_frame_info->cpp = fb->format->cpp[0];
>  	crtc_state->wb_pending = true;
>  	spin_unlock_irq(&output->composer_lock);
>  	drm_writeback_queue_job(wb_conn, connector_state);


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-04-04 20:45 ` [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats Igor Torrente
@ 2022-04-20 12:36   ` Pekka Paalanen
  2022-04-23 16:04     ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-20 12:36 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 22755 bytes --]

On Mon,  4 Apr 2022 17:45:12 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Currently the blend function only accepts XRGB_8888 and ARGB_8888
> as a color input.
> 
> This patch refactors all the functions related to the plane composition
> to overcome this limitation.
> 
> A new internal format(`struct pixel`) is introduced to deal with all

Hi,

struct pixel_argb_u16 was added in the previous patch.

> possible inputs. It consists of 16 bits fields that represent each of
> the channels.
> 
> The pixels blend is done using this internal format. And new handlers
> are being added to convert a specific format to/from this internal format.
> 
> So the blend operation depends on these handlers to convert to this common
> format. The blended result, if necessary, is converted to the writeback
> buffer format.
> 
> This patch introduces three major differences to the blend function.
> 1 - All the planes are blended at once.
> 2 - The blend calculus is done as per line instead of per pixel.
> 3 - It is responsible to calculates the CRC and writing the writeback
> buffer(if necessary).
> 
> These changes allow us to allocate way less memory in the intermediate
> buffer to compute these operations. Because now we don't need to
> have the entire intermediate image lines at once, just one line is
> enough.
> 
> | Memory consumption (output dimensions) |
> |:--------------------------------------:|
> |       Current      |     This patch    |
> |:------------------:|:-----------------:|
> |   Width * Heigth   |     2 * Width     |
> 
> Beyond memory, we also have a minor performance benefit from all
> these changes. Results running the IGT[1] test
> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
> 
> |                 Frametime                  |
> |:------------------------------------------:|
> |  Implementation |  Current  |  This commit |
> |:---------------:|:---------:|:------------:|
> | frametime range |  9~22 ms  |    5~17 ms   |
> |     Average     |  11.4 ms  |    7.8 ms    |
> 
> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
> 
> V2: Improves the performance drastically, by performing the operations
>     per-line and not per-pixel(Pekka Paalanen).
>     Minor improvements(Pekka Paalanen).
> V3: Changes the code to blend the planes all at once. This improves
>     performance, memory consumption, and removes much of the weirdness
>     of the V2(Pekka Paalanen and me).
>     Minor improvements(Pekka Paalanen and me).
> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
> V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen).
>     Several security/robustness improvents(Pekka Paalanen).
>     Removes check_planes_x_bounds function and allows partial
>     partly off-screen(Pekka Paalanen).
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  Documentation/gpu/vkms.rst            |   4 -
>  drivers/gpu/drm/vkms/Makefile         |   1 +
>  drivers/gpu/drm/vkms/vkms_composer.c  | 318 ++++++++++++--------------
>  drivers/gpu/drm/vkms/vkms_formats.c   | 151 ++++++++++++
>  drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
>  drivers/gpu/drm/vkms/vkms_plane.c     |   3 +
>  drivers/gpu/drm/vkms/vkms_writeback.c |   3 +
>  7 files changed, 311 insertions(+), 181 deletions(-)
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> 
> diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
> index 973e2d43108b..a49e4ae92653 100644
> --- a/Documentation/gpu/vkms.rst
> +++ b/Documentation/gpu/vkms.rst
> @@ -118,10 +118,6 @@ Add Plane Features
>  
>  There's lots of plane features we could add support for:
>  
> -- Clearing primary plane: clear primary plane before plane composition (at the
> -  start) for correctness of pixel blend ops. It also guarantees alpha channel
> -  is cleared in the target buffer for stable crc. [Good to get started]
> -
>  - ARGB format on primary plane: blend the primary plane into background with
>    translucent alpha.
>  
> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
> index 72f779cbfedd..1b28a6a32948 100644
> --- a/drivers/gpu/drm/vkms/Makefile
> +++ b/drivers/gpu/drm/vkms/Makefile
> @@ -3,6 +3,7 @@ vkms-y := \
>  	vkms_drv.o \
>  	vkms_plane.o \
>  	vkms_output.o \
> +	vkms_formats.o \
>  	vkms_crtc.o \
>  	vkms_composer.o \
>  	vkms_writeback.o
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 95029d2ebcac..cf24015bf90f 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c

(For this file, I have removed all the minus diff lines from below to
better see the new code.)


> @@ -7,204 +7,186 @@
>  #include <drm/drm_fourcc.h>
>  #include <drm/drm_gem_framebuffer_helper.h>
>  #include <drm/drm_vblank.h>
> +#include <linux/minmax.h>
>  
>  #include "vkms_drv.h"
>  
> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>  {
> +	u32 new_color;
>  
> +	new_color = (src * 0xffff + dst * (0xffff - alpha));
>  
> +	return DIV_ROUND_CLOSEST(new_color, 0xffff);

This looks good.

>  }
>  
>  /**
> + * pre_mul_alpha_blend - alpha blending equation
> + * @src_frame_info: source framebuffer's metadata
> + * @stage_buffer: The line with the pixels from src_plane
> + * @output_buffer: A line buffer that receives all the blends output
>   *
> + * Using the information from the `frame_info`, this blends only the
> + * necessary pixels from the `stage_buffer` to the `output_buffer`
> + * using premultiplied blend formula.
>   *
> + * The current DRM assumption is that pixel color values have been already
> + * pre-multiplied with the alpha channel values. See more
> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
> + * completely opaque background.
>   */
> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> +				struct line_buffer *stage_buffer,
> +				struct line_buffer *output_buffer)
>  {
> +	int x, x_dst = frame_info->dst.x1;
> +	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> +	struct pixel_argb_u16 *in = stage_buffer->pixels;
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    stage_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++) {
> +		out[x].a = (u16)0xffff;
> +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>  	}
>  }
>  
> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>  {
> +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
> +		return true;
>  
> +	return false;
>  }
>  
>  /**
> + * @wb_frame_info: The writeback frame buffer metadata
> + * @crtc_state: The crtc state
> + * @crc32: The crc output of the final frame
> + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
> + * @stage_buffer: The line with the pixels from plane being blend to the output
>   *
> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
> + * from all planes, calculates the crc32 of the output from the former step,
> + * and, if necessary, convert and store the output to the writeback buffer.
>   */
> +static void blend(struct vkms_writeback_job *wb,
> +		  struct vkms_crtc_state *crtc_state,
> +		  u32 *crc32, struct line_buffer *stage_buffer,
> +		  struct line_buffer *output_buffer, s64 row_size)
>  {
> +	struct vkms_plane_state **plane = crtc_state->active_planes;
> +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
> +	u32 n_active_planes = crtc_state->num_active_planes;
> +
> +	int y_dst = primary_plane_info->dst.y1;
> +	int h_dst = drm_rect_height(&primary_plane_info->dst);
> +	int y_limit = y_dst + h_dst;
> +	int y, i;
> +
> +	for (y = y_dst; y < y_limit; y++) {
> +		plane[0]->format_func(output_buffer, primary_plane_info, y);

This is a bad assumption, but the next patch removes the need for this
assumption. The primary plane may not be the bottom-most AFAIU.
Overlays below the primary exist on real hardware.

> +
> +		/* If there are other planes besides primary, we consider the active
> +		 * planes should be in z-order and compose them associatively:
> +		 * ((primary <- overlay) <- cursor)
> +		 */
> +		for (i = 1; i < n_active_planes; i++) {
> +			if (!check_y_limit(plane[i]->frame_info, y))
> +				continue;
> +
> +			plane[i]->format_func(stage_buffer, plane[i]->frame_info, y);
> +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> +					    output_buffer);
> +		}
>  
> +		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
>  
> +		if (wb)
> +			wb->format_func(&wb->frame_info, output_buffer, y);
>  	}
>  }
>  
> +static int check_format_funcs(struct vkms_crtc_state *crtc_state,
> +			      struct vkms_writeback_job *active_wb)
>  {
> +	struct vkms_plane_state **planes = crtc_state->active_planes;
> +	u32 n_active_planes = crtc_state->num_active_planes;
> +	int i;
>  
> +	for (i = 0; i < n_active_planes; i++)
> +		if (!planes[i]->format_func)
> +			return -1;
>  
> +	if (active_wb && !active_wb->format_func)
> +		return -1;
>  
> +	return 0;
>  }
>  
> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
> +				 struct vkms_crtc_state *crtc_state,
> +				 u32 *crc32)
>  {
> +	int line_width, ret = 0, pixel_size = sizeof(struct pixel_argb_u16);
> +	struct vkms_frame_info *primary_plane_info = NULL;
> +	struct line_buffer output_buffer, stage_buffer;
> +	struct vkms_plane_state *act_plane = NULL;
> +	u32 wb_format;
>  
> +	if (WARN_ON(pixel_size != 8))

Isn't there a compile-time assert macro for this? Having to actually
run VKMS to check for this reduces the chances of finding it early.
What's the reason for this check anyway?

> +		return -EINVAL;
> +
> +	if (crtc_state->num_active_planes >= 1) {
> +		act_plane = crtc_state->active_planes[0];
> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> +			primary_plane_info = act_plane->frame_info;

After the next patch, do you even need the primary plane for anything
specifically? There is the map_is_null check below, but that should be
done on all planes in the array, right?

I suspect the next patch, or another patch in this series, should just
delete this chunk.

>  	}
>  
> +	if (!primary_plane_info)
> +		return -EINVAL;
> +
>  	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>  		return -EINVAL;
>  
> +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
> +		return -EINVAL;
>  
> +	line_width = drm_rect_width(&primary_plane_info->dst);
> +	stage_buffer.n_pixels = line_width;
> +	output_buffer.n_pixels = line_width;
>  
> +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> +	if (!stage_buffer.pixels) {
> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
> +		return -ENOMEM;
> +	}
>  
> +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> +	if (!output_buffer.pixels) {
> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
> +		ret = -ENOMEM;
> +		goto free_stage_buffer;
> +	}
> +
> +	if (active_wb) {
> +		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
> +
> +		wb_format = wb_frame_info->fb->format->format;

I don't see wb_format being used, is it?

> +		wb_frame_info->src = primary_plane_info->src;
> +		wb_frame_info->dst = primary_plane_info->dst;
> +	}
> +
> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
> +	      &output_buffer, (s64)line_width * pixel_size);

What's the (s64) doing here?

Are byte sizes not usually expressed with size_t or ssize_t types, or
is the kernel convention to use u64 and s64?

This makes me suspect that pixel_offset() and friends in vkms_format.c
are going to need fixing as well. int type overflows at 2G.

> +
> +	kvfree(output_buffer.pixels);
> +free_stage_buffer:
> +	kvfree(stage_buffer.pixels);
> +
> +	return ret;
>  }
>  
>  /**
> @@ -222,13 +204,11 @@ void vkms_composer_worker(struct work_struct *work)
>  						struct vkms_crtc_state,
>  						composer_work);
>  	struct drm_crtc *crtc = crtc_state->base.crtc;
> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>  	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>  	bool crc_pending, wb_pending;
>  	u64 frame_start, frame_end;
> +	u32 crc32 = 0;
>  	int ret;
>  
>  	spin_lock_irq(&out->composer_lock);
> @@ -248,35 +228,19 @@ void vkms_composer_worker(struct work_struct *work)
>  	if (!crc_pending)
>  		return;
>  
>  	if (wb_pending)
> +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
> +	else
> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>  
> +	if (ret)
>  		return;
>  
>  	if (wb_pending) {
>  		drm_writeback_signal_completion(&out->wb_connector, 0);
>  		spin_lock_irq(&out->composer_lock);
>  		crtc_state->wb_pending = false;
>  		spin_unlock_irq(&out->composer_lock);
>  	}
>  
>  	/*
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> new file mode 100644
> index 000000000000..931a61405d6a
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -0,0 +1,151 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +#include <drm/drm_rect.h>
> +#include <linux/minmax.h>
> +
> +#include "vkms_formats.h"
> +
> +static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> +{
> +	return frame_info->offset + (y * frame_info->pitch)
> +				  + (x * frame_info->cpp);
> +}
> +
> +/*
> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> + *
> + * @frame_info: Buffer metadata
> + * @x: The x(width) coordinate of the 2D buffer
> + * @y: The y(Heigth) coordinate of the 2D buffer
> + *
> + * Takes the information stored in the frame_info, a pair of coordinates, and
> + * returns the address of the first color channel.
> + * This function assumes the channels are packed together, i.e. a color channel
> + * comes immediately after another in the memory. And therefore, this function
> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> + */
> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> +				int x, int y)
> +{
> +	int offset = pixel_offset(frame_info, x, y);
> +
> +	return (u8 *)frame_info->map[0].vaddr + offset;
> +}
> +
> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> +{
> +	int x_src = frame_info->src.x1 >> 16;
> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
> +
> +	return packed_pixels_addr(frame_info, x_src, y_src);
> +}
> +
> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> +				 const struct vkms_frame_info *frame_info, int y)
> +{
> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			       stage_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> +		/*
> +		 * The 257 is the "conversion ratio". This number is obtained by the
> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
> +		 * the best color value in a pixel format with more possibilities.
> +		 * A similar idea applies to others RGB color conversions.
> +		 */
> +		out_pixels[x].a = (u16)src_pixels[3] * 257;
> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> +	}
> +}
> +
> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> +				 const struct vkms_frame_info *frame_info, int y)
> +{
> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			       stage_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> +		out_pixels[x].a = (u16)0xffff;
> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> +	}
> +}
> +
> +/*
> + * The following  functions take an line of argb_u16 pixels from the
> + * src_buffer, convert them to a specific format, and store them in the
> + * destination.
> + *
> + * They are used in the `compose_active_planes` to convert and store a line
> + * from the src_buffer to the writeback buffer.
> + */
> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
> +				 const struct line_buffer *src_buffer, int y)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    src_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		/*
> +		 * This sequence below is important because the format's byte order is
> +		 * in little-endian. In the case of the ARGB8888 the memory is
> +		 * organized this way:
> +		 *
> +		 * | Addr     | = blue channel
> +		 * | Addr + 1 | = green channel
> +		 * | Addr + 2 | = Red channel
> +		 * | Addr + 3 | = Alpha channel
> +		 */
> +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> +	}
> +}
> +
> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
> +				 const struct line_buffer *src_buffer, int y)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    src_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		dst_pixels[3] = (u8)0xff;

When writing to XRGB, it's not necessary to ensure the X channel has
any sensible value. Anyone reading from XRGB must ignore that value
anyway. So why not write something wacky here, like 0xa1, that is far
enough from both 0x00 or 0xff to not be confused with them even
visually? Also not 0x7f or 0x80 which are close to half of 0xff.

Or, you could save a whole function and just use argb_u16_to_ARGBxxxx()
instead, even for XRGB destination.

> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> +	}
> +}
> +
> +plane_format_transform_func get_plane_fmt_transform_function(u32 format)
> +{
> +	if (format == DRM_FORMAT_ARGB8888)
> +		return &ARGB8888_to_argb_u16;
> +	else if (format == DRM_FORMAT_XRGB8888)
> +		return &XRGB8888_to_argb_u16;
> +	else
> +		return NULL;

This works for now, but when more formats are added, I'd think a switch
statement would look better.

> +}
> +
> +wb_format_transform_func get_wb_fmt_transform_function(u32 format)
> +{
> +	if (format == DRM_FORMAT_ARGB8888)
> +		return &argb_u16_to_ARGB8888;
> +	else if (format == DRM_FORMAT_XRGB8888)
> +		return &argb_u16_to_XRGB8888;
> +	else
> +		return NULL;
> +}
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> new file mode 100644
> index 000000000000..adc5a17b9584
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +#ifndef _VKMS_FORMATS_H_
> +#define _VKMS_FORMATS_H_
> +
> +#include "vkms_drv.h"
> +
> +plane_format_transform_func get_plane_fmt_transform_function(u32 format);
> +
> +wb_format_transform_func get_wb_fmt_transform_function(u32 format);

This is good, exposing only what is necessary.


Thanks,
pq

> +
> +#endif /* _VKMS_FORMATS_H_ */
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 28752af0118c..798243837fd0 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -10,6 +10,7 @@
>  #include <drm/drm_plane_helper.h>
>  
>  #include "vkms_drv.h"
> +#include "vkms_formats.h"
>  
>  static const u32 vkms_formats[] = {
>  	DRM_FORMAT_XRGB8888,
> @@ -100,6 +101,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	struct drm_shadow_plane_state *shadow_plane_state;
>  	struct drm_framebuffer *fb = new_state->fb;
>  	struct vkms_frame_info *frame_info;
> +	u32 fmt = fb->format->format;
>  
>  	if (!new_state->crtc || !fb)
>  		return;
> @@ -116,6 +118,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	frame_info->offset = fb->offsets[0];
>  	frame_info->pitch = fb->pitches[0];
>  	frame_info->cpp = fb->format->cpp[0];
> +	vkms_plane_state->format_func = get_plane_fmt_transform_function(fmt);
>  }
>  
>  static int vkms_plane_atomic_check(struct drm_plane *plane,
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index ad4bb1fb37ca..97f71e784bbf 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -11,6 +11,7 @@
>  #include <drm/drm_gem_shmem_helper.h>
>  
>  #include "vkms_drv.h"
> +#include "vkms_formats.h"
>  
>  static const u32 vkms_wb_formats[] = {
>  	DRM_FORMAT_XRGB8888,
> @@ -123,6 +124,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>  	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
>  	struct vkms_writeback_job *active_wb;
>  	struct vkms_frame_info *wb_frame_info;
> +	u32 wb_format = fb->format->format;
>  
>  	if (!conn_state)
>  		return;
> @@ -140,6 +142,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>  	crtc_state->wb_pending = true;
>  	spin_unlock_irq(&output->composer_lock);
>  	drm_writeback_queue_job(wb_conn, connector_state);
> +	active_wb->format_func = get_wb_fmt_transform_function(wb_format);
>  }
>  
>  static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs = {


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 7/9] drm: vkms: Supports to the case where primary plane doesn't match the CRTC
  2022-04-04 20:45 ` [PATCH v5 7/9] drm: vkms: Supports to the case where primary plane doesn't match the CRTC Igor Torrente
@ 2022-04-20 13:13   ` Pekka Paalanen
  2022-04-24  0:41     ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-20 13:13 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 7509 bytes --]

On Mon,  4 Apr 2022 17:45:13 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> We will break the current assumption that the primary plane has the

Hi,

I'd say "remove" rather than "break". Breaking sounds bad but this is
good. :-)

> same size and position as CRTC.

...and that the primary plane is the bottom-most in zpos order, or is
even enabled. At least as far as the blending machinery is concerned.

> 
> For that we will add CRTC dimension information to `vkms_crtc_state`
> and add a opaque black backgound color.
> 
> Because now we need to fill the background, we had a loss in
> performance with this change. Results running the IGT[1] test
> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
> 
> |                  Frametime                   |
> |:--------------------------------------------:|
> |  Implementation |  Previous |   This commit  |
> |:---------------:|:---------:|:--------------:|
> | frametime range |  5~18 ms  |     10~22 ms   |
> |     Average     |  8.47 ms  |     12.32 ms   |
> 
> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  Documentation/gpu/vkms.rst           |  3 +--
>  drivers/gpu/drm/vkms/vkms_composer.c | 32 +++++++++++++++++++---------
>  drivers/gpu/drm/vkms/vkms_crtc.c     |  4 ++++
>  drivers/gpu/drm/vkms/vkms_drv.h      |  2 ++
>  4 files changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
> index a49e4ae92653..49db221c0f52 100644
> --- a/Documentation/gpu/vkms.rst
> +++ b/Documentation/gpu/vkms.rst
> @@ -121,8 +121,7 @@ There's lots of plane features we could add support for:
>  - ARGB format on primary plane: blend the primary plane into background with
>    translucent alpha.
>  
> -- Support when the primary plane isn't exactly matching the output size: blend
> -  the primary plane into the black background.
> +- Add background color KMS property[Good to get started].
>  
>  - Full alpha blending on all planes.
>  
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index cf24015bf90f..f80842227669 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -61,6 +61,15 @@ static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>  	return false;
>  }
>  
> +static void fill_background(struct pixel_argb_u16 *backgroud_color,

Hi,

this could be const struct pixel_argb_u16 *. Also a typo: missing n in
backgroud_color.

> +			    struct line_buffer *output_buffer)
> +{
> +	int i;
> +
> +	for (i = 0; i < output_buffer->n_pixels; i++)
> +		output_buffer->pixels[i] = *backgroud_color;
> +}
> +
>  /**
>   * @wb_frame_info: The writeback frame buffer metadata
>   * @crtc_state: The crtc state
> @@ -78,22 +87,23 @@ static void blend(struct vkms_writeback_job *wb,
>  		  struct line_buffer *output_buffer, s64 row_size)
>  {
>  	struct vkms_plane_state **plane = crtc_state->active_planes;
> -	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
>  	u32 n_active_planes = crtc_state->num_active_planes;
>  
> -	int y_dst = primary_plane_info->dst.y1;
> -	int h_dst = drm_rect_height(&primary_plane_info->dst);
> -	int y_limit = y_dst + h_dst;
> +	struct pixel_argb_u16 background_color = (struct pixel_argb_u16) {
> +		.a = 0xffff
> +	};

Could be const and shorter, if that fits the kernel style:

	const struct pixel_arb_u16 background_color = { .a = 0xffff };

> +
> +	int crtc_y_limit = crtc_state->crtc_height;
>  	int y, i;
>  
> -	for (y = y_dst; y < y_limit; y++) {
> -		plane[0]->format_func(output_buffer, primary_plane_info, y);
> +	for (y = 0; y < crtc_y_limit; y++) {
> +		fill_background(&background_color, output_buffer);
>  
>  		/* If there are other planes besides primary, we consider the active
>  		 * planes should be in z-order and compose them associatively:

Is "associatively" the right word here?

>  		 * ((primary <- overlay) <- cursor)

The example (primary <- overlay) is not generally true with real hardware.

Maybe what you are trying to say is: The active planes are composed in
z-order.

>  		 */
> -		for (i = 1; i < n_active_planes; i++) {
> +		for (i = 0; i < n_active_planes; i++) {
>  			if (!check_y_limit(plane[i]->frame_info, y))
>  				continue;
>  
> @@ -154,7 +164,7 @@ static int compose_active_planes(struct vkms_writeback_job *active_wb,

As I mentioned on the previous patch, I think the finding of primary
plane (which was generally incorrect) should be removed here.

>  	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
>  		return -EINVAL;
>  
> -	line_width = drm_rect_width(&primary_plane_info->dst);
> +	line_width = crtc_state->crtc_width;
>  	stage_buffer.n_pixels = line_width;
>  	output_buffer.n_pixels = line_width;
>  
> @@ -175,8 +185,10 @@ static int compose_active_planes(struct vkms_writeback_job *active_wb,
>  		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>  
>  		wb_format = wb_frame_info->fb->format->format;
> -		wb_frame_info->src = primary_plane_info->src;
> -		wb_frame_info->dst = primary_plane_info->dst;
> +		drm_rect_init(&wb_frame_info->src, 0, 0, crtc_state->crtc_width,
> +			      crtc_state->crtc_height);
> +		drm_rect_init(&wb_frame_info->dst, 0, 0, crtc_state->crtc_width,
> +			      crtc_state->crtc_height);

Why are these not set when the active_wb->frame_info is created? Can
the CRTC (video mode) be smaller than the wb buffer?

Somewhere you must have a check that wb buffer size can fit the crtc
size, or maybe they must be exactly the same size. At least setting
destination rectangle bigger than the buffer dimensions must be
impossible.

>  	}
>  
>  	blend(active_wb, crtc_state, crc32, &stage_buffer,
> diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
> index 57bbd32e9beb..4a37e243c2d7 100644
> --- a/drivers/gpu/drm/vkms/vkms_crtc.c
> +++ b/drivers/gpu/drm/vkms/vkms_crtc.c
> @@ -248,7 +248,9 @@ static void vkms_crtc_atomic_begin(struct drm_crtc *crtc,
>  static void vkms_crtc_atomic_flush(struct drm_crtc *crtc,
>  				   struct drm_atomic_state *state)
>  {
> +	struct drm_crtc_state *crtc_state = drm_atomic_get_new_crtc_state(state, crtc);
>  	struct vkms_output *vkms_output = drm_crtc_to_vkms_output(crtc);
> +	struct drm_display_mode *mode = &crtc_state->mode;
>  
>  	if (crtc->state->event) {
>  		spin_lock(&crtc->dev->event_lock);
> @@ -264,6 +266,8 @@ static void vkms_crtc_atomic_flush(struct drm_crtc *crtc,
>  	}
>  
>  	vkms_output->composer_state = to_vkms_crtc_state(crtc->state);
> +	vkms_output->composer_state->crtc_width = mode->hdisplay;
> +	vkms_output->composer_state->crtc_height = mode->vdisplay;

Is the crtc not keeping track of the current mode, do you really need
your own crtc_width and crtc_height?


Thanks,
pq

>  
>  	spin_unlock_irq(&vkms_output->lock);
>  }
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 2704cfb6904b..ab92d9f7b701 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -90,6 +90,8 @@ struct vkms_crtc_state {
>  	bool wb_pending;
>  	u64 frame_start;
>  	u64 frame_end;
> +	u16 crtc_width;
> +	u16 crtc_height;
>  };
>  
>  struct vkms_output {


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
  2022-04-04 20:45 ` [PATCH v5 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats Igor Torrente
@ 2022-04-20 13:19   ` Pekka Paalanen
  2022-05-07  7:32   ` Thomas Zimmermann
  1 sibling, 0 replies; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-20 13:19 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 6219 bytes --]

On Mon,  4 Apr 2022 17:45:14 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> This will be useful to write tests that depends on these formats.
> 
> ARGB and XRGB follows the a similar implementation of the former formats.
> Just adjusting for 16 bits per channel.
> 
> V3: Adapt the handlers to the new format introduced in patch 7 V3.
> V5: Minor improvements
>     Added le16_to_cpu/cpu_to_le16 to the 16 bits color read/writes.
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  drivers/gpu/drm/vkms/vkms_formats.c   | 77 +++++++++++++++++++++++++++
>  drivers/gpu/drm/vkms/vkms_plane.c     |  5 +-
>  drivers/gpu/drm/vkms/vkms_writeback.c |  2 +
>  3 files changed, 83 insertions(+), 1 deletion(-)

Hi,

aside from the comments I already gave on the other patch adding the
*RGB8888 variants that apply here too, this looks good to me, with or
without those changes.

Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.com>


Thanks,
pq

> 
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 931a61405d6a..8d913fa7dbde 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -78,6 +78,41 @@ static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>  	}
>  }
>  
> +static void ARGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
> +				     const struct vkms_frame_info *frame_info,
> +				     int y)
> +{
> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> +	u16 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			       stage_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> +		out_pixels[x].a = le16_to_cpu(src_pixels[3]);
> +		out_pixels[x].r = le16_to_cpu(src_pixels[2]);
> +		out_pixels[x].g = le16_to_cpu(src_pixels[1]);
> +		out_pixels[x].b = le16_to_cpu(src_pixels[0]);
> +	}
> +}
> +
> +static void XRGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
> +				     const struct vkms_frame_info *frame_info,
> +				     int y)
> +{
> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> +	u16 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			       stage_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> +		out_pixels[x].a = (u16)0xffff;
> +		out_pixels[x].r = le16_to_cpu(src_pixels[2]);
> +		out_pixels[x].g = le16_to_cpu(src_pixels[1]);
> +		out_pixels[x].b = le16_to_cpu(src_pixels[0]);
> +	}
> +}
> +
> +
>  /*
>   * The following  functions take an line of argb_u16 pixels from the
>   * src_buffer, convert them to a specific format, and store them in the
> @@ -130,12 +165,50 @@ static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
>  	}
>  }
>  
> +static void argb_u16_to_ARGB16161616(struct vkms_frame_info *frame_info,
> +				     const struct line_buffer *src_buffer, int y)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    src_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		dst_pixels[3] = cpu_to_le16(in_pixels[x].a);
> +		dst_pixels[2] = cpu_to_le16(in_pixels[x].r);
> +		dst_pixels[1] = cpu_to_le16(in_pixels[x].g);
> +		dst_pixels[0] = cpu_to_le16(in_pixels[x].b);
> +	}
> +}
> +
> +static void argb_u16_to_XRGB16161616(struct vkms_frame_info *frame_info,
> +				     const struct line_buffer *src_buffer, int y)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    src_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		dst_pixels[3] = (u8)0xffff;
> +		dst_pixels[2] = cpu_to_le16(in_pixels[x].r);
> +		dst_pixels[1] = cpu_to_le16(in_pixels[x].g);
> +		dst_pixels[0] = cpu_to_le16(in_pixels[x].b);
> +	}
> +}
> +
>  plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>  {
>  	if (format == DRM_FORMAT_ARGB8888)
>  		return &ARGB8888_to_argb_u16;
>  	else if (format == DRM_FORMAT_XRGB8888)
>  		return &XRGB8888_to_argb_u16;
> +	else if (format == DRM_FORMAT_ARGB16161616)
> +		return &ARGB16161616_to_argb_u16;
> +	else if (format == DRM_FORMAT_XRGB16161616)
> +		return &XRGB16161616_to_argb_u16;
>  	else
>  		return NULL;
>  }
> @@ -146,6 +219,10 @@ wb_format_transform_func get_wb_fmt_transform_function(u32 format)
>  		return &argb_u16_to_ARGB8888;
>  	else if (format == DRM_FORMAT_XRGB8888)
>  		return &argb_u16_to_XRGB8888;
> +	else if (format == DRM_FORMAT_ARGB16161616)
> +		return &argb_u16_to_ARGB16161616;
> +	else if (format == DRM_FORMAT_XRGB16161616)
> +		return &argb_u16_to_XRGB16161616;
>  	else
>  		return NULL;
>  }
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 798243837fd0..60054a85204a 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -14,11 +14,14 @@
>  
>  static const u32 vkms_formats[] = {
>  	DRM_FORMAT_XRGB8888,
> +	DRM_FORMAT_XRGB16161616
>  };
>  
>  static const u32 vkms_plane_formats[] = {
>  	DRM_FORMAT_ARGB8888,
> -	DRM_FORMAT_XRGB8888
> +	DRM_FORMAT_XRGB8888,
> +	DRM_FORMAT_XRGB16161616,
> +	DRM_FORMAT_ARGB16161616
>  };
>  
>  static struct drm_plane_state *
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index 97f71e784bbf..cb63a5da9af1 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -15,6 +15,8 @@
>  
>  static const u32 vkms_wb_formats[] = {
>  	DRM_FORMAT_XRGB8888,
> +	DRM_FORMAT_XRGB16161616,
> +	DRM_FORMAT_ARGB16161616
>  };
>  
>  static const struct drm_connector_funcs vkms_wb_connector_funcs = {

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 9/9] drm: vkms: Add support to the RGB565 format
  2022-04-04 20:45 ` [PATCH v5 9/9] drm: vkms: Add support to the RGB565 format Igor Torrente
@ 2022-04-21 10:58   ` Pekka Paalanen
  2022-04-27  0:53     ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-21 10:58 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 7506 bytes --]

On Mon,  4 Apr 2022 17:45:15 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Adds this common format to vkms.
> 
> This commit also adds new helper macros to deal with fixed-point
> arithmetic.
> 
> It was done to improve the precision of the conversion to ARGB16161616
> since the "conversion ratio" is not an integer.
> 
> V3: Adapt the handlers to the new format introduced in patch 7 V3.
> V5: Minor improvements
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  drivers/gpu/drm/vkms/vkms_formats.c   | 70 +++++++++++++++++++++++++++
>  drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
>  drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
>  3 files changed, 76 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 8d913fa7dbde..4af8b295f31e 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -5,6 +5,23 @@
>  
>  #include "vkms_formats.h"
>  
> +/* The following macros help doing fixed point arithmetic. */
> +/*
> + * With Fixed-Point scale 15 we have 17 and 15 bits of integer and fractional
> + * parts respectively.
> + *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
> + * 31                                          0
> + */
> +#define FIXED_SCALE 15

I think this would usually be called a "shift" since it's used in
bit-shifts.

> +
> +#define INT_TO_FIXED(a) ((a) << FIXED_SCALE)
> +#define FIXED_MUL(a, b) ((s32)(((s64)(a) * (b)) >> FIXED_SCALE))
> +#define FIXED_DIV(a, b) ((s32)(((s64)(a) << FIXED_SCALE) / (b)))

A truncating div, ok.

> +/* This macro converts a fixed point number to int, and round half up it */
> +#define FIXED_TO_INT_ROUND(a) (((a) + (1 << (FIXED_SCALE - 1))) >> FIXED_SCALE)

Yes.

> +/* Convert divisor and dividend to Fixed-Point and performs the division */
> +#define INT_TO_FIXED_DIV(a, b) (FIXED_DIV(INT_TO_FIXED(a), INT_TO_FIXED(b)))

Ok, this is obvious to read, even though it's the same as FIXED_DIV()
alone. Not sure the compiler would optimize that extra bit-shift away...

If one wanted to, it would be possible to write type-safe functions for
these so that fixed and integer could not be mixed up.

> +
>  static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
>  {
>  	return frame_info->offset + (y * frame_info->pitch)
> @@ -112,6 +129,30 @@ static void XRGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
>  	}
>  }
>  
> +static void RGB565_to_argb_u16(struct line_buffer *stage_buffer,
> +			       const struct vkms_frame_info *frame_info, int y)
> +{
> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> +	u16 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			       stage_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels++) {
> +		u16 rgb_565 = le16_to_cpu(*src_pixels);
> +		int fp_r = INT_TO_FIXED((rgb_565 >> 11) & 0x1f);
> +		int fp_g = INT_TO_FIXED((rgb_565 >> 5) & 0x3f);
> +		int fp_b = INT_TO_FIXED(rgb_565 & 0x1f);
> +
> +		int fp_rb_ratio = INT_TO_FIXED_DIV(65535, 31);
> +		int fp_g_ratio = INT_TO_FIXED_DIV(65535, 63);

These two should be outside of the loop since they are constants.
Likely no difference for performance because the compiler is probably
doing that already, but I think it would read better.

> +
> +		out_pixels[x].a = (u16)0xffff;
> +		out_pixels[x].r = FIXED_TO_INT_ROUND(FIXED_MUL(fp_r, fp_rb_ratio));
> +		out_pixels[x].g = FIXED_TO_INT_ROUND(FIXED_MUL(fp_g, fp_g_ratio));
> +		out_pixels[x].b = FIXED_TO_INT_ROUND(FIXED_MUL(fp_b, fp_rb_ratio));

Looks good.

> +	}
> +}
> +
>  
>  /*
>   * The following  functions take an line of argb_u16 pixels from the
> @@ -199,6 +240,31 @@ static void argb_u16_to_XRGB16161616(struct vkms_frame_info *frame_info,
>  	}
>  }
>  
> +static void argb_u16_to_RGB565(struct vkms_frame_info *frame_info,
> +			       const struct line_buffer *src_buffer, int y)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    src_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels++) {
> +		int fp_r = INT_TO_FIXED(in_pixels[x].r);
> +		int fp_g = INT_TO_FIXED(in_pixels[x].g);
> +		int fp_b = INT_TO_FIXED(in_pixels[x].b);
> +
> +		int fp_rb_ratio = INT_TO_FIXED_DIV(65535, 31);
> +		int fp_g_ratio = INT_TO_FIXED_DIV(65535, 63);

Move these out of the loop.

> +
> +		u16 r = FIXED_TO_INT_ROUND(FIXED_DIV(fp_r, fp_rb_ratio));
> +		u16 g = FIXED_TO_INT_ROUND(FIXED_DIV(fp_g, fp_g_ratio));
> +		u16 b = FIXED_TO_INT_ROUND(FIXED_DIV(fp_b, fp_rb_ratio));
> +
> +		*dst_pixels = cpu_to_le16(r << 11 | g << 5 | b);

Looks good.

You are using signed variables (int, s64, s32) when negative values
should never occur. It doesn't seem wrong, just unexpected.

The use of int in code vs. s32 in the macros is a bit inconsistent as
well.

> +	}
> +}
> +
>  plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>  {
>  	if (format == DRM_FORMAT_ARGB8888)
> @@ -209,6 +275,8 @@ plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>  		return &ARGB16161616_to_argb_u16;
>  	else if (format == DRM_FORMAT_XRGB16161616)
>  		return &XRGB16161616_to_argb_u16;
> +	else if (format == DRM_FORMAT_RGB565)
> +		return &RGB565_to_argb_u16;
>  	else
>  		return NULL;
>  }
> @@ -223,6 +291,8 @@ wb_format_transform_func get_wb_fmt_transform_function(u32 format)
>  		return &argb_u16_to_ARGB16161616;
>  	else if (format == DRM_FORMAT_XRGB16161616)
>  		return &argb_u16_to_XRGB16161616;
> +	else if (format == DRM_FORMAT_RGB565)
> +		return &argb_u16_to_RGB565;

Now it's starting to become clear that a switch statement would be nice.

>  	else
>  		return NULL;
>  }
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 60054a85204a..94a8e412886f 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -14,14 +14,16 @@
>  
>  static const u32 vkms_formats[] = {
>  	DRM_FORMAT_XRGB8888,
> -	DRM_FORMAT_XRGB16161616
> +	DRM_FORMAT_XRGB16161616,
> +	DRM_FORMAT_RGB565
>  };
>  
>  static const u32 vkms_plane_formats[] = {
>  	DRM_FORMAT_ARGB8888,
>  	DRM_FORMAT_XRGB8888,
>  	DRM_FORMAT_XRGB16161616,
> -	DRM_FORMAT_ARGB16161616
> +	DRM_FORMAT_ARGB16161616,
> +	DRM_FORMAT_RGB565
>  };
>  
>  static struct drm_plane_state *
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index cb63a5da9af1..98da7bee0f4b 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -16,7 +16,8 @@
>  static const u32 vkms_wb_formats[] = {
>  	DRM_FORMAT_XRGB8888,
>  	DRM_FORMAT_XRGB16161616,
> -	DRM_FORMAT_ARGB16161616
> +	DRM_FORMAT_ARGB16161616,
> +	DRM_FORMAT_RGB565
>  };
>  
>  static const struct drm_connector_funcs vkms_wb_connector_funcs = {

I wonder, would it be possible to add a unit test to make sure that
get_plane_fmt_transform_function() or get_wb_fmt_transform_function()
does not return NULL for any of the listed formats, respectively?
Or is that too paranoid?


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job`
  2022-04-20 11:23   ` Pekka Paalanen
@ 2022-04-23 15:12     ` Igor Torrente
  2022-04-25  7:56       ` Pekka Paalanen
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-23 15:12 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

Hi Pekka,

On 4/20/22 08:23, Pekka Paalanen wrote:
> On Mon,  4 Apr 2022 17:45:11 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
> 
>> This commit is the groundwork to introduce new formats to the planes and
>> writeback buffer. As part of it, a new buffer metadata field is added to
>> `vkms_writeback_job`, this metadata is represented by the `vkms_composer`
>> struct.
> 
> Hi,
> 
> should this be talking about vkms_frame_info struct instead?

Yes it should. I will fix this. Thanks.

> 
>>
>> Also adds two new function pointers (`{wb,plane}_format_transform_func`)
>> are defined to handle format conversion to/from internal format.
>>
>> These things will allow us, in the future, to have different compositing
>> and wb format types.
>>
>> V2: Change the code to get the drm_framebuffer reference and not copy its
>>      contents(Thomas Zimmermann).
>> V3: Drop the refcount in the wb code(Thomas Zimmermann).
>> V5: Add {wb,plane}_format_transform_func to vkms_writeback_job
>>      and vkms_plane_state (Pekka Paalanen)
>>
>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>> ---
>>   drivers/gpu/drm/vkms/vkms_composer.c  |  4 ++--
>>   drivers/gpu/drm/vkms/vkms_drv.h       | 31 +++++++++++++++++++++------
>>   drivers/gpu/drm/vkms/vkms_plane.c     | 10 ++++-----
>>   drivers/gpu/drm/vkms/vkms_writeback.c | 20 ++++++++++++++---
>>   4 files changed, 49 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>> index 2d946368a561..95029d2ebcac 100644
>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>> @@ -153,7 +153,7 @@ static void compose_plane(struct vkms_frame_info *primary_plane_info,
>>   			  struct vkms_frame_info *plane_frame_info,
>>   			  void *vaddr_out)
>>   {
>> -	struct drm_framebuffer *fb = &plane_frame_info->fb;
>> +	struct drm_framebuffer *fb = plane_frame_info->fb;
>>   	void *vaddr;
>>   	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>>   
>> @@ -175,7 +175,7 @@ static int compose_active_planes(void **vaddr_out,
>>   				 struct vkms_frame_info *primary_plane_info,
>>   				 struct vkms_crtc_state *crtc_state)
>>   {
>> -	struct drm_framebuffer *fb = &primary_plane_info->fb;
>> +	struct drm_framebuffer *fb = primary_plane_info->fb;
>>   	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>>   	const void *vaddr;
>>   	int i;
>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>> index 2e6342164bef..2704cfb6904b 100644
>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>> @@ -22,13 +22,8 @@
>>   
>>   #define NUM_OVERLAY_PLANES 8
>>   
>> -struct vkms_writeback_job {
>> -	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
>> -	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
>> -};
>> -
>>   struct vkms_frame_info {
>> -	struct drm_framebuffer fb;
>> +	struct drm_framebuffer *fb;
>>   	struct drm_rect src, dst;
>>   	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
>>   	unsigned int offset;
>> @@ -36,6 +31,29 @@ struct vkms_frame_info {
>>   	unsigned int cpp;
>>   };
>>   
>> +struct pixel_argb_u16 {
>> +	u16 a, r, g, b;
>> +};
>> +
>> +struct line_buffer {
>> +	size_t n_pixels;
>> +	struct pixel_argb_u16 *pixels;
>> +};
>> +
>> +typedef void
>> +(*wb_format_transform_func)(struct vkms_frame_info *frame_info,
>> +			    const struct line_buffer *buffer, int y);
>> +
>> +typedef void
>> +(*plane_format_transform_func)(struct line_buffer *buffer,
>> +			       const struct vkms_frame_info *frame_info, int y);
> 
> It wasn't immediately obvious to me in which direction these function
> types work from their names. The arguments are not wb and plane but
> vkms_frame_info and line_buffer in both. The implementations of these
> functions would have nothing specific to a wb or a plane either, would
> they?

No, there's nothing specific.

Do you think adding {dst_,src_} would be enough?

(*wb_format_transform_func)(struct vkms_frame_info *dst_frame_info,
			    const struct line_buffer *src_buffer

(*plane_format_transform_func)(struct line_buffer *dst_buffer,
			   const struct vkms_frame_info *src_frame_info,

> 
> What about naming them frame_to_line_func and line_to_frame_func?

Sounds good. I will rename it.

> 
>> +
>> +struct vkms_writeback_job {
>> +	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
>> +	struct vkms_frame_info frame_info;
> 
> Which frame_info is this? Should the field be called wb_frame_info?

Considering it's already in the writeback_job struct do you think this
necessary?

In other words, what kind of misudertanding do you think can happen if
this variable stay without the `wb_` prefix?

I spent a few minutes trying to find a case, but nothing came to my
mind.

> 
>> +	wb_format_transform_func format_func;
> 
> line_to_frame_func wb_write;
> 
> perhaps? The type explains the general type of the function, and the
> field name refers to what it is used for.
> 
>> +};
>> +
>>   /**
>>    * vkms_plane_state - Driver specific plane state
>>    * @base: base plane state
>> @@ -44,6 +62,7 @@ struct vkms_frame_info {
>>   struct vkms_plane_state {
>>   	struct drm_shadow_plane_state base;
>>   	struct vkms_frame_info *frame_info;
>> +	plane_format_transform_func format_func;
> 
> Similarly here, maybe
> 
> frame_to_line_func plane_read;
> 
> perhaps?

Yeah, sure.

> 
>>   };
>>   
>>   struct vkms_plane {
>> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
>> index a56b0f76eddd..28752af0118c 100644
>> --- a/drivers/gpu/drm/vkms/vkms_plane.c
>> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
>> @@ -50,12 +50,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
>>   	struct vkms_plane_state *vkms_state = to_vkms_plane_state(old_state);
>>   	struct drm_crtc *crtc = vkms_state->base.base.crtc;
>>   
>> -	if (crtc) {
>> +	if (crtc && vkms_state->frame_info->fb) {
>>   		/* dropping the reference we acquired in
>>   		 * vkms_primary_plane_update()
>>   		 */
>> -		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
>> -			drm_framebuffer_put(&vkms_state->frame_info->fb);
>> +		if (drm_framebuffer_read_refcount(vkms_state->frame_info->fb))
>> +			drm_framebuffer_put(vkms_state->frame_info->fb);
>>   	}
>>   
>>   	kfree(vkms_state->frame_info);
>> @@ -110,9 +110,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>   	frame_info = vkms_plane_state->frame_info;
>>   	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
>>   	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
>> -	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
>> +	frame_info->fb = fb;
> 
> This change, replacing the memcpy with storing a pointer, seems to be
> another major point of this patch. Should it be a separate patch?
> It doesn't seem to fit with the current commit message.
> 
> I have no idea what kind of locking or referencing a drm_framebuffer
> would need, and I suspect that would be easier to review if it was a
> patch of its own.

Makes sense. I will do that.

> 
>>   	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
>> -	drm_framebuffer_get(&frame_info->fb);
>> +	drm_framebuffer_get(frame_info->fb);
> 
> Does drm_framebuffer_get() not return anything?

No, it doesn't actually. This function increments the ref count of this
struct and doesn't return anything.

> 
> To me it would be more idiomatic to write something like
> 
> 	frame_info->fb = drm_framebuffer_get(fb);
> I spend few minutes trying to find a case but nothing comes to my mind.
> I don't know if that pattern is used in the kernel, but I use it in
> userspace to emphasise that frame_info owns a new reference rather than
> borrowing someone else's.
> 
> 
> Thanks,
> pq
> 
>>   	frame_info->offset = fb->offsets[0];
>>   	frame_info->pitch = fb->pitches[0];
>>   	frame_info->cpp = fb->format->cpp[0];
>> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
>> index 746cb0abc6ec..ad4bb1fb37ca 100644
>> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
>> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
>> @@ -74,12 +74,15 @@ static int vkms_wb_prepare_job(struct drm_writeback_connector *wb_connector,
>>   	if (!vkmsjob)
>>   		return -ENOMEM;
>>   
>> -	ret = drm_gem_fb_vmap(job->fb, vkmsjob->map, vkmsjob->data);
>> +	ret = drm_gem_fb_vmap(job->fb, vkmsjob->frame_info.map, vkmsjob->data);
>>   	if (ret) {
>>   		DRM_ERROR("vmap failed: %d\n", ret);
>>   		goto err_kfree;
>>   	}
>>   
>> +	vkmsjob->frame_info.fb = job->fb;
>> +	drm_framebuffer_get(vkmsjob->frame_info.fb);
>> +
>>   	job->priv = vkmsjob;
>>   
>>   	return 0;
>> @@ -98,7 +101,9 @@ static void vkms_wb_cleanup_job(struct drm_writeback_connector *connector,
>>   	if (!job->fb)
>>   		return;
>>   
>> -	drm_gem_fb_vunmap(job->fb, vkmsjob->map);
>> +	drm_gem_fb_vunmap(job->fb, vkmsjob->frame_info.map);
>> +
>> +	drm_framebuffer_put(vkmsjob->frame_info.fb);
>>   
>>   	vkmsdev = drm_device_to_vkms_device(job->fb->dev);
>>   	vkms_set_composer(&vkmsdev->output, false);
>> @@ -115,14 +120,23 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>>   	struct drm_writeback_connector *wb_conn = &output->wb_connector;
>>   	struct drm_connector_state *conn_state = wb_conn->base.state;
>>   	struct vkms_crtc_state *crtc_state = output->composer_state;
>> +	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
>> +	struct vkms_writeback_job *active_wb;
>> +	struct vkms_frame_info *wb_frame_info;
>>   
>>   	if (!conn_state)
>>   		return;
>>   
>>   	vkms_set_composer(&vkmsdev->output, true);
>>   
>> +	active_wb = conn_state->writeback_job->priv;
>> +	wb_frame_info = &active_wb->frame_info;
>> +
>>   	spin_lock_irq(&output->composer_lock);
>> -	crtc_state->active_writeback = conn_state->writeback_job->priv;
>> +	crtc_state->active_writeback = active_wb;
>> +	wb_frame_info->offset = fb->offsets[0];
>> +	wb_frame_info->pitch = fb->pitches[0];
>> +	wb_frame_info->cpp = fb->format->cpp[0];
>>   	crtc_state->wb_pending = true;
>>   	spin_unlock_irq(&output->composer_lock);
>>   	drm_writeback_queue_job(wb_conn, connector_state);
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-04-20 12:36   ` Pekka Paalanen
@ 2022-04-23 16:04     ` Igor Torrente
  2022-04-23 18:53       ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-23 16:04 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

Hi Pekka,

On 4/20/22 09:36, Pekka Paalanen wrote:
> On Mon,  4 Apr 2022 17:45:12 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
> 
>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>> as a color input.
>>
>> This patch refactors all the functions related to the plane composition
>> to overcome this limitation.
>>
>> A new internal format(`struct pixel`) is introduced to deal with all
> 
> Hi,
> 
> struct pixel_argb_u16 was added in the previous patch.

I will fix it. Thanks!

> 
>> possible inputs. It consists of 16 bits fields that represent each of
>> the channels.
>>
>> The pixels blend is done using this internal format. And new handlers
>> are being added to convert a specific format to/from this internal format.
>>
>> So the blend operation depends on these handlers to convert to this common
>> format. The blended result, if necessary, is converted to the writeback
>> buffer format.
>>
>> This patch introduces three major differences to the blend function.
>> 1 - All the planes are blended at once.
>> 2 - The blend calculus is done as per line instead of per pixel.
>> 3 - It is responsible to calculates the CRC and writing the writeback
>> buffer(if necessary).
>>
>> These changes allow us to allocate way less memory in the intermediate
>> buffer to compute these operations. Because now we don't need to
>> have the entire intermediate image lines at once, just one line is
>> enough.
>>
>> | Memory consumption (output dimensions) |
>> |:--------------------------------------:|
>> |       Current      |     This patch    |
>> |:------------------:|:-----------------:|
>> |   Width * Heigth   |     2 * Width     |
>>
>> Beyond memory, we also have a minor performance benefit from all
>> these changes. Results running the IGT[1] test
>> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
>>
>> |                 Frametime                  |
>> |:------------------------------------------:|
>> |  Implementation |  Current  |  This commit |
>> |:---------------:|:---------:|:------------:|
>> | frametime range |  9~22 ms  |    5~17 ms   |
>> |     Average     |  11.4 ms  |    7.8 ms    |
>>
>> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
>>
>> V2: Improves the performance drastically, by performing the operations
>>      per-line and not per-pixel(Pekka Paalanen).
>>      Minor improvements(Pekka Paalanen).
>> V3: Changes the code to blend the planes all at once. This improves
>>      performance, memory consumption, and removes much of the weirdness
>>      of the V2(Pekka Paalanen and me).
>>      Minor improvements(Pekka Paalanen and me).
>> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
>> V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen).
>>      Several security/robustness improvents(Pekka Paalanen).
>>      Removes check_planes_x_bounds function and allows partial
>>      partly off-screen(Pekka Paalanen).
>>
>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>> ---
>>   Documentation/gpu/vkms.rst            |   4 -
>>   drivers/gpu/drm/vkms/Makefile         |   1 +
>>   drivers/gpu/drm/vkms/vkms_composer.c  | 318 ++++++++++++--------------
>>   drivers/gpu/drm/vkms/vkms_formats.c   | 151 ++++++++++++
>>   drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
>>   drivers/gpu/drm/vkms/vkms_plane.c     |   3 +
>>   drivers/gpu/drm/vkms/vkms_writeback.c |   3 +
>>   7 files changed, 311 insertions(+), 181 deletions(-)
>>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
>>
>> diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
>> index 973e2d43108b..a49e4ae92653 100644
>> --- a/Documentation/gpu/vkms.rst
>> +++ b/Documentation/gpu/vkms.rst
>> @@ -118,10 +118,6 @@ Add Plane Features
>>   
>>   There's lots of plane features we could add support for:
>>   
>> -- Clearing primary plane: clear primary plane before plane composition (at the
>> -  start) for correctness of pixel blend ops. It also guarantees alpha channel
>> -  is cleared in the target buffer for stable crc. [Good to get started]
>> -
>>   - ARGB format on primary plane: blend the primary plane into background with
>>     translucent alpha.
>>   
>> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
>> index 72f779cbfedd..1b28a6a32948 100644
>> --- a/drivers/gpu/drm/vkms/Makefile
>> +++ b/drivers/gpu/drm/vkms/Makefile
>> @@ -3,6 +3,7 @@ vkms-y := \
>>   	vkms_drv.o \
>>   	vkms_plane.o \
>>   	vkms_output.o \
>> +	vkms_formats.o \
>>   	vkms_crtc.o \
>>   	vkms_composer.o \
>>   	vkms_writeback.o
>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>> index 95029d2ebcac..cf24015bf90f 100644
>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> 
> (For this file, I have removed all the minus diff lines from below to
> better see the new code.)
> 
> 
>> @@ -7,204 +7,186 @@
>>   #include <drm/drm_fourcc.h>
>>   #include <drm/drm_gem_framebuffer_helper.h>
>>   #include <drm/drm_vblank.h>
>> +#include <linux/minmax.h>
>>   
>>   #include "vkms_drv.h"
>>   
>> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>>   {
>> +	u32 new_color;
>>   
>> +	new_color = (src * 0xffff + dst * (0xffff - alpha));
>>   
>> +	return DIV_ROUND_CLOSEST(new_color, 0xffff);
> 
> This looks good.
> 
>>   }
>>   
>>   /**
>> + * pre_mul_alpha_blend - alpha blending equation
>> + * @src_frame_info: source framebuffer's metadata
>> + * @stage_buffer: The line with the pixels from src_plane
>> + * @output_buffer: A line buffer that receives all the blends output
>>    *
>> + * Using the information from the `frame_info`, this blends only the
>> + * necessary pixels from the `stage_buffer` to the `output_buffer`
>> + * using premultiplied blend formula.
>>    *
>> + * The current DRM assumption is that pixel color values have been already
>> + * pre-multiplied with the alpha channel values. See more
>> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
>> + * completely opaque background.
>>    */
>> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
>> +				struct line_buffer *stage_buffer,
>> +				struct line_buffer *output_buffer)
>>   {
>> +	int x, x_dst = frame_info->dst.x1;
>> +	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
>> +	struct pixel_argb_u16 *in = stage_buffer->pixels;
>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			    stage_buffer->n_pixels);
>> +
>> +	for (x = 0; x < x_limit; x++) {
>> +		out[x].a = (u16)0xffff;
>> +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
>> +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
>> +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>>   	}
>>   }
>>   
>> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>>   {
>> +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
>> +		return true;
>>   
>> +	return false;
>>   }
>>   
>>   /**
>> + * @wb_frame_info: The writeback frame buffer metadata
>> + * @crtc_state: The crtc state
>> + * @crc32: The crc output of the final frame
>> + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
>> + * @stage_buffer: The line with the pixels from plane being blend to the output
>>    *
>> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
>> + * from all planes, calculates the crc32 of the output from the former step,
>> + * and, if necessary, convert and store the output to the writeback buffer.
>>    */
>> +static void blend(struct vkms_writeback_job *wb,
>> +		  struct vkms_crtc_state *crtc_state,
>> +		  u32 *crc32, struct line_buffer *stage_buffer,
>> +		  struct line_buffer *output_buffer, s64 row_size)
>>   {
>> +	struct vkms_plane_state **plane = crtc_state->active_planes;
>> +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
>> +	u32 n_active_planes = crtc_state->num_active_planes;
>> +
>> +	int y_dst = primary_plane_info->dst.y1;
>> +	int h_dst = drm_rect_height(&primary_plane_info->dst);
>> +	int y_limit = y_dst + h_dst;
>> +	int y, i;
>> +
>> +	for (y = y_dst; y < y_limit; y++) {
>> +		plane[0]->format_func(output_buffer, primary_plane_info, y);
> 
> This is a bad assumption, but the next patch removes the need for this
> assumption. The primary plane may not be the bottom-most AFAIU.
> Overlays below the primary exist on real hardware.
> 
>> +
>> +		/* If there are other planes besides primary, we consider the active
>> +		 * planes should be in z-order and compose them associatively:
>> +		 * ((primary <- overlay) <- cursor)
>> +		 */
>> +		for (i = 1; i < n_active_planes; i++) {
>> +			if (!check_y_limit(plane[i]->frame_info, y))
>> +				continue;
>> +
>> +			plane[i]->format_func(stage_buffer, plane[i]->frame_info, y);
>> +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
>> +					    output_buffer);
>> +		}
>>   
>> +		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
>>   
>> +		if (wb)
>> +			wb->format_func(&wb->frame_info, output_buffer, y);
>>   	}
>>   }
>>   
>> +static int check_format_funcs(struct vkms_crtc_state *crtc_state,
>> +			      struct vkms_writeback_job *active_wb)
>>   {
>> +	struct vkms_plane_state **planes = crtc_state->active_planes;
>> +	u32 n_active_planes = crtc_state->num_active_planes;
>> +	int i;
>>   
>> +	for (i = 0; i < n_active_planes; i++)
>> +		if (!planes[i]->format_func)
>> +			return -1;
>>   
>> +	if (active_wb && !active_wb->format_func)
>> +		return -1;
>>   
>> +	return 0;
>>   }
>>   
>> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
>> +				 struct vkms_crtc_state *crtc_state,
>> +				 u32 *crc32)
>>   {
>> +	int line_width, ret = 0, pixel_size = sizeof(struct pixel_argb_u16);
>> +	struct vkms_frame_info *primary_plane_info = NULL;
>> +	struct line_buffer output_buffer, stage_buffer;
>> +	struct vkms_plane_state *act_plane = NULL;
>> +	u32 wb_format;
>>   
>> +	if (WARN_ON(pixel_size != 8))
> 
> Isn't there a compile-time assert macro for this? Having to actually
> run VKMS to check for this reduces the chances of finding it early.
> What's the reason for this check anyway?
> 
>> +		return -EINVAL;
>> +
>> +	if (crtc_state->num_active_planes >= 1) {
>> +		act_plane = crtc_state->active_planes[0];
>> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>> +			primary_plane_info = act_plane->frame_info;
> 
> After the next patch, do you even need the primary plane for anything
> specifically? There is the map_is_null check below, but that should be
> done on all planes in the array, right?
> 
> I suspect the next patch, or another patch in this series, should just
> delete this chunk.



> 
>>   	}
>>   
>> +	if (!primary_plane_info)
>> +		return -EINVAL;
>> +
>>   	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>>   		return -EINVAL;
>>   
>> +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
>> +		return -EINVAL;
>>   
>> +	line_width = drm_rect_width(&primary_plane_info->dst);
>> +	stage_buffer.n_pixels = line_width;
>> +	output_buffer.n_pixels = line_width;
>>   
>> +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>> +	if (!stage_buffer.pixels) {
>> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
>> +		return -ENOMEM;
>> +	}
>>   
>> +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>> +	if (!output_buffer.pixels) {
>> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
>> +		ret = -ENOMEM;
>> +		goto free_stage_buffer;
>> +	}
>> +
>> +	if (active_wb) {
>> +		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>> +
>> +		wb_format = wb_frame_info->fb->format->format;
> 
> I don't see wb_format being used, is it?

This is probably a leftover from the last versions. Thanks for catching
it.

> 
>> +		wb_frame_info->src = primary_plane_info->src;
>> +		wb_frame_info->dst = primary_plane_info->dst;
>> +	}
>> +
>> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
>> +	      &output_buffer, (s64)line_width * pixel_size);
> 
> What's the (s64) doing here?
> 
> Are byte sizes not usually expressed with size_t or ssize_t types, or
> is the kernel convention to use u64 and s64?
> 
> This makes me suspect that pixel_offset() and friends in vkms_format.c
> are going to need fixing as well. int type overflows at 2G.


Yeah, I should be using size_t in all these places.

> 
>> +
>> +	kvfree(output_buffer.pixels);
>> +free_stage_buffer:
>> +	kvfree(stage_buffer.pixels);
>> +can
>> +	return ret;
>>   }
>>   
>>   /**
>> @@ -222,13 +204,11 @@ void vkms_composer_worker(struct work_struct *work)
>>   						struct vkms_crtc_state,
>>   						composer_work);
>>   	struct drm_crtc *crtc = crtc_state->base.crtc;
>> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>>   	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>>   	bool crc_pending, wb_pending;
>>   	u64 frame_start, frame_end;
>> +	u32 crc32 = 0;
>>   	int ret;
>>   
>>   	spin_lock_irq(&out->composer_lock);
>> @@ -248,35 +228,19 @@ void vkms_composer_worker(struct work_struct *work)
>>   	if (!crc_pending)
>>   		return;
>>   
>>   	if (wb_pending)
>> +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
>> +	else
>> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>>   
>> +	if (ret)
>>   		return;
>>   
>>   	if (wb_pending) {
>>   		drm_writeback_signal_completion(&out->wb_connector, 0);
>>   		spin_lock_irq(&out->composer_lock);
>>   		crtc_state->wb_pending = false;
>>   		spin_unlock_irq(&out->composer_lock);
>>   	}
>>   
>>   	/*
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>> new file mode 100644
>> index 000000000000..931a61405d6a
>> --- /dev/null
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>> @@ -0,0 +1,151 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +
>> +#include <drm/drm_rect.h>
>> +#include <linux/minmax.h>
>> +
>> +#include "vkms_formats.h"
>> +
>> +static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
>> +{
>> +	return frame_info->offset + (y * frame_info->pitch)
>> +				  + (x * frame_info->cpp);
>> +}
>> +
>> +/*
>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>> + *
>> + * @frame_info: Buffer metadata
>> + * @x: The x(width) coordinate of the 2D buffer
>> + * @y: The y(Heigth) coordinate of the 2D buffercan
>> + *
>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>> + * returns the address of the first color channel.
>> + * This function assumes the channels are packed together, i.e. a color channel
>> + * comes immediately after another in the memory. And therefore, this function
>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>> + */
>> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
>> +				int x, int y)
>> +{
>> +	int offset = pixel_offset(frame_info, x, y);
>> +
>> +	return (u8 *)frame_info->map[0].vaddr + offset;
>> +}
>> +
>> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
>> +{
>> +	int x_src = frame_info->src.x1 >> 16;
>> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>> +
>> +	return packed_pixels_addr(frame_info, x_src, y_src);
>> +}
>> +
>> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>> +				 const struct vkms_frame_info *frame_info, int y)
>> +{
>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			       stage_buffer->n_pixels);
>> +
>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>> +		/*
>> +		 * The 257 is the "conversion ratio". This number is obtained by the
>> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
>> +		 * the best color value in a pixel format with more possibilities.
>> +		 * A similar idea applies to others RGB color conversions.
>> +		 */
>> +		out_pixels[x].a = (u16)src_pixels[3] * 257;
>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>> +	}
>> +}
>> +
>> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>> +				 const struct vkms_frame_info *frame_info, int y)
>> +{
>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			       stage_buffer->n_pixels);
>> +
>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>> +		out_pixels[x].a = (u16)0xffff;
>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>> +	}
>> +}
>> +
>> +/*
>> + * The following  functions take an line of argb_u16 pixels from the
>> + * src_buffer, convert them to a specific format, and store them in the
>> + * destination.
>> + *
>> + * They are used in the `compose_active_planes` to convert and store a line
>> + * from the src_buffer to the writeback buffer.
>> + */
>> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
>> +				 const struct line_buffer *src_buffer, int y)
>> +{
>> +	int x, x_dst = frame_info->dst.x1;
>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			    src_buffer->n_pixels);
>> +
>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>> +		/*
>> +		 * This sequence below is important because the format's byte order is
>> +		 * in little-endian. In the case of the ARGB8888 the memory is
>> +		 * organized this way:
>> +		 *
>> +		 * | Addr     | = blue channel
>> +		 * | Addr + 1 | = green channel
>> +		 * | Addr + 2 | = Red channel
>> +		 * | Addr + 3 | = Alpha channel
>> +		 */
>> +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>> +	}
>> +}
>> +
>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
>> +				 const struct line_buffer *src_buffer, int y)
>> +{
>> +	int x, x_dst = frame_info->dst.x1;
>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			    src_buffer->n_pixels);
>> +
>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>> +		dst_pixels[3] = (u8)0xff;
> 
> When writing to XRGB, it's not necessary to ensure the X channel has
> any sensible value. Anyone reading from XRGB must ignore that value
> anyway. So why not write something wacky here, like 0xa1, that is far
> enough from both 0x00 or 0xff to not be confused with them even
> visually? Also not 0x7f or 0x80 which are close to half of 0xff.
> 
> Or, you could save a whole function and just use argb_u16_to_ARGBxxxx()
> instead, even for XRGB destination.


Right. Maybe I could just leave the channel untouched.

> 
>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>> +	}
>> +}
>> +
>> +plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>> +{
>> +	if (format == DRM_FORMAT_ARGB8888)
>> +		return &ARGB8888_to_argb_u16;
>> +	else if (format == DRM_FORMAT_XRGB8888)
>> +		return &XRGB8888_to_argb_u16;
>> +	else
>> +		return NULL;
> 
> This works for now, but when more formats are added, I'd think a switch
> statement would look better.

ok.

> 
>> +}
>> +
>> +wb_format_transform_func get_wb_fmt_transform_function(u32 format)
>> +{
>> +	if (format == DRM_FORMAT_ARGB8888)
>> +		return &argb_u16_to_ARGB8888;
>> +	else if (format == DRM_FORMAT_XRGB8888)
>> +		return &argb_u16_to_XRGB8888;
>> +	else
>> +		return NULL;
>> +}
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
>> new file mode 100644
>> index 000000000000..adc5a17b9584
>> --- /dev/null
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
>> @@ -0,0 +1,12 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +
>> +#ifndef _VKMS_FORMATS_H_
>> +#define _VKMS_FORMATS_H_
>> +
>> +#include "vkms_drv.h"
>> +
>> +plane_format_transform_func get_plane_fmt_transform_function(u32 format);
>> +
>> +wb_format_transform_func get_wb_fmt_transform_function(u32 format);
> 
> This is good, exposing only what is necessary.
> 
> 
> Thanks,
> pq
> 
>> +
>> +#endif /* _VKMS_FORMATS_H_ */
>> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
>> index 28752af0118c..798243837fd0 100644
>> --- a/drivers/gpu/drm/vkms/vkms_plane.c
>> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
>> @@ -10,6 +10,7 @@
>>   #include <drm/drm_plane_helper.h>
>>   
>>   #include "vkms_drv.h"
>> +#include "vkms_formats.h"
>>   
>>   static const u32 vkms_formats[] = {
>>   	DRM_FORMAT_XRGB8888,
>> @@ -100,6 +101,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>   	struct drm_shadow_plane_state *shadow_plane_state;
>>   	struct drm_framebuffer *fb = new_state->fb;
>>   	struct vkms_frame_info *frame_info;
>> +	u32 fmt = fb->format->format;
>>   
>>   	if (!new_state->crtc || !fb)
>>   		return;
>> @@ -116,6 +118,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>   	frame_info->offset = fb->offsets[0];
>>   	frame_info->pitch = fb->pitches[0];
>>   	frame_info->cpp = fb->format->cpp[0];
>> +	vkms_plane_state->format_func = get_plane_fmt_transform_function(fmt);
>>   }
>>   
>>   static int vkms_plane_atomic_check(struct drm_plane *plane,
>> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
>> index ad4bb1fb37ca..97f71e784bbf 100644
>> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
>> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
>> @@ -11,6 +11,7 @@
>>   #include <drm/drm_gem_shmem_helper.h>
>>   
>>   #include "vkms_drv.h"
>> +#include "vkms_formats.h"
>>   
>>   static const u32 vkms_wb_formats[] = {
>>   	DRM_FORMAT_XRGB8888,
>> @@ -123,6 +124,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>>   	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
>>   	struct vkms_writeback_job *active_wb;
>>   	struct vkms_frame_info *wb_frame_info;
>> +	u32 wb_format = fb->format->format;
>>   
>>   	if (!conn_state)
>>   		return;
>> @@ -140,6 +142,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>>   	crtc_state->wb_pending = true;
>>   	spin_unlock_irq(&output->composer_lock);
>>   	drm_writeback_queue_job(wb_conn, connector_state);
>> +	active_wb->format_func = get_wb_fmt_transform_function(wb_format);
>>   }
>>   
>>   static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs = {
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-04-23 16:04     ` Igor Torrente
@ 2022-04-23 18:53       ` Igor Torrente
  2022-04-25  8:10         ` Pekka Paalanen
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-23 18:53 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

I forgot to respond some points from your review.

On 4/23/22 13:04, Igor Torrente wrote:
> Hi Pekka,
> 
> On 4/20/22 09:36, Pekka Paalanen wrote:
>> On Mon,  4 Apr 2022 17:45:12 -0300
>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>
>>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>>> as a color input.
>>>
>>> This patch refactors all the functions related to the plane composition
>>> to overcome this limitation.
>>>
>>> A new internal format(`struct pixel`) is introduced to deal with all
>>
>> Hi,
>>
>> struct pixel_argb_u16 was added in the previous patch.
> 
> I will fix it. Thanks!
> 
>>
>>> possible inputs. It consists of 16 bits fields that represent each of
>>> the channels.
>>>
>>> The pixels blend is done using this internal format. And new handlers
>>> are being added to convert a specific format to/from this internal format.
>>>
>>> So the blend operation depends on these handlers to convert to this common
>>> format. The blended result, if necessary, is converted to the writeback
>>> buffer format.
>>>
>>> This patch introduces three major differences to the blend function.
>>> 1 - All the planes are blended at once.
>>> 2 - The blend calculus is done as per line instead of per pixel.
>>> 3 - It is responsible to calculates the CRC and writing the writeback
>>> buffer(if necessary).
>>>
>>> These changes allow us to allocate way less memory in the intermediate
>>> buffer to compute these operations. Because now we don't need to
>>> have the entire intermediate image lines at once, just one line is
>>> enough.
>>>
>>> | Memory consumption (output dimensions) |
>>> |:--------------------------------------:|
>>> |       Current      |     This patch    |
>>> |:------------------:|:-----------------:|
>>> |   Width * Heigth   |     2 * Width     |
>>>
>>> Beyond memory, we also have a minor performance benefit from all
>>> these changes. Results running the IGT[1] test
>>> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
>>>
>>> |                 Frametime                  |
>>> |:------------------------------------------:|
>>> |  Implementation |  Current  |  This commit |
>>> |:---------------:|:---------:|:------------:|
>>> | frametime range |  9~22 ms  |    5~17 ms   |
>>> |     Average     |  11.4 ms  |    7.8 ms    |
>>>
>>> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
>>>
>>> V2: Improves the performance drastically, by performing the operations
>>>       per-line and not per-pixel(Pekka Paalanen).
>>>       Minor improvements(Pekka Paalanen).
>>> V3: Changes the code to blend the planes all at once. This improves
>>>       performance, memory consumption, and removes much of the weirdness
>>>       of the V2(Pekka Paalanen and me).
>>>       Minor improvements(Pekka Paalanen and me).
>>> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
>>> V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen).
>>>       Several security/robustness improvents(Pekka Paalanen).
>>>       Removes check_planes_x_bounds function and allows partial
>>>       partly off-screen(Pekka Paalanen).
>>>
>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>>> ---
>>>    Documentation/gpu/vkms.rst            |   4 -
>>>    drivers/gpu/drm/vkms/Makefile         |   1 +
>>>    drivers/gpu/drm/vkms/vkms_composer.c  | 318 ++++++++++++--------------
>>>    drivers/gpu/drm/vkms/vkms_formats.c   | 151 ++++++++++++
>>>    drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
>>>    drivers/gpu/drm/vkms/vkms_plane.c     |   3 +
>>>    drivers/gpu/drm/vkms/vkms_writeback.c |   3 +
>>>    7 files changed, 311 insertions(+), 181 deletions(-)
>>>    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>>>    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
>>>
>>> diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
>>> index 973e2d43108b..a49e4ae92653 100644
>>> --- a/Documentation/gpu/vkms.rst
>>> +++ b/Documentation/gpu/vkms.rst
>>> @@ -118,10 +118,6 @@ Add Plane Features
>>>    
>>>    There's lots of plane features we could add support for:
>>>    
>>> -- Clearing primary plane: clear primary plane before plane composition (at the
>>> -  start) for correctness of pixel blend ops. It also guarantees alpha channel
>>> -  is cleared in the target buffer for stable crc. [Good to get started]
>>> -
>>>    - ARGB format on primary plane: blend the primary plane into background with
>>>      translucent alpha.
>>>    
>>> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
>>> index 72f779cbfedd..1b28a6a32948 100644
>>> --- a/drivers/gpu/drm/vkms/Makefile
>>> +++ b/drivers/gpu/drm/vkms/Makefile
>>> @@ -3,6 +3,7 @@ vkms-y := \
>>>    	vkms_drv.o \
>>>    	vkms_plane.o \
>>>    	vkms_output.o \
>>> +	vkms_formats.o \
>>>    	vkms_crtc.o \
>>>    	vkms_composer.o \
>>>    	vkms_writeback.o
>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>> index 95029d2ebcac..cf24015bf90f 100644
>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>
>> (For this file, I have removed all the minus diff lines from below to
>> better see the new code.)
>>
>>
>>> @@ -7,204 +7,186 @@
>>>    #include <drm/drm_fourcc.h>
>>>    #include <drm/drm_gem_framebuffer_helper.h>
>>>    #include <drm/drm_vblank.h>
>>> +#include <linux/minmax.h>
>>>    
>>>    #include "vkms_drv.h"
>>>    
>>> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>>>    {
>>> +	u32 new_color;
>>>    
>>> +	new_color = (src * 0xffff + dst * (0xffff - alpha));
>>>    
>>> +	return DIV_ROUND_CLOSEST(new_color, 0xffff);
>>
>> This looks good.
>>
>>>    }
>>>    
>>>    /**
>>> + * pre_mul_alpha_blend - alpha blending equation
>>> + * @src_frame_info: source framebuffer's metadata
>>> + * @stage_buffer: The line with the pixels from src_plane
>>> + * @output_buffer: A line buffer that receives all the blends output
>>>     *
>>> + * Using the information from the `frame_info`, this blends only the
>>> + * necessary pixels from the `stage_buffer` to the `output_buffer`
>>> + * using premultiplied blend formula.
>>>     *
>>> + * The current DRM assumption is that pixel color values have been already
>>> + * pre-multiplied with the alpha channel values. See more
>>> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
>>> + * completely opaque background.
>>>     */
>>> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
>>> +				struct line_buffer *stage_buffer,
>>> +				struct line_buffer *output_buffer)
>>>    {
>>> +	int x, x_dst = frame_info->dst.x1;
>>> +	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
>>> +	struct pixel_argb_u16 *in = stage_buffer->pixels;
>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>> +			    stage_buffer->n_pixels);
>>> +
>>> +	for (x = 0; x < x_limit; x++) {
>>> +		out[x].a = (u16)0xffff;
>>> +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
>>> +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
>>> +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>>>    	}
>>>    }
>>>    
>>> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>>>    {
>>> +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
>>> +		return true;
>>>    
>>> +	return false;
>>>    }
>>>    
>>>    /**
>>> + * @wb_frame_info: The writeback frame buffer metadata
>>> + * @crtc_state: The crtc state
>>> + * @crc32: The crc output of the final frame
>>> + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
>>> + * @stage_buffer: The line with the pixels from plane being blend to the output
>>>     *
>>> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
>>> + * from all planes, calculates the crc32 of the output from the former step,
>>> + * and, if necessary, convert and store the output to the writeback buffer.
>>>     */
>>> +static void blend(struct vkms_writeback_job *wb,
>>> +		  struct vkms_crtc_state *crtc_state,
>>> +		  u32 *crc32, struct line_buffer *stage_buffer,
>>> +		  struct line_buffer *output_buffer, s64 row_size)
>>>    {
>>> +	struct vkms_plane_state **plane = crtc_state->active_planes;
>>> +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
>>> +	u32 n_active_planes = crtc_state->num_active_planes;
>>> +
>>> +	int y_dst = primary_plane_info->dst.y1;
>>> +	int h_dst = drm_rect_height(&primary_plane_info->dst);
>>> +	int y_limit = y_dst + h_dst;
>>> +	int y, i;
>>> +
>>> +	for (y = y_dst; y < y_limit; y++) {
>>> +		plane[0]->format_func(output_buffer, primary_plane_info, y);
>>
>> This is a bad assumption, but the next patch removes the need for this
>> assumption. The primary plane may not be the bottom-most AFAIU.
>> Overlays below the primary exist on real hardware.
>>
>>> +
>>> +		/* If there are other planes besides primary, we consider the active
>>> +		 * planes should be in z-order and compose them associatively:
>>> +		 * ((primary <- overlay) <- cursor)
>>> +		 */
>>> +		for (i = 1; i < n_active_planes; i++) {
>>> +			if (!check_y_limit(plane[i]->frame_info, y))
>>> +				continue;
>>> +
>>> +			plane[i]->format_func(stage_buffer, plane[i]->frame_info, y);
>>> +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
>>> +					    output_buffer);
>>> +		}
>>>    
>>> +		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
>>>    
>>> +		if (wb)
>>> +			wb->format_func(&wb->frame_info, output_buffer, y);
>>>    	}
>>>    }
>>>    
>>> +static int check_format_funcs(struct vkms_crtc_state *crtc_state,
>>> +			      struct vkms_writeback_job *active_wb)
>>>    {
>>> +	struct vkms_plane_state **planes = crtc_state->active_planes;
>>> +	u32 n_active_planes = crtc_state->num_active_planes;
>>> +	int i;
>>>    
>>> +	for (i = 0; i < n_active_planes; i++)
>>> +		if (!planes[i]->format_func)
>>> +			return -1;
>>>    
>>> +	if (active_wb && !active_wb->format_func)
>>> +		return -1;
>>>    
>>> +	return 0;
>>>    }
>>>    
>>> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
>>> +				 struct vkms_crtc_state *crtc_state,
>>> +				 u32 *crc32)
>>>    {
>>> +	int line_width, ret = 0, pixel_size = sizeof(struct pixel_argb_u16);
>>> +	struct vkms_frame_info *primary_plane_info = NULL;
>>> +	struct line_buffer output_buffer, stage_buffer;
>>> +	struct vkms_plane_state *act_plane = NULL;
>>> +	u32 wb_format;
>>>    
>>> +	if (WARN_ON(pixel_size != 8))
>>
>> Isn't there a compile-time assert macro for this? Having to actually
>> run VKMS to check for this reduces the chances of finding it early.
>> What's the reason for this check anyway?

Yes, it exists.

include/linux/build_bug.h:1:#define static_assert(expr, ...) 
__static_assert(expr, ##__VA_ARGS__, #expr)

I didn't add it because I can imagine some people very mad if the kernel 
did not compile because of vkms.

This check exists so we can call `crc32_le` for the entire line instead
doing it for each channel of each pixel in case `struct `pixel_argb_u16`
had any gap added by the compiler between the struct fields.

>>
>>> +		return -EINVAL;
>>> +
>>> +	if (crtc_state->num_active_planes >= 1) {
>>> +		act_plane = crtc_state->active_planes[0];
>>> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>>> +			primary_plane_info = act_plane->frame_info;
>>
>> After the next patch, do you even need the primary plane for anything
>> specifically?

Yeah, I will not need it anymore.

>> There is the map_is_null check below, but that should be
>> done on all planes in the array, right?

Yes, I guess so. And I don't know why it only checks for the 
primary_plane TBH.

>>
>> I suspect the next patch, or another patch in this series, should just
>> delete this chunk.
I should, and I will in the V6 of next patch.

> 
> 
> 
>>
>>>    	}
>>>    
>>> +	if (!primary_plane_info)
>>> +		return -EINVAL;
>>> +
>>>    	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>>>    		return -EINVAL;
>>>    
>>> +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
>>> +		return -EINVAL;
>>>    
>>> +	line_width = drm_rect_width(&primary_plane_info->dst);
>>> +	stage_buffer.n_pixels = line_width;
>>> +	output_buffer.n_pixels = line_width;
>>>    
>>> +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>> +	if (!stage_buffer.pixels) {
>>> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
>>> +		return -ENOMEM;
>>> +	}
>>>    
>>> +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>> +	if (!output_buffer.pixels) {
>>> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
>>> +		ret = -ENOMEM;
>>> +		goto free_stage_buffer;
>>> +	}
>>> +
>>> +	if (active_wb) {
>>> +		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>>> +
>>> +		wb_format = wb_frame_info->fb->format->format;
>>
>> I don't see wb_format being used, is it?
> 
> This is probably a leftover from the last versions. Thanks for catching
> it.
> 
>>
>>> +		wb_frame_info->src = primary_plane_info->src;
>>> +		wb_frame_info->dst = primary_plane_info->dst;
>>> +	}
>>> +
>>> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
>>> +	      &output_buffer, (s64)line_width * pixel_size);
>>
>> What's the (s64) doing here?
>>
>> Are byte sizes not usually expressed with size_t or ssize_t types, or
>> is the kernel convention to use u64 and s64?
>>
>> This makes me suspect that pixel_offset() and friends in vkms_format.c
>> are going to need fixing as well. int type overflows at 2G.
> 
> 
> Yeah, I should be using size_t in all these places.
> 
>>
>>> +
>>> +	kvfree(output_buffer.pixels);
>>> +free_stage_buffer:
>>> +	kvfree(stage_buffer.pixels);
>>> +can
>>> +	return ret;
>>>    }
>>>    
>>>    /**
>>> @@ -222,13 +204,11 @@ void vkms_composer_worker(struct work_struct *work)
>>>    						struct vkms_crtc_state,
>>>    						composer_work);
>>>    	struct drm_crtc *crtc = crtc_state->base.crtc;
>>> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>>>    	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>>>    	bool crc_pending, wb_pending;
>>>    	u64 frame_start, frame_end;
>>> +	u32 crc32 = 0;
>>>    	int ret;
>>>    
>>>    	spin_lock_irq(&out->composer_lock);
>>> @@ -248,35 +228,19 @@ void vkms_composer_worker(struct work_struct *work)
>>>    	if (!crc_pending)
>>>    		return;
>>>    
>>>    	if (wb_pending)
>>> +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
>>> +	else
>>> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>>>    
>>> +	if (ret)
>>>    		return;
>>>    
>>>    	if (wb_pending) {
>>>    		drm_writeback_signal_completion(&out->wb_connector, 0);
>>>    		spin_lock_irq(&out->composer_lock);
>>>    		crtc_state->wb_pending = false;
>>>    		spin_unlock_irq(&out->composer_lock);
>>>    	}
>>>    
>>>    	/*
>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>> new file mode 100644
>>> index 000000000000..931a61405d6a
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>> @@ -0,0 +1,151 @@
>>> +// SPDX-License-Identifier: GPL-2.0+
>>> +
>>> +#include <drm/drm_rect.h>
>>> +#include <linux/minmax.h>
>>> +
>>> +#include "vkms_formats.h"
>>> +
>>> +static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
>>> +{
>>> +	return frame_info->offset + (y * frame_info->pitch)
>>> +				  + (x * frame_info->cpp);
>>> +}
>>> +
>>> +/*
>>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>>> + *
>>> + * @frame_info: Buffer metadata
>>> + * @x: The x(width) coordinate of the 2D buffer
>>> + * @y: The y(Heigth) coordinate of the 2D buffercan
>>> + *
>>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>>> + * returns the address of the first color channel.
>>> + * This function assumes the channels are packed together, i.e. a color channel
>>> + * comes immediately after another in the memory. And therefore, this function
>>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>>> + */
>>> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
>>> +				int x, int y)
>>> +{
>>> +	int offset = pixel_offset(frame_info, x, y);
>>> +
>>> +	return (u8 *)frame_info->map[0].vaddr + offset;
>>> +}
>>> +
>>> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
>>> +{
>>> +	int x_src = frame_info->src.x1 >> 16;
>>> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>>> +
>>> +	return packed_pixels_addr(frame_info, x_src, y_src);
>>> +}
>>> +
>>> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>>> +				 const struct vkms_frame_info *frame_info, int y)
>>> +{
>>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>> +			       stage_buffer->n_pixels);
>>> +
>>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>>> +		/*
>>> +		 * The 257 is the "conversion ratio". This number is obtained by the
>>> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
>>> +		 * the best color value in a pixel format with more possibilities.
>>> +		 * A similar idea applies to others RGB color conversions.
>>> +		 */
>>> +		out_pixels[x].a = (u16)src_pixels[3] * 257;
>>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>>> +	}
>>> +}
>>> +
>>> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>>> +				 const struct vkms_frame_info *frame_info, int y)
>>> +{
>>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>> +			       stage_buffer->n_pixels);
>>> +
>>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>>> +		out_pixels[x].a = (u16)0xffff;
>>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>>> +	}
>>> +}
>>> +
>>> +/*
>>> + * The following  functions take an line of argb_u16 pixels from the
>>> + * src_buffer, convert them to a specific format, and store them in the
>>> + * destination.
>>> + *
>>> + * They are used in the `compose_active_planes` to convert and store a line
>>> + * from the src_buffer to the writeback buffer.
>>> + */
>>> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
>>> +				 const struct line_buffer *src_buffer, int y)
>>> +{
>>> +	int x, x_dst = frame_info->dst.x1;
>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>> +			    src_buffer->n_pixels);
>>> +
>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>>> +		/*
>>> +		 * This sequence below is important because the format's byte order is
>>> +		 * in little-endian. In the case of the ARGB8888 the memory is
>>> +		 * organized this way:
>>> +		 *
>>> +		 * | Addr     | = blue channel
>>> +		 * | Addr + 1 | = green channel
>>> +		 * | Addr + 2 | = Red channel
>>> +		 * | Addr + 3 | = Alpha channel
>>> +		 */
>>> +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
>>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>>> +	}
>>> +}
>>> +
>>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
>>> +				 const struct line_buffer *src_buffer, int y)
>>> +{
>>> +	int x, x_dst = frame_info->dst.x1;
>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>> +			    src_buffer->n_pixels);
>>> +
>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>>> +		dst_pixels[3] = (u8)0xff;
>>
>> When writing to XRGB, it's not necessary to ensure the X channel has
>> any sensible value. Anyone reading from XRGB must ignore that value
>> anyway. So why not write something wacky here, like 0xa1, that is far
>> enough from both 0x00 or 0xff to not be confused with them even
>> visually? Also not 0x7f or 0x80 which are close to half of 0xff.
>>
>> Or, you could save a whole function and just use argb_u16_to_ARGBxxxx()
>> instead, even for XRGB destination.
> 
> 
> Right. Maybe I could just leave the channel untouched.
> 
>>
>>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>>> +	}
>>> +}
>>> +
>>> +plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>>> +{
>>> +	if (format == DRM_FORMAT_ARGB8888)
>>> +		return &ARGB8888_to_argb_u16;
>>> +	else if (format == DRM_FORMAT_XRGB8888)
>>> +		return &XRGB8888_to_argb_u16;
>>> +	else
>>> +		return NULL;
>>
>> This works for now, but when more formats are added, I'd think a switch
>> statement would look better.
> 
> ok.
> 
>>
>>> +}
>>> +
>>> +wb_format_transform_func get_wb_fmt_transform_function(u32 format)
>>> +{
>>> +	if (format == DRM_FORMAT_ARGB8888)
>>> +		return &argb_u16_to_ARGB8888;
>>> +	else if (format == DRM_FORMAT_XRGB8888)
>>> +		return &argb_u16_to_XRGB8888;
>>> +	else
>>> +		return NULL;
>>> +}
>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
>>> new file mode 100644
>>> index 000000000000..adc5a17b9584
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
>>> @@ -0,0 +1,12 @@
>>> +// SPDX-License-Identifier: GPL-2.0+
>>> +
>>> +#ifndef _VKMS_FORMATS_H_
>>> +#define _VKMS_FORMATS_H_
>>> +
>>> +#include "vkms_drv.h"
>>> +
>>> +plane_format_transform_func get_plane_fmt_transform_function(u32 format);
>>> +
>>> +wb_format_transform_func get_wb_fmt_transform_function(u32 format);
>>
>> This is good, exposing only what is necessary.
>>
>>
>> Thanks,
>> pq
>>
>>> +
>>> +#endif /* _VKMS_FORMATS_H_ */
>>> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
>>> index 28752af0118c..798243837fd0 100644
>>> --- a/drivers/gpu/drm/vkms/vkms_plane.c
>>> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
>>> @@ -10,6 +10,7 @@
>>>    #include <drm/drm_plane_helper.h>
>>>    
>>>    #include "vkms_drv.h"
>>> +#include "vkms_formats.h"
>>>    
>>>    static const u32 vkms_formats[] = {
>>>    	DRM_FORMAT_XRGB8888,
>>> @@ -100,6 +101,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>>    	struct drm_shadow_plane_state *shadow_plane_state;
>>>    	struct drm_framebuffer *fb = new_state->fb;
>>>    	struct vkms_frame_info *frame_info;
>>> +	u32 fmt = fb->format->format;
>>>    
>>>    	if (!new_state->crtc || !fb)
>>>    		return;
>>> @@ -116,6 +118,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>>    	frame_info->offset = fb->offsets[0];
>>>    	frame_info->pitch = fb->pitches[0];
>>>    	frame_info->cpp = fb->format->cpp[0];
>>> +	vkms_plane_state->format_func = get_plane_fmt_transform_function(fmt);
>>>    }
>>>    
>>>    static int vkms_plane_atomic_check(struct drm_plane *plane,
>>> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
>>> index ad4bb1fb37ca..97f71e784bbf 100644
>>> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
>>> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
>>> @@ -11,6 +11,7 @@
>>>    #include <drm/drm_gem_shmem_helper.h>
>>>    
>>>    #include "vkms_drv.h"
>>> +#include "vkms_formats.h"
>>>    
>>>    static const u32 vkms_wb_formats[] = {
>>>    	DRM_FORMAT_XRGB8888,
>>> @@ -123,6 +124,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>>>    	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
>>>    	struct vkms_writeback_job *active_wb;
>>>    	struct vkms_frame_info *wb_frame_info;
>>> +	u32 wb_format = fb->format->format;
>>>    
>>>    	if (!conn_state)
>>>    		return;
>>> @@ -140,6 +142,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>>>    	crtc_state->wb_pending = true;
>>>    	spin_unlock_irq(&output->composer_lock);
>>>    	drm_writeback_queue_job(wb_conn, connector_state);
>>> +	active_wb->format_func = get_wb_fmt_transform_function(wb_format);
>>>    }
>>>    
>>>    static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs = {
>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 7/9] drm: vkms: Supports to the case where primary plane doesn't match the CRTC
  2022-04-20 13:13   ` Pekka Paalanen
@ 2022-04-24  0:41     ` Igor Torrente
  0 siblings, 0 replies; 44+ messages in thread
From: Igor Torrente @ 2022-04-24  0:41 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

Hi Pekka,

On 4/20/22 10:13, Pekka Paalanen wrote:
> On Mon,  4 Apr 2022 17:45:13 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
> 
>> We will break the current assumption that the primary plane has the
> 
> Hi,
> 
> I'd say "remove" rather than "break". Breaking sounds bad but this is
> good. :-)

Yeah, sure. :)

> 
>> same size and position as CRTC.
> 
> ...and that the primary plane is the bottom-most in zpos order, or is
> even enabled. At least as far as the blending machinery is concerned.
> 
>>
>> For that we will add CRTC dimension information to `vkms_crtc_state`
>> and add a opaque black backgound color.
>>
>> Because now we need to fill the background, we had a loss in
>> performance with this change. Results running the IGT[1] test
>> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
>>
>> |                  Frametime                   |
>> |:--------------------------------------------:|
>> |  Implementation |  Previous |   This commit  |
>> |:---------------:|:---------:|:--------------:|
>> | frametime range |  5~18 ms  |     10~22 ms   |
>> |     Average     |  8.47 ms  |     12.32 ms   |
>>
>> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
>>
>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>> ---
>>   Documentation/gpu/vkms.rst           |  3 +--
>>   drivers/gpu/drm/vkms/vkms_composer.c | 32 +++++++++++++++++++---------
>>   drivers/gpu/drm/vkms/vkms_crtc.c     |  4 ++++
>>   drivers/gpu/drm/vkms/vkms_drv.h      |  2 ++
>>   4 files changed, 29 insertions(+), 12 deletions(-)
>>
>> diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
>> index a49e4ae92653..49db221c0f52 100644
>> --- a/Documentation/gpu/vkms.rst
>> +++ b/Documentation/gpu/vkms.rst
>> @@ -121,8 +121,7 @@ There's lots of plane features we could add support for:
>>   - ARGB format on primary plane: blend the primary plane into background with
>>     translucent alpha.
>>   
>> -- Support when the primary plane isn't exactly matching the output size: blend
>> -  the primary plane into the black background.
>> +- Add background color KMS property[Good to get started].
>>   
>>   - Full alpha blending on all planes.
>>   
>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>> index cf24015bf90f..f80842227669 100644
>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>> @@ -61,6 +61,15 @@ static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>>   	return false;
>>   }
>>   
>> +static void fill_background(struct pixel_argb_u16 *backgroud_color,
> 
> Hi,
> 
> this could be const struct pixel_argb_u16 *. Also a typo: missing n in
> backgroud_color.

Oops.

> 
>> +			    struct line_buffer *output_buffer)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < output_buffer->n_pixels; i++)
>> +		output_buffer->pixels[i] = *backgroud_color;
>> +}
>> +
>>   /**
>>    * @wb_frame_info: The writeback frame buffer metadata
>>    * @crtc_state: The crtc state
>> @@ -78,22 +87,23 @@ static void blend(struct vkms_writeback_job *wb,
>>   		  struct line_buffer *output_buffer, s64 row_size)
>>   {
>>   	struct vkms_plane_state **plane = crtc_state->active_planes;
>> -	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
>>   	u32 n_active_planes = crtc_state->num_active_planes;
>>   
>> -	int y_dst = primary_plane_info->dst.y1;
>> -	int h_dst = drm_rect_height(&primary_plane_info->dst);
>> -	int y_limit = y_dst + h_dst;
>> +	struct pixel_argb_u16 background_color = (struct pixel_argb_u16) {
>> +		.a = 0xffff
>> +	};
> 
> Could be const and shorter, if that fits the kernel style:
> 
> 	const struct pixel_arb_u16 background_color = { .a = 0xffff };

It fits.

> 
>> +
>> +	int crtc_y_limit = crtc_state->crtc_height;
>>   	int y, i;
>>   
>> -	for (y = y_dst; y < y_limit; y++) {
>> -		plane[0]->format_func(output_buffer, primary_plane_info, y);
>> +	for (y = 0; y < crtc_y_limit; y++) {
>> +		fill_background(&background_color, output_buffer);
>>   
>>   		/* If there are other planes besides primary, we consider the active
>>   		 * planes should be in z-order and compose them associatively:
> 
> Is "associatively" the right word here?
> 
>>   		 * ((primary <- overlay) <- cursor)
> 
> The example (primary <- overlay) is not generally true with real hardware.
> 
> Maybe what you are trying to say is: The active planes are composed in
> z-order.

I always forgot to update these comments. Thanks!

> 
>>   		 */
>> -		for (i = 1; i < n_active_planes; i++) {
>> +		for (i = 0; i < n_active_planes; i++) {
>>   			if (!check_y_limit(plane[i]->frame_info, y))
>>   				continue;
>>   
>> @@ -154,7 +164,7 @@ static int compose_active_planes(struct vkms_writeback_job *active_wb,
> 
> As I mentioned on the previous patch, I think the finding of primary
> plane (which was generally incorrect) should be removed here.

I will remove this.

> 
>>   	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
>>   		return -EINVAL;
>>   
>> -	line_width = drm_rect_width(&primary_plane_info->dst);
>> +	line_width = crtc_state->crtc_width;
>>   	stage_buffer.n_pixels = line_width;
>>   	output_buffer.n_pixels = line_width;
>>   
>> @@ -175,8 +185,10 @@ static int compose_active_planes(struct vkms_writeback_job *active_wb,
>>   		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>>   
>>   		wb_format = wb_frame_info->fb->format->format;
>> -		wb_frame_info->src = primary_plane_info->src;
>> -		wb_frame_info->dst = primary_plane_info->dst;
>> +		drm_rect_init(&wb_frame_info->src, 0, 0, crtc_state->crtc_width,
>> +			      crtc_state->crtc_height);
>> +		drm_rect_init(&wb_frame_info->dst, 0, 0, crtc_state->crtc_width,
>> +			      crtc_state->crtc_height);
> 
> Why are these not set when the active_wb->frame_info is created? 

I thought that I hadn't access to the crtc at the wb creation.

After looking more carefully at the structs, I found this is not the case.

So I will improve this.

> Can the CRTC (video mode) be smaller than the wb buffer?

AFAIK this is not possible.

> 
> Somewhere you must have a check that wb buffer size can fit the crtc
> size, or maybe they must be exactly the same size. At least setting
> destination rectangle bigger than the buffer dimensions must be
> impossible.
> 
>>   	}
>>   
>>   	blend(active_wb, crtc_state, crc32, &stage_buffer,
>> diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
>> index 57bbd32e9beb..4a37e243c2d7 100644
>> --- a/drivers/gpu/drm/vkms/vkms_crtc.c
>> +++ b/drivers/gpu/drm/vkms/vkms_crtc.c
>> @@ -248,7 +248,9 @@ static void vkms_crtc_atomic_begin(struct drm_crtc *crtc,
>>   static void vkms_crtc_atomic_flush(struct drm_crtc *crtc,
>>   				   struct drm_atomic_state *state)
>>   {
>> +	struct drm_crtc_state *crtc_state = drm_atomic_get_new_crtc_state(state, crtc);
>>   	struct vkms_output *vkms_output = drm_crtc_to_vkms_output(crtc);
>> +	struct drm_display_mode *mode = &crtc_state->mode;
>>   
>>   	if (crtc->state->event) {
>>   		spin_lock(&crtc->dev->event_lock);
>> @@ -264,6 +266,8 @@ static void vkms_crtc_atomic_flush(struct drm_crtc *crtc,
>>   	}
>>   
>>   	vkms_output->composer_state = to_vkms_crtc_state(crtc->state);
>> +	vkms_output->composer_state->crtc_width = mode->hdisplay;
>> +	vkms_output->composer_state->crtc_height = mode->vdisplay;
> 
> Is the crtc not keeping track of the current mode, do you really need
> your own crtc_width and crtc_height?
> 

I don't really need it. I was just putting more easily accessible to the 
composer functions.

But np, I can change this.

> 
> Thanks,
> pq
> 
>>   
>>   	spin_unlock_irq(&vkms_output->lock);
>>   }
>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>> index 2704cfb6904b..ab92d9f7b701 100644
>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>> @@ -90,6 +90,8 @@ struct vkms_crtc_state {
>>   	bool wb_pending;
>>   	u64 frame_start;
>>   	u64 frame_end;
>> +	u16 crtc_width;
>> +	u16 crtc_height;
>>   };
>>   
>>   struct vkms_output {
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job`
  2022-04-23 15:12     ` Igor Torrente
@ 2022-04-25  7:56       ` Pekka Paalanen
  2022-04-26  0:56         ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-25  7:56 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 9113 bytes --]

On Sat, 23 Apr 2022 12:12:51 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Hi Pekka,
> 
> On 4/20/22 08:23, Pekka Paalanen wrote:
> > On Mon,  4 Apr 2022 17:45:11 -0300
> > Igor Torrente <igormtorrente@gmail.com> wrote:
> >   
> >> This commit is the groundwork to introduce new formats to the planes and
> >> writeback buffer. As part of it, a new buffer metadata field is added to
> >> `vkms_writeback_job`, this metadata is represented by the `vkms_composer`
> >> struct.  
> > 
> > Hi,
> > 
> > should this be talking about vkms_frame_info struct instead?  
> 
> Yes it should. I will fix this. Thanks.
> 
> >   
> >>
> >> Also adds two new function pointers (`{wb,plane}_format_transform_func`)
> >> are defined to handle format conversion to/from internal format.
> >>
> >> These things will allow us, in the future, to have different compositing
> >> and wb format types.
> >>
> >> V2: Change the code to get the drm_framebuffer reference and not copy its
> >>      contents(Thomas Zimmermann).
> >> V3: Drop the refcount in the wb code(Thomas Zimmermann).
> >> V5: Add {wb,plane}_format_transform_func to vkms_writeback_job
> >>      and vkms_plane_state (Pekka Paalanen)
> >>
> >> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> >> ---
> >>   drivers/gpu/drm/vkms/vkms_composer.c  |  4 ++--
> >>   drivers/gpu/drm/vkms/vkms_drv.h       | 31 +++++++++++++++++++++------
> >>   drivers/gpu/drm/vkms/vkms_plane.c     | 10 ++++-----
> >>   drivers/gpu/drm/vkms/vkms_writeback.c | 20 ++++++++++++++---
> >>   4 files changed, 49 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> >> index 2d946368a561..95029d2ebcac 100644
> >> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> >> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> >> @@ -153,7 +153,7 @@ static void compose_plane(struct vkms_frame_info *primary_plane_info,
> >>   			  struct vkms_frame_info *plane_frame_info,
> >>   			  void *vaddr_out)
> >>   {
> >> -	struct drm_framebuffer *fb = &plane_frame_info->fb;
> >> +	struct drm_framebuffer *fb = plane_frame_info->fb;
> >>   	void *vaddr;
> >>   	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
> >>   
> >> @@ -175,7 +175,7 @@ static int compose_active_planes(void **vaddr_out,
> >>   				 struct vkms_frame_info *primary_plane_info,
> >>   				 struct vkms_crtc_state *crtc_state)
> >>   {
> >> -	struct drm_framebuffer *fb = &primary_plane_info->fb;
> >> +	struct drm_framebuffer *fb = primary_plane_info->fb;
> >>   	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
> >>   	const void *vaddr;
> >>   	int i;
> >> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> >> index 2e6342164bef..2704cfb6904b 100644
> >> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> >> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> >> @@ -22,13 +22,8 @@
> >>   
> >>   #define NUM_OVERLAY_PLANES 8
> >>   
> >> -struct vkms_writeback_job {
> >> -	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
> >> -	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
> >> -};
> >> -
> >>   struct vkms_frame_info {
> >> -	struct drm_framebuffer fb;
> >> +	struct drm_framebuffer *fb;
> >>   	struct drm_rect src, dst;
> >>   	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
> >>   	unsigned int offset;
> >> @@ -36,6 +31,29 @@ struct vkms_frame_info {
> >>   	unsigned int cpp;
> >>   };
> >>   
> >> +struct pixel_argb_u16 {
> >> +	u16 a, r, g, b;
> >> +};
> >> +
> >> +struct line_buffer {
> >> +	size_t n_pixels;
> >> +	struct pixel_argb_u16 *pixels;
> >> +};
> >> +
> >> +typedef void
> >> +(*wb_format_transform_func)(struct vkms_frame_info *frame_info,
> >> +			    const struct line_buffer *buffer, int y);
> >> +
> >> +typedef void
> >> +(*plane_format_transform_func)(struct line_buffer *buffer,
> >> +			       const struct vkms_frame_info *frame_info, int y);  
> > 
> > It wasn't immediately obvious to me in which direction these function
> > types work from their names. The arguments are not wb and plane but
> > vkms_frame_info and line_buffer in both. The implementations of these
> > functions would have nothing specific to a wb or a plane either, would
> > they?  
> 
> No, there's nothing specific.
> 
> Do you think adding {dst_,src_} would be enough?
> 
> (*wb_format_transform_func)(struct vkms_frame_info *dst_frame_info,
> 			    const struct line_buffer *src_buffer
> 
> (*plane_format_transform_func)(struct line_buffer *dst_buffer,
> 			   const struct vkms_frame_info *src_frame_info,

No, because I was looking at the function pointer type names, and not
the function arguments.

> > 
> > What about naming them frame_to_line_func and line_to_frame_func?  
> 
> Sounds good. I will rename it.

Thanks!

> >> +
> >> +struct vkms_writeback_job {
> >> +	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
> >> +	struct vkms_frame_info frame_info;  
> > 
> > Which frame_info is this? Should the field be called wb_frame_info?  
> 
> Considering it's already in the writeback_job struct do you think this
> necessary?

This struct has 'data' too, and that is not the wb buffer, right?

Hmm, if it's not the wb buffer, then using DRM_FORMAT_MAX_PLANES is
odd...

> In other words, what kind of misudertanding do you think can happen if
> this variable stay without the `wb_` prefix?
> 
> I spent a few minutes trying to find a case, but nothing came to my
> mind.

My question above is the confusion.

If all these members are about the wb destination buffer only, then
where do the inputs come from and how are they reference-counted to
make sure they are available when needed?

> >> +	wb_format_transform_func format_func;  
> > 
> > line_to_frame_func wb_write;
> > 
> > perhaps? The type explains the general type of the function, and the
> > field name refers to what it is used for.
> >   
> >> +};
> >> +
> >>   /**
> >>    * vkms_plane_state - Driver specific plane state
> >>    * @base: base plane state
> >> @@ -44,6 +62,7 @@ struct vkms_frame_info {
> >>   struct vkms_plane_state {
> >>   	struct drm_shadow_plane_state base;
> >>   	struct vkms_frame_info *frame_info;
> >> +	plane_format_transform_func format_func;  
> > 
> > Similarly here, maybe
> > 
> > frame_to_line_func plane_read;
> > 
> > perhaps?  
> 
> Yeah, sure.
> 
> >   
> >>   };
> >>   
> >>   struct vkms_plane {
> >> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> >> index a56b0f76eddd..28752af0118c 100644
> >> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> >> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> >> @@ -50,12 +50,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
> >>   	struct vkms_plane_state *vkms_state = to_vkms_plane_state(old_state);
> >>   	struct drm_crtc *crtc = vkms_state->base.base.crtc;
> >>   
> >> -	if (crtc) {
> >> +	if (crtc && vkms_state->frame_info->fb) {
> >>   		/* dropping the reference we acquired in
> >>   		 * vkms_primary_plane_update()
> >>   		 */
> >> -		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
> >> -			drm_framebuffer_put(&vkms_state->frame_info->fb);
> >> +		if (drm_framebuffer_read_refcount(vkms_state->frame_info->fb))
> >> +			drm_framebuffer_put(vkms_state->frame_info->fb);
> >>   	}
> >>   
> >>   	kfree(vkms_state->frame_info);
> >> @@ -110,9 +110,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> >>   	frame_info = vkms_plane_state->frame_info;
> >>   	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
> >>   	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
> >> -	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
> >> +	frame_info->fb = fb;  
> > 
> > This change, replacing the memcpy with storing a pointer, seems to be
> > another major point of this patch. Should it be a separate patch?
> > It doesn't seem to fit with the current commit message.
> > 
> > I have no idea what kind of locking or referencing a drm_framebuffer
> > would need, and I suspect that would be easier to review if it was a
> > patch of its own.  
> 
> Makes sense. I will do that.
> 
> >   
> >>   	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
> >> -	drm_framebuffer_get(&frame_info->fb);
> >> +	drm_framebuffer_get(frame_info->fb);  
> > 
> > Does drm_framebuffer_get() not return anything?  
> 
> No, it doesn't actually. This function increments the ref count of this
> struct and doesn't return anything.

D'oh. Oh well.


Thanks,
pq

> > 
> > To me it would be more idiomatic to write something like
> > 
> > 	frame_info->fb = drm_framebuffer_get(fb);
> > I spend few minutes trying to find a case but nothing comes to my mind.
> > I don't know if that pattern is used in the kernel, but I use it in
> > userspace to emphasise that frame_info owns a new reference rather than
> > borrowing someone else's.
> > 
> > 
> > Thanks,
> > pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-04-23 18:53       ` Igor Torrente
@ 2022-04-25  8:10         ` Pekka Paalanen
  2022-04-26  1:54           ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-25  8:10 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 14298 bytes --]

On Sat, 23 Apr 2022 15:53:20 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> I forgot to respond some points from your review.
> 
> On 4/23/22 13:04, Igor Torrente wrote:
> > Hi Pekka,
> > 
> > On 4/20/22 09:36, Pekka Paalanen wrote:  
> >> On Mon,  4 Apr 2022 17:45:12 -0300
> >> Igor Torrente <igormtorrente@gmail.com> wrote:
> >>  
> >>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
> >>> as a color input.
> >>>
> >>> This patch refactors all the functions related to the plane composition
> >>> to overcome this limitation.
> >>>
> >>> A new internal format(`struct pixel`) is introduced to deal with all  
> >>
> >> Hi,
> >>
> >> struct pixel_argb_u16 was added in the previous patch.  
> > 
> > I will fix it. Thanks!

...

> >>> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
> >>> +				 struct vkms_crtc_state *crtc_state,
> >>> +				 u32 *crc32)
> >>>    {
> >>> +	int line_width, ret = 0, pixel_size = sizeof(struct pixel_argb_u16);
> >>> +	struct vkms_frame_info *primary_plane_info = NULL;
> >>> +	struct line_buffer output_buffer, stage_buffer;
> >>> +	struct vkms_plane_state *act_plane = NULL;
> >>> +	u32 wb_format;
> >>>    
> >>> +	if (WARN_ON(pixel_size != 8))  
> >>
> >> Isn't there a compile-time assert macro for this? Having to actually
> >> run VKMS to check for this reduces the chances of finding it early.
> >> What's the reason for this check anyway?  
> 
> Yes, it exists.
> 
> include/linux/build_bug.h:1:#define static_assert(expr, ...) 
> __static_assert(expr, ##__VA_ARGS__, #expr)
> 
> I didn't add it because I can imagine some people very mad if the kernel 
> did not compile because of vkms.

But that would mean that VKMS is broken on those platforms. You'd
better know which platforms VKMS is broken, so the Kconfig can stop
VKMS from being built there at all. Or better, fix it before anyone
needs VKMS there.

> This check exists so we can call `crc32_le` for the entire line instead
> doing it for each channel of each pixel in case `struct `pixel_argb_u16`
> had any gap added by the compiler between the struct fields.

Oh the CRC computation. Good point.

Can you add a comment about that with the check?

> >>  
> >>> +		return -EINVAL;
> >>> +
> >>> +	if (crtc_state->num_active_planes >= 1) {
> >>> +		act_plane = crtc_state->active_planes[0];
> >>> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> >>> +			primary_plane_info = act_plane->frame_info;  
> >>
> >> After the next patch, do you even need the primary plane for anything
> >> specifically?  
> 
> Yeah, I will not need it anymore.
> 
> >> There is the map_is_null check below, but that should be
> >> done on all planes in the array, right?  
> 
> Yes, I guess so. And I don't know why it only checks for the 
> primary_plane TBH.

Maybe a left-over from times when it didn't have anything but a primary
plane?

> >>
> >> I suspect the next patch, or another patch in this series, should just
> >> delete this chunk.  
> I should, and I will in the V6 of next patch.
> 
> > 
> > 
> >   
> >>  
> >>>    	}
> >>>    
> >>> +	if (!primary_plane_info)
> >>> +		return -EINVAL;
> >>> +
> >>>    	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
> >>>    		return -EINVAL;
> >>>    
> >>> +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
> >>> +		return -EINVAL;
> >>>    
> >>> +	line_width = drm_rect_width(&primary_plane_info->dst);
> >>> +	stage_buffer.n_pixels = line_width;
> >>> +	output_buffer.n_pixels = line_width;
> >>>    
> >>> +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> >>> +	if (!stage_buffer.pixels) {
> >>> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
> >>> +		return -ENOMEM;
> >>> +	}
> >>>    
> >>> +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> >>> +	if (!output_buffer.pixels) {
> >>> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
> >>> +		ret = -ENOMEM;
> >>> +		goto free_stage_buffer;
> >>> +	}
> >>> +
> >>> +	if (active_wb) {
> >>> +		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
> >>> +
> >>> +		wb_format = wb_frame_info->fb->format->format;  
> >>
> >> I don't see wb_format being used, is it?  
> > 
> > This is probably a leftover from the last versions. Thanks for catching
> > it.
> >   
> >>  
> >>> +		wb_frame_info->src = primary_plane_info->src;
> >>> +		wb_frame_info->dst = primary_plane_info->dst;
> >>> +	}
> >>> +
> >>> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
> >>> +	      &output_buffer, (s64)line_width * pixel_size);  
> >>
> >> What's the (s64) doing here?
> >>
> >> Are byte sizes not usually expressed with size_t or ssize_t types, or
> >> is the kernel convention to use u64 and s64?
> >>
> >> This makes me suspect that pixel_offset() and friends in vkms_format.c
> >> are going to need fixing as well. int type overflows at 2G.  
> > 
> > 
> > Yeah, I should be using size_t in all these places.
> >   
> >>  
> >>> +
> >>> +	kvfree(output_buffer.pixels);
> >>> +free_stage_buffer:
> >>> +	kvfree(stage_buffer.pixels);
> >>> +can
> >>> +	return ret;
> >>>    }
> >>>    
> >>>    /**
> >>> @@ -222,13 +204,11 @@ void vkms_composer_worker(struct work_struct *work)
> >>>    						struct vkms_crtc_state,
> >>>    						composer_work);
> >>>    	struct drm_crtc *crtc = crtc_state->base.crtc;
> >>> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
> >>>    	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
> >>>    	bool crc_pending, wb_pending;
> >>>    	u64 frame_start, frame_end;
> >>> +	u32 crc32 = 0;
> >>>    	int ret;
> >>>    
> >>>    	spin_lock_irq(&out->composer_lock);
> >>> @@ -248,35 +228,19 @@ void vkms_composer_worker(struct work_struct *work)
> >>>    	if (!crc_pending)
> >>>    		return;
> >>>    
> >>>    	if (wb_pending)
> >>> +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
> >>> +	else
> >>> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
> >>>    
> >>> +	if (ret)
> >>>    		return;
> >>>    
> >>>    	if (wb_pending) {
> >>>    		drm_writeback_signal_completion(&out->wb_connector, 0);
> >>>    		spin_lock_irq(&out->composer_lock);
> >>>    		crtc_state->wb_pending = false;
> >>>    		spin_unlock_irq(&out->composer_lock);
> >>>    	}
> >>>    
> >>>    	/*
> >>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> >>> new file mode 100644
> >>> index 000000000000..931a61405d6a
> >>> --- /dev/null
> >>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> >>> @@ -0,0 +1,151 @@
> >>> +// SPDX-License-Identifier: GPL-2.0+
> >>> +
> >>> +#include <drm/drm_rect.h>
> >>> +#include <linux/minmax.h>
> >>> +
> >>> +#include "vkms_formats.h"
> >>> +
> >>> +static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> >>> +{
> >>> +	return frame_info->offset + (y * frame_info->pitch)
> >>> +				  + (x * frame_info->cpp);
> >>> +}
> >>> +
> >>> +/*
> >>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> >>> + *
> >>> + * @frame_info: Buffer metadata
> >>> + * @x: The x(width) coordinate of the 2D buffer
> >>> + * @y: The y(Heigth) coordinate of the 2D buffercan
> >>> + *
> >>> + * Takes the information stored in the frame_info, a pair of coordinates, and
> >>> + * returns the address of the first color channel.
> >>> + * This function assumes the channels are packed together, i.e. a color channel
> >>> + * comes immediately after another in the memory. And therefore, this function
> >>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> >>> + */
> >>> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> >>> +				int x, int y)
> >>> +{
> >>> +	int offset = pixel_offset(frame_info, x, y);
> >>> +
> >>> +	return (u8 *)frame_info->map[0].vaddr + offset;
> >>> +}
> >>> +
> >>> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> >>> +{
> >>> +	int x_src = frame_info->src.x1 >> 16;
> >>> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
> >>> +
> >>> +	return packed_pixels_addr(frame_info, x_src, y_src);
> >>> +}
> >>> +
> >>> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> >>> +				 const struct vkms_frame_info *frame_info, int y)
> >>> +{
> >>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> >>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> >>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> >>> +			       stage_buffer->n_pixels);
> >>> +
> >>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> >>> +		/*
> >>> +		 * The 257 is the "conversion ratio". This number is obtained by the
> >>> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
> >>> +		 * the best color value in a pixel format with more possibilities.
> >>> +		 * A similar idea applies to others RGB color conversions.
> >>> +		 */
> >>> +		out_pixels[x].a = (u16)src_pixels[3] * 257;
> >>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> >>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> >>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> >>> +	}
> >>> +}
> >>> +
> >>> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> >>> +				 const struct vkms_frame_info *frame_info, int y)
> >>> +{
> >>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> >>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> >>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> >>> +			       stage_buffer->n_pixels);
> >>> +
> >>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> >>> +		out_pixels[x].a = (u16)0xffff;
> >>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> >>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> >>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> >>> +	}
> >>> +}
> >>> +
> >>> +/*
> >>> + * The following  functions take an line of argb_u16 pixels from the
> >>> + * src_buffer, convert them to a specific format, and store them in the
> >>> + * destination.
> >>> + *
> >>> + * They are used in the `compose_active_planes` to convert and store a line
> >>> + * from the src_buffer to the writeback buffer.
> >>> + */
> >>> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
> >>> +				 const struct line_buffer *src_buffer, int y)
> >>> +{
> >>> +	int x, x_dst = frame_info->dst.x1;
> >>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> >>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> >>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> >>> +			    src_buffer->n_pixels);
> >>> +
> >>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> >>> +		/*
> >>> +		 * This sequence below is important because the format's byte order is
> >>> +		 * in little-endian. In the case of the ARGB8888 the memory is
> >>> +		 * organized this way:
> >>> +		 *
> >>> +		 * | Addr     | = blue channel
> >>> +		 * | Addr + 1 | = green channel
> >>> +		 * | Addr + 2 | = Red channel
> >>> +		 * | Addr + 3 | = Alpha channel
> >>> +		 */
> >>> +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
> >>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> >>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> >>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> >>> +	}
> >>> +}
> >>> +
> >>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
> >>> +				 const struct line_buffer *src_buffer, int y)
> >>> +{
> >>> +	int x, x_dst = frame_info->dst.x1;
> >>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> >>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> >>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> >>> +			    src_buffer->n_pixels);
> >>> +
> >>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> >>> +		dst_pixels[3] = (u8)0xff;  
> >>
> >> When writing to XRGB, it's not necessary to ensure the X channel has
> >> any sensible value. Anyone reading from XRGB must ignore that value
> >> anyway. So why not write something wacky here, like 0xa1, that is far
> >> enough from both 0x00 or 0xff to not be confused with them even
> >> visually? Also not 0x7f or 0x80 which are close to half of 0xff.
> >>
> >> Or, you could save a whole function and just use argb_u16_to_ARGBxxxx()
> >> instead, even for XRGB destination.  
> > 
> > 
> > Right. Maybe I could just leave the channel untouched.

Untouched may not be a good idea. Leaving anything untouched always has
the risk of leaking information through uninitialized memory. Maybe not
in this case because the destination is allocated by userspace already,
but nothing beats being obviously correct.

Whatever you decide here, be prepared for it becoming de-facto kernel
UABI, because it is easy for userspace to (accidentally) rely on the
value, no matter what you pick.


Thanks,
pq


> >   
> >>  
> >>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> >>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> >>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> >>> +	}
> >>> +}
> >>> +
> >>> +plane_format_transform_func get_plane_fmt_transform_function(u32 format)
> >>> +{
> >>> +	if (format == DRM_FORMAT_ARGB8888)
> >>> +		return &ARGB8888_to_argb_u16;
> >>> +	else if (format == DRM_FORMAT_XRGB8888)
> >>> +		return &XRGB8888_to_argb_u16;
> >>> +	else
> >>> +		return NULL;  
> >>
> >> This works for now, but when more formats are added, I'd think a switch
> >> statement would look better.  
> > 
> > ok.
> >   
> >>  
> >>> +}
> >>> +
> >>> +wb_format_transform_func get_wb_fmt_transform_function(u32 format)
> >>> +{
> >>> +	if (format == DRM_FORMAT_ARGB8888)
> >>> +		return &argb_u16_to_ARGB8888;
> >>> +	else if (format == DRM_FORMAT_XRGB8888)
> >>> +		return &argb_u16_to_XRGB8888;
> >>> +	else
> >>> +		return NULL;
> >>> +}

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job`
  2022-04-25  7:56       ` Pekka Paalanen
@ 2022-04-26  0:56         ` Igor Torrente
  2022-04-26  7:09           ` Pekka Paalanen
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-26  0:56 UTC (permalink / raw)
  To: Pekka Paalanen, rodrigosiqueiramelo, tzimmermann
  Cc: hamohammed.sa, airlied, leandro.ribeiro, melissa.srw, dri-devel,
	tales.aparecida, ~lkcamp/patches

Hi Pekka,

On 4/25/22 04:56, Pekka Paalanen wrote:
> On Sat, 23 Apr 2022 12:12:51 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
> 
>> Hi Pekka,
>>
>> On 4/20/22 08:23, Pekka Paalanen wrote:
>>> On Mon,  4 Apr 2022 17:45:11 -0300
>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>>    
>>>> This commit is the groundwork to introduce new formats to the planes and
>>>> writeback buffer. As part of it, a new buffer metadata field is added to
>>>> `vkms_writeback_job`, this metadata is represented by the `vkms_composer`
>>>> struct.
>>>
>>> Hi,
>>>
>>> should this be talking about vkms_frame_info struct instead?
>>
>> Yes it should. I will fix this. Thanks.
>>
>>>    
>>>>
>>>> Also adds two new function pointers (`{wb,plane}_format_transform_func`)
>>>> are defined to handle format conversion to/from internal format.
>>>>
>>>> These things will allow us, in the future, to have different compositing
>>>> and wb format types.
>>>>
>>>> V2: Change the code to get the drm_framebuffer reference and not copy its
>>>>       contents(Thomas Zimmermann).
>>>> V3: Drop the refcount in the wb code(Thomas Zimmermann).
>>>> V5: Add {wb,plane}_format_transform_func to vkms_writeback_job
>>>>       and vkms_plane_state (Pekka Paalanen)
>>>>
>>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>>>> ---
>>>>    drivers/gpu/drm/vkms/vkms_composer.c  |  4 ++--
>>>>    drivers/gpu/drm/vkms/vkms_drv.h       | 31 +++++++++++++++++++++------
>>>>    drivers/gpu/drm/vkms/vkms_plane.c     | 10 ++++-----
>>>>    drivers/gpu/drm/vkms/vkms_writeback.c | 20 ++++++++++++++---
>>>>    4 files changed, 49 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> index 2d946368a561..95029d2ebcac 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> @@ -153,7 +153,7 @@ static void compose_plane(struct vkms_frame_info *primary_plane_info,
>>>>    			  struct vkms_frame_info *plane_frame_info,
>>>>    			  void *vaddr_out)
>>>>    {
>>>> -	struct drm_framebuffer *fb = &plane_frame_info->fb;
>>>> +	struct drm_framebuffer *fb = plane_frame_info->fb;
>>>>    	void *vaddr;
>>>>    	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>>>>    
>>>> @@ -175,7 +175,7 @@ static int compose_active_planes(void **vaddr_out,
>>>>    				 struct vkms_frame_info *primary_plane_info,
>>>>    				 struct vkms_crtc_state *crtc_state)
>>>>    {
>>>> -	struct drm_framebuffer *fb = &primary_plane_info->fb;
>>>> +	struct drm_framebuffer *fb = primary_plane_info->fb;
>>>>    	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>>>>    	const void *vaddr;
>>>>    	int i;
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>>> index 2e6342164bef..2704cfb6904b 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>>>> @@ -22,13 +22,8 @@
>>>>    
>>>>    #define NUM_OVERLAY_PLANES 8
>>>>    
>>>> -struct vkms_writeback_job {
>>>> -	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
>>>> -	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
>>>> -};
>>>> -
>>>>    struct vkms_frame_info {
>>>> -	struct drm_framebuffer fb;
>>>> +	struct drm_framebuffer *fb;
>>>>    	struct drm_rect src, dst;
>>>>    	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
>>>>    	unsigned int offset;
>>>> @@ -36,6 +31,29 @@ struct vkms_frame_info {
>>>>    	unsigned int cpp;
>>>>    };
>>>>    
>>>> +struct pixel_argb_u16 {
>>>> +	u16 a, r, g, b;
>>>> +};
>>>> +
>>>> +struct line_buffer {
>>>> +	size_t n_pixels;
>>>> +	struct pixel_argb_u16 *pixels;
>>>> +};
>>>> +
>>>> +typedef void
>>>> +(*wb_format_transform_func)(struct vkms_frame_info *frame_info,
>>>> +			    const struct line_buffer *buffer, int y);
>>>> +
>>>> +typedef void
>>>> +(*plane_format_transform_func)(struct line_buffer *buffer,
>>>> +			       const struct vkms_frame_info *frame_info, int y);
>>>
>>> It wasn't immediately obvious to me in which direction these function
>>> types work from their names. The arguments are not wb and plane but
>>> vkms_frame_info and line_buffer in both. The implementations of these
>>> functions would have nothing specific to a wb or a plane either, would
>>> they?
>>
>> No, there's nothing specific.
>>
>> Do you think adding {dst_,src_} would be enough?
>>
>> (*wb_format_transform_func)(struct vkms_frame_info *dst_frame_info,
>> 			    const struct line_buffer *src_buffer
>>
>> (*plane_format_transform_func)(struct line_buffer *dst_buffer,
>> 			   const struct vkms_frame_info *src_frame_info,
> 
> No, because I was looking at the function pointer type names, and not
> the function arguments.

Ohhh.

> 
>>>
>>> What about naming them frame_to_line_func and line_to_frame_func?
>>
>> Sounds good. I will rename it.
> 
> Thanks!
> 
>>>> +
>>>> +struct vkms_writeback_job {
>>>> +	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
>>>> +	struct vkms_frame_info frame_info;
>>>
>>> Which frame_info is this? Should the field be called wb_frame_info?
>>
>> Considering it's already in the writeback_job struct do you think this
>> necessary?
> 
> This struct has 'data' too, and that is not the wb buffer, right?

AFAIU, no. Each plane has its own `iosys_map map[]`.

> 
> Hmm, if it's not the wb buffer, then using DRM_FORMAT_MAX_PLANES is
> odd...

Yeah. I honestly don't have any clue why we have an array of `iosys_map`
for each plane, why we only use the map[0] and why we only call
`iosys_map_is_null` only to the `primary_composer`.

Maybe @tzimmermann or @rodrigosiqueiramelo can shed some light on these
questions.

> 
>> In other words, what kind of misudertanding do you think can happen if
>> this variable stay without the `wb_` prefix?
>>
>> I spent a few minutes trying to find a case, but nothing came to my
>> mind.
> 
> My question above is the confusion.
> 
> If all these members are about the wb destination buffer only, then
> where do the inputs come from and how are they reference-counted to
> make sure they are available when needed?

Ok. Got it.

> 
>>>> +	wb_format_transform_func format_func;
>>>
>>> line_to_frame_func wb_write;
>>>
>>> perhaps? The type explains the general type of the function, and the
>>> field name refers to what it is used for.
>>>    
>>>> +};
>>>> +
>>>>    /**
>>>>     * vkms_plane_state - Driver specific plane state
>>>>     * @base: base plane state
>>>> @@ -44,6 +62,7 @@ struct vkms_frame_info {
>>>>    struct vkms_plane_state {
>>>>    	struct drm_shadow_plane_state base;
>>>>    	struct vkms_frame_info *frame_info;
>>>> +	plane_format_transform_func format_func;
>>>
>>> Similarly here, maybe
>>>
>>> frame_to_line_func plane_read;
>>>
>>> perhaps?
>>
>> Yeah, sure.
>>
>>>    
>>>>    };
>>>>    
>>>>    struct vkms_plane {
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
>>>> index a56b0f76eddd..28752af0118c 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_plane.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
>>>> @@ -50,12 +50,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
>>>>    	struct vkms_plane_state *vkms_state = to_vkms_plane_state(old_state);
>>>>    	struct drm_crtc *crtc = vkms_state->base.base.crtc;
>>>>    
>>>> -	if (crtc) {
>>>> +	if (crtc && vkms_state->frame_info->fb) {
>>>>    		/* dropping the reference we acquired in
>>>>    		 * vkms_primary_plane_update()
>>>>    		 */
>>>> -		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
>>>> -			drm_framebuffer_put(&vkms_state->frame_info->fb);
>>>> +		if (drm_framebuffer_read_refcount(vkms_state->frame_info->fb))
>>>> +			drm_framebuffer_put(vkms_state->frame_info->fb);
>>>>    	}
>>>>    
>>>>    	kfree(vkms_state->frame_info);
>>>> @@ -110,9 +110,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>>>    	frame_info = vkms_plane_state->frame_info;
>>>>    	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
>>>>    	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
>>>> -	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
>>>> +	frame_info->fb = fb;
>>>
>>> This change, replacing the memcpy with storing a pointer, seems to be
>>> another major point of this patch. Should it be a separate patch?
>>> It doesn't seem to fit with the current commit message.
>>>
>>> I have no idea what kind of locking or referencing a drm_framebuffer
>>> would need, and I suspect that would be easier to review if it was a
>>> patch of its own.
>>
>> Makes sense. I will do that.
>>
>>>    
>>>>    	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
>>>> -	drm_framebuffer_get(&frame_info->fb);
>>>> +	drm_framebuffer_get(frame_info->fb);
>>>
>>> Does drm_framebuffer_get() not return anything?
>>
>> No, it doesn't actually. This function increments the ref count of this
>> struct and doesn't return anything.
> 
> D'oh. Oh well.
> 
> 
> Thanks,
> pq
> 
>>>
>>> To me it would be more idiomatic to write something like
>>>
>>> 	frame_info->fb = drm_framebuffer_get(fb);
>>> I spend few minutes trying to find a case but nothing comes to my mind.
>>> I don't know if that pattern is used in the kernel, but I use it in
>>> userspace to emphasise that frame_info owns a new reference rather than
>>> borrowing someone else's.
>>>
>>>
>>> Thanks,
>>> pq

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-04-25  8:10         ` Pekka Paalanen
@ 2022-04-26  1:54           ` Igor Torrente
  2022-04-27  1:03             ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-26  1:54 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

Hi Pekka,

On 4/25/22 05:10, Pekka Paalanen wrote:
> On Sat, 23 Apr 2022 15:53:20 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
> 
>> I forgot to respond some points from your review.
>>
>> On 4/23/22 13:04, Igor Torrente wrote:
>>> Hi Pekka,
>>>
>>> On 4/20/22 09:36, Pekka Paalanen wrote:
>>>> On Mon,  4 Apr 2022 17:45:12 -0300
>>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>>>   
>>>>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>>>>> as a color input.
>>>>>
>>>>> This patch refactors all the functions related to the plane composition
>>>>> to overcome this limitation.
>>>>>
>>>>> A new internal format(`struct pixel`) is introduced to deal with all
>>>>
>>>> Hi,
>>>>
>>>> struct pixel_argb_u16 was added in the previous patch.
>>>
>>> I will fix it. Thanks!
> 
> ...
> 
>>>>> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
>>>>> +				 struct vkms_crtc_state *crtc_state,
>>>>> +				 u32 *crc32)
>>>>>     {
>>>>> +	int line_width, ret = 0, pixel_size = sizeof(struct pixel_argb_u16);
>>>>> +	struct vkms_frame_info *primary_plane_info = NULL;
>>>>> +	struct line_buffer output_buffer, stage_buffer;
>>>>> +	struct vkms_plane_state *act_plane = NULL;
>>>>> +	u32 wb_format;
>>>>>     
>>>>> +	if (WARN_ON(pixel_size != 8))
>>>>
>>>> Isn't there a compile-time assert macro for this? Having to actually
>>>> run VKMS to check for this reduces the chances of finding it early.
>>>> What's the reason for this check anyway?
>>
>> Yes, it exists.
>>
>> include/linux/build_bug.h:1:#define static_assert(expr, ...)
>> __static_assert(expr, ##__VA_ARGS__, #expr)
>>
>> I didn't add it because I can imagine some people very mad if the kernel
>> did not compile because of vkms.
> 
> But that would mean that VKMS is broken on those platforms. You'd
> better know which platforms VKMS is broken, so the Kconfig can stop
> VKMS from being built there at all. Or better, fix it before anyone
> needs VKMS there.

Right. Makes sense. I will add it then.

> 
>> This check exists so we can call `crc32_le` for the entire line instead
>> doing it for each channel of each pixel in case `struct `pixel_argb_u16`
>> had any gap added by the compiler between the struct fields.
> 
> Oh the CRC computation. Good point.
> 
> Can you add a comment about that with the check?

Yeah, np.

I will copy the explanation above :)

> 
>>>>   
>>>>> +		return -EINVAL;
>>>>> +
>>>>> +	if (crtc_state->num_active_planes >= 1) {
>>>>> +		act_plane = crtc_state->active_planes[0];
>>>>> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>>>>> +			primary_plane_info = act_plane->frame_info;
>>>>
>>>> After the next patch, do you even need the primary plane for anything
>>>> specifically?
>>
>> Yeah, I will not need it anymore.
>>
>>>> There is the map_is_null check below, but that should be
>>>> done on all planes in the array, right?
>>
>> Yes, I guess so. And I don't know why it only checks for the
>> primary_plane TBH.
> 
> Maybe a left-over from times when it didn't have anything but a primary
> plane?

Maybe.

Anyway, I have added this verification to all active planes in the  next 
version.

> 
>>>>
>>>> I suspect the next patch, or another patch in this series, should just
>>>> delete this chunk.
>> I should, and I will in the V6 of next patch.
>>
>>>
>>>
>>>    
>>>>   
>>>>>     	}
>>>>>     
>>>>> +	if (!primary_plane_info)
>>>>> +		return -EINVAL;
>>>>> +
>>>>>     	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>>>>>     		return -EINVAL;
>>>>>     
>>>>> +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
>>>>> +		return -EINVAL;
>>>>>     
>>>>> +	line_width = drm_rect_width(&primary_plane_info->dst);
>>>>> +	stage_buffer.n_pixels = line_width;
>>>>> +	output_buffer.n_pixels = line_width;
>>>>>     
>>>>> +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>>>> +	if (!stage_buffer.pixels) {
>>>>> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
>>>>> +		return -ENOMEM;
>>>>> +	}
>>>>>     
>>>>> +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>>>> +	if (!output_buffer.pixels) {
>>>>> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
>>>>> +		ret = -ENOMEM;
>>>>> +		goto free_stage_buffer;
>>>>> +	}
>>>>> +
>>>>> +	if (active_wb) {
>>>>> +		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>>>>> +
>>>>> +		wb_format = wb_frame_info->fb->format->format;
>>>>
>>>> I don't see wb_format being used, is it?
>>>
>>> This is probably a leftover from the last versions. Thanks for catching
>>> it.
>>>    
>>>>   
>>>>> +		wb_frame_info->src = primary_plane_info->src;
>>>>> +		wb_frame_info->dst = primary_plane_info->dst;
>>>>> +	}
>>>>> +
>>>>> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
>>>>> +	      &output_buffer, (s64)line_width * pixel_size);
>>>>
>>>> What's the (s64) doing here?
>>>>
>>>> Are byte sizes not usually expressed with size_t or ssize_t types, or
>>>> is the kernel convention to use u64 and s64?
>>>>
>>>> This makes me suspect that pixel_offset() and friends in vkms_format.c
>>>> are going to need fixing as well. int type overflows at 2G.
>>>
>>>
>>> Yeah, I should be using size_t in all these places.
>>>    
>>>>   
>>>>> +
>>>>> +	kvfree(output_buffer.pixels);
>>>>> +free_stage_buffer:
>>>>> +	kvfree(stage_buffer.pixels);
>>>>> +can
>>>>> +	return ret;
>>>>>     }
>>>>>     
>>>>>     /**
>>>>> @@ -222,13 +204,11 @@ void vkms_composer_worker(struct work_struct *work)
>>>>>     						struct vkms_crtc_state,
>>>>>     						composer_work);
>>>>>     	struct drm_crtc *crtc = crtc_state->base.crtc;
>>>>> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>>>>>     	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>>>>>     	bool crc_pending, wb_pending;
>>>>>     	u64 frame_start, frame_end;
>>>>> +	u32 crc32 = 0;
>>>>>     	int ret;
>>>>>     
>>>>>     	spin_lock_irq(&out->composer_lock);
>>>>> @@ -248,35 +228,19 @@ void vkms_composer_worker(struct work_struct *work)
>>>>>     	if (!crc_pending)
>>>>>     		return;
>>>>>     
>>>>>     	if (wb_pending)
>>>>> +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
>>>>> +	else
>>>>> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>>>>>     
>>>>> +	if (ret)
>>>>>     		return;
>>>>>     
>>>>>     	if (wb_pending) {
>>>>>     		drm_writeback_signal_completion(&out->wb_connector, 0);
>>>>>     		spin_lock_irq(&out->composer_lock);
>>>>>     		crtc_state->wb_pending = false;
>>>>>     		spin_unlock_irq(&out->composer_lock);
>>>>>     	}
>>>>>     
>>>>>     	/*
>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>> new file mode 100644
>>>>> index 000000000000..931a61405d6a
>>>>> --- /dev/null
>>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>> @@ -0,0 +1,151 @@
>>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>>> +
>>>>> +#include <drm/drm_rect.h>
>>>>> +#include <linux/minmax.h>
>>>>> +
>>>>> +#include "vkms_formats.h"
>>>>> +
>>>>> +static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
>>>>> +{
>>>>> +	return frame_info->offset + (y * frame_info->pitch)
>>>>> +				  + (x * frame_info->cpp);
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>>>>> + *
>>>>> + * @frame_info: Buffer metadata
>>>>> + * @x: The x(width) coordinate of the 2D buffer
>>>>> + * @y: The y(Heigth) coordinate of the 2D buffercan
>>>>> + *
>>>>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>>>>> + * returns the address of the first color channel.
>>>>> + * This function assumes the channels are packed together, i.e. a color channel
>>>>> + * comes immediately after another in the memory. And therefore, this function
>>>>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>>>>> + */
>>>>> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
>>>>> +				int x, int y)
>>>>> +{
>>>>> +	int offset = pixel_offset(frame_info, x, y);
>>>>> +
>>>>> +	return (u8 *)frame_info->map[0].vaddr + offset;
>>>>> +}
>>>>> +
>>>>> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
>>>>> +{
>>>>> +	int x_src = frame_info->src.x1 >> 16;
>>>>> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>>>>> +
>>>>> +	return packed_pixels_addr(frame_info, x_src, y_src);
>>>>> +}
>>>>> +
>>>>> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>>>>> +				 const struct vkms_frame_info *frame_info, int y)
>>>>> +{
>>>>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>>>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>>>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>> +			       stage_buffer->n_pixels);
>>>>> +
>>>>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>>>>> +		/*
>>>>> +		 * The 257 is the "conversion ratio". This number is obtained by the
>>>>> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
>>>>> +		 * the best color value in a pixel format with more possibilities.
>>>>> +		 * A similar idea applies to others RGB color conversions.
>>>>> +		 */
>>>>> +		out_pixels[x].a = (u16)src_pixels[3] * 257;
>>>>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>>>>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>>>>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>>>>> +	}
>>>>> +}
>>>>> +
>>>>> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>>>>> +				 const struct vkms_frame_info *frame_info, int y)
>>>>> +{
>>>>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>>>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>>>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>> +			       stage_buffer->n_pixels);
>>>>> +
>>>>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>>>>> +		out_pixels[x].a = (u16)0xffff;
>>>>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>>>>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>>>>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>>>>> +	}
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * The following  functions take an line of argb_u16 pixels from the
>>>>> + * src_buffer, convert them to a specific format, and store them in the
>>>>> + * destination.
>>>>> + *
>>>>> + * They are used in the `compose_active_planes` to convert and store a line
>>>>> + * from the src_buffer to the writeback buffer.
>>>>> + */
>>>>> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
>>>>> +				 const struct line_buffer *src_buffer, int y)
>>>>> +{
>>>>> +	int x, x_dst = frame_info->dst.x1;
>>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>> +			    src_buffer->n_pixels);
>>>>> +
>>>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>>>>> +		/*
>>>>> +		 * This sequence below is important because the format's byte order is
>>>>> +		 * in little-endian. In the case of the ARGB8888 the memory is
>>>>> +		 * organized this way:
>>>>> +		 *
>>>>> +		 * | Addr     | = blue channel
>>>>> +		 * | Addr + 1 | = green channel
>>>>> +		 * | Addr + 2 | = Red channel
>>>>> +		 * | Addr + 3 | = Alpha channel
>>>>> +		 */
>>>>> +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
>>>>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>>>>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>>>>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>>>>> +	}
>>>>> +}
>>>>> +
>>>>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
>>>>> +				 const struct line_buffer *src_buffer, int y)
>>>>> +{
>>>>> +	int x, x_dst = frame_info->dst.x1;
>>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>> +			    src_buffer->n_pixels);
>>>>> +
>>>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>>>>> +		dst_pixels[3] = (u8)0xff;
>>>>
>>>> When writing to XRGB, it's not necessary to ensure the X channel has
>>>> any sensible value. Anyone reading from XRGB must ignore that value
>>>> anyway. So why not write something wacky here, like 0xa1, that is far
>>>> enough from both 0x00 or 0xff to not be confused with them even
>>>> visually? Also not 0x7f or 0x80 which are close to half of 0xff.
>>>>
>>>> Or, you could save a whole function and just use argb_u16_to_ARGBxxxx()
>>>> instead, even for XRGB destination.
>>>
>>>
>>> Right. Maybe I could just leave the channel untouched.
> 
> Untouched may not be a good idea. Leaving anything untouched always has
> the risk of leaking information through uninitialized memory. Maybe not
> in this case because the destination is allocated by userspace already,
> but nothing beats being obviously correct.

Makes sense.

> 
> Whatever you decide here, be prepared for it becoming de-facto kernel
> UABI, because it is easy for userspace to (accidentally) rely on the
> value, no matter what you pick.

I hope to make the right decision then.

> 
> 
> Thanks,
> pq
> 
> 
>>>    
>>>>   
>>>>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>>>>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>>>>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>>>>> +	}
>>>>> +}
>>>>> +
>>>>> +plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>>>>> +{
>>>>> +	if (format == DRM_FORMAT_ARGB8888)
>>>>> +		return &ARGB8888_to_argb_u16;
>>>>> +	else if (format == DRM_FORMAT_XRGB8888)
>>>>> +		return &XRGB8888_to_argb_u16;
>>>>> +	else
>>>>> +		return NULL;
>>>>
>>>> This works for now, but when more formats are added, I'd think a switch
>>>> statement would look better.
>>>
>>> ok.
>>>    
>>>>   
>>>>> +}
>>>>> +
>>>>> +wb_format_transform_func get_wb_fmt_transform_function(u32 format)
>>>>> +{
>>>>> +	if (format == DRM_FORMAT_ARGB8888)
>>>>> +		return &argb_u16_to_ARGB8888;
>>>>> +	else if (format == DRM_FORMAT_XRGB8888)
>>>>> +		return &argb_u16_to_XRGB8888;
>>>>> +	else
>>>>> +		return NULL;
>>>>> +}

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job`
  2022-04-26  0:56         ` Igor Torrente
@ 2022-04-26  7:09           ` Pekka Paalanen
  2022-04-27  0:43             ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-26  7:09 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 4599 bytes --]

On Mon, 25 Apr 2022 21:56:12 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Hi Pekka,
> 
> On 4/25/22 04:56, Pekka Paalanen wrote:
> > On Sat, 23 Apr 2022 12:12:51 -0300
> > Igor Torrente <igormtorrente@gmail.com> wrote:
> >   
> >> Hi Pekka,
> >>
> >> On 4/20/22 08:23, Pekka Paalanen wrote:  
> >>> On Mon,  4 Apr 2022 17:45:11 -0300
> >>> Igor Torrente <igormtorrente@gmail.com> wrote:
> >>>      
> >>>> This commit is the groundwork to introduce new formats to the planes and
> >>>> writeback buffer. As part of it, a new buffer metadata field is added to
> >>>> `vkms_writeback_job`, this metadata is represented by the `vkms_composer`
> >>>> struct.  
> >>>
> >>> Hi,
> >>>
> >>> should this be talking about vkms_frame_info struct instead?  
> >>
> >> Yes it should. I will fix this. Thanks.
> >>  
> >>>      
> >>>>
> >>>> Also adds two new function pointers (`{wb,plane}_format_transform_func`)
> >>>> are defined to handle format conversion to/from internal format.
> >>>>
> >>>> These things will allow us, in the future, to have different compositing
> >>>> and wb format types.
> >>>>
> >>>> V2: Change the code to get the drm_framebuffer reference and not copy its
> >>>>       contents(Thomas Zimmermann).
> >>>> V3: Drop the refcount in the wb code(Thomas Zimmermann).
> >>>> V5: Add {wb,plane}_format_transform_func to vkms_writeback_job
> >>>>       and vkms_plane_state (Pekka Paalanen)
> >>>>
> >>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> >>>> ---
> >>>>    drivers/gpu/drm/vkms/vkms_composer.c  |  4 ++--
> >>>>    drivers/gpu/drm/vkms/vkms_drv.h       | 31 +++++++++++++++++++++------
> >>>>    drivers/gpu/drm/vkms/vkms_plane.c     | 10 ++++-----
> >>>>    drivers/gpu/drm/vkms/vkms_writeback.c | 20 ++++++++++++++---
> >>>>    4 files changed, 49 insertions(+), 16 deletions(-)

...

> >>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> >>>> index 2e6342164bef..2704cfb6904b 100644
> >>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> >>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h

...

> >>>> +
> >>>> +struct vkms_writeback_job {
> >>>> +	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
> >>>> +	struct vkms_frame_info frame_info;  
> >>>
> >>> Which frame_info is this? Should the field be called wb_frame_info?  
> >>
> >> Considering it's already in the writeback_job struct do you think this
> >> necessary?  
> > 
> > This struct has 'data' too, and that is not the wb buffer, right?  
> 
> AFAIU, no. Each plane has its own `iosys_map map[]`.
> 
> > 
> > Hmm, if it's not the wb buffer, then using DRM_FORMAT_MAX_PLANES is
> > odd...  
> 
> Yeah. I honestly don't have any clue why we have an array of `iosys_map`
> for each plane, why we only use the map[0] and why we only call
> `iosys_map_is_null` only to the `primary_composer`.
> 
> Maybe @tzimmermann or @rodrigosiqueiramelo can shed some light on these
> questions.

Yeah, those questions would be really good to figure out.

FWIW, I can tell you this: "plane" has two different meanings in the
context of KMS:

https://gitlab.freedesktop.org/pq/color-and-hdr/-/blob/main/doc/glossary.md#plane

DRM_FORMAT_MAX_PLANES refers to the number of planes (or the number of
memory buffers) for a single image (single framebuffer). Most often
with RGB there is just one plane in one memory buffer. RGB buffer could
be accompanied with e.g. a compression data buffer, so two planes,
one buffer each. Different YUV formats have different numbers of planes
from N=1 to 3, and those plane may be stored in 1 to N memory buffers
(with dmabuf handles pointing to them).

The number of planes and the number of memory buffers are often
conflated in APIs by just passing the same memory buffer multiple times
when multiple planes are stored in the same buffer (with different
offset).

The number of planes is determined by the pixel format and the format
modifier. The number of memory buffers is more... vaguely defined and
may vary sometimes.

> 
> >   
> >> In other words, what kind of misudertanding do you think can happen if
> >> this variable stay without the `wb_` prefix?
> >>
> >> I spent a few minutes trying to find a case, but nothing came to my
> >> mind.  
> > 
> > My question above is the confusion.
> > 
> > If all these members are about the wb destination buffer only, then
> > where do the inputs come from and how are they reference-counted to
> > make sure they are available when needed?  
> 
> Ok. Got it.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job`
  2022-04-26  7:09           ` Pekka Paalanen
@ 2022-04-27  0:43             ` Igor Torrente
  2022-04-27  7:31               ` Pekka Paalanen
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-27  0:43 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

On 4/26/22 04:09, Pekka Paalanen wrote:
> On Mon, 25 Apr 2022 21:56:12 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
> 
>> Hi Pekka,
>>
>> On 4/25/22 04:56, Pekka Paalanen wrote:
>>> On Sat, 23 Apr 2022 12:12:51 -0300
>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>>    
>>>> Hi Pekka,
>>>>
>>>> On 4/20/22 08:23, Pekka Paalanen wrote:
>>>>> On Mon,  4 Apr 2022 17:45:11 -0300
>>>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>>>>       
>>>>>> This commit is the groundwork to introduce new formats to the planes and
>>>>>> writeback buffer. As part of it, a new buffer metadata field is added to
>>>>>> `vkms_writeback_job`, this metadata is represented by the `vkms_composer`
>>>>>> struct.
>>>>>
>>>>> Hi,
>>>>>
>>>>> should this be talking about vkms_frame_info struct instead?
>>>>
>>>> Yes it should. I will fix this. Thanks.
>>>>   
>>>>>       
>>>>>>
>>>>>> Also adds two new function pointers (`{wb,plane}_format_transform_func`)
>>>>>> are defined to handle format conversion to/from internal format.
>>>>>>
>>>>>> These things will allow us, in the future, to have different compositing
>>>>>> and wb format types.
>>>>>>
>>>>>> V2: Change the code to get the drm_framebuffer reference and not copy its
>>>>>>        contents(Thomas Zimmermann).
>>>>>> V3: Drop the refcount in the wb code(Thomas Zimmermann).
>>>>>> V5: Add {wb,plane}_format_transform_func to vkms_writeback_job
>>>>>>        and vkms_plane_state (Pekka Paalanen)
>>>>>>
>>>>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/vkms/vkms_composer.c  |  4 ++--
>>>>>>     drivers/gpu/drm/vkms/vkms_drv.h       | 31 +++++++++++++++++++++------
>>>>>>     drivers/gpu/drm/vkms/vkms_plane.c     | 10 ++++-----
>>>>>>     drivers/gpu/drm/vkms/vkms_writeback.c | 20 ++++++++++++++---
>>>>>>     4 files changed, 49 insertions(+), 16 deletions(-)
> 
> ...
> 
>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>> index 2e6342164bef..2704cfb6904b 100644
>>>>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> 
> ...
> 
>>>>>> +
>>>>>> +struct vkms_writeback_job {
>>>>>> +	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
>>>>>> +	struct vkms_frame_info frame_info;
>>>>>
>>>>> Which frame_info is this? Should the field be called wb_frame_info?
>>>>
>>>> Considering it's already in the writeback_job struct do you think this
>>>> necessary?
>>>
>>> This struct has 'data' too, and that is not the wb buffer, right?
>>
>> AFAIU, no. Each plane has its own `iosys_map map[]`.
>>
>>>
>>> Hmm, if it's not the wb buffer, then using DRM_FORMAT_MAX_PLANES is
>>> odd...
>>
>> Yeah. I honestly don't have any clue why we have an array of `iosys_map`
>> for each plane, why we only use the map[0] and why we only call
>> `iosys_map_is_null` only to the `primary_composer`.
>>
>> Maybe @tzimmermann or @rodrigosiqueiramelo can shed some light on these
>> questions.
> 
> Yeah, those questions would be really good to figure out.
> 
> FWIW, I can tell you this: "plane" has two different meanings in the
> context of KMS:
> 
> https://gitlab.freedesktop.org/pq/color-and-hdr/-/blob/main/doc/glossary.md#plane
> 
> DRM_FORMAT_MAX_PLANES refers to the number of planes (or the number of
> memory buffers) for a single image (single framebuffer). Most often
> with RGB there is just one plane in one memory buffer. RGB buffer could
> be accompanied with e.g. a compression data buffer, so two planes,
> one buffer each. Different YUV formats have different numbers of planes
> from N=1 to 3, and those plane may be stored in 1 to N memory buffers
> (with dmabuf handles pointing to them).
> 
> The number of planes and the number of memory buffers are often
> conflated in APIs by just passing the same memory buffer multiple times
> when multiple planes are stored in the same buffer (with different
> offset).
> 
> The number of planes is determined by the pixel format and the format
> modifier. The number of memory buffers is more... vaguely defined and
> may vary sometimes.

Right. So probably this answers the first two questions. And also
probably my initial implementation of YUV420 and NV12 is wrong.

> 
>>
>>>    
>>>> In other words, what kind of misudertanding do you think can happen if
>>>> this variable stay without the `wb_` prefix?
>>>>
>>>> I spent a few minutes trying to find a case, but nothing came to my
>>>> mind.
>>>
>>> My question above is the confusion.
>>>
>>> If all these members are about the wb destination buffer only, then
>>> where do the inputs come from and how are they reference-counted to
>>> make sure they are available when needed?
>>
>> Ok. Got it.
> 
> 
> Thanks,
> pq

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 9/9] drm: vkms: Add support to the RGB565 format
  2022-04-21 10:58   ` Pekka Paalanen
@ 2022-04-27  0:53     ` Igor Torrente
  2022-04-27  7:55       ` Pekka Paalanen
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-27  0:53 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

Hi Pekka,

On 4/21/22 07:58, Pekka Paalanen wrote:
> On Mon,  4 Apr 2022 17:45:15 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
> 
>> Adds this common format to vkms.
>>
>> This commit also adds new helper macros to deal with fixed-point
>> arithmetic.
>>
>> It was done to improve the precision of the conversion to ARGB16161616
>> since the "conversion ratio" is not an integer.
>>
>> V3: Adapt the handlers to the new format introduced in patch 7 V3.
>> V5: Minor improvements
>>
>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>> ---
>>   drivers/gpu/drm/vkms/vkms_formats.c   | 70 +++++++++++++++++++++++++++
>>   drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
>>   drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
>>   3 files changed, 76 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>> index 8d913fa7dbde..4af8b295f31e 100644
>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>> @@ -5,6 +5,23 @@
>>   
>>   #include "vkms_formats.h"
>>   
>> +/* The following macros help doing fixed point arithmetic. */
>> +/*
>> + * With Fixed-Point scale 15 we have 17 and 15 bits of integer and fractional
>> + * parts respectively.
>> + *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
>> + * 31                                          0
>> + */
>> +#define FIXED_SCALE 15
> 
> I think this would usually be called a "shift" since it's used in
> bit-shifts.

Ok, I will rename this.

> 
>> +
>> +#define INT_TO_FIXED(a) ((a) << FIXED_SCALE)
>> +#define FIXED_MUL(a, b) ((s32)(((s64)(a) * (b)) >> FIXED_SCALE))
>> +#define FIXED_DIV(a, b) ((s32)(((s64)(a) << FIXED_SCALE) / (b)))
> 
> A truncating div, ok.
> 
>> +/* This macro converts a fixed point number to int, and round half up it */
>> +#define FIXED_TO_INT_ROUND(a) (((a) + (1 << (FIXED_SCALE - 1))) >> FIXED_SCALE)
> 
> Yes.
> 
>> +/* Convert divisor and dividend to Fixed-Point and performs the division */
>> +#define INT_TO_FIXED_DIV(a, b) (FIXED_DIV(INT_TO_FIXED(a), INT_TO_FIXED(b)))
> 
> Ok, this is obvious to read, even though it's the same as FIXED_DIV()
> alone. Not sure the compiler would optimize that extra bit-shift away...
> 
> If one wanted to, it would be possible to write type-safe functions for
> these so that fixed and integer could not be mixed up.

Ok, I will move to a function.

> 
>> +
>>   static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
>>   {
>>   	return frame_info->offset + (y * frame_info->pitch)
>> @@ -112,6 +129,30 @@ static void XRGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
>>   	}
>>   }
>>   
>> +static void RGB565_to_argb_u16(struct line_buffer *stage_buffer,
>> +			       const struct vkms_frame_info *frame_info, int y)
>> +{
>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>> +	u16 *src_pixels = get_packed_src_addr(frame_info, y);
>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			       stage_buffer->n_pixels);
>> +
>> +	for (x = 0; x < x_limit; x++, src_pixels++) {
>> +		u16 rgb_565 = le16_to_cpu(*src_pixels);
>> +		int fp_r = INT_TO_FIXED((rgb_565 >> 11) & 0x1f);
>> +		int fp_g = INT_TO_FIXED((rgb_565 >> 5) & 0x3f);
>> +		int fp_b = INT_TO_FIXED(rgb_565 & 0x1f);
>> +
>> +		int fp_rb_ratio = INT_TO_FIXED_DIV(65535, 31);
>> +		int fp_g_ratio = INT_TO_FIXED_DIV(65535, 63);
> 
> These two should be outside of the loop since they are constants.
> Likely no difference for performance because the compiler is probably
> doing that already, but I think it would read better.

I will move it.

> 
>> +
>> +		out_pixels[x].a = (u16)0xffff;
>> +		out_pixels[x].r = FIXED_TO_INT_ROUND(FIXED_MUL(fp_r, fp_rb_ratio));
>> +		out_pixels[x].g = FIXED_TO_INT_ROUND(FIXED_MUL(fp_g, fp_g_ratio));
>> +		out_pixels[x].b = FIXED_TO_INT_ROUND(FIXED_MUL(fp_b, fp_rb_ratio));
> 
> Looks good.
> 
>> +	}
>> +}
>> +
>>   
>>   /*
>>    * The following  functions take an line of argb_u16 pixels from the
>> @@ -199,6 +240,31 @@ static void argb_u16_to_XRGB16161616(struct vkms_frame_info *frame_info,
>>   	}
>>   }
>>   
>> +static void argb_u16_to_RGB565(struct vkms_frame_info *frame_info,
>> +			       const struct line_buffer *src_buffer, int y)
>> +{
>> +	int x, x_dst = frame_info->dst.x1;
>> +	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			    src_buffer->n_pixels);
>> +
>> +	for (x = 0; x < x_limit; x++, dst_pixels++) {
>> +		int fp_r = INT_TO_FIXED(in_pixels[x].r);
>> +		int fp_g = INT_TO_FIXED(in_pixels[x].g);
>> +		int fp_b = INT_TO_FIXED(in_pixels[x].b);
>> +
>> +		int fp_rb_ratio = INT_TO_FIXED_DIV(65535, 31);
>> +		int fp_g_ratio = INT_TO_FIXED_DIV(65535, 63);
> 
> Move these out of the loop.
> 
>> +
>> +		u16 r = FIXED_TO_INT_ROUND(FIXED_DIV(fp_r, fp_rb_ratio));
>> +		u16 g = FIXED_TO_INT_ROUND(FIXED_DIV(fp_g, fp_g_ratio));
>> +		u16 b = FIXED_TO_INT_ROUND(FIXED_DIV(fp_b, fp_rb_ratio));
>> +
>> +		*dst_pixels = cpu_to_le16(r << 11 | g << 5 | b);
> 
> Looks good.
> 
> You are using signed variables (int, s64, s32) when negative values
> should never occur. It doesn't seem wrong, just unexpected.

I left the signal so I can reuse them in the YUV formats.

> 
> The use of int in code vs. s32 in the macros is a bit inconsistent as
> well.

Right. I think I will stick with s32 and s64 then.

> 
>> +	}
>> +}
>> +
>>   plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>>   {
>>   	if (format == DRM_FORMAT_ARGB8888)
>> @@ -209,6 +275,8 @@ plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>>   		return &ARGB16161616_to_argb_u16;
>>   	else if (format == DRM_FORMAT_XRGB16161616)
>>   		return &XRGB16161616_to_argb_u16;
>> +	else if (format == DRM_FORMAT_RGB565)
>> +		return &RGB565_to_argb_u16;
>>   	else
>>   		return NULL;
>>   }
>> @@ -223,6 +291,8 @@ wb_format_transform_func get_wb_fmt_transform_function(u32 format)
>>   		return &argb_u16_to_ARGB16161616;
>>   	else if (format == DRM_FORMAT_XRGB16161616)
>>   		return &argb_u16_to_XRGB16161616;
>> +	else if (format == DRM_FORMAT_RGB565)
>> +		return &argb_u16_to_RGB565;
> 
> Now it's starting to become clear that a switch statement would be nice.
> 
>>   	else
>>   		return NULL;
>>   }
>> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
>> index 60054a85204a..94a8e412886f 100644
>> --- a/drivers/gpu/drm/vkms/vkms_plane.c
>> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
>> @@ -14,14 +14,16 @@
>>   
>>   static const u32 vkms_formats[] = {
>>   	DRM_FORMAT_XRGB8888,
>> -	DRM_FORMAT_XRGB16161616
>> +	DRM_FORMAT_XRGB16161616,
>> +	DRM_FORMAT_RGB565
>>   };
>>   
>>   static const u32 vkms_plane_formats[] = {
>>   	DRM_FORMAT_ARGB8888,
>>   	DRM_FORMAT_XRGB8888,
>>   	DRM_FORMAT_XRGB16161616,
>> -	DRM_FORMAT_ARGB16161616
>> +	DRM_FORMAT_ARGB16161616,
>> +	DRM_FORMAT_RGB565
>>   };
>>   
>>   static struct drm_plane_state *
>> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
>> index cb63a5da9af1..98da7bee0f4b 100644
>> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
>> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
>> @@ -16,7 +16,8 @@
>>   static const u32 vkms_wb_formats[] = {
>>   	DRM_FORMAT_XRGB8888,
>>   	DRM_FORMAT_XRGB16161616,
>> -	DRM_FORMAT_ARGB16161616
>> +	DRM_FORMAT_ARGB16161616,
>> +	DRM_FORMAT_RGB565
>>   };
>>   
>>   static const struct drm_connector_funcs vkms_wb_connector_funcs = {
> 
> I wonder, would it be possible to add a unit test to make sure that
> get_plane_fmt_transform_function() or get_wb_fmt_transform_function()
> does not return NULL for any of the listed formats, respectively?
> Or is that too paranoid?

I'm not opposed to it. But I also don't think it needs to be in this 
series of patches either.

A new todo maybe?

> 
> 
> Thanks,
> pq

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-04-26  1:54           ` Igor Torrente
@ 2022-04-27  1:03             ` Igor Torrente
  2022-04-27  1:22               ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-27  1:03 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches



On 4/25/22 22:54, Igor Torrente wrote:
> Hi Pekka,
> 
> On 4/25/22 05:10, Pekka Paalanen wrote:
>> On Sat, 23 Apr 2022 15:53:20 -0300
>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>
>>> I forgot to respond some points from your review.
>>>
>>> On 4/23/22 13:04, Igor Torrente wrote:
>>>> Hi Pekka,
>>>>
>>>> On 4/20/22 09:36, Pekka Paalanen wrote:
>>>>> On Mon,  4 Apr 2022 17:45:12 -0300
>>>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>>>>    
>>>>>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>>>>>> as a color input.
>>>>>>
>>>>>> This patch refactors all the functions related to the plane composition
>>>>>> to overcome this limitation.
>>>>>>
>>>>>> A new internal format(`struct pixel`) is introduced to deal with all
>>>>>
>>>>> Hi,
>>>>>
>>>>> struct pixel_argb_u16 was added in the previous patch.
>>>>
>>>> I will fix it. Thanks!
>>
>> ...
>>
>>>>>> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
>>>>>> +				 struct vkms_crtc_state *crtc_state,
>>>>>> +				 u32 *crc32)
>>>>>>      {
>>>>>> +	int line_width, ret = 0, pixel_size = sizeof(struct pixel_argb_u16);
>>>>>> +	struct vkms_frame_info *primary_plane_info = NULL;
>>>>>> +	struct line_buffer output_buffer, stage_buffer;
>>>>>> +	struct vkms_plane_state *act_plane = NULL;
>>>>>> +	u32 wb_format;
>>>>>>      
>>>>>> +	if (WARN_ON(pixel_size != 8))
>>>>>
>>>>> Isn't there a compile-time assert macro for this? Having to actually
>>>>> run VKMS to check for this reduces the chances of finding it early.
>>>>> What's the reason for this check anyway?
>>>
>>> Yes, it exists.
>>>
>>> include/linux/build_bug.h:1:#define static_assert(expr, ...)
>>> __static_assert(expr, ##__VA_ARGS__, #expr)
>>>
>>> I didn't add it because I can imagine some people very mad if the kernel
>>> did not compile because of vkms.
>>
>> But that would mean that VKMS is broken on those platforms. You'd
>> better know which platforms VKMS is broken, so the Kconfig can stop
>> VKMS from being built there at all. Or better, fix it before anyone
>> needs VKMS there.
> 
> Right. Makes sense. I will add it then.
> 
>>
>>> This check exists so we can call `crc32_le` for the entire line instead
>>> doing it for each channel of each pixel in case `struct `pixel_argb_u16`
>>> had any gap added by the compiler between the struct fields.
>>
>> Oh the CRC computation. Good point.
>>
>> Can you add a comment about that with the check?
> 
> Yeah, np.
> 
> I will copy the explanation above :)
> 
>>
>>>>>    
>>>>>> +		return -EINVAL;
>>>>>> +
>>>>>> +	if (crtc_state->num_active_planes >= 1) {
>>>>>> +		act_plane = crtc_state->active_planes[0];
>>>>>> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>>>>>> +			primary_plane_info = act_plane->frame_info;
>>>>>
>>>>> After the next patch, do you even need the primary plane for anything
>>>>> specifically?
>>>
>>> Yeah, I will not need it anymore.
>>>
>>>>> There is the map_is_null check below, but that should be
>>>>> done on all planes in the array, right?
>>>
>>> Yes, I guess so. And I don't know why it only checks for the
>>> primary_plane TBH.
>>
>> Maybe a left-over from times when it didn't have anything but a primary
>> plane?
> 
> Maybe.
> 
> Anyway, I have added this verification to all active planes in the  next
> version.
> 
>>
>>>>>
>>>>> I suspect the next patch, or another patch in this series, should just
>>>>> delete this chunk.
>>> I should, and I will in the V6 of next patch.
>>>
>>>>
>>>>
>>>>     
>>>>>    
>>>>>>      	}
>>>>>>      
>>>>>> +	if (!primary_plane_info)
>>>>>> +		return -EINVAL;
>>>>>> +
>>>>>>      	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>>>>>>      		return -EINVAL;
>>>>>>      
>>>>>> +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
>>>>>> +		return -EINVAL;
>>>>>>      
>>>>>> +	line_width = drm_rect_width(&primary_plane_info->dst);
>>>>>> +	stage_buffer.n_pixels = line_width;
>>>>>> +	output_buffer.n_pixels = line_width;
>>>>>>      
>>>>>> +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>>>>> +	if (!stage_buffer.pixels) {
>>>>>> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
>>>>>> +		return -ENOMEM;
>>>>>> +	}
>>>>>>      
>>>>>> +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>>>>> +	if (!output_buffer.pixels) {
>>>>>> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
>>>>>> +		ret = -ENOMEM;
>>>>>> +		goto free_stage_buffer;
>>>>>> +	}
>>>>>> +
>>>>>> +	if (active_wb) {
>>>>>> +		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>>>>>> +
>>>>>> +		wb_format = wb_frame_info->fb->format->format;
>>>>>
>>>>> I don't see wb_format being used, is it?
>>>>
>>>> This is probably a leftover from the last versions. Thanks for catching
>>>> it.
>>>>     
>>>>>    
>>>>>> +		wb_frame_info->src = primary_plane_info->src;
>>>>>> +		wb_frame_info->dst = primary_plane_info->dst;
>>>>>> +	}
>>>>>> +
>>>>>> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
>>>>>> +	      &output_buffer, (s64)line_width * pixel_size);
>>>>>
>>>>> What's the (s64) doing here?
>>>>>
>>>>> Are byte sizes not usually expressed with size_t or ssize_t types, or
>>>>> is the kernel convention to use u64 and s64?
>>>>>
>>>>> This makes me suspect that pixel_offset() and friends in vkms_format.c
>>>>> are going to need fixing as well. int type overflows at 2G.
>>>>
>>>>
>>>> Yeah, I should be using size_t in all these places.
>>>>     
>>>>>    
>>>>>> +
>>>>>> +	kvfree(output_buffer.pixels);
>>>>>> +free_stage_buffer:
>>>>>> +	kvfree(stage_buffer.pixels);
>>>>>> +can
>>>>>> +	return ret;
>>>>>>      }
>>>>>>      
>>>>>>      /**
>>>>>> @@ -222,13 +204,11 @@ void vkms_composer_worker(struct work_struct *work)
>>>>>>      						struct vkms_crtc_state,
>>>>>>      						composer_work);
>>>>>>      	struct drm_crtc *crtc = crtc_state->base.crtc;
>>>>>> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>>>>>>      	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>>>>>>      	bool crc_pending, wb_pending;
>>>>>>      	u64 frame_start, frame_end;
>>>>>> +	u32 crc32 = 0;
>>>>>>      	int ret;
>>>>>>      
>>>>>>      	spin_lock_irq(&out->composer_lock);
>>>>>> @@ -248,35 +228,19 @@ void vkms_composer_worker(struct work_struct *work)
>>>>>>      	if (!crc_pending)
>>>>>>      		return;
>>>>>>      
>>>>>>      	if (wb_pending)
>>>>>> +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
>>>>>> +	else
>>>>>> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>>>>>>      
>>>>>> +	if (ret)
>>>>>>      		return;
>>>>>>      
>>>>>>      	if (wb_pending) {
>>>>>>      		drm_writeback_signal_completion(&out->wb_connector, 0);
>>>>>>      		spin_lock_irq(&out->composer_lock);
>>>>>>      		crtc_state->wb_pending = false;
>>>>>>      		spin_unlock_irq(&out->composer_lock);
>>>>>>      	}
>>>>>>      
>>>>>>      	/*
>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>> new file mode 100644
>>>>>> index 000000000000..931a61405d6a
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>> @@ -0,0 +1,151 @@
>>>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>>>> +
>>>>>> +#include <drm/drm_rect.h>
>>>>>> +#include <linux/minmax.h>
>>>>>> +
>>>>>> +#include "vkms_formats.h"
>>>>>> +
>>>>>> +static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
>>>>>> +{
>>>>>> +	return frame_info->offset + (y * frame_info->pitch)
>>>>>> +				  + (x * frame_info->cpp);
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>>>>>> + *
>>>>>> + * @frame_info: Buffer metadata
>>>>>> + * @x: The x(width) coordinate of the 2D buffer
>>>>>> + * @y: The y(Heigth) coordinate of the 2D buffercan
>>>>>> + *
>>>>>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>>>>>> + * returns the address of the first color channel.
>>>>>> + * This function assumes the channels are packed together, i.e. a color channel
>>>>>> + * comes immediately after another in the memory. And therefore, this function
>>>>>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>>>>>> + */
>>>>>> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
>>>>>> +				int x, int y)
>>>>>> +{
>>>>>> +	int offset = pixel_offset(frame_info, x, y);
>>>>>> +
>>>>>> +	return (u8 *)frame_info->map[0].vaddr + offset;
>>>>>> +}
>>>>>> +
>>>>>> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
>>>>>> +{
>>>>>> +	int x_src = frame_info->src.x1 >> 16;
>>>>>> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>>>>>> +
>>>>>> +	return packed_pixels_addr(frame_info, x_src, y_src);
>>>>>> +}
>>>>>> +
>>>>>> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>>>>>> +				 const struct vkms_frame_info *frame_info, int y)
>>>>>> +{
>>>>>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>>>>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>>>>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>>> +			       stage_buffer->n_pixels);
>>>>>> +
>>>>>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>>>>>> +		/*
>>>>>> +		 * The 257 is the "conversion ratio". This number is obtained by the
>>>>>> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
>>>>>> +		 * the best color value in a pixel format with more possibilities.
>>>>>> +		 * A similar idea applies to others RGB color conversions.
>>>>>> +		 */
>>>>>> +		out_pixels[x].a = (u16)src_pixels[3] * 257;
>>>>>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>>>>>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>>>>>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>>>>>> +	}
>>>>>> +}
>>>>>> +
>>>>>> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>>>>>> +				 const struct vkms_frame_info *frame_info, int y)
>>>>>> +{
>>>>>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>>>>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>>>>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>>> +			       stage_buffer->n_pixels);
>>>>>> +
>>>>>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>>>>>> +		out_pixels[x].a = (u16)0xffff;
>>>>>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>>>>>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>>>>>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>>>>>> +	}
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> + * The following  functions take an line of argb_u16 pixels from the
>>>>>> + * src_buffer, convert them to a specific format, and store them in the
>>>>>> + * destination.
>>>>>> + *
>>>>>> + * They are used in the `compose_active_planes` to convert and store a line
>>>>>> + * from the src_buffer to the writeback buffer.
>>>>>> + */
>>>>>> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
>>>>>> +				 const struct line_buffer *src_buffer, int y)
>>>>>> +{
>>>>>> +	int x, x_dst = frame_info->dst.x1;
>>>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>>>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>>>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>>> +			    src_buffer->n_pixels);
>>>>>> +
>>>>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>>>>>> +		/*
>>>>>> +		 * This sequence below is important because the format's byte order is
>>>>>> +		 * in little-endian. In the case of the ARGB8888 the memory is
>>>>>> +		 * organized this way:
>>>>>> +		 *
>>>>>> +		 * | Addr     | = blue channel
>>>>>> +		 * | Addr + 1 | = green channel
>>>>>> +		 * | Addr + 2 | = Red channel
>>>>>> +		 * | Addr + 3 | = Alpha channel
>>>>>> +		 */
>>>>>> +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
>>>>>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>>>>>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>>>>>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>>>>>> +	}
>>>>>> +}
>>>>>> +
>>>>>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
>>>>>> +				 const struct line_buffer *src_buffer, int y)
>>>>>> +{
>>>>>> +	int x, x_dst = frame_info->dst.x1;
>>>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>>>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>>>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>>> +			    src_buffer->n_pixels);
>>>>>> +
>>>>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>>>>>> +		dst_pixels[3] = (u8)0xff;
>>>>>
>>>>> When writing to XRGB, it's not necessary to ensure the X channel has
>>>>> any sensible value. Anyone reading from XRGB must ignore that value
>>>>> anyway. So why not write something wacky here, like 0xa1, that is far
>>>>> enough from both 0x00 or 0xff to not be confused with them even
>>>>> visually? Also not 0x7f or 0x80 which are close to half of 0xff.
>>>>>
>>>>> Or, you could save a whole function and just use argb_u16_to_ARGBxxxx()
>>>>> instead, even for XRGB destination.
>>>>
>>>>
>>>> Right. Maybe I could just leave the channel untouched.
>>
>> Untouched may not be a good idea. Leaving anything untouched always has
>> the risk of leaking information through uninitialized memory. Maybe not
>> in this case because the destination is allocated by userspace already,
>> but nothing beats being obviously correct.
> 
> Makes sense.
> 
>>
>> Whatever you decide here, be prepared for it becoming de-facto kernel
>> UABI, because it is easy for userspace to (accidentally) rely on the
>> value, no matter what you pick.
> 
> I hope to make the right decision then.

The de-facto UABI seems to be already in place for {A, X}RGB8888.

I changed from 0xff to 0xbe and the `writeback-check-output` started to 
fail.

> 
>>
>>
>> Thanks,
>> pq
>>
>>
>>>>     
>>>>>    
>>>>>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>>>>>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>>>>>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>>>>>> +	}
>>>>>> +}
>>>>>> +
>>>>>> +plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>>>>>> +{
>>>>>> +	if (format == DRM_FORMAT_ARGB8888)
>>>>>> +		return &ARGB8888_to_argb_u16;
>>>>>> +	else if (format == DRM_FORMAT_XRGB8888)
>>>>>> +		return &XRGB8888_to_argb_u16;
>>>>>> +	else
>>>>>> +		return NULL;
>>>>>
>>>>> This works for now, but when more formats are added, I'd think a switch
>>>>> statement would look better.
>>>>
>>>> ok.
>>>>     
>>>>>    
>>>>>> +}
>>>>>> +
>>>>>> +wb_format_transform_func get_wb_fmt_transform_function(u32 format)
>>>>>> +{
>>>>>> +	if (format == DRM_FORMAT_ARGB8888)
>>>>>> +		return &argb_u16_to_ARGB8888;
>>>>>> +	else if (format == DRM_FORMAT_XRGB8888)
>>>>>> +		return &argb_u16_to_XRGB8888;
>>>>>> +	else
>>>>>> +		return NULL;
>>>>>> +}

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-04-27  1:03             ` Igor Torrente
@ 2022-04-27  1:22               ` Igor Torrente
  2022-04-27  7:43                 ` Pekka Paalanen
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-27  1:22 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches



On April 26, 2022 10:03:09 PM GMT-03:00, Igor Torrente <igormtorrente@gmail.com> wrote:
>
>
>On 4/25/22 22:54, Igor Torrente wrote:
>> Hi Pekka,
>> 
>> On 4/25/22 05:10, Pekka Paalanen wrote:
>>> On Sat, 23 Apr 2022 15:53:20 -0300
>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>> 
>>>> I forgot to respond some points from your review.
>>>> 
>>>> On 4/23/22 13:04, Igor Torrente wrote:
>>>>> Hi Pekka,
>>>>> 
>>>>> On 4/20/22 09:36, Pekka Paalanen wrote:
>>>>>> On Mon,  4 Apr 2022 17:45:12 -0300
>>>>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>>>>>    
>>>>>>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>>>>>>> as a color input.
>>>>>>> 
>>>>>>> This patch refactors all the functions related to the plane composition
>>>>>>> to overcome this limitation.
>>>>>>> 
>>>>>>> A new internal format(`struct pixel`) is introduced to deal with all
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> struct pixel_argb_u16 was added in the previous patch.
>>>>> 
>>>>> I will fix it. Thanks!
>>> 
>>> ...
>>> 
>>>>>>> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
>>>>>>> +				 struct vkms_crtc_state *crtc_state,
>>>>>>> +				 u32 *crc32)
>>>>>>>      {
>>>>>>> +	int line_width, ret = 0, pixel_size = sizeof(struct pixel_argb_u16);
>>>>>>> +	struct vkms_frame_info *primary_plane_info = NULL;
>>>>>>> +	struct line_buffer output_buffer, stage_buffer;
>>>>>>> +	struct vkms_plane_state *act_plane = NULL;
>>>>>>> +	u32 wb_format;
>>>>>>>      +	if (WARN_ON(pixel_size != 8))
>>>>>> 
>>>>>> Isn't there a compile-time assert macro for this? Having to actually
>>>>>> run VKMS to check for this reduces the chances of finding it early.
>>>>>> What's the reason for this check anyway?
>>>> 
>>>> Yes, it exists.
>>>> 
>>>> include/linux/build_bug.h:1:#define static_assert(expr, ...)
>>>> __static_assert(expr, ##__VA_ARGS__, #expr)
>>>> 
>>>> I didn't add it because I can imagine some people very mad if the kernel
>>>> did not compile because of vkms.
>>> 
>>> But that would mean that VKMS is broken on those platforms. You'd
>>> better know which platforms VKMS is broken, so the Kconfig can stop
>>> VKMS from being built there at all. Or better, fix it before anyone
>>> needs VKMS there.
>> 
>> Right. Makes sense. I will add it then.
>> 
>>> 
>>>> This check exists so we can call `crc32_le` for the entire line instead
>>>> doing it for each channel of each pixel in case `struct `pixel_argb_u16`
>>>> had any gap added by the compiler between the struct fields.
>>> 
>>> Oh the CRC computation. Good point.
>>> 
>>> Can you add a comment about that with the check?
>> 
>> Yeah, np.
>> 
>> I will copy the explanation above :)
>> 
>>> 
>>>>>>    
>>>>>>> +		return -EINVAL;
>>>>>>> +
>>>>>>> +	if (crtc_state->num_active_planes >= 1) {
>>>>>>> +		act_plane = crtc_state->active_planes[0];
>>>>>>> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>>>>>>> +			primary_plane_info = act_plane->frame_info;
>>>>>> 
>>>>>> After the next patch, do you even need the primary plane for anything
>>>>>> specifically?
>>>> 
>>>> Yeah, I will not need it anymore.
>>>> 
>>>>>> There is the map_is_null check below, but that should be
>>>>>> done on all planes in the array, right?
>>>> 
>>>> Yes, I guess so. And I don't know why it only checks for the
>>>> primary_plane TBH.
>>> 
>>> Maybe a left-over from times when it didn't have anything but a primary
>>> plane?
>> 
>> Maybe.
>> 
>> Anyway, I have added this verification to all active planes in the  next
>> version.
>> 
>>> 
>>>>>> 
>>>>>> I suspect the next patch, or another patch in this series, should just
>>>>>> delete this chunk.
>>>> I should, and I will in the V6 of next patch.
>>>> 
>>>>> 
>>>>> 
>>>>>     
>>>>>>    
>>>>>>>      	}
>>>>>>>      +	if (!primary_plane_info)
>>>>>>> +		return -EINVAL;
>>>>>>> +
>>>>>>>      	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>>>>>>>      		return -EINVAL;
>>>>>>>      +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
>>>>>>> +		return -EINVAL;
>>>>>>>      +	line_width = drm_rect_width(&primary_plane_info->dst);
>>>>>>> +	stage_buffer.n_pixels = line_width;
>>>>>>> +	output_buffer.n_pixels = line_width;
>>>>>>>      +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>>>>>> +	if (!stage_buffer.pixels) {
>>>>>>> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
>>>>>>> +		return -ENOMEM;
>>>>>>> +	}
>>>>>>>      +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>>>>>> +	if (!output_buffer.pixels) {
>>>>>>> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
>>>>>>> +		ret = -ENOMEM;
>>>>>>> +		goto free_stage_buffer;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if (active_wb) {
>>>>>>> +		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>>>>>>> +
>>>>>>> +		wb_format = wb_frame_info->fb->format->format;
>>>>>> 
>>>>>> I don't see wb_format being used, is it?
>>>>> 
>>>>> This is probably a leftover from the last versions. Thanks for catching
>>>>> it.
>>>>>     
>>>>>>    
>>>>>>> +		wb_frame_info->src = primary_plane_info->src;
>>>>>>> +		wb_frame_info->dst = primary_plane_info->dst;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
>>>>>>> +	      &output_buffer, (s64)line_width * pixel_size);
>>>>>> 
>>>>>> What's the (s64) doing here?
>>>>>> 
>>>>>> Are byte sizes not usually expressed with size_t or ssize_t types, or
>>>>>> is the kernel convention to use u64 and s64?
>>>>>> 
>>>>>> This makes me suspect that pixel_offset() and friends in vkms_format.c
>>>>>> are going to need fixing as well. int type overflows at 2G.
>>>>> 
>>>>> 
>>>>> Yeah, I should be using size_t in all these places.
>>>>>     
>>>>>>    
>>>>>>> +
>>>>>>> +	kvfree(output_buffer.pixels);
>>>>>>> +free_stage_buffer:
>>>>>>> +	kvfree(stage_buffer.pixels);
>>>>>>> +can
>>>>>>> +	return ret;
>>>>>>>      }
>>>>>>>           /**
>>>>>>> @@ -222,13 +204,11 @@ void vkms_composer_worker(struct work_struct *work)
>>>>>>>      						struct vkms_crtc_state,
>>>>>>>      						composer_work);
>>>>>>>      	struct drm_crtc *crtc = crtc_state->base.crtc;
>>>>>>> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>>>>>>>      	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>>>>>>>      	bool crc_pending, wb_pending;
>>>>>>>      	u64 frame_start, frame_end;
>>>>>>> +	u32 crc32 = 0;
>>>>>>>      	int ret;
>>>>>>>           	spin_lock_irq(&out->composer_lock);
>>>>>>> @@ -248,35 +228,19 @@ void vkms_composer_worker(struct work_struct *work)
>>>>>>>      	if (!crc_pending)
>>>>>>>      		return;
>>>>>>>           	if (wb_pending)
>>>>>>> +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
>>>>>>> +	else
>>>>>>> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>>>>>>>      +	if (ret)
>>>>>>>      		return;
>>>>>>>           	if (wb_pending) {
>>>>>>>      		drm_writeback_signal_completion(&out->wb_connector, 0);
>>>>>>>      		spin_lock_irq(&out->composer_lock);
>>>>>>>      		crtc_state->wb_pending = false;
>>>>>>>      		spin_unlock_irq(&out->composer_lock);
>>>>>>>      	}
>>>>>>>           	/*
>>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>>> new file mode 100644
>>>>>>> index 000000000000..931a61405d6a
>>>>>>> --- /dev/null
>>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>>> @@ -0,0 +1,151 @@
>>>>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>>>>> +
>>>>>>> +#include <drm/drm_rect.h>
>>>>>>> +#include <linux/minmax.h>
>>>>>>> +
>>>>>>> +#include "vkms_formats.h"
>>>>>>> +
>>>>>>> +static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
>>>>>>> +{
>>>>>>> +	return frame_info->offset + (y * frame_info->pitch)
>>>>>>> +				  + (x * frame_info->cpp);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>>>>>>> + *
>>>>>>> + * @frame_info: Buffer metadata
>>>>>>> + * @x: The x(width) coordinate of the 2D buffer
>>>>>>> + * @y: The y(Heigth) coordinate of the 2D buffercan
>>>>>>> + *
>>>>>>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>>>>>>> + * returns the address of the first color channel.
>>>>>>> + * This function assumes the channels are packed together, i.e. a color channel
>>>>>>> + * comes immediately after another in the memory. And therefore, this function
>>>>>>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>>>>>>> + */
>>>>>>> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
>>>>>>> +				int x, int y)
>>>>>>> +{
>>>>>>> +	int offset = pixel_offset(frame_info, x, y);
>>>>>>> +
>>>>>>> +	return (u8 *)frame_info->map[0].vaddr + offset;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
>>>>>>> +{
>>>>>>> +	int x_src = frame_info->src.x1 >> 16;
>>>>>>> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>>>>>>> +
>>>>>>> +	return packed_pixels_addr(frame_info, x_src, y_src);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>>>>>>> +				 const struct vkms_frame_info *frame_info, int y)
>>>>>>> +{
>>>>>>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>>>>>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>>>>>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>>>> +			       stage_buffer->n_pixels);
>>>>>>> +
>>>>>>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>>>>>>> +		/*
>>>>>>> +		 * The 257 is the "conversion ratio". This number is obtained by the
>>>>>>> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
>>>>>>> +		 * the best color value in a pixel format with more possibilities.
>>>>>>> +		 * A similar idea applies to others RGB color conversions.
>>>>>>> +		 */
>>>>>>> +		out_pixels[x].a = (u16)src_pixels[3] * 257;
>>>>>>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>>>>>>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>>>>>>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>>>>>>> +	}
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>>>>>>> +				 const struct vkms_frame_info *frame_info, int y)
>>>>>>> +{
>>>>>>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>>>>>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>>>>>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>>>> +			       stage_buffer->n_pixels);
>>>>>>> +
>>>>>>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>>>>>>> +		out_pixels[x].a = (u16)0xffff;
>>>>>>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>>>>>>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>>>>>>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>>>>>>> +	}
>>>>>>> +}
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * The following  functions take an line of argb_u16 pixels from the
>>>>>>> + * src_buffer, convert them to a specific format, and store them in the
>>>>>>> + * destination.
>>>>>>> + *
>>>>>>> + * They are used in the `compose_active_planes` to convert and store a line
>>>>>>> + * from the src_buffer to the writeback buffer.
>>>>>>> + */
>>>>>>> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
>>>>>>> +				 const struct line_buffer *src_buffer, int y)
>>>>>>> +{
>>>>>>> +	int x, x_dst = frame_info->dst.x1;
>>>>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>>>>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>>>>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>>>> +			    src_buffer->n_pixels);
>>>>>>> +
>>>>>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>>>>>>> +		/*
>>>>>>> +		 * This sequence below is important because the format's byte order is
>>>>>>> +		 * in little-endian. In the case of the ARGB8888 the memory is
>>>>>>> +		 * organized this way:
>>>>>>> +		 *
>>>>>>> +		 * | Addr     | = blue channel
>>>>>>> +		 * | Addr + 1 | = green channel
>>>>>>> +		 * | Addr + 2 | = Red channel
>>>>>>> +		 * | Addr + 3 | = Alpha channel
>>>>>>> +		 */
>>>>>>> +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
>>>>>>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>>>>>>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>>>>>>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>>>>>>> +	}
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
>>>>>>> +				 const struct line_buffer *src_buffer, int y)
>>>>>>> +{
>>>>>>> +	int x, x_dst = frame_info->dst.x1;
>>>>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>>>>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>>>>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>>>> +			    src_buffer->n_pixels);
>>>>>>> +
>>>>>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>>>>>>> +		dst_pixels[3] = (u8)0xff;
>>>>>> 
>>>>>> When writing to XRGB, it's not necessary to ensure the X channel has
>>>>>> any sensible value. Anyone reading from XRGB must ignore that value
>>>>>> anyway. So why not write something wacky here, like 0xa1, that is far
>>>>>> enough from both 0x00 or 0xff to not be confused with them even
>>>>>> visually? Also not 0x7f or 0x80 which are close to half of 0xff.
>>>>>> 
>>>>>> Or, you could save a whole function and just use argb_u16_to_ARGBxxxx()
>>>>>> instead, even for XRGB destination.
>>>>> 
>>>>> 
>>>>> Right. Maybe I could just leave the channel untouched.
>>> 
>>> Untouched may not be a good idea. Leaving anything untouched always has
>>> the risk of leaking information through uninitialized memory. Maybe not
>>> in this case because the destination is allocated by userspace already,
>>> but nothing beats being obviously correct.
>> 
>> Makes sense.
>> 
>>> 
>>> Whatever you decide here, be prepared for it becoming de-facto kernel
>>> UABI, because it is easy for userspace to (accidentally) rely on the
>>> value, no matter what you pick.
>> 
>> I hope to make the right decision then.
>
>The de-facto UABI seems to be already in place for {A, X}RGB8888.

"Only XRGB_8888

>
>I changed from 0xff to 0xbe and the `writeback-check-output` started to fail.
>
>> 
>>> 
>>> 
>>> Thanks,
>>> pq
>>> 
>>> 
>>>>>     
>>>>>>    
>>>>>>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>>>>>>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>>>>>>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>>>>>>> +	}
>>>>>>> +}
>>>>>>> +
>>>>>>> +plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>>>>>>> +{
>>>>>>> +	if (format == DRM_FORMAT_ARGB8888)
>>>>>>> +		return &ARGB8888_to_argb_u16;
>>>>>>> +	else if (format == DRM_FORMAT_XRGB8888)
>>>>>>> +		return &XRGB8888_to_argb_u16;
>>>>>>> +	else
>>>>>>> +		return NULL;
>>>>>> 
>>>>>> This works for now, but when more formats are added, I'd think a switch
>>>>>> statement would look better.
>>>>> 
>>>>> ok.
>>>>>     
>>>>>>    
>>>>>>> +}
>>>>>>> +
>>>>>>> +wb_format_transform_func get_wb_fmt_transform_function(u32 format)
>>>>>>> +{
>>>>>>> +	if (format == DRM_FORMAT_ARGB8888)
>>>>>>> +		return &argb_u16_to_ARGB8888;
>>>>>>> +	else if (format == DRM_FORMAT_XRGB8888)
>>>>>>> +		return &argb_u16_to_XRGB8888;
>>>>>>> +	else
>>>>>>> +		return NULL;
>>>>>>> +}

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job`
  2022-04-27  0:43             ` Igor Torrente
@ 2022-04-27  7:31               ` Pekka Paalanen
  0 siblings, 0 replies; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-27  7:31 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 5156 bytes --]

On Tue, 26 Apr 2022 21:43:12 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> On 4/26/22 04:09, Pekka Paalanen wrote:
> > On Mon, 25 Apr 2022 21:56:12 -0300
> > Igor Torrente <igormtorrente@gmail.com> wrote:
> >   
> >> Hi Pekka,
> >>
> >> On 4/25/22 04:56, Pekka Paalanen wrote:  
> >>> On Sat, 23 Apr 2022 12:12:51 -0300
> >>> Igor Torrente <igormtorrente@gmail.com> wrote:
> >>>      
> >>>> Hi Pekka,
> >>>>
> >>>> On 4/20/22 08:23, Pekka Paalanen wrote:  
> >>>>> On Mon,  4 Apr 2022 17:45:11 -0300
> >>>>> Igor Torrente <igormtorrente@gmail.com> wrote:
> >>>>>         
> >>>>>> This commit is the groundwork to introduce new formats to the planes and
> >>>>>> writeback buffer. As part of it, a new buffer metadata field is added to
> >>>>>> `vkms_writeback_job`, this metadata is represented by the `vkms_composer`
> >>>>>> struct.  
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> should this be talking about vkms_frame_info struct instead?  
> >>>>
> >>>> Yes it should. I will fix this. Thanks.
> >>>>     
> >>>>>         
> >>>>>>
> >>>>>> Also adds two new function pointers (`{wb,plane}_format_transform_func`)
> >>>>>> are defined to handle format conversion to/from internal format.
> >>>>>>
> >>>>>> These things will allow us, in the future, to have different compositing
> >>>>>> and wb format types.
> >>>>>>
> >>>>>> V2: Change the code to get the drm_framebuffer reference and not copy its
> >>>>>>        contents(Thomas Zimmermann).
> >>>>>> V3: Drop the refcount in the wb code(Thomas Zimmermann).
> >>>>>> V5: Add {wb,plane}_format_transform_func to vkms_writeback_job
> >>>>>>        and vkms_plane_state (Pekka Paalanen)
> >>>>>>
> >>>>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> >>>>>> ---
> >>>>>>     drivers/gpu/drm/vkms/vkms_composer.c  |  4 ++--
> >>>>>>     drivers/gpu/drm/vkms/vkms_drv.h       | 31 +++++++++++++++++++++------
> >>>>>>     drivers/gpu/drm/vkms/vkms_plane.c     | 10 ++++-----
> >>>>>>     drivers/gpu/drm/vkms/vkms_writeback.c | 20 ++++++++++++++---
> >>>>>>     4 files changed, 49 insertions(+), 16 deletions(-)  
> > 
> > ...
> >   
> >>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> >>>>>> index 2e6342164bef..2704cfb6904b 100644
> >>>>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> >>>>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h  
> > 
> > ...
> >   
> >>>>>> +
> >>>>>> +struct vkms_writeback_job {
> >>>>>> +	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
> >>>>>> +	struct vkms_frame_info frame_info;  
> >>>>>
> >>>>> Which frame_info is this? Should the field be called wb_frame_info?  
> >>>>
> >>>> Considering it's already in the writeback_job struct do you think this
> >>>> necessary?  
> >>>
> >>> This struct has 'data' too, and that is not the wb buffer, right?  
> >>
> >> AFAIU, no. Each plane has its own `iosys_map map[]`.
> >>  
> >>>
> >>> Hmm, if it's not the wb buffer, then using DRM_FORMAT_MAX_PLANES is
> >>> odd...  
> >>
> >> Yeah. I honestly don't have any clue why we have an array of `iosys_map`
> >> for each plane, why we only use the map[0] and why we only call
> >> `iosys_map_is_null` only to the `primary_composer`.
> >>
> >> Maybe @tzimmermann or @rodrigosiqueiramelo can shed some light on these
> >> questions.  
> > 
> > Yeah, those questions would be really good to figure out.
> > 
> > FWIW, I can tell you this: "plane" has two different meanings in the
> > context of KMS:
> > 
> > https://gitlab.freedesktop.org/pq/color-and-hdr/-/blob/main/doc/glossary.md#plane
> > 
> > DRM_FORMAT_MAX_PLANES refers to the number of planes (or the number of
> > memory buffers) for a single image (single framebuffer). Most often
> > with RGB there is just one plane in one memory buffer. RGB buffer could
> > be accompanied with e.g. a compression data buffer, so two planes,
> > one buffer each. Different YUV formats have different numbers of planes
> > from N=1 to 3, and those plane may be stored in 1 to N memory buffers
> > (with dmabuf handles pointing to them).
> > 
> > The number of planes and the number of memory buffers are often
> > conflated in APIs by just passing the same memory buffer multiple times
> > when multiple planes are stored in the same buffer (with different
> > offset).
> > 
> > The number of planes is determined by the pixel format and the format
> > modifier. The number of memory buffers is more... vaguely defined and
> > may vary sometimes.  
> 
> Right. So probably this answers the first two questions. And also
> probably my initial implementation of YUV420 and NV12 is wrong.

If I'm reading the code right, it's using the maximum number of image
planes (four) as the maximum number of KMS planes (chosen by the
driver). IOW, it confusing the two meanings of "plane" which have
nothing to do with each other.

Assuming that there is one data[] element for each KMS plane created by
VKMS. That makes a further assumption that each KMS plane framebuffer
has only one image plane. I think. Which they do when you are limited
to RGB formats.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-04-27  1:22               ` Igor Torrente
@ 2022-04-27  7:43                 ` Pekka Paalanen
  2022-04-28  0:44                   ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-27  7:43 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 3197 bytes --]

On Tue, 26 Apr 2022 22:22:22 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> On April 26, 2022 10:03:09 PM GMT-03:00, Igor Torrente <igormtorrente@gmail.com> wrote:
> >
> >
> >On 4/25/22 22:54, Igor Torrente wrote:  
> >> Hi Pekka,
> >> 
> >> On 4/25/22 05:10, Pekka Paalanen wrote:  
> >>> On Sat, 23 Apr 2022 15:53:20 -0300
> >>> Igor Torrente <igormtorrente@gmail.com> wrote:
> >>>   

...

> >>>>>>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
> >>>>>>> +				 const struct line_buffer *src_buffer, int y)
> >>>>>>> +{
> >>>>>>> +	int x, x_dst = frame_info->dst.x1;
> >>>>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> >>>>>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> >>>>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> >>>>>>> +			    src_buffer->n_pixels);
> >>>>>>> +
> >>>>>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> >>>>>>> +		dst_pixels[3] = (u8)0xff;  
> >>>>>> 
> >>>>>> When writing to XRGB, it's not necessary to ensure the X channel has
> >>>>>> any sensible value. Anyone reading from XRGB must ignore that value
> >>>>>> anyway. So why not write something wacky here, like 0xa1, that is far
> >>>>>> enough from both 0x00 or 0xff to not be confused with them even
> >>>>>> visually? Also not 0x7f or 0x80 which are close to half of 0xff.
> >>>>>> 
> >>>>>> Or, you could save a whole function and just use argb_u16_to_ARGBxxxx()
> >>>>>> instead, even for XRGB destination.  
> >>>>> 
> >>>>> 
> >>>>> Right. Maybe I could just leave the channel untouched.  
> >>> 
> >>> Untouched may not be a good idea. Leaving anything untouched always has
> >>> the risk of leaking information through uninitialized memory. Maybe not
> >>> in this case because the destination is allocated by userspace already,
> >>> but nothing beats being obviously correct.  
> >> 
> >> Makes sense.
> >>   
> >>> 
> >>> Whatever you decide here, be prepared for it becoming de-facto kernel
> >>> UABI, because it is easy for userspace to (accidentally) rely on the
> >>> value, no matter what you pick.  
> >> 
> >> I hope to make the right decision then.  
> >
> >The de-facto UABI seems to be already in place for {A, X}RGB8888.  
> 
> "Only XRGB_8888

If that's only IGT, then you should raise an issue with IGT about this,
to figure out if IGT is wrong by accident or if it is deliberate, and
are we stuck with it.

This is why I would want to fill X with garbage, to make the
expectations clear before the "obvious and logical constant value for X"
makes a mess by making XRGB indistinguishable from ARGB. Then the next
question is, do we need a special function to write out XRGB values, or
can we simply re-use the ARGB function.

Do the tests expect X channel to be filled with 0xff or with the actual
A values? This question will matter when all planes have ARGB
framebuffers and no background color. Then even more questions will
arise about what should actually happen with A values (blending
equation).

> 
> >
> >I changed from 0xff to 0xbe and the `writeback-check-output` started to fail.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 9/9] drm: vkms: Add support to the RGB565 format
  2022-04-27  0:53     ` Igor Torrente
@ 2022-04-27  7:55       ` Pekka Paalanen
  2022-05-06 23:05         ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-27  7:55 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 5450 bytes --]

On Tue, 26 Apr 2022 21:53:19 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Hi Pekka,
> 
> On 4/21/22 07:58, Pekka Paalanen wrote:
> > On Mon,  4 Apr 2022 17:45:15 -0300
> > Igor Torrente <igormtorrente@gmail.com> wrote:
> >   
> >> Adds this common format to vkms.
> >>
> >> This commit also adds new helper macros to deal with fixed-point
> >> arithmetic.
> >>
> >> It was done to improve the precision of the conversion to ARGB16161616
> >> since the "conversion ratio" is not an integer.
> >>
> >> V3: Adapt the handlers to the new format introduced in patch 7 V3.
> >> V5: Minor improvements
> >>
> >> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> >> ---
> >>   drivers/gpu/drm/vkms/vkms_formats.c   | 70 +++++++++++++++++++++++++++
> >>   drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
> >>   drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
> >>   3 files changed, 76 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> >> index 8d913fa7dbde..4af8b295f31e 100644
> >> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> >> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> >> @@ -5,6 +5,23 @@
> >>   
> >>   #include "vkms_formats.h"
> >>   
> >> +/* The following macros help doing fixed point arithmetic. */
> >> +/*
> >> + * With Fixed-Point scale 15 we have 17 and 15 bits of integer and fractional
> >> + * parts respectively.
> >> + *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
> >> + * 31                                          0
> >> + */
> >> +#define FIXED_SCALE 15  
> > 
> > I think this would usually be called a "shift" since it's used in
> > bit-shifts.  
> 
> Ok, I will rename this.
> 
> >   
> >> +
> >> +#define INT_TO_FIXED(a) ((a) << FIXED_SCALE)
> >> +#define FIXED_MUL(a, b) ((s32)(((s64)(a) * (b)) >> FIXED_SCALE))
> >> +#define FIXED_DIV(a, b) ((s32)(((s64)(a) << FIXED_SCALE) / (b)))  
> > 
> > A truncating div, ok.
> >   
> >> +/* This macro converts a fixed point number to int, and round half up it */
> >> +#define FIXED_TO_INT_ROUND(a) (((a) + (1 << (FIXED_SCALE - 1))) >> FIXED_SCALE)  
> > 
> > Yes.
> >   
> >> +/* Convert divisor and dividend to Fixed-Point and performs the division */
> >> +#define INT_TO_FIXED_DIV(a, b) (FIXED_DIV(INT_TO_FIXED(a), INT_TO_FIXED(b)))  
> > 
> > Ok, this is obvious to read, even though it's the same as FIXED_DIV()
> > alone. Not sure the compiler would optimize that extra bit-shift away...
> > 
> > If one wanted to, it would be possible to write type-safe functions for
> > these so that fixed and integer could not be mixed up.  
> 
> Ok, I will move to a function.

That's not all.

If you want it type-safe, then you need something like

struct vkms_fixed_point {
	s32 value;
};

And use `struct vkms_fixed_point` (by value) everywhere where you pass
a fixed point value, and never as a plain s32 type. Then it will be
impossible to do incorrect arithmetic or conversions by accident on
fixed point values.

Is it worth it? I don't know, since it's limited into this one file.

A simple 'typedef s32 vkms_fixed_point' does not work, because it does
not prevent computing with vkms_fixed_point as if it was just a normal
s32. Using a struct prevents that.

I wonder if the kernel doesn't already have something like this
available in general...

> >> +		u16 r = FIXED_TO_INT_ROUND(FIXED_DIV(fp_r, fp_rb_ratio));
> >> +		u16 g = FIXED_TO_INT_ROUND(FIXED_DIV(fp_g, fp_g_ratio));
> >> +		u16 b = FIXED_TO_INT_ROUND(FIXED_DIV(fp_b, fp_rb_ratio));
> >> +
> >> +		*dst_pixels = cpu_to_le16(r << 11 | g << 5 | b);  
> > 
> > Looks good.
> > 
> > You are using signed variables (int, s64, s32) when negative values
> > should never occur. It doesn't seem wrong, just unexpected.  
> 
> I left the signal so I can reuse them in the YUV formats.

Good point.

> 
> > 
> > The use of int in code vs. s32 in the macros is a bit inconsistent as
> > well.  
> 
> Right. I think I will stick with s32 and s64 then.

...

> >> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> >> index cb63a5da9af1..98da7bee0f4b 100644
> >> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> >> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> >> @@ -16,7 +16,8 @@
> >>   static const u32 vkms_wb_formats[] = {
> >>   	DRM_FORMAT_XRGB8888,
> >>   	DRM_FORMAT_XRGB16161616,
> >> -	DRM_FORMAT_ARGB16161616
> >> +	DRM_FORMAT_ARGB16161616,
> >> +	DRM_FORMAT_RGB565
> >>   };
> >>   
> >>   static const struct drm_connector_funcs vkms_wb_connector_funcs = {  
> > 
> > I wonder, would it be possible to add a unit test to make sure that
> > get_plane_fmt_transform_function() or get_wb_fmt_transform_function()
> > does not return NULL for any of the listed formats, respectively?
> > Or is that too paranoid?  
> 
> I'm not opposed to it. But I also don't think it needs to be in this 
> series of patches either.
> 
> A new todo maybe?

If it's a good thing, then it belongs in this series, because those
function getters are introduced in this series, opening the door for
the mistakes that the tests would prevent. I don't mean IGT tests but
kernel internal tests. I think there is a unit test framework?

You really should get a kernel maintainer's opinion on these questions,
as I am not a kernel developer.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-04-27  7:43                 ` Pekka Paalanen
@ 2022-04-28  0:44                   ` Igor Torrente
  2022-04-29 12:31                     ` Pekka Paalanen
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-04-28  0:44 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches



On 4/27/22 04:43, Pekka Paalanen wrote:
> On Tue, 26 Apr 2022 22:22:22 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
> 
>> On April 26, 2022 10:03:09 PM GMT-03:00, Igor Torrente <igormtorrente@gmail.com> wrote:
>>>
>>>
>>> On 4/25/22 22:54, Igor Torrente wrote:
>>>> Hi Pekka,
>>>>
>>>> On 4/25/22 05:10, Pekka Paalanen wrote:
>>>>> On Sat, 23 Apr 2022 15:53:20 -0300
>>>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>>>>    
> 
> ...
> 
>>>>>>>>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
>>>>>>>>> +				 const struct line_buffer *src_buffer, int y)
>>>>>>>>> +{
>>>>>>>>> +	int x, x_dst = frame_info->dst.x1;
>>>>>>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>>>>>>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>>>>>>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>>>>>>> +			    src_buffer->n_pixels);
>>>>>>>>> +
>>>>>>>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>>>>>>>>> +		dst_pixels[3] = (u8)0xff;
>>>>>>>>
>>>>>>>> When writing to XRGB, it's not necessary to ensure the X channel has
>>>>>>>> any sensible value. Anyone reading from XRGB must ignore that value
>>>>>>>> anyway. So why not write something wacky here, like 0xa1, that is far
>>>>>>>> enough from both 0x00 or 0xff to not be confused with them even
>>>>>>>> visually? Also not 0x7f or 0x80 which are close to half of 0xff.
>>>>>>>>
>>>>>>>> Or, you could save a whole function and just use argb_u16_to_ARGBxxxx()
>>>>>>>> instead, even for XRGB destination.
>>>>>>>
>>>>>>>
>>>>>>> Right. Maybe I could just leave the channel untouched.
>>>>>
>>>>> Untouched may not be a good idea. Leaving anything untouched always has
>>>>> the risk of leaking information through uninitialized memory. Maybe not
>>>>> in this case because the destination is allocated by userspace already,
>>>>> but nothing beats being obviously correct.
>>>>
>>>> Makes sense.
>>>>    
>>>>>
>>>>> Whatever you decide here, be prepared for it becoming de-facto kernel
>>>>> UABI, because it is easy for userspace to (accidentally) rely on the
>>>>> value, no matter what you pick.
>>>>
>>>> I hope to make the right decision then.
>>>
>>> The de-facto UABI seems to be already in place for {A, X}RGB8888.
>>
>> "Only XRGB_8888
> 
> If that's only IGT, then you should raise an issue with IGT about this,
> to figure out if IGT is wrong by accident or if it is deliberate, and
> are we stuck with it.
> 
> This is why I would want to fill X with garbage, to make the
> expectations clear before the "obvious and logical constant value for X"
> makes a mess by making XRGB indistinguishable from ARGB. Then the next
> question is, do we need a special function to write out XRGB values, or
> can we simply re-use the ARGB function.
> 
> Do the tests expect X channel to be filled with 0xff or with the actual
> A values? This question will matter when all planes have ARGB
> framebuffers and no background color. Then even more questions will
> arise about what should actually happen with A values (blending
> equation).

I dig into the igt code a little bit and found that it's waiting for the 
channel to not be changed.
It fills all the pixels in the line with a value and calculates the CRC 
of the entire buffer, including the alpha.

I will crate an issue asking if this is intended.

> 
>>
>>>
>>> I changed from 0xff to 0xbe and the `writeback-check-output` started to fail.
> 
> 
> Thanks,
> pq

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-04-28  0:44                   ` Igor Torrente
@ 2022-04-29 12:31                     ` Pekka Paalanen
  0 siblings, 0 replies; 44+ messages in thread
From: Pekka Paalanen @ 2022-04-29 12:31 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 3822 bytes --]

On Wed, 27 Apr 2022 21:44:34 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> On 4/27/22 04:43, Pekka Paalanen wrote:
> > On Tue, 26 Apr 2022 22:22:22 -0300
> > Igor Torrente <igormtorrente@gmail.com> wrote:
> >   
> >> On April 26, 2022 10:03:09 PM GMT-03:00, Igor Torrente <igormtorrente@gmail.com> wrote:  
> >>>
> >>>
> >>> On 4/25/22 22:54, Igor Torrente wrote:  
> >>>> Hi Pekka,
> >>>>
> >>>> On 4/25/22 05:10, Pekka Paalanen wrote:  
> >>>>> On Sat, 23 Apr 2022 15:53:20 -0300
> >>>>> Igor Torrente <igormtorrente@gmail.com> wrote:
> >>>>>      
> > 
> > ...
> >   
> >>>>>>>>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
> >>>>>>>>> +				 const struct line_buffer *src_buffer, int y)
> >>>>>>>>> +{
> >>>>>>>>> +	int x, x_dst = frame_info->dst.x1;
> >>>>>>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> >>>>>>>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> >>>>>>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> >>>>>>>>> +			    src_buffer->n_pixels);
> >>>>>>>>> +
> >>>>>>>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> >>>>>>>>> +		dst_pixels[3] = (u8)0xff;  
> >>>>>>>>
> >>>>>>>> When writing to XRGB, it's not necessary to ensure the X channel has
> >>>>>>>> any sensible value. Anyone reading from XRGB must ignore that value
> >>>>>>>> anyway. So why not write something wacky here, like 0xa1, that is far
> >>>>>>>> enough from both 0x00 or 0xff to not be confused with them even
> >>>>>>>> visually? Also not 0x7f or 0x80 which are close to half of 0xff.
> >>>>>>>>
> >>>>>>>> Or, you could save a whole function and just use argb_u16_to_ARGBxxxx()
> >>>>>>>> instead, even for XRGB destination.  
> >>>>>>>
> >>>>>>>
> >>>>>>> Right. Maybe I could just leave the channel untouched.  
> >>>>>
> >>>>> Untouched may not be a good idea. Leaving anything untouched always has
> >>>>> the risk of leaking information through uninitialized memory. Maybe not
> >>>>> in this case because the destination is allocated by userspace already,
> >>>>> but nothing beats being obviously correct.  
> >>>>
> >>>> Makes sense.
> >>>>      
> >>>>>
> >>>>> Whatever you decide here, be prepared for it becoming de-facto kernel
> >>>>> UABI, because it is easy for userspace to (accidentally) rely on the
> >>>>> value, no matter what you pick.  
> >>>>
> >>>> I hope to make the right decision then.  
> >>>
> >>> The de-facto UABI seems to be already in place for {A, X}RGB8888.  
> >>
> >> "Only XRGB_8888  
> > 
> > If that's only IGT, then you should raise an issue with IGT about this,
> > to figure out if IGT is wrong by accident or if it is deliberate, and
> > are we stuck with it.
> > 
> > This is why I would want to fill X with garbage, to make the
> > expectations clear before the "obvious and logical constant value for X"
> > makes a mess by making XRGB indistinguishable from ARGB. Then the next
> > question is, do we need a special function to write out XRGB values, or
> > can we simply re-use the ARGB function.
> > 
> > Do the tests expect X channel to be filled with 0xff or with the actual
> > A values? This question will matter when all planes have ARGB
> > framebuffers and no background color. Then even more questions will
> > arise about what should actually happen with A values (blending
> > equation).  
> 
> I dig into the igt code a little bit and found that it's waiting for the 
> channel to not be changed.
> It fills all the pixels in the line with a value and calculates the CRC 
> of the entire buffer, including the alpha.
> 
> I will crate an issue asking if this is intended.

I just remembered this:
https://lists.freedesktop.org/archives/igt-dev/2022-March/039920.html


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 9/9] drm: vkms: Add support to the RGB565 format
  2022-04-27  7:55       ` Pekka Paalanen
@ 2022-05-06 23:05         ` Igor Torrente
  2022-05-09  7:53           ` Pekka Paalanen
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Torrente @ 2022-05-06 23:05 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

Hi Pekka,

On 4/27/22 04:55, Pekka Paalanen wrote:
> On Tue, 26 Apr 2022 21:53:19 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
> 
>> Hi Pekka,
>>
>> On 4/21/22 07:58, Pekka Paalanen wrote:
>>> On Mon,  4 Apr 2022 17:45:15 -0300
>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>>    
>>>> Adds this common format to vkms.
>>>>
>>>> This commit also adds new helper macros to deal with fixed-point
>>>> arithmetic.
>>>>
>>>> It was done to improve the precision of the conversion to ARGB16161616
>>>> since the "conversion ratio" is not an integer.
>>>>
>>>> V3: Adapt the handlers to the new format introduced in patch 7 V3.
>>>> V5: Minor improvements
>>>>
>>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>>>> ---
>>>>    drivers/gpu/drm/vkms/vkms_formats.c   | 70 +++++++++++++++++++++++++++
>>>>    drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
>>>>    drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
>>>>    3 files changed, 76 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> index 8d913fa7dbde..4af8b295f31e 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> @@ -5,6 +5,23 @@
>>>>    
>>>>    #include "vkms_formats.h"
>>>>    
>>>> +/* The following macros help doing fixed point arithmetic. */
>>>> +/*
>>>> + * With Fixed-Point scale 15 we have 17 and 15 bits of integer and fractional
>>>> + * parts respectively.
>>>> + *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
>>>> + * 31                                          0
>>>> + */
>>>> +#define FIXED_SCALE 15
>>>
>>> I think this would usually be called a "shift" since it's used in
>>> bit-shifts.
>>
>> Ok, I will rename this.
>>
>>>    
>>>> +
>>>> +#define INT_TO_FIXED(a) ((a) << FIXED_SCALE)
>>>> +#define FIXED_MUL(a, b) ((s32)(((s64)(a) * (b)) >> FIXED_SCALE))
>>>> +#define FIXED_DIV(a, b) ((s32)(((s64)(a) << FIXED_SCALE) / (b)))
>>>
>>> A truncating div, ok.
>>>    
>>>> +/* This macro converts a fixed point number to int, and round half up it */
>>>> +#define FIXED_TO_INT_ROUND(a) (((a) + (1 << (FIXED_SCALE - 1))) >> FIXED_SCALE)
>>>
>>> Yes.
>>>    
>>>> +/* Convert divisor and dividend to Fixed-Point and performs the division */
>>>> +#define INT_TO_FIXED_DIV(a, b) (FIXED_DIV(INT_TO_FIXED(a), INT_TO_FIXED(b)))
>>>
>>> Ok, this is obvious to read, even though it's the same as FIXED_DIV()
>>> alone. Not sure the compiler would optimize that extra bit-shift away...
>>>
>>> If one wanted to, it would be possible to write type-safe functions for
>>> these so that fixed and integer could not be mixed up.
>>
>> Ok, I will move to a function.
> 
> That's not all.
> 
> If you want it type-safe, then you need something like
> 
> struct vkms_fixed_point {
> 	s32 value;
> };
> 
> And use `struct vkms_fixed_point` (by value) everywhere where you pass
> a fixed point value, and never as a plain s32 type. Then it will be
> impossible to do incorrect arithmetic or conversions by accident on
> fixed point values.
> 
> Is it worth it? I don't know, since it's limited into this one file.
> 
> A simple 'typedef s32 vkms_fixed_point' does not work, because it does
> not prevent computing with vkms_fixed_point as if it was just a normal
> s32. Using a struct prevents that.

ohhh. Got it!

> 
> I wonder if the kernel doesn't already have something like this
> available in general...

After some time searching I found `include/drm/drm_fixed.h`[1].

It seems fine. There are minor things to consider:

1. It doesn't have a `FIXED_TO_INT_ROUND` equivalent.
2. We can use `fixed20_12` for rgb565 but We have to use s64 for YUV
formats or add a `sfixed20_12` with s32.

In terms of consistency, do you think worth using this "library" given
that we may need to use two distinct ways to represent the fixed point
soonish? Or it's better to implement `sfixed20_12`? Or just continue 
with the
current macros?

[1] - https://elixir.bootlin.com/linux/latest/source/include/drm/drm_fixed.h

> 
>>>> +		u16 r = FIXED_TO_INT_ROUND(FIXED_DIV(fp_r, fp_rb_ratio));
>>>> +		u16 g = FIXED_TO_INT_ROUND(FIXED_DIV(fp_g, fp_g_ratio));
>>>> +		u16 b = FIXED_TO_INT_ROUND(FIXED_DIV(fp_b, fp_rb_ratio));
>>>> +
>>>> +		*dst_pixels = cpu_to_le16(r << 11 | g << 5 | b);
>>>
>>> Looks good.
>>>
>>> You are using signed variables (int, s64, s32) when negative values
>>> should never occur. It doesn't seem wrong, just unexpected.
>>
>> I left the signal so I can reuse them in the YUV formats.
> 
> Good point.
> 
>>
>>>
>>> The use of int in code vs. s32 in the macros is a bit inconsistent as
>>> well.
>>
>> Right. I think I will stick with s32 and s64 then.
> 
> ...
> 
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
>>>> index cb63a5da9af1..98da7bee0f4b 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
>>>> @@ -16,7 +16,8 @@
>>>>    static const u32 vkms_wb_formats[] = {
>>>>    	DRM_FORMAT_XRGB8888,
>>>>    	DRM_FORMAT_XRGB16161616,
>>>> -	DRM_FORMAT_ARGB16161616
>>>> +	DRM_FORMAT_ARGB16161616,
>>>> +	DRM_FORMAT_RGB565
>>>>    };
>>>>    
>>>>    static const struct drm_connector_funcs vkms_wb_connector_funcs = {
>>>
>>> I wonder, would it be possible to add a unit test to make sure that
>>> get_plane_fmt_transform_function() or get_wb_fmt_transform_function()
>>> does not return NULL for any of the listed formats, respectively?
>>> Or is that too paranoid?
>>
>> I'm not opposed to it. But I also don't think it needs to be in this
>> series of patches either.
>>
>> A new todo maybe?
> 
> If it's a good thing, then it belongs in this series, because those
> function getters are introduced in this series, opening the door for
> the mistakes that the tests would prevent. I don't mean IGT tests but
> kernel internal tests. I think there is a unit test framework?

Yeah, kernel have kunit and kselftest. Idk what are the differences
between them or how to use them, but I know they exist.

> 
> You really should get a kernel maintainer's opinion on these questions,
> as I am not a kernel developer.

Ok.

> 
> 
> Thanks,
> pq

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
  2022-04-04 20:45 ` [PATCH v5 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats Igor Torrente
  2022-04-20 13:19   ` Pekka Paalanen
@ 2022-05-07  7:32   ` Thomas Zimmermann
  2022-05-10 20:32     ` Igor Torrente
  1 sibling, 1 reply; 44+ messages in thread
From: Thomas Zimmermann @ 2022-05-07  7:32 UTC (permalink / raw)
  To: Igor Torrente, rodrigosiqueiramelo, melissa.srw, ppaalanen
  Cc: hamohammed.sa, airlied, tales.aparecida, dri-devel,
	leandro.ribeiro, ~lkcamp/patches


[-- Attachment #1.1: Type: text/plain, Size: 6264 bytes --]

Hi

Am 04.04.22 um 22:45 schrieb Igor Torrente:
> This will be useful to write tests that depends on these formats.
> 
> ARGB and XRGB follows the a similar implementation of the former formats.
> Just adjusting for 16 bits per channel.
> 
> V3: Adapt the handlers to the new format introduced in patch 7 V3.
> V5: Minor improvements
>      Added le16_to_cpu/cpu_to_le16 to the 16 bits color read/writes.

Is there something we could add to the DRM's format-conversion helpers?

Best regards
Thomas

> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>   drivers/gpu/drm/vkms/vkms_formats.c   | 77 +++++++++++++++++++++++++++
>   drivers/gpu/drm/vkms/vkms_plane.c     |  5 +-
>   drivers/gpu/drm/vkms/vkms_writeback.c |  2 +
>   3 files changed, 83 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 931a61405d6a..8d913fa7dbde 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -78,6 +78,41 @@ static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>   	}
>   }
>   
> +static void ARGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
> +				     const struct vkms_frame_info *frame_info,
> +				     int y)
> +{
> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> +	u16 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			       stage_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> +		out_pixels[x].a = le16_to_cpu(src_pixels[3]);
> +		out_pixels[x].r = le16_to_cpu(src_pixels[2]);
> +		out_pixels[x].g = le16_to_cpu(src_pixels[1]);
> +		out_pixels[x].b = le16_to_cpu(src_pixels[0]);
> +	}
> +}
> +
> +static void XRGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
> +				     const struct vkms_frame_info *frame_info,
> +				     int y)
> +{
> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> +	u16 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			       stage_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> +		out_pixels[x].a = (u16)0xffff;
> +		out_pixels[x].r = le16_to_cpu(src_pixels[2]);
> +		out_pixels[x].g = le16_to_cpu(src_pixels[1]);
> +		out_pixels[x].b = le16_to_cpu(src_pixels[0]);
> +	}
> +}
> +
> +
>   /*
>    * The following  functions take an line of argb_u16 pixels from the
>    * src_buffer, convert them to a specific format, and store them in the
> @@ -130,12 +165,50 @@ static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
>   	}
>   }
>   
> +static void argb_u16_to_ARGB16161616(struct vkms_frame_info *frame_info,
> +				     const struct line_buffer *src_buffer, int y)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    src_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		dst_pixels[3] = cpu_to_le16(in_pixels[x].a);
> +		dst_pixels[2] = cpu_to_le16(in_pixels[x].r);
> +		dst_pixels[1] = cpu_to_le16(in_pixels[x].g);
> +		dst_pixels[0] = cpu_to_le16(in_pixels[x].b);
> +	}
> +}
> +
> +static void argb_u16_to_XRGB16161616(struct vkms_frame_info *frame_info,
> +				     const struct line_buffer *src_buffer, int y)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    src_buffer->n_pixels);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		dst_pixels[3] = (u8)0xffff;
> +		dst_pixels[2] = cpu_to_le16(in_pixels[x].r);
> +		dst_pixels[1] = cpu_to_le16(in_pixels[x].g);
> +		dst_pixels[0] = cpu_to_le16(in_pixels[x].b);
> +	}
> +}
> +
>   plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>   {
>   	if (format == DRM_FORMAT_ARGB8888)
>   		return &ARGB8888_to_argb_u16;
>   	else if (format == DRM_FORMAT_XRGB8888)
>   		return &XRGB8888_to_argb_u16;
> +	else if (format == DRM_FORMAT_ARGB16161616)
> +		return &ARGB16161616_to_argb_u16;
> +	else if (format == DRM_FORMAT_XRGB16161616)
> +		return &XRGB16161616_to_argb_u16;
>   	else
>   		return NULL;
>   }
> @@ -146,6 +219,10 @@ wb_format_transform_func get_wb_fmt_transform_function(u32 format)
>   		return &argb_u16_to_ARGB8888;
>   	else if (format == DRM_FORMAT_XRGB8888)
>   		return &argb_u16_to_XRGB8888;
> +	else if (format == DRM_FORMAT_ARGB16161616)
> +		return &argb_u16_to_ARGB16161616;
> +	else if (format == DRM_FORMAT_XRGB16161616)
> +		return &argb_u16_to_XRGB16161616;
>   	else
>   		return NULL;
>   }
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 798243837fd0..60054a85204a 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -14,11 +14,14 @@
>   
>   static const u32 vkms_formats[] = {
>   	DRM_FORMAT_XRGB8888,
> +	DRM_FORMAT_XRGB16161616
>   };
>   
>   static const u32 vkms_plane_formats[] = {
>   	DRM_FORMAT_ARGB8888,
> -	DRM_FORMAT_XRGB8888
> +	DRM_FORMAT_XRGB8888,
> +	DRM_FORMAT_XRGB16161616,
> +	DRM_FORMAT_ARGB16161616
>   };
>   
>   static struct drm_plane_state *
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index 97f71e784bbf..cb63a5da9af1 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -15,6 +15,8 @@
>   
>   static const u32 vkms_wb_formats[] = {
>   	DRM_FORMAT_XRGB8888,
> +	DRM_FORMAT_XRGB16161616,
> +	DRM_FORMAT_ARGB16161616
>   };
>   
>   static const struct drm_connector_funcs vkms_wb_connector_funcs = {

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 9/9] drm: vkms: Add support to the RGB565 format
  2022-05-06 23:05         ` Igor Torrente
@ 2022-05-09  7:53           ` Pekka Paalanen
  2022-05-10 19:24             ` Igor Torrente
  0 siblings, 1 reply; 44+ messages in thread
From: Pekka Paalanen @ 2022-05-09  7:53 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 4827 bytes --]

On Fri, 6 May 2022 20:05:39 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Hi Pekka,
> 
> On 4/27/22 04:55, Pekka Paalanen wrote:
> > On Tue, 26 Apr 2022 21:53:19 -0300
> > Igor Torrente <igormtorrente@gmail.com> wrote:
> >   
> >> Hi Pekka,
> >>
> >> On 4/21/22 07:58, Pekka Paalanen wrote:  
> >>> On Mon,  4 Apr 2022 17:45:15 -0300
> >>> Igor Torrente <igormtorrente@gmail.com> wrote:
> >>>      
> >>>> Adds this common format to vkms.
> >>>>
> >>>> This commit also adds new helper macros to deal with fixed-point
> >>>> arithmetic.
> >>>>
> >>>> It was done to improve the precision of the conversion to ARGB16161616
> >>>> since the "conversion ratio" is not an integer.
> >>>>
> >>>> V3: Adapt the handlers to the new format introduced in patch 7 V3.
> >>>> V5: Minor improvements
> >>>>
> >>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> >>>> ---
> >>>>    drivers/gpu/drm/vkms/vkms_formats.c   | 70 +++++++++++++++++++++++++++
> >>>>    drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
> >>>>    drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
> >>>>    3 files changed, 76 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> >>>> index 8d913fa7dbde..4af8b295f31e 100644
> >>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> >>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> >>>> @@ -5,6 +5,23 @@
> >>>>    
> >>>>    #include "vkms_formats.h"
> >>>>    
> >>>> +/* The following macros help doing fixed point arithmetic. */
> >>>> +/*
> >>>> + * With Fixed-Point scale 15 we have 17 and 15 bits of integer and fractional
> >>>> + * parts respectively.
> >>>> + *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
> >>>> + * 31                                          0
> >>>> + */
> >>>> +#define FIXED_SCALE 15  
> >>>
> >>> I think this would usually be called a "shift" since it's used in
> >>> bit-shifts.  
> >>
> >> Ok, I will rename this.
> >>  
> >>>      
> >>>> +
> >>>> +#define INT_TO_FIXED(a) ((a) << FIXED_SCALE)
> >>>> +#define FIXED_MUL(a, b) ((s32)(((s64)(a) * (b)) >> FIXED_SCALE))
> >>>> +#define FIXED_DIV(a, b) ((s32)(((s64)(a) << FIXED_SCALE) / (b)))  
> >>>
> >>> A truncating div, ok.
> >>>      
> >>>> +/* This macro converts a fixed point number to int, and round half up it */
> >>>> +#define FIXED_TO_INT_ROUND(a) (((a) + (1 << (FIXED_SCALE - 1))) >> FIXED_SCALE)  
> >>>
> >>> Yes.
> >>>      
> >>>> +/* Convert divisor and dividend to Fixed-Point and performs the division */
> >>>> +#define INT_TO_FIXED_DIV(a, b) (FIXED_DIV(INT_TO_FIXED(a), INT_TO_FIXED(b)))  
> >>>
> >>> Ok, this is obvious to read, even though it's the same as FIXED_DIV()
> >>> alone. Not sure the compiler would optimize that extra bit-shift away...
> >>>
> >>> If one wanted to, it would be possible to write type-safe functions for
> >>> these so that fixed and integer could not be mixed up.  
> >>
> >> Ok, I will move to a function.  
> > 
> > That's not all.
> > 
> > If you want it type-safe, then you need something like
> > 
> > struct vkms_fixed_point {
> > 	s32 value;
> > };
> > 
> > And use `struct vkms_fixed_point` (by value) everywhere where you pass
> > a fixed point value, and never as a plain s32 type. Then it will be
> > impossible to do incorrect arithmetic or conversions by accident on
> > fixed point values.
> > 
> > Is it worth it? I don't know, since it's limited into this one file.
> > 
> > A simple 'typedef s32 vkms_fixed_point' does not work, because it does
> > not prevent computing with vkms_fixed_point as if it was just a normal
> > s32. Using a struct prevents that.  
> 
> ohhh. Got it!
> 
> > 
> > I wonder if the kernel doesn't already have something like this
> > available in general...  
> 
> After some time searching I found `include/drm/drm_fixed.h`[1].
> 
> It seems fine. There are minor things to consider:
> 
> 1. It doesn't have a `FIXED_TO_INT_ROUND` equivalent.
> 2. We can use `fixed20_12` for rgb565 but We have to use s64 for YUV
> formats or add a `sfixed20_12` with s32.
> 
> In terms of consistency, do you think worth using this "library" given
> that we may need to use two distinct ways to represent the fixed point
> soonish? Or it's better to implement `sfixed20_12`? Or just continue 
> with the
> current macros?
> 
> [1] - https://elixir.bootlin.com/linux/latest/source/include/drm/drm_fixed.h

I think that is something the kernel people should weigh in on.

The one thing I would definitely avoid is ending up using multiple
fixed point formats in VKMS.

In the mean time, your current macros seem good enough, if there is no
community interest in better type safety nor sharing the fixed point
code.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 9/9] drm: vkms: Add support to the RGB565 format
  2022-05-09  7:53           ` Pekka Paalanen
@ 2022-05-10 19:24             ` Igor Torrente
  0 siblings, 0 replies; 44+ messages in thread
From: Igor Torrente @ 2022-05-10 19:24 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	leandro.ribeiro, melissa.srw, dri-devel, tales.aparecida,
	~lkcamp/patches

On 5/9/22 04:53, Pekka Paalanen wrote:
> On Fri, 6 May 2022 20:05:39 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
> 
>> Hi Pekka,
>>
>> On 4/27/22 04:55, Pekka Paalanen wrote:
>>> On Tue, 26 Apr 2022 21:53:19 -0300
>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>>    
>>>> Hi Pekka,
>>>>
>>>> On 4/21/22 07:58, Pekka Paalanen wrote:
>>>>> On Mon,  4 Apr 2022 17:45:15 -0300
>>>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>>>>       
>>>>>> Adds this common format to vkms.
>>>>>>
>>>>>> This commit also adds new helper macros to deal with fixed-point
>>>>>> arithmetic.
>>>>>>
>>>>>> It was done to improve the precision of the conversion to ARGB16161616
>>>>>> since the "conversion ratio" is not an integer.
>>>>>>
>>>>>> V3: Adapt the handlers to the new format introduced in patch 7 V3.
>>>>>> V5: Minor improvements
>>>>>>
>>>>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/vkms/vkms_formats.c   | 70 +++++++++++++++++++++++++++
>>>>>>     drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
>>>>>>     drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
>>>>>>     3 files changed, 76 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>> index 8d913fa7dbde..4af8b295f31e 100644
>>>>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>> @@ -5,6 +5,23 @@
>>>>>>     
>>>>>>     #include "vkms_formats.h"
>>>>>>     
>>>>>> +/* The following macros help doing fixed point arithmetic. */
>>>>>> +/*
>>>>>> + * With Fixed-Point scale 15 we have 17 and 15 bits of integer and fractional
>>>>>> + * parts respectively.
>>>>>> + *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
>>>>>> + * 31                                          0
>>>>>> + */
>>>>>> +#define FIXED_SCALE 15
>>>>>
>>>>> I think this would usually be called a "shift" since it's used in
>>>>> bit-shifts.
>>>>
>>>> Ok, I will rename this.
>>>>   
>>>>>       
>>>>>> +
>>>>>> +#define INT_TO_FIXED(a) ((a) << FIXED_SCALE)
>>>>>> +#define FIXED_MUL(a, b) ((s32)(((s64)(a) * (b)) >> FIXED_SCALE))
>>>>>> +#define FIXED_DIV(a, b) ((s32)(((s64)(a) << FIXED_SCALE) / (b)))
>>>>>
>>>>> A truncating div, ok.
>>>>>       
>>>>>> +/* This macro converts a fixed point number to int, and round half up it */
>>>>>> +#define FIXED_TO_INT_ROUND(a) (((a) + (1 << (FIXED_SCALE - 1))) >> FIXED_SCALE)
>>>>>
>>>>> Yes.
>>>>>       
>>>>>> +/* Convert divisor and dividend to Fixed-Point and performs the division */
>>>>>> +#define INT_TO_FIXED_DIV(a, b) (FIXED_DIV(INT_TO_FIXED(a), INT_TO_FIXED(b)))
>>>>>
>>>>> Ok, this is obvious to read, even though it's the same as FIXED_DIV()
>>>>> alone. Not sure the compiler would optimize that extra bit-shift away...
>>>>>
>>>>> If one wanted to, it would be possible to write type-safe functions for
>>>>> these so that fixed and integer could not be mixed up.
>>>>
>>>> Ok, I will move to a function.
>>>
>>> That's not all.
>>>
>>> If you want it type-safe, then you need something like
>>>
>>> struct vkms_fixed_point {
>>> 	s32 value;
>>> };
>>>
>>> And use `struct vkms_fixed_point` (by value) everywhere where you pass
>>> a fixed point value, and never as a plain s32 type. Then it will be
>>> impossible to do incorrect arithmetic or conversions by accident on
>>> fixed point values.
>>>
>>> Is it worth it? I don't know, since it's limited into this one file.
>>>
>>> A simple 'typedef s32 vkms_fixed_point' does not work, because it does
>>> not prevent computing with vkms_fixed_point as if it was just a normal
>>> s32. Using a struct prevents that.
>>
>> ohhh. Got it!
>>
>>>
>>> I wonder if the kernel doesn't already have something like this
>>> available in general...
>>
>> After some time searching I found `include/drm/drm_fixed.h`[1].
>>
>> It seems fine. There are minor things to consider:
>>
>> 1. It doesn't have a `FIXED_TO_INT_ROUND` equivalent.
>> 2. We can use `fixed20_12` for rgb565 but We have to use s64 for YUV
>> formats or add a `sfixed20_12` with s32.
>>
>> In terms of consistency, do you think worth using this "library" given
>> that we may need to use two distinct ways to represent the fixed point
>> soonish? Or it's better to implement `sfixed20_12`? Or just continue
>> with the
>> current macros?
>>
>> [1] - https://elixir.bootlin.com/linux/latest/source/include/drm/drm_fixed.h
> 
> I think that is something the kernel people should weigh in on.
> 
> The one thing I would definitely avoid is ending up using multiple
> fixed point formats in VKMS.
> 
> In the mean time, your current macros seem good enough, if there is no
> community interest in better type safety nor sharing the fixed point
> code.

OK. Thanks!

> 
> 
> Thanks,
> pq

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
  2022-05-07  7:32   ` Thomas Zimmermann
@ 2022-05-10 20:32     ` Igor Torrente
  0 siblings, 0 replies; 44+ messages in thread
From: Igor Torrente @ 2022-05-10 20:32 UTC (permalink / raw)
  To: Thomas Zimmermann, rodrigosiqueiramelo, melissa.srw, ppaalanen
  Cc: hamohammed.sa, airlied, tales.aparecida, dri-devel,
	leandro.ribeiro, ~lkcamp/patches

Hi Thomas

On 5/7/22 04:32, Thomas Zimmermann wrote:
> Hi
> 
> Am 04.04.22 um 22:45 schrieb Igor Torrente:
>> This will be useful to write tests that depends on these formats.
>>
>> ARGB and XRGB follows the a similar implementation of the former formats.
>> Just adjusting for 16 bits per channel.
>>
>> V3: Adapt the handlers to the new format introduced in patch 7 V3.
>> V5: Minor improvements
>>       Added le16_to_cpu/cpu_to_le16 to the 16 bits color read/writes.
> 
> Is there something we could add to the DRM's format-conversion helpers?

I don't believe there's anything reusable.

Correct if I'm wrong, but from what I understood, these 
format-conversion functions don't seem compatible. The Vkms functions 
are converting to/from an internal format (struct pixel_argb_u16) while 
the functions from `drm_format_helper.c` are converting between two DRM 
formats.

I don't think these vkms format functions are useful to other drivers.

> 
> Best regards
> Thomas
> 	
>>
>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>> ---
>>    drivers/gpu/drm/vkms/vkms_formats.c   | 77 +++++++++++++++++++++++++++
>>    drivers/gpu/drm/vkms/vkms_plane.c     |  5 +-
>>    drivers/gpu/drm/vkms/vkms_writeback.c |  2 +
>>    3 files changed, 83 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>> index 931a61405d6a..8d913fa7dbde 100644
>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>> @@ -78,6 +78,41 @@ static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>>    	}
>>    }
>>    
>> +static void ARGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
>> +				     const struct vkms_frame_info *frame_info,
>> +				     int y)
>> +{
>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>> +	u16 *src_pixels = get_packed_src_addr(frame_info, y);
>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			       stage_buffer->n_pixels);
>> +
>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>> +		out_pixels[x].a = le16_to_cpu(src_pixels[3]);
>> +		out_pixels[x].r = le16_to_cpu(src_pixels[2]);
>> +		out_pixels[x].g = le16_to_cpu(src_pixels[1]);
>> +		out_pixels[x].b = le16_to_cpu(src_pixels[0]);
>> +	}
>> +}
>> +
>> +static void XRGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
>> +				     const struct vkms_frame_info *frame_info,
>> +				     int y)
>> +{
>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>> +	u16 *src_pixels = get_packed_src_addr(frame_info, y);
>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			       stage_buffer->n_pixels);
>> +
>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>> +		out_pixels[x].a = (u16)0xffff;
>> +		out_pixels[x].r = le16_to_cpu(src_pixels[2]);
>> +		out_pixels[x].g = le16_to_cpu(src_pixels[1]);
>> +		out_pixels[x].b = le16_to_cpu(src_pixels[0]);
>> +	}
>> +}
>> +
>> +
>>    /*
>>     * The following  functions take an line of argb_u16 pixels from the
>>     * src_buffer, convert them to a specific format, and store them in the
>> @@ -130,12 +165,50 @@ static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
>>    	}
>>    }
>>    
>> +static void argb_u16_to_ARGB16161616(struct vkms_frame_info *frame_info,
>> +				     const struct line_buffer *src_buffer, int y)
>> +{
>> +	int x, x_dst = frame_info->dst.x1;
>> +	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			    src_buffer->n_pixels);
>> +
>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>> +		dst_pixels[3] = cpu_to_le16(in_pixels[x].a);
>> +		dst_pixels[2] = cpu_to_le16(in_pixels[x].r);
>> +		dst_pixels[1] = cpu_to_le16(in_pixels[x].g);
>> +		dst_pixels[0] = cpu_to_le16(in_pixels[x].b);
>> +	}
>> +}
>> +
>> +static void argb_u16_to_XRGB16161616(struct vkms_frame_info *frame_info,
>> +				     const struct line_buffer *src_buffer, int y)
>> +{
>> +	int x, x_dst = frame_info->dst.x1;
>> +	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			    src_buffer->n_pixels);
>> +
>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>> +		dst_pixels[3] = (u8)0xffff;
>> +		dst_pixels[2] = cpu_to_le16(in_pixels[x].r);
>> +		dst_pixels[1] = cpu_to_le16(in_pixels[x].g);
>> +		dst_pixels[0] = cpu_to_le16(in_pixels[x].b);
>> +	}
>> +}
>> +
>>    plane_format_transform_func get_plane_fmt_transform_function(u32 format)
>>    {
>>    	if (format == DRM_FORMAT_ARGB8888)
>>    		return &ARGB8888_to_argb_u16;
>>    	else if (format == DRM_FORMAT_XRGB8888)
>>    		return &XRGB8888_to_argb_u16;
>> +	else if (format == DRM_FORMAT_ARGB16161616)
>> +		return &ARGB16161616_to_argb_u16;
>> +	else if (format == DRM_FORMAT_XRGB16161616)
>> +		return &XRGB16161616_to_argb_u16;
>>    	else
>>    		return NULL;
>>    }
>> @@ -146,6 +219,10 @@ wb_format_transform_func get_wb_fmt_transform_function(u32 format)
>>    		return &argb_u16_to_ARGB8888;
>>    	else if (format == DRM_FORMAT_XRGB8888)
>>    		return &argb_u16_to_XRGB8888;
>> +	else if (format == DRM_FORMAT_ARGB16161616)
>> +		return &argb_u16_to_ARGB16161616;
>> +	else if (format == DRM_FORMAT_XRGB16161616)
>> +		return &argb_u16_to_XRGB16161616;
>>    	else
>>    		return NULL;
>>    }
>> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
>> index 798243837fd0..60054a85204a 100644
>> --- a/drivers/gpu/drm/vkms/vkms_plane.c
>> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
>> @@ -14,11 +14,14 @@
>>    
>>    static const u32 vkms_formats[] = {
>>    	DRM_FORMAT_XRGB8888,
>> +	DRM_FORMAT_XRGB16161616
>>    };
>>    
>>    static const u32 vkms_plane_formats[] = {
>>    	DRM_FORMAT_ARGB8888,
>> -	DRM_FORMAT_XRGB8888
>> +	DRM_FORMAT_XRGB8888,
>> +	DRM_FORMAT_XRGB16161616,
>> +	DRM_FORMAT_ARGB16161616
>>    };
>>    
>>    static struct drm_plane_state *
>> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
>> index 97f71e784bbf..cb63a5da9af1 100644
>> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
>> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
>> @@ -15,6 +15,8 @@
>>    
>>    static const u32 vkms_wb_formats[] = {
>>    	DRM_FORMAT_XRGB8888,
>> +	DRM_FORMAT_XRGB16161616,
>> +	DRM_FORMAT_ARGB16161616
>>    };
>>    
>>    static const struct drm_connector_funcs vkms_wb_connector_funcs = {
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 0/9] Add new formats support to vkms
  2022-04-04 20:45 [PATCH v5 0/9] Add new formats support to vkms Igor Torrente
                   ` (8 preceding siblings ...)
  2022-04-04 20:45 ` [PATCH v5 9/9] drm: vkms: Add support to the RGB565 format Igor Torrente
@ 2022-06-13  9:52 ` Melissa Wen
  2022-06-13 20:26   ` Igor Torrente
  9 siblings, 1 reply; 44+ messages in thread
From: Melissa Wen @ 2022-06-13  9:52 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, tales.aparecida,
	leandro.ribeiro, melissa.srw, ppaalanen, dri-devel, tzimmermann,
	~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 4254 bytes --]

On 04/04, Igor Torrente wrote:
> Summary
> =======
> This series of patches refactor some vkms components in order to introduce
> new formats to the planes and writeback connector.
> 
> Now in the blend function, the plane's pixels are converted to ARGB16161616
> and then blended together.
> 
> The CRC is calculated based on the ARGB1616161616 buffer. And if required,
> this buffer is copied/converted to the writeback buffer format.
> 
> And to handle the pixel conversion, new functions were added to convert
> from a specific format to ARGB16161616 (the reciprocal is also true).
> 
> Tests
> =====
> This patch series was tested using the following igt tests:
> -t ".*kms_plane.*"
> -t ".*kms_writeback.*"
> -t ".*kms_cursor_crc*"
> -t ".*kms_flip.*"
> 
> New tests passing
> -------------------
> - pipe-A-cursor-size-change
> - pipe-A-cursor-alpha-transparent
> 
> Performance
> -----------
> It's running slightly faster than the current implementation.
> 
> Results running the IGT[1] test
> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
> 
> |                  Frametime                   |
> |:--------------------------------------------:|
> |  Implementation |  Current  |   This commit  |
> |:---------------:|:---------:|:--------------:|
> | frametime range |  9~22 ms  |     10~22 ms   |
> |     Average     |  11.4 ms  |     12.32 ms   |
> 
> Memory consumption
> ==================
> It consumes less memory than the current implementation in
> the common case (more detail in the commit message).
> 
> | Memory consumption (output dimensions) |
> |:--------------------------------------:|
> |       Current      |     This patch    |
> |:------------------:|:-----------------:|
> |   Width * Heigth   |     2 * Width     |
> 
> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
> 
> XRGB to ARGB behavior
> =====================
> During the development, I decided to always fill the alpha channel of
> the output pixel whenever the conversion from a format without an alpha
> channel to ARGB16161616 is necessary. Therefore, I ignore the value
> received from the XRGB and overwrite the value with 0xFFFF.
> 
> Primary plane and CRTC size
> ===========================
> This patch series reworks the blend function to accept a primary plane with
> a different size and position from CRTC.
> Because now we need to fill the background, we had a loss in
> performance with this change
> 
> ---
Hi Igor,

Thanks for this effort.

> Igor Torrente (9):
>   drm: vkms: Alloc the compose frame using vzalloc

As this first patch fixes an error on vkms, I cherry-picked it and
applied to drm-misc-next.

For remaining patches, looking forward the next version addressing
feedback and rebasing them too.

Best Regards,

Melissa

>   drm: vkms: Replace hardcoded value of `vkms_composer.map` to
>     DRM_FORMAT_MAX_PLANES
>   drm: vkms: Rename `vkms_composer` to `vkms_frame_info`
>   drm: drm_atomic_helper: Add a new helper to deal with the writeback
>     connector validation
>   drm: vkms: Add fb information to `vkms_writeback_job`
>   drm: vkms: Refactor the plane composer to accept new formats
>   drm: vkms: Supports to the case where primary plane doesn't match the
>     CRTC
>   drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
>   drm: vkms: Add support to the RGB565 format
> 
>  Documentation/gpu/vkms.rst            |  13 +-
>  drivers/gpu/drm/drm_atomic_helper.c   |  39 ++++
>  drivers/gpu/drm/vkms/Makefile         |   1 +
>  drivers/gpu/drm/vkms/vkms_composer.c  | 325 ++++++++++++--------------
>  drivers/gpu/drm/vkms/vkms_crtc.c      |   4 +
>  drivers/gpu/drm/vkms/vkms_drv.h       |  41 +++-
>  drivers/gpu/drm/vkms/vkms_formats.c   | 298 +++++++++++++++++++++++
>  drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
>  drivers/gpu/drm/vkms/vkms_plane.c     |  50 ++--
>  drivers/gpu/drm/vkms/vkms_writeback.c |  35 ++-
>  include/drm/drm_atomic_helper.h       |   3 +
>  11 files changed, 596 insertions(+), 225 deletions(-)
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> 
> -- 
> 2.30.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 0/9] Add new formats support to vkms
  2022-06-13  9:52 ` [PATCH v5 0/9] Add new formats support to vkms Melissa Wen
@ 2022-06-13 20:26   ` Igor Torrente
  0 siblings, 0 replies; 44+ messages in thread
From: Igor Torrente @ 2022-06-13 20:26 UTC (permalink / raw)
  To: Melissa Wen
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, tales.aparecida,
	leandro.ribeiro, melissa.srw, ppaalanen, dri-devel, tzimmermann,
	~lkcamp/patches

Hi Melissa,

On 6/13/22 06:52, Melissa Wen wrote:
> On 04/04, Igor Torrente wrote:
>> Summary
>> =======
>> This series of patches refactor some vkms components in order to introduce
>> new formats to the planes and writeback connector.
>>
>> Now in the blend function, the plane's pixels are converted to ARGB16161616
>> and then blended together.
>>
>> The CRC is calculated based on the ARGB1616161616 buffer. And if required,
>> this buffer is copied/converted to the writeback buffer format.
>>
>> And to handle the pixel conversion, new functions were added to convert
>> from a specific format to ARGB16161616 (the reciprocal is also true).
>>
>> Tests
>> =====
>> This patch series was tested using the following igt tests:
>> -t ".*kms_plane.*"
>> -t ".*kms_writeback.*"
>> -t ".*kms_cursor_crc*"
>> -t ".*kms_flip.*"
>>
>> New tests passing
>> -------------------
>> - pipe-A-cursor-size-change
>> - pipe-A-cursor-alpha-transparent
>>
>> Performance
>> -----------
>> It's running slightly faster than the current implementation.
>>
>> Results running the IGT[1] test
>> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
>>
>> |                  Frametime                   |
>> |:--------------------------------------------:|
>> |  Implementation |  Current  |   This commit  |
>> |:---------------:|:---------:|:--------------:|
>> | frametime range |  9~22 ms  |     10~22 ms   |
>> |     Average     |  11.4 ms  |     12.32 ms   |
>>
>> Memory consumption
>> ==================
>> It consumes less memory than the current implementation in
>> the common case (more detail in the commit message).
>>
>> | Memory consumption (output dimensions) |
>> |:--------------------------------------:|
>> |       Current      |     This patch    |
>> |:------------------:|:-----------------:|
>> |   Width * Heigth   |     2 * Width     |
>>
>> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
>>
>> XRGB to ARGB behavior
>> =====================
>> During the development, I decided to always fill the alpha channel of
>> the output pixel whenever the conversion from a format without an alpha
>> channel to ARGB16161616 is necessary. Therefore, I ignore the value
>> received from the XRGB and overwrite the value with 0xFFFF.
>>
>> Primary plane and CRTC size
>> ===========================
>> This patch series reworks the blend function to accept a primary plane with
>> a different size and position from CRTC.
>> Because now we need to fill the background, we had a loss in
>> performance with this change
>>
>> ---
> Hi Igor,
> 
> Thanks for this effort.
> 
>> Igor Torrente (9):
>>    drm: vkms: Alloc the compose frame using vzalloc
> 
> As this first patch fixes an error on vkms, I cherry-picked it and
> applied to drm-misc-next.

Oh right. I will skip it then!

Best Regards,
---
Igor Torrente

> 
> For remaining patches, looking forward the next version addressing
> feedback and rebasing them too.
> 
> Best Regards,
> 
> Melissa
> 
>>    drm: vkms: Replace hardcoded value of `vkms_composer.map` to
>>      DRM_FORMAT_MAX_PLANES
>>    drm: vkms: Rename `vkms_composer` to `vkms_frame_info`
>>    drm: drm_atomic_helper: Add a new helper to deal with the writeback
>>      connector validation
>>    drm: vkms: Add fb information to `vkms_writeback_job`
>>    drm: vkms: Refactor the plane composer to accept new formats
>>    drm: vkms: Supports to the case where primary plane doesn't match the
>>      CRTC
>>    drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
>>    drm: vkms: Add support to the RGB565 format
>>
>>   Documentation/gpu/vkms.rst            |  13 +-
>>   drivers/gpu/drm/drm_atomic_helper.c   |  39 ++++
>>   drivers/gpu/drm/vkms/Makefile         |   1 +
>>   drivers/gpu/drm/vkms/vkms_composer.c  | 325 ++++++++++++--------------
>>   drivers/gpu/drm/vkms/vkms_crtc.c      |   4 +
>>   drivers/gpu/drm/vkms/vkms_drv.h       |  41 +++-
>>   drivers/gpu/drm/vkms/vkms_formats.c   | 298 +++++++++++++++++++++++
>>   drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
>>   drivers/gpu/drm/vkms/vkms_plane.c     |  50 ++--
>>   drivers/gpu/drm/vkms/vkms_writeback.c |  35 ++-
>>   include/drm/drm_atomic_helper.h       |   3 +
>>   11 files changed, 596 insertions(+), 225 deletions(-)
>>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
>>
>> -- 
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2022-06-13 20:27 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-04 20:45 [PATCH v5 0/9] Add new formats support to vkms Igor Torrente
2022-04-04 20:45 ` [PATCH v5 1/9] drm: vkms: Alloc the compose frame using vzalloc Igor Torrente
2022-04-05 14:05   ` André Almeida
2022-04-05 19:03     ` Igor Torrente
2022-04-04 20:45 ` [PATCH v5 2/9] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES Igor Torrente
2022-04-04 20:45 ` [PATCH v5 3/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info` Igor Torrente
2022-04-04 20:45 ` [PATCH v5 4/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation Igor Torrente
2022-04-05 14:21   ` André Almeida
2022-04-05 19:05     ` Igor Torrente
2022-04-04 20:45 ` [PATCH v5 5/9] drm: vkms: Add fb information to `vkms_writeback_job` Igor Torrente
2022-04-20 11:23   ` Pekka Paalanen
2022-04-23 15:12     ` Igor Torrente
2022-04-25  7:56       ` Pekka Paalanen
2022-04-26  0:56         ` Igor Torrente
2022-04-26  7:09           ` Pekka Paalanen
2022-04-27  0:43             ` Igor Torrente
2022-04-27  7:31               ` Pekka Paalanen
2022-04-04 20:45 ` [PATCH v5 6/9] drm: vkms: Refactor the plane composer to accept new formats Igor Torrente
2022-04-20 12:36   ` Pekka Paalanen
2022-04-23 16:04     ` Igor Torrente
2022-04-23 18:53       ` Igor Torrente
2022-04-25  8:10         ` Pekka Paalanen
2022-04-26  1:54           ` Igor Torrente
2022-04-27  1:03             ` Igor Torrente
2022-04-27  1:22               ` Igor Torrente
2022-04-27  7:43                 ` Pekka Paalanen
2022-04-28  0:44                   ` Igor Torrente
2022-04-29 12:31                     ` Pekka Paalanen
2022-04-04 20:45 ` [PATCH v5 7/9] drm: vkms: Supports to the case where primary plane doesn't match the CRTC Igor Torrente
2022-04-20 13:13   ` Pekka Paalanen
2022-04-24  0:41     ` Igor Torrente
2022-04-04 20:45 ` [PATCH v5 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats Igor Torrente
2022-04-20 13:19   ` Pekka Paalanen
2022-05-07  7:32   ` Thomas Zimmermann
2022-05-10 20:32     ` Igor Torrente
2022-04-04 20:45 ` [PATCH v5 9/9] drm: vkms: Add support to the RGB565 format Igor Torrente
2022-04-21 10:58   ` Pekka Paalanen
2022-04-27  0:53     ` Igor Torrente
2022-04-27  7:55       ` Pekka Paalanen
2022-05-06 23:05         ` Igor Torrente
2022-05-09  7:53           ` Pekka Paalanen
2022-05-10 19:24             ` Igor Torrente
2022-06-13  9:52 ` [PATCH v5 0/9] Add new formats support to vkms Melissa Wen
2022-06-13 20:26   ` Igor Torrente

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.