All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/9] Add new formats support to vkms
@ 2022-01-21 21:38 Igor Torrente
  2022-01-21 21:38 ` [PATCH v4 1/9] drm: vkms: Replace the deprecated drm_mode_config_init Igor Torrente
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Igor Torrente @ 2022-01-21 21:38 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, dri-devel, ~lkcamp/patches, Igor Torrente

Summary
=======
This series of patches refactor some vkms components in order to introduce
new formats to the planes and writeback connector.

Now in the blend function, the plane's pixels are converted to ARGB16161616
and then blended together.

The CRC is calculated based on the ARGB1616161616 buffer. And if required,
this buffer is copied/converted to the writeback buffer format.

And to handle the pixel conversion, new functions were added to convert
from a specific format to ARGB16161616 (the reciprocal is also true).

Tests
=====
This patch series was tested using the following igt tests:
-t ".*kms_plane.*"
-t ".*kms_writeback.*"
-t ".*kms_cursor_crc*"
-t ".*kms_flip.*"

New tests passing
-------------------
- pipe-A-cursor-size-change
- pipe-A-cursor-alpha-transparent

Performance
-----------
Further optimizing the code, now it's running slightly faster than the V2.
And it consumes less memory than the current implementation in the common case
(more detail in the commit message).

Results running the IGT tests `kms_cursor_crc`:

|                             Frametime                                 |
|:---------------:|:---------:|:--------------:|:------------:|:-------:|
|  implmentation  |  Current  |  Per-pixel(V1) | Per-line(V2) |   V3    |
| frametime range |  8~22 ms  |    32~56 ms    |    6~19 ms   | 5~18 ms |
|     Average     |  10.0 ms  |     35.8 ms    |    8.6 ms    |  7.3 ms |

| Memory consumption (output dimensions) |
|:--------------------------------------:|
|       Current      |     This patch    |
|:------------------:|:-----------------:|
|   Width * Heigth   |     2 * Width     |

XRGB to ARGB behavior
=====================
During the development, I decided to always fill the alpha channel of
the output pixel whenever the conversion from a format without an alpha
channel to ARGB16161616 is necessary. Therefore, I ignore the value
received from the XRGB and overwrite the value with 0xFFFF.

---
Igor Torrente (9):
  drm: vkms: Replace the deprecated drm_mode_config_init
  drm: vkms: Alloc the compose frame using vzalloc
  drm: vkms: Replace hardcoded value of `vkms_composer.map` to
    DRM_FORMAT_MAX_PLANES
  drm: vkms: Rename `vkms_composer` to `vkms_frame_info`
  drm: vkms: Add fb information to `vkms_writeback_job`
  drm: drm_atomic_helper: Add a new helper to deal with the writeback
    connector validation
  drm: vkms: Refactor the plane composer to accept new formats
  drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
  drm: vkms: Add support to the RGB565 format

 drivers/gpu/drm/drm_atomic_helper.c   |  39 +++
 drivers/gpu/drm/vkms/Makefile         |   1 +
 drivers/gpu/drm/vkms/vkms_composer.c  | 336 +++++++++++++-------------
 drivers/gpu/drm/vkms/vkms_drv.c       |   6 +-
 drivers/gpu/drm/vkms/vkms_drv.h       |  20 +-
 drivers/gpu/drm/vkms/vkms_formats.c   | 279 +++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_formats.h   |  49 ++++
 drivers/gpu/drm/vkms/vkms_plane.c     |  47 ++--
 drivers/gpu/drm/vkms/vkms_writeback.c |  32 ++-
 include/drm/drm_atomic_helper.h       |   3 +
 10 files changed, 600 insertions(+), 212 deletions(-)
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h

-- 
2.30.2


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v4 1/9] drm: vkms: Replace the deprecated drm_mode_config_init
  2022-01-21 21:38 [PATCH v4 0/9] Add new formats support to vkms Igor Torrente
@ 2022-01-21 21:38 ` Igor Torrente
  2022-02-08 10:02   ` Melissa Wen
  2022-01-21 21:38 ` [PATCH v4 2/9] drm: vkms: Alloc the compose frame using vzalloc Igor Torrente
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Igor Torrente @ 2022-01-21 21:38 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, dri-devel, ~lkcamp/patches, Igor Torrente

`drm_mode_config_init` is deprecated since commit c3b790ea07a1 ("drm: Manage
drm_mode_config_init with drmm_") in favor of `drmm_mode_config_init`. Update
the former to the latter.

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
V2: Change the code style(Thomas Zimmermann).

V4: Update the commit message(Nícolas F. R. A. Prado)
---
 drivers/gpu/drm/vkms/vkms_drv.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.c b/drivers/gpu/drm/vkms/vkms_drv.c
index 0ffe5f0e33f7..ee4d96dabe19 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.c
+++ b/drivers/gpu/drm/vkms/vkms_drv.c
@@ -140,8 +140,12 @@ static const struct drm_mode_config_helper_funcs vkms_mode_config_helpers = {
 static int vkms_modeset_init(struct vkms_device *vkmsdev)
 {
 	struct drm_device *dev = &vkmsdev->drm;
+	int ret;
+
+	ret = drmm_mode_config_init(dev);
+	if (ret < 0)
+		return ret;
 
-	drm_mode_config_init(dev);
 	dev->mode_config.funcs = &vkms_mode_funcs;
 	dev->mode_config.min_width = XRES_MIN;
 	dev->mode_config.min_height = YRES_MIN;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 2/9] drm: vkms: Alloc the compose frame using vzalloc
  2022-01-21 21:38 [PATCH v4 0/9] Add new formats support to vkms Igor Torrente
  2022-01-21 21:38 ` [PATCH v4 1/9] drm: vkms: Replace the deprecated drm_mode_config_init Igor Torrente
@ 2022-01-21 21:38 ` Igor Torrente
  2022-02-08 10:14   ` Melissa Wen
  2022-01-21 21:38 ` [PATCH v4 3/9] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES Igor Torrente
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Igor Torrente @ 2022-01-21 21:38 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, dri-devel, ~lkcamp/patches, Igor Torrente

Currently, the memory to the composition frame is being allocated using
the kzmalloc. This comes with the limitation of maximum size of one
page size(which in the x86_64 is 4Kb and 4MB for default and hugepage
respectively).

Somes test of igt (e.g. kms_plane@pixel-format) uses more than 4MB when
testing some pixel formats like ARGB16161616.

This problem is addessed by allocating the memory using kvzalloc that
circunvents this limitation.

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 9e8204be9a14..82f79e508f81 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -180,7 +180,7 @@ static int compose_active_planes(void **vaddr_out,
 	int i;
 
 	if (!*vaddr_out) {
-		*vaddr_out = kzalloc(gem_obj->size, GFP_KERNEL);
+		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
 		if (!*vaddr_out) {
 			DRM_ERROR("Cannot allocate memory for output frame.");
 			return -ENOMEM;
@@ -263,7 +263,7 @@ void vkms_composer_worker(struct work_struct *work)
 				    crtc_state);
 	if (ret) {
 		if (ret == -EINVAL && !wb_pending)
-			kfree(vaddr_out);
+			kvfree(vaddr_out);
 		return;
 	}
 
@@ -275,7 +275,7 @@ void vkms_composer_worker(struct work_struct *work)
 		crtc_state->wb_pending = false;
 		spin_unlock_irq(&out->composer_lock);
 	} else {
-		kfree(vaddr_out);
+		kvfree(vaddr_out);
 	}
 
 	/*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 3/9] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES
  2022-01-21 21:38 [PATCH v4 0/9] Add new formats support to vkms Igor Torrente
  2022-01-21 21:38 ` [PATCH v4 1/9] drm: vkms: Replace the deprecated drm_mode_config_init Igor Torrente
  2022-01-21 21:38 ` [PATCH v4 2/9] drm: vkms: Alloc the compose frame using vzalloc Igor Torrente
@ 2022-01-21 21:38 ` Igor Torrente
  2022-02-08 10:16   ` Melissa Wen
  2022-01-21 21:38 ` [PATCH v4 4/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info` Igor Torrente
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Igor Torrente @ 2022-01-21 21:38 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, dri-devel, ~lkcamp/patches, Igor Torrente

The `map` vector at `vkms_composer` uses a hardcoded value to define its
size.

If someday the maximum number of planes increases, this hardcoded value
can be a problem.

This value is being replaced with the DRM_FORMAT_MAX_PLANES macro.

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_drv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 9496fdc900b8..0eeea6f93733 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -30,7 +30,7 @@ struct vkms_writeback_job {
 struct vkms_composer {
 	struct drm_framebuffer fb;
 	struct drm_rect src, dst;
-	struct dma_buf_map map[4];
+	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
 	unsigned int offset;
 	unsigned int pitch;
 	unsigned int cpp;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 4/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info`
  2022-01-21 21:38 [PATCH v4 0/9] Add new formats support to vkms Igor Torrente
                   ` (2 preceding siblings ...)
  2022-01-21 21:38 ` [PATCH v4 3/9] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES Igor Torrente
@ 2022-01-21 21:38 ` Igor Torrente
  2022-02-08 10:20   ` Melissa Wen
  2022-01-21 21:38 ` [PATCH v4 5/9] drm: vkms: Add fb information to `vkms_writeback_job` Igor Torrente
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Igor Torrente @ 2022-01-21 21:38 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, dri-devel, ~lkcamp/patches, Igor Torrente

Changes the name of this struct to a more meaningful name.
A name that represents better what this struct is about.

Composer is the code that do the compositing of the planes.
This struct is contains information of the frame that is
being used in the output composition. Thus, vkms_frame_info
is a better name to represent this.

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c | 87 ++++++++++++++--------------
 drivers/gpu/drm/vkms/vkms_drv.h      |  6 +-
 drivers/gpu/drm/vkms/vkms_plane.c    | 38 ++++++------
 3 files changed, 66 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 82f79e508f81..2d946368a561 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -11,11 +11,11 @@
 #include "vkms_drv.h"
 
 static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
-				 const struct vkms_composer *composer)
+				 const struct vkms_frame_info *frame_info)
 {
 	u32 pixel;
-	int src_offset = composer->offset + (y * composer->pitch)
-				      + (x * composer->cpp);
+	int src_offset = frame_info->offset + (y * frame_info->pitch)
+					    + (x * frame_info->cpp);
 
 	pixel = *(u32 *)&buffer[src_offset];
 
@@ -26,24 +26,24 @@ static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
  * compute_crc - Compute CRC value on output frame
  *
  * @vaddr: address to final framebuffer
- * @composer: framebuffer's metadata
+ * @frame_info: framebuffer's metadata
  *
  * returns CRC value computed using crc32 on the visible portion of
  * the final framebuffer at vaddr_out
  */
 static uint32_t compute_crc(const u8 *vaddr,
-			    const struct vkms_composer *composer)
+			    const struct vkms_frame_info *frame_info)
 {
 	int x, y;
 	u32 crc = 0, pixel = 0;
-	int x_src = composer->src.x1 >> 16;
-	int y_src = composer->src.y1 >> 16;
-	int h_src = drm_rect_height(&composer->src) >> 16;
-	int w_src = drm_rect_width(&composer->src) >> 16;
+	int x_src = frame_info->src.x1 >> 16;
+	int y_src = frame_info->src.y1 >> 16;
+	int h_src = drm_rect_height(&frame_info->src) >> 16;
+	int w_src = drm_rect_width(&frame_info->src) >> 16;
 
 	for (y = y_src; y < y_src + h_src; ++y) {
 		for (x = x_src; x < x_src + w_src; ++x) {
-			pixel = get_pixel_from_buffer(x, y, vaddr, composer);
+			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
 			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
 		}
 	}
@@ -98,8 +98,8 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
  * blend - blend value at vaddr_src with value at vaddr_dst
  * @vaddr_dst: destination address
  * @vaddr_src: source address
- * @dst_composer: destination framebuffer's metadata
- * @src_composer: source framebuffer's metadata
+ * @dst_frame_info: destination framebuffer's metadata
+ * @src_frame_info: source framebuffer's metadata
  * @pixel_blend: blending equation based on plane format
  *
  * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
@@ -111,33 +111,33 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
  * pixel color values
  */
 static void blend(void *vaddr_dst, void *vaddr_src,
-		  struct vkms_composer *dst_composer,
-		  struct vkms_composer *src_composer,
+		  struct vkms_frame_info *dst_frame_info,
+		  struct vkms_frame_info *src_frame_info,
 		  void (*pixel_blend)(const u8 *, u8 *))
 {
 	int i, j, j_dst, i_dst;
 	int offset_src, offset_dst;
 	u8 *pixel_dst, *pixel_src;
 
-	int x_src = src_composer->src.x1 >> 16;
-	int y_src = src_composer->src.y1 >> 16;
+	int x_src = src_frame_info->src.x1 >> 16;
+	int y_src = src_frame_info->src.y1 >> 16;
 
-	int x_dst = src_composer->dst.x1;
-	int y_dst = src_composer->dst.y1;
-	int h_dst = drm_rect_height(&src_composer->dst);
-	int w_dst = drm_rect_width(&src_composer->dst);
+	int x_dst = src_frame_info->dst.x1;
+	int y_dst = src_frame_info->dst.y1;
+	int h_dst = drm_rect_height(&src_frame_info->dst);
+	int w_dst = drm_rect_width(&src_frame_info->dst);
 
 	int y_limit = y_src + h_dst;
 	int x_limit = x_src + w_dst;
 
 	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
 		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
-			offset_dst = dst_composer->offset
-				     + (i_dst * dst_composer->pitch)
-				     + (j_dst++ * dst_composer->cpp);
-			offset_src = src_composer->offset
-				     + (i * src_composer->pitch)
-				     + (j * src_composer->cpp);
+			offset_dst = dst_frame_info->offset
+				     + (i_dst * dst_frame_info->pitch)
+				     + (j_dst++ * dst_frame_info->cpp);
+			offset_src = src_frame_info->offset
+				     + (i * src_frame_info->pitch)
+				     + (j * src_frame_info->cpp);
 
 			pixel_src = (u8 *)(vaddr_src + offset_src);
 			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
@@ -149,32 +149,33 @@ static void blend(void *vaddr_dst, void *vaddr_src,
 	}
 }
 
-static void compose_plane(struct vkms_composer *primary_composer,
-			  struct vkms_composer *plane_composer,
+static void compose_plane(struct vkms_frame_info *primary_plane_info,
+			  struct vkms_frame_info *plane_frame_info,
 			  void *vaddr_out)
 {
-	struct drm_framebuffer *fb = &plane_composer->fb;
+	struct drm_framebuffer *fb = &plane_frame_info->fb;
 	void *vaddr;
 	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
 
-	if (WARN_ON(dma_buf_map_is_null(&primary_composer->map[0])))
+	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
 		return;
 
-	vaddr = plane_composer->map[0].vaddr;
+	vaddr = plane_frame_info->map[0].vaddr;
 
 	if (fb->format->format == DRM_FORMAT_ARGB8888)
 		pixel_blend = &alpha_blend;
 	else
 		pixel_blend = &x_blend;
 
-	blend(vaddr_out, vaddr, primary_composer, plane_composer, pixel_blend);
+	blend(vaddr_out, vaddr, primary_plane_info,
+	      plane_frame_info, pixel_blend);
 }
 
 static int compose_active_planes(void **vaddr_out,
-				 struct vkms_composer *primary_composer,
+				 struct vkms_frame_info *primary_plane_info,
 				 struct vkms_crtc_state *crtc_state)
 {
-	struct drm_framebuffer *fb = &primary_composer->fb;
+	struct drm_framebuffer *fb = &primary_plane_info->fb;
 	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
 	const void *vaddr;
 	int i;
@@ -187,10 +188,10 @@ static int compose_active_planes(void **vaddr_out,
 		}
 	}
 
-	if (WARN_ON(dma_buf_map_is_null(&primary_composer->map[0])))
+	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
 		return -EINVAL;
 
-	vaddr = primary_composer->map[0].vaddr;
+	vaddr = primary_plane_info->map[0].vaddr;
 
 	memcpy(*vaddr_out, vaddr, gem_obj->size);
 
@@ -199,8 +200,8 @@ static int compose_active_planes(void **vaddr_out,
 	 * ((primary <- overlay) <- cursor)
 	 */
 	for (i = 1; i < crtc_state->num_active_planes; i++)
-		compose_plane(primary_composer,
-			      crtc_state->active_planes[i]->composer,
+		compose_plane(primary_plane_info,
+			      crtc_state->active_planes[i]->frame_info,
 			      *vaddr_out);
 
 	return 0;
@@ -222,7 +223,7 @@ void vkms_composer_worker(struct work_struct *work)
 						composer_work);
 	struct drm_crtc *crtc = crtc_state->base.crtc;
 	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
-	struct vkms_composer *primary_composer = NULL;
+	struct vkms_frame_info *primary_plane_info = NULL;
 	struct vkms_plane_state *act_plane = NULL;
 	bool crc_pending, wb_pending;
 	void *vaddr_out = NULL;
@@ -250,16 +251,16 @@ void vkms_composer_worker(struct work_struct *work)
 	if (crtc_state->num_active_planes >= 1) {
 		act_plane = crtc_state->active_planes[0];
 		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
-			primary_composer = act_plane->composer;
+			primary_plane_info = act_plane->frame_info;
 	}
 
-	if (!primary_composer)
+	if (!primary_plane_info)
 		return;
 
 	if (wb_pending)
 		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
 
-	ret = compose_active_planes(&vaddr_out, primary_composer,
+	ret = compose_active_planes(&vaddr_out, primary_plane_info,
 				    crtc_state);
 	if (ret) {
 		if (ret == -EINVAL && !wb_pending)
@@ -267,7 +268,7 @@ void vkms_composer_worker(struct work_struct *work)
 		return;
 	}
 
-	crc32 = compute_crc(vaddr_out, primary_composer);
+	crc32 = compute_crc(vaddr_out, primary_plane_info);
 
 	if (wb_pending) {
 		drm_writeback_signal_completion(&out->wb_connector, 0);
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 0eeea6f93733..2e6342164bef 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -27,7 +27,7 @@ struct vkms_writeback_job {
 	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
 };
 
-struct vkms_composer {
+struct vkms_frame_info {
 	struct drm_framebuffer fb;
 	struct drm_rect src, dst;
 	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
@@ -39,11 +39,11 @@ struct vkms_composer {
 /**
  * vkms_plane_state - Driver specific plane state
  * @base: base plane state
- * @composer: data required for composing computation
+ * @frame_info: data required for composing computation
  */
 struct vkms_plane_state {
 	struct drm_shadow_plane_state base;
-	struct vkms_composer *composer;
+	struct vkms_frame_info *frame_info;
 };
 
 struct vkms_plane {
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 32409e15244b..a56b0f76eddd 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -24,20 +24,20 @@ static struct drm_plane_state *
 vkms_plane_duplicate_state(struct drm_plane *plane)
 {
 	struct vkms_plane_state *vkms_state;
-	struct vkms_composer *composer;
+	struct vkms_frame_info *frame_info;
 
 	vkms_state = kzalloc(sizeof(*vkms_state), GFP_KERNEL);
 	if (!vkms_state)
 		return NULL;
 
-	composer = kzalloc(sizeof(*composer), GFP_KERNEL);
-	if (!composer) {
-		DRM_DEBUG_KMS("Couldn't allocate composer\n");
+	frame_info = kzalloc(sizeof(*frame_info), GFP_KERNEL);
+	if (!frame_info) {
+		DRM_DEBUG_KMS("Couldn't allocate frame_info\n");
 		kfree(vkms_state);
 		return NULL;
 	}
 
-	vkms_state->composer = composer;
+	vkms_state->frame_info = frame_info;
 
 	__drm_gem_duplicate_shadow_plane_state(plane, &vkms_state->base);
 
@@ -54,12 +54,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
 		/* dropping the reference we acquired in
 		 * vkms_primary_plane_update()
 		 */
-		if (drm_framebuffer_read_refcount(&vkms_state->composer->fb))
-			drm_framebuffer_put(&vkms_state->composer->fb);
+		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
+			drm_framebuffer_put(&vkms_state->frame_info->fb);
 	}
 
-	kfree(vkms_state->composer);
-	vkms_state->composer = NULL;
+	kfree(vkms_state->frame_info);
+	vkms_state->frame_info = NULL;
 
 	__drm_gem_destroy_shadow_plane_state(&vkms_state->base);
 	kfree(vkms_state);
@@ -99,7 +99,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	struct vkms_plane_state *vkms_plane_state;
 	struct drm_shadow_plane_state *shadow_plane_state;
 	struct drm_framebuffer *fb = new_state->fb;
-	struct vkms_composer *composer;
+	struct vkms_frame_info *frame_info;
 
 	if (!new_state->crtc || !fb)
 		return;
@@ -107,15 +107,15 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	vkms_plane_state = to_vkms_plane_state(new_state);
 	shadow_plane_state = &vkms_plane_state->base;
 
-	composer = vkms_plane_state->composer;
-	memcpy(&composer->src, &new_state->src, sizeof(struct drm_rect));
-	memcpy(&composer->dst, &new_state->dst, sizeof(struct drm_rect));
-	memcpy(&composer->fb, fb, sizeof(struct drm_framebuffer));
-	memcpy(&composer->map, &shadow_plane_state->data, sizeof(composer->map));
-	drm_framebuffer_get(&composer->fb);
-	composer->offset = fb->offsets[0];
-	composer->pitch = fb->pitches[0];
-	composer->cpp = fb->format->cpp[0];
+	frame_info = vkms_plane_state->frame_info;
+	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
+	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
+	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
+	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
+	drm_framebuffer_get(&frame_info->fb);
+	frame_info->offset = fb->offsets[0];
+	frame_info->pitch = fb->pitches[0];
+	frame_info->cpp = fb->format->cpp[0];
 }
 
 static int vkms_plane_atomic_check(struct drm_plane *plane,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 5/9] drm: vkms: Add fb information to `vkms_writeback_job`
  2022-01-21 21:38 [PATCH v4 0/9] Add new formats support to vkms Igor Torrente
                   ` (3 preceding siblings ...)
  2022-01-21 21:38 ` [PATCH v4 4/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info` Igor Torrente
@ 2022-01-21 21:38 ` Igor Torrente
  2022-02-08 10:22   ` Melissa Wen
  2022-01-21 21:38 ` [PATCH v4 6/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation Igor Torrente
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Igor Torrente @ 2022-01-21 21:38 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, dri-devel, ~lkcamp/patches, Igor Torrente

This commit is the groundwork to introduce new formats to the planes and
writeback buffer. As part of it, a new buffer metadata field is added to
`vkms_writeback_job`, this metadata is represented by the `vkms_composer`
struct.

This will allow us, in the future, to have different compositing and wb
format types.

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
V2: Change the code to get the drm_framebuffer reference and not copy its
    contents(Thomas Zimmermann).

V3: Drop the refcount in the wb code(Thomas Zimmermann).
---
 drivers/gpu/drm/vkms/vkms_composer.c  |  4 ++--
 drivers/gpu/drm/vkms/vkms_drv.h       | 12 ++++++------
 drivers/gpu/drm/vkms/vkms_plane.c     | 10 +++++-----
 drivers/gpu/drm/vkms/vkms_writeback.c | 20 +++++++++++++++++---
 4 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 2d946368a561..95029d2ebcac 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -153,7 +153,7 @@ static void compose_plane(struct vkms_frame_info *primary_plane_info,
 			  struct vkms_frame_info *plane_frame_info,
 			  void *vaddr_out)
 {
-	struct drm_framebuffer *fb = &plane_frame_info->fb;
+	struct drm_framebuffer *fb = plane_frame_info->fb;
 	void *vaddr;
 	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
 
@@ -175,7 +175,7 @@ static int compose_active_planes(void **vaddr_out,
 				 struct vkms_frame_info *primary_plane_info,
 				 struct vkms_crtc_state *crtc_state)
 {
-	struct drm_framebuffer *fb = &primary_plane_info->fb;
+	struct drm_framebuffer *fb = primary_plane_info->fb;
 	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
 	const void *vaddr;
 	int i;
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 2e6342164bef..c850d755247c 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -22,13 +22,8 @@
 
 #define NUM_OVERLAY_PLANES 8
 
-struct vkms_writeback_job {
-	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
-	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
-};
-
 struct vkms_frame_info {
-	struct drm_framebuffer fb;
+	struct drm_framebuffer *fb;
 	struct drm_rect src, dst;
 	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
 	unsigned int offset;
@@ -36,6 +31,11 @@ struct vkms_frame_info {
 	unsigned int cpp;
 };
 
+struct vkms_writeback_job {
+	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
+	struct vkms_frame_info frame_info;
+};
+
 /**
  * vkms_plane_state - Driver specific plane state
  * @base: base plane state
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index a56b0f76eddd..28752af0118c 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -50,12 +50,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
 	struct vkms_plane_state *vkms_state = to_vkms_plane_state(old_state);
 	struct drm_crtc *crtc = vkms_state->base.base.crtc;
 
-	if (crtc) {
+	if (crtc && vkms_state->frame_info->fb) {
 		/* dropping the reference we acquired in
 		 * vkms_primary_plane_update()
 		 */
-		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
-			drm_framebuffer_put(&vkms_state->frame_info->fb);
+		if (drm_framebuffer_read_refcount(vkms_state->frame_info->fb))
+			drm_framebuffer_put(vkms_state->frame_info->fb);
 	}
 
 	kfree(vkms_state->frame_info);
@@ -110,9 +110,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	frame_info = vkms_plane_state->frame_info;
 	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
 	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
-	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
+	frame_info->fb = fb;
 	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
-	drm_framebuffer_get(&frame_info->fb);
+	drm_framebuffer_get(frame_info->fb);
 	frame_info->offset = fb->offsets[0];
 	frame_info->pitch = fb->pitches[0];
 	frame_info->cpp = fb->format->cpp[0];
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index 8694227f555f..de379331b236 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -75,12 +75,15 @@ static int vkms_wb_prepare_job(struct drm_writeback_connector *wb_connector,
 	if (!vkmsjob)
 		return -ENOMEM;
 
-	ret = drm_gem_fb_vmap(job->fb, vkmsjob->map, vkmsjob->data);
+	ret = drm_gem_fb_vmap(job->fb, vkmsjob->frame_info.map, vkmsjob->data);
 	if (ret) {
 		DRM_ERROR("vmap failed: %d\n", ret);
 		goto err_kfree;
 	}
 
+	vkmsjob->frame_info.fb = job->fb;
+	drm_framebuffer_get(vkmsjob->frame_info.fb);
+
 	job->priv = vkmsjob;
 
 	return 0;
@@ -99,7 +102,9 @@ static void vkms_wb_cleanup_job(struct drm_writeback_connector *connector,
 	if (!job->fb)
 		return;
 
-	drm_gem_fb_vunmap(job->fb, vkmsjob->map);
+	drm_gem_fb_vunmap(job->fb, vkmsjob->frame_info.map);
+
+	drm_framebuffer_put(vkmsjob->frame_info.fb);
 
 	vkmsdev = drm_device_to_vkms_device(job->fb->dev);
 	vkms_set_composer(&vkmsdev->output, false);
@@ -116,14 +121,23 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
 	struct drm_writeback_connector *wb_conn = &output->wb_connector;
 	struct drm_connector_state *conn_state = wb_conn->base.state;
 	struct vkms_crtc_state *crtc_state = output->composer_state;
+	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
+	struct vkms_writeback_job *active_wb;
+	struct vkms_frame_info *wb_frame_info;
 
 	if (!conn_state)
 		return;
 
 	vkms_set_composer(&vkmsdev->output, true);
 
+	active_wb = conn_state->writeback_job->priv;
+	wb_frame_info = &active_wb->frame_info;
+
 	spin_lock_irq(&output->composer_lock);
-	crtc_state->active_writeback = conn_state->writeback_job->priv;
+	crtc_state->active_writeback = active_wb;
+	wb_frame_info->offset = fb->offsets[0];
+	wb_frame_info->pitch = fb->pitches[0];
+	wb_frame_info->cpp = fb->format->cpp[0];
 	crtc_state->wb_pending = true;
 	spin_unlock_irq(&output->composer_lock);
 	drm_writeback_queue_job(wb_conn, connector_state);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 6/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation
  2022-01-21 21:38 [PATCH v4 0/9] Add new formats support to vkms Igor Torrente
                   ` (4 preceding siblings ...)
  2022-01-21 21:38 ` [PATCH v4 5/9] drm: vkms: Add fb information to `vkms_writeback_job` Igor Torrente
@ 2022-01-21 21:38 ` Igor Torrente
  2022-01-21 21:38 ` [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats Igor Torrente
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Igor Torrente @ 2022-01-21 21:38 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, dri-devel, ~lkcamp/patches, Igor Torrente

Add a helper function to validate the connector configuration receive in
the encoder atomic_check by the drivers.

So the drivers don't need do these common validations themselves.

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
V2: Move the format verification to a new helper at the drm_atomic_helper.c
    (Thomas Zimmermann).

V3: Format check improvements (Leandro Ribeiro).
    Minor improvements(Thomas Zimmermann).
---
 drivers/gpu/drm/drm_atomic_helper.c   | 39 +++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_writeback.c |  9 +++----
 include/drm/drm_atomic_helper.h       |  3 +++
 3 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index a7a05e1e26bb..ccb6e62bf80a 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -776,6 +776,45 @@ drm_atomic_helper_check_modeset(struct drm_device *dev,
 }
 EXPORT_SYMBOL(drm_atomic_helper_check_modeset);
 
+/**
+ * drm_atomic_helper_check_wb_connector_state() - Check writeback encoder state
+ * @encoder: encoder state to check
+ * @conn_state: connector state to check
+ *
+ * Checks if the writeback connector state is valid, and returns an error if it
+ * isn't.
+ *
+ * RETURNS:
+ * Zero for success or -errno
+ */
+int
+drm_atomic_helper_check_wb_encoder_state(struct drm_encoder *encoder,
+					 struct drm_connector_state *conn_state)
+{
+	struct drm_writeback_job *wb_job = conn_state->writeback_job;
+	struct drm_property_blob *pixel_format_blob;
+	struct drm_framebuffer *fb;
+	size_t i, nformats;
+	u32 *formats;
+
+	if (!wb_job || !wb_job->fb)
+		return 0;
+
+	pixel_format_blob = wb_job->connector->pixel_formats_blob_ptr;
+	nformats = pixel_format_blob->length / sizeof(u32);
+	formats = pixel_format_blob->data;
+	fb = wb_job->fb;
+
+	for (i = 0; i < nformats; i++)
+		if (fb->format->format == formats[i])
+			return 0;
+
+	drm_dbg_kms(encoder->dev, "Invalid pixel format %p4cc\n", &fb->format->format);
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(drm_atomic_helper_check_wb_encoder_state);
+
 /**
  * drm_atomic_helper_check_plane_state() - Check plane state for validity
  * @plane_state: plane state to check
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index de379331b236..ad4bb1fb37ca 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -30,6 +30,7 @@ static int vkms_wb_encoder_atomic_check(struct drm_encoder *encoder,
 {
 	struct drm_framebuffer *fb;
 	const struct drm_display_mode *mode = &crtc_state->mode;
+	int ret;
 
 	if (!conn_state->writeback_job || !conn_state->writeback_job->fb)
 		return 0;
@@ -41,11 +42,9 @@ static int vkms_wb_encoder_atomic_check(struct drm_encoder *encoder,
 		return -EINVAL;
 	}
 
-	if (fb->format->format != vkms_wb_formats[0]) {
-		DRM_DEBUG_KMS("Invalid pixel format %p4cc\n",
-			      &fb->format->format);
-		return -EINVAL;
-	}
+	ret = drm_atomic_helper_check_wb_encoder_state(encoder, conn_state);
+	if (ret < 0)
+		return ret;
 
 	return 0;
 }
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
index 4045e2507e11..3fbf695da60f 100644
--- a/include/drm/drm_atomic_helper.h
+++ b/include/drm/drm_atomic_helper.h
@@ -40,6 +40,9 @@ struct drm_private_state;
 
 int drm_atomic_helper_check_modeset(struct drm_device *dev,
 				struct drm_atomic_state *state);
+int
+drm_atomic_helper_check_wb_encoder_state(struct drm_encoder *encoder,
+					 struct drm_connector_state *conn_state);
 int drm_atomic_helper_check_plane_state(struct drm_plane_state *plane_state,
 					const struct drm_crtc_state *crtc_state,
 					int min_scale,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-01-21 21:38 [PATCH v4 0/9] Add new formats support to vkms Igor Torrente
                   ` (5 preceding siblings ...)
  2022-01-21 21:38 ` [PATCH v4 6/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation Igor Torrente
@ 2022-01-21 21:38 ` Igor Torrente
  2022-02-08 10:40   ` Melissa Wen
  2022-02-10  9:37   ` Pekka Paalanen
  2022-01-21 21:38 ` [PATCH v4 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats Igor Torrente
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 31+ messages in thread
From: Igor Torrente @ 2022-01-21 21:38 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, kernel test robot, airlied, dri-devel,
	~lkcamp/patches, Igor Torrente

Currently the blend function only accepts XRGB_8888 and ARGB_8888
as a color input.

This patch refactors all the functions related to the plane composition
to overcome this limitation.

A new internal format(`struct pixel`) is introduced to deal with all
possible inputs. It consists of 16 bits fields that represent each of
the channels.

The pixels blend is done using this internal format. And new handlers
are being added to convert a specific format to/from this internal format.

So the blend operation depends on these handlers to convert to this common
format. The blended result, if necessary, is converted to the writeback
buffer format.

This patch introduces three major differences to the blend function.
1 - All the planes are blended at once.
2 - The blend calculus is done as per line instead of per pixel.
3 - It is responsible to calculates the CRC and writing the writeback
    buffer(if necessary).

These changes allow us to allocate way less memory in the intermediate
buffer to compute these operations. Because now we don't need to
have the entire intermediate image lines at once, just one line is
enough.

| Memory consumption (output dimensions) |
|:--------------------------------------:|
|       Current      |     This patch    |
|:------------------:|:-----------------:|
|   Width * Heigth   |     2 * Width     |

Beyond memory, we also have a minor performance benefit from all
these changes. Results running the IGT tests `*kms_cursor_crc*`:

|                 Frametime                  |
|:------------------------------------------:|
|  Implementation |  Current  |  This commit |
|:---------------:|:---------:|:------------:|
| frametime range |  8~22 ms  |    5~18 ms   |
|     Average     |  10.0 ms  |    7.3 ms    |

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
V2: Improves the performance drastically, by perfoming the operations
    per-line and not per-pixel(Pekka Paalanen).
    Minor improvements(Pekka Paalanen).

V3: Changes the code to blend the planes all at once. This improves
    performance, memory consumption, and removes much of the weirdness
    of the V2(Pekka Paalanen and me).
    Minor improvements(Pekka Paalanen and me).

V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
---
 drivers/gpu/drm/vkms/Makefile        |   1 +
 drivers/gpu/drm/vkms/vkms_composer.c | 335 +++++++++++++--------------
 drivers/gpu/drm/vkms/vkms_formats.c  | 138 +++++++++++
 drivers/gpu/drm/vkms/vkms_formats.h  |  31 +++
 4 files changed, 333 insertions(+), 172 deletions(-)
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h

diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
index 72f779cbfedd..1b28a6a32948 100644
--- a/drivers/gpu/drm/vkms/Makefile
+++ b/drivers/gpu/drm/vkms/Makefile
@@ -3,6 +3,7 @@ vkms-y := \
 	vkms_drv.o \
 	vkms_plane.o \
 	vkms_output.o \
+	vkms_formats.o \
 	vkms_crtc.o \
 	vkms_composer.o \
 	vkms_writeback.o
diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 95029d2ebcac..9f70fcf84fb9 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -9,202 +9,210 @@
 #include <drm/drm_vblank.h>
 
 #include "vkms_drv.h"
+#include "vkms_formats.h"
 
-static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
-				 const struct vkms_frame_info *frame_info)
+static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
 {
-	u32 pixel;
-	int src_offset = frame_info->offset + (y * frame_info->pitch)
-					    + (x * frame_info->cpp);
+	u32 new_color;
 
-	pixel = *(u32 *)&buffer[src_offset];
+	new_color = (src * 0xffff + dst * (0xffff - alpha));
 
-	return pixel;
+	return DIV_ROUND_UP(new_color, 0xffff);
 }
 
 /**
- * compute_crc - Compute CRC value on output frame
+ * pre_mul_alpha_blend - alpha blending equation
+ * @src_frame_info: source framebuffer's metadata
+ * @stage_buffer: The line with the pixels from src_plane
+ * @output_buffer: A line buffer that receives all the blends output
  *
- * @vaddr: address to final framebuffer
- * @frame_info: framebuffer's metadata
+ * Using the information from the `frame_info`, this blends only the
+ * necessary pixels from the `stage_buffer` to the `output_buffer`
+ * using premultiplied blend formula.
  *
- * returns CRC value computed using crc32 on the visible portion of
- * the final framebuffer at vaddr_out
+ * The current DRM assumption is that pixel color values have been already
+ * pre-multiplied with the alpha channel values. See more
+ * drm_plane_create_blend_mode_property(). Also, this formula assumes a
+ * completely opaque background.
  */
-static uint32_t compute_crc(const u8 *vaddr,
-			    const struct vkms_frame_info *frame_info)
+static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
+				struct line_buffer *stage_buffer,
+				struct line_buffer *output_buffer)
 {
-	int x, y;
-	u32 crc = 0, pixel = 0;
-	int x_src = frame_info->src.x1 >> 16;
-	int y_src = frame_info->src.y1 >> 16;
-	int h_src = drm_rect_height(&frame_info->src) >> 16;
-	int w_src = drm_rect_width(&frame_info->src) >> 16;
-
-	for (y = y_src; y < y_src + h_src; ++y) {
-		for (x = x_src; x < x_src + w_src; ++x) {
-			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
-			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
-		}
+	int x, x_dst = frame_info->dst.x1;
+	int x_limit = drm_rect_width(&frame_info->dst);
+	struct line_buffer *out = output_buffer + x_dst;
+	struct line_buffer *in = stage_buffer;
+
+	for (x = 0; x < x_limit; x++) {
+		out[x].a = (u16)0xffff;
+		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
+		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
+		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
 	}
-
-	return crc;
 }
 
-static u8 blend_channel(u8 src, u8 dst, u8 alpha)
+static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
 {
-	u32 pre_blend;
-	u8 new_color;
-
-	pre_blend = (src * 255 + dst * (255 - alpha));
-
-	/* Faster div by 255 */
-	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
+	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
+		return true;
 
-	return new_color;
+	return false;
 }
 
 /**
- * alpha_blend - alpha blending equation
- * @argb_src: src pixel on premultiplied alpha mode
- * @argb_dst: dst pixel completely opaque
- *
- * blend pixels using premultiplied blend formula. The current DRM assumption
- * is that pixel color values have been already pre-multiplied with the alpha
- * channel values. See more drm_plane_create_blend_mode_property(). Also, this
- * formula assumes a completely opaque background.
- */
-static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
-{
-	u8 alpha;
-
-	alpha = argb_src[3];
-	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
-	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
-	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
-}
-
-/**
- * x_blend - blending equation that ignores the pixel alpha
- *
- * overwrites RGB color value from src pixel to dst pixel.
- */
-static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
-{
-	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
-}
-
-/**
- * blend - blend value at vaddr_src with value at vaddr_dst
- * @vaddr_dst: destination address
- * @vaddr_src: source address
- * @dst_frame_info: destination framebuffer's metadata
- * @src_frame_info: source framebuffer's metadata
- * @pixel_blend: blending equation based on plane format
+ * @wb_frame_info: The writeback frame buffer metadata
+ * @wb_fmt_func: The format tranformatio function to the wb buffer
+ * @crtc_state: The crtc state
+ * @plane_fmt_func: A format tranformation function to each plane
+ * @crc32: The crc output of the final frame
+ * @output_buffer: A buffer of a row that will receive the result of the blend(s)
+ * @stage_buffer: The line with the pixels from src_compositor
  *
- * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
- * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
- * and clearing alpha channel to an completely opaque background. This function
- * uses buffer's metadata to locate the new composite values at vaddr_dst.
+ * This function blends the pixels (Using the `pre_mul_alpha_blend`)
+ * from all planes, calculates the crc32 of the output from the former step,
+ * and, if necessary, convert and store the output to the writeback buffer.
  *
  * TODO: completely clear the primary plane (a = 0xff) before starting to blend
  * pixel color values
  */
-static void blend(void *vaddr_dst, void *vaddr_src,
-		  struct vkms_frame_info *dst_frame_info,
-		  struct vkms_frame_info *src_frame_info,
-		  void (*pixel_blend)(const u8 *, u8 *))
+static void blend(struct vkms_frame_info *wb_frame_info,
+		  format_transform_func wb_fmt_func,
+		  struct vkms_crtc_state *crtc_state,
+		  format_transform_func *plane_fmt_func,
+		  u32 *crc32, struct line_buffer *stage_buffer,
+		  struct line_buffer *output_buffer, s64 row_size)
 {
-	int i, j, j_dst, i_dst;
-	int offset_src, offset_dst;
-	u8 *pixel_dst, *pixel_src;
-
-	int x_src = src_frame_info->src.x1 >> 16;
-	int y_src = src_frame_info->src.y1 >> 16;
-
-	int x_dst = src_frame_info->dst.x1;
-	int y_dst = src_frame_info->dst.y1;
-	int h_dst = drm_rect_height(&src_frame_info->dst);
-	int w_dst = drm_rect_width(&src_frame_info->dst);
+	struct vkms_plane_state **plane = crtc_state->active_planes;
+	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
+	u32 n_active_planes = crtc_state->num_active_planes;
 
+	int y_src = primary_plane_info->dst.y1;
+	int h_dst = drm_rect_height(&primary_plane_info->dst);
 	int y_limit = y_src + h_dst;
-	int x_limit = x_src + w_dst;
-
-	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
-		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
-			offset_dst = dst_frame_info->offset
-				     + (i_dst * dst_frame_info->pitch)
-				     + (j_dst++ * dst_frame_info->cpp);
-			offset_src = src_frame_info->offset
-				     + (i * src_frame_info->pitch)
-				     + (j * src_frame_info->cpp);
-
-			pixel_src = (u8 *)(vaddr_src + offset_src);
-			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
-			pixel_blend(pixel_src, pixel_dst);
-			/* clearing alpha channel (0xff)*/
-			pixel_dst[3] = 0xff;
+	int y, i;
+
+	for (y = y_src; y < y_limit; y++) {
+		plane_fmt_func[0](primary_plane_info, y, output_buffer);
+
+		/* If there are other planes besides primary, we consider the active
+		 * planes should be in z-order and compose them associatively:
+		 * ((primary <- overlay) <- cursor)
+		 */
+		for (i = 1; i < n_active_planes; i++) {
+			if (!check_y_limit(plane[i]->frame_info, y))
+				continue;
+
+			plane_fmt_func[i](plane[i]->frame_info, y, stage_buffer);
+			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
+					    output_buffer);
 		}
-		i_dst++;
+
+		*crc32 = crc32_le(*crc32, (void *)output_buffer, row_size);
+
+		if (wb_frame_info)
+			wb_fmt_func(wb_frame_info, y, output_buffer);
 	}
 }
 
-static void compose_plane(struct vkms_frame_info *primary_plane_info,
-			  struct vkms_frame_info *plane_frame_info,
-			  void *vaddr_out)
+static void get_format_transform_functions(struct vkms_crtc_state *crtc_state,
+					   format_transform_func plane_funcs[])
 {
-	struct drm_framebuffer *fb = plane_frame_info->fb;
-	void *vaddr;
-	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
+	struct vkms_plane_state **active_planes = crtc_state->active_planes;
+	u32 n_active_planes = crtc_state->num_active_planes, s_fmt;
+	int i;
 
-	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
-		return;
+	for (i = 0; i < n_active_planes; i++) {
+		s_fmt = active_planes[i]->frame_info->fb->format->format;
+		plane_funcs[i] = get_fmt_transform_function(s_fmt);
+	}
+}
 
-	vaddr = plane_frame_info->map[0].vaddr;
+static bool check_planes_x_bounds(struct vkms_crtc_state *crtc_state,
+				  struct vkms_frame_info *wb_frame_info)
+{
+	struct vkms_plane_state **planes = crtc_state->active_planes;
+	struct vkms_frame_info *primary_plane_info = planes[0]->frame_info;
+	int line_width = drm_rect_width(&primary_plane_info->dst);
+	u32 n_active_planes = crtc_state->num_active_planes;
+	int i;
 
-	if (fb->format->format == DRM_FORMAT_ARGB8888)
-		pixel_blend = &alpha_blend;
-	else
-		pixel_blend = &x_blend;
+	for (i = 0; i < n_active_planes; i++) {
+		int x_dst = planes[i]->frame_info->dst.x1;
+		int x_src = planes[i]->frame_info->src.x1 >> 16;
+		int x2_src = planes[i]->frame_info->src.x2 >> 16;
+		int x_limit = drm_rect_width(&planes[i]->frame_info->dst);
 
-	blend(vaddr_out, vaddr, primary_plane_info,
-	      plane_frame_info, pixel_blend);
+		if (x_dst + x_limit > line_width)
+			return false;
+		if (x_src + x_limit > x2_src)
+			return false;
+	}
+
+	return true;
 }
 
-static int compose_active_planes(void **vaddr_out,
-				 struct vkms_frame_info *primary_plane_info,
-				 struct vkms_crtc_state *crtc_state)
+static int compose_active_planes(struct vkms_frame_info *wb_frame_info,
+				 struct vkms_crtc_state *crtc_state,
+				 u32 *crc32)
 {
-	struct drm_framebuffer *fb = primary_plane_info->fb;
-	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
-	const void *vaddr;
-	int i;
+	format_transform_func plane_funcs[NUM_OVERLAY_PLANES], wb_func = NULL;
+	int line_width, ret = 0, pixel_size = sizeof(struct line_buffer);
+	struct vkms_frame_info *primary_plane_info = NULL;
+	struct line_buffer *output_buffer, *stage_buffer;
+	struct vkms_plane_state *act_plane = NULL;
+	u32 wb_format;
 
-	if (!*vaddr_out) {
-		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
-		if (!*vaddr_out) {
-			DRM_ERROR("Cannot allocate memory for output frame.");
-			return -ENOMEM;
-		}
+	if (WARN_ON(pixel_size != 8))
+		return -EINVAL;
+
+	if (crtc_state->num_active_planes >= 1) {
+		act_plane = crtc_state->active_planes[0];
+		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
+			primary_plane_info = act_plane->frame_info;
 	}
 
+	if (!primary_plane_info)
+		return -EINVAL;
+
 	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
 		return -EINVAL;
 
-	vaddr = primary_plane_info->map[0].vaddr;
+	if (WARN_ON(!check_planes_x_bounds(crtc_state, wb_frame_info)))
+		return -EINVAL;
 
-	memcpy(*vaddr_out, vaddr, gem_obj->size);
+	line_width = drm_rect_width(&primary_plane_info->dst);
 
-	/* If there are other planes besides primary, we consider the active
-	 * planes should be in z-order and compose them associatively:
-	 * ((primary <- overlay) <- cursor)
-	 */
-	for (i = 1; i < crtc_state->num_active_planes; i++)
-		compose_plane(primary_plane_info,
-			      crtc_state->active_planes[i]->frame_info,
-			      *vaddr_out);
+	stage_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
+	if (!stage_buffer) {
+		DRM_ERROR("Cannot allocate memory for the output line buffer");
+		return -ENOMEM;
+	}
 
-	return 0;
+	output_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
+	if (!output_buffer) {
+		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
+		ret = -ENOMEM;
+		goto free_stage_buffer;
+	}
+
+	get_format_transform_functions(crtc_state, plane_funcs);
+
+	if (wb_frame_info) {
+		wb_format = wb_frame_info->fb->format->format;
+		wb_func = get_wb_fmt_transform_function(wb_format);
+		wb_frame_info->src = primary_plane_info->src;
+		wb_frame_info->dst = primary_plane_info->dst;
+	}
+
+	blend(wb_frame_info, wb_func, crtc_state, plane_funcs, crc32,
+	      stage_buffer, output_buffer, (s64)line_width * pixel_size);
+
+	kvfree(output_buffer);
+free_stage_buffer:
+	kvfree(stage_buffer);
+
+	return ret;
 }
 
 /**
@@ -222,13 +230,12 @@ void vkms_composer_worker(struct work_struct *work)
 						struct vkms_crtc_state,
 						composer_work);
 	struct drm_crtc *crtc = crtc_state->base.crtc;
+	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
+	struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
 	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
-	struct vkms_frame_info *primary_plane_info = NULL;
-	struct vkms_plane_state *act_plane = NULL;
 	bool crc_pending, wb_pending;
-	void *vaddr_out = NULL;
-	u32 crc32 = 0;
 	u64 frame_start, frame_end;
+	u32 crc32 = 0;
 	int ret;
 
 	spin_lock_irq(&out->composer_lock);
@@ -248,35 +255,19 @@ void vkms_composer_worker(struct work_struct *work)
 	if (!crc_pending)
 		return;
 
-	if (crtc_state->num_active_planes >= 1) {
-		act_plane = crtc_state->active_planes[0];
-		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
-			primary_plane_info = act_plane->frame_info;
-	}
-
-	if (!primary_plane_info)
-		return;
-
 	if (wb_pending)
-		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
+		ret = compose_active_planes(wb_frame_info, crtc_state, &crc32);
+	else
+		ret = compose_active_planes(NULL, crtc_state, &crc32);
 
-	ret = compose_active_planes(&vaddr_out, primary_plane_info,
-				    crtc_state);
-	if (ret) {
-		if (ret == -EINVAL && !wb_pending)
-			kvfree(vaddr_out);
+	if (ret)
 		return;
-	}
-
-	crc32 = compute_crc(vaddr_out, primary_plane_info);
 
 	if (wb_pending) {
 		drm_writeback_signal_completion(&out->wb_connector, 0);
 		spin_lock_irq(&out->composer_lock);
 		crtc_state->wb_pending = false;
 		spin_unlock_irq(&out->composer_lock);
-	} else {
-		kvfree(vaddr_out);
 	}
 
 	/*
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
new file mode 100644
index 000000000000..0d1838d1b835
--- /dev/null
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -0,0 +1,138 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#include <drm/drm_rect.h>
+#include "vkms_formats.h"
+
+format_transform_func get_fmt_transform_function(u32 format)
+{
+	if (format == DRM_FORMAT_ARGB8888)
+		return &ARGB8888_to_ARGB16161616;
+	else
+		return &XRGB8888_to_ARGB16161616;
+}
+
+format_transform_func get_wb_fmt_transform_function(u32 format)
+{
+	if (format == DRM_FORMAT_ARGB8888)
+		return &convert_to_ARGB8888;
+	else
+		return &convert_to_XRGB8888;
+}
+
+static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
+{
+	return frame_info->offset + (y * frame_info->pitch)
+				  + (x * frame_info->cpp);
+}
+
+/*
+ * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
+ *
+ * @frame_info: Buffer metadata
+ * @x: The x(width) coordinate of the 2D buffer
+ * @y: The y(Heigth) coordinate of the 2D buffer
+ *
+ * Takes the information stored in the frame_info, a pair of coordinates, and
+ * returns the address of the first color channel.
+ * This function assumes the channels are packed together, i.e. a color channel
+ * comes immediately after another in the memory. And therefore, this function
+ * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
+ */
+static void *packed_pixels_addr(struct vkms_frame_info *frame_info, int x, int y)
+{
+	int offset = pixel_offset(frame_info, x, y);
+
+	return (u8 *)frame_info->map[0].vaddr + offset;
+}
+
+static void *get_packed_src_addr(struct vkms_frame_info *frame_info, int y)
+{
+	int x_src = frame_info->src.x1 >> 16;
+	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
+
+	return packed_pixels_addr(frame_info, x_src, y_src);
+}
+
+void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+			      struct line_buffer *stage_buffer)
+{
+	u8 *src_pixels = get_packed_src_addr(frame_info, y);
+	int x, x_limit = drm_rect_width(&frame_info->dst);
+
+	for (x = 0; x < x_limit; x++, src_pixels += 4) {
+		/*
+		 * Organizes the channels in their respective positions and converts
+		 * the 8 bits channel to 16.
+		 * The 257 is the "conversion ratio". This number is obtained by the
+		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
+		 * the best color value in a pixel format with more possibilities.
+		 * And a similar idea applies to others RGB color conversions.
+		 */
+		stage_buffer[x].a = (u16)src_pixels[3] * 257;
+		stage_buffer[x].r = (u16)src_pixels[2] * 257;
+		stage_buffer[x].g = (u16)src_pixels[1] * 257;
+		stage_buffer[x].b = (u16)src_pixels[0] * 257;
+	}
+}
+
+void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+			      struct line_buffer *stage_buffer)
+{
+	u8 *src_pixels = get_packed_src_addr(frame_info, y);
+	int x, x_limit = drm_rect_width(&frame_info->dst);
+
+	for (x = 0; x < x_limit; x++, src_pixels += 4) {
+		stage_buffer[x].a = (u16)0xffff;
+		stage_buffer[x].r = (u16)src_pixels[2] * 257;
+		stage_buffer[x].g = (u16)src_pixels[1] * 257;
+		stage_buffer[x].b = (u16)src_pixels[0] * 257;
+	}
+}
+
+/*
+ * The following  functions take an line of ARGB16161616 pixels from the
+ * src_buffer, convert them to a specific format, and store them in the
+ * destination.
+ *
+ * They are used in the `compose_active_planes` to convert and store a line
+ * from the src_buffer to the writeback buffer.
+ */
+void convert_to_ARGB8888(struct vkms_frame_info *frame_info,
+			 int y, struct line_buffer *src_buffer)
+{
+	int x, x_dst = frame_info->dst.x1;
+	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	int x_limit = drm_rect_width(&frame_info->dst);
+
+	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
+		/*
+		 * This sequence below is important because the format's byte order is
+		 * in little-endian. In the case of the ARGB8888 the memory is
+		 * organized this way:
+		 *
+		 * | Addr     | = blue channel
+		 * | Addr + 1 | = green channel
+		 * | Addr + 2 | = Red channel
+		 * | Addr + 3 | = Alpha channel
+		 */
+		dst_pixels[3] = DIV_ROUND_UP(src_buffer[x].a, 257);
+		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
+		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
+		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
+	}
+}
+
+void convert_to_XRGB8888(struct vkms_frame_info *frame_info,
+			 int y, struct line_buffer *src_buffer)
+{
+	int x, x_dst = frame_info->dst.x1;
+	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	int x_limit = drm_rect_width(&frame_info->dst);
+
+	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
+		dst_pixels[3] = (u8)0xff;
+		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
+		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
+		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
+	}
+}
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
new file mode 100644
index 000000000000..817e8b2124ae
--- /dev/null
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _VKMS_FORMATS_H_
+#define _VKMS_FORMATS_H_
+
+#include "vkms_drv.h"
+
+struct line_buffer {
+	u16 a, r, g, b;
+};
+
+void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+			      struct line_buffer *stage_buffer);
+
+void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+			      struct line_buffer *stage_buffer);
+
+void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
+			 struct line_buffer *src_buffer);
+
+void convert_to_XRGB8888(struct vkms_frame_info *frame_info, int y,
+			 struct line_buffer *src_buffer);
+
+typedef void (*format_transform_func)(struct vkms_frame_info *frame_info, int y,
+				      struct line_buffer *buffer);
+
+format_transform_func get_fmt_transform_function(u32 format);
+
+format_transform_func get_wb_fmt_transform_function(u32 format);
+
+#endif /* _VKMS_FORMATS_H_ */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
  2022-01-21 21:38 [PATCH v4 0/9] Add new formats support to vkms Igor Torrente
                   ` (6 preceding siblings ...)
  2022-01-21 21:38 ` [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats Igor Torrente
@ 2022-01-21 21:38 ` Igor Torrente
  2022-01-21 21:38 ` [PATCH v4 9/9] drm: vkms: Add support to the RGB565 format Igor Torrente
  2022-02-08 11:03 ` [PATCH v4 0/9] Add new formats support to vkms Melissa Wen
  9 siblings, 0 replies; 31+ messages in thread
From: Igor Torrente @ 2022-01-21 21:38 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, dri-devel, ~lkcamp/patches, Igor Torrente

This will be useful to write tests that depends on these formats.

ARGB and XRGB follows the a similar implementation of the former formats.
Just adjusting for 16 bits per channel.

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
V3: Adapt the handlers to the new format introduced in patch 7 V3.
---
 drivers/gpu/drm/vkms/vkms_formats.c   | 67 +++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_formats.h   | 12 +++++
 drivers/gpu/drm/vkms/vkms_plane.c     |  5 +-
 drivers/gpu/drm/vkms/vkms_writeback.c |  2 +
 4 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 0d1838d1b835..661da39d1276 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -7,6 +7,10 @@ format_transform_func get_fmt_transform_function(u32 format)
 {
 	if (format == DRM_FORMAT_ARGB8888)
 		return &ARGB8888_to_ARGB16161616;
+	else if (format == DRM_FORMAT_ARGB16161616)
+		return &get_ARGB16161616;
+	else if (format == DRM_FORMAT_XRGB16161616)
+		return &XRGB16161616_to_ARGB16161616;
 	else
 		return &XRGB8888_to_ARGB16161616;
 }
@@ -15,6 +19,10 @@ format_transform_func get_wb_fmt_transform_function(u32 format)
 {
 	if (format == DRM_FORMAT_ARGB8888)
 		return &convert_to_ARGB8888;
+	else if (format == DRM_FORMAT_ARGB16161616)
+		return &convert_to_ARGB16161616;
+	else if (format == DRM_FORMAT_XRGB16161616)
+		return &convert_to_XRGB16161616;
 	else
 		return &convert_to_XRGB8888;
 }
@@ -89,6 +97,35 @@ void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
 	}
 }
 
+void get_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+		      struct line_buffer *stage_buffer)
+{
+	u16 *src_pixels = get_packed_src_addr(frame_info, y);
+	int x, x_limit = drm_rect_width(&frame_info->dst);
+
+	for (x = 0; x < x_limit; x++, src_pixels += 4) {
+		stage_buffer[x].a = src_pixels[3];
+		stage_buffer[x].r = src_pixels[2];
+		stage_buffer[x].g = src_pixels[1];
+		stage_buffer[x].b = src_pixels[0];
+	}
+}
+
+void XRGB16161616_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+				  struct line_buffer *stage_buffer)
+{
+	u16 *src_pixels = get_packed_src_addr(frame_info, y);
+	int x, x_limit = drm_rect_width(&frame_info->dst);
+
+	for (x = 0; x < x_limit; x++, src_pixels += 4) {
+		stage_buffer[x].a = (u16)0xffff;
+		stage_buffer[x].r = src_pixels[2];
+		stage_buffer[x].g = src_pixels[1];
+		stage_buffer[x].b = src_pixels[0];
+	}
+}
+
+
 /*
  * The following  functions take an line of ARGB16161616 pixels from the
  * src_buffer, convert them to a specific format, and store them in the
@@ -136,3 +173,33 @@ void convert_to_XRGB8888(struct vkms_frame_info *frame_info,
 		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
 	}
 }
+
+void convert_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+			     struct line_buffer *src_buffer)
+{
+	int x, x_dst = frame_info->dst.x1;
+	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	int x_limit = drm_rect_width(&frame_info->dst);
+
+	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
+		dst_pixels[3] = src_buffer[x].a;
+		dst_pixels[2] = src_buffer[x].r;
+		dst_pixels[1] = src_buffer[x].g;
+		dst_pixels[0] = src_buffer[x].b;
+	}
+}
+
+void convert_to_XRGB16161616(struct vkms_frame_info *frame_info, int y,
+			     struct line_buffer *src_buffer)
+{
+	int x, x_dst = frame_info->dst.x1;
+	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	int x_limit = drm_rect_width(&frame_info->dst);
+
+	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
+		dst_pixels[3] = src_buffer[x].a;
+		dst_pixels[2] = src_buffer[x].r;
+		dst_pixels[1] = src_buffer[x].g;
+		dst_pixels[0] = src_buffer[x].b;
+	}
+}
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index 817e8b2124ae..22358f3a33ab 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -15,12 +15,24 @@ void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
 void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
 			      struct line_buffer *stage_buffer);
 
+void get_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+		      struct line_buffer *stage_buffer);
+
+void XRGB16161616_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+				  struct line_buffer *stage_buffer);
+
 void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
 			 struct line_buffer *src_buffer);
 
 void convert_to_XRGB8888(struct vkms_frame_info *frame_info, int y,
 			 struct line_buffer *src_buffer);
 
+void convert_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+			     struct line_buffer *src_buffer);
+
+void convert_to_XRGB16161616(struct vkms_frame_info *frame_info, int y,
+			     struct line_buffer *src_buffer);
+
 typedef void (*format_transform_func)(struct vkms_frame_info *frame_info, int y,
 				      struct line_buffer *buffer);
 
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 28752af0118c..1d70c9e8f109 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -13,11 +13,14 @@
 
 static const u32 vkms_formats[] = {
 	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_XRGB16161616
 };
 
 static const u32 vkms_plane_formats[] = {
 	DRM_FORMAT_ARGB8888,
-	DRM_FORMAT_XRGB8888
+	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_XRGB16161616,
+	DRM_FORMAT_ARGB16161616
 };
 
 static struct drm_plane_state *
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index ad4bb1fb37ca..393d3fc7966f 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -14,6 +14,8 @@
 
 static const u32 vkms_wb_formats[] = {
 	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_XRGB16161616,
+	DRM_FORMAT_ARGB16161616
 };
 
 static const struct drm_connector_funcs vkms_wb_connector_funcs = {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 9/9] drm: vkms: Add support to the RGB565 format
  2022-01-21 21:38 [PATCH v4 0/9] Add new formats support to vkms Igor Torrente
                   ` (7 preceding siblings ...)
  2022-01-21 21:38 ` [PATCH v4 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats Igor Torrente
@ 2022-01-21 21:38 ` Igor Torrente
  2022-02-08 10:50   ` Melissa Wen
  2022-02-10  9:50   ` Pekka Paalanen
  2022-02-08 11:03 ` [PATCH v4 0/9] Add new formats support to vkms Melissa Wen
  9 siblings, 2 replies; 31+ messages in thread
From: Igor Torrente @ 2022-01-21 21:38 UTC (permalink / raw)
  To: rodrigosiqueiramelo, melissa.srw, ppaalanen, tzimmermann
  Cc: hamohammed.sa, airlied, dri-devel, ~lkcamp/patches, Igor Torrente

Adds this common format to vkms.

This commit also adds new helper macros to deal with fixed-point
arithmetic.

It was done to improve the precision of the conversion to ARGB16161616
since the "conversion ratio" is not an integer.

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
V3: Adapt the handlers to the new format introduced in patch 7 V3.
---
 drivers/gpu/drm/vkms/vkms_formats.c   | 74 +++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_formats.h   |  6 +++
 drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
 drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
 4 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 661da39d1276..dc612882dd8c 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -11,6 +11,8 @@ format_transform_func get_fmt_transform_function(u32 format)
 		return &get_ARGB16161616;
 	else if (format == DRM_FORMAT_XRGB16161616)
 		return &XRGB16161616_to_ARGB16161616;
+	else if (format == DRM_FORMAT_RGB565)
+		return &RGB565_to_ARGB16161616;
 	else
 		return &XRGB8888_to_ARGB16161616;
 }
@@ -23,6 +25,8 @@ format_transform_func get_wb_fmt_transform_function(u32 format)
 		return &convert_to_ARGB16161616;
 	else if (format == DRM_FORMAT_XRGB16161616)
 		return &convert_to_XRGB16161616;
+	else if (format == DRM_FORMAT_RGB565)
+		return &convert_to_RGB565;
 	else
 		return &convert_to_XRGB8888;
 }
@@ -33,6 +37,26 @@ static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
 				  + (x * frame_info->cpp);
 }
 
+/*
+ * FP stands for _Fixed Point_ and **not** _Float Point_
+ * LF stands for Long Float (i.e. double)
+ * The following macros help doing fixed point arithmetic.
+ */
+/*
+ * With FP scale 15 we have 17 and 15 bits of integer and fractional parts
+ * respectively.
+ *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
+ * 31                                          0
+ */
+#define FP_SCALE 15
+
+#define LF_TO_FP(a) ((a) * (u64)(1 << FP_SCALE))
+#define INT_TO_FP(a) ((a) << FP_SCALE)
+#define FP_MUL(a, b) ((s32)(((s64)(a) * (b)) >> FP_SCALE))
+#define FP_DIV(a, b) ((s32)(((s64)(a) << FP_SCALE) / (b)))
+/* This macro converts a fixed point number to int, and round half up it */
+#define FP_TO_INT_ROUND_UP(a) (((a) + (1 << (FP_SCALE - 1))) >> FP_SCALE)
+
 /*
  * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
  *
@@ -125,6 +149,33 @@ void XRGB16161616_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
 	}
 }
 
+void RGB565_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+			    struct line_buffer *stage_buffer)
+{
+	u16 *src_pixels = get_packed_src_addr(frame_info, y);
+	int x, x_limit = drm_rect_width(&frame_info->dst);
+
+	for (x = 0; x < x_limit; x++, src_pixels++) {
+		u16 rgb_565 = le16_to_cpu(*src_pixels);
+		int fp_r = INT_TO_FP((rgb_565 >> 11) & 0x1f);
+		int fp_g = INT_TO_FP((rgb_565 >> 5) & 0x3f);
+		int fp_b = INT_TO_FP(rgb_565 & 0x1f);
+
+		/*
+		 * The magic constants is the "conversion ratio" and is calculated
+		 * dividing 65535(2^16 - 1) by 31(2^5 -1) and 63(2^6 - 1)
+		 * respectively.
+		 */
+		int fp_rb_ratio = LF_TO_FP(2114.032258065);
+		int fp_g_ratio = LF_TO_FP(1040.238095238);
+
+		stage_buffer[x].a = (u16)0xffff;
+		stage_buffer[x].r = FP_TO_INT_ROUND_UP(FP_MUL(fp_r, fp_rb_ratio));
+		stage_buffer[x].g = FP_TO_INT_ROUND_UP(FP_MUL(fp_g, fp_g_ratio));
+		stage_buffer[x].b = FP_TO_INT_ROUND_UP(FP_MUL(fp_b, fp_rb_ratio));
+	}
+}
+
 
 /*
  * The following  functions take an line of ARGB16161616 pixels from the
@@ -203,3 +254,26 @@ void convert_to_XRGB16161616(struct vkms_frame_info *frame_info, int y,
 		dst_pixels[0] = src_buffer[x].b;
 	}
 }
+
+void convert_to_RGB565(struct vkms_frame_info *frame_info, int y,
+		       struct line_buffer *src_buffer)
+{
+	int x, x_dst = frame_info->dst.x1;
+	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	int x_limit = drm_rect_width(&frame_info->dst);
+
+	for (x = 0; x < x_limit; x++, dst_pixels++) {
+		int fp_r = INT_TO_FP(src_buffer[x].r);
+		int fp_g = INT_TO_FP(src_buffer[x].g);
+		int fp_b = INT_TO_FP(src_buffer[x].b);
+
+		int fp_rb_ratio = LF_TO_FP(2114.032258065);
+		int fp_g_ratio = LF_TO_FP(1040.238095238);
+
+		u16 r = FP_TO_INT_ROUND_UP(FP_DIV(fp_r, fp_rb_ratio));
+		u16 g = FP_TO_INT_ROUND_UP(FP_DIV(fp_g, fp_g_ratio));
+		u16 b = FP_TO_INT_ROUND_UP(FP_DIV(fp_b, fp_rb_ratio));
+
+		*dst_pixels = cpu_to_le16(r << 11 | g << 5 | b);
+	}
+}
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index 22358f3a33ab..836d6e43ea90 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -21,6 +21,9 @@ void get_ARGB16161616(struct vkms_frame_info *frame_info, int y,
 void XRGB16161616_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
 				  struct line_buffer *stage_buffer);
 
+void RGB565_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
+			    struct line_buffer *stage_buffer);
+
 void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
 			 struct line_buffer *src_buffer);
 
@@ -33,6 +36,9 @@ void convert_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
 void convert_to_XRGB16161616(struct vkms_frame_info *frame_info, int y,
 			     struct line_buffer *src_buffer);
 
+void convert_to_RGB565(struct vkms_frame_info *frame_info, int y,
+		       struct line_buffer *src_buffer);
+
 typedef void (*format_transform_func)(struct vkms_frame_info *frame_info, int y,
 				      struct line_buffer *buffer);
 
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 1d70c9e8f109..4643eefcdf29 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -13,14 +13,16 @@
 
 static const u32 vkms_formats[] = {
 	DRM_FORMAT_XRGB8888,
-	DRM_FORMAT_XRGB16161616
+	DRM_FORMAT_XRGB16161616,
+	DRM_FORMAT_RGB565
 };
 
 static const u32 vkms_plane_formats[] = {
 	DRM_FORMAT_ARGB8888,
 	DRM_FORMAT_XRGB8888,
 	DRM_FORMAT_XRGB16161616,
-	DRM_FORMAT_ARGB16161616
+	DRM_FORMAT_ARGB16161616,
+	DRM_FORMAT_RGB565
 };
 
 static struct drm_plane_state *
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index 393d3fc7966f..1aaa630090d3 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -15,7 +15,8 @@
 static const u32 vkms_wb_formats[] = {
 	DRM_FORMAT_XRGB8888,
 	DRM_FORMAT_XRGB16161616,
-	DRM_FORMAT_ARGB16161616
+	DRM_FORMAT_ARGB16161616,
+	DRM_FORMAT_RGB565
 };
 
 static const struct drm_connector_funcs vkms_wb_connector_funcs = {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/9] drm: vkms: Replace the deprecated drm_mode_config_init
  2022-01-21 21:38 ` [PATCH v4 1/9] drm: vkms: Replace the deprecated drm_mode_config_init Igor Torrente
@ 2022-02-08 10:02   ` Melissa Wen
  0 siblings, 0 replies; 31+ messages in thread
From: Melissa Wen @ 2022-02-08 10:02 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	melissa.srw, tzimmermann, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]

On 01/21, Igor Torrente wrote:
> `drm_mode_config_init` is deprecated since commit c3b790ea07a1 ("drm: Manage
> drm_mode_config_init with drmm_") in favor of `drmm_mode_config_init`. Update
> the former to the latter.
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
> V2: Change the code style(Thomas Zimmermann).
> 
> V4: Update the commit message(Nícolas F. R. A. Prado)
> ---
>  drivers/gpu/drm/vkms/vkms_drv.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.c b/drivers/gpu/drm/vkms/vkms_drv.c
> index 0ffe5f0e33f7..ee4d96dabe19 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.c
> +++ b/drivers/gpu/drm/vkms/vkms_drv.c
> @@ -140,8 +140,12 @@ static const struct drm_mode_config_helper_funcs vkms_mode_config_helpers = {
>  static int vkms_modeset_init(struct vkms_device *vkmsdev)
>  {
>  	struct drm_device *dev = &vkmsdev->drm;
> +	int ret;
> +
> +	ret = drmm_mode_config_init(dev);
> +	if (ret < 0)
> +		return ret;
lgtm.

Reviewed-by: Melissa Wen <mwen@igalia.com>
>  
> -	drm_mode_config_init(dev);
>  	dev->mode_config.funcs = &vkms_mode_funcs;
>  	dev->mode_config.min_width = XRES_MIN;
>  	dev->mode_config.min_height = YRES_MIN;
> -- 
> 2.30.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] drm: vkms: Alloc the compose frame using vzalloc
  2022-01-21 21:38 ` [PATCH v4 2/9] drm: vkms: Alloc the compose frame using vzalloc Igor Torrente
@ 2022-02-08 10:14   ` Melissa Wen
  0 siblings, 0 replies; 31+ messages in thread
From: Melissa Wen @ 2022-02-08 10:14 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	melissa.srw, tzimmermann, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 2144 bytes --]

On 01/21, Igor Torrente wrote:
> Currently, the memory to the composition frame is being allocated using
> the kzmalloc. This comes with the limitation of maximum size of one
> page size(which in the x86_64 is 4Kb and 4MB for default and hugepage
> respectively).
> 
> Somes test of igt (e.g. kms_plane@pixel-format) uses more than 4MB when
> testing some pixel formats like ARGB16161616.
... And the following error were showing up when running
kms_plane@plane-panning-bottom-right*:

[drm:vkms_composer_worker [vkms]] *ERROR* Cannot allocate memory for
output frame. 
> 
> This problem is addessed by allocating the memory using kvzalloc that
> circunvents this limitation.

With this patch, can you drop these debugging issues in VKMS TO-DO[1],
please?

Thanks,

Reviewed-by: Melissa Wen <mwen@igalia.com>

[1] https://dri.freedesktop.org/docs/drm/gpu/vkms.html#igt-better-support
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 9e8204be9a14..82f79e508f81 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -180,7 +180,7 @@ static int compose_active_planes(void **vaddr_out,
>  	int i;
>  
>  	if (!*vaddr_out) {
> -		*vaddr_out = kzalloc(gem_obj->size, GFP_KERNEL);
> +		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
>  		if (!*vaddr_out) {
>  			DRM_ERROR("Cannot allocate memory for output frame.");
>  			return -ENOMEM;
> @@ -263,7 +263,7 @@ void vkms_composer_worker(struct work_struct *work)
>  				    crtc_state);
>  	if (ret) {
>  		if (ret == -EINVAL && !wb_pending)
> -			kfree(vaddr_out);
> +			kvfree(vaddr_out);
>  		return;
>  	}
>  
> @@ -275,7 +275,7 @@ void vkms_composer_worker(struct work_struct *work)
>  		crtc_state->wb_pending = false;
>  		spin_unlock_irq(&out->composer_lock);
>  	} else {
> -		kfree(vaddr_out);
> +		kvfree(vaddr_out);
>  	}
>  
>  	/*
> -- 
> 2.30.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/9] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES
  2022-01-21 21:38 ` [PATCH v4 3/9] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES Igor Torrente
@ 2022-02-08 10:16   ` Melissa Wen
  0 siblings, 0 replies; 31+ messages in thread
From: Melissa Wen @ 2022-02-08 10:16 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	melissa.srw, tzimmermann, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]

On 01/21, Igor Torrente wrote:
> The `map` vector at `vkms_composer` uses a hardcoded value to define its
> size.
> 
> If someday the maximum number of planes increases, this hardcoded value
> can be a problem.
> 
> This value is being replaced with the DRM_FORMAT_MAX_PLANES macro.
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  drivers/gpu/drm/vkms/vkms_drv.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 9496fdc900b8..0eeea6f93733 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -30,7 +30,7 @@ struct vkms_writeback_job {
>  struct vkms_composer {
>  	struct drm_framebuffer fb;
>  	struct drm_rect src, dst;
> -	struct dma_buf_map map[4];
> +	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
Reviewed-by: Melissa Wen <mwen@igalia.com>
>  	unsigned int offset;
>  	unsigned int pitch;
>  	unsigned int cpp;
> -- 
> 2.30.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info`
  2022-01-21 21:38 ` [PATCH v4 4/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info` Igor Torrente
@ 2022-02-08 10:20   ` Melissa Wen
  0 siblings, 0 replies; 31+ messages in thread
From: Melissa Wen @ 2022-02-08 10:20 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	melissa.srw, tzimmermann, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 13528 bytes --]

On 01/21, Igor Torrente wrote:
> Changes the name of this struct to a more meaningful name.
> A name that represents better what this struct is about.
> 
> Composer is the code that do the compositing of the planes.
> This struct is contains information of the frame that is
> being used in the output composition. Thus, vkms_frame_info
> is a better name to represent this.
Typo. Maybe this:
`This struct contains information on the frame used in the output
composition`

Anyway, this change makes sense to me.
Reviewed-by: Melissa Wen <mwen@igalia.com>
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c | 87 ++++++++++++++--------------
>  drivers/gpu/drm/vkms/vkms_drv.h      |  6 +-
>  drivers/gpu/drm/vkms/vkms_plane.c    | 38 ++++++------
>  3 files changed, 66 insertions(+), 65 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 82f79e508f81..2d946368a561 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -11,11 +11,11 @@
>  #include "vkms_drv.h"
>  
>  static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> -				 const struct vkms_composer *composer)
> +				 const struct vkms_frame_info *frame_info)
>  {
>  	u32 pixel;
> -	int src_offset = composer->offset + (y * composer->pitch)
> -				      + (x * composer->cpp);
> +	int src_offset = frame_info->offset + (y * frame_info->pitch)
> +					    + (x * frame_info->cpp);
>  
>  	pixel = *(u32 *)&buffer[src_offset];
>  
> @@ -26,24 +26,24 @@ static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
>   * compute_crc - Compute CRC value on output frame
>   *
>   * @vaddr: address to final framebuffer
> - * @composer: framebuffer's metadata
> + * @frame_info: framebuffer's metadata
>   *
>   * returns CRC value computed using crc32 on the visible portion of
>   * the final framebuffer at vaddr_out
>   */
>  static uint32_t compute_crc(const u8 *vaddr,
> -			    const struct vkms_composer *composer)
> +			    const struct vkms_frame_info *frame_info)
>  {
>  	int x, y;
>  	u32 crc = 0, pixel = 0;
> -	int x_src = composer->src.x1 >> 16;
> -	int y_src = composer->src.y1 >> 16;
> -	int h_src = drm_rect_height(&composer->src) >> 16;
> -	int w_src = drm_rect_width(&composer->src) >> 16;
> +	int x_src = frame_info->src.x1 >> 16;
> +	int y_src = frame_info->src.y1 >> 16;
> +	int h_src = drm_rect_height(&frame_info->src) >> 16;
> +	int w_src = drm_rect_width(&frame_info->src) >> 16;
>  
>  	for (y = y_src; y < y_src + h_src; ++y) {
>  		for (x = x_src; x < x_src + w_src; ++x) {
> -			pixel = get_pixel_from_buffer(x, y, vaddr, composer);
> +			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
>  			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
>  		}
>  	}
> @@ -98,8 +98,8 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>   * blend - blend value at vaddr_src with value at vaddr_dst
>   * @vaddr_dst: destination address
>   * @vaddr_src: source address
> - * @dst_composer: destination framebuffer's metadata
> - * @src_composer: source framebuffer's metadata
> + * @dst_frame_info: destination framebuffer's metadata
> + * @src_frame_info: source framebuffer's metadata
>   * @pixel_blend: blending equation based on plane format
>   *
>   * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
> @@ -111,33 +111,33 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>   * pixel color values
>   */
>  static void blend(void *vaddr_dst, void *vaddr_src,
> -		  struct vkms_composer *dst_composer,
> -		  struct vkms_composer *src_composer,
> +		  struct vkms_frame_info *dst_frame_info,
> +		  struct vkms_frame_info *src_frame_info,
>  		  void (*pixel_blend)(const u8 *, u8 *))
>  {
>  	int i, j, j_dst, i_dst;
>  	int offset_src, offset_dst;
>  	u8 *pixel_dst, *pixel_src;
>  
> -	int x_src = src_composer->src.x1 >> 16;
> -	int y_src = src_composer->src.y1 >> 16;
> +	int x_src = src_frame_info->src.x1 >> 16;
> +	int y_src = src_frame_info->src.y1 >> 16;
>  
> -	int x_dst = src_composer->dst.x1;
> -	int y_dst = src_composer->dst.y1;
> -	int h_dst = drm_rect_height(&src_composer->dst);
> -	int w_dst = drm_rect_width(&src_composer->dst);
> +	int x_dst = src_frame_info->dst.x1;
> +	int y_dst = src_frame_info->dst.y1;
> +	int h_dst = drm_rect_height(&src_frame_info->dst);
> +	int w_dst = drm_rect_width(&src_frame_info->dst);
>  
>  	int y_limit = y_src + h_dst;
>  	int x_limit = x_src + w_dst;
>  
>  	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
>  		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
> -			offset_dst = dst_composer->offset
> -				     + (i_dst * dst_composer->pitch)
> -				     + (j_dst++ * dst_composer->cpp);
> -			offset_src = src_composer->offset
> -				     + (i * src_composer->pitch)
> -				     + (j * src_composer->cpp);
> +			offset_dst = dst_frame_info->offset
> +				     + (i_dst * dst_frame_info->pitch)
> +				     + (j_dst++ * dst_frame_info->cpp);
> +			offset_src = src_frame_info->offset
> +				     + (i * src_frame_info->pitch)
> +				     + (j * src_frame_info->cpp);
>  
>  			pixel_src = (u8 *)(vaddr_src + offset_src);
>  			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
> @@ -149,32 +149,33 @@ static void blend(void *vaddr_dst, void *vaddr_src,
>  	}
>  }
>  
> -static void compose_plane(struct vkms_composer *primary_composer,
> -			  struct vkms_composer *plane_composer,
> +static void compose_plane(struct vkms_frame_info *primary_plane_info,
> +			  struct vkms_frame_info *plane_frame_info,
>  			  void *vaddr_out)
>  {
> -	struct drm_framebuffer *fb = &plane_composer->fb;
> +	struct drm_framebuffer *fb = &plane_frame_info->fb;
>  	void *vaddr;
>  	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>  
> -	if (WARN_ON(dma_buf_map_is_null(&primary_composer->map[0])))
> +	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>  		return;
>  
> -	vaddr = plane_composer->map[0].vaddr;
> +	vaddr = plane_frame_info->map[0].vaddr;
>  
>  	if (fb->format->format == DRM_FORMAT_ARGB8888)
>  		pixel_blend = &alpha_blend;
>  	else
>  		pixel_blend = &x_blend;
>  
> -	blend(vaddr_out, vaddr, primary_composer, plane_composer, pixel_blend);
> +	blend(vaddr_out, vaddr, primary_plane_info,
> +	      plane_frame_info, pixel_blend);
>  }
>  
>  static int compose_active_planes(void **vaddr_out,
> -				 struct vkms_composer *primary_composer,
> +				 struct vkms_frame_info *primary_plane_info,
>  				 struct vkms_crtc_state *crtc_state)
>  {
> -	struct drm_framebuffer *fb = &primary_composer->fb;
> +	struct drm_framebuffer *fb = &primary_plane_info->fb;
>  	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>  	const void *vaddr;
>  	int i;
> @@ -187,10 +188,10 @@ static int compose_active_planes(void **vaddr_out,
>  		}
>  	}
>  
> -	if (WARN_ON(dma_buf_map_is_null(&primary_composer->map[0])))
> +	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>  		return -EINVAL;
>  
> -	vaddr = primary_composer->map[0].vaddr;
> +	vaddr = primary_plane_info->map[0].vaddr;
>  
>  	memcpy(*vaddr_out, vaddr, gem_obj->size);
>  
> @@ -199,8 +200,8 @@ static int compose_active_planes(void **vaddr_out,
>  	 * ((primary <- overlay) <- cursor)
>  	 */
>  	for (i = 1; i < crtc_state->num_active_planes; i++)
> -		compose_plane(primary_composer,
> -			      crtc_state->active_planes[i]->composer,
> +		compose_plane(primary_plane_info,
> +			      crtc_state->active_planes[i]->frame_info,
>  			      *vaddr_out);
>  
>  	return 0;
> @@ -222,7 +223,7 @@ void vkms_composer_worker(struct work_struct *work)
>  						composer_work);
>  	struct drm_crtc *crtc = crtc_state->base.crtc;
>  	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
> -	struct vkms_composer *primary_composer = NULL;
> +	struct vkms_frame_info *primary_plane_info = NULL;
>  	struct vkms_plane_state *act_plane = NULL;
>  	bool crc_pending, wb_pending;
>  	void *vaddr_out = NULL;
> @@ -250,16 +251,16 @@ void vkms_composer_worker(struct work_struct *work)
>  	if (crtc_state->num_active_planes >= 1) {
>  		act_plane = crtc_state->active_planes[0];
>  		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> -			primary_composer = act_plane->composer;
> +			primary_plane_info = act_plane->frame_info;
>  	}
>  
> -	if (!primary_composer)
> +	if (!primary_plane_info)
>  		return;
>  
>  	if (wb_pending)
>  		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
>  
> -	ret = compose_active_planes(&vaddr_out, primary_composer,
> +	ret = compose_active_planes(&vaddr_out, primary_plane_info,
>  				    crtc_state);
>  	if (ret) {
>  		if (ret == -EINVAL && !wb_pending)
> @@ -267,7 +268,7 @@ void vkms_composer_worker(struct work_struct *work)
>  		return;
>  	}
>  
> -	crc32 = compute_crc(vaddr_out, primary_composer);
> +	crc32 = compute_crc(vaddr_out, primary_plane_info);
>  
>  	if (wb_pending) {
>  		drm_writeback_signal_completion(&out->wb_connector, 0);
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 0eeea6f93733..2e6342164bef 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -27,7 +27,7 @@ struct vkms_writeback_job {
>  	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
>  };
>  
> -struct vkms_composer {
> +struct vkms_frame_info {
>  	struct drm_framebuffer fb;
>  	struct drm_rect src, dst;
>  	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
> @@ -39,11 +39,11 @@ struct vkms_composer {
>  /**
>   * vkms_plane_state - Driver specific plane state
>   * @base: base plane state
> - * @composer: data required for composing computation
> + * @frame_info: data required for composing computation
>   */
>  struct vkms_plane_state {
>  	struct drm_shadow_plane_state base;
> -	struct vkms_composer *composer;
> +	struct vkms_frame_info *frame_info;
>  };
>  
>  struct vkms_plane {
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 32409e15244b..a56b0f76eddd 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -24,20 +24,20 @@ static struct drm_plane_state *
>  vkms_plane_duplicate_state(struct drm_plane *plane)
>  {
>  	struct vkms_plane_state *vkms_state;
> -	struct vkms_composer *composer;
> +	struct vkms_frame_info *frame_info;
>  
>  	vkms_state = kzalloc(sizeof(*vkms_state), GFP_KERNEL);
>  	if (!vkms_state)
>  		return NULL;
>  
> -	composer = kzalloc(sizeof(*composer), GFP_KERNEL);
> -	if (!composer) {
> -		DRM_DEBUG_KMS("Couldn't allocate composer\n");
> +	frame_info = kzalloc(sizeof(*frame_info), GFP_KERNEL);
> +	if (!frame_info) {
> +		DRM_DEBUG_KMS("Couldn't allocate frame_info\n");
>  		kfree(vkms_state);
>  		return NULL;
>  	}
>  
> -	vkms_state->composer = composer;
> +	vkms_state->frame_info = frame_info;
>  
>  	__drm_gem_duplicate_shadow_plane_state(plane, &vkms_state->base);
>  
> @@ -54,12 +54,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
>  		/* dropping the reference we acquired in
>  		 * vkms_primary_plane_update()
>  		 */
> -		if (drm_framebuffer_read_refcount(&vkms_state->composer->fb))
> -			drm_framebuffer_put(&vkms_state->composer->fb);
> +		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
> +			drm_framebuffer_put(&vkms_state->frame_info->fb);
>  	}
>  
> -	kfree(vkms_state->composer);
> -	vkms_state->composer = NULL;
> +	kfree(vkms_state->frame_info);
> +	vkms_state->frame_info = NULL;
>  
>  	__drm_gem_destroy_shadow_plane_state(&vkms_state->base);
>  	kfree(vkms_state);
> @@ -99,7 +99,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	struct vkms_plane_state *vkms_plane_state;
>  	struct drm_shadow_plane_state *shadow_plane_state;
>  	struct drm_framebuffer *fb = new_state->fb;
> -	struct vkms_composer *composer;
> +	struct vkms_frame_info *frame_info;
>  
>  	if (!new_state->crtc || !fb)
>  		return;
> @@ -107,15 +107,15 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	vkms_plane_state = to_vkms_plane_state(new_state);
>  	shadow_plane_state = &vkms_plane_state->base;
>  
> -	composer = vkms_plane_state->composer;
> -	memcpy(&composer->src, &new_state->src, sizeof(struct drm_rect));
> -	memcpy(&composer->dst, &new_state->dst, sizeof(struct drm_rect));
> -	memcpy(&composer->fb, fb, sizeof(struct drm_framebuffer));
> -	memcpy(&composer->map, &shadow_plane_state->data, sizeof(composer->map));
> -	drm_framebuffer_get(&composer->fb);
> -	composer->offset = fb->offsets[0];
> -	composer->pitch = fb->pitches[0];
> -	composer->cpp = fb->format->cpp[0];
> +	frame_info = vkms_plane_state->frame_info;
> +	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
> +	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
> +	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
> +	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
> +	drm_framebuffer_get(&frame_info->fb);
> +	frame_info->offset = fb->offsets[0];
> +	frame_info->pitch = fb->pitches[0];
> +	frame_info->cpp = fb->format->cpp[0];
>  }
>  
>  static int vkms_plane_atomic_check(struct drm_plane *plane,
> -- 
> 2.30.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 5/9] drm: vkms: Add fb information to `vkms_writeback_job`
  2022-01-21 21:38 ` [PATCH v4 5/9] drm: vkms: Add fb information to `vkms_writeback_job` Igor Torrente
@ 2022-02-08 10:22   ` Melissa Wen
  0 siblings, 0 replies; 31+ messages in thread
From: Melissa Wen @ 2022-02-08 10:22 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	melissa.srw, tzimmermann, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 6714 bytes --]

On 01/21, Igor Torrente wrote:
> This commit is the groundwork to introduce new formats to the planes and
> writeback buffer. As part of it, a new buffer metadata field is added to
> `vkms_writeback_job`, this metadata is represented by the `vkms_composer`
> struct.
> 
> This will allow us, in the future, to have different compositing and wb
> format types.
lgtm.

Reviewed-by: Melissa Wen <mwen@igalia.com>
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
> V2: Change the code to get the drm_framebuffer reference and not copy its
>     contents(Thomas Zimmermann).
> 
> V3: Drop the refcount in the wb code(Thomas Zimmermann).
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c  |  4 ++--
>  drivers/gpu/drm/vkms/vkms_drv.h       | 12 ++++++------
>  drivers/gpu/drm/vkms/vkms_plane.c     | 10 +++++-----
>  drivers/gpu/drm/vkms/vkms_writeback.c | 20 +++++++++++++++++---
>  4 files changed, 30 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 2d946368a561..95029d2ebcac 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -153,7 +153,7 @@ static void compose_plane(struct vkms_frame_info *primary_plane_info,
>  			  struct vkms_frame_info *plane_frame_info,
>  			  void *vaddr_out)
>  {
> -	struct drm_framebuffer *fb = &plane_frame_info->fb;
> +	struct drm_framebuffer *fb = plane_frame_info->fb;
>  	void *vaddr;
>  	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>  
> @@ -175,7 +175,7 @@ static int compose_active_planes(void **vaddr_out,
>  				 struct vkms_frame_info *primary_plane_info,
>  				 struct vkms_crtc_state *crtc_state)
>  {
> -	struct drm_framebuffer *fb = &primary_plane_info->fb;
> +	struct drm_framebuffer *fb = primary_plane_info->fb;
>  	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>  	const void *vaddr;
>  	int i;
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 2e6342164bef..c850d755247c 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -22,13 +22,8 @@
>  
>  #define NUM_OVERLAY_PLANES 8
>  
> -struct vkms_writeback_job {
> -	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
> -	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
> -};
> -
>  struct vkms_frame_info {
> -	struct drm_framebuffer fb;
> +	struct drm_framebuffer *fb;
>  	struct drm_rect src, dst;
>  	struct dma_buf_map map[DRM_FORMAT_MAX_PLANES];
>  	unsigned int offset;
> @@ -36,6 +31,11 @@ struct vkms_frame_info {
>  	unsigned int cpp;
>  };
>  
> +struct vkms_writeback_job {
> +	struct dma_buf_map data[DRM_FORMAT_MAX_PLANES];
> +	struct vkms_frame_info frame_info;
> +};
> +
>  /**
>   * vkms_plane_state - Driver specific plane state
>   * @base: base plane state
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index a56b0f76eddd..28752af0118c 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -50,12 +50,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
>  	struct vkms_plane_state *vkms_state = to_vkms_plane_state(old_state);
>  	struct drm_crtc *crtc = vkms_state->base.base.crtc;
>  
> -	if (crtc) {
> +	if (crtc && vkms_state->frame_info->fb) {
>  		/* dropping the reference we acquired in
>  		 * vkms_primary_plane_update()
>  		 */
> -		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
> -			drm_framebuffer_put(&vkms_state->frame_info->fb);
> +		if (drm_framebuffer_read_refcount(vkms_state->frame_info->fb))
> +			drm_framebuffer_put(vkms_state->frame_info->fb);
>  	}
>  
>  	kfree(vkms_state->frame_info);
> @@ -110,9 +110,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	frame_info = vkms_plane_state->frame_info;
>  	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
>  	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
> -	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
> +	frame_info->fb = fb;
>  	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
> -	drm_framebuffer_get(&frame_info->fb);
> +	drm_framebuffer_get(frame_info->fb);
>  	frame_info->offset = fb->offsets[0];
>  	frame_info->pitch = fb->pitches[0];
>  	frame_info->cpp = fb->format->cpp[0];
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index 8694227f555f..de379331b236 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -75,12 +75,15 @@ static int vkms_wb_prepare_job(struct drm_writeback_connector *wb_connector,
>  	if (!vkmsjob)
>  		return -ENOMEM;
>  
> -	ret = drm_gem_fb_vmap(job->fb, vkmsjob->map, vkmsjob->data);
> +	ret = drm_gem_fb_vmap(job->fb, vkmsjob->frame_info.map, vkmsjob->data);
>  	if (ret) {
>  		DRM_ERROR("vmap failed: %d\n", ret);
>  		goto err_kfree;
>  	}
>  
> +	vkmsjob->frame_info.fb = job->fb;
> +	drm_framebuffer_get(vkmsjob->frame_info.fb);
> +
>  	job->priv = vkmsjob;
>  
>  	return 0;
> @@ -99,7 +102,9 @@ static void vkms_wb_cleanup_job(struct drm_writeback_connector *connector,
>  	if (!job->fb)
>  		return;
>  
> -	drm_gem_fb_vunmap(job->fb, vkmsjob->map);
> +	drm_gem_fb_vunmap(job->fb, vkmsjob->frame_info.map);
> +
> +	drm_framebuffer_put(vkmsjob->frame_info.fb);
>  
>  	vkmsdev = drm_device_to_vkms_device(job->fb->dev);
>  	vkms_set_composer(&vkmsdev->output, false);
> @@ -116,14 +121,23 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>  	struct drm_writeback_connector *wb_conn = &output->wb_connector;
>  	struct drm_connector_state *conn_state = wb_conn->base.state;
>  	struct vkms_crtc_state *crtc_state = output->composer_state;
> +	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
> +	struct vkms_writeback_job *active_wb;
> +	struct vkms_frame_info *wb_frame_info;
>  
>  	if (!conn_state)
>  		return;
>  
>  	vkms_set_composer(&vkmsdev->output, true);
>  
> +	active_wb = conn_state->writeback_job->priv;
> +	wb_frame_info = &active_wb->frame_info;
> +
>  	spin_lock_irq(&output->composer_lock);
> -	crtc_state->active_writeback = conn_state->writeback_job->priv;
> +	crtc_state->active_writeback = active_wb;
> +	wb_frame_info->offset = fb->offsets[0];
> +	wb_frame_info->pitch = fb->pitches[0];
> +	wb_frame_info->cpp = fb->format->cpp[0];
>  	crtc_state->wb_pending = true;
>  	spin_unlock_irq(&output->composer_lock);
>  	drm_writeback_queue_job(wb_conn, connector_state);
> -- 
> 2.30.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-01-21 21:38 ` [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats Igor Torrente
@ 2022-02-08 10:40   ` Melissa Wen
  2022-02-09  0:58     ` Igor Torrente
  2022-02-10  9:37   ` Pekka Paalanen
  1 sibling, 1 reply; 31+ messages in thread
From: Melissa Wen @ 2022-02-08 10:40 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, kernel test robot, rodrigosiqueiramelo, airlied,
	dri-devel, melissa.srw, tzimmermann, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 26391 bytes --]

On 01/21, Igor Torrente wrote:
> Currently the blend function only accepts XRGB_8888 and ARGB_8888
> as a color input.
> 
> This patch refactors all the functions related to the plane composition
> to overcome this limitation.
> 
> A new internal format(`struct pixel`) is introduced to deal with all
> possible inputs. It consists of 16 bits fields that represent each of
> the channels.
> 
> The pixels blend is done using this internal format. And new handlers
> are being added to convert a specific format to/from this internal format.
> 
> So the blend operation depends on these handlers to convert to this common
> format. The blended result, if necessary, is converted to the writeback
> buffer format.
> 
> This patch introduces three major differences to the blend function.
> 1 - All the planes are blended at once.
> 2 - The blend calculus is done as per line instead of per pixel.
> 3 - It is responsible to calculates the CRC and writing the writeback
>     buffer(if necessary).
> 
> These changes allow us to allocate way less memory in the intermediate
> buffer to compute these operations. Because now we don't need to
> have the entire intermediate image lines at once, just one line is
> enough.
> 
> | Memory consumption (output dimensions) |
> |:--------------------------------------:|
> |       Current      |     This patch    |
> |:------------------:|:-----------------:|
> |   Width * Heigth   |     2 * Width     |
> 
> Beyond memory, we also have a minor performance benefit from all
> these changes. Results running the IGT tests `*kms_cursor_crc*`:
>
First, thanks for this improvement.

Some recent changes in kms_cursor_crc caused VKMS to fail in most test
cases (iirc, only size-change and alpha-opaque are passing currently).
But saying that performance improvement here would cause a
misunderstanding when reviewing the change history. Can you update this
statistics here? I think you can specify the IGT hash to specify the
test case version or you can pick another test for comparison.  
> |                 Frametime                  |
> |:------------------------------------------:|
> |  Implementation |  Current  |  This commit |
> |:---------------:|:---------:|:------------:|
> | frametime range |  8~22 ms  |    5~18 ms   |
> |     Average     |  10.0 ms  |    7.3 ms    |
> 
> Reported-by: kernel test robot <lkp@intel.com>
A little confusing for me to have this reported-by tag without any
explanation of what was reported and fixed. Can you specify it?
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
> V2: Improves the performance drastically, by perfoming the operations
>     per-line and not per-pixel(Pekka Paalanen).
>     Minor improvements(Pekka Paalanen).
> 
> V3: Changes the code to blend the planes all at once. This improves
>     performance, memory consumption, and removes much of the weirdness
>     of the V2(Pekka Paalanen and me).
>     Minor improvements(Pekka Paalanen and me).
> 
> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
Can you move version changes up so that they are not ignored?

I also pointed out minor code style issue below.
With these comments addressed, you can add my r-b tag in the next
version.
> ---
>  drivers/gpu/drm/vkms/Makefile        |   1 +
>  drivers/gpu/drm/vkms/vkms_composer.c | 335 +++++++++++++--------------
>  drivers/gpu/drm/vkms/vkms_formats.c  | 138 +++++++++++
>  drivers/gpu/drm/vkms/vkms_formats.h  |  31 +++
>  4 files changed, 333 insertions(+), 172 deletions(-)
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> 
> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
> index 72f779cbfedd..1b28a6a32948 100644
> --- a/drivers/gpu/drm/vkms/Makefile
> +++ b/drivers/gpu/drm/vkms/Makefile
> @@ -3,6 +3,7 @@ vkms-y := \
>  	vkms_drv.o \
>  	vkms_plane.o \
>  	vkms_output.o \
> +	vkms_formats.o \
>  	vkms_crtc.o \
>  	vkms_composer.o \
>  	vkms_writeback.o
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 95029d2ebcac..9f70fcf84fb9 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -9,202 +9,210 @@
>  #include <drm/drm_vblank.h>
>  
>  #include "vkms_drv.h"
> +#include "vkms_formats.h"
>  
> -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> -				 const struct vkms_frame_info *frame_info)
> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>  {
> -	u32 pixel;
> -	int src_offset = frame_info->offset + (y * frame_info->pitch)
> -					    + (x * frame_info->cpp);
> +	u32 new_color;
>  
> -	pixel = *(u32 *)&buffer[src_offset];
> +	new_color = (src * 0xffff + dst * (0xffff - alpha));
>  
> -	return pixel;
> +	return DIV_ROUND_UP(new_color, 0xffff);
>  }
>  
>  /**
> - * compute_crc - Compute CRC value on output frame
> + * pre_mul_alpha_blend - alpha blending equation
> + * @src_frame_info: source framebuffer's metadata
> + * @stage_buffer: The line with the pixels from src_plane
> + * @output_buffer: A line buffer that receives all the blends output
>   *
> - * @vaddr: address to final framebuffer
> - * @frame_info: framebuffer's metadata
> + * Using the information from the `frame_info`, this blends only the
> + * necessary pixels from the `stage_buffer` to the `output_buffer`
> + * using premultiplied blend formula.
>   *
> - * returns CRC value computed using crc32 on the visible portion of
> - * the final framebuffer at vaddr_out
> + * The current DRM assumption is that pixel color values have been already
> + * pre-multiplied with the alpha channel values. See more
> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
> + * completely opaque background.
>   */
> -static uint32_t compute_crc(const u8 *vaddr,
> -			    const struct vkms_frame_info *frame_info)
> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> +				struct line_buffer *stage_buffer,
> +				struct line_buffer *output_buffer)
>  {
> -	int x, y;
> -	u32 crc = 0, pixel = 0;
> -	int x_src = frame_info->src.x1 >> 16;
> -	int y_src = frame_info->src.y1 >> 16;
> -	int h_src = drm_rect_height(&frame_info->src) >> 16;
> -	int w_src = drm_rect_width(&frame_info->src) >> 16;
> -
> -	for (y = y_src; y < y_src + h_src; ++y) {
> -		for (x = x_src; x < x_src + w_src; ++x) {
> -			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
> -			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
> -		}
> +	int x, x_dst = frame_info->dst.x1;
> +	int x_limit = drm_rect_width(&frame_info->dst);
> +	struct line_buffer *out = output_buffer + x_dst;
> +	struct line_buffer *in = stage_buffer;
> +
> +	for (x = 0; x < x_limit; x++) {
> +		out[x].a = (u16)0xffff;
> +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>  	}
> -
> -	return crc;
>  }
>  
> -static u8 blend_channel(u8 src, u8 dst, u8 alpha)
> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>  {
> -	u32 pre_blend;
> -	u8 new_color;
> -
> -	pre_blend = (src * 255 + dst * (255 - alpha));
> -
> -	/* Faster div by 255 */
> -	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
> +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
> +		return true;
>  
> -	return new_color;
> +	return false;
>  }
>  
>  /**
> - * alpha_blend - alpha blending equation
> - * @argb_src: src pixel on premultiplied alpha mode
> - * @argb_dst: dst pixel completely opaque
> - *
> - * blend pixels using premultiplied blend formula. The current DRM assumption
> - * is that pixel color values have been already pre-multiplied with the alpha
> - * channel values. See more drm_plane_create_blend_mode_property(). Also, this
> - * formula assumes a completely opaque background.
> - */
> -static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
> -{
> -	u8 alpha;
> -
> -	alpha = argb_src[3];
> -	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
> -	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
> -	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
> -}
> -
> -/**
> - * x_blend - blending equation that ignores the pixel alpha
> - *
> - * overwrites RGB color value from src pixel to dst pixel.
> - */
> -static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
> -{
> -	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
> -}
> -
> -/**
> - * blend - blend value at vaddr_src with value at vaddr_dst
> - * @vaddr_dst: destination address
> - * @vaddr_src: source address
> - * @dst_frame_info: destination framebuffer's metadata
> - * @src_frame_info: source framebuffer's metadata
> - * @pixel_blend: blending equation based on plane format
> + * @wb_frame_info: The writeback frame buffer metadata
> + * @wb_fmt_func: The format tranformatio function to the wb buffer
> + * @crtc_state: The crtc state
> + * @plane_fmt_func: A format tranformation function to each plane
> + * @crc32: The crc output of the final frame
> + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
> + * @stage_buffer: The line with the pixels from src_compositor
>   *
> - * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
> - * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
> - * and clearing alpha channel to an completely opaque background. This function
> - * uses buffer's metadata to locate the new composite values at vaddr_dst.
> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
> + * from all planes, calculates the crc32 of the output from the former step,
> + * and, if necessary, convert and store the output to the writeback buffer.
>   *
>   * TODO: completely clear the primary plane (a = 0xff) before starting to blend
>   * pixel color values
>   */
> -static void blend(void *vaddr_dst, void *vaddr_src,
> -		  struct vkms_frame_info *dst_frame_info,
> -		  struct vkms_frame_info *src_frame_info,
> -		  void (*pixel_blend)(const u8 *, u8 *))
> +static void blend(struct vkms_frame_info *wb_frame_info,
> +		  format_transform_func wb_fmt_func,
> +		  struct vkms_crtc_state *crtc_state,
> +		  format_transform_func *plane_fmt_func,
> +		  u32 *crc32, struct line_buffer *stage_buffer,
> +		  struct line_buffer *output_buffer, s64 row_size)
>  {
> -	int i, j, j_dst, i_dst;
> -	int offset_src, offset_dst;
> -	u8 *pixel_dst, *pixel_src;
> -
> -	int x_src = src_frame_info->src.x1 >> 16;
> -	int y_src = src_frame_info->src.y1 >> 16;
> -
> -	int x_dst = src_frame_info->dst.x1;
> -	int y_dst = src_frame_info->dst.y1;
> -	int h_dst = drm_rect_height(&src_frame_info->dst);
> -	int w_dst = drm_rect_width(&src_frame_info->dst);
> +	struct vkms_plane_state **plane = crtc_state->active_planes;
> +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
> +	u32 n_active_planes = crtc_state->num_active_planes;
>  
> +	int y_src = primary_plane_info->dst.y1;
> +	int h_dst = drm_rect_height(&primary_plane_info->dst);
>  	int y_limit = y_src + h_dst;
> -	int x_limit = x_src + w_dst;
> -
> -	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
> -		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
> -			offset_dst = dst_frame_info->offset
> -				     + (i_dst * dst_frame_info->pitch)
> -				     + (j_dst++ * dst_frame_info->cpp);
> -			offset_src = src_frame_info->offset
> -				     + (i * src_frame_info->pitch)
> -				     + (j * src_frame_info->cpp);
> -
> -			pixel_src = (u8 *)(vaddr_src + offset_src);
> -			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
> -			pixel_blend(pixel_src, pixel_dst);
> -			/* clearing alpha channel (0xff)*/
> -			pixel_dst[3] = 0xff;
> +	int y, i;
> +
> +	for (y = y_src; y < y_limit; y++) {
> +		plane_fmt_func[0](primary_plane_info, y, output_buffer);
> +
> +		/* If there are other planes besides primary, we consider the active
> +		 * planes should be in z-order and compose them associatively:
> +		 * ((primary <- overlay) <- cursor)
> +		 */
> +		for (i = 1; i < n_active_planes; i++) {
> +			if (!check_y_limit(plane[i]->frame_info, y))
> +				continue;
> +
> +			plane_fmt_func[i](plane[i]->frame_info, y, stage_buffer);
> +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> +					    output_buffer);
>  		}
> -		i_dst++;
> +
> +		*crc32 = crc32_le(*crc32, (void *)output_buffer, row_size);
> +
> +		if (wb_frame_info)
> +			wb_fmt_func(wb_frame_info, y, output_buffer);
>  	}
>  }
>  
> -static void compose_plane(struct vkms_frame_info *primary_plane_info,
> -			  struct vkms_frame_info *plane_frame_info,
> -			  void *vaddr_out)
> +static void get_format_transform_functions(struct vkms_crtc_state *crtc_state,
> +					   format_transform_func plane_funcs[])
>  {
> -	struct drm_framebuffer *fb = plane_frame_info->fb;
> -	void *vaddr;
> -	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
> +	struct vkms_plane_state **active_planes = crtc_state->active_planes;
> +	u32 n_active_planes = crtc_state->num_active_planes, s_fmt;
> +	int i;
>  
> -	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
> -		return;
> +	for (i = 0; i < n_active_planes; i++) {
> +		s_fmt = active_planes[i]->frame_info->fb->format->format;
> +		plane_funcs[i] = get_fmt_transform_function(s_fmt);
> +	}
> +}
>  
> -	vaddr = plane_frame_info->map[0].vaddr;
> +static bool check_planes_x_bounds(struct vkms_crtc_state *crtc_state,
> +				  struct vkms_frame_info *wb_frame_info)
> +{
> +	struct vkms_plane_state **planes = crtc_state->active_planes;
> +	struct vkms_frame_info *primary_plane_info = planes[0]->frame_info;
> +	int line_width = drm_rect_width(&primary_plane_info->dst);
> +	u32 n_active_planes = crtc_state->num_active_planes;
> +	int i;
>  
> -	if (fb->format->format == DRM_FORMAT_ARGB8888)
> -		pixel_blend = &alpha_blend;
> -	else
> -		pixel_blend = &x_blend;
> +	for (i = 0; i < n_active_planes; i++) {
> +		int x_dst = planes[i]->frame_info->dst.x1;
> +		int x_src = planes[i]->frame_info->src.x1 >> 16;
> +		int x2_src = planes[i]->frame_info->src.x2 >> 16;
> +		int x_limit = drm_rect_width(&planes[i]->frame_info->dst);
>  
> -	blend(vaddr_out, vaddr, primary_plane_info,
> -	      plane_frame_info, pixel_blend);
> +		if (x_dst + x_limit > line_width)
> +			return false;
> +		if (x_src + x_limit > x2_src)
> +			return false;
> +	}
> +
> +	return true;
>  }
>  
> -static int compose_active_planes(void **vaddr_out,
> -				 struct vkms_frame_info *primary_plane_info,
> -				 struct vkms_crtc_state *crtc_state)
> +static int compose_active_planes(struct vkms_frame_info *wb_frame_info,
> +				 struct vkms_crtc_state *crtc_state,
> +				 u32 *crc32)
>  {
> -	struct drm_framebuffer *fb = primary_plane_info->fb;
> -	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
> -	const void *vaddr;
> -	int i;
> +	format_transform_func plane_funcs[NUM_OVERLAY_PLANES], wb_func = NULL;
> +	int line_width, ret = 0, pixel_size = sizeof(struct line_buffer);
> +	struct vkms_frame_info *primary_plane_info = NULL;
> +	struct line_buffer *output_buffer, *stage_buffer;
> +	struct vkms_plane_state *act_plane = NULL;
> +	u32 wb_format;
>  
> -	if (!*vaddr_out) {
> -		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
> -		if (!*vaddr_out) {
> -			DRM_ERROR("Cannot allocate memory for output frame.");
> -			return -ENOMEM;
> -		}
> +	if (WARN_ON(pixel_size != 8))
> +		return -EINVAL;
> +
> +	if (crtc_state->num_active_planes >= 1) {
> +		act_plane = crtc_state->active_planes[0];
> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> +			primary_plane_info = act_plane->frame_info;
>  	}
>  
> +	if (!primary_plane_info)
> +		return -EINVAL;
> +
>  	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>  		return -EINVAL;
>  
> -	vaddr = primary_plane_info->map[0].vaddr;
> +	if (WARN_ON(!check_planes_x_bounds(crtc_state, wb_frame_info)))
> +		return -EINVAL;
>  
> -	memcpy(*vaddr_out, vaddr, gem_obj->size);
> +	line_width = drm_rect_width(&primary_plane_info->dst);
>  
> -	/* If there are other planes besides primary, we consider the active
> -	 * planes should be in z-order and compose them associatively:
> -	 * ((primary <- overlay) <- cursor)
> -	 */
> -	for (i = 1; i < crtc_state->num_active_planes; i++)
> -		compose_plane(primary_plane_info,
> -			      crtc_state->active_planes[i]->frame_info,
> -			      *vaddr_out);
> +	stage_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> +	if (!stage_buffer) {
> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
> +		return -ENOMEM;
> +	}
>  
> -	return 0;
> +	output_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> +	if (!output_buffer) {
> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
> +		ret = -ENOMEM;
> +		goto free_stage_buffer;
> +	}
> +
> +	get_format_transform_functions(crtc_state, plane_funcs);
> +
> +	if (wb_frame_info) {
> +		wb_format = wb_frame_info->fb->format->format;
> +		wb_func = get_wb_fmt_transform_function(wb_format);
> +		wb_frame_info->src = primary_plane_info->src;
> +		wb_frame_info->dst = primary_plane_info->dst;
> +	}
> +
> +	blend(wb_frame_info, wb_func, crtc_state, plane_funcs, crc32,
> +	      stage_buffer, output_buffer, (s64)line_width * pixel_size);
> +
> +	kvfree(output_buffer);
> +free_stage_buffer:
> +	kvfree(stage_buffer);
> +
> +	return ret;
>  }
>  
>  /**
> @@ -222,13 +230,12 @@ void vkms_composer_worker(struct work_struct *work)
>  						struct vkms_crtc_state,
>  						composer_work);
>  	struct drm_crtc *crtc = crtc_state->base.crtc;
> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
> +	struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>  	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
> -	struct vkms_frame_info *primary_plane_info = NULL;
> -	struct vkms_plane_state *act_plane = NULL;
>  	bool crc_pending, wb_pending;
> -	void *vaddr_out = NULL;
> -	u32 crc32 = 0;
>  	u64 frame_start, frame_end;
> +	u32 crc32 = 0;
>  	int ret;
>  
>  	spin_lock_irq(&out->composer_lock);
> @@ -248,35 +255,19 @@ void vkms_composer_worker(struct work_struct *work)
>  	if (!crc_pending)
>  		return;
>  
> -	if (crtc_state->num_active_planes >= 1) {
> -		act_plane = crtc_state->active_planes[0];
> -		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> -			primary_plane_info = act_plane->frame_info;
> -	}
> -
> -	if (!primary_plane_info)
> -		return;
> -
>  	if (wb_pending)
> -		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
> +		ret = compose_active_planes(wb_frame_info, crtc_state, &crc32);
> +	else
> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>  
> -	ret = compose_active_planes(&vaddr_out, primary_plane_info,
> -				    crtc_state);
> -	if (ret) {
> -		if (ret == -EINVAL && !wb_pending)
> -			kvfree(vaddr_out);
> +	if (ret)
>  		return;
> -	}
> -
> -	crc32 = compute_crc(vaddr_out, primary_plane_info);
>  
>  	if (wb_pending) {
>  		drm_writeback_signal_completion(&out->wb_connector, 0);
>  		spin_lock_irq(&out->composer_lock);
>  		crtc_state->wb_pending = false;
>  		spin_unlock_irq(&out->composer_lock);
> -	} else {
> -		kvfree(vaddr_out);
>  	}
>  
>  	/*
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> new file mode 100644
> index 000000000000..0d1838d1b835
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -0,0 +1,138 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
checkpatch complains here ^ Use `\\`
> +
> +#include <drm/drm_rect.h>
> +#include "vkms_formats.h"
> +
> +format_transform_func get_fmt_transform_function(u32 format)
> +{
> +	if (format == DRM_FORMAT_ARGB8888)
> +		return &ARGB8888_to_ARGB16161616;
> +	else
> +		return &XRGB8888_to_ARGB16161616;
> +}
> +
> +format_transform_func get_wb_fmt_transform_function(u32 format)
> +{
> +	if (format == DRM_FORMAT_ARGB8888)
> +		return &convert_to_ARGB8888;
> +	else
> +		return &convert_to_XRGB8888;
> +}
> +
> +static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
> +{
> +	return frame_info->offset + (y * frame_info->pitch)
> +				  + (x * frame_info->cpp);
> +}
> +
> +/*
> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> + *
> + * @frame_info: Buffer metadata
> + * @x: The x(width) coordinate of the 2D buffer
> + * @y: The y(Heigth) coordinate of the 2D buffer
> + *
> + * Takes the information stored in the frame_info, a pair of coordinates, and
> + * returns the address of the first color channel.
> + * This function assumes the channels are packed together, i.e. a color channel
> + * comes immediately after another in the memory. And therefore, this function
> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> + */
> +static void *packed_pixels_addr(struct vkms_frame_info *frame_info, int x, int y)
> +{
> +	int offset = pixel_offset(frame_info, x, y);
> +
> +	return (u8 *)frame_info->map[0].vaddr + offset;
> +}
> +
> +static void *get_packed_src_addr(struct vkms_frame_info *frame_info, int y)
> +{
> +	int x_src = frame_info->src.x1 >> 16;
> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
> +
> +	return packed_pixels_addr(frame_info, x_src, y_src);
> +}
> +
> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			      struct line_buffer *stage_buffer)
> +{
> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> +		/*
> +		 * Organizes the channels in their respective positions and converts
> +		 * the 8 bits channel to 16.
> +		 * The 257 is the "conversion ratio". This number is obtained by the
> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
> +		 * the best color value in a pixel format with more possibilities.
> +		 * And a similar idea applies to others RGB color conversions.
> +		 */
> +		stage_buffer[x].a = (u16)src_pixels[3] * 257;
> +		stage_buffer[x].r = (u16)src_pixels[2] * 257;
> +		stage_buffer[x].g = (u16)src_pixels[1] * 257;
> +		stage_buffer[x].b = (u16)src_pixels[0] * 257;
> +	}
> +}
> +
> +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			      struct line_buffer *stage_buffer)
> +{
> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> +		stage_buffer[x].a = (u16)0xffff;
> +		stage_buffer[x].r = (u16)src_pixels[2] * 257;
> +		stage_buffer[x].g = (u16)src_pixels[1] * 257;
> +		stage_buffer[x].b = (u16)src_pixels[0] * 257;
> +	}
> +}
> +
> +/*
> + * The following  functions take an line of ARGB16161616 pixels from the
> + * src_buffer, convert them to a specific format, and store them in the
> + * destination.
> + *
> + * They are used in the `compose_active_planes` to convert and store a line
> + * from the src_buffer to the writeback buffer.
> + */
> +void convert_to_ARGB8888(struct vkms_frame_info *frame_info,
> +			 int y, struct line_buffer *src_buffer)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	int x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		/*
> +		 * This sequence below is important because the format's byte order is
> +		 * in little-endian. In the case of the ARGB8888 the memory is
> +		 * organized this way:
> +		 *
> +		 * | Addr     | = blue channel
> +		 * | Addr + 1 | = green channel
> +		 * | Addr + 2 | = Red channel
> +		 * | Addr + 3 | = Alpha channel
> +		 */
> +		dst_pixels[3] = DIV_ROUND_UP(src_buffer[x].a, 257);
> +		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
> +		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
> +		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
> +	}
> +}
> +
> +void convert_to_XRGB8888(struct vkms_frame_info *frame_info,
> +			 int y, struct line_buffer *src_buffer)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	int x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		dst_pixels[3] = (u8)0xff;
> +		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
> +		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
> +		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
> +	}
> +}
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> new file mode 100644
> index 000000000000..817e8b2124ae
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
and here ^

> +
> +#ifndef _VKMS_FORMATS_H_
> +#define _VKMS_FORMATS_H_
> +
> +#include "vkms_drv.h"
> +
> +struct line_buffer {
> +	u16 a, r, g, b;
> +};
> +
> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			      struct line_buffer *stage_buffer);
> +
> +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			      struct line_buffer *stage_buffer);
> +
> +void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
> +			 struct line_buffer *src_buffer);
> +
> +void convert_to_XRGB8888(struct vkms_frame_info *frame_info, int y,
> +			 struct line_buffer *src_buffer);
> +
> +typedef void (*format_transform_func)(struct vkms_frame_info *frame_info, int y,
> +				      struct line_buffer *buffer);
> +
> +format_transform_func get_fmt_transform_function(u32 format);
> +
> +format_transform_func get_wb_fmt_transform_function(u32 format);
> +
> +#endif /* _VKMS_FORMATS_H_ */
> -- 
> 2.30.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 9/9] drm: vkms: Add support to the RGB565 format
  2022-01-21 21:38 ` [PATCH v4 9/9] drm: vkms: Add support to the RGB565 format Igor Torrente
@ 2022-02-08 10:50   ` Melissa Wen
  2022-02-10  9:50   ` Pekka Paalanen
  1 sibling, 0 replies; 31+ messages in thread
From: Melissa Wen @ 2022-02-08 10:50 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	melissa.srw, tzimmermann, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 7436 bytes --]

On 01/21, Igor Torrente wrote:
> Adds this common format to vkms.
> 
> This commit also adds new helper macros to deal with fixed-point
> arithmetic.
> 
> It was done to improve the precision of the conversion to ARGB16161616
> since the "conversion ratio" is not an integer.
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
> V3: Adapt the handlers to the new format introduced in patch 7 V3.
> ---
>  drivers/gpu/drm/vkms/vkms_formats.c   | 74 +++++++++++++++++++++++++++
>  drivers/gpu/drm/vkms/vkms_formats.h   |  6 +++
>  drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
>  drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
>  4 files changed, 86 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 661da39d1276..dc612882dd8c 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -11,6 +11,8 @@ format_transform_func get_fmt_transform_function(u32 format)
>  		return &get_ARGB16161616;
>  	else if (format == DRM_FORMAT_XRGB16161616)
>  		return &XRGB16161616_to_ARGB16161616;
> +	else if (format == DRM_FORMAT_RGB565)
> +		return &RGB565_to_ARGB16161616;
>  	else
>  		return &XRGB8888_to_ARGB16161616;
>  }
> @@ -23,6 +25,8 @@ format_transform_func get_wb_fmt_transform_function(u32 format)
>  		return &convert_to_ARGB16161616;
>  	else if (format == DRM_FORMAT_XRGB16161616)
>  		return &convert_to_XRGB16161616;
> +	else if (format == DRM_FORMAT_RGB565)
> +		return &convert_to_RGB565;
>  	else
>  		return &convert_to_XRGB8888;
>  }
> @@ -33,6 +37,26 @@ static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
>  				  + (x * frame_info->cpp);
>  }
>  
> +/*
> + * FP stands for _Fixed Point_ and **not** _Float Point_
> + * LF stands for Long Float (i.e. double)
> + * The following macros help doing fixed point arithmetic.
> + */
> +/*
> + * With FP scale 15 we have 17 and 15 bits of integer and fractional parts
> + * respectively.
> + *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
> + * 31                                          0
> + */
> +#define FP_SCALE 15
> +
> +#define LF_TO_FP(a) ((a) * (u64)(1 << FP_SCALE))
> +#define INT_TO_FP(a) ((a) << FP_SCALE)
> +#define FP_MUL(a, b) ((s32)(((s64)(a) * (b)) >> FP_SCALE))
> +#define FP_DIV(a, b) ((s32)(((s64)(a) << FP_SCALE) / (b)))
> +/* This macro converts a fixed point number to int, and round half up it */
> +#define FP_TO_INT_ROUND_UP(a) (((a) + (1 << (FP_SCALE - 1))) >> FP_SCALE)
> +
>  /*
>   * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>   *
> @@ -125,6 +149,33 @@ void XRGB16161616_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>  	}
>  }
>  
> +void RGB565_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			    struct line_buffer *stage_buffer)
> +{
> +	u16 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels++) {
> +		u16 rgb_565 = le16_to_cpu(*src_pixels);
> +		int fp_r = INT_TO_FP((rgb_565 >> 11) & 0x1f);
> +		int fp_g = INT_TO_FP((rgb_565 >> 5) & 0x3f);
> +		int fp_b = INT_TO_FP(rgb_565 & 0x1f);
> +
> +		/*
> +		 * The magic constants is the "conversion ratio" and is calculated
> +		 * dividing 65535(2^16 - 1) by 31(2^5 -1) and 63(2^6 - 1)
> +		 * respectively.
> +		 */
> +		int fp_rb_ratio = LF_TO_FP(2114.032258065);
> +		int fp_g_ratio = LF_TO_FP(1040.238095238);
> +
> +		stage_buffer[x].a = (u16)0xffff;
> +		stage_buffer[x].r = FP_TO_INT_ROUND_UP(FP_MUL(fp_r, fp_rb_ratio));
> +		stage_buffer[x].g = FP_TO_INT_ROUND_UP(FP_MUL(fp_g, fp_g_ratio));
> +		stage_buffer[x].b = FP_TO_INT_ROUND_UP(FP_MUL(fp_b, fp_rb_ratio));
> +	}
> +}
> +
I don't know if there is a testcase in IGT check this conversion, did
you use anyone here? Does it enables any other testcase?

Thanks,

Melissa
>  
>  /*
>   * The following  functions take an line of ARGB16161616 pixels from the
> @@ -203,3 +254,26 @@ void convert_to_XRGB16161616(struct vkms_frame_info *frame_info, int y,
>  		dst_pixels[0] = src_buffer[x].b;
>  	}
>  }
> +
> +void convert_to_RGB565(struct vkms_frame_info *frame_info, int y,
> +		       struct line_buffer *src_buffer)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	int x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels++) {
> +		int fp_r = INT_TO_FP(src_buffer[x].r);
> +		int fp_g = INT_TO_FP(src_buffer[x].g);
> +		int fp_b = INT_TO_FP(src_buffer[x].b);
> +
> +		int fp_rb_ratio = LF_TO_FP(2114.032258065);
> +		int fp_g_ratio = LF_TO_FP(1040.238095238);
> +
> +		u16 r = FP_TO_INT_ROUND_UP(FP_DIV(fp_r, fp_rb_ratio));
> +		u16 g = FP_TO_INT_ROUND_UP(FP_DIV(fp_g, fp_g_ratio));
> +		u16 b = FP_TO_INT_ROUND_UP(FP_DIV(fp_b, fp_rb_ratio));
> +
> +		*dst_pixels = cpu_to_le16(r << 11 | g << 5 | b);
> +	}
> +}
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index 22358f3a33ab..836d6e43ea90 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -21,6 +21,9 @@ void get_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>  void XRGB16161616_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>  				  struct line_buffer *stage_buffer);
>  
> +void RGB565_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			    struct line_buffer *stage_buffer);
> +
>  void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
>  			 struct line_buffer *src_buffer);
>  
> @@ -33,6 +36,9 @@ void convert_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>  void convert_to_XRGB16161616(struct vkms_frame_info *frame_info, int y,
>  			     struct line_buffer *src_buffer);
>  
> +void convert_to_RGB565(struct vkms_frame_info *frame_info, int y,
> +		       struct line_buffer *src_buffer);
> +
>  typedef void (*format_transform_func)(struct vkms_frame_info *frame_info, int y,
>  				      struct line_buffer *buffer);
>  
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 1d70c9e8f109..4643eefcdf29 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -13,14 +13,16 @@
>  
>  static const u32 vkms_formats[] = {
>  	DRM_FORMAT_XRGB8888,
> -	DRM_FORMAT_XRGB16161616
> +	DRM_FORMAT_XRGB16161616,
> +	DRM_FORMAT_RGB565
>  };
>  
>  static const u32 vkms_plane_formats[] = {
>  	DRM_FORMAT_ARGB8888,
>  	DRM_FORMAT_XRGB8888,
>  	DRM_FORMAT_XRGB16161616,
> -	DRM_FORMAT_ARGB16161616
> +	DRM_FORMAT_ARGB16161616,
> +	DRM_FORMAT_RGB565
>  };
>  
>  static struct drm_plane_state *
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index 393d3fc7966f..1aaa630090d3 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -15,7 +15,8 @@
>  static const u32 vkms_wb_formats[] = {
>  	DRM_FORMAT_XRGB8888,
>  	DRM_FORMAT_XRGB16161616,
> -	DRM_FORMAT_ARGB16161616
> +	DRM_FORMAT_ARGB16161616,
> +	DRM_FORMAT_RGB565
>  };
>  
>  static const struct drm_connector_funcs vkms_wb_connector_funcs = {
> -- 
> 2.30.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 0/9] Add new formats support to vkms
  2022-01-21 21:38 [PATCH v4 0/9] Add new formats support to vkms Igor Torrente
                   ` (8 preceding siblings ...)
  2022-01-21 21:38 ` [PATCH v4 9/9] drm: vkms: Add support to the RGB565 format Igor Torrente
@ 2022-02-08 11:03 ` Melissa Wen
  9 siblings, 0 replies; 31+ messages in thread
From: Melissa Wen @ 2022-02-08 11:03 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	melissa.srw, tzimmermann, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 4286 bytes --]

On 01/21, Igor Torrente wrote:
> Summary
> =======
> This series of patches refactor some vkms components in order to introduce
> new formats to the planes and writeback connector.
> 
> Now in the blend function, the plane's pixels are converted to ARGB16161616
> and then blended together.
> 
> The CRC is calculated based on the ARGB1616161616 buffer. And if required,
> this buffer is copied/converted to the writeback buffer format.
> 
> And to handle the pixel conversion, new functions were added to convert
> from a specific format to ARGB16161616 (the reciprocal is also true).
Hi Igor,

Thanks a lot for your work to improve the VKMS.
Overall, lgtm. I've pointed out some minor improvements, most of them
to better describe changes.
It seems that your are using a different version of the kms_cursor_crc test
and the test results diverge. Can you update and double-check the
statictics?

I also consider important to keep the version changes in the body of
each commit message. Can you move them to a place that it will not be
ignored when applying?
> 
> Tests
> =====
> This patch series was tested using the following igt tests:
> -t ".*kms_plane.*"
> -t ".*kms_writeback.*"
> -t ".*kms_cursor_crc*"
> -t ".*kms_flip.*"
> 
> New tests passing
> -------------------
> - pipe-A-cursor-size-change
> - pipe-A-cursor-alpha-transparent
> 
> Performance
> -----------
> Further optimizing the code, now it's running slightly faster than the V2.
> And it consumes less memory than the current implementation in the common case
> (more detail in the commit message).
> 
> Results running the IGT tests `kms_cursor_crc`:
> 
> |                             Frametime                                 |
> |:---------------:|:---------:|:--------------:|:------------:|:-------:|
> |  implmentation  |  Current  |  Per-pixel(V1) | Per-line(V2) |   V3    |
> | frametime range |  8~22 ms  |    32~56 ms    |    6~19 ms   | 5~18 ms |
> |     Average     |  10.0 ms  |     35.8 ms    |    8.6 ms    |  7.3 ms |
> 
> | Memory consumption (output dimensions) |
> |:--------------------------------------:|
> |       Current      |     This patch    |
> |:------------------:|:-----------------:|
> |   Width * Heigth   |     2 * Width     |
> 
> XRGB to ARGB behavior
> =====================
> During the development, I decided to always fill the alpha channel of
> the output pixel whenever the conversion from a format without an alpha
> channel to ARGB16161616 is necessary. Therefore, I ignore the value
> received from the XRGB and overwrite the value with 0xFFFF.
And you can also drop this TO-DO here (Clearing primary plane):
https://dri.freedesktop.org/docs/drm/gpu/vkms.html#add-plane-features

With these points addressed, you can add my r-b to the entire series:
Reviewed-by: Melissa Wen <mwen@igalia.com>

> 
> ---
> Igor Torrente (9):
>   drm: vkms: Replace the deprecated drm_mode_config_init
>   drm: vkms: Alloc the compose frame using vzalloc
>   drm: vkms: Replace hardcoded value of `vkms_composer.map` to
>     DRM_FORMAT_MAX_PLANES
>   drm: vkms: Rename `vkms_composer` to `vkms_frame_info`
>   drm: vkms: Add fb information to `vkms_writeback_job`
>   drm: drm_atomic_helper: Add a new helper to deal with the writeback
>     connector validation
>   drm: vkms: Refactor the plane composer to accept new formats
>   drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
>   drm: vkms: Add support to the RGB565 format
> 
>  drivers/gpu/drm/drm_atomic_helper.c   |  39 +++
>  drivers/gpu/drm/vkms/Makefile         |   1 +
>  drivers/gpu/drm/vkms/vkms_composer.c  | 336 +++++++++++++-------------
>  drivers/gpu/drm/vkms/vkms_drv.c       |   6 +-
>  drivers/gpu/drm/vkms/vkms_drv.h       |  20 +-
>  drivers/gpu/drm/vkms/vkms_formats.c   | 279 +++++++++++++++++++++
>  drivers/gpu/drm/vkms/vkms_formats.h   |  49 ++++
>  drivers/gpu/drm/vkms/vkms_plane.c     |  47 ++--
>  drivers/gpu/drm/vkms/vkms_writeback.c |  32 ++-
>  include/drm/drm_atomic_helper.h       |   3 +
>  10 files changed, 600 insertions(+), 212 deletions(-)
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> 
> -- 
> 2.30.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-02-08 10:40   ` Melissa Wen
@ 2022-02-09  0:58     ` Igor Torrente
  2022-02-09 21:45       ` Melissa Wen
  0 siblings, 1 reply; 31+ messages in thread
From: Igor Torrente @ 2022-02-09  0:58 UTC (permalink / raw)
  To: Melissa Wen
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	tzimmermann, ~lkcamp/patches

Hi Melissa,

On 2/8/22 07:40, Melissa Wen wrote:
> On 01/21, Igor Torrente wrote:
>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>> as a color input.
>>
>> This patch refactors all the functions related to the plane composition
>> to overcome this limitation.
>>
>> A new internal format(`struct pixel`) is introduced to deal with all
>> possible inputs. It consists of 16 bits fields that represent each of
>> the channels.
>>
>> The pixels blend is done using this internal format. And new handlers
>> are being added to convert a specific format to/from this internal format.
>>
>> So the blend operation depends on these handlers to convert to this common
>> format. The blended result, if necessary, is converted to the writeback
>> buffer format.
>>
>> This patch introduces three major differences to the blend function.
>> 1 - All the planes are blended at once.
>> 2 - The blend calculus is done as per line instead of per pixel.
>> 3 - It is responsible to calculates the CRC and writing the writeback
>>      buffer(if necessary).
>>
>> These changes allow us to allocate way less memory in the intermediate
>> buffer to compute these operations. Because now we don't need to
>> have the entire intermediate image lines at once, just one line is
>> enough.
>>
>> | Memory consumption (output dimensions) |
>> |:--------------------------------------:|
>> |       Current      |     This patch    |
>> |:------------------:|:-----------------:|
>> |   Width * Heigth   |     2 * Width     |
>>
>> Beyond memory, we also have a minor performance benefit from all
>> these changes. Results running the IGT tests `*kms_cursor_crc*`:
>>
> First, thanks for this improvement.
> 
> Some recent changes in kms_cursor_crc caused VKMS to fail in most test
> cases (iirc, only size-change and alpha-opaque are passing currently).

I updated my igt and kernel(from drm_misc/drm-misc-next) to the latest
commit[1][2] and I'm getting mixed results. Sometimes most of the test
passes, sometimes almost nothing passes.

[1] a96674e7 (tests/api_intel_bb: Handle different alignments in 
delta-check)
[2] b21a142fd205 (drm/nouveau/backlight: Just set all backlight types as 
RAW)

> But saying that performance improvement here would cause a
> misunderstanding when reviewing the change history. Can you update this
> statistics here? I think you can specify the IGT hash to specify the
> test case version or you can pick another test for comparison.

OK, I will do both.

>> |                 Frametime                  |
>> |:------------------------------------------:|
>> |  Implementation |  Current  |  This commit |
>> |:---------------:|:---------:|:------------:|
>> | frametime range |  8~22 ms  |    5~18 ms   |
>> |     Average     |  10.0 ms  |    7.3 ms    |
>>
>> Reported-by: kernel test robot <lkp@intel.com>
> A little confusing for me to have this reported-by tag without any
> explanation of what was reported and fixed. Can you specify it?
>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>> ---
>> V2: Improves the performance drastically, by perfoming the operations
>>      per-line and not per-pixel(Pekka Paalanen).
>>      Minor improvements(Pekka Paalanen).
>>
>> V3: Changes the code to blend the planes all at once. This improves
>>      performance, memory consumption, and removes much of the weirdness
>>      of the V2(Pekka Paalanen and me).
>>      Minor improvements(Pekka Paalanen and me).
>>
>> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
> Can you move version changes up so that they are not ignored?
> 
> I also pointed out minor code style issue below.
> With these comments addressed, you can add my r-b tag in the next
> version.
>> ---
>>   drivers/gpu/drm/vkms/Makefile        |   1 +
>>   drivers/gpu/drm/vkms/vkms_composer.c | 335 +++++++++++++--------------
>>   drivers/gpu/drm/vkms/vkms_formats.c  | 138 +++++++++++
>>   drivers/gpu/drm/vkms/vkms_formats.h  |  31 +++
>>   4 files changed, 333 insertions(+), 172 deletions(-)
>>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
>>
>> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
>> index 72f779cbfedd..1b28a6a32948 100644
>> --- a/drivers/gpu/drm/vkms/Makefile
>> +++ b/drivers/gpu/drm/vkms/Makefile
>> @@ -3,6 +3,7 @@ vkms-y := \
>>   	vkms_drv.o \
>>   	vkms_plane.o \
>>   	vkms_output.o \
>> +	vkms_formats.o \
>>   	vkms_crtc.o \
>>   	vkms_composer.o \
>>   	vkms_writeback.o
>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>> index 95029d2ebcac..9f70fcf84fb9 100644
>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>> @@ -9,202 +9,210 @@
>>   #include <drm/drm_vblank.h>
>>   
>>   #include "vkms_drv.h"
>> +#include "vkms_formats.h"
>>   
>> -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
>> -				 const struct vkms_frame_info *frame_info)
>> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>>   {
>> -	u32 pixel;
>> -	int src_offset = frame_info->offset + (y * frame_info->pitch)
>> -					    + (x * frame_info->cpp);
>> +	u32 new_color;
>>   
>> -	pixel = *(u32 *)&buffer[src_offset];
>> +	new_color = (src * 0xffff + dst * (0xffff - alpha));
>>   
>> -	return pixel;
>> +	return DIV_ROUND_UP(new_color, 0xffff);
>>   }
>>   
>>   /**
>> - * compute_crc - Compute CRC value on output frame
>> + * pre_mul_alpha_blend - alpha blending equation
>> + * @src_frame_info: source framebuffer's metadata
>> + * @stage_buffer: The line with the pixels from src_plane
>> + * @output_buffer: A line buffer that receives all the blends output
>>    *
>> - * @vaddr: address to final framebuffer
>> - * @frame_info: framebuffer's metadata
>> + * Using the information from the `frame_info`, this blends only the
>> + * necessary pixels from the `stage_buffer` to the `output_buffer`
>> + * using premultiplied blend formula.
>>    *
>> - * returns CRC value computed using crc32 on the visible portion of
>> - * the final framebuffer at vaddr_out
>> + * The current DRM assumption is that pixel color values have been already
>> + * pre-multiplied with the alpha channel values. See more
>> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
>> + * completely opaque background.
>>    */
>> -static uint32_t compute_crc(const u8 *vaddr,
>> -			    const struct vkms_frame_info *frame_info)
>> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
>> +				struct line_buffer *stage_buffer,
>> +				struct line_buffer *output_buffer)
>>   {
>> -	int x, y;
>> -	u32 crc = 0, pixel = 0;
>> -	int x_src = frame_info->src.x1 >> 16;
>> -	int y_src = frame_info->src.y1 >> 16;
>> -	int h_src = drm_rect_height(&frame_info->src) >> 16;
>> -	int w_src = drm_rect_width(&frame_info->src) >> 16;
>> -
>> -	for (y = y_src; y < y_src + h_src; ++y) {
>> -		for (x = x_src; x < x_src + w_src; ++x) {
>> -			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
>> -			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
>> -		}
>> +	int x, x_dst = frame_info->dst.x1;
>> +	int x_limit = drm_rect_width(&frame_info->dst);
>> +	struct line_buffer *out = output_buffer + x_dst;
>> +	struct line_buffer *in = stage_buffer;
>> +
>> +	for (x = 0; x < x_limit; x++) {
>> +		out[x].a = (u16)0xffff;
>> +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
>> +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
>> +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>>   	}
>> -
>> -	return crc;
>>   }
>>   
>> -static u8 blend_channel(u8 src, u8 dst, u8 alpha)
>> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>>   {
>> -	u32 pre_blend;
>> -	u8 new_color;
>> -
>> -	pre_blend = (src * 255 + dst * (255 - alpha));
>> -
>> -	/* Faster div by 255 */
>> -	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
>> +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
>> +		return true;
>>   
>> -	return new_color;
>> +	return false;
>>   }
>>   
>>   /**
>> - * alpha_blend - alpha blending equation
>> - * @argb_src: src pixel on premultiplied alpha mode
>> - * @argb_dst: dst pixel completely opaque
>> - *
>> - * blend pixels using premultiplied blend formula. The current DRM assumption
>> - * is that pixel color values have been already pre-multiplied with the alpha
>> - * channel values. See more drm_plane_create_blend_mode_property(). Also, this
>> - * formula assumes a completely opaque background.
>> - */
>> -static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
>> -{
>> -	u8 alpha;
>> -
>> -	alpha = argb_src[3];
>> -	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
>> -	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
>> -	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
>> -}
>> -
>> -/**
>> - * x_blend - blending equation that ignores the pixel alpha
>> - *
>> - * overwrites RGB color value from src pixel to dst pixel.
>> - */
>> -static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>> -{
>> -	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
>> -}
>> -
>> -/**
>> - * blend - blend value at vaddr_src with value at vaddr_dst
>> - * @vaddr_dst: destination address
>> - * @vaddr_src: source address
>> - * @dst_frame_info: destination framebuffer's metadata
>> - * @src_frame_info: source framebuffer's metadata
>> - * @pixel_blend: blending equation based on plane format
>> + * @wb_frame_info: The writeback frame buffer metadata
>> + * @wb_fmt_func: The format tranformatio function to the wb buffer
>> + * @crtc_state: The crtc state
>> + * @plane_fmt_func: A format tranformation function to each plane
>> + * @crc32: The crc output of the final frame
>> + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
>> + * @stage_buffer: The line with the pixels from src_compositor
>>    *
>> - * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
>> - * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
>> - * and clearing alpha channel to an completely opaque background. This function
>> - * uses buffer's metadata to locate the new composite values at vaddr_dst.
>> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
>> + * from all planes, calculates the crc32 of the output from the former step,
>> + * and, if necessary, convert and store the output to the writeback buffer.
>>    *
>>    * TODO: completely clear the primary plane (a = 0xff) before starting to blend
>>    * pixel color values
>>    */
>> -static void blend(void *vaddr_dst, void *vaddr_src,
>> -		  struct vkms_frame_info *dst_frame_info,
>> -		  struct vkms_frame_info *src_frame_info,
>> -		  void (*pixel_blend)(const u8 *, u8 *))
>> +static void blend(struct vkms_frame_info *wb_frame_info,
>> +		  format_transform_func wb_fmt_func,
>> +		  struct vkms_crtc_state *crtc_state,
>> +		  format_transform_func *plane_fmt_func,
>> +		  u32 *crc32, struct line_buffer *stage_buffer,
>> +		  struct line_buffer *output_buffer, s64 row_size)
>>   {
>> -	int i, j, j_dst, i_dst;
>> -	int offset_src, offset_dst;
>> -	u8 *pixel_dst, *pixel_src;
>> -
>> -	int x_src = src_frame_info->src.x1 >> 16;
>> -	int y_src = src_frame_info->src.y1 >> 16;
>> -
>> -	int x_dst = src_frame_info->dst.x1;
>> -	int y_dst = src_frame_info->dst.y1;
>> -	int h_dst = drm_rect_height(&src_frame_info->dst);
>> -	int w_dst = drm_rect_width(&src_frame_info->dst);
>> +	struct vkms_plane_state **plane = crtc_state->active_planes;
>> +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
>> +	u32 n_active_planes = crtc_state->num_active_planes;
>>   
>> +	int y_src = primary_plane_info->dst.y1;
>> +	int h_dst = drm_rect_height(&primary_plane_info->dst);
>>   	int y_limit = y_src + h_dst;
>> -	int x_limit = x_src + w_dst;
>> -
>> -	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
>> -		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
>> -			offset_dst = dst_frame_info->offset
>> -				     + (i_dst * dst_frame_info->pitch)
>> -				     + (j_dst++ * dst_frame_info->cpp);
>> -			offset_src = src_frame_info->offset
>> -				     + (i * src_frame_info->pitch)
>> -				     + (j * src_frame_info->cpp);
>> -
>> -			pixel_src = (u8 *)(vaddr_src + offset_src);
>> -			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
>> -			pixel_blend(pixel_src, pixel_dst);
>> -			/* clearing alpha channel (0xff)*/
>> -			pixel_dst[3] = 0xff;
>> +	int y, i;
>> +
>> +	for (y = y_src; y < y_limit; y++) {
>> +		plane_fmt_func[0](primary_plane_info, y, output_buffer);
>> +
>> +		/* If there are other planes besides primary, we consider the active
>> +		 * planes should be in z-order and compose them associatively:
>> +		 * ((primary <- overlay) <- cursor)
>> +		 */
>> +		for (i = 1; i < n_active_planes; i++) {
>> +			if (!check_y_limit(plane[i]->frame_info, y))
>> +				continue;
>> +
>> +			plane_fmt_func[i](plane[i]->frame_info, y, stage_buffer);
>> +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
>> +					    output_buffer);
>>   		}
>> -		i_dst++;
>> +
>> +		*crc32 = crc32_le(*crc32, (void *)output_buffer, row_size);
>> +
>> +		if (wb_frame_info)
>> +			wb_fmt_func(wb_frame_info, y, output_buffer);
>>   	}
>>   }
>>   
>> -static void compose_plane(struct vkms_frame_info *primary_plane_info,
>> -			  struct vkms_frame_info *plane_frame_info,
>> -			  void *vaddr_out)
>> +static void get_format_transform_functions(struct vkms_crtc_state *crtc_state,
>> +					   format_transform_func plane_funcs[])
>>   {
>> -	struct drm_framebuffer *fb = plane_frame_info->fb;
>> -	void *vaddr;
>> -	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>> +	struct vkms_plane_state **active_planes = crtc_state->active_planes;
>> +	u32 n_active_planes = crtc_state->num_active_planes, s_fmt;
>> +	int i;
>>   
>> -	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>> -		return;
>> +	for (i = 0; i < n_active_planes; i++) {
>> +		s_fmt = active_planes[i]->frame_info->fb->format->format;
>> +		plane_funcs[i] = get_fmt_transform_function(s_fmt);
>> +	}
>> +}
>>   
>> -	vaddr = plane_frame_info->map[0].vaddr;
>> +static bool check_planes_x_bounds(struct vkms_crtc_state *crtc_state,
>> +				  struct vkms_frame_info *wb_frame_info)
>> +{
>> +	struct vkms_plane_state **planes = crtc_state->active_planes;
>> +	struct vkms_frame_info *primary_plane_info = planes[0]->frame_info;
>> +	int line_width = drm_rect_width(&primary_plane_info->dst);
>> +	u32 n_active_planes = crtc_state->num_active_planes;
>> +	int i;
>>   
>> -	if (fb->format->format == DRM_FORMAT_ARGB8888)
>> -		pixel_blend = &alpha_blend;
>> -	else
>> -		pixel_blend = &x_blend;
>> +	for (i = 0; i < n_active_planes; i++) {
>> +		int x_dst = planes[i]->frame_info->dst.x1;
>> +		int x_src = planes[i]->frame_info->src.x1 >> 16;
>> +		int x2_src = planes[i]->frame_info->src.x2 >> 16;
>> +		int x_limit = drm_rect_width(&planes[i]->frame_info->dst);
>>   
>> -	blend(vaddr_out, vaddr, primary_plane_info,
>> -	      plane_frame_info, pixel_blend);
>> +		if (x_dst + x_limit > line_width)
>> +			return false;
>> +		if (x_src + x_limit > x2_src)
>> +			return false;
>> +	}
>> +
>> +	return true;
>>   }
>>   
>> -static int compose_active_planes(void **vaddr_out,
>> -				 struct vkms_frame_info *primary_plane_info,
>> -				 struct vkms_crtc_state *crtc_state)
>> +static int compose_active_planes(struct vkms_frame_info *wb_frame_info,
>> +				 struct vkms_crtc_state *crtc_state,
>> +				 u32 *crc32)
>>   {
>> -	struct drm_framebuffer *fb = primary_plane_info->fb;
>> -	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>> -	const void *vaddr;
>> -	int i;
>> +	format_transform_func plane_funcs[NUM_OVERLAY_PLANES], wb_func = NULL;
>> +	int line_width, ret = 0, pixel_size = sizeof(struct line_buffer);
>> +	struct vkms_frame_info *primary_plane_info = NULL;
>> +	struct line_buffer *output_buffer, *stage_buffer;
>> +	struct vkms_plane_state *act_plane = NULL;
>> +	u32 wb_format;
>>   
>> -	if (!*vaddr_out) {
>> -		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
>> -		if (!*vaddr_out) {
>> -			DRM_ERROR("Cannot allocate memory for output frame.");
>> -			return -ENOMEM;
>> -		}
>> +	if (WARN_ON(pixel_size != 8))
>> +		return -EINVAL;
>> +
>> +	if (crtc_state->num_active_planes >= 1) {
>> +		act_plane = crtc_state->active_planes[0];
>> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>> +			primary_plane_info = act_plane->frame_info;
>>   	}
>>   
>> +	if (!primary_plane_info)
>> +		return -EINVAL;
>> +
>>   	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>>   		return -EINVAL;
>>   
>> -	vaddr = primary_plane_info->map[0].vaddr;
>> +	if (WARN_ON(!check_planes_x_bounds(crtc_state, wb_frame_info)))
>> +		return -EINVAL;
>>   
>> -	memcpy(*vaddr_out, vaddr, gem_obj->size);
>> +	line_width = drm_rect_width(&primary_plane_info->dst);
>>   
>> -	/* If there are other planes besides primary, we consider the active
>> -	 * planes should be in z-order and compose them associatively:
>> -	 * ((primary <- overlay) <- cursor)
>> -	 */
>> -	for (i = 1; i < crtc_state->num_active_planes; i++)
>> -		compose_plane(primary_plane_info,
>> -			      crtc_state->active_planes[i]->frame_info,
>> -			      *vaddr_out);
>> +	stage_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>> +	if (!stage_buffer) {
>> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
>> +		return -ENOMEM;
>> +	}
>>   
>> -	return 0;
>> +	output_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>> +	if (!output_buffer) {
>> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
>> +		ret = -ENOMEM;
>> +		goto free_stage_buffer;
>> +	}
>> +
>> +	get_format_transform_functions(crtc_state, plane_funcs);
>> +
>> +	if (wb_frame_info) {
>> +		wb_format = wb_frame_info->fb->format->format;
>> +		wb_func = get_wb_fmt_transform_function(wb_format);
>> +		wb_frame_info->src = primary_plane_info->src;
>> +		wb_frame_info->dst = primary_plane_info->dst;
>> +	}
>> +
>> +	blend(wb_frame_info, wb_func, crtc_state, plane_funcs, crc32,
>> +	      stage_buffer, output_buffer, (s64)line_width * pixel_size);
>> +
>> +	kvfree(output_buffer);
>> +free_stage_buffer:
>> +	kvfree(stage_buffer);
>> +
>> +	return ret;
>>   }
>>   
>>   /**
>> @@ -222,13 +230,12 @@ void vkms_composer_worker(struct work_struct *work)
>>   						struct vkms_crtc_state,
>>   						composer_work);
>>   	struct drm_crtc *crtc = crtc_state->base.crtc;
>> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>> +	struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>>   	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>> -	struct vkms_frame_info *primary_plane_info = NULL;
>> -	struct vkms_plane_state *act_plane = NULL;
>>   	bool crc_pending, wb_pending;
>> -	void *vaddr_out = NULL;
>> -	u32 crc32 = 0;
>>   	u64 frame_start, frame_end;
>> +	u32 crc32 = 0;
>>   	int ret;
>>   
>>   	spin_lock_irq(&out->composer_lock);
>> @@ -248,35 +255,19 @@ void vkms_composer_worker(struct work_struct *work)
>>   	if (!crc_pending)
>>   		return;
>>   
>> -	if (crtc_state->num_active_planes >= 1) {
>> -		act_plane = crtc_state->active_planes[0];
>> -		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>> -			primary_plane_info = act_plane->frame_info;
>> -	}
>> -
>> -	if (!primary_plane_info)
>> -		return;
>> -
>>   	if (wb_pending)
>> -		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
>> +		ret = compose_active_planes(wb_frame_info, crtc_state, &crc32);
>> +	else
>> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>>   
>> -	ret = compose_active_planes(&vaddr_out, primary_plane_info,
>> -				    crtc_state);
>> -	if (ret) {
>> -		if (ret == -EINVAL && !wb_pending)
>> -			kvfree(vaddr_out);
>> +	if (ret)
>>   		return;
>> -	}
>> -
>> -	crc32 = compute_crc(vaddr_out, primary_plane_info);
>>   
>>   	if (wb_pending) {
>>   		drm_writeback_signal_completion(&out->wb_connector, 0);
>>   		spin_lock_irq(&out->composer_lock);
>>   		crtc_state->wb_pending = false;
>>   		spin_unlock_irq(&out->composer_lock);
>> -	} else {
>> -		kvfree(vaddr_out);
>>   	}
>>   
>>   	/*
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>> new file mode 100644
>> index 000000000000..0d1838d1b835
>> --- /dev/null
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>> @@ -0,0 +1,138 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ */
> checkpatch complains here ^ Use `\\`

I change it, but:

WARNING: Improper SPDX comment style for 
'drivers/gpu/drm/vkms/vkms_formats.h', please use '/*' instead
#660: FILE: drivers/gpu/drm/vkms/vkms_formats.h:1:
+// SPDX-License-Identifier: GPL-2.0+

I keep the change to be consitent with the rest of the vkms files.

>> +
>> +#include <drm/drm_rect.h>
>> +#include "vkms_formats.h"
>> +
>> +format_transform_func get_fmt_transform_function(u32 format)
>> +{
>> +	if (format == DRM_FORMAT_ARGB8888)
>> +		return &ARGB8888_to_ARGB16161616;
>> +	else
>> +		return &XRGB8888_to_ARGB16161616;
>> +}
>> +
>> +format_transform_func get_wb_fmt_transform_function(u32 format)
>> +{
>> +	if (format == DRM_FORMAT_ARGB8888)
>> +		return &convert_to_ARGB8888;
>> +	else
>> +		return &convert_to_XRGB8888;
>> +}
>> +
>> +static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
>> +{
>> +	return frame_info->offset + (y * frame_info->pitch)
>> +				  + (x * frame_info->cpp);
>> +}
>> +
>> +/*
>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>> + *
>> + * @frame_info: Buffer metadata
>> + * @x: The x(width) coordinate of the 2D buffer
>> + * @y: The y(Heigth) coordinate of the 2D buffer
>> + *
>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>> + * returns the address of the first color channel.
>> + * This function assumes the channels are packed together, i.e. a color channel
>> + * comes immediately after another in the memory. And therefore, this function
>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>> + */
>> +static void *packed_pixels_addr(struct vkms_frame_info *frame_info, int x, int y)
>> +{
>> +	int offset = pixel_offset(frame_info, x, y);
>> +
>> +	return (u8 *)frame_info->map[0].vaddr + offset;
>> +}
>> +
>> +static void *get_packed_src_addr(struct vkms_frame_info *frame_info, int y)
>> +{
>> +	int x_src = frame_info->src.x1 >> 16;
>> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>> +
>> +	return packed_pixels_addr(frame_info, x_src, y_src);
>> +}
>> +
>> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>> +			      struct line_buffer *stage_buffer)
>> +{
>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>> +	int x, x_limit = drm_rect_width(&frame_info->dst);
>> +
>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>> +		/*
>> +		 * Organizes the channels in their respective positions and converts
>> +		 * the 8 bits channel to 16.
>> +		 * The 257 is the "conversion ratio". This number is obtained by the
>> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
>> +		 * the best color value in a pixel format with more possibilities.
>> +		 * And a similar idea applies to others RGB color conversions.
>> +		 */
>> +		stage_buffer[x].a = (u16)src_pixels[3] * 257;
>> +		stage_buffer[x].r = (u16)src_pixels[2] * 257;
>> +		stage_buffer[x].g = (u16)src_pixels[1] * 257;
>> +		stage_buffer[x].b = (u16)src_pixels[0] * 257;
>> +	}
>> +}
>> +
>> +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>> +			      struct line_buffer *stage_buffer)
>> +{
>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>> +	int x, x_limit = drm_rect_width(&frame_info->dst);
>> +
>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>> +		stage_buffer[x].a = (u16)0xffff;
>> +		stage_buffer[x].r = (u16)src_pixels[2] * 257;
>> +		stage_buffer[x].g = (u16)src_pixels[1] * 257;
>> +		stage_buffer[x].b = (u16)src_pixels[0] * 257;
>> +	}
>> +}
>> +
>> +/*
>> + * The following  functions take an line of ARGB16161616 pixels from the
>> + * src_buffer, convert them to a specific format, and store them in the
>> + * destination.
>> + *
>> + * They are used in the `compose_active_planes` to convert and store a line
>> + * from the src_buffer to the writeback buffer.
>> + */
>> +void convert_to_ARGB8888(struct vkms_frame_info *frame_info,
>> +			 int y, struct line_buffer *src_buffer)
>> +{
>> +	int x, x_dst = frame_info->dst.x1;
>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>> +	int x_limit = drm_rect_width(&frame_info->dst);
>> +
>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>> +		/*
>> +		 * This sequence below is important because the format's byte order is
>> +		 * in little-endian. In the case of the ARGB8888 the memory is
>> +		 * organized this way:
>> +		 *
>> +		 * | Addr     | = blue channel
>> +		 * | Addr + 1 | = green channel
>> +		 * | Addr + 2 | = Red channel
>> +		 * | Addr + 3 | = Alpha channel
>> +		 */
>> +		dst_pixels[3] = DIV_ROUND_UP(src_buffer[x].a, 257);
>> +		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
>> +		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
>> +		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
>> +	}
>> +}
>> +
>> +void convert_to_XRGB8888(struct vkms_frame_info *frame_info,
>> +			 int y, struct line_buffer *src_buffer)
>> +{
>> +	int x, x_dst = frame_info->dst.x1;
>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>> +	int x_limit = drm_rect_width(&frame_info->dst);
>> +
>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>> +		dst_pixels[3] = (u8)0xff;
>> +		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
>> +		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
>> +		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
>> +	}
>> +}
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
>> new file mode 100644
>> index 000000000000..817e8b2124ae
>> --- /dev/null
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
>> @@ -0,0 +1,31 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ */
> and here ^
> 
>> +
>> +#ifndef _VKMS_FORMATS_H_
>> +#define _VKMS_FORMATS_H_
>> +
>> +#include "vkms_drv.h"
>> +
>> +struct line_buffer {
>> +	u16 a, r, g, b;
>> +};
>> +
>> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>> +			      struct line_buffer *stage_buffer);
>> +
>> +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>> +			      struct line_buffer *stage_buffer);
>> +
>> +void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
>> +			 struct line_buffer *src_buffer);
>> +
>> +void convert_to_XRGB8888(struct vkms_frame_info *frame_info, int y,
>> +			 struct line_buffer *src_buffer);
>> +
>> +typedef void (*format_transform_func)(struct vkms_frame_info *frame_info, int y,
>> +				      struct line_buffer *buffer);
>> +
>> +format_transform_func get_fmt_transform_function(u32 format);
>> +
>> +format_transform_func get_wb_fmt_transform_function(u32 format);
>> +
>> +#endif /* _VKMS_FORMATS_H_ */
>> -- 
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-02-09  0:58     ` Igor Torrente
@ 2022-02-09 21:45       ` Melissa Wen
  2022-02-21  1:02         ` Igor Torrente
  0 siblings, 1 reply; 31+ messages in thread
From: Melissa Wen @ 2022-02-09 21:45 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	tzimmermann, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 30862 bytes --]

On 02/08, Igor Torrente wrote:
> Hi Melissa,
> 
> On 2/8/22 07:40, Melissa Wen wrote:
> > On 01/21, Igor Torrente wrote:
> > > Currently the blend function only accepts XRGB_8888 and ARGB_8888
> > > as a color input.
> > > 
> > > This patch refactors all the functions related to the plane composition
> > > to overcome this limitation.
> > > 
> > > A new internal format(`struct pixel`) is introduced to deal with all
> > > possible inputs. It consists of 16 bits fields that represent each of
> > > the channels.
> > > 
> > > The pixels blend is done using this internal format. And new handlers
> > > are being added to convert a specific format to/from this internal format.
> > > 
> > > So the blend operation depends on these handlers to convert to this common
> > > format. The blended result, if necessary, is converted to the writeback
> > > buffer format.
> > > 
> > > This patch introduces three major differences to the blend function.
> > > 1 - All the planes are blended at once.
> > > 2 - The blend calculus is done as per line instead of per pixel.
> > > 3 - It is responsible to calculates the CRC and writing the writeback
> > >      buffer(if necessary).
> > > 
> > > These changes allow us to allocate way less memory in the intermediate
> > > buffer to compute these operations. Because now we don't need to
> > > have the entire intermediate image lines at once, just one line is
> > > enough.
> > > 
> > > | Memory consumption (output dimensions) |
> > > |:--------------------------------------:|
> > > |       Current      |     This patch    |
> > > |:------------------:|:-----------------:|
> > > |   Width * Heigth   |     2 * Width     |
> > > 
> > > Beyond memory, we also have a minor performance benefit from all
> > > these changes. Results running the IGT tests `*kms_cursor_crc*`:
> > > 
> > First, thanks for this improvement.
> > 
> > Some recent changes in kms_cursor_crc caused VKMS to fail in most test
> > cases (iirc, only size-change and alpha-opaque are passing currently).
> 
> I updated my igt and kernel(from drm_misc/drm-misc-next) to the latest
> commit[1][2] and I'm getting mixed results. Sometimes most of the test
> passes, sometimes almost nothing passes.
hmm.. is it happening when running kms_cursor_crc? Is the results
variation random or is it possible to follow a set of steps to reproduce
it? When failing, what is the reason displayed by the log?

From my side, only the first two subtest of kms_cursor_crc is passing
before this patch. And after your changes here, all subtests are
successful again, except those related to 32x10 cursor size (that needs
futher investigation). I didn't check how the recent changes in
kms_cursor_crc affect VKMS performance on it, but I bet that clearing
the alpha channel is the reason to have the performance back.
> 
> [1] a96674e7 (tests/api_intel_bb: Handle different alignments in
> delta-check)
> [2] b21a142fd205 (drm/nouveau/backlight: Just set all backlight types as
> RAW)
> 
> > But saying that performance improvement here would cause a
> > misunderstanding when reviewing the change history. Can you update this
> > statistics here? I think you can specify the IGT hash to specify the
> > test case version or you can pick another test for comparison.
> 
> OK, I will do both.
> 
> > > |                 Frametime                  |
> > > |:------------------------------------------:|
> > > |  Implementation |  Current  |  This commit |
> > > |:---------------:|:---------:|:------------:|
> > > | frametime range |  8~22 ms  |    5~18 ms   |
> > > |     Average     |  10.0 ms  |    7.3 ms    |
> > > 
> > > Reported-by: kernel test robot <lkp@intel.com>
> > A little confusing for me to have this reported-by tag without any
> > explanation of what was reported and fixed. Can you specify it?
> > > Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> > > ---
> > > V2: Improves the performance drastically, by perfoming the operations
> > >      per-line and not per-pixel(Pekka Paalanen).
> > >      Minor improvements(Pekka Paalanen).
> > > 
> > > V3: Changes the code to blend the planes all at once. This improves
> > >      performance, memory consumption, and removes much of the weirdness
> > >      of the V2(Pekka Paalanen and me).
> > >      Minor improvements(Pekka Paalanen and me).
> > > 
> > > V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
> > Can you move version changes up so that they are not ignored?
> > 
> > I also pointed out minor code style issue below.
> > With these comments addressed, you can add my r-b tag in the next
> > version.
> > > ---
> > >   drivers/gpu/drm/vkms/Makefile        |   1 +
> > >   drivers/gpu/drm/vkms/vkms_composer.c | 335 +++++++++++++--------------
> > >   drivers/gpu/drm/vkms/vkms_formats.c  | 138 +++++++++++
> > >   drivers/gpu/drm/vkms/vkms_formats.h  |  31 +++
> > >   4 files changed, 333 insertions(+), 172 deletions(-)
> > >   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
> > >   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> > > 
> > > diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
> > > index 72f779cbfedd..1b28a6a32948 100644
> > > --- a/drivers/gpu/drm/vkms/Makefile
> > > +++ b/drivers/gpu/drm/vkms/Makefile
> > > @@ -3,6 +3,7 @@ vkms-y := \
> > >   	vkms_drv.o \
> > >   	vkms_plane.o \
> > >   	vkms_output.o \
> > > +	vkms_formats.o \
> > >   	vkms_crtc.o \
> > >   	vkms_composer.o \
> > >   	vkms_writeback.o
> > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > > index 95029d2ebcac..9f70fcf84fb9 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > > @@ -9,202 +9,210 @@
> > >   #include <drm/drm_vblank.h>
> > >   #include "vkms_drv.h"
> > > +#include "vkms_formats.h"
> > > -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> > > -				 const struct vkms_frame_info *frame_info)
> > > +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
> > >   {
> > > -	u32 pixel;
> > > -	int src_offset = frame_info->offset + (y * frame_info->pitch)
> > > -					    + (x * frame_info->cpp);
> > > +	u32 new_color;
> > > -	pixel = *(u32 *)&buffer[src_offset];
> > > +	new_color = (src * 0xffff + dst * (0xffff - alpha));
> > > -	return pixel;
> > > +	return DIV_ROUND_UP(new_color, 0xffff);
> > >   }
> > >   /**
> > > - * compute_crc - Compute CRC value on output frame
> > > + * pre_mul_alpha_blend - alpha blending equation
> > > + * @src_frame_info: source framebuffer's metadata
> > > + * @stage_buffer: The line with the pixels from src_plane
> > > + * @output_buffer: A line buffer that receives all the blends output
> > >    *
> > > - * @vaddr: address to final framebuffer
> > > - * @frame_info: framebuffer's metadata
> > > + * Using the information from the `frame_info`, this blends only the
> > > + * necessary pixels from the `stage_buffer` to the `output_buffer`
> > > + * using premultiplied blend formula.
> > >    *
> > > - * returns CRC value computed using crc32 on the visible portion of
> > > - * the final framebuffer at vaddr_out
> > > + * The current DRM assumption is that pixel color values have been already
> > > + * pre-multiplied with the alpha channel values. See more
> > > + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
> > > + * completely opaque background.
> > >    */
> > > -static uint32_t compute_crc(const u8 *vaddr,
> > > -			    const struct vkms_frame_info *frame_info)
> > > +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> > > +				struct line_buffer *stage_buffer,
> > > +				struct line_buffer *output_buffer)
> > >   {
> > > -	int x, y;
> > > -	u32 crc = 0, pixel = 0;
> > > -	int x_src = frame_info->src.x1 >> 16;
> > > -	int y_src = frame_info->src.y1 >> 16;
> > > -	int h_src = drm_rect_height(&frame_info->src) >> 16;
> > > -	int w_src = drm_rect_width(&frame_info->src) >> 16;
> > > -
> > > -	for (y = y_src; y < y_src + h_src; ++y) {
> > > -		for (x = x_src; x < x_src + w_src; ++x) {
> > > -			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
> > > -			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
> > > -		}
> > > +	int x, x_dst = frame_info->dst.x1;
> > > +	int x_limit = drm_rect_width(&frame_info->dst);
> > > +	struct line_buffer *out = output_buffer + x_dst;
> > > +	struct line_buffer *in = stage_buffer;
> > > +
> > > +	for (x = 0; x < x_limit; x++) {
> > > +		out[x].a = (u16)0xffff;
> > > +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> > > +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> > > +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
> > >   	}
> > > -
> > > -	return crc;
> > >   }
> > > -static u8 blend_channel(u8 src, u8 dst, u8 alpha)
> > > +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
> > >   {
> > > -	u32 pre_blend;
> > > -	u8 new_color;
> > > -
> > > -	pre_blend = (src * 255 + dst * (255 - alpha));
> > > -
> > > -	/* Faster div by 255 */
> > > -	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
> > > +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
> > > +		return true;
> > > -	return new_color;
> > > +	return false;
> > >   }
> > >   /**
> > > - * alpha_blend - alpha blending equation
> > > - * @argb_src: src pixel on premultiplied alpha mode
> > > - * @argb_dst: dst pixel completely opaque
> > > - *
> > > - * blend pixels using premultiplied blend formula. The current DRM assumption
> > > - * is that pixel color values have been already pre-multiplied with the alpha
> > > - * channel values. See more drm_plane_create_blend_mode_property(). Also, this
> > > - * formula assumes a completely opaque background.
> > > - */
> > > -static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
> > > -{
> > > -	u8 alpha;
> > > -
> > > -	alpha = argb_src[3];
> > > -	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
> > > -	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
> > > -	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
> > > -}
> > > -
> > > -/**
> > > - * x_blend - blending equation that ignores the pixel alpha
> > > - *
> > > - * overwrites RGB color value from src pixel to dst pixel.
> > > - */
> > > -static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
> > > -{
> > > -	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
> > > -}
> > > -
> > > -/**
> > > - * blend - blend value at vaddr_src with value at vaddr_dst
> > > - * @vaddr_dst: destination address
> > > - * @vaddr_src: source address
> > > - * @dst_frame_info: destination framebuffer's metadata
> > > - * @src_frame_info: source framebuffer's metadata
> > > - * @pixel_blend: blending equation based on plane format
> > > + * @wb_frame_info: The writeback frame buffer metadata
> > > + * @wb_fmt_func: The format tranformatio function to the wb buffer
> > > + * @crtc_state: The crtc state
> > > + * @plane_fmt_func: A format tranformation function to each plane
> > > + * @crc32: The crc output of the final frame
> > > + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
> > > + * @stage_buffer: The line with the pixels from src_compositor
> > >    *
> > > - * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
> > > - * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
> > > - * and clearing alpha channel to an completely opaque background. This function
> > > - * uses buffer's metadata to locate the new composite values at vaddr_dst.
> > > + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
> > > + * from all planes, calculates the crc32 of the output from the former step,
> > > + * and, if necessary, convert and store the output to the writeback buffer.
> > >    *
> > >    * TODO: completely clear the primary plane (a = 0xff) before starting to blend
> > >    * pixel color values
> > >    */
> > > -static void blend(void *vaddr_dst, void *vaddr_src,
> > > -		  struct vkms_frame_info *dst_frame_info,
> > > -		  struct vkms_frame_info *src_frame_info,
> > > -		  void (*pixel_blend)(const u8 *, u8 *))
> > > +static void blend(struct vkms_frame_info *wb_frame_info,
> > > +		  format_transform_func wb_fmt_func,
> > > +		  struct vkms_crtc_state *crtc_state,
> > > +		  format_transform_func *plane_fmt_func,
> > > +		  u32 *crc32, struct line_buffer *stage_buffer,
> > > +		  struct line_buffer *output_buffer, s64 row_size)
> > >   {
> > > -	int i, j, j_dst, i_dst;
> > > -	int offset_src, offset_dst;
> > > -	u8 *pixel_dst, *pixel_src;
> > > -
> > > -	int x_src = src_frame_info->src.x1 >> 16;
> > > -	int y_src = src_frame_info->src.y1 >> 16;
> > > -
> > > -	int x_dst = src_frame_info->dst.x1;
> > > -	int y_dst = src_frame_info->dst.y1;
> > > -	int h_dst = drm_rect_height(&src_frame_info->dst);
> > > -	int w_dst = drm_rect_width(&src_frame_info->dst);
> > > +	struct vkms_plane_state **plane = crtc_state->active_planes;
> > > +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
> > > +	u32 n_active_planes = crtc_state->num_active_planes;
> > > +	int y_src = primary_plane_info->dst.y1;
> > > +	int h_dst = drm_rect_height(&primary_plane_info->dst);
> > >   	int y_limit = y_src + h_dst;
> > > -	int x_limit = x_src + w_dst;
> > > -
> > > -	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
> > > -		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
> > > -			offset_dst = dst_frame_info->offset
> > > -				     + (i_dst * dst_frame_info->pitch)
> > > -				     + (j_dst++ * dst_frame_info->cpp);
> > > -			offset_src = src_frame_info->offset
> > > -				     + (i * src_frame_info->pitch)
> > > -				     + (j * src_frame_info->cpp);
> > > -
> > > -			pixel_src = (u8 *)(vaddr_src + offset_src);
> > > -			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
> > > -			pixel_blend(pixel_src, pixel_dst);
> > > -			/* clearing alpha channel (0xff)*/
> > > -			pixel_dst[3] = 0xff;
> > > +	int y, i;
> > > +
> > > +	for (y = y_src; y < y_limit; y++) {
> > > +		plane_fmt_func[0](primary_plane_info, y, output_buffer);
> > > +
> > > +		/* If there are other planes besides primary, we consider the active
> > > +		 * planes should be in z-order and compose them associatively:
> > > +		 * ((primary <- overlay) <- cursor)
> > > +		 */
> > > +		for (i = 1; i < n_active_planes; i++) {
> > > +			if (!check_y_limit(plane[i]->frame_info, y))
> > > +				continue;
> > > +
> > > +			plane_fmt_func[i](plane[i]->frame_info, y, stage_buffer);
> > > +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> > > +					    output_buffer);
> > >   		}
> > > -		i_dst++;
> > > +
> > > +		*crc32 = crc32_le(*crc32, (void *)output_buffer, row_size);
> > > +
> > > +		if (wb_frame_info)
> > > +			wb_fmt_func(wb_frame_info, y, output_buffer);
> > >   	}
> > >   }
> > > -static void compose_plane(struct vkms_frame_info *primary_plane_info,
> > > -			  struct vkms_frame_info *plane_frame_info,
> > > -			  void *vaddr_out)
> > > +static void get_format_transform_functions(struct vkms_crtc_state *crtc_state,
> > > +					   format_transform_func plane_funcs[])
> > >   {
> > > -	struct drm_framebuffer *fb = plane_frame_info->fb;
> > > -	void *vaddr;
> > > -	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
> > > +	struct vkms_plane_state **active_planes = crtc_state->active_planes;
> > > +	u32 n_active_planes = crtc_state->num_active_planes, s_fmt;
> > > +	int i;
> > > -	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
> > > -		return;
> > > +	for (i = 0; i < n_active_planes; i++) {
> > > +		s_fmt = active_planes[i]->frame_info->fb->format->format;
> > > +		plane_funcs[i] = get_fmt_transform_function(s_fmt);
> > > +	}
> > > +}
> > > -	vaddr = plane_frame_info->map[0].vaddr;
> > > +static bool check_planes_x_bounds(struct vkms_crtc_state *crtc_state,
> > > +				  struct vkms_frame_info *wb_frame_info)
> > > +{
> > > +	struct vkms_plane_state **planes = crtc_state->active_planes;
> > > +	struct vkms_frame_info *primary_plane_info = planes[0]->frame_info;
> > > +	int line_width = drm_rect_width(&primary_plane_info->dst);
> > > +	u32 n_active_planes = crtc_state->num_active_planes;
> > > +	int i;
> > > -	if (fb->format->format == DRM_FORMAT_ARGB8888)
> > > -		pixel_blend = &alpha_blend;
> > > -	else
> > > -		pixel_blend = &x_blend;
> > > +	for (i = 0; i < n_active_planes; i++) {
> > > +		int x_dst = planes[i]->frame_info->dst.x1;
> > > +		int x_src = planes[i]->frame_info->src.x1 >> 16;
> > > +		int x2_src = planes[i]->frame_info->src.x2 >> 16;
> > > +		int x_limit = drm_rect_width(&planes[i]->frame_info->dst);
> > > -	blend(vaddr_out, vaddr, primary_plane_info,
> > > -	      plane_frame_info, pixel_blend);
> > > +		if (x_dst + x_limit > line_width)
> > > +			return false;
> > > +		if (x_src + x_limit > x2_src)
> > > +			return false;
> > > +	}
> > > +
> > > +	return true;
> > >   }
> > > -static int compose_active_planes(void **vaddr_out,
> > > -				 struct vkms_frame_info *primary_plane_info,
> > > -				 struct vkms_crtc_state *crtc_state)
> > > +static int compose_active_planes(struct vkms_frame_info *wb_frame_info,
> > > +				 struct vkms_crtc_state *crtc_state,
> > > +				 u32 *crc32)
> > >   {
> > > -	struct drm_framebuffer *fb = primary_plane_info->fb;
> > > -	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
> > > -	const void *vaddr;
> > > -	int i;
> > > +	format_transform_func plane_funcs[NUM_OVERLAY_PLANES], wb_func = NULL;
> > > +	int line_width, ret = 0, pixel_size = sizeof(struct line_buffer);
> > > +	struct vkms_frame_info *primary_plane_info = NULL;
> > > +	struct line_buffer *output_buffer, *stage_buffer;
> > > +	struct vkms_plane_state *act_plane = NULL;
> > > +	u32 wb_format;
> > > -	if (!*vaddr_out) {
> > > -		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
> > > -		if (!*vaddr_out) {
> > > -			DRM_ERROR("Cannot allocate memory for output frame.");
> > > -			return -ENOMEM;
> > > -		}
> > > +	if (WARN_ON(pixel_size != 8))
> > > +		return -EINVAL;
> > > +
> > > +	if (crtc_state->num_active_planes >= 1) {
> > > +		act_plane = crtc_state->active_planes[0];
> > > +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> > > +			primary_plane_info = act_plane->frame_info;
> > >   	}
> > > +	if (!primary_plane_info)
> > > +		return -EINVAL;
> > > +
> > >   	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
> > >   		return -EINVAL;
> > > -	vaddr = primary_plane_info->map[0].vaddr;
> > > +	if (WARN_ON(!check_planes_x_bounds(crtc_state, wb_frame_info)))
> > > +		return -EINVAL;
> > > -	memcpy(*vaddr_out, vaddr, gem_obj->size);
> > > +	line_width = drm_rect_width(&primary_plane_info->dst);
> > > -	/* If there are other planes besides primary, we consider the active
> > > -	 * planes should be in z-order and compose them associatively:
> > > -	 * ((primary <- overlay) <- cursor)
> > > -	 */
> > > -	for (i = 1; i < crtc_state->num_active_planes; i++)
> > > -		compose_plane(primary_plane_info,
> > > -			      crtc_state->active_planes[i]->frame_info,
> > > -			      *vaddr_out);
> > > +	stage_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> > > +	if (!stage_buffer) {
> > > +		DRM_ERROR("Cannot allocate memory for the output line buffer");
> > > +		return -ENOMEM;
> > > +	}
> > > -	return 0;
> > > +	output_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> > > +	if (!output_buffer) {
> > > +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
> > > +		ret = -ENOMEM;
> > > +		goto free_stage_buffer;
> > > +	}
> > > +
> > > +	get_format_transform_functions(crtc_state, plane_funcs);
> > > +
> > > +	if (wb_frame_info) {
> > > +		wb_format = wb_frame_info->fb->format->format;
> > > +		wb_func = get_wb_fmt_transform_function(wb_format);
> > > +		wb_frame_info->src = primary_plane_info->src;
> > > +		wb_frame_info->dst = primary_plane_info->dst;
> > > +	}
> > > +
> > > +	blend(wb_frame_info, wb_func, crtc_state, plane_funcs, crc32,
> > > +	      stage_buffer, output_buffer, (s64)line_width * pixel_size);
> > > +
> > > +	kvfree(output_buffer);
> > > +free_stage_buffer:
> > > +	kvfree(stage_buffer);
> > > +
> > > +	return ret;
> > >   }
> > >   /**
> > > @@ -222,13 +230,12 @@ void vkms_composer_worker(struct work_struct *work)
> > >   						struct vkms_crtc_state,
> > >   						composer_work);
> > >   	struct drm_crtc *crtc = crtc_state->base.crtc;
> > > +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
> > > +	struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
> > >   	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
> > > -	struct vkms_frame_info *primary_plane_info = NULL;
> > > -	struct vkms_plane_state *act_plane = NULL;
> > >   	bool crc_pending, wb_pending;
> > > -	void *vaddr_out = NULL;
> > > -	u32 crc32 = 0;
> > >   	u64 frame_start, frame_end;
> > > +	u32 crc32 = 0;
> > >   	int ret;
> > >   	spin_lock_irq(&out->composer_lock);
> > > @@ -248,35 +255,19 @@ void vkms_composer_worker(struct work_struct *work)
> > >   	if (!crc_pending)
> > >   		return;
> > > -	if (crtc_state->num_active_planes >= 1) {
> > > -		act_plane = crtc_state->active_planes[0];
> > > -		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> > > -			primary_plane_info = act_plane->frame_info;
> > > -	}
> > > -
> > > -	if (!primary_plane_info)
> > > -		return;
> > > -
> > >   	if (wb_pending)
> > > -		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
> > > +		ret = compose_active_planes(wb_frame_info, crtc_state, &crc32);
> > > +	else
> > > +		ret = compose_active_planes(NULL, crtc_state, &crc32);
> > > -	ret = compose_active_planes(&vaddr_out, primary_plane_info,
> > > -				    crtc_state);
> > > -	if (ret) {
> > > -		if (ret == -EINVAL && !wb_pending)
> > > -			kvfree(vaddr_out);
> > > +	if (ret)
> > >   		return;
> > > -	}
> > > -
> > > -	crc32 = compute_crc(vaddr_out, primary_plane_info);
> > >   	if (wb_pending) {
> > >   		drm_writeback_signal_completion(&out->wb_connector, 0);
> > >   		spin_lock_irq(&out->composer_lock);
> > >   		crtc_state->wb_pending = false;
> > >   		spin_unlock_irq(&out->composer_lock);
> > > -	} else {
> > > -		kvfree(vaddr_out);
> > >   	}
> > >   	/*
> > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > > new file mode 100644
> > > index 000000000000..0d1838d1b835
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > @@ -0,0 +1,138 @@
> > > +/* SPDX-License-Identifier: GPL-2.0+ */
> > checkpatch complains here ^ Use `\\`
> 
> I change it, but:
> 
> WARNING: Improper SPDX comment style for
> 'drivers/gpu/drm/vkms/vkms_formats.h', please use '/*' instead
> #660: FILE: drivers/gpu/drm/vkms/vkms_formats.h:1:
> +// SPDX-License-Identifier: GPL-2.0+
Ok, previously checkpatch was complaining only for `vkms_format.c` but
not for the header. I got it wrong when I pointed to the .h file too,
sorry. I had two points in mind, but the second issue is not here, it is
`multiple blank lines` in the next patch.

btw, you find more details about the comment style for SPDX here:
https://www.kernel.org/doc/html/latest/process/license-rules.html#license-identifier-syntax

> 
> I keep the change to be consitent with the rest of the vkms files.
> 
> > > +
> > > +#include <drm/drm_rect.h>
> > > +#include "vkms_formats.h"
> > > +
> > > +format_transform_func get_fmt_transform_function(u32 format)
> > > +{
> > > +	if (format == DRM_FORMAT_ARGB8888)
> > > +		return &ARGB8888_to_ARGB16161616;
> > > +	else
> > > +		return &XRGB8888_to_ARGB16161616;
> > > +}
> > > +
> > > +format_transform_func get_wb_fmt_transform_function(u32 format)
> > > +{
> > > +	if (format == DRM_FORMAT_ARGB8888)
> > > +		return &convert_to_ARGB8888;
> > > +	else
> > > +		return &convert_to_XRGB8888;
> > > +}
> > > +
> > > +static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
> > > +{
> > > +	return frame_info->offset + (y * frame_info->pitch)
> > > +				  + (x * frame_info->cpp);
> > > +}
> > > +
> > > +/*
> > > + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> > > + *
> > > + * @frame_info: Buffer metadata
> > > + * @x: The x(width) coordinate of the 2D buffer
> > > + * @y: The y(Heigth) coordinate of the 2D buffer
> > > + *
> > > + * Takes the information stored in the frame_info, a pair of coordinates, and
> > > + * returns the address of the first color channel.
> > > + * This function assumes the channels are packed together, i.e. a color channel
> > > + * comes immediately after another in the memory. And therefore, this function
> > > + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> > > + */
> > > +static void *packed_pixels_addr(struct vkms_frame_info *frame_info, int x, int y)
> > > +{
> > > +	int offset = pixel_offset(frame_info, x, y);
> > > +
> > > +	return (u8 *)frame_info->map[0].vaddr + offset;
> > > +}
> > > +
> > > +static void *get_packed_src_addr(struct vkms_frame_info *frame_info, int y)
> > > +{
> > > +	int x_src = frame_info->src.x1 >> 16;
> > > +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
> > > +
> > > +	return packed_pixels_addr(frame_info, x_src, y_src);
> > > +}
> > > +
> > > +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> > > +			      struct line_buffer *stage_buffer)
> > > +{
> > > +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> > > +	int x, x_limit = drm_rect_width(&frame_info->dst);
> > > +
> > > +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> > > +		/*
> > > +		 * Organizes the channels in their respective positions and converts
> > > +		 * the 8 bits channel to 16.
> > > +		 * The 257 is the "conversion ratio". This number is obtained by the
> > > +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
> > > +		 * the best color value in a pixel format with more possibilities.
> > > +		 * And a similar idea applies to others RGB color conversions.
> > > +		 */
> > > +		stage_buffer[x].a = (u16)src_pixels[3] * 257;
> > > +		stage_buffer[x].r = (u16)src_pixels[2] * 257;
> > > +		stage_buffer[x].g = (u16)src_pixels[1] * 257;
> > > +		stage_buffer[x].b = (u16)src_pixels[0] * 257;
> > > +	}
> > > +}
> > > +
> > > +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> > > +			      struct line_buffer *stage_buffer)
> > > +{
> > > +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> > > +	int x, x_limit = drm_rect_width(&frame_info->dst);
> > > +
> > > +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> > > +		stage_buffer[x].a = (u16)0xffff;
> > > +		stage_buffer[x].r = (u16)src_pixels[2] * 257;
> > > +		stage_buffer[x].g = (u16)src_pixels[1] * 257;
> > > +		stage_buffer[x].b = (u16)src_pixels[0] * 257;
> > > +	}
> > > +}
> > > +
> > > +/*
> > > + * The following  functions take an line of ARGB16161616 pixels from the
> > > + * src_buffer, convert them to a specific format, and store them in the
> > > + * destination.
> > > + *
> > > + * They are used in the `compose_active_planes` to convert and store a line
> > > + * from the src_buffer to the writeback buffer.
> > > + */
> > > +void convert_to_ARGB8888(struct vkms_frame_info *frame_info,
> > > +			 int y, struct line_buffer *src_buffer)
> > > +{
> > > +	int x, x_dst = frame_info->dst.x1;
> > > +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> > > +	int x_limit = drm_rect_width(&frame_info->dst);
> > > +
> > > +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> > > +		/*
> > > +		 * This sequence below is important because the format's byte order is
> > > +		 * in little-endian. In the case of the ARGB8888 the memory is
> > > +		 * organized this way:
> > > +		 *
> > > +		 * | Addr     | = blue channel
> > > +		 * | Addr + 1 | = green channel
> > > +		 * | Addr + 2 | = Red channel
> > > +		 * | Addr + 3 | = Alpha channel
> > > +		 */
> > > +		dst_pixels[3] = DIV_ROUND_UP(src_buffer[x].a, 257);
> > > +		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
> > > +		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
> > > +		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
> > > +	}
> > > +}
> > > +
> > > +void convert_to_XRGB8888(struct vkms_frame_info *frame_info,
> > > +			 int y, struct line_buffer *src_buffer)
> > > +{
> > > +	int x, x_dst = frame_info->dst.x1;
> > > +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> > > +	int x_limit = drm_rect_width(&frame_info->dst);
> > > +
> > > +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> > > +		dst_pixels[3] = (u8)0xff;
> > > +		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
> > > +		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
> > > +		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
> > > +	}
> > > +}
> > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> > > new file mode 100644
> > > index 000000000000..817e8b2124ae
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> > > @@ -0,0 +1,31 @@
> > > +/* SPDX-License-Identifier: GPL-2.0+ */
> > and here ^
> > 
> > > +
> > > +#ifndef _VKMS_FORMATS_H_
> > > +#define _VKMS_FORMATS_H_
> > > +
> > > +#include "vkms_drv.h"
> > > +
> > > +struct line_buffer {
> > > +	u16 a, r, g, b;
> > > +};
> > > +
> > > +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> > > +			      struct line_buffer *stage_buffer);
> > > +
> > > +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> > > +			      struct line_buffer *stage_buffer);
> > > +
> > > +void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
> > > +			 struct line_buffer *src_buffer);
> > > +
> > > +void convert_to_XRGB8888(struct vkms_frame_info *frame_info, int y,
> > > +			 struct line_buffer *src_buffer);
> > > +
> > > +typedef void (*format_transform_func)(struct vkms_frame_info *frame_info, int y,
> > > +				      struct line_buffer *buffer);
> > > +
> > > +format_transform_func get_fmt_transform_function(u32 format);
> > > +
> > > +format_transform_func get_wb_fmt_transform_function(u32 format);
> > > +
> > > +#endif /* _VKMS_FORMATS_H_ */
> > > -- 
> > > 2.30.2
> > > 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-01-21 21:38 ` [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats Igor Torrente
  2022-02-08 10:40   ` Melissa Wen
@ 2022-02-10  9:37   ` Pekka Paalanen
  2022-02-25  0:43     ` Igor Torrente
  1 sibling, 1 reply; 31+ messages in thread
From: Pekka Paalanen @ 2022-02-10  9:37 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	dri-devel, melissa.srw, ~lkcamp/patches, kernel test robot

[-- Attachment #1: Type: text/plain, Size: 25355 bytes --]

On Fri, 21 Jan 2022 18:38:29 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Currently the blend function only accepts XRGB_8888 and ARGB_8888
> as a color input.
> 
> This patch refactors all the functions related to the plane composition
> to overcome this limitation.
> 
> A new internal format(`struct pixel`) is introduced to deal with all
> possible inputs. It consists of 16 bits fields that represent each of
> the channels.
> 
> The pixels blend is done using this internal format. And new handlers
> are being added to convert a specific format to/from this internal format.
> 
> So the blend operation depends on these handlers to convert to this common
> format. The blended result, if necessary, is converted to the writeback
> buffer format.
> 
> This patch introduces three major differences to the blend function.
> 1 - All the planes are blended at once.
> 2 - The blend calculus is done as per line instead of per pixel.
> 3 - It is responsible to calculates the CRC and writing the writeback
>     buffer(if necessary).
> 
> These changes allow us to allocate way less memory in the intermediate
> buffer to compute these operations. Because now we don't need to
> have the entire intermediate image lines at once, just one line is
> enough.
> 
> | Memory consumption (output dimensions) |
> |:--------------------------------------:|
> |       Current      |     This patch    |
> |:------------------:|:-----------------:|
> |   Width * Heigth   |     2 * Width     |
> 
> Beyond memory, we also have a minor performance benefit from all
> these changes. Results running the IGT tests `*kms_cursor_crc*`:
> 
> |                 Frametime                  |
> |:------------------------------------------:|
> |  Implementation |  Current  |  This commit |
> |:---------------:|:---------:|:------------:|
> | frametime range |  8~22 ms  |    5~18 ms   |
> |     Average     |  10.0 ms  |    7.3 ms    |
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
> V2: Improves the performance drastically, by perfoming the operations
>     per-line and not per-pixel(Pekka Paalanen).
>     Minor improvements(Pekka Paalanen).
> 
> V3: Changes the code to blend the planes all at once. This improves
>     performance, memory consumption, and removes much of the weirdness
>     of the V2(Pekka Paalanen and me).
>     Minor improvements(Pekka Paalanen and me).
> 
> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
> ---
>  drivers/gpu/drm/vkms/Makefile        |   1 +
>  drivers/gpu/drm/vkms/vkms_composer.c | 335 +++++++++++++--------------
>  drivers/gpu/drm/vkms/vkms_formats.c  | 138 +++++++++++
>  drivers/gpu/drm/vkms/vkms_formats.h  |  31 +++
>  4 files changed, 333 insertions(+), 172 deletions(-)
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h

Hi Igor,

I'm really happy to see this, thanks!

I still have some security/robustness and other comments below.

I've deleted all the minus lines from the patch to make the new code
more clear.

> 
> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
> index 72f779cbfedd..1b28a6a32948 100644
> --- a/drivers/gpu/drm/vkms/Makefile
> +++ b/drivers/gpu/drm/vkms/Makefile
> @@ -3,6 +3,7 @@ vkms-y := \
>  	vkms_drv.o \
>  	vkms_plane.o \
>  	vkms_output.o \
> +	vkms_formats.o \
>  	vkms_crtc.o \
>  	vkms_composer.o \
>  	vkms_writeback.o
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 95029d2ebcac..9f70fcf84fb9 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -9,202 +9,210 @@
>  #include <drm/drm_vblank.h>
>  
>  #include "vkms_drv.h"
> +#include "vkms_formats.h"
>  
> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>  {
> +	u32 new_color;
>  
> +	new_color = (src * 0xffff + dst * (0xffff - alpha));
>  
> +	return DIV_ROUND_UP(new_color, 0xffff);

Why round-up rather than the usual mathematical rounding?

>  }
>  
>  /**
> + * pre_mul_alpha_blend - alpha blending equation
> + * @src_frame_info: source framebuffer's metadata
> + * @stage_buffer: The line with the pixels from src_plane
> + * @output_buffer: A line buffer that receives all the blends output
>   *
> + * Using the information from the `frame_info`, this blends only the
> + * necessary pixels from the `stage_buffer` to the `output_buffer`
> + * using premultiplied blend formula.
>   *
> + * The current DRM assumption is that pixel color values have been already
> + * pre-multiplied with the alpha channel values. See more
> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
> + * completely opaque background.
>   */
> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> +				struct line_buffer *stage_buffer,
> +				struct line_buffer *output_buffer)
>  {
> +	int x, x_dst = frame_info->dst.x1;
> +	int x_limit = drm_rect_width(&frame_info->dst);
> +	struct line_buffer *out = output_buffer + x_dst;
> +	struct line_buffer *in = stage_buffer;

Here you would check that you don't overrun any of the arrays. At this
point, I believe an overrun would indicate a bug in VKMS, so handle it
according to the kernel conventions. I have suggestion further below
how to make that check possible. In other places, I'll just say "check
for overruns" for short.

> +
> +	for (x = 0; x < x_limit; x++) {
> +		out[x].a = (u16)0xffff;
> +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>  	}
>  }
>  
> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>  {
> +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
> +		return true;
>  
> +	return false;
>  }
>  
>  /**
> + * @wb_frame_info: The writeback frame buffer metadata
> + * @wb_fmt_func: The format tranformatio function to the wb buffer
> + * @crtc_state: The crtc state
> + * @plane_fmt_func: A format tranformation function to each plane

Is it not *from* each plane?

Each plane... does this mean that all planes must have the same pixel
format?

Oh wait, it's a pointer, so an array, isn't it? You're passing in an
array without passing in the array size. That seems quite risky to me.
Think of someone else needing to patch something here without fully
understanding how this all works, they'd easily introduce a subtle bug.

Looks like the array must be number of "active planes" long. So it's
not even a constant, and the size of the array is not documented here.

What if the fmt_func was a field in struct vkms_frame_info? So you
could set it when creating a vkms_frame_info. Wouldn't that simplify
the code in blend() and its callers?

> + * @crc32: The crc output of the final frame
> + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
> + * @stage_buffer: The line with the pixels from src_compositor

I don't see src_compositor?

>   *
> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
> + * from all planes, calculates the crc32 of the output from the former step,
> + * and, if necessary, convert and store the output to the writeback buffer.
>   *
>   * TODO: completely clear the primary plane (a = 0xff) before starting to blend
>   * pixel color values

Mm, you only need to clear output_buffer, not the whole writeback FB.
output_buffer will unconditionally and totally overwrite the writeback
FB, right?

>   */
> +static void blend(struct vkms_frame_info *wb_frame_info,

Using "wb" as short for writeback is... well, it's hard for the me
remember at least. Could this not be named simply "writeback"?

> +		  format_transform_func wb_fmt_func,

"writeback_func"

> +		  struct vkms_crtc_state *crtc_state,
> +		  format_transform_func *plane_fmt_func,
> +		  u32 *crc32, struct line_buffer *stage_buffer,
> +		  struct line_buffer *output_buffer, s64 row_size)
>  {
> +	struct vkms_plane_state **plane = crtc_state->active_planes;
> +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
> +	u32 n_active_planes = crtc_state->num_active_planes;
>  
> +	int y_src = primary_plane_info->dst.y1;

Shouldn't this be called y_dst instead?

> +	int h_dst = drm_rect_height(&primary_plane_info->dst);
>  	int y_limit = y_src + h_dst;
> +	int y, i;

It took me a while to understand that all these y-coordinates are CRTC
coordinates. Maybe call them crtc_y, crtc_y_begin, crtc_y_end,
crtc_y_height, etc.

> +
> +	for (y = y_src; y < y_limit; y++) {
> +		plane_fmt_func[0](primary_plane_info, y, output_buffer);

This is initializing output_buffer, right? So why do you have the TODO
comment about clearing the primary plane above?

Is it because the primary plane may not cover the CRTC exactly, the
destination rectangle might be bigger or smaller?

The output_buffer length should be the CRTC width, right?

Maybe the special-casing the primary plane in this code is wrong.
crtc_y needs to iterate over the CRTC height starting from zero. Then,
you explicitly clear output_buffer to opaque background color, and
primary plane becomes just another plane in the array of active planes
with no special handling here.

That will allow you to support overlay planes *below* the primary plane
(as is fairly common in non-PC hardware), and you can even support the
background color KMS property.

> +
> +		/* If there are other planes besides primary, we consider the active
> +		 * planes should be in z-order and compose them associatively:
> +		 * ((primary <- overlay) <- cursor)
> +		 */
> +		for (i = 1; i < n_active_planes; i++) {
> +			if (!check_y_limit(plane[i]->frame_info, y))
> +				continue;
> +
> +			plane_fmt_func[i](plane[i]->frame_info, y, stage_buffer);
> +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> +					    output_buffer);
>  		}
> +
> +		*crc32 = crc32_le(*crc32, (void *)output_buffer, row_size);
> +
> +		if (wb_frame_info)
> +			wb_fmt_func(wb_frame_info, y, output_buffer);
>  	}
>  }
>  
> +static void get_format_transform_functions(struct vkms_crtc_state *crtc_state,
> +					   format_transform_func plane_funcs[])
>  {
> +	struct vkms_plane_state **active_planes = crtc_state->active_planes;
> +	u32 n_active_planes = crtc_state->num_active_planes, s_fmt;
> +	int i;
>  
> +	for (i = 0; i < n_active_planes; i++) {
> +		s_fmt = active_planes[i]->frame_info->fb->format->format;
> +		plane_funcs[i] = get_fmt_transform_function(s_fmt);
> +	}
> +}
>  
> +static bool check_planes_x_bounds(struct vkms_crtc_state *crtc_state,
> +				  struct vkms_frame_info *wb_frame_info)
> +{
> +	struct vkms_plane_state **planes = crtc_state->active_planes;
> +	struct vkms_frame_info *primary_plane_info = planes[0]->frame_info;
> +	int line_width = drm_rect_width(&primary_plane_info->dst);
> +	u32 n_active_planes = crtc_state->num_active_planes;
> +	int i;
>  
> +	for (i = 0; i < n_active_planes; i++) {
> +		int x_dst = planes[i]->frame_info->dst.x1;
> +		int x_src = planes[i]->frame_info->src.x1 >> 16;
> +		int x2_src = planes[i]->frame_info->src.x2 >> 16;
> +		int x_limit = drm_rect_width(&planes[i]->frame_info->dst);
>  
> +		if (x_dst + x_limit > line_width)
> +			return false;
> +		if (x_src + x_limit > x2_src)
> +			return false;
> +	}

That's interesting. Looks like you reject everything if any plane is
not fully inside the primary plane destination rectangle. But that's
not the right check, is it? If you want to check this, you would check
against the CRTC dimensions.

Then again, I think some hardware do allow planes to reach outside of
the CRTC dimensions. Cursor plane is probably the best example. The
cursor can be partly off-screen. So this is something that would need
to be supported both ways I suppose, but going with the "all plane
destination rectangles must be strictly inside the CRTC dimensions" is
a good start.

But why only x-coordinate check? y should have the same rules, right?

> +
> +	return true;
>  }
>  
> +static int compose_active_planes(struct vkms_frame_info *wb_frame_info,
> +				 struct vkms_crtc_state *crtc_state,
> +				 u32 *crc32)
>  {
> +	format_transform_func plane_funcs[NUM_OVERLAY_PLANES], wb_func = NULL;
> +	int line_width, ret = 0, pixel_size = sizeof(struct line_buffer);
> +	struct vkms_frame_info *primary_plane_info = NULL;
> +	struct line_buffer *output_buffer, *stage_buffer;
> +	struct vkms_plane_state *act_plane = NULL;
> +	u32 wb_format;
>  
> +	if (WARN_ON(pixel_size != 8))
> +		return -EINVAL;
> +
> +	if (crtc_state->num_active_planes >= 1) {
> +		act_plane = crtc_state->active_planes[0];
> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> +			primary_plane_info = act_plane->frame_info;
>  	}
>  
> +	if (!primary_plane_info)
> +		return -EINVAL;
> +
>  	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>  		return -EINVAL;
>  
> +	if (WARN_ON(!check_planes_x_bounds(crtc_state, wb_frame_info)))
> +		return -EINVAL;
>  
> +	line_width = drm_rect_width(&primary_plane_info->dst);

This needs to be CRTC width, not primary plane width.

>  
> +	stage_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> +	if (!stage_buffer) {
> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
> +		return -ENOMEM;
> +	}
>  
> +	output_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> +	if (!output_buffer) {
> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
> +		ret = -ENOMEM;
> +		goto free_stage_buffer;
> +	}
> +
> +	get_format_transform_functions(crtc_state, plane_funcs);
> +
> +	if (wb_frame_info) {
> +		wb_format = wb_frame_info->fb->format->format;
> +		wb_func = get_wb_fmt_transform_function(wb_format);
> +		wb_frame_info->src = primary_plane_info->src;
> +		wb_frame_info->dst = primary_plane_info->dst;
> +	}
> +
> +	blend(wb_frame_info, wb_func, crtc_state, plane_funcs, crc32,
> +	      stage_buffer, output_buffer, (s64)line_width * pixel_size);
> +
> +	kvfree(output_buffer);
> +free_stage_buffer:
> +	kvfree(stage_buffer);
> +
> +	return ret;
>  }
>  
>  /**
> @@ -222,13 +230,12 @@ void vkms_composer_worker(struct work_struct *work)
>  						struct vkms_crtc_state,
>  						composer_work);
>  	struct drm_crtc *crtc = crtc_state->base.crtc;
> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
> +	struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>  	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>  	bool crc_pending, wb_pending;
>  	u64 frame_start, frame_end;
> +	u32 crc32 = 0;
>  	int ret;
>  
>  	spin_lock_irq(&out->composer_lock);
> @@ -248,35 +255,19 @@ void vkms_composer_worker(struct work_struct *work)
>  	if (!crc_pending)
>  		return;
>  
>  	if (wb_pending)
> +		ret = compose_active_planes(wb_frame_info, crtc_state, &crc32);
> +	else
> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>  
> +	if (ret)
>  		return;
>  
>  	if (wb_pending) {
>  		drm_writeback_signal_completion(&out->wb_connector, 0);
>  		spin_lock_irq(&out->composer_lock);
>  		crtc_state->wb_pending = false;
>  		spin_unlock_irq(&out->composer_lock);
>  	}
>  
>  	/*
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> new file mode 100644
> index 000000000000..0d1838d1b835
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -0,0 +1,138 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +
> +#include <drm/drm_rect.h>
> +#include "vkms_formats.h"
> +
> +format_transform_func get_fmt_transform_function(u32 format)
> +{
> +	if (format == DRM_FORMAT_ARGB8888)
> +		return &ARGB8888_to_ARGB16161616;
> +	else
> +		return &XRGB8888_to_ARGB16161616;

In functions like this you should prepare for caller errors. Use a
switch, and fail any attempt to use a pixel format it doesn't support.
Failing is much better than silently producing garbage or worse: buffer
overruns when bytes-per-pixel is not what you expected.

What to do on failure depends on whether the failure here is never
supposed to happen (follow the kernel style) e.g. malicious userspace
cannot trigger it, or if you actually use this function to define the
supported for pixel formats.

The latter means you'd have a list of all DRM pixel formats and then
you'd ask for each one if this function knows it, and if yes, you add
the format to the list of supported formats advertised to userspace. I
don't know if that would be fine by DRM coding style.

> +}
> +
> +format_transform_func get_wb_fmt_transform_function(u32 format)
> +{
> +	if (format == DRM_FORMAT_ARGB8888)
> +		return &convert_to_ARGB8888;
> +	else
> +		return &convert_to_XRGB8888;
> +}

I think you could move the above getter functions to the bottom of the
.c file, and make all the four *_to_* functions static, and remove them
from the header.

> +
> +static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
> +{
> +	return frame_info->offset + (y * frame_info->pitch)
> +				  + (x * frame_info->cpp);
> +}
> +
> +/*
> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> + *
> + * @frame_info: Buffer metadata
> + * @x: The x(width) coordinate of the 2D buffer
> + * @y: The y(Heigth) coordinate of the 2D buffer
> + *
> + * Takes the information stored in the frame_info, a pair of coordinates, and
> + * returns the address of the first color channel.
> + * This function assumes the channels are packed together, i.e. a color channel
> + * comes immediately after another in the memory. And therefore, this function
> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> + */
> +static void *packed_pixels_addr(struct vkms_frame_info *frame_info, int x, int y)
> +{
> +	int offset = pixel_offset(frame_info, x, y);
> +
> +	return (u8 *)frame_info->map[0].vaddr + offset;
> +}
> +
> +static void *get_packed_src_addr(struct vkms_frame_info *frame_info, int y)
> +{
> +	int x_src = frame_info->src.x1 >> 16;
> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
> +
> +	return packed_pixels_addr(frame_info, x_src, y_src);
> +}
> +
> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			      struct line_buffer *stage_buffer)

I'm fairly sure that DRM will one day add exactly ARGB16161616 format.
But that will not be the format you use here (or it might be, but
purely accidentally and depending on machine endianess and whatnot), so
I would suggest inventing a new name. Also use the same name for the
struct to hold a single pixel.

E.g. struct pixel_argb_u16

So that it is clear it is not meant to be any specific DRM_FORMAT_* format.

> +{
> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> +		/*
> +		 * Organizes the channels in their respective positions and converts
> +		 * the 8 bits channel to 16.
> +		 * The 257 is the "conversion ratio". This number is obtained by the
> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
> +		 * the best color value in a pixel format with more possibilities.
> +		 * And a similar idea applies to others RGB color conversions.
> +		 */
> +		stage_buffer[x].a = (u16)src_pixels[3] * 257;
> +		stage_buffer[x].r = (u16)src_pixels[2] * 257;
> +		stage_buffer[x].g = (u16)src_pixels[1] * 257;
> +		stage_buffer[x].b = (u16)src_pixels[0] * 257;
> +	}
> +}
> +
> +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			      struct line_buffer *stage_buffer)
> +{
> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> +		stage_buffer[x].a = (u16)0xffff;
> +		stage_buffer[x].r = (u16)src_pixels[2] * 257;
> +		stage_buffer[x].g = (u16)src_pixels[1] * 257;
> +		stage_buffer[x].b = (u16)src_pixels[0] * 257;
> +	}
> +}
> +
> +/*
> + * The following  functions take an line of ARGB16161616 pixels from the
> + * src_buffer, convert them to a specific format, and store them in the
> + * destination.
> + *
> + * They are used in the `compose_active_planes` to convert and store a line
> + * from the src_buffer to the writeback buffer.
> + */
> +void convert_to_ARGB8888(struct vkms_frame_info *frame_info,
> +			 int y, struct line_buffer *src_buffer)

Please, use consistent function naming style. These are using "convert"
while the other ones are using "ARGB16161616".

> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	int x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		/*
> +		 * This sequence below is important because the format's byte order is
> +		 * in little-endian. In the case of the ARGB8888 the memory is
> +		 * organized this way:
> +		 *
> +		 * | Addr     | = blue channel
> +		 * | Addr + 1 | = green channel
> +		 * | Addr + 2 | = Red channel
> +		 * | Addr + 3 | = Alpha channel
> +		 */
> +		dst_pixels[3] = DIV_ROUND_UP(src_buffer[x].a, 257);
> +		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
> +		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
> +		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
> +	}
> +}
> +
> +void convert_to_XRGB8888(struct vkms_frame_info *frame_info,
> +			 int y, struct line_buffer *src_buffer)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	int x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		dst_pixels[3] = (u8)0xff;
> +		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
> +		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
> +		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
> +	}
> +}
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> new file mode 100644
> index 000000000000..817e8b2124ae
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +
> +#ifndef _VKMS_FORMATS_H_
> +#define _VKMS_FORMATS_H_
> +
> +#include "vkms_drv.h"
> +
> +struct line_buffer {

As I mentioned above, this would be called pixel_argb_u16 or something
like that.

> +	u16 a, r, g, b;
> +};

I was trying to suggest that a line_buffer would actually hold a whole
line, something like the pseudo code:

struct line_buffer {
	size_t len_pixels;
	struct my_pixel_type pixels[];
}

or whatever the kernel style for a variable length array at the end of
a struct is. Field names are suggestions.

Then it is easy to check that you don't overflow any line_buffer when
operating on them.

> +
> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			      struct line_buffer *stage_buffer);
> +
> +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			      struct line_buffer *stage_buffer);
> +
> +void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
> +			 struct line_buffer *src_buffer);
> +
> +void convert_to_XRGB8888(struct vkms_frame_info *frame_info, int y,
> +			 struct line_buffer *src_buffer);

You should only need the below functions and not the above ones in this header.

> +
> +typedef void (*format_transform_func)(struct vkms_frame_info *frame_info, int y,
> +				      struct line_buffer *buffer);

The arguments for this function-pointer should be documented,
especially that y is the y-coordinate in CRTC coordinate space, i.e.
plane destination rectangle. You might even call it crtc_y.

I think you should use two different function-pointer types for the two
different kinds of functions:
- reads arbitrary pixel format and writes to rgba_u16
- reads rgba_u16 and writes to arbitrary pixel format

This will prevent any mistakes in accidentally using the wrong kind of
function. If you also have the argument order different between the two
types of functions, getting them mixed up is even less likely. I
presume the kernel uses the function(destination, source) style of
argument ordering. You can also use 'const' on the source, that is a
good way to document things too.

The consequence of using the wrong function could be the leak of kernel
memory content to userspace, which is pretty bad. So preventing that
kind of problems before they happen is nice.

> +
> +format_transform_func get_fmt_transform_function(u32 format);
> +
> +format_transform_func get_wb_fmt_transform_function(u32 format);
> +
> +#endif /* _VKMS_FORMATS_H_ */


Good work!


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 9/9] drm: vkms: Add support to the RGB565 format
  2022-01-21 21:38 ` [PATCH v4 9/9] drm: vkms: Add support to the RGB565 format Igor Torrente
  2022-02-08 10:50   ` Melissa Wen
@ 2022-02-10  9:50   ` Pekka Paalanen
  2022-02-25  1:03     ` Igor Torrente
  1 sibling, 1 reply; 31+ messages in thread
From: Pekka Paalanen @ 2022-02-10  9:50 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, tzimmermann, rodrigosiqueiramelo, airlied,
	dri-devel, melissa.srw, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 8215 bytes --]

On Fri, 21 Jan 2022 18:38:31 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Adds this common format to vkms.
> 
> This commit also adds new helper macros to deal with fixed-point
> arithmetic.
> 
> It was done to improve the precision of the conversion to ARGB16161616
> since the "conversion ratio" is not an integer.
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
> V3: Adapt the handlers to the new format introduced in patch 7 V3.
> ---
>  drivers/gpu/drm/vkms/vkms_formats.c   | 74 +++++++++++++++++++++++++++
>  drivers/gpu/drm/vkms/vkms_formats.h   |  6 +++
>  drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
>  drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
>  4 files changed, 86 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 661da39d1276..dc612882dd8c 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -11,6 +11,8 @@ format_transform_func get_fmt_transform_function(u32 format)
>  		return &get_ARGB16161616;
>  	else if (format == DRM_FORMAT_XRGB16161616)
>  		return &XRGB16161616_to_ARGB16161616;
> +	else if (format == DRM_FORMAT_RGB565)
> +		return &RGB565_to_ARGB16161616;
>  	else
>  		return &XRGB8888_to_ARGB16161616;
>  }
> @@ -23,6 +25,8 @@ format_transform_func get_wb_fmt_transform_function(u32 format)
>  		return &convert_to_ARGB16161616;
>  	else if (format == DRM_FORMAT_XRGB16161616)
>  		return &convert_to_XRGB16161616;
> +	else if (format == DRM_FORMAT_RGB565)
> +		return &convert_to_RGB565;
>  	else
>  		return &convert_to_XRGB8888;
>  }
> @@ -33,6 +37,26 @@ static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
>  				  + (x * frame_info->cpp);
>  }
>  
> +/*
> + * FP stands for _Fixed Point_ and **not** _Float Point_

Is it common in the kernel that FP always means fixed-point?

If there is any doubt about that, I'd suggest using "fixed" and "float"
to avoid misunderstandings.

And, since you are not supposed to use floats in the kernel unless you
really really must and you do all the preparations necessary (which you
don't here), maybe replace the "float" with a fraction.

In other words, write a macro that takes (65535, 31) as arguments
instead of a float, when converting to fixed-point. Then you don't have
to use those strange decimal constants either.

> + * LF stands for Long Float (i.e. double)
> + * The following macros help doing fixed point arithmetic.
> + */
> +/*
> + * With FP scale 15 we have 17 and 15 bits of integer and fractional parts
> + * respectively.
> + *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
> + * 31                                          0
> + */
> +#define FP_SCALE 15
> +
> +#define LF_TO_FP(a) ((a) * (u64)(1 << FP_SCALE))
> +#define INT_TO_FP(a) ((a) << FP_SCALE)
> +#define FP_MUL(a, b) ((s32)(((s64)(a) * (b)) >> FP_SCALE))
> +#define FP_DIV(a, b) ((s32)(((s64)(a) << FP_SCALE) / (b)))
> +/* This macro converts a fixed point number to int, and round half up it */
> +#define FP_TO_INT_ROUND_UP(a) (((a) + (1 << (FP_SCALE - 1))) >> FP_SCALE)
> +
>  /*
>   * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>   *
> @@ -125,6 +149,33 @@ void XRGB16161616_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>  	}
>  }
>  
> +void RGB565_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			    struct line_buffer *stage_buffer)
> +{
> +	u16 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x, x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, src_pixels++) {
> +		u16 rgb_565 = le16_to_cpu(*src_pixels);
> +		int fp_r = INT_TO_FP((rgb_565 >> 11) & 0x1f);
> +		int fp_g = INT_TO_FP((rgb_565 >> 5) & 0x3f);
> +		int fp_b = INT_TO_FP(rgb_565 & 0x1f);
> +
> +		/*
> +		 * The magic constants is the "conversion ratio" and is calculated
> +		 * dividing 65535(2^16 - 1) by 31(2^5 -1) and 63(2^6 - 1)
> +		 * respectively.
> +		 */
> +		int fp_rb_ratio = LF_TO_FP(2114.032258065);
> +		int fp_g_ratio = LF_TO_FP(1040.238095238);
> +
> +		stage_buffer[x].a = (u16)0xffff;
> +		stage_buffer[x].r = FP_TO_INT_ROUND_UP(FP_MUL(fp_r, fp_rb_ratio));
> +		stage_buffer[x].g = FP_TO_INT_ROUND_UP(FP_MUL(fp_g, fp_g_ratio));
> +		stage_buffer[x].b = FP_TO_INT_ROUND_UP(FP_MUL(fp_b, fp_rb_ratio));
> +	}
> +}
> +
>  
>  /*
>   * The following  functions take an line of ARGB16161616 pixels from the
> @@ -203,3 +254,26 @@ void convert_to_XRGB16161616(struct vkms_frame_info *frame_info, int y,
>  		dst_pixels[0] = src_buffer[x].b;
>  	}
>  }
> +
> +void convert_to_RGB565(struct vkms_frame_info *frame_info, int y,
> +		       struct line_buffer *src_buffer)
> +{
> +	int x, x_dst = frame_info->dst.x1;
> +	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	int x_limit = drm_rect_width(&frame_info->dst);
> +
> +	for (x = 0; x < x_limit; x++, dst_pixels++) {
> +		int fp_r = INT_TO_FP(src_buffer[x].r);
> +		int fp_g = INT_TO_FP(src_buffer[x].g);
> +		int fp_b = INT_TO_FP(src_buffer[x].b);
> +
> +		int fp_rb_ratio = LF_TO_FP(2114.032258065);
> +		int fp_g_ratio = LF_TO_FP(1040.238095238);

Are there any guarantees that this will not result in floating-point
CPU instructions being used? Like a compiler error if it did?

Yes, it's a constant expression, but I think there were some funny
rules in C that floating-point operations may not be evaluated at
compile time. Maybe I'm just paranoid?


Thanks,
pq

> +
> +		u16 r = FP_TO_INT_ROUND_UP(FP_DIV(fp_r, fp_rb_ratio));
> +		u16 g = FP_TO_INT_ROUND_UP(FP_DIV(fp_g, fp_g_ratio));
> +		u16 b = FP_TO_INT_ROUND_UP(FP_DIV(fp_b, fp_rb_ratio));
> +
> +		*dst_pixels = cpu_to_le16(r << 11 | g << 5 | b);
> +	}
> +}
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index 22358f3a33ab..836d6e43ea90 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -21,6 +21,9 @@ void get_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>  void XRGB16161616_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>  				  struct line_buffer *stage_buffer);
>  
> +void RGB565_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> +			    struct line_buffer *stage_buffer);
> +
>  void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
>  			 struct line_buffer *src_buffer);
>  
> @@ -33,6 +36,9 @@ void convert_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>  void convert_to_XRGB16161616(struct vkms_frame_info *frame_info, int y,
>  			     struct line_buffer *src_buffer);
>  
> +void convert_to_RGB565(struct vkms_frame_info *frame_info, int y,
> +		       struct line_buffer *src_buffer);
> +
>  typedef void (*format_transform_func)(struct vkms_frame_info *frame_info, int y,
>  				      struct line_buffer *buffer);
>  
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 1d70c9e8f109..4643eefcdf29 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -13,14 +13,16 @@
>  
>  static const u32 vkms_formats[] = {
>  	DRM_FORMAT_XRGB8888,
> -	DRM_FORMAT_XRGB16161616
> +	DRM_FORMAT_XRGB16161616,
> +	DRM_FORMAT_RGB565
>  };
>  
>  static const u32 vkms_plane_formats[] = {
>  	DRM_FORMAT_ARGB8888,
>  	DRM_FORMAT_XRGB8888,
>  	DRM_FORMAT_XRGB16161616,
> -	DRM_FORMAT_ARGB16161616
> +	DRM_FORMAT_ARGB16161616,
> +	DRM_FORMAT_RGB565
>  };
>  
>  static struct drm_plane_state *
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index 393d3fc7966f..1aaa630090d3 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -15,7 +15,8 @@
>  static const u32 vkms_wb_formats[] = {
>  	DRM_FORMAT_XRGB8888,
>  	DRM_FORMAT_XRGB16161616,
> -	DRM_FORMAT_ARGB16161616
> +	DRM_FORMAT_ARGB16161616,
> +	DRM_FORMAT_RGB565
>  };
>  
>  static const struct drm_connector_funcs vkms_wb_connector_funcs = {


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-02-09 21:45       ` Melissa Wen
@ 2022-02-21  1:02         ` Igor Torrente
  2022-02-21  9:18           ` Pekka Paalanen
  0 siblings, 1 reply; 31+ messages in thread
From: Igor Torrente @ 2022-02-21  1:02 UTC (permalink / raw)
  To: Melissa Wen
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	tzimmermann, ~lkcamp/patches

Hi Melissa,

On 2/9/22 18:45, Melissa Wen wrote:
> On 02/08, Igor Torrente wrote:
>> Hi Melissa,
>>
>> On 2/8/22 07:40, Melissa Wen wrote:
>>> On 01/21, Igor Torrente wrote:
>>>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>>>> as a color input.
>>>>
>>>> This patch refactors all the functions related to the plane composition
>>>> to overcome this limitation.
>>>>
>>>> A new internal format(`struct pixel`) is introduced to deal with all
>>>> possible inputs. It consists of 16 bits fields that represent each of
>>>> the channels.
>>>>
>>>> The pixels blend is done using this internal format. And new handlers
>>>> are being added to convert a specific format to/from this internal format.
>>>>
>>>> So the blend operation depends on these handlers to convert to this common
>>>> format. The blended result, if necessary, is converted to the writeback
>>>> buffer format.
>>>>
>>>> This patch introduces three major differences to the blend function.
>>>> 1 - All the planes are blended at once.
>>>> 2 - The blend calculus is done as per line instead of per pixel.
>>>> 3 - It is responsible to calculates the CRC and writing the writeback
>>>>       buffer(if necessary).
>>>>
>>>> These changes allow us to allocate way less memory in the intermediate
>>>> buffer to compute these operations. Because now we don't need to
>>>> have the entire intermediate image lines at once, just one line is
>>>> enough.
>>>>
>>>> | Memory consumption (output dimensions) |
>>>> |:--------------------------------------:|
>>>> |       Current      |     This patch    |
>>>> |:------------------:|:-----------------:|
>>>> |   Width * Heigth   |     2 * Width     |
>>>>
>>>> Beyond memory, we also have a minor performance benefit from all
>>>> these changes. Results running the IGT tests `*kms_cursor_crc*`:
>>>>
>>> First, thanks for this improvement.
>>>
>>> Some recent changes in kms_cursor_crc caused VKMS to fail in most test
>>> cases (iirc, only size-change and alpha-opaque are passing currently).
>>
>> I updated my igt and kernel(from drm_misc/drm-misc-next) to the latest
>> commit[1][2] and I'm getting mixed results. Sometimes most of the test
>> passes, sometimes almost nothing passes.
> hmm.. is it happening when running kms_cursor_crc? Is the results
> variation random or is it possible to follow a set of steps to reproduce
> it? When failing, what is the reason displayed by the log?

I investigated it a little bit and discovered that the KMS
cursor(".*kms_cursor_crc*" ) are failing after the execution of
writeback tests(".*kms_writeback.*").

I don't know what is causing it, but they are failing while trying to 
commit the KMS changes.

out.txt:
IGT-Version: 1.26-NO-GIT (x86_64) (Linux: 5.17.0-rc2 x86_64)
Stack trace:
   #0 ../lib/igt_core.c:1754 __igt_fail_assert()
   #1 ../lib/igt_kms.c:3795 do_display_commit()
   #2 ../lib/igt_kms.c:3901 igt_display_commit2()
   #3 ../tests/kms_cursor_crc.c:820 __igt_unique____real_main814()
   #4 ../tests/kms_cursor_crc.c:814 main()
   #5 ../csu/libc-start.c:308 __libc_start_main()
   #6 [_start+0x2a]
Subtest pipe-A-cursor-size-change: FAIL

err.txt:
(kms_cursor_crc:1936) igt_kms-CRITICAL: Test assertion failure function 
do_display_commit, file ../lib/igt_kms.c:3795:
(kms_cursor_crc:1936) igt_kms-CRITICAL: Failed assertion: ret == 0
(kms_cursor_crc:1936) igt_kms-CRITICAL: Last errno: 22, Invalid argument
(kms_cursor_crc:1936) igt_kms-CRITICAL: error: -22 != 0

> 
>  From my side, only the first two subtest of kms_cursor_crc is passing
> before this patch. And after your changes here, all subtests are
> successful again, except those related to 32x10 cursor size (that needs
> futher investigation). I didn't check how the recent changes in
> kms_cursor_crc affect VKMS performance on it, but I bet that clearing
> the alpha channel is the reason to have the performance back.

Yeah, I also don't understand why the 32x10 cursor tests are failing.

>>
>> [1] a96674e7 (tests/api_intel_bb: Handle different alignments in
>> delta-check)
>> [2] b21a142fd205 (drm/nouveau/backlight: Just set all backlight types as
>> RAW)
>>
>>> But saying that performance improvement here would cause a
>>> misunderstanding when reviewing the change history. Can you update this
>>> statistics here? I think you can specify the IGT hash to specify the
>>> test case version or you can pick another test for comparison.
>>
>> OK, I will do both.
>>
>>>> |                 Frametime                  |
>>>> |:------------------------------------------:|
>>>> |  Implementation |  Current  |  This commit |
>>>> |:---------------:|:---------:|:------------:|
>>>> | frametime range |  8~22 ms  |    5~18 ms   |
>>>> |     Average     |  10.0 ms  |    7.3 ms    |
>>>>
>>>> Reported-by: kernel test robot <lkp@intel.com>
>>> A little confusing for me to have this reported-by tag without any
>>> explanation of what was reported and fixed. Can you specify it?
>>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>>>> ---
>>>> V2: Improves the performance drastically, by perfoming the operations
>>>>       per-line and not per-pixel(Pekka Paalanen).
>>>>       Minor improvements(Pekka Paalanen).
>>>>
>>>> V3: Changes the code to blend the planes all at once. This improves
>>>>       performance, memory consumption, and removes much of the weirdness
>>>>       of the V2(Pekka Paalanen and me).
>>>>       Minor improvements(Pekka Paalanen and me).
>>>>
>>>> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
>>> Can you move version changes up so that they are not ignored?
>>>
>>> I also pointed out minor code style issue below.
>>> With these comments addressed, you can add my r-b tag in the next
>>> version.
>>>> ---
>>>>    drivers/gpu/drm/vkms/Makefile        |   1 +
>>>>    drivers/gpu/drm/vkms/vkms_composer.c | 335 +++++++++++++--------------
>>>>    drivers/gpu/drm/vkms/vkms_formats.c  | 138 +++++++++++
>>>>    drivers/gpu/drm/vkms/vkms_formats.h  |  31 +++
>>>>    4 files changed, 333 insertions(+), 172 deletions(-)
>>>>    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>>>>    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
>>>>
>>>> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
>>>> index 72f779cbfedd..1b28a6a32948 100644
>>>> --- a/drivers/gpu/drm/vkms/Makefile
>>>> +++ b/drivers/gpu/drm/vkms/Makefile
>>>> @@ -3,6 +3,7 @@ vkms-y := \
>>>>    	vkms_drv.o \
>>>>    	vkms_plane.o \
>>>>    	vkms_output.o \
>>>> +	vkms_formats.o \
>>>>    	vkms_crtc.o \
>>>>    	vkms_composer.o \
>>>>    	vkms_writeback.o
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> index 95029d2ebcac..9f70fcf84fb9 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> @@ -9,202 +9,210 @@
>>>>    #include <drm/drm_vblank.h>
>>>>    #include "vkms_drv.h"
>>>> +#include "vkms_formats.h"
>>>> -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
>>>> -				 const struct vkms_frame_info *frame_info)
>>>> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>>>>    {
>>>> -	u32 pixel;
>>>> -	int src_offset = frame_info->offset + (y * frame_info->pitch)
>>>> -					    + (x * frame_info->cpp);
>>>> +	u32 new_color;
>>>> -	pixel = *(u32 *)&buffer[src_offset];
>>>> +	new_color = (src * 0xffff + dst * (0xffff - alpha));
>>>> -	return pixel;
>>>> +	return DIV_ROUND_UP(new_color, 0xffff);
>>>>    }
>>>>    /**
>>>> - * compute_crc - Compute CRC value on output frame
>>>> + * pre_mul_alpha_blend - alpha blending equation
>>>> + * @src_frame_info: source framebuffer's metadata
>>>> + * @stage_buffer: The line with the pixels from src_plane
>>>> + * @output_buffer: A line buffer that receives all the blends output
>>>>     *
>>>> - * @vaddr: address to final framebuffer
>>>> - * @frame_info: framebuffer's metadata
>>>> + * Using the information from the `frame_info`, this blends only the
>>>> + * necessary pixels from the `stage_buffer` to the `output_buffer`
>>>> + * using premultiplied blend formula.
>>>>     *
>>>> - * returns CRC value computed using crc32 on the visible portion of
>>>> - * the final framebuffer at vaddr_out
>>>> + * The current DRM assumption is that pixel color values have been already
>>>> + * pre-multiplied with the alpha channel values. See more
>>>> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
>>>> + * completely opaque background.
>>>>     */
>>>> -static uint32_t compute_crc(const u8 *vaddr,
>>>> -			    const struct vkms_frame_info *frame_info)
>>>> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
>>>> +				struct line_buffer *stage_buffer,
>>>> +				struct line_buffer *output_buffer)
>>>>    {
>>>> -	int x, y;
>>>> -	u32 crc = 0, pixel = 0;
>>>> -	int x_src = frame_info->src.x1 >> 16;
>>>> -	int y_src = frame_info->src.y1 >> 16;
>>>> -	int h_src = drm_rect_height(&frame_info->src) >> 16;
>>>> -	int w_src = drm_rect_width(&frame_info->src) >> 16;
>>>> -
>>>> -	for (y = y_src; y < y_src + h_src; ++y) {
>>>> -		for (x = x_src; x < x_src + w_src; ++x) {
>>>> -			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
>>>> -			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
>>>> -		}
>>>> +	int x, x_dst = frame_info->dst.x1;
>>>> +	int x_limit = drm_rect_width(&frame_info->dst);
>>>> +	struct line_buffer *out = output_buffer + x_dst;
>>>> +	struct line_buffer *in = stage_buffer;
>>>> +
>>>> +	for (x = 0; x < x_limit; x++) {
>>>> +		out[x].a = (u16)0xffff;
>>>> +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
>>>> +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
>>>> +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>>>>    	}
>>>> -
>>>> -	return crc;
>>>>    }
>>>> -static u8 blend_channel(u8 src, u8 dst, u8 alpha)
>>>> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>>>>    {
>>>> -	u32 pre_blend;
>>>> -	u8 new_color;
>>>> -
>>>> -	pre_blend = (src * 255 + dst * (255 - alpha));
>>>> -
>>>> -	/* Faster div by 255 */
>>>> -	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
>>>> +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
>>>> +		return true;
>>>> -	return new_color;
>>>> +	return false;
>>>>    }
>>>>    /**
>>>> - * alpha_blend - alpha blending equation
>>>> - * @argb_src: src pixel on premultiplied alpha mode
>>>> - * @argb_dst: dst pixel completely opaque
>>>> - *
>>>> - * blend pixels using premultiplied blend formula. The current DRM assumption
>>>> - * is that pixel color values have been already pre-multiplied with the alpha
>>>> - * channel values. See more drm_plane_create_blend_mode_property(). Also, this
>>>> - * formula assumes a completely opaque background.
>>>> - */
>>>> -static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
>>>> -{
>>>> -	u8 alpha;
>>>> -
>>>> -	alpha = argb_src[3];
>>>> -	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
>>>> -	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
>>>> -	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
>>>> -}
>>>> -
>>>> -/**
>>>> - * x_blend - blending equation that ignores the pixel alpha
>>>> - *
>>>> - * overwrites RGB color value from src pixel to dst pixel.
>>>> - */
>>>> -static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>>>> -{
>>>> -	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
>>>> -}
>>>> -
>>>> -/**
>>>> - * blend - blend value at vaddr_src with value at vaddr_dst
>>>> - * @vaddr_dst: destination address
>>>> - * @vaddr_src: source address
>>>> - * @dst_frame_info: destination framebuffer's metadata
>>>> - * @src_frame_info: source framebuffer's metadata
>>>> - * @pixel_blend: blending equation based on plane format
>>>> + * @wb_frame_info: The writeback frame buffer metadata
>>>> + * @wb_fmt_func: The format tranformatio function to the wb buffer
>>>> + * @crtc_state: The crtc state
>>>> + * @plane_fmt_func: A format tranformation function to each plane
>>>> + * @crc32: The crc output of the final frame
>>>> + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
>>>> + * @stage_buffer: The line with the pixels from src_compositor
>>>>     *
>>>> - * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
>>>> - * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
>>>> - * and clearing alpha channel to an completely opaque background. This function
>>>> - * uses buffer's metadata to locate the new composite values at vaddr_dst.
>>>> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
>>>> + * from all planes, calculates the crc32 of the output from the former step,
>>>> + * and, if necessary, convert and store the output to the writeback buffer.
>>>>     *
>>>>     * TODO: completely clear the primary plane (a = 0xff) before starting to blend
>>>>     * pixel color values
>>>>     */
>>>> -static void blend(void *vaddr_dst, void *vaddr_src,
>>>> -		  struct vkms_frame_info *dst_frame_info,
>>>> -		  struct vkms_frame_info *src_frame_info,
>>>> -		  void (*pixel_blend)(const u8 *, u8 *))
>>>> +static void blend(struct vkms_frame_info *wb_frame_info,
>>>> +		  format_transform_func wb_fmt_func,
>>>> +		  struct vkms_crtc_state *crtc_state,
>>>> +		  format_transform_func *plane_fmt_func,
>>>> +		  u32 *crc32, struct line_buffer *stage_buffer,
>>>> +		  struct line_buffer *output_buffer, s64 row_size)
>>>>    {
>>>> -	int i, j, j_dst, i_dst;
>>>> -	int offset_src, offset_dst;
>>>> -	u8 *pixel_dst, *pixel_src;
>>>> -
>>>> -	int x_src = src_frame_info->src.x1 >> 16;
>>>> -	int y_src = src_frame_info->src.y1 >> 16;
>>>> -
>>>> -	int x_dst = src_frame_info->dst.x1;
>>>> -	int y_dst = src_frame_info->dst.y1;
>>>> -	int h_dst = drm_rect_height(&src_frame_info->dst);
>>>> -	int w_dst = drm_rect_width(&src_frame_info->dst);
>>>> +	struct vkms_plane_state **plane = crtc_state->active_planes;
>>>> +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
>>>> +	u32 n_active_planes = crtc_state->num_active_planes;
>>>> +	int y_src = primary_plane_info->dst.y1;
>>>> +	int h_dst = drm_rect_height(&primary_plane_info->dst);
>>>>    	int y_limit = y_src + h_dst;
>>>> -	int x_limit = x_src + w_dst;
>>>> -
>>>> -	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
>>>> -		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
>>>> -			offset_dst = dst_frame_info->offset
>>>> -				     + (i_dst * dst_frame_info->pitch)
>>>> -				     + (j_dst++ * dst_frame_info->cpp);
>>>> -			offset_src = src_frame_info->offset
>>>> -				     + (i * src_frame_info->pitch)
>>>> -				     + (j * src_frame_info->cpp);
>>>> -
>>>> -			pixel_src = (u8 *)(vaddr_src + offset_src);
>>>> -			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
>>>> -			pixel_blend(pixel_src, pixel_dst);
>>>> -			/* clearing alpha channel (0xff)*/
>>>> -			pixel_dst[3] = 0xff;
>>>> +	int y, i;
>>>> +
>>>> +	for (y = y_src; y < y_limit; y++) {
>>>> +		plane_fmt_func[0](primary_plane_info, y, output_buffer);
>>>> +
>>>> +		/* If there are other planes besides primary, we consider the active
>>>> +		 * planes should be in z-order and compose them associatively:
>>>> +		 * ((primary <- overlay) <- cursor)
>>>> +		 */
>>>> +		for (i = 1; i < n_active_planes; i++) {
>>>> +			if (!check_y_limit(plane[i]->frame_info, y))
>>>> +				continue;
>>>> +
>>>> +			plane_fmt_func[i](plane[i]->frame_info, y, stage_buffer);
>>>> +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
>>>> +					    output_buffer);
>>>>    		}
>>>> -		i_dst++;
>>>> +
>>>> +		*crc32 = crc32_le(*crc32, (void *)output_buffer, row_size);
>>>> +
>>>> +		if (wb_frame_info)
>>>> +			wb_fmt_func(wb_frame_info, y, output_buffer);
>>>>    	}
>>>>    }
>>>> -static void compose_plane(struct vkms_frame_info *primary_plane_info,
>>>> -			  struct vkms_frame_info *plane_frame_info,
>>>> -			  void *vaddr_out)
>>>> +static void get_format_transform_functions(struct vkms_crtc_state *crtc_state,
>>>> +					   format_transform_func plane_funcs[])
>>>>    {
>>>> -	struct drm_framebuffer *fb = plane_frame_info->fb;
>>>> -	void *vaddr;
>>>> -	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>>>> +	struct vkms_plane_state **active_planes = crtc_state->active_planes;
>>>> +	u32 n_active_planes = crtc_state->num_active_planes, s_fmt;
>>>> +	int i;
>>>> -	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>>>> -		return;
>>>> +	for (i = 0; i < n_active_planes; i++) {
>>>> +		s_fmt = active_planes[i]->frame_info->fb->format->format;
>>>> +		plane_funcs[i] = get_fmt_transform_function(s_fmt);
>>>> +	}
>>>> +}
>>>> -	vaddr = plane_frame_info->map[0].vaddr;
>>>> +static bool check_planes_x_bounds(struct vkms_crtc_state *crtc_state,
>>>> +				  struct vkms_frame_info *wb_frame_info)
>>>> +{
>>>> +	struct vkms_plane_state **planes = crtc_state->active_planes;
>>>> +	struct vkms_frame_info *primary_plane_info = planes[0]->frame_info;
>>>> +	int line_width = drm_rect_width(&primary_plane_info->dst);
>>>> +	u32 n_active_planes = crtc_state->num_active_planes;
>>>> +	int i;
>>>> -	if (fb->format->format == DRM_FORMAT_ARGB8888)
>>>> -		pixel_blend = &alpha_blend;
>>>> -	else
>>>> -		pixel_blend = &x_blend;
>>>> +	for (i = 0; i < n_active_planes; i++) {
>>>> +		int x_dst = planes[i]->frame_info->dst.x1;
>>>> +		int x_src = planes[i]->frame_info->src.x1 >> 16;
>>>> +		int x2_src = planes[i]->frame_info->src.x2 >> 16;
>>>> +		int x_limit = drm_rect_width(&planes[i]->frame_info->dst);
>>>> -	blend(vaddr_out, vaddr, primary_plane_info,
>>>> -	      plane_frame_info, pixel_blend);
>>>> +		if (x_dst + x_limit > line_width)
>>>> +			return false;
>>>> +		if (x_src + x_limit > x2_src)
>>>> +			return false;
>>>> +	}
>>>> +
>>>> +	return true;
>>>>    }
>>>> -static int compose_active_planes(void **vaddr_out,
>>>> -				 struct vkms_frame_info *primary_plane_info,
>>>> -				 struct vkms_crtc_state *crtc_state)
>>>> +static int compose_active_planes(struct vkms_frame_info *wb_frame_info,
>>>> +				 struct vkms_crtc_state *crtc_state,
>>>> +				 u32 *crc32)
>>>>    {
>>>> -	struct drm_framebuffer *fb = primary_plane_info->fb;
>>>> -	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>>>> -	const void *vaddr;
>>>> -	int i;
>>>> +	format_transform_func plane_funcs[NUM_OVERLAY_PLANES], wb_func = NULL;
>>>> +	int line_width, ret = 0, pixel_size = sizeof(struct line_buffer);
>>>> +	struct vkms_frame_info *primary_plane_info = NULL;
>>>> +	struct line_buffer *output_buffer, *stage_buffer;
>>>> +	struct vkms_plane_state *act_plane = NULL;
>>>> +	u32 wb_format;
>>>> -	if (!*vaddr_out) {
>>>> -		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
>>>> -		if (!*vaddr_out) {
>>>> -			DRM_ERROR("Cannot allocate memory for output frame.");
>>>> -			return -ENOMEM;
>>>> -		}
>>>> +	if (WARN_ON(pixel_size != 8))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (crtc_state->num_active_planes >= 1) {
>>>> +		act_plane = crtc_state->active_planes[0];
>>>> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>>>> +			primary_plane_info = act_plane->frame_info;
>>>>    	}
>>>> +	if (!primary_plane_info)
>>>> +		return -EINVAL;
>>>> +
>>>>    	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>>>>    		return -EINVAL;
>>>> -	vaddr = primary_plane_info->map[0].vaddr;
>>>> +	if (WARN_ON(!check_planes_x_bounds(crtc_state, wb_frame_info)))
>>>> +		return -EINVAL;
>>>> -	memcpy(*vaddr_out, vaddr, gem_obj->size);
>>>> +	line_width = drm_rect_width(&primary_plane_info->dst);
>>>> -	/* If there are other planes besides primary, we consider the active
>>>> -	 * planes should be in z-order and compose them associatively:
>>>> -	 * ((primary <- overlay) <- cursor)
>>>> -	 */
>>>> -	for (i = 1; i < crtc_state->num_active_planes; i++)
>>>> -		compose_plane(primary_plane_info,
>>>> -			      crtc_state->active_planes[i]->frame_info,
>>>> -			      *vaddr_out);
>>>> +	stage_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>>> +	if (!stage_buffer) {
>>>> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
>>>> +		return -ENOMEM;
>>>> +	}
>>>> -	return 0;
>>>> +	output_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>>> +	if (!output_buffer) {
>>>> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
>>>> +		ret = -ENOMEM;
>>>> +		goto free_stage_buffer;
>>>> +	}
>>>> +
>>>> +	get_format_transform_functions(crtc_state, plane_funcs);
>>>> +
>>>> +	if (wb_frame_info) {
>>>> +		wb_format = wb_frame_info->fb->format->format;
>>>> +		wb_func = get_wb_fmt_transform_function(wb_format);
>>>> +		wb_frame_info->src = primary_plane_info->src;
>>>> +		wb_frame_info->dst = primary_plane_info->dst;
>>>> +	}
>>>> +
>>>> +	blend(wb_frame_info, wb_func, crtc_state, plane_funcs, crc32,
>>>> +	      stage_buffer, output_buffer, (s64)line_width * pixel_size);
>>>> +
>>>> +	kvfree(output_buffer);
>>>> +free_stage_buffer:
>>>> +	kvfree(stage_buffer);
>>>> +
>>>> +	return ret;
>>>>    }
>>>>    /**
>>>> @@ -222,13 +230,12 @@ void vkms_composer_worker(struct work_struct *work)
>>>>    						struct vkms_crtc_state,
>>>>    						composer_work);
>>>>    	struct drm_crtc *crtc = crtc_state->base.crtc;
>>>> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>>>> +	struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>>>>    	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>>>> -	struct vkms_frame_info *primary_plane_info = NULL;
>>>> -	struct vkms_plane_state *act_plane = NULL;
>>>>    	bool crc_pending, wb_pending;
>>>> -	void *vaddr_out = NULL;
>>>> -	u32 crc32 = 0;
>>>>    	u64 frame_start, frame_end;
>>>> +	u32 crc32 = 0;
>>>>    	int ret;
>>>>    	spin_lock_irq(&out->composer_lock);
>>>> @@ -248,35 +255,19 @@ void vkms_composer_worker(struct work_struct *work)
>>>>    	if (!crc_pending)
>>>>    		return;
>>>> -	if (crtc_state->num_active_planes >= 1) {
>>>> -		act_plane = crtc_state->active_planes[0];
>>>> -		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>>>> -			primary_plane_info = act_plane->frame_info;
>>>> -	}
>>>> -
>>>> -	if (!primary_plane_info)
>>>> -		return;
>>>> -
>>>>    	if (wb_pending)
>>>> -		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
>>>> +		ret = compose_active_planes(wb_frame_info, crtc_state, &crc32);
>>>> +	else
>>>> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>>>> -	ret = compose_active_planes(&vaddr_out, primary_plane_info,
>>>> -				    crtc_state);
>>>> -	if (ret) {
>>>> -		if (ret == -EINVAL && !wb_pending)
>>>> -			kvfree(vaddr_out);
>>>> +	if (ret)
>>>>    		return;
>>>> -	}
>>>> -
>>>> -	crc32 = compute_crc(vaddr_out, primary_plane_info);
>>>>    	if (wb_pending) {
>>>>    		drm_writeback_signal_completion(&out->wb_connector, 0);
>>>>    		spin_lock_irq(&out->composer_lock);
>>>>    		crtc_state->wb_pending = false;
>>>>    		spin_unlock_irq(&out->composer_lock);
>>>> -	} else {
>>>> -		kvfree(vaddr_out);
>>>>    	}
>>>>    	/*
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> new file mode 100644
>>>> index 000000000000..0d1838d1b835
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> @@ -0,0 +1,138 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0+ */
>>> checkpatch complains here ^ Use `\\`
>>
>> I change it, but:
>>
>> WARNING: Improper SPDX comment style for
>> 'drivers/gpu/drm/vkms/vkms_formats.h', please use '/*' instead
>> #660: FILE: drivers/gpu/drm/vkms/vkms_formats.h:1:
>> +// SPDX-License-Identifier: GPL-2.0+
> Ok, previously checkpatch was complaining only for `vkms_format.c` but
> not for the header. I got it wrong when I pointed to the .h file too,
> sorry. I had two points in mind, but the second issue is not here, it is
> `multiple blank lines` in the next patch.
> 
> btw, you find more details about the comment style for SPDX here:
> https://www.kernel.org/doc/html/latest/process/license-rules.html#license-identifier-syntax
> 
>>
>> I keep the change to be consitent with the rest of the vkms files.
>>
>>>> +
>>>> +#include <drm/drm_rect.h>
>>>> +#include "vkms_formats.h"
>>>> +
>>>> +format_transform_func get_fmt_transform_function(u32 format)
>>>> +{
>>>> +	if (format == DRM_FORMAT_ARGB8888)
>>>> +		return &ARGB8888_to_ARGB16161616;
>>>> +	else
>>>> +		return &XRGB8888_to_ARGB16161616;
>>>> +}
>>>> +
>>>> +format_transform_func get_wb_fmt_transform_function(u32 format)
>>>> +{
>>>> +	if (format == DRM_FORMAT_ARGB8888)
>>>> +		return &convert_to_ARGB8888;
>>>> +	else
>>>> +		return &convert_to_XRGB8888;
>>>> +}
>>>> +
>>>> +static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
>>>> +{
>>>> +	return frame_info->offset + (y * frame_info->pitch)
>>>> +				  + (x * frame_info->cpp);
>>>> +}
>>>> +
>>>> +/*
>>>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>>>> + *
>>>> + * @frame_info: Buffer metadata
>>>> + * @x: The x(width) coordinate of the 2D buffer
>>>> + * @y: The y(Heigth) coordinate of the 2D buffer
>>>> + *
>>>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>>>> + * returns the address of the first color channel.
>>>> + * This function assumes the channels are packed together, i.e. a color channel
>>>> + * comes immediately after another in the memory. And therefore, this function
>>>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>>>> + */
>>>> +static void *packed_pixels_addr(struct vkms_frame_info *frame_info, int x, int y)
>>>> +{
>>>> +	int offset = pixel_offset(frame_info, x, y);
>>>> +
>>>> +	return (u8 *)frame_info->map[0].vaddr + offset;
>>>> +}
>>>> +
>>>> +static void *get_packed_src_addr(struct vkms_frame_info *frame_info, int y)
>>>> +{
>>>> +	int x_src = frame_info->src.x1 >> 16;
>>>> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>>>> +
>>>> +	return packed_pixels_addr(frame_info, x_src, y_src);
>>>> +}
>>>> +
>>>> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>>>> +			      struct line_buffer *stage_buffer)
>>>> +{
>>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>>> +	int x, x_limit = drm_rect_width(&frame_info->dst);
>>>> +
>>>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>>>> +		/*
>>>> +		 * Organizes the channels in their respective positions and converts
>>>> +		 * the 8 bits channel to 16.
>>>> +		 * The 257 is the "conversion ratio". This number is obtained by the
>>>> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
>>>> +		 * the best color value in a pixel format with more possibilities.
>>>> +		 * And a similar idea applies to others RGB color conversions.
>>>> +		 */
>>>> +		stage_buffer[x].a = (u16)src_pixels[3] * 257;
>>>> +		stage_buffer[x].r = (u16)src_pixels[2] * 257;
>>>> +		stage_buffer[x].g = (u16)src_pixels[1] * 257;
>>>> +		stage_buffer[x].b = (u16)src_pixels[0] * 257;
>>>> +	}
>>>> +}
>>>> +
>>>> +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>>>> +			      struct line_buffer *stage_buffer)
>>>> +{
>>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>>> +	int x, x_limit = drm_rect_width(&frame_info->dst);
>>>> +
>>>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
>>>> +		stage_buffer[x].a = (u16)0xffff;
>>>> +		stage_buffer[x].r = (u16)src_pixels[2] * 257;
>>>> +		stage_buffer[x].g = (u16)src_pixels[1] * 257;
>>>> +		stage_buffer[x].b = (u16)src_pixels[0] * 257;
>>>> +	}
>>>> +}
>>>> +
>>>> +/*
>>>> + * The following  functions take an line of ARGB16161616 pixels from the
>>>> + * src_buffer, convert them to a specific format, and store them in the
>>>> + * destination.
>>>> + *
>>>> + * They are used in the `compose_active_planes` to convert and store a line
>>>> + * from the src_buffer to the writeback buffer.
>>>> + */
>>>> +void convert_to_ARGB8888(struct vkms_frame_info *frame_info,
>>>> +			 int y, struct line_buffer *src_buffer)
>>>> +{
>>>> +	int x, x_dst = frame_info->dst.x1;
>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>>> +	int x_limit = drm_rect_width(&frame_info->dst);
>>>> +
>>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>>>> +		/*
>>>> +		 * This sequence below is important because the format's byte order is
>>>> +		 * in little-endian. In the case of the ARGB8888 the memory is
>>>> +		 * organized this way:
>>>> +		 *
>>>> +		 * | Addr     | = blue channel
>>>> +		 * | Addr + 1 | = green channel
>>>> +		 * | Addr + 2 | = Red channel
>>>> +		 * | Addr + 3 | = Alpha channel
>>>> +		 */
>>>> +		dst_pixels[3] = DIV_ROUND_UP(src_buffer[x].a, 257);
>>>> +		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
>>>> +		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
>>>> +		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
>>>> +	}
>>>> +}
>>>> +
>>>> +void convert_to_XRGB8888(struct vkms_frame_info *frame_info,
>>>> +			 int y, struct line_buffer *src_buffer)
>>>> +{
>>>> +	int x, x_dst = frame_info->dst.x1;
>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>>> +	int x_limit = drm_rect_width(&frame_info->dst);
>>>> +
>>>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>>>> +		dst_pixels[3] = (u8)0xff;
>>>> +		dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
>>>> +		dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
>>>> +		dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
>>>> +	}
>>>> +}
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
>>>> new file mode 100644
>>>> index 000000000000..817e8b2124ae
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
>>>> @@ -0,0 +1,31 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0+ */
>>> and here ^
>>>
>>>> +
>>>> +#ifndef _VKMS_FORMATS_H_
>>>> +#define _VKMS_FORMATS_H_
>>>> +
>>>> +#include "vkms_drv.h"
>>>> +
>>>> +struct line_buffer {
>>>> +	u16 a, r, g, b;
>>>> +};
>>>> +
>>>> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>>>> +			      struct line_buffer *stage_buffer);
>>>> +
>>>> +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>>>> +			      struct line_buffer *stage_buffer);
>>>> +
>>>> +void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
>>>> +			 struct line_buffer *src_buffer);
>>>> +
>>>> +void convert_to_XRGB8888(struct vkms_frame_info *frame_info, int y,
>>>> +			 struct line_buffer *src_buffer);
>>>> +
>>>> +typedef void (*format_transform_func)(struct vkms_frame_info *frame_info, int y,
>>>> +				      struct line_buffer *buffer);
>>>> +
>>>> +format_transform_func get_fmt_transform_function(u32 format);
>>>> +
>>>> +format_transform_func get_wb_fmt_transform_function(u32 format);
>>>> +
>>>> +#endif /* _VKMS_FORMATS_H_ */
>>>> -- 
>>>> 2.30.2
>>>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-02-21  1:02         ` Igor Torrente
@ 2022-02-21  9:18           ` Pekka Paalanen
  2022-02-22  1:13             ` Igor Torrente
  0 siblings, 1 reply; 31+ messages in thread
From: Pekka Paalanen @ 2022-02-21  9:18 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	Melissa Wen, tzimmermann, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 4882 bytes --]

On Sun, 20 Feb 2022 22:02:12 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Hi Melissa,
> 
> On 2/9/22 18:45, Melissa Wen wrote:
> > On 02/08, Igor Torrente wrote:  
> >> Hi Melissa,
> >>
> >> On 2/8/22 07:40, Melissa Wen wrote:  
> >>> On 01/21, Igor Torrente wrote:  
> >>>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
> >>>> as a color input.
> >>>>
> >>>> This patch refactors all the functions related to the plane composition
> >>>> to overcome this limitation.
> >>>>
> >>>> A new internal format(`struct pixel`) is introduced to deal with all
> >>>> possible inputs. It consists of 16 bits fields that represent each of
> >>>> the channels.
> >>>>
> >>>> The pixels blend is done using this internal format. And new handlers
> >>>> are being added to convert a specific format to/from this internal format.
> >>>>
> >>>> So the blend operation depends on these handlers to convert to this common
> >>>> format. The blended result, if necessary, is converted to the writeback
> >>>> buffer format.
> >>>>
> >>>> This patch introduces three major differences to the blend function.
> >>>> 1 - All the planes are blended at once.
> >>>> 2 - The blend calculus is done as per line instead of per pixel.
> >>>> 3 - It is responsible to calculates the CRC and writing the writeback
> >>>>       buffer(if necessary).
> >>>>
> >>>> These changes allow us to allocate way less memory in the intermediate
> >>>> buffer to compute these operations. Because now we don't need to
> >>>> have the entire intermediate image lines at once, just one line is
> >>>> enough.
> >>>>
> >>>> | Memory consumption (output dimensions) |
> >>>> |:--------------------------------------:|
> >>>> |       Current      |     This patch    |
> >>>> |:------------------:|:-----------------:|
> >>>> |   Width * Heigth   |     2 * Width     |
> >>>>
> >>>> Beyond memory, we also have a minor performance benefit from all
> >>>> these changes. Results running the IGT tests `*kms_cursor_crc*`:
> >>>>  
> >>> First, thanks for this improvement.
> >>>
> >>> Some recent changes in kms_cursor_crc caused VKMS to fail in most test
> >>> cases (iirc, only size-change and alpha-opaque are passing currently).  
> >>
> >> I updated my igt and kernel(from drm_misc/drm-misc-next) to the latest
> >> commit[1][2] and I'm getting mixed results. Sometimes most of the test
> >> passes, sometimes almost nothing passes.  
> > hmm.. is it happening when running kms_cursor_crc? Is the results
> > variation random or is it possible to follow a set of steps to reproduce
> > it? When failing, what is the reason displayed by the log?  
> 
> I investigated it a little bit and discovered that the KMS
> cursor(".*kms_cursor_crc*" ) are failing after the execution of
> writeback tests(".*kms_writeback.*").
> 
> I don't know what is causing it, but they are failing while trying to 
> commit the KMS changes.
> 
> out.txt:
> IGT-Version: 1.26-NO-GIT (x86_64) (Linux: 5.17.0-rc2 x86_64)
> Stack trace:
>    #0 ../lib/igt_core.c:1754 __igt_fail_assert()
>    #1 ../lib/igt_kms.c:3795 do_display_commit()
>    #2 ../lib/igt_kms.c:3901 igt_display_commit2()
>    #3 ../tests/kms_cursor_crc.c:820 __igt_unique____real_main814()
>    #4 ../tests/kms_cursor_crc.c:814 main()
>    #5 ../csu/libc-start.c:308 __libc_start_main()
>    #6 [_start+0x2a]
> Subtest pipe-A-cursor-size-change: FAIL
> 
> err.txt:
> (kms_cursor_crc:1936) igt_kms-CRITICAL: Test assertion failure function 
> do_display_commit, file ../lib/igt_kms.c:3795:
> (kms_cursor_crc:1936) igt_kms-CRITICAL: Failed assertion: ret == 0
> (kms_cursor_crc:1936) igt_kms-CRITICAL: Last errno: 22, Invalid argument
> (kms_cursor_crc:1936) igt_kms-CRITICAL: error: -22 != 0
> 
> > 
> >  From my side, only the first two subtest of kms_cursor_crc is passing
> > before this patch. And after your changes here, all subtests are
> > successful again, except those related to 32x10 cursor size (that needs
> > futher investigation). I didn't check how the recent changes in
> > kms_cursor_crc affect VKMS performance on it, but I bet that clearing
> > the alpha channel is the reason to have the performance back.  
> 
> Yeah, I also don't understand why the 32x10 cursor tests are failing.
> 

Hi,

are the tests putting the cursor partially outside of the CRTC area?
Or partially outside of primary plane area (which IIRC you used when you
should have used the CRTC area?)?

Does the writeback test forget to unlink the writeback connector? Or
does VKMS not handle unlinking the writeback connector?

If both are a problem, the latter would be just an unrelated bug that
exposes the first bug in VKMS, because whether writeback is used or not
probably should not affect where the cursor plane is allowed to be.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-02-21  9:18           ` Pekka Paalanen
@ 2022-02-22  1:13             ` Igor Torrente
  2022-02-22  9:26               ` Pekka Paalanen
  0 siblings, 1 reply; 31+ messages in thread
From: Igor Torrente @ 2022-02-22  1:13 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	Melissa Wen, tzimmermann, ~lkcamp/patches

Hi Pekka,

On 2/21/22 06:18, Pekka Paalanen wrote:
> On Sun, 20 Feb 2022 22:02:12 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
> 
>> Hi Melissa,
>>
>> On 2/9/22 18:45, Melissa Wen wrote:
>>> On 02/08, Igor Torrente wrote:
>>>> Hi Melissa,
>>>>
>>>> On 2/8/22 07:40, Melissa Wen wrote:
>>>>> On 01/21, Igor Torrente wrote:
>>>>>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>>>>>> as a color input.
>>>>>>
>>>>>> This patch refactors all the functions related to the plane composition
>>>>>> to overcome this limitation.
>>>>>>
>>>>>> A new internal format(`struct pixel`) is introduced to deal with all
>>>>>> possible inputs. It consists of 16 bits fields that represent each of
>>>>>> the channels.
>>>>>>
>>>>>> The pixels blend is done using this internal format. And new handlers
>>>>>> are being added to convert a specific format to/from this internal format.
>>>>>>
>>>>>> So the blend operation depends on these handlers to convert to this common
>>>>>> format. The blended result, if necessary, is converted to the writeback
>>>>>> buffer format.
>>>>>>
>>>>>> This patch introduces three major differences to the blend function.
>>>>>> 1 - All the planes are blended at once.
>>>>>> 2 - The blend calculus is done as per line instead of per pixel.
>>>>>> 3 - It is responsible to calculates the CRC and writing the writeback
>>>>>>        buffer(if necessary).
>>>>>>
>>>>>> These changes allow us to allocate way less memory in the intermediate
>>>>>> buffer to compute these operations. Because now we don't need to
>>>>>> have the entire intermediate image lines at once, just one line is
>>>>>> enough.
>>>>>>
>>>>>> | Memory consumption (output dimensions) |
>>>>>> |:--------------------------------------:|
>>>>>> |       Current      |     This patch    |
>>>>>> |:------------------:|:-----------------:|
>>>>>> |   Width * Heigth   |     2 * Width     |
>>>>>>
>>>>>> Beyond memory, we also have a minor performance benefit from all
>>>>>> these changes. Results running the IGT tests `*kms_cursor_crc*`:
>>>>>>   
>>>>> First, thanks for this improvement.
>>>>>
>>>>> Some recent changes in kms_cursor_crc caused VKMS to fail in most test
>>>>> cases (iirc, only size-change and alpha-opaque are passing currently).
>>>>
>>>> I updated my igt and kernel(from drm_misc/drm-misc-next) to the latest
>>>> commit[1][2] and I'm getting mixed results. Sometimes most of the test
>>>> passes, sometimes almost nothing passes.
>>> hmm.. is it happening when running kms_cursor_crc? Is the results
>>> variation random or is it possible to follow a set of steps to reproduce
>>> it? When failing, what is the reason displayed by the log?
>>
>> I investigated it a little bit and discovered that the KMS
>> cursor(".*kms_cursor_crc*" ) are failing after the execution of
>> writeback tests(".*kms_writeback.*").
>>
>> I don't know what is causing it, but they are failing while trying to
>> commit the KMS changes.
>>
>> out.txt:
>> IGT-Version: 1.26-NO-GIT (x86_64) (Linux: 5.17.0-rc2 x86_64)
>> Stack trace:
>>     #0 ../lib/igt_core.c:1754 __igt_fail_assert()
>>     #1 ../lib/igt_kms.c:3795 do_display_commit()
>>     #2 ../lib/igt_kms.c:3901 igt_display_commit2()
>>     #3 ../tests/kms_cursor_crc.c:820 __igt_unique____real_main814()
>>     #4 ../tests/kms_cursor_crc.c:814 main()
>>     #5 ../csu/libc-start.c:308 __libc_start_main()
>>     #6 [_start+0x2a]
>> Subtest pipe-A-cursor-size-change: FAIL
>>
>> err.txt:
>> (kms_cursor_crc:1936) igt_kms-CRITICAL: Test assertion failure function
>> do_display_commit, file ../lib/igt_kms.c:3795:
>> (kms_cursor_crc:1936) igt_kms-CRITICAL: Failed assertion: ret == 0
>> (kms_cursor_crc:1936) igt_kms-CRITICAL: Last errno: 22, Invalid argument
>> (kms_cursor_crc:1936) igt_kms-CRITICAL: error: -22 != 0
>>
>>>
>>>   From my side, only the first two subtest of kms_cursor_crc is passing
>>> before this patch. And after your changes here, all subtests are
>>> successful again, except those related to 32x10 cursor size (that needs
>>> futher investigation). I didn't check how the recent changes in
>>> kms_cursor_crc affect VKMS performance on it, but I bet that clearing
>>> the alpha channel is the reason to have the performance back.
>>
>> Yeah, I also don't understand why the 32x10 cursor tests are failing.
>>
> 
> Hi,
> 
> are the tests putting the cursor partially outside of the CRTC area?
> Or partially outside of primary plane area (which IIRC you used when you
> should have used the CRTC area?)?
> 
> Does the writeback test forget to unlink the writeback connector? Or
> does VKMS not handle unlinking the writeback connector?

I don't know the answer to all these questions.

I did try to find the commit that introduces this issue, and I found
that it's happening since the writeback was introduced in Aug
2020(dbd9d80c).

And the failure related to the 32x10 cursor was happening before my
changes.

> 
> If both are a problem, the latter would be just an unrelated bug that
> exposes the first bug in VKMS, because whether writeback is used or not
> probably should not affect where the cursor plane is allowed to be.

Yeah, I don't think those a related.

Best Regards.
---
Igor Torrente

> 
> 
> Thanks,
> pq

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-02-22  1:13             ` Igor Torrente
@ 2022-02-22  9:26               ` Pekka Paalanen
  0 siblings, 0 replies; 31+ messages in thread
From: Pekka Paalanen @ 2022-02-22  9:26 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, rodrigosiqueiramelo, airlied, dri-devel,
	Melissa Wen, tzimmermann, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 6063 bytes --]

On Mon, 21 Feb 2022 22:13:21 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Hi Pekka,
> 
> On 2/21/22 06:18, Pekka Paalanen wrote:
> > On Sun, 20 Feb 2022 22:02:12 -0300
> > Igor Torrente <igormtorrente@gmail.com> wrote:
> >   
> >> Hi Melissa,
> >>
> >> On 2/9/22 18:45, Melissa Wen wrote:  
> >>> On 02/08, Igor Torrente wrote:  
> >>>> Hi Melissa,
> >>>>
> >>>> On 2/8/22 07:40, Melissa Wen wrote:  
> >>>>> On 01/21, Igor Torrente wrote:  
> >>>>>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
> >>>>>> as a color input.
> >>>>>>
> >>>>>> This patch refactors all the functions related to the plane composition
> >>>>>> to overcome this limitation.
> >>>>>>
> >>>>>> A new internal format(`struct pixel`) is introduced to deal with all
> >>>>>> possible inputs. It consists of 16 bits fields that represent each of
> >>>>>> the channels.
> >>>>>>
> >>>>>> The pixels blend is done using this internal format. And new handlers
> >>>>>> are being added to convert a specific format to/from this internal format.
> >>>>>>
> >>>>>> So the blend operation depends on these handlers to convert to this common
> >>>>>> format. The blended result, if necessary, is converted to the writeback
> >>>>>> buffer format.
> >>>>>>
> >>>>>> This patch introduces three major differences to the blend function.
> >>>>>> 1 - All the planes are blended at once.
> >>>>>> 2 - The blend calculus is done as per line instead of per pixel.
> >>>>>> 3 - It is responsible to calculates the CRC and writing the writeback
> >>>>>>        buffer(if necessary).
> >>>>>>
> >>>>>> These changes allow us to allocate way less memory in the intermediate
> >>>>>> buffer to compute these operations. Because now we don't need to
> >>>>>> have the entire intermediate image lines at once, just one line is
> >>>>>> enough.
> >>>>>>
> >>>>>> | Memory consumption (output dimensions) |
> >>>>>> |:--------------------------------------:|
> >>>>>> |       Current      |     This patch    |
> >>>>>> |:------------------:|:-----------------:|
> >>>>>> |   Width * Heigth   |     2 * Width     |
> >>>>>>
> >>>>>> Beyond memory, we also have a minor performance benefit from all
> >>>>>> these changes. Results running the IGT tests `*kms_cursor_crc*`:
> >>>>>>     
> >>>>> First, thanks for this improvement.
> >>>>>
> >>>>> Some recent changes in kms_cursor_crc caused VKMS to fail in most test
> >>>>> cases (iirc, only size-change and alpha-opaque are passing currently).  
> >>>>
> >>>> I updated my igt and kernel(from drm_misc/drm-misc-next) to the latest
> >>>> commit[1][2] and I'm getting mixed results. Sometimes most of the test
> >>>> passes, sometimes almost nothing passes.  
> >>> hmm.. is it happening when running kms_cursor_crc? Is the results
> >>> variation random or is it possible to follow a set of steps to reproduce
> >>> it? When failing, what is the reason displayed by the log?  
> >>
> >> I investigated it a little bit and discovered that the KMS
> >> cursor(".*kms_cursor_crc*" ) are failing after the execution of
> >> writeback tests(".*kms_writeback.*").
> >>
> >> I don't know what is causing it, but they are failing while trying to
> >> commit the KMS changes.
> >>
> >> out.txt:
> >> IGT-Version: 1.26-NO-GIT (x86_64) (Linux: 5.17.0-rc2 x86_64)
> >> Stack trace:
> >>     #0 ../lib/igt_core.c:1754 __igt_fail_assert()
> >>     #1 ../lib/igt_kms.c:3795 do_display_commit()
> >>     #2 ../lib/igt_kms.c:3901 igt_display_commit2()
> >>     #3 ../tests/kms_cursor_crc.c:820 __igt_unique____real_main814()
> >>     #4 ../tests/kms_cursor_crc.c:814 main()
> >>     #5 ../csu/libc-start.c:308 __libc_start_main()
> >>     #6 [_start+0x2a]
> >> Subtest pipe-A-cursor-size-change: FAIL
> >>
> >> err.txt:
> >> (kms_cursor_crc:1936) igt_kms-CRITICAL: Test assertion failure function
> >> do_display_commit, file ../lib/igt_kms.c:3795:
> >> (kms_cursor_crc:1936) igt_kms-CRITICAL: Failed assertion: ret == 0
> >> (kms_cursor_crc:1936) igt_kms-CRITICAL: Last errno: 22, Invalid argument
> >> (kms_cursor_crc:1936) igt_kms-CRITICAL: error: -22 != 0
> >>  
> >>>
> >>>   From my side, only the first two subtest of kms_cursor_crc is passing
> >>> before this patch. And after your changes here, all subtests are
> >>> successful again, except those related to 32x10 cursor size (that needs
> >>> futher investigation). I didn't check how the recent changes in
> >>> kms_cursor_crc affect VKMS performance on it, but I bet that clearing
> >>> the alpha channel is the reason to have the performance back.  
> >>
> >> Yeah, I also don't understand why the 32x10 cursor tests are failing.
> >>  
> > 
> > Hi,
> > 
> > are the tests putting the cursor partially outside of the CRTC area?
> > Or partially outside of primary plane area (which IIRC you used when you
> > should have used the CRTC area?)?
> > 
> > Does the writeback test forget to unlink the writeback connector? Or
> > does VKMS not handle unlinking the writeback connector?  
> 
> I don't know the answer to all these questions.

These are just suggestions in the direction of research, just in case
you had no idea. ;-)

After all, the UAPI error code is EINVAL, so something in VKMS rejects
the IGT-produced configuration. Figuring out what that configuration is
and why it gets rejected might be useful to find out.

Maybe the original writeback code did not expect planes to be partially
off-screen? Guarding against that would produce EINVAL. This is just a
wild guess, I've never read that code, but it also seems like the
simplest possible mistake to make in good faith - not a bug in code,
just a wrong initial assumption of use cases.

If you found in your testing that the IGT cursor-size-change test
succeeds if ran before writeback test, but fails if ran after the
writeback test, then obviously something in the writeback test is
leaving stray state behind. It could be the test itself, or it could be
a VKMS bug.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-02-10  9:37   ` Pekka Paalanen
@ 2022-02-25  0:43     ` Igor Torrente
  2022-02-25  9:38       ` Pekka Paalanen
  0 siblings, 1 reply; 31+ messages in thread
From: Igor Torrente @ 2022-02-25  0:43 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, Thomas Zimmermann, rodrigosiqueiramelo, airlied,
	dri-devel, melissa.srw, ~lkcamp/patches, kernel test robot

Hi Pekka,

On 2/10/22 06:37, Pekka Paalanen wrote:
> On Fri, 21 Jan 2022 18:38:29 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
>
>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>> as a color input.
>>
>> This patch refactors all the functions related to the plane composition
>> to overcome this limitation.
>>
>> A new internal format(`struct pixel`) is introduced to deal with all
>> possible inputs. It consists of 16 bits fields that represent each of
>> the channels.
>>
>> The pixels blend is done using this internal format. And new handlers
>> are being added to convert a specific format to/from this internal format.
>>
>> So the blend operation depends on these handlers to convert to this common
>> format. The blended result, if necessary, is converted to the writeback
>> buffer format.
>>
>> This patch introduces three major differences to the blend function.
>> 1 - All the planes are blended at once.
>> 2 - The blend calculus is done as per line instead of per pixel.
>> 3 - It is responsible to calculates the CRC and writing the writeback
>>      buffer(if necessary).
>>
>> These changes allow us to allocate way less memory in the intermediate
>> buffer to compute these operations. Because now we don't need to
>> have the entire intermediate image lines at once, just one line is
>> enough.
>>
>> | Memory consumption (output dimensions) |
>> |:--------------------------------------:|
>> |       Current      |     This patch    |
>> |:------------------:|:-----------------:|
>> |   Width * Heigth   |     2 * Width     |
>>
>> Beyond memory, we also have a minor performance benefit from all
>> these changes. Results running the IGT tests `*kms_cursor_crc*`:
>>
>> |                 Frametime                  |
>> |:------------------------------------------:|
>> |  Implementation |  Current  |  This commit |
>> |:---------------:|:---------:|:------------:|
>> | frametime range |  8~22 ms  |    5~18 ms   |
>> |     Average     |  10.0 ms  |    7.3 ms    |
>>
>> Reported-by: kernel test robot <lkp@intel.com>
>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>> ---
>> V2: Improves the performance drastically, by perfoming the operations
>>      per-line and not per-pixel(Pekka Paalanen).
>>      Minor improvements(Pekka Paalanen).
>>
>> V3: Changes the code to blend the planes all at once. This improves
>>      performance, memory consumption, and removes much of the weirdness
>>      of the V2(Pekka Paalanen and me).
>>      Minor improvements(Pekka Paalanen and me).
>>
>> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
>> ---
>>   drivers/gpu/drm/vkms/Makefile        |   1 +
>>   drivers/gpu/drm/vkms/vkms_composer.c | 335 +++++++++++++--------------
>>   drivers/gpu/drm/vkms/vkms_formats.c  | 138 +++++++++++
>>   drivers/gpu/drm/vkms/vkms_formats.h  |  31 +++
>>   4 files changed, 333 insertions(+), 172 deletions(-)
>>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
>
> Hi Igor,
>
> I'm really happy to see this, thanks!
>
> I still have some security/robustness and other comments below.
>
> I've deleted all the minus lines from the patch to make the new code
> more clear.
>
>>
>> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
>> index 72f779cbfedd..1b28a6a32948 100644
>> --- a/drivers/gpu/drm/vkms/Makefile
>> +++ b/drivers/gpu/drm/vkms/Makefile
>> @@ -3,6 +3,7 @@ vkms-y := \
>>      vkms_drv.o \
>>      vkms_plane.o \
>>      vkms_output.o \
>> +    vkms_formats.o \
>>      vkms_crtc.o \
>>      vkms_composer.o \
>>      vkms_writeback.o
>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>> index 95029d2ebcac..9f70fcf84fb9 100644
>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>> @@ -9,202 +9,210 @@
>>   #include <drm/drm_vblank.h>
>>
>>   #include "vkms_drv.h"
>> +#include "vkms_formats.h"
>>
>> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>>   {
>> +    u32 new_color;
>>
>> +    new_color = (src * 0xffff + dst * (0xffff - alpha));
>>
>> +    return DIV_ROUND_UP(new_color, 0xffff);
>
> Why round-up rather than the usual mathematical rounding?

AFAIK, this is the only round that's present in the kernel. And if I
understood correctly it is the round toward positive infinity that we are
all used to use.

>
>>   }
>>
>>   /**
>> + * pre_mul_alpha_blend - alpha blending equation
>> + * @src_frame_info: source framebuffer's metadata
>> + * @stage_buffer: The line with the pixels from src_plane
>> + * @output_buffer: A line buffer that receives all the blends output
>>    *
>> + * Using the information from the `frame_info`, this blends only the
>> + * necessary pixels from the `stage_buffer` to the `output_buffer`
>> + * using premultiplied blend formula.
>>    *
>> + * The current DRM assumption is that pixel color values have been already
>> + * pre-multiplied with the alpha channel values. See more
>> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
>> + * completely opaque background.
>>    */
>> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
>> +                            struct line_buffer *stage_buffer,
>> +                            struct line_buffer *output_buffer)
>>   {
>> +    int x, x_dst = frame_info->dst.x1;
>> +    int x_limit = drm_rect_width(&frame_info->dst);
>> +    struct line_buffer *out = output_buffer + x_dst;
>> +    struct line_buffer *in = stage_buffer;
>
> Here you would check that you don't overrun any of the arrays. At this
> point, I believe an overrun would indicate a bug in VKMS, so handle it
> according to the kernel conventions. I have suggestions below
> how to make that check possible. In other places, I'll just say "check
> for overruns" for short.
>
>> +
>> +    for (x = 0; x < x_limit; x++) {
>> +            out[x].a = (u16)0xffff;
>> +            out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
>> +            out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
>> +            out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>>      }
>>   }
>>
>> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>>   {
>> +    if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
>> +            return true;
>>
>> +    return false;
>>   }
>>
>>   /**
>> + * @wb_frame_info: The writeback frame buffer metadata
>> + * @wb_fmt_func: The format tranformatio function to the wb buffer
>> + * @crtc_state: The crtc state
>> + * @plane_fmt_func: A format tranformation function to each plane
>
> Is it not *from* each plane?
>
> Each plane... does this mean that all planes must have the same pixel
> format?
>
> Oh wait, it's a pointer, so an array, isn't it? You're passing in an
> array without passing in the array size. That seems quite risky to me.
> Think of someone else needing to patch something here without fully
> understanding how this all works, they'd easily introduce a subtle bug.
>
> Looks like the array must be number of "active planes" long. So it's
> not even a constant, and the size of the array is not documented here.

I didn't think about it. But it makes a lot of sense to me.

>
> What if the fmt_func was a field in struct vkms_frame_info? So you
> could set it when creating a vkms_frame_info. Wouldn't that simplify
> the code in blend() and its callers?

This is a great idea! I will change it to the next version.

>
>> + * @crc32: The crc output of the final frame
>> + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
>> + * @stage_buffer: The line with the pixels from src_compositor
>
> I don't see src_compositor?

Oops.

>
>>    *
>> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
>> + * from all planes, calculates the crc32 of the output from the former step,
>> + * and, if necessary, convert and store the output to the writeback buffer.
>>    *
>>    * TODO: completely clear the primary plane (a = 0xff) before starting to blend
>>    * pixel color values
>
> Mm, you only need to clear output_buffer, not the whole writeback FB.
> output_buffer will unconditionally and totally overwrite the writeback
> FB, right?

Right. If I understand it correctly, this was necessary to be implemented
because of the way the previous code handles the alpha channel.

>
>>    */
>> +static void blend(struct vkms_frame_info *wb_frame_info,
>
> Using "wb" as short for writeback is... well, it's hard for the me
> remember at least. Could this not be named simply "writeback"?

IMHO it's better to use wb instead of writeback for consistency. Given that wb
is used throughout the vkms code.

>
>> +              format_transform_func wb_fmt_func,
>
> "writeback_func"
>
>> +              struct vkms_crtc_state *crtc_state,
>> +              format_transform_func *plane_fmt_func,
>> +              u32 *crc32, struct line_buffer *stage_buffer,
>> +              struct line_buffer *output_buffer, s64 row_size)
>>   {
>> +    struct vkms_plane_state **plane = crtc_state->active_planes;
>> +    struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
>> +    u32 n_active_planes = crtc_state->num_active_planes;
>>
>> +    int y_src = primary_plane_info->dst.y1;
>
> Shouldn't this be called y_dst instead?

Yes, it should. And will for v5.

>
>> +    int h_dst = drm_rect_height(&primary_plane_info->dst);
>>      int y_limit = y_src + h_dst;
>> +    int y, i;
>
> It took me a while to understand that all these y-coordinates are CRTC
> coordinates. Maybe call them crtc_y, crtc_y_begin, crtc_y_end,
> crtc_y_height, etc.
>
>> +
>> +    for (y = y_src; y < y_limit; y++) {
>> +            plane_fmt_func[0](primary_plane_info, y, output_buffer);
>
> This is initializing output_buffer, right? So why do you have the TODO
> comment about clearing the primary plane above?
>
> Is it because the primary plane may not cover the CRTC exactly, the
> destination rectangle might be bigger or smaller?
>
> The output_buffer length should be the CRTC width, right?
>
> Maybe the special-casing the primary plane in this code is wrong.
> crtc_y needs to iterate over the CRTC height starting from zero. Then,
> you explicitly clear output_buffer to opaque background color, and
> primary plane becomes just another plane in the array of active planes
> with no special handling here.
>
> That will allow you to support overlay planes *below* the primary plane
> (as is fairly common in non-PC hardware), and you can even support the
> background color KMS property.

I thought that the primary would always cover the entire screen exactly.

So yeah, my patch code assumes that CRTC is the same size as the primary plane.
(and if I'm not mistaken the current version also assumes it).

But If this is not the case, where are the CRTC dimensions?
Are they in the CRTC properties? drm_mode_config?

I couldn't find them.

>
>> +
>> +            /* If there are other planes besides primary, we consider the active
>> +             * planes should be in z-order and compose them associatively:
>> +             * ((primary <- overlay) <- cursor)
>> +             */
>> +            for (i = 1; i < n_active_planes; i++) {
>> +                    if (!check_y_limit(plane[i]->frame_info, y))
>> +                            continue;
>> +
>> +                    plane_fmt_func[i](plane[i]->frame_info, y, stage_buffer);
>> +                    pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
>> +                                        output_buffer);
>>              }
>> +
>> +            *crc32 = crc32_le(*crc32, (void *)output_buffer, row_size);
>> +
>> +            if (wb_frame_info)
>> +                    wb_fmt_func(wb_frame_info, y, output_buffer);
>>      }
>>   }
>>
>> +static void get_format_transform_functions(struct vkms_crtc_state *crtc_state,
>> +                                       format_transform_func plane_funcs[])
>>   {
>> +    struct vkms_plane_state **active_planes = crtc_state->active_planes;
>> +    u32 n_active_planes = crtc_state->num_active_planes, s_fmt;
>> +    int i;
>>
>> +    for (i = 0; i < n_active_planes; i++) {
>> +            s_fmt = active_planes[i]->frame_info->fb->format->format;
>> +            plane_funcs[i] = get_fmt_transform_function(s_fmt);
>> +    }
>> +}
>>
>> +static bool check_planes_x_bounds(struct vkms_crtc_state *crtc_state,
>> +                              struct vkms_frame_info *wb_frame_info)
>> +{
>> +    struct vkms_plane_state **planes = crtc_state->active_planes;
>> +    struct vkms_frame_info *primary_plane_info = planes[0]->frame_info;
>> +    int line_width = drm_rect_width(&primary_plane_info->dst);
>> +    u32 n_active_planes = crtc_state->num_active_planes;
>> +    int i;
>>
>> +    for (i = 0; i < n_active_planes; i++) {
>> +            int x_dst = planes[i]->frame_info->dst.x1;
>> +            int x_src = planes[i]->frame_info->src.x1 >> 16;
>> +            int x2_src = planes[i]->frame_info->src.x2 >> 16;
>> +            int x_limit = drm_rect_width(&planes[i]->frame_info->dst);
>>
>> +            if (x_dst + x_limit > line_width)
>> +                    return false;
>> +            if (x_src + x_limit > x2_src)
>> +                    return false;
>> +    }
>
> That's interesting. Looks like you reject everything if any plane is
> not fully inside the primary plane destination rectangle. But that's
> not the right check, is it? If you want to check this, you would check
> against the CRTC dimensions.

The same wrong assumption here.

>
> Then again, I think some hardware do allow planes to reach outside of
> the CRTC dimensions. Cursor plane is probably the best example. The
> cursor can be partly off-screen. So this is something that would need
> to be supported both ways I suppose, but going with the "all plane
> destination rectangles must be strictly inside the CRTC dimensions" is
> a good start.
>
> But why only x-coordinate check? y should have the same rules, right?

My code is inconsistent in this regard.

I created this function to prevent out-of-bound memory access by checking all
the X-axis limits, but because the blend loop ends, no matter what, at
the primary
Y-limit (which until now I assumed to be exactly the CRTC dimensions),
I don't check the Y-axis because it's not possible to cause any
out-of-bound access.

So, unintentionally, we have the partial off-screen for the Y-axis but
nor the X-axis.

>
>> +
>> +    return true;
>>   }
>>
>> +static int compose_active_planes(struct vkms_frame_info *wb_frame_info,
>> +                             struct vkms_crtc_state *crtc_state,
>> +                             u32 *crc32)
>>   {
>> +    format_transform_func plane_funcs[NUM_OVERLAY_PLANES], wb_func = NULL;
>> +    int line_width, ret = 0, pixel_size = sizeof(struct line_buffer);
>> +    struct vkms_frame_info *primary_plane_info = NULL;
>> +    struct line_buffer *output_buffer, *stage_buffer;
>> +    struct vkms_plane_state *act_plane = NULL;
>> +    u32 wb_format;
>>
>> +    if (WARN_ON(pixel_size != 8))
>> +            return -EINVAL;
>> +
>> +    if (crtc_state->num_active_planes >= 1) {
>> +            act_plane = crtc_state->active_planes[0];
>> +            if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>> +                    primary_plane_info = act_plane->frame_info;
>>      }
>>
>> +    if (!primary_plane_info)
>> +            return -EINVAL;
>> +
>>      if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
>>              return -EINVAL;
>>
>> +    if (WARN_ON(!check_planes_x_bounds(crtc_state, wb_frame_info)))
>> +            return -EINVAL;
>>
>> +    line_width = drm_rect_width(&primary_plane_info->dst);
>
> This needs to be CRTC width, not primary plane width.
>
>>
>> +    stage_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>> +    if (!stage_buffer) {
>> +            DRM_ERROR("Cannot allocate memory for the output line buffer");
>> +            return -ENOMEM;
>> +    }
>>
>> +    output_buffer = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>> +    if (!output_buffer) {
>> +            DRM_ERROR("Cannot allocate memory for intermediate line buffer");
>> +            ret = -ENOMEM;
>> +            goto free_stage_buffer;
>> +    }
>> +
>> +    get_format_transform_functions(crtc_state, plane_funcs);
>> +
>> +    if (wb_frame_info) {
>> +            wb_format = wb_frame_info->fb->format->format;
>> +            wb_func = get_wb_fmt_transform_function(wb_format);
>> +            wb_frame_info->src = primary_plane_info->src;
>> +            wb_frame_info->dst = primary_plane_info->dst;
>> +    }
>> +
>> +    blend(wb_frame_info, wb_func, crtc_state, plane_funcs, crc32,
>> +          stage_buffer, output_buffer, (s64)line_width * pixel_size);
>> +
>> +    kvfree(output_buffer);
>> +free_stage_buffer:
>> +    kvfree(stage_buffer);
>> +
>> +    return ret;
>>   }
>>
>>   /**
>> @@ -222,13 +230,12 @@ void vkms_composer_worker(struct work_struct *work)
>>                                              struct vkms_crtc_state,
>>                                              composer_work);
>>      struct drm_crtc *crtc = crtc_state->base.crtc;
>> +    struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>> +    struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
>>      struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>>      bool crc_pending, wb_pending;
>>      u64 frame_start, frame_end;
>> +    u32 crc32 = 0;
>>      int ret;
>>
>>      spin_lock_irq(&out->composer_lock);
>> @@ -248,35 +255,19 @@ void vkms_composer_worker(struct work_struct *work)
>>      if (!crc_pending)
>>              return;
>>
>>      if (wb_pending)
>> +            ret = compose_active_planes(wb_frame_info, crtc_state, &crc32);
>> +    else
>> +            ret = compose_active_planes(NULL, crtc_state, &crc32);
>>
>> +    if (ret)
>>              return;
>>
>>      if (wb_pending) {
>>              drm_writeback_signal_completion(&out->wb_connector, 0);
>>              spin_lock_irq(&out->composer_lock);
>>              crtc_state->wb_pending = false;
>>              spin_unlock_irq(&out->composer_lock);
>>      }
>>
>>      /*
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>> new file mode 100644
>> index 000000000000..0d1838d1b835
>> --- /dev/null
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>> @@ -0,0 +1,138 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ */
>> +
>> +#include <drm/drm_rect.h>
>> +#include "vkms_formats.h"
>> +
>> +format_transform_func get_fmt_transform_function(u32 format)
>> +{
>> +    if (format == DRM_FORMAT_ARGB8888)
>> +            return &ARGB8888_to_ARGB16161616;
>> +    else
>> +            return &XRGB8888_to_ARGB16161616;
>
> In functions like this you should prepare for caller errors. Use a
> switch, and fail any attempt to use a pixel format it doesn't support.
> Failing is much better than silently producing garbage or worse: buffer
> overruns when bytes-per-pixel is not what you expected.
>
> What to do on failure depends on whether the failure here is never
> supposed to happen (follow the kernel style) e.g. malicious userspace
> cannot trigger it, or if you actually use this function to define the
> supported for pixel formats.

No, I don't use this function to define supported formats, It's defined:
- vkms_writeback.c:15
- vkms_plane.c:14 and 22

And if I'm not mistaken the DRM framework takes care of validation.

>
> The latter means you'd have a list of all DRM pixel formats and then
> you'd ask for each one if this function knows it, and if yes, you add
> the format to the list of supported formats advertised to userspace. I
> don't know if that would be fine by DRM coding style.
>
>> +}
>> +
>> +format_transform_func get_wb_fmt_transform_function(u32 format)
>> +{
>> +    if (format == DRM_FORMAT_ARGB8888)
>> +            return &convert_to_ARGB8888;
>> +    else
>> +            return &convert_to_XRGB8888;
>> +}
>
> I think you could move the above getter functions to the bottom of the
> .c file, and make all the four *_to_* functions static, and remove them
> from the header.

OK. I will do that.

Question, what's the benefits of using static functions?

>
>> +
>> +static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
>> +{
>> +    return frame_info->offset + (y * frame_info->pitch)
>> +                              + (x * frame_info->cpp);
>> +}
>> +
>> +/*
>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>> + *
>> + * @frame_info: Buffer metadata
>> + * @x: The x(width) coordinate of the 2D buffer
>> + * @y: The y(Heigth) coordinate of the 2D buffer
>> + *
>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>> + * returns the address of the first color channel.
>> + * This function assumes the channels are packed together, i.e. a color channel
>> + * comes immediately after another in the memory. And therefore, this function
>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>> + */
>> +static void *packed_pixels_addr(struct vkms_frame_info *frame_info, int x, int y)
>> +{
>> +    int offset = pixel_offset(frame_info, x, y);
>> +
>> +    return (u8 *)frame_info->map[0].vaddr + offset;
>> +}
>> +
>> +static void *get_packed_src_addr(struct vkms_frame_info *frame_info, int y)
>> +{
>> +    int x_src = frame_info->src.x1 >> 16;
>> +    int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>> +
>> +    return packed_pixels_addr(frame_info, x_src, y_src);
>> +}
>> +
>> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>> +                          struct line_buffer *stage_buffer)
>
> I'm fairly sure that DRM will one day add exactly ARGB16161616 format.
> But that will not be the format you use here (or it might be, but
> purely accidentally and depending on machine endianess and whatnot), so
> I would suggest inventing a new name. Also use the same name for the
> struct to hold a single pixel.
>
> E.g. struct pixel_argb_u16

I'm terrible with names of variables, functions, etc. I will end-up with
ARGB8888_to_argb_u16.

>
> So that it is clear it is not meant to be any specific DRM_FORMAT_* format.
>
>> +{
>> +    u8 *src_pixels = get_packed_src_addr(frame_info, y);
>> +    int x, x_limit = drm_rect_width(&frame_info->dst);
>> +
>> +    for (x = 0; x < x_limit; x++, src_pixels += 4) {
>> +            /*
>> +             * Organizes the channels in their respective positions and converts
>> +             * the 8 bits channel to 16.
>> +             * The 257 is the "conversion ratio". This number is obtained by the
>> +             * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
>> +             * the best color value in a pixel format with more possibilities.
>> +             * And a similar idea applies to others RGB color conversions.
>> +             */
>> +            stage_buffer[x].a = (u16)src_pixels[3] * 257;
>> +            stage_buffer[x].r = (u16)src_pixels[2] * 257;
>> +            stage_buffer[x].g = (u16)src_pixels[1] * 257;
>> +            stage_buffer[x].b = (u16)src_pixels[0] * 257;
>> +    }
>> +}
>> +
>> +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>> +                          struct line_buffer *stage_buffer)
>> +{
>> +    u8 *src_pixels = get_packed_src_addr(frame_info, y);
>> +    int x, x_limit = drm_rect_width(&frame_info->dst);
>> +
>> +    for (x = 0; x < x_limit; x++, src_pixels += 4) {
>> +            stage_buffer[x].a = (u16)0xffff;
>> +            stage_buffer[x].r = (u16)src_pixels[2] * 257;
>> +            stage_buffer[x].g = (u16)src_pixels[1] * 257;
>> +            stage_buffer[x].b = (u16)src_pixels[0] * 257;
>> +    }
>> +}
>> +
>> +/*
>> + * The following  functions take an line of ARGB16161616 pixels from the
>> + * src_buffer, convert them to a specific format, and store them in the
>> + * destination.
>> + *
>> + * They are used in the `compose_active_planes` to convert and store a line
>> + * from the src_buffer to the writeback buffer.
>> + */
>> +void convert_to_ARGB8888(struct vkms_frame_info *frame_info,
>> +                     int y, struct line_buffer *src_buffer)
>
> Please, use consistent function naming style. These are using "convert"
> while the other ones are using "ARGB16161616".
>
>> +{
>> +    int x, x_dst = frame_info->dst.x1;
>> +    u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>> +    int x_limit = drm_rect_width(&frame_info->dst);
>> +
>> +    for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>> +            /*
>> +             * This sequence below is important because the format's byte order is
>> +             * in little-endian. In the case of the ARGB8888 the memory is
>> +             * organized this way:
>> +             *
>> +             * | Addr     | = blue channel
>> +             * | Addr + 1 | = green channel
>> +             * | Addr + 2 | = Red channel
>> +             * | Addr + 3 | = Alpha channel
>> +             */
>> +            dst_pixels[3] = DIV_ROUND_UP(src_buffer[x].a, 257);
>> +            dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
>> +            dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
>> +            dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
>> +    }
>> +}
>> +
>> +void convert_to_XRGB8888(struct vkms_frame_info *frame_info,
>> +                     int y, struct line_buffer *src_buffer)
>> +{
>> +    int x, x_dst = frame_info->dst.x1;
>> +    u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>> +    int x_limit = drm_rect_width(&frame_info->dst);
>> +
>> +    for (x = 0; x < x_limit; x++, dst_pixels += 4) {
>> +            dst_pixels[3] = (u8)0xff;
>> +            dst_pixels[2] = DIV_ROUND_UP(src_buffer[x].r, 257);
>> +            dst_pixels[1] = DIV_ROUND_UP(src_buffer[x].g, 257);
>> +            dst_pixels[0] = DIV_ROUND_UP(src_buffer[x].b, 257);
>> +    }
>> +}
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
>> new file mode 100644
>> index 000000000000..817e8b2124ae
>> --- /dev/null
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
>> @@ -0,0 +1,31 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ */
>> +
>> +#ifndef _VKMS_FORMATS_H_
>> +#define _VKMS_FORMATS_H_
>> +
>> +#include "vkms_drv.h"
>> +
>> +struct line_buffer {
>
> As I mentioned above, this would be called pixel_argb_u16 or something
> like that.
>
>> +    u16 a, r, g, b;
>> +};
>
> I was trying to suggest that a line_buffer would actually hold a whole
> line, something like the pseudo code:
>
> struct line_buffer {
>       size_t len_pixels;
>       struct my_pixel_type pixels[];
> }
>
> or whatever the kernel style for a variable length array at the end of
> a struct is. Field names are suggestions.
>
> Then it is easy to check that you don't overflow any line_buffer when
> operating on them.

Ok. I will change it.

>
>> +
>> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>> +                          struct line_buffer *stage_buffer);
>> +
>> +void XRGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>> +                          struct line_buffer *stage_buffer);
>> +
>> +void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
>> +                     struct line_buffer *src_buffer);
>> +
>> +void convert_to_XRGB8888(struct vkms_frame_info *frame_info, int y,
>> +                     struct line_buffer *src_buffer);
>
> You should only need the below functions and not the above ones in this header.
>
>> +
>> +typedef void (*format_transform_func)(struct vkms_frame_info *frame_info, int y,
>> +                                  struct line_buffer *buffer);
>
> The arguments for this function-pointer should be documented,
> especially that y is the y-coordinate in CRTC coordinate space, i.e.
> plane destination rectangle. You might even call it crtc_y.
>
> I think you should use two different function-pointer types for the two
> different kinds of functions:
> - reads arbitrary pixel format and writes to rgba_u16
> - reads rgba_u16 and writes to arbitrary pixel format
>
> This will prevent any mistakes in accidentally using the wrong kind of
> function. If you also have the argument order different between the two
> types of functions, getting them mixed up is even less likely. I
> presume the kernel uses the function(destination, source) style of
> argument ordering. You can also use 'const' on the source, that is a
> good way to document things too.
>
> The consequence of using the wrong function could be the leak of kernel
> memory content to userspace, which is pretty bad. So preventing that
> kind of problems before they happen is nice.

Okay, all of this makes a lot of sense. I will change it to the next version.

Last question for this email,

I have a patch with the implementation of nv12 and YUV420 formats, but
I don't know how to test it because the ".*kms_plane@pixel-format*" igt test
doesn't support these formats (And also it isn't working anymore with my hack).

Do you know how to test it?

Best Regards,
---
Igor Torrente

>
>> +
>> +format_transform_func get_fmt_transform_function(u32 format);
>> +
>> +format_transform_func get_wb_fmt_transform_function(u32 format);
>> +
>> +#endif /* _VKMS_FORMATS_H_ */
>
>
> Good work!
>
>
> Thanks,
> pq

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 9/9] drm: vkms: Add support to the RGB565 format
  2022-02-10  9:50   ` Pekka Paalanen
@ 2022-02-25  1:03     ` Igor Torrente
  2022-02-25  9:43       ` Pekka Paalanen
  0 siblings, 1 reply; 31+ messages in thread
From: Igor Torrente @ 2022-02-25  1:03 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, Thomas Zimmermann, rodrigosiqueiramelo, airlied,
	dri-devel, melissa.srw, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 9565 bytes --]

Hi Pekka,

On Thu, Feb 10, 2022 at 6:50 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:

> On Fri, 21 Jan 2022 18:38:31 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
>
> > Adds this common format to vkms.
> >
> > This commit also adds new helper macros to deal with fixed-point
> > arithmetic.
> >
> > It was done to improve the precision of the conversion to ARGB16161616
> > since the "conversion ratio" is not an integer.
> >
> > Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> > ---
> > V3: Adapt the handlers to the new format introduced in patch 7 V3.
> > ---
> >  drivers/gpu/drm/vkms/vkms_formats.c   | 74 +++++++++++++++++++++++++++
> >  drivers/gpu/drm/vkms/vkms_formats.h   |  6 +++
> >  drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
> >  drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
> >  4 files changed, 86 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c
> b/drivers/gpu/drm/vkms/vkms_formats.c
> > index 661da39d1276..dc612882dd8c 100644
> > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > @@ -11,6 +11,8 @@ format_transform_func get_fmt_transform_function(u32
> format)
> >               return &get_ARGB16161616;
> >       else if (format == DRM_FORMAT_XRGB16161616)
> >               return &XRGB16161616_to_ARGB16161616;
> > +     else if (format == DRM_FORMAT_RGB565)
> > +             return &RGB565_to_ARGB16161616;
> >       else
> >               return &XRGB8888_to_ARGB16161616;
> >  }
> > @@ -23,6 +25,8 @@ format_transform_func
> get_wb_fmt_transform_function(u32 format)
> >               return &convert_to_ARGB16161616;
> >       else if (format == DRM_FORMAT_XRGB16161616)
> >               return &convert_to_XRGB16161616;
> > +     else if (format == DRM_FORMAT_RGB565)
> > +             return &convert_to_RGB565;
> >       else
> >               return &convert_to_XRGB8888;
> >  }
> > @@ -33,6 +37,26 @@ static int pixel_offset(struct vkms_frame_info
> *frame_info, int x, int y)
> >                                 + (x * frame_info->cpp);
> >  }
> >
> > +/*
> > + * FP stands for _Fixed Point_ and **not** _Float Point_
>
> Is it common in the kernel that FP always means fixed-point?
>

I cannot say for sure, but I don't think so. I put it for people like me
that goes automatically to Floating-Point because never worked with
fixed-point before.


>
> If there is any doubt about that, I'd suggest using "fixed" and "float"
> to avoid misunderstandings.
>
> And, since you are not supposed to use floats in the kernel unless you
> really really must and you do all the preparations necessary (which you
> don't here), maybe replace the "float" with a fraction.
>
> In other words, write a macro that takes (65535, 31) as arguments
> instead of a float, when converting to fixed-point. Then you don't have
> to use those strange decimal constants either.
>

It looks better, I will try to implement this.


> > + * LF stands for Long Float (i.e. double)
> > + * The following macros help doing fixed point arithmetic.
> > + */
> > +/*
> > + * With FP scale 15 we have 17 and 15 bits of integer and fractional
> parts
> > + * respectively.
> > + *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
> > + * 31                                          0
> > + */
> > +#define FP_SCALE 15
> > +
> > +#define LF_TO_FP(a) ((a) * (u64)(1 << FP_SCALE))
> > +#define INT_TO_FP(a) ((a) << FP_SCALE)
> > +#define FP_MUL(a, b) ((s32)(((s64)(a) * (b)) >> FP_SCALE))
> > +#define FP_DIV(a, b) ((s32)(((s64)(a) << FP_SCALE) / (b)))
> > +/* This macro converts a fixed point number to int, and round half up
> it */
> > +#define FP_TO_INT_ROUND_UP(a) (((a) + (1 << (FP_SCALE - 1))) >>
> FP_SCALE)
> > +
> >  /*
> >   * packed_pixels_addr - Get the pointer to pixel of a given pair of
> coordinates
> >   *
> > @@ -125,6 +149,33 @@ void XRGB16161616_to_ARGB16161616(struct
> vkms_frame_info *frame_info, int y,
> >       }
> >  }
> >
> > +void RGB565_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> > +                         struct line_buffer *stage_buffer)
> > +{
> > +     u16 *src_pixels = get_packed_src_addr(frame_info, y);
> > +     int x, x_limit = drm_rect_width(&frame_info->dst);
> > +
> > +     for (x = 0; x < x_limit; x++, src_pixels++) {
> > +             u16 rgb_565 = le16_to_cpu(*src_pixels);
> > +             int fp_r = INT_TO_FP((rgb_565 >> 11) & 0x1f);
> > +             int fp_g = INT_TO_FP((rgb_565 >> 5) & 0x3f);
> > +             int fp_b = INT_TO_FP(rgb_565 & 0x1f);
> > +
> > +             /*
> > +              * The magic constants is the "conversion ratio" and is
> calculated
> > +              * dividing 65535(2^16 - 1) by 31(2^5 -1) and 63(2^6 - 1)
> > +              * respectively.
> > +              */
> > +             int fp_rb_ratio = LF_TO_FP(2114.032258065);
> > +             int fp_g_ratio = LF_TO_FP(1040.238095238);
> > +
> > +             stage_buffer[x].a = (u16)0xffff;
> > +             stage_buffer[x].r = FP_TO_INT_ROUND_UP(FP_MUL(fp_r,
> fp_rb_ratio));
> > +             stage_buffer[x].g = FP_TO_INT_ROUND_UP(FP_MUL(fp_g,
> fp_g_ratio));
> > +             stage_buffer[x].b = FP_TO_INT_ROUND_UP(FP_MUL(fp_b,
> fp_rb_ratio));
> > +     }
> > +}
> > +
> >
> >  /*
> >   * The following  functions take an line of ARGB16161616 pixels from the
> > @@ -203,3 +254,26 @@ void convert_to_XRGB16161616(struct vkms_frame_info
> *frame_info, int y,
> >               dst_pixels[0] = src_buffer[x].b;
> >       }
> >  }
> > +
> > +void convert_to_RGB565(struct vkms_frame_info *frame_info, int y,
> > +                    struct line_buffer *src_buffer)
> > +{
> > +     int x, x_dst = frame_info->dst.x1;
> > +     u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> > +     int x_limit = drm_rect_width(&frame_info->dst);
> > +
> > +     for (x = 0; x < x_limit; x++, dst_pixels++) {
> > +             int fp_r = INT_TO_FP(src_buffer[x].r);
> > +             int fp_g = INT_TO_FP(src_buffer[x].g);
> > +             int fp_b = INT_TO_FP(src_buffer[x].b);
> > +
> > +             int fp_rb_ratio = LF_TO_FP(2114.032258065);
> > +             int fp_g_ratio = LF_TO_FP(1040.238095238);
>
> Are there any guarantees that this will not result in floating-point
> CPU instructions being used? Like a compiler error if it did?
>
> Yes, it's a constant expression, but I think there were some funny
> rules in C that floating-point operations may not be evaluated at
> compile time. Maybe I'm just paranoid?
>
>
Well, I cannot guarantee anything, but every time that I
intentionally/unintentionally
did anything related with floating-point it couldn't link the kernel.


>
> Thanks,
> pq
>
> > +
> > +             u16 r = FP_TO_INT_ROUND_UP(FP_DIV(fp_r, fp_rb_ratio));
> > +             u16 g = FP_TO_INT_ROUND_UP(FP_DIV(fp_g, fp_g_ratio));
> > +             u16 b = FP_TO_INT_ROUND_UP(FP_DIV(fp_b, fp_rb_ratio));
> > +
> > +             *dst_pixels = cpu_to_le16(r << 11 | g << 5 | b);
> > +     }
> > +}
> > diff --git a/drivers/gpu/drm/vkms/vkms_formats.h
> b/drivers/gpu/drm/vkms/vkms_formats.h
> > index 22358f3a33ab..836d6e43ea90 100644
> > --- a/drivers/gpu/drm/vkms/vkms_formats.h
> > +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> > @@ -21,6 +21,9 @@ void get_ARGB16161616(struct vkms_frame_info
> *frame_info, int y,
> >  void XRGB16161616_to_ARGB16161616(struct vkms_frame_info *frame_info,
> int y,
> >                                 struct line_buffer *stage_buffer);
> >
> > +void RGB565_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> > +                         struct line_buffer *stage_buffer);
> > +
> >  void convert_to_ARGB8888(struct vkms_frame_info *frame_info, int y,
> >                        struct line_buffer *src_buffer);
> >
> > @@ -33,6 +36,9 @@ void convert_to_ARGB16161616(struct vkms_frame_info
> *frame_info, int y,
> >  void convert_to_XRGB16161616(struct vkms_frame_info *frame_info, int y,
> >                            struct line_buffer *src_buffer);
> >
> > +void convert_to_RGB565(struct vkms_frame_info *frame_info, int y,
> > +                    struct line_buffer *src_buffer);
> > +
> >  typedef void (*format_transform_func)(struct vkms_frame_info
> *frame_info, int y,
> >                                     struct line_buffer *buffer);
> >
> > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c
> b/drivers/gpu/drm/vkms/vkms_plane.c
> > index 1d70c9e8f109..4643eefcdf29 100644
> > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > @@ -13,14 +13,16 @@
> >
> >  static const u32 vkms_formats[] = {
> >       DRM_FORMAT_XRGB8888,
> > -     DRM_FORMAT_XRGB16161616
> > +     DRM_FORMAT_XRGB16161616,
> > +     DRM_FORMAT_RGB565
> >  };
> >
> >  static const u32 vkms_plane_formats[] = {
> >       DRM_FORMAT_ARGB8888,
> >       DRM_FORMAT_XRGB8888,
> >       DRM_FORMAT_XRGB16161616,
> > -     DRM_FORMAT_ARGB16161616
> > +     DRM_FORMAT_ARGB16161616,
> > +     DRM_FORMAT_RGB565
> >  };
> >
> >  static struct drm_plane_state *
> > diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c
> b/drivers/gpu/drm/vkms/vkms_writeback.c
> > index 393d3fc7966f..1aaa630090d3 100644
> > --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> > +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> > @@ -15,7 +15,8 @@
> >  static const u32 vkms_wb_formats[] = {
> >       DRM_FORMAT_XRGB8888,
> >       DRM_FORMAT_XRGB16161616,
> > -     DRM_FORMAT_ARGB16161616
> > +     DRM_FORMAT_ARGB16161616,
> > +     DRM_FORMAT_RGB565
> >  };
> >
> >  static const struct drm_connector_funcs vkms_wb_connector_funcs = {
>
>

[-- Attachment #2: Type: text/html, Size: 12517 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-02-25  0:43     ` Igor Torrente
@ 2022-02-25  9:38       ` Pekka Paalanen
  2022-02-27 14:19         ` Igor Torrente
  0 siblings, 1 reply; 31+ messages in thread
From: Pekka Paalanen @ 2022-02-25  9:38 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, Thomas Zimmermann, rodrigosiqueiramelo, airlied,
	dri-devel, melissa.srw, ~lkcamp/patches, kernel test robot

[-- Attachment #1: Type: text/plain, Size: 14553 bytes --]

On Thu, 24 Feb 2022 21:43:01 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Hi Pekka,
> 
> On 2/10/22 06:37, Pekka Paalanen wrote:
> > On Fri, 21 Jan 2022 18:38:29 -0300
> > Igor Torrente <igormtorrente@gmail.com> wrote:
> >  
> >> Currently the blend function only accepts XRGB_8888 and ARGB_8888
> >> as a color input.
> >>
> >> This patch refactors all the functions related to the plane composition
> >> to overcome this limitation.
> >>
> >> A new internal format(`struct pixel`) is introduced to deal with all
> >> possible inputs. It consists of 16 bits fields that represent each of
> >> the channels.
> >>
> >> The pixels blend is done using this internal format. And new handlers
> >> are being added to convert a specific format to/from this internal format.
> >>
> >> So the blend operation depends on these handlers to convert to this common
> >> format. The blended result, if necessary, is converted to the writeback
> >> buffer format.
> >>
> >> This patch introduces three major differences to the blend function.
> >> 1 - All the planes are blended at once.
> >> 2 - The blend calculus is done as per line instead of per pixel.
> >> 3 - It is responsible to calculates the CRC and writing the writeback
> >>      buffer(if necessary).
> >>
> >> These changes allow us to allocate way less memory in the intermediate
> >> buffer to compute these operations. Because now we don't need to
> >> have the entire intermediate image lines at once, just one line is
> >> enough.
> >>
> >> | Memory consumption (output dimensions) |
> >> |:--------------------------------------:|
> >> |       Current      |     This patch    |
> >> |:------------------:|:-----------------:|
> >> |   Width * Heigth   |     2 * Width     |
> >>
> >> Beyond memory, we also have a minor performance benefit from all
> >> these changes. Results running the IGT tests `*kms_cursor_crc*`:
> >>
> >> |                 Frametime                  |
> >> |:------------------------------------------:|
> >> |  Implementation |  Current  |  This commit |
> >> |:---------------:|:---------:|:------------:|
> >> | frametime range |  8~22 ms  |    5~18 ms   |
> >> |     Average     |  10.0 ms  |    7.3 ms    |
> >>
> >> Reported-by: kernel test robot <lkp@intel.com>
> >> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> >> ---
> >> V2: Improves the performance drastically, by perfoming the operations
> >>      per-line and not per-pixel(Pekka Paalanen).
> >>      Minor improvements(Pekka Paalanen).
> >>
> >> V3: Changes the code to blend the planes all at once. This improves
> >>      performance, memory consumption, and removes much of the weirdness
> >>      of the V2(Pekka Paalanen and me).
> >>      Minor improvements(Pekka Paalanen and me).
> >>
> >> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
> >> ---
> >>   drivers/gpu/drm/vkms/Makefile        |   1 +
> >>   drivers/gpu/drm/vkms/vkms_composer.c | 335 +++++++++++++--------------
> >>   drivers/gpu/drm/vkms/vkms_formats.c  | 138 +++++++++++
> >>   drivers/gpu/drm/vkms/vkms_formats.h  |  31 +++
> >>   4 files changed, 333 insertions(+), 172 deletions(-)
> >>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
> >>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h  
> >
> > Hi Igor,
> >
> > I'm really happy to see this, thanks!
> >
> > I still have some security/robustness and other comments below.
> >
> > I've deleted all the minus lines from the patch to make the new code
> > more clear.
> >  
> >>
> >> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
> >> index 72f779cbfedd..1b28a6a32948 100644
> >> --- a/drivers/gpu/drm/vkms/Makefile
> >> +++ b/drivers/gpu/drm/vkms/Makefile
> >> @@ -3,6 +3,7 @@ vkms-y := \
> >>      vkms_drv.o \
> >>      vkms_plane.o \
> >>      vkms_output.o \
> >> +    vkms_formats.o \
> >>      vkms_crtc.o \
> >>      vkms_composer.o \
> >>      vkms_writeback.o
> >> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> >> index 95029d2ebcac..9f70fcf84fb9 100644
> >> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> >> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> >> @@ -9,202 +9,210 @@
> >>   #include <drm/drm_vblank.h>
> >>
> >>   #include "vkms_drv.h"
> >> +#include "vkms_formats.h"
> >>
> >> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
> >>   {
> >> +    u32 new_color;
> >>
> >> +    new_color = (src * 0xffff + dst * (0xffff - alpha));
> >>
> >> +    return DIV_ROUND_UP(new_color, 0xffff);  
> >
> > Why round-up rather than the usual mathematical rounding?  
> 
> AFAIK, this is the only round that's present in the kernel. And if I
> understood correctly it is the round toward positive infinity that we are
> all used to use.

Should be pretty easy to round-up from 0.5 and round-down otherwise.
Just add a different offset than DIV_ROUND_UP does.

Not having a ready-made macro and habits are not good
justifications. The justification needs to be mathematical.

The problem with DIV_ROUND_UP that I see, is that 0x00000001 gets
rounded to 0x0001, and anything that is even slightly above 0xfffe0000
gets rounded to 0xffff. So it seems to me that this adds a bias to the
result.

Is my intuition right or wrong, I'm not sure. I do know that

0xffff * 0xffff = 0xfffe0001

so values greater than 0xfffe0001 cannot occur.

That seems to mean that there is exactly one 32-bit value that
DIV_ROUND_UPs to 0x0000 and exactly one 32-bit value that DIV_ROUND_UPs
to 0xffff. That doesn't feel right to me.

Would need to compare to how the blending with real numbers would work.


> >>    */
> >> +static void blend(struct vkms_frame_info *wb_frame_info,  
> >
> > Using "wb" as short for writeback is... well, it's hard for the me
> > remember at least. Could this not be named simply "writeback"?  
> 
> IMHO it's better to use wb instead of writeback for consistency. Given that wb
> is used throughout the vkms code.

Right, so that's a problem for me.

Is any other driver using wb for writeback?

I don't mind using wb in local variables, but in type names I would
personally prefer more descriptive names.


> >> +    int h_dst = drm_rect_height(&primary_plane_info->dst);
> >>      int y_limit = y_src + h_dst;
> >> +    int y, i;  
> >
> > It took me a while to understand that all these y-coordinates are CRTC
> > coordinates. Maybe call them crtc_y, crtc_y_begin, crtc_y_end,
> > crtc_y_height, etc.
> >  
> >> +
> >> +    for (y = y_src; y < y_limit; y++) {
> >> +            plane_fmt_func[0](primary_plane_info, y, output_buffer);  
> >
> > This is initializing output_buffer, right? So why do you have the TODO
> > comment about clearing the primary plane above?
> >
> > Is it because the primary plane may not cover the CRTC exactly, the
> > destination rectangle might be bigger or smaller?
> >
> > The output_buffer length should be the CRTC width, right?
> >
> > Maybe the special-casing the primary plane in this code is wrong.
> > crtc_y needs to iterate over the CRTC height starting from zero. Then,
> > you explicitly clear output_buffer to opaque background color, and
> > primary plane becomes just another plane in the array of active planes
> > with no special handling here.
> >
> > That will allow you to support overlay planes *below* the primary plane
> > (as is fairly common in non-PC hardware), and you can even support the
> > background color KMS property.  
> 
> I thought that the primary would always cover the entire screen exactly.

Nope. Maybe PC hardware has such limitations, but I'm quite sure there
are display controllers that do not require this. Therefore VKMS should
support the more generic case, and possible offer a configuration knob
to reject atomic state where primary plane is not active or not
covering the whole CRTC in order to be able to test userspace against
such driver behaviour.

After all, background color KMS property exists.

> 
> So yeah, my patch code assumes that CRTC is the same size as the primary plane.
> (and if I'm not mistaken the current version also assumes it).
> 
> But If this is not the case, where are the CRTC dimensions?
> Are they in the CRTC properties? drm_mode_config?
> 
> I couldn't find them.

It's the active area of the current video mode, I believe. How that
translated to DRM code, I don't know.


> >> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> >> new file mode 100644
> >> index 000000000000..0d1838d1b835
> >> --- /dev/null
> >> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> >> @@ -0,0 +1,138 @@
> >> +/* SPDX-License-Identifier: GPL-2.0+ */
> >> +
> >> +#include <drm/drm_rect.h>
> >> +#include "vkms_formats.h"
> >> +
> >> +format_transform_func get_fmt_transform_function(u32 format)
> >> +{
> >> +    if (format == DRM_FORMAT_ARGB8888)
> >> +            return &ARGB8888_to_ARGB16161616;
> >> +    else
> >> +            return &XRGB8888_to_ARGB16161616;  
> >
> > In functions like this you should prepare for caller errors. Use a
> > switch, and fail any attempt to use a pixel format it doesn't support.
> > Failing is much better than silently producing garbage or worse: buffer
> > overruns when bytes-per-pixel is not what you expected.
> >
> > What to do on failure depends on whether the failure here is never
> > supposed to happen (follow the kernel style) e.g. malicious userspace
> > cannot trigger it, or if you actually use this function to define the
> > supported for pixel formats.  
> 
> No, I don't use this function to define supported formats, It's defined:
> - vkms_writeback.c:15
> - vkms_plane.c:14 and 22
> 
> And if I'm not mistaken the DRM framework takes care of validation.

Then someone else comes, adds a new pixel format to those files, and
does not even realize get_fmt_transform_function() exists.

If you know that something must already ensure you cannot get
unsupported pixel formats in this function, then I guess some kind of
kernel panic here if you do get an unsupported pixel format would be
appropriate? Or an oops.

That would tell loud and clear to that other person they overlooked
something. Assuming they test the code the new format.

> 
> >
> > The latter means you'd have a list of all DRM pixel formats and then
> > you'd ask for each one if this function knows it, and if yes, you add
> > the format to the list of supported formats advertised to userspace. I
> > don't know if that would be fine by DRM coding style.
> >  
> >> +}
> >> +
> >> +format_transform_func get_wb_fmt_transform_function(u32 format)
> >> +{
> >> +    if (format == DRM_FORMAT_ARGB8888)
> >> +            return &convert_to_ARGB8888;
> >> +    else
> >> +            return &convert_to_XRGB8888;
> >> +}  
> >
> > I think you could move the above getter functions to the bottom of the
> > .c file, and make all the four *_to_* functions static, and remove them
> > from the header.  
> 
> OK. I will do that.
> 
> Question, what's the benefits of using static functions?

Making code more contained. When people see that a function is static,
they know it won't be directly referenced from any other file. This
makes understanding easier. It's hygiene too: make everything static
that could be static.

Sometimes it can also have other benefits, like the compiler
automatically inlining the whole thing, and not even emitting the
independent function code. It might also speed up linking, as a static
function cannot be a target.

> >  
> >> +
> >> +static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
> >> +{
> >> +    return frame_info->offset + (y * frame_info->pitch)
> >> +                              + (x * frame_info->cpp);
> >> +}
> >> +
> >> +/*
> >> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> >> + *
> >> + * @frame_info: Buffer metadata
> >> + * @x: The x(width) coordinate of the 2D buffer
> >> + * @y: The y(Heigth) coordinate of the 2D buffer
> >> + *
> >> + * Takes the information stored in the frame_info, a pair of coordinates, and
> >> + * returns the address of the first color channel.
> >> + * This function assumes the channels are packed together, i.e. a color channel
> >> + * comes immediately after another in the memory. And therefore, this function
> >> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> >> + */
> >> +static void *packed_pixels_addr(struct vkms_frame_info *frame_info, int x, int y)
> >> +{
> >> +    int offset = pixel_offset(frame_info, x, y);
> >> +
> >> +    return (u8 *)frame_info->map[0].vaddr + offset;
> >> +}
> >> +
> >> +static void *get_packed_src_addr(struct vkms_frame_info *frame_info, int y)
> >> +{
> >> +    int x_src = frame_info->src.x1 >> 16;
> >> +    int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
> >> +
> >> +    return packed_pixels_addr(frame_info, x_src, y_src);
> >> +}
> >> +
> >> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
> >> +                          struct line_buffer *stage_buffer)  
> >
> > I'm fairly sure that DRM will one day add exactly ARGB16161616 format.

Oops, I think the format already exists.

> > But that will not be the format you use here (or it might be, but
> > purely accidentally and depending on machine endianess and whatnot), so
> > I would suggest inventing a new name. Also use the same name for the
> > struct to hold a single pixel.
> >
> > E.g. struct pixel_argb_u16  
> 
> I'm terrible with names of variables, functions, etc. I will end-up with
> ARGB8888_to_argb_u16.

Sounds fine.

> 
> I have a patch with the implementation of nv12 and YUV420 formats, but
> I don't know how to test it because the ".*kms_plane@pixel-format*" igt test
> doesn't support these formats (And also it isn't working anymore with my hack).
> 
> Do you know how to test it?

I think the best way would be to teach IGT to test it. Then everyone
will automatically benefit from it.

I don't really know anything about IGT code.

FWIW, Weston has some YUV testing code I wrote in
tests/yuv-buffer-test.c for the color conversions, but it's very
limited scope (only BT.601, only limited range, and ignores chroma
siting).


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 9/9] drm: vkms: Add support to the RGB565 format
  2022-02-25  1:03     ` Igor Torrente
@ 2022-02-25  9:43       ` Pekka Paalanen
  0 siblings, 0 replies; 31+ messages in thread
From: Pekka Paalanen @ 2022-02-25  9:43 UTC (permalink / raw)
  To: Igor Torrente
  Cc: hamohammed.sa, Thomas Zimmermann, rodrigosiqueiramelo, airlied,
	dri-devel, melissa.srw, ~lkcamp/patches

[-- Attachment #1: Type: text/plain, Size: 2864 bytes --]

On Thu, 24 Feb 2022 22:03:42 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Hi Pekka,
> 
> On Thu, Feb 10, 2022 at 6:50 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> 
> > On Fri, 21 Jan 2022 18:38:31 -0300
> > Igor Torrente <igormtorrente@gmail.com> wrote:
> >  
> > > Adds this common format to vkms.
> > >
> > > This commit also adds new helper macros to deal with fixed-point
> > > arithmetic.
> > >
> > > It was done to improve the precision of the conversion to ARGB16161616
> > > since the "conversion ratio" is not an integer.
> > >
> > > Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> > > ---
> > > V3: Adapt the handlers to the new format introduced in patch 7 V3.
> > > ---
> > >  drivers/gpu/drm/vkms/vkms_formats.c   | 74 +++++++++++++++++++++++++++
> > >  drivers/gpu/drm/vkms/vkms_formats.h   |  6 +++
> > >  drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
> > >  drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
> > >  4 files changed, 86 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c  
> > b/drivers/gpu/drm/vkms/vkms_formats.c  
> > > index 661da39d1276..dc612882dd8c 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > @@ -11,6 +11,8 @@ format_transform_func get_fmt_transform_function(u32  
> > format)  
> > >               return &get_ARGB16161616;
> > >       else if (format == DRM_FORMAT_XRGB16161616)
> > >               return &XRGB16161616_to_ARGB16161616;
> > > +     else if (format == DRM_FORMAT_RGB565)
> > > +             return &RGB565_to_ARGB16161616;
> > >       else
> > >               return &XRGB8888_to_ARGB16161616;
> > >  }
> > > @@ -23,6 +25,8 @@ format_transform_func  
> > get_wb_fmt_transform_function(u32 format)  
> > >               return &convert_to_ARGB16161616;
> > >       else if (format == DRM_FORMAT_XRGB16161616)
> > >               return &convert_to_XRGB16161616;
> > > +     else if (format == DRM_FORMAT_RGB565)
> > > +             return &convert_to_RGB565;
> > >       else
> > >               return &convert_to_XRGB8888;
> > >  }
> > > @@ -33,6 +37,26 @@ static int pixel_offset(struct vkms_frame_info  
> > *frame_info, int x, int y)  
> > >                                 + (x * frame_info->cpp);
> > >  }
> > >
> > > +/*
> > > + * FP stands for _Fixed Point_ and **not** _Float Point_  
> >
> > Is it common in the kernel that FP always means fixed-point?
> >  
> 
> I cannot say for sure, but I don't think so. I put it for people like me
> that goes automatically to Floating-Point because never worked with
> fixed-point before.

Indeed, so do not use "FP" at all as an abbreviation, please. Use a
name or abbreviation that does not need a comment to prevent easy
misunderstandings.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats
  2022-02-25  9:38       ` Pekka Paalanen
@ 2022-02-27 14:19         ` Igor Torrente
  0 siblings, 0 replies; 31+ messages in thread
From: Igor Torrente @ 2022-02-27 14:19 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: hamohammed.sa, Thomas Zimmermann, rodrigosiqueiramelo, airlied,
	dri-devel, melissa.srw, ~lkcamp/patches, kernel test robot

Hi Pekka,

On 2/25/22 05:38, Pekka Paalanen wrote:
> On Thu, 24 Feb 2022 21:43:01 -0300
> Igor Torrente <igormtorrente@gmail.com> wrote:
>
>> Hi Pekka,
>>
>> On 2/10/22 06:37, Pekka Paalanen wrote:
>>> On Fri, 21 Jan 2022 18:38:29 -0300
>>> Igor Torrente <igormtorrente@gmail.com> wrote:
>>>
>>>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>>>> as a color input.
>>>>
>>>> This patch refactors all the functions related to the plane composition
>>>> to overcome this limitation.
>>>>
>>>> A new internal format(`struct pixel`) is introduced to deal with all
>>>> possible inputs. It consists of 16 bits fields that represent each of
>>>> the channels.
>>>>
>>>> The pixels blend is done using this internal format. And new handlers
>>>> are being added to convert a specific format to/from this internal format.
>>>>
>>>> So the blend operation depends on these handlers to convert to this common
>>>> format. The blended result, if necessary, is converted to the writeback
>>>> buffer format.
>>>>
>>>> This patch introduces three major differences to the blend function.
>>>> 1 - All the planes are blended at once.
>>>> 2 - The blend calculus is done as per line instead of per pixel.
>>>> 3 - It is responsible to calculates the CRC and writing the writeback
>>>>       buffer(if necessary).
>>>>
>>>> These changes allow us to allocate way less memory in the intermediate
>>>> buffer to compute these operations. Because now we don't need to
>>>> have the entire intermediate image lines at once, just one line is
>>>> enough.
>>>>
>>>> | Memory consumption (output dimensions) |
>>>> |:--------------------------------------:|
>>>> |       Current      |     This patch    |
>>>> |:------------------:|:-----------------:|
>>>> |   Width * Heigth   |     2 * Width     |
>>>>
>>>> Beyond memory, we also have a minor performance benefit from all
>>>> these changes. Results running the IGT tests `*kms_cursor_crc*`:
>>>>
>>>> |                 Frametime                  |
>>>> |:------------------------------------------:|
>>>> |  Implementation |  Current  |  This commit |
>>>> |:---------------:|:---------:|:------------:|
>>>> | frametime range |  8~22 ms  |    5~18 ms   |
>>>> |     Average     |  10.0 ms  |    7.3 ms    |
>>>>
>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>>>> ---
>>>> V2: Improves the performance drastically, by perfoming the operations
>>>>       per-line and not per-pixel(Pekka Paalanen).
>>>>       Minor improvements(Pekka Paalanen).
>>>>
>>>> V3: Changes the code to blend the planes all at once. This improves
>>>>       performance, memory consumption, and removes much of the weirdness
>>>>       of the V2(Pekka Paalanen and me).
>>>>       Minor improvements(Pekka Paalanen and me).
>>>>
>>>> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
>>>> ---
>>>>    drivers/gpu/drm/vkms/Makefile        |   1 +
>>>>    drivers/gpu/drm/vkms/vkms_composer.c | 335 +++++++++++++--------------
>>>>    drivers/gpu/drm/vkms/vkms_formats.c  | 138 +++++++++++
>>>>    drivers/gpu/drm/vkms/vkms_formats.h  |  31 +++
>>>>    4 files changed, 333 insertions(+), 172 deletions(-)
>>>>    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>>>>    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
>>>
>>> Hi Igor,
>>>
>>> I'm really happy to see this, thanks!
>>>
>>> I still have some security/robustness and other comments below.
>>>
>>> I've deleted all the minus lines from the patch to make the new code
>>> more clear.
>>>
>>>>
>>>> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
>>>> index 72f779cbfedd..1b28a6a32948 100644
>>>> --- a/drivers/gpu/drm/vkms/Makefile
>>>> +++ b/drivers/gpu/drm/vkms/Makefile
>>>> @@ -3,6 +3,7 @@ vkms-y := \
>>>>       vkms_drv.o \
>>>>       vkms_plane.o \
>>>>       vkms_output.o \
>>>> +    vkms_formats.o \
>>>>       vkms_crtc.o \
>>>>       vkms_composer.o \
>>>>       vkms_writeback.o
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> index 95029d2ebcac..9f70fcf84fb9 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> @@ -9,202 +9,210 @@
>>>>    #include <drm/drm_vblank.h>
>>>>
>>>>    #include "vkms_drv.h"
>>>> +#include "vkms_formats.h"
>>>>
>>>> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>>>>    {
>>>> +    u32 new_color;
>>>>
>>>> +    new_color = (src * 0xffff + dst * (0xffff - alpha));
>>>>
>>>> +    return DIV_ROUND_UP(new_color, 0xffff);
>>>
>>> Why round-up rather than the usual mathematical rounding?
>>
>> AFAIK, this is the only round that's present in the kernel. And if I
>> understood correctly it is the round toward positive infinity that we are
>> all used to use.
>
> Should be pretty easy to round-up from 0.5 and round-down otherwise.
> Just add a different offset than DIV_ROUND_UP does.
>
> Not having a ready-made macro and habits are not good
> justifications. The justification needs to be mathematical.

Actually, I wasn't trying to argue about anything. Sorry if it sounded
that way.

Also, I had some confusion and I have to correct what I said. The
DIV_ROUND_UP isn't the usual mathematical rounding, it's more
like a round and ceiling.

I don't know why I thought that was the usual rounding.

>
> The problem with DIV_ROUND_UP that I see, is that 0x00000001 gets
> rounded to 0x0001, and anything that is even slightly above 0xfffe0000
> gets rounded to 0xffff. So it seems to me that this adds a bias to the
> result.
>
> Is my intuition right or wrong, I'm not sure. I do know that
>
> 0xffff * 0xffff = 0xfffe0001
>
> so values greater than 0xfffe0001 cannot occur.
>
> That seems to mean that there is exactly one 32-bit value that
> DIV_ROUND_UPs to 0x0000 and exactly one 32-bit value that DIV_ROUND_UPs
> to 0xffff. That doesn't feel right to me.
>
> Would need to compare to how the blending with real numbers would work.

Yeah, I agree, it makes sense to me.

I will change that, thanks!

>
>
>>>>     */
>>>> +static void blend(struct vkms_frame_info *wb_frame_info,
>>>
>>> Using "wb" as short for writeback is... well, it's hard for the me
>>> remember at least. Could this not be named simply "writeback"?
>>
>> IMHO it's better to use wb instead of writeback for consistency. Given that wb
>> is used throughout the vkms code.
>
> Right, so that's a problem for me.
>
> Is any other driver using wb for writeback?

Apparently, yes.

rcar_du_writeback.c:24:struct rcar_du_wb_conn_state {
rcar_du_writeback.c:29:#define to_rcar_wb_conn_state
rcar_du_writeback.c:219:        struct rcar_du_wb_job *rjob; q1
komeda_wb_connector.c:111:komeda_wb_connector_detect(...)
komeda_wb_connector.c:78:static const struct drm_encoder_helper_funcs
komeda_wb_encoder_helper_funcs = {qq
amdgpu_device.c:1095:static void amdgpu_device_wb_fini(...)

>
> I don't mind using wb in local variables, but in type names I would
> personally prefer more descriptive names.
>
>
>>>> +    int h_dst = drm_rect_height(&primary_plane_info->dst);
>>>>       int y_limit = y_src + h_dst;
>>>> +    int y, i;
>>>
>>> It took me a while to understand that all these y-coordinates are CRTC
>>> coordinates. Maybe call them crtc_y, crtc_y_begin, crtc_y_end,
>>> crtc_y_height, etc.
>>>
>>>> +
>>>> +    for (y = y_src; y < y_limit; y++) {
>>>> +            plane_fmt_func[0](primary_plane_info, y, output_buffer);
>>>
>>> This is initializing output_buffer, right? So why do you have the TODO
>>> comment about clearing the primary plane above?
>>>
>>> Is it because the primary plane may not cover the CRTC exactly, the
>>> destination rectangle might be bigger or smaller?
>>>
>>> The output_buffer length should be the CRTC width, right?
>>>
>>> Maybe the special-casing the primary plane in this code is wrong.
>>> crtc_y needs to iterate over the CRTC height starting from zero. Then,
>>> you explicitly clear output_buffer to opaque background color, and
>>> primary plane becomes just another plane in the array of active planes
>>> with no special handling here.
>>>
>>> That will allow you to support overlay planes *below* the primary plane
>>> (as is fairly common in non-PC hardware), and you can even support the
>>> background color KMS property.
>>
>> I thought that the primary would always cover the entire screen exactly.
>
> Nope. Maybe PC hardware has such limitations, but I'm quite sure there
> are display controllers that do not require this. Therefore VKMS should
> support the more generic case, and possible offer a configuration knob
> to reject atomic state where primary plane is not active or not
> covering the whole CRTC in order to be able to test userspace against
> such driver behaviour.
>
> After all, background color KMS property exists.

Ok, makes sense.

>
>>
>> So yeah, my patch code assumes that CRTC is the same size as the primary plane.
>> (and if I'm not mistaken the current version also assumes it).
>>
>> But If this is not the case, where are the CRTC dimensions?
>> Are they in the CRTC properties? drm_mode_config?
>>
>> I couldn't find them.
>
> It's the active area of the current video mode, I believe. How that
> translated to DRM code, I don't know.

I will try to find it.

>
>
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> new file mode 100644
>>>> index 000000000000..0d1838d1b835
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> @@ -0,0 +1,138 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0+ */
>>>> +
>>>> +#include <drm/drm_rect.h>
>>>> +#include "vkms_formats.h"
>>>> +
>>>> +format_transform_func get_fmt_transform_function(u32 format)
>>>> +{
>>>> +    if (format == DRM_FORMAT_ARGB8888)
>>>> +            return &ARGB8888_to_ARGB16161616;
>>>> +    else
>>>> +            return &XRGB8888_to_ARGB16161616;
>>>
>>> In functions like this you should prepare for caller errors. Use a
>>> switch, and fail any attempt to use a pixel format it doesn't support.
>>> Failing is much better than silently producing garbage or worse: buffer
>>> overruns when bytes-per-pixel is not what you expected.
>>>
>>> What to do on failure depends on whether the failure here is never
>>> supposed to happen (follow the kernel style) e.g. malicious userspace
>>> cannot trigger it, or if you actually use this function to define the
>>> supported for pixel formats.
>>
>> No, I don't use this function to define supported formats, It's defined:
>> - vkms_writeback.c:15
>> - vkms_plane.c:14 and 22
>>
>> And if I'm not mistaken the DRM framework takes care of validation.
>
> Then someone else comes, adds a new pixel format to those files, and
> does not even realize get_fmt_transform_function() exists.
>
> If you know that something must already ensure you cannot get
> unsupported pixel formats in this function, then I guess some kind of
> kernel panic here if you do get an unsupported pixel format would be
> appropriate? Or an oops.
>
> That would tell loud and clear to that other person they overlooked
> something. Assuming they test the code the new format.

Ok, got it. I will change it.

>
>>
>>>
>>> The latter means you'd have a list of all DRM pixel formats and then
>>> you'd ask for each one if this function knows it, and if yes, you add
>>> the format to the list of supported formats advertised to userspace. I
>>> don't know if that would be fine by DRM coding style.
>>>
>>>> +}
>>>> +
>>>> +format_transform_func get_wb_fmt_transform_function(u32 format)
>>>> +{
>>>> +    if (format == DRM_FORMAT_ARGB8888)
>>>> +            return &convert_to_ARGB8888;
>>>> +    else
>>>> +            return &convert_to_XRGB8888;
>>>> +}
>>>
>>> I think you could move the above getter functions to the bottom of the
>>> .c file, and make all the four *_to_* functions static, and remove them
>>> from the header.
>>
>> OK. I will do that.
>>
>> Question, what's the benefits of using static functions?
>
> Making code more contained. When people see that a function is static,
> they know it won't be directly referenced from any other file. This
> makes understanding easier. It's hygiene too: make everything static
> that could be static.
>
> Sometimes it can also have other benefits, like the compiler
> automatically inlining the whole thing, and not even emitting the
> independent function code. It might also speed up linking, as a static
> function cannot be a target.
>
>>>
>>>> +
>>>> +static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
>>>> +{
>>>> +    return frame_info->offset + (y * frame_info->pitch)
>>>> +                              + (x * frame_info->cpp);
>>>> +}
>>>> +
>>>> +/*
>>>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>>>> + *
>>>> + * @frame_info: Buffer metadata
>>>> + * @x: The x(width) coordinate of the 2D buffer
>>>> + * @y: The y(Heigth) coordinate of the 2D buffer
>>>> + *
>>>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>>>> + * returns the address of the first color channel.
>>>> + * This function assumes the channels are packed together, i.e. a color channel
>>>> + * comes immediately after another in the memory. And therefore, this function
>>>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>>>> + */
>>>> +static void *packed_pixels_addr(struct vkms_frame_info *frame_info, int x, int y)
>>>> +{
>>>> +    int offset = pixel_offset(frame_info, x, y);
>>>> +
>>>> +    return (u8 *)frame_info->map[0].vaddr + offset;
>>>> +}
>>>> +
>>>> +static void *get_packed_src_addr(struct vkms_frame_info *frame_info, int y)
>>>> +{
>>>> +    int x_src = frame_info->src.x1 >> 16;
>>>> +    int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>>>> +
>>>> +    return packed_pixels_addr(frame_info, x_src, y_src);
>>>> +}
>>>> +
>>>> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>>>> +                          struct line_buffer *stage_buffer)
>>>
>>> I'm fairly sure that DRM will one day add exactly ARGB16161616 format.
>
> Oops, I think the format already exists.
>
>>> But that will not be the format you use here (or it might be, but
>>> purely accidentally and depending on machine endianess and whatnot), so
>>> I would suggest inventing a new name. Also use the same name for the
>>> struct to hold a single pixel.
>>>
>>> E.g. struct pixel_argb_u16
>>
>> I'm terrible with names of variables, functions, etc. I will end-up with
>> ARGB8888_to_argb_u16.
>
> Sounds fine.
>
>>
>> I have a patch with the implementation of nv12 and YUV420 formats, but
>> I don't know how to test it because the ".*kms_plane@pixel-format*" igt test
>> doesn't support these formats (And also it isn't working anymore with my hack).
>>
>> Do you know how to test it?
>
> I think the best way would be to teach IGT to test it. Then everyone
> will automatically benefit from it.
>
> I don't really know anything about IGT code.
>
> FWIW, Weston has some YUV testing code I wrote in
> tests/yuv-buffer-test.c for the color conversions, but it's very
> limited scope (only BT.601, only limited range, and ignores chroma
> siting).

Right. I will do it later, As this patch series is already big enough.

Best Regards,
---
Igor Torrente

>
>
> Thanks,
> pq

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-02-27 14:20 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-21 21:38 [PATCH v4 0/9] Add new formats support to vkms Igor Torrente
2022-01-21 21:38 ` [PATCH v4 1/9] drm: vkms: Replace the deprecated drm_mode_config_init Igor Torrente
2022-02-08 10:02   ` Melissa Wen
2022-01-21 21:38 ` [PATCH v4 2/9] drm: vkms: Alloc the compose frame using vzalloc Igor Torrente
2022-02-08 10:14   ` Melissa Wen
2022-01-21 21:38 ` [PATCH v4 3/9] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES Igor Torrente
2022-02-08 10:16   ` Melissa Wen
2022-01-21 21:38 ` [PATCH v4 4/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info` Igor Torrente
2022-02-08 10:20   ` Melissa Wen
2022-01-21 21:38 ` [PATCH v4 5/9] drm: vkms: Add fb information to `vkms_writeback_job` Igor Torrente
2022-02-08 10:22   ` Melissa Wen
2022-01-21 21:38 ` [PATCH v4 6/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation Igor Torrente
2022-01-21 21:38 ` [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats Igor Torrente
2022-02-08 10:40   ` Melissa Wen
2022-02-09  0:58     ` Igor Torrente
2022-02-09 21:45       ` Melissa Wen
2022-02-21  1:02         ` Igor Torrente
2022-02-21  9:18           ` Pekka Paalanen
2022-02-22  1:13             ` Igor Torrente
2022-02-22  9:26               ` Pekka Paalanen
2022-02-10  9:37   ` Pekka Paalanen
2022-02-25  0:43     ` Igor Torrente
2022-02-25  9:38       ` Pekka Paalanen
2022-02-27 14:19         ` Igor Torrente
2022-01-21 21:38 ` [PATCH v4 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats Igor Torrente
2022-01-21 21:38 ` [PATCH v4 9/9] drm: vkms: Add support to the RGB565 format Igor Torrente
2022-02-08 10:50   ` Melissa Wen
2022-02-10  9:50   ` Pekka Paalanen
2022-02-25  1:03     ` Igor Torrente
2022-02-25  9:43       ` Pekka Paalanen
2022-02-08 11:03 ` [PATCH v4 0/9] Add new formats support to vkms Melissa Wen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.