linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading
@ 2024-02-26  8:46 Louis Chauvet
  2024-02-26  8:46 ` [PATCH v3 1/9] drm/vkms: Code formatting Louis Chauvet
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Louis Chauvet @ 2024-02-26  8:46 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, arthurgrillo, Jonathan Corbet,
	pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee,
	Louis Chauvet

This patchset is the second version of [1]. It is almost a complete 
rewrite to use a line-by-line algorithm for the composition.
It can be divided in three parts:
- PATCH 1 to 4: no functional change is intended, only some formatting and 
documenting
(PATCH 2 is taken from [2])
- PATCH 5: main patch for this series, it reintroduce the 
line-by-line algorithm
- PATCH 6 to 9: taken from Arthur's series [2], with sometimes adaptation 
to use the pixel-by-pixel algorithm.

The PATCH 5 aims to restore the line-by-line pixel reading algorithm. It 
was introduced in 8ba1648567e2 ("drm: vkms: Refactor the plane composer to 
accept new formats") but removed in 8ba1648567e2 ("drm: vkms: Refactor the 
plane composer to accept new formats") in a over-simplification effort. 
At this time, nobody noticed the performance impact of this commit. After 
the first iteration of my series, poeple notice performance impact, and it 
was the case. Pekka suggested to reimplement the line-by-line algorithm.

Expiriments on my side shown great improvement for the line-by-line 
algorithm, and the performances are the same as the original line-by-line 
algorithm. I targeted my effort to make the code working for all the 
rotations and translations. The usage of helpers from drm_rect_* avoid 
reimplementing existing logic.

The only "complex" part remaining is the clipping of the coordinate to 
avoid reading/writing outside of src/dst. Thus I added a lot of comments 
to help when someone will want to add some features (framebuffer resizing 
for example).

The YUV part is not mandatory for this series, but as my first effort was 
to help the integration of YUV, I decided to rebase Arthur's series on 
mine to help. I took [3], [4], [5] and [6] and adapted them to use the 
line-by-line reading. If I did something wrong here, please let me 
know.

My series was mainly tested with:
- kms_plane (for color conversions)
- kms_rotation_crc (for rotations of planes)
- kms_cursor_crc (for translations)
The benchmark used to measure the improvment was done with:
- kms_fb_stress

[1]: https://lore.kernel.org/r/20240201-yuv-v1-0-3ca376f27632@bootlin.com
[2]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-0-952fcaa5a193@riseup.net/
[3]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-3-952fcaa5a193@riseup.net/
[4]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-5-952fcaa5a193@riseup.net/
[5]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-6-952fcaa5a193@riseup.net/
[6]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-7-952fcaa5a193@riseup.net/

To: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
To: Melissa Wen <melissa.srw@gmail.com>
To: Maíra Canal <mairacanal@riseup.net>
To: Haneen Mohammed <hamohammed.sa@gmail.com>
To: Daniel Vetter <daniel@ffwll.ch>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: Maxime Ripard <mripard@kernel.org>
To: Thomas Zimmermann <tzimmermann@suse.de>
To: David Airlie <airlied@gmail.com>
To: arthurgrillo@riseup.net
To: Jonathan Corbet <corbet@lwn.net>
To: pekka.paalanen@haloniitty.fi
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: jeremie.dautheribes@bootlin.com
Cc: miquel.raynal@bootlin.com
Cc: thomas.petazzoni@bootlin.com
Cc: seanpaul@google.com
Cc: marcheu@google.com
Cc: nicolejadeyee@google.com
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>

Note: after my changes, those tests seems to pass, so [7] may need 
updating (I did not check, it was maybe already the case):
- kms_cursor_legacy@flip-vs-cursor-atomic
- kms_pipe_crc_basic@nonblocking-crc
- kms_pipe_crc_basic@nonblocking-crc-frame-sequence
- kms_writeback@writeback-pixel-formats
- kms_writeback@writeback-invalid-parameters
- kms_flip@flip-vs-absolute-wf_vblank-interruptible
And those tests pass, I did not investigate why the runners fails:
- kms_flip@flip-vs-expired-vblank-interruptible
- kms_flip@flip-vs-expired-vblank
- kms_flip@plain-flip-fb-recreate
- kms_flip@plain-flip-fb-recreate-interruptible
- kms_flip@plain-flip-ts-check-interruptible
- kms_cursor_legacy@cursorA-vs-flipA-toggle
- kms_pipe_crc_basic@nonblocking-crc
- kms_prop_blob@invalid-get-prop
- kms_flip@flip-vs-absolute-wf_vblank-interruptible
- kms_invalid_mode@zero-hdisplay
- kms_invalid_mode@bad-vtotal
- kms_cursor_crc.* (everything is SUCCEED or SKIP, but no fails)

[7]: https://lore.kernel.org/all/20240201065346.801038-1-vignesh.raman@collabora.com/

Changes in v3:
- Correction of remaining git-rebase artefacts
- Added Pekka in copy of this patch
- Link to v2: https://lore.kernel.org/r/20240223-yuv-v2-0-aa6be2827bb7@bootlin.com
Changes in v2:
- Rebased the series on top of drm-misc/drm-misc-net
- Extract the typedef for pixel_read/pixel_write
- Introduce the line-by-line algorithm per pixel format
- Add some documentation for existing and new code
- Port the series [1] to use line-by-line algorithm
- Link to v1: https://lore.kernel.org/r/20240201-yuv-v1-0-3ca376f27632@bootlin.com

---
Arthur Grillo (5):
      drm/vkms: Use drm_frame directly
      drm/vkms: Add YUV support
      drm/vkms: Add range and encoding properties to pixel_read function
      drm/vkms: Drop YUV formats TODO
      drm/vkms: Create KUnit tests for YUV conversions

Louis Chauvet (4):
      drm/vkms: Code formatting
      drm/vkms: write/update the documentation for pixel conversion and pixel write functions
      drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
      drm/vkms: Re-introduce line-per-line composition algorithm

 Documentation/gpu/vkms.rst                    |   3 +-
 drivers/gpu/drm/vkms/Makefile                 |   1 +
 drivers/gpu/drm/vkms/tests/.kunitconfig       |   4 +
 drivers/gpu/drm/vkms/tests/Makefile           |   3 +
 drivers/gpu/drm/vkms/tests/vkms_format_test.c | 155 +++++++
 drivers/gpu/drm/vkms/vkms_composer.c          | 233 ++++++++---
 drivers/gpu/drm/vkms/vkms_crtc.c              |   6 +-
 drivers/gpu/drm/vkms/vkms_drv.c               |   3 +-
 drivers/gpu/drm/vkms/vkms_drv.h               |  55 ++-
 drivers/gpu/drm/vkms/vkms_formats.c           | 565 +++++++++++++++++++++-----
 drivers/gpu/drm/vkms/vkms_formats.h           |  13 +-
 drivers/gpu/drm/vkms/vkms_plane.c             |  50 ++-
 drivers/gpu/drm/vkms/vkms_writeback.c         |  14 +-
 13 files changed, 915 insertions(+), 190 deletions(-)
---
base-commit: c079e2e113f2ec2803ba859bbb442a6ab82c96bd
change-id: 20240201-yuv-1337d90d9576

Best regards,
-- 
Louis Chauvet <louis.chauvet@bootlin.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/9] drm/vkms: Code formatting
  2024-02-26  8:46 [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
@ 2024-02-26  8:46 ` Louis Chauvet
  2024-02-26  8:46 ` [PATCH v3 2/9] drm/vkms: Use drm_frame directly Louis Chauvet
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Louis Chauvet @ 2024-02-26  8:46 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, arthurgrillo, Jonathan Corbet,
	pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee,
	Louis Chauvet

Few no-op changes to remove double spaces and fix wrong alignments.

Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c | 10 +++++-----
 drivers/gpu/drm/vkms/vkms_crtc.c     |  6 ++----
 drivers/gpu/drm/vkms/vkms_drv.c      |  3 +--
 drivers/gpu/drm/vkms/vkms_plane.c    |  9 ++++-----
 4 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index e7441b227b3c..c6d9b4a65809 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -96,7 +96,7 @@ static u16 lerp_u16(u16 a, u16 b, s64 t)
 	s64 a_fp = drm_int2fixp(a);
 	s64 b_fp = drm_int2fixp(b);
 
-	s64 delta = drm_fixp_mul(b_fp - a_fp,  t);
+	s64 delta = drm_fixp_mul(b_fp - a_fp, t);
 
 	return drm_fixp2int(a_fp + delta);
 }
@@ -302,8 +302,8 @@ static int compose_active_planes(struct vkms_writeback_job *active_wb,
 void vkms_composer_worker(struct work_struct *work)
 {
 	struct vkms_crtc_state *crtc_state = container_of(work,
-						struct vkms_crtc_state,
-						composer_work);
+							  struct vkms_crtc_state,
+							  composer_work);
 	struct drm_crtc *crtc = crtc_state->base.crtc;
 	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
 	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
@@ -328,7 +328,7 @@ void vkms_composer_worker(struct work_struct *work)
 		crtc_state->gamma_lut.base = (struct drm_color_lut *)crtc->state->gamma_lut->data;
 		crtc_state->gamma_lut.lut_length =
 			crtc->state->gamma_lut->length / sizeof(struct drm_color_lut);
-		max_lut_index_fp = drm_int2fixp(crtc_state->gamma_lut.lut_length  - 1);
+		max_lut_index_fp = drm_int2fixp(crtc_state->gamma_lut.lut_length - 1);
 		crtc_state->gamma_lut.channel_value2index_ratio = drm_fixp_div(max_lut_index_fp,
 									       u16_max_fp);
 
@@ -367,7 +367,7 @@ void vkms_composer_worker(struct work_struct *work)
 		drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32);
 }
 
-static const char * const pipe_crc_sources[] = {"auto"};
+static const char *const pipe_crc_sources[] = { "auto" };
 
 const char *const *vkms_get_crc_sources(struct drm_crtc *crtc,
 					size_t *count)
diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
index 61e500b8c9da..7586ae2e1dd3 100644
--- a/drivers/gpu/drm/vkms/vkms_crtc.c
+++ b/drivers/gpu/drm/vkms/vkms_crtc.c
@@ -191,8 +191,7 @@ static int vkms_crtc_atomic_check(struct drm_crtc *crtc,
 		return ret;
 
 	drm_for_each_plane_mask(plane, crtc->dev, crtc_state->plane_mask) {
-		plane_state = drm_atomic_get_existing_plane_state(crtc_state->state,
-								  plane);
+		plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, plane);
 		WARN_ON(!plane_state);
 
 		if (!plane_state->visible)
@@ -208,8 +207,7 @@ static int vkms_crtc_atomic_check(struct drm_crtc *crtc,
 
 	i = 0;
 	drm_for_each_plane_mask(plane, crtc->dev, crtc_state->plane_mask) {
-		plane_state = drm_atomic_get_existing_plane_state(crtc_state->state,
-								  plane);
+		plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, plane);
 
 		if (!plane_state->visible)
 			continue;
diff --git a/drivers/gpu/drm/vkms/vkms_drv.c b/drivers/gpu/drm/vkms/vkms_drv.c
index dd0af086e7fa..83e6c9b9ff46 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.c
+++ b/drivers/gpu/drm/vkms/vkms_drv.c
@@ -81,8 +81,7 @@ static void vkms_atomic_commit_tail(struct drm_atomic_state *old_state)
 	drm_atomic_helper_wait_for_flip_done(dev, old_state);
 
 	for_each_old_crtc_in_state(old_state, crtc, old_crtc_state, i) {
-		struct vkms_crtc_state *vkms_state =
-			to_vkms_crtc_state(old_crtc_state);
+		struct vkms_crtc_state *vkms_state = to_vkms_crtc_state(old_crtc_state);
 
 		flush_work(&vkms_state->composer_work);
 	}
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index e5c625ab8e3e..90c09046e0af 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -112,15 +112,14 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	frame_info = vkms_plane_state->frame_info;
 	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
 	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
-	memcpy(&frame_info->rotated, &new_state->dst, sizeof(struct drm_rect));
 	frame_info->fb = fb;
 	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
 	drm_framebuffer_get(frame_info->fb);
 	frame_info->rotation = drm_rotation_simplify(new_state->rotation, DRM_MODE_ROTATE_0 |
-						     DRM_MODE_ROTATE_90 |
-						     DRM_MODE_ROTATE_270 |
-						     DRM_MODE_REFLECT_X |
-						     DRM_MODE_REFLECT_Y);
+									  DRM_MODE_ROTATE_90 |
+									  DRM_MODE_ROTATE_270 |
+									  DRM_MODE_REFLECT_X |
+									  DRM_MODE_REFLECT_Y);
 
 	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
 			drm_rect_height(&frame_info->rotated), frame_info->rotation);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/9] drm/vkms: Use drm_frame directly
  2024-02-26  8:46 [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
  2024-02-26  8:46 ` [PATCH v3 1/9] drm/vkms: Code formatting Louis Chauvet
@ 2024-02-26  8:46 ` Louis Chauvet
  2024-02-26  8:46 ` [PATCH v3 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions Louis Chauvet
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Louis Chauvet @ 2024-02-26  8:46 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, arthurgrillo, Jonathan Corbet,
	pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee,
	Louis Chauvet

From: Arthur Grillo <arthurgrillo@riseup.net>

Remove intermidiary variables and access the variables directly from
drm_frame. These changes should be noop.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_drv.h       |  3 ---
 drivers/gpu/drm/vkms/vkms_formats.c   | 12 +++++++-----
 drivers/gpu/drm/vkms/vkms_plane.c     |  3 ---
 drivers/gpu/drm/vkms/vkms_writeback.c |  5 -----
 4 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 8f5710debb1e..b4b357447292 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -31,9 +31,6 @@ struct vkms_frame_info {
 	struct drm_rect rotated;
 	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
 	unsigned int rotation;
-	unsigned int offset;
-	unsigned int pitch;
-	unsigned int cpp;
 };
 
 struct pixel_argb_u16 {
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 36046b12f296..172830a3936a 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -11,8 +11,10 @@
 
 static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
 {
-	return frame_info->offset + (y * frame_info->pitch)
-				  + (x * frame_info->cpp);
+	struct drm_framebuffer *fb = frame_info->fb;
+
+	return fb->offsets[0] + (y * fb->pitches[0])
+			      + (x * fb->format->cpp[0]);
 }
 
 /*
@@ -131,12 +133,12 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
 	u8 *src_pixels = get_packed_src_addr(frame_info, y);
 	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
 
-	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->cpp) {
+	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
 		int x_pos = get_x_position(frame_info, limit, x);
 
 		if (drm_rotation_90_or_270(frame_info->rotation))
 			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
-				+ frame_info->cpp * y;
+				+ frame_info->fb->format->cpp[0] * y;
 
 		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
 	}
@@ -223,7 +225,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
 	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
 	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
 
-	for (size_t x = 0; x < x_limit; x++, dst_pixels += frame_info->cpp)
+	for (size_t x = 0; x < x_limit; x++, dst_pixels += frame_info->fb->format->cpp[0])
 		wb->pixel_write(dst_pixels, &in_pixels[x]);
 }
 
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 90c09046e0af..d5203f531d96 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -124,9 +124,6 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
 			drm_rect_height(&frame_info->rotated), frame_info->rotation);
 
-	frame_info->offset = fb->offsets[0];
-	frame_info->pitch = fb->pitches[0];
-	frame_info->cpp = fb->format->cpp[0];
 	vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt);
 }
 
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index bc724cbd5e3a..c8582df1f739 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -149,11 +149,6 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
 	crtc_state->active_writeback = active_wb;
 	crtc_state->wb_pending = true;
 	spin_unlock_irq(&output->composer_lock);
-
-	wb_frame_info->offset = fb->offsets[0];
-	wb_frame_info->pitch = fb->pitches[0];
-	wb_frame_info->cpp = fb->format->cpp[0];
-
 	drm_writeback_queue_job(wb_conn, connector_state);
 	active_wb->pixel_write = get_pixel_write_function(wb_format);
 	drm_rect_init(&wb_frame_info->src, 0, 0, crtc_width, crtc_height);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions
  2024-02-26  8:46 [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
  2024-02-26  8:46 ` [PATCH v3 1/9] drm/vkms: Code formatting Louis Chauvet
  2024-02-26  8:46 ` [PATCH v3 2/9] drm/vkms: Use drm_frame directly Louis Chauvet
@ 2024-02-26  8:46 ` Louis Chauvet
  2024-02-26 13:07   ` Arthur Grillo
  2024-02-26  8:46 ` [PATCH v3 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions Louis Chauvet
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Louis Chauvet @ 2024-02-26  8:46 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, arthurgrillo, Jonathan Corbet,
	pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee,
	Louis Chauvet

Add some documentation on pixel conversion functions.
Update of outdated comments for pixel_write functions.

Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c |  4 +++
 drivers/gpu/drm/vkms/vkms_drv.h      | 13 ++++++++
 drivers/gpu/drm/vkms/vkms_formats.c  | 58 ++++++++++++++++++++++++++++++------
 3 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index c6d9b4a65809..5b341222d239 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -189,6 +189,10 @@ static void blend(struct vkms_writeback_job *wb,
 
 	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
 
+	/*
+	 * The planes are composed line-by-line. It is a necessary complexity to avoid poor
+	 * blending performance.
+	 */
 	for (size_t y = 0; y < crtc_y_limit; y++) {
 		fill_background(&background_color, output_buffer);
 
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index b4b357447292..18086423a3a7 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -25,6 +25,17 @@
 
 #define VKMS_LUT_SIZE 256
 
+/**
+ * struct vkms_frame_info - structure to store the state of a frame
+ *
+ * @fb: backing drm framebuffer
+ * @src: source rectangle of this frame in the source framebuffer
+ * @dst: destination rectangle in the crtc buffer
+ * @map: see drm_shadow_plane_state@data
+ * @rotation: rotation applied to the source.
+ *
+ * @src and @dst should have the same size modulo the rotation.
+ */
 struct vkms_frame_info {
 	struct drm_framebuffer *fb;
 	struct drm_rect src, dst;
@@ -52,6 +63,8 @@ struct vkms_writeback_job {
  * vkms_plane_state - Driver specific plane state
  * @base: base plane state
  * @frame_info: data required for composing computation
+ * @pixel_read: function to read a pixel in this plane. The creator of a vkms_plane_state must
+ * ensure that this pointer is valid
  */
 struct vkms_plane_state {
 	struct drm_shadow_plane_state base;
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 172830a3936a..cb7a49b7c8e7 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -9,6 +9,17 @@
 
 #include "vkms_formats.h"
 
+/**
+ * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
+ * in the first plane
+ *
+ * @frame_info: Buffer metadata
+ * @x: The x coordinate of the wanted pixel in the buffer
+ * @y: The y coordinate of the wanted pixel in the buffer
+ *
+ * The caller must be aware that this offset is not always a pointer to a pixel. If individual
+ * pixel values are needed, they have to be extracted from the resulting block.
+ */
 static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
 {
 	struct drm_framebuffer *fb = frame_info->fb;
@@ -17,12 +28,13 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
 			      + (x * fb->format->cpp[0]);
 }
 
-/*
- * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
+/**
+ * packed_pixels_addr() - Get the pointer to the block containing the pixel at the given
+ * coordinates
  *
  * @frame_info: Buffer metadata
- * @x: The x(width) coordinate of the 2D buffer
- * @y: The y(Heigth) coordinate of the 2D buffer
+ * @x: The x(width) coordinate inside the plane
+ * @y: The y(height) coordinate inside the plane
  *
  * Takes the information stored in the frame_info, a pair of coordinates, and
  * returns the address of the first color channel.
@@ -53,6 +65,13 @@ static int get_x_position(const struct vkms_frame_info *frame_info, int limit, i
 	return x;
 }
 
+/*
+ * The following  functions take pixel data from the buffer and convert them to the format
+ * ARGB16161616 in out_pixel.
+ *
+ * They are used in the `vkms_compose_row` function to handle multiple formats.
+ */
+
 static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
 {
 	/*
@@ -145,12 +164,11 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
 }
 
 /*
- * The following  functions take an line of argb_u16 pixels from the
- * src_buffer, convert them to a specific format, and store them in the
- * destination.
+ * The following functions take one argb_u16 pixel and convert it to a specific format. The
+ * result is stored in @dst_pixels.
  *
- * They are used in the `compose_active_planes` to convert and store a line
- * from the src_buffer to the writeback buffer.
+ * They are used in the `vkms_writeback_row` to convert and store a pixel from the src_buffer to
+ * the writeback buffer.
  */
 static void argb_u16_to_ARGB8888(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
 {
@@ -216,6 +234,14 @@ static void argb_u16_to_RGB565(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
 	*pixels = cpu_to_le16(r << 11 | g << 5 | b);
 }
 
+/**
+ * Generic loop for all supported writeback format. It is executed just after the blending to
+ * write a line in the writeback buffer.
+ *
+ * @wb: Job where to insert the final image
+ * @src_buffer: Line to write
+ * @y: Row to write in the writeback buffer
+ */
 void vkms_writeback_row(struct vkms_writeback_job *wb,
 			const struct line_buffer *src_buffer, int y)
 {
@@ -229,6 +255,13 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
 		wb->pixel_write(dst_pixels, &in_pixels[x]);
 }
 
+/**
+ * Retrieve the correct read_pixel function for a specific format.
+ * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
+ * pointer is valid before using it in a vkms_plane_state.
+ *
+ * @format: 4cc of the format
+ */
 void *get_pixel_conversion_function(u32 format)
 {
 	switch (format) {
@@ -247,6 +280,13 @@ void *get_pixel_conversion_function(u32 format)
 	}
 }
 
+/**
+ * Retrieve the correct write_pixel function for a specific format.
+ * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
+ * pointer is valid before using it in a vkms_writeback_job.
+ *
+ * @format: 4cc of the format
+ */
 void *get_pixel_write_function(u32 format)
 {
 	switch (format) {

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
  2024-02-26  8:46 [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (2 preceding siblings ...)
  2024-02-26  8:46 ` [PATCH v3 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions Louis Chauvet
@ 2024-02-26  8:46 ` Louis Chauvet
  2024-02-26 12:44   ` Arthur Grillo
  2024-02-26  8:46 ` [PATCH v3 5/9] drm/vkms: Re-introduce line-per-line composition algorithm Louis Chauvet
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Louis Chauvet @ 2024-02-26  8:46 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, arthurgrillo, Jonathan Corbet,
	pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee,
	Louis Chauvet

Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
compiler to check if the passed functions take the correct arguments.
Such typedefs will help ensuring consistency across the code base in
case of update of these prototypes.

Introduce a check around the get_pixel_*_functions to avoid using a
nullptr as a function.

Document for those typedefs.

Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_drv.h       | 23 +++++++++++++++++++++--
 drivers/gpu/drm/vkms/vkms_formats.c   |  8 ++++----
 drivers/gpu/drm/vkms/vkms_formats.h   |  4 ++--
 drivers/gpu/drm/vkms/vkms_plane.c     |  9 ++++++++-
 drivers/gpu/drm/vkms/vkms_writeback.c |  9 ++++++++-
 5 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 18086423a3a7..886c885c8cf5 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -53,12 +53,31 @@ struct line_buffer {
 	struct pixel_argb_u16 *pixels;
 };
 
+/**
+ * typedef pixel_write_t - These functions are used to read a pixel from a
+ * `struct pixel_argb_u16*`, convert it in a specific format and write it in the @dst_pixels
+ * buffer.
+ *
+ * @dst_pixel: destination address to write the pixel
+ * @in_pixel: pixel to write
+ */
+typedef void (*pixel_write_t)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
+
 struct vkms_writeback_job {
 	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
 	struct vkms_frame_info wb_frame_info;
-	void (*pixel_write)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
+	pixel_write_t pixel_write;
 };
 
+/**
+ * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
+ * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
+ *
+ * @src_pixels: Pointer to the pixel to read
+ * @out_pixel: Pointer to write the converted pixel
+ */
+typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
+
 /**
  * vkms_plane_state - Driver specific plane state
  * @base: base plane state
@@ -69,7 +88,7 @@ struct vkms_writeback_job {
 struct vkms_plane_state {
 	struct drm_shadow_plane_state base;
 	struct vkms_frame_info *frame_info;
-	void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel);
+	pixel_read_t pixel_read;
 };
 
 struct vkms_plane {
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index cb7a49b7c8e7..1f5aeba57ad6 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -262,7 +262,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
  *
  * @format: 4cc of the format
  */
-void *get_pixel_conversion_function(u32 format)
+pixel_read_t get_pixel_read_function(u32 format)
 {
 	switch (format) {
 	case DRM_FORMAT_ARGB8888:
@@ -276,7 +276,7 @@ void *get_pixel_conversion_function(u32 format)
 	case DRM_FORMAT_RGB565:
 		return &RGB565_to_argb_u16;
 	default:
-		return NULL;
+		return (pixel_read_t)NULL;
 	}
 }
 
@@ -287,7 +287,7 @@ void *get_pixel_conversion_function(u32 format)
  *
  * @format: 4cc of the format
  */
-void *get_pixel_write_function(u32 format)
+pixel_write_t get_pixel_write_function(u32 format)
 {
 	switch (format) {
 	case DRM_FORMAT_ARGB8888:
@@ -301,6 +301,6 @@ void *get_pixel_write_function(u32 format)
 	case DRM_FORMAT_RGB565:
 		return &argb_u16_to_RGB565;
 	default:
-		return NULL;
+		return (pixel_write_t)NULL;
 	}
 }
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index cf59c2ed8e9a..3ecea4563254 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -5,8 +5,8 @@
 
 #include "vkms_drv.h"
 
-void *get_pixel_conversion_function(u32 format);
+pixel_read_t get_pixel_read_function(u32 format);
 
-void *get_pixel_write_function(u32 format);
+pixel_write_t get_pixel_write_function(u32 format);
 
 #endif /* _VKMS_FORMATS_H_ */
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index d5203f531d96..f68b1b03d632 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -106,6 +106,13 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 		return;
 
 	fmt = fb->format->format;
+	pixel_read_t pixel_read = get_pixel_read_function(fmt);
+
+	if (!pixel_read) {
+		DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
+		return;
+	}
+
 	vkms_plane_state = to_vkms_plane_state(new_state);
 	shadow_plane_state = &vkms_plane_state->base;
 
@@ -124,7 +131,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
 			drm_rect_height(&frame_info->rotated), frame_info->rotation);
 
-	vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt);
+	vkms_plane_state->pixel_read = pixel_read;
 }
 
 static int vkms_plane_atomic_check(struct drm_plane *plane,
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index c8582df1f739..c92b9f06c4a4 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -140,6 +140,13 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
 	if (!conn_state)
 		return;
 
+	pixel_write_t pixel_write = get_pixel_write_function(wb_format);
+
+	if (!pixel_write) {
+		DRM_WARN("Pixel format is not supported by VKMS writeback. State is inchanged\n");
+		return;
+	}
+
 	vkms_set_composer(&vkmsdev->output, true);
 
 	active_wb = conn_state->writeback_job->priv;
@@ -150,7 +157,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
 	crtc_state->wb_pending = true;
 	spin_unlock_irq(&output->composer_lock);
 	drm_writeback_queue_job(wb_conn, connector_state);
-	active_wb->pixel_write = get_pixel_write_function(wb_format);
+	active_wb->pixel_write = pixel_write;
 	drm_rect_init(&wb_frame_info->src, 0, 0, crtc_width, crtc_height);
 	drm_rect_init(&wb_frame_info->dst, 0, 0, crtc_width, crtc_height);
 }

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 5/9] drm/vkms: Re-introduce line-per-line composition algorithm
  2024-02-26  8:46 [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (3 preceding siblings ...)
  2024-02-26  8:46 ` [PATCH v3 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions Louis Chauvet
@ 2024-02-26  8:46 ` Louis Chauvet
  2024-02-26 14:14   ` Arthur Grillo
  2024-02-26  8:46 ` [PATCH v3 6/9] drm/vkms: Add YUV support Louis Chauvet
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Louis Chauvet @ 2024-02-26  8:46 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, arthurgrillo, Jonathan Corbet,
	pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee,
	Louis Chauvet

Re-introduce a line-by-line composition algorithm for each pixel format.
This allows more performance by not requiring an indirection per pixel
read. This patch is focused on readability of the code.

Line-by-line composition was introduced by [1] but rewritten back to
pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
on performance, and it was merged.

This patch is almost a revert of [2], but in addition efforts have been
made to increase readability and maintainability of the rotation handling.
The blend function is now divided in two parts:
- Transformation of coordinates from the output referential to the source
referential
- Line conversion and blending

Most of the complexity of the rotation management is avoided by using
drm_rect_* helpers. The remaining complexity is around the clipping, to
avoid reading/writing outside source/destination buffers.

The pixel conversion is now done line-by-line, so the read_pixel_t was
replaced with read_pixel_line_t callback. This way the indirection is only
required once per line and per plane, instead of once per pixel and per
plane.

The read_line_t callbacks are very similar for most pixel format, but it
is required to avoid performance impact. Some helpers were created to
avoid code repetition:
- get_step_1x1: get the step in byte to reach next pixel block in a
  certain direction
- *_to_argb_u16: helpers to perform colors conversion. They should be
  inlined by the compiler, and they are used to avoid repetition between
  multiple variants of the same format (argb/xrgb and maybe in the
  future for formats like bgr formats).

This new algorithm was tested with:
- kms_plane (for color conversions)
- kms_rotation_crc (for rotations of planes)
- kms_cursor_crc (for translations of planes)
The performance gain was mesured with:
- kms_fb_stress

[1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept
     new formats")
     https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@gmail.com/
[2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion
     functionality")
     https://lore.kernel.org/all/20230418130525.128733-2-mcanal@igalia.com/

Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c | 219 +++++++++++++++++++++++-------
 drivers/gpu/drm/vkms/vkms_drv.h      |  24 +++-
 drivers/gpu/drm/vkms/vkms_formats.c  | 253 ++++++++++++++++++++++-------------
 drivers/gpu/drm/vkms/vkms_formats.h  |   2 +-
 drivers/gpu/drm/vkms/vkms_plane.c    |   8 +-
 5 files changed, 349 insertions(+), 157 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 5b341222d239..e555bf9c1aee 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -24,9 +24,10 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
 
 /**
  * pre_mul_alpha_blend - alpha blending equation
- * @frame_info: Source framebuffer's metadata
  * @stage_buffer: The line with the pixels from src_plane
  * @output_buffer: A line buffer that receives all the blends output
+ * @x_start: The start offset to avoid useless copy
+ * @count: The number of byte to copy
  *
  * Using the information from the `frame_info`, this blends only the
  * necessary pixels from the `stage_buffer` to the `output_buffer`
@@ -37,51 +38,23 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
  * drm_plane_create_blend_mode_property(). Also, this formula assumes a
  * completely opaque background.
  */
-static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
-				struct line_buffer *stage_buffer,
-				struct line_buffer *output_buffer)
+static void pre_mul_alpha_blend(
+	struct line_buffer *stage_buffer,
+	struct line_buffer *output_buffer,
+	int x_start,
+	int pixel_count)
 {
-	int x_dst = frame_info->dst.x1;
-	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
-	struct pixel_argb_u16 *in = stage_buffer->pixels;
-	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
-			    stage_buffer->n_pixels);
-
-	for (int x = 0; x < x_limit; x++) {
-		out[x].a = (u16)0xffff;
-		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
-		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
-		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
+	struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
+	struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
+
+	for (int i = 0; i < pixel_count; i++) {
+		out[i].a = (u16)0xffff;
+		out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a);
+		out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a);
+		out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a);
 	}
 }
 
-static int get_y_pos(struct vkms_frame_info *frame_info, int y)
-{
-	if (frame_info->rotation & DRM_MODE_REFLECT_Y)
-		return drm_rect_height(&frame_info->rotated) - y - 1;
-
-	switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) {
-	case DRM_MODE_ROTATE_90:
-		return frame_info->rotated.x2 - y - 1;
-	case DRM_MODE_ROTATE_270:
-		return y + frame_info->rotated.x1;
-	default:
-		return y;
-	}
-}
-
-static bool check_limit(struct vkms_frame_info *frame_info, int pos)
-{
-	if (drm_rotation_90_or_270(frame_info->rotation)) {
-		if (pos >= 0 && pos < drm_rect_width(&frame_info->rotated))
-			return true;
-	} else {
-		if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2)
-			return true;
-	}
-
-	return false;
-}
 
 static void fill_background(const struct pixel_argb_u16 *background_color,
 			    struct line_buffer *output_buffer)
@@ -163,6 +136,37 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff
 	}
 }
 
+/**
+ * direction_for_rotation() - Helper to get the correct reading direction for a specific rotation
+ *
+ * @rotation: rotation to analyze
+ */
+enum pixel_read_direction direction_for_rotation(unsigned int rotation)
+{
+	if (rotation & DRM_MODE_ROTATE_0) {
+		if (rotation & DRM_MODE_REFLECT_X)
+			return READ_LEFT;
+		else
+			return READ_RIGHT;
+	} else if (rotation & DRM_MODE_ROTATE_90) {
+		if (rotation & DRM_MODE_REFLECT_Y)
+			return READ_UP;
+		else
+			return READ_DOWN;
+	} else if (rotation & DRM_MODE_ROTATE_180) {
+		if (rotation & DRM_MODE_REFLECT_X)
+			return READ_RIGHT;
+		else
+			return READ_LEFT;
+	} else if (rotation & DRM_MODE_ROTATE_270) {
+		if (rotation & DRM_MODE_REFLECT_Y)
+			return READ_DOWN;
+		else
+			return READ_UP;
+	}
+	return READ_RIGHT;
+}
+
 /**
  * blend - blend the pixels from all planes and compute crc
  * @wb: The writeback frame buffer metadata
@@ -183,11 +187,11 @@ static void blend(struct vkms_writeback_job *wb,
 {
 	struct vkms_plane_state **plane = crtc_state->active_planes;
 	u32 n_active_planes = crtc_state->num_active_planes;
-	int y_pos;
 
 	const struct pixel_argb_u16 background_color = { .a = 0xffff };
 
 	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
+	size_t crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;
 
 	/*
 	 * The planes are composed line-by-line. It is a necessary complexity to avoid poor
@@ -198,22 +202,133 @@ static void blend(struct vkms_writeback_job *wb,
 
 		/* The active planes are composed associatively in z-order. */
 		for (size_t i = 0; i < n_active_planes; i++) {
-			y_pos = get_y_pos(plane[i]->frame_info, y);
+			struct vkms_plane_state *current_plane = plane[i];
 
-			if (!check_limit(plane[i]->frame_info, y_pos))
+			/* Avoid rendering useless lines */
+			if (y < current_plane->frame_info->dst.y1 ||
+			    y >= current_plane->frame_info->dst.y2) {
 				continue;
-
-			vkms_compose_row(stage_buffer, plane[i], y_pos);
-			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
-					    output_buffer);
+			}
+
+			/*
+			 * src_px is the line to copy. The initial coordinates are inside the
+			 * destination framebuffer, and then drm_rect_* helpers are used to
+			 * compute the correct position into the source framebuffer.
+			 */
+			struct drm_rect src_px = DRM_RECT_INIT(
+				current_plane->frame_info->dst.x1, y,
+				drm_rect_width(&current_plane->frame_info->dst), 1);
+			struct drm_rect tmp_src;
+
+			drm_rect_fp_to_int(&tmp_src, &current_plane->frame_info->src);
+
+			/*
+			 * [1]: Clamping src_px to the crtc_x_limit to avoid writing outside of the
+			 * destination buffer
+			 */
+			src_px.x2 = min_t(int, src_px.x2, (int)crtc_x_limit);
+
+			/*
+			 * Transform the coordinate x/y from the crtc to coordinates into
+			 * coordinates for the src buffer.
+			 *
+			 * - Cancel the offset of the dst buffer.
+			 * - Invert the rotation. This assumes that
+			 *   dst = drm_rect_rotate(src, rotation) (dst and src have the
+			 *   same size, but can be rotated).
+			 * - Apply the offset of the source rectangle to the coordinate.
+			 */
+			drm_rect_translate(&src_px, -current_plane->frame_info->dst.x1,
+					   -current_plane->frame_info->dst.y1);
+			drm_rect_rotate_inv(&src_px,
+					    drm_rect_width(&tmp_src),
+					    drm_rect_height(&tmp_src),
+					    current_plane->frame_info->rotation);
+			drm_rect_translate(&src_px, tmp_src.x1, tmp_src.y1);
+
+			/* Get the correct reading direction in the source buffer. */
+
+			enum pixel_read_direction direction =
+				direction_for_rotation(current_plane->frame_info->rotation);
+
+			int x_start = src_px.x1;
+			int y_start = src_px.y1;
+			int pixel_count;
+			/* [2]: Compute and clamp the number of pixel to read */
+			if (direction == READ_RIGHT || direction == READ_LEFT) {
+				/*
+				 * In horizontal reading, the src_px width is the number of pixel to
+				 * read
+				 */
+				pixel_count = drm_rect_width(&src_px);
+				if (x_start < 0) {
+					pixel_count += x_start;
+					x_start = 0;
+				}
+				if (x_start + pixel_count > current_plane->frame_info->fb->width) {
+					pixel_count =
+						(int)current_plane->frame_info->fb->width - x_start;
+				}
+			} else {
+				/*
+				 * In vertical reading, the src_px height is the number of pixel to
+				 * read
+				 */
+				pixel_count = drm_rect_height(&src_px);
+				if (y_start < 0) {
+					pixel_count += y_start;
+					y_start = 0;
+				}
+				if (y_start + pixel_count > current_plane->frame_info->fb->height) {
+					pixel_count =
+						(int)current_plane->frame_info->fb->width - y_start;
+				}
+			}
+
+			if (pixel_count <= 0) {
+				/* Nothing to read, so avoid multiple function calls for nothing */
+				continue;
+			}
+
+			/*
+			 * Modify the starting point to take in account the rotation
+			 *
+			 * src_px is the top-left corner, so when reading READ_LEFT or READ_TOP, it
+			 * must be changed to the top-right/bottom-left corner.
+			 */
+			if (direction == READ_LEFT) {
+				// x_start is now the right point
+				x_start += pixel_count - 1;
+			} else if (direction == READ_UP) {
+				// y_start is now the bottom point
+				y_start += pixel_count - 1;
+			}
+
+			/*
+			 * Perform the conversion and the blending
+			 *
+			 * Here we know that the read line (x_start, y_start, pixel_count) is
+			 * inside the source buffer [2] and we don't write outside the stage
+			 * buffer [1]
+			 */
+			current_plane->pixel_read_line(
+				current_plane->frame_info,
+				x_start,
+				y_start,
+				direction,
+				pixel_count,
+				&stage_buffer->pixels[current_plane->frame_info->dst.x1]);
+
+			pre_mul_alpha_blend(stage_buffer, output_buffer,
+					    current_plane->frame_info->dst.x1,
+					    pixel_count);
 		}
 
 		apply_lut(crtc_state, output_buffer);
 
 		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
-
 		if (wb)
-			vkms_writeback_row(wb, output_buffer, y_pos);
+			vkms_writeback_row(wb, output_buffer, y);
 	}
 }
 
@@ -224,7 +339,7 @@ static int check_format_funcs(struct vkms_crtc_state *crtc_state,
 	u32 n_active_planes = crtc_state->num_active_planes;
 
 	for (size_t i = 0; i < n_active_planes; i++)
-		if (!planes[i]->pixel_read)
+		if (!planes[i]->pixel_read_line)
 			return -1;
 
 	if (active_wb && !active_wb->pixel_write)
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 886c885c8cf5..0bf49b3c435b 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -39,7 +39,6 @@
 struct vkms_frame_info {
 	struct drm_framebuffer *fb;
 	struct drm_rect src, dst;
-	struct drm_rect rotated;
 	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
 	unsigned int rotation;
 };
@@ -69,14 +68,26 @@ struct vkms_writeback_job {
 	pixel_write_t pixel_write;
 };
 
+enum pixel_read_direction {
+	READ_UP,
+	READ_DOWN,
+	READ_LEFT,
+	READ_RIGHT
+};
+
 /**
- * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
+ * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
  * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
  *
- * @src_pixels: Pointer to the pixel to read
- * @out_pixel: Pointer to write the converted pixel
+ * @frame_info: Frame used as source for the pixel value
+ * @y: Y (height) coordinate in the source buffer
+ * @x_start: X (width) coordinate of the first pixel to copy
+ * @x_end: X (width) coordinate of the last pixel to copy
+ * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
+ *  x_end.
  */
-typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
+typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
+	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
 
 /**
  * vkms_plane_state - Driver specific plane state
@@ -88,7 +99,7 @@ typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
 struct vkms_plane_state {
 	struct drm_shadow_plane_state base;
 	struct vkms_frame_info *frame_info;
-	pixel_read_t pixel_read;
+	pixel_read_line_t pixel_read_line;
 };
 
 struct vkms_plane {
@@ -193,7 +204,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const char *source_name,
 /* Composer Support */
 void vkms_composer_worker(struct work_struct *work);
 void vkms_set_composer(struct vkms_output *out, bool enabled);
-void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y);
 void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer *src_buffer, int y);
 
 /* Writeback */
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 1f5aeba57ad6..46daea6d3ee9 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -11,21 +11,29 @@
 
 /**
  * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
- * in the first plane
  *
  * @frame_info: Buffer metadata
  * @x: The x coordinate of the wanted pixel in the buffer
  * @y: The y coordinate of the wanted pixel in the buffer
+ * @plane_index: The index of the plane to use
  *
  * The caller must be aware that this offset is not always a pointer to a pixel. If individual
  * pixel values are needed, they have to be extracted from the resulting block.
  */
-static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
+static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
+				   size_t plane_index)
 {
 	struct drm_framebuffer *fb = frame_info->fb;
-
-	return fb->offsets[0] + (y * fb->pitches[0])
-			      + (x * fb->format->cpp[0]);
+	const struct drm_format_info *format = frame_info->fb->format;
+	/* Directly using x and y to multiply pitches and format->ccp is not sufficient because
+	 * in some formats a block can represent multiple pixels.
+	 *
+	 * Dividing x and y by the block size allows to extract the correct offset of the block
+	 * containing the pixel.
+	 */
+	return fb->offsets[plane_index] +
+	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
+	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
 }
 
 /**
@@ -35,44 +43,56 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
  * @frame_info: Buffer metadata
  * @x: The x(width) coordinate inside the plane
  * @y: The y(height) coordinate inside the plane
+ * @plane_index: The index of the plane
  *
- * Takes the information stored in the frame_info, a pair of coordinates, and
- * returns the address of the first color channel.
- * This function assumes the channels are packed together, i.e. a color channel
- * comes immediately after another in the memory. And therefore, this function
- * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
+ * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
+ * of the block containing this pixel.
+ * The caller must be aware that this pointer is sometimes not directly a pixel, it needs some
+ * additional work to extract pixel color from this block.
  */
 static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
-				int x, int y)
+				int x, int y, size_t plane_index)
 {
-	size_t offset = pixel_offset(frame_info, x, y);
-
-	return (u8 *)frame_info->map[0].vaddr + offset;
+	return (u8 *)frame_info->map[0].vaddr + packed_pixels_offset(frame_info, x, y, plane_index);
 }
 
-static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
+/**
+ * get_step_1x1() - Common helper to compute the correct step value between each pixel to read in a
+ * certain direction.
+ * This must be used only with format where blockh == blockw == 1.
+ * In the case when direction is not a valid pixel_read_direction, the returned step is 0, so you
+ * must not rely on this result to create a loop variant.
+ *
+ * @fb Framebuffer to iter on
+ * @direction Direction of the reading
+ */
+static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction direction,
+			int plane_index)
 {
-	int x_src = frame_info->src.x1 >> 16;
-	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
-
-	return packed_pixels_addr(frame_info, x_src, y_src);
+	switch (direction) {
+	default:
+		DRM_ERROR("Invalid direction for pixel reading: %d\n", direction);
+		return 0;
+	case READ_RIGHT:
+		return fb->format->char_per_block[plane_index];
+	case READ_LEFT:
+		return -fb->format->char_per_block[plane_index];
+	case READ_DOWN:
+		return (int)fb->pitches[plane_index];
+	case READ_UP:
+		return -(int)fb->pitches[plane_index];
+	}
 }
 
-static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
-{
-	if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
-		return limit - x - 1;
-	return x;
-}
 
 /*
- * The following  functions take pixel data from the buffer and convert them to the format
+ * The following  functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
  * ARGB16161616 in out_pixel.
  *
- * They are used in the `vkms_compose_row` function to handle multiple formats.
+ * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
  */
 
-static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void ARGB8888_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
 {
 	/*
 	 * The 257 is the "conversion ratio". This number is obtained by the
@@ -80,48 +100,26 @@ static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixe
 	 * the best color value in a pixel format with more possibilities.
 	 * A similar idea applies to others RGB color conversions.
 	 */
-	out_pixel->a = (u16)src_pixels[3] * 257;
-	out_pixel->r = (u16)src_pixels[2] * 257;
-	out_pixel->g = (u16)src_pixels[1] * 257;
-	out_pixel->b = (u16)src_pixels[0] * 257;
-}
-
-static void XRGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
-{
-	out_pixel->a = (u16)0xffff;
-	out_pixel->r = (u16)src_pixels[2] * 257;
-	out_pixel->g = (u16)src_pixels[1] * 257;
-	out_pixel->b = (u16)src_pixels[0] * 257;
+	out_pixel->a = (u16)a * 257;
+	out_pixel->r = (u16)r * 257;
+	out_pixel->g = (u16)g * 257;
+	out_pixel->b = (u16)b * 257;
 }
 
-static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void ARGB16161616_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
 {
-	u16 *pixels = (u16 *)src_pixels;
-
-	out_pixel->a = le16_to_cpu(pixels[3]);
-	out_pixel->r = le16_to_cpu(pixels[2]);
-	out_pixel->g = le16_to_cpu(pixels[1]);
-	out_pixel->b = le16_to_cpu(pixels[0]);
-}
-
-static void XRGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
-{
-	u16 *pixels = (u16 *)src_pixels;
-
-	out_pixel->a = (u16)0xffff;
-	out_pixel->r = le16_to_cpu(pixels[2]);
-	out_pixel->g = le16_to_cpu(pixels[1]);
-	out_pixel->b = le16_to_cpu(pixels[0]);
+	out_pixel->a = le16_to_cpu(a);
+	out_pixel->r = le16_to_cpu(r);
+	out_pixel->g = le16_to_cpu(g);
+	out_pixel->b = le16_to_cpu(b);
 }
 
-static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixel)
 {
-	u16 *pixels = (u16 *)src_pixels;
-
 	s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31));
 	s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63));
 
-	u16 rgb_565 = le16_to_cpu(*pixels);
+	u16 rgb_565 = le16_to_cpu(*pixel);
 	s64 fp_r = drm_int2fixp((rgb_565 >> 11) & 0x1f);
 	s64 fp_g = drm_int2fixp((rgb_565 >> 5) & 0x3f);
 	s64 fp_b = drm_int2fixp(rgb_565 & 0x1f);
@@ -132,34 +130,105 @@ static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
 	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
 }
 
-/**
- * vkms_compose_row - compose a single row of a plane
- * @stage_buffer: output line with the composed pixels
- * @plane: state of the plane that is being composed
- * @y: y coordinate of the row
+/*
+ * The following functions are read_line function for each pixel format supported by VKMS.
  *
- * This function composes a single row of a plane. It gets the source pixels
- * through the y coordinate (see get_packed_src_addr()) and goes linearly
- * through the source pixel, reading the pixels and converting it to
- * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
- * the source pixels are not traversed linearly. The source pixels are queried
- * on each iteration in order to traverse the pixels vertically.
+ * They read a line starting at the point @x_start,@y_start following the @direction. The result
+ * is stored in @out_pixel and in the format ARGB16161616.
+ *
+ * Those function are very similar, but it is required for performance reason. In the past, some
+ * experiment were done, and with a generic loop the performance are very reduced [1].
+ *
+ * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/
  */
-void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
+
+static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+			       enum pixel_read_direction direction, int count,
+			       struct pixel_argb_u16 out_pixel[])
+{
+	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+
+	int step = get_step_1x1(frame_info->fb, direction, 0);
+
+	while (count) {
+		u8 *px = (u8 *)src_pixels;
+
+		ARGB8888_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
+		out_pixel += 1;
+		src_pixels += step;
+		count--;
+	}
+}
+
+static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+			       enum pixel_read_direction direction, int count,
+			       struct pixel_argb_u16 out_pixel[])
+{
+	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+
+	int step = get_step_1x1(frame_info->fb, direction, 0);
+
+	while (count) {
+		u8 *px = (u8 *)src_pixels;
+
+		ARGB8888_to_argb_u16(out_pixel, 255, px[2], px[1], px[0]);
+		out_pixel += 1;
+		src_pixels += step;
+		count--;
+	}
+}
+
+static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+				   enum pixel_read_direction direction, int count,
+				   struct pixel_argb_u16 out_pixel[])
+{
+	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+
+	int step = get_step_1x1(frame_info->fb, direction, 0);
+
+	while (count) {
+		u16 *px = (u16 *)src_pixels;
+
+		ARGB16161616_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
+		out_pixel += 1;
+		src_pixels += step;
+		count--;
+	}
+}
+
+static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+				   enum pixel_read_direction direction, int count,
+				   struct pixel_argb_u16 out_pixel[])
+{
+	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+
+	int step = get_step_1x1(frame_info->fb, direction, 0);
+
+	while (count) {
+		u16 *px = (u16 *)src_pixels;
+
+		ARGB16161616_to_argb_u16(out_pixel, 0xFFFF, px[2], px[1], px[0]);
+		out_pixel += 1;
+		src_pixels += step;
+		count--;
+	}
+}
+
+static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+			     enum pixel_read_direction direction, int count,
+			     struct pixel_argb_u16 out_pixel[])
 {
-	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
-	struct vkms_frame_info *frame_info = plane->frame_info;
-	u8 *src_pixels = get_packed_src_addr(frame_info, y);
-	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
+	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
 
-	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
-		int x_pos = get_x_position(frame_info, limit, x);
+	int step = get_step_1x1(frame_info->fb, direction, 0);
 
-		if (drm_rotation_90_or_270(frame_info->rotation))
-			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
-				+ frame_info->fb->format->cpp[0] * y;
+	while (count) {
+		u16 *px = (u16 *)src_pixels;
 
-		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
+		RGB565_to_argb_u16(out_pixel, px);
+		out_pixel += 1;
+		src_pixels += step;
+		count--;
 	}
 }
 
@@ -247,7 +316,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
 {
 	struct vkms_frame_info *frame_info = &wb->wb_frame_info;
 	int x_dst = frame_info->dst.x1;
-	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y, 0);
 	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
 	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
 
@@ -256,27 +325,27 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
 }
 
 /**
- * Retrieve the correct read_pixel function for a specific format.
+ * Retrieve the correct read_line function for a specific format.
  * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
  * pointer is valid before using it in a vkms_plane_state.
  *
  * @format: 4cc of the format
  */
-pixel_read_t get_pixel_read_function(u32 format)
+pixel_read_line_t get_pixel_read_line_function(u32 format)
 {
 	switch (format) {
 	case DRM_FORMAT_ARGB8888:
-		return &ARGB8888_to_argb_u16;
+		return &ARGB8888_read_line;
 	case DRM_FORMAT_XRGB8888:
-		return &XRGB8888_to_argb_u16;
+		return &XRGB8888_read_line;
 	case DRM_FORMAT_ARGB16161616:
-		return &ARGB16161616_to_argb_u16;
+		return &ARGB16161616_read_line;
 	case DRM_FORMAT_XRGB16161616:
-		return &XRGB16161616_to_argb_u16;
+		return &XRGB16161616_read_line;
 	case DRM_FORMAT_RGB565:
-		return &RGB565_to_argb_u16;
+		return &RGB565_read_line;
 	default:
-		return (pixel_read_t)NULL;
+		return (pixel_read_line_t)NULL;
 	}
 }
 
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index 3ecea4563254..8d2bef95ff79 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -5,7 +5,7 @@
 
 #include "vkms_drv.h"
 
-pixel_read_t get_pixel_read_function(u32 format);
+pixel_read_line_t get_pixel_read_line_function(u32 format);
 
 pixel_write_t get_pixel_write_function(u32 format);
 
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index f68b1b03d632..58c1c74742b5 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -106,9 +106,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 		return;
 
 	fmt = fb->format->format;
-	pixel_read_t pixel_read = get_pixel_read_function(fmt);
+	pixel_read_line_t pixel_read_line = get_pixel_read_line_function(fmt);
 
-	if (!pixel_read) {
+	if (!pixel_read_line) {
 		DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
 		return;
 	}
@@ -128,10 +128,8 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 									  DRM_MODE_REFLECT_X |
 									  DRM_MODE_REFLECT_Y);
 
-	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
-			drm_rect_height(&frame_info->rotated), frame_info->rotation);
 
-	vkms_plane_state->pixel_read = pixel_read;
+	vkms_plane_state->pixel_read_line = pixel_read_line;
 }
 
 static int vkms_plane_atomic_check(struct drm_plane *plane,

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 6/9] drm/vkms: Add YUV support
  2024-02-26  8:46 [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (4 preceding siblings ...)
  2024-02-26  8:46 ` [PATCH v3 5/9] drm/vkms: Re-introduce line-per-line composition algorithm Louis Chauvet
@ 2024-02-26  8:46 ` Louis Chauvet
  2024-02-26  8:46 ` [PATCH v3 7/9] drm/vkms: Add range and encoding properties to pixel_read function Louis Chauvet
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Louis Chauvet @ 2024-02-26  8:46 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, arthurgrillo, Jonathan Corbet,
	pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee,
	Louis Chauvet

From: Arthur Grillo <arthurgrillo@riseup.net>

Add support to the YUV formats bellow:

- NV12
- NV16
- NV24
- NV21
- NV61
- NV42
- YUV420
- YUV422
- YUV444
- YVU420
- YVU422
- YVU444

The conversion matrices of each encoding and range were obtained by
rounding the values of the original conversion matrices multiplied by
2^8. This is done to avoid the use of fixed point operations.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
[Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
callbacks for yuv formats]
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
 drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
 drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
 drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
 drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
 5 files changed, 295 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index e555bf9c1aee..54fc5161d565 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
 			 * buffer [1]
 			 */
 			current_plane->pixel_read_line(
-				current_plane->frame_info,
+				current_plane,
 				x_start,
 				y_start,
 				direction,
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 0bf49b3c435b..a825c36d458f 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -75,6 +75,8 @@ enum pixel_read_direction {
 	READ_RIGHT
 };
 
+struct vkms_plane_state;
+
 /**
  * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
  * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
@@ -86,8 +88,8 @@ enum pixel_read_direction {
  * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
  *  x_end.
  */
-typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
-	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
+typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
+	enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
 
 /**
  * vkms_plane_state - Driver specific plane state
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 46daea6d3ee9..515c80866a58 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
 	 */
 	return fb->offsets[plane_index] +
 	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
-	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
+	       (x / drm_format_info_block_height(format, plane_index)) *
+	       format->char_per_block[plane_index];
 }
 
 /**
@@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
 	}
 }
 
+/**
+ * get_subsampling() - Get the subsampling value on a specific direction
+ */
+static int get_subsampling(const struct drm_format_info *format,
+			   enum pixel_read_direction direction)
+{
+	if (direction == READ_LEFT || direction == READ_RIGHT)
+		return format->hsub;
+	else if (direction == READ_DOWN || direction == READ_UP)
+		return format->vsub;
+	return 1;
+}
+
+/**
+ * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
+ */
+static int get_subsampling_offset(const struct drm_format_info *format,
+				  enum pixel_read_direction direction, int x_start, int y_start)
+{
+	if (direction == READ_RIGHT || direction == READ_LEFT)
+		return x_start;
+	else if (direction == READ_DOWN || direction == READ_UP)
+		return y_start;
+	return 0;
+}
+
 
 /*
  * The following  functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
@@ -130,6 +157,87 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
 	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
 }
 
+static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
+{
+	s32 y_16, cb_16, cr_16;
+	s32 r_16, g_16, b_16;
+
+	y_16 = y - y_offset;
+	cb_16 = cb - 128;
+	cr_16 = cr - 128;
+
+	r_16 = m[0][0] * y_16 + m[0][1] * cb_16 + m[0][2] * cr_16;
+	g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
+	b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
+
+	*r = clamp(r_16, 0, 0xffff) >> 8;
+	*g = clamp(g_16, 0, 0xffff) >> 8;
+	*b = clamp(b_16, 0, 0xffff) >> 8;
+}
+
+static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
+			       enum drm_color_encoding encoding, enum drm_color_range range)
+{
+	static const s16 bt601_full[3][3] = {
+		{ 256, 0,   359 },
+		{ 256, -88, -183 },
+		{ 256, 454, 0 },
+	};
+	static const s16 bt601[3][3] = {
+		{ 298, 0,    409 },
+		{ 298, -100, -208 },
+		{ 298, 516,  0 },
+	};
+	static const s16 rec709_full[3][3] = {
+		{ 256, 0,   408 },
+		{ 256, -48, -120 },
+		{ 256, 476, 0 },
+	};
+	static const s16 rec709[3][3] = {
+		{ 298, 0,   459 },
+		{ 298, -55, -136 },
+		{ 298, 541, 0 },
+	};
+	static const s16 bt2020_full[3][3] = {
+		{ 256, 0,   377 },
+		{ 256, -42, -146 },
+		{ 256, 482, 0 },
+	};
+	static const s16 bt2020[3][3] = {
+		{ 298, 0,   430 },
+		{ 298, -48, -167 },
+		{ 298, 548, 0 },
+	};
+
+	u8 r = 0;
+	u8 g = 0;
+	u8 b = 0;
+	bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
+	unsigned int y_offset = full ? 0 : 16;
+
+	switch (encoding) {
+	case DRM_COLOR_YCBCR_BT601:
+		ycbcr2rgb(full ? bt601_full : bt601,
+			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
+		break;
+	case DRM_COLOR_YCBCR_BT709:
+		ycbcr2rgb(full ? rec709_full : rec709,
+			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
+		break;
+	case DRM_COLOR_YCBCR_BT2020:
+		ycbcr2rgb(full ? bt2020_full : bt2020,
+			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
+		break;
+	default:
+		pr_warn_once("Not supported color encoding\n");
+		break;
+	}
+
+	argb_u16->r = r * 257;
+	argb_u16->g = g * 257;
+	argb_u16->b = b * 257;
+}
+
 /*
  * The following functions are read_line function for each pixel format supported by VKMS.
  *
@@ -142,13 +250,13 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
  * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/
  */
 
-static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+static void ARGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
 			       enum pixel_read_direction direction, int count,
 			       struct pixel_argb_u16 out_pixel[])
 {
-	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
 
-	int step = get_step_1x1(frame_info->fb, direction, 0);
+	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
 
 	while (count) {
 		u8 *px = (u8 *)src_pixels;
@@ -160,13 +268,13 @@ static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
 	}
 }
 
-static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+static void XRGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
 			       enum pixel_read_direction direction, int count,
 			       struct pixel_argb_u16 out_pixel[])
 {
-	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
 
-	int step = get_step_1x1(frame_info->fb, direction, 0);
+	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
 
 	while (count) {
 		u8 *px = (u8 *)src_pixels;
@@ -178,13 +286,13 @@ static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
 	}
 }
 
-static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+static void ARGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
 				   enum pixel_read_direction direction, int count,
 				   struct pixel_argb_u16 out_pixel[])
 {
-	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
 
-	int step = get_step_1x1(frame_info->fb, direction, 0);
+	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
 
 	while (count) {
 		u16 *px = (u16 *)src_pixels;
@@ -196,13 +304,13 @@ static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
 	}
 }
 
-static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+static void XRGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
 				   enum pixel_read_direction direction, int count,
 				   struct pixel_argb_u16 out_pixel[])
 {
-	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
 
-	int step = get_step_1x1(frame_info->fb, direction, 0);
+	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
 
 	while (count) {
 		u16 *px = (u16 *)src_pixels;
@@ -214,13 +322,13 @@ static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
 	}
 }
 
-static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+static void RGB565_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
 			     enum pixel_read_direction direction, int count,
 			     struct pixel_argb_u16 out_pixel[])
 {
-	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
 
-	int step = get_step_1x1(frame_info->fb, direction, 0);
+	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
 
 	while (count) {
 		u16 *px = (u16 *)src_pixels;
@@ -232,6 +340,139 @@ static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, in
 	}
 }
 
+static void semi_planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
+				      enum pixel_read_direction direction, int count,
+				      struct pixel_argb_u16 out_pixel[])
+{
+	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
+	u8 *uv_plane = packed_pixels_addr(plane->frame_info,
+					  x_start / plane->frame_info->fb->format->hsub,
+					  y_start / plane->frame_info->fb->format->vsub,
+					  1);
+	struct pixel_yuv_u8 yuv_u8;
+	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
+	int step_uv = get_step_1x1(plane->frame_info->fb, direction, 1);
+	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
+	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
+							x_start, y_start); // 0
+
+	for (int i = 0; i < count; i++) {
+		yuv_u8.y = y_plane[0];
+		yuv_u8.u = uv_plane[0];
+		yuv_u8.v = uv_plane[1];
+
+		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
+				   plane->base.base.color_range);
+		out_pixel += 1;
+		y_plane += step_y;
+		if ((i + subsampling_offset + 1) % subsampling == 0)
+			uv_plane += step_uv;
+	}
+}
+
+static void semi_planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
+				      enum pixel_read_direction direction, int count,
+				      struct pixel_argb_u16 out_pixel[])
+{
+	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
+	u8 *vu_plane = packed_pixels_addr(plane->frame_info,
+					  x_start / plane->frame_info->fb->format->hsub,
+					  y_start / plane->frame_info->fb->format->vsub,
+					  1);
+	struct pixel_yuv_u8 yuv_u8;
+	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
+	int step_vu = get_step_1x1(plane->frame_info->fb, direction, 1);
+	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
+	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
+							x_start, y_start);
+	for (int i = 0; i < count; i++) {
+		yuv_u8.y = y_plane[0];
+		yuv_u8.u = vu_plane[1];
+		yuv_u8.v = vu_plane[0];
+
+		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
+				   plane->base.base.color_range);
+		out_pixel += 1;
+		y_plane += step_y;
+		if ((i + subsampling_offset + 1) % subsampling == 0)
+			vu_plane += step_vu;
+	}
+}
+
+static void planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
+				 enum pixel_read_direction direction, int count,
+				 struct pixel_argb_u16 out_pixel[])
+{
+	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
+	u8 *u_plane = packed_pixels_addr(plane->frame_info,
+					 x_start / plane->frame_info->fb->format->hsub,
+					 y_start / plane->frame_info->fb->format->vsub,
+					 1);
+	u8 *v_plane = packed_pixels_addr(plane->frame_info,
+					 x_start / plane->frame_info->fb->format->hsub,
+					 y_start / plane->frame_info->fb->format->vsub,
+					 2);
+	struct pixel_yuv_u8 yuv_u8;
+	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
+	int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
+	int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
+	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
+	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
+							x_start, y_start);
+
+	for (int i = 0; i < count; i++) {
+		yuv_u8.y = *y_plane;
+		yuv_u8.u = *u_plane;
+		yuv_u8.v = *v_plane;
+
+		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
+				   plane->base.base.color_range);
+		out_pixel += 1;
+		y_plane += step_y;
+		if ((i + subsampling_offset + 1) % subsampling == 0) {
+			u_plane += step_u;
+			v_plane += step_v;
+		}
+	}
+}
+
+static void planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
+				 enum pixel_read_direction direction, int count,
+				 struct pixel_argb_u16 out_pixel[])
+{
+	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
+	u8 *v_plane = packed_pixels_addr(plane->frame_info,
+					 x_start / plane->frame_info->fb->format->hsub,
+					 y_start / plane->frame_info->fb->format->vsub,
+					 1);
+	u8 *u_plane = packed_pixels_addr(plane->frame_info,
+					 x_start / plane->frame_info->fb->format->hsub,
+					 y_start / plane->frame_info->fb->format->vsub,
+					 2);
+	struct pixel_yuv_u8 yuv_u8;
+	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
+	int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
+	int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
+	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
+	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
+							x_start, y_start);
+
+	for (int i = 0; i < count; i++) {
+		yuv_u8.y = *y_plane;
+		yuv_u8.u = *u_plane;
+		yuv_u8.v = *v_plane;
+
+		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
+				   plane->base.base.color_range);
+		out_pixel += 1;
+		y_plane += step_y;
+		if ((i + subsampling_offset + 1) % subsampling == 0) {
+			u_plane += step_u;
+			v_plane += step_v;
+		}
+	}
+}
+
 /*
  * The following functions take one argb_u16 pixel and convert it to a specific format. The
  * result is stored in @dst_pixels.
@@ -344,6 +585,22 @@ pixel_read_line_t get_pixel_read_line_function(u32 format)
 		return &XRGB16161616_read_line;
 	case DRM_FORMAT_RGB565:
 		return &RGB565_read_line;
+	case DRM_FORMAT_NV12:
+	case DRM_FORMAT_NV16:
+	case DRM_FORMAT_NV24:
+		return &semi_planar_yuv_read_line;
+	case DRM_FORMAT_NV21:
+	case DRM_FORMAT_NV61:
+	case DRM_FORMAT_NV42:
+		return &semi_planar_yvu_read_line;
+	case DRM_FORMAT_YUV420:
+	case DRM_FORMAT_YUV422:
+	case DRM_FORMAT_YUV444:
+		return &planar_yuv_read_line;
+	case DRM_FORMAT_YVU420:
+	case DRM_FORMAT_YVU422:
+	case DRM_FORMAT_YVU444:
+		return &planar_yvu_read_line;
 	default:
 		return (pixel_read_line_t)NULL;
 	}
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index 8d2bef95ff79..5a3a9e1328d8 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -9,4 +9,8 @@ pixel_read_line_t get_pixel_read_line_function(u32 format);
 
 pixel_write_t get_pixel_write_function(u32 format);
 
+struct pixel_yuv_u8 {
+	u8 y, u, v;
+};
+
 #endif /* _VKMS_FORMATS_H_ */
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 58c1c74742b5..427ca67c60ce 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -17,7 +17,19 @@ static const u32 vkms_formats[] = {
 	DRM_FORMAT_XRGB8888,
 	DRM_FORMAT_XRGB16161616,
 	DRM_FORMAT_ARGB16161616,
-	DRM_FORMAT_RGB565
+	DRM_FORMAT_RGB565,
+	DRM_FORMAT_NV12,
+	DRM_FORMAT_NV16,
+	DRM_FORMAT_NV24,
+	DRM_FORMAT_NV21,
+	DRM_FORMAT_NV61,
+	DRM_FORMAT_NV42,
+	DRM_FORMAT_YUV420,
+	DRM_FORMAT_YUV422,
+	DRM_FORMAT_YUV444,
+	DRM_FORMAT_YVU420,
+	DRM_FORMAT_YVU422,
+	DRM_FORMAT_YVU444
 };
 
 static struct drm_plane_state *

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 7/9] drm/vkms: Add range and encoding properties to pixel_read function
  2024-02-26  8:46 [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (5 preceding siblings ...)
  2024-02-26  8:46 ` [PATCH v3 6/9] drm/vkms: Add YUV support Louis Chauvet
@ 2024-02-26  8:46 ` Louis Chauvet
  2024-02-26  8:46 ` [PATCH v3 8/9] drm/vkms: Drop YUV formats TODO Louis Chauvet
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Louis Chauvet @ 2024-02-26  8:46 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, arthurgrillo, Jonathan Corbet,
	pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee,
	Louis Chauvet

From: Arthur Grillo <arthurgrillo@riseup.net>

Create range and encoding properties. This should be noop, as none of
the conversion functions need those properties.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
[Louis Chauvet: retained only relevant parts]
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_plane.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 427ca67c60ce..95dfde297377 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -228,5 +228,14 @@ struct vkms_plane *vkms_plane_init(struct vkms_device *vkmsdev,
 	drm_plane_create_rotation_property(&plane->base, DRM_MODE_ROTATE_0,
 					   DRM_MODE_ROTATE_MASK | DRM_MODE_REFLECT_MASK);
 
+	drm_plane_create_color_properties(&plane->base,
+					  BIT(DRM_COLOR_YCBCR_BT601) |
+					  BIT(DRM_COLOR_YCBCR_BT709) |
+					  BIT(DRM_COLOR_YCBCR_BT2020),
+					  BIT(DRM_COLOR_YCBCR_LIMITED_RANGE) |
+					  BIT(DRM_COLOR_YCBCR_FULL_RANGE),
+					  DRM_COLOR_YCBCR_BT601,
+					  DRM_COLOR_YCBCR_FULL_RANGE);
+
 	return plane;
 }

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 8/9] drm/vkms: Drop YUV formats TODO
  2024-02-26  8:46 [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (6 preceding siblings ...)
  2024-02-26  8:46 ` [PATCH v3 7/9] drm/vkms: Add range and encoding properties to pixel_read function Louis Chauvet
@ 2024-02-26  8:46 ` Louis Chauvet
  2024-02-26  8:46 ` [PATCH v3 9/9] drm/vkms: Create KUnit tests for YUV conversions Louis Chauvet
  2024-02-26 12:29 ` [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Pekka Paalanen
  9 siblings, 0 replies; 19+ messages in thread
From: Louis Chauvet @ 2024-02-26  8:46 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, arthurgrillo, Jonathan Corbet,
	pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee,
	Louis Chauvet

From: Arthur Grillo <arthurgrillo@riseup.net>

VKMS has support for YUV formats now. Remove the task from the TODO
list.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 Documentation/gpu/vkms.rst | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
index ba04ac7c2167..13b866c3617c 100644
--- a/Documentation/gpu/vkms.rst
+++ b/Documentation/gpu/vkms.rst
@@ -122,8 +122,7 @@ There's lots of plane features we could add support for:
 
 - Scaling.
 
-- Additional buffer formats, especially YUV formats for video like NV12.
-  Low/high bpp RGB formats would also be interesting.
+- Additional buffer formats. Low/high bpp RGB formats would be interesting.
 
 - Async updates (currently only possible on cursor plane using the legacy
   cursor api).

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 9/9] drm/vkms: Create KUnit tests for YUV conversions
  2024-02-26  8:46 [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (7 preceding siblings ...)
  2024-02-26  8:46 ` [PATCH v3 8/9] drm/vkms: Drop YUV formats TODO Louis Chauvet
@ 2024-02-26  8:46 ` Louis Chauvet
  2024-02-26 12:29 ` [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Pekka Paalanen
  9 siblings, 0 replies; 19+ messages in thread
From: Louis Chauvet @ 2024-02-26  8:46 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, arthurgrillo, Jonathan Corbet,
	pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee,
	Louis Chauvet

From: Arthur Grillo <arthurgrillo@riseup.net>

Create KUnit tests to test the conversion between YUV and RGB. Test each
conversion and range combination with some common colors.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
[Louis Chauvet: fix minor formating issues (whitespace, double line)]
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/Makefile                 |   1 +
 drivers/gpu/drm/vkms/tests/.kunitconfig       |   4 +
 drivers/gpu/drm/vkms/tests/Makefile           |   3 +
 drivers/gpu/drm/vkms/tests/vkms_format_test.c | 155 ++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_formats.c           |   9 +-
 drivers/gpu/drm/vkms/vkms_formats.h           |   5 +
 6 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
index 1b28a6a32948..8d3e46dde635 100644
--- a/drivers/gpu/drm/vkms/Makefile
+++ b/drivers/gpu/drm/vkms/Makefile
@@ -9,3 +9,4 @@ vkms-y := \
 	vkms_writeback.o
 
 obj-$(CONFIG_DRM_VKMS) += vkms.o
+obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += tests/
diff --git a/drivers/gpu/drm/vkms/tests/.kunitconfig b/drivers/gpu/drm/vkms/tests/.kunitconfig
new file mode 100644
index 000000000000..70e378228cbd
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/.kunitconfig
@@ -0,0 +1,4 @@
+CONFIG_KUNIT=y
+CONFIG_DRM=y
+CONFIG_DRM_VKMS=y
+CONFIG_DRM_VKMS_KUNIT_TESTS=y
diff --git a/drivers/gpu/drm/vkms/tests/Makefile b/drivers/gpu/drm/vkms/tests/Makefile
new file mode 100644
index 000000000000..2d1df668569e
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += vkms_format_test.o
diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
new file mode 100644
index 000000000000..cb6d32ff115d
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <kunit/test.h>
+
+#include <drm/drm_fixed.h>
+#include <drm/drm_fourcc.h>
+#include <drm/drm_print.h>
+
+#include "../../drm_crtc_internal.h"
+
+#include "../vkms_drv.h"
+#include "../vkms_formats.h"
+
+#define TEST_BUFF_SIZE 50
+
+struct yuv_u8_to_argb_u16_case {
+	enum drm_color_encoding encoding;
+	enum drm_color_range range;
+	size_t n_colors;
+	struct format_pair {
+		char *name;
+		struct pixel_yuv_u8 yuv;
+		struct pixel_argb_u16 argb;
+	} colors[TEST_BUFF_SIZE];
+};
+
+static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
+	{
+		.encoding = DRM_COLOR_YCBCR_BT601,
+		.range = DRM_COLOR_YCBCR_FULL_RANGE,
+		.n_colors = 6,
+		.colors = {
+			{"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+			{"gray",  {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+			{"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+			{"red",   {0x4c, 0x55, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
+			{"green", {0x96, 0x2c, 0x15}, {0x0000, 0x0000, 0xffff, 0x0000}},
+			{"blue",  {0x1d, 0xff, 0x6b}, {0x0000, 0x0000, 0x0000, 0xffff}},
+		},
+	},
+	{
+		.encoding = DRM_COLOR_YCBCR_BT601,
+		.range = DRM_COLOR_YCBCR_LIMITED_RANGE,
+		.n_colors = 6,
+		.colors = {
+			{"white", {0xeb, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+			{"gray",  {0x7e, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+			{"black", {0x10, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+			{"red",   {0x51, 0x5a, 0xf0}, {0x0000, 0xffff, 0x0000, 0x0000}},
+			{"green", {0x91, 0x36, 0x22}, {0x0000, 0x0000, 0xffff, 0x0000}},
+			{"blue",  {0x29, 0xf0, 0x6e}, {0x0000, 0x0000, 0x0000, 0xffff}},
+		},
+	},
+	{
+		.encoding = DRM_COLOR_YCBCR_BT709,
+		.range = DRM_COLOR_YCBCR_FULL_RANGE,
+		.n_colors = 4,
+		.colors = {
+			{"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+			{"gray",  {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+			{"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+			{"red",   {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
+			{"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
+			{"blue",  {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
+		},
+	},
+	{
+		.encoding = DRM_COLOR_YCBCR_BT709,
+		.range = DRM_COLOR_YCBCR_LIMITED_RANGE,
+		.n_colors = 4,
+		.colors = {
+			{"white", {0xeb, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+			{"gray",  {0x7e, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+			{"black", {0x10, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+			{"red",   {0x3f, 0x66, 0xf0}, {0x0000, 0xffff, 0x0000, 0x0000}},
+			{"green", {0xad, 0x2a, 0x1a}, {0x0000, 0x0000, 0xffff, 0x0000}},
+			{"blue",  {0x20, 0xf0, 0x76}, {0x0000, 0x0000, 0x0000, 0xffff}},
+		},
+	},
+	{
+		.encoding = DRM_COLOR_YCBCR_BT2020,
+		.range = DRM_COLOR_YCBCR_FULL_RANGE,
+		.n_colors = 4,
+		.colors = {
+			{"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+			{"gray",  {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+			{"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+			{"red",   {0x43, 0x5c, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
+			{"green", {0xad, 0x24, 0x0b}, {0x0000, 0x0000, 0xffff, 0x0000}},
+			{"blue",  {0x0f, 0xff, 0x76}, {0x0000, 0x0000, 0x0000, 0xffff}},
+		},
+	},
+	{
+		.encoding = DRM_COLOR_YCBCR_BT2020,
+		.range = DRM_COLOR_YCBCR_LIMITED_RANGE,
+		.n_colors = 4,
+		.colors = {
+			{"white", {0xeb, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+			{"gray",  {0x7e, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+			{"black", {0x10, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+			{"red",   {0x4a, 0x61, 0xf0}, {0x0000, 0xffff, 0x0000, 0x0000}},
+			{"green", {0xa4, 0x2f, 0x19}, {0x0000, 0x0000, 0xffff, 0x0000}},
+			{"blue",  {0x1d, 0xf0, 0x77}, {0x0000, 0x0000, 0x0000, 0xffff}},
+		},
+	},
+};
+
+static void vkms_format_test_yuv_u8_to_argb_u16(struct kunit *test)
+{
+	const struct yuv_u8_to_argb_u16_case *param = test->param_value;
+	struct pixel_argb_u16 argb;
+
+	for (size_t i = 0; i < param->n_colors; i++) {
+		const struct format_pair *color = &param->colors[i];
+
+		yuv_u8_to_argb_u16(&argb, &color->yuv, param->encoding, param->range);
+
+		KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.a, color->argb.a), 257,
+				    "On the A channel of the color %s expected 0x%04x, got 0x%04x",
+				    color->name, color->argb.a, argb.a);
+		KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.r, color->argb.r), 257,
+				    "On the R channel of the color %s expected 0x%04x, got 0x%04x",
+				    color->name, color->argb.r, argb.r);
+		KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.g, color->argb.g), 257,
+				    "On the G channel of the color %s expected 0x%04x, got 0x%04x",
+				    color->name, color->argb.g, argb.g);
+		KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.b, color->argb.b), 257,
+				    "On the B channel of the color %s expected 0x%04x, got 0x%04x",
+				    color->name, color->argb.b, argb.b);
+	}
+}
+
+static void vkms_format_test_yuv_u8_to_argb_u16_case_desc(struct yuv_u8_to_argb_u16_case *t,
+							  char *desc)
+{
+	snprintf(desc, KUNIT_PARAM_DESC_SIZE, "%s - %s",
+		 drm_get_color_encoding_name(t->encoding), drm_get_color_range_name(t->range));
+}
+
+KUNIT_ARRAY_PARAM(yuv_u8_to_argb_u16, yuv_u8_to_argb_u16_cases,
+		  vkms_format_test_yuv_u8_to_argb_u16_case_desc);
+
+static struct kunit_case vkms_format_test_cases[] = {
+	KUNIT_CASE_PARAM(vkms_format_test_yuv_u8_to_argb_u16, yuv_u8_to_argb_u16_gen_params),
+	{}
+};
+
+static struct kunit_suite vkms_format_test_suite = {
+	.name = "vkms-format",
+	.test_cases = vkms_format_test_cases,
+};
+
+kunit_test_suite(vkms_format_test_suite);
+
+MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 515c80866a58..20dd23ce9051 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -7,6 +7,8 @@
 #include <drm/drm_rect.h>
 #include <drm/drm_fixed.h>
 
+#include <kunit/visibility.h>
+
 #include "vkms_formats.h"
 
 /**
@@ -175,8 +177,10 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
 	*b = clamp(b_16, 0, 0xffff) >> 8;
 }
 
-static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
-			       enum drm_color_encoding encoding, enum drm_color_range range)
+VISIBLE_IF_KUNIT void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16,
+					 const struct pixel_yuv_u8 *yuv_u8,
+					 enum drm_color_encoding encoding,
+					 enum drm_color_range range)
 {
 	static const s16 bt601_full[3][3] = {
 		{ 256, 0,   359 },
@@ -237,6 +241,7 @@ static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pix
 	argb_u16->g = g * 257;
 	argb_u16->b = b * 257;
 }
+EXPORT_SYMBOL_IF_KUNIT(yuv_u8_to_argb_u16);
 
 /*
  * The following functions are read_line function for each pixel format supported by VKMS.
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index 5a3a9e1328d8..4245a5c5e956 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -13,4 +13,9 @@ struct pixel_yuv_u8 {
 	u8 y, u, v;
 };
 
+#if IS_ENABLED(CONFIG_KUNIT)
+void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
+			enum drm_color_encoding encoding, enum drm_color_range range);
+#endif
+
 #endif /* _VKMS_FORMATS_H_ */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading
  2024-02-26  8:46 [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (8 preceding siblings ...)
  2024-02-26  8:46 ` [PATCH v3 9/9] drm/vkms: Create KUnit tests for YUV conversions Louis Chauvet
@ 2024-02-26 12:29 ` Pekka Paalanen
  9 siblings, 0 replies; 19+ messages in thread
From: Pekka Paalanen @ 2024-02-26 12:29 UTC (permalink / raw)
  To: Louis Chauvet
  Cc: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, arthurgrillo, Jonathan Corbet,
	dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee

[-- Attachment #1: Type: text/plain, Size: 715 bytes --]

On Mon, 26 Feb 2024 09:46:30 +0100
Louis Chauvet <louis.chauvet@bootlin.com> wrote:

> This patchset is the second version of [1]. It is almost a complete 
> rewrite to use a line-by-line algorithm for the composition.
> It can be divided in three parts:
> - PATCH 1 to 4: no functional change is intended, only some formatting and 
> documenting
> (PATCH 2 is taken from [2])
> - PATCH 5: main patch for this series, it reintroduce the 
> line-by-line algorithm
> - PATCH 6 to 9: taken from Arthur's series [2], with sometimes adaptation 
> to use the pixel-by-pixel algorithm.

Hi Louis,

I'll skip v3 because I was still reviewing v2 while you posted this.
I'm done on v2 now.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
  2024-02-26  8:46 ` [PATCH v3 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions Louis Chauvet
@ 2024-02-26 12:44   ` Arthur Grillo
  2024-02-27 15:02     ` Louis Chauvet
  0 siblings, 1 reply; 19+ messages in thread
From: Arthur Grillo @ 2024-02-26 12:44 UTC (permalink / raw)
  To: Louis Chauvet, Rodrigo Siqueira, Melissa Wen, Maíra Canal,
	Haneen Mohammed, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Jonathan Corbet, pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee



On 26/02/24 05:46, Louis Chauvet wrote:
> Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
> compiler to check if the passed functions take the correct arguments.
> Such typedefs will help ensuring consistency across the code base in
> case of update of these prototypes.
> 
> Introduce a check around the get_pixel_*_functions to avoid using a
> nullptr as a function.
> 
> Document for those typedefs.
> 
> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> ---
>  drivers/gpu/drm/vkms/vkms_drv.h       | 23 +++++++++++++++++++++--
>  drivers/gpu/drm/vkms/vkms_formats.c   |  8 ++++----
>  drivers/gpu/drm/vkms/vkms_formats.h   |  4 ++--
>  drivers/gpu/drm/vkms/vkms_plane.c     |  9 ++++++++-
>  drivers/gpu/drm/vkms/vkms_writeback.c |  9 ++++++++-
>  5 files changed, 43 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 18086423a3a7..886c885c8cf5 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -53,12 +53,31 @@ struct line_buffer {
>  	struct pixel_argb_u16 *pixels;
>  };
>  
> +/**
> + * typedef pixel_write_t - These functions are used to read a pixel from a
> + * `struct pixel_argb_u16*`, convert it in a specific format and write it in the @dst_pixels
> + * buffer.
> + *
> + * @dst_pixel: destination address to write the pixel
> + * @in_pixel: pixel to write
> + */
> +typedef void (*pixel_write_t)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
> +
>  struct vkms_writeback_job {
>  	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
>  	struct vkms_frame_info wb_frame_info;
> -	void (*pixel_write)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
> +	pixel_write_t pixel_write;
>  };
>  
> +/**
> + * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
> + * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
> + *
> + * @src_pixels: Pointer to the pixel to read
> + * @out_pixel: Pointer to write the converted pixel
> + */
> +typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
> +
>  /**
>   * vkms_plane_state - Driver specific plane state
>   * @base: base plane state
> @@ -69,7 +88,7 @@ struct vkms_writeback_job {
>  struct vkms_plane_state {
>  	struct drm_shadow_plane_state base;
>  	struct vkms_frame_info *frame_info;
> -	void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel);
> +	pixel_read_t pixel_read;
>  };
>  
>  struct vkms_plane {
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index cb7a49b7c8e7..1f5aeba57ad6 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -262,7 +262,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
>   *
>   * @format: 4cc of the format
>   */
> -void *get_pixel_conversion_function(u32 format)
> +pixel_read_t get_pixel_read_function(u32 format)
>  {
>  	switch (format) {
>  	case DRM_FORMAT_ARGB8888:
> @@ -276,7 +276,7 @@ void *get_pixel_conversion_function(u32 format)
>  	case DRM_FORMAT_RGB565:
>  		return &RGB565_to_argb_u16;
>  	default:
> -		return NULL;
> +		return (pixel_read_t)NULL;
>  	}
>  }
>  
> @@ -287,7 +287,7 @@ void *get_pixel_conversion_function(u32 format)
>   *
>   * @format: 4cc of the format
>   */
> -void *get_pixel_write_function(u32 format)
> +pixel_write_t get_pixel_write_function(u32 format)
>  {
>  	switch (format) {
>  	case DRM_FORMAT_ARGB8888:
> @@ -301,6 +301,6 @@ void *get_pixel_write_function(u32 format)
>  	case DRM_FORMAT_RGB565:
>  		return &argb_u16_to_RGB565;
>  	default:
> -		return NULL;
> +		return (pixel_write_t)NULL;
>  	}
>  }
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index cf59c2ed8e9a..3ecea4563254 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -5,8 +5,8 @@
>  
>  #include "vkms_drv.h"
>  
> -void *get_pixel_conversion_function(u32 format);
> +pixel_read_t get_pixel_read_function(u32 format);
>  
> -void *get_pixel_write_function(u32 format);
> +pixel_write_t get_pixel_write_function(u32 format);
>  
>  #endif /* _VKMS_FORMATS_H_ */
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index d5203f531d96..f68b1b03d632 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -106,6 +106,13 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  		return;
>  
>  	fmt = fb->format->format;
> +	pixel_read_t pixel_read = get_pixel_read_function(fmt);
> +
> +	if (!pixel_read) {
> +		DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");

s/inchanged/unchanged/

> +		return;
> +	}
> +
>  	vkms_plane_state = to_vkms_plane_state(new_state);
>  	shadow_plane_state = &vkms_plane_state->base;
>  
> @@ -124,7 +131,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
>  			drm_rect_height(&frame_info->rotated), frame_info->rotation);
>  
> -	vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt);
> +	vkms_plane_state->pixel_read = pixel_read;
>  }
>  
>  static int vkms_plane_atomic_check(struct drm_plane *plane,
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index c8582df1f739..c92b9f06c4a4 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -140,6 +140,13 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>  	if (!conn_state)
>  		return;
>  
> +	pixel_write_t pixel_write = get_pixel_write_function(wb_format);
> +
> +	if (!pixel_write) {
> +		DRM_WARN("Pixel format is not supported by VKMS writeback. State is inchanged\n");

Same here

Best Regards,
~Arthur Grillo

> +		return;
> +	}
> +
>  	vkms_set_composer(&vkmsdev->output, true);
>  
>  	active_wb = conn_state->writeback_job->priv;
> @@ -150,7 +157,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>  	crtc_state->wb_pending = true;
>  	spin_unlock_irq(&output->composer_lock);
>  	drm_writeback_queue_job(wb_conn, connector_state);
> -	active_wb->pixel_write = get_pixel_write_function(wb_format);
> +	active_wb->pixel_write = pixel_write;
>  	drm_rect_init(&wb_frame_info->src, 0, 0, crtc_width, crtc_height);
>  	drm_rect_init(&wb_frame_info->dst, 0, 0, crtc_width, crtc_height);
>  }
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions
  2024-02-26  8:46 ` [PATCH v3 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions Louis Chauvet
@ 2024-02-26 13:07   ` Arthur Grillo
  2024-02-27 15:02     ` Louis Chauvet
  0 siblings, 1 reply; 19+ messages in thread
From: Arthur Grillo @ 2024-02-26 13:07 UTC (permalink / raw)
  To: Louis Chauvet, Rodrigo Siqueira, Melissa Wen, Maíra Canal,
	Haneen Mohammed, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Jonathan Corbet, pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee



On 26/02/24 05:46, Louis Chauvet wrote:
> Add some documentation on pixel conversion functions.
> Update of outdated comments for pixel_write functions.
> 
> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c |  4 +++
>  drivers/gpu/drm/vkms/vkms_drv.h      | 13 ++++++++
>  drivers/gpu/drm/vkms/vkms_formats.c  | 58 ++++++++++++++++++++++++++++++------
>  3 files changed, 66 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index c6d9b4a65809..5b341222d239 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -189,6 +189,10 @@ static void blend(struct vkms_writeback_job *wb,
>  
>  	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
>  
> +	/*
> +	 * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> +	 * blending performance.

At this moment in the series, you have not yet reintroduced the
line-by-line algorithm yet. Maybe it's better to add this comment when
you do.

Also, I think it's good to give more context, like:
"The planes are composed line-by-line, instead of pixel-by-pixel"

Best Regards,
~Arthur Grillo

> +	 */
>  	for (size_t y = 0; y < crtc_y_limit; y++) {
>  		fill_background(&background_color, output_buffer);
>  
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index b4b357447292..18086423a3a7 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -25,6 +25,17 @@
>  
>  #define VKMS_LUT_SIZE 256
>  
> +/**
> + * struct vkms_frame_info - structure to store the state of a frame
> + *
> + * @fb: backing drm framebuffer
> + * @src: source rectangle of this frame in the source framebuffer
> + * @dst: destination rectangle in the crtc buffer
> + * @map: see drm_shadow_plane_state@data
> + * @rotation: rotation applied to the source.
> + *
> + * @src and @dst should have the same size modulo the rotation.
> + */
>  struct vkms_frame_info {
>  	struct drm_framebuffer *fb;
>  	struct drm_rect src, dst;
> @@ -52,6 +63,8 @@ struct vkms_writeback_job {
>   * vkms_plane_state - Driver specific plane state
>   * @base: base plane state
>   * @frame_info: data required for composing computation
> + * @pixel_read: function to read a pixel in this plane. The creator of a vkms_plane_state must
> + * ensure that this pointer is valid
>   */
>  struct vkms_plane_state {
>  	struct drm_shadow_plane_state base;
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 172830a3936a..cb7a49b7c8e7 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -9,6 +9,17 @@
>  
>  #include "vkms_formats.h"
>  
> +/**
> + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> + * in the first plane
> + *
> + * @frame_info: Buffer metadata
> + * @x: The x coordinate of the wanted pixel in the buffer
> + * @y: The y coordinate of the wanted pixel in the buffer
> + *
> + * The caller must be aware that this offset is not always a pointer to a pixel. If individual
> + * pixel values are needed, they have to be extracted from the resulting block.
> + */
>  static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
>  {
>  	struct drm_framebuffer *fb = frame_info->fb;
> @@ -17,12 +28,13 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
>  			      + (x * fb->format->cpp[0]);
>  }
>  
> -/*
> - * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> +/**
> + * packed_pixels_addr() - Get the pointer to the block containing the pixel at the given
> + * coordinates
>   *
>   * @frame_info: Buffer metadata
> - * @x: The x(width) coordinate of the 2D buffer
> - * @y: The y(Heigth) coordinate of the 2D buffer
> + * @x: The x(width) coordinate inside the plane
> + * @y: The y(height) coordinate inside the plane
>   *
>   * Takes the information stored in the frame_info, a pair of coordinates, and
>   * returns the address of the first color channel.
> @@ -53,6 +65,13 @@ static int get_x_position(const struct vkms_frame_info *frame_info, int limit, i
>  	return x;
>  }
>  
> +/*
> + * The following  functions take pixel data from the buffer and convert them to the format
> + * ARGB16161616 in out_pixel.
> + *
> + * They are used in the `vkms_compose_row` function to handle multiple formats.
> + */
> +
>  static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
>  {
>  	/*
> @@ -145,12 +164,11 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
>  }
>  
>  /*
> - * The following  functions take an line of argb_u16 pixels from the
> - * src_buffer, convert them to a specific format, and store them in the
> - * destination.
> + * The following functions take one argb_u16 pixel and convert it to a specific format. The
> + * result is stored in @dst_pixels.
>   *
> - * They are used in the `compose_active_planes` to convert and store a line
> - * from the src_buffer to the writeback buffer.
> + * They are used in the `vkms_writeback_row` to convert and store a pixel from the src_buffer to
> + * the writeback buffer.
>   */
>  static void argb_u16_to_ARGB8888(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
>  {
> @@ -216,6 +234,14 @@ static void argb_u16_to_RGB565(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
>  	*pixels = cpu_to_le16(r << 11 | g << 5 | b);
>  }
>  
> +/**
> + * Generic loop for all supported writeback format. It is executed just after the blending to
> + * write a line in the writeback buffer.
> + *
> + * @wb: Job where to insert the final image
> + * @src_buffer: Line to write
> + * @y: Row to write in the writeback buffer
> + */
>  void vkms_writeback_row(struct vkms_writeback_job *wb,
>  			const struct line_buffer *src_buffer, int y)
>  {
> @@ -229,6 +255,13 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
>  		wb->pixel_write(dst_pixels, &in_pixels[x]);
>  }
>  
> +/**
> + * Retrieve the correct read_pixel function for a specific format.
> + * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
> + * pointer is valid before using it in a vkms_plane_state.
> + *
> + * @format: 4cc of the format
> + */
>  void *get_pixel_conversion_function(u32 format)
>  {
>  	switch (format) {
> @@ -247,6 +280,13 @@ void *get_pixel_conversion_function(u32 format)
>  	}
>  }
>  
> +/**
> + * Retrieve the correct write_pixel function for a specific format.
> + * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
> + * pointer is valid before using it in a vkms_writeback_job.
> + *
> + * @format: 4cc of the format
> + */
>  void *get_pixel_write_function(u32 format)
>  {
>  	switch (format) {
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/9] drm/vkms: Re-introduce line-per-line composition algorithm
  2024-02-26  8:46 ` [PATCH v3 5/9] drm/vkms: Re-introduce line-per-line composition algorithm Louis Chauvet
@ 2024-02-26 14:14   ` Arthur Grillo
  2024-02-27 15:02     ` Louis Chauvet
  0 siblings, 1 reply; 19+ messages in thread
From: Arthur Grillo @ 2024-02-26 14:14 UTC (permalink / raw)
  To: Louis Chauvet, Rodrigo Siqueira, Melissa Wen, Maíra Canal,
	Haneen Mohammed, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Jonathan Corbet, pekka.paalanen
  Cc: dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee



On 26/02/24 05:46, Louis Chauvet wrote:
> Re-introduce a line-by-line composition algorithm for each pixel format.
> This allows more performance by not requiring an indirection per pixel
> read. This patch is focused on readability of the code.
> 
> Line-by-line composition was introduced by [1] but rewritten back to
> pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
> on performance, and it was merged.
> 
> This patch is almost a revert of [2], but in addition efforts have been
> made to increase readability and maintainability of the rotation handling.
> The blend function is now divided in two parts:
> - Transformation of coordinates from the output referential to the source
> referential
> - Line conversion and blending
> 
> Most of the complexity of the rotation management is avoided by using
> drm_rect_* helpers. The remaining complexity is around the clipping, to
> avoid reading/writing outside source/destination buffers.
> 
> The pixel conversion is now done line-by-line, so the read_pixel_t was
> replaced with read_pixel_line_t callback. This way the indirection is only
> required once per line and per plane, instead of once per pixel and per
> plane.
> 
> The read_line_t callbacks are very similar for most pixel format, but it
> is required to avoid performance impact. Some helpers were created to
> avoid code repetition:
> - get_step_1x1: get the step in byte to reach next pixel block in a
>   certain direction
> - *_to_argb_u16: helpers to perform colors conversion. They should be
>   inlined by the compiler, and they are used to avoid repetition between
>   multiple variants of the same format (argb/xrgb and maybe in the
>   future for formats like bgr formats).
> 
> This new algorithm was tested with:
> - kms_plane (for color conversions)
> - kms_rotation_crc (for rotations of planes)
> - kms_cursor_crc (for translations of planes)
> The performance gain was mesured with:
> - kms_fb_stress
> 
> [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept
>      new formats")
>      https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@gmail.com/
> [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion
>      functionality")
>      https://lore.kernel.org/all/20230418130525.128733-2-mcanal@igalia.com/
> 
> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c | 219 +++++++++++++++++++++++-------
>  drivers/gpu/drm/vkms/vkms_drv.h      |  24 +++-
>  drivers/gpu/drm/vkms/vkms_formats.c  | 253 ++++++++++++++++++++++-------------
>  drivers/gpu/drm/vkms/vkms_formats.h  |   2 +-
>  drivers/gpu/drm/vkms/vkms_plane.c    |   8 +-
>  5 files changed, 349 insertions(+), 157 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 5b341222d239..e555bf9c1aee 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -24,9 +24,10 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>  
>  /**
>   * pre_mul_alpha_blend - alpha blending equation
> - * @frame_info: Source framebuffer's metadata
>   * @stage_buffer: The line with the pixels from src_plane
>   * @output_buffer: A line buffer that receives all the blends output
> + * @x_start: The start offset to avoid useless copy
> + * @count: The number of byte to copy
>   *
>   * Using the information from the `frame_info`, this blends only the
>   * necessary pixels from the `stage_buffer` to the `output_buffer`
> @@ -37,51 +38,23 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>   * drm_plane_create_blend_mode_property(). Also, this formula assumes a
>   * completely opaque background.
>   */
> -static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> -				struct line_buffer *stage_buffer,
> -				struct line_buffer *output_buffer)
> +static void pre_mul_alpha_blend(
> +	struct line_buffer *stage_buffer,
> +	struct line_buffer *output_buffer,
> +	int x_start,
> +	int pixel_count)
>  {
> -	int x_dst = frame_info->dst.x1;
> -	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> -	struct pixel_argb_u16 *in = stage_buffer->pixels;
> -	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> -			    stage_buffer->n_pixels);
> -
> -	for (int x = 0; x < x_limit; x++) {
> -		out[x].a = (u16)0xffff;
> -		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> -		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> -		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
> +	struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
> +	struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
> +
> +	for (int i = 0; i < pixel_count; i++) {
> +		out[i].a = (u16)0xffff;
> +		out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a);
> +		out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a);
> +		out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a);
>  	}
>  }
>  
> -static int get_y_pos(struct vkms_frame_info *frame_info, int y)
> -{
> -	if (frame_info->rotation & DRM_MODE_REFLECT_Y)
> -		return drm_rect_height(&frame_info->rotated) - y - 1;
> -
> -	switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) {
> -	case DRM_MODE_ROTATE_90:
> -		return frame_info->rotated.x2 - y - 1;
> -	case DRM_MODE_ROTATE_270:
> -		return y + frame_info->rotated.x1;
> -	default:
> -		return y;
> -	}
> -}
> -
> -static bool check_limit(struct vkms_frame_info *frame_info, int pos)
> -{
> -	if (drm_rotation_90_or_270(frame_info->rotation)) {
> -		if (pos >= 0 && pos < drm_rect_width(&frame_info->rotated))
> -			return true;
> -	} else {
> -		if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2)
> -			return true;
> -	}
> -
> -	return false;
> -}
>  
>  static void fill_background(const struct pixel_argb_u16 *background_color,
>  			    struct line_buffer *output_buffer)
> @@ -163,6 +136,37 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff
>  	}
>  }
>  
> +/**
> + * direction_for_rotation() - Helper to get the correct reading direction for a specific rotation
> + *
> + * @rotation: rotation to analyze
> + */
> +enum pixel_read_direction direction_for_rotation(unsigned int rotation)
> +{
> +	if (rotation & DRM_MODE_ROTATE_0) {
> +		if (rotation & DRM_MODE_REFLECT_X)
> +			return READ_LEFT;
> +		else
> +			return READ_RIGHT;
> +	} else if (rotation & DRM_MODE_ROTATE_90) {
> +		if (rotation & DRM_MODE_REFLECT_Y)
> +			return READ_UP;
> +		else
> +			return READ_DOWN;
> +	} else if (rotation & DRM_MODE_ROTATE_180) {
> +		if (rotation & DRM_MODE_REFLECT_X)
> +			return READ_RIGHT;
> +		else
> +			return READ_LEFT;
> +	} else if (rotation & DRM_MODE_ROTATE_270) {
> +		if (rotation & DRM_MODE_REFLECT_Y)
> +			return READ_DOWN;
> +		else
> +			return READ_UP;
> +	}
> +	return READ_RIGHT;
> +}
> +
>  /**
>   * blend - blend the pixels from all planes and compute crc
>   * @wb: The writeback frame buffer metadata
> @@ -183,11 +187,11 @@ static void blend(struct vkms_writeback_job *wb,
>  {
>  	struct vkms_plane_state **plane = crtc_state->active_planes;
>  	u32 n_active_planes = crtc_state->num_active_planes;
> -	int y_pos;
>  
>  	const struct pixel_argb_u16 background_color = { .a = 0xffff };
>  
>  	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> +	size_t crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;
>  
>  	/*
>  	 * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> @@ -198,22 +202,133 @@ static void blend(struct vkms_writeback_job *wb,
>  
>  		/* The active planes are composed associatively in z-order. */
>  		for (size_t i = 0; i < n_active_planes; i++) {
> -			y_pos = get_y_pos(plane[i]->frame_info, y);
> +			struct vkms_plane_state *current_plane = plane[i];
>  
> -			if (!check_limit(plane[i]->frame_info, y_pos))
> +			/* Avoid rendering useless lines */
> +			if (y < current_plane->frame_info->dst.y1 ||
> +			    y >= current_plane->frame_info->dst.y2) {
>  				continue;
> -
> -			vkms_compose_row(stage_buffer, plane[i], y_pos);
> -			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> -					    output_buffer);
> +			}
> +
> +			/*
> +			 * src_px is the line to copy. The initial coordinates are inside the

So maybe is better to rename to src_line?

Best Regards,
~Arthur Grillo

> +			 * destination framebuffer, and then drm_rect_* helpers are used to
> +			 * compute the correct position into the source framebuffer.
> +			 */
> +			struct drm_rect src_px = DRM_RECT_INIT(
> +				current_plane->frame_info->dst.x1, y,
> +				drm_rect_width(&current_plane->frame_info->dst), 1);
> +			struct drm_rect tmp_src;
> +
> +			drm_rect_fp_to_int(&tmp_src, &current_plane->frame_info->src);
> +
> +			/*
> +			 * [1]: Clamping src_px to the crtc_x_limit to avoid writing outside of the
> +			 * destination buffer
> +			 */
> +			src_px.x2 = min_t(int, src_px.x2, (int)crtc_x_limit);
> +
> +			/*
> +			 * Transform the coordinate x/y from the crtc to coordinates into
> +			 * coordinates for the src buffer.
> +			 *
> +			 * - Cancel the offset of the dst buffer.
> +			 * - Invert the rotation. This assumes that
> +			 *   dst = drm_rect_rotate(src, rotation) (dst and src have the
> +			 *   same size, but can be rotated).
> +			 * - Apply the offset of the source rectangle to the coordinate.
> +			 */
> +			drm_rect_translate(&src_px, -current_plane->frame_info->dst.x1,
> +					   -current_plane->frame_info->dst.y1);
> +			drm_rect_rotate_inv(&src_px,
> +					    drm_rect_width(&tmp_src),
> +					    drm_rect_height(&tmp_src),
> +					    current_plane->frame_info->rotation);
> +			drm_rect_translate(&src_px, tmp_src.x1, tmp_src.y1);
> +
> +			/* Get the correct reading direction in the source buffer. */
> +
> +			enum pixel_read_direction direction =
> +				direction_for_rotation(current_plane->frame_info->rotation);
> +
> +			int x_start = src_px.x1;
> +			int y_start = src_px.y1;
> +			int pixel_count;
> +			/* [2]: Compute and clamp the number of pixel to read */
> +			if (direction == READ_RIGHT || direction == READ_LEFT) {
> +				/*
> +				 * In horizontal reading, the src_px width is the number of pixel to
> +				 * read
> +				 */
> +				pixel_count = drm_rect_width(&src_px);
> +				if (x_start < 0) {
> +					pixel_count += x_start;
> +					x_start = 0;
> +				}
> +				if (x_start + pixel_count > current_plane->frame_info->fb->width) {
> +					pixel_count =
> +						(int)current_plane->frame_info->fb->width - x_start;
> +				}
> +			} else {
> +				/*
> +				 * In vertical reading, the src_px height is the number of pixel to
> +				 * read
> +				 */
> +				pixel_count = drm_rect_height(&src_px);
> +				if (y_start < 0) {
> +					pixel_count += y_start;
> +					y_start = 0;
> +				}
> +				if (y_start + pixel_count > current_plane->frame_info->fb->height) {
> +					pixel_count =
> +						(int)current_plane->frame_info->fb->width - y_start;
> +				}
> +			}
> +
> +			if (pixel_count <= 0) {
> +				/* Nothing to read, so avoid multiple function calls for nothing */
> +				continue;
> +			}
> +
> +			/*
> +			 * Modify the starting point to take in account the rotation
> +			 *
> +			 * src_px is the top-left corner, so when reading READ_LEFT or READ_TOP, it
> +			 * must be changed to the top-right/bottom-left corner.
> +			 */
> +			if (direction == READ_LEFT) {
> +				// x_start is now the right point
> +				x_start += pixel_count - 1;
> +			} else if (direction == READ_UP) {
> +				// y_start is now the bottom point
> +				y_start += pixel_count - 1;
> +			}
> +
> +			/*
> +			 * Perform the conversion and the blending
> +			 *
> +			 * Here we know that the read line (x_start, y_start, pixel_count) is
> +			 * inside the source buffer [2] and we don't write outside the stage
> +			 * buffer [1]
> +			 */
> +			current_plane->pixel_read_line(
> +				current_plane->frame_info,
> +				x_start,
> +				y_start,
> +				direction,
> +				pixel_count,
> +				&stage_buffer->pixels[current_plane->frame_info->dst.x1]);
> +
> +			pre_mul_alpha_blend(stage_buffer, output_buffer,
> +					    current_plane->frame_info->dst.x1,
> +					    pixel_count);
>  		}
>  
>  		apply_lut(crtc_state, output_buffer);
>  
>  		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
> -
>  		if (wb)
> -			vkms_writeback_row(wb, output_buffer, y_pos);
> +			vkms_writeback_row(wb, output_buffer, y);
>  	}
>  }
>  
> @@ -224,7 +339,7 @@ static int check_format_funcs(struct vkms_crtc_state *crtc_state,
>  	u32 n_active_planes = crtc_state->num_active_planes;
>  
>  	for (size_t i = 0; i < n_active_planes; i++)
> -		if (!planes[i]->pixel_read)
> +		if (!planes[i]->pixel_read_line)
>  			return -1;
>  
>  	if (active_wb && !active_wb->pixel_write)
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 886c885c8cf5..0bf49b3c435b 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -39,7 +39,6 @@
>  struct vkms_frame_info {
>  	struct drm_framebuffer *fb;
>  	struct drm_rect src, dst;
> -	struct drm_rect rotated;
>  	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
>  	unsigned int rotation;
>  };
> @@ -69,14 +68,26 @@ struct vkms_writeback_job {
>  	pixel_write_t pixel_write;
>  };
>  
> +enum pixel_read_direction {
> +	READ_UP,
> +	READ_DOWN,
> +	READ_LEFT,
> +	READ_RIGHT
> +};
> +
>  /**
> - * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
> + * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
>   * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
>   *
> - * @src_pixels: Pointer to the pixel to read
> - * @out_pixel: Pointer to write the converted pixel
> + * @frame_info: Frame used as source for the pixel value
> + * @y: Y (height) coordinate in the source buffer
> + * @x_start: X (width) coordinate of the first pixel to copy
> + * @x_end: X (width) coordinate of the last pixel to copy
> + * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> + *  x_end.
>   */
> -typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
> +typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> +	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>  
>  /**
>   * vkms_plane_state - Driver specific plane state
> @@ -88,7 +99,7 @@ typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
>  struct vkms_plane_state {
>  	struct drm_shadow_plane_state base;
>  	struct vkms_frame_info *frame_info;
> -	pixel_read_t pixel_read;
> +	pixel_read_line_t pixel_read_line;
>  };
>  
>  struct vkms_plane {
> @@ -193,7 +204,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const char *source_name,
>  /* Composer Support */
>  void vkms_composer_worker(struct work_struct *work);
>  void vkms_set_composer(struct vkms_output *out, bool enabled);
> -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y);
>  void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer *src_buffer, int y);
>  
>  /* Writeback */
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 1f5aeba57ad6..46daea6d3ee9 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -11,21 +11,29 @@
>  
>  /**
>   * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> - * in the first plane
>   *
>   * @frame_info: Buffer metadata
>   * @x: The x coordinate of the wanted pixel in the buffer
>   * @y: The y coordinate of the wanted pixel in the buffer
> + * @plane_index: The index of the plane to use
>   *
>   * The caller must be aware that this offset is not always a pointer to a pixel. If individual
>   * pixel values are needed, they have to be extracted from the resulting block.
>   */
> -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> +static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
> +				   size_t plane_index)
>  {
>  	struct drm_framebuffer *fb = frame_info->fb;
> -
> -	return fb->offsets[0] + (y * fb->pitches[0])
> -			      + (x * fb->format->cpp[0]);
> +	const struct drm_format_info *format = frame_info->fb->format;
> +	/* Directly using x and y to multiply pitches and format->ccp is not sufficient because
> +	 * in some formats a block can represent multiple pixels.
> +	 *
> +	 * Dividing x and y by the block size allows to extract the correct offset of the block
> +	 * containing the pixel.
> +	 */
> +	return fb->offsets[plane_index] +
> +	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> +	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
>  }
>  
>  /**
> @@ -35,44 +43,56 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
>   * @frame_info: Buffer metadata
>   * @x: The x(width) coordinate inside the plane
>   * @y: The y(height) coordinate inside the plane
> + * @plane_index: The index of the plane
>   *
> - * Takes the information stored in the frame_info, a pair of coordinates, and
> - * returns the address of the first color channel.
> - * This function assumes the channels are packed together, i.e. a color channel
> - * comes immediately after another in the memory. And therefore, this function
> - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> + * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
> + * of the block containing this pixel.
> + * The caller must be aware that this pointer is sometimes not directly a pixel, it needs some
> + * additional work to extract pixel color from this block.
>   */
>  static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> -				int x, int y)
> +				int x, int y, size_t plane_index)
>  {
> -	size_t offset = pixel_offset(frame_info, x, y);
> -
> -	return (u8 *)frame_info->map[0].vaddr + offset;
> +	return (u8 *)frame_info->map[0].vaddr + packed_pixels_offset(frame_info, x, y, plane_index);
>  }
>  
> -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> +/**
> + * get_step_1x1() - Common helper to compute the correct step value between each pixel to read in a
> + * certain direction.
> + * This must be used only with format where blockh == blockw == 1.
> + * In the case when direction is not a valid pixel_read_direction, the returned step is 0, so you
> + * must not rely on this result to create a loop variant.
> + *
> + * @fb Framebuffer to iter on
> + * @direction Direction of the reading
> + */
> +static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction direction,
> +			int plane_index)
>  {
> -	int x_src = frame_info->src.x1 >> 16;
> -	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> -
> -	return packed_pixels_addr(frame_info, x_src, y_src);
> +	switch (direction) {
> +	default:
> +		DRM_ERROR("Invalid direction for pixel reading: %d\n", direction);
> +		return 0;
> +	case READ_RIGHT:
> +		return fb->format->char_per_block[plane_index];
> +	case READ_LEFT:
> +		return -fb->format->char_per_block[plane_index];
> +	case READ_DOWN:
> +		return (int)fb->pitches[plane_index];
> +	case READ_UP:
> +		return -(int)fb->pitches[plane_index];
> +	}
>  }
>  
> -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> -{
> -	if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
> -		return limit - x - 1;
> -	return x;
> -}
>  
>  /*
> - * The following  functions take pixel data from the buffer and convert them to the format
> + * The following  functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
>   * ARGB16161616 in out_pixel.
>   *
> - * They are used in the `vkms_compose_row` function to handle multiple formats.
> + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
>   */
>  
> -static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> +static void ARGB8888_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
>  {
>  	/*
>  	 * The 257 is the "conversion ratio". This number is obtained by the
> @@ -80,48 +100,26 @@ static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixe
>  	 * the best color value in a pixel format with more possibilities.
>  	 * A similar idea applies to others RGB color conversions.
>  	 */
> -	out_pixel->a = (u16)src_pixels[3] * 257;
> -	out_pixel->r = (u16)src_pixels[2] * 257;
> -	out_pixel->g = (u16)src_pixels[1] * 257;
> -	out_pixel->b = (u16)src_pixels[0] * 257;
> -}
> -
> -static void XRGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> -{
> -	out_pixel->a = (u16)0xffff;
> -	out_pixel->r = (u16)src_pixels[2] * 257;
> -	out_pixel->g = (u16)src_pixels[1] * 257;
> -	out_pixel->b = (u16)src_pixels[0] * 257;
> +	out_pixel->a = (u16)a * 257;
> +	out_pixel->r = (u16)r * 257;
> +	out_pixel->g = (u16)g * 257;
> +	out_pixel->b = (u16)b * 257;
>  }
>  
> -static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> +static void ARGB16161616_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
>  {
> -	u16 *pixels = (u16 *)src_pixels;
> -
> -	out_pixel->a = le16_to_cpu(pixels[3]);
> -	out_pixel->r = le16_to_cpu(pixels[2]);
> -	out_pixel->g = le16_to_cpu(pixels[1]);
> -	out_pixel->b = le16_to_cpu(pixels[0]);
> -}
> -
> -static void XRGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> -{
> -	u16 *pixels = (u16 *)src_pixels;
> -
> -	out_pixel->a = (u16)0xffff;
> -	out_pixel->r = le16_to_cpu(pixels[2]);
> -	out_pixel->g = le16_to_cpu(pixels[1]);
> -	out_pixel->b = le16_to_cpu(pixels[0]);
> +	out_pixel->a = le16_to_cpu(a);
> +	out_pixel->r = le16_to_cpu(r);
> +	out_pixel->g = le16_to_cpu(g);
> +	out_pixel->b = le16_to_cpu(b);
>  }
>  
> -static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> +static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixel)
>  {
> -	u16 *pixels = (u16 *)src_pixels;
> -
>  	s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31));
>  	s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63));
>  
> -	u16 rgb_565 = le16_to_cpu(*pixels);
> +	u16 rgb_565 = le16_to_cpu(*pixel);
>  	s64 fp_r = drm_int2fixp((rgb_565 >> 11) & 0x1f);
>  	s64 fp_g = drm_int2fixp((rgb_565 >> 5) & 0x3f);
>  	s64 fp_b = drm_int2fixp(rgb_565 & 0x1f);
> @@ -132,34 +130,105 @@ static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
>  	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
>  }
>  
> -/**
> - * vkms_compose_row - compose a single row of a plane
> - * @stage_buffer: output line with the composed pixels
> - * @plane: state of the plane that is being composed
> - * @y: y coordinate of the row
> +/*
> + * The following functions are read_line function for each pixel format supported by VKMS.
>   *
> - * This function composes a single row of a plane. It gets the source pixels
> - * through the y coordinate (see get_packed_src_addr()) and goes linearly
> - * through the source pixel, reading the pixels and converting it to
> - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
> - * the source pixels are not traversed linearly. The source pixels are queried
> - * on each iteration in order to traverse the pixels vertically.
> + * They read a line starting at the point @x_start,@y_start following the @direction. The result
> + * is stored in @out_pixel and in the format ARGB16161616.
> + *
> + * Those function are very similar, but it is required for performance reason. In the past, some
> + * experiment were done, and with a generic loop the performance are very reduced [1].
> + *
> + * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/
>   */
> -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
> +
> +static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +			       enum pixel_read_direction direction, int count,
> +			       struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> +	int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> +	while (count) {
> +		u8 *px = (u8 *)src_pixels;
> +
> +		ARGB8888_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
> +		out_pixel += 1;
> +		src_pixels += step;
> +		count--;
> +	}
> +}
> +
> +static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +			       enum pixel_read_direction direction, int count,
> +			       struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> +	int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> +	while (count) {
> +		u8 *px = (u8 *)src_pixels;
> +
> +		ARGB8888_to_argb_u16(out_pixel, 255, px[2], px[1], px[0]);
> +		out_pixel += 1;
> +		src_pixels += step;
> +		count--;
> +	}
> +}
> +
> +static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +				   enum pixel_read_direction direction, int count,
> +				   struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> +	int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> +	while (count) {
> +		u16 *px = (u16 *)src_pixels;
> +
> +		ARGB16161616_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
> +		out_pixel += 1;
> +		src_pixels += step;
> +		count--;
> +	}
> +}
> +
> +static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +				   enum pixel_read_direction direction, int count,
> +				   struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> +	int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> +	while (count) {
> +		u16 *px = (u16 *)src_pixels;
> +
> +		ARGB16161616_to_argb_u16(out_pixel, 0xFFFF, px[2], px[1], px[0]);
> +		out_pixel += 1;
> +		src_pixels += step;
> +		count--;
> +	}
> +}
> +
> +static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +			     enum pixel_read_direction direction, int count,
> +			     struct pixel_argb_u16 out_pixel[])
>  {
> -	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> -	struct vkms_frame_info *frame_info = plane->frame_info;
> -	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> -	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
> +	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
>  
> -	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
> -		int x_pos = get_x_position(frame_info, limit, x);
> +	int step = get_step_1x1(frame_info->fb, direction, 0);
>  
> -		if (drm_rotation_90_or_270(frame_info->rotation))
> -			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
> -				+ frame_info->fb->format->cpp[0] * y;
> +	while (count) {
> +		u16 *px = (u16 *)src_pixels;
>  
> -		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
> +		RGB565_to_argb_u16(out_pixel, px);
> +		out_pixel += 1;
> +		src_pixels += step;
> +		count--;
>  	}
>  }
>  
> @@ -247,7 +316,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
>  {
>  	struct vkms_frame_info *frame_info = &wb->wb_frame_info;
>  	int x_dst = frame_info->dst.x1;
> -	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y, 0);
>  	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>  	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
>  
> @@ -256,27 +325,27 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
>  }
>  
>  /**
> - * Retrieve the correct read_pixel function for a specific format.
> + * Retrieve the correct read_line function for a specific format.
>   * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
>   * pointer is valid before using it in a vkms_plane_state.
>   *
>   * @format: 4cc of the format
>   */
> -pixel_read_t get_pixel_read_function(u32 format)
> +pixel_read_line_t get_pixel_read_line_function(u32 format)
>  {
>  	switch (format) {
>  	case DRM_FORMAT_ARGB8888:
> -		return &ARGB8888_to_argb_u16;
> +		return &ARGB8888_read_line;
>  	case DRM_FORMAT_XRGB8888:
> -		return &XRGB8888_to_argb_u16;
> +		return &XRGB8888_read_line;
>  	case DRM_FORMAT_ARGB16161616:
> -		return &ARGB16161616_to_argb_u16;
> +		return &ARGB16161616_read_line;
>  	case DRM_FORMAT_XRGB16161616:
> -		return &XRGB16161616_to_argb_u16;
> +		return &XRGB16161616_read_line;
>  	case DRM_FORMAT_RGB565:
> -		return &RGB565_to_argb_u16;
> +		return &RGB565_read_line;
>  	default:
> -		return (pixel_read_t)NULL;
> +		return (pixel_read_line_t)NULL;
>  	}
>  }
>  
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index 3ecea4563254..8d2bef95ff79 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -5,7 +5,7 @@
>  
>  #include "vkms_drv.h"
>  
> -pixel_read_t get_pixel_read_function(u32 format);
> +pixel_read_line_t get_pixel_read_line_function(u32 format);
>  
>  pixel_write_t get_pixel_write_function(u32 format);
>  
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index f68b1b03d632..58c1c74742b5 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -106,9 +106,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  		return;
>  
>  	fmt = fb->format->format;
> -	pixel_read_t pixel_read = get_pixel_read_function(fmt);
> +	pixel_read_line_t pixel_read_line = get_pixel_read_line_function(fmt);
>  
> -	if (!pixel_read) {
> +	if (!pixel_read_line) {
>  		DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
>  		return;
>  	}
> @@ -128,10 +128,8 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  									  DRM_MODE_REFLECT_X |
>  									  DRM_MODE_REFLECT_Y);
>  
> -	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
> -			drm_rect_height(&frame_info->rotated), frame_info->rotation);
>  
> -	vkms_plane_state->pixel_read = pixel_read;
> +	vkms_plane_state->pixel_read_line = pixel_read_line;
>  }
>  
>  static int vkms_plane_atomic_check(struct drm_plane *plane,
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/9] drm/vkms: Re-introduce line-per-line composition algorithm
  2024-02-26 14:14   ` Arthur Grillo
@ 2024-02-27 15:02     ` Louis Chauvet
  0 siblings, 0 replies; 19+ messages in thread
From: Louis Chauvet @ 2024-02-27 15:02 UTC (permalink / raw)
  To: Arthur Grillo
  Cc: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Jonathan Corbet, pekka.paalanen,
	dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee

Le 26/02/24 - 11:14, Arthur Grillo a écrit :
> 
> 
> On 26/02/24 05:46, Louis Chauvet wrote:
> > Re-introduce a line-by-line composition algorithm for each pixel format.
> > This allows more performance by not requiring an indirection per pixel
> > read. This patch is focused on readability of the code.
> > 
> > Line-by-line composition was introduced by [1] but rewritten back to
> > pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
> > on performance, and it was merged.
> > 
> > This patch is almost a revert of [2], but in addition efforts have been
> > made to increase readability and maintainability of the rotation handling.
> > The blend function is now divided in two parts:
> > - Transformation of coordinates from the output referential to the source
> > referential
> > - Line conversion and blending
> > 
> > Most of the complexity of the rotation management is avoided by using
> > drm_rect_* helpers. The remaining complexity is around the clipping, to
> > avoid reading/writing outside source/destination buffers.
> > 
> > The pixel conversion is now done line-by-line, so the read_pixel_t was
> > replaced with read_pixel_line_t callback. This way the indirection is only
> > required once per line and per plane, instead of once per pixel and per
> > plane.
> > 
> > The read_line_t callbacks are very similar for most pixel format, but it
> > is required to avoid performance impact. Some helpers were created to
> > avoid code repetition:
> > - get_step_1x1: get the step in byte to reach next pixel block in a
> >   certain direction
> > - *_to_argb_u16: helpers to perform colors conversion. They should be
> >   inlined by the compiler, and they are used to avoid repetition between
> >   multiple variants of the same format (argb/xrgb and maybe in the
> >   future for formats like bgr formats).
> > 
> > This new algorithm was tested with:
> > - kms_plane (for color conversions)
> > - kms_rotation_crc (for rotations of planes)
> > - kms_cursor_crc (for translations of planes)
> > The performance gain was mesured with:
> > - kms_fb_stress
> > 
> > [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept
> >      new formats")
> >      https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@gmail.com/
> > [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion
> >      functionality")
> >      https://lore.kernel.org/all/20230418130525.128733-2-mcanal@igalia.com/
> > 
> > Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> > ---
> >  drivers/gpu/drm/vkms/vkms_composer.c | 219 +++++++++++++++++++++++-------
> >  drivers/gpu/drm/vkms/vkms_drv.h      |  24 +++-
> >  drivers/gpu/drm/vkms/vkms_formats.c  | 253 ++++++++++++++++++++++-------------
> >  drivers/gpu/drm/vkms/vkms_formats.h  |   2 +-
> >  drivers/gpu/drm/vkms/vkms_plane.c    |   8 +-
> >  5 files changed, 349 insertions(+), 157 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > index 5b341222d239..e555bf9c1aee 100644
> > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > @@ -24,9 +24,10 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
> >  
> >  /**
> >   * pre_mul_alpha_blend - alpha blending equation
> > - * @frame_info: Source framebuffer's metadata
> >   * @stage_buffer: The line with the pixels from src_plane
> >   * @output_buffer: A line buffer that receives all the blends output
> > + * @x_start: The start offset to avoid useless copy
> > + * @count: The number of byte to copy
> >   *
> >   * Using the information from the `frame_info`, this blends only the
> >   * necessary pixels from the `stage_buffer` to the `output_buffer`
> > @@ -37,51 +38,23 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
> >   * drm_plane_create_blend_mode_property(). Also, this formula assumes a
> >   * completely opaque background.
> >   */
> > -static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> > -				struct line_buffer *stage_buffer,
> > -				struct line_buffer *output_buffer)
> > +static void pre_mul_alpha_blend(
> > +	struct line_buffer *stage_buffer,
> > +	struct line_buffer *output_buffer,
> > +	int x_start,
> > +	int pixel_count)
> >  {
> > -	int x_dst = frame_info->dst.x1;
> > -	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> > -	struct pixel_argb_u16 *in = stage_buffer->pixels;
> > -	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > -			    stage_buffer->n_pixels);
> > -
> > -	for (int x = 0; x < x_limit; x++) {
> > -		out[x].a = (u16)0xffff;
> > -		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> > -		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> > -		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
> > +	struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
> > +	struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
> > +
> > +	for (int i = 0; i < pixel_count; i++) {
> > +		out[i].a = (u16)0xffff;
> > +		out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a);
> > +		out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a);
> > +		out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a);
> >  	}
> >  }
> >  
> > -static int get_y_pos(struct vkms_frame_info *frame_info, int y)
> > -{
> > -	if (frame_info->rotation & DRM_MODE_REFLECT_Y)
> > -		return drm_rect_height(&frame_info->rotated) - y - 1;
> > -
> > -	switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) {
> > -	case DRM_MODE_ROTATE_90:
> > -		return frame_info->rotated.x2 - y - 1;
> > -	case DRM_MODE_ROTATE_270:
> > -		return y + frame_info->rotated.x1;
> > -	default:
> > -		return y;
> > -	}
> > -}
> > -
> > -static bool check_limit(struct vkms_frame_info *frame_info, int pos)
> > -{
> > -	if (drm_rotation_90_or_270(frame_info->rotation)) {
> > -		if (pos >= 0 && pos < drm_rect_width(&frame_info->rotated))
> > -			return true;
> > -	} else {
> > -		if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2)
> > -			return true;
> > -	}
> > -
> > -	return false;
> > -}
> >  
> >  static void fill_background(const struct pixel_argb_u16 *background_color,
> >  			    struct line_buffer *output_buffer)
> > @@ -163,6 +136,37 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff
> >  	}
> >  }
> >  
> > +/**
> > + * direction_for_rotation() - Helper to get the correct reading direction for a specific rotation
> > + *
> > + * @rotation: rotation to analyze
> > + */
> > +enum pixel_read_direction direction_for_rotation(unsigned int rotation)
> > +{
> > +	if (rotation & DRM_MODE_ROTATE_0) {
> > +		if (rotation & DRM_MODE_REFLECT_X)
> > +			return READ_LEFT;
> > +		else
> > +			return READ_RIGHT;
> > +	} else if (rotation & DRM_MODE_ROTATE_90) {
> > +		if (rotation & DRM_MODE_REFLECT_Y)
> > +			return READ_UP;
> > +		else
> > +			return READ_DOWN;
> > +	} else if (rotation & DRM_MODE_ROTATE_180) {
> > +		if (rotation & DRM_MODE_REFLECT_X)
> > +			return READ_RIGHT;
> > +		else
> > +			return READ_LEFT;
> > +	} else if (rotation & DRM_MODE_ROTATE_270) {
> > +		if (rotation & DRM_MODE_REFLECT_Y)
> > +			return READ_DOWN;
> > +		else
> > +			return READ_UP;
> > +	}
> > +	return READ_RIGHT;
> > +}
> > +
> >  /**
> >   * blend - blend the pixels from all planes and compute crc
> >   * @wb: The writeback frame buffer metadata
> > @@ -183,11 +187,11 @@ static void blend(struct vkms_writeback_job *wb,
> >  {
> >  	struct vkms_plane_state **plane = crtc_state->active_planes;
> >  	u32 n_active_planes = crtc_state->num_active_planes;
> > -	int y_pos;
> >  
> >  	const struct pixel_argb_u16 background_color = { .a = 0xffff };
> >  
> >  	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> > +	size_t crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;
> >  
> >  	/*
> >  	 * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> > @@ -198,22 +202,133 @@ static void blend(struct vkms_writeback_job *wb,
> >  
> >  		/* The active planes are composed associatively in z-order. */
> >  		for (size_t i = 0; i < n_active_planes; i++) {
> > -			y_pos = get_y_pos(plane[i]->frame_info, y);
> > +			struct vkms_plane_state *current_plane = plane[i];
> >  
> > -			if (!check_limit(plane[i]->frame_info, y_pos))
> > +			/* Avoid rendering useless lines */
> > +			if (y < current_plane->frame_info->dst.y1 ||
> > +			    y >= current_plane->frame_info->dst.y2) {
> >  				continue;
> > -
> > -			vkms_compose_row(stage_buffer, plane[i], y_pos);
> > -			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> > -					    output_buffer);
> > +			}
> > +
> > +			/*
> > +			 * src_px is the line to copy. The initial coordinates are inside the
> 
> So maybe is better to rename to src_line?

Good idea!

Kind regards,
Louis Chauvet

[...]

-- 
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions
  2024-02-26 13:07   ` Arthur Grillo
@ 2024-02-27 15:02     ` Louis Chauvet
  2024-02-27 18:47       ` Arthur Grillo
  0 siblings, 1 reply; 19+ messages in thread
From: Louis Chauvet @ 2024-02-27 15:02 UTC (permalink / raw)
  To: Arthur Grillo
  Cc: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Jonathan Corbet, pekka.paalanen,
	dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee

Le 26/02/24 - 10:07, Arthur Grillo a écrit :
> 
> 
> On 26/02/24 05:46, Louis Chauvet wrote:
> > Add some documentation on pixel conversion functions.
> > Update of outdated comments for pixel_write functions.
> > 
> > Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> > ---
> >  drivers/gpu/drm/vkms/vkms_composer.c |  4 +++
> >  drivers/gpu/drm/vkms/vkms_drv.h      | 13 ++++++++
> >  drivers/gpu/drm/vkms/vkms_formats.c  | 58 ++++++++++++++++++++++++++++++------
> >  3 files changed, 66 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > index c6d9b4a65809..5b341222d239 100644
> > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > @@ -189,6 +189,10 @@ static void blend(struct vkms_writeback_job *wb,
> >  
> >  	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> >  
> > +	/*
> > +	 * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> > +	 * blending performance.
> 
> At this moment in the series, you have not yet reintroduced the
> line-by-line algorithm yet. Maybe it's better to add this comment when
> you do.

Is it better with this:

	/*
	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
	 * complexity to avoid poor blending performance.
	 *
	 * The function vkms_compose_row is used to read a line, pixel-by-pixel, into the staging
	 * buffer.
	 */
 
> Also, I think it's good to give more context, like:
> "The planes are composed line-by-line, instead of pixel-by-pixel"

And after PATCHv3 5/9:

	/*
	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
	 * complexity to avoid poor blending performance.
	 *
	 * The function pixel_read_line callback is used to read a line, using an efficient 
	 * algorithm for a specific format, into the staging buffer.
	 */

Kind regards,
Louis Chauvet

> Best Regards,
> ~Arthur Grillo

[...]

-- 
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
  2024-02-26 12:44   ` Arthur Grillo
@ 2024-02-27 15:02     ` Louis Chauvet
  0 siblings, 0 replies; 19+ messages in thread
From: Louis Chauvet @ 2024-02-27 15:02 UTC (permalink / raw)
  To: Arthur Grillo
  Cc: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Jonathan Corbet, pekka.paalanen,
	dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee

[...]

> > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > index d5203f531d96..f68b1b03d632 100644
> > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > @@ -106,6 +106,13 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> >  		return;
> >  
> >  	fmt = fb->format->format;
> > +	pixel_read_t pixel_read = get_pixel_read_function(fmt);
> > +
> > +	if (!pixel_read) {
> > +		DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
> 
> s/inchanged/unchanged/

Thanks for this correction. See my answer to [1], I changed the message a 
bit.
[1]: https://lore.kernel.org/dri-devel/20240226133646.174d3fb2.pekka.paalanen@collabora.com/

Kind regards,
Louis Chauvet

> > +		return;
> > +	}
> > +
> >  	vkms_plane_state = to_vkms_plane_state(new_state);
> >  	shadow_plane_state = &vkms_plane_state->base;
> >  
> > @@ -124,7 +131,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> >  	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
> >  			drm_rect_height(&frame_info->rotated), frame_info->rotation);
> >  
> > -	vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt);
> > +	vkms_plane_state->pixel_read = pixel_read;
> >  }
> >  
> >  static int vkms_plane_atomic_check(struct drm_plane *plane,
> > diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> > index c8582df1f739..c92b9f06c4a4 100644
> > --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> > +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> > @@ -140,6 +140,13 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
> >  	if (!conn_state)
> >  		return;
> >  
> > +	pixel_write_t pixel_write = get_pixel_write_function(wb_format);
> > +
> > +	if (!pixel_write) {
> > +		DRM_WARN("Pixel format is not supported by VKMS writeback. State is inchanged\n");
> 
> Same here
> 
> Best Regards,
> ~Arthur Grillo
> 
> > +		return;
> > +	}
> > +
> >  	vkms_set_composer(&vkmsdev->output, true);
> >  
> >  	active_wb = conn_state->writeback_job->priv;
> > @@ -150,7 +157,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
> >  	crtc_state->wb_pending = true;
> >  	spin_unlock_irq(&output->composer_lock);
> >  	drm_writeback_queue_job(wb_conn, connector_state);
> > -	active_wb->pixel_write = get_pixel_write_function(wb_format);
> > +	active_wb->pixel_write = pixel_write;
> >  	drm_rect_init(&wb_frame_info->src, 0, 0, crtc_width, crtc_height);
> >  	drm_rect_init(&wb_frame_info->dst, 0, 0, crtc_width, crtc_height);
> >  }
> > 

-- 
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions
  2024-02-27 15:02     ` Louis Chauvet
@ 2024-02-27 18:47       ` Arthur Grillo
  2024-02-29 12:41         ` Pekka Paalanen
  0 siblings, 1 reply; 19+ messages in thread
From: Arthur Grillo @ 2024-02-27 18:47 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Jonathan Corbet, pekka.paalanen,
	dri-devel, linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee



On 27/02/24 12:02, Louis Chauvet wrote:
> Le 26/02/24 - 10:07, Arthur Grillo a écrit :
>>
>>
>> On 26/02/24 05:46, Louis Chauvet wrote:
>>> Add some documentation on pixel conversion functions.
>>> Update of outdated comments for pixel_write functions.
>>>
>>> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
>>> ---
>>>  drivers/gpu/drm/vkms/vkms_composer.c |  4 +++
>>>  drivers/gpu/drm/vkms/vkms_drv.h      | 13 ++++++++
>>>  drivers/gpu/drm/vkms/vkms_formats.c  | 58 ++++++++++++++++++++++++++++++------
>>>  3 files changed, 66 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>> index c6d9b4a65809..5b341222d239 100644
>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>> @@ -189,6 +189,10 @@ static void blend(struct vkms_writeback_job *wb,
>>>  
>>>  	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
>>>  
>>> +	/*
>>> +	 * The planes are composed line-by-line. It is a necessary complexity to avoid poor
>>> +	 * blending performance.
>>
>> At this moment in the series, you have not yet reintroduced the
>> line-by-line algorithm yet. Maybe it's better to add this comment when
>> you do.
> 
> Is it better with this:
> 
> 	/*
> 	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
> 	 * complexity to avoid poor blending performance.
> 	 *
> 	 * The function vkms_compose_row is used to read a line, pixel-by-pixel, into the staging
> 	 * buffer.
> 	 */
>  
>> Also, I think it's good to give more context, like:
>> "The planes are composed line-by-line, instead of pixel-by-pixel"
> 
> And after PATCHv3 5/9:
> 
> 	/*
> 	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
> 	 * complexity to avoid poor blending performance.
> 	 *
> 	 * The function pixel_read_line callback is used to read a line, using an efficient 
> 	 * algorithm for a specific format, into the staging buffer.
> 	 */
> 

Hi,

This looks good to me.

Best Regards,
~Arthur Grillo

> Kind regards,
> Louis Chauvet
> 
>> Best Regards,
>> ~Arthur Grillo
> 
> [...]
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions
  2024-02-27 18:47       ` Arthur Grillo
@ 2024-02-29 12:41         ` Pekka Paalanen
  0 siblings, 0 replies; 19+ messages in thread
From: Pekka Paalanen @ 2024-02-29 12:41 UTC (permalink / raw)
  To: Arthur Grillo
  Cc: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Jonathan Corbet, dri-devel,
	linux-kernel, jeremie.dautheribes, miquel.raynal,
	thomas.petazzoni, seanpaul, marcheu, nicolejadeyee

[-- Attachment #1: Type: text/plain, Size: 3471 bytes --]

On Tue, 27 Feb 2024 15:47:08 -0300
Arthur Grillo <arthurgrillo@riseup.net> wrote:

> On 27/02/24 12:02, Louis Chauvet wrote:
> > Le 26/02/24 - 10:07, Arthur Grillo a écrit :  
> >>
> >>
> >> On 26/02/24 05:46, Louis Chauvet wrote:  
> >>> Add some documentation on pixel conversion functions.
> >>> Update of outdated comments for pixel_write functions.
> >>>
> >>> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> >>> ---
> >>>  drivers/gpu/drm/vkms/vkms_composer.c |  4 +++
> >>>  drivers/gpu/drm/vkms/vkms_drv.h      | 13 ++++++++
> >>>  drivers/gpu/drm/vkms/vkms_formats.c  | 58 ++++++++++++++++++++++++++++++------
> >>>  3 files changed, 66 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> >>> index c6d9b4a65809..5b341222d239 100644
> >>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> >>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> >>> @@ -189,6 +189,10 @@ static void blend(struct vkms_writeback_job *wb,
> >>>  
> >>>  	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> >>>  
> >>> +	/*
> >>> +	 * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> >>> +	 * blending performance.  
> >>
> >> At this moment in the series, you have not yet reintroduced the
> >> line-by-line algorithm yet. Maybe it's better to add this comment when
> >> you do.  
> > 
> > Is it better with this:
> > 
> > 	/*
> > 	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
> > 	 * complexity to avoid poor blending performance.
> > 	 *
> > 	 * The function vkms_compose_row is used to read a line, pixel-by-pixel, into the staging
> > 	 * buffer.
> > 	 */
> >    
> >> Also, I think it's good to give more context, like:
> >> "The planes are composed line-by-line, instead of pixel-by-pixel"  
> > 
> > And after PATCHv3 5/9:
> > 
> > 	/*
> > 	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
> > 	 * complexity to avoid poor blending performance.
> > 	 *
> > 	 * The function pixel_read_line callback is used to read a line, using an efficient 
> > 	 * algorithm for a specific format, into the staging buffer.
> > 	 */
> >   

Hi,

there are a few reasons for the line-by-line algorithm, and the
optimizations at large:

VKMS uses temporary stage and output buffers so that blending functions
can operate on just one high-precision pixel format, struct
pixel_argb_u16. We can make pixel-format-specific read and write
functions completely orthogonal from the blending operations and FB
format combinations. This avoids a combinatorial explosion of needed
functions for { input pixel formats × blending operations × output pixel
formats }.

We can use a temporary stage and output buffer whose size is one line
and not whole FB or CRTC framebuffer. This is the memory savings.

Using a temporary output buffer also avoids repeated
read-decode-blend-encode-write cycles into the final destination
buffer, as we don't need to decode/encode the pixel format.

Finally, doing elementary operations (read, blend, write) line-by-line
is much more efficient than pixel-by-pixel, because it allows making
the inner-most loop very tight. It avoids repeatedly computing a result
that does not change, like which function to call for a specific pixel
format or blending equation.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-02-29 12:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-26  8:46 [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
2024-02-26  8:46 ` [PATCH v3 1/9] drm/vkms: Code formatting Louis Chauvet
2024-02-26  8:46 ` [PATCH v3 2/9] drm/vkms: Use drm_frame directly Louis Chauvet
2024-02-26  8:46 ` [PATCH v3 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions Louis Chauvet
2024-02-26 13:07   ` Arthur Grillo
2024-02-27 15:02     ` Louis Chauvet
2024-02-27 18:47       ` Arthur Grillo
2024-02-29 12:41         ` Pekka Paalanen
2024-02-26  8:46 ` [PATCH v3 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions Louis Chauvet
2024-02-26 12:44   ` Arthur Grillo
2024-02-27 15:02     ` Louis Chauvet
2024-02-26  8:46 ` [PATCH v3 5/9] drm/vkms: Re-introduce line-per-line composition algorithm Louis Chauvet
2024-02-26 14:14   ` Arthur Grillo
2024-02-27 15:02     ` Louis Chauvet
2024-02-26  8:46 ` [PATCH v3 6/9] drm/vkms: Add YUV support Louis Chauvet
2024-02-26  8:46 ` [PATCH v3 7/9] drm/vkms: Add range and encoding properties to pixel_read function Louis Chauvet
2024-02-26  8:46 ` [PATCH v3 8/9] drm/vkms: Drop YUV formats TODO Louis Chauvet
2024-02-26  8:46 ` [PATCH v3 9/9] drm/vkms: Create KUnit tests for YUV conversions Louis Chauvet
2024-02-26 12:29 ` [PATCH v3 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Pekka Paalanen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).