From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dri-devel-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id BAD98C25B08
	for <dri-devel@archiver.kernel.org>; Sat, 20 Aug 2022 10:51:40 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 4CA7510E099;
	Sat, 20 Aug 2022 10:51:39 +0000 (UTC)
Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 341B410E07C
 for <dri-devel@lists.freedesktop.org>; Sat, 20 Aug 2022 10:51:33 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; 
 s=20170329;
 h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:
 Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
 List-Post:List-Owner:List-Archive;
 bh=05ROpri2+MY5RBnekTNuPKXBxMwq5zsRnB718SeniNw=; b=XTCWc83vgEdBXe3xRJTx495Cvz
 e/9CBalM7jb/QoOzdamcGjYejRc2kndwdj/h6CSEmUd647yrTOm66CfLMu+wGOsQVmm5mNlTTvD8X
 sxvWtUnhTzqd1TpKtrC2Xnv0zHKasZfo3z4nImDdUNSnWa+Twou26VPulki4sgpV0xxjyx3RDaiFg
 r7b481gfh8lUfxFNggnBSqRkhfO7D2JfcC/iYudyuAGgsdsZvMuIUDONy/9Jcoku4dWK24o1qpobn
 s6bOKugvVKuN804Ubn9yVwmldU3JYUcfGEvGwGiz0Yf34Gq8gCvxJuo3Ut5zg+i6tH+XuDbkHPtfI
 WH2/f9lA==;
Received: from [165.90.126.25] (helo=mail.igalia.com)
 by fanzine2.igalia.com with esmtpsa 
 (Cipher TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim)
 id 1oPM4W-00CznD-H8; Sat, 20 Aug 2022 12:51:20 +0200
Date: Sat, 20 Aug 2022 09:51:07 -0100
From: Melissa Wen <mwen@igalia.com>
To: Igor Torrente <igormtorrente@gmail.com>
Subject: Re: [RESEND v6 6/9] drm: vkms: Refactor the plane composer to accept
 new formats
Message-ID: <20220820105046.cittsquvjvenw54z@mail.igalia.com>
References: <20220819182411.20246-1-igormtorrente@gmail.com>
 <20220819182411.20246-7-igormtorrente@gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature"; boundary="qhewaprldssdcmak"
Content-Disposition: inline
In-Reply-To: <20220819182411.20246-7-igormtorrente@gmail.com>
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Cc: hamohammed.sa@gmail.com, kernel test robot <lkp@intel.com>,
 rodrigosiqueiramelo@gmail.com, airlied@linux.ie, tales.aparecida@gmail.com,
 leandro.ribeiro@collabora.com, melissa.srw@gmail.com, ppaalanen@gmail.com,
 dri-devel@lists.freedesktop.org, tzimmermann@suse.de,
 ~lkcamp/patches@lists.sr.ht
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>


--qhewaprldssdcmak
Content-Type: multipart/signed; micalg=pgp-sha512;
	protocol="application/pgp-signature"; boundary="pzrs67g3khiydb7v"
Content-Disposition: inline


--pzrs67g3khiydb7v
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 08/19, Igor Torrente wrote:
> Currently the blend function only accepts XRGB_8888 and ARGB_8888
> as a color input.
>=20
> This patch refactors all the functions related to the plane composition
> to overcome this limitation.
>=20
> The pixels blend is done using the new internal format. And new handlers
> are being added to convert a specific format to/from this internal format.
>=20
> So the blend operation depends on these handlers to convert to this common
> format. The blended result, if necessary, is converted to the writeback
> buffer format.
>=20
> This patch introduces three major differences to the blend function.
> 1 - All the planes are blended at once.
> 2 - The blend calculus is done as per line instead of per pixel.
> 3 - It is responsible to calculates the CRC and writing the writeback
> buffer(if necessary).
>=20
> These changes allow us to allocate way less memory in the intermediate
> buffer to compute these operations. Because now we don't need to
> have the entire intermediate image lines at once, just one line is
> enough.
>=20
> | Memory consumption (output dimensions) |
> |:--------------------------------------:|
> |       Current      |     This patch    |
> |:------------------:|:-----------------:|
> |   Width * Heigth   |     2 * Width     |
>=20
> Beyond memory, we also have a minor performance benefit from all
> these changes. Results running the IGT[1] test
> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
>=20
> |                 Frametime                  |
> |:------------------------------------------:|
> |  Implementation |  Current  |  This commit |
> |:---------------:|:---------:|:------------:|
> | frametime range |  9~22 ms  |    5~17 ms   |
> |     Average     |  11.4 ms  |    7.8 ms    |
>=20
> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
>=20
> V2: Improves the performance drastically, by performing the operations
>     per-line and not per-pixel(Pekka Paalanen).
>     Minor improvements(Pekka Paalanen).
> V3: Changes the code to blend the planes all at once. This improves
>     performance, memory consumption, and removes much of the weirdness
>     of the V2(Pekka Paalanen and me).
>     Minor improvements(Pekka Paalanen and me).
> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
> V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen).
>     Several security/robustness improvents(Pekka Paalanen).
>     Removes check_planes_x_bounds function and allows partial
>     partly off-screen(Pekka Paalanen).
> V6: Fix a mismatch of some variable sizes (Pekka Paalanen).
>     Several minor improvements (Pekka Paalanen).
>=20
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  Documentation/gpu/vkms.rst            |   4 -
>  drivers/gpu/drm/vkms/Makefile         |   1 +
>  drivers/gpu/drm/vkms/vkms_composer.c  | 320 ++++++++++++--------------
>  drivers/gpu/drm/vkms/vkms_formats.c   | 155 +++++++++++++
>  drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
>  drivers/gpu/drm/vkms/vkms_plane.c     |   3 +
>  drivers/gpu/drm/vkms/vkms_writeback.c |   3 +
>  7 files changed, 317 insertions(+), 181 deletions(-)
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
>=20
> diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
> index 973e2d43108b..a49e4ae92653 100644
> --- a/Documentation/gpu/vkms.rst
> +++ b/Documentation/gpu/vkms.rst
> @@ -118,10 +118,6 @@ Add Plane Features
> =20
>  There's lots of plane features we could add support for:
> =20
> -- Clearing primary plane: clear primary plane before plane composition (=
at the
> -  start) for correctness of pixel blend ops. It also guarantees alpha ch=
annel
> -  is cleared in the target buffer for stable crc. [Good to get started]
> -
>  - ARGB format on primary plane: blend the primary plane into background =
with
>    translucent alpha.
> =20
> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
> index 72f779cbfedd..1b28a6a32948 100644
> --- a/drivers/gpu/drm/vkms/Makefile
> +++ b/drivers/gpu/drm/vkms/Makefile
> @@ -3,6 +3,7 @@ vkms-y :=3D \
>  	vkms_drv.o \
>  	vkms_plane.o \
>  	vkms_output.o \
> +	vkms_formats.o \
>  	vkms_crtc.o \
>  	vkms_composer.o \
>  	vkms_writeback.o
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/=
vkms_composer.c
> index b9fb408e8973..5b1a8bdd8268 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -7,204 +7,188 @@
>  #include <drm/drm_fourcc.h>
>  #include <drm/drm_gem_framebuffer_helper.h>
>  #include <drm/drm_vblank.h>
> +#include <linux/minmax.h>
> =20
>  #include "vkms_drv.h"
> =20
> -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> -				 const struct vkms_frame_info *frame_info)
> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>  {
> -	u32 pixel;
> -	int src_offset =3D frame_info->offset + (y * frame_info->pitch)
> -					    + (x * frame_info->cpp);
> +	u32 new_color;
> =20
> -	pixel =3D *(u32 *)&buffer[src_offset];
> +	new_color =3D (src * 0xffff + dst * (0xffff - alpha));
> =20
> -	return pixel;
> +	return DIV_ROUND_CLOSEST(new_color, 0xffff);
>  }
> =20
>  /**
> - * compute_crc - Compute CRC value on output frame
> + * pre_mul_alpha_blend - alpha blending equation
> + * @src_frame_info: source framebuffer's metadata
> + * @stage_buffer: The line with the pixels from src_plane
> + * @output_buffer: A line buffer that receives all the blends output
>   *
> - * @vaddr: address to final framebuffer
> - * @frame_info: framebuffer's metadata
> + * Using the information from the `frame_info`, this blends only the
> + * necessary pixels from the `stage_buffer` to the `output_buffer`
> + * using premultiplied blend formula.
>   *
> - * returns CRC value computed using crc32 on the visible portion of
> - * the final framebuffer at vaddr_out
> + * The current DRM assumption is that pixel color values have been alrea=
dy
> + * pre-multiplied with the alpha channel values. See more
> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
> + * completely opaque background.
>   */
> -static uint32_t compute_crc(const u8 *vaddr,
> -			    const struct vkms_frame_info *frame_info)
> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> +				struct line_buffer *stage_buffer,
> +				struct line_buffer *output_buffer)
>  {
> -	int x, y;
> -	u32 crc =3D 0, pixel =3D 0;
> -	int x_src =3D frame_info->src.x1 >> 16;
> -	int y_src =3D frame_info->src.y1 >> 16;
> -	int h_src =3D drm_rect_height(&frame_info->src) >> 16;
> -	int w_src =3D drm_rect_width(&frame_info->src) >> 16;
> -
> -	for (y =3D y_src; y < y_src + h_src; ++y) {
> -		for (x =3D x_src; x < x_src + w_src; ++x) {
> -			pixel =3D get_pixel_from_buffer(x, y, vaddr, frame_info);
> -			crc =3D crc32_le(crc, (void *)&pixel, sizeof(u32));
> -		}
> +	int x_dst =3D frame_info->dst.x1;
> +	struct pixel_argb_u16 *out =3D output_buffer->pixels + x_dst;
> +	struct pixel_argb_u16 *in =3D stage_buffer->pixels;
> +	int x_limit =3D min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    stage_buffer->n_pixels);
> +
> +	for (int x =3D 0; x < x_limit; x++) {
> +		out[x].a =3D (u16)0xffff;
> +		out[x].r =3D pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> +		out[x].g =3D pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> +		out[x].b =3D pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>  	}
> -
> -	return crc;
>  }
> =20
> -static u8 blend_channel(u8 src, u8 dst, u8 alpha)
> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>  {
> -	u32 pre_blend;
> -	u8 new_color;
> -
> -	pre_blend =3D (src * 255 + dst * (255 - alpha));
> -
> -	/* Faster div by 255 */
> -	new_color =3D ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
> +	if (y >=3D frame_info->dst.y1 && y < frame_info->dst.y2)
> +		return true;
> =20
> -	return new_color;
> +	return false;
>  }
> =20
>  /**
> - * alpha_blend - alpha blending equation
> - * @argb_src: src pixel on premultiplied alpha mode
> - * @argb_dst: dst pixel completely opaque
> + * @wb_frame_info: The writeback frame buffer metadata
> + * @crtc_state: The crtc state
> + * @crc32: The crc output of the final frame
> + * @output_buffer: A buffer of a row that will receive the result of the=
 blend(s)
> + * @stage_buffer: The line with the pixels from plane being blend to the=
 output
>   *
> - * blend pixels using premultiplied blend formula. The current DRM assum=
ption
> - * is that pixel color values have been already pre-multiplied with the =
alpha
> - * channel values. See more drm_plane_create_blend_mode_property(). Also=
, this
> - * formula assumes a completely opaque background.
> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
> + * from all planes, calculates the crc32 of the output from the former s=
tep,
> + * and, if necessary, convert and store the output to the writeback buff=
er.
>   */
> -static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
> +static void blend(struct vkms_writeback_job *wb,
> +		  struct vkms_crtc_state *crtc_state,
> +		  u32 *crc32, struct line_buffer *stage_buffer,
> +		  struct line_buffer *output_buffer, size_t row_size)
>  {
> -	u8 alpha;
> +	struct vkms_plane_state **plane =3D crtc_state->active_planes;
> +	struct vkms_frame_info *primary_plane_info =3D plane[0]->frame_info;
> +	u32 n_active_planes =3D crtc_state->num_active_planes;
> +
> +	int y_dst =3D primary_plane_info->dst.y1;
> +	int h_dst =3D drm_rect_height(&primary_plane_info->dst);
> +	int y_limit =3D y_dst + h_dst;
> +
> +	for (size_t y =3D y_dst; y < y_limit; y++) {
> +		plane[0]->plane_read(output_buffer, primary_plane_info, y);
> +
> +		/* If there are other planes besides primary, we consider the active
> +		 * planes should be in z-order and compose them associatively:
> +		 * ((primary <- overlay) <- cursor)
> +		 */
> +		for (size_t i =3D 1; i < n_active_planes; i++) {
> +			if (!check_y_limit(plane[i]->frame_info, y))
> +				continue;
> +
> +			plane[i]->plane_read(stage_buffer, plane[i]->frame_info, y);
> +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> +					    output_buffer);
> +		}
> +
> +		*crc32 =3D crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
> =20
> -	alpha =3D argb_src[3];
> -	argb_dst[0] =3D blend_channel(argb_src[0], argb_dst[0], alpha);
> -	argb_dst[1] =3D blend_channel(argb_src[1], argb_dst[1], alpha);
> -	argb_dst[2] =3D blend_channel(argb_src[2], argb_dst[2], alpha);
> +		if (wb)
> +			wb->wb_write(&wb->wb_frame_info, output_buffer, y);
> +	}
>  }
> =20
> -/**
> - * x_blend - blending equation that ignores the pixel alpha
> - *
> - * overwrites RGB color value from src pixel to dst pixel.
> - */
> -static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
> +static int check_format_funcs(struct vkms_crtc_state *crtc_state,
> +			      struct vkms_writeback_job *active_wb)
>  {
> -	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
> +	struct vkms_plane_state **planes =3D crtc_state->active_planes;
> +	u32 n_active_planes =3D crtc_state->num_active_planes;
> +
> +	for (size_t i =3D 0; i < n_active_planes; i++)
> +		if (!planes[i]->plane_read)
> +			return -1;
> +
> +	if (active_wb && !active_wb->wb_write)
> +		return -1;
> +
> +	return 0;
>  }
> =20
> -/**
> - * blend - blend value at vaddr_src with value at vaddr_dst
> - * @vaddr_dst: destination address
> - * @vaddr_src: source address
> - * @dst_frame_info: destination framebuffer's metadata
> - * @src_frame_info: source framebuffer's metadata
> - * @pixel_blend: blending equation based on plane format
> - *
> - * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
> - * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8=
888)
> - * and clearing alpha channel to an completely opaque background. This f=
unction
> - * uses buffer's metadata to locate the new composite values at vaddr_ds=
t.
> - *
> - * TODO: completely clear the primary plane (a =3D 0xff) before starting=
 to blend
> - * pixel color values
> - */
> -static void blend(void *vaddr_dst, void *vaddr_src,
> -		  struct vkms_frame_info *dst_frame_info,
> -		  struct vkms_frame_info *src_frame_info,
> -		  void (*pixel_blend)(const u8 *, u8 *))
> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
> +				 struct vkms_crtc_state *crtc_state,
> +				 u32 *crc32)
>  {
> -	int i, j, j_dst, i_dst;
> -	int offset_src, offset_dst;
> -	u8 *pixel_dst, *pixel_src;
> -
> -	int x_src =3D src_frame_info->src.x1 >> 16;
> -	int y_src =3D src_frame_info->src.y1 >> 16;
> -
> -	int x_dst =3D src_frame_info->dst.x1;
> -	int y_dst =3D src_frame_info->dst.y1;
> -	int h_dst =3D drm_rect_height(&src_frame_info->dst);
> -	int w_dst =3D drm_rect_width(&src_frame_info->dst);
> -
> -	int y_limit =3D y_src + h_dst;
> -	int x_limit =3D x_src + w_dst;
> -
> -	for (i =3D y_src, i_dst =3D y_dst; i < y_limit; ++i) {
> -		for (j =3D x_src, j_dst =3D x_dst; j < x_limit; ++j) {
> -			offset_dst =3D dst_frame_info->offset
> -				     + (i_dst * dst_frame_info->pitch)
> -				     + (j_dst++ * dst_frame_info->cpp);
> -			offset_src =3D src_frame_info->offset
> -				     + (i * src_frame_info->pitch)
> -				     + (j * src_frame_info->cpp);
> -
> -			pixel_src =3D (u8 *)(vaddr_src + offset_src);
> -			pixel_dst =3D (u8 *)(vaddr_dst + offset_dst);
> -			pixel_blend(pixel_src, pixel_dst);
> -			/* clearing alpha channel (0xff)*/
> -			pixel_dst[3] =3D 0xff;
> -		}
> -		i_dst++;
> +	size_t line_width, pixel_size =3D sizeof(struct pixel_argb_u16);
> +	struct vkms_frame_info *primary_plane_info =3D NULL;
> +	struct line_buffer output_buffer, stage_buffer;
> +	struct vkms_plane_state *act_plane =3D NULL;
> +	int ret =3D 0;
> +
> +	/*
> +	 * This check exists so we can call `crc32_le` for the entire line
> +	 * instead doing it for each channel of each pixel in case
> +	 * `struct `pixel_argb_u16` had any gap added by the compiler
> +	 * between the struct fields.
> +	 */
> +	static_assert(sizeof(struct pixel_argb_u16) =3D=3D 8);
> +
> +	if (crtc_state->num_active_planes >=3D 1) {
> +		act_plane =3D crtc_state->active_planes[0];
> +		if (act_plane->base.base.plane->type =3D=3D DRM_PLANE_TYPE_PRIMARY)
> +			primary_plane_info =3D act_plane->frame_info;
>  	}
> -}
> =20
> -static void compose_plane(struct vkms_frame_info *primary_plane_info,
> -			  struct vkms_frame_info *plane_frame_info,
> -			  void *vaddr_out)
> -{
> -	struct drm_framebuffer *fb =3D plane_frame_info->fb;
> -	void *vaddr;
> -	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
> +	if (!primary_plane_info)
> +		return -EINVAL;
> =20
>  	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
> -		return;
> +		return -EINVAL;
> =20
> -	vaddr =3D plane_frame_info->map[0].vaddr;
> +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
> +		return -EINVAL;
> =20
> -	if (fb->format->format =3D=3D DRM_FORMAT_ARGB8888)
> -		pixel_blend =3D &alpha_blend;
> -	else
> -		pixel_blend =3D &x_blend;
> +	line_width =3D drm_rect_width(&primary_plane_info->dst);
> +	stage_buffer.n_pixels =3D line_width;
> +	output_buffer.n_pixels =3D line_width;
> =20
> -	blend(vaddr_out, vaddr, primary_plane_info,
> -	      plane_frame_info, pixel_blend);
> -}
> +	stage_buffer.pixels =3D kvmalloc(line_width * pixel_size, GFP_KERNEL);
> +	if (!stage_buffer.pixels) {
> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
> +		return -ENOMEM;
> +	}
> =20
> -static int compose_active_planes(void **vaddr_out,
> -				 struct vkms_frame_info *primary_plane_info,
> -				 struct vkms_crtc_state *crtc_state)
> -{
> -	struct drm_framebuffer *fb =3D primary_plane_info->fb;
> -	struct drm_gem_object *gem_obj =3D drm_gem_fb_get_obj(fb, 0);
> -	const void *vaddr;
> -	int i;
> -
> -	if (!*vaddr_out) {
> -		*vaddr_out =3D kvzalloc(gem_obj->size, GFP_KERNEL);
> -		if (!*vaddr_out) {
> -			DRM_ERROR("Cannot allocate memory for output frame.");
> -			return -ENOMEM;
> -		}
> +	output_buffer.pixels =3D kvmalloc(line_width * pixel_size, GFP_KERNEL);
> +	if (!output_buffer.pixels) {
> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
> +		ret =3D -ENOMEM;
> +		goto free_stage_buffer;
>  	}
> =20
> -	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
> -		return -EINVAL;
> +	if (active_wb) {
> +		struct vkms_frame_info *wb_frame_info =3D &active_wb->wb_frame_info;
> =20
> -	vaddr =3D primary_plane_info->map[0].vaddr;
> +		wb_frame_info->src =3D primary_plane_info->src;
> +		wb_frame_info->dst =3D primary_plane_info->dst;
> +	}
> =20
> -	memcpy(*vaddr_out, vaddr, gem_obj->size);
> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
> +	      &output_buffer, line_width * pixel_size);
> =20
> -	/* If there are other planes besides primary, we consider the active
> -	 * planes should be in z-order and compose them associatively:
> -	 * ((primary <- overlay) <- cursor)
> -	 */
> -	for (i =3D 1; i < crtc_state->num_active_planes; i++)
> -		compose_plane(primary_plane_info,
> -			      crtc_state->active_planes[i]->frame_info,
> -			      *vaddr_out);
> +	kvfree(output_buffer.pixels);
> +free_stage_buffer:
> +	kvfree(stage_buffer.pixels);
> =20
> -	return 0;
> +	return ret;
>  }
> =20
>  /**
> @@ -222,13 +206,11 @@ void vkms_composer_worker(struct work_struct *work)
>  						struct vkms_crtc_state,
>  						composer_work);
>  	struct drm_crtc *crtc =3D crtc_state->base.crtc;
> +	struct vkms_writeback_job *active_wb =3D crtc_state->active_writeback;
>  	struct vkms_output *out =3D drm_crtc_to_vkms_output(crtc);
> -	struct vkms_frame_info *primary_plane_info =3D NULL;
> -	struct vkms_plane_state *act_plane =3D NULL;
>  	bool crc_pending, wb_pending;
> -	void *vaddr_out =3D NULL;
> -	u32 crc32 =3D 0;
>  	u64 frame_start, frame_end;
> +	u32 crc32 =3D 0;
>  	int ret;
> =20
>  	spin_lock_irq(&out->composer_lock);
> @@ -248,35 +230,19 @@ void vkms_composer_worker(struct work_struct *work)
>  	if (!crc_pending)
>  		return;
> =20
> -	if (crtc_state->num_active_planes >=3D 1) {
> -		act_plane =3D crtc_state->active_planes[0];
> -		if (act_plane->base.base.plane->type =3D=3D DRM_PLANE_TYPE_PRIMARY)
> -			primary_plane_info =3D act_plane->frame_info;
> -	}
> -
> -	if (!primary_plane_info)
> -		return;
> -
>  	if (wb_pending)
> -		vaddr_out =3D crtc_state->active_writeback->data[0].vaddr;
> +		ret =3D compose_active_planes(active_wb, crtc_state, &crc32);
> +	else
> +		ret =3D compose_active_planes(NULL, crtc_state, &crc32);
> =20
> -	ret =3D compose_active_planes(&vaddr_out, primary_plane_info,
> -				    crtc_state);
> -	if (ret) {
> -		if (ret =3D=3D -EINVAL && !wb_pending)
> -			kvfree(vaddr_out);
> +	if (ret)
>  		return;
> -	}
> -
> -	crc32 =3D compute_crc(vaddr_out, primary_plane_info);
> =20
>  	if (wb_pending) {
>  		drm_writeback_signal_completion(&out->wb_connector, 0);
>  		spin_lock_irq(&out->composer_lock);
>  		crtc_state->wb_pending =3D false;
>  		spin_unlock_irq(&out->composer_lock);
> -	} else {
> -		kvfree(vaddr_out);
>  	}
> =20
>  	/*
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/v=
kms_formats.c
> new file mode 100644
> index 000000000000..ca4bfcac686b
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -0,0 +1,155 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +#include <drm/drm_rect.h>
> +#include <linux/minmax.h>
> +
> +#include "vkms_formats.h"
> +
> +static size_t pixel_offset(const struct vkms_frame_info *frame_info, int=
 x, int y)
> +{
> +	return frame_info->offset + (y * frame_info->pitch)
> +				  + (x * frame_info->cpp);
> +}
> +
> +/*
> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coor=
dinates
> + *
> + * @frame_info: Buffer metadata
> + * @x: The x(width) coordinate of the 2D buffer
> + * @y: The y(Heigth) coordinate of the 2D buffer
> + *
> + * Takes the information stored in the frame_info, a pair of coordinates=
, and
> + * returns the address of the first color channel.
> + * This function assumes the channels are packed together, i.e. a color =
channel
> + * comes immediately after another in the memory. And therefore, this fu=
nction
> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> + */
> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> +				int x, int y)
> +{
> +	size_t offset =3D pixel_offset(frame_info, x, y);
> +
> +	return (u8 *)frame_info->map[0].vaddr + offset;
> +}
> +
> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_inf=
o, int y)
> +{
> +	int x_src =3D frame_info->src.x1 >> 16;
> +	int y_src =3D y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
> +
> +	return packed_pixels_addr(frame_info, x_src, y_src);
> +}
> +
> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> +				 const struct vkms_frame_info *frame_info, int y)
> +{
> +	struct pixel_argb_u16 *out_pixels =3D stage_buffer->pixels;
> +	u8 *src_pixels =3D get_packed_src_addr(frame_info, y);
> +	int x_limit =3D min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    stage_buffer->n_pixels);
> +
> +	for (size_t x =3D 0; x < x_limit; x++, src_pixels +=3D 4) {
> +		/*
> +		 * The 257 is the "conversion ratio". This number is obtained by the
> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
> +		 * the best color value in a pixel format with more possibilities.
> +		 * A similar idea applies to others RGB color conversions.
> +		 */
> +		out_pixels[x].a =3D (u16)src_pixels[3] * 257;
> +		out_pixels[x].r =3D (u16)src_pixels[2] * 257;
> +		out_pixels[x].g =3D (u16)src_pixels[1] * 257;
> +		out_pixels[x].b =3D (u16)src_pixels[0] * 257;
> +	}
> +}
> +
> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> +				 const struct vkms_frame_info *frame_info, int y)
> +{
> +	struct pixel_argb_u16 *out_pixels =3D stage_buffer->pixels;
> +	u8 *src_pixels =3D get_packed_src_addr(frame_info, y);
> +	int x_limit =3D min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    stage_buffer->n_pixels);
> +
> +	for (size_t x =3D 0; x < x_limit; x++, src_pixels +=3D 4) {
> +		out_pixels[x].a =3D (u16)0xffff;
> +		out_pixels[x].r =3D (u16)src_pixels[2] * 257;
> +		out_pixels[x].g =3D (u16)src_pixels[1] * 257;
> +		out_pixels[x].b =3D (u16)src_pixels[0] * 257;
> +	}
> +}
> +
> +/*
> + * The following  functions take an line of argb_u16 pixels from the
> + * src_buffer, convert them to a specific format, and store them in the
> + * destination.
> + *
> + * They are used in the `compose_active_planes` to convert and store a l=
ine
> + * from the src_buffer to the writeback buffer.
> + */
> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
> +				 const struct line_buffer *src_buffer, int y)
> +{
> +	int x_dst =3D frame_info->dst.x1;
> +	u8 *dst_pixels =3D packed_pixels_addr(frame_info, x_dst, y);
> +	struct pixel_argb_u16 *in_pixels =3D src_buffer->pixels;
> +	int x_limit =3D min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    src_buffer->n_pixels);
> +
> +	for (size_t x =3D 0; x < x_limit; x++, dst_pixels +=3D 4) {
> +		/*
> +		 * This sequence below is important because the format's byte order is
> +		 * in little-endian. In the case of the ARGB8888 the memory is
> +		 * organized this way:
> +		 *
> +		 * | Addr     | =3D blue channel
> +		 * | Addr + 1 | =3D green channel
> +		 * | Addr + 2 | =3D Red channel
> +		 * | Addr + 3 | =3D Alpha channel
> +		 */
> +		dst_pixels[3] =3D DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
> +		dst_pixels[2] =3D DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> +		dst_pixels[1] =3D DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> +		dst_pixels[0] =3D DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> +	}
> +}
> +
> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
> +				 const struct line_buffer *src_buffer, int y)
> +{
> +	int x_dst =3D frame_info->dst.x1;
> +	u8 *dst_pixels =3D packed_pixels_addr(frame_info, x_dst, y);
> +	struct pixel_argb_u16 *in_pixels =3D src_buffer->pixels;
> +	int x_limit =3D min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    src_buffer->n_pixels);
> +
> +	for (size_t x =3D 0; x < x_limit; x++, dst_pixels +=3D 4) {
> +		dst_pixels[3] =3D 0xff;
> +		dst_pixels[2] =3D DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> +		dst_pixels[1] =3D DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> +		dst_pixels[0] =3D DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> +	}
> +}
> +
> +frame_to_line_func get_frame_to_line_function(u32 format)
> +{
> +	switch (format) {
> +	case DRM_FORMAT_ARGB8888:
> +		return &ARGB8888_to_argb_u16;
> +	case DRM_FORMAT_XRGB8888:
> +		return &XRGB8888_to_argb_u16;
> +	default:
> +		return NULL;
> +	}
> +}
> +
> +line_to_frame_func get_line_to_frame_function(u32 format)
> +{
> +	switch (format) {
> +	case DRM_FORMAT_ARGB8888:
> +		return &argb_u16_to_ARGB8888;
> +	case DRM_FORMAT_XRGB8888:
> +		return &argb_u16_to_XRGB8888;
> +	default:
> +		return NULL;
> +	}
> +}
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/v=
kms_formats.h
> new file mode 100644
> index 000000000000..053ca42d5b31
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +#ifndef _VKMS_FORMATS_H_
> +#define _VKMS_FORMATS_H_
> +
> +#include "vkms_drv.h"
> +
> +frame_to_line_func get_frame_to_line_function(u32 format);
> +
> +line_to_frame_func get_line_to_frame_function(u32 format);
> +
> +#endif /* _VKMS_FORMATS_H_ */
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkm=
s_plane.c
> index 8adbfdc05e50..7a479a714565 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -10,6 +10,7 @@
>  #include <drm/drm_plane_helper.h>
> =20
>  #include "vkms_drv.h"
> +#include "vkms_formats.h"
^ this line no longer applies (needs to rebase), but I can manage it before=
 apply to drm-misc-next
> =20
>  static const u32 vkms_formats[] =3D {
>  	DRM_FORMAT_XRGB8888,
> @@ -100,6 +101,7 @@ static void vkms_plane_atomic_update(struct drm_plane=
 *plane,
>  	struct drm_shadow_plane_state *shadow_plane_state;
>  	struct drm_framebuffer *fb =3D new_state->fb;
>  	struct vkms_frame_info *frame_info;
> +	u32 fmt =3D fb->format->format;
> =20
>  	if (!new_state->crtc || !fb)
>  		return;
> @@ -116,6 +118,7 @@ static void vkms_plane_atomic_update(struct drm_plane=
 *plane,
>  	frame_info->offset =3D fb->offsets[0];
>  	frame_info->pitch =3D fb->pitches[0];
>  	frame_info->cpp =3D fb->format->cpp[0];
> +	vkms_plane_state->plane_read =3D get_frame_to_line_function(fmt);
>  }
> =20
>  static int vkms_plane_atomic_check(struct drm_plane *plane,
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms=
/vkms_writeback.c
> index c87f6c89e7b4..d2aabb52cb46 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -11,6 +11,7 @@
>  #include <drm/drm_gem_shmem_helper.h>
> =20
>  #include "vkms_drv.h"
> +#include "vkms_formats.h"
> =20
>  static const u32 vkms_wb_formats[] =3D {
>  	DRM_FORMAT_XRGB8888,
> @@ -123,6 +124,7 @@ static void vkms_wb_atomic_commit(struct drm_connecto=
r *conn,
>  	struct drm_framebuffer *fb =3D connector_state->writeback_job->fb;
>  	struct vkms_writeback_job *active_wb;
>  	struct vkms_frame_info *wb_frame_info;
> +	u32 wb_format =3D fb->format->format;
> =20
>  	if (!conn_state)
>  		return;
> @@ -140,6 +142,7 @@ static void vkms_wb_atomic_commit(struct drm_connecto=
r *conn,
>  	crtc_state->wb_pending =3D true;
>  	spin_unlock_irq(&output->composer_lock);
>  	drm_writeback_queue_job(wb_conn, connector_state);
> +	active_wb->wb_write =3D get_line_to_frame_function(wb_format);
>  }
> =20
>  static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs=
 =3D {
> --=20
> 2.30.2
>=20

--pzrs67g3khiydb7v--

--qhewaprldssdcmak
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEd8WOo/JViG+Tu+XIwqF3j0dLehwFAmMAvJsACgkQwqF3j0dL
ehxyvA//Xy8LOWWOasKj8YjxpUHBrNhSnB9kbkYEQrDgdoHu+9pn5VqJxQ+CfgT2
jmH8gUYQGSuWXB1ANXXUxwFyjLD23c7mMJJQd0pKH448EF7l2nMX1jS9gvmNvrqn
l9LffMvnqZJARKy87dLTtE1C8RyJEPm2bkjNYBoDqKt5ZZe881pTYHuG2H6nSAdC
GtTj1/l3UuIU9TF/MLz2VCmWF1Lb5z337eTMmAO05zKT2DpHofE0NE7ZpAmeR2iX
Jw6JfMHd6TJkUV3IF6zZQo9bdvRgaPg6fg6WUcpxanZZimwhNTRYrx3n/XEBss5C
8BCZrcxViRnNbYAcm1ZLqybRZdaIde4ASMLJI5mRjdUON4Q2ZqnmKg5JnWMaNcsF
lLZu3QP9SbRDuLOLltxEyI3+ZBR3nVVc28KeL67KRO91Gex/ucWfuln6zE6Vk2c7
uvs84nFNz60MboXQ/Y4TUnxeFm+NcgmlOzBp79UYytScX043LFEmC5P1/wjz9U7h
XuaXc9xJRfsI67cmSI1ftR8kfBxrWC2uf56OafLPsfYWTOC0d6BrxzcV2m8EkzVw
ur9jnAfOWo85CW215yoVEXsMgfJrvrJHuXdzXryrrDqrCNeifkLgze23KoO5AY0K
j/DZcHirG/GT07z9nVWde90ogGQbw9ZSdzWi5g0c+xlBa7qiQoo=
=FgBy
-----END PGP SIGNATURE-----

--qhewaprldssdcmak--