From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from relay2-d.mail.gandi.net (relay2-d.mail.gandi.net [217.70.183.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2278366B4C for ; Wed, 13 Mar 2024 17:45:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.183.194 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710351943; cv=none; b=CJyrMQ1RgLo653VBHS442S4LX+J1bYTtB6/vrFVryuMwFVcF2dSHPPt4weANMiQgvZIf2alYxIt5PsSQLxRHZd6HuW1z4nZFQ8r+uFYrBiwjHfWt7rIO352txmSC+qlRghvIs2WrOwZ+3+HlFR2QVrFcbVyrVKqbhkwtHceHfRM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710351943; c=relaxed/simple; bh=4m3MZLm+SLybI8mDO1PzEFLIuxCLsbs2u+oKHfliEaY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=UGsrYk5kzU1ntPVzxU6x3LP8MvcUl73m147j8N7dVrpdocJwMc0VFov96hsiQ4gE7eqoqZZV74uvH1Dul3FhX/+mLiujW423YSaa0Psm5lRAJV0nzKXB4YszmI91b5wYH/tRyW0MxoYYdpkBJrPLOIq/gYjR+fb9wUPf3dRhpA4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com; spf=pass smtp.mailfrom=bootlin.com; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b=VFADyqag; arc=none smtp.client-ip=217.70.183.194 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bootlin.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b="VFADyqag" Received: by mail.gandi.net (Postfix) with ESMTPSA id 56B3B40013; Wed, 13 Mar 2024 17:45:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bootlin.com; s=gm1; t=1710351937; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TnvwdimqSEcQBKzA3c0Z3gb3fjfIWYyWkufTOOgxbvw=; b=VFADyqag91NT0YlTG9GSgoVD2aYsTwh4zamQs+V+3/hrl//5Wt5SUrLPRxaMCULD1GcUNk GH8Mwc1e4luA6ZMmlT24PUJGFoaOKP1bd9KaD1T16AYYKLU2MMW5GnjLwxynzsSa4lqfL2 nFa/Zwq9W/BhGev0ZBh0RV1hMUF8vYwAfGzqko5BuPxrktbqtKXsAdMj2MJmLmIFG/G9y3 mIqdQeEgOmJzNKOjlAZ+OsbeHuNlVCC0FiYRxTc6cNh/ZPoQN/t0+S9bRXdQEQ3P4xmVLC cUqdO6dSzEhbNUVojUUUPsXQSMl7nymAPfQjQCXaF8Z7jZRsWqF9nNYWp8zqxg== From: Louis Chauvet Date: Wed, 13 Mar 2024 18:45:04 +0100 Subject: [PATCH v5 10/16] drm/vkms: Re-introduce line-per-line composition algorithm Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240313-yuv-v5-10-e610cbd03f52@bootlin.com> References: <20240313-yuv-v5-0-e610cbd03f52@bootlin.com> In-Reply-To: <20240313-yuv-v5-0-e610cbd03f52@bootlin.com> To: Rodrigo Siqueira , Melissa Wen , =?utf-8?q?Ma=C3=ADra_Canal?= , Haneen Mohammed , Daniel Vetter , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , arthurgrillo@riseup.net, Jonathan Corbet , pekka.paalanen@haloniitty.fi Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, jeremie.dautheribes@bootlin.com, miquel.raynal@bootlin.com, thomas.petazzoni@bootlin.com, seanpaul@google.com, marcheu@google.com, nicolejadeyee@google.com, Louis Chauvet X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=27042; i=louis.chauvet@bootlin.com; h=from:subject:message-id; bh=4m3MZLm+SLybI8mDO1PzEFLIuxCLsbs2u+oKHfliEaY=; b=owEBbQKS/ZANAwAIASCtLsZbECziAcsmYgBl8eY0UeEenlAI/xAzloeo3dN8Oh3dI4lPyXNhMyj8 Mr/o+w+JAjMEAAEIAB0WIQRPj7g/vng8MQxQWQQgrS7GWxAs4gUCZfHmNAAKCRAgrS7GWxAs4phqEA CVB5iKL+GvsTu5349qw+/zayreDeCi8OqkX98iBtpAWDgKY0dBro93lRcCcZbukdm3U4q+xLLGA2xU BDw1Un3Q+eKroxoT58Xcm4io1gFuUiSIHOquiXpG8sndxUupskBMzRaI5cnd0FlmlZQ98JDBJmzymS yHSk2T1WkmBj61sHe1iwyCSejFeZXM7ZO/QgbXwvQQnQibB641mKFu4mcnyT+XYv3w48iQeOBLluzz XAT8rXjUzQlXFwMuw58SghR779HHu0EXLZcYQ7Q6e/4GG7zlDzIf85tOC0HQcn6gzDsNDaqcP9+rE4 aJTvHuNvOmexlVEhhUWETFwUHnBvQe0+eK0Aam+tVt2rf6Fn30GrU1NycqwL7L9Ost+1h3NCYkICJv pGedfu1NyWB7wyyMSyQ4IFoWpkHW8qgRVkw79dKJY74Cb79j+SycTaRIk5xtIFz/pu4jHKwMyF5Uuh +QcmkKOR8vxs2eDP94mdnaX/RTWsOsamSW2H6Jq3kqxRnmqbzXMX8kP7kTGP7tIRU4QIHWWByRrhTz XpnjeXu9V0u9jeG8C+f21Unvap7K24+2UhpD01a1T6rFpg2u2hvDPoipe//c+jbCEdpVrINSd/7jGJ U5DvCefzKgI/wN/Zo5n/IhgFZUOoGFPQmdJr8qP7Aq/0/LIMLXqtz0g4vsDQ== X-Developer-Key: i=louis.chauvet@bootlin.com; a=openpgp; fpr=8B7104AE9A272D6693F527F2EC1883F55E0B40A5 X-GND-Sasl: louis.chauvet@bootlin.com Re-introduce a line-by-line composition algorithm for each pixel format. This allows more performance by not requiring an indirection per pixel read. This patch is focused on readability of the code. Line-by-line composition was introduced by [1] but rewritten back to pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact on performance, and it was merged. This patch is almost a revert of [2], but in addition efforts have been made to increase readability and maintainability of the rotation handling. The blend function is now divided in two parts: - Transformation of coordinates from the output referential to the source referential - Line conversion and blending Most of the complexity of the rotation management is avoided by using drm_rect_* helpers. The remaining complexity is around the clipping, to avoid reading/writing outside source/destination buffers. The pixel conversion is now done line-by-line, so the read_pixel_t was replaced with read_pixel_line_t callback. This way the indirection is only required once per line and per plane, instead of once per pixel and per plane. The read_line_t callbacks are very similar for most pixel format, but it is required to avoid performance impact. Some helpers for color conversion were introduced to avoid code repetition: - *_to_argb_u16: perform colors conversion. They should be inlined by the compiler, and they are used to avoid repetition between multiple variants of the same format (argb/xrgb and maybe in the future for formats like bgr formats). This new algorithm was tested with: - kms_plane (for color conversions) - kms_rotation_crc (for rotations of planes) - kms_cursor_crc (for translations of planes) - kms_rotation (for all rotations and formats combinations) [3] The performance gain was mesured with: - kms_fb_stress [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept new formats") https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@gmail.com/ [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion functionality") https://lore.kernel.org/all/20230418130525.128733-2-mcanal@igalia.com/ [3]: Signed-off-by: Louis Chauvet --- drivers/gpu/drm/vkms/vkms_composer.c | 167 +++++++++++++++++++------ drivers/gpu/drm/vkms/vkms_drv.h | 27 ++-- drivers/gpu/drm/vkms/vkms_formats.c | 236 ++++++++++++++++++++++------------- drivers/gpu/drm/vkms/vkms_formats.h | 2 +- drivers/gpu/drm/vkms/vkms_plane.c | 5 +- 5 files changed, 292 insertions(+), 145 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c index 989bcf59f375..5d78c33dbf41 100644 --- a/drivers/gpu/drm/vkms/vkms_composer.c +++ b/drivers/gpu/drm/vkms/vkms_composer.c @@ -41,7 +41,7 @@ static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer, struct line_buffer *output_buffer, int x_start, int pixel_count) { struct pixel_argb_u16 *out = &output_buffer->pixels[x_start]; - const struct pixel_argb_u16 *in = stage_buffer->pixels; + const struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start]; for (int i = 0; i < pixel_count; i++) { out[i].a = (u16)0xffff; @@ -51,33 +51,6 @@ static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer, } } -static int get_y_pos(struct vkms_frame_info *frame_info, int y) -{ - if (frame_info->rotation & DRM_MODE_REFLECT_Y) - return drm_rect_height(&frame_info->rotated) - y - 1; - - switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) { - case DRM_MODE_ROTATE_90: - return frame_info->rotated.x2 - y - 1; - case DRM_MODE_ROTATE_270: - return y + frame_info->rotated.x1; - default: - return y; - } -} - -static bool check_limit(struct vkms_frame_info *frame_info, int pos) -{ - if (drm_rotation_90_or_270(frame_info->rotation)) { - if (pos >= 0 && pos < drm_rect_width(&frame_info->rotated)) - return true; - } else { - if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2) - return true; - } - - return false; -} static void fill_background(const struct pixel_argb_u16 *background_color, struct line_buffer *output_buffer) @@ -215,34 +188,146 @@ static void blend(struct vkms_writeback_job *wb, { struct vkms_plane_state **plane = crtc_state->active_planes; u32 n_active_planes = crtc_state->num_active_planes; - int y_pos, x_dst, x_limit; const struct pixel_argb_u16 background_color = { .a = 0xffff }; - size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay; + int crtc_y_limit = crtc_state->base.crtc->mode.vdisplay; + int crtc_x_limit = crtc_state->base.crtc->mode.hdisplay; /* * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary * complexity to avoid poor blending performance. * - * The function vkms_compose_row is used to read a line, pixel-by-pixel, into the staging - * buffer. + * The function pixel_read_line callback is used to read a line, using an efficient + * algorithm for a specific format, into the staging buffer. */ for (size_t y = 0; y < crtc_y_limit; y++) { fill_background(&background_color, output_buffer); /* The active planes are composed associatively in z-order. */ for (size_t i = 0; i < n_active_planes; i++) { - x_dst = plane[i]->frame_info->dst.x1; - x_limit = min_t(size_t, drm_rect_width(&plane[i]->frame_info->dst), - stage_buffer->n_pixels); - y_pos = get_y_pos(plane[i]->frame_info, y); + struct vkms_plane_state *current_plane = plane[i]; - if (!check_limit(plane[i]->frame_info, y_pos)) + /* Avoid rendering useless lines */ + if (y < current_plane->frame_info->dst.y1 || + y >= current_plane->frame_info->dst.y2) continue; - vkms_compose_row(stage_buffer, plane[i], y_pos); - pre_mul_alpha_blend(stage_buffer, output_buffer, x_dst, x_limit); + /* + * dst_line is the line to copy. The initial coordinates are inside the + * destination framebuffer, and then drm_rect_* helpers are used to + * compute the correct position into the source framebuffer. + */ + struct drm_rect dst_line = DRM_RECT_INIT( + current_plane->frame_info->dst.x1, y, + drm_rect_width(¤t_plane->frame_info->dst), 1); + struct drm_rect tmp_src; + + drm_rect_fp_to_int(&tmp_src, ¤t_plane->frame_info->src); + + /* + * [1]: Clamping src_line to the crtc_x_limit to avoid writing outside of + * the destination buffer + */ + dst_line.x1 = max_t(int, dst_line.x1, 0); + dst_line.x2 = min_t(int, dst_line.x2, crtc_x_limit); + /* The destination is completely outside of the crtc. */ + if (dst_line.x2 <= dst_line.x1) + continue; + + struct drm_rect src_line = dst_line; + + /* + * Transform the coordinate x/y from the crtc to coordinates into + * coordinates for the src buffer. + * + * - Cancel the offset of the dst buffer. + * - Invert the rotation. This assumes that + * dst = drm_rect_rotate(src, rotation) (dst and src have the + * same size, but can be rotated). + * - Apply the offset of the source rectangle to the coordinate. + */ + drm_rect_translate(&src_line, -current_plane->frame_info->dst.x1, + -current_plane->frame_info->dst.y1); + drm_rect_rotate_inv(&src_line, + drm_rect_width(&tmp_src), + drm_rect_height(&tmp_src), + current_plane->frame_info->rotation); + drm_rect_translate(&src_line, tmp_src.x1, tmp_src.y1); + + /* Get the correct reading direction in the source buffer. */ + + enum pixel_read_direction direction = + direction_for_rotation(current_plane->frame_info->rotation); + + int x_start = src_line.x1; + int y_start = src_line.y1; + int pixel_count; + /* [2]: Compute and clamp the number of pixel to read */ + if (direction == READ_LEFT_TO_RIGHT || direction == READ_RIGHT_TO_LEFT) { + /* + * In horizontal reading, the src_line width is the number of pixel + * to read + */ + pixel_count = drm_rect_width(&src_line); + if (x_start < 0) { + pixel_count += x_start; + x_start = 0; + } + if (x_start + pixel_count > current_plane->frame_info->fb->width) { + pixel_count = + (int)current_plane->frame_info->fb->width - x_start; + } + } else { + /* + * In vertical reading, the src_line height is the number of pixel + * to read + */ + pixel_count = drm_rect_height(&src_line); + if (y_start < 0) { + pixel_count += y_start; + y_start = 0; + } + if (y_start + pixel_count > current_plane->frame_info->fb->height) { + pixel_count = + (int)current_plane->frame_info->fb->width - y_start; + } + } + + if (pixel_count <= 0) { + /* Nothing to read, so avoid multiple function calls for nothing */ + continue; + } + + /* + * Modify the starting point to take in account the rotation + * + * src_line is the top-left corner, so when reading READ_RIGHT_TO_LEFT or + * READ_BOTTOM_TO_TOP, it must be changed to the top-right/bottom-left + * corner. + */ + if (direction == READ_RIGHT_TO_LEFT) { + // x_start is now the right point + x_start += pixel_count - 1; + } else if (direction == READ_BOTTOM_TO_TOP) { + // y_start is now the bottom point + y_start += pixel_count - 1; + } + + /* + * Perform the conversion and the blending + * + * Here we know that the read line (x_start, y_start, pixel_count) is + * inside the source buffer [2] and we don't write outside the stage + * buffer [1] + */ + current_plane->pixel_read_line( + current_plane, x_start, y_start, direction, pixel_count, + &stage_buffer->pixels[current_plane->frame_info->dst.x1]); + + pre_mul_alpha_blend(stage_buffer, output_buffer, + current_plane->frame_info->dst.x1, + pixel_count); } apply_lut(crtc_state, output_buffer); @@ -250,7 +335,7 @@ static void blend(struct vkms_writeback_job *wb, *crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size); if (wb) - vkms_writeback_row(wb, output_buffer, y_pos); + vkms_writeback_row(wb, output_buffer, y); } } @@ -261,7 +346,7 @@ static int check_format_funcs(struct vkms_crtc_state *crtc_state, u32 n_active_planes = crtc_state->num_active_planes; for (size_t i = 0; i < n_active_planes; i++) - if (!planes[i]->pixel_read) + if (!planes[i]->pixel_read_line) return -1; if (active_wb && !active_wb->pixel_write) diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h index 985e7a92b7bc..23e1d247468d 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.h +++ b/drivers/gpu/drm/vkms/vkms_drv.h @@ -39,7 +39,6 @@ struct vkms_frame_info { struct drm_framebuffer *fb; struct drm_rect src, dst; - struct drm_rect rotated; struct iosys_map map[DRM_FORMAT_MAX_PLANES]; unsigned int rotation; }; @@ -80,26 +79,37 @@ enum pixel_read_direction { READ_LEFT_TO_RIGHT }; +struct vkms_plane_state; + /** - * typedef pixel_read_t - These functions are used to read a pixel in the source frame, + * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame, * convert it to `struct pixel_argb_u16` and write it to @out_pixel. * - * @in_pixel: Pointer to the pixel to read - * @out_pixel: Pointer to write the converted pixel + * @plane: Plane used as source for the pixel value + * @x_start: X (width) coordinate of the first pixel to copy. The caller must ensure that x_start + * is positive and smaller than @plane->frame_info->fb->width. + * @y_start: Y (width) coordinate of the first pixel to copy. The caller must ensure that y_start + * is positive and smaller than @plane->frame_info->fb->height. + * @direction: Direction to use for the copy, starting at @x_start/@y_start + * @count: Number of pixels to copy + * @out_pixel: Pointer where to write the pixel values. They will be written from @out_pixel[0] + * to @out_pixel[@count]. The caller must ensure that out_pixel have a length of at least @count. */ -typedef void (*pixel_read_t)(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel); +typedef void (*pixel_read_line_t)(const struct vkms_plane_state *plane, int x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]); /** * vkms_plane_state - Driver specific plane state * @base: base plane state * @frame_info: data required for composing computation - * @pixel_read: function to read a pixel in this plane. The creator of a vkms_plane_state must - * ensure that this pointer is valid + * @pixel_read_line: function to read a pixel line in this plane. The creator of a vkms_plane_state + * must ensure that this pointer is valid */ struct vkms_plane_state { struct drm_shadow_plane_state base; struct vkms_frame_info *frame_info; - pixel_read_t pixel_read; + pixel_read_line_t pixel_read_line; }; struct vkms_plane { @@ -204,7 +214,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const char *source_name, /* Composer Support */ void vkms_composer_worker(struct work_struct *work); void vkms_set_composer(struct vkms_output *out, bool enabled); -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y); void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer *src_buffer, int y); /* Writeback */ diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c index 743b6fd06db5..1449a0e6c706 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.c +++ b/drivers/gpu/drm/vkms/vkms_formats.c @@ -105,77 +105,45 @@ static int get_step_next_block(struct drm_framebuffer *fb, enum pixel_read_direc return 0; } -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y, - int plane_index) -{ - int x_src = frame_info->src.x1 >> 16; - int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16); - u8 *addr; - int rem_x, rem_y; - - packed_pixels_addr(frame_info, x_src, y_src, plane_index, &addr, &rem_x, &rem_y); - return addr; -} - -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x) -{ - if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270)) - return limit - x - 1; - return x; -} - /* - * The following functions take pixel data from the buffer and convert them to the format + * The following functions take pixel data (a, r, g, b, pixel, ...), convert them to the format * ARGB16161616 in out_pixel. * - * They are used in the `vkms_compose_row` function to handle multiple formats. + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats. */ -static void ARGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) +static struct pixel_argb_u16 argb_u16_from_u8888(int a, int r, int g, int b) { + struct pixel_argb_u16 out_pixel; /* * The 257 is the "conversion ratio". This number is obtained by the * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get * the best color value in a pixel format with more possibilities. * A similar idea applies to others RGB color conversions. */ - out_pixel->a = (u16)in_pixel[3] * 257; - out_pixel->r = (u16)in_pixel[2] * 257; - out_pixel->g = (u16)in_pixel[1] * 257; - out_pixel->b = (u16)in_pixel[0] * 257; -} + out_pixel.a = (u16)a * 257; + out_pixel.r = (u16)r * 257; + out_pixel.g = (u16)g * 257; + out_pixel.b = (u16)b * 257; -static void XRGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) -{ - out_pixel->a = (u16)0xffff; - out_pixel->r = (u16)in_pixel[2] * 257; - out_pixel->g = (u16)in_pixel[1] * 257; - out_pixel->b = (u16)in_pixel[0] * 257; + return out_pixel; } -static void ARGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) +static struct pixel_argb_u16 argb_u16_from_u16161616(int a, int r, int g, int b) { - u16 *pixel = (u16 *)in_pixel; + struct pixel_argb_u16 out_pixel; - out_pixel->a = le16_to_cpu(pixel[3]); - out_pixel->r = le16_to_cpu(pixel[2]); - out_pixel->g = le16_to_cpu(pixel[1]); - out_pixel->b = le16_to_cpu(pixel[0]); -} + out_pixel.a = le16_to_cpu(a); + out_pixel.r = le16_to_cpu(r); + out_pixel.g = le16_to_cpu(g); + out_pixel.b = le16_to_cpu(b); -static void XRGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) -{ - u16 *pixel = (u16 *)in_pixel; - - out_pixel->a = (u16)0xffff; - out_pixel->r = le16_to_cpu(pixel[2]); - out_pixel->g = le16_to_cpu(pixel[1]); - out_pixel->b = le16_to_cpu(pixel[0]); + return out_pixel; } -static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) +static struct pixel_argb_u16 argb_u16_from_RGB565(const u16 *pixel) { - u16 *pixel = (u16 *)in_pixel; + struct pixel_argb_u16 out_pixel; s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31)); s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63)); @@ -185,12 +153,26 @@ static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pi s64 fp_g = drm_int2fixp((rgb_565 >> 5) & 0x3f); s64 fp_b = drm_int2fixp(rgb_565 & 0x1f); - out_pixel->a = (u16)0xffff; - out_pixel->r = drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio)); - out_pixel->g = drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio)); - out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio)); + out_pixel.a = (u16)0xffff; + out_pixel.r = drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio)); + out_pixel.g = drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio)); + out_pixel.b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio)); + + return out_pixel; } +/* + * The following functions are read_line function for each pixel format supported by VKMS. + * + * They read a line starting at the point @x_start,@y_start following the @direction. The result + * is stored in @out_pixel and in the format ARGB16161616. + * + * Those function are very similar, but it is required for performance reason. In the past, some + * experiment were done, and with a generic loop the performance are very reduced [1]. + * + * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/ + */ + /** * black_to_argb_u16() - pixel_read callback which always read black * @@ -198,42 +180,116 @@ static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pi * It is used to avoid null pointer to be used as a function. In theory, this function should * never be called, except if you found a bug in the driver/DRM core. */ -static void black_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) +static void black_to_argb_u16(const struct vkms_plane_state *plane, int x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) { - out_pixel->a = (u16)0xFFFF; - out_pixel->r = 0; - out_pixel->g = 0; - out_pixel->b = 0; + struct pixel_argb_u16 *end = out_pixel + count; + + while (out_pixel < end) { + *out_pixel = argb_u16_from_u8888(255, 0, 0, 0); + out_pixel += 1; + } } -/** - * vkms_compose_row - compose a single row of a plane - * @stage_buffer: output line with the composed pixels - * @plane: state of the plane that is being composed - * @y: y coordinate of the row - * - * This function composes a single row of a plane. It gets the source pixels - * through the y coordinate (see get_packed_src_addr()) and goes linearly - * through the source pixel, reading the pixels and converting it to - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270, - * the source pixels are not traversed linearly. The source pixels are queried - * on each iteration in order to traverse the pixels vertically. - */ -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y) +static void ARGB8888_read_line(const struct vkms_plane_state *plane, int x_start, int y_start, + enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) { - struct pixel_argb_u16 *out_pixels = stage_buffer->pixels; - struct vkms_frame_info *frame_info = plane->frame_info; - u8 *src_pixels = get_packed_src_addr(frame_info, y, 0); - int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels); + struct pixel_argb_u16 *end = out_pixel + count; + u8 *src_pixels; + int rem_x, rem_y; + + packed_pixels_addr(plane->frame_info, x_start, y_start, 0, &src_pixels, &rem_x, &rem_y); + + int step = get_step_next_block(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + u8 *px = (u8 *)src_pixels; + *out_pixel = argb_u16_from_u8888(px[3], px[2], px[1], px[0]); + out_pixel += 1; + src_pixels += step; + } +} + +static void XRGB8888_read_line(const struct vkms_plane_state *plane, int x_start, int y_start, + enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end = out_pixel + count; + u8 *src_pixels; + int rem_x, rem_y; + + packed_pixels_addr(plane->frame_info, x_start, y_start, 0, &src_pixels, &rem_x, &rem_y); + + int step = get_step_next_block(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + u8 *px = (u8 *)src_pixels; + *out_pixel = argb_u16_from_u8888(255, px[2], px[1], px[0]); + out_pixel += 1; + src_pixels += step; + } +} + +static void ARGB16161616_read_line(const struct vkms_plane_state *plane, int x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end = out_pixel + count; + u8 *src_pixels; + int rem_x, rem_y; + + packed_pixels_addr(plane->frame_info, x_start, y_start, 0, &src_pixels, &rem_x, &rem_y); + + int step = get_step_next_block(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + u16 *px = (u16 *)src_pixels; + *out_pixel = argb_u16_from_u16161616(px[3], px[2], px[1], px[0]); + out_pixel += 1; + src_pixels += step; + } +} + +static void XRGB16161616_read_line(const struct vkms_plane_state *plane, int x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end = out_pixel + count; + u8 *src_pixels; + int rem_x, rem_y; + + packed_pixels_addr(plane->frame_info, x_start, y_start, 0, &src_pixels, &rem_x, &rem_y); + + int step = get_step_next_block(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + u16 *px = (u16 *)src_pixels; + *out_pixel = argb_u16_from_u16161616(0xFFFF, px[2], px[1], px[0]); + out_pixel += 1; + src_pixels += step; + } +} + +static void RGB565_read_line(const struct vkms_plane_state *plane, int x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end = out_pixel + count; + u8 *src_pixels; + int rem_x, rem_y; + + packed_pixels_addr(plane->frame_info, x_start, y_start, 0, &src_pixels, &rem_x, &rem_y); - for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) { - int x_pos = get_x_position(frame_info, limit, x); + int step = get_step_next_block(plane->frame_info->fb, direction, 0); - if (drm_rotation_90_or_270(frame_info->rotation)) - src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1, 0) - + frame_info->fb->format->cpp[0] * y; + while (out_pixel < end) { + u16 *px = (u16 *)src_pixels; - plane->pixel_read(src_pixels, &out_pixels[x_pos]); + *out_pixel = argb_u16_from_RGB565(px); + out_pixel += 1; + src_pixels += step; } } @@ -343,25 +399,25 @@ void vkms_writeback_row(struct vkms_writeback_job *wb, } /** - * Retrieve the correct read_pixel function for a specific format. + * Retrieve the correct read_line function for a specific format. * If the format is not supported by VKMS a warn is emitted and a dummy "always read black" * function is returned. * * @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h]) */ -pixel_read_t get_pixel_read_function(u32 format) +pixel_read_line_t get_pixel_read_line_function(u32 format) { switch (format) { case DRM_FORMAT_ARGB8888: - return &ARGB8888_to_argb_u16; + return &ARGB8888_read_line; case DRM_FORMAT_XRGB8888: - return &XRGB8888_to_argb_u16; + return &XRGB8888_read_line; case DRM_FORMAT_ARGB16161616: - return &ARGB16161616_to_argb_u16; + return &ARGB16161616_read_line; case DRM_FORMAT_XRGB16161616: - return &XRGB16161616_to_argb_u16; + return &XRGB16161616_read_line; case DRM_FORMAT_RGB565: - return &RGB565_to_argb_u16; + return &RGB565_read_line; default: /* * This is a bug in vkms_plane_atomic_check. All the supported diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h index 3ecea4563254..8d2bef95ff79 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.h +++ b/drivers/gpu/drm/vkms/vkms_formats.h @@ -5,7 +5,7 @@ #include "vkms_drv.h" -pixel_read_t get_pixel_read_function(u32 format); +pixel_read_line_t get_pixel_read_line_function(u32 format); pixel_write_t get_pixel_write_function(u32 format); diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c index 10e9b23dab28..8875bed76410 100644 --- a/drivers/gpu/drm/vkms/vkms_plane.c +++ b/drivers/gpu/drm/vkms/vkms_plane.c @@ -112,7 +112,6 @@ static void vkms_plane_atomic_update(struct drm_plane *plane, frame_info = vkms_plane_state->frame_info; memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect)); memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect)); - memcpy(&frame_info->rotated, &new_state->dst, sizeof(struct drm_rect)); frame_info->fb = fb; memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map)); drm_framebuffer_get(frame_info->fb); @@ -122,10 +121,8 @@ static void vkms_plane_atomic_update(struct drm_plane *plane, DRM_MODE_REFLECT_X | DRM_MODE_REFLECT_Y); - drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated), - drm_rect_height(&frame_info->rotated), frame_info->rotation); - vkms_plane_state->pixel_read = get_pixel_read_function(fmt); + vkms_plane_state->pixel_read_line = get_pixel_read_line_function(fmt); } static int vkms_plane_atomic_check(struct drm_plane *plane, -- 2.43.0