On Sat, 23 Apr 2022 15:53:20 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> I forgot to respond some points from your review.
> 
> On 4/23/22 13:04, Igor Torrente wrote:
> > Hi Pekka,
> > 
> > On 4/20/22 09:36, Pekka Paalanen wrote:  
> >> On Mon,  4 Apr 2022 17:45:12 -0300
> >> Igor Torrente <igormtorrente@gmail.com> wrote:
> >>  
> >>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
> >>> as a color input.
> >>>
> >>> This patch refactors all the functions related to the plane composition
> >>> to overcome this limitation.
> >>>
> >>> A new internal format(`struct pixel`) is introduced to deal with all  
> >>
> >> Hi,
> >>
> >> struct pixel_argb_u16 was added in the previous patch.  
> > 
> > I will fix it. Thanks!

...

> >>> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
> >>> +				 struct vkms_crtc_state *crtc_state,
> >>> +				 u32 *crc32)
> >>>    {
> >>> +	int line_width, ret = 0, pixel_size = sizeof(struct pixel_argb_u16);
> >>> +	struct vkms_frame_info *primary_plane_info = NULL;
> >>> +	struct line_buffer output_buffer, stage_buffer;
> >>> +	struct vkms_plane_state *act_plane = NULL;
> >>> +	u32 wb_format;
> >>>    
> >>> +	if (WARN_ON(pixel_size != 8))  
> >>
> >> Isn't there a compile-time assert macro for this? Having to actually
> >> run VKMS to check for this reduces the chances of finding it early.
> >> What's the reason for this check anyway?  
> 
> Yes, it exists.
> 
> include/linux/build_bug.h:1:#define static_assert(expr, ...) 
> __static_assert(expr, ##__VA_ARGS__, #expr)
> 
> I didn't add it because I can imagine some people very mad if the kernel 
> did not compile because of vkms.

But that would mean that VKMS is broken on those platforms. You'd
better know which platforms VKMS is broken, so the Kconfig can stop
VKMS from being built there at all. Or better, fix it before anyone
needs VKMS there.

> This check exists so we can call `crc32_le` for the entire line instead
> doing it for each channel of each pixel in case `struct `pixel_argb_u16`
> had any gap added by the compiler between the struct fields.

Oh the CRC computation. Good point.

Can you add a comment about that with the check?

> >>  
> >>> +		return -EINVAL;
> >>> +
> >>> +	if (crtc_state->num_active_planes >= 1) {
> >>> +		act_plane = crtc_state->active_planes[0];
> >>> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> >>> +			primary_plane_info = act_plane->frame_info;  
> >>
> >> After the next patch, do you even need the primary plane for anything
> >> specifically?  
> 
> Yeah, I will not need it anymore.
> 
> >> There is the map_is_null check below, but that should be
> >> done on all planes in the array, right?  
> 
> Yes, I guess so. And I don't know why it only checks for the 
> primary_plane TBH.

Maybe a left-over from times when it didn't have anything but a primary
plane?

> >>
> >> I suspect the next patch, or another patch in this series, should just
> >> delete this chunk.  
> I should, and I will in the V6 of next patch.
> 
> > 
> > 
> >   
> >>  
> >>>    	}
> >>>    
> >>> +	if (!primary_plane_info)
> >>> +		return -EINVAL;
> >>> +
> >>>    	if (WARN_ON(dma_buf_map_is_null(&primary_plane_info->map[0])))
> >>>    		return -EINVAL;
> >>>    
> >>> +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
> >>> +		return -EINVAL;
> >>>    
> >>> +	line_width = drm_rect_width(&primary_plane_info->dst);
> >>> +	stage_buffer.n_pixels = line_width;
> >>> +	output_buffer.n_pixels = line_width;
> >>>    
> >>> +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> >>> +	if (!stage_buffer.pixels) {
> >>> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
> >>> +		return -ENOMEM;
> >>> +	}
> >>>    
> >>> +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> >>> +	if (!output_buffer.pixels) {
> >>> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
> >>> +		ret = -ENOMEM;
> >>> +		goto free_stage_buffer;
> >>> +	}
> >>> +
> >>> +	if (active_wb) {
> >>> +		struct vkms_frame_info *wb_frame_info = &active_wb->frame_info;
> >>> +
> >>> +		wb_format = wb_frame_info->fb->format->format;  
> >>
> >> I don't see wb_format being used, is it?  
> > 
> > This is probably a leftover from the last versions. Thanks for catching
> > it.
> >   
> >>  
> >>> +		wb_frame_info->src = primary_plane_info->src;
> >>> +		wb_frame_info->dst = primary_plane_info->dst;
> >>> +	}
> >>> +
> >>> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
> >>> +	      &output_buffer, (s64)line_width * pixel_size);  
> >>
> >> What's the (s64) doing here?
> >>
> >> Are byte sizes not usually expressed with size_t or ssize_t types, or
> >> is the kernel convention to use u64 and s64?
> >>
> >> This makes me suspect that pixel_offset() and friends in vkms_format.c
> >> are going to need fixing as well. int type overflows at 2G.  
> > 
> > 
> > Yeah, I should be using size_t in all these places.
> >   
> >>  
> >>> +
> >>> +	kvfree(output_buffer.pixels);
> >>> +free_stage_buffer:
> >>> +	kvfree(stage_buffer.pixels);
> >>> +can
> >>> +	return ret;
> >>>    }
> >>>    
> >>>    /**
> >>> @@ -222,13 +204,11 @@ void vkms_composer_worker(struct work_struct *work)
> >>>    						struct vkms_crtc_state,
> >>>    						composer_work);
> >>>    	struct drm_crtc *crtc = crtc_state->base.crtc;
> >>> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
> >>>    	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
> >>>    	bool crc_pending, wb_pending;
> >>>    	u64 frame_start, frame_end;
> >>> +	u32 crc32 = 0;
> >>>    	int ret;
> >>>    
> >>>    	spin_lock_irq(&out->composer_lock);
> >>> @@ -248,35 +228,19 @@ void vkms_composer_worker(struct work_struct *work)
> >>>    	if (!crc_pending)
> >>>    		return;
> >>>    
> >>>    	if (wb_pending)
> >>> +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
> >>> +	else
> >>> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
> >>>    
> >>> +	if (ret)
> >>>    		return;
> >>>    
> >>>    	if (wb_pending) {
> >>>    		drm_writeback_signal_completion(&out->wb_connector, 0);
> >>>    		spin_lock_irq(&out->composer_lock);
> >>>    		crtc_state->wb_pending = false;
> >>>    		spin_unlock_irq(&out->composer_lock);
> >>>    	}
> >>>    
> >>>    	/*
> >>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> >>> new file mode 100644
> >>> index 000000000000..931a61405d6a
> >>> --- /dev/null
> >>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> >>> @@ -0,0 +1,151 @@
> >>> +// SPDX-License-Identifier: GPL-2.0+
> >>> +
> >>> +#include <drm/drm_rect.h>
> >>> +#include <linux/minmax.h>
> >>> +
> >>> +#include "vkms_formats.h"
> >>> +
> >>> +static int pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> >>> +{
> >>> +	return frame_info->offset + (y * frame_info->pitch)
> >>> +				  + (x * frame_info->cpp);
> >>> +}
> >>> +
> >>> +/*
> >>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> >>> + *
> >>> + * @frame_info: Buffer metadata
> >>> + * @x: The x(width) coordinate of the 2D buffer
> >>> + * @y: The y(Heigth) coordinate of the 2D buffercan
> >>> + *
> >>> + * Takes the information stored in the frame_info, a pair of coordinates, and
> >>> + * returns the address of the first color channel.
> >>> + * This function assumes the channels are packed together, i.e. a color channel
> >>> + * comes immediately after another in the memory. And therefore, this function
> >>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> >>> + */
> >>> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> >>> +				int x, int y)
> >>> +{
> >>> +	int offset = pixel_offset(frame_info, x, y);
> >>> +
> >>> +	return (u8 *)frame_info->map[0].vaddr + offset;
> >>> +}
> >>> +
> >>> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> >>> +{
> >>> +	int x_src = frame_info->src.x1 >> 16;
> >>> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
> >>> +
> >>> +	return packed_pixels_addr(frame_info, x_src, y_src);
> >>> +}
> >>> +
> >>> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> >>> +				 const struct vkms_frame_info *frame_info, int y)
> >>> +{
> >>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> >>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> >>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> >>> +			       stage_buffer->n_pixels);
> >>> +
> >>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> >>> +		/*
> >>> +		 * The 257 is the "conversion ratio". This number is obtained by the
> >>> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
> >>> +		 * the best color value in a pixel format with more possibilities.
> >>> +		 * A similar idea applies to others RGB color conversions.
> >>> +		 */
> >>> +		out_pixels[x].a = (u16)src_pixels[3] * 257;
> >>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> >>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> >>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> >>> +	}
> >>> +}
> >>> +
> >>> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> >>> +				 const struct vkms_frame_info *frame_info, int y)
> >>> +{
> >>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> >>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> >>> +	int x, x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> >>> +			       stage_buffer->n_pixels);
> >>> +
> >>> +	for (x = 0; x < x_limit; x++, src_pixels += 4) {
> >>> +		out_pixels[x].a = (u16)0xffff;
> >>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> >>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> >>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> >>> +	}
> >>> +}
> >>> +
> >>> +/*
> >>> + * The following  functions take an line of argb_u16 pixels from the
> >>> + * src_buffer, convert them to a specific format, and store them in the
> >>> + * destination.
> >>> + *
> >>> + * They are used in the `compose_active_planes` to convert and store a line
> >>> + * from the src_buffer to the writeback buffer.
> >>> + */
> >>> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
> >>> +				 const struct line_buffer *src_buffer, int y)
> >>> +{
> >>> +	int x, x_dst = frame_info->dst.x1;
> >>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> >>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> >>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> >>> +			    src_buffer->n_pixels);
> >>> +
> >>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> >>> +		/*
> >>> +		 * This sequence below is important because the format's byte order is
> >>> +		 * in little-endian. In the case of the ARGB8888 the memory is
> >>> +		 * organized this way:
> >>> +		 *
> >>> +		 * | Addr     | = blue channel
> >>> +		 * | Addr + 1 | = green channel
> >>> +		 * | Addr + 2 | = Red channel
> >>> +		 * | Addr + 3 | = Alpha channel
> >>> +		 */
> >>> +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
> >>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> >>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> >>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> >>> +	}
> >>> +}
> >>> +
> >>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
> >>> +				 const struct line_buffer *src_buffer, int y)
> >>> +{
> >>> +	int x, x_dst = frame_info->dst.x1;
> >>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> >>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> >>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> >>> +			    src_buffer->n_pixels);
> >>> +
> >>> +	for (x = 0; x < x_limit; x++, dst_pixels += 4) {
> >>> +		dst_pixels[3] = (u8)0xff;  
> >>
> >> When writing to XRGB, it's not necessary to ensure the X channel has
> >> any sensible value. Anyone reading from XRGB must ignore that value
> >> anyway. So why not write something wacky here, like 0xa1, that is far
> >> enough from both 0x00 or 0xff to not be confused with them even
> >> visually? Also not 0x7f or 0x80 which are close to half of 0xff.
> >>
> >> Or, you could save a whole function and just use argb_u16_to_ARGBxxxx()
> >> instead, even for XRGB destination.  
> > 
> > 
> > Right. Maybe I could just leave the channel untouched.

Untouched may not be a good idea. Leaving anything untouched always has
the risk of leaking information through uninitialized memory. Maybe not
in this case because the destination is allocated by userspace already,
but nothing beats being obviously correct.

Whatever you decide here, be prepared for it becoming de-facto kernel
UABI, because it is easy for userspace to (accidentally) rely on the
value, no matter what you pick.


Thanks,
pq


> >   
> >>  
> >>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> >>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> >>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> >>> +	}
> >>> +}
> >>> +
> >>> +plane_format_transform_func get_plane_fmt_transform_function(u32 format)
> >>> +{
> >>> +	if (format == DRM_FORMAT_ARGB8888)
> >>> +		return &ARGB8888_to_argb_u16;
> >>> +	else if (format == DRM_FORMAT_XRGB8888)
> >>> +		return &XRGB8888_to_argb_u16;
> >>> +	else
> >>> +		return NULL;  
> >>
> >> This works for now, but when more formats are added, I'd think a switch
> >> statement would look better.  
> > 
> > ok.
> >   
> >>  
> >>> +}
> >>> +
> >>> +wb_format_transform_func get_wb_fmt_transform_function(u32 format)
> >>> +{
> >>> +	if (format == DRM_FORMAT_ARGB8888)
> >>> +		return &argb_u16_to_ARGB8888;
> >>> +	else if (format == DRM_FORMAT_XRGB8888)
> >>> +		return &argb_u16_to_XRGB8888;
> >>> +	else
> >>> +		return NULL;
> >>> +}