On Tue, 27 Feb 2024 15:47:08 -0300
Arthur Grillo <arthurgrillo@riseup.net> wrote:

> On 27/02/24 12:02, Louis Chauvet wrote:
> > Le 26/02/24 - 10:07, Arthur Grillo a écrit :  
> >>
> >>
> >> On 26/02/24 05:46, Louis Chauvet wrote:  
> >>> Add some documentation on pixel conversion functions.
> >>> Update of outdated comments for pixel_write functions.
> >>>
> >>> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> >>> ---
> >>>  drivers/gpu/drm/vkms/vkms_composer.c |  4 +++
> >>>  drivers/gpu/drm/vkms/vkms_drv.h      | 13 ++++++++
> >>>  drivers/gpu/drm/vkms/vkms_formats.c  | 58 ++++++++++++++++++++++++++++++------
> >>>  3 files changed, 66 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> >>> index c6d9b4a65809..5b341222d239 100644
> >>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> >>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> >>> @@ -189,6 +189,10 @@ static void blend(struct vkms_writeback_job *wb,
> >>>  
> >>>  	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> >>>  
> >>> +	/*
> >>> +	 * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> >>> +	 * blending performance.  
> >>
> >> At this moment in the series, you have not yet reintroduced the
> >> line-by-line algorithm yet. Maybe it's better to add this comment when
> >> you do.  
> > 
> > Is it better with this:
> > 
> > 	/*
> > 	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
> > 	 * complexity to avoid poor blending performance.
> > 	 *
> > 	 * The function vkms_compose_row is used to read a line, pixel-by-pixel, into the staging
> > 	 * buffer.
> > 	 */
> >    
> >> Also, I think it's good to give more context, like:
> >> "The planes are composed line-by-line, instead of pixel-by-pixel"  
> > 
> > And after PATCHv3 5/9:
> > 
> > 	/*
> > 	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
> > 	 * complexity to avoid poor blending performance.
> > 	 *
> > 	 * The function pixel_read_line callback is used to read a line, using an efficient 
> > 	 * algorithm for a specific format, into the staging buffer.
> > 	 */
> >   

Hi,

there are a few reasons for the line-by-line algorithm, and the
optimizations at large:

VKMS uses temporary stage and output buffers so that blending functions
can operate on just one high-precision pixel format, struct
pixel_argb_u16. We can make pixel-format-specific read and write
functions completely orthogonal from the blending operations and FB
format combinations. This avoids a combinatorial explosion of needed
functions for { input pixel formats × blending operations × output pixel
formats }.

We can use a temporary stage and output buffer whose size is one line
and not whole FB or CRTC framebuffer. This is the memory savings.

Using a temporary output buffer also avoids repeated
read-decode-blend-encode-write cycles into the final destination
buffer, as we don't need to decode/encode the pixel format.

Finally, doing elementary operations (read, blend, write) line-by-line
is much more efficient than pixel-by-pixel, because it allows making
the inner-most loop very tight. It avoids repeatedly computing a result
that does not change, like which function to call for a specific pixel
format or blending equation.


Thanks,
pq