On Tue, 27 Feb 2024 15:47:08 -0300 Arthur Grillo wrote: > On 27/02/24 12:02, Louis Chauvet wrote: > > Le 26/02/24 - 10:07, Arthur Grillo a écrit : > >> > >> > >> On 26/02/24 05:46, Louis Chauvet wrote: > >>> Add some documentation on pixel conversion functions. > >>> Update of outdated comments for pixel_write functions. > >>> > >>> Signed-off-by: Louis Chauvet > >>> --- > >>> drivers/gpu/drm/vkms/vkms_composer.c | 4 +++ > >>> drivers/gpu/drm/vkms/vkms_drv.h | 13 ++++++++ > >>> drivers/gpu/drm/vkms/vkms_formats.c | 58 ++++++++++++++++++++++++++++++------ > >>> 3 files changed, 66 insertions(+), 9 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c > >>> index c6d9b4a65809..5b341222d239 100644 > >>> --- a/drivers/gpu/drm/vkms/vkms_composer.c > >>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c > >>> @@ -189,6 +189,10 @@ static void blend(struct vkms_writeback_job *wb, > >>> > >>> size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay; > >>> > >>> + /* > >>> + * The planes are composed line-by-line. It is a necessary complexity to avoid poor > >>> + * blending performance. > >> > >> At this moment in the series, you have not yet reintroduced the > >> line-by-line algorithm yet. Maybe it's better to add this comment when > >> you do. > > > > Is it better with this: > > > > /* > > * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary > > * complexity to avoid poor blending performance. > > * > > * The function vkms_compose_row is used to read a line, pixel-by-pixel, into the staging > > * buffer. > > */ > > > >> Also, I think it's good to give more context, like: > >> "The planes are composed line-by-line, instead of pixel-by-pixel" > > > > And after PATCHv3 5/9: > > > > /* > > * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary > > * complexity to avoid poor blending performance. > > * > > * The function pixel_read_line callback is used to read a line, using an efficient > > * algorithm for a specific format, into the staging buffer. > > */ > > Hi, there are a few reasons for the line-by-line algorithm, and the optimizations at large: VKMS uses temporary stage and output buffers so that blending functions can operate on just one high-precision pixel format, struct pixel_argb_u16. We can make pixel-format-specific read and write functions completely orthogonal from the blending operations and FB format combinations. This avoids a combinatorial explosion of needed functions for { input pixel formats × blending operations × output pixel formats }. We can use a temporary stage and output buffer whose size is one line and not whole FB or CRTC framebuffer. This is the memory savings. Using a temporary output buffer also avoids repeated read-decode-blend-encode-write cycles into the final destination buffer, as we don't need to decode/encode the pixel format. Finally, doing elementary operations (read, blend, write) line-by-line is much more efficient than pixel-by-pixel, because it allows making the inner-most loop very tight. It avoids repeatedly computing a result that does not change, like which function to call for a specific pixel format or blending equation. Thanks, pq