(cc'ing lkml for feedback)

Hi Jocelyn

Am 19.10.23 um 15:55 schrieb Jocelyn Falempe:
> We found a regression in v5.10 on real-time server, using the
> rt-kernel and the mgag200 driver. It's some really specialized
> workload, with <10us latency expectation on isolated core.
> After the v5.10, the real time tasks missed their <10us latency
> when something prints on the screen (fbcon or printk)

I'd like to hear the opinion of the RT-devs on this patch. Because AFAIK 
we never did such a workaround in other drivers. And AFAIK printk is a 
PITA anyway.

IMHO if that RT system cannot handle differences in framebuffer caching, 
it's under-powered. It's just a matter of time until something else 
changes and the problem returns. And (honest question) as it's an 
x86-64, how do they handle System Management Mode?

> 
> The regression has been bisected to 2 commits:
> 0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages")
> 4862ffaec523 ("drm/mgag200: Move vmap out of commit tail")
> 
> The first one changed the system memory framebuffer from Write-Combine
> to the default caching.
> Before the second commit, the mgag200 driver used to unmap the
> framebuffer after each frame, which implicitly does a cache flush.
> Both regressions are fixed by the following patch, which forces a
> cache flush after each frame, reverting to almost v5.9 behavior.

With that second commit, we essentially never unmap an active 
framebuffer console. But with commit

359c6649cd9a ("drm/gem: Implement shadow-plane {begin, end}_fb_access 
with vmap")

we now again unmap the console framebuffer after the pageflip happened.

So how does the latest kernel behave wrt to the problem?

> This is necessary only if you have strong realtime constraints, so I
> put the cache flush under the CONFIG_PREEMPT_RT config flag.
> Also clflush is only availabe on x86, (and this issue has only been
> reproduced on x86_64) so it's also under the CONFIG_X86 config flag.
> 
> Fixes: 0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages")
> Fixes: 4862ffaec523 ("drm/mgag200: Move vmap out of commit tail")
> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
> ---
>   drivers/gpu/drm/mgag200/mgag200_mode.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/mgag200/mgag200_mode.c b/drivers/gpu/drm/mgag200/mgag200_mode.c
> index af3ce5a6a636..11660cd29cea 100644
> --- a/drivers/gpu/drm/mgag200/mgag200_mode.c
> +++ b/drivers/gpu/drm/mgag200/mgag200_mode.c
> @@ -13,6 +13,7 @@
>   
>   #include <drm/drm_atomic.h>
>   #include <drm/drm_atomic_helper.h>
> +#include <drm/drm_cache.h>
>   #include <drm/drm_damage_helper.h>
>   #include <drm/drm_format_helper.h>
>   #include <drm/drm_fourcc.h>
> @@ -436,6 +437,10 @@ static void mgag200_handle_damage(struct mga_device *mdev, const struct iosys_ma
>   
>   	iosys_map_incr(&dst, drm_fb_clip_offset(fb->pitches[0], fb->format, clip));
>   	drm_fb_memcpy(&dst, fb->pitches, vmap, fb, clip);
> +	/* On RT systems, flushing the cache reduces the latency for other RT tasks */
> +#if defined(CONFIG_X86) && defined(CONFIG_PREEMPT_RT)
> +	drm_clflush_virt_range(vmap, fb->height * fb->pitches[0]);
> +#endif

Your second commit is part of a larger patchset that updates several 
drivers. They might all be affected. So if anything, the patch should go 
here before the unmap call:

https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/drm_gem_atomic_helper.c#L377

with a much expanded comment.

But I'd really like to hear other peoples' opinions on the matter.

Best regards
Thomas

>   }
>   
>   /*
> 
> base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)