(cc'ing lkml for feedback) Hi Jocelyn Am 19.10.23 um 15:55 schrieb Jocelyn Falempe: > We found a regression in v5.10 on real-time server, using the > rt-kernel and the mgag200 driver. It's some really specialized > workload, with <10us latency expectation on isolated core. > After the v5.10, the real time tasks missed their <10us latency > when something prints on the screen (fbcon or printk) I'd like to hear the opinion of the RT-devs on this patch. Because AFAIK we never did such a workaround in other drivers. And AFAIK printk is a PITA anyway. IMHO if that RT system cannot handle differences in framebuffer caching, it's under-powered. It's just a matter of time until something else changes and the problem returns. And (honest question) as it's an x86-64, how do they handle System Management Mode? > > The regression has been bisected to 2 commits: > 0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages") > 4862ffaec523 ("drm/mgag200: Move vmap out of commit tail") > > The first one changed the system memory framebuffer from Write-Combine > to the default caching. > Before the second commit, the mgag200 driver used to unmap the > framebuffer after each frame, which implicitly does a cache flush. > Both regressions are fixed by the following patch, which forces a > cache flush after each frame, reverting to almost v5.9 behavior. With that second commit, we essentially never unmap an active framebuffer console. But with commit 359c6649cd9a ("drm/gem: Implement shadow-plane {begin, end}_fb_access with vmap") we now again unmap the console framebuffer after the pageflip happened. So how does the latest kernel behave wrt to the problem? > This is necessary only if you have strong realtime constraints, so I > put the cache flush under the CONFIG_PREEMPT_RT config flag. > Also clflush is only availabe on x86, (and this issue has only been > reproduced on x86_64) so it's also under the CONFIG_X86 config flag. > > Fixes: 0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages") > Fixes: 4862ffaec523 ("drm/mgag200: Move vmap out of commit tail") > Signed-off-by: Jocelyn Falempe > --- > drivers/gpu/drm/mgag200/mgag200_mode.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/gpu/drm/mgag200/mgag200_mode.c b/drivers/gpu/drm/mgag200/mgag200_mode.c > index af3ce5a6a636..11660cd29cea 100644 > --- a/drivers/gpu/drm/mgag200/mgag200_mode.c > +++ b/drivers/gpu/drm/mgag200/mgag200_mode.c > @@ -13,6 +13,7 @@ > > #include > #include > +#include > #include > #include > #include > @@ -436,6 +437,10 @@ static void mgag200_handle_damage(struct mga_device *mdev, const struct iosys_ma > > iosys_map_incr(&dst, drm_fb_clip_offset(fb->pitches[0], fb->format, clip)); > drm_fb_memcpy(&dst, fb->pitches, vmap, fb, clip); > + /* On RT systems, flushing the cache reduces the latency for other RT tasks */ > +#if defined(CONFIG_X86) && defined(CONFIG_PREEMPT_RT) > + drm_clflush_virt_range(vmap, fb->height * fb->pitches[0]); > +#endif Your second commit is part of a larger patchset that updates several drivers. They might all be affected. So if anything, the patch should go here before the unmap call: https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/drm_gem_atomic_helper.c#L377 with a much expanded comment. But I'd really like to hear other peoples' opinions on the matter. Best regards Thomas > } > > /* > > base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)