[PATCH] drm/mgag200: Flush the cache to improve latency

* [PATCH] drm/mgag200: Flush the cache to improve latency
@ 2023-10-19 13:55 Jocelyn Falempe
  2023-10-20 12:06 ` Thomas Zimmermann
  0 siblings, 1 reply; 9+ messages in thread
From: Jocelyn Falempe @ 2023-10-19 13:55 UTC (permalink / raw)
  To: dri-devel, tzimmermann, airlied, daniel; +Cc: Jocelyn Falempe

We found a regression in v5.10 on real-time server, using the
rt-kernel and the mgag200 driver. It's some really specialized
workload, with <10us latency expectation on isolated core.
After the v5.10, the real time tasks missed their <10us latency
when something prints on the screen (fbcon or printk)

The regression has been bisected to 2 commits:
0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages")
4862ffaec523 ("drm/mgag200: Move vmap out of commit tail")

The first one changed the system memory framebuffer from Write-Combine
to the default caching.
Before the second commit, the mgag200 driver used to unmap the
framebuffer after each frame, which implicitly does a cache flush.
Both regressions are fixed by the following patch, which forces a
cache flush after each frame, reverting to almost v5.9 behavior.
This is necessary only if you have strong realtime constraints, so I
put the cache flush under the CONFIG_PREEMPT_RT config flag.
Also clflush is only availabe on x86, (and this issue has only been
reproduced on x86_64) so it's also under the CONFIG_X86 config flag.

Fixes: 0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages")
Fixes: 4862ffaec523 ("drm/mgag200: Move vmap out of commit tail")
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
---
 drivers/gpu/drm/mgag200/mgag200_mode.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/mgag200/mgag200_mode.c b/drivers/gpu/drm/mgag200/mgag200_mode.c
index af3ce5a6a636..11660cd29cea 100644
--- a/drivers/gpu/drm/mgag200/mgag200_mode.c
+++ b/drivers/gpu/drm/mgag200/mgag200_mode.c
@@ -13,6 +13,7 @@
 
 #include <drm/drm_atomic.h>
 #include <drm/drm_atomic_helper.h>
+#include <drm/drm_cache.h>
 #include <drm/drm_damage_helper.h>
 #include <drm/drm_format_helper.h>
 #include <drm/drm_fourcc.h>
@@ -436,6 +437,10 @@ static void mgag200_handle_damage(struct mga_device *mdev, const struct iosys_ma
 
 	iosys_map_incr(&dst, drm_fb_clip_offset(fb->pitches[0], fb->format, clip));
 	drm_fb_memcpy(&dst, fb->pitches, vmap, fb, clip);
+	/* On RT systems, flushing the cache reduces the latency for other RT tasks */
+#if defined(CONFIG_X86) && defined(CONFIG_PREEMPT_RT)
+	drm_clflush_virt_range(vmap, fb->height * fb->pitches[0]);
+#endif
 }
 
 /*

base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread