Re: [v2] drm/mgag200: Add a workaround for low-latency

From: Sui Jingfeng <sui.jingfeng@linux.dev>
To: Jocelyn Falempe <jfalempe@redhat.com>,
	dri-devel@lists.freedesktop.org, airlied@redhat.com,
	tzimmermann@suse.de, daniel@ffwll.ch
Subject: Re: [v2] drm/mgag200: Add a workaround for low-latency
Date: Tue, 12 Mar 2024 20:56:40 +0800	[thread overview]
Message-ID: <212a3a40-e1d1-4e2f-97f1-7039f92121f5@linux.dev> (raw)
In-Reply-To: <20240208095125.377908-1-jfalempe@redhat.com>

Hi,

Interesting patch! I know this patch already merged.
While study this patch, I have a few questions.

On 2024/2/8 17:51, Jocelyn Falempe wrote:
> We found a regression in v5.10 on real-time server, using the
> rt-kernel and the mgag200 driver. It's some really specialized
> workload, with <10us latency expectation on isolated core.
> After the v5.10, the real time tasks missed their <10us latency
> when something prints on the screen (fbcon or printk)
>
> The regression has been bisected to 2 commits:
> commit 0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages")
> commit 4862ffaec523 ("drm/mgag200: Move vmap out of commit tail")
>
> The first one changed the system memory framebuffer from Write-Combine
> to the default caching.
> Before the second commit, the mgag200 driver used to unmap the
> framebuffer after each frame, which implicitly does a cache flush.

I don't know why it need to do a cache flush, where is the code.
I'm asking because I want to study this technique.

Generally speaking, X86-64 platform's default page caching is cached.
And I think the cached mapping is fastest for software rendering. And
the platform guaranteed the coherency for us, right?

Because X86-64 platform(or CPU)'s write buffer is implemented on the
top of cache? I'm means that for ARM(or other) CPU, when using Write-combine
the data will has nothing to do with cache.

> Both regressions are fixed by this commit, which restore WC mapping
> for the framebuffer in system memory, and add a cache flush.

So switch back to WC probably will decrease overall performance, I think.
And the cache flush operation should not have a impact. Except X86-64's
Write-Combine is different other platform's Write-Combine?

> This is only needed on x86_64, for low-latency workload,
> so the new kconfig DRM_MGAG200_IOBURST_WORKAROUND depends on
> PREEMPT_RT and X86.
>
> For more context, the whole thread can be found here [1]
>
> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
> Link: https://lore.kernel.org/dri-devel/20231019135655.313759-1-jfalempe@redhat.com/ # 1
> Acked-by: Thomas Zimmermann <tzimmermann@suse.de>