Re: [PATCH v4 3/4] drm/shmem-helpers: Allocate wc pages on x86

From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <christian.koenig@amd.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"David Airlie" <airlied@linux.ie>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Daniel Vetter" <daniel.vetter@intel.com>
Subject: Re: [PATCH v4 3/4] drm/shmem-helpers: Allocate wc pages on x86
Date: Fri, 23 Jul 2021 10:34:14 +0200	[thread overview]
Message-ID: <YPp/BlD8zrM98+6C@phenom.ffwll.local> (raw)
In-Reply-To: <be56fbe8-5151-ef8d-13cb-0b8a71f4d1e0@amd.com>

On Fri, Jul 23, 2021 at 10:02:39AM +0200, Christian König wrote:
> Am 23.07.21 um 09:36 schrieb Daniel Vetter:
> > On Thu, Jul 22, 2021 at 08:40:56PM +0200, Thomas Zimmermann wrote:
> > > Hi
> > > 
> > > Am 13.07.21 um 22:51 schrieb Daniel Vetter:
> > > [SNIP]
> > > > +#ifdef CONFIG_X86
> > > > +	if (shmem->map_wc)
> > > > +		set_pages_array_wc(pages, obj->size >> PAGE_SHIFT);
> > > > +#endif
> > > I cannot comment much on the technical details of the caching of various
> > > architectures. If this patch goes in, there should be a longer comment that
> > > reflects the discussion in this thread. It's apparently a workaround.
> > > 
> > > I think the call itself should be hidden behind a DRM API, which depends on
> > > CONFIG_X86. Something simple like
> > > 
> > > ifdef CONFIG_X86
> > > drm_set_pages_array_wc()
> > > {
> > > 	set_pages_array_wc();
> > > }
> > > else
> > > drm_set_pages_array_wc()
> > >   {
> > >   }
> > > #endif
> > > 
> > > Maybe in drm_cache.h?
> > We do have a bunch of this in drm_cache.h already, and architecture
> > maintainers hate us for it.
> 
> Yeah, for good reasons :)
> 
> > The real fix is to get at the architecture-specific wc allocator, which is
> > currently not something that's exposed, but hidden within the dma api. I
> > think having this stick out like this is better than hiding it behind fake
> > generic code (like we do with drm_clflush, which defacto also only really
> > works on x86).
> 
> The DMA API also doesn't really touch that stuff as far as I know.
> 
> What we rather do on other architectures is to set the appropriate caching
> flags on the CPU mappings, see function ttm_prot_from_caching().

This alone doesn't do cache flushes. And at least on some arm cpus having
inconsistent mappings can lead to interconnect hangs, so you have to at
least punch out the kernel linear map. Which on some arms isn't possible
(because the kernel map is a special linear map and not done with
pagetables). Which means you need to carve this out at boot and treat them
as GFP_HIGHMEM.

Afaik dma-api has that allocator somewhere which dtrt for
dma_alloc_coherent.

Also shmem helpers already set the caching pgprot.

> > Also note that ttm has the exact same ifdef in its page allocator, but it
> > does fall back to using dma_alloc_coherent on other platforms.
> 
> This works surprisingly well on non x86 architectures as well. We just don't
> necessary update the kernel mappings everywhere which limits the kmap usage.
> 
> In other words radeon and nouveau still work on PowerPC AGP systems as far
> as I know for example.

The thing is, on most cpus you get away with just pgprot set to wc, and on
many others it's only an issue while there's still some cpu dirt hanging
around because they don't prefetch badly enough. It's very few were it's a
persistent problem.

Really the only reason I've even caught this was because some of the
i915+vgem buffer sharing tests we have are very nasty and intentionally
try to provoke the worst case :-)

Anyway, since you're looking, can you pls review this and the previous
patch for shmem helpers?

The first one to make VM_PFNMAP standard for all dma-buf isn't ready yet,
because I need to audit all the driver still. And at least i915 dma-buf
mmap is still using gup-able memory too. So more work to do here.
-Danel

> 
> Christian.
> 
> > -Daniel
> > 
> > > Best regard
> > > Thomas
> > > 
> > > > +
> > > >    	shmem->pages = pages;
> > > >    	return 0;
> > > > @@ -203,6 +212,11 @@ static void drm_gem_shmem_put_pages_locked(struct drm_gem_shmem_object *shmem)
> > > >    	if (--shmem->pages_use_count > 0)
> > > >    		return;
> > > > +#ifdef CONFIG_X86
> > > > +	if (shmem->map_wc)
> > > > +		set_pages_array_wb(shmem->pages, obj->size >> PAGE_SHIFT);
> > > > +#endif
> > > > +
> > > >    	drm_gem_put_pages(obj, shmem->pages,
> > > >    			  shmem->pages_mark_dirty_on_put,
> > > >    			  shmem->pages_mark_accessed_on_put);
> > > > 
> > > -- 
> > > Thomas Zimmermann
> > > Graphics Driver Developer
> > > SUSE Software Solutions Germany GmbH
> > > Maxfeldstr. 5, 90409 Nürnberg, Germany
> > > (HRB 36809, AG Nürnberg)
> > > Geschäftsführer: Felix Imendörffer
> > > 
> > 
> > 
> > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch