Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

From: Matthew Auld <matthew.william.auld@gmail.com>
To: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	Matthew Auld <matthew.auld@intel.com>,
	ML dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits
Date: Tue, 13 Jul 2021 17:13:37 +0100	[thread overview]
Message-ID: <CAM0jSHOx=WVbzfQzn=kL-5qaG4B3dxPLOimkvUdv6HFJymZeZw@mail.gmail.com> (raw)
In-Reply-To: <YO23Y3PUS22FaXDC@intel.com>

On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> > +     /**
> > +      * @cache_coherent:
> > +      *
> > +      * Track whether the pages are coherent with the GPU if reading or
> > +      * writing through the CPU cache.
> > +      *
> > +      * This largely depends on the @cache_level, for example if the object
> > +      * is marked as I915_CACHE_LLC, then GPU access is coherent for both
> > +      * reads and writes through the CPU cache.
> > +      *
> > +      * Note that on platforms with shared-LLC support(HAS_LLC) reads through
> > +      * the CPU cache are always coherent, regardless of the @cache_level. On
> > +      * snooping based platforms this is not the case, unless the full
> > +      * I915_CACHE_LLC or similar setting is used.
> > +      *
> > +      * As a result of this we need to track coherency separately for reads
> > +      * and writes, in order to avoid superfluous flushing on shared-LLC
> > +      * platforms, for reads.
> > +      *
> > +      * I915_BO_CACHE_COHERENT_FOR_READ:
> > +      *
> > +      * When reading through the CPU cache, the GPU is still coherent. Note
> > +      * that no data has actually been modified here, so it might seem
> > +      * strange that we care about this.
> > +      *
> > +      * As an example, if some object is mapped on the CPU with write-back
> > +      * caching, and we read some page, then the cache likely now contains
> > +      * the data from that read. At this point the cache and main memory
> > +      * match up, so all good. But next the GPU needs to write some data to
> > +      * that same page. Now if the @cache_level is I915_CACHE_NONE and the
> > +      * the platform doesn't have the shared-LLC, then the GPU will
> > +      * effectively skip invalidating the cache(or however that works
> > +      * internally) when writing the new value.  This is really bad since the
> > +      * GPU has just written some new data to main memory, but the CPU cache
> > +      * is still valid and now contains stale data. As a result the next time
> > +      * we do a cached read with the CPU, we are rewarded with stale data.
> > +      * Likewise if the cache is later flushed, we might be rewarded with
> > +      * overwriting main memory with stale data.
> > +      *
> > +      * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > +      *
> > +      * When writing through the CPU cache, the GPU is still coherent. Note
> > +      * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > +      *
> > +      * This is never set when I915_CACHE_NONE is used for @cache_level,
> > +      * where instead we have to manually flush the caches after writing
> > +      * through the CPU cache. For other cache levels this should be set and
> > +      * the object is therefore considered coherent for both reads and writes
> > +      * through the CPU cache.
>
> I don't remember why we have this read vs. write split and this new
> documentation doesn't seem to really explain it either.

Hmm, I attempted to explain that earlier:

* Note that on platforms with shared-LLC support(HAS_LLC) reads through
* the CPU cache are always coherent, regardless of the @cache_level. On
* snooping based platforms this is not the case, unless the full
* I915_CACHE_LLC or similar setting is used.
*
* As a result of this we need to track coherency separately for reads
* and writes, in order to avoid superfluous flushing on shared-LLC
* platforms, for reads.

So AFAIK it's just because shared-LLC can be coherent for reads, while
also not being coherent for writes(CACHE_NONE), so being able to track
each separately is kind of needed to avoid unnecessary flushing for
the read cases i.e simple boolean for coherent vs non-coherent is not
enough.

I can try to reword things to make that more clear.

>
> Is it for optimizing some display related case where we can omit the
> invalidates but still have to do the writeback to keep the display
> engine happy?
>
> --
> Ville Syrjälä
> Intel