intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits
@ 2021-07-22 11:34 Matthew Auld
  2021-07-22 11:34 ` [Intel-gfx] [PATCH v3 2/2] drm/i915/ehl: unconditionally flush the pages on acquire Matthew Auld
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Matthew Auld @ 2021-07-22 11:34 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, dri-devel

Try to document the object caching related bits, like cache_coherent and
cache_dirty.

v2(Ville):
 - As pointed out by Ville, fix the completely incorrect assumptions
   about the "partial" coherency on shared LLC platforms.
v3(Daniel):
 - Fix nonsense about "dirtying" the cache with reads.

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 176 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_drv.h               |   9 -
 2 files changed, 172 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index afbadfc5516b..40cce816a7e3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -92,6 +92,76 @@ struct drm_i915_gem_object_ops {
 	const char *name; /* friendly name for debug, e.g. lockdep classes */
 };
 
+/**
+ * enum i915_cache_level - The supported GTT caching values for system memory
+ * pages.
+ *
+ * These translate to some special GTT PTE bits when binding pages into some
+ * address space. It also determines whether an object, or rather its pages are
+ * coherent with the GPU, when also reading or writing through the CPU cache
+ * with those pages.
+ *
+ * Userspace can also control this through struct drm_i915_gem_caching.
+ */
+enum i915_cache_level {
+	/**
+	 * @I915_CACHE_NONE:
+	 *
+	 * Not coherent with the CPU cache. If the cache is dirty and we need
+	 * the underlying pages to be coherent with some later GPU access then
+	 * we need to manually flush the pages.
+	 *
+	 * Note that on shared LLC platforms reads and writes through the CPU
+	 * cache are still coherent even with this setting. See also
+	 * &drm_i915_gem_object.cache_coherent for more details.
+	 *
+	 * Note that on platforms with a shared LLC this should ideally only be
+	 * used for scanout surfaces, otherwise we end up over-flushing in some
+	 * places.
+	 */
+	I915_CACHE_NONE = 0,
+	/**
+	 * @I915_CACHE_LLC:
+	 *
+	 * Coherent with the CPU cache. If the cache is dirty, then the GPU will
+	 * ensure that access remains coherent, when both reading and writing
+	 * through the CPU cache.
+	 *
+	 * Not used for scanout surfaces.
+	 *
+	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
+	 * based platforms(HAS_SNOOP).
+	 *
+	 * This should be the default for platforms which share the LLC with the
+	 * CPU. The only exception is scanout objects, where the display engine
+	 * is not coherent with the LLC. For such objects I915_CACHE_NONE or
+	 * I915_CACHE_WT should be used.
+	 */
+	I915_CACHE_LLC,
+	/**
+	 * @I915_CACHE_L3_LLC:
+	 *
+	 * Explicitly enable the Gfx L3 cache, with snooped LLC.
+	 *
+	 * The Gfx L3 sits between the domain specific caches, e.g
+	 * sampler/render caches, and the larger LLC. LLC is coherent with the
+	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
+	 * when the workload completes.
+	 *
+	 * Not used for scanout surfaces.
+	 *
+	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
+	 * this.
+	 */
+	I915_CACHE_L3_LLC,
+	/**
+	 * @I915_CACHE_WT:
+	 *
+	 * hsw:gt3e Write-through for scanout buffers.
+	 */
+	I915_CACHE_WT,
+};
+
 enum i915_map_type {
 	I915_MAP_WB = 0,
 	I915_MAP_WC,
@@ -229,14 +299,112 @@ struct drm_i915_gem_object {
 	unsigned int mem_flags;
 #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */
 #define I915_BO_FLAG_IOMEM       BIT(1) /* Object backed by IO memory */
-	/*
-	 * Is the object to be mapped as read-only to the GPU
-	 * Only honoured if hardware has relevant pte bit
+	/**
+	 * @cache_level: The desired GTT caching level.
+	 *
+	 * See enum i915_cache_level for possible values, along with what
+	 * each does.
 	 */
 	unsigned int cache_level:3;
-	unsigned int cache_coherent:2;
+	/**
+	 * @cache_coherent:
+	 *
+	 * Track whether the pages are coherent with the GPU if reading or
+	 * writing through the CPU caches. The largely depends on the
+	 * @cache_level setting.
+	 *
+	 * On platforms which don't have the shared LLC(HAS_SNOOP), like on Atom
+	 * platforms, coherency must be explicitly requested with some special
+	 * GTT caching bits(see enum i915_cache_level). When enabling coherency
+	 * it does come at a performance and power cost on such platforms. On
+	 * the flip side the kernel does need to manually flush any buffers
+	 * which need to be coherent with the GPU, if the object is not
+	 * coherent i.e @cache_coherent is zero.
+	 *
+	 * On platforms that share the LLC with the CPU(HAS_LLC), all GT memory
+	 * access will automatically snoop the CPU caches(even with CACHE_NONE).
+	 * The one exception is when dealing with the display engine, like with
+	 * scanout surfaces. To handle this the kernel will always flush the
+	 * surface out of the CPU caches when preparing it for scanout.  Also
+	 * note that since scanout surfaces are only ever read by the display
+	 * engine we only need to care about flushing any writes through the CPU
+	 * cache, reads on the other hand will always be coherent.
+	 *
+	 * Something strange here is why @cache_coherent is not a simple
+	 * boolean, i.e coherent vs non-coherent. The reasoning for this is back
+	 * to the display engine not being fully coherent. As a result scanout
+	 * surfaces will either be marked as I915_CACHE_NONE or I915_CACHE_WT.
+	 * In the case of seeing I915_CACHE_NONE the kernel makes the assumption
+	 * that this is likely a scanout surface, and will set @cache_coherent
+	 * as only I915_BO_CACHE_COHERENT_FOR_READ, on platforms with the shared
+	 * LLC. The kernel uses this to always flush writes through the CPU
+	 * cache as early as possible, where it can, in effect keeping
+	 * @cache_dirty clean, so we can potentially avoid stalling when
+	 * flushing the surface just before doing the scanout.  This does mean
+	 * we might unnecessarily flush non-scanout objects in some places, but
+	 * the default assumption is that all normal objects should be using
+	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
+	 *
+	 * Supported values:
+	 *
+	 * I915_BO_CACHE_COHERENT_FOR_READ:
+	 *
+	 * On shared LLC platforms, we use this for special scanout surfaces,
+	 * where the display engine is not coherent with the CPU cache. As such
+	 * we need to ensure we flush any writes before doing the scanout. As an
+	 * optimisation we try to flush any writes as early as possible to avoid
+	 * stalling later.
+	 *
+	 * Thus for scanout surfaces using I915_CACHE_NONE, on shared LLC
+	 * platforms, we use:
+	 *
+	 *	cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ
+	 *
+	 * While for normal objects that are fully coherent we use:
+	 *
+	 *	cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
+	 *			 I915_BO_CACHE_COHERENT_FOR_WRITE
+	 *
+	 * And then for objects that are not coherent at all we use:
+	 *
+	 *	cache_coherent = 0
+	 *
+	 * I915_BO_CACHE_COHERENT_FOR_WRITE:
+	 *
+	 * When writing through the CPU cache, the GPU is still coherent. Note
+	 * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
+	 */
 #define I915_BO_CACHE_COHERENT_FOR_READ BIT(0)
 #define I915_BO_CACHE_COHERENT_FOR_WRITE BIT(1)
+	unsigned int cache_coherent:2;
+
+	/**
+	 * @cache_dirty:
+	 *
+	 * Track if we are we dirty with writes through the CPU cache for this
+	 * object. As a result reading directly from main memory might yield
+	 * stale data.
+	 *
+	 * This also ties into whether the kernel is tracking the object as
+	 * coherent with the GPU, as per @cache_coherent, as it determines if
+	 * flushing might be needed at various points.
+	 *
+	 * Another part of @cache_dirty is managing flushing when first
+	 * acquiring the pages for system memory, at this point the pages are
+	 * considered foreign, so the default assumption is that the cache is
+	 * dirty, for example the page zeroing done by the kernel might leave
+	 * writes though the CPU cache, or swapping-in, while the actual data in
+	 * main memory is potentially stale.  Note that this is a potential
+	 * security issue when dealing with userspace objects and zeroing. Now,
+	 * whether we actually need apply the big sledgehammer of flushing all
+	 * the pages on acquire depends on if @cache_coherent is marked as
+	 * I915_BO_CACHE_COHERENT_FOR_WRITE, i.e that the GPU will be coherent
+	 * for both reads and writes though the CPU cache.
+	 *
+	 * Note that on shared LLC platforms we still apply the heavy flush for
+	 * I915_CACHE_NONE objects, under the assumption that this is going to
+	 * be used for scanout.
+	 */
 	unsigned int cache_dirty:1;
 
 	/**
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0321a1f9738d..f97792ccc199 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -394,15 +394,6 @@ struct drm_i915_display_funcs {
 	void (*read_luts)(struct intel_crtc_state *crtc_state);
 };
 
-enum i915_cache_level {
-	I915_CACHE_NONE = 0,
-	I915_CACHE_LLC, /* also used for snoopable memory on non-LLC */
-	I915_CACHE_L3_LLC, /* gen7+, L3 sits between the domain specifc
-			      caches, eg sampler/render caches, and the
-			      large Last-Level-Cache. LLC is coherent with
-			      the CPU, but L3 is only visible to the GPU. */
-	I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */
-};
 
 #define I915_COLOR_UNEVICTABLE (-1) /* a non-vma sharing the address space */
 
-- 
2.26.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Intel-gfx] [PATCH v3 2/2] drm/i915/ehl: unconditionally flush the pages on acquire
  2021-07-22 11:34 [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits Matthew Auld
@ 2021-07-22 11:34 ` Matthew Auld
  2021-07-22 11:54 ` [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits Daniel Vetter
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Matthew Auld @ 2021-07-22 11:34 UTC (permalink / raw)
  To: intel-gfx
  Cc: Daniel Vetter, Lucas De Marchi, dri-devel, Chris Wilson, Francisco Jerez

EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
possible for userspace to bypass the GTT caching bits set by the kernel,
as per the given object cache_level. This is troublesome since the heavy
flush we apply when first acquiring the pages is skipped if the kernel
thinks the object is coherent with the GPU. As a result it might be
possible to bypass the cache and read the contents of the page directly,
which could be stale data. If it's just a case of userspace shooting
themselves in the foot then so be it, but since i915 takes the stance of
always zeroing memory before handing it to userspace, we need to prevent
this.

v2: this time actually set cache_dirty in put_pages()
v3: move to get_pages() which looks simpler

BSpec: 34007
References: 046091758b50 ("Revert "drm/i915/ehl: Update MOCS table for EHL"")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Cc: Francisco Jerez <francisco.jerez.plata@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h   |  6 ++++++
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c      | 18 ++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 40cce816a7e3..f0948f6b1e1d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -404,6 +404,12 @@ struct drm_i915_gem_object {
 	 * Note that on shared LLC platforms we still apply the heavy flush for
 	 * I915_CACHE_NONE objects, under the assumption that this is going to
 	 * be used for scanout.
+	 *
+	 * Update: On some hardware there is now also the 'Bypass LLC' MOCS
+	 * entry, which defeats our @cache_coherent tracking, since userspace
+	 * can freely bypass the CPU cache when touching the pages with the GPU,
+	 * where the kernel is completely unaware. On such platform we need
+	 * apply the sledgehammer-on-acquire regardless of the @cache_coherent.
 	 */
 	unsigned int cache_dirty:1;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 6a04cce188fc..11f072193f3b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -182,6 +182,24 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
 	if (i915_gem_object_needs_bit17_swizzle(obj))
 		i915_gem_object_do_bit_17_swizzle(obj, st);
 
+	/*
+	 * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
+	 * possible for userspace to bypass the GTT caching bits set by the
+	 * kernel, as per the given object cache_level. This is troublesome
+	 * since the heavy flush we apply when first gathering the pages is
+	 * skipped if the kernel thinks the object is coherent with the GPU. As
+	 * a result it might be possible to bypass the cache and read the
+	 * contents of the page directly, which could be stale data. If it's
+	 * just a case of userspace shooting themselves in the foot then so be
+	 * it, but since i915 takes the stance of always zeroing memory before
+	 * handing it to userspace, we need to prevent this.
+	 *
+	 * By setting cache_dirty here we make the clflush in set_pages
+	 * unconditional on such platforms.
+	 */
+	if (IS_JSL_EHL(i915) && obj->flags & I915_BO_ALLOC_USER)
+		obj->cache_dirty = true;
+
 	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
 
 	return 0;
-- 
2.26.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits
  2021-07-22 11:34 [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits Matthew Auld
  2021-07-22 11:34 ` [Intel-gfx] [PATCH v3 2/2] drm/i915/ehl: unconditionally flush the pages on acquire Matthew Auld
@ 2021-07-22 11:54 ` Daniel Vetter
  2021-07-23  8:58   ` Matthew Auld
  2021-07-22 13:30 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [v3,1/2] " Patchwork
  2021-07-22 14:01 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  3 siblings, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2021-07-22 11:54 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Daniel Vetter, intel-gfx, dri-devel

On Thu, Jul 22, 2021 at 12:34:55PM +0100, Matthew Auld wrote:
> Try to document the object caching related bits, like cache_coherent and
> cache_dirty.
> 
> v2(Ville):
>  - As pointed out by Ville, fix the completely incorrect assumptions
>    about the "partial" coherency on shared LLC platforms.
> v3(Daniel):
>  - Fix nonsense about "dirtying" the cache with reads.
> 
> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 176 +++++++++++++++++-
>  drivers/gpu/drm/i915/i915_drv.h               |   9 -
>  2 files changed, 172 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index afbadfc5516b..40cce816a7e3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -92,6 +92,76 @@ struct drm_i915_gem_object_ops {
>  	const char *name; /* friendly name for debug, e.g. lockdep classes */
>  };
>  
> +/**
> + * enum i915_cache_level - The supported GTT caching values for system memory
> + * pages.
> + *
> + * These translate to some special GTT PTE bits when binding pages into some
> + * address space. It also determines whether an object, or rather its pages are
> + * coherent with the GPU, when also reading or writing through the CPU cache
> + * with those pages.
> + *
> + * Userspace can also control this through struct drm_i915_gem_caching.
> + */
> +enum i915_cache_level {
> +	/**
> +	 * @I915_CACHE_NONE:
> +	 *
> +	 * Not coherent with the CPU cache. If the cache is dirty and we need
> +	 * the underlying pages to be coherent with some later GPU access then
> +	 * we need to manually flush the pages.
> +	 *
> +	 * Note that on shared LLC platforms reads and writes through the CPU
> +	 * cache are still coherent even with this setting. See also
> +	 * &drm_i915_gem_object.cache_coherent for more details.
> +	 *
> +	 * Note that on platforms with a shared LLC this should ideally only be

Merge this with the previous note and maybe explain it with "Due to this
we should only use uncached for scanout surfaces on platforms with shared
LLC, otherwise ..."

As-is reads a bit awkward/repetive.

> +	 * used for scanout surfaces, otherwise we end up over-flushing in some
> +	 * places.

Maybe also note that on non-LLC platforms uncached is the default.

> +	 */
> +	I915_CACHE_NONE = 0,
> +	/**
> +	 * @I915_CACHE_LLC:
> +	 *
> +	 * Coherent with the CPU cache. If the cache is dirty, then the GPU will
> +	 * ensure that access remains coherent, when both reading and writing
> +	 * through the CPU cache.
> +	 *
> +	 * Not used for scanout surfaces.
> +	 *
> +	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
> +	 * based platforms(HAS_SNOOP).
> +	 *
> +	 * This should be the default for platforms which share the LLC with the
s/should/is/

After all it _is_ the default at object creation time.

> +	 * CPU. The only exception is scanout objects, where the display engine
> +	 * is not coherent with the LLC. For such objects I915_CACHE_NONE or
> +	 * I915_CACHE_WT should be used.

Maybe clarify that we automatically apply this transition upon
pin_for_display if userspace hasn't done it.

> +	 */
> +	I915_CACHE_LLC,
> +	/**
> +	 * @I915_CACHE_L3_LLC:
> +	 *
> +	 * Explicitly enable the Gfx L3 cache, with snooped LLC.
> +	 *
> +	 * The Gfx L3 sits between the domain specific caches, e.g
> +	 * sampler/render caches, and the larger LLC. LLC is coherent with the
> +	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
> +	 * when the workload completes.
> +	 *
> +	 * Not used for scanout surfaces.
> +	 *
> +	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
> +	 * this.

I think it's also the default on these?

> +	 */
> +	I915_CACHE_L3_LLC,

> +	/**
> +	 * @I915_CACHE_WT:
> +	 *
> +	 * hsw:gt3e Write-through for scanout buffers.

I haven't checked, but are we using this automatically?

> +	 */
> +	I915_CACHE_WT,
> +};
> +
>  enum i915_map_type {
>  	I915_MAP_WB = 0,
>  	I915_MAP_WC,
> @@ -229,14 +299,112 @@ struct drm_i915_gem_object {
>  	unsigned int mem_flags;
>  #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */
>  #define I915_BO_FLAG_IOMEM       BIT(1) /* Object backed by IO memory */
> -	/*
> -	 * Is the object to be mapped as read-only to the GPU
> -	 * Only honoured if hardware has relevant pte bit
> +	/**
> +	 * @cache_level: The desired GTT caching level.
> +	 *
> +	 * See enum i915_cache_level for possible values, along with what
> +	 * each does.
>  	 */
>  	unsigned int cache_level:3;
> -	unsigned int cache_coherent:2;
> +	/**
> +	 * @cache_coherent:
> +	 *
> +	 * Track whether the pages are coherent with the GPU if reading or
> +	 * writing through the CPU caches. The largely depends on the
> +	 * @cache_level setting.
> +	 *
> +	 * On platforms which don't have the shared LLC(HAS_SNOOP), like on Atom
> +	 * platforms, coherency must be explicitly requested with some special
> +	 * GTT caching bits(see enum i915_cache_level). When enabling coherency
> +	 * it does come at a performance and power cost on such platforms. On
> +	 * the flip side the kernel does need to manually flush any buffers

does _not_ need

I think at least that's what you mean here.

> +	 * which need to be coherent with the GPU, if the object is not
> +	 * coherent i.e @cache_coherent is zero.
> +	 *
> +	 * On platforms that share the LLC with the CPU(HAS_LLC), all GT memory
> +	 * access will automatically snoop the CPU caches(even with CACHE_NONE).
> +	 * The one exception is when dealing with the display engine, like with
> +	 * scanout surfaces. To handle this the kernel will always flush the
> +	 * surface out of the CPU caches when preparing it for scanout.  Also
> +	 * note that since scanout surfaces are only ever read by the display
> +	 * engine we only need to care about flushing any writes through the CPU
> +	 * cache, reads on the other hand will always be coherent.
> +	 *
> +	 * Something strange here is why @cache_coherent is not a simple
> +	 * boolean, i.e coherent vs non-coherent. The reasoning for this is back
> +	 * to the display engine not being fully coherent. As a result scanout
> +	 * surfaces will either be marked as I915_CACHE_NONE or I915_CACHE_WT.
> +	 * In the case of seeing I915_CACHE_NONE the kernel makes the assumption
> +	 * that this is likely a scanout surface, and will set @cache_coherent
> +	 * as only I915_BO_CACHE_COHERENT_FOR_READ, on platforms with the shared

Do we only do this for NONE, and not for WT? That would be a bit a bug I
guess ...

> +	 * LLC. The kernel uses this to always flush writes through the CPU
> +	 * cache as early as possible, where it can, in effect keeping
> +	 * @cache_dirty clean, so we can potentially avoid stalling when
> +	 * flushing the surface just before doing the scanout.  This does mean
> +	 * we might unnecessarily flush non-scanout objects in some places, but
> +	 * the default assumption is that all normal objects should be using
> +	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
> +	 *
> +	 * Supported values:
> +	 *
> +	 * I915_BO_CACHE_COHERENT_FOR_READ:
> +	 *
> +	 * On shared LLC platforms, we use this for special scanout surfaces,
> +	 * where the display engine is not coherent with the CPU cache. As such
> +	 * we need to ensure we flush any writes before doing the scanout. As an
> +	 * optimisation we try to flush any writes as early as possible to avoid
> +	 * stalling later.
> +	 *
> +	 * Thus for scanout surfaces using I915_CACHE_NONE, on shared LLC
> +	 * platforms, we use:
> +	 *
> +	 *	cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ
> +	 *
> +	 * While for normal objects that are fully coherent we use:
> +	 *
> +	 *	cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
> +	 *			 I915_BO_CACHE_COHERENT_FOR_WRITE
> +	 *
> +	 * And then for objects that are not coherent at all we use:
> +	 *
> +	 *	cache_coherent = 0
> +	 *
> +	 * I915_BO_CACHE_COHERENT_FOR_WRITE:
> +	 *
> +	 * When writing through the CPU cache, the GPU is still coherent. Note
> +	 * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> +	 */
>  #define I915_BO_CACHE_COHERENT_FOR_READ BIT(0)
>  #define I915_BO_CACHE_COHERENT_FOR_WRITE BIT(1)
> +	unsigned int cache_coherent:2;
> +
> +	/**
> +	 * @cache_dirty:
> +	 *
> +	 * Track if we are we dirty with writes through the CPU cache for this
> +	 * object. As a result reading directly from main memory might yield
> +	 * stale data.
> +	 *
> +	 * This also ties into whether the kernel is tracking the object as
> +	 * coherent with the GPU, as per @cache_coherent, as it determines if
> +	 * flushing might be needed at various points.
> +	 *
> +	 * Another part of @cache_dirty is managing flushing when first
> +	 * acquiring the pages for system memory, at this point the pages are
> +	 * considered foreign, so the default assumption is that the cache is
> +	 * dirty, for example the page zeroing done by the kernel might leave
> +	 * writes though the CPU cache, or swapping-in, while the actual data in
> +	 * main memory is potentially stale.  Note that this is a potential
> +	 * security issue when dealing with userspace objects and zeroing. Now,
> +	 * whether we actually need apply the big sledgehammer of flushing all
> +	 * the pages on acquire depends on if @cache_coherent is marked as
> +	 * I915_BO_CACHE_COHERENT_FOR_WRITE, i.e that the GPU will be coherent
> +	 * for both reads and writes though the CPU cache.
> +	 *
> +	 * Note that on shared LLC platforms we still apply the heavy flush for
> +	 * I915_CACHE_NONE objects, under the assumption that this is going to
> +	 * be used for scanout.
> +	 */

I feel like rethinking all our special cases here would be really good,
especially around whether we need to flush for security concerns, or not.

E.g. on !LLC platforms, if we set an object to CACHE_LLC, but then use
mocs to not access is such: Can we bypass the cpu cache and potentially
get stale data because i915 didn't force the clflush for this case?

>  	unsigned int cache_dirty:1;
>  
>  	/**
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 0321a1f9738d..f97792ccc199 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -394,15 +394,6 @@ struct drm_i915_display_funcs {
>  	void (*read_luts)(struct intel_crtc_state *crtc_state);
>  };
>  
> -enum i915_cache_level {
> -	I915_CACHE_NONE = 0,
> -	I915_CACHE_LLC, /* also used for snoopable memory on non-LLC */
> -	I915_CACHE_L3_LLC, /* gen7+, L3 sits between the domain specifc
> -			      caches, eg sampler/render caches, and the
> -			      large Last-Level-Cache. LLC is coherent with
> -			      the CPU, but L3 is only visible to the GPU. */
> -	I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */
> -};
>  
>  #define I915_COLOR_UNEVICTABLE (-1) /* a non-vma sharing the address space */

With the nits addressed:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

>  
> -- 
> 2.26.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [v3,1/2] drm/i915: document caching related bits
  2021-07-22 11:34 [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits Matthew Auld
  2021-07-22 11:34 ` [Intel-gfx] [PATCH v3 2/2] drm/i915/ehl: unconditionally flush the pages on acquire Matthew Auld
  2021-07-22 11:54 ` [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits Daniel Vetter
@ 2021-07-22 13:30 ` Patchwork
  2021-07-22 14:01 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  3 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2021-07-22 13:30 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v3,1/2] drm/i915: document caching related bits
URL   : https://patchwork.freedesktop.org/series/92889/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
c4aff2fea027 drm/i915: document caching related bits
abcea5c46d5f drm/i915/ehl: unconditionally flush the pages on acquire
-:21: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#21: 
References: 046091758b50 ("Revert "drm/i915/ehl: Update MOCS table for EHL"")

-:21: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 046091758b50 ("Revert "drm/i915/ehl: Update MOCS table for EHL"")'
#21: 
References: 046091758b50 ("Revert "drm/i915/ehl: Update MOCS table for EHL"")

total: 1 errors, 1 warnings, 0 checks, 36 lines checked


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [v3,1/2] drm/i915: document caching related bits
  2021-07-22 11:34 [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits Matthew Auld
                   ` (2 preceding siblings ...)
  2021-07-22 13:30 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [v3,1/2] " Patchwork
@ 2021-07-22 14:01 ` Patchwork
  3 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2021-07-22 14:01 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 3918 bytes --]

== Series Details ==

Series: series starting with [v3,1/2] drm/i915: document caching related bits
URL   : https://patchwork.freedesktop.org/series/92889/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10371 -> Patchwork_20677
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_20677 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_20677, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20677/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_20677:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_pm_rpm@basic-rte:
    - fi-bdw-5557u:       NOTRUN -> [FAIL][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20677/fi-bdw-5557u/igt@i915_pm_rpm@basic-rte.html

  
Known issues
------------

  Here are the changes found in Patchwork_20677 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@query-info:
    - fi-bsw-kefka:       NOTRUN -> [SKIP][2] ([fdo#109271]) +17 similar issues
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20677/fi-bsw-kefka/igt@amdgpu/amd_basic@query-info.html

  * igt@amdgpu/amd_basic@semaphore:
    - fi-bdw-5557u:       NOTRUN -> [SKIP][3] ([fdo#109271]) +25 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20677/fi-bdw-5557u/igt@amdgpu/amd_basic@semaphore.html

  * igt@core_hotunplug@unbind-rebind:
    - fi-bdw-5557u:       NOTRUN -> [WARN][4] ([i915#3718])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20677/fi-bdw-5557u/igt@core_hotunplug@unbind-rebind.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3:
    - {fi-tgl-1115g4}:    [FAIL][5] ([i915#1888]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10371/fi-tgl-1115g4/igt@gem_exec_suspend@basic-s3.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20677/fi-tgl-1115g4/igt@gem_exec_suspend@basic-s3.html

  * igt@i915_selftest@live@execlists:
    - fi-bsw-kefka:       [INCOMPLETE][7] ([i915#2782] / [i915#2940]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10371/fi-bsw-kefka/igt@i915_selftest@live@execlists.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20677/fi-bsw-kefka/igt@i915_selftest@live@execlists.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#2782]: https://gitlab.freedesktop.org/drm/intel/issues/2782
  [i915#2940]: https://gitlab.freedesktop.org/drm/intel/issues/2940
  [i915#3718]: https://gitlab.freedesktop.org/drm/intel/issues/3718


Participating hosts (38 -> 35)
------------------------------

  Missing    (3): fi-ilk-m540 fi-bdw-samus fi-hsw-4200u 


Build changes
-------------

  * Linux: CI_DRM_10371 -> Patchwork_20677

  CI-20190529: 20190529
  CI_DRM_10371: 8e68c13425e29c96ef94c9dd3583159000c61380 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6147: f3994c2cd99a1acfe991a8cc838a387dcb36598a @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_20677: abcea5c46d5f1656718eced4b854ca680d4d61f4 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

abcea5c46d5f drm/i915/ehl: unconditionally flush the pages on acquire
c4aff2fea027 drm/i915: document caching related bits

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20677/index.html

[-- Attachment #1.2: Type: text/html, Size: 4757 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits
  2021-07-22 11:54 ` [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits Daniel Vetter
@ 2021-07-23  8:58   ` Matthew Auld
  0 siblings, 0 replies; 6+ messages in thread
From: Matthew Auld @ 2021-07-23  8:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, Intel Graphics Development, Matthew Auld, ML dri-devel

On Thu, 22 Jul 2021 at 12:54, Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Thu, Jul 22, 2021 at 12:34:55PM +0100, Matthew Auld wrote:
> > Try to document the object caching related bits, like cache_coherent and
> > cache_dirty.
> >
> > v2(Ville):
> >  - As pointed out by Ville, fix the completely incorrect assumptions
> >    about the "partial" coherency on shared LLC platforms.
> > v3(Daniel):
> >  - Fix nonsense about "dirtying" the cache with reads.
> >
> > Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 176 +++++++++++++++++-
> >  drivers/gpu/drm/i915/i915_drv.h               |   9 -
> >  2 files changed, 172 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > index afbadfc5516b..40cce816a7e3 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > @@ -92,6 +92,76 @@ struct drm_i915_gem_object_ops {
> >       const char *name; /* friendly name for debug, e.g. lockdep classes */
> >  };
> >
> > +/**
> > + * enum i915_cache_level - The supported GTT caching values for system memory
> > + * pages.
> > + *
> > + * These translate to some special GTT PTE bits when binding pages into some
> > + * address space. It also determines whether an object, or rather its pages are
> > + * coherent with the GPU, when also reading or writing through the CPU cache
> > + * with those pages.
> > + *
> > + * Userspace can also control this through struct drm_i915_gem_caching.
> > + */
> > +enum i915_cache_level {
> > +     /**
> > +      * @I915_CACHE_NONE:
> > +      *
> > +      * Not coherent with the CPU cache. If the cache is dirty and we need
> > +      * the underlying pages to be coherent with some later GPU access then
> > +      * we need to manually flush the pages.
> > +      *
> > +      * Note that on shared LLC platforms reads and writes through the CPU
> > +      * cache are still coherent even with this setting. See also
> > +      * &drm_i915_gem_object.cache_coherent for more details.
> > +      *
> > +      * Note that on platforms with a shared LLC this should ideally only be
>
> Merge this with the previous note and maybe explain it with "Due to this
> we should only use uncached for scanout surfaces on platforms with shared
> LLC, otherwise ..."
>
> As-is reads a bit awkward/repetive.
>
> > +      * used for scanout surfaces, otherwise we end up over-flushing in some
> > +      * places.
>
> Maybe also note that on non-LLC platforms uncached is the default.
>
> > +      */
> > +     I915_CACHE_NONE = 0,
> > +     /**
> > +      * @I915_CACHE_LLC:
> > +      *
> > +      * Coherent with the CPU cache. If the cache is dirty, then the GPU will
> > +      * ensure that access remains coherent, when both reading and writing
> > +      * through the CPU cache.
> > +      *
> > +      * Not used for scanout surfaces.
> > +      *
> > +      * Applies to both platforms with shared LLC(HAS_LLC), and snooping
> > +      * based platforms(HAS_SNOOP).
> > +      *
> > +      * This should be the default for platforms which share the LLC with the
> s/should/is/
>
> After all it _is_ the default at object creation time.
>
> > +      * CPU. The only exception is scanout objects, where the display engine
> > +      * is not coherent with the LLC. For such objects I915_CACHE_NONE or
> > +      * I915_CACHE_WT should be used.
>
> Maybe clarify that we automatically apply this transition upon
> pin_for_display if userspace hasn't done it.
>
> > +      */
> > +     I915_CACHE_LLC,
> > +     /**
> > +      * @I915_CACHE_L3_LLC:
> > +      *
> > +      * Explicitly enable the Gfx L3 cache, with snooped LLC.
> > +      *
> > +      * The Gfx L3 sits between the domain specific caches, e.g
> > +      * sampler/render caches, and the larger LLC. LLC is coherent with the
> > +      * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
> > +      * when the workload completes.
> > +      *
> > +      * Not used for scanout surfaces.
> > +      *
> > +      * Only exposed on some gen7 + GGTT. More recent hardware has dropped
> > +      * this.
>
> I think it's also the default on these?

I would say yes.

>
> > +      */
> > +     I915_CACHE_L3_LLC,
>
> > +     /**
> > +      * @I915_CACHE_WT:
> > +      *
> > +      * hsw:gt3e Write-through for scanout buffers.
>
> I haven't checked, but are we using this automatically?

Yes, if the HW supports it.

>
> > +      */
> > +     I915_CACHE_WT,
> > +};
> > +
> >  enum i915_map_type {
> >       I915_MAP_WB = 0,
> >       I915_MAP_WC,
> > @@ -229,14 +299,112 @@ struct drm_i915_gem_object {
> >       unsigned int mem_flags;
> >  #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */
> >  #define I915_BO_FLAG_IOMEM       BIT(1) /* Object backed by IO memory */
> > -     /*
> > -      * Is the object to be mapped as read-only to the GPU
> > -      * Only honoured if hardware has relevant pte bit
> > +     /**
> > +      * @cache_level: The desired GTT caching level.
> > +      *
> > +      * See enum i915_cache_level for possible values, along with what
> > +      * each does.
> >        */
> >       unsigned int cache_level:3;
> > -     unsigned int cache_coherent:2;
> > +     /**
> > +      * @cache_coherent:
> > +      *
> > +      * Track whether the pages are coherent with the GPU if reading or
> > +      * writing through the CPU caches. The largely depends on the
> > +      * @cache_level setting.
> > +      *
> > +      * On platforms which don't have the shared LLC(HAS_SNOOP), like on Atom
> > +      * platforms, coherency must be explicitly requested with some special
> > +      * GTT caching bits(see enum i915_cache_level). When enabling coherency
> > +      * it does come at a performance and power cost on such platforms. On
> > +      * the flip side the kernel does need to manually flush any buffers
>
> does _not_ need
>
> I think at least that's what you mean here.
>
> > +      * which need to be coherent with the GPU, if the object is not
> > +      * coherent i.e @cache_coherent is zero.
> > +      *
> > +      * On platforms that share the LLC with the CPU(HAS_LLC), all GT memory
> > +      * access will automatically snoop the CPU caches(even with CACHE_NONE).
> > +      * The one exception is when dealing with the display engine, like with
> > +      * scanout surfaces. To handle this the kernel will always flush the
> > +      * surface out of the CPU caches when preparing it for scanout.  Also
> > +      * note that since scanout surfaces are only ever read by the display
> > +      * engine we only need to care about flushing any writes through the CPU
> > +      * cache, reads on the other hand will always be coherent.
> > +      *
> > +      * Something strange here is why @cache_coherent is not a simple
> > +      * boolean, i.e coherent vs non-coherent. The reasoning for this is back
> > +      * to the display engine not being fully coherent. As a result scanout
> > +      * surfaces will either be marked as I915_CACHE_NONE or I915_CACHE_WT.
> > +      * In the case of seeing I915_CACHE_NONE the kernel makes the assumption
> > +      * that this is likely a scanout surface, and will set @cache_coherent
> > +      * as only I915_BO_CACHE_COHERENT_FOR_READ, on platforms with the shared
>
> Do we only do this for NONE, and not for WT? That would be a bit a bug I
> guess ...

If I'm reading this correctly, write-through only ensures we don't
need to flush the scanout surface when moving it out of the render
domain, but writes through the cache on the CPU side still need to be
flushed, so yeah I would have expected cache_coherent = FOR_READ here
for WT...

I guess that means we don't do the flush-early optimisations in some
places, and for the flush-on-acquire, the forced set_cache_level in
pin_to_display should still ensure cache_dirty = true before the
scanout?

>
> > +      * LLC. The kernel uses this to always flush writes through the CPU
> > +      * cache as early as possible, where it can, in effect keeping
> > +      * @cache_dirty clean, so we can potentially avoid stalling when
> > +      * flushing the surface just before doing the scanout.  This does mean
> > +      * we might unnecessarily flush non-scanout objects in some places, but
> > +      * the default assumption is that all normal objects should be using
> > +      * I915_CACHE_LLC, at least on platforms with the shared LLC.
> > +      *
> > +      * Supported values:
> > +      *
> > +      * I915_BO_CACHE_COHERENT_FOR_READ:
> > +      *
> > +      * On shared LLC platforms, we use this for special scanout surfaces,
> > +      * where the display engine is not coherent with the CPU cache. As such
> > +      * we need to ensure we flush any writes before doing the scanout. As an
> > +      * optimisation we try to flush any writes as early as possible to avoid
> > +      * stalling later.
> > +      *
> > +      * Thus for scanout surfaces using I915_CACHE_NONE, on shared LLC
> > +      * platforms, we use:
> > +      *
> > +      *      cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ
> > +      *
> > +      * While for normal objects that are fully coherent we use:
> > +      *
> > +      *      cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
> > +      *                       I915_BO_CACHE_COHERENT_FOR_WRITE
> > +      *
> > +      * And then for objects that are not coherent at all we use:
> > +      *
> > +      *      cache_coherent = 0
> > +      *
> > +      * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > +      *
> > +      * When writing through the CPU cache, the GPU is still coherent. Note
> > +      * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > +      */
> >  #define I915_BO_CACHE_COHERENT_FOR_READ BIT(0)
> >  #define I915_BO_CACHE_COHERENT_FOR_WRITE BIT(1)
> > +     unsigned int cache_coherent:2;
> > +
> > +     /**
> > +      * @cache_dirty:
> > +      *
> > +      * Track if we are we dirty with writes through the CPU cache for this
> > +      * object. As a result reading directly from main memory might yield
> > +      * stale data.
> > +      *
> > +      * This also ties into whether the kernel is tracking the object as
> > +      * coherent with the GPU, as per @cache_coherent, as it determines if
> > +      * flushing might be needed at various points.
> > +      *
> > +      * Another part of @cache_dirty is managing flushing when first
> > +      * acquiring the pages for system memory, at this point the pages are
> > +      * considered foreign, so the default assumption is that the cache is
> > +      * dirty, for example the page zeroing done by the kernel might leave
> > +      * writes though the CPU cache, or swapping-in, while the actual data in
> > +      * main memory is potentially stale.  Note that this is a potential
> > +      * security issue when dealing with userspace objects and zeroing. Now,
> > +      * whether we actually need apply the big sledgehammer of flushing all
> > +      * the pages on acquire depends on if @cache_coherent is marked as
> > +      * I915_BO_CACHE_COHERENT_FOR_WRITE, i.e that the GPU will be coherent
> > +      * for both reads and writes though the CPU cache.
> > +      *
> > +      * Note that on shared LLC platforms we still apply the heavy flush for
> > +      * I915_CACHE_NONE objects, under the assumption that this is going to
> > +      * be used for scanout.
> > +      */
>
> I feel like rethinking all our special cases here would be really good,
> especially around whether we need to flush for security concerns, or not.
>
> E.g. on !LLC platforms, if we set an object to CACHE_LLC, but then use
> mocs to not access is such: Can we bypass the cpu cache and potentially
> get stale data because i915 didn't force the clflush for this case?
>
> >       unsigned int cache_dirty:1;
> >
> >       /**
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 0321a1f9738d..f97792ccc199 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -394,15 +394,6 @@ struct drm_i915_display_funcs {
> >       void (*read_luts)(struct intel_crtc_state *crtc_state);
> >  };
> >
> > -enum i915_cache_level {
> > -     I915_CACHE_NONE = 0,
> > -     I915_CACHE_LLC, /* also used for snoopable memory on non-LLC */
> > -     I915_CACHE_L3_LLC, /* gen7+, L3 sits between the domain specifc
> > -                           caches, eg sampler/render caches, and the
> > -                           large Last-Level-Cache. LLC is coherent with
> > -                           the CPU, but L3 is only visible to the GPU. */
> > -     I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */
> > -};
> >
> >  #define I915_COLOR_UNEVICTABLE (-1) /* a non-vma sharing the address space */
>
> With the nits addressed:
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> >
> > --
> > 2.26.3
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-07-23  8:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-22 11:34 [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits Matthew Auld
2021-07-22 11:34 ` [Intel-gfx] [PATCH v3 2/2] drm/i915/ehl: unconditionally flush the pages on acquire Matthew Auld
2021-07-22 11:54 ` [Intel-gfx] [PATCH v3 1/2] drm/i915: document caching related bits Daniel Vetter
2021-07-23  8:58   ` Matthew Auld
2021-07-22 13:30 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [v3,1/2] " Patchwork
2021-07-22 14:01 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).