[PATCH v3 00/13] small BAR uapi bits

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 00/13] small BAR uapi bits
@ 2022-06-29 12:14 ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel

IGT: https://patchwork.freedesktop.org/series/104368/#rev2
Mesa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16739


-- 
2.36.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 00/13] small BAR uapi bits
@ 2022-06-29 12:14 ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel

IGT: https://patchwork.freedesktop.org/series/104368/#rev2
Mesa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16739


-- 
2.36.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 01/13] drm/doc: add rfc section for small BAR uapi
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, mesa-dev, Akeem G Abodunrin

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
  - Some spelling fixes and other small tweaks. (Akeem & Thomas)
  - Rework error capture interactions, including no longer needing
    NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
  - Add probed_cpu_visible_size. (Lionel)
v3:
  - Drop the vma query for now.
  - Add unallocated_cpu_visible_size as part of the region query.
  - Improve the docs some more, including documenting the expected
    behaviour on older kernels, since this came up in some offline
    discussion.
v4:
  - Various improvements all over. (Tvrtko)

v5:
  - Include newer integrated platforms when applying the non-recoverable
    context and error capture restriction. (Thomas)

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: mesa-dev@lists.freedesktop.org
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Acked-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
---
 Documentation/gpu/rfc/i915_small_bar.h   | 189 +++++++++++++++++++++++
 Documentation/gpu/rfc/i915_small_bar.rst |  47 ++++++
 Documentation/gpu/rfc/index.rst          |   4 +
 3 files changed, 240 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
 create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index 000000000000..6003c81d5aa4
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,189 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
+ * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+	/** @region: The class:instance pair encoding */
+	struct drm_i915_gem_memory_class_instance region;
+
+	/** @rsvd0: MBZ */
+	__u32 rsvd0;
+
+	/**
+	 * @probed_size: Memory probed by the driver
+	 *
+	 * Note that it should not be possible to ever encounter a zero value
+	 * here, also note that no current region type will ever return -1 here.
+	 * Although for future region types, this might be a possibility. The
+	 * same applies to the other size fields.
+	 */
+	__u64 probed_size;
+
+	/**
+	 * @unallocated_size: Estimate of memory remaining
+	 *
+	 * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting.
+	 * Without this (or if this is an older kernel) the value here will
+	 * always equal the @probed_size. Note this is only currently tracked
+	 * for I915_MEMORY_CLASS_DEVICE regions (for other types the value here
+	 * will always equal the @probed_size).
+	 */
+	__u64 unallocated_size;
+
+	union {
+		/** @rsvd1: MBZ */
+		__u64 rsvd1[8];
+		struct {
+			/**
+			 * @probed_cpu_visible_size: Memory probed by the driver
+			 * that is CPU accessible.
+			 *
+			 * This will be always be <= @probed_size, and the
+			 * remainder (if there is any) will not be CPU
+			 * accessible.
+			 *
+			 * On systems without small BAR, the @probed_size will
+			 * always equal the @probed_cpu_visible_size, since all
+			 * of it will be CPU accessible.
+			 *
+			 * Note this is only tracked for
+			 * I915_MEMORY_CLASS_DEVICE regions (for other types the
+			 * value here will always equal the @probed_size).
+			 *
+			 * Note that if the value returned here is zero, then
+			 * this must be an old kernel which lacks the relevant
+			 * small-bar uAPI support (including
+			 * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on
+			 * such systems we should never actually end up with a
+			 * small BAR configuration, assuming we are able to load
+			 * the kernel module. Hence it should be safe to treat
+			 * this the same as when @probed_cpu_visible_size ==
+			 * @probed_size.
+			 */
+			__u64 probed_cpu_visible_size;
+
+			/**
+			 * @unallocated_cpu_visible_size: Estimate of CPU
+			 * visible memory remaining
+			 *
+			 * Note this is only tracked for
+			 * I915_MEMORY_CLASS_DEVICE regions (for other types the
+			 * value here will always equal the
+			 * @probed_cpu_visible_size).
+			 *
+			 * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable
+			 * accounting.  Without this the value here will always
+			 * equal the @probed_cpu_visible_size. Note this is only
+			 * currently tracked for I915_MEMORY_CLASS_DEVICE
+			 * regions (for other types the value here will also
+			 * always equal the @probed_cpu_visible_size).
+			 *
+			 * If this is an older kernel the value here will be
+			 * zero, see also @probed_cpu_visible_size.
+			 */
+			__u64 unallocated_cpu_visible_size;
+		};
+	};
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+	/**
+	 * @size: Requested size for the object.
+	 *
+	 * The (page-aligned) allocated size for the object will be returned.
+	 *
+	 * Note that for some devices we have might have further minimum
+	 * page-size restrictions (larger than 4K), like for device local-memory.
+	 * However in general the final size here should always reflect any
+	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
+	 * extension to place the object in device local-memory. The kernel will
+	 * always select the largest minimum page-size for the set of possible
+	 * placements as the value to use when rounding up the @size.
+	 */
+	__u64 size;
+
+	/**
+	 * @handle: Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+
+	/**
+	 * @flags: Optional flags.
+	 *
+	 * Supported values:
+	 *
+	 * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+	 * the object will need to be accessed via the CPU.
+	 *
+	 * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and only
+	 * strictly required on configurations where some subset of the device
+	 * memory is directly visible/mappable through the CPU (which we also
+	 * call small BAR), like on some DG2+ systems. Note that this is quite
+	 * undesirable, but due to various factors like the client CPU, BIOS etc
+	 * it's something we can expect to see in the wild. See
+	 * &__drm_i915_memory_region_info.probed_cpu_visible_size for how to
+	 * determine if this system applies.
+	 *
+	 * Note that one of the placements MUST be I915_MEMORY_CLASS_SYSTEM, to
+	 * ensure the kernel can always spill the allocation to system memory,
+	 * if the object can't be allocated in the mappable part of
+	 * I915_MEMORY_CLASS_DEVICE.
+	 *
+	 * Also note that since the kernel only supports flat-CCS on objects
+	 * that can *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefore
+	 * don't support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with
+	 * flat-CCS.
+	 *
+	 * Without this hint, the kernel will assume that non-mappable
+	 * I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the
+	 * kernel can still migrate the object to the mappable part, as a last
+	 * resort, if userspace ever CPU faults this object, but this might be
+	 * expensive, and so ideally should be avoided.
+	 *
+	 * On older kernels which lack the relevant small-bar uAPI support (see
+	 * also &__drm_i915_memory_region_info.probed_cpu_visible_size),
+	 * usage of the flag will result in an error, but it should NEVER be
+	 * possible to end up with a small BAR configuration, assuming we can
+	 * also successfully load the i915 kernel module. In such cases the
+	 * entire I915_MEMORY_CLASS_DEVICE region will be CPU accessible, and as
+	 * such there are zero restrictions on where the object can be placed.
+	 */
+#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0)
+	__u32 flags;
+
+	/**
+	 * @extensions: The chain of extensions to apply to this object.
+	 *
+	 * This will be useful in the future when we need to support several
+	 * different extensions, and we need to apply more than one when
+	 * creating the object. See struct i915_user_extension.
+	 *
+	 * If we don't supply any extensions then we get the same old gem_create
+	 * behaviour.
+	 *
+	 * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
+	 * struct drm_i915_gem_create_ext_memory_regions.
+	 *
+	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
+	 * struct drm_i915_gem_create_ext_protected_content.
+	 */
+#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
+#define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+	__u64 extensions;
+};
diff --git a/Documentation/gpu/rfc/i915_small_bar.rst b/Documentation/gpu/rfc/i915_small_bar.rst
new file mode 100644
index 000000000000..d6c03ce3b862
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.rst
@@ -0,0 +1,47 @@
+==========================
+I915 Small BAR RFC Section
+==========================
+Starting from DG2 we will have resizable BAR support for device local-memory(i.e
+I915_MEMORY_CLASS_DEVICE), but in some cases the final BAR size might still be
+smaller than the total probed_size. In such cases, only some subset of
+I915_MEMORY_CLASS_DEVICE will be CPU accessible(for example the first 256M),
+while the remainder is only accessible via the GPU.
+
+I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag
+----------------------------------------------
+New gem_create_ext flag to tell the kernel that a BO will require CPU access.
+This becomes important when placing an object in I915_MEMORY_CLASS_DEVICE, where
+underneath the device has a small BAR, meaning only some portion of it is CPU
+accessible. Without this flag the kernel will assume that CPU access is not
+required, and prioritize using the non-CPU visible portion of
+I915_MEMORY_CLASS_DEVICE.
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_small_bar.h
+   :functions: __drm_i915_gem_create_ext
+
+probed_cpu_visible_size attribute
+---------------------------------
+New struct__drm_i915_memory_region attribute which returns the total size of the
+CPU accessible portion, for the particular region. This should only be
+applicable for I915_MEMORY_CLASS_DEVICE. We also report the
+unallocated_cpu_visible_size, alongside the unallocated_size.
+
+Vulkan will need this as part of creating a separate VkMemoryHeap with the
+VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT set, to represent the CPU visible portion,
+where the total size of the heap needs to be known. It also wants to be able to
+give a rough estimate of how memory can potentially be allocated.
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_small_bar.h
+   :functions: __drm_i915_memory_region_info
+
+Error Capture restrictions
+--------------------------
+With error capture we have two new restrictions:
+
+    1) Error capture is best effort on small BAR systems; if the pages are not
+    CPU accessible, at the time of capture, then the kernel is free to skip
+    trying to capture them.
+
+    2) On discrete and newer integrated platforms we now reject error capture
+    on recoverable contexts. In the future the kernel may want to blit during
+    error capture, when for example something is not currently CPU accessible.
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..5a3bd3924ba6 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    i915_small_bar.rst
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 01/13] drm/doc: add rfc section for small BAR uapi
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel,
	mesa-dev

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
  - Some spelling fixes and other small tweaks. (Akeem & Thomas)
  - Rework error capture interactions, including no longer needing
    NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
  - Add probed_cpu_visible_size. (Lionel)
v3:
  - Drop the vma query for now.
  - Add unallocated_cpu_visible_size as part of the region query.
  - Improve the docs some more, including documenting the expected
    behaviour on older kernels, since this came up in some offline
    discussion.
v4:
  - Various improvements all over. (Tvrtko)

v5:
  - Include newer integrated platforms when applying the non-recoverable
    context and error capture restriction. (Thomas)

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: mesa-dev@lists.freedesktop.org
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Acked-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
---
 Documentation/gpu/rfc/i915_small_bar.h   | 189 +++++++++++++++++++++++
 Documentation/gpu/rfc/i915_small_bar.rst |  47 ++++++
 Documentation/gpu/rfc/index.rst          |   4 +
 3 files changed, 240 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
 create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index 000000000000..6003c81d5aa4
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,189 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
+ * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+	/** @region: The class:instance pair encoding */
+	struct drm_i915_gem_memory_class_instance region;
+
+	/** @rsvd0: MBZ */
+	__u32 rsvd0;
+
+	/**
+	 * @probed_size: Memory probed by the driver
+	 *
+	 * Note that it should not be possible to ever encounter a zero value
+	 * here, also note that no current region type will ever return -1 here.
+	 * Although for future region types, this might be a possibility. The
+	 * same applies to the other size fields.
+	 */
+	__u64 probed_size;
+
+	/**
+	 * @unallocated_size: Estimate of memory remaining
+	 *
+	 * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting.
+	 * Without this (or if this is an older kernel) the value here will
+	 * always equal the @probed_size. Note this is only currently tracked
+	 * for I915_MEMORY_CLASS_DEVICE regions (for other types the value here
+	 * will always equal the @probed_size).
+	 */
+	__u64 unallocated_size;
+
+	union {
+		/** @rsvd1: MBZ */
+		__u64 rsvd1[8];
+		struct {
+			/**
+			 * @probed_cpu_visible_size: Memory probed by the driver
+			 * that is CPU accessible.
+			 *
+			 * This will be always be <= @probed_size, and the
+			 * remainder (if there is any) will not be CPU
+			 * accessible.
+			 *
+			 * On systems without small BAR, the @probed_size will
+			 * always equal the @probed_cpu_visible_size, since all
+			 * of it will be CPU accessible.
+			 *
+			 * Note this is only tracked for
+			 * I915_MEMORY_CLASS_DEVICE regions (for other types the
+			 * value here will always equal the @probed_size).
+			 *
+			 * Note that if the value returned here is zero, then
+			 * this must be an old kernel which lacks the relevant
+			 * small-bar uAPI support (including
+			 * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on
+			 * such systems we should never actually end up with a
+			 * small BAR configuration, assuming we are able to load
+			 * the kernel module. Hence it should be safe to treat
+			 * this the same as when @probed_cpu_visible_size ==
+			 * @probed_size.
+			 */
+			__u64 probed_cpu_visible_size;
+
+			/**
+			 * @unallocated_cpu_visible_size: Estimate of CPU
+			 * visible memory remaining
+			 *
+			 * Note this is only tracked for
+			 * I915_MEMORY_CLASS_DEVICE regions (for other types the
+			 * value here will always equal the
+			 * @probed_cpu_visible_size).
+			 *
+			 * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable
+			 * accounting.  Without this the value here will always
+			 * equal the @probed_cpu_visible_size. Note this is only
+			 * currently tracked for I915_MEMORY_CLASS_DEVICE
+			 * regions (for other types the value here will also
+			 * always equal the @probed_cpu_visible_size).
+			 *
+			 * If this is an older kernel the value here will be
+			 * zero, see also @probed_cpu_visible_size.
+			 */
+			__u64 unallocated_cpu_visible_size;
+		};
+	};
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+	/**
+	 * @size: Requested size for the object.
+	 *
+	 * The (page-aligned) allocated size for the object will be returned.
+	 *
+	 * Note that for some devices we have might have further minimum
+	 * page-size restrictions (larger than 4K), like for device local-memory.
+	 * However in general the final size here should always reflect any
+	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
+	 * extension to place the object in device local-memory. The kernel will
+	 * always select the largest minimum page-size for the set of possible
+	 * placements as the value to use when rounding up the @size.
+	 */
+	__u64 size;
+
+	/**
+	 * @handle: Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+
+	/**
+	 * @flags: Optional flags.
+	 *
+	 * Supported values:
+	 *
+	 * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+	 * the object will need to be accessed via the CPU.
+	 *
+	 * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and only
+	 * strictly required on configurations where some subset of the device
+	 * memory is directly visible/mappable through the CPU (which we also
+	 * call small BAR), like on some DG2+ systems. Note that this is quite
+	 * undesirable, but due to various factors like the client CPU, BIOS etc
+	 * it's something we can expect to see in the wild. See
+	 * &__drm_i915_memory_region_info.probed_cpu_visible_size for how to
+	 * determine if this system applies.
+	 *
+	 * Note that one of the placements MUST be I915_MEMORY_CLASS_SYSTEM, to
+	 * ensure the kernel can always spill the allocation to system memory,
+	 * if the object can't be allocated in the mappable part of
+	 * I915_MEMORY_CLASS_DEVICE.
+	 *
+	 * Also note that since the kernel only supports flat-CCS on objects
+	 * that can *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefore
+	 * don't support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with
+	 * flat-CCS.
+	 *
+	 * Without this hint, the kernel will assume that non-mappable
+	 * I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the
+	 * kernel can still migrate the object to the mappable part, as a last
+	 * resort, if userspace ever CPU faults this object, but this might be
+	 * expensive, and so ideally should be avoided.
+	 *
+	 * On older kernels which lack the relevant small-bar uAPI support (see
+	 * also &__drm_i915_memory_region_info.probed_cpu_visible_size),
+	 * usage of the flag will result in an error, but it should NEVER be
+	 * possible to end up with a small BAR configuration, assuming we can
+	 * also successfully load the i915 kernel module. In such cases the
+	 * entire I915_MEMORY_CLASS_DEVICE region will be CPU accessible, and as
+	 * such there are zero restrictions on where the object can be placed.
+	 */
+#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0)
+	__u32 flags;
+
+	/**
+	 * @extensions: The chain of extensions to apply to this object.
+	 *
+	 * This will be useful in the future when we need to support several
+	 * different extensions, and we need to apply more than one when
+	 * creating the object. See struct i915_user_extension.
+	 *
+	 * If we don't supply any extensions then we get the same old gem_create
+	 * behaviour.
+	 *
+	 * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
+	 * struct drm_i915_gem_create_ext_memory_regions.
+	 *
+	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
+	 * struct drm_i915_gem_create_ext_protected_content.
+	 */
+#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
+#define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+	__u64 extensions;
+};
diff --git a/Documentation/gpu/rfc/i915_small_bar.rst b/Documentation/gpu/rfc/i915_small_bar.rst
new file mode 100644
index 000000000000..d6c03ce3b862
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.rst
@@ -0,0 +1,47 @@
+==========================
+I915 Small BAR RFC Section
+==========================
+Starting from DG2 we will have resizable BAR support for device local-memory(i.e
+I915_MEMORY_CLASS_DEVICE), but in some cases the final BAR size might still be
+smaller than the total probed_size. In such cases, only some subset of
+I915_MEMORY_CLASS_DEVICE will be CPU accessible(for example the first 256M),
+while the remainder is only accessible via the GPU.
+
+I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag
+----------------------------------------------
+New gem_create_ext flag to tell the kernel that a BO will require CPU access.
+This becomes important when placing an object in I915_MEMORY_CLASS_DEVICE, where
+underneath the device has a small BAR, meaning only some portion of it is CPU
+accessible. Without this flag the kernel will assume that CPU access is not
+required, and prioritize using the non-CPU visible portion of
+I915_MEMORY_CLASS_DEVICE.
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_small_bar.h
+   :functions: __drm_i915_gem_create_ext
+
+probed_cpu_visible_size attribute
+---------------------------------
+New struct__drm_i915_memory_region attribute which returns the total size of the
+CPU accessible portion, for the particular region. This should only be
+applicable for I915_MEMORY_CLASS_DEVICE. We also report the
+unallocated_cpu_visible_size, alongside the unallocated_size.
+
+Vulkan will need this as part of creating a separate VkMemoryHeap with the
+VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT set, to represent the CPU visible portion,
+where the total size of the heap needs to be known. It also wants to be able to
+give a rough estimate of how memory can potentially be allocated.
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_small_bar.h
+   :functions: __drm_i915_memory_region_info
+
+Error Capture restrictions
+--------------------------
+With error capture we have two new restrictions:
+
+    1) Error capture is best effort on small BAR systems; if the pages are not
+    CPU accessible, at the time of capture, then the kernel is free to skip
+    trying to capture them.
+
+    2) On discrete and newer integrated platforms we now reject error capture
+    on recoverable contexts. In the future the kernel may want to blit during
+    error capture, when for example something is not currently CPU accessible.
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..5a3bd3924ba6 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    i915_small_bar.rst
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 02/13] drm/i915/uapi: add probed_cpu_visible_size
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin, Nirmoy Das

Userspace wants to know the size of CPU visible portion of device
local-memory, and on small BAR devices the probed_size is no longer
enough. In Vulkan, for example, it would like to know the size in bytes
for CPU visible VkMemoryHeap. We already track the io_size for each
region, so plumb that through to the region query.

v2: Drop the ( -1 = unknown ) stuff, which is confusing since nothing
can currently ever return such a value.

Testcase: igt@i915_query@query-regions-sanity-check
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_query.c |  6 +++
 include/uapi/drm/i915_drm.h       | 76 +++++++++++++++++--------------
 2 files changed, 48 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 0094f67c63f2..9894add651dd 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -498,6 +498,12 @@ static int query_memregion_info(struct drm_i915_private *i915,
 		info.region.memory_class = mr->type;
 		info.region.memory_instance = mr->instance;
 		info.probed_size = mr->total;
+
+		if (mr->type == INTEL_MEMORY_LOCAL)
+			info.probed_cpu_visible_size = mr->io_size;
+		else
+			info.probed_cpu_visible_size = mr->total;
+
 		info.unallocated_size = mr->avail;
 
 		if (__copy_to_user(info_ptr, &info, sizeof(info)))
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index de49b68b4fc8..7eacacb00373 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3207,36 +3207,6 @@ struct drm_i915_gem_memory_class_instance {
  * struct drm_i915_memory_region_info - Describes one region as known to the
  * driver.
  *
- * Note that we reserve some stuff here for potential future work. As an example
- * we might want expose the capabilities for a given region, which could include
- * things like if the region is CPU mappable/accessible, what are the supported
- * mapping types etc.
- *
- * Note that to extend struct drm_i915_memory_region_info and struct
- * drm_i915_query_memory_regions in the future the plan is to do the following:
- *
- * .. code-block:: C
- *
- *	struct drm_i915_memory_region_info {
- *		struct drm_i915_gem_memory_class_instance region;
- *		union {
- *			__u32 rsvd0;
- *			__u32 new_thing1;
- *		};
- *		...
- *		union {
- *			__u64 rsvd1[8];
- *			struct {
- *				__u64 new_thing2;
- *				__u64 new_thing3;
- *				...
- *			};
- *		};
- *	};
- *
- * With this things should remain source compatible between versions for
- * userspace, even as we add new fields.
- *
  * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
  * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
  * at &drm_i915_query_item.query_id.
@@ -3248,14 +3218,52 @@ struct drm_i915_memory_region_info {
 	/** @rsvd0: MBZ */
 	__u32 rsvd0;
 
-	/** @probed_size: Memory probed by the driver (-1 = unknown) */
+	/**
+	 * @probed_size: Memory probed by the driver
+	 *
+	 * Note that it should not be possible to ever encounter a zero value
+	 * here, also note that no current region type will ever return -1 here.
+	 * Although for future region types, this might be a possibility. The
+	 * same applies to the other size fields.
+	 */
 	__u64 probed_size;
 
-	/** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+	/** @unallocated_size: Estimate of memory remaining */
 	__u64 unallocated_size;
 
-	/** @rsvd1: MBZ */
-	__u64 rsvd1[8];
+	union {
+		/** @rsvd1: MBZ */
+		__u64 rsvd1[8];
+		struct {
+			/**
+			 * @probed_cpu_visible_size: Memory probed by the driver
+			 * that is CPU accessible.
+			 *
+			 * This will be always be <= @probed_size, and the
+			 * remainder (if there is any) will not be CPU
+			 * accessible.
+			 *
+			 * On systems without small BAR, the @probed_size will
+			 * always equal the @probed_cpu_visible_size, since all
+			 * of it will be CPU accessible.
+			 *
+			 * Note this is only tracked for
+			 * I915_MEMORY_CLASS_DEVICE regions (for other types the
+			 * value here will always equal the @probed_size).
+			 *
+			 * Note that if the value returned here is zero, then
+			 * this must be an old kernel which lacks the relevant
+			 * small-bar uAPI support (including
+			 * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on
+			 * such systems we should never actually end up with a
+			 * small BAR configuration, assuming we are able to load
+			 * the kernel module. Hence it should be safe to treat
+			 * this the same as when @probed_cpu_visible_size ==
+			 * @probed_size.
+			 */
+			__u64 probed_cpu_visible_size;
+		};
+	};
 };
 
 /**
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 02/13] drm/i915/uapi: add probed_cpu_visible_size
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel,
	Nirmoy Das

Userspace wants to know the size of CPU visible portion of device
local-memory, and on small BAR devices the probed_size is no longer
enough. In Vulkan, for example, it would like to know the size in bytes
for CPU visible VkMemoryHeap. We already track the io_size for each
region, so plumb that through to the region query.

v2: Drop the ( -1 = unknown ) stuff, which is confusing since nothing
can currently ever return such a value.

Testcase: igt@i915_query@query-regions-sanity-check
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_query.c |  6 +++
 include/uapi/drm/i915_drm.h       | 76 +++++++++++++++++--------------
 2 files changed, 48 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 0094f67c63f2..9894add651dd 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -498,6 +498,12 @@ static int query_memregion_info(struct drm_i915_private *i915,
 		info.region.memory_class = mr->type;
 		info.region.memory_instance = mr->instance;
 		info.probed_size = mr->total;
+
+		if (mr->type == INTEL_MEMORY_LOCAL)
+			info.probed_cpu_visible_size = mr->io_size;
+		else
+			info.probed_cpu_visible_size = mr->total;
+
 		info.unallocated_size = mr->avail;
 
 		if (__copy_to_user(info_ptr, &info, sizeof(info)))
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index de49b68b4fc8..7eacacb00373 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3207,36 +3207,6 @@ struct drm_i915_gem_memory_class_instance {
  * struct drm_i915_memory_region_info - Describes one region as known to the
  * driver.
  *
- * Note that we reserve some stuff here for potential future work. As an example
- * we might want expose the capabilities for a given region, which could include
- * things like if the region is CPU mappable/accessible, what are the supported
- * mapping types etc.
- *
- * Note that to extend struct drm_i915_memory_region_info and struct
- * drm_i915_query_memory_regions in the future the plan is to do the following:
- *
- * .. code-block:: C
- *
- *	struct drm_i915_memory_region_info {
- *		struct drm_i915_gem_memory_class_instance region;
- *		union {
- *			__u32 rsvd0;
- *			__u32 new_thing1;
- *		};
- *		...
- *		union {
- *			__u64 rsvd1[8];
- *			struct {
- *				__u64 new_thing2;
- *				__u64 new_thing3;
- *				...
- *			};
- *		};
- *	};
- *
- * With this things should remain source compatible between versions for
- * userspace, even as we add new fields.
- *
  * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
  * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
  * at &drm_i915_query_item.query_id.
@@ -3248,14 +3218,52 @@ struct drm_i915_memory_region_info {
 	/** @rsvd0: MBZ */
 	__u32 rsvd0;
 
-	/** @probed_size: Memory probed by the driver (-1 = unknown) */
+	/**
+	 * @probed_size: Memory probed by the driver
+	 *
+	 * Note that it should not be possible to ever encounter a zero value
+	 * here, also note that no current region type will ever return -1 here.
+	 * Although for future region types, this might be a possibility. The
+	 * same applies to the other size fields.
+	 */
 	__u64 probed_size;
 
-	/** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+	/** @unallocated_size: Estimate of memory remaining */
 	__u64 unallocated_size;
 
-	/** @rsvd1: MBZ */
-	__u64 rsvd1[8];
+	union {
+		/** @rsvd1: MBZ */
+		__u64 rsvd1[8];
+		struct {
+			/**
+			 * @probed_cpu_visible_size: Memory probed by the driver
+			 * that is CPU accessible.
+			 *
+			 * This will be always be <= @probed_size, and the
+			 * remainder (if there is any) will not be CPU
+			 * accessible.
+			 *
+			 * On systems without small BAR, the @probed_size will
+			 * always equal the @probed_cpu_visible_size, since all
+			 * of it will be CPU accessible.
+			 *
+			 * Note this is only tracked for
+			 * I915_MEMORY_CLASS_DEVICE regions (for other types the
+			 * value here will always equal the @probed_size).
+			 *
+			 * Note that if the value returned here is zero, then
+			 * this must be an old kernel which lacks the relevant
+			 * small-bar uAPI support (including
+			 * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on
+			 * such systems we should never actually end up with a
+			 * small BAR configuration, assuming we are able to load
+			 * the kernel module. Hence it should be safe to treat
+			 * this the same as when @probed_cpu_visible_size ==
+			 * @probed_size.
+			 */
+			__u64 probed_cpu_visible_size;
+		};
+	};
 };
 
 /**
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 03/13] drm/i915/uapi: expose the avail tracking
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin

Vulkan would like to have a rough measure of how much device memory can
in theory be allocated. Also add unallocated_cpu_visible_size to track
the visible portion, in case the device is using small BAR. Also tweak
the locking so we nice consistent values for both the mm->avail and the
visible tracking.

v2: tweak the locking slightly so we update the mm->avail and visible
tracking as one atomic operation, such that userspace doesn't get
strange values when sampling the values.

Testcase: igt@i915_query@query-regions-unallocated
Testcase: igt@i915_query@query-regions-sanity-check
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_query.c             | 10 +++++-
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 31 ++++++++++++++-----
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.h |  3 ++
 drivers/gpu/drm/i915/intel_memory_region.c    | 14 +++++++++
 drivers/gpu/drm/i915/intel_memory_region.h    |  3 ++
 include/uapi/drm/i915_drm.h                   | 31 ++++++++++++++++++-
 6 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 9894add651dd..6ec9c9fb7b0d 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -504,7 +504,15 @@ static int query_memregion_info(struct drm_i915_private *i915,
 		else
 			info.probed_cpu_visible_size = mr->total;
 
-		info.unallocated_size = mr->avail;
+		if (perfmon_capable()) {
+			intel_memory_region_avail(mr,
+						  &info.unallocated_size,
+						  &info.unallocated_cpu_visible_size);
+		} else {
+			info.unallocated_size = info.probed_size;
+			info.unallocated_cpu_visible_size =
+				info.probed_cpu_visible_size;
+		}
 
 		if (__copy_to_user(info_ptr, &info, sizeof(info)))
 			return -EFAULT;
diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
index a5109548abc0..427de1aaab36 100644
--- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
+++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
@@ -104,18 +104,15 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
 				     min_page_size,
 				     &bman_res->blocks,
 				     bman_res->flags);
-	mutex_unlock(&bman->lock);
 	if (unlikely(err))
 		goto err_free_blocks;
 
 	if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
 		u64 original_size = (u64)bman_res->base.num_pages << PAGE_SHIFT;
 
-		mutex_lock(&bman->lock);
 		drm_buddy_block_trim(mm,
 				     original_size,
 				     &bman_res->blocks);
-		mutex_unlock(&bman->lock);
 	}
 
 	if (lpfn <= bman->visible_size) {
@@ -137,11 +134,10 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
 		}
 	}
 
-	if (bman_res->used_visible_size) {
-		mutex_lock(&bman->lock);
+	if (bman_res->used_visible_size)
 		bman->visible_avail -= bman_res->used_visible_size;
-		mutex_unlock(&bman->lock);
-	}
+
+	mutex_unlock(&bman->lock);
 
 	if (place->lpfn - place->fpfn == n_pages)
 		bman_res->base.start = place->fpfn;
@@ -154,7 +150,6 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
 	return 0;
 
 err_free_blocks:
-	mutex_lock(&bman->lock);
 	drm_buddy_free_list(mm, &bman_res->blocks);
 	mutex_unlock(&bman->lock);
 err_free_res:
@@ -365,6 +360,26 @@ u64 i915_ttm_buddy_man_visible_size(struct ttm_resource_manager *man)
 	return bman->visible_size;
 }
 
+/**
+ * i915_ttm_buddy_man_avail - Query the avail tracking for the manager.
+ *
+ * @man: The buddy allocator ttm manager
+ * @avail: The total available memory in pages for the entire manager.
+ * @visible_avail: The total available memory in pages for the CPU visible
+ * portion. Note that this will always give the same value as @avail on
+ * configurations that don't have a small BAR.
+ */
+void i915_ttm_buddy_man_avail(struct ttm_resource_manager *man,
+			      u64 *avail, u64 *visible_avail)
+{
+	struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
+
+	mutex_lock(&bman->lock);
+	*avail = bman->mm.avail >> PAGE_SHIFT;
+	*visible_avail = bman->visible_avail;
+	mutex_unlock(&bman->lock);
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 void i915_ttm_buddy_man_force_visible_size(struct ttm_resource_manager *man,
 					   u64 size)
diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
index 52d9586d242c..d64620712830 100644
--- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
+++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
@@ -61,6 +61,9 @@ int i915_ttm_buddy_man_reserve(struct ttm_resource_manager *man,
 
 u64 i915_ttm_buddy_man_visible_size(struct ttm_resource_manager *man);
 
+void i915_ttm_buddy_man_avail(struct ttm_resource_manager *man,
+			      u64 *avail, u64 *avail_visible);
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 void i915_ttm_buddy_man_force_visible_size(struct ttm_resource_manager *man,
 					   u64 size);
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index e38d2db1c3e3..94ee26e99549 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -279,6 +279,20 @@ void intel_memory_region_set_name(struct intel_memory_region *mem,
 	va_end(ap);
 }
 
+void intel_memory_region_avail(struct intel_memory_region *mr,
+			       u64 *avail, u64 *visible_avail)
+{
+	if (mr->type == INTEL_MEMORY_LOCAL) {
+		i915_ttm_buddy_man_avail(mr->region_private,
+					 avail, visible_avail);
+		*avail <<= PAGE_SHIFT;
+		*visible_avail <<= PAGE_SHIFT;
+	} else {
+		*avail = mr->total;
+		*visible_avail = mr->total;
+	}
+}
+
 void intel_memory_region_destroy(struct intel_memory_region *mem)
 {
 	int ret = 0;
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 3d8378c1b447..2214f251bec3 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -127,6 +127,9 @@ int intel_memory_region_reserve(struct intel_memory_region *mem,
 void intel_memory_region_debug(struct intel_memory_region *mr,
 			       struct drm_printer *printer);
 
+void intel_memory_region_avail(struct intel_memory_region *mr,
+			       u64 *avail, u64 *visible_avail);
+
 struct intel_memory_region *
 i915_gem_ttm_system_setup(struct drm_i915_private *i915,
 			  u16 type, u16 instance);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7eacacb00373..e4847436bab8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3228,7 +3228,15 @@ struct drm_i915_memory_region_info {
 	 */
 	__u64 probed_size;
 
-	/** @unallocated_size: Estimate of memory remaining */
+	/**
+	 * @unallocated_size: Estimate of memory remaining
+	 *
+	 * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting.
+	 * Without this (or if this is an older kernel) the value here will
+	 * always equal the @probed_size. Note this is only currently tracked
+	 * for I915_MEMORY_CLASS_DEVICE regions (for other types the value here
+	 * will always equal the @probed_size).
+	 */
 	__u64 unallocated_size;
 
 	union {
@@ -3262,6 +3270,27 @@ struct drm_i915_memory_region_info {
 			 * @probed_size.
 			 */
 			__u64 probed_cpu_visible_size;
+
+			/**
+			 * @unallocated_cpu_visible_size: Estimate of CPU
+			 * visible memory remaining.
+			 *
+			 * Note this is only tracked for
+			 * I915_MEMORY_CLASS_DEVICE regions (for other types the
+			 * value here will always equal the
+			 * @probed_cpu_visible_size).
+			 *
+			 * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable
+			 * accounting.  Without this the value here will always
+			 * equal the @probed_cpu_visible_size. Note this is only
+			 * currently tracked for I915_MEMORY_CLASS_DEVICE
+			 * regions (for other types the value here will also
+			 * always equal the @probed_cpu_visible_size).
+			 *
+			 * If this is an older kernel the value here will be
+			 * zero, see also @probed_cpu_visible_size.
+			 */
+			__u64 unallocated_cpu_visible_size;
 		};
 	};
 };
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 03/13] drm/i915/uapi: expose the avail tracking
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel

Vulkan would like to have a rough measure of how much device memory can
in theory be allocated. Also add unallocated_cpu_visible_size to track
the visible portion, in case the device is using small BAR. Also tweak
the locking so we nice consistent values for both the mm->avail and the
visible tracking.

v2: tweak the locking slightly so we update the mm->avail and visible
tracking as one atomic operation, such that userspace doesn't get
strange values when sampling the values.

Testcase: igt@i915_query@query-regions-unallocated
Testcase: igt@i915_query@query-regions-sanity-check
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_query.c             | 10 +++++-
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 31 ++++++++++++++-----
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.h |  3 ++
 drivers/gpu/drm/i915/intel_memory_region.c    | 14 +++++++++
 drivers/gpu/drm/i915/intel_memory_region.h    |  3 ++
 include/uapi/drm/i915_drm.h                   | 31 ++++++++++++++++++-
 6 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 9894add651dd..6ec9c9fb7b0d 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -504,7 +504,15 @@ static int query_memregion_info(struct drm_i915_private *i915,
 		else
 			info.probed_cpu_visible_size = mr->total;
 
-		info.unallocated_size = mr->avail;
+		if (perfmon_capable()) {
+			intel_memory_region_avail(mr,
+						  &info.unallocated_size,
+						  &info.unallocated_cpu_visible_size);
+		} else {
+			info.unallocated_size = info.probed_size;
+			info.unallocated_cpu_visible_size =
+				info.probed_cpu_visible_size;
+		}
 
 		if (__copy_to_user(info_ptr, &info, sizeof(info)))
 			return -EFAULT;
diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
index a5109548abc0..427de1aaab36 100644
--- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
+++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
@@ -104,18 +104,15 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
 				     min_page_size,
 				     &bman_res->blocks,
 				     bman_res->flags);
-	mutex_unlock(&bman->lock);
 	if (unlikely(err))
 		goto err_free_blocks;
 
 	if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
 		u64 original_size = (u64)bman_res->base.num_pages << PAGE_SHIFT;
 
-		mutex_lock(&bman->lock);
 		drm_buddy_block_trim(mm,
 				     original_size,
 				     &bman_res->blocks);
-		mutex_unlock(&bman->lock);
 	}
 
 	if (lpfn <= bman->visible_size) {
@@ -137,11 +134,10 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
 		}
 	}
 
-	if (bman_res->used_visible_size) {
-		mutex_lock(&bman->lock);
+	if (bman_res->used_visible_size)
 		bman->visible_avail -= bman_res->used_visible_size;
-		mutex_unlock(&bman->lock);
-	}
+
+	mutex_unlock(&bman->lock);
 
 	if (place->lpfn - place->fpfn == n_pages)
 		bman_res->base.start = place->fpfn;
@@ -154,7 +150,6 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
 	return 0;
 
 err_free_blocks:
-	mutex_lock(&bman->lock);
 	drm_buddy_free_list(mm, &bman_res->blocks);
 	mutex_unlock(&bman->lock);
 err_free_res:
@@ -365,6 +360,26 @@ u64 i915_ttm_buddy_man_visible_size(struct ttm_resource_manager *man)
 	return bman->visible_size;
 }
 
+/**
+ * i915_ttm_buddy_man_avail - Query the avail tracking for the manager.
+ *
+ * @man: The buddy allocator ttm manager
+ * @avail: The total available memory in pages for the entire manager.
+ * @visible_avail: The total available memory in pages for the CPU visible
+ * portion. Note that this will always give the same value as @avail on
+ * configurations that don't have a small BAR.
+ */
+void i915_ttm_buddy_man_avail(struct ttm_resource_manager *man,
+			      u64 *avail, u64 *visible_avail)
+{
+	struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
+
+	mutex_lock(&bman->lock);
+	*avail = bman->mm.avail >> PAGE_SHIFT;
+	*visible_avail = bman->visible_avail;
+	mutex_unlock(&bman->lock);
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 void i915_ttm_buddy_man_force_visible_size(struct ttm_resource_manager *man,
 					   u64 size)
diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
index 52d9586d242c..d64620712830 100644
--- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
+++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
@@ -61,6 +61,9 @@ int i915_ttm_buddy_man_reserve(struct ttm_resource_manager *man,
 
 u64 i915_ttm_buddy_man_visible_size(struct ttm_resource_manager *man);
 
+void i915_ttm_buddy_man_avail(struct ttm_resource_manager *man,
+			      u64 *avail, u64 *avail_visible);
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 void i915_ttm_buddy_man_force_visible_size(struct ttm_resource_manager *man,
 					   u64 size);
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index e38d2db1c3e3..94ee26e99549 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -279,6 +279,20 @@ void intel_memory_region_set_name(struct intel_memory_region *mem,
 	va_end(ap);
 }
 
+void intel_memory_region_avail(struct intel_memory_region *mr,
+			       u64 *avail, u64 *visible_avail)
+{
+	if (mr->type == INTEL_MEMORY_LOCAL) {
+		i915_ttm_buddy_man_avail(mr->region_private,
+					 avail, visible_avail);
+		*avail <<= PAGE_SHIFT;
+		*visible_avail <<= PAGE_SHIFT;
+	} else {
+		*avail = mr->total;
+		*visible_avail = mr->total;
+	}
+}
+
 void intel_memory_region_destroy(struct intel_memory_region *mem)
 {
 	int ret = 0;
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 3d8378c1b447..2214f251bec3 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -127,6 +127,9 @@ int intel_memory_region_reserve(struct intel_memory_region *mem,
 void intel_memory_region_debug(struct intel_memory_region *mr,
 			       struct drm_printer *printer);
 
+void intel_memory_region_avail(struct intel_memory_region *mr,
+			       u64 *avail, u64 *visible_avail);
+
 struct intel_memory_region *
 i915_gem_ttm_system_setup(struct drm_i915_private *i915,
 			  u16 type, u16 instance);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7eacacb00373..e4847436bab8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3228,7 +3228,15 @@ struct drm_i915_memory_region_info {
 	 */
 	__u64 probed_size;
 
-	/** @unallocated_size: Estimate of memory remaining */
+	/**
+	 * @unallocated_size: Estimate of memory remaining
+	 *
+	 * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting.
+	 * Without this (or if this is an older kernel) the value here will
+	 * always equal the @probed_size. Note this is only currently tracked
+	 * for I915_MEMORY_CLASS_DEVICE regions (for other types the value here
+	 * will always equal the @probed_size).
+	 */
 	__u64 unallocated_size;
 
 	union {
@@ -3262,6 +3270,27 @@ struct drm_i915_memory_region_info {
 			 * @probed_size.
 			 */
 			__u64 probed_cpu_visible_size;
+
+			/**
+			 * @unallocated_cpu_visible_size: Estimate of CPU
+			 * visible memory remaining.
+			 *
+			 * Note this is only tracked for
+			 * I915_MEMORY_CLASS_DEVICE regions (for other types the
+			 * value here will always equal the
+			 * @probed_cpu_visible_size).
+			 *
+			 * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable
+			 * accounting.  Without this the value here will always
+			 * equal the @probed_cpu_visible_size. Note this is only
+			 * currently tracked for I915_MEMORY_CLASS_DEVICE
+			 * regions (for other types the value here will also
+			 * always equal the @probed_cpu_visible_size).
+			 *
+			 * If this is an older kernel the value here will be
+			 * zero, see also @probed_cpu_visible_size.
+			 */
+			__u64 unallocated_cpu_visible_size;
 		};
 	};
 };
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 04/13] drm/i915: remove intel_memory_region avail
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin

No longer used.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_memory_region.c | 4 +---
 drivers/gpu/drm/i915/intel_memory_region.h | 1 -
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index 94ee26e99549..9a4a7fb55582 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -198,8 +198,7 @@ void intel_memory_region_debug(struct intel_memory_region *mr,
 	if (mr->region_private)
 		ttm_resource_manager_debug(mr->region_private, printer);
 	else
-		drm_printf(printer, "total:%pa, available:%pa bytes\n",
-			   &mr->total, &mr->avail);
+		drm_printf(printer, "total:%pa bytes\n", &mr->total);
 }
 
 static int intel_memory_region_memtest(struct intel_memory_region *mem,
@@ -242,7 +241,6 @@ intel_memory_region_create(struct drm_i915_private *i915,
 	mem->min_page_size = min_page_size;
 	mem->ops = ops;
 	mem->total = size;
-	mem->avail = mem->total;
 	mem->type = type;
 	mem->instance = instance;
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 2214f251bec3..2953ed5c3248 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -75,7 +75,6 @@ struct intel_memory_region {
 	resource_size_t io_size;
 	resource_size_t min_page_size;
 	resource_size_t total;
-	resource_size_t avail;
 
 	u16 type;
 	u16 instance;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 04/13] drm/i915: remove intel_memory_region avail
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel

No longer used.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_memory_region.c | 4 +---
 drivers/gpu/drm/i915/intel_memory_region.h | 1 -
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index 94ee26e99549..9a4a7fb55582 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -198,8 +198,7 @@ void intel_memory_region_debug(struct intel_memory_region *mr,
 	if (mr->region_private)
 		ttm_resource_manager_debug(mr->region_private, printer);
 	else
-		drm_printf(printer, "total:%pa, available:%pa bytes\n",
-			   &mr->total, &mr->avail);
+		drm_printf(printer, "total:%pa bytes\n", &mr->total);
 }
 
 static int intel_memory_region_memtest(struct intel_memory_region *mem,
@@ -242,7 +241,6 @@ intel_memory_region_create(struct drm_i915_private *i915,
 	mem->min_page_size = min_page_size;
 	mem->ops = ops;
 	mem->total = size;
-	mem->avail = mem->total;
 	mem->type = type;
 	mem->instance = instance;
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 2214f251bec3..2953ed5c3248 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -75,7 +75,6 @@ struct intel_memory_region {
 	resource_size_t io_size;
 	resource_size_t min_page_size;
 	resource_size_t total;
-	resource_size_t avail;
 
 	u16 type;
 	u16 instance;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 05/13] drm/i915/uapi: apply ALLOC_GPU_ONLY by default
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin

On small BAR configurations, when dealing with I915_MEMORY_CLASS_DEVICE
allocations, we assume that by default, all userspace allocations should
be placed in the non-CPU visible portion.  Note that dumb buffers are
not included here, since these are not "GPU accelerated" and likely need
CPU access. We choose to just always set GPU_ONLY, and let the backend
figure out if that should be ignored or not, for example on full BAR
systems.

In a later patch userspace will be able to provide a hint if CPU access
to the buffer is needed.

v2(Thomas)
 - Apply GPU_ONLY on all discrete devices, but only if the BO can be
   placed in LMEM. Down in the depths this should be turned into a noop,
   where required, and as an annotation it still make some sense. If we
   apply it regardless of the placements then we end up needing to check
   the placements during exec capture. Also it's slightly inconsistent
   since the NEEDS_CPU_ACCESS can only be applied on objects that can be
   placed in LMEM. The other annoyance would be gem_create_ext vs plain
   gem_create, if we were to always apply GPU_ONLY.

Testcase: igt@gem-create@create-ext-cpu-access-sanity-check
Testcase: igt@gem-create@create-ext-cpu-access-big
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 5802692ea604..d094cae0ddf1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -427,6 +427,14 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 		ext_data.n_placements = 1;
 	}
 
+	/*
+	 * TODO: add a userspace hint to force CPU_ACCESS for the object, which
+	 * can override this.
+	 */
+	if (ext_data.n_placements > 1 ||
+	    ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM)
+		ext_data.flags |= I915_BO_ALLOC_GPU_ONLY;
+
 	obj = __i915_gem_object_create_user_ext(i915, args->size,
 						ext_data.placements,
 						ext_data.n_placements,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 05/13] drm/i915/uapi: apply ALLOC_GPU_ONLY by default
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel

On small BAR configurations, when dealing with I915_MEMORY_CLASS_DEVICE
allocations, we assume that by default, all userspace allocations should
be placed in the non-CPU visible portion.  Note that dumb buffers are
not included here, since these are not "GPU accelerated" and likely need
CPU access. We choose to just always set GPU_ONLY, and let the backend
figure out if that should be ignored or not, for example on full BAR
systems.

In a later patch userspace will be able to provide a hint if CPU access
to the buffer is needed.

v2(Thomas)
 - Apply GPU_ONLY on all discrete devices, but only if the BO can be
   placed in LMEM. Down in the depths this should be turned into a noop,
   where required, and as an annotation it still make some sense. If we
   apply it regardless of the placements then we end up needing to check
   the placements during exec capture. Also it's slightly inconsistent
   since the NEEDS_CPU_ACCESS can only be applied on objects that can be
   placed in LMEM. The other annoyance would be gem_create_ext vs plain
   gem_create, if we were to always apply GPU_ONLY.

Testcase: igt@gem-create@create-ext-cpu-access-sanity-check
Testcase: igt@gem-create@create-ext-cpu-access-big
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 5802692ea604..d094cae0ddf1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -427,6 +427,14 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 		ext_data.n_placements = 1;
 	}
 
+	/*
+	 * TODO: add a userspace hint to force CPU_ACCESS for the object, which
+	 * can override this.
+	 */
+	if (ext_data.n_placements > 1 ||
+	    ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM)
+		ext_data.flags |= I915_BO_ALLOC_GPU_ONLY;
+
 	obj = __i915_gem_object_create_user_ext(i915, args->size,
 						ext_data.placements,
 						ext_data.n_placements,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 06/13] drm/i915/uapi: add NEEDS_CPU_ACCESS hint
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Jordan Justen, Lionel Landwerlin,
	Kenneth Graunke, Jon Bloomfield, dri-devel, Daniel Vetter,
	Akeem G Abodunrin, Nirmoy Das

If set, force the allocation to be placed in the mappable portion of
I915_MEMORY_CLASS_DEVICE. One big restriction here is that system memory
(i.e I915_MEMORY_CLASS_SYSTEM) must be given as a potential placement for the
object, that way we can always spill the object into system memory if we
can't make space.

Testcase: igt@gem-create@create-ext-cpu-access-sanity-check
Testcase: igt@gem-create@create-ext-cpu-access-big
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 26 ++++++---
 include/uapi/drm/i915_drm.h                | 61 +++++++++++++++++++---
 2 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index d094cae0ddf1..33673fe7ee0a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -241,6 +241,7 @@ struct create_ext {
 	struct drm_i915_private *i915;
 	struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
 	unsigned int n_placements;
+	unsigned int placement_mask;
 	unsigned long flags;
 };
 
@@ -337,6 +338,7 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args,
 	for (i = 0; i < args->num_regions; i++)
 		ext_data->placements[i] = placements[i];
 
+	ext_data->placement_mask = mask;
 	return 0;
 
 out_dump:
@@ -411,7 +413,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_gem_object *obj;
 	int ret;
 
-	if (args->flags)
+	if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS)
 		return -EINVAL;
 
 	ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
@@ -427,13 +429,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 		ext_data.n_placements = 1;
 	}
 
-	/*
-	 * TODO: add a userspace hint to force CPU_ACCESS for the object, which
-	 * can override this.
-	 */
-	if (ext_data.n_placements > 1 ||
-	    ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM)
-		ext_data.flags |= I915_BO_ALLOC_GPU_ONLY;
+	if (args->flags & I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) {
+		if (ext_data.n_placements == 1)
+			return -EINVAL;
+
+		/*
+		 * We always need to be able to spill to system memory, if we
+		 * can't place in the mappable part of LMEM.
+		 */
+		if (!(ext_data.placement_mask & BIT(INTEL_REGION_SMEM)))
+			return -EINVAL;
+	} else {
+		if (ext_data.n_placements > 1 ||
+		    ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM)
+			ext_data.flags |= I915_BO_ALLOC_GPU_ONLY;
+	}
 
 	obj = __i915_gem_object_create_user_ext(i915, args->size,
 						ext_data.placements,
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index e4847436bab8..3e78a00220ea 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3366,11 +3366,11 @@ struct drm_i915_query_memory_regions {
  * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
  * extension support using struct i915_user_extension.
  *
- * Note that in the future we want to have our buffer flags here, at least for
- * the stuff that is immutable. Previously we would have two ioctls, one to
- * create the object with gem_create, and another to apply various parameters,
- * however this creates some ambiguity for the params which are considered
- * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also in
+ * general we're phasing out the various SET/GET ioctls.
  */
 struct drm_i915_gem_create_ext {
 	/**
@@ -3378,7 +3378,6 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 *
 	 * DG2 64K min page size implications:
 	 *
 	 * On discrete platforms, starting from DG2, we have to contend with GTT
@@ -3390,7 +3389,9 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * Note that the returned size here will always reflect any required
 	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
-	 * such as DG2.
+	 * such as DG2. The kernel will always select the largest minimum
+	 * page-size for the set of possible placements as the value to use when
+	 * rounding up the @size.
 	 *
 	 * Special DG2 GTT address alignment requirement:
 	 *
@@ -3414,14 +3415,58 @@ struct drm_i915_gem_create_ext {
 	 * is deemed to be a good compromise.
 	 */
 	__u64 size;
+
 	/**
 	 * @handle: Returned handle for the object.
 	 *
 	 * Object handles are nonzero.
 	 */
 	__u32 handle;
-	/** @flags: MBZ */
+
+	/**
+	 * @flags: Optional flags.
+	 *
+	 * Supported values:
+	 *
+	 * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+	 * the object will need to be accessed via the CPU.
+	 *
+	 * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and only
+	 * strictly required on configurations where some subset of the device
+	 * memory is directly visible/mappable through the CPU (which we also
+	 * call small BAR), like on some DG2+ systems. Note that this is quite
+	 * undesirable, but due to various factors like the client CPU, BIOS etc
+	 * it's something we can expect to see in the wild. See
+	 * &drm_i915_memory_region_info.probed_cpu_visible_size for how to
+	 * determine if this system applies.
+	 *
+	 * Note that one of the placements MUST be I915_MEMORY_CLASS_SYSTEM, to
+	 * ensure the kernel can always spill the allocation to system memory,
+	 * if the object can't be allocated in the mappable part of
+	 * I915_MEMORY_CLASS_DEVICE.
+	 *
+	 * Also note that since the kernel only supports flat-CCS on objects
+	 * that can *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefore
+	 * don't support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with
+	 * flat-CCS.
+	 *
+	 * Without this hint, the kernel will assume that non-mappable
+	 * I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the
+	 * kernel can still migrate the object to the mappable part, as a last
+	 * resort, if userspace ever CPU faults this object, but this might be
+	 * expensive, and so ideally should be avoided.
+	 *
+	 * On older kernels which lack the relevant small-bar uAPI support (see
+	 * also &drm_i915_memory_region_info.probed_cpu_visible_size),
+	 * usage of the flag will result in an error, but it should NEVER be
+	 * possible to end up with a small BAR configuration, assuming we can
+	 * also successfully load the i915 kernel module. In such cases the
+	 * entire I915_MEMORY_CLASS_DEVICE region will be CPU accessible, and as
+	 * such there are zero restrictions on where the object can be placed.
+	 */
+#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0)
 	__u32 flags;
+
 	/**
 	 * @extensions: The chain of extensions to apply to this object.
 	 *
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 06/13] drm/i915/uapi: add NEEDS_CPU_ACCESS hint
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Kenneth Graunke, dri-devel, Daniel Vetter,
	Nirmoy Das

If set, force the allocation to be placed in the mappable portion of
I915_MEMORY_CLASS_DEVICE. One big restriction here is that system memory
(i.e I915_MEMORY_CLASS_SYSTEM) must be given as a potential placement for the
object, that way we can always spill the object into system memory if we
can't make space.

Testcase: igt@gem-create@create-ext-cpu-access-sanity-check
Testcase: igt@gem-create@create-ext-cpu-access-big
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 26 ++++++---
 include/uapi/drm/i915_drm.h                | 61 +++++++++++++++++++---
 2 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index d094cae0ddf1..33673fe7ee0a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -241,6 +241,7 @@ struct create_ext {
 	struct drm_i915_private *i915;
 	struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
 	unsigned int n_placements;
+	unsigned int placement_mask;
 	unsigned long flags;
 };
 
@@ -337,6 +338,7 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args,
 	for (i = 0; i < args->num_regions; i++)
 		ext_data->placements[i] = placements[i];
 
+	ext_data->placement_mask = mask;
 	return 0;
 
 out_dump:
@@ -411,7 +413,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_gem_object *obj;
 	int ret;
 
-	if (args->flags)
+	if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS)
 		return -EINVAL;
 
 	ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
@@ -427,13 +429,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 		ext_data.n_placements = 1;
 	}
 
-	/*
-	 * TODO: add a userspace hint to force CPU_ACCESS for the object, which
-	 * can override this.
-	 */
-	if (ext_data.n_placements > 1 ||
-	    ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM)
-		ext_data.flags |= I915_BO_ALLOC_GPU_ONLY;
+	if (args->flags & I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) {
+		if (ext_data.n_placements == 1)
+			return -EINVAL;
+
+		/*
+		 * We always need to be able to spill to system memory, if we
+		 * can't place in the mappable part of LMEM.
+		 */
+		if (!(ext_data.placement_mask & BIT(INTEL_REGION_SMEM)))
+			return -EINVAL;
+	} else {
+		if (ext_data.n_placements > 1 ||
+		    ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM)
+			ext_data.flags |= I915_BO_ALLOC_GPU_ONLY;
+	}
 
 	obj = __i915_gem_object_create_user_ext(i915, args->size,
 						ext_data.placements,
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index e4847436bab8..3e78a00220ea 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3366,11 +3366,11 @@ struct drm_i915_query_memory_regions {
  * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
  * extension support using struct i915_user_extension.
  *
- * Note that in the future we want to have our buffer flags here, at least for
- * the stuff that is immutable. Previously we would have two ioctls, one to
- * create the object with gem_create, and another to apply various parameters,
- * however this creates some ambiguity for the params which are considered
- * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also in
+ * general we're phasing out the various SET/GET ioctls.
  */
 struct drm_i915_gem_create_ext {
 	/**
@@ -3378,7 +3378,6 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 *
 	 * DG2 64K min page size implications:
 	 *
 	 * On discrete platforms, starting from DG2, we have to contend with GTT
@@ -3390,7 +3389,9 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * Note that the returned size here will always reflect any required
 	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
-	 * such as DG2.
+	 * such as DG2. The kernel will always select the largest minimum
+	 * page-size for the set of possible placements as the value to use when
+	 * rounding up the @size.
 	 *
 	 * Special DG2 GTT address alignment requirement:
 	 *
@@ -3414,14 +3415,58 @@ struct drm_i915_gem_create_ext {
 	 * is deemed to be a good compromise.
 	 */
 	__u64 size;
+
 	/**
 	 * @handle: Returned handle for the object.
 	 *
 	 * Object handles are nonzero.
 	 */
 	__u32 handle;
-	/** @flags: MBZ */
+
+	/**
+	 * @flags: Optional flags.
+	 *
+	 * Supported values:
+	 *
+	 * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+	 * the object will need to be accessed via the CPU.
+	 *
+	 * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and only
+	 * strictly required on configurations where some subset of the device
+	 * memory is directly visible/mappable through the CPU (which we also
+	 * call small BAR), like on some DG2+ systems. Note that this is quite
+	 * undesirable, but due to various factors like the client CPU, BIOS etc
+	 * it's something we can expect to see in the wild. See
+	 * &drm_i915_memory_region_info.probed_cpu_visible_size for how to
+	 * determine if this system applies.
+	 *
+	 * Note that one of the placements MUST be I915_MEMORY_CLASS_SYSTEM, to
+	 * ensure the kernel can always spill the allocation to system memory,
+	 * if the object can't be allocated in the mappable part of
+	 * I915_MEMORY_CLASS_DEVICE.
+	 *
+	 * Also note that since the kernel only supports flat-CCS on objects
+	 * that can *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefore
+	 * don't support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with
+	 * flat-CCS.
+	 *
+	 * Without this hint, the kernel will assume that non-mappable
+	 * I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the
+	 * kernel can still migrate the object to the mappable part, as a last
+	 * resort, if userspace ever CPU faults this object, but this might be
+	 * expensive, and so ideally should be avoided.
+	 *
+	 * On older kernels which lack the relevant small-bar uAPI support (see
+	 * also &drm_i915_memory_region_info.probed_cpu_visible_size),
+	 * usage of the flag will result in an error, but it should NEVER be
+	 * possible to end up with a small BAR configuration, assuming we can
+	 * also successfully load the i915 kernel module. In such cases the
+	 * entire I915_MEMORY_CLASS_DEVICE region will be CPU accessible, and as
+	 * such there are zero restrictions on where the object can be placed.
+	 */
+#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0)
 	__u32 flags;
+
 	/**
 	 * @extensions: The chain of extensions to apply to this object.
 	 *
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 07/13] drm/i915/error: skip non-mappable pages
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin, Nirmoy Das

Skip capturing any lmem pages that can't be copied using the CPU. This
in now only best effort on platforms that have small BAR.

Testcase: igt@gem-exec-capture@capture-invisible
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index f9b1969ed7ed..52ea13fee015 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1129,11 +1129,15 @@ i915_vma_coredump_create(const struct intel_gt *gt,
 		dma_addr_t dma;
 
 		for_each_sgt_daddr(dma, iter, vma_res->bi.pages) {
+			dma_addr_t offset = dma - mem->region.start;
 			void __iomem *s;
 
-			s = io_mapping_map_wc(&mem->iomap,
-					      dma - mem->region.start,
-					      PAGE_SIZE);
+			if (offset + PAGE_SIZE > mem->io_size) {
+				ret = -EINVAL;
+				break;
+			}
+
+			s = io_mapping_map_wc(&mem->iomap, offset, PAGE_SIZE);
 			ret = compress_page(compress,
 					    (void __force *)s, dst,
 					    true);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 07/13] drm/i915/error: skip non-mappable pages
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel,
	Nirmoy Das

Skip capturing any lmem pages that can't be copied using the CPU. This
in now only best effort on platforms that have small BAR.

Testcase: igt@gem-exec-capture@capture-invisible
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index f9b1969ed7ed..52ea13fee015 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1129,11 +1129,15 @@ i915_vma_coredump_create(const struct intel_gt *gt,
 		dma_addr_t dma;
 
 		for_each_sgt_daddr(dma, iter, vma_res->bi.pages) {
+			dma_addr_t offset = dma - mem->region.start;
 			void __iomem *s;
 
-			s = io_mapping_map_wc(&mem->iomap,
-					      dma - mem->region.start,
-					      PAGE_SIZE);
+			if (offset + PAGE_SIZE > mem->io_size) {
+				ret = -EINVAL;
+				break;
+			}
+
+			s = io_mapping_map_wc(&mem->iomap, offset, PAGE_SIZE);
 			ret = compress_page(compress,
 					    (void __force *)s, dst,
 					    true);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 08/13] drm/i915/uapi: tweak error capture on recoverable contexts
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel

A non-recoverable context must be used if the user wants proper error
capture on discrete platforms. In the future the kernel may want to blit
the contents of some objects when later doing the capture stage. Also
extend to newer integrated platforms.

v2(Thomas):
  - Also extend to newer integrated platforms, for capture buffer memory
    allocation purposes.
v3 (Reported-by: kernel test robot <lkp@intel.com>):
  - Fix build on !CONFIG_DRM_I915_CAPTURE_ERROR

Testcase: igt@gem_exec_capture@capture-recoverable
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 30fe847c6664..b7b2c14fd9e1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1951,7 +1951,7 @@ eb_find_first_request_added(struct i915_execbuffer *eb)
 #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)
 
 /* Stage with GFP_KERNEL allocations before we enter the signaling critical path */
-static void eb_capture_stage(struct i915_execbuffer *eb)
+static int eb_capture_stage(struct i915_execbuffer *eb)
 {
 	const unsigned int count = eb->buffer_count;
 	unsigned int i = count, j;
@@ -1964,6 +1964,10 @@ static void eb_capture_stage(struct i915_execbuffer *eb)
 		if (!(flags & EXEC_OBJECT_CAPTURE))
 			continue;
 
+		if (i915_gem_context_is_recoverable(eb->gem_context) &&
+		    (IS_DGFX(eb->i915) || GRAPHICS_VER_FULL(eb->i915) > IP_VER(12, 0)))
+			return -EINVAL;
+
 		for_each_batch_create_order(eb, j) {
 			struct i915_capture_list *capture;
 
@@ -1976,6 +1980,8 @@ static void eb_capture_stage(struct i915_execbuffer *eb)
 			eb->capture_lists[j] = capture;
 		}
 	}
+
+	return 0;
 }
 
 /* Commit once we're in the critical path */
@@ -2017,8 +2023,9 @@ static void eb_capture_list_clear(struct i915_execbuffer *eb)
 
 #else
 
-static void eb_capture_stage(struct i915_execbuffer *eb)
+static int eb_capture_stage(struct i915_execbuffer *eb)
 {
+	return 0;
 }
 
 static void eb_capture_commit(struct i915_execbuffer *eb)
@@ -3410,7 +3417,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	}
 
 	ww_acquire_done(&eb.ww.ctx);
-	eb_capture_stage(&eb);
+	err = eb_capture_stage(&eb);
+	if (err)
+		goto err_vma;
 
 	out_fence = eb_requests_create(&eb, in_fence, out_fence_fd);
 	if (IS_ERR(out_fence)) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 08/13] drm/i915/uapi: tweak error capture on recoverable contexts
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin

A non-recoverable context must be used if the user wants proper error
capture on discrete platforms. In the future the kernel may want to blit
the contents of some objects when later doing the capture stage. Also
extend to newer integrated platforms.

v2(Thomas):
  - Also extend to newer integrated platforms, for capture buffer memory
    allocation purposes.
v3 (Reported-by: kernel test robot <lkp@intel.com>):
  - Fix build on !CONFIG_DRM_I915_CAPTURE_ERROR

Testcase: igt@gem_exec_capture@capture-recoverable
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 30fe847c6664..b7b2c14fd9e1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1951,7 +1951,7 @@ eb_find_first_request_added(struct i915_execbuffer *eb)
 #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)
 
 /* Stage with GFP_KERNEL allocations before we enter the signaling critical path */
-static void eb_capture_stage(struct i915_execbuffer *eb)
+static int eb_capture_stage(struct i915_execbuffer *eb)
 {
 	const unsigned int count = eb->buffer_count;
 	unsigned int i = count, j;
@@ -1964,6 +1964,10 @@ static void eb_capture_stage(struct i915_execbuffer *eb)
 		if (!(flags & EXEC_OBJECT_CAPTURE))
 			continue;
 
+		if (i915_gem_context_is_recoverable(eb->gem_context) &&
+		    (IS_DGFX(eb->i915) || GRAPHICS_VER_FULL(eb->i915) > IP_VER(12, 0)))
+			return -EINVAL;
+
 		for_each_batch_create_order(eb, j) {
 			struct i915_capture_list *capture;
 
@@ -1976,6 +1980,8 @@ static void eb_capture_stage(struct i915_execbuffer *eb)
 			eb->capture_lists[j] = capture;
 		}
 	}
+
+	return 0;
 }
 
 /* Commit once we're in the critical path */
@@ -2017,8 +2023,9 @@ static void eb_capture_list_clear(struct i915_execbuffer *eb)
 
 #else
 
-static void eb_capture_stage(struct i915_execbuffer *eb)
+static int eb_capture_stage(struct i915_execbuffer *eb)
 {
+	return 0;
 }
 
 static void eb_capture_commit(struct i915_execbuffer *eb)
@@ -3410,7 +3417,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	}
 
 	ww_acquire_done(&eb.ww.ctx);
-	eb_capture_stage(&eb);
+	err = eb_capture_stage(&eb);
+	if (err)
+		goto err_vma;
 
 	out_fence = eb_requests_create(&eb, in_fence, out_fence_fd);
 	if (IS_ERR(out_fence)) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 09/13] drm/i915/selftests: skip the mman tests for stolen
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin

It's not supported, and just skips later anyway. With small-BAR things
get more complicated since all of stolen is likely not even CPU
accessible, hence not passing I915_BO_ALLOC_GPU_ONLY just results in the
object create failing.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..388c85b0f764 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -979,6 +979,9 @@ static int igt_mmap(void *arg)
 		};
 		int i;
 
+		if (mr->private)
+			continue;
+
 		for (i = 0; i < ARRAY_SIZE(sizes); i++) {
 			struct drm_i915_gem_object *obj;
 			int err;
@@ -1435,6 +1438,9 @@ static int igt_mmap_access(void *arg)
 		struct drm_i915_gem_object *obj;
 		int err;
 
+		if (mr->private)
+			continue;
+
 		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
 		if (obj == ERR_PTR(-ENODEV))
 			continue;
@@ -1580,6 +1586,9 @@ static int igt_mmap_gpu(void *arg)
 		struct drm_i915_gem_object *obj;
 		int err;
 
+		if (mr->private)
+			continue;
+
 		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
 		if (obj == ERR_PTR(-ENODEV))
 			continue;
@@ -1727,6 +1736,9 @@ static int igt_mmap_revoke(void *arg)
 		struct drm_i915_gem_object *obj;
 		int err;
 
+		if (mr->private)
+			continue;
+
 		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
 		if (obj == ERR_PTR(-ENODEV))
 			continue;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 09/13] drm/i915/selftests: skip the mman tests for stolen
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel

It's not supported, and just skips later anyway. With small-BAR things
get more complicated since all of stolen is likely not even CPU
accessible, hence not passing I915_BO_ALLOC_GPU_ONLY just results in the
object create failing.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..388c85b0f764 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -979,6 +979,9 @@ static int igt_mmap(void *arg)
 		};
 		int i;
 
+		if (mr->private)
+			continue;
+
 		for (i = 0; i < ARRAY_SIZE(sizes); i++) {
 			struct drm_i915_gem_object *obj;
 			int err;
@@ -1435,6 +1438,9 @@ static int igt_mmap_access(void *arg)
 		struct drm_i915_gem_object *obj;
 		int err;
 
+		if (mr->private)
+			continue;
+
 		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
 		if (obj == ERR_PTR(-ENODEV))
 			continue;
@@ -1580,6 +1586,9 @@ static int igt_mmap_gpu(void *arg)
 		struct drm_i915_gem_object *obj;
 		int err;
 
+		if (mr->private)
+			continue;
+
 		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
 		if (obj == ERR_PTR(-ENODEV))
 			continue;
@@ -1727,6 +1736,9 @@ static int igt_mmap_revoke(void *arg)
 		struct drm_i915_gem_object *obj;
 		int err;
 
+		if (mr->private)
+			continue;
+
 		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
 		if (obj == ERR_PTR(-ENODEV))
 			continue;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 10/13] drm/i915/selftests: ensure we reserve a fence slot
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin

We should always be explicit and allocate a fence slot before adding a
new fence.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 388c85b0f764..da28acb78a88 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -1224,8 +1224,10 @@ static int __igt_mmap_migrate(struct intel_memory_region **placements,
 					  expand32(POISON_INUSE), &rq);
 	i915_gem_object_unpin_pages(obj);
 	if (rq) {
-		dma_resv_add_fence(obj->base.resv, &rq->fence,
-				   DMA_RESV_USAGE_KERNEL);
+		err = dma_resv_reserve_fences(obj->base.resv, 1);
+		if (!err)
+			dma_resv_add_fence(obj->base.resv, &rq->fence,
+					   DMA_RESV_USAGE_KERNEL);
 		i915_request_put(rq);
 	}
 	i915_gem_object_unlock(obj);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 10/13] drm/i915/selftests: ensure we reserve a fence slot
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel

We should always be explicit and allocate a fence slot before adding a
new fence.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 388c85b0f764..da28acb78a88 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -1224,8 +1224,10 @@ static int __igt_mmap_migrate(struct intel_memory_region **placements,
 					  expand32(POISON_INUSE), &rq);
 	i915_gem_object_unpin_pages(obj);
 	if (rq) {
-		dma_resv_add_fence(obj->base.resv, &rq->fence,
-				   DMA_RESV_USAGE_KERNEL);
+		err = dma_resv_reserve_fences(obj->base.resv, 1);
+		if (!err)
+			dma_resv_add_fence(obj->base.resv, &rq->fence,
+					   DMA_RESV_USAGE_KERNEL);
 		i915_request_put(rq);
 	}
 	i915_gem_object_unlock(obj);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 11/13] drm/i915/ttm: handle blitter failure on DG2
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin

If the move or clear operation somehow fails, and the memory underneath
is not cleared, like when moving to lmem, then we currently fallback to
memcpy or memset. However with small-BAR systems this fallback might no
longer be possible. For now we use the set_wedged sledgehammer if we
ever encounter such a scenario, and mark the object as borked to plug
any holes where access to the memory underneath can happen. Add some
basic selftests to exercise this.

v2:
  - In the selftests make sure we grab the runtime pm around the reset.
    Also make sure we grab the reset lock before checking if the device
    is wedged, since the wedge might still be in-progress and hence the
    bit might not be set yet.
  - Don't wedge or put the object into an unknown state, if the request
    construction fails (or similar). Just returning an error and
    skipping the fallback should be safe here.
  - Make sure we wedge each gt. (Thomas)
  - Peek at the unknown_state in io_reserve, that way we don't have to
    export or hand roll the fault_wait_for_idle. (Thomas)
  - Add the missing read-side barriers for the unknown_state. (Thomas)
  - Some kernel-doc fixes. (Thomas)

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  21 +++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  26 +++-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  88 +++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |   1 +
 .../drm/i915/gem/selftests/i915_gem_migrate.c | 141 +++++++++++++++---
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  69 +++++++++
 drivers/gpu/drm/i915/i915_vma.c               |  25 ++--
 10 files changed, 346 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..642a5d59ce26 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -783,10 +783,31 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
 				    intr, MAX_SCHEDULE_TIMEOUT);
 	if (!ret)
 		ret = -ETIME;
+	else if (ret > 0 && i915_gem_object_has_unknown_state(obj))
+		ret = -EIO;
 
 	return ret < 0 ? ret : 0;
 }
 
+/**
+ * i915_gem_object_has_unknown_state - Return true if the object backing pages are
+ * in an unknown_state. This means that userspace must NEVER be allowed to touch
+ * the pages, with either the GPU or CPU.
+ *
+ * ONLY valid to be called after ensuring that all kernel fences have signalled
+ * (in particular the fence for moving/clearing the object).
+ */
+bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj)
+{
+	/*
+	 * The below barrier pairs with the dma_fence_signal() in
+	 * __memcpy_work(). We should only sample the unknown_state after all
+	 * the kernel fences have signalled.
+	 */
+	smp_rmb();
+	return obj->mm.unknown_state;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/huge_gem_object.c"
 #include "selftests/huge_pages.c"
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index e11d82a9f7c3..0bf3ee27a2a8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -524,6 +524,7 @@ int i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj,
 				     struct dma_fence **fence);
 int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
 				      bool intr);
+bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
 
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 					 unsigned int cache_level);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..5cf36a130061 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -547,6 +547,24 @@ struct drm_i915_gem_object {
 		 */
 		bool ttm_shrinkable;
 
+		/**
+		 * @unknown_state: Indicate that the object is effectively
+		 * borked. This is write-once and set if we somehow encounter a
+		 * fatal error when moving/clearing the pages, and we are not
+		 * able to fallback to memcpy/memset, like on small-BAR systems.
+		 * The GPU should also be wedged (or in the process) at this
+		 * point.
+		 *
+		 * Only valid to read this after acquiring the dma-resv lock and
+		 * waiting for all DMA_RESV_USAGE_KERNEL fences to be signalled,
+		 * or if we otherwise know that the moving fence has signalled,
+		 * and we are certain the pages underneath are valid for
+		 * immediate access (under normal operation), like just prior to
+		 * binding the object or when setting up the CPU fault handler.
+		 * See i915_gem_object_has_unknown_state();
+		 */
+		bool unknown_state;
+
 		/**
 		 * Priority list of potential placements for this object.
 		 */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..098409a33e10 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -675,7 +675,15 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
 		i915_ttm_purge(obj);
 }
 
-static bool i915_ttm_resource_mappable(struct ttm_resource *res)
+/**
+ * i915_ttm_resource_mappable - Return true if the ttm resource is CPU
+ * accessible.
+ * @res: The TTM resource to check.
+ *
+ * This is interesting on small-BAR systems where we may encounter lmem objects
+ * that can't be accessed via the CPU.
+ */
+bool i915_ttm_resource_mappable(struct ttm_resource *res)
 {
 	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
 
@@ -687,6 +695,22 @@ static bool i915_ttm_resource_mappable(struct ttm_resource *res)
 
 static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
 {
+	struct drm_i915_gem_object *obj = i915_ttm_to_gem(mem->bo);
+	bool unknown_state;
+
+	if (!obj)
+		return -EINVAL;
+
+	if (!kref_get_unless_zero(&obj->base.refcount))
+		return -EINVAL;
+
+	assert_object_held(obj);
+
+	unknown_state = i915_gem_object_has_unknown_state(obj);
+	i915_gem_object_put(obj);
+	if (unknown_state)
+		return -EINVAL;
+
 	if (!i915_ttm_cpu_maps_iomem(mem))
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 73e371aa3850..e4842b4296fc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -92,4 +92,7 @@ static inline bool i915_ttm_cpu_maps_iomem(struct ttm_resource *mem)
 	/* Once / if we support GGTT, this is also false for cached ttm_tts */
 	return mem->mem_type != I915_PL_SYSTEM;
 }
+
+bool i915_ttm_resource_mappable(struct ttm_resource *res);
+
 #endif
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..364e7fe8efb1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -33,6 +33,7 @@
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 static bool fail_gpu_migration;
 static bool fail_work_allocation;
+static bool ban_memcpy;
 
 void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 					bool work_allocation)
@@ -40,6 +41,11 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 	fail_gpu_migration = gpu_migration;
 	fail_work_allocation = work_allocation;
 }
+
+void i915_ttm_migrate_set_ban_memcpy(bool ban)
+{
+	ban_memcpy = ban;
+}
 #endif
 
 static enum i915_cache_level
@@ -258,15 +264,23 @@ struct i915_ttm_memcpy_arg {
  * from the callback for lockdep reasons.
  * @cb: Callback for the accelerated migration fence.
  * @arg: The argument for the memcpy functionality.
+ * @i915: The i915 pointer.
+ * @obj: The GEM object.
+ * @memcpy_allowed: Instead of processing the @arg, and falling back to memcpy
+ * or memset, we wedge the device and set the @obj unknown_state, to prevent
+ * further access to the object with the CPU or GPU.  On some devices we might
+ * only be permitted to use the blitter engine for such operations.
  */
 struct i915_ttm_memcpy_work {
 	struct dma_fence fence;
 	struct work_struct work;
-	/* The fence lock */
 	spinlock_t lock;
 	struct irq_work irq_work;
 	struct dma_fence_cb cb;
 	struct i915_ttm_memcpy_arg arg;
+	struct drm_i915_private *i915;
+	struct drm_i915_gem_object *obj;
+	bool memcpy_allowed;
 };
 
 static void i915_ttm_move_memcpy(struct i915_ttm_memcpy_arg *arg)
@@ -319,12 +333,34 @@ static void __memcpy_work(struct work_struct *work)
 	struct i915_ttm_memcpy_arg *arg = &copy_work->arg;
 	bool cookie = dma_fence_begin_signalling();
 
-	i915_ttm_move_memcpy(arg);
+	if (copy_work->memcpy_allowed) {
+		i915_ttm_move_memcpy(arg);
+	} else {
+		/*
+		 * Prevent further use of the object. Any future GTT binding or
+		 * CPU access is not allowed once we signal the fence. Outside
+		 * of the fence critical section, we then also then wedge the gpu
+		 * to indicate the device is not functional.
+		 *
+		 * The below dma_fence_signal() is our write-memory-barrier.
+		 */
+		copy_work->obj->mm.unknown_state = true;
+	}
+
 	dma_fence_end_signalling(cookie);
 
 	dma_fence_signal(&copy_work->fence);
 
+	if (!copy_work->memcpy_allowed) {
+		struct intel_gt *gt;
+		unsigned int id;
+
+		for_each_gt(gt, copy_work->i915, id)
+			intel_gt_set_wedged(gt);
+	}
+
 	i915_ttm_memcpy_release(arg);
+	i915_gem_object_put(copy_work->obj);
 	dma_fence_put(&copy_work->fence);
 }
 
@@ -336,6 +372,7 @@ static void __memcpy_irq_work(struct irq_work *irq_work)
 
 	dma_fence_signal(&copy_work->fence);
 	i915_ttm_memcpy_release(arg);
+	i915_gem_object_put(copy_work->obj);
 	dma_fence_put(&copy_work->fence);
 }
 
@@ -389,6 +426,16 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 	return &work->fence;
 }
 
+static bool i915_ttm_memcpy_allowed(struct ttm_buffer_object *bo,
+				    struct ttm_resource *dst_mem)
+{
+	if (!(i915_ttm_resource_mappable(bo->resource) &&
+	      i915_ttm_resource_mappable(dst_mem)))
+		return false;
+
+	return I915_SELFTEST_ONLY(ban_memcpy) ? false : true;
+}
+
 static struct dma_fence *
 __i915_ttm_move(struct ttm_buffer_object *bo,
 		const struct ttm_operation_ctx *ctx, bool clear,
@@ -396,6 +443,9 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
 		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
 		const struct i915_deps *move_deps)
 {
+	const bool memcpy_allowed = i915_ttm_memcpy_allowed(bo, dst_mem);
+	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+	struct drm_i915_private *i915 = to_i915(bo->base.dev);
 	struct i915_ttm_memcpy_work *copy_work = NULL;
 	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
 	struct dma_fence *fence = ERR_PTR(-EINVAL);
@@ -423,9 +473,14 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
 			copy_work = kzalloc(sizeof(*copy_work), GFP_KERNEL);
 
 		if (copy_work) {
+			copy_work->i915 = i915;
+			copy_work->memcpy_allowed = memcpy_allowed;
+			copy_work->obj = i915_gem_object_get(obj);
 			arg = &copy_work->arg;
-			i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
-					     dst_rsgt);
+			if (memcpy_allowed)
+				i915_ttm_memcpy_init(arg, bo, clear, dst_mem,
+						     dst_ttm, dst_rsgt);
+
 			fence = i915_ttm_memcpy_work_arm(copy_work, dep);
 		} else {
 			dma_fence_wait(dep, false);
@@ -450,17 +505,23 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
 	}
 
 	/* Error intercept failed or no accelerated migration to start with */
-	if (!copy_work)
-		i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
-				     dst_rsgt);
-	i915_ttm_move_memcpy(arg);
-	i915_ttm_memcpy_release(arg);
+
+	if (memcpy_allowed) {
+		if (!copy_work)
+			i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
+					     dst_rsgt);
+		i915_ttm_move_memcpy(arg);
+		i915_ttm_memcpy_release(arg);
+	}
+	if (copy_work)
+		i915_gem_object_put(copy_work->obj);
 	kfree(copy_work);
 
-	return NULL;
+	return memcpy_allowed ? NULL : ERR_PTR(-EIO);
 out:
 	if (!fence && copy_work) {
 		i915_ttm_memcpy_release(arg);
+		i915_gem_object_put(copy_work->obj);
 		kfree(copy_work);
 	}
 
@@ -539,8 +600,11 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	}
 
 	if (migration_fence) {
-		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
-						true, dst_mem);
+		if (I915_SELFTEST_ONLY(evict && fail_gpu_migration))
+			ret = -EIO; /* never feed non-migrate fences into ttm */
+		else
+			ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
+							true, dst_mem);
 		if (ret) {
 			dma_fence_wait(migration_fence, false);
 			ttm_bo_move_sync_cleanup(bo, dst_mem);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
index d2e7f149e05c..8a5d5ab0cc34 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
@@ -22,6 +22,7 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo);
 
 I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 							      bool work_allocation));
+I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_ban_memcpy(bool ban));
 
 int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 			  struct drm_i915_gem_object *src,
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index 801af51aff62..fe6c37fd7859 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -9,6 +9,7 @@
 
 #include "i915_deps.h"
 
+#include "selftests/igt_reset.h"
 #include "selftests/igt_spinner.h"
 
 static int igt_fill_check_buffer(struct drm_i915_gem_object *obj,
@@ -109,7 +110,8 @@ static int igt_same_create_migrate(void *arg)
 
 static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
 				  struct drm_i915_gem_object *obj,
-				  struct i915_vma *vma)
+				  struct i915_vma *vma,
+				  bool silent_migrate)
 {
 	int err;
 
@@ -138,7 +140,8 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
 	if (i915_gem_object_is_lmem(obj)) {
 		err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM);
 		if (err) {
-			pr_err("Object failed migration to smem\n");
+			if (!silent_migrate)
+				pr_err("Object failed migration to smem\n");
 			if (err)
 				return err;
 		}
@@ -156,7 +159,8 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
 	} else {
 		err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM_0);
 		if (err) {
-			pr_err("Object failed migration to lmem\n");
+			if (!silent_migrate)
+				pr_err("Object failed migration to lmem\n");
 			if (err)
 				return err;
 		}
@@ -179,7 +183,8 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
 				    struct i915_address_space *vm,
 				    struct i915_deps *deps,
 				    struct igt_spinner *spin,
-				    struct dma_fence *spin_fence)
+				    struct dma_fence *spin_fence,
+				    bool borked_migrate)
 {
 	struct drm_i915_private *i915 = gt->i915;
 	struct drm_i915_gem_object *obj;
@@ -242,7 +247,8 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
 	 */
 	for (i = 1; i <= 5; ++i) {
 		for_i915_gem_ww(&ww, err, true)
-			err = lmem_pages_migrate_one(&ww, obj, vma);
+			err = lmem_pages_migrate_one(&ww, obj, vma,
+						     borked_migrate);
 		if (err)
 			goto out_put;
 	}
@@ -283,23 +289,70 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
 
 static int igt_lmem_pages_failsafe_migrate(void *arg)
 {
-	int fail_gpu, fail_alloc, ret;
+	int fail_gpu, fail_alloc, ban_memcpy, ret;
 	struct intel_gt *gt = arg;
 
 	for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
 		for (fail_alloc = 0; fail_alloc < 2; ++fail_alloc) {
-			pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
-				fail_gpu, fail_alloc);
-			i915_ttm_migrate_set_failure_modes(fail_gpu,
-							   fail_alloc);
-			ret = __igt_lmem_pages_migrate(gt, NULL, NULL, NULL, NULL);
-			if (ret)
-				goto out_err;
+			for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
+				pr_info("Simulated failure modes: gpu: %d, alloc:%d, ban_memcpy: %d\n",
+					fail_gpu, fail_alloc, ban_memcpy);
+				i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
+				i915_ttm_migrate_set_failure_modes(fail_gpu,
+								   fail_alloc);
+				ret = __igt_lmem_pages_migrate(gt, NULL, NULL,
+							       NULL, NULL,
+							       ban_memcpy &&
+							       fail_gpu);
+
+				if (ban_memcpy && fail_gpu) {
+					struct intel_gt *__gt;
+					unsigned int id;
+
+					if (ret != -EIO) {
+						pr_err("expected -EIO, got (%d)\n", ret);
+						ret = -EINVAL;
+					} else {
+						ret = 0;
+					}
+
+					for_each_gt(__gt, gt->i915, id) {
+						intel_wakeref_t wakeref;
+						bool wedged;
+
+						mutex_lock(&__gt->reset.mutex);
+						wedged = test_bit(I915_WEDGED, &__gt->reset.flags);
+						mutex_unlock(&__gt->reset.mutex);
+
+						if (fail_gpu && !fail_alloc) {
+							if (!wedged) {
+								pr_err("gt(%u) not wedged\n", id);
+								ret = -EINVAL;
+								continue;
+							}
+						} else if (wedged) {
+							pr_err("gt(%u) incorrectly wedged\n", id);
+							ret = -EINVAL;
+						} else {
+							continue;
+						}
+
+						wakeref = intel_runtime_pm_get(__gt->uncore->rpm);
+						igt_global_reset_lock(__gt);
+						intel_gt_reset(__gt, ALL_ENGINES, NULL);
+						igt_global_reset_unlock(__gt);
+						intel_runtime_pm_put(__gt->uncore->rpm, wakeref);
+					}
+					if (ret)
+						goto out_err;
+				}
+			}
 		}
 	}
 
 out_err:
 	i915_ttm_migrate_set_failure_modes(false, false);
+	i915_ttm_migrate_set_ban_memcpy(false);
 	return ret;
 }
 
@@ -370,7 +423,7 @@ static int igt_async_migrate(struct intel_gt *gt)
 			goto out_ce;
 
 		err = __igt_lmem_pages_migrate(gt, &ppgtt->vm, &deps, &spin,
-					       spin_fence);
+					       spin_fence, false);
 		i915_deps_fini(&deps);
 		dma_fence_put(spin_fence);
 		if (err)
@@ -394,23 +447,67 @@ static int igt_async_migrate(struct intel_gt *gt)
 #define ASYNC_FAIL_ALLOC 1
 static int igt_lmem_async_migrate(void *arg)
 {
-	int fail_gpu, fail_alloc, ret;
+	int fail_gpu, fail_alloc, ban_memcpy, ret;
 	struct intel_gt *gt = arg;
 
 	for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
 		for (fail_alloc = 0; fail_alloc < ASYNC_FAIL_ALLOC; ++fail_alloc) {
-			pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
-				fail_gpu, fail_alloc);
-			i915_ttm_migrate_set_failure_modes(fail_gpu,
-							   fail_alloc);
-			ret = igt_async_migrate(gt);
-			if (ret)
-				goto out_err;
+			for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
+				pr_info("Simulated failure modes: gpu: %d, alloc: %d, ban_memcpy: %d\n",
+					fail_gpu, fail_alloc, ban_memcpy);
+				i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
+				i915_ttm_migrate_set_failure_modes(fail_gpu,
+								   fail_alloc);
+				ret = igt_async_migrate(gt);
+
+				if (fail_gpu && ban_memcpy) {
+					struct intel_gt *__gt;
+					unsigned int id;
+
+					if (ret != -EIO) {
+						pr_err("expected -EIO, got (%d)\n", ret);
+						ret = -EINVAL;
+					} else {
+						ret = 0;
+					}
+
+					for_each_gt(__gt, gt->i915, id) {
+						intel_wakeref_t wakeref;
+						bool wedged;
+
+						mutex_lock(&__gt->reset.mutex);
+						wedged = test_bit(I915_WEDGED, &__gt->reset.flags);
+						mutex_unlock(&__gt->reset.mutex);
+
+						if (fail_gpu && !fail_alloc) {
+							if (!wedged) {
+								pr_err("gt(%u) not wedged\n", id);
+								ret = -EINVAL;
+								continue;
+							}
+						} else if (wedged) {
+							pr_err("gt(%u) incorrectly wedged\n", id);
+							ret = -EINVAL;
+						} else {
+							continue;
+						}
+
+						wakeref = intel_runtime_pm_get(__gt->uncore->rpm);
+						igt_global_reset_lock(__gt);
+						intel_gt_reset(__gt, ALL_ENGINES, NULL);
+						igt_global_reset_unlock(__gt);
+						intel_runtime_pm_put(__gt->uncore->rpm, wakeref);
+					}
+				}
+				if (ret)
+					goto out_err;
+			}
 		}
 	}
 
 out_err:
 	i915_ttm_migrate_set_failure_modes(false, false);
+	i915_ttm_migrate_set_ban_memcpy(false);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index da28acb78a88..3ced9948a331 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -10,6 +10,7 @@
 #include "gem/i915_gem_internal.h"
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
+#include "gem/i915_gem_ttm_move.h"
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_gt.h"
@@ -21,6 +22,7 @@
 #include "i915_selftest.h"
 #include "selftests/i915_random.h"
 #include "selftests/igt_flush_test.h"
+#include "selftests/igt_reset.h"
 #include "selftests/igt_mmap.h"
 
 struct tile {
@@ -1163,6 +1165,7 @@ static int ___igt_mmap_migrate(struct drm_i915_private *i915,
 #define IGT_MMAP_MIGRATE_FILL        (1 << 1)
 #define IGT_MMAP_MIGRATE_EVICTABLE   (1 << 2)
 #define IGT_MMAP_MIGRATE_UNFAULTABLE (1 << 3)
+#define IGT_MMAP_MIGRATE_FAIL_GPU    (1 << 4)
 static int __igt_mmap_migrate(struct intel_memory_region **placements,
 			      int n_placements,
 			      struct intel_memory_region *expected_mr,
@@ -1237,13 +1240,62 @@ static int __igt_mmap_migrate(struct intel_memory_region **placements,
 	if (flags & IGT_MMAP_MIGRATE_EVICTABLE)
 		igt_make_evictable(&objects);
 
+	if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
+		err = i915_gem_object_lock(obj, NULL);
+		if (err)
+			goto out_put;
+
+		/*
+		 * Ensure we only simulate the gpu failuire when faulting the
+		 * pages.
+		 */
+		err = i915_gem_object_wait_moving_fence(obj, true);
+		i915_gem_object_unlock(obj);
+		if (err)
+			goto out_put;
+		i915_ttm_migrate_set_failure_modes(true, false);
+	}
+
 	err = ___igt_mmap_migrate(i915, obj, addr,
 				  flags & IGT_MMAP_MIGRATE_UNFAULTABLE);
+
 	if (!err && obj->mm.region != expected_mr) {
 		pr_err("%s region mismatch %s\n", __func__, expected_mr->name);
 		err = -EINVAL;
 	}
 
+	if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
+		struct intel_gt *gt;
+		unsigned int id;
+
+		i915_ttm_migrate_set_failure_modes(false, false);
+
+		for_each_gt(gt, i915, id) {
+			intel_wakeref_t wakeref;
+			bool wedged;
+
+			mutex_lock(&gt->reset.mutex);
+			wedged = test_bit(I915_WEDGED, &gt->reset.flags);
+			mutex_unlock(&gt->reset.mutex);
+			if (!wedged) {
+				pr_err("gt(%u) not wedged\n", id);
+				err = -EINVAL;
+				continue;
+			}
+
+			wakeref = intel_runtime_pm_get(gt->uncore->rpm);
+			igt_global_reset_lock(gt);
+			intel_gt_reset(gt, ALL_ENGINES, NULL);
+			igt_global_reset_unlock(gt);
+			intel_runtime_pm_put(gt->uncore->rpm, wakeref);
+		}
+
+		if (!i915_gem_object_has_unknown_state(obj)) {
+			pr_err("object missing unknown_state\n");
+			err = -EINVAL;
+		}
+	}
+
 out_put:
 	i915_gem_object_put(obj);
 	igt_close_objects(i915, &objects);
@@ -1324,6 +1376,23 @@ static int igt_mmap_migrate(void *arg)
 					 IGT_MMAP_MIGRATE_TOPDOWN |
 					 IGT_MMAP_MIGRATE_FILL |
 					 IGT_MMAP_MIGRATE_UNFAULTABLE);
+		if (err)
+			goto out_io_size;
+
+		/*
+		 * Allocate in the non-mappable portion, but force migrating to
+		 * the mappable portion on fault (LMEM -> LMEM). We then also
+		 * simulate a gpu error when moving the pages when faulting the
+		 * pages, which should result in wedging the gpu and returning
+		 * SIGBUS in the fault handler, since we can't fallback to
+		 * memcpy.
+		 */
+		err = __igt_mmap_migrate(single, ARRAY_SIZE(single), mr,
+					 IGT_MMAP_MIGRATE_TOPDOWN |
+					 IGT_MMAP_MIGRATE_FILL |
+					 IGT_MMAP_MIGRATE_EVICTABLE |
+					 IGT_MMAP_MIGRATE_FAIL_GPU |
+					 IGT_MMAP_MIGRATE_UNFAULTABLE);
 out_io_size:
 		mr->io_size = saved_io_size;
 		i915_ttm_buddy_man_force_visible_size(man,
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 5d5828b9a242..43339ecabd73 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -310,7 +310,7 @@ struct i915_vma_work {
 	struct i915_address_space *vm;
 	struct i915_vm_pt_stash stash;
 	struct i915_vma_resource *vma_res;
-	struct drm_i915_gem_object *pinned;
+	struct drm_i915_gem_object *obj;
 	struct i915_sw_dma_fence_cb cb;
 	enum i915_cache_level cache_level;
 	unsigned int flags;
@@ -321,17 +321,25 @@ static void __vma_bind(struct dma_fence_work *work)
 	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
 	struct i915_vma_resource *vma_res = vw->vma_res;
 
+	/*
+	 * We are about the bind the object, which must mean we have already
+	 * signaled the work to potentially clear/move the pages underneath. If
+	 * something went wrong at that stage then the object should have
+	 * unknown_state set, in which case we need to skip the bind.
+	 */
+	if (i915_gem_object_has_unknown_state(vw->obj))
+		return;
+
 	vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
 			       vma_res, vw->cache_level, vw->flags);
-
 }
 
 static void __vma_release(struct dma_fence_work *work)
 {
 	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
 
-	if (vw->pinned)
-		i915_gem_object_put(vw->pinned);
+	if (vw->obj)
+		i915_gem_object_put(vw->obj);
 
 	i915_vm_free_pt_stash(vw->vm, &vw->stash);
 	if (vw->vma_res)
@@ -517,14 +525,7 @@ int i915_vma_bind(struct i915_vma *vma,
 		}
 
 		work->base.dma.error = 0; /* enable the queue_work() */
-
-		/*
-		 * If we don't have the refcounted pages list, keep a reference
-		 * on the object to avoid waiting for the async bind to
-		 * complete in the object destruction path.
-		 */
-		if (!work->vma_res->bi.pages_rsgt)
-			work->pinned = i915_gem_object_get(vma->obj);
+		work->obj = i915_gem_object_get(vma->obj);
 	} else {
 		ret = i915_gem_object_wait_moving_fence(vma->obj, true);
 		if (ret) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 11/13] drm/i915/ttm: handle blitter failure on DG2
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel

If the move or clear operation somehow fails, and the memory underneath
is not cleared, like when moving to lmem, then we currently fallback to
memcpy or memset. However with small-BAR systems this fallback might no
longer be possible. For now we use the set_wedged sledgehammer if we
ever encounter such a scenario, and mark the object as borked to plug
any holes where access to the memory underneath can happen. Add some
basic selftests to exercise this.

v2:
  - In the selftests make sure we grab the runtime pm around the reset.
    Also make sure we grab the reset lock before checking if the device
    is wedged, since the wedge might still be in-progress and hence the
    bit might not be set yet.
  - Don't wedge or put the object into an unknown state, if the request
    construction fails (or similar). Just returning an error and
    skipping the fallback should be safe here.
  - Make sure we wedge each gt. (Thomas)
  - Peek at the unknown_state in io_reserve, that way we don't have to
    export or hand roll the fault_wait_for_idle. (Thomas)
  - Add the missing read-side barriers for the unknown_state. (Thomas)
  - Some kernel-doc fixes. (Thomas)

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  21 +++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  26 +++-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  88 +++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |   1 +
 .../drm/i915/gem/selftests/i915_gem_migrate.c | 141 +++++++++++++++---
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  69 +++++++++
 drivers/gpu/drm/i915/i915_vma.c               |  25 ++--
 10 files changed, 346 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..642a5d59ce26 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -783,10 +783,31 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
 				    intr, MAX_SCHEDULE_TIMEOUT);
 	if (!ret)
 		ret = -ETIME;
+	else if (ret > 0 && i915_gem_object_has_unknown_state(obj))
+		ret = -EIO;
 
 	return ret < 0 ? ret : 0;
 }
 
+/**
+ * i915_gem_object_has_unknown_state - Return true if the object backing pages are
+ * in an unknown_state. This means that userspace must NEVER be allowed to touch
+ * the pages, with either the GPU or CPU.
+ *
+ * ONLY valid to be called after ensuring that all kernel fences have signalled
+ * (in particular the fence for moving/clearing the object).
+ */
+bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj)
+{
+	/*
+	 * The below barrier pairs with the dma_fence_signal() in
+	 * __memcpy_work(). We should only sample the unknown_state after all
+	 * the kernel fences have signalled.
+	 */
+	smp_rmb();
+	return obj->mm.unknown_state;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/huge_gem_object.c"
 #include "selftests/huge_pages.c"
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index e11d82a9f7c3..0bf3ee27a2a8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -524,6 +524,7 @@ int i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj,
 				     struct dma_fence **fence);
 int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
 				      bool intr);
+bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
 
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 					 unsigned int cache_level);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..5cf36a130061 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -547,6 +547,24 @@ struct drm_i915_gem_object {
 		 */
 		bool ttm_shrinkable;
 
+		/**
+		 * @unknown_state: Indicate that the object is effectively
+		 * borked. This is write-once and set if we somehow encounter a
+		 * fatal error when moving/clearing the pages, and we are not
+		 * able to fallback to memcpy/memset, like on small-BAR systems.
+		 * The GPU should also be wedged (or in the process) at this
+		 * point.
+		 *
+		 * Only valid to read this after acquiring the dma-resv lock and
+		 * waiting for all DMA_RESV_USAGE_KERNEL fences to be signalled,
+		 * or if we otherwise know that the moving fence has signalled,
+		 * and we are certain the pages underneath are valid for
+		 * immediate access (under normal operation), like just prior to
+		 * binding the object or when setting up the CPU fault handler.
+		 * See i915_gem_object_has_unknown_state();
+		 */
+		bool unknown_state;
+
 		/**
 		 * Priority list of potential placements for this object.
 		 */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..098409a33e10 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -675,7 +675,15 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
 		i915_ttm_purge(obj);
 }
 
-static bool i915_ttm_resource_mappable(struct ttm_resource *res)
+/**
+ * i915_ttm_resource_mappable - Return true if the ttm resource is CPU
+ * accessible.
+ * @res: The TTM resource to check.
+ *
+ * This is interesting on small-BAR systems where we may encounter lmem objects
+ * that can't be accessed via the CPU.
+ */
+bool i915_ttm_resource_mappable(struct ttm_resource *res)
 {
 	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
 
@@ -687,6 +695,22 @@ static bool i915_ttm_resource_mappable(struct ttm_resource *res)
 
 static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
 {
+	struct drm_i915_gem_object *obj = i915_ttm_to_gem(mem->bo);
+	bool unknown_state;
+
+	if (!obj)
+		return -EINVAL;
+
+	if (!kref_get_unless_zero(&obj->base.refcount))
+		return -EINVAL;
+
+	assert_object_held(obj);
+
+	unknown_state = i915_gem_object_has_unknown_state(obj);
+	i915_gem_object_put(obj);
+	if (unknown_state)
+		return -EINVAL;
+
 	if (!i915_ttm_cpu_maps_iomem(mem))
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 73e371aa3850..e4842b4296fc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -92,4 +92,7 @@ static inline bool i915_ttm_cpu_maps_iomem(struct ttm_resource *mem)
 	/* Once / if we support GGTT, this is also false for cached ttm_tts */
 	return mem->mem_type != I915_PL_SYSTEM;
 }
+
+bool i915_ttm_resource_mappable(struct ttm_resource *res);
+
 #endif
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..364e7fe8efb1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -33,6 +33,7 @@
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 static bool fail_gpu_migration;
 static bool fail_work_allocation;
+static bool ban_memcpy;
 
 void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 					bool work_allocation)
@@ -40,6 +41,11 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 	fail_gpu_migration = gpu_migration;
 	fail_work_allocation = work_allocation;
 }
+
+void i915_ttm_migrate_set_ban_memcpy(bool ban)
+{
+	ban_memcpy = ban;
+}
 #endif
 
 static enum i915_cache_level
@@ -258,15 +264,23 @@ struct i915_ttm_memcpy_arg {
  * from the callback for lockdep reasons.
  * @cb: Callback for the accelerated migration fence.
  * @arg: The argument for the memcpy functionality.
+ * @i915: The i915 pointer.
+ * @obj: The GEM object.
+ * @memcpy_allowed: Instead of processing the @arg, and falling back to memcpy
+ * or memset, we wedge the device and set the @obj unknown_state, to prevent
+ * further access to the object with the CPU or GPU.  On some devices we might
+ * only be permitted to use the blitter engine for such operations.
  */
 struct i915_ttm_memcpy_work {
 	struct dma_fence fence;
 	struct work_struct work;
-	/* The fence lock */
 	spinlock_t lock;
 	struct irq_work irq_work;
 	struct dma_fence_cb cb;
 	struct i915_ttm_memcpy_arg arg;
+	struct drm_i915_private *i915;
+	struct drm_i915_gem_object *obj;
+	bool memcpy_allowed;
 };
 
 static void i915_ttm_move_memcpy(struct i915_ttm_memcpy_arg *arg)
@@ -319,12 +333,34 @@ static void __memcpy_work(struct work_struct *work)
 	struct i915_ttm_memcpy_arg *arg = &copy_work->arg;
 	bool cookie = dma_fence_begin_signalling();
 
-	i915_ttm_move_memcpy(arg);
+	if (copy_work->memcpy_allowed) {
+		i915_ttm_move_memcpy(arg);
+	} else {
+		/*
+		 * Prevent further use of the object. Any future GTT binding or
+		 * CPU access is not allowed once we signal the fence. Outside
+		 * of the fence critical section, we then also then wedge the gpu
+		 * to indicate the device is not functional.
+		 *
+		 * The below dma_fence_signal() is our write-memory-barrier.
+		 */
+		copy_work->obj->mm.unknown_state = true;
+	}
+
 	dma_fence_end_signalling(cookie);
 
 	dma_fence_signal(&copy_work->fence);
 
+	if (!copy_work->memcpy_allowed) {
+		struct intel_gt *gt;
+		unsigned int id;
+
+		for_each_gt(gt, copy_work->i915, id)
+			intel_gt_set_wedged(gt);
+	}
+
 	i915_ttm_memcpy_release(arg);
+	i915_gem_object_put(copy_work->obj);
 	dma_fence_put(&copy_work->fence);
 }
 
@@ -336,6 +372,7 @@ static void __memcpy_irq_work(struct irq_work *irq_work)
 
 	dma_fence_signal(&copy_work->fence);
 	i915_ttm_memcpy_release(arg);
+	i915_gem_object_put(copy_work->obj);
 	dma_fence_put(&copy_work->fence);
 }
 
@@ -389,6 +426,16 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 	return &work->fence;
 }
 
+static bool i915_ttm_memcpy_allowed(struct ttm_buffer_object *bo,
+				    struct ttm_resource *dst_mem)
+{
+	if (!(i915_ttm_resource_mappable(bo->resource) &&
+	      i915_ttm_resource_mappable(dst_mem)))
+		return false;
+
+	return I915_SELFTEST_ONLY(ban_memcpy) ? false : true;
+}
+
 static struct dma_fence *
 __i915_ttm_move(struct ttm_buffer_object *bo,
 		const struct ttm_operation_ctx *ctx, bool clear,
@@ -396,6 +443,9 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
 		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
 		const struct i915_deps *move_deps)
 {
+	const bool memcpy_allowed = i915_ttm_memcpy_allowed(bo, dst_mem);
+	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+	struct drm_i915_private *i915 = to_i915(bo->base.dev);
 	struct i915_ttm_memcpy_work *copy_work = NULL;
 	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
 	struct dma_fence *fence = ERR_PTR(-EINVAL);
@@ -423,9 +473,14 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
 			copy_work = kzalloc(sizeof(*copy_work), GFP_KERNEL);
 
 		if (copy_work) {
+			copy_work->i915 = i915;
+			copy_work->memcpy_allowed = memcpy_allowed;
+			copy_work->obj = i915_gem_object_get(obj);
 			arg = &copy_work->arg;
-			i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
-					     dst_rsgt);
+			if (memcpy_allowed)
+				i915_ttm_memcpy_init(arg, bo, clear, dst_mem,
+						     dst_ttm, dst_rsgt);
+
 			fence = i915_ttm_memcpy_work_arm(copy_work, dep);
 		} else {
 			dma_fence_wait(dep, false);
@@ -450,17 +505,23 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
 	}
 
 	/* Error intercept failed or no accelerated migration to start with */
-	if (!copy_work)
-		i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
-				     dst_rsgt);
-	i915_ttm_move_memcpy(arg);
-	i915_ttm_memcpy_release(arg);
+
+	if (memcpy_allowed) {
+		if (!copy_work)
+			i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
+					     dst_rsgt);
+		i915_ttm_move_memcpy(arg);
+		i915_ttm_memcpy_release(arg);
+	}
+	if (copy_work)
+		i915_gem_object_put(copy_work->obj);
 	kfree(copy_work);
 
-	return NULL;
+	return memcpy_allowed ? NULL : ERR_PTR(-EIO);
 out:
 	if (!fence && copy_work) {
 		i915_ttm_memcpy_release(arg);
+		i915_gem_object_put(copy_work->obj);
 		kfree(copy_work);
 	}
 
@@ -539,8 +600,11 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	}
 
 	if (migration_fence) {
-		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
-						true, dst_mem);
+		if (I915_SELFTEST_ONLY(evict && fail_gpu_migration))
+			ret = -EIO; /* never feed non-migrate fences into ttm */
+		else
+			ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
+							true, dst_mem);
 		if (ret) {
 			dma_fence_wait(migration_fence, false);
 			ttm_bo_move_sync_cleanup(bo, dst_mem);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
index d2e7f149e05c..8a5d5ab0cc34 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
@@ -22,6 +22,7 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo);
 
 I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 							      bool work_allocation));
+I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_ban_memcpy(bool ban));
 
 int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 			  struct drm_i915_gem_object *src,
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index 801af51aff62..fe6c37fd7859 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -9,6 +9,7 @@
 
 #include "i915_deps.h"
 
+#include "selftests/igt_reset.h"
 #include "selftests/igt_spinner.h"
 
 static int igt_fill_check_buffer(struct drm_i915_gem_object *obj,
@@ -109,7 +110,8 @@ static int igt_same_create_migrate(void *arg)
 
 static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
 				  struct drm_i915_gem_object *obj,
-				  struct i915_vma *vma)
+				  struct i915_vma *vma,
+				  bool silent_migrate)
 {
 	int err;
 
@@ -138,7 +140,8 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
 	if (i915_gem_object_is_lmem(obj)) {
 		err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM);
 		if (err) {
-			pr_err("Object failed migration to smem\n");
+			if (!silent_migrate)
+				pr_err("Object failed migration to smem\n");
 			if (err)
 				return err;
 		}
@@ -156,7 +159,8 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
 	} else {
 		err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM_0);
 		if (err) {
-			pr_err("Object failed migration to lmem\n");
+			if (!silent_migrate)
+				pr_err("Object failed migration to lmem\n");
 			if (err)
 				return err;
 		}
@@ -179,7 +183,8 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
 				    struct i915_address_space *vm,
 				    struct i915_deps *deps,
 				    struct igt_spinner *spin,
-				    struct dma_fence *spin_fence)
+				    struct dma_fence *spin_fence,
+				    bool borked_migrate)
 {
 	struct drm_i915_private *i915 = gt->i915;
 	struct drm_i915_gem_object *obj;
@@ -242,7 +247,8 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
 	 */
 	for (i = 1; i <= 5; ++i) {
 		for_i915_gem_ww(&ww, err, true)
-			err = lmem_pages_migrate_one(&ww, obj, vma);
+			err = lmem_pages_migrate_one(&ww, obj, vma,
+						     borked_migrate);
 		if (err)
 			goto out_put;
 	}
@@ -283,23 +289,70 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
 
 static int igt_lmem_pages_failsafe_migrate(void *arg)
 {
-	int fail_gpu, fail_alloc, ret;
+	int fail_gpu, fail_alloc, ban_memcpy, ret;
 	struct intel_gt *gt = arg;
 
 	for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
 		for (fail_alloc = 0; fail_alloc < 2; ++fail_alloc) {
-			pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
-				fail_gpu, fail_alloc);
-			i915_ttm_migrate_set_failure_modes(fail_gpu,
-							   fail_alloc);
-			ret = __igt_lmem_pages_migrate(gt, NULL, NULL, NULL, NULL);
-			if (ret)
-				goto out_err;
+			for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
+				pr_info("Simulated failure modes: gpu: %d, alloc:%d, ban_memcpy: %d\n",
+					fail_gpu, fail_alloc, ban_memcpy);
+				i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
+				i915_ttm_migrate_set_failure_modes(fail_gpu,
+								   fail_alloc);
+				ret = __igt_lmem_pages_migrate(gt, NULL, NULL,
+							       NULL, NULL,
+							       ban_memcpy &&
+							       fail_gpu);
+
+				if (ban_memcpy && fail_gpu) {
+					struct intel_gt *__gt;
+					unsigned int id;
+
+					if (ret != -EIO) {
+						pr_err("expected -EIO, got (%d)\n", ret);
+						ret = -EINVAL;
+					} else {
+						ret = 0;
+					}
+
+					for_each_gt(__gt, gt->i915, id) {
+						intel_wakeref_t wakeref;
+						bool wedged;
+
+						mutex_lock(&__gt->reset.mutex);
+						wedged = test_bit(I915_WEDGED, &__gt->reset.flags);
+						mutex_unlock(&__gt->reset.mutex);
+
+						if (fail_gpu && !fail_alloc) {
+							if (!wedged) {
+								pr_err("gt(%u) not wedged\n", id);
+								ret = -EINVAL;
+								continue;
+							}
+						} else if (wedged) {
+							pr_err("gt(%u) incorrectly wedged\n", id);
+							ret = -EINVAL;
+						} else {
+							continue;
+						}
+
+						wakeref = intel_runtime_pm_get(__gt->uncore->rpm);
+						igt_global_reset_lock(__gt);
+						intel_gt_reset(__gt, ALL_ENGINES, NULL);
+						igt_global_reset_unlock(__gt);
+						intel_runtime_pm_put(__gt->uncore->rpm, wakeref);
+					}
+					if (ret)
+						goto out_err;
+				}
+			}
 		}
 	}
 
 out_err:
 	i915_ttm_migrate_set_failure_modes(false, false);
+	i915_ttm_migrate_set_ban_memcpy(false);
 	return ret;
 }
 
@@ -370,7 +423,7 @@ static int igt_async_migrate(struct intel_gt *gt)
 			goto out_ce;
 
 		err = __igt_lmem_pages_migrate(gt, &ppgtt->vm, &deps, &spin,
-					       spin_fence);
+					       spin_fence, false);
 		i915_deps_fini(&deps);
 		dma_fence_put(spin_fence);
 		if (err)
@@ -394,23 +447,67 @@ static int igt_async_migrate(struct intel_gt *gt)
 #define ASYNC_FAIL_ALLOC 1
 static int igt_lmem_async_migrate(void *arg)
 {
-	int fail_gpu, fail_alloc, ret;
+	int fail_gpu, fail_alloc, ban_memcpy, ret;
 	struct intel_gt *gt = arg;
 
 	for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
 		for (fail_alloc = 0; fail_alloc < ASYNC_FAIL_ALLOC; ++fail_alloc) {
-			pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
-				fail_gpu, fail_alloc);
-			i915_ttm_migrate_set_failure_modes(fail_gpu,
-							   fail_alloc);
-			ret = igt_async_migrate(gt);
-			if (ret)
-				goto out_err;
+			for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
+				pr_info("Simulated failure modes: gpu: %d, alloc: %d, ban_memcpy: %d\n",
+					fail_gpu, fail_alloc, ban_memcpy);
+				i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
+				i915_ttm_migrate_set_failure_modes(fail_gpu,
+								   fail_alloc);
+				ret = igt_async_migrate(gt);
+
+				if (fail_gpu && ban_memcpy) {
+					struct intel_gt *__gt;
+					unsigned int id;
+
+					if (ret != -EIO) {
+						pr_err("expected -EIO, got (%d)\n", ret);
+						ret = -EINVAL;
+					} else {
+						ret = 0;
+					}
+
+					for_each_gt(__gt, gt->i915, id) {
+						intel_wakeref_t wakeref;
+						bool wedged;
+
+						mutex_lock(&__gt->reset.mutex);
+						wedged = test_bit(I915_WEDGED, &__gt->reset.flags);
+						mutex_unlock(&__gt->reset.mutex);
+
+						if (fail_gpu && !fail_alloc) {
+							if (!wedged) {
+								pr_err("gt(%u) not wedged\n", id);
+								ret = -EINVAL;
+								continue;
+							}
+						} else if (wedged) {
+							pr_err("gt(%u) incorrectly wedged\n", id);
+							ret = -EINVAL;
+						} else {
+							continue;
+						}
+
+						wakeref = intel_runtime_pm_get(__gt->uncore->rpm);
+						igt_global_reset_lock(__gt);
+						intel_gt_reset(__gt, ALL_ENGINES, NULL);
+						igt_global_reset_unlock(__gt);
+						intel_runtime_pm_put(__gt->uncore->rpm, wakeref);
+					}
+				}
+				if (ret)
+					goto out_err;
+			}
 		}
 	}
 
 out_err:
 	i915_ttm_migrate_set_failure_modes(false, false);
+	i915_ttm_migrate_set_ban_memcpy(false);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index da28acb78a88..3ced9948a331 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -10,6 +10,7 @@
 #include "gem/i915_gem_internal.h"
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
+#include "gem/i915_gem_ttm_move.h"
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_gt.h"
@@ -21,6 +22,7 @@
 #include "i915_selftest.h"
 #include "selftests/i915_random.h"
 #include "selftests/igt_flush_test.h"
+#include "selftests/igt_reset.h"
 #include "selftests/igt_mmap.h"
 
 struct tile {
@@ -1163,6 +1165,7 @@ static int ___igt_mmap_migrate(struct drm_i915_private *i915,
 #define IGT_MMAP_MIGRATE_FILL        (1 << 1)
 #define IGT_MMAP_MIGRATE_EVICTABLE   (1 << 2)
 #define IGT_MMAP_MIGRATE_UNFAULTABLE (1 << 3)
+#define IGT_MMAP_MIGRATE_FAIL_GPU    (1 << 4)
 static int __igt_mmap_migrate(struct intel_memory_region **placements,
 			      int n_placements,
 			      struct intel_memory_region *expected_mr,
@@ -1237,13 +1240,62 @@ static int __igt_mmap_migrate(struct intel_memory_region **placements,
 	if (flags & IGT_MMAP_MIGRATE_EVICTABLE)
 		igt_make_evictable(&objects);
 
+	if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
+		err = i915_gem_object_lock(obj, NULL);
+		if (err)
+			goto out_put;
+
+		/*
+		 * Ensure we only simulate the gpu failuire when faulting the
+		 * pages.
+		 */
+		err = i915_gem_object_wait_moving_fence(obj, true);
+		i915_gem_object_unlock(obj);
+		if (err)
+			goto out_put;
+		i915_ttm_migrate_set_failure_modes(true, false);
+	}
+
 	err = ___igt_mmap_migrate(i915, obj, addr,
 				  flags & IGT_MMAP_MIGRATE_UNFAULTABLE);
+
 	if (!err && obj->mm.region != expected_mr) {
 		pr_err("%s region mismatch %s\n", __func__, expected_mr->name);
 		err = -EINVAL;
 	}
 
+	if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
+		struct intel_gt *gt;
+		unsigned int id;
+
+		i915_ttm_migrate_set_failure_modes(false, false);
+
+		for_each_gt(gt, i915, id) {
+			intel_wakeref_t wakeref;
+			bool wedged;
+
+			mutex_lock(&gt->reset.mutex);
+			wedged = test_bit(I915_WEDGED, &gt->reset.flags);
+			mutex_unlock(&gt->reset.mutex);
+			if (!wedged) {
+				pr_err("gt(%u) not wedged\n", id);
+				err = -EINVAL;
+				continue;
+			}
+
+			wakeref = intel_runtime_pm_get(gt->uncore->rpm);
+			igt_global_reset_lock(gt);
+			intel_gt_reset(gt, ALL_ENGINES, NULL);
+			igt_global_reset_unlock(gt);
+			intel_runtime_pm_put(gt->uncore->rpm, wakeref);
+		}
+
+		if (!i915_gem_object_has_unknown_state(obj)) {
+			pr_err("object missing unknown_state\n");
+			err = -EINVAL;
+		}
+	}
+
 out_put:
 	i915_gem_object_put(obj);
 	igt_close_objects(i915, &objects);
@@ -1324,6 +1376,23 @@ static int igt_mmap_migrate(void *arg)
 					 IGT_MMAP_MIGRATE_TOPDOWN |
 					 IGT_MMAP_MIGRATE_FILL |
 					 IGT_MMAP_MIGRATE_UNFAULTABLE);
+		if (err)
+			goto out_io_size;
+
+		/*
+		 * Allocate in the non-mappable portion, but force migrating to
+		 * the mappable portion on fault (LMEM -> LMEM). We then also
+		 * simulate a gpu error when moving the pages when faulting the
+		 * pages, which should result in wedging the gpu and returning
+		 * SIGBUS in the fault handler, since we can't fallback to
+		 * memcpy.
+		 */
+		err = __igt_mmap_migrate(single, ARRAY_SIZE(single), mr,
+					 IGT_MMAP_MIGRATE_TOPDOWN |
+					 IGT_MMAP_MIGRATE_FILL |
+					 IGT_MMAP_MIGRATE_EVICTABLE |
+					 IGT_MMAP_MIGRATE_FAIL_GPU |
+					 IGT_MMAP_MIGRATE_UNFAULTABLE);
 out_io_size:
 		mr->io_size = saved_io_size;
 		i915_ttm_buddy_man_force_visible_size(man,
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 5d5828b9a242..43339ecabd73 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -310,7 +310,7 @@ struct i915_vma_work {
 	struct i915_address_space *vm;
 	struct i915_vm_pt_stash stash;
 	struct i915_vma_resource *vma_res;
-	struct drm_i915_gem_object *pinned;
+	struct drm_i915_gem_object *obj;
 	struct i915_sw_dma_fence_cb cb;
 	enum i915_cache_level cache_level;
 	unsigned int flags;
@@ -321,17 +321,25 @@ static void __vma_bind(struct dma_fence_work *work)
 	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
 	struct i915_vma_resource *vma_res = vw->vma_res;
 
+	/*
+	 * We are about the bind the object, which must mean we have already
+	 * signaled the work to potentially clear/move the pages underneath. If
+	 * something went wrong at that stage then the object should have
+	 * unknown_state set, in which case we need to skip the bind.
+	 */
+	if (i915_gem_object_has_unknown_state(vw->obj))
+		return;
+
 	vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
 			       vma_res, vw->cache_level, vw->flags);
-
 }
 
 static void __vma_release(struct dma_fence_work *work)
 {
 	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
 
-	if (vw->pinned)
-		i915_gem_object_put(vw->pinned);
+	if (vw->obj)
+		i915_gem_object_put(vw->obj);
 
 	i915_vm_free_pt_stash(vw->vm, &vw->stash);
 	if (vw->vma_res)
@@ -517,14 +525,7 @@ int i915_vma_bind(struct i915_vma *vma,
 		}
 
 		work->base.dma.error = 0; /* enable the queue_work() */
-
-		/*
-		 * If we don't have the refcounted pages list, keep a reference
-		 * on the object to avoid waiting for the async bind to
-		 * complete in the object destruction path.
-		 */
-		if (!work->vma_res->bi.pages_rsgt)
-			work->pinned = i915_gem_object_get(vma->obj);
+		work->obj = i915_gem_object_get(vma->obj);
 	} else {
 		ret = i915_gem_object_wait_moving_fence(vma->obj, true);
 		if (ret) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 12/13] drm/i915/ttm: disallow CPU fallback mode for ccs pages
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin

Falling back to memcpy/memset shouldn't be allowed if we know we have
CCS state to manage using the blitter. Otherwise we are potentially
leaving the aux CCS state in an unknown state, which smells like an info
leak.

Fixes: 48760ffe923a ("drm/i915/gt: Clear compress metadata for Flat-ccs objects")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 26 ++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_object.h   |  2 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 18 --------------
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  3 +++
 4 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 642a5d59ce26..ccec4055fde3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -717,6 +717,32 @@ bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj,
 	return false;
 }
 
+/**
+ * i915_gem_object_needs_ccs_pages - Check whether the object requires extra
+ * pages when placed in system-memory, in order to save and later restore the
+ * flat-CCS aux state when the object is moved between local-memory and
+ * system-memory
+ * @obj: Pointer to the object
+ *
+ * Return: True if the object needs extra ccs pages. False otherwise.
+ */
+bool i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj)
+{
+	bool lmem_placement = false;
+	int i;
+
+	for (i = 0; i < obj->mm.n_placements; i++) {
+		/* Compression is not allowed for the objects with smem placement */
+		if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
+			return false;
+		if (!lmem_placement &&
+		    obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
+			lmem_placement = true;
+	}
+
+	return lmem_placement;
+}
+
 void i915_gem_init__objects(struct drm_i915_private *i915)
 {
 	INIT_DELAYED_WORK(&i915->mm.free_work, __i915_gem_free_work);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 0bf3ee27a2a8..6f0a3ce35567 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -618,6 +618,8 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
 bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj,
 					enum intel_memory_type type);
 
+bool i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj);
+
 int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
 			 size_t size, struct intel_memory_region *mr,
 			 struct address_space *mapping,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 098409a33e10..7e1f8b83077f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -266,24 +266,6 @@ static const struct i915_refct_sgt_ops tt_rsgt_ops = {
 	.release = i915_ttm_tt_release
 };
 
-static inline bool
-i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj)
-{
-	bool lmem_placement = false;
-	int i;
-
-	for (i = 0; i < obj->mm.n_placements; i++) {
-		/* Compression is not allowed for the objects with smem placement */
-		if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
-			return false;
-		if (!lmem_placement &&
-		    obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
-			lmem_placement = true;
-	}
-
-	return lmem_placement;
-}
-
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 					 uint32_t page_flags)
 {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 364e7fe8efb1..d22e38aad6b9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -429,6 +429,9 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 static bool i915_ttm_memcpy_allowed(struct ttm_buffer_object *bo,
 				    struct ttm_resource *dst_mem)
 {
+	if (i915_gem_object_needs_ccs_pages(i915_ttm_to_gem(bo)))
+		return false;
+
 	if (!(i915_ttm_resource_mappable(bo->resource) &&
 	      i915_ttm_resource_mappable(dst_mem)))
 		return false;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 12/13] drm/i915/ttm: disallow CPU fallback mode for ccs pages
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel

Falling back to memcpy/memset shouldn't be allowed if we know we have
CCS state to manage using the blitter. Otherwise we are potentially
leaving the aux CCS state in an unknown state, which smells like an info
leak.

Fixes: 48760ffe923a ("drm/i915/gt: Clear compress metadata for Flat-ccs objects")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 26 ++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_object.h   |  2 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 18 --------------
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  3 +++
 4 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 642a5d59ce26..ccec4055fde3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -717,6 +717,32 @@ bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj,
 	return false;
 }
 
+/**
+ * i915_gem_object_needs_ccs_pages - Check whether the object requires extra
+ * pages when placed in system-memory, in order to save and later restore the
+ * flat-CCS aux state when the object is moved between local-memory and
+ * system-memory
+ * @obj: Pointer to the object
+ *
+ * Return: True if the object needs extra ccs pages. False otherwise.
+ */
+bool i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj)
+{
+	bool lmem_placement = false;
+	int i;
+
+	for (i = 0; i < obj->mm.n_placements; i++) {
+		/* Compression is not allowed for the objects with smem placement */
+		if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
+			return false;
+		if (!lmem_placement &&
+		    obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
+			lmem_placement = true;
+	}
+
+	return lmem_placement;
+}
+
 void i915_gem_init__objects(struct drm_i915_private *i915)
 {
 	INIT_DELAYED_WORK(&i915->mm.free_work, __i915_gem_free_work);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 0bf3ee27a2a8..6f0a3ce35567 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -618,6 +618,8 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
 bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj,
 					enum intel_memory_type type);
 
+bool i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj);
+
 int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
 			 size_t size, struct intel_memory_region *mr,
 			 struct address_space *mapping,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 098409a33e10..7e1f8b83077f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -266,24 +266,6 @@ static const struct i915_refct_sgt_ops tt_rsgt_ops = {
 	.release = i915_ttm_tt_release
 };
 
-static inline bool
-i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj)
-{
-	bool lmem_placement = false;
-	int i;
-
-	for (i = 0; i < obj->mm.n_placements; i++) {
-		/* Compression is not allowed for the objects with smem placement */
-		if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
-			return false;
-		if (!lmem_placement &&
-		    obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
-			lmem_placement = true;
-	}
-
-	return lmem_placement;
-}
-
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 					 uint32_t page_flags)
 {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 364e7fe8efb1..d22e38aad6b9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -429,6 +429,9 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 static bool i915_ttm_memcpy_allowed(struct ttm_buffer_object *bo,
 				    struct ttm_resource *dst_mem)
 {
+	if (i915_gem_object_needs_ccs_pages(i915_ttm_to_gem(bo)))
+		return false;
+
 	if (!(i915_ttm_resource_mappable(bo->resource) &&
 	      i915_ttm_resource_mappable(dst_mem)))
 		return false;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 13/13] drm/i915: turn on small BAR support
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:14   ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin

With the uAPI in place we should now have enough in place to ensure a
working system on small BAR configurations.

v2: (Nirmoy & Thomas):
  - s/full BAR/Resizable BAR/ which is hopefully more easily
    understood by users.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_region_lmem.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index d09b996a9759..fa7b86f83e7b 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -112,12 +112,6 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 		flat_ccs_base = intel_gt_mcr_read_any(gt, XEHP_FLAT_CCS_BASE_ADDR);
 		flat_ccs_base = (flat_ccs_base >> XEHP_CCS_BASE_SHIFT) * SZ_64K;
 
-		/* FIXME: Remove this when we have small-bar enabled */
-		if (pci_resource_len(pdev, 2) < lmem_size) {
-			drm_err(&i915->drm, "System requires small-BAR support, which is currently unsupported on this kernel\n");
-			return ERR_PTR(-EINVAL);
-		}
-
 		if (GEM_WARN_ON(lmem_size < flat_ccs_base))
 			return ERR_PTR(-EIO);
 
@@ -170,6 +164,10 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	drm_info(&i915->drm, "Local memory available: %pa\n",
 		 &lmem_size);
 
+	if (io_size < lmem_size)
+		drm_info(&i915->drm, "Using a reduced BAR size of %lluMiB. Consider enabling 'Resizable BAR' or similar, if available in the BIOS.\n",
+			 (u64)io_size >> 20);
+
 	return mem;
 
 err_region_put:
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v3 13/13] drm/i915: turn on small BAR support
@ 2022-06-29 12:14   ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 12:14 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Daniel Vetter, Kenneth Graunke, dri-devel

With the uAPI in place we should now have enough in place to ensure a
working system on small BAR configurations.

v2: (Nirmoy & Thomas):
  - s/full BAR/Resizable BAR/ which is hopefully more easily
    understood by users.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_region_lmem.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index d09b996a9759..fa7b86f83e7b 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -112,12 +112,6 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 		flat_ccs_base = intel_gt_mcr_read_any(gt, XEHP_FLAT_CCS_BASE_ADDR);
 		flat_ccs_base = (flat_ccs_base >> XEHP_CCS_BASE_SHIFT) * SZ_64K;
 
-		/* FIXME: Remove this when we have small-bar enabled */
-		if (pci_resource_len(pdev, 2) < lmem_size) {
-			drm_err(&i915->drm, "System requires small-BAR support, which is currently unsupported on this kernel\n");
-			return ERR_PTR(-EINVAL);
-		}
-
 		if (GEM_WARN_ON(lmem_size < flat_ccs_base))
 			return ERR_PTR(-EIO);
 
@@ -170,6 +164,10 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	drm_info(&i915->drm, "Local memory available: %pa\n",
 		 &lmem_size);
 
+	if (io_size < lmem_size)
+		drm_info(&i915->drm, "Using a reduced BAR size of %lluMiB. Consider enabling 'Resizable BAR' or similar, if available in the BIOS.\n",
+			 (u64)io_size >> 20);
+
 	return mem;
 
 err_region_put:
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 12/13] drm/i915/ttm: disallow CPU fallback mode for ccs pages
  2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 12:43     ` Ramalingam C
  -1 siblings, 0 replies; 48+ messages in thread
From: Ramalingam C @ 2022-06-29 12:43 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Thomas Hellström, Tvrtko Ursulin, Daniel Vetter, intel-gfx,
	Lionel Landwerlin, Kenneth Graunke, Jon Bloomfield, dri-devel,
	Jordan Justen, Akeem G Abodunrin

On 2022-06-29 at 13:14:26 +0100, Matthew Auld wrote:
> Falling back to memcpy/memset shouldn't be allowed if we know we have
> CCS state to manage using the blitter. Otherwise we are potentially
> leaving the aux CCS state in an unknown state, which smells like an info
> leak.
> 
> Fixes: 48760ffe923a ("drm/i915/gt: Clear compress metadata for Flat-ccs objects")

Looks good to me.

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c   | 26 ++++++++++++++++++++
>  drivers/gpu/drm/i915/gem/i915_gem_object.h   |  2 ++
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 18 --------------
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  3 +++
>  4 files changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 642a5d59ce26..ccec4055fde3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -717,6 +717,32 @@ bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj,
>  	return false;
>  }
>  
> +/**
> + * i915_gem_object_needs_ccs_pages - Check whether the object requires extra
> + * pages when placed in system-memory, in order to save and later restore the
> + * flat-CCS aux state when the object is moved between local-memory and
> + * system-memory
> + * @obj: Pointer to the object
> + *
> + * Return: True if the object needs extra ccs pages. False otherwise.
> + */
> +bool i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj)
> +{
> +	bool lmem_placement = false;
> +	int i;
> +
> +	for (i = 0; i < obj->mm.n_placements; i++) {
> +		/* Compression is not allowed for the objects with smem placement */
> +		if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
> +			return false;
> +		if (!lmem_placement &&
> +		    obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
> +			lmem_placement = true;
> +	}
> +
> +	return lmem_placement;
> +}
> +
>  void i915_gem_init__objects(struct drm_i915_private *i915)
>  {
>  	INIT_DELAYED_WORK(&i915->mm.free_work, __i915_gem_free_work);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 0bf3ee27a2a8..6f0a3ce35567 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -618,6 +618,8 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
>  bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj,
>  					enum intel_memory_type type);
>  
> +bool i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj);
> +
>  int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
>  			 size_t size, struct intel_memory_region *mr,
>  			 struct address_space *mapping,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 098409a33e10..7e1f8b83077f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -266,24 +266,6 @@ static const struct i915_refct_sgt_ops tt_rsgt_ops = {
>  	.release = i915_ttm_tt_release
>  };
>  
> -static inline bool
> -i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj)
> -{
> -	bool lmem_placement = false;
> -	int i;
> -
> -	for (i = 0; i < obj->mm.n_placements; i++) {
> -		/* Compression is not allowed for the objects with smem placement */
> -		if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
> -			return false;
> -		if (!lmem_placement &&
> -		    obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
> -			lmem_placement = true;
> -	}
> -
> -	return lmem_placement;
> -}
> -
>  static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>  					 uint32_t page_flags)
>  {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index 364e7fe8efb1..d22e38aad6b9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -429,6 +429,9 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
>  static bool i915_ttm_memcpy_allowed(struct ttm_buffer_object *bo,
>  				    struct ttm_resource *dst_mem)
>  {
> +	if (i915_gem_object_needs_ccs_pages(i915_ttm_to_gem(bo)))
> +		return false;
> +
>  	if (!(i915_ttm_resource_mappable(bo->resource) &&
>  	      i915_ttm_resource_mappable(dst_mem)))
>  		return false;
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx] [PATCH v3 12/13] drm/i915/ttm: disallow CPU fallback mode for ccs pages
@ 2022-06-29 12:43     ` Ramalingam C
  0 siblings, 0 replies; 48+ messages in thread
From: Ramalingam C @ 2022-06-29 12:43 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Thomas Hellström, Daniel Vetter, intel-gfx, Kenneth Graunke,
	dri-devel

On 2022-06-29 at 13:14:26 +0100, Matthew Auld wrote:
> Falling back to memcpy/memset shouldn't be allowed if we know we have
> CCS state to manage using the blitter. Otherwise we are potentially
> leaving the aux CCS state in an unknown state, which smells like an info
> leak.
> 
> Fixes: 48760ffe923a ("drm/i915/gt: Clear compress metadata for Flat-ccs objects")

Looks good to me.

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c   | 26 ++++++++++++++++++++
>  drivers/gpu/drm/i915/gem/i915_gem_object.h   |  2 ++
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 18 --------------
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  3 +++
>  4 files changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 642a5d59ce26..ccec4055fde3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -717,6 +717,32 @@ bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj,
>  	return false;
>  }
>  
> +/**
> + * i915_gem_object_needs_ccs_pages - Check whether the object requires extra
> + * pages when placed in system-memory, in order to save and later restore the
> + * flat-CCS aux state when the object is moved between local-memory and
> + * system-memory
> + * @obj: Pointer to the object
> + *
> + * Return: True if the object needs extra ccs pages. False otherwise.
> + */
> +bool i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj)
> +{
> +	bool lmem_placement = false;
> +	int i;
> +
> +	for (i = 0; i < obj->mm.n_placements; i++) {
> +		/* Compression is not allowed for the objects with smem placement */
> +		if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
> +			return false;
> +		if (!lmem_placement &&
> +		    obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
> +			lmem_placement = true;
> +	}
> +
> +	return lmem_placement;
> +}
> +
>  void i915_gem_init__objects(struct drm_i915_private *i915)
>  {
>  	INIT_DELAYED_WORK(&i915->mm.free_work, __i915_gem_free_work);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 0bf3ee27a2a8..6f0a3ce35567 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -618,6 +618,8 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
>  bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj,
>  					enum intel_memory_type type);
>  
> +bool i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj);
> +
>  int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
>  			 size_t size, struct intel_memory_region *mr,
>  			 struct address_space *mapping,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 098409a33e10..7e1f8b83077f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -266,24 +266,6 @@ static const struct i915_refct_sgt_ops tt_rsgt_ops = {
>  	.release = i915_ttm_tt_release
>  };
>  
> -static inline bool
> -i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj)
> -{
> -	bool lmem_placement = false;
> -	int i;
> -
> -	for (i = 0; i < obj->mm.n_placements; i++) {
> -		/* Compression is not allowed for the objects with smem placement */
> -		if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
> -			return false;
> -		if (!lmem_placement &&
> -		    obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
> -			lmem_placement = true;
> -	}
> -
> -	return lmem_placement;
> -}
> -
>  static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>  					 uint32_t page_flags)
>  {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index 364e7fe8efb1..d22e38aad6b9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -429,6 +429,9 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
>  static bool i915_ttm_memcpy_allowed(struct ttm_buffer_object *bo,
>  				    struct ttm_resource *dst_mem)
>  {
> +	if (i915_gem_object_needs_ccs_pages(i915_ttm_to_gem(bo)))
> +		return false;
> +
>  	if (!(i915_ttm_resource_mappable(bo->resource) &&
>  	      i915_ttm_resource_mappable(dst_mem)))
>  		return false;
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for small BAR uapi bits (rev4)
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
                   ` (13 preceding siblings ...)
  (?)
@ 2022-06-29 13:00 ` Patchwork
  -1 siblings, 0 replies; 48+ messages in thread
From: Patchwork @ 2022-06-29 13:00 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-gfx

== Series Details ==

Series: small BAR uapi bits (rev4)
URL   : https://patchwork.freedesktop.org/series/104369/
State : warning

== Summary ==

Error: dim checkpatch failed
0815fb1423e5 drm/doc: add rfc section for small BAR uapi
-:46: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#46: 
new file mode 100644

-:51: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#51: FILE: Documentation/gpu/rfc/i915_small_bar.h:1:
+/**

-:246: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#246: FILE: Documentation/gpu/rfc/i915_small_bar.rst:1:
+==========================

total: 0 errors, 3 warnings, 0 checks, 243 lines checked
43279bb27abc drm/i915/uapi: add probed_cpu_visible_size
8b35ecafd44c drm/i915/uapi: expose the avail tracking
e6f3ce79d7c3 drm/i915: remove intel_memory_region avail
d867a1820216 drm/i915/uapi: apply ALLOC_GPU_ONLY by default
080a9ba4affd drm/i915/uapi: add NEEDS_CPU_ACCESS hint
-:11: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#11: 
(i.e I915_MEMORY_CLASS_SYSTEM) must be given as a potential placement for the

total: 0 errors, 1 warnings, 0 checks, 142 lines checked
4d94042c9353 drm/i915/error: skip non-mappable pages
2e6fdb9dd549 drm/i915/uapi: tweak error capture on recoverable contexts
fb0eba9ae8a0 drm/i915/selftests: skip the mman tests for stolen
09d095ced3b9 drm/i915/selftests: ensure we reserve a fence slot
e5e598420bd3 drm/i915/ttm: handle blitter failure on DG2
-:476: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#476: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:327:
+						if (fail_gpu && !fail_alloc) {

-:477: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#477: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:328:
+							if (!wedged) {

-:482: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#482: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:333:
+						} else if (wedged) {

-:485: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#485: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:336:
+						} else {

-:561: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#561: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:482:
+						if (fail_gpu && !fail_alloc) {

-:562: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#562: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:483:
+							if (!wedged) {

-:567: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#567: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:488:
+						} else if (wedged) {

-:570: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#570: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:491:
+						} else {

total: 0 errors, 8 warnings, 0 checks, 651 lines checked
373e49121aa8 drm/i915/ttm: disallow CPU fallback mode for ccs pages
9dbf14ae4a7d drm/i915: turn on small BAR support



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for small BAR uapi bits (rev4)
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
                   ` (14 preceding siblings ...)
  (?)
@ 2022-06-29 13:00 ` Patchwork
  -1 siblings, 0 replies; 48+ messages in thread
From: Patchwork @ 2022-06-29 13:00 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-gfx

== Series Details ==

Series: small BAR uapi bits (rev4)
URL   : https://patchwork.freedesktop.org/series/104369/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for small BAR uapi bits (rev4)
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
                   ` (15 preceding siblings ...)
  (?)
@ 2022-06-29 13:22 ` Patchwork
  -1 siblings, 0 replies; 48+ messages in thread
From: Patchwork @ 2022-06-29 13:22 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 18799 bytes --]

== Series Details ==

Series: small BAR uapi bits (rev4)
URL   : https://patchwork.freedesktop.org/series/104369/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11825 -> Patchwork_104369v4
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_104369v4 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_104369v4, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/index.html

Participating hosts (39 -> 42)
------------------------------

  Additional (6): fi-rkl-11600 fi-bsw-n3050 bat-dg2-8 bat-dg2-9 fi-cfl-8700k fi-cfl-guc 
  Missing    (3): bat-adlm-1 fi-bxt-dsi fi-pnv-d510 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_104369v4:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@gt_lrc:
    - bat-dg1-5:          NOTRUN -> [INCOMPLETE][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/bat-dg1-5/igt@i915_selftest@live@gt_lrc.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_module_load@reload:
    - {fi-jsl-1}:         [PASS][2] -> [INCOMPLETE][3]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-jsl-1/igt@i915_module_load@reload.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-jsl-1/igt@i915_module_load@reload.html

  
Known issues
------------

  Here are the changes found in Patchwork_104369v4 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_huc_copy@huc-copy:
    - fi-cfl-8700k:       NOTRUN -> [SKIP][4] ([fdo#109271] / [i915#2190])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-cfl-8700k/igt@gem_huc_copy@huc-copy.html
    - fi-rkl-11600:       NOTRUN -> [SKIP][5] ([i915#2190])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@basic:
    - fi-apl-guc:         NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#4613]) +3 similar issues
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-apl-guc/igt@gem_lmem_swapping@basic.html
    - fi-cfl-8700k:       NOTRUN -> [SKIP][7] ([fdo#109271] / [i915#4613]) +3 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-cfl-8700k/igt@gem_lmem_swapping@basic.html
    - fi-rkl-11600:       NOTRUN -> [SKIP][8] ([i915#4613]) +3 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@gem_lmem_swapping@basic.html

  * igt@gem_lmem_swapping@verify-random:
    - fi-cfl-guc:         NOTRUN -> [SKIP][9] ([fdo#109271] / [i915#4613]) +3 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-cfl-guc/igt@gem_lmem_swapping@verify-random.html

  * igt@gem_tiled_pread_basic:
    - fi-rkl-11600:       NOTRUN -> [SKIP][10] ([i915#3282])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
    - fi-rkl-11600:       NOTRUN -> [SKIP][11] ([i915#3012])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@i915_pm_backlight@basic-brightness.html

  * igt@i915_selftest@live@hangcheck:
    - fi-icl-u2:          [PASS][12] -> [INCOMPLETE][13] ([i915#4890])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-icl-u2/igt@i915_selftest@live@hangcheck.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-icl-u2/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@requests:
    - fi-blb-e6850:       [PASS][14] -> [DMESG-FAIL][15] ([i915#4528])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-blb-e6850/igt@i915_selftest@live@requests.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-blb-e6850/igt@i915_selftest@live@requests.html

  * igt@i915_suspend@basic-s3-without-i915:
    - fi-rkl-11600:       NOTRUN -> [INCOMPLETE][16] ([i915#5982])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_busy@basic@flip:
    - bat-adlp-4:         [PASS][17] -> [DMESG-WARN][18] ([i915#1982] / [i915#3576])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-adlp-4/igt@kms_busy@basic@flip.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/bat-adlp-4/igt@kms_busy@basic@flip.html

  * igt@kms_busy@basic@modeset:
    - bat-adlp-4:         [PASS][19] -> [DMESG-WARN][20] ([i915#3576])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-adlp-4/igt@kms_busy@basic@modeset.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/bat-adlp-4/igt@kms_busy@basic@modeset.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-hsw-g3258:       NOTRUN -> [SKIP][21] ([fdo#109271] / [fdo#111827])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-hsw-g3258/igt@kms_chamelium@common-hpd-after-suspend.html
    - fi-hsw-4770:        NOTRUN -> [SKIP][22] ([fdo#109271] / [fdo#111827])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-hsw-4770/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_chamelium@dp-edid-read:
    - fi-cfl-guc:         NOTRUN -> [SKIP][23] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-cfl-guc/igt@kms_chamelium@dp-edid-read.html
    - fi-cfl-8700k:       NOTRUN -> [SKIP][24] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-cfl-8700k/igt@kms_chamelium@dp-edid-read.html

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-apl-guc:         NOTRUN -> [SKIP][25] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-apl-guc/igt@kms_chamelium@hdmi-crc-fast.html

  * igt@kms_chamelium@hdmi-edid-read:
    - fi-bsw-n3050:       NOTRUN -> [SKIP][26] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-bsw-n3050/igt@kms_chamelium@hdmi-edid-read.html
    - fi-rkl-11600:       NOTRUN -> [SKIP][27] ([fdo#111827]) +7 similar issues
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@kms_chamelium@hdmi-edid-read.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
    - fi-cfl-8700k:       NOTRUN -> [SKIP][28] ([fdo#109271]) +10 similar issues
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-cfl-8700k/igt@kms_cursor_legacy@basic-busy-flip-before-cursor.html
    - fi-rkl-11600:       NOTRUN -> [SKIP][29] ([i915#4103])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@kms_cursor_legacy@basic-busy-flip-before-cursor.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-rkl-11600:       NOTRUN -> [SKIP][30] ([fdo#109285] / [i915#4098])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck:
    - fi-bsw-n3050:       NOTRUN -> [SKIP][31] ([fdo#109271]) +26 similar issues
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-bsw-n3050/igt@kms_pipe_crc_basic@compare-crc-sanitycheck.html

  * igt@kms_psr@primary_page_flip:
    - fi-rkl-11600:       NOTRUN -> [SKIP][32] ([i915#1072]) +3 similar issues
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@kms_psr@primary_page_flip.html

  * igt@kms_psr@sprite_plane_onoff:
    - fi-apl-guc:         NOTRUN -> [SKIP][33] ([fdo#109271]) +11 similar issues
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-apl-guc/igt@kms_psr@sprite_plane_onoff.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - fi-rkl-11600:       NOTRUN -> [SKIP][34] ([i915#3555] / [i915#4098])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-read:
    - fi-rkl-11600:       NOTRUN -> [SKIP][35] ([fdo#109295] / [i915#3291] / [i915#3708]) +2 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@prime_vgem@basic-read.html

  * igt@prime_vgem@basic-userptr:
    - fi-cfl-guc:         NOTRUN -> [SKIP][36] ([fdo#109271]) +10 similar issues
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-cfl-guc/igt@prime_vgem@basic-userptr.html
    - fi-tgl-u2:          NOTRUN -> [SKIP][37] ([fdo#109295] / [i915#3301])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-tgl-u2/igt@prime_vgem@basic-userptr.html
    - fi-rkl-11600:       NOTRUN -> [SKIP][38] ([fdo#109295] / [i915#3301] / [i915#3708])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-rkl-11600/igt@prime_vgem@basic-userptr.html

  * igt@runner@aborted:
    - fi-icl-u2:          NOTRUN -> [FAIL][39] ([i915#4312])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-icl-u2/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@core_hotunplug@unbind-rebind:
    - {bat-adln-1}:       [DMESG-WARN][40] ([i915#6297]) -> [PASS][41]
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-adln-1/igt@core_hotunplug@unbind-rebind.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/bat-adln-1/igt@core_hotunplug@unbind-rebind.html

  * igt@gem_render_tiled_blits@basic:
    - fi-apl-guc:         [INCOMPLETE][42] ([i915#6274]) -> [PASS][43]
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-apl-guc/igt@gem_render_tiled_blits@basic.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-apl-guc/igt@gem_render_tiled_blits@basic.html

  * igt@i915_pm_rpm@module-reload:
    - fi-cfl-8109u:       [DMESG-FAIL][44] ([i915#62]) -> [PASS][45]
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-cfl-8109u/igt@i915_pm_rpm@module-reload.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-cfl-8109u/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live@gt_engines:
    - bat-dg1-5:          [INCOMPLETE][46] ([i915#4418]) -> [PASS][47]
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-dg1-5/igt@i915_selftest@live@gt_engines.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/bat-dg1-5/igt@i915_selftest@live@gt_engines.html

  * igt@i915_selftest@live@hangcheck:
    - fi-hsw-4770:        [INCOMPLETE][48] ([i915#4785]) -> [PASS][49]
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
    - fi-hsw-g3258:       [INCOMPLETE][50] ([i915#4785]) -> [PASS][51]
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-hsw-g3258/igt@i915_selftest@live@hangcheck.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-hsw-g3258/igt@i915_selftest@live@hangcheck.html
    - bat-dg1-6:          [DMESG-FAIL][52] ([i915#4494] / [i915#4957]) -> [PASS][53]
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-dg1-6/igt@i915_selftest@live@hangcheck.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/bat-dg1-6/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@late_gt_pm:
    - fi-cfl-8109u:       [DMESG-WARN][54] ([i915#5904]) -> [PASS][55] +11 similar issues
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-cfl-8109u/igt@i915_selftest@live@late_gt_pm.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-cfl-8109u/igt@i915_selftest@live@late_gt_pm.html

  * igt@i915_selftest@live@reset:
    - {bat-adlp-6}:       [DMESG-FAIL][56] ([i915#4983]) -> [PASS][57]
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-adlp-6/igt@i915_selftest@live@reset.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/bat-adlp-6/igt@i915_selftest@live@reset.html

  * igt@i915_suspend@basic-s2idle-without-i915:
    - fi-cfl-8109u:       [DMESG-WARN][58] ([i915#5904] / [i915#62]) -> [PASS][59]
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-cfl-8109u/igt@i915_suspend@basic-s2idle-without-i915.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-cfl-8109u/igt@i915_suspend@basic-s2idle-without-i915.html

  * igt@kms_flip@basic-flip-vs-modeset@a-edp1:
    - fi-tgl-u2:          [DMESG-WARN][60] ([i915#402]) -> [PASS][61]
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
    - {bat-adln-1}:       [DMESG-WARN][62] ([i915#3576]) -> [PASS][63]
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-adln-1/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/bat-adln-1/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cfl-8109u:       [DMESG-WARN][64] ([i915#62]) -> [PASS][65] +12 similar issues
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1155]: https://gitlab.freedesktop.org/drm/intel/issues/1155
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#3012]: https://gitlab.freedesktop.org/drm/intel/issues/3012
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#3595]: https://gitlab.freedesktop.org/drm/intel/issues/3595
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4418]: https://gitlab.freedesktop.org/drm/intel/issues/4418
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4579]: https://gitlab.freedesktop.org/drm/intel/issues/4579
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4785]: https://gitlab.freedesktop.org/drm/intel/issues/4785
  [i915#4873]: https://gitlab.freedesktop.org/drm/intel/issues/4873
  [i915#4890]: https://gitlab.freedesktop.org/drm/intel/issues/4890
  [i915#4957]: https://gitlab.freedesktop.org/drm/intel/issues/4957
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5174]: https://gitlab.freedesktop.org/drm/intel/issues/5174
  [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190
  [i915#5274]: https://gitlab.freedesktop.org/drm/intel/issues/5274
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#5763]: https://gitlab.freedesktop.org/drm/intel/issues/5763
  [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903
  [i915#5904]: https://gitlab.freedesktop.org/drm/intel/issues/5904
  [i915#5982]: https://gitlab.freedesktop.org/drm/intel/issues/5982
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#6274]: https://gitlab.freedesktop.org/drm/intel/issues/6274
  [i915#6297]: https://gitlab.freedesktop.org/drm/intel/issues/6297


Build changes
-------------

  * Linux: CI_DRM_11825 -> Patchwork_104369v4

  CI-20190529: 20190529
  CI_DRM_11825: 3d881054a2b8614e37db0453c662ead2c0fafe8d @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6551: a01ebaef40f1fa653e9d6a59b719f2d69af2b458 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_104369v4: 3d881054a2b8614e37db0453c662ead2c0fafe8d @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

e1909bab89f7 drm/i915: turn on small BAR support
601ec9894487 drm/i915/ttm: disallow CPU fallback mode for ccs pages
77e31f8adc78 drm/i915/ttm: handle blitter failure on DG2
2dc7cb674dde drm/i915/selftests: ensure we reserve a fence slot
1ed98b11ba4d drm/i915/selftests: skip the mman tests for stolen
d19d24fcdee3 drm/i915/uapi: tweak error capture on recoverable contexts
8618f4d22c74 drm/i915/error: skip non-mappable pages
294273ef6dbf drm/i915/uapi: add NEEDS_CPU_ACCESS hint
658c18c4a985 drm/i915/uapi: apply ALLOC_GPU_ONLY by default
ed79c6135449 drm/i915: remove intel_memory_region avail
413bc92e080a drm/i915/uapi: expose the avail tracking
3e3daf64b8ff drm/i915/uapi: add probed_cpu_visible_size
72a26911e486 drm/doc: add rfc section for small BAR uapi

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v4/index.html

[-- Attachment #2: Type: text/html, Size: 21683 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for small BAR uapi bits (rev5)
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
                   ` (16 preceding siblings ...)
  (?)
@ 2022-06-29 14:40 ` Patchwork
  -1 siblings, 0 replies; 48+ messages in thread
From: Patchwork @ 2022-06-29 14:40 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-gfx

== Series Details ==

Series: small BAR uapi bits (rev5)
URL   : https://patchwork.freedesktop.org/series/104369/
State : warning

== Summary ==

Error: dim checkpatch failed
fca09013097d drm/doc: add rfc section for small BAR uapi
-:46: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#46: 
new file mode 100644

-:51: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#51: FILE: Documentation/gpu/rfc/i915_small_bar.h:1:
+/**

-:246: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#246: FILE: Documentation/gpu/rfc/i915_small_bar.rst:1:
+==========================

total: 0 errors, 3 warnings, 0 checks, 243 lines checked
25d9a8b8e3e6 drm/i915/uapi: add probed_cpu_visible_size
9bd4543cbf53 drm/i915/uapi: expose the avail tracking
ddad2e2be214 drm/i915: remove intel_memory_region avail
0cf6e0196580 drm/i915/uapi: apply ALLOC_GPU_ONLY by default
3f2050678097 drm/i915/uapi: add NEEDS_CPU_ACCESS hint
-:11: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#11: 
(i.e I915_MEMORY_CLASS_SYSTEM) must be given as a potential placement for the

total: 0 errors, 1 warnings, 0 checks, 142 lines checked
c4c76db8f955 drm/i915/error: skip non-mappable pages
624025965f59 drm/i915/uapi: tweak error capture on recoverable contexts
fd46f87db81b drm/i915/selftests: skip the mman tests for stolen
c7e672ad041e drm/i915/selftests: ensure we reserve a fence slot
46b6cad9fcb9 drm/i915/ttm: handle blitter failure on DG2
-:476: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#476: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:327:
+						if (fail_gpu && !fail_alloc) {

-:477: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#477: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:328:
+							if (!wedged) {

-:482: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#482: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:333:
+						} else if (wedged) {

-:485: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#485: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:336:
+						} else {

-:561: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#561: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:482:
+						if (fail_gpu && !fail_alloc) {

-:562: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#562: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:483:
+							if (!wedged) {

-:567: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#567: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:488:
+						} else if (wedged) {

-:570: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#570: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c:491:
+						} else {

total: 0 errors, 8 warnings, 0 checks, 651 lines checked
32359c896915 drm/i915/ttm: disallow CPU fallback mode for ccs pages
ae48f7b6fa91 drm/i915: turn on small BAR support



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for small BAR uapi bits (rev5)
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
                   ` (17 preceding siblings ...)
  (?)
@ 2022-06-29 14:40 ` Patchwork
  -1 siblings, 0 replies; 48+ messages in thread
From: Patchwork @ 2022-06-29 14:40 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-gfx

== Series Details ==

Series: small BAR uapi bits (rev5)
URL   : https://patchwork.freedesktop.org/series/104369/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for small BAR uapi bits (rev5)
  2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
                   ` (18 preceding siblings ...)
  (?)
@ 2022-06-29 15:05 ` Patchwork
  -1 siblings, 0 replies; 48+ messages in thread
From: Patchwork @ 2022-06-29 15:05 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 16244 bytes --]

== Series Details ==

Series: small BAR uapi bits (rev5)
URL   : https://patchwork.freedesktop.org/series/104369/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11825 -> Patchwork_104369v5
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_104369v5 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_104369v5, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/index.html

Participating hosts (39 -> 42)
------------------------------

  Additional (5): fi-bsw-n3050 bat-dg2-8 bat-dg2-9 fi-cfl-guc fi-cfl-8700k 
  Missing    (2): bat-adlm-1 fi-bxt-dsi 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_104369v5:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_exec_parallel@engines@fds:
    - fi-bsw-nick:        [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-bsw-nick/igt@gem_exec_parallel@engines@fds.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-bsw-nick/igt@gem_exec_parallel@engines@fds.html

  * igt@i915_selftest@live@gem_migrate:
    - bat-dg1-6:          [PASS][3] -> [DMESG-FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-dg1-6/igt@i915_selftest@live@gem_migrate.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/bat-dg1-6/igt@i915_selftest@live@gem_migrate.html

  
Known issues
------------

  Here are the changes found in Patchwork_104369v5 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_huc_copy@huc-copy:
    - fi-cfl-8700k:       NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#2190])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-cfl-8700k/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@basic:
    - fi-apl-guc:         NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#4613]) +3 similar issues
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-apl-guc/igt@gem_lmem_swapping@basic.html
    - fi-cfl-8700k:       NOTRUN -> [SKIP][7] ([fdo#109271] / [i915#4613]) +3 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-cfl-8700k/igt@gem_lmem_swapping@basic.html

  * igt@gem_lmem_swapping@verify-random:
    - fi-cfl-guc:         NOTRUN -> [SKIP][8] ([fdo#109271] / [i915#4613]) +3 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-cfl-guc/igt@gem_lmem_swapping@verify-random.html

  * igt@i915_selftest@live@hangcheck:
    - fi-snb-2600:        [PASS][9] -> [INCOMPLETE][10] ([i915#3921])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-snb-2600/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@late_gt_pm:
    - fi-bsw-n3050:       NOTRUN -> [DMESG-FAIL][11] ([i915#3428])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-bsw-n3050/igt@i915_selftest@live@late_gt_pm.html

  * igt@i915_selftest@live@requests:
    - fi-blb-e6850:       [PASS][12] -> [DMESG-FAIL][13] ([i915#4528])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-blb-e6850/igt@i915_selftest@live@requests.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-blb-e6850/igt@i915_selftest@live@requests.html

  * igt@kms_busy@basic@modeset:
    - fi-tgl-u2:          [PASS][14] -> [DMESG-WARN][15] ([i915#402]) +1 similar issue
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-tgl-u2/igt@kms_busy@basic@modeset.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-tgl-u2/igt@kms_busy@basic@modeset.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-hsw-g3258:       NOTRUN -> [SKIP][16] ([fdo#109271] / [fdo#111827])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-hsw-g3258/igt@kms_chamelium@common-hpd-after-suspend.html
    - fi-hsw-4770:        NOTRUN -> [SKIP][17] ([fdo#109271] / [fdo#111827])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-hsw-4770/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_chamelium@dp-edid-read:
    - fi-cfl-guc:         NOTRUN -> [SKIP][18] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-cfl-guc/igt@kms_chamelium@dp-edid-read.html
    - fi-cfl-8700k:       NOTRUN -> [SKIP][19] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-cfl-8700k/igt@kms_chamelium@dp-edid-read.html

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-apl-guc:         NOTRUN -> [SKIP][20] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-apl-guc/igt@kms_chamelium@hdmi-crc-fast.html

  * igt@kms_chamelium@hdmi-edid-read:
    - fi-bsw-n3050:       NOTRUN -> [SKIP][21] ([fdo#109271] / [fdo#111827]) +7 similar issues
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-bsw-n3050/igt@kms_chamelium@hdmi-edid-read.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
    - fi-cfl-8700k:       NOTRUN -> [SKIP][22] ([fdo#109271]) +10 similar issues
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-cfl-8700k/igt@kms_cursor_legacy@basic-busy-flip-before-cursor.html

  * igt@kms_flip@basic-flip-vs-modeset@a-edp1:
    - bat-adlp-4:         [PASS][23] -> [DMESG-WARN][24] ([i915#3576])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-adlp-4/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/bat-adlp-4/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck:
    - fi-bsw-n3050:       NOTRUN -> [SKIP][25] ([fdo#109271]) +25 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-bsw-n3050/igt@kms_pipe_crc_basic@compare-crc-sanitycheck.html

  * igt@kms_psr@sprite_plane_onoff:
    - fi-apl-guc:         NOTRUN -> [SKIP][26] ([fdo#109271]) +11 similar issues
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-apl-guc/igt@kms_psr@sprite_plane_onoff.html

  * igt@prime_vgem@basic-userptr:
    - fi-cfl-guc:         NOTRUN -> [SKIP][27] ([fdo#109271]) +10 similar issues
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-cfl-guc/igt@prime_vgem@basic-userptr.html

  * igt@runner@aborted:
    - fi-bsw-n3050:       NOTRUN -> [FAIL][28] ([fdo#109271] / [i915#3428] / [i915#4312])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-bsw-n3050/igt@runner@aborted.html
    - fi-bsw-nick:        NOTRUN -> [FAIL][29] ([i915#4312])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-bsw-nick/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@core_hotunplug@unbind-rebind:
    - {bat-adln-1}:       [DMESG-WARN][30] ([i915#6297]) -> [PASS][31]
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-adln-1/igt@core_hotunplug@unbind-rebind.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/bat-adln-1/igt@core_hotunplug@unbind-rebind.html

  * igt@gem_render_tiled_blits@basic:
    - fi-apl-guc:         [INCOMPLETE][32] ([i915#6274]) -> [PASS][33]
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-apl-guc/igt@gem_render_tiled_blits@basic.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-apl-guc/igt@gem_render_tiled_blits@basic.html

  * igt@i915_pm_rpm@module-reload:
    - fi-cfl-8109u:       [DMESG-FAIL][34] ([i915#62]) -> [PASS][35]
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-cfl-8109u/igt@i915_pm_rpm@module-reload.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-cfl-8109u/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live@hangcheck:
    - fi-hsw-4770:        [INCOMPLETE][36] ([i915#4785]) -> [PASS][37]
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
    - fi-hsw-g3258:       [INCOMPLETE][38] ([i915#4785]) -> [PASS][39]
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-hsw-g3258/igt@i915_selftest@live@hangcheck.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-hsw-g3258/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@late_gt_pm:
    - fi-cfl-8109u:       [DMESG-WARN][40] ([i915#5904]) -> [PASS][41] +11 similar issues
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-cfl-8109u/igt@i915_selftest@live@late_gt_pm.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-cfl-8109u/igt@i915_selftest@live@late_gt_pm.html

  * igt@i915_selftest@live@reset:
    - {bat-adlp-6}:       [DMESG-FAIL][42] ([i915#4983]) -> [PASS][43]
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-adlp-6/igt@i915_selftest@live@reset.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/bat-adlp-6/igt@i915_selftest@live@reset.html

  * igt@i915_suspend@basic-s2idle-without-i915:
    - fi-cfl-8109u:       [DMESG-WARN][44] ([i915#5904] / [i915#62]) -> [PASS][45]
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-cfl-8109u/igt@i915_suspend@basic-s2idle-without-i915.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-cfl-8109u/igt@i915_suspend@basic-s2idle-without-i915.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions-varying-size:
    - fi-bsw-kefka:       [FAIL][46] ([i915#6298]) -> [PASS][47]
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions-varying-size.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions-varying-size.html

  * igt@kms_flip@basic-flip-vs-modeset@a-edp1:
    - fi-tgl-u2:          [DMESG-WARN][48] ([i915#402]) -> [PASS][49]
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
    - {bat-adln-1}:       [DMESG-WARN][50] ([i915#3576]) -> [PASS][51]
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-adln-1/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/bat-adln-1/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html

  * igt@kms_flip@basic-flip-vs-modeset@b-edp1:
    - {bat-adlp-6}:       [DMESG-WARN][52] ([i915#3576]) -> [PASS][53] +1 similar issue
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/bat-adlp-6/igt@kms_flip@basic-flip-vs-modeset@b-edp1.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/bat-adlp-6/igt@kms_flip@basic-flip-vs-modeset@b-edp1.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cfl-8109u:       [DMESG-WARN][54] ([i915#62]) -> [PASS][55] +12 similar issues
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11825/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1155]: https://gitlab.freedesktop.org/drm/intel/issues/1155
  [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3428]: https://gitlab.freedesktop.org/drm/intel/issues/3428
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#3595]: https://gitlab.freedesktop.org/drm/intel/issues/3595
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3921]: https://gitlab.freedesktop.org/drm/intel/issues/3921
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4579]: https://gitlab.freedesktop.org/drm/intel/issues/4579
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4785]: https://gitlab.freedesktop.org/drm/intel/issues/4785
  [i915#4873]: https://gitlab.freedesktop.org/drm/intel/issues/4873
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5174]: https://gitlab.freedesktop.org/drm/intel/issues/5174
  [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190
  [i915#5270]: https://gitlab.freedesktop.org/drm/intel/issues/5270
  [i915#5274]: https://gitlab.freedesktop.org/drm/intel/issues/5274
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#5763]: https://gitlab.freedesktop.org/drm/intel/issues/5763
  [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903
  [i915#5904]: https://gitlab.freedesktop.org/drm/intel/issues/5904
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#6274]: https://gitlab.freedesktop.org/drm/intel/issues/6274
  [i915#6297]: https://gitlab.freedesktop.org/drm/intel/issues/6297
  [i915#6298]: https://gitlab.freedesktop.org/drm/intel/issues/6298


Build changes
-------------

  * Linux: CI_DRM_11825 -> Patchwork_104369v5

  CI-20190529: 20190529
  CI_DRM_11825: 3d881054a2b8614e37db0453c662ead2c0fafe8d @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6551: a01ebaef40f1fa653e9d6a59b719f2d69af2b458 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_104369v5: 3d881054a2b8614e37db0453c662ead2c0fafe8d @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

aaa10d7e534b drm/i915: turn on small BAR support
44433bc33067 drm/i915/ttm: disallow CPU fallback mode for ccs pages
eb996389e553 drm/i915/ttm: handle blitter failure on DG2
523f42effa5c drm/i915/selftests: ensure we reserve a fence slot
fc8c4b7eeb9a drm/i915/selftests: skip the mman tests for stolen
67b20e7e67e7 drm/i915/uapi: tweak error capture on recoverable contexts
e51bb329ea71 drm/i915/error: skip non-mappable pages
64a4b1bb92ca drm/i915/uapi: add NEEDS_CPU_ACCESS hint
f69a553bd52e drm/i915/uapi: apply ALLOC_GPU_ONLY by default
113571e5e424 drm/i915: remove intel_memory_region avail
6a24501e805f drm/i915/uapi: expose the avail tracking
e284c9078060 drm/i915/uapi: add probed_cpu_visible_size
38f3d8f6d7bd drm/doc: add rfc section for small BAR uapi

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_104369v5/index.html

[-- Attachment #2: Type: text/html, Size: 18114 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 11/13] drm/i915/ttm: handle blitter failure on DG2
  2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 16:11     ` Thomas Hellström
  -1 siblings, 0 replies; 48+ messages in thread
From: Thomas Hellström @ 2022-06-29 16:11 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx
  Cc: Tvrtko Ursulin, Jordan Justen, Lionel Landwerlin,
	Kenneth Graunke, Jon Bloomfield, dri-devel, Daniel Vetter,
	Akeem G Abodunrin

Hi, Matthew,

On 6/29/22 14:14, Matthew Auld wrote:
> If the move or clear operation somehow fails, and the memory underneath
> is not cleared, like when moving to lmem, then we currently fallback to
> memcpy or memset. However with small-BAR systems this fallback might no
> longer be possible. For now we use the set_wedged sledgehammer if we
> ever encounter such a scenario, and mark the object as borked to plug
> any holes where access to the memory underneath can happen. Add some
> basic selftests to exercise this.
>
> v2:
>    - In the selftests make sure we grab the runtime pm around the reset.
>      Also make sure we grab the reset lock before checking if the device
>      is wedged, since the wedge might still be in-progress and hence the
>      bit might not be set yet.
>    - Don't wedge or put the object into an unknown state, if the request
>      construction fails (or similar). Just returning an error and
>      skipping the fallback should be safe here.
>    - Make sure we wedge each gt. (Thomas)
>    - Peek at the unknown_state in io_reserve, that way we don't have to
>      export or hand roll the fault_wait_for_idle. (Thomas)
>    - Add the missing read-side barriers for the unknown_state. (Thomas)
>    - Some kernel-doc fixes. (Thomas)
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_object.c    |  21 +++
>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |   1 +
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +++
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  26 +++-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   3 +
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  88 +++++++++--
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |   1 +
>   .../drm/i915/gem/selftests/i915_gem_migrate.c | 141 +++++++++++++++---
>   .../drm/i915/gem/selftests/i915_gem_mman.c    |  69 +++++++++
>   drivers/gpu/drm/i915/i915_vma.c               |  25 ++--
>   10 files changed, 346 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 06b1b188ce5a..642a5d59ce26 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -783,10 +783,31 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>   				    intr, MAX_SCHEDULE_TIMEOUT);
>   	if (!ret)
>   		ret = -ETIME;
> +	else if (ret > 0 && i915_gem_object_has_unknown_state(obj))
> +		ret = -EIO;
>   
>   	return ret < 0 ? ret : 0;
>   }
>   
> +/**
> + * i915_gem_object_has_unknown_state - Return true if the object backing pages are
> + * in an unknown_state. This means that userspace must NEVER be allowed to touch
> + * the pages, with either the GPU or CPU.
> + *
> + * ONLY valid to be called after ensuring that all kernel fences have signalled
> + * (in particular the fence for moving/clearing the object).
> + */
> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj)
> +{
> +	/*
> +	 * The below barrier pairs with the dma_fence_signal() in
> +	 * __memcpy_work(). We should only sample the unknown_state after all
> +	 * the kernel fences have signalled.
> +	 */
> +	smp_rmb();
> +	return obj->mm.unknown_state;
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/huge_gem_object.c"
>   #include "selftests/huge_pages.c"
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index e11d82a9f7c3..0bf3ee27a2a8 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -524,6 +524,7 @@ int i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj,
>   				     struct dma_fence **fence);
>   int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>   				      bool intr);
> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>   
>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   					 unsigned int cache_level);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 2c88bdb8ff7c..5cf36a130061 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -547,6 +547,24 @@ struct drm_i915_gem_object {
>   		 */
>   		bool ttm_shrinkable;
>   
> +		/**
> +		 * @unknown_state: Indicate that the object is effectively
> +		 * borked. This is write-once and set if we somehow encounter a
> +		 * fatal error when moving/clearing the pages, and we are not
> +		 * able to fallback to memcpy/memset, like on small-BAR systems.
> +		 * The GPU should also be wedged (or in the process) at this
> +		 * point.
> +		 *
> +		 * Only valid to read this after acquiring the dma-resv lock and
> +		 * waiting for all DMA_RESV_USAGE_KERNEL fences to be signalled,
> +		 * or if we otherwise know that the moving fence has signalled,
> +		 * and we are certain the pages underneath are valid for
> +		 * immediate access (under normal operation), like just prior to
> +		 * binding the object or when setting up the CPU fault handler.
> +		 * See i915_gem_object_has_unknown_state();
> +		 */
> +		bool unknown_state;
> +
>   		/**
>   		 * Priority list of potential placements for this object.
>   		 */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 4c25d9b2f138..098409a33e10 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -675,7 +675,15 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
>   		i915_ttm_purge(obj);
>   }
>   
> -static bool i915_ttm_resource_mappable(struct ttm_resource *res)
> +/**
> + * i915_ttm_resource_mappable - Return true if the ttm resource is CPU
> + * accessible.
> + * @res: The TTM resource to check.
> + *
> + * This is interesting on small-BAR systems where we may encounter lmem objects
> + * that can't be accessed via the CPU.
> + */
> +bool i915_ttm_resource_mappable(struct ttm_resource *res)
>   {
>   	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
>   
> @@ -687,6 +695,22 @@ static bool i915_ttm_resource_mappable(struct ttm_resource *res)
>   
>   static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
>   {
> +	struct drm_i915_gem_object *obj = i915_ttm_to_gem(mem->bo);
> +	bool unknown_state;
> +
> +	if (!obj)
> +		return -EINVAL;
> +
> +	if (!kref_get_unless_zero(&obj->base.refcount))
> +		return -EINVAL;
> +
> +	assert_object_held(obj);
> +
> +	unknown_state = i915_gem_object_has_unknown_state(obj);
> +	i915_gem_object_put(obj);
> +	if (unknown_state)
> +		return -EINVAL;
> +
>   	if (!i915_ttm_cpu_maps_iomem(mem))
>   		return 0;
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> index 73e371aa3850..e4842b4296fc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> @@ -92,4 +92,7 @@ static inline bool i915_ttm_cpu_maps_iomem(struct ttm_resource *mem)
>   	/* Once / if we support GGTT, this is also false for cached ttm_tts */
>   	return mem->mem_type != I915_PL_SYSTEM;
>   }
> +
> +bool i915_ttm_resource_mappable(struct ttm_resource *res);
> +
>   #endif
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index a10716f4e717..364e7fe8efb1 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -33,6 +33,7 @@
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   static bool fail_gpu_migration;
>   static bool fail_work_allocation;
> +static bool ban_memcpy;
>   
>   void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>   					bool work_allocation)
> @@ -40,6 +41,11 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>   	fail_gpu_migration = gpu_migration;
>   	fail_work_allocation = work_allocation;
>   }
> +
> +void i915_ttm_migrate_set_ban_memcpy(bool ban)
> +{
> +	ban_memcpy = ban;
> +}
>   #endif
>   
>   static enum i915_cache_level
> @@ -258,15 +264,23 @@ struct i915_ttm_memcpy_arg {
>    * from the callback for lockdep reasons.
>    * @cb: Callback for the accelerated migration fence.
>    * @arg: The argument for the memcpy functionality.
> + * @i915: The i915 pointer.
> + * @obj: The GEM object.
> + * @memcpy_allowed: Instead of processing the @arg, and falling back to memcpy
> + * or memset, we wedge the device and set the @obj unknown_state, to prevent
> + * further access to the object with the CPU or GPU.  On some devices we might
> + * only be permitted to use the blitter engine for such operations.
>    */
>   struct i915_ttm_memcpy_work {
>   	struct dma_fence fence;
>   	struct work_struct work;
> -	/* The fence lock */
>   	spinlock_t lock;
>   	struct irq_work irq_work;
>   	struct dma_fence_cb cb;
>   	struct i915_ttm_memcpy_arg arg;
> +	struct drm_i915_private *i915;
> +	struct drm_i915_gem_object *obj;
> +	bool memcpy_allowed;
>   };
>   
>   static void i915_ttm_move_memcpy(struct i915_ttm_memcpy_arg *arg)
> @@ -319,12 +333,34 @@ static void __memcpy_work(struct work_struct *work)
>   	struct i915_ttm_memcpy_arg *arg = &copy_work->arg;
>   	bool cookie = dma_fence_begin_signalling();
>   
> -	i915_ttm_move_memcpy(arg);
> +	if (copy_work->memcpy_allowed) {
> +		i915_ttm_move_memcpy(arg);
> +	} else {
> +		/*
> +		 * Prevent further use of the object. Any future GTT binding or
> +		 * CPU access is not allowed once we signal the fence. Outside
> +		 * of the fence critical section, we then also then wedge the gpu
> +		 * to indicate the device is not functional.
> +		 *
> +		 * The below dma_fence_signal() is our write-memory-barrier.
> +		 */
> +		copy_work->obj->mm.unknown_state = true;
> +	}
> +
>   	dma_fence_end_signalling(cookie);
>   
>   	dma_fence_signal(&copy_work->fence);
>   
> +	if (!copy_work->memcpy_allowed) {
> +		struct intel_gt *gt;
> +		unsigned int id;
> +
> +		for_each_gt(gt, copy_work->i915, id)
> +			intel_gt_set_wedged(gt);
> +	}

Did you try to move the gt wedging to before dma_fence_signal, but 
before dma_fence_end_signalling? Otherwise I think there is a race 
opportunity (albeit very unlikely) where the gpu might read 
uninitialized content.

With that fixed (or not working)

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> +
>   	i915_ttm_memcpy_release(arg);
> +	i915_gem_object_put(copy_work->obj);
>   	dma_fence_put(&copy_work->fence);
>   }
>   
> @@ -336,6 +372,7 @@ static void __memcpy_irq_work(struct irq_work *irq_work)
>   
>   	dma_fence_signal(&copy_work->fence);
>   	i915_ttm_memcpy_release(arg);
> +	i915_gem_object_put(copy_work->obj);
>   	dma_fence_put(&copy_work->fence);
>   }
>   
> @@ -389,6 +426,16 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
>   	return &work->fence;
>   }
>   
> +static bool i915_ttm_memcpy_allowed(struct ttm_buffer_object *bo,
> +				    struct ttm_resource *dst_mem)
> +{
> +	if (!(i915_ttm_resource_mappable(bo->resource) &&
> +	      i915_ttm_resource_mappable(dst_mem)))
> +		return false;
> +
> +	return I915_SELFTEST_ONLY(ban_memcpy) ? false : true;
> +}
> +
>   static struct dma_fence *
>   __i915_ttm_move(struct ttm_buffer_object *bo,
>   		const struct ttm_operation_ctx *ctx, bool clear,
> @@ -396,6 +443,9 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>   		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
>   		const struct i915_deps *move_deps)
>   {
> +	const bool memcpy_allowed = i915_ttm_memcpy_allowed(bo, dst_mem);
> +	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> +	struct drm_i915_private *i915 = to_i915(bo->base.dev);
>   	struct i915_ttm_memcpy_work *copy_work = NULL;
>   	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
>   	struct dma_fence *fence = ERR_PTR(-EINVAL);
> @@ -423,9 +473,14 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>   			copy_work = kzalloc(sizeof(*copy_work), GFP_KERNEL);
>   
>   		if (copy_work) {
> +			copy_work->i915 = i915;
> +			copy_work->memcpy_allowed = memcpy_allowed;
> +			copy_work->obj = i915_gem_object_get(obj);
>   			arg = &copy_work->arg;
> -			i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
> -					     dst_rsgt);
> +			if (memcpy_allowed)
> +				i915_ttm_memcpy_init(arg, bo, clear, dst_mem,
> +						     dst_ttm, dst_rsgt);
> +
>   			fence = i915_ttm_memcpy_work_arm(copy_work, dep);
>   		} else {
>   			dma_fence_wait(dep, false);
> @@ -450,17 +505,23 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>   	}
>   
>   	/* Error intercept failed or no accelerated migration to start with */
> -	if (!copy_work)
> -		i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
> -				     dst_rsgt);
> -	i915_ttm_move_memcpy(arg);
> -	i915_ttm_memcpy_release(arg);
> +
> +	if (memcpy_allowed) {
> +		if (!copy_work)
> +			i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
> +					     dst_rsgt);
> +		i915_ttm_move_memcpy(arg);
> +		i915_ttm_memcpy_release(arg);
> +	}
> +	if (copy_work)
> +		i915_gem_object_put(copy_work->obj);
>   	kfree(copy_work);
>   
> -	return NULL;
> +	return memcpy_allowed ? NULL : ERR_PTR(-EIO);
>   out:
>   	if (!fence && copy_work) {
>   		i915_ttm_memcpy_release(arg);
> +		i915_gem_object_put(copy_work->obj);
>   		kfree(copy_work);
>   	}
>   
> @@ -539,8 +600,11 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>   	}
>   
>   	if (migration_fence) {
> -		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
> -						true, dst_mem);
> +		if (I915_SELFTEST_ONLY(evict && fail_gpu_migration))
> +			ret = -EIO; /* never feed non-migrate fences into ttm */
> +		else
> +			ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
> +							true, dst_mem);
>   		if (ret) {
>   			dma_fence_wait(migration_fence, false);
>   			ttm_bo_move_sync_cleanup(bo, dst_mem);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
> index d2e7f149e05c..8a5d5ab0cc34 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
> @@ -22,6 +22,7 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo);
>   
>   I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>   							      bool work_allocation));
> +I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_ban_memcpy(bool ban));
>   
>   int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>   			  struct drm_i915_gem_object *src,
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> index 801af51aff62..fe6c37fd7859 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> @@ -9,6 +9,7 @@
>   
>   #include "i915_deps.h"
>   
> +#include "selftests/igt_reset.h"
>   #include "selftests/igt_spinner.h"
>   
>   static int igt_fill_check_buffer(struct drm_i915_gem_object *obj,
> @@ -109,7 +110,8 @@ static int igt_same_create_migrate(void *arg)
>   
>   static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
>   				  struct drm_i915_gem_object *obj,
> -				  struct i915_vma *vma)
> +				  struct i915_vma *vma,
> +				  bool silent_migrate)
>   {
>   	int err;
>   
> @@ -138,7 +140,8 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
>   	if (i915_gem_object_is_lmem(obj)) {
>   		err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM);
>   		if (err) {
> -			pr_err("Object failed migration to smem\n");
> +			if (!silent_migrate)
> +				pr_err("Object failed migration to smem\n");
>   			if (err)
>   				return err;
>   		}
> @@ -156,7 +159,8 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
>   	} else {
>   		err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM_0);
>   		if (err) {
> -			pr_err("Object failed migration to lmem\n");
> +			if (!silent_migrate)
> +				pr_err("Object failed migration to lmem\n");
>   			if (err)
>   				return err;
>   		}
> @@ -179,7 +183,8 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
>   				    struct i915_address_space *vm,
>   				    struct i915_deps *deps,
>   				    struct igt_spinner *spin,
> -				    struct dma_fence *spin_fence)
> +				    struct dma_fence *spin_fence,
> +				    bool borked_migrate)
>   {
>   	struct drm_i915_private *i915 = gt->i915;
>   	struct drm_i915_gem_object *obj;
> @@ -242,7 +247,8 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
>   	 */
>   	for (i = 1; i <= 5; ++i) {
>   		for_i915_gem_ww(&ww, err, true)
> -			err = lmem_pages_migrate_one(&ww, obj, vma);
> +			err = lmem_pages_migrate_one(&ww, obj, vma,
> +						     borked_migrate);
>   		if (err)
>   			goto out_put;
>   	}
> @@ -283,23 +289,70 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
>   
>   static int igt_lmem_pages_failsafe_migrate(void *arg)
>   {
> -	int fail_gpu, fail_alloc, ret;
> +	int fail_gpu, fail_alloc, ban_memcpy, ret;
>   	struct intel_gt *gt = arg;
>   
>   	for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
>   		for (fail_alloc = 0; fail_alloc < 2; ++fail_alloc) {
> -			pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
> -				fail_gpu, fail_alloc);
> -			i915_ttm_migrate_set_failure_modes(fail_gpu,
> -							   fail_alloc);
> -			ret = __igt_lmem_pages_migrate(gt, NULL, NULL, NULL, NULL);
> -			if (ret)
> -				goto out_err;
> +			for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
> +				pr_info("Simulated failure modes: gpu: %d, alloc:%d, ban_memcpy: %d\n",
> +					fail_gpu, fail_alloc, ban_memcpy);
> +				i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
> +				i915_ttm_migrate_set_failure_modes(fail_gpu,
> +								   fail_alloc);
> +				ret = __igt_lmem_pages_migrate(gt, NULL, NULL,
> +							       NULL, NULL,
> +							       ban_memcpy &&
> +							       fail_gpu);
> +
> +				if (ban_memcpy && fail_gpu) {
> +					struct intel_gt *__gt;
> +					unsigned int id;
> +
> +					if (ret != -EIO) {
> +						pr_err("expected -EIO, got (%d)\n", ret);
> +						ret = -EINVAL;
> +					} else {
> +						ret = 0;
> +					}
> +
> +					for_each_gt(__gt, gt->i915, id) {
> +						intel_wakeref_t wakeref;
> +						bool wedged;
> +
> +						mutex_lock(&__gt->reset.mutex);
> +						wedged = test_bit(I915_WEDGED, &__gt->reset.flags);
> +						mutex_unlock(&__gt->reset.mutex);
> +
> +						if (fail_gpu && !fail_alloc) {
> +							if (!wedged) {
> +								pr_err("gt(%u) not wedged\n", id);
> +								ret = -EINVAL;
> +								continue;
> +							}
> +						} else if (wedged) {
> +							pr_err("gt(%u) incorrectly wedged\n", id);
> +							ret = -EINVAL;
> +						} else {
> +							continue;
> +						}
> +
> +						wakeref = intel_runtime_pm_get(__gt->uncore->rpm);
> +						igt_global_reset_lock(__gt);
> +						intel_gt_reset(__gt, ALL_ENGINES, NULL);
> +						igt_global_reset_unlock(__gt);
> +						intel_runtime_pm_put(__gt->uncore->rpm, wakeref);
> +					}
> +					if (ret)
> +						goto out_err;
> +				}
> +			}
>   		}
>   	}
>   
>   out_err:
>   	i915_ttm_migrate_set_failure_modes(false, false);
> +	i915_ttm_migrate_set_ban_memcpy(false);
>   	return ret;
>   }
>   
> @@ -370,7 +423,7 @@ static int igt_async_migrate(struct intel_gt *gt)
>   			goto out_ce;
>   
>   		err = __igt_lmem_pages_migrate(gt, &ppgtt->vm, &deps, &spin,
> -					       spin_fence);
> +					       spin_fence, false);
>   		i915_deps_fini(&deps);
>   		dma_fence_put(spin_fence);
>   		if (err)
> @@ -394,23 +447,67 @@ static int igt_async_migrate(struct intel_gt *gt)
>   #define ASYNC_FAIL_ALLOC 1
>   static int igt_lmem_async_migrate(void *arg)
>   {
> -	int fail_gpu, fail_alloc, ret;
> +	int fail_gpu, fail_alloc, ban_memcpy, ret;
>   	struct intel_gt *gt = arg;
>   
>   	for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
>   		for (fail_alloc = 0; fail_alloc < ASYNC_FAIL_ALLOC; ++fail_alloc) {
> -			pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
> -				fail_gpu, fail_alloc);
> -			i915_ttm_migrate_set_failure_modes(fail_gpu,
> -							   fail_alloc);
> -			ret = igt_async_migrate(gt);
> -			if (ret)
> -				goto out_err;
> +			for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
> +				pr_info("Simulated failure modes: gpu: %d, alloc: %d, ban_memcpy: %d\n",
> +					fail_gpu, fail_alloc, ban_memcpy);
> +				i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
> +				i915_ttm_migrate_set_failure_modes(fail_gpu,
> +								   fail_alloc);
> +				ret = igt_async_migrate(gt);
> +
> +				if (fail_gpu && ban_memcpy) {
> +					struct intel_gt *__gt;
> +					unsigned int id;
> +
> +					if (ret != -EIO) {
> +						pr_err("expected -EIO, got (%d)\n", ret);
> +						ret = -EINVAL;
> +					} else {
> +						ret = 0;
> +					}
> +
> +					for_each_gt(__gt, gt->i915, id) {
> +						intel_wakeref_t wakeref;
> +						bool wedged;
> +
> +						mutex_lock(&__gt->reset.mutex);
> +						wedged = test_bit(I915_WEDGED, &__gt->reset.flags);
> +						mutex_unlock(&__gt->reset.mutex);
> +
> +						if (fail_gpu && !fail_alloc) {
> +							if (!wedged) {
> +								pr_err("gt(%u) not wedged\n", id);
> +								ret = -EINVAL;
> +								continue;
> +							}
> +						} else if (wedged) {
> +							pr_err("gt(%u) incorrectly wedged\n", id);
> +							ret = -EINVAL;
> +						} else {
> +							continue;
> +						}
> +
> +						wakeref = intel_runtime_pm_get(__gt->uncore->rpm);
> +						igt_global_reset_lock(__gt);
> +						intel_gt_reset(__gt, ALL_ENGINES, NULL);
> +						igt_global_reset_unlock(__gt);
> +						intel_runtime_pm_put(__gt->uncore->rpm, wakeref);
> +					}
> +				}
> +				if (ret)
> +					goto out_err;
> +			}
>   		}
>   	}
>   
>   out_err:
>   	i915_ttm_migrate_set_failure_modes(false, false);
> +	i915_ttm_migrate_set_ban_memcpy(false);
>   	return ret;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index da28acb78a88..3ced9948a331 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -10,6 +10,7 @@
>   #include "gem/i915_gem_internal.h"
>   #include "gem/i915_gem_region.h"
>   #include "gem/i915_gem_ttm.h"
> +#include "gem/i915_gem_ttm_move.h"
>   #include "gt/intel_engine_pm.h"
>   #include "gt/intel_gpu_commands.h"
>   #include "gt/intel_gt.h"
> @@ -21,6 +22,7 @@
>   #include "i915_selftest.h"
>   #include "selftests/i915_random.h"
>   #include "selftests/igt_flush_test.h"
> +#include "selftests/igt_reset.h"
>   #include "selftests/igt_mmap.h"
>   
>   struct tile {
> @@ -1163,6 +1165,7 @@ static int ___igt_mmap_migrate(struct drm_i915_private *i915,
>   #define IGT_MMAP_MIGRATE_FILL        (1 << 1)
>   #define IGT_MMAP_MIGRATE_EVICTABLE   (1 << 2)
>   #define IGT_MMAP_MIGRATE_UNFAULTABLE (1 << 3)
> +#define IGT_MMAP_MIGRATE_FAIL_GPU    (1 << 4)
>   static int __igt_mmap_migrate(struct intel_memory_region **placements,
>   			      int n_placements,
>   			      struct intel_memory_region *expected_mr,
> @@ -1237,13 +1240,62 @@ static int __igt_mmap_migrate(struct intel_memory_region **placements,
>   	if (flags & IGT_MMAP_MIGRATE_EVICTABLE)
>   		igt_make_evictable(&objects);
>   
> +	if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
> +		err = i915_gem_object_lock(obj, NULL);
> +		if (err)
> +			goto out_put;
> +
> +		/*
> +		 * Ensure we only simulate the gpu failuire when faulting the
> +		 * pages.
> +		 */
> +		err = i915_gem_object_wait_moving_fence(obj, true);
> +		i915_gem_object_unlock(obj);
> +		if (err)
> +			goto out_put;
> +		i915_ttm_migrate_set_failure_modes(true, false);
> +	}
> +
>   	err = ___igt_mmap_migrate(i915, obj, addr,
>   				  flags & IGT_MMAP_MIGRATE_UNFAULTABLE);
> +
>   	if (!err && obj->mm.region != expected_mr) {
>   		pr_err("%s region mismatch %s\n", __func__, expected_mr->name);
>   		err = -EINVAL;
>   	}
>   
> +	if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
> +		struct intel_gt *gt;
> +		unsigned int id;
> +
> +		i915_ttm_migrate_set_failure_modes(false, false);
> +
> +		for_each_gt(gt, i915, id) {
> +			intel_wakeref_t wakeref;
> +			bool wedged;
> +
> +			mutex_lock(&gt->reset.mutex);
> +			wedged = test_bit(I915_WEDGED, &gt->reset.flags);
> +			mutex_unlock(&gt->reset.mutex);
> +			if (!wedged) {
> +				pr_err("gt(%u) not wedged\n", id);
> +				err = -EINVAL;
> +				continue;
> +			}
> +
> +			wakeref = intel_runtime_pm_get(gt->uncore->rpm);
> +			igt_global_reset_lock(gt);
> +			intel_gt_reset(gt, ALL_ENGINES, NULL);
> +			igt_global_reset_unlock(gt);
> +			intel_runtime_pm_put(gt->uncore->rpm, wakeref);
> +		}
> +
> +		if (!i915_gem_object_has_unknown_state(obj)) {
> +			pr_err("object missing unknown_state\n");
> +			err = -EINVAL;
> +		}
> +	}
> +
>   out_put:
>   	i915_gem_object_put(obj);
>   	igt_close_objects(i915, &objects);
> @@ -1324,6 +1376,23 @@ static int igt_mmap_migrate(void *arg)
>   					 IGT_MMAP_MIGRATE_TOPDOWN |
>   					 IGT_MMAP_MIGRATE_FILL |
>   					 IGT_MMAP_MIGRATE_UNFAULTABLE);
> +		if (err)
> +			goto out_io_size;
> +
> +		/*
> +		 * Allocate in the non-mappable portion, but force migrating to
> +		 * the mappable portion on fault (LMEM -> LMEM). We then also
> +		 * simulate a gpu error when moving the pages when faulting the
> +		 * pages, which should result in wedging the gpu and returning
> +		 * SIGBUS in the fault handler, since we can't fallback to
> +		 * memcpy.
> +		 */
> +		err = __igt_mmap_migrate(single, ARRAY_SIZE(single), mr,
> +					 IGT_MMAP_MIGRATE_TOPDOWN |
> +					 IGT_MMAP_MIGRATE_FILL |
> +					 IGT_MMAP_MIGRATE_EVICTABLE |
> +					 IGT_MMAP_MIGRATE_FAIL_GPU |
> +					 IGT_MMAP_MIGRATE_UNFAULTABLE);
>   out_io_size:
>   		mr->io_size = saved_io_size;
>   		i915_ttm_buddy_man_force_visible_size(man,
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 5d5828b9a242..43339ecabd73 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -310,7 +310,7 @@ struct i915_vma_work {
>   	struct i915_address_space *vm;
>   	struct i915_vm_pt_stash stash;
>   	struct i915_vma_resource *vma_res;
> -	struct drm_i915_gem_object *pinned;
> +	struct drm_i915_gem_object *obj;
>   	struct i915_sw_dma_fence_cb cb;
>   	enum i915_cache_level cache_level;
>   	unsigned int flags;
> @@ -321,17 +321,25 @@ static void __vma_bind(struct dma_fence_work *work)
>   	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>   	struct i915_vma_resource *vma_res = vw->vma_res;
>   
> +	/*
> +	 * We are about the bind the object, which must mean we have already
> +	 * signaled the work to potentially clear/move the pages underneath. If
> +	 * something went wrong at that stage then the object should have
> +	 * unknown_state set, in which case we need to skip the bind.
> +	 */
> +	if (i915_gem_object_has_unknown_state(vw->obj))
> +		return;
> +
>   	vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
>   			       vma_res, vw->cache_level, vw->flags);
> -
>   }
>   
>   static void __vma_release(struct dma_fence_work *work)
>   {
>   	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>   
> -	if (vw->pinned)
> -		i915_gem_object_put(vw->pinned);
> +	if (vw->obj)
> +		i915_gem_object_put(vw->obj);
>   
>   	i915_vm_free_pt_stash(vw->vm, &vw->stash);
>   	if (vw->vma_res)
> @@ -517,14 +525,7 @@ int i915_vma_bind(struct i915_vma *vma,
>   		}
>   
>   		work->base.dma.error = 0; /* enable the queue_work() */
> -
> -		/*
> -		 * If we don't have the refcounted pages list, keep a reference
> -		 * on the object to avoid waiting for the async bind to
> -		 * complete in the object destruction path.
> -		 */
> -		if (!work->vma_res->bi.pages_rsgt)
> -			work->pinned = i915_gem_object_get(vma->obj);
> +		work->obj = i915_gem_object_get(vma->obj);
>   	} else {
>   		ret = i915_gem_object_wait_moving_fence(vma->obj, true);
>   		if (ret) {

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx] [PATCH v3 11/13] drm/i915/ttm: handle blitter failure on DG2
@ 2022-06-29 16:11     ` Thomas Hellström
  0 siblings, 0 replies; 48+ messages in thread
From: Thomas Hellström @ 2022-06-29 16:11 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx; +Cc: Kenneth Graunke, dri-devel, Daniel Vetter

Hi, Matthew,

On 6/29/22 14:14, Matthew Auld wrote:
> If the move or clear operation somehow fails, and the memory underneath
> is not cleared, like when moving to lmem, then we currently fallback to
> memcpy or memset. However with small-BAR systems this fallback might no
> longer be possible. For now we use the set_wedged sledgehammer if we
> ever encounter such a scenario, and mark the object as borked to plug
> any holes where access to the memory underneath can happen. Add some
> basic selftests to exercise this.
>
> v2:
>    - In the selftests make sure we grab the runtime pm around the reset.
>      Also make sure we grab the reset lock before checking if the device
>      is wedged, since the wedge might still be in-progress and hence the
>      bit might not be set yet.
>    - Don't wedge or put the object into an unknown state, if the request
>      construction fails (or similar). Just returning an error and
>      skipping the fallback should be safe here.
>    - Make sure we wedge each gt. (Thomas)
>    - Peek at the unknown_state in io_reserve, that way we don't have to
>      export or hand roll the fault_wait_for_idle. (Thomas)
>    - Add the missing read-side barriers for the unknown_state. (Thomas)
>    - Some kernel-doc fixes. (Thomas)
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_object.c    |  21 +++
>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |   1 +
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +++
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  26 +++-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   3 +
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  88 +++++++++--
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |   1 +
>   .../drm/i915/gem/selftests/i915_gem_migrate.c | 141 +++++++++++++++---
>   .../drm/i915/gem/selftests/i915_gem_mman.c    |  69 +++++++++
>   drivers/gpu/drm/i915/i915_vma.c               |  25 ++--
>   10 files changed, 346 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 06b1b188ce5a..642a5d59ce26 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -783,10 +783,31 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>   				    intr, MAX_SCHEDULE_TIMEOUT);
>   	if (!ret)
>   		ret = -ETIME;
> +	else if (ret > 0 && i915_gem_object_has_unknown_state(obj))
> +		ret = -EIO;
>   
>   	return ret < 0 ? ret : 0;
>   }
>   
> +/**
> + * i915_gem_object_has_unknown_state - Return true if the object backing pages are
> + * in an unknown_state. This means that userspace must NEVER be allowed to touch
> + * the pages, with either the GPU or CPU.
> + *
> + * ONLY valid to be called after ensuring that all kernel fences have signalled
> + * (in particular the fence for moving/clearing the object).
> + */
> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj)
> +{
> +	/*
> +	 * The below barrier pairs with the dma_fence_signal() in
> +	 * __memcpy_work(). We should only sample the unknown_state after all
> +	 * the kernel fences have signalled.
> +	 */
> +	smp_rmb();
> +	return obj->mm.unknown_state;
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/huge_gem_object.c"
>   #include "selftests/huge_pages.c"
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index e11d82a9f7c3..0bf3ee27a2a8 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -524,6 +524,7 @@ int i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj,
>   				     struct dma_fence **fence);
>   int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>   				      bool intr);
> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>   
>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   					 unsigned int cache_level);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 2c88bdb8ff7c..5cf36a130061 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -547,6 +547,24 @@ struct drm_i915_gem_object {
>   		 */
>   		bool ttm_shrinkable;
>   
> +		/**
> +		 * @unknown_state: Indicate that the object is effectively
> +		 * borked. This is write-once and set if we somehow encounter a
> +		 * fatal error when moving/clearing the pages, and we are not
> +		 * able to fallback to memcpy/memset, like on small-BAR systems.
> +		 * The GPU should also be wedged (or in the process) at this
> +		 * point.
> +		 *
> +		 * Only valid to read this after acquiring the dma-resv lock and
> +		 * waiting for all DMA_RESV_USAGE_KERNEL fences to be signalled,
> +		 * or if we otherwise know that the moving fence has signalled,
> +		 * and we are certain the pages underneath are valid for
> +		 * immediate access (under normal operation), like just prior to
> +		 * binding the object or when setting up the CPU fault handler.
> +		 * See i915_gem_object_has_unknown_state();
> +		 */
> +		bool unknown_state;
> +
>   		/**
>   		 * Priority list of potential placements for this object.
>   		 */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 4c25d9b2f138..098409a33e10 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -675,7 +675,15 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
>   		i915_ttm_purge(obj);
>   }
>   
> -static bool i915_ttm_resource_mappable(struct ttm_resource *res)
> +/**
> + * i915_ttm_resource_mappable - Return true if the ttm resource is CPU
> + * accessible.
> + * @res: The TTM resource to check.
> + *
> + * This is interesting on small-BAR systems where we may encounter lmem objects
> + * that can't be accessed via the CPU.
> + */
> +bool i915_ttm_resource_mappable(struct ttm_resource *res)
>   {
>   	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
>   
> @@ -687,6 +695,22 @@ static bool i915_ttm_resource_mappable(struct ttm_resource *res)
>   
>   static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
>   {
> +	struct drm_i915_gem_object *obj = i915_ttm_to_gem(mem->bo);
> +	bool unknown_state;
> +
> +	if (!obj)
> +		return -EINVAL;
> +
> +	if (!kref_get_unless_zero(&obj->base.refcount))
> +		return -EINVAL;
> +
> +	assert_object_held(obj);
> +
> +	unknown_state = i915_gem_object_has_unknown_state(obj);
> +	i915_gem_object_put(obj);
> +	if (unknown_state)
> +		return -EINVAL;
> +
>   	if (!i915_ttm_cpu_maps_iomem(mem))
>   		return 0;
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> index 73e371aa3850..e4842b4296fc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> @@ -92,4 +92,7 @@ static inline bool i915_ttm_cpu_maps_iomem(struct ttm_resource *mem)
>   	/* Once / if we support GGTT, this is also false for cached ttm_tts */
>   	return mem->mem_type != I915_PL_SYSTEM;
>   }
> +
> +bool i915_ttm_resource_mappable(struct ttm_resource *res);
> +
>   #endif
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index a10716f4e717..364e7fe8efb1 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -33,6 +33,7 @@
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   static bool fail_gpu_migration;
>   static bool fail_work_allocation;
> +static bool ban_memcpy;
>   
>   void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>   					bool work_allocation)
> @@ -40,6 +41,11 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>   	fail_gpu_migration = gpu_migration;
>   	fail_work_allocation = work_allocation;
>   }
> +
> +void i915_ttm_migrate_set_ban_memcpy(bool ban)
> +{
> +	ban_memcpy = ban;
> +}
>   #endif
>   
>   static enum i915_cache_level
> @@ -258,15 +264,23 @@ struct i915_ttm_memcpy_arg {
>    * from the callback for lockdep reasons.
>    * @cb: Callback for the accelerated migration fence.
>    * @arg: The argument for the memcpy functionality.
> + * @i915: The i915 pointer.
> + * @obj: The GEM object.
> + * @memcpy_allowed: Instead of processing the @arg, and falling back to memcpy
> + * or memset, we wedge the device and set the @obj unknown_state, to prevent
> + * further access to the object with the CPU or GPU.  On some devices we might
> + * only be permitted to use the blitter engine for such operations.
>    */
>   struct i915_ttm_memcpy_work {
>   	struct dma_fence fence;
>   	struct work_struct work;
> -	/* The fence lock */
>   	spinlock_t lock;
>   	struct irq_work irq_work;
>   	struct dma_fence_cb cb;
>   	struct i915_ttm_memcpy_arg arg;
> +	struct drm_i915_private *i915;
> +	struct drm_i915_gem_object *obj;
> +	bool memcpy_allowed;
>   };
>   
>   static void i915_ttm_move_memcpy(struct i915_ttm_memcpy_arg *arg)
> @@ -319,12 +333,34 @@ static void __memcpy_work(struct work_struct *work)
>   	struct i915_ttm_memcpy_arg *arg = &copy_work->arg;
>   	bool cookie = dma_fence_begin_signalling();
>   
> -	i915_ttm_move_memcpy(arg);
> +	if (copy_work->memcpy_allowed) {
> +		i915_ttm_move_memcpy(arg);
> +	} else {
> +		/*
> +		 * Prevent further use of the object. Any future GTT binding or
> +		 * CPU access is not allowed once we signal the fence. Outside
> +		 * of the fence critical section, we then also then wedge the gpu
> +		 * to indicate the device is not functional.
> +		 *
> +		 * The below dma_fence_signal() is our write-memory-barrier.
> +		 */
> +		copy_work->obj->mm.unknown_state = true;
> +	}
> +
>   	dma_fence_end_signalling(cookie);
>   
>   	dma_fence_signal(&copy_work->fence);
>   
> +	if (!copy_work->memcpy_allowed) {
> +		struct intel_gt *gt;
> +		unsigned int id;
> +
> +		for_each_gt(gt, copy_work->i915, id)
> +			intel_gt_set_wedged(gt);
> +	}

Did you try to move the gt wedging to before dma_fence_signal, but 
before dma_fence_end_signalling? Otherwise I think there is a race 
opportunity (albeit very unlikely) where the gpu might read 
uninitialized content.

With that fixed (or not working)

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> +
>   	i915_ttm_memcpy_release(arg);
> +	i915_gem_object_put(copy_work->obj);
>   	dma_fence_put(&copy_work->fence);
>   }
>   
> @@ -336,6 +372,7 @@ static void __memcpy_irq_work(struct irq_work *irq_work)
>   
>   	dma_fence_signal(&copy_work->fence);
>   	i915_ttm_memcpy_release(arg);
> +	i915_gem_object_put(copy_work->obj);
>   	dma_fence_put(&copy_work->fence);
>   }
>   
> @@ -389,6 +426,16 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
>   	return &work->fence;
>   }
>   
> +static bool i915_ttm_memcpy_allowed(struct ttm_buffer_object *bo,
> +				    struct ttm_resource *dst_mem)
> +{
> +	if (!(i915_ttm_resource_mappable(bo->resource) &&
> +	      i915_ttm_resource_mappable(dst_mem)))
> +		return false;
> +
> +	return I915_SELFTEST_ONLY(ban_memcpy) ? false : true;
> +}
> +
>   static struct dma_fence *
>   __i915_ttm_move(struct ttm_buffer_object *bo,
>   		const struct ttm_operation_ctx *ctx, bool clear,
> @@ -396,6 +443,9 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>   		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
>   		const struct i915_deps *move_deps)
>   {
> +	const bool memcpy_allowed = i915_ttm_memcpy_allowed(bo, dst_mem);
> +	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> +	struct drm_i915_private *i915 = to_i915(bo->base.dev);
>   	struct i915_ttm_memcpy_work *copy_work = NULL;
>   	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
>   	struct dma_fence *fence = ERR_PTR(-EINVAL);
> @@ -423,9 +473,14 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>   			copy_work = kzalloc(sizeof(*copy_work), GFP_KERNEL);
>   
>   		if (copy_work) {
> +			copy_work->i915 = i915;
> +			copy_work->memcpy_allowed = memcpy_allowed;
> +			copy_work->obj = i915_gem_object_get(obj);
>   			arg = &copy_work->arg;
> -			i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
> -					     dst_rsgt);
> +			if (memcpy_allowed)
> +				i915_ttm_memcpy_init(arg, bo, clear, dst_mem,
> +						     dst_ttm, dst_rsgt);
> +
>   			fence = i915_ttm_memcpy_work_arm(copy_work, dep);
>   		} else {
>   			dma_fence_wait(dep, false);
> @@ -450,17 +505,23 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>   	}
>   
>   	/* Error intercept failed or no accelerated migration to start with */
> -	if (!copy_work)
> -		i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
> -				     dst_rsgt);
> -	i915_ttm_move_memcpy(arg);
> -	i915_ttm_memcpy_release(arg);
> +
> +	if (memcpy_allowed) {
> +		if (!copy_work)
> +			i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
> +					     dst_rsgt);
> +		i915_ttm_move_memcpy(arg);
> +		i915_ttm_memcpy_release(arg);
> +	}
> +	if (copy_work)
> +		i915_gem_object_put(copy_work->obj);
>   	kfree(copy_work);
>   
> -	return NULL;
> +	return memcpy_allowed ? NULL : ERR_PTR(-EIO);
>   out:
>   	if (!fence && copy_work) {
>   		i915_ttm_memcpy_release(arg);
> +		i915_gem_object_put(copy_work->obj);
>   		kfree(copy_work);
>   	}
>   
> @@ -539,8 +600,11 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>   	}
>   
>   	if (migration_fence) {
> -		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
> -						true, dst_mem);
> +		if (I915_SELFTEST_ONLY(evict && fail_gpu_migration))
> +			ret = -EIO; /* never feed non-migrate fences into ttm */
> +		else
> +			ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
> +							true, dst_mem);
>   		if (ret) {
>   			dma_fence_wait(migration_fence, false);
>   			ttm_bo_move_sync_cleanup(bo, dst_mem);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
> index d2e7f149e05c..8a5d5ab0cc34 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
> @@ -22,6 +22,7 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo);
>   
>   I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>   							      bool work_allocation));
> +I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_ban_memcpy(bool ban));
>   
>   int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>   			  struct drm_i915_gem_object *src,
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> index 801af51aff62..fe6c37fd7859 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> @@ -9,6 +9,7 @@
>   
>   #include "i915_deps.h"
>   
> +#include "selftests/igt_reset.h"
>   #include "selftests/igt_spinner.h"
>   
>   static int igt_fill_check_buffer(struct drm_i915_gem_object *obj,
> @@ -109,7 +110,8 @@ static int igt_same_create_migrate(void *arg)
>   
>   static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
>   				  struct drm_i915_gem_object *obj,
> -				  struct i915_vma *vma)
> +				  struct i915_vma *vma,
> +				  bool silent_migrate)
>   {
>   	int err;
>   
> @@ -138,7 +140,8 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
>   	if (i915_gem_object_is_lmem(obj)) {
>   		err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM);
>   		if (err) {
> -			pr_err("Object failed migration to smem\n");
> +			if (!silent_migrate)
> +				pr_err("Object failed migration to smem\n");
>   			if (err)
>   				return err;
>   		}
> @@ -156,7 +159,8 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
>   	} else {
>   		err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM_0);
>   		if (err) {
> -			pr_err("Object failed migration to lmem\n");
> +			if (!silent_migrate)
> +				pr_err("Object failed migration to lmem\n");
>   			if (err)
>   				return err;
>   		}
> @@ -179,7 +183,8 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
>   				    struct i915_address_space *vm,
>   				    struct i915_deps *deps,
>   				    struct igt_spinner *spin,
> -				    struct dma_fence *spin_fence)
> +				    struct dma_fence *spin_fence,
> +				    bool borked_migrate)
>   {
>   	struct drm_i915_private *i915 = gt->i915;
>   	struct drm_i915_gem_object *obj;
> @@ -242,7 +247,8 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
>   	 */
>   	for (i = 1; i <= 5; ++i) {
>   		for_i915_gem_ww(&ww, err, true)
> -			err = lmem_pages_migrate_one(&ww, obj, vma);
> +			err = lmem_pages_migrate_one(&ww, obj, vma,
> +						     borked_migrate);
>   		if (err)
>   			goto out_put;
>   	}
> @@ -283,23 +289,70 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
>   
>   static int igt_lmem_pages_failsafe_migrate(void *arg)
>   {
> -	int fail_gpu, fail_alloc, ret;
> +	int fail_gpu, fail_alloc, ban_memcpy, ret;
>   	struct intel_gt *gt = arg;
>   
>   	for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
>   		for (fail_alloc = 0; fail_alloc < 2; ++fail_alloc) {
> -			pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
> -				fail_gpu, fail_alloc);
> -			i915_ttm_migrate_set_failure_modes(fail_gpu,
> -							   fail_alloc);
> -			ret = __igt_lmem_pages_migrate(gt, NULL, NULL, NULL, NULL);
> -			if (ret)
> -				goto out_err;
> +			for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
> +				pr_info("Simulated failure modes: gpu: %d, alloc:%d, ban_memcpy: %d\n",
> +					fail_gpu, fail_alloc, ban_memcpy);
> +				i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
> +				i915_ttm_migrate_set_failure_modes(fail_gpu,
> +								   fail_alloc);
> +				ret = __igt_lmem_pages_migrate(gt, NULL, NULL,
> +							       NULL, NULL,
> +							       ban_memcpy &&
> +							       fail_gpu);
> +
> +				if (ban_memcpy && fail_gpu) {
> +					struct intel_gt *__gt;
> +					unsigned int id;
> +
> +					if (ret != -EIO) {
> +						pr_err("expected -EIO, got (%d)\n", ret);
> +						ret = -EINVAL;
> +					} else {
> +						ret = 0;
> +					}
> +
> +					for_each_gt(__gt, gt->i915, id) {
> +						intel_wakeref_t wakeref;
> +						bool wedged;
> +
> +						mutex_lock(&__gt->reset.mutex);
> +						wedged = test_bit(I915_WEDGED, &__gt->reset.flags);
> +						mutex_unlock(&__gt->reset.mutex);
> +
> +						if (fail_gpu && !fail_alloc) {
> +							if (!wedged) {
> +								pr_err("gt(%u) not wedged\n", id);
> +								ret = -EINVAL;
> +								continue;
> +							}
> +						} else if (wedged) {
> +							pr_err("gt(%u) incorrectly wedged\n", id);
> +							ret = -EINVAL;
> +						} else {
> +							continue;
> +						}
> +
> +						wakeref = intel_runtime_pm_get(__gt->uncore->rpm);
> +						igt_global_reset_lock(__gt);
> +						intel_gt_reset(__gt, ALL_ENGINES, NULL);
> +						igt_global_reset_unlock(__gt);
> +						intel_runtime_pm_put(__gt->uncore->rpm, wakeref);
> +					}
> +					if (ret)
> +						goto out_err;
> +				}
> +			}
>   		}
>   	}
>   
>   out_err:
>   	i915_ttm_migrate_set_failure_modes(false, false);
> +	i915_ttm_migrate_set_ban_memcpy(false);
>   	return ret;
>   }
>   
> @@ -370,7 +423,7 @@ static int igt_async_migrate(struct intel_gt *gt)
>   			goto out_ce;
>   
>   		err = __igt_lmem_pages_migrate(gt, &ppgtt->vm, &deps, &spin,
> -					       spin_fence);
> +					       spin_fence, false);
>   		i915_deps_fini(&deps);
>   		dma_fence_put(spin_fence);
>   		if (err)
> @@ -394,23 +447,67 @@ static int igt_async_migrate(struct intel_gt *gt)
>   #define ASYNC_FAIL_ALLOC 1
>   static int igt_lmem_async_migrate(void *arg)
>   {
> -	int fail_gpu, fail_alloc, ret;
> +	int fail_gpu, fail_alloc, ban_memcpy, ret;
>   	struct intel_gt *gt = arg;
>   
>   	for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
>   		for (fail_alloc = 0; fail_alloc < ASYNC_FAIL_ALLOC; ++fail_alloc) {
> -			pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
> -				fail_gpu, fail_alloc);
> -			i915_ttm_migrate_set_failure_modes(fail_gpu,
> -							   fail_alloc);
> -			ret = igt_async_migrate(gt);
> -			if (ret)
> -				goto out_err;
> +			for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
> +				pr_info("Simulated failure modes: gpu: %d, alloc: %d, ban_memcpy: %d\n",
> +					fail_gpu, fail_alloc, ban_memcpy);
> +				i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
> +				i915_ttm_migrate_set_failure_modes(fail_gpu,
> +								   fail_alloc);
> +				ret = igt_async_migrate(gt);
> +
> +				if (fail_gpu && ban_memcpy) {
> +					struct intel_gt *__gt;
> +					unsigned int id;
> +
> +					if (ret != -EIO) {
> +						pr_err("expected -EIO, got (%d)\n", ret);
> +						ret = -EINVAL;
> +					} else {
> +						ret = 0;
> +					}
> +
> +					for_each_gt(__gt, gt->i915, id) {
> +						intel_wakeref_t wakeref;
> +						bool wedged;
> +
> +						mutex_lock(&__gt->reset.mutex);
> +						wedged = test_bit(I915_WEDGED, &__gt->reset.flags);
> +						mutex_unlock(&__gt->reset.mutex);
> +
> +						if (fail_gpu && !fail_alloc) {
> +							if (!wedged) {
> +								pr_err("gt(%u) not wedged\n", id);
> +								ret = -EINVAL;
> +								continue;
> +							}
> +						} else if (wedged) {
> +							pr_err("gt(%u) incorrectly wedged\n", id);
> +							ret = -EINVAL;
> +						} else {
> +							continue;
> +						}
> +
> +						wakeref = intel_runtime_pm_get(__gt->uncore->rpm);
> +						igt_global_reset_lock(__gt);
> +						intel_gt_reset(__gt, ALL_ENGINES, NULL);
> +						igt_global_reset_unlock(__gt);
> +						intel_runtime_pm_put(__gt->uncore->rpm, wakeref);
> +					}
> +				}
> +				if (ret)
> +					goto out_err;
> +			}
>   		}
>   	}
>   
>   out_err:
>   	i915_ttm_migrate_set_failure_modes(false, false);
> +	i915_ttm_migrate_set_ban_memcpy(false);
>   	return ret;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index da28acb78a88..3ced9948a331 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -10,6 +10,7 @@
>   #include "gem/i915_gem_internal.h"
>   #include "gem/i915_gem_region.h"
>   #include "gem/i915_gem_ttm.h"
> +#include "gem/i915_gem_ttm_move.h"
>   #include "gt/intel_engine_pm.h"
>   #include "gt/intel_gpu_commands.h"
>   #include "gt/intel_gt.h"
> @@ -21,6 +22,7 @@
>   #include "i915_selftest.h"
>   #include "selftests/i915_random.h"
>   #include "selftests/igt_flush_test.h"
> +#include "selftests/igt_reset.h"
>   #include "selftests/igt_mmap.h"
>   
>   struct tile {
> @@ -1163,6 +1165,7 @@ static int ___igt_mmap_migrate(struct drm_i915_private *i915,
>   #define IGT_MMAP_MIGRATE_FILL        (1 << 1)
>   #define IGT_MMAP_MIGRATE_EVICTABLE   (1 << 2)
>   #define IGT_MMAP_MIGRATE_UNFAULTABLE (1 << 3)
> +#define IGT_MMAP_MIGRATE_FAIL_GPU    (1 << 4)
>   static int __igt_mmap_migrate(struct intel_memory_region **placements,
>   			      int n_placements,
>   			      struct intel_memory_region *expected_mr,
> @@ -1237,13 +1240,62 @@ static int __igt_mmap_migrate(struct intel_memory_region **placements,
>   	if (flags & IGT_MMAP_MIGRATE_EVICTABLE)
>   		igt_make_evictable(&objects);
>   
> +	if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
> +		err = i915_gem_object_lock(obj, NULL);
> +		if (err)
> +			goto out_put;
> +
> +		/*
> +		 * Ensure we only simulate the gpu failuire when faulting the
> +		 * pages.
> +		 */
> +		err = i915_gem_object_wait_moving_fence(obj, true);
> +		i915_gem_object_unlock(obj);
> +		if (err)
> +			goto out_put;
> +		i915_ttm_migrate_set_failure_modes(true, false);
> +	}
> +
>   	err = ___igt_mmap_migrate(i915, obj, addr,
>   				  flags & IGT_MMAP_MIGRATE_UNFAULTABLE);
> +
>   	if (!err && obj->mm.region != expected_mr) {
>   		pr_err("%s region mismatch %s\n", __func__, expected_mr->name);
>   		err = -EINVAL;
>   	}
>   
> +	if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
> +		struct intel_gt *gt;
> +		unsigned int id;
> +
> +		i915_ttm_migrate_set_failure_modes(false, false);
> +
> +		for_each_gt(gt, i915, id) {
> +			intel_wakeref_t wakeref;
> +			bool wedged;
> +
> +			mutex_lock(&gt->reset.mutex);
> +			wedged = test_bit(I915_WEDGED, &gt->reset.flags);
> +			mutex_unlock(&gt->reset.mutex);
> +			if (!wedged) {
> +				pr_err("gt(%u) not wedged\n", id);
> +				err = -EINVAL;
> +				continue;
> +			}
> +
> +			wakeref = intel_runtime_pm_get(gt->uncore->rpm);
> +			igt_global_reset_lock(gt);
> +			intel_gt_reset(gt, ALL_ENGINES, NULL);
> +			igt_global_reset_unlock(gt);
> +			intel_runtime_pm_put(gt->uncore->rpm, wakeref);
> +		}
> +
> +		if (!i915_gem_object_has_unknown_state(obj)) {
> +			pr_err("object missing unknown_state\n");
> +			err = -EINVAL;
> +		}
> +	}
> +
>   out_put:
>   	i915_gem_object_put(obj);
>   	igt_close_objects(i915, &objects);
> @@ -1324,6 +1376,23 @@ static int igt_mmap_migrate(void *arg)
>   					 IGT_MMAP_MIGRATE_TOPDOWN |
>   					 IGT_MMAP_MIGRATE_FILL |
>   					 IGT_MMAP_MIGRATE_UNFAULTABLE);
> +		if (err)
> +			goto out_io_size;
> +
> +		/*
> +		 * Allocate in the non-mappable portion, but force migrating to
> +		 * the mappable portion on fault (LMEM -> LMEM). We then also
> +		 * simulate a gpu error when moving the pages when faulting the
> +		 * pages, which should result in wedging the gpu and returning
> +		 * SIGBUS in the fault handler, since we can't fallback to
> +		 * memcpy.
> +		 */
> +		err = __igt_mmap_migrate(single, ARRAY_SIZE(single), mr,
> +					 IGT_MMAP_MIGRATE_TOPDOWN |
> +					 IGT_MMAP_MIGRATE_FILL |
> +					 IGT_MMAP_MIGRATE_EVICTABLE |
> +					 IGT_MMAP_MIGRATE_FAIL_GPU |
> +					 IGT_MMAP_MIGRATE_UNFAULTABLE);
>   out_io_size:
>   		mr->io_size = saved_io_size;
>   		i915_ttm_buddy_man_force_visible_size(man,
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 5d5828b9a242..43339ecabd73 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -310,7 +310,7 @@ struct i915_vma_work {
>   	struct i915_address_space *vm;
>   	struct i915_vm_pt_stash stash;
>   	struct i915_vma_resource *vma_res;
> -	struct drm_i915_gem_object *pinned;
> +	struct drm_i915_gem_object *obj;
>   	struct i915_sw_dma_fence_cb cb;
>   	enum i915_cache_level cache_level;
>   	unsigned int flags;
> @@ -321,17 +321,25 @@ static void __vma_bind(struct dma_fence_work *work)
>   	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>   	struct i915_vma_resource *vma_res = vw->vma_res;
>   
> +	/*
> +	 * We are about the bind the object, which must mean we have already
> +	 * signaled the work to potentially clear/move the pages underneath. If
> +	 * something went wrong at that stage then the object should have
> +	 * unknown_state set, in which case we need to skip the bind.
> +	 */
> +	if (i915_gem_object_has_unknown_state(vw->obj))
> +		return;
> +
>   	vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
>   			       vma_res, vw->cache_level, vw->flags);
> -
>   }
>   
>   static void __vma_release(struct dma_fence_work *work)
>   {
>   	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>   
> -	if (vw->pinned)
> -		i915_gem_object_put(vw->pinned);
> +	if (vw->obj)
> +		i915_gem_object_put(vw->obj);
>   
>   	i915_vm_free_pt_stash(vw->vm, &vw->stash);
>   	if (vw->vma_res)
> @@ -517,14 +525,7 @@ int i915_vma_bind(struct i915_vma *vma,
>   		}
>   
>   		work->base.dma.error = 0; /* enable the queue_work() */
> -
> -		/*
> -		 * If we don't have the refcounted pages list, keep a reference
> -		 * on the object to avoid waiting for the async bind to
> -		 * complete in the object destruction path.
> -		 */
> -		if (!work->vma_res->bi.pages_rsgt)
> -			work->pinned = i915_gem_object_get(vma->obj);
> +		work->obj = i915_gem_object_get(vma->obj);
>   	} else {
>   		ret = i915_gem_object_wait_moving_fence(vma->obj, true);
>   		if (ret) {

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 13/13] drm/i915: turn on small BAR support
  2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 16:16     ` Thomas Hellström
  -1 siblings, 0 replies; 48+ messages in thread
From: Thomas Hellström @ 2022-06-29 16:16 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx
  Cc: Tvrtko Ursulin, Jordan Justen, Lionel Landwerlin,
	Kenneth Graunke, Jon Bloomfield, dri-devel, Daniel Vetter,
	Akeem G Abodunrin


On 6/29/22 14:14, Matthew Auld wrote:
> With the uAPI in place we should now have enough in place to ensure a
> working system on small BAR configurations.
>
> v2: (Nirmoy & Thomas):
>    - s/full BAR/Resizable BAR/ which is hopefully more easily
>      understood by users.
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_region_lmem.c | 10 ++++------
>   1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> index d09b996a9759..fa7b86f83e7b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> @@ -112,12 +112,6 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
>   		flat_ccs_base = intel_gt_mcr_read_any(gt, XEHP_FLAT_CCS_BASE_ADDR);
>   		flat_ccs_base = (flat_ccs_base >> XEHP_CCS_BASE_SHIFT) * SZ_64K;
>   
> -		/* FIXME: Remove this when we have small-bar enabled */
> -		if (pci_resource_len(pdev, 2) < lmem_size) {
> -			drm_err(&i915->drm, "System requires small-BAR support, which is currently unsupported on this kernel\n");
> -			return ERR_PTR(-EINVAL);
> -		}
> -
>   		if (GEM_WARN_ON(lmem_size < flat_ccs_base))
>   			return ERR_PTR(-EIO);
>   
> @@ -170,6 +164,10 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
>   	drm_info(&i915->drm, "Local memory available: %pa\n",
>   		 &lmem_size);
>   
> +	if (io_size < lmem_size)
> +		drm_info(&i915->drm, "Using a reduced BAR size of %lluMiB. Consider enabling 'Resizable BAR' or similar, if available in the BIOS.\n",
> +			 (u64)io_size >> 20);
> +

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx] [PATCH v3 13/13] drm/i915: turn on small BAR support
@ 2022-06-29 16:16     ` Thomas Hellström
  0 siblings, 0 replies; 48+ messages in thread
From: Thomas Hellström @ 2022-06-29 16:16 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx; +Cc: Kenneth Graunke, dri-devel, Daniel Vetter


On 6/29/22 14:14, Matthew Auld wrote:
> With the uAPI in place we should now have enough in place to ensure a
> working system on small BAR configurations.
>
> v2: (Nirmoy & Thomas):
>    - s/full BAR/Resizable BAR/ which is hopefully more easily
>      understood by users.
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_region_lmem.c | 10 ++++------
>   1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> index d09b996a9759..fa7b86f83e7b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> @@ -112,12 +112,6 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
>   		flat_ccs_base = intel_gt_mcr_read_any(gt, XEHP_FLAT_CCS_BASE_ADDR);
>   		flat_ccs_base = (flat_ccs_base >> XEHP_CCS_BASE_SHIFT) * SZ_64K;
>   
> -		/* FIXME: Remove this when we have small-bar enabled */
> -		if (pci_resource_len(pdev, 2) < lmem_size) {
> -			drm_err(&i915->drm, "System requires small-BAR support, which is currently unsupported on this kernel\n");
> -			return ERR_PTR(-EINVAL);
> -		}
> -
>   		if (GEM_WARN_ON(lmem_size < flat_ccs_base))
>   			return ERR_PTR(-EIO);
>   
> @@ -170,6 +164,10 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
>   	drm_info(&i915->drm, "Local memory available: %pa\n",
>   		 &lmem_size);
>   
> +	if (io_size < lmem_size)
> +		drm_info(&i915->drm, "Using a reduced BAR size of %lluMiB. Consider enabling 'Resizable BAR' or similar, if available in the BIOS.\n",
> +			 (u64)io_size >> 20);
> +

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 09/13] drm/i915/selftests: skip the mman tests for stolen
  2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 16:22     ` Thomas Hellström
  -1 siblings, 0 replies; 48+ messages in thread
From: Thomas Hellström @ 2022-06-29 16:22 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx
  Cc: Tvrtko Ursulin, Jordan Justen, Lionel Landwerlin,
	Kenneth Graunke, Jon Bloomfield, dri-devel, Daniel Vetter,
	Akeem G Abodunrin


On 6/29/22 14:14, Matthew Auld wrote:
> It's not supported, and just skips later anyway. With small-BAR things
> get more complicated since all of stolen is likely not even CPU
> accessible, hence not passing I915_BO_ALLOC_GPU_ONLY just results in the
> object create failing.
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)

This reminds me,

Is there a problem for fbdev (and hence things like plymouth) if the 
initial fbdev image ends up as a stolen memory object which in turn ends 
up not being mappable? I remember we discussed this before but can't 
recall what the answer was.

Anyway, for this patch

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>





>
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index 5bc93a1ce3e3..388c85b0f764 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -979,6 +979,9 @@ static int igt_mmap(void *arg)
>   		};
>   		int i;
>   
> +		if (mr->private)
> +			continue;
> +
>   		for (i = 0; i < ARRAY_SIZE(sizes); i++) {
>   			struct drm_i915_gem_object *obj;
>   			int err;
> @@ -1435,6 +1438,9 @@ static int igt_mmap_access(void *arg)
>   		struct drm_i915_gem_object *obj;
>   		int err;
>   
> +		if (mr->private)
> +			continue;
> +
>   		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>   		if (obj == ERR_PTR(-ENODEV))
>   			continue;
> @@ -1580,6 +1586,9 @@ static int igt_mmap_gpu(void *arg)
>   		struct drm_i915_gem_object *obj;
>   		int err;
>   
> +		if (mr->private)
> +			continue;
> +
>   		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>   		if (obj == ERR_PTR(-ENODEV))
>   			continue;
> @@ -1727,6 +1736,9 @@ static int igt_mmap_revoke(void *arg)
>   		struct drm_i915_gem_object *obj;
>   		int err;
>   
> +		if (mr->private)
> +			continue;
> +
>   		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>   		if (obj == ERR_PTR(-ENODEV))
>   			continue;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx] [PATCH v3 09/13] drm/i915/selftests: skip the mman tests for stolen
@ 2022-06-29 16:22     ` Thomas Hellström
  0 siblings, 0 replies; 48+ messages in thread
From: Thomas Hellström @ 2022-06-29 16:22 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx; +Cc: Kenneth Graunke, dri-devel, Daniel Vetter


On 6/29/22 14:14, Matthew Auld wrote:
> It's not supported, and just skips later anyway. With small-BAR things
> get more complicated since all of stolen is likely not even CPU
> accessible, hence not passing I915_BO_ALLOC_GPU_ONLY just results in the
> object create failing.
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)

This reminds me,

Is there a problem for fbdev (and hence things like plymouth) if the 
initial fbdev image ends up as a stolen memory object which in turn ends 
up not being mappable? I remember we discussed this before but can't 
recall what the answer was.

Anyway, for this patch

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>





>
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index 5bc93a1ce3e3..388c85b0f764 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -979,6 +979,9 @@ static int igt_mmap(void *arg)
>   		};
>   		int i;
>   
> +		if (mr->private)
> +			continue;
> +
>   		for (i = 0; i < ARRAY_SIZE(sizes); i++) {
>   			struct drm_i915_gem_object *obj;
>   			int err;
> @@ -1435,6 +1438,9 @@ static int igt_mmap_access(void *arg)
>   		struct drm_i915_gem_object *obj;
>   		int err;
>   
> +		if (mr->private)
> +			continue;
> +
>   		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>   		if (obj == ERR_PTR(-ENODEV))
>   			continue;
> @@ -1580,6 +1586,9 @@ static int igt_mmap_gpu(void *arg)
>   		struct drm_i915_gem_object *obj;
>   		int err;
>   
> +		if (mr->private)
> +			continue;
> +
>   		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>   		if (obj == ERR_PTR(-ENODEV))
>   			continue;
> @@ -1727,6 +1736,9 @@ static int igt_mmap_revoke(void *arg)
>   		struct drm_i915_gem_object *obj;
>   		int err;
>   
> +		if (mr->private)
> +			continue;
> +
>   		obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>   		if (obj == ERR_PTR(-ENODEV))
>   			continue;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 11/13] drm/i915/ttm: handle blitter failure on DG2
  2022-06-29 16:11     ` [Intel-gfx] " Thomas Hellström
@ 2022-06-29 16:28       ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 16:28 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx
  Cc: Tvrtko Ursulin, Jordan Justen, Lionel Landwerlin,
	Kenneth Graunke, Jon Bloomfield, dri-devel, Daniel Vetter,
	Akeem G Abodunrin

On 29/06/2022 17:11, Thomas Hellström wrote:
> Hi, Matthew,
> 
> On 6/29/22 14:14, Matthew Auld wrote:
>> If the move or clear operation somehow fails, and the memory underneath
>> is not cleared, like when moving to lmem, then we currently fallback to
>> memcpy or memset. However with small-BAR systems this fallback might no
>> longer be possible. For now we use the set_wedged sledgehammer if we
>> ever encounter such a scenario, and mark the object as borked to plug
>> any holes where access to the memory underneath can happen. Add some
>> basic selftests to exercise this.
>>
>> v2:
>>    - In the selftests make sure we grab the runtime pm around the reset.
>>      Also make sure we grab the reset lock before checking if the device
>>      is wedged, since the wedge might still be in-progress and hence the
>>      bit might not be set yet.
>>    - Don't wedge or put the object into an unknown state, if the request
>>      construction fails (or similar). Just returning an error and
>>      skipping the fallback should be safe here.
>>    - Make sure we wedge each gt. (Thomas)
>>    - Peek at the unknown_state in io_reserve, that way we don't have to
>>      export or hand roll the fault_wait_for_idle. (Thomas)
>>    - Add the missing read-side barriers for the unknown_state. (Thomas)
>>    - Some kernel-doc fixes. (Thomas)
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Kenneth Graunke <kenneth@whitecape.org>
>> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    |  21 +++
>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |   1 +
>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +++
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  26 +++-
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   3 +
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  88 +++++++++--
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |   1 +
>>   .../drm/i915/gem/selftests/i915_gem_migrate.c | 141 +++++++++++++++---
>>   .../drm/i915/gem/selftests/i915_gem_mman.c    |  69 +++++++++
>>   drivers/gpu/drm/i915/i915_vma.c               |  25 ++--
>>   10 files changed, 346 insertions(+), 47 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 06b1b188ce5a..642a5d59ce26 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -783,10 +783,31 @@ int i915_gem_object_wait_moving_fence(struct 
>> drm_i915_gem_object *obj,
>>                       intr, MAX_SCHEDULE_TIMEOUT);
>>       if (!ret)
>>           ret = -ETIME;
>> +    else if (ret > 0 && i915_gem_object_has_unknown_state(obj))
>> +        ret = -EIO;
>>       return ret < 0 ? ret : 0;
>>   }
>> +/**
>> + * i915_gem_object_has_unknown_state - Return true if the object 
>> backing pages are
>> + * in an unknown_state. This means that userspace must NEVER be 
>> allowed to touch
>> + * the pages, with either the GPU or CPU.
>> + *
>> + * ONLY valid to be called after ensuring that all kernel fences have 
>> signalled
>> + * (in particular the fence for moving/clearing the object).
>> + */
>> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj)
>> +{
>> +    /*
>> +     * The below barrier pairs with the dma_fence_signal() in
>> +     * __memcpy_work(). We should only sample the unknown_state after 
>> all
>> +     * the kernel fences have signalled.
>> +     */
>> +    smp_rmb();
>> +    return obj->mm.unknown_state;
>> +}
>> +
>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>   #include "selftests/huge_gem_object.c"
>>   #include "selftests/huge_pages.c"
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> index e11d82a9f7c3..0bf3ee27a2a8 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> @@ -524,6 +524,7 @@ int i915_gem_object_get_moving_fence(struct 
>> drm_i915_gem_object *obj,
>>                        struct dma_fence **fence);
>>   int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>>                         bool intr);
>> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object 
>> *obj,
>>                        unsigned int cache_level);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> index 2c88bdb8ff7c..5cf36a130061 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> @@ -547,6 +547,24 @@ struct drm_i915_gem_object {
>>            */
>>           bool ttm_shrinkable;
>> +        /**
>> +         * @unknown_state: Indicate that the object is effectively
>> +         * borked. This is write-once and set if we somehow encounter a
>> +         * fatal error when moving/clearing the pages, and we are not
>> +         * able to fallback to memcpy/memset, like on small-BAR systems.
>> +         * The GPU should also be wedged (or in the process) at this
>> +         * point.
>> +         *
>> +         * Only valid to read this after acquiring the dma-resv lock and
>> +         * waiting for all DMA_RESV_USAGE_KERNEL fences to be signalled,
>> +         * or if we otherwise know that the moving fence has signalled,
>> +         * and we are certain the pages underneath are valid for
>> +         * immediate access (under normal operation), like just prior to
>> +         * binding the object or when setting up the CPU fault handler.
>> +         * See i915_gem_object_has_unknown_state();
>> +         */
>> +        bool unknown_state;
>> +
>>           /**
>>            * Priority list of potential placements for this object.
>>            */
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> index 4c25d9b2f138..098409a33e10 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> @@ -675,7 +675,15 @@ static void i915_ttm_swap_notify(struct 
>> ttm_buffer_object *bo)
>>           i915_ttm_purge(obj);
>>   }
>> -static bool i915_ttm_resource_mappable(struct ttm_resource *res)
>> +/**
>> + * i915_ttm_resource_mappable - Return true if the ttm resource is CPU
>> + * accessible.
>> + * @res: The TTM resource to check.
>> + *
>> + * This is interesting on small-BAR systems where we may encounter 
>> lmem objects
>> + * that can't be accessed via the CPU.
>> + */
>> +bool i915_ttm_resource_mappable(struct ttm_resource *res)
>>   {
>>       struct i915_ttm_buddy_resource *bman_res = 
>> to_ttm_buddy_resource(res);
>> @@ -687,6 +695,22 @@ static bool i915_ttm_resource_mappable(struct 
>> ttm_resource *res)
>>   static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct 
>> ttm_resource *mem)
>>   {
>> +    struct drm_i915_gem_object *obj = i915_ttm_to_gem(mem->bo);
>> +    bool unknown_state;
>> +
>> +    if (!obj)
>> +        return -EINVAL;
>> +
>> +    if (!kref_get_unless_zero(&obj->base.refcount))
>> +        return -EINVAL;
>> +
>> +    assert_object_held(obj);
>> +
>> +    unknown_state = i915_gem_object_has_unknown_state(obj);
>> +    i915_gem_object_put(obj);
>> +    if (unknown_state)
>> +        return -EINVAL;
>> +
>>       if (!i915_ttm_cpu_maps_iomem(mem))
>>           return 0;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h 
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>> index 73e371aa3850..e4842b4296fc 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>> @@ -92,4 +92,7 @@ static inline bool i915_ttm_cpu_maps_iomem(struct 
>> ttm_resource *mem)
>>       /* Once / if we support GGTT, this is also false for cached 
>> ttm_tts */
>>       return mem->mem_type != I915_PL_SYSTEM;
>>   }
>> +
>> +bool i915_ttm_resource_mappable(struct ttm_resource *res);
>> +
>>   #endif
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> index a10716f4e717..364e7fe8efb1 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> @@ -33,6 +33,7 @@
>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>   static bool fail_gpu_migration;
>>   static bool fail_work_allocation;
>> +static bool ban_memcpy;
>>   void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>>                       bool work_allocation)
>> @@ -40,6 +41,11 @@ void i915_ttm_migrate_set_failure_modes(bool 
>> gpu_migration,
>>       fail_gpu_migration = gpu_migration;
>>       fail_work_allocation = work_allocation;
>>   }
>> +
>> +void i915_ttm_migrate_set_ban_memcpy(bool ban)
>> +{
>> +    ban_memcpy = ban;
>> +}
>>   #endif
>>   static enum i915_cache_level
>> @@ -258,15 +264,23 @@ struct i915_ttm_memcpy_arg {
>>    * from the callback for lockdep reasons.
>>    * @cb: Callback for the accelerated migration fence.
>>    * @arg: The argument for the memcpy functionality.
>> + * @i915: The i915 pointer.
>> + * @obj: The GEM object.
>> + * @memcpy_allowed: Instead of processing the @arg, and falling back 
>> to memcpy
>> + * or memset, we wedge the device and set the @obj unknown_state, to 
>> prevent
>> + * further access to the object with the CPU or GPU.  On some devices 
>> we might
>> + * only be permitted to use the blitter engine for such operations.
>>    */
>>   struct i915_ttm_memcpy_work {
>>       struct dma_fence fence;
>>       struct work_struct work;
>> -    /* The fence lock */
>>       spinlock_t lock;
>>       struct irq_work irq_work;
>>       struct dma_fence_cb cb;
>>       struct i915_ttm_memcpy_arg arg;
>> +    struct drm_i915_private *i915;
>> +    struct drm_i915_gem_object *obj;
>> +    bool memcpy_allowed;
>>   };
>>   static void i915_ttm_move_memcpy(struct i915_ttm_memcpy_arg *arg)
>> @@ -319,12 +333,34 @@ static void __memcpy_work(struct work_struct *work)
>>       struct i915_ttm_memcpy_arg *arg = &copy_work->arg;
>>       bool cookie = dma_fence_begin_signalling();
>> -    i915_ttm_move_memcpy(arg);
>> +    if (copy_work->memcpy_allowed) {
>> +        i915_ttm_move_memcpy(arg);
>> +    } else {
>> +        /*
>> +         * Prevent further use of the object. Any future GTT binding or
>> +         * CPU access is not allowed once we signal the fence. Outside
>> +         * of the fence critical section, we then also then wedge the 
>> gpu
>> +         * to indicate the device is not functional.
>> +         *
>> +         * The below dma_fence_signal() is our write-memory-barrier.
>> +         */
>> +        copy_work->obj->mm.unknown_state = true;
>> +    }
>> +
>>       dma_fence_end_signalling(cookie);
>>       dma_fence_signal(&copy_work->fence);
>> +    if (!copy_work->memcpy_allowed) {
>> +        struct intel_gt *gt;
>> +        unsigned int id;
>> +
>> +        for_each_gt(gt, copy_work->i915, id)
>> +            intel_gt_set_wedged(gt);
>> +    }
> 
> Did you try to move the gt wedging to before dma_fence_signal, but 
> before dma_fence_end_signalling? Otherwise I think there is a race 
> opportunity (albeit very unlikely) where the gpu might read 
> uninitialized content.

I didn't try moving the set_wedged. But here AFAIK the wedge is not 
needed to prevent GPU access to the pages, more just to indicate that 
something is very broken. Prior to binding the object, either for the 
sync or async case (which must be after we signalled the clear/move here 
I think) we always first sample the unknown_state, and just keep the 
PTEs pointing to scratch if it has been set.

> 
> With that fixed (or not working)
> 
> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> 
>> +
>>       i915_ttm_memcpy_release(arg);
>> +    i915_gem_object_put(copy_work->obj);
>>       dma_fence_put(&copy_work->fence);
>>   }
>> @@ -336,6 +372,7 @@ static void __memcpy_irq_work(struct irq_work 
>> *irq_work)
>>       dma_fence_signal(&copy_work->fence);
>>       i915_ttm_memcpy_release(arg);
>> +    i915_gem_object_put(copy_work->obj);
>>       dma_fence_put(&copy_work->fence);
>>   }
>> @@ -389,6 +426,16 @@ i915_ttm_memcpy_work_arm(struct 
>> i915_ttm_memcpy_work *work,
>>       return &work->fence;
>>   }
>> +static bool i915_ttm_memcpy_allowed(struct ttm_buffer_object *bo,
>> +                    struct ttm_resource *dst_mem)
>> +{
>> +    if (!(i915_ttm_resource_mappable(bo->resource) &&
>> +          i915_ttm_resource_mappable(dst_mem)))
>> +        return false;
>> +
>> +    return I915_SELFTEST_ONLY(ban_memcpy) ? false : true;
>> +}
>> +
>>   static struct dma_fence *
>>   __i915_ttm_move(struct ttm_buffer_object *bo,
>>           const struct ttm_operation_ctx *ctx, bool clear,
>> @@ -396,6 +443,9 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>>           struct i915_refct_sgt *dst_rsgt, bool allow_accel,
>>           const struct i915_deps *move_deps)
>>   {
>> +    const bool memcpy_allowed = i915_ttm_memcpy_allowed(bo, dst_mem);
>> +    struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>> +    struct drm_i915_private *i915 = to_i915(bo->base.dev);
>>       struct i915_ttm_memcpy_work *copy_work = NULL;
>>       struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
>>       struct dma_fence *fence = ERR_PTR(-EINVAL);
>> @@ -423,9 +473,14 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>>               copy_work = kzalloc(sizeof(*copy_work), GFP_KERNEL);
>>           if (copy_work) {
>> +            copy_work->i915 = i915;
>> +            copy_work->memcpy_allowed = memcpy_allowed;
>> +            copy_work->obj = i915_gem_object_get(obj);
>>               arg = &copy_work->arg;
>> -            i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
>> -                         dst_rsgt);
>> +            if (memcpy_allowed)
>> +                i915_ttm_memcpy_init(arg, bo, clear, dst_mem,
>> +                             dst_ttm, dst_rsgt);
>> +
>>               fence = i915_ttm_memcpy_work_arm(copy_work, dep);
>>           } else {
>>               dma_fence_wait(dep, false);
>> @@ -450,17 +505,23 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>>       }
>>       /* Error intercept failed or no accelerated migration to start 
>> with */
>> -    if (!copy_work)
>> -        i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
>> -                     dst_rsgt);
>> -    i915_ttm_move_memcpy(arg);
>> -    i915_ttm_memcpy_release(arg);
>> +
>> +    if (memcpy_allowed) {
>> +        if (!copy_work)
>> +            i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
>> +                         dst_rsgt);
>> +        i915_ttm_move_memcpy(arg);
>> +        i915_ttm_memcpy_release(arg);
>> +    }
>> +    if (copy_work)
>> +        i915_gem_object_put(copy_work->obj);
>>       kfree(copy_work);
>> -    return NULL;
>> +    return memcpy_allowed ? NULL : ERR_PTR(-EIO);
>>   out:
>>       if (!fence && copy_work) {
>>           i915_ttm_memcpy_release(arg);
>> +        i915_gem_object_put(copy_work->obj);
>>           kfree(copy_work);
>>       }
>> @@ -539,8 +600,11 @@ int i915_ttm_move(struct ttm_buffer_object *bo, 
>> bool evict,
>>       }
>>       if (migration_fence) {
>> -        ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
>> -                        true, dst_mem);
>> +        if (I915_SELFTEST_ONLY(evict && fail_gpu_migration))
>> +            ret = -EIO; /* never feed non-migrate fences into ttm */
>> +        else
>> +            ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
>> +                            true, dst_mem);
>>           if (ret) {
>>               dma_fence_wait(migration_fence, false);
>>               ttm_bo_move_sync_cleanup(bo, dst_mem);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h 
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
>> index d2e7f149e05c..8a5d5ab0cc34 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
>> @@ -22,6 +22,7 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo);
>>   I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool 
>> gpu_migration,
>>                                     bool work_allocation));
>> +I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_ban_memcpy(bool ban));
>>   int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>>                 struct drm_i915_gem_object *src,
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>> index 801af51aff62..fe6c37fd7859 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>> @@ -9,6 +9,7 @@
>>   #include "i915_deps.h"
>> +#include "selftests/igt_reset.h"
>>   #include "selftests/igt_spinner.h"
>>   static int igt_fill_check_buffer(struct drm_i915_gem_object *obj,
>> @@ -109,7 +110,8 @@ static int igt_same_create_migrate(void *arg)
>>   static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
>>                     struct drm_i915_gem_object *obj,
>> -                  struct i915_vma *vma)
>> +                  struct i915_vma *vma,
>> +                  bool silent_migrate)
>>   {
>>       int err;
>> @@ -138,7 +140,8 @@ static int lmem_pages_migrate_one(struct 
>> i915_gem_ww_ctx *ww,
>>       if (i915_gem_object_is_lmem(obj)) {
>>           err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM);
>>           if (err) {
>> -            pr_err("Object failed migration to smem\n");
>> +            if (!silent_migrate)
>> +                pr_err("Object failed migration to smem\n");
>>               if (err)
>>                   return err;
>>           }
>> @@ -156,7 +159,8 @@ static int lmem_pages_migrate_one(struct 
>> i915_gem_ww_ctx *ww,
>>       } else {
>>           err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM_0);
>>           if (err) {
>> -            pr_err("Object failed migration to lmem\n");
>> +            if (!silent_migrate)
>> +                pr_err("Object failed migration to lmem\n");
>>               if (err)
>>                   return err;
>>           }
>> @@ -179,7 +183,8 @@ static int __igt_lmem_pages_migrate(struct 
>> intel_gt *gt,
>>                       struct i915_address_space *vm,
>>                       struct i915_deps *deps,
>>                       struct igt_spinner *spin,
>> -                    struct dma_fence *spin_fence)
>> +                    struct dma_fence *spin_fence,
>> +                    bool borked_migrate)
>>   {
>>       struct drm_i915_private *i915 = gt->i915;
>>       struct drm_i915_gem_object *obj;
>> @@ -242,7 +247,8 @@ static int __igt_lmem_pages_migrate(struct 
>> intel_gt *gt,
>>        */
>>       for (i = 1; i <= 5; ++i) {
>>           for_i915_gem_ww(&ww, err, true)
>> -            err = lmem_pages_migrate_one(&ww, obj, vma);
>> +            err = lmem_pages_migrate_one(&ww, obj, vma,
>> +                             borked_migrate);
>>           if (err)
>>               goto out_put;
>>       }
>> @@ -283,23 +289,70 @@ static int __igt_lmem_pages_migrate(struct 
>> intel_gt *gt,
>>   static int igt_lmem_pages_failsafe_migrate(void *arg)
>>   {
>> -    int fail_gpu, fail_alloc, ret;
>> +    int fail_gpu, fail_alloc, ban_memcpy, ret;
>>       struct intel_gt *gt = arg;
>>       for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
>>           for (fail_alloc = 0; fail_alloc < 2; ++fail_alloc) {
>> -            pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
>> -                fail_gpu, fail_alloc);
>> -            i915_ttm_migrate_set_failure_modes(fail_gpu,
>> -                               fail_alloc);
>> -            ret = __igt_lmem_pages_migrate(gt, NULL, NULL, NULL, NULL);
>> -            if (ret)
>> -                goto out_err;
>> +            for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
>> +                pr_info("Simulated failure modes: gpu: %d, alloc:%d, 
>> ban_memcpy: %d\n",
>> +                    fail_gpu, fail_alloc, ban_memcpy);
>> +                i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
>> +                i915_ttm_migrate_set_failure_modes(fail_gpu,
>> +                                   fail_alloc);
>> +                ret = __igt_lmem_pages_migrate(gt, NULL, NULL,
>> +                                   NULL, NULL,
>> +                                   ban_memcpy &&
>> +                                   fail_gpu);
>> +
>> +                if (ban_memcpy && fail_gpu) {
>> +                    struct intel_gt *__gt;
>> +                    unsigned int id;
>> +
>> +                    if (ret != -EIO) {
>> +                        pr_err("expected -EIO, got (%d)\n", ret);
>> +                        ret = -EINVAL;
>> +                    } else {
>> +                        ret = 0;
>> +                    }
>> +
>> +                    for_each_gt(__gt, gt->i915, id) {
>> +                        intel_wakeref_t wakeref;
>> +                        bool wedged;
>> +
>> +                        mutex_lock(&__gt->reset.mutex);
>> +                        wedged = test_bit(I915_WEDGED, 
>> &__gt->reset.flags);
>> +                        mutex_unlock(&__gt->reset.mutex);
>> +
>> +                        if (fail_gpu && !fail_alloc) {
>> +                            if (!wedged) {
>> +                                pr_err("gt(%u) not wedged\n", id);
>> +                                ret = -EINVAL;
>> +                                continue;
>> +                            }
>> +                        } else if (wedged) {
>> +                            pr_err("gt(%u) incorrectly wedged\n", id);
>> +                            ret = -EINVAL;
>> +                        } else {
>> +                            continue;
>> +                        }
>> +
>> +                        wakeref = 
>> intel_runtime_pm_get(__gt->uncore->rpm);
>> +                        igt_global_reset_lock(__gt);
>> +                        intel_gt_reset(__gt, ALL_ENGINES, NULL);
>> +                        igt_global_reset_unlock(__gt);
>> +                        intel_runtime_pm_put(__gt->uncore->rpm, 
>> wakeref);
>> +                    }
>> +                    if (ret)
>> +                        goto out_err;
>> +                }
>> +            }
>>           }
>>       }
>>   out_err:
>>       i915_ttm_migrate_set_failure_modes(false, false);
>> +    i915_ttm_migrate_set_ban_memcpy(false);
>>       return ret;
>>   }
>> @@ -370,7 +423,7 @@ static int igt_async_migrate(struct intel_gt *gt)
>>               goto out_ce;
>>           err = __igt_lmem_pages_migrate(gt, &ppgtt->vm, &deps, &spin,
>> -                           spin_fence);
>> +                           spin_fence, false);
>>           i915_deps_fini(&deps);
>>           dma_fence_put(spin_fence);
>>           if (err)
>> @@ -394,23 +447,67 @@ static int igt_async_migrate(struct intel_gt *gt)
>>   #define ASYNC_FAIL_ALLOC 1
>>   static int igt_lmem_async_migrate(void *arg)
>>   {
>> -    int fail_gpu, fail_alloc, ret;
>> +    int fail_gpu, fail_alloc, ban_memcpy, ret;
>>       struct intel_gt *gt = arg;
>>       for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
>>           for (fail_alloc = 0; fail_alloc < ASYNC_FAIL_ALLOC; 
>> ++fail_alloc) {
>> -            pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
>> -                fail_gpu, fail_alloc);
>> -            i915_ttm_migrate_set_failure_modes(fail_gpu,
>> -                               fail_alloc);
>> -            ret = igt_async_migrate(gt);
>> -            if (ret)
>> -                goto out_err;
>> +            for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
>> +                pr_info("Simulated failure modes: gpu: %d, alloc: %d, 
>> ban_memcpy: %d\n",
>> +                    fail_gpu, fail_alloc, ban_memcpy);
>> +                i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
>> +                i915_ttm_migrate_set_failure_modes(fail_gpu,
>> +                                   fail_alloc);
>> +                ret = igt_async_migrate(gt);
>> +
>> +                if (fail_gpu && ban_memcpy) {
>> +                    struct intel_gt *__gt;
>> +                    unsigned int id;
>> +
>> +                    if (ret != -EIO) {
>> +                        pr_err("expected -EIO, got (%d)\n", ret);
>> +                        ret = -EINVAL;
>> +                    } else {
>> +                        ret = 0;
>> +                    }
>> +
>> +                    for_each_gt(__gt, gt->i915, id) {
>> +                        intel_wakeref_t wakeref;
>> +                        bool wedged;
>> +
>> +                        mutex_lock(&__gt->reset.mutex);
>> +                        wedged = test_bit(I915_WEDGED, 
>> &__gt->reset.flags);
>> +                        mutex_unlock(&__gt->reset.mutex);
>> +
>> +                        if (fail_gpu && !fail_alloc) {
>> +                            if (!wedged) {
>> +                                pr_err("gt(%u) not wedged\n", id);
>> +                                ret = -EINVAL;
>> +                                continue;
>> +                            }
>> +                        } else if (wedged) {
>> +                            pr_err("gt(%u) incorrectly wedged\n", id);
>> +                            ret = -EINVAL;
>> +                        } else {
>> +                            continue;
>> +                        }
>> +
>> +                        wakeref = 
>> intel_runtime_pm_get(__gt->uncore->rpm);
>> +                        igt_global_reset_lock(__gt);
>> +                        intel_gt_reset(__gt, ALL_ENGINES, NULL);
>> +                        igt_global_reset_unlock(__gt);
>> +                        intel_runtime_pm_put(__gt->uncore->rpm, 
>> wakeref);
>> +                    }
>> +                }
>> +                if (ret)
>> +                    goto out_err;
>> +            }
>>           }
>>       }
>>   out_err:
>>       i915_ttm_migrate_set_failure_modes(false, false);
>> +    i915_ttm_migrate_set_ban_memcpy(false);
>>       return ret;
>>   }
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> index da28acb78a88..3ced9948a331 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> @@ -10,6 +10,7 @@
>>   #include "gem/i915_gem_internal.h"
>>   #include "gem/i915_gem_region.h"
>>   #include "gem/i915_gem_ttm.h"
>> +#include "gem/i915_gem_ttm_move.h"
>>   #include "gt/intel_engine_pm.h"
>>   #include "gt/intel_gpu_commands.h"
>>   #include "gt/intel_gt.h"
>> @@ -21,6 +22,7 @@
>>   #include "i915_selftest.h"
>>   #include "selftests/i915_random.h"
>>   #include "selftests/igt_flush_test.h"
>> +#include "selftests/igt_reset.h"
>>   #include "selftests/igt_mmap.h"
>>   struct tile {
>> @@ -1163,6 +1165,7 @@ static int ___igt_mmap_migrate(struct 
>> drm_i915_private *i915,
>>   #define IGT_MMAP_MIGRATE_FILL        (1 << 1)
>>   #define IGT_MMAP_MIGRATE_EVICTABLE   (1 << 2)
>>   #define IGT_MMAP_MIGRATE_UNFAULTABLE (1 << 3)
>> +#define IGT_MMAP_MIGRATE_FAIL_GPU    (1 << 4)
>>   static int __igt_mmap_migrate(struct intel_memory_region **placements,
>>                     int n_placements,
>>                     struct intel_memory_region *expected_mr,
>> @@ -1237,13 +1240,62 @@ static int __igt_mmap_migrate(struct 
>> intel_memory_region **placements,
>>       if (flags & IGT_MMAP_MIGRATE_EVICTABLE)
>>           igt_make_evictable(&objects);
>> +    if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
>> +        err = i915_gem_object_lock(obj, NULL);
>> +        if (err)
>> +            goto out_put;
>> +
>> +        /*
>> +         * Ensure we only simulate the gpu failuire when faulting the
>> +         * pages.
>> +         */
>> +        err = i915_gem_object_wait_moving_fence(obj, true);
>> +        i915_gem_object_unlock(obj);
>> +        if (err)
>> +            goto out_put;
>> +        i915_ttm_migrate_set_failure_modes(true, false);
>> +    }
>> +
>>       err = ___igt_mmap_migrate(i915, obj, addr,
>>                     flags & IGT_MMAP_MIGRATE_UNFAULTABLE);
>> +
>>       if (!err && obj->mm.region != expected_mr) {
>>           pr_err("%s region mismatch %s\n", __func__, expected_mr->name);
>>           err = -EINVAL;
>>       }
>> +    if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
>> +        struct intel_gt *gt;
>> +        unsigned int id;
>> +
>> +        i915_ttm_migrate_set_failure_modes(false, false);
>> +
>> +        for_each_gt(gt, i915, id) {
>> +            intel_wakeref_t wakeref;
>> +            bool wedged;
>> +
>> +            mutex_lock(&gt->reset.mutex);
>> +            wedged = test_bit(I915_WEDGED, &gt->reset.flags);
>> +            mutex_unlock(&gt->reset.mutex);
>> +            if (!wedged) {
>> +                pr_err("gt(%u) not wedged\n", id);
>> +                err = -EINVAL;
>> +                continue;
>> +            }
>> +
>> +            wakeref = intel_runtime_pm_get(gt->uncore->rpm);
>> +            igt_global_reset_lock(gt);
>> +            intel_gt_reset(gt, ALL_ENGINES, NULL);
>> +            igt_global_reset_unlock(gt);
>> +            intel_runtime_pm_put(gt->uncore->rpm, wakeref);
>> +        }
>> +
>> +        if (!i915_gem_object_has_unknown_state(obj)) {
>> +            pr_err("object missing unknown_state\n");
>> +            err = -EINVAL;
>> +        }
>> +    }
>> +
>>   out_put:
>>       i915_gem_object_put(obj);
>>       igt_close_objects(i915, &objects);
>> @@ -1324,6 +1376,23 @@ static int igt_mmap_migrate(void *arg)
>>                        IGT_MMAP_MIGRATE_TOPDOWN |
>>                        IGT_MMAP_MIGRATE_FILL |
>>                        IGT_MMAP_MIGRATE_UNFAULTABLE);
>> +        if (err)
>> +            goto out_io_size;
>> +
>> +        /*
>> +         * Allocate in the non-mappable portion, but force migrating to
>> +         * the mappable portion on fault (LMEM -> LMEM). We then also
>> +         * simulate a gpu error when moving the pages when faulting the
>> +         * pages, which should result in wedging the gpu and returning
>> +         * SIGBUS in the fault handler, since we can't fallback to
>> +         * memcpy.
>> +         */
>> +        err = __igt_mmap_migrate(single, ARRAY_SIZE(single), mr,
>> +                     IGT_MMAP_MIGRATE_TOPDOWN |
>> +                     IGT_MMAP_MIGRATE_FILL |
>> +                     IGT_MMAP_MIGRATE_EVICTABLE |
>> +                     IGT_MMAP_MIGRATE_FAIL_GPU |
>> +                     IGT_MMAP_MIGRATE_UNFAULTABLE);
>>   out_io_size:
>>           mr->io_size = saved_io_size;
>>           i915_ttm_buddy_man_force_visible_size(man,
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index 5d5828b9a242..43339ecabd73 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -310,7 +310,7 @@ struct i915_vma_work {
>>       struct i915_address_space *vm;
>>       struct i915_vm_pt_stash stash;
>>       struct i915_vma_resource *vma_res;
>> -    struct drm_i915_gem_object *pinned;
>> +    struct drm_i915_gem_object *obj;
>>       struct i915_sw_dma_fence_cb cb;
>>       enum i915_cache_level cache_level;
>>       unsigned int flags;
>> @@ -321,17 +321,25 @@ static void __vma_bind(struct dma_fence_work *work)
>>       struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>>       struct i915_vma_resource *vma_res = vw->vma_res;
>> +    /*
>> +     * We are about the bind the object, which must mean we have already
>> +     * signaled the work to potentially clear/move the pages 
>> underneath. If
>> +     * something went wrong at that stage then the object should have
>> +     * unknown_state set, in which case we need to skip the bind.
>> +     */
>> +    if (i915_gem_object_has_unknown_state(vw->obj))
>> +        return;
>> +
>>       vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
>>                      vma_res, vw->cache_level, vw->flags);
>> -
>>   }
>>   static void __vma_release(struct dma_fence_work *work)
>>   {
>>       struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>> -    if (vw->pinned)
>> -        i915_gem_object_put(vw->pinned);
>> +    if (vw->obj)
>> +        i915_gem_object_put(vw->obj);
>>       i915_vm_free_pt_stash(vw->vm, &vw->stash);
>>       if (vw->vma_res)
>> @@ -517,14 +525,7 @@ int i915_vma_bind(struct i915_vma *vma,
>>           }
>>           work->base.dma.error = 0; /* enable the queue_work() */
>> -
>> -        /*
>> -         * If we don't have the refcounted pages list, keep a reference
>> -         * on the object to avoid waiting for the async bind to
>> -         * complete in the object destruction path.
>> -         */
>> -        if (!work->vma_res->bi.pages_rsgt)
>> -            work->pinned = i915_gem_object_get(vma->obj);
>> +        work->obj = i915_gem_object_get(vma->obj);
>>       } else {
>>           ret = i915_gem_object_wait_moving_fence(vma->obj, true);
>>           if (ret) {

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx] [PATCH v3 11/13] drm/i915/ttm: handle blitter failure on DG2
@ 2022-06-29 16:28       ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 16:28 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx
  Cc: Kenneth Graunke, dri-devel, Daniel Vetter

On 29/06/2022 17:11, Thomas Hellström wrote:
> Hi, Matthew,
> 
> On 6/29/22 14:14, Matthew Auld wrote:
>> If the move or clear operation somehow fails, and the memory underneath
>> is not cleared, like when moving to lmem, then we currently fallback to
>> memcpy or memset. However with small-BAR systems this fallback might no
>> longer be possible. For now we use the set_wedged sledgehammer if we
>> ever encounter such a scenario, and mark the object as borked to plug
>> any holes where access to the memory underneath can happen. Add some
>> basic selftests to exercise this.
>>
>> v2:
>>    - In the selftests make sure we grab the runtime pm around the reset.
>>      Also make sure we grab the reset lock before checking if the device
>>      is wedged, since the wedge might still be in-progress and hence the
>>      bit might not be set yet.
>>    - Don't wedge or put the object into an unknown state, if the request
>>      construction fails (or similar). Just returning an error and
>>      skipping the fallback should be safe here.
>>    - Make sure we wedge each gt. (Thomas)
>>    - Peek at the unknown_state in io_reserve, that way we don't have to
>>      export or hand roll the fault_wait_for_idle. (Thomas)
>>    - Add the missing read-side barriers for the unknown_state. (Thomas)
>>    - Some kernel-doc fixes. (Thomas)
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Kenneth Graunke <kenneth@whitecape.org>
>> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    |  21 +++
>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |   1 +
>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +++
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  26 +++-
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   3 +
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  88 +++++++++--
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |   1 +
>>   .../drm/i915/gem/selftests/i915_gem_migrate.c | 141 +++++++++++++++---
>>   .../drm/i915/gem/selftests/i915_gem_mman.c    |  69 +++++++++
>>   drivers/gpu/drm/i915/i915_vma.c               |  25 ++--
>>   10 files changed, 346 insertions(+), 47 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 06b1b188ce5a..642a5d59ce26 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -783,10 +783,31 @@ int i915_gem_object_wait_moving_fence(struct 
>> drm_i915_gem_object *obj,
>>                       intr, MAX_SCHEDULE_TIMEOUT);
>>       if (!ret)
>>           ret = -ETIME;
>> +    else if (ret > 0 && i915_gem_object_has_unknown_state(obj))
>> +        ret = -EIO;
>>       return ret < 0 ? ret : 0;
>>   }
>> +/**
>> + * i915_gem_object_has_unknown_state - Return true if the object 
>> backing pages are
>> + * in an unknown_state. This means that userspace must NEVER be 
>> allowed to touch
>> + * the pages, with either the GPU or CPU.
>> + *
>> + * ONLY valid to be called after ensuring that all kernel fences have 
>> signalled
>> + * (in particular the fence for moving/clearing the object).
>> + */
>> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj)
>> +{
>> +    /*
>> +     * The below barrier pairs with the dma_fence_signal() in
>> +     * __memcpy_work(). We should only sample the unknown_state after 
>> all
>> +     * the kernel fences have signalled.
>> +     */
>> +    smp_rmb();
>> +    return obj->mm.unknown_state;
>> +}
>> +
>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>   #include "selftests/huge_gem_object.c"
>>   #include "selftests/huge_pages.c"
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> index e11d82a9f7c3..0bf3ee27a2a8 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> @@ -524,6 +524,7 @@ int i915_gem_object_get_moving_fence(struct 
>> drm_i915_gem_object *obj,
>>                        struct dma_fence **fence);
>>   int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>>                         bool intr);
>> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object 
>> *obj,
>>                        unsigned int cache_level);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> index 2c88bdb8ff7c..5cf36a130061 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> @@ -547,6 +547,24 @@ struct drm_i915_gem_object {
>>            */
>>           bool ttm_shrinkable;
>> +        /**
>> +         * @unknown_state: Indicate that the object is effectively
>> +         * borked. This is write-once and set if we somehow encounter a
>> +         * fatal error when moving/clearing the pages, and we are not
>> +         * able to fallback to memcpy/memset, like on small-BAR systems.
>> +         * The GPU should also be wedged (or in the process) at this
>> +         * point.
>> +         *
>> +         * Only valid to read this after acquiring the dma-resv lock and
>> +         * waiting for all DMA_RESV_USAGE_KERNEL fences to be signalled,
>> +         * or if we otherwise know that the moving fence has signalled,
>> +         * and we are certain the pages underneath are valid for
>> +         * immediate access (under normal operation), like just prior to
>> +         * binding the object or when setting up the CPU fault handler.
>> +         * See i915_gem_object_has_unknown_state();
>> +         */
>> +        bool unknown_state;
>> +
>>           /**
>>            * Priority list of potential placements for this object.
>>            */
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> index 4c25d9b2f138..098409a33e10 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> @@ -675,7 +675,15 @@ static void i915_ttm_swap_notify(struct 
>> ttm_buffer_object *bo)
>>           i915_ttm_purge(obj);
>>   }
>> -static bool i915_ttm_resource_mappable(struct ttm_resource *res)
>> +/**
>> + * i915_ttm_resource_mappable - Return true if the ttm resource is CPU
>> + * accessible.
>> + * @res: The TTM resource to check.
>> + *
>> + * This is interesting on small-BAR systems where we may encounter 
>> lmem objects
>> + * that can't be accessed via the CPU.
>> + */
>> +bool i915_ttm_resource_mappable(struct ttm_resource *res)
>>   {
>>       struct i915_ttm_buddy_resource *bman_res = 
>> to_ttm_buddy_resource(res);
>> @@ -687,6 +695,22 @@ static bool i915_ttm_resource_mappable(struct 
>> ttm_resource *res)
>>   static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct 
>> ttm_resource *mem)
>>   {
>> +    struct drm_i915_gem_object *obj = i915_ttm_to_gem(mem->bo);
>> +    bool unknown_state;
>> +
>> +    if (!obj)
>> +        return -EINVAL;
>> +
>> +    if (!kref_get_unless_zero(&obj->base.refcount))
>> +        return -EINVAL;
>> +
>> +    assert_object_held(obj);
>> +
>> +    unknown_state = i915_gem_object_has_unknown_state(obj);
>> +    i915_gem_object_put(obj);
>> +    if (unknown_state)
>> +        return -EINVAL;
>> +
>>       if (!i915_ttm_cpu_maps_iomem(mem))
>>           return 0;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h 
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>> index 73e371aa3850..e4842b4296fc 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>> @@ -92,4 +92,7 @@ static inline bool i915_ttm_cpu_maps_iomem(struct 
>> ttm_resource *mem)
>>       /* Once / if we support GGTT, this is also false for cached 
>> ttm_tts */
>>       return mem->mem_type != I915_PL_SYSTEM;
>>   }
>> +
>> +bool i915_ttm_resource_mappable(struct ttm_resource *res);
>> +
>>   #endif
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> index a10716f4e717..364e7fe8efb1 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> @@ -33,6 +33,7 @@
>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>   static bool fail_gpu_migration;
>>   static bool fail_work_allocation;
>> +static bool ban_memcpy;
>>   void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>>                       bool work_allocation)
>> @@ -40,6 +41,11 @@ void i915_ttm_migrate_set_failure_modes(bool 
>> gpu_migration,
>>       fail_gpu_migration = gpu_migration;
>>       fail_work_allocation = work_allocation;
>>   }
>> +
>> +void i915_ttm_migrate_set_ban_memcpy(bool ban)
>> +{
>> +    ban_memcpy = ban;
>> +}
>>   #endif
>>   static enum i915_cache_level
>> @@ -258,15 +264,23 @@ struct i915_ttm_memcpy_arg {
>>    * from the callback for lockdep reasons.
>>    * @cb: Callback for the accelerated migration fence.
>>    * @arg: The argument for the memcpy functionality.
>> + * @i915: The i915 pointer.
>> + * @obj: The GEM object.
>> + * @memcpy_allowed: Instead of processing the @arg, and falling back 
>> to memcpy
>> + * or memset, we wedge the device and set the @obj unknown_state, to 
>> prevent
>> + * further access to the object with the CPU or GPU.  On some devices 
>> we might
>> + * only be permitted to use the blitter engine for such operations.
>>    */
>>   struct i915_ttm_memcpy_work {
>>       struct dma_fence fence;
>>       struct work_struct work;
>> -    /* The fence lock */
>>       spinlock_t lock;
>>       struct irq_work irq_work;
>>       struct dma_fence_cb cb;
>>       struct i915_ttm_memcpy_arg arg;
>> +    struct drm_i915_private *i915;
>> +    struct drm_i915_gem_object *obj;
>> +    bool memcpy_allowed;
>>   };
>>   static void i915_ttm_move_memcpy(struct i915_ttm_memcpy_arg *arg)
>> @@ -319,12 +333,34 @@ static void __memcpy_work(struct work_struct *work)
>>       struct i915_ttm_memcpy_arg *arg = &copy_work->arg;
>>       bool cookie = dma_fence_begin_signalling();
>> -    i915_ttm_move_memcpy(arg);
>> +    if (copy_work->memcpy_allowed) {
>> +        i915_ttm_move_memcpy(arg);
>> +    } else {
>> +        /*
>> +         * Prevent further use of the object. Any future GTT binding or
>> +         * CPU access is not allowed once we signal the fence. Outside
>> +         * of the fence critical section, we then also then wedge the 
>> gpu
>> +         * to indicate the device is not functional.
>> +         *
>> +         * The below dma_fence_signal() is our write-memory-barrier.
>> +         */
>> +        copy_work->obj->mm.unknown_state = true;
>> +    }
>> +
>>       dma_fence_end_signalling(cookie);
>>       dma_fence_signal(&copy_work->fence);
>> +    if (!copy_work->memcpy_allowed) {
>> +        struct intel_gt *gt;
>> +        unsigned int id;
>> +
>> +        for_each_gt(gt, copy_work->i915, id)
>> +            intel_gt_set_wedged(gt);
>> +    }
> 
> Did you try to move the gt wedging to before dma_fence_signal, but 
> before dma_fence_end_signalling? Otherwise I think there is a race 
> opportunity (albeit very unlikely) where the gpu might read 
> uninitialized content.

I didn't try moving the set_wedged. But here AFAIK the wedge is not 
needed to prevent GPU access to the pages, more just to indicate that 
something is very broken. Prior to binding the object, either for the 
sync or async case (which must be after we signalled the clear/move here 
I think) we always first sample the unknown_state, and just keep the 
PTEs pointing to scratch if it has been set.

> 
> With that fixed (or not working)
> 
> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> 
>> +
>>       i915_ttm_memcpy_release(arg);
>> +    i915_gem_object_put(copy_work->obj);
>>       dma_fence_put(&copy_work->fence);
>>   }
>> @@ -336,6 +372,7 @@ static void __memcpy_irq_work(struct irq_work 
>> *irq_work)
>>       dma_fence_signal(&copy_work->fence);
>>       i915_ttm_memcpy_release(arg);
>> +    i915_gem_object_put(copy_work->obj);
>>       dma_fence_put(&copy_work->fence);
>>   }
>> @@ -389,6 +426,16 @@ i915_ttm_memcpy_work_arm(struct 
>> i915_ttm_memcpy_work *work,
>>       return &work->fence;
>>   }
>> +static bool i915_ttm_memcpy_allowed(struct ttm_buffer_object *bo,
>> +                    struct ttm_resource *dst_mem)
>> +{
>> +    if (!(i915_ttm_resource_mappable(bo->resource) &&
>> +          i915_ttm_resource_mappable(dst_mem)))
>> +        return false;
>> +
>> +    return I915_SELFTEST_ONLY(ban_memcpy) ? false : true;
>> +}
>> +
>>   static struct dma_fence *
>>   __i915_ttm_move(struct ttm_buffer_object *bo,
>>           const struct ttm_operation_ctx *ctx, bool clear,
>> @@ -396,6 +443,9 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>>           struct i915_refct_sgt *dst_rsgt, bool allow_accel,
>>           const struct i915_deps *move_deps)
>>   {
>> +    const bool memcpy_allowed = i915_ttm_memcpy_allowed(bo, dst_mem);
>> +    struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>> +    struct drm_i915_private *i915 = to_i915(bo->base.dev);
>>       struct i915_ttm_memcpy_work *copy_work = NULL;
>>       struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
>>       struct dma_fence *fence = ERR_PTR(-EINVAL);
>> @@ -423,9 +473,14 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>>               copy_work = kzalloc(sizeof(*copy_work), GFP_KERNEL);
>>           if (copy_work) {
>> +            copy_work->i915 = i915;
>> +            copy_work->memcpy_allowed = memcpy_allowed;
>> +            copy_work->obj = i915_gem_object_get(obj);
>>               arg = &copy_work->arg;
>> -            i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
>> -                         dst_rsgt);
>> +            if (memcpy_allowed)
>> +                i915_ttm_memcpy_init(arg, bo, clear, dst_mem,
>> +                             dst_ttm, dst_rsgt);
>> +
>>               fence = i915_ttm_memcpy_work_arm(copy_work, dep);
>>           } else {
>>               dma_fence_wait(dep, false);
>> @@ -450,17 +505,23 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
>>       }
>>       /* Error intercept failed or no accelerated migration to start 
>> with */
>> -    if (!copy_work)
>> -        i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
>> -                     dst_rsgt);
>> -    i915_ttm_move_memcpy(arg);
>> -    i915_ttm_memcpy_release(arg);
>> +
>> +    if (memcpy_allowed) {
>> +        if (!copy_work)
>> +            i915_ttm_memcpy_init(arg, bo, clear, dst_mem, dst_ttm,
>> +                         dst_rsgt);
>> +        i915_ttm_move_memcpy(arg);
>> +        i915_ttm_memcpy_release(arg);
>> +    }
>> +    if (copy_work)
>> +        i915_gem_object_put(copy_work->obj);
>>       kfree(copy_work);
>> -    return NULL;
>> +    return memcpy_allowed ? NULL : ERR_PTR(-EIO);
>>   out:
>>       if (!fence && copy_work) {
>>           i915_ttm_memcpy_release(arg);
>> +        i915_gem_object_put(copy_work->obj);
>>           kfree(copy_work);
>>       }
>> @@ -539,8 +600,11 @@ int i915_ttm_move(struct ttm_buffer_object *bo, 
>> bool evict,
>>       }
>>       if (migration_fence) {
>> -        ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
>> -                        true, dst_mem);
>> +        if (I915_SELFTEST_ONLY(evict && fail_gpu_migration))
>> +            ret = -EIO; /* never feed non-migrate fences into ttm */
>> +        else
>> +            ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
>> +                            true, dst_mem);
>>           if (ret) {
>>               dma_fence_wait(migration_fence, false);
>>               ttm_bo_move_sync_cleanup(bo, dst_mem);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h 
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
>> index d2e7f149e05c..8a5d5ab0cc34 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
>> @@ -22,6 +22,7 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo);
>>   I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool 
>> gpu_migration,
>>                                     bool work_allocation));
>> +I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_ban_memcpy(bool ban));
>>   int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>>                 struct drm_i915_gem_object *src,
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>> index 801af51aff62..fe6c37fd7859 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>> @@ -9,6 +9,7 @@
>>   #include "i915_deps.h"
>> +#include "selftests/igt_reset.h"
>>   #include "selftests/igt_spinner.h"
>>   static int igt_fill_check_buffer(struct drm_i915_gem_object *obj,
>> @@ -109,7 +110,8 @@ static int igt_same_create_migrate(void *arg)
>>   static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
>>                     struct drm_i915_gem_object *obj,
>> -                  struct i915_vma *vma)
>> +                  struct i915_vma *vma,
>> +                  bool silent_migrate)
>>   {
>>       int err;
>> @@ -138,7 +140,8 @@ static int lmem_pages_migrate_one(struct 
>> i915_gem_ww_ctx *ww,
>>       if (i915_gem_object_is_lmem(obj)) {
>>           err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM);
>>           if (err) {
>> -            pr_err("Object failed migration to smem\n");
>> +            if (!silent_migrate)
>> +                pr_err("Object failed migration to smem\n");
>>               if (err)
>>                   return err;
>>           }
>> @@ -156,7 +159,8 @@ static int lmem_pages_migrate_one(struct 
>> i915_gem_ww_ctx *ww,
>>       } else {
>>           err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM_0);
>>           if (err) {
>> -            pr_err("Object failed migration to lmem\n");
>> +            if (!silent_migrate)
>> +                pr_err("Object failed migration to lmem\n");
>>               if (err)
>>                   return err;
>>           }
>> @@ -179,7 +183,8 @@ static int __igt_lmem_pages_migrate(struct 
>> intel_gt *gt,
>>                       struct i915_address_space *vm,
>>                       struct i915_deps *deps,
>>                       struct igt_spinner *spin,
>> -                    struct dma_fence *spin_fence)
>> +                    struct dma_fence *spin_fence,
>> +                    bool borked_migrate)
>>   {
>>       struct drm_i915_private *i915 = gt->i915;
>>       struct drm_i915_gem_object *obj;
>> @@ -242,7 +247,8 @@ static int __igt_lmem_pages_migrate(struct 
>> intel_gt *gt,
>>        */
>>       for (i = 1; i <= 5; ++i) {
>>           for_i915_gem_ww(&ww, err, true)
>> -            err = lmem_pages_migrate_one(&ww, obj, vma);
>> +            err = lmem_pages_migrate_one(&ww, obj, vma,
>> +                             borked_migrate);
>>           if (err)
>>               goto out_put;
>>       }
>> @@ -283,23 +289,70 @@ static int __igt_lmem_pages_migrate(struct 
>> intel_gt *gt,
>>   static int igt_lmem_pages_failsafe_migrate(void *arg)
>>   {
>> -    int fail_gpu, fail_alloc, ret;
>> +    int fail_gpu, fail_alloc, ban_memcpy, ret;
>>       struct intel_gt *gt = arg;
>>       for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
>>           for (fail_alloc = 0; fail_alloc < 2; ++fail_alloc) {
>> -            pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
>> -                fail_gpu, fail_alloc);
>> -            i915_ttm_migrate_set_failure_modes(fail_gpu,
>> -                               fail_alloc);
>> -            ret = __igt_lmem_pages_migrate(gt, NULL, NULL, NULL, NULL);
>> -            if (ret)
>> -                goto out_err;
>> +            for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
>> +                pr_info("Simulated failure modes: gpu: %d, alloc:%d, 
>> ban_memcpy: %d\n",
>> +                    fail_gpu, fail_alloc, ban_memcpy);
>> +                i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
>> +                i915_ttm_migrate_set_failure_modes(fail_gpu,
>> +                                   fail_alloc);
>> +                ret = __igt_lmem_pages_migrate(gt, NULL, NULL,
>> +                                   NULL, NULL,
>> +                                   ban_memcpy &&
>> +                                   fail_gpu);
>> +
>> +                if (ban_memcpy && fail_gpu) {
>> +                    struct intel_gt *__gt;
>> +                    unsigned int id;
>> +
>> +                    if (ret != -EIO) {
>> +                        pr_err("expected -EIO, got (%d)\n", ret);
>> +                        ret = -EINVAL;
>> +                    } else {
>> +                        ret = 0;
>> +                    }
>> +
>> +                    for_each_gt(__gt, gt->i915, id) {
>> +                        intel_wakeref_t wakeref;
>> +                        bool wedged;
>> +
>> +                        mutex_lock(&__gt->reset.mutex);
>> +                        wedged = test_bit(I915_WEDGED, 
>> &__gt->reset.flags);
>> +                        mutex_unlock(&__gt->reset.mutex);
>> +
>> +                        if (fail_gpu && !fail_alloc) {
>> +                            if (!wedged) {
>> +                                pr_err("gt(%u) not wedged\n", id);
>> +                                ret = -EINVAL;
>> +                                continue;
>> +                            }
>> +                        } else if (wedged) {
>> +                            pr_err("gt(%u) incorrectly wedged\n", id);
>> +                            ret = -EINVAL;
>> +                        } else {
>> +                            continue;
>> +                        }
>> +
>> +                        wakeref = 
>> intel_runtime_pm_get(__gt->uncore->rpm);
>> +                        igt_global_reset_lock(__gt);
>> +                        intel_gt_reset(__gt, ALL_ENGINES, NULL);
>> +                        igt_global_reset_unlock(__gt);
>> +                        intel_runtime_pm_put(__gt->uncore->rpm, 
>> wakeref);
>> +                    }
>> +                    if (ret)
>> +                        goto out_err;
>> +                }
>> +            }
>>           }
>>       }
>>   out_err:
>>       i915_ttm_migrate_set_failure_modes(false, false);
>> +    i915_ttm_migrate_set_ban_memcpy(false);
>>       return ret;
>>   }
>> @@ -370,7 +423,7 @@ static int igt_async_migrate(struct intel_gt *gt)
>>               goto out_ce;
>>           err = __igt_lmem_pages_migrate(gt, &ppgtt->vm, &deps, &spin,
>> -                           spin_fence);
>> +                           spin_fence, false);
>>           i915_deps_fini(&deps);
>>           dma_fence_put(spin_fence);
>>           if (err)
>> @@ -394,23 +447,67 @@ static int igt_async_migrate(struct intel_gt *gt)
>>   #define ASYNC_FAIL_ALLOC 1
>>   static int igt_lmem_async_migrate(void *arg)
>>   {
>> -    int fail_gpu, fail_alloc, ret;
>> +    int fail_gpu, fail_alloc, ban_memcpy, ret;
>>       struct intel_gt *gt = arg;
>>       for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
>>           for (fail_alloc = 0; fail_alloc < ASYNC_FAIL_ALLOC; 
>> ++fail_alloc) {
>> -            pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
>> -                fail_gpu, fail_alloc);
>> -            i915_ttm_migrate_set_failure_modes(fail_gpu,
>> -                               fail_alloc);
>> -            ret = igt_async_migrate(gt);
>> -            if (ret)
>> -                goto out_err;
>> +            for (ban_memcpy = 0; ban_memcpy < 2; ++ban_memcpy) {
>> +                pr_info("Simulated failure modes: gpu: %d, alloc: %d, 
>> ban_memcpy: %d\n",
>> +                    fail_gpu, fail_alloc, ban_memcpy);
>> +                i915_ttm_migrate_set_ban_memcpy(ban_memcpy);
>> +                i915_ttm_migrate_set_failure_modes(fail_gpu,
>> +                                   fail_alloc);
>> +                ret = igt_async_migrate(gt);
>> +
>> +                if (fail_gpu && ban_memcpy) {
>> +                    struct intel_gt *__gt;
>> +                    unsigned int id;
>> +
>> +                    if (ret != -EIO) {
>> +                        pr_err("expected -EIO, got (%d)\n", ret);
>> +                        ret = -EINVAL;
>> +                    } else {
>> +                        ret = 0;
>> +                    }
>> +
>> +                    for_each_gt(__gt, gt->i915, id) {
>> +                        intel_wakeref_t wakeref;
>> +                        bool wedged;
>> +
>> +                        mutex_lock(&__gt->reset.mutex);
>> +                        wedged = test_bit(I915_WEDGED, 
>> &__gt->reset.flags);
>> +                        mutex_unlock(&__gt->reset.mutex);
>> +
>> +                        if (fail_gpu && !fail_alloc) {
>> +                            if (!wedged) {
>> +                                pr_err("gt(%u) not wedged\n", id);
>> +                                ret = -EINVAL;
>> +                                continue;
>> +                            }
>> +                        } else if (wedged) {
>> +                            pr_err("gt(%u) incorrectly wedged\n", id);
>> +                            ret = -EINVAL;
>> +                        } else {
>> +                            continue;
>> +                        }
>> +
>> +                        wakeref = 
>> intel_runtime_pm_get(__gt->uncore->rpm);
>> +                        igt_global_reset_lock(__gt);
>> +                        intel_gt_reset(__gt, ALL_ENGINES, NULL);
>> +                        igt_global_reset_unlock(__gt);
>> +                        intel_runtime_pm_put(__gt->uncore->rpm, 
>> wakeref);
>> +                    }
>> +                }
>> +                if (ret)
>> +                    goto out_err;
>> +            }
>>           }
>>       }
>>   out_err:
>>       i915_ttm_migrate_set_failure_modes(false, false);
>> +    i915_ttm_migrate_set_ban_memcpy(false);
>>       return ret;
>>   }
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> index da28acb78a88..3ced9948a331 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> @@ -10,6 +10,7 @@
>>   #include "gem/i915_gem_internal.h"
>>   #include "gem/i915_gem_region.h"
>>   #include "gem/i915_gem_ttm.h"
>> +#include "gem/i915_gem_ttm_move.h"
>>   #include "gt/intel_engine_pm.h"
>>   #include "gt/intel_gpu_commands.h"
>>   #include "gt/intel_gt.h"
>> @@ -21,6 +22,7 @@
>>   #include "i915_selftest.h"
>>   #include "selftests/i915_random.h"
>>   #include "selftests/igt_flush_test.h"
>> +#include "selftests/igt_reset.h"
>>   #include "selftests/igt_mmap.h"
>>   struct tile {
>> @@ -1163,6 +1165,7 @@ static int ___igt_mmap_migrate(struct 
>> drm_i915_private *i915,
>>   #define IGT_MMAP_MIGRATE_FILL        (1 << 1)
>>   #define IGT_MMAP_MIGRATE_EVICTABLE   (1 << 2)
>>   #define IGT_MMAP_MIGRATE_UNFAULTABLE (1 << 3)
>> +#define IGT_MMAP_MIGRATE_FAIL_GPU    (1 << 4)
>>   static int __igt_mmap_migrate(struct intel_memory_region **placements,
>>                     int n_placements,
>>                     struct intel_memory_region *expected_mr,
>> @@ -1237,13 +1240,62 @@ static int __igt_mmap_migrate(struct 
>> intel_memory_region **placements,
>>       if (flags & IGT_MMAP_MIGRATE_EVICTABLE)
>>           igt_make_evictable(&objects);
>> +    if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
>> +        err = i915_gem_object_lock(obj, NULL);
>> +        if (err)
>> +            goto out_put;
>> +
>> +        /*
>> +         * Ensure we only simulate the gpu failuire when faulting the
>> +         * pages.
>> +         */
>> +        err = i915_gem_object_wait_moving_fence(obj, true);
>> +        i915_gem_object_unlock(obj);
>> +        if (err)
>> +            goto out_put;
>> +        i915_ttm_migrate_set_failure_modes(true, false);
>> +    }
>> +
>>       err = ___igt_mmap_migrate(i915, obj, addr,
>>                     flags & IGT_MMAP_MIGRATE_UNFAULTABLE);
>> +
>>       if (!err && obj->mm.region != expected_mr) {
>>           pr_err("%s region mismatch %s\n", __func__, expected_mr->name);
>>           err = -EINVAL;
>>       }
>> +    if (flags & IGT_MMAP_MIGRATE_FAIL_GPU) {
>> +        struct intel_gt *gt;
>> +        unsigned int id;
>> +
>> +        i915_ttm_migrate_set_failure_modes(false, false);
>> +
>> +        for_each_gt(gt, i915, id) {
>> +            intel_wakeref_t wakeref;
>> +            bool wedged;
>> +
>> +            mutex_lock(&gt->reset.mutex);
>> +            wedged = test_bit(I915_WEDGED, &gt->reset.flags);
>> +            mutex_unlock(&gt->reset.mutex);
>> +            if (!wedged) {
>> +                pr_err("gt(%u) not wedged\n", id);
>> +                err = -EINVAL;
>> +                continue;
>> +            }
>> +
>> +            wakeref = intel_runtime_pm_get(gt->uncore->rpm);
>> +            igt_global_reset_lock(gt);
>> +            intel_gt_reset(gt, ALL_ENGINES, NULL);
>> +            igt_global_reset_unlock(gt);
>> +            intel_runtime_pm_put(gt->uncore->rpm, wakeref);
>> +        }
>> +
>> +        if (!i915_gem_object_has_unknown_state(obj)) {
>> +            pr_err("object missing unknown_state\n");
>> +            err = -EINVAL;
>> +        }
>> +    }
>> +
>>   out_put:
>>       i915_gem_object_put(obj);
>>       igt_close_objects(i915, &objects);
>> @@ -1324,6 +1376,23 @@ static int igt_mmap_migrate(void *arg)
>>                        IGT_MMAP_MIGRATE_TOPDOWN |
>>                        IGT_MMAP_MIGRATE_FILL |
>>                        IGT_MMAP_MIGRATE_UNFAULTABLE);
>> +        if (err)
>> +            goto out_io_size;
>> +
>> +        /*
>> +         * Allocate in the non-mappable portion, but force migrating to
>> +         * the mappable portion on fault (LMEM -> LMEM). We then also
>> +         * simulate a gpu error when moving the pages when faulting the
>> +         * pages, which should result in wedging the gpu and returning
>> +         * SIGBUS in the fault handler, since we can't fallback to
>> +         * memcpy.
>> +         */
>> +        err = __igt_mmap_migrate(single, ARRAY_SIZE(single), mr,
>> +                     IGT_MMAP_MIGRATE_TOPDOWN |
>> +                     IGT_MMAP_MIGRATE_FILL |
>> +                     IGT_MMAP_MIGRATE_EVICTABLE |
>> +                     IGT_MMAP_MIGRATE_FAIL_GPU |
>> +                     IGT_MMAP_MIGRATE_UNFAULTABLE);
>>   out_io_size:
>>           mr->io_size = saved_io_size;
>>           i915_ttm_buddy_man_force_visible_size(man,
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index 5d5828b9a242..43339ecabd73 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -310,7 +310,7 @@ struct i915_vma_work {
>>       struct i915_address_space *vm;
>>       struct i915_vm_pt_stash stash;
>>       struct i915_vma_resource *vma_res;
>> -    struct drm_i915_gem_object *pinned;
>> +    struct drm_i915_gem_object *obj;
>>       struct i915_sw_dma_fence_cb cb;
>>       enum i915_cache_level cache_level;
>>       unsigned int flags;
>> @@ -321,17 +321,25 @@ static void __vma_bind(struct dma_fence_work *work)
>>       struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>>       struct i915_vma_resource *vma_res = vw->vma_res;
>> +    /*
>> +     * We are about the bind the object, which must mean we have already
>> +     * signaled the work to potentially clear/move the pages 
>> underneath. If
>> +     * something went wrong at that stage then the object should have
>> +     * unknown_state set, in which case we need to skip the bind.
>> +     */
>> +    if (i915_gem_object_has_unknown_state(vw->obj))
>> +        return;
>> +
>>       vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
>>                      vma_res, vw->cache_level, vw->flags);
>> -
>>   }
>>   static void __vma_release(struct dma_fence_work *work)
>>   {
>>       struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>> -    if (vw->pinned)
>> -        i915_gem_object_put(vw->pinned);
>> +    if (vw->obj)
>> +        i915_gem_object_put(vw->obj);
>>       i915_vm_free_pt_stash(vw->vm, &vw->stash);
>>       if (vw->vma_res)
>> @@ -517,14 +525,7 @@ int i915_vma_bind(struct i915_vma *vma,
>>           }
>>           work->base.dma.error = 0; /* enable the queue_work() */
>> -
>> -        /*
>> -         * If we don't have the refcounted pages list, keep a reference
>> -         * on the object to avoid waiting for the async bind to
>> -         * complete in the object destruction path.
>> -         */
>> -        if (!work->vma_res->bi.pages_rsgt)
>> -            work->pinned = i915_gem_object_get(vma->obj);
>> +        work->obj = i915_gem_object_get(vma->obj);
>>       } else {
>>           ret = i915_gem_object_wait_moving_fence(vma->obj, true);
>>           if (ret) {

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 09/13] drm/i915/selftests: skip the mman tests for stolen
  2022-06-29 16:22     ` [Intel-gfx] " Thomas Hellström
@ 2022-06-29 16:42       ` Matthew Auld
  -1 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 16:42 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx
  Cc: Tvrtko Ursulin, Jordan Justen, Lionel Landwerlin,
	Kenneth Graunke, Jon Bloomfield, dri-devel, Daniel Vetter,
	Akeem G Abodunrin

On 29/06/2022 17:22, Thomas Hellström wrote:
> 
> On 6/29/22 14:14, Matthew Auld wrote:
>> It's not supported, and just skips later anyway. With small-BAR things
>> get more complicated since all of stolen is likely not even CPU
>> accessible, hence not passing I915_BO_ALLOC_GPU_ONLY just results in the
>> object create failing.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Kenneth Graunke <kenneth@whitecape.org>
>> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
> 
> This reminds me,
> 
> Is there a problem for fbdev (and hence things like plymouth) if the 
> initial fbdev image ends up as a stolen memory object which in turn ends 
> up not being mappable? I remember we discussed this before but can't 
> recall what the answer was.

On discrete the initial-fb looks to be allocated directly from lmem (at 
least on the machines I've seen in CI). See 7fe7c2a679dc ("drm/i915: 
fixup the initial fb base on DGFX"). And from what I could tell the 
offset in lmem is always at the beginning somewhere, which makes sense 
given stuff like small-BAR. But yeah, the create_at() helper should 
complain if someone tried to allocate the initial-fb or similar outside 
the mappable part. IIRC the only user of stolen-lmem is fbc, but that 
doesn't seem to need CPU access.

> 
> Anyway, for this patch
> 
> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> 
> 
> 
> 
>>
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> index 5bc93a1ce3e3..388c85b0f764 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> @@ -979,6 +979,9 @@ static int igt_mmap(void *arg)
>>           };
>>           int i;
>> +        if (mr->private)
>> +            continue;
>> +
>>           for (i = 0; i < ARRAY_SIZE(sizes); i++) {
>>               struct drm_i915_gem_object *obj;
>>               int err;
>> @@ -1435,6 +1438,9 @@ static int igt_mmap_access(void *arg)
>>           struct drm_i915_gem_object *obj;
>>           int err;
>> +        if (mr->private)
>> +            continue;
>> +
>>           obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>>           if (obj == ERR_PTR(-ENODEV))
>>               continue;
>> @@ -1580,6 +1586,9 @@ static int igt_mmap_gpu(void *arg)
>>           struct drm_i915_gem_object *obj;
>>           int err;
>> +        if (mr->private)
>> +            continue;
>> +
>>           obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>>           if (obj == ERR_PTR(-ENODEV))
>>               continue;
>> @@ -1727,6 +1736,9 @@ static int igt_mmap_revoke(void *arg)
>>           struct drm_i915_gem_object *obj;
>>           int err;
>> +        if (mr->private)
>> +            continue;
>> +
>>           obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>>           if (obj == ERR_PTR(-ENODEV))
>>               continue;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx] [PATCH v3 09/13] drm/i915/selftests: skip the mman tests for stolen
@ 2022-06-29 16:42       ` Matthew Auld
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Auld @ 2022-06-29 16:42 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx
  Cc: Kenneth Graunke, dri-devel, Daniel Vetter

On 29/06/2022 17:22, Thomas Hellström wrote:
> 
> On 6/29/22 14:14, Matthew Auld wrote:
>> It's not supported, and just skips later anyway. With small-BAR things
>> get more complicated since all of stolen is likely not even CPU
>> accessible, hence not passing I915_BO_ALLOC_GPU_ONLY just results in the
>> object create failing.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Kenneth Graunke <kenneth@whitecape.org>
>> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
> 
> This reminds me,
> 
> Is there a problem for fbdev (and hence things like plymouth) if the 
> initial fbdev image ends up as a stolen memory object which in turn ends 
> up not being mappable? I remember we discussed this before but can't 
> recall what the answer was.

On discrete the initial-fb looks to be allocated directly from lmem (at 
least on the machines I've seen in CI). See 7fe7c2a679dc ("drm/i915: 
fixup the initial fb base on DGFX"). And from what I could tell the 
offset in lmem is always at the beginning somewhere, which makes sense 
given stuff like small-BAR. But yeah, the create_at() helper should 
complain if someone tried to allocate the initial-fb or similar outside 
the mappable part. IIRC the only user of stolen-lmem is fbc, but that 
doesn't seem to need CPU access.

> 
> Anyway, for this patch
> 
> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> 
> 
> 
> 
>>
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> index 5bc93a1ce3e3..388c85b0f764 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> @@ -979,6 +979,9 @@ static int igt_mmap(void *arg)
>>           };
>>           int i;
>> +        if (mr->private)
>> +            continue;
>> +
>>           for (i = 0; i < ARRAY_SIZE(sizes); i++) {
>>               struct drm_i915_gem_object *obj;
>>               int err;
>> @@ -1435,6 +1438,9 @@ static int igt_mmap_access(void *arg)
>>           struct drm_i915_gem_object *obj;
>>           int err;
>> +        if (mr->private)
>> +            continue;
>> +
>>           obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>>           if (obj == ERR_PTR(-ENODEV))
>>               continue;
>> @@ -1580,6 +1586,9 @@ static int igt_mmap_gpu(void *arg)
>>           struct drm_i915_gem_object *obj;
>>           int err;
>> +        if (mr->private)
>> +            continue;
>> +
>>           obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>>           if (obj == ERR_PTR(-ENODEV))
>>               continue;
>> @@ -1727,6 +1736,9 @@ static int igt_mmap_revoke(void *arg)
>>           struct drm_i915_gem_object *obj;
>>           int err;
>> +        if (mr->private)
>> +            continue;
>> +
>>           obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
>>           if (obj == ERR_PTR(-ENODEV))
>>               continue;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 11/13] drm/i915/ttm: handle blitter failure on DG2
  2022-06-29 16:28       ` [Intel-gfx] " Matthew Auld
@ 2022-06-29 16:47         ` Thomas Hellström
  -1 siblings, 0 replies; 48+ messages in thread
From: Thomas Hellström @ 2022-06-29 16:47 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx
  Cc: Tvrtko Ursulin, Jordan Justen, Lionel Landwerlin,
	Kenneth Graunke, Jon Bloomfield, dri-devel, Daniel Vetter,
	Akeem G Abodunrin


On 6/29/22 18:28, Matthew Auld wrote:
> On 29/06/2022 17:11, Thomas Hellström wrote:
>> Hi, Matthew,
>>
>> On 6/29/22 14:14, Matthew Auld wrote:
>>> If the move or clear operation somehow fails, and the memory underneath
>>> is not cleared, like when moving to lmem, then we currently fallback to
>>> memcpy or memset. However with small-BAR systems this fallback might no
>>> longer be possible. For now we use the set_wedged sledgehammer if we
>>> ever encounter such a scenario, and mark the object as borked to plug
>>> any holes where access to the memory underneath can happen. Add some
>>> basic selftests to exercise this.
>>>
>>> v2:
>>>    - In the selftests make sure we grab the runtime pm around the 
>>> reset.
>>>      Also make sure we grab the reset lock before checking if the 
>>> device
>>>      is wedged, since the wedge might still be in-progress and hence 
>>> the
>>>      bit might not be set yet.
>>>    - Don't wedge or put the object into an unknown state, if the 
>>> request
>>>      construction fails (or similar). Just returning an error and
>>>      skipping the fallback should be safe here.
>>>    - Make sure we wedge each gt. (Thomas)
>>>    - Peek at the unknown_state in io_reserve, that way we don't have to
>>>      export or hand roll the fault_wait_for_idle. (Thomas)
>>>    - Add the missing read-side barriers for the unknown_state. (Thomas)
>>>    - Some kernel-doc fixes. (Thomas)
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>>> Cc: Kenneth Graunke <kenneth@whitecape.org>
>>> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    |  21 +++
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |   1 +
>>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +++
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  26 +++-
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   3 +
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  88 +++++++++--
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |   1 +
>>>   .../drm/i915/gem/selftests/i915_gem_migrate.c | 141 
>>> +++++++++++++++---
>>>   .../drm/i915/gem/selftests/i915_gem_mman.c    |  69 +++++++++
>>>   drivers/gpu/drm/i915/i915_vma.c               |  25 ++--
>>>   10 files changed, 346 insertions(+), 47 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> index 06b1b188ce5a..642a5d59ce26 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> @@ -783,10 +783,31 @@ int i915_gem_object_wait_moving_fence(struct 
>>> drm_i915_gem_object *obj,
>>>                       intr, MAX_SCHEDULE_TIMEOUT);
>>>       if (!ret)
>>>           ret = -ETIME;
>>> +    else if (ret > 0 && i915_gem_object_has_unknown_state(obj))
>>> +        ret = -EIO;
>>>       return ret < 0 ? ret : 0;
>>>   }
>>> +/**
>>> + * i915_gem_object_has_unknown_state - Return true if the object 
>>> backing pages are
>>> + * in an unknown_state. This means that userspace must NEVER be 
>>> allowed to touch
>>> + * the pages, with either the GPU or CPU.
>>> + *
>>> + * ONLY valid to be called after ensuring that all kernel fences 
>>> have signalled
>>> + * (in particular the fence for moving/clearing the object).
>>> + */
>>> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object 
>>> *obj)
>>> +{
>>> +    /*
>>> +     * The below barrier pairs with the dma_fence_signal() in
>>> +     * __memcpy_work(). We should only sample the unknown_state 
>>> after all
>>> +     * the kernel fences have signalled.
>>> +     */
>>> +    smp_rmb();
>>> +    return obj->mm.unknown_state;
>>> +}
>>> +
>>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>>   #include "selftests/huge_gem_object.c"
>>>   #include "selftests/huge_pages.c"
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> index e11d82a9f7c3..0bf3ee27a2a8 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> @@ -524,6 +524,7 @@ int i915_gem_object_get_moving_fence(struct 
>>> drm_i915_gem_object *obj,
>>>                        struct dma_fence **fence);
>>>   int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object 
>>> *obj,
>>>                         bool intr);
>>> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object 
>>> *obj);
>>>   void i915_gem_object_set_cache_coherency(struct 
>>> drm_i915_gem_object *obj,
>>>                        unsigned int cache_level);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> index 2c88bdb8ff7c..5cf36a130061 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> @@ -547,6 +547,24 @@ struct drm_i915_gem_object {
>>>            */
>>>           bool ttm_shrinkable;
>>> +        /**
>>> +         * @unknown_state: Indicate that the object is effectively
>>> +         * borked. This is write-once and set if we somehow 
>>> encounter a
>>> +         * fatal error when moving/clearing the pages, and we are not
>>> +         * able to fallback to memcpy/memset, like on small-BAR 
>>> systems.
>>> +         * The GPU should also be wedged (or in the process) at this
>>> +         * point.
>>> +         *
>>> +         * Only valid to read this after acquiring the dma-resv 
>>> lock and
>>> +         * waiting for all DMA_RESV_USAGE_KERNEL fences to be 
>>> signalled,
>>> +         * or if we otherwise know that the moving fence has 
>>> signalled,
>>> +         * and we are certain the pages underneath are valid for
>>> +         * immediate access (under normal operation), like just 
>>> prior to
>>> +         * binding the object or when setting up the CPU fault 
>>> handler.
>>> +         * See i915_gem_object_has_unknown_state();
>>> +         */
>>> +        bool unknown_state;
>>> +
>>>           /**
>>>            * Priority list of potential placements for this object.
>>>            */
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> index 4c25d9b2f138..098409a33e10 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> @@ -675,7 +675,15 @@ static void i915_ttm_swap_notify(struct 
>>> ttm_buffer_object *bo)
>>>           i915_ttm_purge(obj);
>>>   }
>>> -static bool i915_ttm_resource_mappable(struct ttm_resource *res)
>>> +/**
>>> + * i915_ttm_resource_mappable - Return true if the ttm resource is CPU
>>> + * accessible.
>>> + * @res: The TTM resource to check.
>>> + *
>>> + * This is interesting on small-BAR systems where we may encounter 
>>> lmem objects
>>> + * that can't be accessed via the CPU.
>>> + */
>>> +bool i915_ttm_resource_mappable(struct ttm_resource *res)
>>>   {
>>>       struct i915_ttm_buddy_resource *bman_res = 
>>> to_ttm_buddy_resource(res);
>>> @@ -687,6 +695,22 @@ static bool i915_ttm_resource_mappable(struct 
>>> ttm_resource *res)
>>>   static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct 
>>> ttm_resource *mem)
>>>   {
>>> +    struct drm_i915_gem_object *obj = i915_ttm_to_gem(mem->bo);
>>> +    bool unknown_state;
>>> +
>>> +    if (!obj)
>>> +        return -EINVAL;
>>> +
>>> +    if (!kref_get_unless_zero(&obj->base.refcount))
>>> +        return -EINVAL;
>>> +
>>> +    assert_object_held(obj);
>>> +
>>> +    unknown_state = i915_gem_object_has_unknown_state(obj);
>>> +    i915_gem_object_put(obj);
>>> +    if (unknown_state)
>>> +        return -EINVAL;
>>> +
>>>       if (!i915_ttm_cpu_maps_iomem(mem))
>>>           return 0;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>>> index 73e371aa3850..e4842b4296fc 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>>> @@ -92,4 +92,7 @@ static inline bool i915_ttm_cpu_maps_iomem(struct 
>>> ttm_resource *mem)
>>>       /* Once / if we support GGTT, this is also false for cached 
>>> ttm_tts */
>>>       return mem->mem_type != I915_PL_SYSTEM;
>>>   }
>>> +
>>> +bool i915_ttm_resource_mappable(struct ttm_resource *res);
>>> +
>>>   #endif
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> index a10716f4e717..364e7fe8efb1 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> @@ -33,6 +33,7 @@
>>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>>   static bool fail_gpu_migration;
>>>   static bool fail_work_allocation;
>>> +static bool ban_memcpy;
>>>   void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>>>                       bool work_allocation)
>>> @@ -40,6 +41,11 @@ void i915_ttm_migrate_set_failure_modes(bool 
>>> gpu_migration,
>>>       fail_gpu_migration = gpu_migration;
>>>       fail_work_allocation = work_allocation;
>>>   }
>>> +
>>> +void i915_ttm_migrate_set_ban_memcpy(bool ban)
>>> +{
>>> +    ban_memcpy = ban;
>>> +}
>>>   #endif
>>>   static enum i915_cache_level
>>> @@ -258,15 +264,23 @@ struct i915_ttm_memcpy_arg {
>>>    * from the callback for lockdep reasons.
>>>    * @cb: Callback for the accelerated migration fence.
>>>    * @arg: The argument for the memcpy functionality.
>>> + * @i915: The i915 pointer.
>>> + * @obj: The GEM object.
>>> + * @memcpy_allowed: Instead of processing the @arg, and falling 
>>> back to memcpy
>>> + * or memset, we wedge the device and set the @obj unknown_state, 
>>> to prevent
>>> + * further access to the object with the CPU or GPU.  On some 
>>> devices we might
>>> + * only be permitted to use the blitter engine for such operations.
>>>    */
>>>   struct i915_ttm_memcpy_work {
>>>       struct dma_fence fence;
>>>       struct work_struct work;
>>> -    /* The fence lock */
>>>       spinlock_t lock;
>>>       struct irq_work irq_work;
>>>       struct dma_fence_cb cb;
>>>       struct i915_ttm_memcpy_arg arg;
>>> +    struct drm_i915_private *i915;
>>> +    struct drm_i915_gem_object *obj;
>>> +    bool memcpy_allowed;
>>>   };
>>>   static void i915_ttm_move_memcpy(struct i915_ttm_memcpy_arg *arg)
>>> @@ -319,12 +333,34 @@ static void __memcpy_work(struct work_struct 
>>> *work)
>>>       struct i915_ttm_memcpy_arg *arg = &copy_work->arg;
>>>       bool cookie = dma_fence_begin_signalling();
>>> -    i915_ttm_move_memcpy(arg);
>>> +    if (copy_work->memcpy_allowed) {
>>> +        i915_ttm_move_memcpy(arg);
>>> +    } else {
>>> +        /*
>>> +         * Prevent further use of the object. Any future GTT 
>>> binding or
>>> +         * CPU access is not allowed once we signal the fence. Outside
>>> +         * of the fence critical section, we then also then wedge 
>>> the gpu
>>> +         * to indicate the device is not functional.
>>> +         *
>>> +         * The below dma_fence_signal() is our write-memory-barrier.
>>> +         */
>>> +        copy_work->obj->mm.unknown_state = true;
>>> +    }
>>> +
>>>       dma_fence_end_signalling(cookie);
>>>       dma_fence_signal(&copy_work->fence);
>>> +    if (!copy_work->memcpy_allowed) {
>>> +        struct intel_gt *gt;
>>> +        unsigned int id;
>>> +
>>> +        for_each_gt(gt, copy_work->i915, id)
>>> +            intel_gt_set_wedged(gt);
>>> +    }
>>
>> Did you try to move the gt wedging to before dma_fence_signal, but 
>> before dma_fence_end_signalling? Otherwise I think there is a race 
>> opportunity (albeit very unlikely) where the gpu might read 
>> uninitialized content.
>
> I didn't try moving the set_wedged. But here AFAIK the wedge is not 
> needed to prevent GPU access to the pages, more just to indicate that 
> something is very broken. Prior to binding the object, either for the 
> sync or async case (which must be after we signalled the clear/move 
> here I think) we always first sample the unknown_state, and just keep 
> the PTEs pointing to scratch if it has been set.

Ah yes, understood.

/Thomas



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx] [PATCH v3 11/13] drm/i915/ttm: handle blitter failure on DG2
@ 2022-06-29 16:47         ` Thomas Hellström
  0 siblings, 0 replies; 48+ messages in thread
From: Thomas Hellström @ 2022-06-29 16:47 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx; +Cc: Kenneth Graunke, dri-devel, Daniel Vetter


On 6/29/22 18:28, Matthew Auld wrote:
> On 29/06/2022 17:11, Thomas Hellström wrote:
>> Hi, Matthew,
>>
>> On 6/29/22 14:14, Matthew Auld wrote:
>>> If the move or clear operation somehow fails, and the memory underneath
>>> is not cleared, like when moving to lmem, then we currently fallback to
>>> memcpy or memset. However with small-BAR systems this fallback might no
>>> longer be possible. For now we use the set_wedged sledgehammer if we
>>> ever encounter such a scenario, and mark the object as borked to plug
>>> any holes where access to the memory underneath can happen. Add some
>>> basic selftests to exercise this.
>>>
>>> v2:
>>>    - In the selftests make sure we grab the runtime pm around the 
>>> reset.
>>>      Also make sure we grab the reset lock before checking if the 
>>> device
>>>      is wedged, since the wedge might still be in-progress and hence 
>>> the
>>>      bit might not be set yet.
>>>    - Don't wedge or put the object into an unknown state, if the 
>>> request
>>>      construction fails (or similar). Just returning an error and
>>>      skipping the fallback should be safe here.
>>>    - Make sure we wedge each gt. (Thomas)
>>>    - Peek at the unknown_state in io_reserve, that way we don't have to
>>>      export or hand roll the fault_wait_for_idle. (Thomas)
>>>    - Add the missing read-side barriers for the unknown_state. (Thomas)
>>>    - Some kernel-doc fixes. (Thomas)
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>>> Cc: Kenneth Graunke <kenneth@whitecape.org>
>>> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    |  21 +++
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |   1 +
>>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +++
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  26 +++-
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   3 +
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  88 +++++++++--
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |   1 +
>>>   .../drm/i915/gem/selftests/i915_gem_migrate.c | 141 
>>> +++++++++++++++---
>>>   .../drm/i915/gem/selftests/i915_gem_mman.c    |  69 +++++++++
>>>   drivers/gpu/drm/i915/i915_vma.c               |  25 ++--
>>>   10 files changed, 346 insertions(+), 47 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> index 06b1b188ce5a..642a5d59ce26 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> @@ -783,10 +783,31 @@ int i915_gem_object_wait_moving_fence(struct 
>>> drm_i915_gem_object *obj,
>>>                       intr, MAX_SCHEDULE_TIMEOUT);
>>>       if (!ret)
>>>           ret = -ETIME;
>>> +    else if (ret > 0 && i915_gem_object_has_unknown_state(obj))
>>> +        ret = -EIO;
>>>       return ret < 0 ? ret : 0;
>>>   }
>>> +/**
>>> + * i915_gem_object_has_unknown_state - Return true if the object 
>>> backing pages are
>>> + * in an unknown_state. This means that userspace must NEVER be 
>>> allowed to touch
>>> + * the pages, with either the GPU or CPU.
>>> + *
>>> + * ONLY valid to be called after ensuring that all kernel fences 
>>> have signalled
>>> + * (in particular the fence for moving/clearing the object).
>>> + */
>>> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object 
>>> *obj)
>>> +{
>>> +    /*
>>> +     * The below barrier pairs with the dma_fence_signal() in
>>> +     * __memcpy_work(). We should only sample the unknown_state 
>>> after all
>>> +     * the kernel fences have signalled.
>>> +     */
>>> +    smp_rmb();
>>> +    return obj->mm.unknown_state;
>>> +}
>>> +
>>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>>   #include "selftests/huge_gem_object.c"
>>>   #include "selftests/huge_pages.c"
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> index e11d82a9f7c3..0bf3ee27a2a8 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> @@ -524,6 +524,7 @@ int i915_gem_object_get_moving_fence(struct 
>>> drm_i915_gem_object *obj,
>>>                        struct dma_fence **fence);
>>>   int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object 
>>> *obj,
>>>                         bool intr);
>>> +bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object 
>>> *obj);
>>>   void i915_gem_object_set_cache_coherency(struct 
>>> drm_i915_gem_object *obj,
>>>                        unsigned int cache_level);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> index 2c88bdb8ff7c..5cf36a130061 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> @@ -547,6 +547,24 @@ struct drm_i915_gem_object {
>>>            */
>>>           bool ttm_shrinkable;
>>> +        /**
>>> +         * @unknown_state: Indicate that the object is effectively
>>> +         * borked. This is write-once and set if we somehow 
>>> encounter a
>>> +         * fatal error when moving/clearing the pages, and we are not
>>> +         * able to fallback to memcpy/memset, like on small-BAR 
>>> systems.
>>> +         * The GPU should also be wedged (or in the process) at this
>>> +         * point.
>>> +         *
>>> +         * Only valid to read this after acquiring the dma-resv 
>>> lock and
>>> +         * waiting for all DMA_RESV_USAGE_KERNEL fences to be 
>>> signalled,
>>> +         * or if we otherwise know that the moving fence has 
>>> signalled,
>>> +         * and we are certain the pages underneath are valid for
>>> +         * immediate access (under normal operation), like just 
>>> prior to
>>> +         * binding the object or when setting up the CPU fault 
>>> handler.
>>> +         * See i915_gem_object_has_unknown_state();
>>> +         */
>>> +        bool unknown_state;
>>> +
>>>           /**
>>>            * Priority list of potential placements for this object.
>>>            */
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> index 4c25d9b2f138..098409a33e10 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> @@ -675,7 +675,15 @@ static void i915_ttm_swap_notify(struct 
>>> ttm_buffer_object *bo)
>>>           i915_ttm_purge(obj);
>>>   }
>>> -static bool i915_ttm_resource_mappable(struct ttm_resource *res)
>>> +/**
>>> + * i915_ttm_resource_mappable - Return true if the ttm resource is CPU
>>> + * accessible.
>>> + * @res: The TTM resource to check.
>>> + *
>>> + * This is interesting on small-BAR systems where we may encounter 
>>> lmem objects
>>> + * that can't be accessed via the CPU.
>>> + */
>>> +bool i915_ttm_resource_mappable(struct ttm_resource *res)
>>>   {
>>>       struct i915_ttm_buddy_resource *bman_res = 
>>> to_ttm_buddy_resource(res);
>>> @@ -687,6 +695,22 @@ static bool i915_ttm_resource_mappable(struct 
>>> ttm_resource *res)
>>>   static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct 
>>> ttm_resource *mem)
>>>   {
>>> +    struct drm_i915_gem_object *obj = i915_ttm_to_gem(mem->bo);
>>> +    bool unknown_state;
>>> +
>>> +    if (!obj)
>>> +        return -EINVAL;
>>> +
>>> +    if (!kref_get_unless_zero(&obj->base.refcount))
>>> +        return -EINVAL;
>>> +
>>> +    assert_object_held(obj);
>>> +
>>> +    unknown_state = i915_gem_object_has_unknown_state(obj);
>>> +    i915_gem_object_put(obj);
>>> +    if (unknown_state)
>>> +        return -EINVAL;
>>> +
>>>       if (!i915_ttm_cpu_maps_iomem(mem))
>>>           return 0;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>>> index 73e371aa3850..e4842b4296fc 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>>> @@ -92,4 +92,7 @@ static inline bool i915_ttm_cpu_maps_iomem(struct 
>>> ttm_resource *mem)
>>>       /* Once / if we support GGTT, this is also false for cached 
>>> ttm_tts */
>>>       return mem->mem_type != I915_PL_SYSTEM;
>>>   }
>>> +
>>> +bool i915_ttm_resource_mappable(struct ttm_resource *res);
>>> +
>>>   #endif
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> index a10716f4e717..364e7fe8efb1 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> @@ -33,6 +33,7 @@
>>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>>   static bool fail_gpu_migration;
>>>   static bool fail_work_allocation;
>>> +static bool ban_memcpy;
>>>   void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>>>                       bool work_allocation)
>>> @@ -40,6 +41,11 @@ void i915_ttm_migrate_set_failure_modes(bool 
>>> gpu_migration,
>>>       fail_gpu_migration = gpu_migration;
>>>       fail_work_allocation = work_allocation;
>>>   }
>>> +
>>> +void i915_ttm_migrate_set_ban_memcpy(bool ban)
>>> +{
>>> +    ban_memcpy = ban;
>>> +}
>>>   #endif
>>>   static enum i915_cache_level
>>> @@ -258,15 +264,23 @@ struct i915_ttm_memcpy_arg {
>>>    * from the callback for lockdep reasons.
>>>    * @cb: Callback for the accelerated migration fence.
>>>    * @arg: The argument for the memcpy functionality.
>>> + * @i915: The i915 pointer.
>>> + * @obj: The GEM object.
>>> + * @memcpy_allowed: Instead of processing the @arg, and falling 
>>> back to memcpy
>>> + * or memset, we wedge the device and set the @obj unknown_state, 
>>> to prevent
>>> + * further access to the object with the CPU or GPU.  On some 
>>> devices we might
>>> + * only be permitted to use the blitter engine for such operations.
>>>    */
>>>   struct i915_ttm_memcpy_work {
>>>       struct dma_fence fence;
>>>       struct work_struct work;
>>> -    /* The fence lock */
>>>       spinlock_t lock;
>>>       struct irq_work irq_work;
>>>       struct dma_fence_cb cb;
>>>       struct i915_ttm_memcpy_arg arg;
>>> +    struct drm_i915_private *i915;
>>> +    struct drm_i915_gem_object *obj;
>>> +    bool memcpy_allowed;
>>>   };
>>>   static void i915_ttm_move_memcpy(struct i915_ttm_memcpy_arg *arg)
>>> @@ -319,12 +333,34 @@ static void __memcpy_work(struct work_struct 
>>> *work)
>>>       struct i915_ttm_memcpy_arg *arg = &copy_work->arg;
>>>       bool cookie = dma_fence_begin_signalling();
>>> -    i915_ttm_move_memcpy(arg);
>>> +    if (copy_work->memcpy_allowed) {
>>> +        i915_ttm_move_memcpy(arg);
>>> +    } else {
>>> +        /*
>>> +         * Prevent further use of the object. Any future GTT 
>>> binding or
>>> +         * CPU access is not allowed once we signal the fence. Outside
>>> +         * of the fence critical section, we then also then wedge 
>>> the gpu
>>> +         * to indicate the device is not functional.
>>> +         *
>>> +         * The below dma_fence_signal() is our write-memory-barrier.
>>> +         */
>>> +        copy_work->obj->mm.unknown_state = true;
>>> +    }
>>> +
>>>       dma_fence_end_signalling(cookie);
>>>       dma_fence_signal(&copy_work->fence);
>>> +    if (!copy_work->memcpy_allowed) {
>>> +        struct intel_gt *gt;
>>> +        unsigned int id;
>>> +
>>> +        for_each_gt(gt, copy_work->i915, id)
>>> +            intel_gt_set_wedged(gt);
>>> +    }
>>
>> Did you try to move the gt wedging to before dma_fence_signal, but 
>> before dma_fence_end_signalling? Otherwise I think there is a race 
>> opportunity (albeit very unlikely) where the gpu might read 
>> uninitialized content.
>
> I didn't try moving the set_wedged. But here AFAIK the wedge is not 
> needed to prevent GPU access to the pages, more just to indicate that 
> something is very broken. Prior to binding the object, either for the 
> sync or async case (which must be after we signalled the clear/move 
> here I think) we always first sample the unknown_state, and just keep 
> the PTEs pointing to scratch if it has been set.

Ah yes, understood.

/Thomas



^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2022-06-29 16:47 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29 12:14 [PATCH v3 00/13] small BAR uapi bits Matthew Auld
2022-06-29 12:14 ` [Intel-gfx] " Matthew Auld
2022-06-29 12:14 ` [PATCH v3 01/13] drm/doc: add rfc section for small BAR uapi Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 12:14 ` [PATCH v3 02/13] drm/i915/uapi: add probed_cpu_visible_size Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 12:14 ` [PATCH v3 03/13] drm/i915/uapi: expose the avail tracking Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 12:14 ` [PATCH v3 04/13] drm/i915: remove intel_memory_region avail Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 12:14 ` [PATCH v3 05/13] drm/i915/uapi: apply ALLOC_GPU_ONLY by default Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 12:14 ` [PATCH v3 06/13] drm/i915/uapi: add NEEDS_CPU_ACCESS hint Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 12:14 ` [PATCH v3 07/13] drm/i915/error: skip non-mappable pages Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 12:14 ` [Intel-gfx] [PATCH v3 08/13] drm/i915/uapi: tweak error capture on recoverable contexts Matthew Auld
2022-06-29 12:14   ` Matthew Auld
2022-06-29 12:14 ` [PATCH v3 09/13] drm/i915/selftests: skip the mman tests for stolen Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 16:22   ` Thomas Hellström
2022-06-29 16:22     ` [Intel-gfx] " Thomas Hellström
2022-06-29 16:42     ` Matthew Auld
2022-06-29 16:42       ` [Intel-gfx] " Matthew Auld
2022-06-29 12:14 ` [PATCH v3 10/13] drm/i915/selftests: ensure we reserve a fence slot Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 12:14 ` [PATCH v3 11/13] drm/i915/ttm: handle blitter failure on DG2 Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 16:11   ` Thomas Hellström
2022-06-29 16:11     ` [Intel-gfx] " Thomas Hellström
2022-06-29 16:28     ` Matthew Auld
2022-06-29 16:28       ` [Intel-gfx] " Matthew Auld
2022-06-29 16:47       ` Thomas Hellström
2022-06-29 16:47         ` [Intel-gfx] " Thomas Hellström
2022-06-29 12:14 ` [PATCH v3 12/13] drm/i915/ttm: disallow CPU fallback mode for ccs pages Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 12:43   ` Ramalingam C
2022-06-29 12:43     ` [Intel-gfx] " Ramalingam C
2022-06-29 12:14 ` [PATCH v3 13/13] drm/i915: turn on small BAR support Matthew Auld
2022-06-29 12:14   ` [Intel-gfx] " Matthew Auld
2022-06-29 16:16   ` Thomas Hellström
2022-06-29 16:16     ` [Intel-gfx] " Thomas Hellström
2022-06-29 13:00 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for small BAR uapi bits (rev4) Patchwork
2022-06-29 13:00 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-06-29 13:22 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-06-29 14:40 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for small BAR uapi bits (rev5) Patchwork
2022-06-29 14:40 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-06-29 15:05 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.