All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] drm/i915/ttm: Async migration
@ 2021-11-18 13:02 ` Thomas Hellström
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

This patch series deals with async migration and async vram management.
It still leaves an important part out, which is async unbinding which
will reduce latency further, at least when trying to migrate already active
objects.

Patches 1/6 deals with accessing and waiting for the TTM moving
fence from i915 GEM.
Patch 2 is pure code reorganization, no functional change.
Patch 3 breaks a refcounting loop involving the TTM moving fence.
Patch 4 makes the i915 TTM shinking code handle async moves.
Patch 5 uses TTM to implement the ttm move() callback async, it also
introduces a utility to collect dependencies and turn them into a
single dma_fence, which is needed for the intel_migrate code.
This also affects the gem object migrate code.
Patch 6 makes the object copy utility async as well, mainly for future
users since the only current user, suspend backup and restore, typically
will want to sync anyway.

v2:
- Fix a couple of SPARSE warnings.
v3:
- Fix a NULL pointer dereference.
v4:
- Squash what was previously patch 1 and 2 to patch1
- Ditch the moving fence waiting in i915_vma_pin_iomap()
- Rework how the refcounting loop is broken in patch 3. Drop region
  reference counting.
- Break what is now patch 4 out of patch 5. Add support for avoiding
  waiting for gpu when shrinking.
- A number of changes in patch 5. See the commit message for details.

Maarten Lankhorst (1):
  drm/i915: Add support for moving fence waiting

Thomas Hellström (5):
  drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
  drm/i915/ttm: Drop region reference counting
  drm/i915/ttm: Correctly handle waiting for gpu when shrinking
  drm/i915/ttm: Implement asynchronous TTM moves
  drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous

 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  52 +++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   6 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |   6 +
 drivers/gpu/drm/i915/gem/i915_gem_region.c    |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  89 ++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 405 ++++++++++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c    |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_wait.c      |   4 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  10 +-
 drivers/gpu/drm/i915/i915_vma.c               |  36 +-
 drivers/gpu/drm/i915/intel_memory_region.c    |  26 +-
 drivers/gpu/drm/i915/intel_memory_region.h    |   9 +-
 drivers/gpu/drm/i915/intel_region_ttm.c       |  35 +-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   2 +-
 .../drm/i915/selftests/intel_memory_region.c  |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |   7 +-
 23 files changed, 588 insertions(+), 143 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v4 0/6] drm/i915/ttm: Async migration
@ 2021-11-18 13:02 ` Thomas Hellström
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

This patch series deals with async migration and async vram management.
It still leaves an important part out, which is async unbinding which
will reduce latency further, at least when trying to migrate already active
objects.

Patches 1/6 deals with accessing and waiting for the TTM moving
fence from i915 GEM.
Patch 2 is pure code reorganization, no functional change.
Patch 3 breaks a refcounting loop involving the TTM moving fence.
Patch 4 makes the i915 TTM shinking code handle async moves.
Patch 5 uses TTM to implement the ttm move() callback async, it also
introduces a utility to collect dependencies and turn them into a
single dma_fence, which is needed for the intel_migrate code.
This also affects the gem object migrate code.
Patch 6 makes the object copy utility async as well, mainly for future
users since the only current user, suspend backup and restore, typically
will want to sync anyway.

v2:
- Fix a couple of SPARSE warnings.
v3:
- Fix a NULL pointer dereference.
v4:
- Squash what was previously patch 1 and 2 to patch1
- Ditch the moving fence waiting in i915_vma_pin_iomap()
- Rework how the refcounting loop is broken in patch 3. Drop region
  reference counting.
- Break what is now patch 4 out of patch 5. Add support for avoiding
  waiting for gpu when shrinking.
- A number of changes in patch 5. See the commit message for details.

Maarten Lankhorst (1):
  drm/i915: Add support for moving fence waiting

Thomas Hellström (5):
  drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
  drm/i915/ttm: Drop region reference counting
  drm/i915/ttm: Correctly handle waiting for gpu when shrinking
  drm/i915/ttm: Implement asynchronous TTM moves
  drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous

 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  52 +++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   6 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |   6 +
 drivers/gpu/drm/i915/gem/i915_gem_region.c    |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  89 ++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 405 ++++++++++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c    |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_wait.c      |   4 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  10 +-
 drivers/gpu/drm/i915/i915_vma.c               |  36 +-
 drivers/gpu/drm/i915/intel_memory_region.c    |  26 +-
 drivers/gpu/drm/i915/intel_memory_region.h    |   9 +-
 drivers/gpu/drm/i915/intel_region_ttm.c       |  35 +-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   2 +-
 .../drm/i915/selftests/intel_memory_region.c  |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |   7 +-
 23 files changed, 588 insertions(+), 143 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v4 1/6] drm/i915: Add support for moving fence waiting
  2021-11-18 13:02 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-18 13:02   ` Thomas Hellström
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

For now, we will only allow async migration when TTM is used,
so the paths we care about are related to TTM.

The mmap path is handled by having the fence in ttm_bo->moving,
when pinning, the binding only becomes available after the moving
fence is signaled, and pinning a cpu map will only work after
the moving fence signals.

This should close all holes where userspace can read a buffer
before it's fully migrated.

v2:
- Fix a couple of SPARSE warnings
v3:
- Fix a NULL pointer dereference
v4:
- Ditch the moving fence waiting for i915_vma_pin_iomap() and
  replace with a verification that the vma is already bound.
  (Matthew Auld)
- Squash with a previous patch introducing moving fence waiting and
  accessing interfaces (Matthew Auld)
- Rename to indicated that we also add support for sync waiting.

Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c | 52 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_object.h |  6 +++
 drivers/gpu/drm/i915/gem/i915_gem_pages.c  |  6 +++
 drivers/gpu/drm/i915/i915_vma.c            | 36 ++++++++++++++-
 4 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 591ee3cb7275..7b7d9415c9cb 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -33,6 +33,7 @@
 #include "i915_gem_object.h"
 #include "i915_memcpy.h"
 #include "i915_trace.h"
+#include "i915_gem_ttm.h"
 
 static struct kmem_cache *slab_objects;
 
@@ -726,6 +727,57 @@ static const struct drm_gem_object_funcs i915_gem_object_funcs = {
 	.export = i915_gem_prime_export,
 };
 
+/**
+ * i915_gem_object_get_moving_fence - Get the object's moving fence if any
+ * @obj: The object whose moving fence to get.
+ *
+ * A non-signaled moving fence means that there is an async operation
+ * pending on the object that needs to be waited on before setting up
+ * any GPU- or CPU PTEs to the object's pages.
+ *
+ * Return: A refcounted pointer to the object's moving fence if any,
+ * NULL otherwise.
+ */
+struct dma_fence *
+i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj)
+{
+	return dma_fence_get(i915_gem_to_ttm(obj)->moving);
+}
+
+/**
+ * i915_gem_object_wait_moving_fence - Wait for the object's moving fence if any
+ * @obj: The object whose moving fence to wait for.
+ * @intr: Whether to wait interruptible.
+ *
+ * If the moving fence signaled without an error, it is detached from the
+ * object and put.
+ *
+ * Return: 0 if successful, -ERESTARTSYS if the wait was interrupted,
+ * negative error code if the async operation represented by the
+ * moving fence failed.
+ */
+int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
+				      bool intr)
+{
+	struct dma_fence *fence = i915_gem_to_ttm(obj)->moving;
+	int ret;
+
+	assert_object_held(obj);
+	if (!fence)
+		return 0;
+
+	ret = dma_fence_wait(fence, intr);
+	if (ret)
+		return ret;
+
+	if (fence->error)
+		return fence->error;
+
+	i915_gem_to_ttm(obj)->moving = NULL;
+	dma_fence_put(fence);
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/huge_gem_object.c"
 #include "selftests/huge_pages.c"
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 133963b46135..66f20b803b01 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -517,6 +517,12 @@ i915_gem_object_finish_access(struct drm_i915_gem_object *obj)
 	i915_gem_object_unpin_pages(obj);
 }
 
+struct dma_fence *
+i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj);
+
+int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
+				      bool intr);
+
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 					 unsigned int cache_level);
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index c4f684b7cc51..49c6e55c68ce 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 	}
 
 	if (!ptr) {
+		err = i915_gem_object_wait_moving_fence(obj, true);
+		if (err) {
+			ptr = ERR_PTR(err);
+			goto err_unpin;
+		}
+
 		if (GEM_WARN_ON(type == I915_MAP_WC &&
 				!static_cpu_has(X86_FEATURE_PAT)))
 			ptr = ERR_PTR(-ENODEV);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 0896656896d0..00d3daf72329 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -354,6 +354,25 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
 	return err;
 }
 
+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
+static int i915_vma_verify_bind_complete(struct i915_vma *vma)
+{
+	int err = 0;
+
+	if (i915_active_has_exclusive(&vma->active)) {
+		struct dma_fence *fence =
+			i915_active_fence_get(&vma->active.excl);
+
+		if (dma_fence_is_signaled(fence))
+			err = fence->error;
+		else
+			err = -EBUSY;
+	}
+
+	return err;
+}
+#endif
+
 /**
  * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
  * @vma: VMA to map
@@ -428,6 +447,13 @@ int i915_vma_bind(struct i915_vma *vma,
 			work->pinned = i915_gem_object_get(vma->obj);
 		}
 	} else {
+		if (vma->obj) {
+			int ret;
+
+			ret = i915_gem_object_wait_moving_fence(vma->obj, true);
+			if (ret)
+				return ret;
+		}
 		vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags);
 	}
 
@@ -449,6 +475,7 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
 
 	GEM_BUG_ON(!i915_vma_is_ggtt(vma));
 	GEM_BUG_ON(!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND));
+	GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
 
 	ptr = READ_ONCE(vma->iomap);
 	if (ptr == NULL) {
@@ -867,6 +894,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 		    u64 size, u64 alignment, u64 flags)
 {
 	struct i915_vma_work *work = NULL;
+	struct dma_fence *moving = NULL;
 	intel_wakeref_t wakeref = 0;
 	unsigned int bound;
 	int err;
@@ -892,7 +920,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 	if (flags & PIN_GLOBAL)
 		wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
 
-	if (flags & vma->vm->bind_async_flags) {
+	moving = vma->obj ? i915_gem_object_get_moving_fence(vma->obj) : NULL;
+	if (flags & vma->vm->bind_async_flags || moving) {
 		/* lock VM */
 		err = i915_vm_lock_objects(vma->vm, ww);
 		if (err)
@@ -906,6 +935,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 
 		work->vm = i915_vm_get(vma->vm);
 
+		dma_fence_work_chain(&work->base, moving);
+
 		/* Allocate enough page directories to used PTE */
 		if (vma->vm->allocate_va_range) {
 			err = i915_vm_alloc_pt_stash(vma->vm,
@@ -1010,7 +1041,10 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 err_rpm:
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
+	if (moving)
+		dma_fence_put(moving);
 	vma_put_pages(vma);
+
 	return err;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v4 1/6] drm/i915: Add support for moving fence waiting
@ 2021-11-18 13:02   ` Thomas Hellström
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

For now, we will only allow async migration when TTM is used,
so the paths we care about are related to TTM.

The mmap path is handled by having the fence in ttm_bo->moving,
when pinning, the binding only becomes available after the moving
fence is signaled, and pinning a cpu map will only work after
the moving fence signals.

This should close all holes where userspace can read a buffer
before it's fully migrated.

v2:
- Fix a couple of SPARSE warnings
v3:
- Fix a NULL pointer dereference
v4:
- Ditch the moving fence waiting for i915_vma_pin_iomap() and
  replace with a verification that the vma is already bound.
  (Matthew Auld)
- Squash with a previous patch introducing moving fence waiting and
  accessing interfaces (Matthew Auld)
- Rename to indicated that we also add support for sync waiting.

Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c | 52 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_object.h |  6 +++
 drivers/gpu/drm/i915/gem/i915_gem_pages.c  |  6 +++
 drivers/gpu/drm/i915/i915_vma.c            | 36 ++++++++++++++-
 4 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 591ee3cb7275..7b7d9415c9cb 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -33,6 +33,7 @@
 #include "i915_gem_object.h"
 #include "i915_memcpy.h"
 #include "i915_trace.h"
+#include "i915_gem_ttm.h"
 
 static struct kmem_cache *slab_objects;
 
@@ -726,6 +727,57 @@ static const struct drm_gem_object_funcs i915_gem_object_funcs = {
 	.export = i915_gem_prime_export,
 };
 
+/**
+ * i915_gem_object_get_moving_fence - Get the object's moving fence if any
+ * @obj: The object whose moving fence to get.
+ *
+ * A non-signaled moving fence means that there is an async operation
+ * pending on the object that needs to be waited on before setting up
+ * any GPU- or CPU PTEs to the object's pages.
+ *
+ * Return: A refcounted pointer to the object's moving fence if any,
+ * NULL otherwise.
+ */
+struct dma_fence *
+i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj)
+{
+	return dma_fence_get(i915_gem_to_ttm(obj)->moving);
+}
+
+/**
+ * i915_gem_object_wait_moving_fence - Wait for the object's moving fence if any
+ * @obj: The object whose moving fence to wait for.
+ * @intr: Whether to wait interruptible.
+ *
+ * If the moving fence signaled without an error, it is detached from the
+ * object and put.
+ *
+ * Return: 0 if successful, -ERESTARTSYS if the wait was interrupted,
+ * negative error code if the async operation represented by the
+ * moving fence failed.
+ */
+int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
+				      bool intr)
+{
+	struct dma_fence *fence = i915_gem_to_ttm(obj)->moving;
+	int ret;
+
+	assert_object_held(obj);
+	if (!fence)
+		return 0;
+
+	ret = dma_fence_wait(fence, intr);
+	if (ret)
+		return ret;
+
+	if (fence->error)
+		return fence->error;
+
+	i915_gem_to_ttm(obj)->moving = NULL;
+	dma_fence_put(fence);
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/huge_gem_object.c"
 #include "selftests/huge_pages.c"
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 133963b46135..66f20b803b01 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -517,6 +517,12 @@ i915_gem_object_finish_access(struct drm_i915_gem_object *obj)
 	i915_gem_object_unpin_pages(obj);
 }
 
+struct dma_fence *
+i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj);
+
+int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
+				      bool intr);
+
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 					 unsigned int cache_level);
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index c4f684b7cc51..49c6e55c68ce 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 	}
 
 	if (!ptr) {
+		err = i915_gem_object_wait_moving_fence(obj, true);
+		if (err) {
+			ptr = ERR_PTR(err);
+			goto err_unpin;
+		}
+
 		if (GEM_WARN_ON(type == I915_MAP_WC &&
 				!static_cpu_has(X86_FEATURE_PAT)))
 			ptr = ERR_PTR(-ENODEV);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 0896656896d0..00d3daf72329 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -354,6 +354,25 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
 	return err;
 }
 
+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
+static int i915_vma_verify_bind_complete(struct i915_vma *vma)
+{
+	int err = 0;
+
+	if (i915_active_has_exclusive(&vma->active)) {
+		struct dma_fence *fence =
+			i915_active_fence_get(&vma->active.excl);
+
+		if (dma_fence_is_signaled(fence))
+			err = fence->error;
+		else
+			err = -EBUSY;
+	}
+
+	return err;
+}
+#endif
+
 /**
  * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
  * @vma: VMA to map
@@ -428,6 +447,13 @@ int i915_vma_bind(struct i915_vma *vma,
 			work->pinned = i915_gem_object_get(vma->obj);
 		}
 	} else {
+		if (vma->obj) {
+			int ret;
+
+			ret = i915_gem_object_wait_moving_fence(vma->obj, true);
+			if (ret)
+				return ret;
+		}
 		vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags);
 	}
 
@@ -449,6 +475,7 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
 
 	GEM_BUG_ON(!i915_vma_is_ggtt(vma));
 	GEM_BUG_ON(!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND));
+	GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
 
 	ptr = READ_ONCE(vma->iomap);
 	if (ptr == NULL) {
@@ -867,6 +894,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 		    u64 size, u64 alignment, u64 flags)
 {
 	struct i915_vma_work *work = NULL;
+	struct dma_fence *moving = NULL;
 	intel_wakeref_t wakeref = 0;
 	unsigned int bound;
 	int err;
@@ -892,7 +920,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 	if (flags & PIN_GLOBAL)
 		wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
 
-	if (flags & vma->vm->bind_async_flags) {
+	moving = vma->obj ? i915_gem_object_get_moving_fence(vma->obj) : NULL;
+	if (flags & vma->vm->bind_async_flags || moving) {
 		/* lock VM */
 		err = i915_vm_lock_objects(vma->vm, ww);
 		if (err)
@@ -906,6 +935,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 
 		work->vm = i915_vm_get(vma->vm);
 
+		dma_fence_work_chain(&work->base, moving);
+
 		/* Allocate enough page directories to used PTE */
 		if (vma->vm->allocate_va_range) {
 			err = i915_vm_alloc_pt_stash(vma->vm,
@@ -1010,7 +1041,10 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 err_rpm:
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
+	if (moving)
+		dma_fence_put(moving);
 	vma_put_pages(vma);
+
 	return err;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 2/6] drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
  2021-11-18 13:02 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-18 13:02   ` Thomas Hellström
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Move the i915_gem_obj_copy_ttm() function to i915_gem_ttm_move.h.
This will help keep a number of functions static when introducing
async moves.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 47 ---------------
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |  4 --
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 63 ++++++++++++++++----
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 10 ++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c   |  1 +
 5 files changed, 56 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 68cfe6e9ceab..537a81445b90 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1063,50 +1063,3 @@ i915_gem_ttm_system_setup(struct drm_i915_private *i915,
 	intel_memory_region_set_name(mr, "system-ttm");
 	return mr;
 }
-
-/**
- * i915_gem_obj_copy_ttm - Copy the contents of one ttm-based gem object to
- * another
- * @dst: The destination object
- * @src: The source object
- * @allow_accel: Allow using the blitter. Otherwise TTM memcpy is used.
- * @intr: Whether to perform waits interruptible:
- *
- * Note: The caller is responsible for assuring that the underlying
- * TTM objects are populated if needed and locked.
- *
- * Return: Zero on success. Negative error code on error. If @intr == true,
- * then it may return -ERESTARTSYS or -EINTR.
- */
-int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
-			  struct drm_i915_gem_object *src,
-			  bool allow_accel, bool intr)
-{
-	struct ttm_buffer_object *dst_bo = i915_gem_to_ttm(dst);
-	struct ttm_buffer_object *src_bo = i915_gem_to_ttm(src);
-	struct ttm_operation_ctx ctx = {
-		.interruptible = intr,
-	};
-	struct i915_refct_sgt *dst_rsgt;
-	int ret;
-
-	assert_object_held(dst);
-	assert_object_held(src);
-
-	/*
-	 * Sync for now. This will change with async moves.
-	 */
-	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
-	if (!ret)
-		ret = ttm_bo_wait_ctx(src_bo, &ctx);
-	if (ret)
-		return ret;
-
-	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
-	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_rsgt, allow_accel);
-
-	i915_refct_sgt_put(dst_rsgt);
-
-	return 0;
-}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 074a7c08ff31..82cdabb542be 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -49,10 +49,6 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 			       resource_size_t page_size,
 			       unsigned int flags);
 
-int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
-			  struct drm_i915_gem_object *src,
-			  bool allow_accel, bool intr);
-
 /* Internal I915 TTM declarations and definitions below. */
 
 #define I915_PL_LMEM0 TTM_PL_PRIV
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index ef22d4ed66ad..f35b386c56ca 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -378,18 +378,10 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 	return &work->fence;
 }
 
-/**
- * __i915_ttm_move - helper to perform TTM moves or clears.
- * @bo: The source buffer object.
- * @clear: Whether this is a clear operation.
- * @dst_mem: The destination ttm resource.
- * @dst_ttm: The destination ttm page vector.
- * @dst_rsgt: The destination refcounted sg-list.
- * @allow_accel: Whether to allow acceleration.
- */
-void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-		     struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
-		     struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+			    struct ttm_resource *dst_mem,
+			    struct ttm_tt *dst_ttm,
+			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
 {
 	struct i915_ttm_memcpy_work *copy_work = NULL;
 	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
@@ -521,3 +513,50 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	i915_ttm_adjust_gem_after_move(obj);
 	return 0;
 }
+
+/**
+ * i915_gem_obj_copy_ttm - Copy the contents of one ttm-based gem object to
+ * another
+ * @dst: The destination object
+ * @src: The source object
+ * @allow_accel: Allow using the blitter. Otherwise TTM memcpy is used.
+ * @intr: Whether to perform waits interruptible:
+ *
+ * Note: The caller is responsible for assuring that the underlying
+ * TTM objects are populated if needed and locked.
+ *
+ * Return: Zero on success. Negative error code on error. If @intr == true,
+ * then it may return -ERESTARTSYS or -EINTR.
+ */
+int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
+			  struct drm_i915_gem_object *src,
+			  bool allow_accel, bool intr)
+{
+	struct ttm_buffer_object *dst_bo = i915_gem_to_ttm(dst);
+	struct ttm_buffer_object *src_bo = i915_gem_to_ttm(src);
+	struct ttm_operation_ctx ctx = {
+		.interruptible = intr,
+	};
+	struct i915_refct_sgt *dst_rsgt;
+	int ret;
+
+	assert_object_held(dst);
+	assert_object_held(src);
+
+	/*
+	 * Sync for now. This will change with async moves.
+	 */
+	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
+	if (!ret)
+		ret = ttm_bo_wait_ctx(src_bo, &ctx);
+	if (ret)
+		return ret;
+
+	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
+	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
+			dst_rsgt, allow_accel);
+
+	i915_refct_sgt_put(dst_rsgt);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
index 75b87e752af2..d2e7f149e05c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
@@ -23,13 +23,11 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo);
 I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 							      bool work_allocation));
 
-/* Internal I915 TTM declarations and definitions below. */
+int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
+			  struct drm_i915_gem_object *src,
+			  bool allow_accel, bool intr);
 
-void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-		     struct ttm_resource *dst_mem,
-		     struct ttm_tt *dst_ttm,
-		     struct i915_refct_sgt *dst_rsgt,
-		     bool allow_accel);
+/* Internal I915 TTM declarations and definitions below. */
 
 int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		  struct ttm_operation_ctx *ctx,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 3b6d14b5c604..60d10ab55d1e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -12,6 +12,7 @@
 
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
+#include "gem/i915_gem_ttm_move.h"
 #include "gem/i915_gem_ttm_pm.h"
 
 /**
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v4 2/6] drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
@ 2021-11-18 13:02   ` Thomas Hellström
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Move the i915_gem_obj_copy_ttm() function to i915_gem_ttm_move.h.
This will help keep a number of functions static when introducing
async moves.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 47 ---------------
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |  4 --
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 63 ++++++++++++++++----
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 10 ++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c   |  1 +
 5 files changed, 56 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 68cfe6e9ceab..537a81445b90 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1063,50 +1063,3 @@ i915_gem_ttm_system_setup(struct drm_i915_private *i915,
 	intel_memory_region_set_name(mr, "system-ttm");
 	return mr;
 }
-
-/**
- * i915_gem_obj_copy_ttm - Copy the contents of one ttm-based gem object to
- * another
- * @dst: The destination object
- * @src: The source object
- * @allow_accel: Allow using the blitter. Otherwise TTM memcpy is used.
- * @intr: Whether to perform waits interruptible:
- *
- * Note: The caller is responsible for assuring that the underlying
- * TTM objects are populated if needed and locked.
- *
- * Return: Zero on success. Negative error code on error. If @intr == true,
- * then it may return -ERESTARTSYS or -EINTR.
- */
-int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
-			  struct drm_i915_gem_object *src,
-			  bool allow_accel, bool intr)
-{
-	struct ttm_buffer_object *dst_bo = i915_gem_to_ttm(dst);
-	struct ttm_buffer_object *src_bo = i915_gem_to_ttm(src);
-	struct ttm_operation_ctx ctx = {
-		.interruptible = intr,
-	};
-	struct i915_refct_sgt *dst_rsgt;
-	int ret;
-
-	assert_object_held(dst);
-	assert_object_held(src);
-
-	/*
-	 * Sync for now. This will change with async moves.
-	 */
-	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
-	if (!ret)
-		ret = ttm_bo_wait_ctx(src_bo, &ctx);
-	if (ret)
-		return ret;
-
-	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
-	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_rsgt, allow_accel);
-
-	i915_refct_sgt_put(dst_rsgt);
-
-	return 0;
-}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 074a7c08ff31..82cdabb542be 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -49,10 +49,6 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 			       resource_size_t page_size,
 			       unsigned int flags);
 
-int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
-			  struct drm_i915_gem_object *src,
-			  bool allow_accel, bool intr);
-
 /* Internal I915 TTM declarations and definitions below. */
 
 #define I915_PL_LMEM0 TTM_PL_PRIV
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index ef22d4ed66ad..f35b386c56ca 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -378,18 +378,10 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 	return &work->fence;
 }
 
-/**
- * __i915_ttm_move - helper to perform TTM moves or clears.
- * @bo: The source buffer object.
- * @clear: Whether this is a clear operation.
- * @dst_mem: The destination ttm resource.
- * @dst_ttm: The destination ttm page vector.
- * @dst_rsgt: The destination refcounted sg-list.
- * @allow_accel: Whether to allow acceleration.
- */
-void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-		     struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
-		     struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+			    struct ttm_resource *dst_mem,
+			    struct ttm_tt *dst_ttm,
+			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
 {
 	struct i915_ttm_memcpy_work *copy_work = NULL;
 	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
@@ -521,3 +513,50 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	i915_ttm_adjust_gem_after_move(obj);
 	return 0;
 }
+
+/**
+ * i915_gem_obj_copy_ttm - Copy the contents of one ttm-based gem object to
+ * another
+ * @dst: The destination object
+ * @src: The source object
+ * @allow_accel: Allow using the blitter. Otherwise TTM memcpy is used.
+ * @intr: Whether to perform waits interruptible:
+ *
+ * Note: The caller is responsible for assuring that the underlying
+ * TTM objects are populated if needed and locked.
+ *
+ * Return: Zero on success. Negative error code on error. If @intr == true,
+ * then it may return -ERESTARTSYS or -EINTR.
+ */
+int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
+			  struct drm_i915_gem_object *src,
+			  bool allow_accel, bool intr)
+{
+	struct ttm_buffer_object *dst_bo = i915_gem_to_ttm(dst);
+	struct ttm_buffer_object *src_bo = i915_gem_to_ttm(src);
+	struct ttm_operation_ctx ctx = {
+		.interruptible = intr,
+	};
+	struct i915_refct_sgt *dst_rsgt;
+	int ret;
+
+	assert_object_held(dst);
+	assert_object_held(src);
+
+	/*
+	 * Sync for now. This will change with async moves.
+	 */
+	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
+	if (!ret)
+		ret = ttm_bo_wait_ctx(src_bo, &ctx);
+	if (ret)
+		return ret;
+
+	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
+	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
+			dst_rsgt, allow_accel);
+
+	i915_refct_sgt_put(dst_rsgt);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
index 75b87e752af2..d2e7f149e05c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
@@ -23,13 +23,11 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo);
 I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 							      bool work_allocation));
 
-/* Internal I915 TTM declarations and definitions below. */
+int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
+			  struct drm_i915_gem_object *src,
+			  bool allow_accel, bool intr);
 
-void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-		     struct ttm_resource *dst_mem,
-		     struct ttm_tt *dst_ttm,
-		     struct i915_refct_sgt *dst_rsgt,
-		     bool allow_accel);
+/* Internal I915 TTM declarations and definitions below. */
 
 int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		  struct ttm_operation_ctx *ctx,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 3b6d14b5c604..60d10ab55d1e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -12,6 +12,7 @@
 
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
+#include "gem/i915_gem_ttm_move.h"
 #include "gem/i915_gem_ttm_pm.h"
 
 /**
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 3/6] drm/i915/ttm: Drop region reference counting
  2021-11-18 13:02 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-18 13:02   ` Thomas Hellström
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

There is an interesting refcounting loop:
struct intel_memory_region has a struct ttm_resource_manager,
ttm_resource_manager->move may hold a reference to i915_request,
i915_request may hold a reference to intel_context,
intel_context may hold a reference to drm_i915_gem_object,
drm_i915_gem_object may hold a reference to intel_memory_region.

Break this loop by dropping region reference counting.

In addition, Have regions with a manager moving fence make sure
that all region objects are released before freeing the region.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_region.c    |  4 +--
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  6 ++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   | 10 ++++--
 drivers/gpu/drm/i915/intel_memory_region.c    | 26 ++++----------
 drivers/gpu/drm/i915/intel_memory_region.h    |  9 ++---
 drivers/gpu/drm/i915/intel_region_ttm.c       | 35 +++++++++++++++++--
 drivers/gpu/drm/i915/intel_region_ttm.h       |  2 +-
 .../drm/i915/selftests/intel_memory_region.c  |  8 ++---
 drivers/gpu/drm/i915/selftests/mock_region.c  |  7 ++--
 12 files changed, 69 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c
index a016ccec36f3..a4350227e9ae 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_region.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c
@@ -11,7 +11,7 @@
 void i915_gem_object_init_memory_region(struct drm_i915_gem_object *obj,
 					struct intel_memory_region *mem)
 {
-	obj->mm.region = intel_memory_region_get(mem);
+	obj->mm.region = mem;
 
 	mutex_lock(&mem->objects.lock);
 	list_add(&obj->mm.region_link, &mem->objects.list);
@@ -25,8 +25,6 @@ void i915_gem_object_release_memory_region(struct drm_i915_gem_object *obj)
 	mutex_lock(&mem->objects.lock);
 	list_del(&obj->mm.region_link);
 	mutex_unlock(&mem->objects.lock);
-
-	intel_memory_region_put(mem);
 }
 
 struct drm_i915_gem_object *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 4a88c89b7a14..cc9fe258fba7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -664,9 +664,10 @@ static int init_shmem(struct intel_memory_region *mem)
 	return 0; /* Don't error, we can simply fallback to the kernel mnt */
 }
 
-static void release_shmem(struct intel_memory_region *mem)
+static int release_shmem(struct intel_memory_region *mem)
 {
 	i915_gemfs_fini(mem->i915);
+	return 0;
 }
 
 static const struct intel_memory_region_ops shmem_region_ops = {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index ddd37ccb1362..80680395bb3b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -720,9 +720,10 @@ static int init_stolen_smem(struct intel_memory_region *mem)
 	return i915_gem_init_stolen(mem);
 }
 
-static void release_stolen_smem(struct intel_memory_region *mem)
+static int release_stolen_smem(struct intel_memory_region *mem)
 {
 	i915_gem_cleanup_stolen(mem->i915);
+	return 0;
 }
 
 static const struct intel_memory_region_ops i915_region_stolen_smem_ops = {
@@ -759,10 +760,11 @@ static int init_stolen_lmem(struct intel_memory_region *mem)
 	return err;
 }
 
-static void release_stolen_lmem(struct intel_memory_region *mem)
+static int release_stolen_lmem(struct intel_memory_region *mem)
 {
 	io_mapping_fini(&mem->iomap);
 	i915_gem_cleanup_stolen(mem->i915);
+	return 0;
 }
 
 static const struct intel_memory_region_ops i915_region_stolen_lmem_ops = {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 537a81445b90..350bf1a23db5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -997,7 +997,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
 
 	/* Don't put on a region list until we're either locked or fully initialized. */
-	obj->mm.region = intel_memory_region_get(mem);
+	obj->mm.region = mem;
 	INIT_LIST_HEAD(&obj->mm.region_link);
 
 	INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN);
@@ -1044,6 +1044,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 
 static const struct intel_memory_region_ops ttm_system_region_ops = {
 	.init_object = __i915_gem_ttm_object_init,
+	.release = intel_region_ttm_fini,
 };
 
 struct intel_memory_region *
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 257588b68adc..c69c7d45aabc 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -568,7 +568,7 @@ static int igt_mock_memory_region_huge_pages(void *arg)
 out_put:
 	i915_gem_object_put(obj);
 out_region:
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index aec838ecb2ef..9ea49e0a27c0 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -66,12 +66,16 @@ static void release_fake_lmem_bar(struct intel_memory_region *mem)
 			   DMA_ATTR_FORCE_CONTIGUOUS);
 }
 
-static void
+static int
 region_lmem_release(struct intel_memory_region *mem)
 {
-	intel_region_ttm_fini(mem);
+	int ret;
+
+	ret = intel_region_ttm_fini(mem);
 	io_mapping_fini(&mem->iomap);
 	release_fake_lmem_bar(mem);
+
+	return ret;
 }
 
 static int
@@ -231,7 +235,7 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	return mem;
 
 err_region_put:
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 	return ERR_PTR(err);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index e7f7e6627750..b43121609e25 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -126,7 +126,6 @@ intel_memory_region_create(struct drm_i915_private *i915,
 			goto err_free;
 	}
 
-	kref_init(&mem->kref);
 	return mem;
 
 err_free:
@@ -144,28 +143,17 @@ void intel_memory_region_set_name(struct intel_memory_region *mem,
 	va_end(ap);
 }
 
-static void __intel_memory_region_destroy(struct kref *kref)
+void intel_memory_region_destroy(struct intel_memory_region *mem)
 {
-	struct intel_memory_region *mem =
-		container_of(kref, typeof(*mem), kref);
+	int ret = 0;
 
 	if (mem->ops->release)
-		mem->ops->release(mem);
+		ret = mem->ops->release(mem);
 
+	GEM_WARN_ON(!list_empty_careful(&mem->objects.list));
 	mutex_destroy(&mem->objects.lock);
-	kfree(mem);
-}
-
-struct intel_memory_region *
-intel_memory_region_get(struct intel_memory_region *mem)
-{
-	kref_get(&mem->kref);
-	return mem;
-}
-
-void intel_memory_region_put(struct intel_memory_region *mem)
-{
-	kref_put(&mem->kref, __intel_memory_region_destroy);
+	if (!ret)
+		kfree(mem);
 }
 
 /* Global memory region registration -- only slight layer inversions! */
@@ -234,7 +222,7 @@ void intel_memory_regions_driver_release(struct drm_i915_private *i915)
 			fetch_and_zero(&i915->mm.regions[i]);
 
 		if (region)
-			intel_memory_region_put(region);
+			intel_memory_region_destroy(region);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 3feae3353d33..5625c9c38993 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -6,7 +6,6 @@
 #ifndef __INTEL_MEMORY_REGION_H__
 #define __INTEL_MEMORY_REGION_H__
 
-#include <linux/kref.h>
 #include <linux/ioport.h>
 #include <linux/mutex.h>
 #include <linux/io-mapping.h>
@@ -51,7 +50,7 @@ struct intel_memory_region_ops {
 	unsigned int flags;
 
 	int (*init)(struct intel_memory_region *mem);
-	void (*release)(struct intel_memory_region *mem);
+	int (*release)(struct intel_memory_region *mem);
 
 	int (*init_object)(struct intel_memory_region *mem,
 			   struct drm_i915_gem_object *obj,
@@ -71,8 +70,6 @@ struct intel_memory_region {
 	/* For fake LMEM */
 	struct drm_mm_node fake_mappable;
 
-	struct kref kref;
-
 	resource_size_t io_start;
 	resource_size_t min_page_size;
 	resource_size_t total;
@@ -110,9 +107,7 @@ intel_memory_region_create(struct drm_i915_private *i915,
 			   u16 instance,
 			   const struct intel_memory_region_ops *ops);
 
-struct intel_memory_region *
-intel_memory_region_get(struct intel_memory_region *mem);
-void intel_memory_region_put(struct intel_memory_region *mem);
+void intel_memory_region_destroy(struct intel_memory_region *mem);
 
 int intel_memory_regions_hw_probe(struct drm_i915_private *i915);
 void intel_memory_regions_driver_release(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 2e901a27e259..f9bda8989074 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -104,14 +104,45 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
  * memory region, and if it was registered with the TTM device,
  * removes that registration.
  */
-void intel_region_ttm_fini(struct intel_memory_region *mem)
+int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
-	int ret;
+	struct ttm_resource_manager *man = mem->region_private;
+	int ret = -EBUSY;
+	int count;
+
+	/*
+	 * Put the region's move fences. This releases requests that
+	 * may hold on to contexts and vms that may hold on to buffer
+	 * objects that may have a refcount on the region. :/
+	 */
+	if (man)
+		ttm_resource_manager_cleanup(man);
+
+	/* Flush objects from region. */
+	for (count = 0; count < 10; ++count) {
+		i915_gem_flush_free_objects(mem->i915);
+
+		mutex_lock(&mem->objects.lock);
+		if (list_empty(&mem->objects.list))
+			ret = 0;
+		mutex_unlock(&mem->objects.lock);
+		if (!ret)
+			break;
+
+		msleep(20);
+		flush_delayed_work(&mem->i915->bdev.wq);
+	}
+
+	/* If we leaked objects, Don't free the region causing use after free */
+	if (ret || !man)
+		return ret;
 
 	ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
 				      intel_region_to_ttm_type(mem));
 	GEM_WARN_ON(ret);
 	mem->region_private = NULL;
+
+	return ret;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
index 7bbe2b46b504..fdee5e7bd46c 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -20,7 +20,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private *dev_priv);
 
 int intel_region_ttm_init(struct intel_memory_region *mem);
 
-void intel_region_ttm_fini(struct intel_memory_region *mem);
+int intel_region_ttm_fini(struct intel_memory_region *mem);
 
 struct i915_refct_sgt *
 intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
index 418caae84759..0d5df0dc7212 100644
--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
@@ -225,7 +225,7 @@ static int igt_mock_reserve(void *arg)
 
 out_close:
 	close_objects(mem, &objects);
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 out_free_order:
 	kfree(order);
 	return err;
@@ -439,7 +439,7 @@ static int igt_mock_splintered_region(void *arg)
 out_close:
 	close_objects(mem, &objects);
 out_put:
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 	return err;
 }
 
@@ -507,7 +507,7 @@ static int igt_mock_max_segment(void *arg)
 out_close:
 	close_objects(mem, &objects);
 out_put:
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 	return err;
 }
 
@@ -1196,7 +1196,7 @@ int intel_memory_region_mock_selftests(void)
 
 	err = i915_subtests(tests, mem);
 
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 out_unref:
 	mock_destroy_device(i915);
 	return err;
diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c
index 7ec5037eaa58..19bff8afcaaa 100644
--- a/drivers/gpu/drm/i915/selftests/mock_region.c
+++ b/drivers/gpu/drm/i915/selftests/mock_region.c
@@ -84,13 +84,16 @@ static int mock_object_init(struct intel_memory_region *mem,
 	return 0;
 }
 
-static void mock_region_fini(struct intel_memory_region *mem)
+static int mock_region_fini(struct intel_memory_region *mem)
 {
 	struct drm_i915_private *i915 = mem->i915;
 	int instance = mem->instance;
+	int ret;
 
-	intel_region_ttm_fini(mem);
+	ret = intel_region_ttm_fini(mem);
 	ida_free(&i915->selftest.mock_region_instances, instance);
+
+	return ret;
 }
 
 static const struct intel_memory_region_ops mock_region_ops = {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v4 3/6] drm/i915/ttm: Drop region reference counting
@ 2021-11-18 13:02   ` Thomas Hellström
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

There is an interesting refcounting loop:
struct intel_memory_region has a struct ttm_resource_manager,
ttm_resource_manager->move may hold a reference to i915_request,
i915_request may hold a reference to intel_context,
intel_context may hold a reference to drm_i915_gem_object,
drm_i915_gem_object may hold a reference to intel_memory_region.

Break this loop by dropping region reference counting.

In addition, Have regions with a manager moving fence make sure
that all region objects are released before freeing the region.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_region.c    |  4 +--
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  6 ++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  3 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   | 10 ++++--
 drivers/gpu/drm/i915/intel_memory_region.c    | 26 ++++----------
 drivers/gpu/drm/i915/intel_memory_region.h    |  9 ++---
 drivers/gpu/drm/i915/intel_region_ttm.c       | 35 +++++++++++++++++--
 drivers/gpu/drm/i915/intel_region_ttm.h       |  2 +-
 .../drm/i915/selftests/intel_memory_region.c  |  8 ++---
 drivers/gpu/drm/i915/selftests/mock_region.c  |  7 ++--
 12 files changed, 69 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c
index a016ccec36f3..a4350227e9ae 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_region.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c
@@ -11,7 +11,7 @@
 void i915_gem_object_init_memory_region(struct drm_i915_gem_object *obj,
 					struct intel_memory_region *mem)
 {
-	obj->mm.region = intel_memory_region_get(mem);
+	obj->mm.region = mem;
 
 	mutex_lock(&mem->objects.lock);
 	list_add(&obj->mm.region_link, &mem->objects.list);
@@ -25,8 +25,6 @@ void i915_gem_object_release_memory_region(struct drm_i915_gem_object *obj)
 	mutex_lock(&mem->objects.lock);
 	list_del(&obj->mm.region_link);
 	mutex_unlock(&mem->objects.lock);
-
-	intel_memory_region_put(mem);
 }
 
 struct drm_i915_gem_object *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 4a88c89b7a14..cc9fe258fba7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -664,9 +664,10 @@ static int init_shmem(struct intel_memory_region *mem)
 	return 0; /* Don't error, we can simply fallback to the kernel mnt */
 }
 
-static void release_shmem(struct intel_memory_region *mem)
+static int release_shmem(struct intel_memory_region *mem)
 {
 	i915_gemfs_fini(mem->i915);
+	return 0;
 }
 
 static const struct intel_memory_region_ops shmem_region_ops = {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index ddd37ccb1362..80680395bb3b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -720,9 +720,10 @@ static int init_stolen_smem(struct intel_memory_region *mem)
 	return i915_gem_init_stolen(mem);
 }
 
-static void release_stolen_smem(struct intel_memory_region *mem)
+static int release_stolen_smem(struct intel_memory_region *mem)
 {
 	i915_gem_cleanup_stolen(mem->i915);
+	return 0;
 }
 
 static const struct intel_memory_region_ops i915_region_stolen_smem_ops = {
@@ -759,10 +760,11 @@ static int init_stolen_lmem(struct intel_memory_region *mem)
 	return err;
 }
 
-static void release_stolen_lmem(struct intel_memory_region *mem)
+static int release_stolen_lmem(struct intel_memory_region *mem)
 {
 	io_mapping_fini(&mem->iomap);
 	i915_gem_cleanup_stolen(mem->i915);
+	return 0;
 }
 
 static const struct intel_memory_region_ops i915_region_stolen_lmem_ops = {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 537a81445b90..350bf1a23db5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -997,7 +997,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
 
 	/* Don't put on a region list until we're either locked or fully initialized. */
-	obj->mm.region = intel_memory_region_get(mem);
+	obj->mm.region = mem;
 	INIT_LIST_HEAD(&obj->mm.region_link);
 
 	INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN);
@@ -1044,6 +1044,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 
 static const struct intel_memory_region_ops ttm_system_region_ops = {
 	.init_object = __i915_gem_ttm_object_init,
+	.release = intel_region_ttm_fini,
 };
 
 struct intel_memory_region *
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 257588b68adc..c69c7d45aabc 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -568,7 +568,7 @@ static int igt_mock_memory_region_huge_pages(void *arg)
 out_put:
 	i915_gem_object_put(obj);
 out_region:
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index aec838ecb2ef..9ea49e0a27c0 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -66,12 +66,16 @@ static void release_fake_lmem_bar(struct intel_memory_region *mem)
 			   DMA_ATTR_FORCE_CONTIGUOUS);
 }
 
-static void
+static int
 region_lmem_release(struct intel_memory_region *mem)
 {
-	intel_region_ttm_fini(mem);
+	int ret;
+
+	ret = intel_region_ttm_fini(mem);
 	io_mapping_fini(&mem->iomap);
 	release_fake_lmem_bar(mem);
+
+	return ret;
 }
 
 static int
@@ -231,7 +235,7 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	return mem;
 
 err_region_put:
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 	return ERR_PTR(err);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index e7f7e6627750..b43121609e25 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -126,7 +126,6 @@ intel_memory_region_create(struct drm_i915_private *i915,
 			goto err_free;
 	}
 
-	kref_init(&mem->kref);
 	return mem;
 
 err_free:
@@ -144,28 +143,17 @@ void intel_memory_region_set_name(struct intel_memory_region *mem,
 	va_end(ap);
 }
 
-static void __intel_memory_region_destroy(struct kref *kref)
+void intel_memory_region_destroy(struct intel_memory_region *mem)
 {
-	struct intel_memory_region *mem =
-		container_of(kref, typeof(*mem), kref);
+	int ret = 0;
 
 	if (mem->ops->release)
-		mem->ops->release(mem);
+		ret = mem->ops->release(mem);
 
+	GEM_WARN_ON(!list_empty_careful(&mem->objects.list));
 	mutex_destroy(&mem->objects.lock);
-	kfree(mem);
-}
-
-struct intel_memory_region *
-intel_memory_region_get(struct intel_memory_region *mem)
-{
-	kref_get(&mem->kref);
-	return mem;
-}
-
-void intel_memory_region_put(struct intel_memory_region *mem)
-{
-	kref_put(&mem->kref, __intel_memory_region_destroy);
+	if (!ret)
+		kfree(mem);
 }
 
 /* Global memory region registration -- only slight layer inversions! */
@@ -234,7 +222,7 @@ void intel_memory_regions_driver_release(struct drm_i915_private *i915)
 			fetch_and_zero(&i915->mm.regions[i]);
 
 		if (region)
-			intel_memory_region_put(region);
+			intel_memory_region_destroy(region);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 3feae3353d33..5625c9c38993 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -6,7 +6,6 @@
 #ifndef __INTEL_MEMORY_REGION_H__
 #define __INTEL_MEMORY_REGION_H__
 
-#include <linux/kref.h>
 #include <linux/ioport.h>
 #include <linux/mutex.h>
 #include <linux/io-mapping.h>
@@ -51,7 +50,7 @@ struct intel_memory_region_ops {
 	unsigned int flags;
 
 	int (*init)(struct intel_memory_region *mem);
-	void (*release)(struct intel_memory_region *mem);
+	int (*release)(struct intel_memory_region *mem);
 
 	int (*init_object)(struct intel_memory_region *mem,
 			   struct drm_i915_gem_object *obj,
@@ -71,8 +70,6 @@ struct intel_memory_region {
 	/* For fake LMEM */
 	struct drm_mm_node fake_mappable;
 
-	struct kref kref;
-
 	resource_size_t io_start;
 	resource_size_t min_page_size;
 	resource_size_t total;
@@ -110,9 +107,7 @@ intel_memory_region_create(struct drm_i915_private *i915,
 			   u16 instance,
 			   const struct intel_memory_region_ops *ops);
 
-struct intel_memory_region *
-intel_memory_region_get(struct intel_memory_region *mem);
-void intel_memory_region_put(struct intel_memory_region *mem);
+void intel_memory_region_destroy(struct intel_memory_region *mem);
 
 int intel_memory_regions_hw_probe(struct drm_i915_private *i915);
 void intel_memory_regions_driver_release(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 2e901a27e259..f9bda8989074 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -104,14 +104,45 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
  * memory region, and if it was registered with the TTM device,
  * removes that registration.
  */
-void intel_region_ttm_fini(struct intel_memory_region *mem)
+int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
-	int ret;
+	struct ttm_resource_manager *man = mem->region_private;
+	int ret = -EBUSY;
+	int count;
+
+	/*
+	 * Put the region's move fences. This releases requests that
+	 * may hold on to contexts and vms that may hold on to buffer
+	 * objects that may have a refcount on the region. :/
+	 */
+	if (man)
+		ttm_resource_manager_cleanup(man);
+
+	/* Flush objects from region. */
+	for (count = 0; count < 10; ++count) {
+		i915_gem_flush_free_objects(mem->i915);
+
+		mutex_lock(&mem->objects.lock);
+		if (list_empty(&mem->objects.list))
+			ret = 0;
+		mutex_unlock(&mem->objects.lock);
+		if (!ret)
+			break;
+
+		msleep(20);
+		flush_delayed_work(&mem->i915->bdev.wq);
+	}
+
+	/* If we leaked objects, Don't free the region causing use after free */
+	if (ret || !man)
+		return ret;
 
 	ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
 				      intel_region_to_ttm_type(mem));
 	GEM_WARN_ON(ret);
 	mem->region_private = NULL;
+
+	return ret;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
index 7bbe2b46b504..fdee5e7bd46c 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -20,7 +20,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private *dev_priv);
 
 int intel_region_ttm_init(struct intel_memory_region *mem);
 
-void intel_region_ttm_fini(struct intel_memory_region *mem);
+int intel_region_ttm_fini(struct intel_memory_region *mem);
 
 struct i915_refct_sgt *
 intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
index 418caae84759..0d5df0dc7212 100644
--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
@@ -225,7 +225,7 @@ static int igt_mock_reserve(void *arg)
 
 out_close:
 	close_objects(mem, &objects);
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 out_free_order:
 	kfree(order);
 	return err;
@@ -439,7 +439,7 @@ static int igt_mock_splintered_region(void *arg)
 out_close:
 	close_objects(mem, &objects);
 out_put:
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 	return err;
 }
 
@@ -507,7 +507,7 @@ static int igt_mock_max_segment(void *arg)
 out_close:
 	close_objects(mem, &objects);
 out_put:
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 	return err;
 }
 
@@ -1196,7 +1196,7 @@ int intel_memory_region_mock_selftests(void)
 
 	err = i915_subtests(tests, mem);
 
-	intel_memory_region_put(mem);
+	intel_memory_region_destroy(mem);
 out_unref:
 	mock_destroy_device(i915);
 	return err;
diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c
index 7ec5037eaa58..19bff8afcaaa 100644
--- a/drivers/gpu/drm/i915/selftests/mock_region.c
+++ b/drivers/gpu/drm/i915/selftests/mock_region.c
@@ -84,13 +84,16 @@ static int mock_object_init(struct intel_memory_region *mem,
 	return 0;
 }
 
-static void mock_region_fini(struct intel_memory_region *mem)
+static int mock_region_fini(struct intel_memory_region *mem)
 {
 	struct drm_i915_private *i915 = mem->i915;
 	int instance = mem->instance;
+	int ret;
 
-	intel_region_ttm_fini(mem);
+	ret = intel_region_ttm_fini(mem);
 	ida_free(&i915->selftest.mock_region_instances, instance);
+
+	return ret;
 }
 
 static const struct intel_memory_region_ops mock_region_ops = {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 4/6] drm/i915/ttm: Correctly handle waiting for gpu when shrinking
  2021-11-18 13:02 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-18 13:02   ` Thomas Hellström
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

With async migration, the shrinker may end up wanting to release the
pages of an object while the migration blit is still running, since
the GT migration code doesn't set up VMAs and the shrinker is thus
oblivious to the fact that the GPU is still using the pages.

Add waiting for gpu in the shrinker_release_pages() op and an
argument to that function indicating whether the shrinker expects it
to not wait for gpu. In the latter case the shrinker_release_pages()
op will return -EBUSY if the object is not idle.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c     | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c          | 7 ++++++-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 604ed5ad77f5..f9f7e44099fe 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -59,6 +59,7 @@ struct drm_i915_gem_object_ops {
 	int (*truncate)(struct drm_i915_gem_object *obj);
 	void (*writeback)(struct drm_i915_gem_object *obj);
 	int (*shrinker_release_pages)(struct drm_i915_gem_object *obj,
+				      bool no_gpu_wait,
 				      bool should_writeback);
 
 	int (*pread)(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index dde0a5c232f8..8b4b5f3a432a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -60,6 +60,7 @@ static int try_to_writeback(struct drm_i915_gem_object *obj, unsigned int flags)
 {
 	if (obj->ops->shrinker_release_pages)
 		return obj->ops->shrinker_release_pages(obj,
+							!(flags & I915_SHRINK_ACTIVE),
 							flags & I915_SHRINK_WRITEBACK);
 
 	switch (obj->mm.madv) {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 350bf1a23db5..e37157b080e4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -418,6 +418,7 @@ int i915_ttm_purge(struct drm_i915_gem_object *obj)
 }
 
 static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
+					   bool no_wait_gpu,
 					   bool should_writeback)
 {
 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
@@ -425,7 +426,7 @@ static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
 		container_of(bo->ttm, typeof(*i915_tt), ttm);
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
-		.no_wait_gpu = false,
+		.no_wait_gpu = no_wait_gpu,
 	};
 	struct ttm_placement place = {};
 	int ret;
@@ -438,6 +439,10 @@ static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
 	if (!i915_tt->filp)
 		return 0;
 
+	ret = ttm_bo_wait_ctx(bo, &ctx);
+	if (ret)
+		return ret;
+
 	switch (obj->mm.madv) {
 	case I915_MADV_DONTNEED:
 		return i915_ttm_purge(obj);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v4 4/6] drm/i915/ttm: Correctly handle waiting for gpu when shrinking
@ 2021-11-18 13:02   ` Thomas Hellström
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

With async migration, the shrinker may end up wanting to release the
pages of an object while the migration blit is still running, since
the GT migration code doesn't set up VMAs and the shrinker is thus
oblivious to the fact that the GPU is still using the pages.

Add waiting for gpu in the shrinker_release_pages() op and an
argument to that function indicating whether the shrinker expects it
to not wait for gpu. In the latter case the shrinker_release_pages()
op will return -EBUSY if the object is not idle.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c     | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c          | 7 ++++++-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 604ed5ad77f5..f9f7e44099fe 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -59,6 +59,7 @@ struct drm_i915_gem_object_ops {
 	int (*truncate)(struct drm_i915_gem_object *obj);
 	void (*writeback)(struct drm_i915_gem_object *obj);
 	int (*shrinker_release_pages)(struct drm_i915_gem_object *obj,
+				      bool no_gpu_wait,
 				      bool should_writeback);
 
 	int (*pread)(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index dde0a5c232f8..8b4b5f3a432a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -60,6 +60,7 @@ static int try_to_writeback(struct drm_i915_gem_object *obj, unsigned int flags)
 {
 	if (obj->ops->shrinker_release_pages)
 		return obj->ops->shrinker_release_pages(obj,
+							!(flags & I915_SHRINK_ACTIVE),
 							flags & I915_SHRINK_WRITEBACK);
 
 	switch (obj->mm.madv) {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 350bf1a23db5..e37157b080e4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -418,6 +418,7 @@ int i915_ttm_purge(struct drm_i915_gem_object *obj)
 }
 
 static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
+					   bool no_wait_gpu,
 					   bool should_writeback)
 {
 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
@@ -425,7 +426,7 @@ static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
 		container_of(bo->ttm, typeof(*i915_tt), ttm);
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
-		.no_wait_gpu = false,
+		.no_wait_gpu = no_wait_gpu,
 	};
 	struct ttm_placement place = {};
 	int ret;
@@ -438,6 +439,10 @@ static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
 	if (!i915_tt->filp)
 		return 0;
 
+	ret = ttm_bo_wait_ctx(bo, &ctx);
+	if (ret)
+		return ret;
+
 	switch (obj->mm.madv) {
 	case I915_MADV_DONTNEED:
 		return i915_ttm_purge(obj);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 5/6] drm/i915/ttm: Implement asynchronous TTM moves
  2021-11-18 13:02 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-18 13:02   ` Thomas Hellström
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Don't wait sync while migrating, but rather make the GPU blit await the
dependencies and add a moving fence to the object.

This also enables asynchronous VRAM management in that on eviction,
rather than waiting for the moving fence to expire before freeing VRAM,
it is freed immediately and the fence is stored with the VRAM manager and
handed out to newly allocated objects to await before clears and swapins,
or for kernel objects before setting up gpu vmas or mapping.

To collect dependencies before migrating, add a set of utilities that
coalesce these to a single dma_fence.

What is still missing for fully asynchronous operation is asynchronous vma
unbinding, which is still to be implemented.

This commit substantially reduces execution time in the gem_lmem_swapping
test.

v2:
- Make a couple of functions static.
v4:
- Fix some style issues (Matthew Auld)
- Audit and add more checks for ghost objects (Matthew Auld)
- Add more documentation for the i915_deps utility (Mattew Auld)
- Simplify the i915_deps_sync() function

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  32 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 338 +++++++++++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_wait.c     |   4 +-
 4 files changed, 344 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index e37157b080e4..81e84c1763de 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -248,10 +248,13 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 	struct ttm_resource_manager *man =
 		ttm_manager_type(bo->bdev, bo->resource->mem_type);
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-	enum ttm_caching caching = i915_ttm_select_tt_caching(obj);
+	enum ttm_caching caching;
 	struct i915_ttm_tt *i915_tt;
 	int ret;
 
+	if (!obj)
+		return NULL;
+
 	i915_tt = kzalloc(sizeof(*i915_tt), GFP_KERNEL);
 	if (!i915_tt)
 		return NULL;
@@ -260,6 +263,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 	    man->use_tt)
 		page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
 
+	caching = i915_ttm_select_tt_caching(obj);
 	if (i915_gem_object_is_shrinkable(obj) && caching == ttm_cached) {
 		page_flags |= TTM_TT_FLAG_EXTERNAL |
 			      TTM_TT_FLAG_EXTERNAL_MAPPABLE;
@@ -326,6 +330,9 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
 {
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 
+	if (!obj)
+		return false;
+
 	/*
 	 * EXTERNAL objects should never be swapped out by TTM, instead we need
 	 * to handle that ourselves. TTM will already skip such objects for us,
@@ -552,8 +559,12 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
 {
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-	int ret = i915_ttm_move_notify(bo);
+	int ret;
 
+	if (!obj)
+		return;
+
+	ret = i915_ttm_move_notify(bo);
 	GEM_WARN_ON(ret);
 	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
 	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
@@ -575,17 +586,23 @@ static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo,
 					 unsigned long page_offset)
 {
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-	unsigned long base = obj->mm.region->iomap.base - obj->mm.region->region.start;
 	struct scatterlist *sg;
+	unsigned long base;
 	unsigned int ofs;
 
+	GEM_BUG_ON(!obj);
 	GEM_WARN_ON(bo->ttm);
 
+	base = obj->mm.region->iomap.base - obj->mm.region->region.start;
 	sg = __i915_gem_object_get_sg(obj, &obj->ttm.get_io_page, page_offset, &ofs, true);
 
 	return ((base + sg_dma_address(sg)) >> PAGE_SHIFT) + ofs;
 }
 
+/*
+ * All callbacks need to take care not to downcast a struct ttm_buffer_object
+ * without checking its subclass, since it might be a TTM ghost object.
+ */
 static struct ttm_device_funcs i915_ttm_bo_driver = {
 	.ttm_tt_create = i915_ttm_tt_create,
 	.ttm_tt_populate = i915_ttm_tt_populate,
@@ -847,13 +864,16 @@ static void i915_ttm_delayed_free(struct drm_i915_gem_object *obj)
 static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
 {
 	struct vm_area_struct *area = vmf->vma;
-	struct drm_i915_gem_object *obj =
-		i915_ttm_to_gem(area->vm_private_data);
-	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
+	struct ttm_buffer_object *bo = area->vm_private_data;
 	struct drm_device *dev = bo->base.dev;
+	struct drm_i915_gem_object *obj;
 	vm_fault_t ret;
 	int idx;
 
+	obj = i915_ttm_to_gem(bo);
+	if (!obj)
+		return VM_FAULT_SIGBUS;
+
 	/* Sanity check that we allow writing into this object */
 	if (unlikely(i915_gem_object_is_readonly(obj) &&
 		     area->vm_flags & VM_WRITE))
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 82cdabb542be..9d698ad00853 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -37,7 +37,7 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo);
 static inline struct drm_i915_gem_object *
 i915_ttm_to_gem(struct ttm_buffer_object *bo)
 {
-	if (GEM_WARN_ON(bo->destroy != i915_ttm_bo_destroy))
+	if (bo->destroy != i915_ttm_bo_destroy)
 		return NULL;
 
 	return container_of(bo, struct drm_i915_gem_object, __do_not_access);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index f35b386c56ca..38623fde170a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,8 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include <linux/dma-fence-array.h>
+
 #include <drm/ttm/ttm_bo_driver.h>
 
 #include "i915_drv.h"
@@ -41,6 +43,234 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 }
 #endif
 
+/**
+ * DOC: Set of utilities to dynamically collect dependencies and
+ * eventually coalesce them into a single fence which is fed into
+ * the GT migration code, since it only accepts a single dependency
+ * fence.
+ * The single fence returned from these utilities, in the case of
+ * dependencies from multiple fence contexts, a struct dma_fence_array,
+ * since the i915 request code can break that up and await the individual
+ * fences.
+ *
+ * Once we can do async unbinding, this is also needed to coalesce
+ * the migration fence with the unbind fences.
+ *
+ * While collecting the individual dependencies, we store the refcounted
+ * struct dma_fence pointers in a realloc-managed pointer array, since
+ * that can be easily fed into a dma_fence_array. Other options are
+ * available, like for example an xarray for similarity with drm/sched.
+ * Can be changed easily if needed.
+ *
+ * A struct i915_deps need to be initialized using i915_deps_init().
+ * If i915_deps_add_dependency() or i915_deps_add_resv() return an
+ * error code they will internally call i915_deps_fini(), which frees
+ * all internal references and allocations. After a call to
+ * i915_deps_to_fence(), or i915_deps_sync(), the struct should similarly
+ * be viewed as uninitialized.
+ *
+ * We might want to break this out into a separate file as a utility.
+ */
+
+#define I915_DEPS_MIN_ALLOC_CHUNK 8U
+
+/**
+ * struct i915_deps - Collect dependencies into a single dma-fence
+ * @single: Storage for pointer if the collection is a single fence.
+ * @fence: Allocated array of fence pointers if more than a single fence;
+ * otherwise points to the address of @single.
+ * @num_deps: Current number of dependency fences.
+ * @fences_size: Size of the @fences array in number of pointers.
+ * @gfp: Allocation mode.
+ */
+struct i915_deps {
+	struct dma_fence *single;
+	struct dma_fence **fences;
+	unsigned int num_deps;
+	unsigned int fences_size;
+	gfp_t gfp;
+};
+
+static void i915_deps_reset_fences(struct i915_deps *deps)
+{
+	if (deps->fences != &deps->single)
+		kfree(deps->fences);
+	deps->num_deps = 0;
+	deps->fences_size = 1;
+	deps->fences = &deps->single;
+}
+
+static void i915_deps_init(struct i915_deps *deps, gfp_t gfp)
+{
+	deps->fences = NULL;
+	deps->gfp = gfp;
+	i915_deps_reset_fences(deps);
+}
+
+static void i915_deps_fini(struct i915_deps *deps)
+{
+	unsigned int i;
+
+	for (i = 0; i < deps->num_deps; ++i)
+		dma_fence_put(deps->fences[i]);
+
+	if (deps->fences != &deps->single)
+		kfree(deps->fences);
+}
+
+static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence,
+			  const struct ttm_operation_ctx *ctx)
+{
+	int ret;
+
+	if (deps->num_deps >= deps->fences_size) {
+		unsigned int new_size = 2 * deps->fences_size;
+		struct dma_fence **new_fences;
+
+		new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK);
+		new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp);
+		if (!new_fences)
+			goto sync;
+
+		memcpy(new_fences, deps->fences,
+		       deps->fences_size * sizeof(*new_fences));
+		swap(new_fences, deps->fences);
+		if (new_fences != &deps->single)
+			kfree(new_fences);
+		deps->fences_size = new_size;
+	}
+	deps->fences[deps->num_deps++] = dma_fence_get(fence);
+	return 0;
+
+sync:
+	if (ctx->no_wait_gpu) {
+		ret = -EBUSY;
+		goto unref;
+	}
+
+	ret = dma_fence_wait(fence, ctx->interruptible);
+	if (ret)
+		goto unref;
+
+	ret = fence->error;
+	if (ret)
+		goto unref;
+
+	return 0;
+
+unref:
+	i915_deps_fini(deps);
+	return ret;
+}
+
+static int i915_deps_sync(struct i915_deps *deps,
+			  const struct ttm_operation_ctx *ctx)
+{
+	struct dma_fence **fences = deps->fences;
+	unsigned int i;
+	int ret = 0;
+
+	for (i = 0; i < deps->num_deps; ++i, ++fences) {
+		if (ctx->no_wait_gpu) {
+			ret = -EBUSY;
+			break;
+		}
+
+		ret = dma_fence_wait(*fences, ctx->interruptible);
+		if (!ret)
+			ret = (*fences)->error;
+		if (ret)
+			break;
+	}
+
+	i915_deps_fini(deps);
+	return ret;
+}
+
+static int i915_deps_add_dependency(struct i915_deps *deps,
+				    struct dma_fence *fence,
+				    const struct ttm_operation_ctx *ctx)
+{
+	unsigned int i;
+	int ret;
+
+	if (!fence)
+		return 0;
+
+	if (dma_fence_is_signaled(fence)) {
+		ret = fence->error;
+		if (ret)
+			i915_deps_fini(deps);
+		return ret;
+	}
+
+	for (i = 0; i < deps->num_deps; ++i) {
+		struct dma_fence *entry = deps->fences[i];
+
+		if (!entry->context || entry->context != fence->context)
+			continue;
+
+		if (dma_fence_is_later(fence, entry)) {
+			dma_fence_put(entry);
+			deps->fences[i] = dma_fence_get(fence);
+		}
+
+		return 0;
+	}
+
+	return i915_deps_grow(deps, fence, ctx);
+}
+
+static struct dma_fence *i915_deps_to_fence(struct i915_deps *deps,
+					    const struct ttm_operation_ctx *ctx)
+{
+	struct dma_fence_array *array;
+
+	if (deps->num_deps == 0)
+		return NULL;
+
+	if (deps->num_deps == 1) {
+		deps->num_deps = 0;
+		return deps->fences[0];
+	}
+
+	/*
+	 * TODO: Alter the allocation mode here to not try too hard to
+	 * make things async.
+	 */
+	array = dma_fence_array_create(deps->num_deps, deps->fences, 0, 0,
+				       false);
+	if (!array)
+		return ERR_PTR(i915_deps_sync(deps, ctx));
+
+	deps->fences = NULL;
+	i915_deps_reset_fences(deps);
+
+	return &array->base;
+}
+
+static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv,
+			      bool all, const bool no_excl,
+			      const struct ttm_operation_ctx *ctx)
+{
+	struct dma_resv_iter iter;
+	struct dma_fence *fence;
+
+	dma_resv_assert_held(resv);
+	dma_resv_for_each_fence(&iter, resv, all, fence) {
+		int ret;
+
+		if (no_excl && !iter.index)
+			continue;
+
+		ret = i915_deps_add_dependency(deps, fence, ctx);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 		     struct ttm_tt *ttm)
@@ -156,7 +386,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 					     bool clear,
 					     struct ttm_resource *dst_mem,
 					     struct ttm_tt *dst_ttm,
-					     struct sg_table *dst_st)
+					     struct sg_table *dst_st,
+					     struct dma_fence *dep)
 {
 	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
 						     bdev);
@@ -180,7 +411,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 			return ERR_PTR(-EINVAL);
 
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
-		ret = intel_context_migrate_clear(i915->gt.migrate.context, NULL,
+		ret = intel_context_migrate_clear(i915->gt.migrate.context, dep,
 						  dst_st->sgl, dst_level,
 						  i915_ttm_gtt_binds_lmem(dst_mem),
 						  0, &rq);
@@ -194,7 +425,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
 		ret = intel_context_migrate_copy(i915->gt.migrate.context,
-						 NULL, src_rsgt->table.sgl,
+						 dep, src_rsgt->table.sgl,
 						 src_level,
 						 i915_ttm_gtt_binds_lmem(bo->resource),
 						 dst_st->sgl, dst_level,
@@ -378,10 +609,11 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 	return &work->fence;
 }
 
-static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-			    struct ttm_resource *dst_mem,
-			    struct ttm_tt *dst_ttm,
-			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+static struct dma_fence *
+__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+		struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
+		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
+		struct dma_fence *move_dep)
 {
 	struct i915_ttm_memcpy_work *copy_work = NULL;
 	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
@@ -389,7 +621,7 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 
 	if (allow_accel) {
 		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
-					    &dst_rsgt->table);
+					    &dst_rsgt->table, move_dep);
 
 		/*
 		 * We only need to intercept the error when moving to lmem.
@@ -423,6 +655,11 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 
 		if (!IS_ERR(fence))
 			goto out;
+	} else if (move_dep) {
+		int err = dma_fence_wait(move_dep, true);
+
+		if (err)
+			return ERR_PTR(err);
 	}
 
 	/* Error intercept failed or no accelerated migration to start with */
@@ -433,16 +670,35 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 	i915_ttm_memcpy_release(arg);
 	kfree(copy_work);
 
-	return;
+	return NULL;
 out:
-	/* Sync here for now, forward the fence to caller when fully async. */
-	if (fence) {
-		dma_fence_wait(fence, false);
-		dma_fence_put(fence);
-	} else if (copy_work) {
+	if (!fence && copy_work) {
 		i915_ttm_memcpy_release(arg);
 		kfree(copy_work);
 	}
+
+	return fence;
+}
+
+static struct dma_fence *prev_fence(struct ttm_buffer_object *bo,
+				    struct ttm_operation_ctx *ctx)
+{
+	struct i915_deps deps;
+	int ret;
+
+	/*
+	 * Instead of trying hard with GFP_KERNEL to allocate memory,
+	 * the dependency collection will just sync if it doesn't
+	 * succeed.
+	 */
+	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
+	ret = i915_deps_add_dependency(&deps, bo->moving, ctx);
+	if (!ret)
+		ret = i915_deps_add_resv(&deps, bo->base.resv, false, false, ctx);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return i915_deps_to_fence(&deps, ctx);
 }
 
 /**
@@ -462,15 +718,16 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 	struct ttm_resource_manager *dst_man =
 		ttm_manager_type(bo->bdev, dst_mem->mem_type);
+	struct dma_fence *migration_fence = NULL;
 	struct ttm_tt *ttm = bo->ttm;
 	struct i915_refct_sgt *dst_rsgt;
 	bool clear;
 	int ret;
 
-	/* Sync for now. We could do the actual copy async. */
-	ret = ttm_bo_wait_ctx(bo, ctx);
-	if (ret)
-		return ret;
+	if (GEM_WARN_ON(!obj)) {
+		ttm_bo_move_null(bo, dst_mem);
+		return 0;
+	}
 
 	ret = i915_ttm_move_notify(bo);
 	if (ret)
@@ -494,10 +751,37 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		return PTR_ERR(dst_rsgt);
 
 	clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
-	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
-		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
+	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+		struct dma_fence *dep = prev_fence(bo, ctx);
+
+		if (IS_ERR(dep)) {
+			i915_refct_sgt_put(dst_rsgt);
+			return PTR_ERR(dep);
+		}
+
+		migration_fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm,
+						  dst_rsgt, true, dep);
+		dma_fence_put(dep);
+	}
+
+	/* We can possibly get an -ERESTARTSYS here */
+	if (IS_ERR(migration_fence)) {
+		i915_refct_sgt_put(dst_rsgt);
+		return PTR_ERR(migration_fence);
+	}
+
+	if (migration_fence) {
+		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
+						true, dst_mem);
+		if (ret) {
+			dma_fence_wait(migration_fence, false);
+			ttm_bo_move_sync_cleanup(bo, dst_mem);
+		}
+		dma_fence_put(migration_fence);
+	} else {
+		ttm_bo_move_sync_cleanup(bo, dst_mem);
+	}
 
-	ttm_bo_move_sync_cleanup(bo, dst_mem);
 	i915_ttm_adjust_domains_after_move(obj);
 	i915_ttm_free_cached_io_rsgt(obj);
 
@@ -538,6 +822,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		.interruptible = intr,
 	};
 	struct i915_refct_sgt *dst_rsgt;
+	struct dma_fence *copy_fence;
 	int ret;
 
 	assert_object_held(dst);
@@ -553,10 +838,17 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		return ret;
 
 	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
-	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_rsgt, allow_accel);
+	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
+				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
 
 	i915_refct_sgt_put(dst_rsgt);
+	if (IS_ERR(copy_fence))
+		return PTR_ERR(copy_fence);
+
+	if (copy_fence) {
+		dma_fence_wait(copy_fence, false);
+		dma_fence_put(copy_fence);
+	}
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index f909aaa09d9c..bae65796a6cc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -306,6 +306,6 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
 				   unsigned int flags)
 {
 	might_sleep();
-	/* NOP for now. */
-	return 0;
+
+	return i915_gem_object_wait_moving_fence(obj, !!(flags & I915_WAIT_INTERRUPTIBLE));
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v4 5/6] drm/i915/ttm: Implement asynchronous TTM moves
@ 2021-11-18 13:02   ` Thomas Hellström
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Don't wait sync while migrating, but rather make the GPU blit await the
dependencies and add a moving fence to the object.

This also enables asynchronous VRAM management in that on eviction,
rather than waiting for the moving fence to expire before freeing VRAM,
it is freed immediately and the fence is stored with the VRAM manager and
handed out to newly allocated objects to await before clears and swapins,
or for kernel objects before setting up gpu vmas or mapping.

To collect dependencies before migrating, add a set of utilities that
coalesce these to a single dma_fence.

What is still missing for fully asynchronous operation is asynchronous vma
unbinding, which is still to be implemented.

This commit substantially reduces execution time in the gem_lmem_swapping
test.

v2:
- Make a couple of functions static.
v4:
- Fix some style issues (Matthew Auld)
- Audit and add more checks for ghost objects (Matthew Auld)
- Add more documentation for the i915_deps utility (Mattew Auld)
- Simplify the i915_deps_sync() function

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  32 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 338 +++++++++++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_wait.c     |   4 +-
 4 files changed, 344 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index e37157b080e4..81e84c1763de 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -248,10 +248,13 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 	struct ttm_resource_manager *man =
 		ttm_manager_type(bo->bdev, bo->resource->mem_type);
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-	enum ttm_caching caching = i915_ttm_select_tt_caching(obj);
+	enum ttm_caching caching;
 	struct i915_ttm_tt *i915_tt;
 	int ret;
 
+	if (!obj)
+		return NULL;
+
 	i915_tt = kzalloc(sizeof(*i915_tt), GFP_KERNEL);
 	if (!i915_tt)
 		return NULL;
@@ -260,6 +263,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 	    man->use_tt)
 		page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
 
+	caching = i915_ttm_select_tt_caching(obj);
 	if (i915_gem_object_is_shrinkable(obj) && caching == ttm_cached) {
 		page_flags |= TTM_TT_FLAG_EXTERNAL |
 			      TTM_TT_FLAG_EXTERNAL_MAPPABLE;
@@ -326,6 +330,9 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
 {
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 
+	if (!obj)
+		return false;
+
 	/*
 	 * EXTERNAL objects should never be swapped out by TTM, instead we need
 	 * to handle that ourselves. TTM will already skip such objects for us,
@@ -552,8 +559,12 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
 {
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-	int ret = i915_ttm_move_notify(bo);
+	int ret;
 
+	if (!obj)
+		return;
+
+	ret = i915_ttm_move_notify(bo);
 	GEM_WARN_ON(ret);
 	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
 	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
@@ -575,17 +586,23 @@ static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo,
 					 unsigned long page_offset)
 {
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-	unsigned long base = obj->mm.region->iomap.base - obj->mm.region->region.start;
 	struct scatterlist *sg;
+	unsigned long base;
 	unsigned int ofs;
 
+	GEM_BUG_ON(!obj);
 	GEM_WARN_ON(bo->ttm);
 
+	base = obj->mm.region->iomap.base - obj->mm.region->region.start;
 	sg = __i915_gem_object_get_sg(obj, &obj->ttm.get_io_page, page_offset, &ofs, true);
 
 	return ((base + sg_dma_address(sg)) >> PAGE_SHIFT) + ofs;
 }
 
+/*
+ * All callbacks need to take care not to downcast a struct ttm_buffer_object
+ * without checking its subclass, since it might be a TTM ghost object.
+ */
 static struct ttm_device_funcs i915_ttm_bo_driver = {
 	.ttm_tt_create = i915_ttm_tt_create,
 	.ttm_tt_populate = i915_ttm_tt_populate,
@@ -847,13 +864,16 @@ static void i915_ttm_delayed_free(struct drm_i915_gem_object *obj)
 static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
 {
 	struct vm_area_struct *area = vmf->vma;
-	struct drm_i915_gem_object *obj =
-		i915_ttm_to_gem(area->vm_private_data);
-	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
+	struct ttm_buffer_object *bo = area->vm_private_data;
 	struct drm_device *dev = bo->base.dev;
+	struct drm_i915_gem_object *obj;
 	vm_fault_t ret;
 	int idx;
 
+	obj = i915_ttm_to_gem(bo);
+	if (!obj)
+		return VM_FAULT_SIGBUS;
+
 	/* Sanity check that we allow writing into this object */
 	if (unlikely(i915_gem_object_is_readonly(obj) &&
 		     area->vm_flags & VM_WRITE))
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 82cdabb542be..9d698ad00853 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -37,7 +37,7 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo);
 static inline struct drm_i915_gem_object *
 i915_ttm_to_gem(struct ttm_buffer_object *bo)
 {
-	if (GEM_WARN_ON(bo->destroy != i915_ttm_bo_destroy))
+	if (bo->destroy != i915_ttm_bo_destroy)
 		return NULL;
 
 	return container_of(bo, struct drm_i915_gem_object, __do_not_access);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index f35b386c56ca..38623fde170a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,8 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include <linux/dma-fence-array.h>
+
 #include <drm/ttm/ttm_bo_driver.h>
 
 #include "i915_drv.h"
@@ -41,6 +43,234 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 }
 #endif
 
+/**
+ * DOC: Set of utilities to dynamically collect dependencies and
+ * eventually coalesce them into a single fence which is fed into
+ * the GT migration code, since it only accepts a single dependency
+ * fence.
+ * The single fence returned from these utilities, in the case of
+ * dependencies from multiple fence contexts, a struct dma_fence_array,
+ * since the i915 request code can break that up and await the individual
+ * fences.
+ *
+ * Once we can do async unbinding, this is also needed to coalesce
+ * the migration fence with the unbind fences.
+ *
+ * While collecting the individual dependencies, we store the refcounted
+ * struct dma_fence pointers in a realloc-managed pointer array, since
+ * that can be easily fed into a dma_fence_array. Other options are
+ * available, like for example an xarray for similarity with drm/sched.
+ * Can be changed easily if needed.
+ *
+ * A struct i915_deps need to be initialized using i915_deps_init().
+ * If i915_deps_add_dependency() or i915_deps_add_resv() return an
+ * error code they will internally call i915_deps_fini(), which frees
+ * all internal references and allocations. After a call to
+ * i915_deps_to_fence(), or i915_deps_sync(), the struct should similarly
+ * be viewed as uninitialized.
+ *
+ * We might want to break this out into a separate file as a utility.
+ */
+
+#define I915_DEPS_MIN_ALLOC_CHUNK 8U
+
+/**
+ * struct i915_deps - Collect dependencies into a single dma-fence
+ * @single: Storage for pointer if the collection is a single fence.
+ * @fence: Allocated array of fence pointers if more than a single fence;
+ * otherwise points to the address of @single.
+ * @num_deps: Current number of dependency fences.
+ * @fences_size: Size of the @fences array in number of pointers.
+ * @gfp: Allocation mode.
+ */
+struct i915_deps {
+	struct dma_fence *single;
+	struct dma_fence **fences;
+	unsigned int num_deps;
+	unsigned int fences_size;
+	gfp_t gfp;
+};
+
+static void i915_deps_reset_fences(struct i915_deps *deps)
+{
+	if (deps->fences != &deps->single)
+		kfree(deps->fences);
+	deps->num_deps = 0;
+	deps->fences_size = 1;
+	deps->fences = &deps->single;
+}
+
+static void i915_deps_init(struct i915_deps *deps, gfp_t gfp)
+{
+	deps->fences = NULL;
+	deps->gfp = gfp;
+	i915_deps_reset_fences(deps);
+}
+
+static void i915_deps_fini(struct i915_deps *deps)
+{
+	unsigned int i;
+
+	for (i = 0; i < deps->num_deps; ++i)
+		dma_fence_put(deps->fences[i]);
+
+	if (deps->fences != &deps->single)
+		kfree(deps->fences);
+}
+
+static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence,
+			  const struct ttm_operation_ctx *ctx)
+{
+	int ret;
+
+	if (deps->num_deps >= deps->fences_size) {
+		unsigned int new_size = 2 * deps->fences_size;
+		struct dma_fence **new_fences;
+
+		new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK);
+		new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp);
+		if (!new_fences)
+			goto sync;
+
+		memcpy(new_fences, deps->fences,
+		       deps->fences_size * sizeof(*new_fences));
+		swap(new_fences, deps->fences);
+		if (new_fences != &deps->single)
+			kfree(new_fences);
+		deps->fences_size = new_size;
+	}
+	deps->fences[deps->num_deps++] = dma_fence_get(fence);
+	return 0;
+
+sync:
+	if (ctx->no_wait_gpu) {
+		ret = -EBUSY;
+		goto unref;
+	}
+
+	ret = dma_fence_wait(fence, ctx->interruptible);
+	if (ret)
+		goto unref;
+
+	ret = fence->error;
+	if (ret)
+		goto unref;
+
+	return 0;
+
+unref:
+	i915_deps_fini(deps);
+	return ret;
+}
+
+static int i915_deps_sync(struct i915_deps *deps,
+			  const struct ttm_operation_ctx *ctx)
+{
+	struct dma_fence **fences = deps->fences;
+	unsigned int i;
+	int ret = 0;
+
+	for (i = 0; i < deps->num_deps; ++i, ++fences) {
+		if (ctx->no_wait_gpu) {
+			ret = -EBUSY;
+			break;
+		}
+
+		ret = dma_fence_wait(*fences, ctx->interruptible);
+		if (!ret)
+			ret = (*fences)->error;
+		if (ret)
+			break;
+	}
+
+	i915_deps_fini(deps);
+	return ret;
+}
+
+static int i915_deps_add_dependency(struct i915_deps *deps,
+				    struct dma_fence *fence,
+				    const struct ttm_operation_ctx *ctx)
+{
+	unsigned int i;
+	int ret;
+
+	if (!fence)
+		return 0;
+
+	if (dma_fence_is_signaled(fence)) {
+		ret = fence->error;
+		if (ret)
+			i915_deps_fini(deps);
+		return ret;
+	}
+
+	for (i = 0; i < deps->num_deps; ++i) {
+		struct dma_fence *entry = deps->fences[i];
+
+		if (!entry->context || entry->context != fence->context)
+			continue;
+
+		if (dma_fence_is_later(fence, entry)) {
+			dma_fence_put(entry);
+			deps->fences[i] = dma_fence_get(fence);
+		}
+
+		return 0;
+	}
+
+	return i915_deps_grow(deps, fence, ctx);
+}
+
+static struct dma_fence *i915_deps_to_fence(struct i915_deps *deps,
+					    const struct ttm_operation_ctx *ctx)
+{
+	struct dma_fence_array *array;
+
+	if (deps->num_deps == 0)
+		return NULL;
+
+	if (deps->num_deps == 1) {
+		deps->num_deps = 0;
+		return deps->fences[0];
+	}
+
+	/*
+	 * TODO: Alter the allocation mode here to not try too hard to
+	 * make things async.
+	 */
+	array = dma_fence_array_create(deps->num_deps, deps->fences, 0, 0,
+				       false);
+	if (!array)
+		return ERR_PTR(i915_deps_sync(deps, ctx));
+
+	deps->fences = NULL;
+	i915_deps_reset_fences(deps);
+
+	return &array->base;
+}
+
+static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv,
+			      bool all, const bool no_excl,
+			      const struct ttm_operation_ctx *ctx)
+{
+	struct dma_resv_iter iter;
+	struct dma_fence *fence;
+
+	dma_resv_assert_held(resv);
+	dma_resv_for_each_fence(&iter, resv, all, fence) {
+		int ret;
+
+		if (no_excl && !iter.index)
+			continue;
+
+		ret = i915_deps_add_dependency(deps, fence, ctx);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 		     struct ttm_tt *ttm)
@@ -156,7 +386,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 					     bool clear,
 					     struct ttm_resource *dst_mem,
 					     struct ttm_tt *dst_ttm,
-					     struct sg_table *dst_st)
+					     struct sg_table *dst_st,
+					     struct dma_fence *dep)
 {
 	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
 						     bdev);
@@ -180,7 +411,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 			return ERR_PTR(-EINVAL);
 
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
-		ret = intel_context_migrate_clear(i915->gt.migrate.context, NULL,
+		ret = intel_context_migrate_clear(i915->gt.migrate.context, dep,
 						  dst_st->sgl, dst_level,
 						  i915_ttm_gtt_binds_lmem(dst_mem),
 						  0, &rq);
@@ -194,7 +425,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
 		ret = intel_context_migrate_copy(i915->gt.migrate.context,
-						 NULL, src_rsgt->table.sgl,
+						 dep, src_rsgt->table.sgl,
 						 src_level,
 						 i915_ttm_gtt_binds_lmem(bo->resource),
 						 dst_st->sgl, dst_level,
@@ -378,10 +609,11 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 	return &work->fence;
 }
 
-static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-			    struct ttm_resource *dst_mem,
-			    struct ttm_tt *dst_ttm,
-			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+static struct dma_fence *
+__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+		struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
+		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
+		struct dma_fence *move_dep)
 {
 	struct i915_ttm_memcpy_work *copy_work = NULL;
 	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
@@ -389,7 +621,7 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 
 	if (allow_accel) {
 		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
-					    &dst_rsgt->table);
+					    &dst_rsgt->table, move_dep);
 
 		/*
 		 * We only need to intercept the error when moving to lmem.
@@ -423,6 +655,11 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 
 		if (!IS_ERR(fence))
 			goto out;
+	} else if (move_dep) {
+		int err = dma_fence_wait(move_dep, true);
+
+		if (err)
+			return ERR_PTR(err);
 	}
 
 	/* Error intercept failed or no accelerated migration to start with */
@@ -433,16 +670,35 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 	i915_ttm_memcpy_release(arg);
 	kfree(copy_work);
 
-	return;
+	return NULL;
 out:
-	/* Sync here for now, forward the fence to caller when fully async. */
-	if (fence) {
-		dma_fence_wait(fence, false);
-		dma_fence_put(fence);
-	} else if (copy_work) {
+	if (!fence && copy_work) {
 		i915_ttm_memcpy_release(arg);
 		kfree(copy_work);
 	}
+
+	return fence;
+}
+
+static struct dma_fence *prev_fence(struct ttm_buffer_object *bo,
+				    struct ttm_operation_ctx *ctx)
+{
+	struct i915_deps deps;
+	int ret;
+
+	/*
+	 * Instead of trying hard with GFP_KERNEL to allocate memory,
+	 * the dependency collection will just sync if it doesn't
+	 * succeed.
+	 */
+	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
+	ret = i915_deps_add_dependency(&deps, bo->moving, ctx);
+	if (!ret)
+		ret = i915_deps_add_resv(&deps, bo->base.resv, false, false, ctx);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return i915_deps_to_fence(&deps, ctx);
 }
 
 /**
@@ -462,15 +718,16 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 	struct ttm_resource_manager *dst_man =
 		ttm_manager_type(bo->bdev, dst_mem->mem_type);
+	struct dma_fence *migration_fence = NULL;
 	struct ttm_tt *ttm = bo->ttm;
 	struct i915_refct_sgt *dst_rsgt;
 	bool clear;
 	int ret;
 
-	/* Sync for now. We could do the actual copy async. */
-	ret = ttm_bo_wait_ctx(bo, ctx);
-	if (ret)
-		return ret;
+	if (GEM_WARN_ON(!obj)) {
+		ttm_bo_move_null(bo, dst_mem);
+		return 0;
+	}
 
 	ret = i915_ttm_move_notify(bo);
 	if (ret)
@@ -494,10 +751,37 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		return PTR_ERR(dst_rsgt);
 
 	clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
-	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
-		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
+	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+		struct dma_fence *dep = prev_fence(bo, ctx);
+
+		if (IS_ERR(dep)) {
+			i915_refct_sgt_put(dst_rsgt);
+			return PTR_ERR(dep);
+		}
+
+		migration_fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm,
+						  dst_rsgt, true, dep);
+		dma_fence_put(dep);
+	}
+
+	/* We can possibly get an -ERESTARTSYS here */
+	if (IS_ERR(migration_fence)) {
+		i915_refct_sgt_put(dst_rsgt);
+		return PTR_ERR(migration_fence);
+	}
+
+	if (migration_fence) {
+		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
+						true, dst_mem);
+		if (ret) {
+			dma_fence_wait(migration_fence, false);
+			ttm_bo_move_sync_cleanup(bo, dst_mem);
+		}
+		dma_fence_put(migration_fence);
+	} else {
+		ttm_bo_move_sync_cleanup(bo, dst_mem);
+	}
 
-	ttm_bo_move_sync_cleanup(bo, dst_mem);
 	i915_ttm_adjust_domains_after_move(obj);
 	i915_ttm_free_cached_io_rsgt(obj);
 
@@ -538,6 +822,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		.interruptible = intr,
 	};
 	struct i915_refct_sgt *dst_rsgt;
+	struct dma_fence *copy_fence;
 	int ret;
 
 	assert_object_held(dst);
@@ -553,10 +838,17 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		return ret;
 
 	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
-	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_rsgt, allow_accel);
+	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
+				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
 
 	i915_refct_sgt_put(dst_rsgt);
+	if (IS_ERR(copy_fence))
+		return PTR_ERR(copy_fence);
+
+	if (copy_fence) {
+		dma_fence_wait(copy_fence, false);
+		dma_fence_put(copy_fence);
+	}
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index f909aaa09d9c..bae65796a6cc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -306,6 +306,6 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
 				   unsigned int flags)
 {
 	might_sleep();
-	/* NOP for now. */
-	return 0;
+
+	return i915_gem_object_wait_moving_fence(obj, !!(flags & I915_WAIT_INTERRUPTIBLE));
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 6/6] drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous
  2021-11-18 13:02 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-18 13:02   ` Thomas Hellström
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Update the copy function i915_gem_obj_copy_ttm() to be asynchronous for
future users and update the only current user to sync the objects
as needed after this function.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 40 ++++++++++++++------
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c   |  2 +
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 38623fde170a..d377c86232f1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -822,33 +822,49 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		.interruptible = intr,
 	};
 	struct i915_refct_sgt *dst_rsgt;
-	struct dma_fence *copy_fence;
-	int ret;
+	struct dma_fence *copy_fence, *dep_fence;
+	struct i915_deps deps;
+	int ret, shared_err;
 
 	assert_object_held(dst);
 	assert_object_held(src);
+	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
 
 	/*
-	 * Sync for now. This will change with async moves.
+	 * We plan to add a shared fence only for the source. If that
+	 * fails, we await all source fences before commencing
+	 * the copy instead of only the exclusive.
 	 */
-	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
+	shared_err = dma_resv_reserve_shared(src_bo->base.resv, 1);
+	ret = i915_deps_add_resv(&deps, dst_bo->base.resv, true, false, &ctx);
 	if (!ret)
-		ret = ttm_bo_wait_ctx(src_bo, &ctx);
+		ret = i915_deps_add_resv(&deps, src_bo->base.resv,
+					 !!shared_err, false, &ctx);
 	if (ret)
 		return ret;
 
+	dep_fence = i915_deps_to_fence(&deps, &ctx);
+	if (IS_ERR(dep_fence))
+		return PTR_ERR(dep_fence);
+
 	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
 	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
-				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
+				     dst_bo->ttm, dst_rsgt, allow_accel,
+				     dep_fence);
 
 	i915_refct_sgt_put(dst_rsgt);
-	if (IS_ERR(copy_fence))
-		return PTR_ERR(copy_fence);
+	if (IS_ERR_OR_NULL(copy_fence))
+		return PTR_ERR_OR_ZERO(copy_fence);
 
-	if (copy_fence) {
-		dma_fence_wait(copy_fence, false);
-		dma_fence_put(copy_fence);
-	}
+	dma_resv_add_excl_fence(dst_bo->base.resv, copy_fence);
+
+	/* If we failed to reserve a shared slot, add an exclusive fence */
+	if (shared_err)
+		dma_resv_add_excl_fence(src_bo->base.resv, copy_fence);
+	else
+		dma_resv_add_shared_fence(src_bo->base.resv, copy_fence);
+
+	dma_fence_put(copy_fence);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 60d10ab55d1e..9aad84059d56 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -80,6 +80,7 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region *apply,
 
 	err = i915_gem_obj_copy_ttm(backup, obj, pm_apply->allow_gpu, false);
 	GEM_WARN_ON(err);
+	ttm_bo_wait_ctx(backup_bo, &ctx);
 
 	obj->ttm.backup = backup;
 	return 0;
@@ -170,6 +171,7 @@ static int i915_ttm_restore(struct i915_gem_apply_to_region *apply,
 		err = i915_gem_obj_copy_ttm(obj, backup, pm_apply->allow_gpu,
 					    false);
 		GEM_WARN_ON(err);
+		ttm_bo_wait_ctx(backup_bo, &ctx);
 
 		obj->ttm.backup = NULL;
 		err = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v4 6/6] drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous
@ 2021-11-18 13:02   ` Thomas Hellström
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Hellström @ 2021-11-18 13:02 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Update the copy function i915_gem_obj_copy_ttm() to be asynchronous for
future users and update the only current user to sync the objects
as needed after this function.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 40 ++++++++++++++------
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c   |  2 +
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 38623fde170a..d377c86232f1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -822,33 +822,49 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		.interruptible = intr,
 	};
 	struct i915_refct_sgt *dst_rsgt;
-	struct dma_fence *copy_fence;
-	int ret;
+	struct dma_fence *copy_fence, *dep_fence;
+	struct i915_deps deps;
+	int ret, shared_err;
 
 	assert_object_held(dst);
 	assert_object_held(src);
+	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
 
 	/*
-	 * Sync for now. This will change with async moves.
+	 * We plan to add a shared fence only for the source. If that
+	 * fails, we await all source fences before commencing
+	 * the copy instead of only the exclusive.
 	 */
-	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
+	shared_err = dma_resv_reserve_shared(src_bo->base.resv, 1);
+	ret = i915_deps_add_resv(&deps, dst_bo->base.resv, true, false, &ctx);
 	if (!ret)
-		ret = ttm_bo_wait_ctx(src_bo, &ctx);
+		ret = i915_deps_add_resv(&deps, src_bo->base.resv,
+					 !!shared_err, false, &ctx);
 	if (ret)
 		return ret;
 
+	dep_fence = i915_deps_to_fence(&deps, &ctx);
+	if (IS_ERR(dep_fence))
+		return PTR_ERR(dep_fence);
+
 	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
 	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
-				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
+				     dst_bo->ttm, dst_rsgt, allow_accel,
+				     dep_fence);
 
 	i915_refct_sgt_put(dst_rsgt);
-	if (IS_ERR(copy_fence))
-		return PTR_ERR(copy_fence);
+	if (IS_ERR_OR_NULL(copy_fence))
+		return PTR_ERR_OR_ZERO(copy_fence);
 
-	if (copy_fence) {
-		dma_fence_wait(copy_fence, false);
-		dma_fence_put(copy_fence);
-	}
+	dma_resv_add_excl_fence(dst_bo->base.resv, copy_fence);
+
+	/* If we failed to reserve a shared slot, add an exclusive fence */
+	if (shared_err)
+		dma_resv_add_excl_fence(src_bo->base.resv, copy_fence);
+	else
+		dma_resv_add_shared_fence(src_bo->base.resv, copy_fence);
+
+	dma_fence_put(copy_fence);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 60d10ab55d1e..9aad84059d56 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -80,6 +80,7 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region *apply,
 
 	err = i915_gem_obj_copy_ttm(backup, obj, pm_apply->allow_gpu, false);
 	GEM_WARN_ON(err);
+	ttm_bo_wait_ctx(backup_bo, &ctx);
 
 	obj->ttm.backup = backup;
 	return 0;
@@ -170,6 +171,7 @@ static int i915_ttm_restore(struct i915_gem_apply_to_region *apply,
 		err = i915_gem_obj_copy_ttm(obj, backup, pm_apply->allow_gpu,
 					    false);
 		GEM_WARN_ON(err);
+		ttm_bo_wait_ctx(backup_bo, &ctx);
 
 		obj->ttm.backup = NULL;
 		err = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/6] drm/i915: Add support for moving fence waiting
  2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
@ 2021-11-18 14:12     ` Matthew Auld
  -1 siblings, 0 replies; 31+ messages in thread
From: Matthew Auld @ 2021-11-18 14:12 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 18/11/2021 13:02, Thomas Hellström wrote:
> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> 
> For now, we will only allow async migration when TTM is used,
> so the paths we care about are related to TTM.
> 
> The mmap path is handled by having the fence in ttm_bo->moving,
> when pinning, the binding only becomes available after the moving
> fence is signaled, and pinning a cpu map will only work after
> the moving fence signals.
> 
> This should close all holes where userspace can read a buffer
> before it's fully migrated.
> 
> v2:
> - Fix a couple of SPARSE warnings
> v3:
> - Fix a NULL pointer dereference
> v4:
> - Ditch the moving fence waiting for i915_vma_pin_iomap() and
>    replace with a verification that the vma is already bound.
>    (Matthew Auld)
> - Squash with a previous patch introducing moving fence waiting and
>    accessing interfaces (Matthew Auld)
> - Rename to indicated that we also add support for sync waiting.
> 
> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 52 ++++++++++++++++++++++
>   drivers/gpu/drm/i915/gem/i915_gem_object.h |  6 +++
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c  |  6 +++
>   drivers/gpu/drm/i915/i915_vma.c            | 36 ++++++++++++++-
>   4 files changed, 99 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 591ee3cb7275..7b7d9415c9cb 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -33,6 +33,7 @@
>   #include "i915_gem_object.h"
>   #include "i915_memcpy.h"
>   #include "i915_trace.h"
> +#include "i915_gem_ttm.h"

Nit: Ordering.

>   
>   static struct kmem_cache *slab_objects;
>   
> @@ -726,6 +727,57 @@ static const struct drm_gem_object_funcs i915_gem_object_funcs = {
>   	.export = i915_gem_prime_export,
>   };
>   
> +/**
> + * i915_gem_object_get_moving_fence - Get the object's moving fence if any
> + * @obj: The object whose moving fence to get.
> + *
> + * A non-signaled moving fence means that there is an async operation
> + * pending on the object that needs to be waited on before setting up
> + * any GPU- or CPU PTEs to the object's pages.
> + *
> + * Return: A refcounted pointer to the object's moving fence if any,
> + * NULL otherwise.
> + */
> +struct dma_fence *
> +i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj)
> +{
> +	return dma_fence_get(i915_gem_to_ttm(obj)->moving);
> +}
> +
> +/**
> + * i915_gem_object_wait_moving_fence - Wait for the object's moving fence if any
> + * @obj: The object whose moving fence to wait for.
> + * @intr: Whether to wait interruptible.
> + *
> + * If the moving fence signaled without an error, it is detached from the
> + * object and put.
> + *
> + * Return: 0 if successful, -ERESTARTSYS if the wait was interrupted,
> + * negative error code if the async operation represented by the
> + * moving fence failed.
> + */
> +int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> +				      bool intr)
> +{
> +	struct dma_fence *fence = i915_gem_to_ttm(obj)->moving;
> +	int ret;
> +
> +	assert_object_held(obj);
> +	if (!fence)
> +		return 0;
> +
> +	ret = dma_fence_wait(fence, intr);
> +	if (ret)
> +		return ret;
> +
> +	if (fence->error)
> +		return fence->error;
> +
> +	i915_gem_to_ttm(obj)->moving = NULL;
> +	dma_fence_put(fence);
> +	return 0;
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/huge_gem_object.c"
>   #include "selftests/huge_pages.c"
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 133963b46135..66f20b803b01 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -517,6 +517,12 @@ i915_gem_object_finish_access(struct drm_i915_gem_object *obj)
>   	i915_gem_object_unpin_pages(obj);
>   }
>   
> +struct dma_fence *
> +i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj);
> +
> +int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> +				      bool intr);
> +
>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   					 unsigned int cache_level);
>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index c4f684b7cc51..49c6e55c68ce 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
>   	}
>   
>   	if (!ptr) {
> +		err = i915_gem_object_wait_moving_fence(obj, true);
> +		if (err) {
> +			ptr = ERR_PTR(err);
> +			goto err_unpin;
> +		}
> +
>   		if (GEM_WARN_ON(type == I915_MAP_WC &&
>   				!static_cpu_has(X86_FEATURE_PAT)))
>   			ptr = ERR_PTR(-ENODEV);
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 0896656896d0..00d3daf72329 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -354,6 +354,25 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
>   	return err;
>   }
>   
> +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
> +static int i915_vma_verify_bind_complete(struct i915_vma *vma)
> +{
> +	int err = 0;
> +
> +	if (i915_active_has_exclusive(&vma->active)) {
> +		struct dma_fence *fence =
> +			i915_active_fence_get(&vma->active.excl);

if (!fence)
     return 0;

?

> +
> +		if (dma_fence_is_signaled(fence))
> +			err = fence->error;
> +		else
> +			err = -EBUSY;

dma_fence_put(fence);

?

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> +	}
> +
> +	return err;
> +}
> +#endif
> +
>   /**
>    * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
>    * @vma: VMA to map
> @@ -428,6 +447,13 @@ int i915_vma_bind(struct i915_vma *vma,
>   			work->pinned = i915_gem_object_get(vma->obj);
>   		}
>   	} else {
> +		if (vma->obj) {
> +			int ret;
> +
> +			ret = i915_gem_object_wait_moving_fence(vma->obj, true);
> +			if (ret)
> +				return ret;
> +		}
>   		vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags);
>   	}
>   
> @@ -449,6 +475,7 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
>   
>   	GEM_BUG_ON(!i915_vma_is_ggtt(vma));
>   	GEM_BUG_ON(!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND));
> +	GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
>   
>   	ptr = READ_ONCE(vma->iomap);
>   	if (ptr == NULL) {
> @@ -867,6 +894,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   		    u64 size, u64 alignment, u64 flags)
>   {
>   	struct i915_vma_work *work = NULL;
> +	struct dma_fence *moving = NULL;
>   	intel_wakeref_t wakeref = 0;
>   	unsigned int bound;
>   	int err;
> @@ -892,7 +920,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   	if (flags & PIN_GLOBAL)
>   		wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
>   
> -	if (flags & vma->vm->bind_async_flags) {
> +	moving = vma->obj ? i915_gem_object_get_moving_fence(vma->obj) : NULL;
> +	if (flags & vma->vm->bind_async_flags || moving) {
>   		/* lock VM */
>   		err = i915_vm_lock_objects(vma->vm, ww);
>   		if (err)
> @@ -906,6 +935,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   
>   		work->vm = i915_vm_get(vma->vm);
>   
> +		dma_fence_work_chain(&work->base, moving);
> +
>   		/* Allocate enough page directories to used PTE */
>   		if (vma->vm->allocate_va_range) {
>   			err = i915_vm_alloc_pt_stash(vma->vm,
> @@ -1010,7 +1041,10 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   err_rpm:
>   	if (wakeref)
>   		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
> +	if (moving)
> +		dma_fence_put(moving);
>   	vma_put_pages(vma);
> +
>   	return err;
>   }
>   
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v4 1/6] drm/i915: Add support for moving fence waiting
@ 2021-11-18 14:12     ` Matthew Auld
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Auld @ 2021-11-18 14:12 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 18/11/2021 13:02, Thomas Hellström wrote:
> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> 
> For now, we will only allow async migration when TTM is used,
> so the paths we care about are related to TTM.
> 
> The mmap path is handled by having the fence in ttm_bo->moving,
> when pinning, the binding only becomes available after the moving
> fence is signaled, and pinning a cpu map will only work after
> the moving fence signals.
> 
> This should close all holes where userspace can read a buffer
> before it's fully migrated.
> 
> v2:
> - Fix a couple of SPARSE warnings
> v3:
> - Fix a NULL pointer dereference
> v4:
> - Ditch the moving fence waiting for i915_vma_pin_iomap() and
>    replace with a verification that the vma is already bound.
>    (Matthew Auld)
> - Squash with a previous patch introducing moving fence waiting and
>    accessing interfaces (Matthew Auld)
> - Rename to indicated that we also add support for sync waiting.
> 
> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 52 ++++++++++++++++++++++
>   drivers/gpu/drm/i915/gem/i915_gem_object.h |  6 +++
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c  |  6 +++
>   drivers/gpu/drm/i915/i915_vma.c            | 36 ++++++++++++++-
>   4 files changed, 99 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 591ee3cb7275..7b7d9415c9cb 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -33,6 +33,7 @@
>   #include "i915_gem_object.h"
>   #include "i915_memcpy.h"
>   #include "i915_trace.h"
> +#include "i915_gem_ttm.h"

Nit: Ordering.

>   
>   static struct kmem_cache *slab_objects;
>   
> @@ -726,6 +727,57 @@ static const struct drm_gem_object_funcs i915_gem_object_funcs = {
>   	.export = i915_gem_prime_export,
>   };
>   
> +/**
> + * i915_gem_object_get_moving_fence - Get the object's moving fence if any
> + * @obj: The object whose moving fence to get.
> + *
> + * A non-signaled moving fence means that there is an async operation
> + * pending on the object that needs to be waited on before setting up
> + * any GPU- or CPU PTEs to the object's pages.
> + *
> + * Return: A refcounted pointer to the object's moving fence if any,
> + * NULL otherwise.
> + */
> +struct dma_fence *
> +i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj)
> +{
> +	return dma_fence_get(i915_gem_to_ttm(obj)->moving);
> +}
> +
> +/**
> + * i915_gem_object_wait_moving_fence - Wait for the object's moving fence if any
> + * @obj: The object whose moving fence to wait for.
> + * @intr: Whether to wait interruptible.
> + *
> + * If the moving fence signaled without an error, it is detached from the
> + * object and put.
> + *
> + * Return: 0 if successful, -ERESTARTSYS if the wait was interrupted,
> + * negative error code if the async operation represented by the
> + * moving fence failed.
> + */
> +int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> +				      bool intr)
> +{
> +	struct dma_fence *fence = i915_gem_to_ttm(obj)->moving;
> +	int ret;
> +
> +	assert_object_held(obj);
> +	if (!fence)
> +		return 0;
> +
> +	ret = dma_fence_wait(fence, intr);
> +	if (ret)
> +		return ret;
> +
> +	if (fence->error)
> +		return fence->error;
> +
> +	i915_gem_to_ttm(obj)->moving = NULL;
> +	dma_fence_put(fence);
> +	return 0;
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/huge_gem_object.c"
>   #include "selftests/huge_pages.c"
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 133963b46135..66f20b803b01 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -517,6 +517,12 @@ i915_gem_object_finish_access(struct drm_i915_gem_object *obj)
>   	i915_gem_object_unpin_pages(obj);
>   }
>   
> +struct dma_fence *
> +i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj);
> +
> +int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> +				      bool intr);
> +
>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   					 unsigned int cache_level);
>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index c4f684b7cc51..49c6e55c68ce 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
>   	}
>   
>   	if (!ptr) {
> +		err = i915_gem_object_wait_moving_fence(obj, true);
> +		if (err) {
> +			ptr = ERR_PTR(err);
> +			goto err_unpin;
> +		}
> +
>   		if (GEM_WARN_ON(type == I915_MAP_WC &&
>   				!static_cpu_has(X86_FEATURE_PAT)))
>   			ptr = ERR_PTR(-ENODEV);
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 0896656896d0..00d3daf72329 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -354,6 +354,25 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
>   	return err;
>   }
>   
> +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
> +static int i915_vma_verify_bind_complete(struct i915_vma *vma)
> +{
> +	int err = 0;
> +
> +	if (i915_active_has_exclusive(&vma->active)) {
> +		struct dma_fence *fence =
> +			i915_active_fence_get(&vma->active.excl);

if (!fence)
     return 0;

?

> +
> +		if (dma_fence_is_signaled(fence))
> +			err = fence->error;
> +		else
> +			err = -EBUSY;

dma_fence_put(fence);

?

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> +	}
> +
> +	return err;
> +}
> +#endif
> +
>   /**
>    * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
>    * @vma: VMA to map
> @@ -428,6 +447,13 @@ int i915_vma_bind(struct i915_vma *vma,
>   			work->pinned = i915_gem_object_get(vma->obj);
>   		}
>   	} else {
> +		if (vma->obj) {
> +			int ret;
> +
> +			ret = i915_gem_object_wait_moving_fence(vma->obj, true);
> +			if (ret)
> +				return ret;
> +		}
>   		vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags);
>   	}
>   
> @@ -449,6 +475,7 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
>   
>   	GEM_BUG_ON(!i915_vma_is_ggtt(vma));
>   	GEM_BUG_ON(!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND));
> +	GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
>   
>   	ptr = READ_ONCE(vma->iomap);
>   	if (ptr == NULL) {
> @@ -867,6 +894,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   		    u64 size, u64 alignment, u64 flags)
>   {
>   	struct i915_vma_work *work = NULL;
> +	struct dma_fence *moving = NULL;
>   	intel_wakeref_t wakeref = 0;
>   	unsigned int bound;
>   	int err;
> @@ -892,7 +920,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   	if (flags & PIN_GLOBAL)
>   		wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
>   
> -	if (flags & vma->vm->bind_async_flags) {
> +	moving = vma->obj ? i915_gem_object_get_moving_fence(vma->obj) : NULL;
> +	if (flags & vma->vm->bind_async_flags || moving) {
>   		/* lock VM */
>   		err = i915_vm_lock_objects(vma->vm, ww);
>   		if (err)
> @@ -906,6 +935,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   
>   		work->vm = i915_vm_get(vma->vm);
>   
> +		dma_fence_work_chain(&work->base, moving);
> +
>   		/* Allocate enough page directories to used PTE */
>   		if (vma->vm->allocate_va_range) {
>   			err = i915_vm_alloc_pt_stash(vma->vm,
> @@ -1010,7 +1041,10 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   err_rpm:
>   	if (wakeref)
>   		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
> +	if (moving)
> +		dma_fence_put(moving);
>   	vma_put_pages(vma);
> +
>   	return err;
>   }
>   
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/6] drm/i915/ttm: Drop region reference counting
  2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
@ 2021-11-18 14:19     ` Matthew Auld
  -1 siblings, 0 replies; 31+ messages in thread
From: Matthew Auld @ 2021-11-18 14:19 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 18/11/2021 13:02, Thomas Hellström wrote:
> There is an interesting refcounting loop:
> struct intel_memory_region has a struct ttm_resource_manager,
> ttm_resource_manager->move may hold a reference to i915_request,
> i915_request may hold a reference to intel_context,
> intel_context may hold a reference to drm_i915_gem_object,
> drm_i915_gem_object may hold a reference to intel_memory_region.
> 
> Break this loop by dropping region reference counting.
> 
> In addition, Have regions with a manager moving fence make sure
> that all region objects are released before freeing the region.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v4 3/6] drm/i915/ttm: Drop region reference counting
@ 2021-11-18 14:19     ` Matthew Auld
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Auld @ 2021-11-18 14:19 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 18/11/2021 13:02, Thomas Hellström wrote:
> There is an interesting refcounting loop:
> struct intel_memory_region has a struct ttm_resource_manager,
> ttm_resource_manager->move may hold a reference to i915_request,
> i915_request may hold a reference to intel_context,
> intel_context may hold a reference to drm_i915_gem_object,
> drm_i915_gem_object may hold a reference to intel_memory_region.
> 
> Break this loop by dropping region reference counting.
> 
> In addition, Have regions with a manager moving fence make sure
> that all region objects are released before freeing the region.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/6] drm/i915/ttm: Correctly handle waiting for gpu when shrinking
  2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
@ 2021-11-18 14:30     ` Matthew Auld
  -1 siblings, 0 replies; 31+ messages in thread
From: Matthew Auld @ 2021-11-18 14:30 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 18/11/2021 13:02, Thomas Hellström wrote:
> With async migration, the shrinker may end up wanting to release the
> pages of an object while the migration blit is still running, since
> the GT migration code doesn't set up VMAs and the shrinker is thus
> oblivious to the fact that the GPU is still using the pages.
> 
> Add waiting for gpu in the shrinker_release_pages() op and an
> argument to that function indicating whether the shrinker expects it
> to not wait for gpu. In the latter case the shrinker_release_pages()
> op will return -EBUSY if the object is not idle.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v4 4/6] drm/i915/ttm: Correctly handle waiting for gpu when shrinking
@ 2021-11-18 14:30     ` Matthew Auld
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Auld @ 2021-11-18 14:30 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 18/11/2021 13:02, Thomas Hellström wrote:
> With async migration, the shrinker may end up wanting to release the
> pages of an object while the migration blit is still running, since
> the GT migration code doesn't set up VMAs and the shrinker is thus
> oblivious to the fact that the GPU is still using the pages.
> 
> Add waiting for gpu in the shrinker_release_pages() op and an
> argument to that function indicating whether the shrinker expects it
> to not wait for gpu. In the latter case the shrinker_release_pages()
> op will return -EBUSY if the object is not idle.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/ttm: Async migration (rev5)
  2021-11-18 13:02 ` [Intel-gfx] " Thomas Hellström
                   ` (6 preceding siblings ...)
  (?)
@ 2021-11-18 15:32 ` Patchwork
  -1 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2021-11-18 15:32 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/ttm: Async migration (rev5)
URL   : https://patchwork.freedesktop.org/series/96798/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/ttm: Async migration (rev5)
  2021-11-18 13:02 ` [Intel-gfx] " Thomas Hellström
                   ` (7 preceding siblings ...)
  (?)
@ 2021-11-18 15:59 ` Patchwork
  -1 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2021-11-18 15:59 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 8232 bytes --]

== Series Details ==

Series: drm/i915/ttm: Async migration (rev5)
URL   : https://patchwork.freedesktop.org/series/96798/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10900 -> Patchwork_21630
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/index.html

Participating hosts (38 -> 31)
------------------------------

  Additional (1): fi-tgl-1115g4 
  Missing    (8): fi-kbl-soraka fi-tgl-u2 fi-hsw-4200u fi-bsw-cyan fi-kbl-guc fi-ctg-p8600 bat-jsl-2 bat-jsl-1 

Known issues
------------

  Here are the changes found in Patchwork_21630 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@query-info:
    - fi-tgl-1115g4:      NOTRUN -> [SKIP][1] ([fdo#109315])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-tgl-1115g4/igt@amdgpu/amd_basic@query-info.html

  * igt@amdgpu/amd_basic@semaphore:
    - fi-bdw-5557u:       NOTRUN -> [SKIP][2] ([fdo#109271]) +31 similar issues
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-bdw-5557u/igt@amdgpu/amd_basic@semaphore.html

  * igt@amdgpu/amd_cs_nop@nop-gfx0:
    - fi-tgl-1115g4:      NOTRUN -> [SKIP][3] ([fdo#109315] / [i915#2575]) +16 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-tgl-1115g4/igt@amdgpu/amd_cs_nop@nop-gfx0.html

  * igt@amdgpu/amd_cs_nop@sync-fork-compute0:
    - fi-snb-2600:        NOTRUN -> [SKIP][4] ([fdo#109271]) +17 similar issues
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-snb-2600/igt@amdgpu/amd_cs_nop@sync-fork-compute0.html

  * igt@gem_exec_suspend@basic-s3:
    - fi-tgl-1115g4:      NOTRUN -> [FAIL][5] ([i915#1888])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-tgl-1115g4/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_huc_copy@huc-copy:
    - fi-skl-6600u:       NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#2190])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-skl-6600u/igt@gem_huc_copy@huc-copy.html
    - fi-tgl-1115g4:      NOTRUN -> [SKIP][7] ([i915#2190])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-tgl-1115g4/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@verify-random:
    - fi-skl-6600u:       NOTRUN -> [SKIP][8] ([fdo#109271]) +6 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-skl-6600u/igt@gem_lmem_swapping@verify-random.html
    - fi-tgl-1115g4:      NOTRUN -> [SKIP][9] ([i915#4555]) +3 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-tgl-1115g4/igt@gem_lmem_swapping@verify-random.html

  * igt@i915_pm_backlight@basic-brightness:
    - fi-tgl-1115g4:      NOTRUN -> [SKIP][10] ([i915#1155])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-tgl-1115g4/igt@i915_pm_backlight@basic-brightness.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-tgl-1115g4:      NOTRUN -> [SKIP][11] ([fdo#111827]) +8 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-tgl-1115g4/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_chamelium@dp-crc-fast:
    - fi-bdw-5557u:       NOTRUN -> [SKIP][12] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-bdw-5557u/igt@kms_chamelium@dp-crc-fast.html

  * igt@kms_chamelium@vga-edid-read:
    - fi-skl-6600u:       NOTRUN -> [SKIP][13] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-skl-6600u/igt@kms_chamelium@vga-edid-read.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-tgl-1115g4:      NOTRUN -> [SKIP][14] ([i915#4103]) +1 similar issue
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-tgl-1115g4/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-tgl-1115g4:      NOTRUN -> [SKIP][15] ([fdo#109285])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-tgl-1115g4/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - fi-skl-6600u:       NOTRUN -> [SKIP][16] ([fdo#109271] / [i915#533])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-skl-6600u/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@kms_psr@primary_mmap_gtt:
    - fi-tgl-1115g4:      NOTRUN -> [SKIP][17] ([i915#1072]) +3 similar issues
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-tgl-1115g4/igt@kms_psr@primary_mmap_gtt.html

  * igt@kms_psr@primary_page_flip:
    - fi-skl-6600u:       NOTRUN -> [FAIL][18] ([i915#4547])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-skl-6600u/igt@kms_psr@primary_page_flip.html

  * igt@prime_vgem@basic-userptr:
    - fi-tgl-1115g4:      NOTRUN -> [SKIP][19] ([i915#3301])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-tgl-1115g4/igt@prime_vgem@basic-userptr.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-bdw-5557u:       [INCOMPLETE][20] ([i915#146]) -> [PASS][21]
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/fi-bdw-5557u/igt@gem_exec_suspend@basic-s3.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-bdw-5557u/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_flink_basic@bad-flink:
    - fi-skl-6600u:       [FAIL][22] ([i915#4547]) -> [PASS][23]
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/fi-skl-6600u/igt@gem_flink_basic@bad-flink.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-skl-6600u/igt@gem_flink_basic@bad-flink.html

  * igt@i915_selftest@live@hangcheck:
    - fi-snb-2600:        [INCOMPLETE][24] ([i915#3921]) -> [PASS][25]
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/fi-snb-2600/igt@i915_selftest@live@hangcheck.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1155]: https://gitlab.freedesktop.org/drm/intel/issues/1155
  [i915#146]: https://gitlab.freedesktop.org/drm/intel/issues/146
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2575]: https://gitlab.freedesktop.org/drm/intel/issues/2575
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3921]: https://gitlab.freedesktop.org/drm/intel/issues/3921
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4547]: https://gitlab.freedesktop.org/drm/intel/issues/4547
  [i915#4555]: https://gitlab.freedesktop.org/drm/intel/issues/4555
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533


Build changes
-------------

  * Linux: CI_DRM_10900 -> Patchwork_21630

  CI-20190529: 20190529
  CI_DRM_10900: b50839f33180500c64a505623ab77829b869a57c @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6285: 2e0355faad5c2e81cd6705b76e529ce526c7c9bf @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_21630: 95cb2cbc3565e979675fa7ea36dc7d727336a07d @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

95cb2cbc3565 drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous
343cd8ad3ec9 drm/i915/ttm: Implement asynchronous TTM moves
7818d0e9eb89 drm/i915/ttm: Correctly handle waiting for gpu when shrinking
a6214e9c1a5e drm/i915/ttm: Drop region reference counting
6911ddc485bb drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
9ae55108104f drm/i915: Add support for moving fence waiting

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/index.html

[-- Attachment #2: Type: text/html, Size: 10144 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/ttm: Async migration (rev5)
  2021-11-18 13:02 ` [Intel-gfx] " Thomas Hellström
                   ` (8 preceding siblings ...)
  (?)
@ 2021-11-19  5:34 ` Patchwork
  -1 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2021-11-19  5:34 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 30259 bytes --]

== Series Details ==

Series: drm/i915/ttm: Async migration (rev5)
URL   : https://patchwork.freedesktop.org/series/96798/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10900_full -> Patchwork_21630_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_21630_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21630_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (11 -> 11)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_21630_full:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_pm_rpm@gem-execbuf:
    - shard-iclb:         [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-iclb8/igt@i915_pm_rpm@gem-execbuf.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb4/igt@i915_pm_rpm@gem-execbuf.html

  * igt@kms_vblank@pipe-b-ts-continuation-dpms-rpm:
    - shard-tglb:         [PASS][3] -> [SKIP][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-tglb2/igt@kms_vblank@pipe-b-ts-continuation-dpms-rpm.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb2/igt@kms_vblank@pipe-b-ts-continuation-dpms-rpm.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@kms_big_fb@y-tiled-32bpp-rotate-180:
    - {shard-rkl}:        [SKIP][5] ([i915#1845]) -> ([DMESG-WARN][6], [SKIP][7]) ([i915#1845])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-rkl-1/igt@kms_big_fb@y-tiled-32bpp-rotate-180.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-rkl-2/igt@kms_big_fb@y-tiled-32bpp-rotate-180.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-rkl-4/igt@kms_big_fb@y-tiled-32bpp-rotate-180.html

  
Known issues
------------

  Here are the changes found in Patchwork_21630_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@feature_discovery@display-2x:
    - shard-iclb:         NOTRUN -> [SKIP][8] ([i915#1839])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@feature_discovery@display-2x.html

  * igt@gem_create@create-massive:
    - shard-skl:          NOTRUN -> [DMESG-WARN][9] ([i915#3002])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl10/igt@gem_create@create-massive.html

  * igt@gem_eio@in-flight-contexts-immediate:
    - shard-tglb:         [PASS][10] -> [TIMEOUT][11] ([i915#3063])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-tglb8/igt@gem_eio@in-flight-contexts-immediate.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb5/igt@gem_eio@in-flight-contexts-immediate.html

  * igt@gem_exec_capture@pi@bcs0:
    - shard-skl:          [PASS][12] -> [INCOMPLETE][13] ([i915#2369])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-skl10/igt@gem_exec_capture@pi@bcs0.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl4/igt@gem_exec_capture@pi@bcs0.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - shard-apl:          [PASS][14] -> [SKIP][15] ([fdo#109271])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-apl7/igt@gem_exec_fair@basic-none-share@rcs0.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-apl6/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-none@rcs0:
    - shard-iclb:         [PASS][16] -> [FAIL][17] ([i915#2842])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-iclb7/igt@gem_exec_fair@basic-none@rcs0.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb3/igt@gem_exec_fair@basic-none@rcs0.html

  * igt@gem_exec_fair@basic-pace@vcs1:
    - shard-iclb:         NOTRUN -> [FAIL][18] ([i915#2842])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb4/igt@gem_exec_fair@basic-pace@vcs1.html

  * igt@gem_exec_fair@basic-pace@vecs0:
    - shard-tglb:         [PASS][19] -> [FAIL][20] ([i915#2842]) +1 similar issue
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-tglb6/igt@gem_exec_fair@basic-pace@vecs0.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb8/igt@gem_exec_fair@basic-pace@vecs0.html

  * igt@gem_exec_fair@basic-throttle@rcs0:
    - shard-glk:          [PASS][21] -> [FAIL][22] ([i915#2842]) +1 similar issue
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-glk6/igt@gem_exec_fair@basic-throttle@rcs0.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-glk3/igt@gem_exec_fair@basic-throttle@rcs0.html

  * igt@gem_exec_suspend@basic-s0:
    - shard-glk:          [PASS][23] -> [DMESG-WARN][24] ([i915#118])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-glk1/igt@gem_exec_suspend@basic-s0.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-glk4/igt@gem_exec_suspend@basic-s0.html

  * igt@gem_huc_copy@huc-copy:
    - shard-apl:          NOTRUN -> [SKIP][25] ([fdo#109271] / [i915#2190])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-apl3/igt@gem_huc_copy@huc-copy.html
    - shard-kbl:          NOTRUN -> [SKIP][26] ([fdo#109271] / [i915#2190])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-kbl3/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@heavy-random:
    - shard-iclb:         NOTRUN -> [SKIP][27] ([i915#4555])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@gem_lmem_swapping@heavy-random.html

  * igt@gem_lmem_swapping@parallel-random-verify:
    - shard-tglb:         NOTRUN -> [SKIP][28] ([i915#4555] / [i915#4565])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@gem_lmem_swapping@parallel-random-verify.html

  * igt@gem_pread@exhaustion:
    - shard-iclb:         NOTRUN -> [WARN][29] ([i915#2658]) +1 similar issue
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@gem_pread@exhaustion.html

  * igt@gem_render_copy@y-tiled-mc-ccs-to-yf-tiled-ccs:
    - shard-iclb:         NOTRUN -> [SKIP][30] ([i915#768]) +1 similar issue
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@gem_render_copy@y-tiled-mc-ccs-to-yf-tiled-ccs.html

  * igt@gem_softpin@noreloc-s3:
    - shard-tglb:         [PASS][31] -> [INCOMPLETE][32] ([i915#1373] / [i915#456])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-tglb5/igt@gem_softpin@noreloc-s3.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb7/igt@gem_softpin@noreloc-s3.html

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-skl:          NOTRUN -> [SKIP][33] ([fdo#109271] / [i915#3323])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl4/igt@gem_userptr_blits@dmabuf-sync.html

  * igt@gem_userptr_blits@vma-merge:
    - shard-skl:          NOTRUN -> [FAIL][34] ([i915#3318])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl7/igt@gem_userptr_blits@vma-merge.html

  * igt@gen3_render_linear_blits:
    - shard-tglb:         NOTRUN -> [SKIP][35] ([fdo#109289]) +2 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@gen3_render_linear_blits.html

  * igt@gen9_exec_parse@allowed-all:
    - shard-skl:          NOTRUN -> [DMESG-WARN][36] ([i915#1436] / [i915#716])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl9/igt@gen9_exec_parse@allowed-all.html
    - shard-tglb:         NOTRUN -> [SKIP][37] ([i915#2856]) +1 similar issue
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@gen9_exec_parse@allowed-all.html

  * igt@i915_pm_dc@dc6-dpms:
    - shard-skl:          NOTRUN -> [FAIL][38] ([i915#454]) +1 similar issue
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl8/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_dc@dc9-dpms:
    - shard-iclb:         NOTRUN -> [FAIL][39] ([i915#4275])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@i915_pm_dc@dc9-dpms.html

  * igt@i915_selftest@live@gem_contexts:
    - shard-tglb:         [PASS][40] -> [DMESG-WARN][41] ([i915#2867]) +14 similar issues
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-tglb2/igt@i915_selftest@live@gem_contexts.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb2/igt@i915_selftest@live@gem_contexts.html

  * igt@i915_selftest@live@hangcheck:
    - shard-snb:          [PASS][42] -> [INCOMPLETE][43] ([i915#3921])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-snb5/igt@i915_selftest@live@hangcheck.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-snb2/igt@i915_selftest@live@hangcheck.html

  * igt@kms_async_flips@crc:
    - shard-skl:          NOTRUN -> [FAIL][44] ([i915#4272])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl9/igt@kms_async_flips@crc.html

  * igt@kms_big_fb@linear-8bpp-rotate-90:
    - shard-tglb:         NOTRUN -> [SKIP][45] ([fdo#111614]) +1 similar issue
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_big_fb@linear-8bpp-rotate-90.html

  * igt@kms_big_fb@x-tiled-64bpp-rotate-270:
    - shard-iclb:         NOTRUN -> [SKIP][46] ([fdo#110725] / [fdo#111614])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_big_fb@x-tiled-64bpp-rotate-270.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-hflip:
    - shard-skl:          NOTRUN -> [SKIP][47] ([fdo#109271] / [i915#3777]) +4 similar issues
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl4/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-hflip.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-async-flip:
    - shard-skl:          NOTRUN -> [FAIL][48] ([i915#3763])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl10/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-async-flip.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0:
    - shard-tglb:         NOTRUN -> [SKIP][49] ([fdo#111615]) +1 similar issue
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb7/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-180-async-flip:
    - shard-skl:          NOTRUN -> [FAIL][50] ([i915#3743]) +2 similar issues
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl10/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-180-async-flip.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180-async-flip:
    - shard-iclb:         NOTRUN -> [SKIP][51] ([fdo#110723])
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180-async-flip.html

  * igt@kms_ccs@pipe-a-bad-aux-stride-y_tiled_gen12_mc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][52] ([i915#3689] / [i915#3886])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_ccs@pipe-a-bad-aux-stride-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-bad-rotation-90-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][53] ([fdo#109271] / [i915#3886]) +4 similar issues
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-apl2/igt@kms_ccs@pipe-a-bad-rotation-90-y_tiled_gen12_mc_ccs.html
    - shard-kbl:          NOTRUN -> [SKIP][54] ([fdo#109271] / [i915#3886]) +1 similar issue
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-kbl3/igt@kms_ccs@pipe-a-bad-rotation-90-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs:
    - shard-iclb:         NOTRUN -> [SKIP][55] ([fdo#109278] / [i915#3886]) +2 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_rc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][56] ([fdo#109271]) +94 similar issues
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-apl4/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_rc_ccs.html

  * igt@kms_ccs@pipe-b-crc-sprite-planes-basic-yf_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][57] ([fdo#111615] / [i915#3689]) +3 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_ccs@pipe-b-crc-sprite-planes-basic-yf_tiled_ccs.html

  * igt@kms_ccs@pipe-c-bad-rotation-90-y_tiled_gen12_rc_ccs_cc:
    - shard-skl:          NOTRUN -> [SKIP][58] ([fdo#109271] / [i915#3886]) +15 similar issues
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl4/igt@kms_ccs@pipe-c-bad-rotation-90-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-d-crc-primary-basic-y_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][59] ([i915#3689])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_ccs@pipe-d-crc-primary-basic-y_tiled_ccs.html

  * igt@kms_cdclk@plane-scaling:
    - shard-tglb:         NOTRUN -> [SKIP][60] ([i915#3742])
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_cdclk@plane-scaling.html

  * igt@kms_chamelium@dp-edid-change-during-suspend:
    - shard-iclb:         NOTRUN -> [SKIP][61] ([fdo#109284] / [fdo#111827]) +3 similar issues
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_chamelium@dp-edid-change-during-suspend.html

  * igt@kms_chamelium@vga-hpd-without-ddc:
    - shard-kbl:          NOTRUN -> [SKIP][62] ([fdo#109271] / [fdo#111827]) +1 similar issue
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-kbl3/igt@kms_chamelium@vga-hpd-without-ddc.html

  * igt@kms_color_chamelium@pipe-b-ctm-0-5:
    - shard-apl:          NOTRUN -> [SKIP][63] ([fdo#109271] / [fdo#111827]) +6 similar issues
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-apl2/igt@kms_color_chamelium@pipe-b-ctm-0-5.html

  * igt@kms_color_chamelium@pipe-b-ctm-limited-range:
    - shard-tglb:         NOTRUN -> [SKIP][64] ([fdo#109284] / [fdo#111827]) +4 similar issues
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_color_chamelium@pipe-b-ctm-limited-range.html

  * igt@kms_color_chamelium@pipe-b-ctm-max:
    - shard-skl:          NOTRUN -> [SKIP][65] ([fdo#109271] / [fdo#111827]) +35 similar issues
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl9/igt@kms_color_chamelium@pipe-b-ctm-max.html

  * igt@kms_cursor_crc@pipe-a-cursor-512x512-rapid-movement:
    - shard-iclb:         NOTRUN -> [SKIP][66] ([fdo#109278] / [fdo#109279])
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_cursor_crc@pipe-a-cursor-512x512-rapid-movement.html

  * igt@kms_cursor_crc@pipe-b-cursor-32x32-offscreen:
    - shard-tglb:         NOTRUN -> [SKIP][67] ([i915#3319]) +2 similar issues
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_cursor_crc@pipe-b-cursor-32x32-offscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-32x10-rapid-movement:
    - shard-tglb:         NOTRUN -> [SKIP][68] ([i915#3359]) +2 similar issues
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_cursor_crc@pipe-c-cursor-32x10-rapid-movement.html

  * igt@kms_cursor_crc@pipe-d-cursor-512x512-offscreen:
    - shard-tglb:         NOTRUN -> [SKIP][69] ([fdo#109279] / [i915#3359])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_cursor_crc@pipe-d-cursor-512x512-offscreen.html

  * igt@kms_cursor_legacy@cursor-vs-flip-atomic-transitions-varying-size:
    - shard-iclb:         [PASS][70] -> [FAIL][71] ([i915#2370])
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-iclb5/igt@kms_cursor_legacy@cursor-vs-flip-atomic-transitions-varying-size.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb7/igt@kms_cursor_legacy@cursor-vs-flip-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-legacy:
    - shard-iclb:         NOTRUN -> [SKIP][72] ([fdo#109274] / [fdo#109278])
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_cursor_legacy@cursora-vs-flipb-legacy.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions:
    - shard-skl:          NOTRUN -> [FAIL][73] ([i915#2346])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl10/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size:
    - shard-skl:          NOTRUN -> [FAIL][74] ([i915#2346] / [i915#533])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl8/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html

  * igt@kms_dp_tiled_display@basic-test-pattern-with-chamelium:
    - shard-iclb:         NOTRUN -> [SKIP][75] ([i915#3528])
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_dp_tiled_display@basic-test-pattern-with-chamelium.html

  * igt@kms_flip@2x-flip-vs-rmfb:
    - shard-tglb:         NOTRUN -> [SKIP][76] ([fdo#111825]) +17 similar issues
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_flip@2x-flip-vs-rmfb.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1:
    - shard-skl:          NOTRUN -> [FAIL][77] ([i915#79]) +1 similar issue
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl4/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-kbl:          [PASS][78] -> [DMESG-WARN][79] ([i915#180]) +4 similar issues
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-kbl2/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-kbl1/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@b-edp1:
    - shard-skl:          [PASS][80] -> [INCOMPLETE][81] ([i915#198])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-skl8/igt@kms_flip@flip-vs-suspend-interruptible@b-edp1.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl7/igt@kms_flip@flip-vs-suspend-interruptible@b-edp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@c-dp1:
    - shard-apl:          [PASS][82] -> [DMESG-WARN][83] ([i915#180]) +5 similar issues
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-apl2/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-apl2/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytileccs:
    - shard-skl:          NOTRUN -> [INCOMPLETE][84] ([i915#3699])
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl4/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytileccs.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs:
    - shard-kbl:          NOTRUN -> [SKIP][85] ([fdo#109271] / [i915#2672])
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-kbl3/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs.html
    - shard-apl:          NOTRUN -> [SKIP][86] ([fdo#109271] / [i915#2672])
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-apl2/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs.html

  * igt@kms_frontbuffer_tracking@fbc-1p-shrfb-fliptrack-mmap-gtt:
    - shard-skl:          NOTRUN -> [SKIP][87] ([fdo#109271]) +504 similar issues
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl9/igt@kms_frontbuffer_tracking@fbc-1p-shrfb-fliptrack-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-rte:
    - shard-glk:          [PASS][88] -> [FAIL][89] ([i915#1888] / [i915#2546])
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-glk6/igt@kms_frontbuffer_tracking@fbc-2p-rte.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-glk1/igt@kms_frontbuffer_tracking@fbc-2p-rte.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-onoff:
    - shard-iclb:         NOTRUN -> [SKIP][90] ([fdo#109280]) +6 similar issues
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-indfb-msflip-blt:
    - shard-kbl:          NOTRUN -> [SKIP][91] ([fdo#109271]) +34 similar issues
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-kbl3/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-indfb-msflip-blt.html

  * igt@kms_hdr@bpc-switch-suspend:
    - shard-skl:          NOTRUN -> [FAIL][92] ([i915#1188])
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl4/igt@kms_hdr@bpc-switch-suspend.html

  * igt@kms_hdr@static-swap:
    - shard-tglb:         NOTRUN -> [SKIP][93] ([i915#1187])
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb7/igt@kms_hdr@static-swap.html

  * igt@kms_pipe_b_c_ivb@pipe-b-dpms-off-modeset-pipe-c:
    - shard-iclb:         NOTRUN -> [SKIP][94] ([fdo#109289]) +1 similar issue
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_pipe_b_c_ivb@pipe-b-dpms-off-modeset-pipe-c.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-d-frame-sequence:
    - shard-apl:          NOTRUN -> [SKIP][95] ([fdo#109271] / [i915#533])
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-apl4/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-d-frame-sequence.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-d:
    - shard-skl:          NOTRUN -> [SKIP][96] ([fdo#109271] / [i915#533]) +3 similar issues
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl8/igt@kms_pipe_crc_basic@read-crc-pipe-d.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [PASS][97] -> [FAIL][98] ([fdo#108145] / [i915#265])
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-skl8/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl9/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-opaque-fb:
    - shard-skl:          NOTRUN -> [FAIL][99] ([fdo#108145] / [i915#265]) +1 similar issue
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl4/igt@kms_plane_alpha_blend@pipe-c-alpha-opaque-fb.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-transparent-fb:
    - shard-skl:          NOTRUN -> [FAIL][100] ([i915#265]) +1 similar issue
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl7/igt@kms_plane_alpha_blend@pipe-c-alpha-transparent-fb.html

  * igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping:
    - shard-skl:          NOTRUN -> [SKIP][101] ([fdo#109271] / [i915#2733])
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl7/igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping.html

  * igt@kms_prime@basic-crc@first-to-second:
    - shard-tglb:         NOTRUN -> [SKIP][102] ([i915#1836])
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_prime@basic-crc@first-to-second.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-2:
    - shard-tglb:         NOTRUN -> [SKIP][103] ([i915#2920]) +2 similar issues
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-2.html

  * igt@kms_psr2_sf@plane-move-sf-dmg-area-2:
    - shard-skl:          NOTRUN -> [SKIP][104] ([fdo#109271] / [i915#658]) +7 similar issues
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl10/igt@kms_psr2_sf@plane-move-sf-dmg-area-2.html

  * igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-1:
    - shard-iclb:         NOTRUN -> [SKIP][105] ([i915#2920])
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-1.html

  * igt@kms_psr@psr2_cursor_plane_move:
    - shard-iclb:         [PASS][106] -> [SKIP][107] ([fdo#109441]) +1 similar issue
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-iclb2/igt@kms_psr@psr2_cursor_plane_move.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb8/igt@kms_psr@psr2_cursor_plane_move.html

  * igt@kms_psr@psr2_sprite_plane_move:
    - shard-tglb:         NOTRUN -> [FAIL][108] ([i915#132] / [i915#3467])
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_psr@psr2_sprite_plane_move.html

  * igt@kms_setmode@basic:
    - shard-glk:          [PASS][109] -> [FAIL][110] ([i915#31])
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-glk5/igt@kms_setmode@basic.html
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-glk2/igt@kms_setmode@basic.html

  * igt@kms_sysfs_edid_timing:
    - shard-skl:          NOTRUN -> [FAIL][111] ([IGT#2])
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl7/igt@kms_sysfs_edid_timing.html

  * igt@kms_tv_load_detect@load-detect:
    - shard-iclb:         NOTRUN -> [SKIP][112] ([fdo#109309])
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_tv_load_detect@load-detect.html

  * igt@kms_vblank@pipe-d-wait-forked:
    - shard-iclb:         NOTRUN -> [SKIP][113] ([fdo#109278]) +9 similar issues
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_vblank@pipe-d-wait-forked.html

  * igt@kms_vrr@flip-suspend:
    - shard-tglb:         NOTRUN -> [SKIP][114] ([fdo#109502])
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb3/igt@kms_vrr@flip-suspend.html

  * igt@kms_writeback@writeback-check-output:
    - shard-skl:          NOTRUN -> [SKIP][115] ([fdo#109271] / [i915#2437])
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl7/igt@kms_writeback@writeback-check-output.html

  * igt@kms_writeback@writeback-invalid-parameters:
    - shard-iclb:         NOTRUN -> [SKIP][116] ([i915#2437])
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@kms_writeback@writeback-invalid-parameters.html

  * igt@nouveau_crc@pipe-c-source-outp-complete:
    - shard-tglb:         NOTRUN -> [SKIP][117] ([i915#2530]) +2 similar issues
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb7/igt@nouveau_crc@pipe-c-source-outp-complete.html

  * igt@perf@polling-small-buf:
    - shard-skl:          NOTRUN -> [FAIL][118] ([i915#1722])
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl9/igt@perf@polling-small-buf.html

  * igt@prime_vgem@basic-userptr:
    - shard-iclb:         NOTRUN -> [SKIP][119] ([i915#3301])
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@prime_vgem@basic-userptr.html

  * igt@sysfs_clients@fair-0:
    - shard-skl:          NOTRUN -> [SKIP][120] ([fdo#109271] / [i915#2994]) +5 similar issues
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-skl10/igt@sysfs_clients@fair-0.html

  * igt@sysfs_clients@sema-50:
    - shard-apl:          NOTRUN -> [SKIP][121] ([fdo#109271] / [i915#2994]) +1 similar issue
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-apl4/igt@sysfs_clients@sema-50.html

  * igt@sysfs_clients@split-10:
    - shard-kbl:          NOTRUN -> [SKIP][122] ([fdo#109271] / [i915#2994])
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-kbl3/igt@sysfs_clients@split-10.html

  
#### Possible fixes ####

  * igt@fbdev@info:
    - {shard-rkl}:        ([SKIP][123], [SKIP][124]) ([i915#2582]) -> [PASS][125]
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-rkl-1/igt@fbdev@info.html
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-rkl-4/igt@fbdev@info.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-rkl-2/igt@fbdev@info.html

  * igt@gem_eio@unwedge-stress:
    - shard-tglb:         [TIMEOUT][126] ([i915#2369] / [i915#3063] / [i915#3648]) -> [PASS][127]
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-tglb8/igt@gem_eio@unwedge-stress.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-tglb5/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_capture@pi@vcs0:
    - shard-iclb:         [INCOMPLETE][128] ([i915#2369]) -> [PASS][129]
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-iclb4/igt@gem_exec_capture@pi@vcs0.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/shard-iclb2/igt@gem_exec_capture@pi@vcs0.html

  * igt@gem_exec_fair@basic-none-rrul@rcs0:
    - shard-glk:          [FAIL][130] ([i915#2842]) -> [PASS][131] +1 similar issue
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10900/shard-glk2/igt@gem_exec_fair@basic-none-rrul@rcs0.html
   [131]:

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21630/index.html

[-- Attachment #2: Type: text/html, Size: 33735 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 5/6] drm/i915/ttm: Implement asynchronous TTM moves
  2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
@ 2021-11-19 17:18     ` Matthew Auld
  -1 siblings, 0 replies; 31+ messages in thread
From: Matthew Auld @ 2021-11-19 17:18 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 18/11/2021 13:02, Thomas Hellström wrote:
> Don't wait sync while migrating, but rather make the GPU blit await the
> dependencies and add a moving fence to the object.
> 
> This also enables asynchronous VRAM management in that on eviction,
> rather than waiting for the moving fence to expire before freeing VRAM,
> it is freed immediately and the fence is stored with the VRAM manager and
> handed out to newly allocated objects to await before clears and swapins,
> or for kernel objects before setting up gpu vmas or mapping.
> 
> To collect dependencies before migrating, add a set of utilities that
> coalesce these to a single dma_fence.
> 
> What is still missing for fully asynchronous operation is asynchronous vma
> unbinding, which is still to be implemented.
> 
> This commit substantially reduces execution time in the gem_lmem_swapping
> test.
> 
> v2:
> - Make a couple of functions static.
> v4:
> - Fix some style issues (Matthew Auld)
> - Audit and add more checks for ghost objects (Matthew Auld)
> - Add more documentation for the i915_deps utility (Mattew Auld)
> - Simplify the i915_deps_sync() function
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  32 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 338 +++++++++++++++++--
>   drivers/gpu/drm/i915/gem/i915_gem_wait.c     |   4 +-
>   4 files changed, 344 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index e37157b080e4..81e84c1763de 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -248,10 +248,13 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>   	struct ttm_resource_manager *man =
>   		ttm_manager_type(bo->bdev, bo->resource->mem_type);
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> -	enum ttm_caching caching = i915_ttm_select_tt_caching(obj);
> +	enum ttm_caching caching;
>   	struct i915_ttm_tt *i915_tt;
>   	int ret;
>   
> +	if (!obj)
> +		return NULL;
> +
>   	i915_tt = kzalloc(sizeof(*i915_tt), GFP_KERNEL);
>   	if (!i915_tt)
>   		return NULL;
> @@ -260,6 +263,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>   	    man->use_tt)
>   		page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
>   
> +	caching = i915_ttm_select_tt_caching(obj);
>   	if (i915_gem_object_is_shrinkable(obj) && caching == ttm_cached) {
>   		page_flags |= TTM_TT_FLAG_EXTERNAL |
>   			      TTM_TT_FLAG_EXTERNAL_MAPPABLE;
> @@ -326,6 +330,9 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
>   {
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>   
> +	if (!obj)
> +		return false;
> +
>   	/*
>   	 * EXTERNAL objects should never be swapped out by TTM, instead we need
>   	 * to handle that ourselves. TTM will already skip such objects for us,
> @@ -552,8 +559,12 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>   static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
>   {
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> -	int ret = i915_ttm_move_notify(bo);
> +	int ret;
>   
> +	if (!obj)
> +		return;
> +
> +	ret = i915_ttm_move_notify(bo);
>   	GEM_WARN_ON(ret);
>   	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
>   	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
> @@ -575,17 +586,23 @@ static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo,
>   					 unsigned long page_offset)
>   {
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> -	unsigned long base = obj->mm.region->iomap.base - obj->mm.region->region.start;
>   	struct scatterlist *sg;
> +	unsigned long base;
>   	unsigned int ofs;
>   
> +	GEM_BUG_ON(!obj);
>   	GEM_WARN_ON(bo->ttm);
>   
> +	base = obj->mm.region->iomap.base - obj->mm.region->region.start;
>   	sg = __i915_gem_object_get_sg(obj, &obj->ttm.get_io_page, page_offset, &ofs, true);
>   
>   	return ((base + sg_dma_address(sg)) >> PAGE_SHIFT) + ofs;
>   }
>   
> +/*
> + * All callbacks need to take care not to downcast a struct ttm_buffer_object
> + * without checking its subclass, since it might be a TTM ghost object.
> + */
>   static struct ttm_device_funcs i915_ttm_bo_driver = {
>   	.ttm_tt_create = i915_ttm_tt_create,
>   	.ttm_tt_populate = i915_ttm_tt_populate,
> @@ -847,13 +864,16 @@ static void i915_ttm_delayed_free(struct drm_i915_gem_object *obj)
>   static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
>   {
>   	struct vm_area_struct *area = vmf->vma;
> -	struct drm_i915_gem_object *obj =
> -		i915_ttm_to_gem(area->vm_private_data);
> -	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> +	struct ttm_buffer_object *bo = area->vm_private_data;
>   	struct drm_device *dev = bo->base.dev;
> +	struct drm_i915_gem_object *obj;
>   	vm_fault_t ret;
>   	int idx;
>   
> +	obj = i915_ttm_to_gem(bo);
> +	if (!obj)
> +		return VM_FAULT_SIGBUS;

This should by safe outside of the object lock below, right?

> +
>   	/* Sanity check that we allow writing into this object */
>   	if (unlikely(i915_gem_object_is_readonly(obj) &&
>   		     area->vm_flags & VM_WRITE))
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> index 82cdabb542be..9d698ad00853 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> @@ -37,7 +37,7 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo);
>   static inline struct drm_i915_gem_object *
>   i915_ttm_to_gem(struct ttm_buffer_object *bo)
>   {
> -	if (GEM_WARN_ON(bo->destroy != i915_ttm_bo_destroy))
> +	if (bo->destroy != i915_ttm_bo_destroy)
>   		return NULL;
>   
>   	return container_of(bo, struct drm_i915_gem_object, __do_not_access);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index f35b386c56ca..38623fde170a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -3,6 +3,8 @@
>    * Copyright © 2021 Intel Corporation
>    */
>   
> +#include <linux/dma-fence-array.h>
> +
>   #include <drm/ttm/ttm_bo_driver.h>
>   
>   #include "i915_drv.h"
> @@ -41,6 +43,234 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>   }
>   #endif
>   
> +/**
> + * DOC: Set of utilities to dynamically collect dependencies and
> + * eventually coalesce them into a single fence which is fed into
> + * the GT migration code, since it only accepts a single dependency
> + * fence.
> + * The single fence returned from these utilities, in the case of
> + * dependencies from multiple fence contexts, a struct dma_fence_array,
> + * since the i915 request code can break that up and await the individual
> + * fences.
> + *
> + * Once we can do async unbinding, this is also needed to coalesce
> + * the migration fence with the unbind fences.
> + *
> + * While collecting the individual dependencies, we store the refcounted
> + * struct dma_fence pointers in a realloc-managed pointer array, since
> + * that can be easily fed into a dma_fence_array. Other options are
> + * available, like for example an xarray for similarity with drm/sched.
> + * Can be changed easily if needed.
> + *
> + * A struct i915_deps need to be initialized using i915_deps_init().
> + * If i915_deps_add_dependency() or i915_deps_add_resv() return an
> + * error code they will internally call i915_deps_fini(), which frees
> + * all internal references and allocations. After a call to
> + * i915_deps_to_fence(), or i915_deps_sync(), the struct should similarly
> + * be viewed as uninitialized.
> + *
> + * We might want to break this out into a separate file as a utility.
> + */
> +
> +#define I915_DEPS_MIN_ALLOC_CHUNK 8U
> +
> +/**
> + * struct i915_deps - Collect dependencies into a single dma-fence
> + * @single: Storage for pointer if the collection is a single fence.
> + * @fence: Allocated array of fence pointers if more than a single fence;
> + * otherwise points to the address of @single.
> + * @num_deps: Current number of dependency fences.
> + * @fences_size: Size of the @fences array in number of pointers.
> + * @gfp: Allocation mode.
> + */
> +struct i915_deps {
> +	struct dma_fence *single;
> +	struct dma_fence **fences;
> +	unsigned int num_deps;
> +	unsigned int fences_size;
> +	gfp_t gfp;
> +};
> +
> +static void i915_deps_reset_fences(struct i915_deps *deps)
> +{
> +	if (deps->fences != &deps->single)
> +		kfree(deps->fences);
> +	deps->num_deps = 0;
> +	deps->fences_size = 1;
> +	deps->fences = &deps->single;
> +}
> +
> +static void i915_deps_init(struct i915_deps *deps, gfp_t gfp)
> +{
> +	deps->fences = NULL;
> +	deps->gfp = gfp;
> +	i915_deps_reset_fences(deps);
> +}
> +
> +static void i915_deps_fini(struct i915_deps *deps)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < deps->num_deps; ++i)
> +		dma_fence_put(deps->fences[i]);
> +
> +	if (deps->fences != &deps->single)
> +		kfree(deps->fences);
> +}
> +
> +static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence,
> +			  const struct ttm_operation_ctx *ctx)
> +{
> +	int ret;
> +
> +	if (deps->num_deps >= deps->fences_size) {
> +		unsigned int new_size = 2 * deps->fences_size;
> +		struct dma_fence **new_fences;
> +
> +		new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK);
> +		new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp);
> +		if (!new_fences)
> +			goto sync;
> +
> +		memcpy(new_fences, deps->fences,
> +		       deps->fences_size * sizeof(*new_fences));
> +		swap(new_fences, deps->fences);
> +		if (new_fences != &deps->single)
> +			kfree(new_fences);
> +		deps->fences_size = new_size;
> +	}
> +	deps->fences[deps->num_deps++] = dma_fence_get(fence);
> +	return 0;
> +
> +sync:
> +	if (ctx->no_wait_gpu) {

Check if signalled?

> +		ret = -EBUSY;
> +		goto unref;
> +	}
> +
> +	ret = dma_fence_wait(fence, ctx->interruptible);
> +	if (ret)
> +		goto unref;
> +
> +	ret = fence->error;
> +	if (ret)
> +		goto unref;
> +
> +	return 0;
> +
> +unref:
> +	i915_deps_fini(deps);
> +	return ret;
> +}
> +
> +static int i915_deps_sync(struct i915_deps *deps,
> +			  const struct ttm_operation_ctx *ctx)
> +{
> +	struct dma_fence **fences = deps->fences;
> +	unsigned int i;
> +	int ret = 0;
> +
> +	for (i = 0; i < deps->num_deps; ++i, ++fences) {
> +		if (ctx->no_wait_gpu) {

Ditto?

> +			ret = -EBUSY;
> +			break;
> +		}
> +
> +		ret = dma_fence_wait(*fences, ctx->interruptible);
> +		if (!ret)
> +			ret = (*fences)->error;
> +		if (ret)
> +			break;
> +	}
> +
> +	i915_deps_fini(deps);
> +	return ret;
> +}
> +
> +static int i915_deps_add_dependency(struct i915_deps *deps,
> +				    struct dma_fence *fence,
> +				    const struct ttm_operation_ctx *ctx)
> +{
> +	unsigned int i;
> +	int ret;
> +
> +	if (!fence)
> +		return 0;
> +
> +	if (dma_fence_is_signaled(fence)) {
> +		ret = fence->error;
> +		if (ret)
> +			i915_deps_fini(deps);
> +		return ret;
> +	}
> +
> +	for (i = 0; i < deps->num_deps; ++i) {
> +		struct dma_fence *entry = deps->fences[i];
> +
> +		if (!entry->context || entry->context != fence->context)
> +			continue;
> +
> +		if (dma_fence_is_later(fence, entry)) {
> +			dma_fence_put(entry);
> +			deps->fences[i] = dma_fence_get(fence);
> +		}
> +
> +		return 0;
> +	}
> +
> +	return i915_deps_grow(deps, fence, ctx);
> +}
> +
> +static struct dma_fence *i915_deps_to_fence(struct i915_deps *deps,
> +					    const struct ttm_operation_ctx *ctx)
> +{
> +	struct dma_fence_array *array;
> +
> +	if (deps->num_deps == 0)
> +		return NULL;
> +
> +	if (deps->num_deps == 1) {
> +		deps->num_deps = 0;
> +		return deps->fences[0];
> +	}
> +
> +	/*
> +	 * TODO: Alter the allocation mode here to not try too hard to
> +	 * make things async.
> +	 */
> +	array = dma_fence_array_create(deps->num_deps, deps->fences, 0, 0,
> +				       false);
> +	if (!array)
> +		return ERR_PTR(i915_deps_sync(deps, ctx));
> +
> +	deps->fences = NULL;
> +	i915_deps_reset_fences(deps);
> +
> +	return &array->base;
> +}
> +
> +static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv,
> +			      bool all, const bool no_excl,
> +			      const struct ttm_operation_ctx *ctx)
> +{
> +	struct dma_resv_iter iter;
> +	struct dma_fence *fence;
> +
> +	dma_resv_assert_held(resv);
> +	dma_resv_for_each_fence(&iter, resv, all, fence) {
> +		int ret;
> +
> +		if (no_excl && !iter.index)

dma_resv_iter_is_exclusive() ?

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> +			continue;
> +
> +		ret = i915_deps_add_dependency(deps, fence, ctx);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
>   static enum i915_cache_level
>   i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>   		     struct ttm_tt *ttm)
> @@ -156,7 +386,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   					     bool clear,
>   					     struct ttm_resource *dst_mem,
>   					     struct ttm_tt *dst_ttm,
> -					     struct sg_table *dst_st)
> +					     struct sg_table *dst_st,
> +					     struct dma_fence *dep)
>   {
>   	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
>   						     bdev);
> @@ -180,7 +411,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   			return ERR_PTR(-EINVAL);
>   
>   		intel_engine_pm_get(i915->gt.migrate.context->engine);
> -		ret = intel_context_migrate_clear(i915->gt.migrate.context, NULL,
> +		ret = intel_context_migrate_clear(i915->gt.migrate.context, dep,
>   						  dst_st->sgl, dst_level,
>   						  i915_ttm_gtt_binds_lmem(dst_mem),
>   						  0, &rq);
> @@ -194,7 +425,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>   		intel_engine_pm_get(i915->gt.migrate.context->engine);
>   		ret = intel_context_migrate_copy(i915->gt.migrate.context,
> -						 NULL, src_rsgt->table.sgl,
> +						 dep, src_rsgt->table.sgl,
>   						 src_level,
>   						 i915_ttm_gtt_binds_lmem(bo->resource),
>   						 dst_st->sgl, dst_level,
> @@ -378,10 +609,11 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
>   	return &work->fence;
>   }
>   
> -static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
> -			    struct ttm_resource *dst_mem,
> -			    struct ttm_tt *dst_ttm,
> -			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
> +static struct dma_fence *
> +__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
> +		struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
> +		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
> +		struct dma_fence *move_dep)
>   {
>   	struct i915_ttm_memcpy_work *copy_work = NULL;
>   	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
> @@ -389,7 +621,7 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   
>   	if (allow_accel) {
>   		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
> -					    &dst_rsgt->table);
> +					    &dst_rsgt->table, move_dep);
>   
>   		/*
>   		 * We only need to intercept the error when moving to lmem.
> @@ -423,6 +655,11 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   
>   		if (!IS_ERR(fence))
>   			goto out;
> +	} else if (move_dep) {
> +		int err = dma_fence_wait(move_dep, true);
> +
> +		if (err)
> +			return ERR_PTR(err);
>   	}
>   
>   	/* Error intercept failed or no accelerated migration to start with */
> @@ -433,16 +670,35 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   	i915_ttm_memcpy_release(arg);
>   	kfree(copy_work);
>   
> -	return;
> +	return NULL;
>   out:
> -	/* Sync here for now, forward the fence to caller when fully async. */
> -	if (fence) {
> -		dma_fence_wait(fence, false);
> -		dma_fence_put(fence);
> -	} else if (copy_work) {
> +	if (!fence && copy_work) {
>   		i915_ttm_memcpy_release(arg);
>   		kfree(copy_work);
>   	}
> +
> +	return fence;
> +}
> +
> +static struct dma_fence *prev_fence(struct ttm_buffer_object *bo,
> +				    struct ttm_operation_ctx *ctx)
> +{
> +	struct i915_deps deps;
> +	int ret;
> +
> +	/*
> +	 * Instead of trying hard with GFP_KERNEL to allocate memory,
> +	 * the dependency collection will just sync if it doesn't
> +	 * succeed.
> +	 */
> +	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
> +	ret = i915_deps_add_dependency(&deps, bo->moving, ctx);
> +	if (!ret)
> +		ret = i915_deps_add_resv(&deps, bo->base.resv, false, false, ctx);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	return i915_deps_to_fence(&deps, ctx);
>   }
>   
>   /**
> @@ -462,15 +718,16 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>   	struct ttm_resource_manager *dst_man =
>   		ttm_manager_type(bo->bdev, dst_mem->mem_type);
> +	struct dma_fence *migration_fence = NULL;
>   	struct ttm_tt *ttm = bo->ttm;
>   	struct i915_refct_sgt *dst_rsgt;
>   	bool clear;
>   	int ret;
>   
> -	/* Sync for now. We could do the actual copy async. */
> -	ret = ttm_bo_wait_ctx(bo, ctx);
> -	if (ret)
> -		return ret;
> +	if (GEM_WARN_ON(!obj)) {
> +		ttm_bo_move_null(bo, dst_mem);
> +		return 0;
> +	}
>   
>   	ret = i915_ttm_move_notify(bo);
>   	if (ret)
> @@ -494,10 +751,37 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>   		return PTR_ERR(dst_rsgt);
>   
>   	clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
> -	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
> -		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
> +	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
> +		struct dma_fence *dep = prev_fence(bo, ctx);
> +
> +		if (IS_ERR(dep)) {
> +			i915_refct_sgt_put(dst_rsgt);
> +			return PTR_ERR(dep);
> +		}
> +
> +		migration_fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm,
> +						  dst_rsgt, true, dep);
> +		dma_fence_put(dep);
> +	}
> +
> +	/* We can possibly get an -ERESTARTSYS here */
> +	if (IS_ERR(migration_fence)) {
> +		i915_refct_sgt_put(dst_rsgt);
> +		return PTR_ERR(migration_fence);
> +	}
> +
> +	if (migration_fence) {
> +		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
> +						true, dst_mem);
> +		if (ret) {
> +			dma_fence_wait(migration_fence, false);
> +			ttm_bo_move_sync_cleanup(bo, dst_mem);
> +		}
> +		dma_fence_put(migration_fence);
> +	} else {
> +		ttm_bo_move_sync_cleanup(bo, dst_mem);
> +	}
>   
> -	ttm_bo_move_sync_cleanup(bo, dst_mem);
>   	i915_ttm_adjust_domains_after_move(obj);
>   	i915_ttm_free_cached_io_rsgt(obj);
>   
> @@ -538,6 +822,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>   		.interruptible = intr,
>   	};
>   	struct i915_refct_sgt *dst_rsgt;
> +	struct dma_fence *copy_fence;
>   	int ret;
>   
>   	assert_object_held(dst);
> @@ -553,10 +838,17 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>   		return ret;
>   
>   	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
> -	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
> -			dst_rsgt, allow_accel);
> +	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
> +				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
>   
>   	i915_refct_sgt_put(dst_rsgt);
> +	if (IS_ERR(copy_fence))
> +		return PTR_ERR(copy_fence);
> +
> +	if (copy_fence) {
> +		dma_fence_wait(copy_fence, false);
> +		dma_fence_put(copy_fence);
> +	}
>   
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> index f909aaa09d9c..bae65796a6cc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> @@ -306,6 +306,6 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
>   				   unsigned int flags)
>   {
>   	might_sleep();
> -	/* NOP for now. */
> -	return 0;
> +
> +	return i915_gem_object_wait_moving_fence(obj, !!(flags & I915_WAIT_INTERRUPTIBLE));
>   }
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v4 5/6] drm/i915/ttm: Implement asynchronous TTM moves
@ 2021-11-19 17:18     ` Matthew Auld
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Auld @ 2021-11-19 17:18 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 18/11/2021 13:02, Thomas Hellström wrote:
> Don't wait sync while migrating, but rather make the GPU blit await the
> dependencies and add a moving fence to the object.
> 
> This also enables asynchronous VRAM management in that on eviction,
> rather than waiting for the moving fence to expire before freeing VRAM,
> it is freed immediately and the fence is stored with the VRAM manager and
> handed out to newly allocated objects to await before clears and swapins,
> or for kernel objects before setting up gpu vmas or mapping.
> 
> To collect dependencies before migrating, add a set of utilities that
> coalesce these to a single dma_fence.
> 
> What is still missing for fully asynchronous operation is asynchronous vma
> unbinding, which is still to be implemented.
> 
> This commit substantially reduces execution time in the gem_lmem_swapping
> test.
> 
> v2:
> - Make a couple of functions static.
> v4:
> - Fix some style issues (Matthew Auld)
> - Audit and add more checks for ghost objects (Matthew Auld)
> - Add more documentation for the i915_deps utility (Mattew Auld)
> - Simplify the i915_deps_sync() function
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  32 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 338 +++++++++++++++++--
>   drivers/gpu/drm/i915/gem/i915_gem_wait.c     |   4 +-
>   4 files changed, 344 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index e37157b080e4..81e84c1763de 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -248,10 +248,13 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>   	struct ttm_resource_manager *man =
>   		ttm_manager_type(bo->bdev, bo->resource->mem_type);
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> -	enum ttm_caching caching = i915_ttm_select_tt_caching(obj);
> +	enum ttm_caching caching;
>   	struct i915_ttm_tt *i915_tt;
>   	int ret;
>   
> +	if (!obj)
> +		return NULL;
> +
>   	i915_tt = kzalloc(sizeof(*i915_tt), GFP_KERNEL);
>   	if (!i915_tt)
>   		return NULL;
> @@ -260,6 +263,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>   	    man->use_tt)
>   		page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
>   
> +	caching = i915_ttm_select_tt_caching(obj);
>   	if (i915_gem_object_is_shrinkable(obj) && caching == ttm_cached) {
>   		page_flags |= TTM_TT_FLAG_EXTERNAL |
>   			      TTM_TT_FLAG_EXTERNAL_MAPPABLE;
> @@ -326,6 +330,9 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
>   {
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>   
> +	if (!obj)
> +		return false;
> +
>   	/*
>   	 * EXTERNAL objects should never be swapped out by TTM, instead we need
>   	 * to handle that ourselves. TTM will already skip such objects for us,
> @@ -552,8 +559,12 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>   static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
>   {
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> -	int ret = i915_ttm_move_notify(bo);
> +	int ret;
>   
> +	if (!obj)
> +		return;
> +
> +	ret = i915_ttm_move_notify(bo);
>   	GEM_WARN_ON(ret);
>   	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
>   	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
> @@ -575,17 +586,23 @@ static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo,
>   					 unsigned long page_offset)
>   {
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> -	unsigned long base = obj->mm.region->iomap.base - obj->mm.region->region.start;
>   	struct scatterlist *sg;
> +	unsigned long base;
>   	unsigned int ofs;
>   
> +	GEM_BUG_ON(!obj);
>   	GEM_WARN_ON(bo->ttm);
>   
> +	base = obj->mm.region->iomap.base - obj->mm.region->region.start;
>   	sg = __i915_gem_object_get_sg(obj, &obj->ttm.get_io_page, page_offset, &ofs, true);
>   
>   	return ((base + sg_dma_address(sg)) >> PAGE_SHIFT) + ofs;
>   }
>   
> +/*
> + * All callbacks need to take care not to downcast a struct ttm_buffer_object
> + * without checking its subclass, since it might be a TTM ghost object.
> + */
>   static struct ttm_device_funcs i915_ttm_bo_driver = {
>   	.ttm_tt_create = i915_ttm_tt_create,
>   	.ttm_tt_populate = i915_ttm_tt_populate,
> @@ -847,13 +864,16 @@ static void i915_ttm_delayed_free(struct drm_i915_gem_object *obj)
>   static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
>   {
>   	struct vm_area_struct *area = vmf->vma;
> -	struct drm_i915_gem_object *obj =
> -		i915_ttm_to_gem(area->vm_private_data);
> -	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> +	struct ttm_buffer_object *bo = area->vm_private_data;
>   	struct drm_device *dev = bo->base.dev;
> +	struct drm_i915_gem_object *obj;
>   	vm_fault_t ret;
>   	int idx;
>   
> +	obj = i915_ttm_to_gem(bo);
> +	if (!obj)
> +		return VM_FAULT_SIGBUS;

This should by safe outside of the object lock below, right?

> +
>   	/* Sanity check that we allow writing into this object */
>   	if (unlikely(i915_gem_object_is_readonly(obj) &&
>   		     area->vm_flags & VM_WRITE))
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> index 82cdabb542be..9d698ad00853 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> @@ -37,7 +37,7 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo);
>   static inline struct drm_i915_gem_object *
>   i915_ttm_to_gem(struct ttm_buffer_object *bo)
>   {
> -	if (GEM_WARN_ON(bo->destroy != i915_ttm_bo_destroy))
> +	if (bo->destroy != i915_ttm_bo_destroy)
>   		return NULL;
>   
>   	return container_of(bo, struct drm_i915_gem_object, __do_not_access);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index f35b386c56ca..38623fde170a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -3,6 +3,8 @@
>    * Copyright © 2021 Intel Corporation
>    */
>   
> +#include <linux/dma-fence-array.h>
> +
>   #include <drm/ttm/ttm_bo_driver.h>
>   
>   #include "i915_drv.h"
> @@ -41,6 +43,234 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>   }
>   #endif
>   
> +/**
> + * DOC: Set of utilities to dynamically collect dependencies and
> + * eventually coalesce them into a single fence which is fed into
> + * the GT migration code, since it only accepts a single dependency
> + * fence.
> + * The single fence returned from these utilities, in the case of
> + * dependencies from multiple fence contexts, a struct dma_fence_array,
> + * since the i915 request code can break that up and await the individual
> + * fences.
> + *
> + * Once we can do async unbinding, this is also needed to coalesce
> + * the migration fence with the unbind fences.
> + *
> + * While collecting the individual dependencies, we store the refcounted
> + * struct dma_fence pointers in a realloc-managed pointer array, since
> + * that can be easily fed into a dma_fence_array. Other options are
> + * available, like for example an xarray for similarity with drm/sched.
> + * Can be changed easily if needed.
> + *
> + * A struct i915_deps need to be initialized using i915_deps_init().
> + * If i915_deps_add_dependency() or i915_deps_add_resv() return an
> + * error code they will internally call i915_deps_fini(), which frees
> + * all internal references and allocations. After a call to
> + * i915_deps_to_fence(), or i915_deps_sync(), the struct should similarly
> + * be viewed as uninitialized.
> + *
> + * We might want to break this out into a separate file as a utility.
> + */
> +
> +#define I915_DEPS_MIN_ALLOC_CHUNK 8U
> +
> +/**
> + * struct i915_deps - Collect dependencies into a single dma-fence
> + * @single: Storage for pointer if the collection is a single fence.
> + * @fence: Allocated array of fence pointers if more than a single fence;
> + * otherwise points to the address of @single.
> + * @num_deps: Current number of dependency fences.
> + * @fences_size: Size of the @fences array in number of pointers.
> + * @gfp: Allocation mode.
> + */
> +struct i915_deps {
> +	struct dma_fence *single;
> +	struct dma_fence **fences;
> +	unsigned int num_deps;
> +	unsigned int fences_size;
> +	gfp_t gfp;
> +};
> +
> +static void i915_deps_reset_fences(struct i915_deps *deps)
> +{
> +	if (deps->fences != &deps->single)
> +		kfree(deps->fences);
> +	deps->num_deps = 0;
> +	deps->fences_size = 1;
> +	deps->fences = &deps->single;
> +}
> +
> +static void i915_deps_init(struct i915_deps *deps, gfp_t gfp)
> +{
> +	deps->fences = NULL;
> +	deps->gfp = gfp;
> +	i915_deps_reset_fences(deps);
> +}
> +
> +static void i915_deps_fini(struct i915_deps *deps)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < deps->num_deps; ++i)
> +		dma_fence_put(deps->fences[i]);
> +
> +	if (deps->fences != &deps->single)
> +		kfree(deps->fences);
> +}
> +
> +static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence,
> +			  const struct ttm_operation_ctx *ctx)
> +{
> +	int ret;
> +
> +	if (deps->num_deps >= deps->fences_size) {
> +		unsigned int new_size = 2 * deps->fences_size;
> +		struct dma_fence **new_fences;
> +
> +		new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK);
> +		new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp);
> +		if (!new_fences)
> +			goto sync;
> +
> +		memcpy(new_fences, deps->fences,
> +		       deps->fences_size * sizeof(*new_fences));
> +		swap(new_fences, deps->fences);
> +		if (new_fences != &deps->single)
> +			kfree(new_fences);
> +		deps->fences_size = new_size;
> +	}
> +	deps->fences[deps->num_deps++] = dma_fence_get(fence);
> +	return 0;
> +
> +sync:
> +	if (ctx->no_wait_gpu) {

Check if signalled?

> +		ret = -EBUSY;
> +		goto unref;
> +	}
> +
> +	ret = dma_fence_wait(fence, ctx->interruptible);
> +	if (ret)
> +		goto unref;
> +
> +	ret = fence->error;
> +	if (ret)
> +		goto unref;
> +
> +	return 0;
> +
> +unref:
> +	i915_deps_fini(deps);
> +	return ret;
> +}
> +
> +static int i915_deps_sync(struct i915_deps *deps,
> +			  const struct ttm_operation_ctx *ctx)
> +{
> +	struct dma_fence **fences = deps->fences;
> +	unsigned int i;
> +	int ret = 0;
> +
> +	for (i = 0; i < deps->num_deps; ++i, ++fences) {
> +		if (ctx->no_wait_gpu) {

Ditto?

> +			ret = -EBUSY;
> +			break;
> +		}
> +
> +		ret = dma_fence_wait(*fences, ctx->interruptible);
> +		if (!ret)
> +			ret = (*fences)->error;
> +		if (ret)
> +			break;
> +	}
> +
> +	i915_deps_fini(deps);
> +	return ret;
> +}
> +
> +static int i915_deps_add_dependency(struct i915_deps *deps,
> +				    struct dma_fence *fence,
> +				    const struct ttm_operation_ctx *ctx)
> +{
> +	unsigned int i;
> +	int ret;
> +
> +	if (!fence)
> +		return 0;
> +
> +	if (dma_fence_is_signaled(fence)) {
> +		ret = fence->error;
> +		if (ret)
> +			i915_deps_fini(deps);
> +		return ret;
> +	}
> +
> +	for (i = 0; i < deps->num_deps; ++i) {
> +		struct dma_fence *entry = deps->fences[i];
> +
> +		if (!entry->context || entry->context != fence->context)
> +			continue;
> +
> +		if (dma_fence_is_later(fence, entry)) {
> +			dma_fence_put(entry);
> +			deps->fences[i] = dma_fence_get(fence);
> +		}
> +
> +		return 0;
> +	}
> +
> +	return i915_deps_grow(deps, fence, ctx);
> +}
> +
> +static struct dma_fence *i915_deps_to_fence(struct i915_deps *deps,
> +					    const struct ttm_operation_ctx *ctx)
> +{
> +	struct dma_fence_array *array;
> +
> +	if (deps->num_deps == 0)
> +		return NULL;
> +
> +	if (deps->num_deps == 1) {
> +		deps->num_deps = 0;
> +		return deps->fences[0];
> +	}
> +
> +	/*
> +	 * TODO: Alter the allocation mode here to not try too hard to
> +	 * make things async.
> +	 */
> +	array = dma_fence_array_create(deps->num_deps, deps->fences, 0, 0,
> +				       false);
> +	if (!array)
> +		return ERR_PTR(i915_deps_sync(deps, ctx));
> +
> +	deps->fences = NULL;
> +	i915_deps_reset_fences(deps);
> +
> +	return &array->base;
> +}
> +
> +static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv,
> +			      bool all, const bool no_excl,
> +			      const struct ttm_operation_ctx *ctx)
> +{
> +	struct dma_resv_iter iter;
> +	struct dma_fence *fence;
> +
> +	dma_resv_assert_held(resv);
> +	dma_resv_for_each_fence(&iter, resv, all, fence) {
> +		int ret;
> +
> +		if (no_excl && !iter.index)

dma_resv_iter_is_exclusive() ?

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> +			continue;
> +
> +		ret = i915_deps_add_dependency(deps, fence, ctx);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
>   static enum i915_cache_level
>   i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>   		     struct ttm_tt *ttm)
> @@ -156,7 +386,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   					     bool clear,
>   					     struct ttm_resource *dst_mem,
>   					     struct ttm_tt *dst_ttm,
> -					     struct sg_table *dst_st)
> +					     struct sg_table *dst_st,
> +					     struct dma_fence *dep)
>   {
>   	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
>   						     bdev);
> @@ -180,7 +411,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   			return ERR_PTR(-EINVAL);
>   
>   		intel_engine_pm_get(i915->gt.migrate.context->engine);
> -		ret = intel_context_migrate_clear(i915->gt.migrate.context, NULL,
> +		ret = intel_context_migrate_clear(i915->gt.migrate.context, dep,
>   						  dst_st->sgl, dst_level,
>   						  i915_ttm_gtt_binds_lmem(dst_mem),
>   						  0, &rq);
> @@ -194,7 +425,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>   		intel_engine_pm_get(i915->gt.migrate.context->engine);
>   		ret = intel_context_migrate_copy(i915->gt.migrate.context,
> -						 NULL, src_rsgt->table.sgl,
> +						 dep, src_rsgt->table.sgl,
>   						 src_level,
>   						 i915_ttm_gtt_binds_lmem(bo->resource),
>   						 dst_st->sgl, dst_level,
> @@ -378,10 +609,11 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
>   	return &work->fence;
>   }
>   
> -static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
> -			    struct ttm_resource *dst_mem,
> -			    struct ttm_tt *dst_ttm,
> -			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
> +static struct dma_fence *
> +__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
> +		struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
> +		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
> +		struct dma_fence *move_dep)
>   {
>   	struct i915_ttm_memcpy_work *copy_work = NULL;
>   	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
> @@ -389,7 +621,7 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   
>   	if (allow_accel) {
>   		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
> -					    &dst_rsgt->table);
> +					    &dst_rsgt->table, move_dep);
>   
>   		/*
>   		 * We only need to intercept the error when moving to lmem.
> @@ -423,6 +655,11 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   
>   		if (!IS_ERR(fence))
>   			goto out;
> +	} else if (move_dep) {
> +		int err = dma_fence_wait(move_dep, true);
> +
> +		if (err)
> +			return ERR_PTR(err);
>   	}
>   
>   	/* Error intercept failed or no accelerated migration to start with */
> @@ -433,16 +670,35 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   	i915_ttm_memcpy_release(arg);
>   	kfree(copy_work);
>   
> -	return;
> +	return NULL;
>   out:
> -	/* Sync here for now, forward the fence to caller when fully async. */
> -	if (fence) {
> -		dma_fence_wait(fence, false);
> -		dma_fence_put(fence);
> -	} else if (copy_work) {
> +	if (!fence && copy_work) {
>   		i915_ttm_memcpy_release(arg);
>   		kfree(copy_work);
>   	}
> +
> +	return fence;
> +}
> +
> +static struct dma_fence *prev_fence(struct ttm_buffer_object *bo,
> +				    struct ttm_operation_ctx *ctx)
> +{
> +	struct i915_deps deps;
> +	int ret;
> +
> +	/*
> +	 * Instead of trying hard with GFP_KERNEL to allocate memory,
> +	 * the dependency collection will just sync if it doesn't
> +	 * succeed.
> +	 */
> +	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
> +	ret = i915_deps_add_dependency(&deps, bo->moving, ctx);
> +	if (!ret)
> +		ret = i915_deps_add_resv(&deps, bo->base.resv, false, false, ctx);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	return i915_deps_to_fence(&deps, ctx);
>   }
>   
>   /**
> @@ -462,15 +718,16 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>   	struct ttm_resource_manager *dst_man =
>   		ttm_manager_type(bo->bdev, dst_mem->mem_type);
> +	struct dma_fence *migration_fence = NULL;
>   	struct ttm_tt *ttm = bo->ttm;
>   	struct i915_refct_sgt *dst_rsgt;
>   	bool clear;
>   	int ret;
>   
> -	/* Sync for now. We could do the actual copy async. */
> -	ret = ttm_bo_wait_ctx(bo, ctx);
> -	if (ret)
> -		return ret;
> +	if (GEM_WARN_ON(!obj)) {
> +		ttm_bo_move_null(bo, dst_mem);
> +		return 0;
> +	}
>   
>   	ret = i915_ttm_move_notify(bo);
>   	if (ret)
> @@ -494,10 +751,37 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>   		return PTR_ERR(dst_rsgt);
>   
>   	clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
> -	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
> -		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
> +	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
> +		struct dma_fence *dep = prev_fence(bo, ctx);
> +
> +		if (IS_ERR(dep)) {
> +			i915_refct_sgt_put(dst_rsgt);
> +			return PTR_ERR(dep);
> +		}
> +
> +		migration_fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm,
> +						  dst_rsgt, true, dep);
> +		dma_fence_put(dep);
> +	}
> +
> +	/* We can possibly get an -ERESTARTSYS here */
> +	if (IS_ERR(migration_fence)) {
> +		i915_refct_sgt_put(dst_rsgt);
> +		return PTR_ERR(migration_fence);
> +	}
> +
> +	if (migration_fence) {
> +		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
> +						true, dst_mem);
> +		if (ret) {
> +			dma_fence_wait(migration_fence, false);
> +			ttm_bo_move_sync_cleanup(bo, dst_mem);
> +		}
> +		dma_fence_put(migration_fence);
> +	} else {
> +		ttm_bo_move_sync_cleanup(bo, dst_mem);
> +	}
>   
> -	ttm_bo_move_sync_cleanup(bo, dst_mem);
>   	i915_ttm_adjust_domains_after_move(obj);
>   	i915_ttm_free_cached_io_rsgt(obj);
>   
> @@ -538,6 +822,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>   		.interruptible = intr,
>   	};
>   	struct i915_refct_sgt *dst_rsgt;
> +	struct dma_fence *copy_fence;
>   	int ret;
>   
>   	assert_object_held(dst);
> @@ -553,10 +838,17 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>   		return ret;
>   
>   	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
> -	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
> -			dst_rsgt, allow_accel);
> +	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
> +				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
>   
>   	i915_refct_sgt_put(dst_rsgt);
> +	if (IS_ERR(copy_fence))
> +		return PTR_ERR(copy_fence);
> +
> +	if (copy_fence) {
> +		dma_fence_wait(copy_fence, false);
> +		dma_fence_put(copy_fence);
> +	}
>   
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> index f909aaa09d9c..bae65796a6cc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> @@ -306,6 +306,6 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
>   				   unsigned int flags)
>   {
>   	might_sleep();
> -	/* NOP for now. */
> -	return 0;
> +
> +	return i915_gem_object_wait_moving_fence(obj, !!(flags & I915_WAIT_INTERRUPTIBLE));
>   }
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 6/6] drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous
  2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
@ 2021-11-19 17:22     ` Matthew Auld
  -1 siblings, 0 replies; 31+ messages in thread
From: Matthew Auld @ 2021-11-19 17:22 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 18/11/2021 13:02, Thomas Hellström wrote:
> Update the copy function i915_gem_obj_copy_ttm() to be asynchronous for
> future users and update the only current user to sync the objects
> as needed after this function.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v4 6/6] drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous
@ 2021-11-19 17:22     ` Matthew Auld
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Auld @ 2021-11-19 17:22 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 18/11/2021 13:02, Thomas Hellström wrote:
> Update the copy function i915_gem_obj_copy_ttm() to be asynchronous for
> future users and update the only current user to sync the objects
> as needed after this function.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/6] drm/i915: Add support for moving fence waiting
  2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
  (?)
  (?)
@ 2021-11-20  9:09     ` kernel test robot
  -1 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2021-11-20  9:09 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel
  Cc: llvm, kbuild-all, Thomas Hellström, matthew.auld

[-- Attachment #1: Type: text/plain, Size: 3696 bytes --]

Hi "Thomas,

I love your patch! Yet something to improve:

[auto build test ERROR on drm-tip/drm-tip]
[also build test ERROR on drm-exynos/exynos-drm-next drm/drm-next v5.16-rc1 next-20211118]
[cannot apply to drm-intel/for-linux-next tegra-drm/drm/tegra/for-next airlied/drm-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Thomas-Hellstr-m/drm-i915-ttm-Async-migration/20211118-210436
base:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
config: x86_64-randconfig-r001-20211118 (attached as .config)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/316dda0c6b91d043111cb348b43be6a6f2e3bb0a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Thomas-Hellstr-m/drm-i915-ttm-Async-migration/20211118-210436
        git checkout 316dda0c6b91d043111cb348b43be6a6f2e3bb0a
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/i915/i915_vma.c:478:13: error: implicit declaration of function 'i915_vma_verify_bind_complete' [-Werror,-Wimplicit-function-declaration]
           GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
                      ^
   1 error generated.


vim +/i915_vma_verify_bind_complete +478 drivers/gpu/drm/i915/i915_vma.c

   463	
   464	void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
   465	{
   466		void __iomem *ptr;
   467		int err;
   468	
   469		if (!i915_gem_object_is_lmem(vma->obj)) {
   470			if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) {
   471				err = -ENODEV;
   472				goto err;
   473			}
   474		}
   475	
   476		GEM_BUG_ON(!i915_vma_is_ggtt(vma));
   477		GEM_BUG_ON(!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND));
 > 478		GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
   479	
   480		ptr = READ_ONCE(vma->iomap);
   481		if (ptr == NULL) {
   482			/*
   483			 * TODO: consider just using i915_gem_object_pin_map() for lmem
   484			 * instead, which already supports mapping non-contiguous chunks
   485			 * of pages, that way we can also drop the
   486			 * I915_BO_ALLOC_CONTIGUOUS when allocating the object.
   487			 */
   488			if (i915_gem_object_is_lmem(vma->obj))
   489				ptr = i915_gem_object_lmem_io_map(vma->obj, 0,
   490								  vma->obj->base.size);
   491			else
   492				ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap,
   493							vma->node.start,
   494							vma->node.size);
   495			if (ptr == NULL) {
   496				err = -ENOMEM;
   497				goto err;
   498			}
   499	
   500			if (unlikely(cmpxchg(&vma->iomap, NULL, ptr))) {
   501				io_mapping_unmap(ptr);
   502				ptr = vma->iomap;
   503			}
   504		}
   505	
   506		__i915_vma_pin(vma);
   507	
   508		err = i915_vma_pin_fence(vma);
   509		if (err)
   510			goto err_unpin;
   511	
   512		i915_vma_set_ggtt_write(vma);
   513	
   514		/* NB Access through the GTT requires the device to be awake. */
   515		return ptr;
   516	
   517	err_unpin:
   518		__i915_vma_unpin(vma);
   519	err:
   520		return IO_ERR_PTR(err);
   521	}
   522	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35610 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/6] drm/i915: Add support for moving fence waiting
@ 2021-11-20  9:09     ` kernel test robot
  0 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2021-11-20  9:09 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel
  Cc: Thomas Hellström, llvm, kbuild-all, matthew.auld

[-- Attachment #1: Type: text/plain, Size: 3696 bytes --]

Hi "Thomas,

I love your patch! Yet something to improve:

[auto build test ERROR on drm-tip/drm-tip]
[also build test ERROR on drm-exynos/exynos-drm-next drm/drm-next v5.16-rc1 next-20211118]
[cannot apply to drm-intel/for-linux-next tegra-drm/drm/tegra/for-next airlied/drm-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Thomas-Hellstr-m/drm-i915-ttm-Async-migration/20211118-210436
base:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
config: x86_64-randconfig-r001-20211118 (attached as .config)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/316dda0c6b91d043111cb348b43be6a6f2e3bb0a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Thomas-Hellstr-m/drm-i915-ttm-Async-migration/20211118-210436
        git checkout 316dda0c6b91d043111cb348b43be6a6f2e3bb0a
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/i915/i915_vma.c:478:13: error: implicit declaration of function 'i915_vma_verify_bind_complete' [-Werror,-Wimplicit-function-declaration]
           GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
                      ^
   1 error generated.


vim +/i915_vma_verify_bind_complete +478 drivers/gpu/drm/i915/i915_vma.c

   463	
   464	void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
   465	{
   466		void __iomem *ptr;
   467		int err;
   468	
   469		if (!i915_gem_object_is_lmem(vma->obj)) {
   470			if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) {
   471				err = -ENODEV;
   472				goto err;
   473			}
   474		}
   475	
   476		GEM_BUG_ON(!i915_vma_is_ggtt(vma));
   477		GEM_BUG_ON(!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND));
 > 478		GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
   479	
   480		ptr = READ_ONCE(vma->iomap);
   481		if (ptr == NULL) {
   482			/*
   483			 * TODO: consider just using i915_gem_object_pin_map() for lmem
   484			 * instead, which already supports mapping non-contiguous chunks
   485			 * of pages, that way we can also drop the
   486			 * I915_BO_ALLOC_CONTIGUOUS when allocating the object.
   487			 */
   488			if (i915_gem_object_is_lmem(vma->obj))
   489				ptr = i915_gem_object_lmem_io_map(vma->obj, 0,
   490								  vma->obj->base.size);
   491			else
   492				ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap,
   493							vma->node.start,
   494							vma->node.size);
   495			if (ptr == NULL) {
   496				err = -ENOMEM;
   497				goto err;
   498			}
   499	
   500			if (unlikely(cmpxchg(&vma->iomap, NULL, ptr))) {
   501				io_mapping_unmap(ptr);
   502				ptr = vma->iomap;
   503			}
   504		}
   505	
   506		__i915_vma_pin(vma);
   507	
   508		err = i915_vma_pin_fence(vma);
   509		if (err)
   510			goto err_unpin;
   511	
   512		i915_vma_set_ggtt_write(vma);
   513	
   514		/* NB Access through the GTT requires the device to be awake. */
   515		return ptr;
   516	
   517	err_unpin:
   518		__i915_vma_unpin(vma);
   519	err:
   520		return IO_ERR_PTR(err);
   521	}
   522	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35610 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v4 1/6] drm/i915: Add support for moving fence waiting
@ 2021-11-20  9:09     ` kernel test robot
  0 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2021-11-20  9:09 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel
  Cc: Thomas Hellström, llvm, kbuild-all, matthew.auld

[-- Attachment #1: Type: text/plain, Size: 3696 bytes --]

Hi "Thomas,

I love your patch! Yet something to improve:

[auto build test ERROR on drm-tip/drm-tip]
[also build test ERROR on drm-exynos/exynos-drm-next drm/drm-next v5.16-rc1 next-20211118]
[cannot apply to drm-intel/for-linux-next tegra-drm/drm/tegra/for-next airlied/drm-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Thomas-Hellstr-m/drm-i915-ttm-Async-migration/20211118-210436
base:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
config: x86_64-randconfig-r001-20211118 (attached as .config)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/316dda0c6b91d043111cb348b43be6a6f2e3bb0a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Thomas-Hellstr-m/drm-i915-ttm-Async-migration/20211118-210436
        git checkout 316dda0c6b91d043111cb348b43be6a6f2e3bb0a
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/i915/i915_vma.c:478:13: error: implicit declaration of function 'i915_vma_verify_bind_complete' [-Werror,-Wimplicit-function-declaration]
           GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
                      ^
   1 error generated.


vim +/i915_vma_verify_bind_complete +478 drivers/gpu/drm/i915/i915_vma.c

   463	
   464	void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
   465	{
   466		void __iomem *ptr;
   467		int err;
   468	
   469		if (!i915_gem_object_is_lmem(vma->obj)) {
   470			if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) {
   471				err = -ENODEV;
   472				goto err;
   473			}
   474		}
   475	
   476		GEM_BUG_ON(!i915_vma_is_ggtt(vma));
   477		GEM_BUG_ON(!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND));
 > 478		GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
   479	
   480		ptr = READ_ONCE(vma->iomap);
   481		if (ptr == NULL) {
   482			/*
   483			 * TODO: consider just using i915_gem_object_pin_map() for lmem
   484			 * instead, which already supports mapping non-contiguous chunks
   485			 * of pages, that way we can also drop the
   486			 * I915_BO_ALLOC_CONTIGUOUS when allocating the object.
   487			 */
   488			if (i915_gem_object_is_lmem(vma->obj))
   489				ptr = i915_gem_object_lmem_io_map(vma->obj, 0,
   490								  vma->obj->base.size);
   491			else
   492				ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap,
   493							vma->node.start,
   494							vma->node.size);
   495			if (ptr == NULL) {
   496				err = -ENOMEM;
   497				goto err;
   498			}
   499	
   500			if (unlikely(cmpxchg(&vma->iomap, NULL, ptr))) {
   501				io_mapping_unmap(ptr);
   502				ptr = vma->iomap;
   503			}
   504		}
   505	
   506		__i915_vma_pin(vma);
   507	
   508		err = i915_vma_pin_fence(vma);
   509		if (err)
   510			goto err_unpin;
   511	
   512		i915_vma_set_ggtt_write(vma);
   513	
   514		/* NB Access through the GTT requires the device to be awake. */
   515		return ptr;
   516	
   517	err_unpin:
   518		__i915_vma_unpin(vma);
   519	err:
   520		return IO_ERR_PTR(err);
   521	}
   522	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35610 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/6] drm/i915: Add support for moving fence waiting
@ 2021-11-20  9:09     ` kernel test robot
  0 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2021-11-20  9:09 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3799 bytes --]

Hi "Thomas,

I love your patch! Yet something to improve:

[auto build test ERROR on drm-tip/drm-tip]
[also build test ERROR on drm-exynos/exynos-drm-next drm/drm-next v5.16-rc1 next-20211118]
[cannot apply to drm-intel/for-linux-next tegra-drm/drm/tegra/for-next airlied/drm-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Thomas-Hellstr-m/drm-i915-ttm-Async-migration/20211118-210436
base:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
config: x86_64-randconfig-r001-20211118 (attached as .config)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/316dda0c6b91d043111cb348b43be6a6f2e3bb0a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Thomas-Hellstr-m/drm-i915-ttm-Async-migration/20211118-210436
        git checkout 316dda0c6b91d043111cb348b43be6a6f2e3bb0a
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/i915/i915_vma.c:478:13: error: implicit declaration of function 'i915_vma_verify_bind_complete' [-Werror,-Wimplicit-function-declaration]
           GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
                      ^
   1 error generated.


vim +/i915_vma_verify_bind_complete +478 drivers/gpu/drm/i915/i915_vma.c

   463	
   464	void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
   465	{
   466		void __iomem *ptr;
   467		int err;
   468	
   469		if (!i915_gem_object_is_lmem(vma->obj)) {
   470			if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) {
   471				err = -ENODEV;
   472				goto err;
   473			}
   474		}
   475	
   476		GEM_BUG_ON(!i915_vma_is_ggtt(vma));
   477		GEM_BUG_ON(!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND));
 > 478		GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
   479	
   480		ptr = READ_ONCE(vma->iomap);
   481		if (ptr == NULL) {
   482			/*
   483			 * TODO: consider just using i915_gem_object_pin_map() for lmem
   484			 * instead, which already supports mapping non-contiguous chunks
   485			 * of pages, that way we can also drop the
   486			 * I915_BO_ALLOC_CONTIGUOUS when allocating the object.
   487			 */
   488			if (i915_gem_object_is_lmem(vma->obj))
   489				ptr = i915_gem_object_lmem_io_map(vma->obj, 0,
   490								  vma->obj->base.size);
   491			else
   492				ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap,
   493							vma->node.start,
   494							vma->node.size);
   495			if (ptr == NULL) {
   496				err = -ENOMEM;
   497				goto err;
   498			}
   499	
   500			if (unlikely(cmpxchg(&vma->iomap, NULL, ptr))) {
   501				io_mapping_unmap(ptr);
   502				ptr = vma->iomap;
   503			}
   504		}
   505	
   506		__i915_vma_pin(vma);
   507	
   508		err = i915_vma_pin_fence(vma);
   509		if (err)
   510			goto err_unpin;
   511	
   512		i915_vma_set_ggtt_write(vma);
   513	
   514		/* NB Access through the GTT requires the device to be awake. */
   515		return ptr;
   516	
   517	err_unpin:
   518		__i915_vma_unpin(vma);
   519	err:
   520		return IO_ERR_PTR(err);
   521	}
   522	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35610 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-11-20  9:10 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-18 13:02 [PATCH v4 0/6] drm/i915/ttm: Async migration Thomas Hellström
2021-11-18 13:02 ` [Intel-gfx] " Thomas Hellström
2021-11-18 13:02 ` [PATCH v4 1/6] drm/i915: Add support for moving fence waiting Thomas Hellström
2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
2021-11-18 14:12   ` Matthew Auld
2021-11-18 14:12     ` [Intel-gfx] " Matthew Auld
2021-11-20  9:09   ` kernel test robot
2021-11-20  9:09     ` kernel test robot
2021-11-20  9:09     ` [Intel-gfx] " kernel test robot
2021-11-20  9:09     ` kernel test robot
2021-11-18 13:02 ` [PATCH v4 2/6] drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function Thomas Hellström
2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
2021-11-18 13:02 ` [PATCH v4 3/6] drm/i915/ttm: Drop region reference counting Thomas Hellström
2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
2021-11-18 14:19   ` Matthew Auld
2021-11-18 14:19     ` [Intel-gfx] " Matthew Auld
2021-11-18 13:02 ` [PATCH v4 4/6] drm/i915/ttm: Correctly handle waiting for gpu when shrinking Thomas Hellström
2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
2021-11-18 14:30   ` Matthew Auld
2021-11-18 14:30     ` [Intel-gfx] " Matthew Auld
2021-11-18 13:02 ` [PATCH v4 5/6] drm/i915/ttm: Implement asynchronous TTM moves Thomas Hellström
2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
2021-11-19 17:18   ` Matthew Auld
2021-11-19 17:18     ` [Intel-gfx] " Matthew Auld
2021-11-18 13:02 ` [PATCH v4 6/6] drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous Thomas Hellström
2021-11-18 13:02   ` [Intel-gfx] " Thomas Hellström
2021-11-19 17:22   ` Matthew Auld
2021-11-19 17:22     ` [Intel-gfx] " Matthew Auld
2021-11-18 15:32 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/ttm: Async migration (rev5) Patchwork
2021-11-18 15:59 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-11-19  5:34 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.