All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] drm/i915/ttm: Async migration
@ 2021-11-14 11:12 ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

This patch series deals with async migration and async vram management.
It still leaves an important part out, which is async unbinding which
will reduce latency further, at least when trying to migrate already active
objects.

Patches 1/6 and 2/6 deal with accessing and waiting for the TTM moving
fence from i915 GEM.
Patch 3 is pure code reorganization, no functional change.
Patch 4 breaks a refcounting loop involving the TTM moving fence.
Patch 5 uses TTM to implement the ttm move() callback async, it also
introduces a utility to collect dependencies and turn them into a
single dma_fence, which is needed for the intel_migrate code.
This also affects the gem object migrate code so.
Patch 6 makes the object copy utility async as well, mainly for future
users since the only current user, suspend backup and restore, typically
will want to sync anyway.

v2:
- Fix a couple of SPARSE warnings.
v3:
- Fix a NULL pointer dereference.

Maarten Lankhorst (2):
  drm/i915: Add functions to set/get moving fence
  drm/i915: Add support for asynchronous moving fence waiting

Thomas Hellström (4):
  drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
  drm/i915/ttm: Break refcounting loops at device region unref time
  drm/i915/ttm: Implement asynchronous TTM moves
  drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous

 drivers/gpu/drm/i915/display/intel_fbdev.c    |   7 +-
 drivers/gpu/drm/i915/display/intel_overlay.c  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  37 ++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   9 +
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |   6 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  58 +--
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 396 ++++++++++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c    |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_wait.c      |   4 +-
 .../i915/gem/selftests/i915_gem_coherency.c   |   4 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  22 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |   1 +
 drivers/gpu/drm/i915/i915_vma.c               |  39 +-
 drivers/gpu/drm/i915/i915_vma.h               |   3 +
 drivers/gpu/drm/i915/intel_memory_region.c    |   5 +-
 drivers/gpu/drm/i915/intel_memory_region.h    |   1 +
 drivers/gpu/drm/i915/intel_region_ttm.c       |  28 ++
 drivers/gpu/drm/i915/intel_region_ttm.h       |   2 +
 drivers/gpu/drm/i915/selftests/i915_vma.c     |   4 +-
 21 files changed, 538 insertions(+), 109 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Intel-gfx] [PATCH v3 0/6] drm/i915/ttm: Async migration
@ 2021-11-14 11:12 ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

This patch series deals with async migration and async vram management.
It still leaves an important part out, which is async unbinding which
will reduce latency further, at least when trying to migrate already active
objects.

Patches 1/6 and 2/6 deal with accessing and waiting for the TTM moving
fence from i915 GEM.
Patch 3 is pure code reorganization, no functional change.
Patch 4 breaks a refcounting loop involving the TTM moving fence.
Patch 5 uses TTM to implement the ttm move() callback async, it also
introduces a utility to collect dependencies and turn them into a
single dma_fence, which is needed for the intel_migrate code.
This also affects the gem object migrate code so.
Patch 6 makes the object copy utility async as well, mainly for future
users since the only current user, suspend backup and restore, typically
will want to sync anyway.

v2:
- Fix a couple of SPARSE warnings.
v3:
- Fix a NULL pointer dereference.

Maarten Lankhorst (2):
  drm/i915: Add functions to set/get moving fence
  drm/i915: Add support for asynchronous moving fence waiting

Thomas Hellström (4):
  drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
  drm/i915/ttm: Break refcounting loops at device region unref time
  drm/i915/ttm: Implement asynchronous TTM moves
  drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous

 drivers/gpu/drm/i915/display/intel_fbdev.c    |   7 +-
 drivers/gpu/drm/i915/display/intel_overlay.c  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  37 ++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   9 +
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |   6 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  58 +--
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 396 ++++++++++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h  |  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c    |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_wait.c      |   4 +-
 .../i915/gem/selftests/i915_gem_coherency.c   |   4 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  22 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |   1 +
 drivers/gpu/drm/i915/i915_vma.c               |  39 +-
 drivers/gpu/drm/i915/i915_vma.h               |   3 +
 drivers/gpu/drm/i915/intel_memory_region.c    |   5 +-
 drivers/gpu/drm/i915/intel_memory_region.h    |   1 +
 drivers/gpu/drm/i915/intel_region_ttm.c       |  28 ++
 drivers/gpu/drm/i915/intel_region_ttm.h       |   2 +
 drivers/gpu/drm/i915/selftests/i915_vma.c     |   4 +-
 21 files changed, 538 insertions(+), 109 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 1/6] drm/i915: Add functions to set/get moving fence
  2021-11-14 11:12 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-14 11:12   ` Thomas Hellström
  -1 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: matthew.auld

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

We want to get rid of i915_vma tracking to simplify the code and
lifetimes. Add a way to set/put the moving fence, in preparation for
removing the tracking.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c | 37 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_object.h |  9 ++++++
 2 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 591ee3cb7275..ec4313836597 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -33,6 +33,7 @@
 #include "i915_gem_object.h"
 #include "i915_memcpy.h"
 #include "i915_trace.h"
+#include "i915_gem_ttm.h"
 
 static struct kmem_cache *slab_objects;
 
@@ -726,6 +727,42 @@ static const struct drm_gem_object_funcs i915_gem_object_funcs = {
 	.export = i915_gem_prime_export,
 };
 
+struct dma_fence *
+i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj)
+{
+	return dma_fence_get(i915_gem_to_ttm(obj)->moving);
+}
+
+void i915_gem_object_set_moving_fence(struct drm_i915_gem_object *obj,
+				      struct dma_fence *fence)
+{
+	dma_fence_put(i915_gem_to_ttm(obj)->moving);
+
+	i915_gem_to_ttm(obj)->moving = dma_fence_get(fence);
+}
+
+int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
+				      bool intr)
+{
+	struct dma_fence *fence = i915_gem_to_ttm(obj)->moving;
+	int ret;
+
+	assert_object_held(obj);
+	if (!fence)
+		return 0;
+
+	ret = dma_fence_wait(fence, intr);
+	if (ret)
+		return ret;
+
+	if (fence->error)
+		return fence->error;
+
+	i915_gem_to_ttm(obj)->moving = NULL;
+	dma_fence_put(fence);
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/huge_gem_object.c"
 #include "selftests/huge_pages.c"
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 133963b46135..36bf3e2e602f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -517,6 +517,15 @@ i915_gem_object_finish_access(struct drm_i915_gem_object *obj)
 	i915_gem_object_unpin_pages(obj);
 }
 
+struct dma_fence *
+i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj);
+
+void i915_gem_object_set_moving_fence(struct drm_i915_gem_object *obj,
+				      struct dma_fence *fence);
+
+int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
+				      bool intr);
+
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 					 unsigned int cache_level);
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-gfx] [PATCH v3 1/6] drm/i915: Add functions to set/get moving fence
@ 2021-11-14 11:12   ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: matthew.auld

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

We want to get rid of i915_vma tracking to simplify the code and
lifetimes. Add a way to set/put the moving fence, in preparation for
removing the tracking.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c | 37 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_object.h |  9 ++++++
 2 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 591ee3cb7275..ec4313836597 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -33,6 +33,7 @@
 #include "i915_gem_object.h"
 #include "i915_memcpy.h"
 #include "i915_trace.h"
+#include "i915_gem_ttm.h"
 
 static struct kmem_cache *slab_objects;
 
@@ -726,6 +727,42 @@ static const struct drm_gem_object_funcs i915_gem_object_funcs = {
 	.export = i915_gem_prime_export,
 };
 
+struct dma_fence *
+i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj)
+{
+	return dma_fence_get(i915_gem_to_ttm(obj)->moving);
+}
+
+void i915_gem_object_set_moving_fence(struct drm_i915_gem_object *obj,
+				      struct dma_fence *fence)
+{
+	dma_fence_put(i915_gem_to_ttm(obj)->moving);
+
+	i915_gem_to_ttm(obj)->moving = dma_fence_get(fence);
+}
+
+int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
+				      bool intr)
+{
+	struct dma_fence *fence = i915_gem_to_ttm(obj)->moving;
+	int ret;
+
+	assert_object_held(obj);
+	if (!fence)
+		return 0;
+
+	ret = dma_fence_wait(fence, intr);
+	if (ret)
+		return ret;
+
+	if (fence->error)
+		return fence->error;
+
+	i915_gem_to_ttm(obj)->moving = NULL;
+	dma_fence_put(fence);
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/huge_gem_object.c"
 #include "selftests/huge_pages.c"
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 133963b46135..36bf3e2e602f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -517,6 +517,15 @@ i915_gem_object_finish_access(struct drm_i915_gem_object *obj)
 	i915_gem_object_unpin_pages(obj);
 }
 
+struct dma_fence *
+i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj);
+
+void i915_gem_object_set_moving_fence(struct drm_i915_gem_object *obj,
+				      struct dma_fence *fence);
+
+int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
+				      bool intr);
+
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 					 unsigned int cache_level);
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting
  2021-11-14 11:12 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-14 11:12   ` Thomas Hellström
  -1 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

For now, we will only allow async migration when TTM is used,
so the paths we care about are related to TTM.

The mmap path is handled by having the fence in ttm_bo->moving,
when pinning, the binding only becomes available after the moving
fence is signaled, and pinning a cpu map will only work after
the moving fence signals.

This should close all holes where userspace can read a buffer
before it's fully migrated.

v2:
- Fix a couple of SPARSE warnings
v3:
- Fix a NULL pointer dereference

Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/display/intel_fbdev.c    |  7 ++--
 drivers/gpu/drm/i915/display/intel_overlay.c  |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  6 +++
 .../i915/gem/selftests/i915_gem_coherency.c   |  4 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    | 22 ++++++-----
 drivers/gpu/drm/i915/i915_vma.c               | 39 ++++++++++++++++++-
 drivers/gpu/drm/i915/i915_vma.h               |  3 ++
 drivers/gpu/drm/i915/selftests/i915_vma.c     |  4 +-
 8 files changed, 69 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c
index adc3a81be9f7..5902ad0c2bd8 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -265,11 +265,12 @@ static int intelfb_create(struct drm_fb_helper *helper,
 		info->fix.smem_len = vma->node.size;
 	}
 
-	vaddr = i915_vma_pin_iomap(vma);
+	vaddr = i915_vma_pin_iomap_unlocked(vma);
 	if (IS_ERR(vaddr)) {
-		drm_err(&dev_priv->drm,
-			"Failed to remap framebuffer into virtual memory\n");
 		ret = PTR_ERR(vaddr);
+		if (ret != -EINTR && ret != -ERESTARTSYS)
+			drm_err(&dev_priv->drm,
+				"Failed to remap framebuffer into virtual memory\n");
 		goto out_unpin;
 	}
 	info->screen_base = vaddr;
diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c b/drivers/gpu/drm/i915/display/intel_overlay.c
index 7e3f5c6ca484..21593f3f2664 100644
--- a/drivers/gpu/drm/i915/display/intel_overlay.c
+++ b/drivers/gpu/drm/i915/display/intel_overlay.c
@@ -1357,7 +1357,7 @@ static int get_registers(struct intel_overlay *overlay, bool use_phys)
 		overlay->flip_addr = sg_dma_address(obj->mm.pages->sgl);
 	else
 		overlay->flip_addr = i915_ggtt_offset(vma);
-	overlay->regs = i915_vma_pin_iomap(vma);
+	overlay->regs = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 
 	if (IS_ERR(overlay->regs)) {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index c4f684b7cc51..49c6e55c68ce 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 	}
 
 	if (!ptr) {
+		err = i915_gem_object_wait_moving_fence(obj, true);
+		if (err) {
+			ptr = ERR_PTR(err);
+			goto err_unpin;
+		}
+
 		if (GEM_WARN_ON(type == I915_MAP_WC &&
 				!static_cpu_has(X86_FEATURE_PAT)))
 			ptr = ERR_PTR(-ENODEV);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
index 13b088cc787e..067c512961ba 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
@@ -101,7 +101,7 @@ static int gtt_set(struct context *ctx, unsigned long offset, u32 v)
 
 	intel_gt_pm_get(vma->vm->gt);
 
-	map = i915_vma_pin_iomap(vma);
+	map = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 	if (IS_ERR(map)) {
 		err = PTR_ERR(map);
@@ -134,7 +134,7 @@ static int gtt_get(struct context *ctx, unsigned long offset, u32 *v)
 
 	intel_gt_pm_get(vma->vm->gt);
 
-	map = i915_vma_pin_iomap(vma);
+	map = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 	if (IS_ERR(map)) {
 		err = PTR_ERR(map);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 6d30cdfa80f3..5d54181c2145 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -125,12 +125,13 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj,
 	n = page - view.partial.offset;
 	GEM_BUG_ON(n >= view.partial.size);
 
-	io = i915_vma_pin_iomap(vma);
+	io = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 	if (IS_ERR(io)) {
-		pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
-		       page, (int)PTR_ERR(io));
 		err = PTR_ERR(io);
+		if (err != -EINTR && err != -ERESTARTSYS)
+			pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
+			       page, err);
 		goto out;
 	}
 
@@ -219,12 +220,15 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj,
 		n = page - view.partial.offset;
 		GEM_BUG_ON(n >= view.partial.size);
 
-		io = i915_vma_pin_iomap(vma);
+		io = i915_vma_pin_iomap_unlocked(vma);
 		i915_vma_unpin(vma);
 		if (IS_ERR(io)) {
-			pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
-			       page, (int)PTR_ERR(io));
-			return PTR_ERR(io);
+			int err = PTR_ERR(io);
+
+			if (err != -EINTR && err != -ERESTARTSYS)
+				pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
+				       page, err);
+			return err;
 		}
 
 		iowrite32(page, io + n * PAGE_SIZE / sizeof(*io));
@@ -773,7 +777,7 @@ static int gtt_set(struct drm_i915_gem_object *obj)
 		return PTR_ERR(vma);
 
 	intel_gt_pm_get(vma->vm->gt);
-	map = i915_vma_pin_iomap(vma);
+	map = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 	if (IS_ERR(map)) {
 		err = PTR_ERR(map);
@@ -799,7 +803,7 @@ static int gtt_check(struct drm_i915_gem_object *obj)
 		return PTR_ERR(vma);
 
 	intel_gt_pm_get(vma->vm->gt);
-	map = i915_vma_pin_iomap(vma);
+	map = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 	if (IS_ERR(map)) {
 		err = PTR_ERR(map);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 8781c4f61952..069f22b3cd48 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -431,6 +431,13 @@ int i915_vma_bind(struct i915_vma *vma,
 			work->pinned = i915_gem_object_get(vma->obj);
 		}
 	} else {
+		if (vma->obj) {
+			int ret;
+
+			ret = i915_gem_object_wait_moving_fence(vma->obj, true);
+			if (ret)
+				return ret;
+		}
 		vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags);
 	}
 
@@ -455,6 +462,10 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
 
 	ptr = READ_ONCE(vma->iomap);
 	if (ptr == NULL) {
+		err = i915_gem_object_wait_moving_fence(vma->obj, true);
+		if (err)
+			goto err;
+
 		/*
 		 * TODO: consider just using i915_gem_object_pin_map() for lmem
 		 * instead, which already supports mapping non-contiguous chunks
@@ -496,6 +507,25 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
 	return IO_ERR_PTR(err);
 }
 
+void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma)
+{
+	struct i915_gem_ww_ctx ww;
+	void __iomem *map;
+	int err;
+
+	for_i915_gem_ww(&ww, err, true) {
+		err = i915_gem_object_lock(vma->obj, &ww);
+		if (err)
+			continue;
+
+		map = i915_vma_pin_iomap(vma);
+	}
+	if (err)
+		map = IO_ERR_PTR(err);
+
+	return map;
+}
+
 void i915_vma_flush_writes(struct i915_vma *vma)
 {
 	if (i915_vma_unset_ggtt_write(vma))
@@ -870,6 +900,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 		    u64 size, u64 alignment, u64 flags)
 {
 	struct i915_vma_work *work = NULL;
+	struct dma_fence *moving = NULL;
 	intel_wakeref_t wakeref = 0;
 	unsigned int bound;
 	int err;
@@ -895,7 +926,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 	if (flags & PIN_GLOBAL)
 		wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
 
-	if (flags & vma->vm->bind_async_flags) {
+	moving = vma->obj ? i915_gem_object_get_moving_fence(vma->obj) : NULL;
+	if (flags & vma->vm->bind_async_flags || moving) {
 		/* lock VM */
 		err = i915_vm_lock_objects(vma->vm, ww);
 		if (err)
@@ -909,6 +941,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 
 		work->vm = i915_vm_get(vma->vm);
 
+		dma_fence_work_chain(&work->base, moving);
+
 		/* Allocate enough page directories to used PTE */
 		if (vma->vm->allocate_va_range) {
 			err = i915_vm_alloc_pt_stash(vma->vm,
@@ -1013,7 +1047,10 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 err_rpm:
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
+	if (moving)
+		dma_fence_put(moving);
 	vma_put_pages(vma);
+
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 648dbe744c96..1812b2904a31 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -326,6 +326,9 @@ static inline bool i915_node_color_differs(const struct drm_mm_node *node,
  * Returns a valid iomapped pointer or ERR_PTR.
  */
 void __iomem *i915_vma_pin_iomap(struct i915_vma *vma);
+
+void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma);
+
 #define IO_ERR_PTR(x) ((void __iomem *)ERR_PTR(x))
 
 /**
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index 1f10fe36619b..85f43b209890 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -1005,7 +1005,7 @@ static int igt_vma_remapped_gtt(void *arg)
 
 			GEM_BUG_ON(vma->ggtt_view.type != *t);
 
-			map = i915_vma_pin_iomap(vma);
+			map = i915_vma_pin_iomap_unlocked(vma);
 			i915_vma_unpin(vma);
 			if (IS_ERR(map)) {
 				err = PTR_ERR(map);
@@ -1036,7 +1036,7 @@ static int igt_vma_remapped_gtt(void *arg)
 
 			GEM_BUG_ON(vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL);
 
-			map = i915_vma_pin_iomap(vma);
+			map = i915_vma_pin_iomap_unlocked(vma);
 			i915_vma_unpin(vma);
 			if (IS_ERR(map)) {
 				err = PTR_ERR(map);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-gfx] [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting
@ 2021-11-14 11:12   ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

For now, we will only allow async migration when TTM is used,
so the paths we care about are related to TTM.

The mmap path is handled by having the fence in ttm_bo->moving,
when pinning, the binding only becomes available after the moving
fence is signaled, and pinning a cpu map will only work after
the moving fence signals.

This should close all holes where userspace can read a buffer
before it's fully migrated.

v2:
- Fix a couple of SPARSE warnings
v3:
- Fix a NULL pointer dereference

Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/display/intel_fbdev.c    |  7 ++--
 drivers/gpu/drm/i915/display/intel_overlay.c  |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  6 +++
 .../i915/gem/selftests/i915_gem_coherency.c   |  4 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    | 22 ++++++-----
 drivers/gpu/drm/i915/i915_vma.c               | 39 ++++++++++++++++++-
 drivers/gpu/drm/i915/i915_vma.h               |  3 ++
 drivers/gpu/drm/i915/selftests/i915_vma.c     |  4 +-
 8 files changed, 69 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c
index adc3a81be9f7..5902ad0c2bd8 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -265,11 +265,12 @@ static int intelfb_create(struct drm_fb_helper *helper,
 		info->fix.smem_len = vma->node.size;
 	}
 
-	vaddr = i915_vma_pin_iomap(vma);
+	vaddr = i915_vma_pin_iomap_unlocked(vma);
 	if (IS_ERR(vaddr)) {
-		drm_err(&dev_priv->drm,
-			"Failed to remap framebuffer into virtual memory\n");
 		ret = PTR_ERR(vaddr);
+		if (ret != -EINTR && ret != -ERESTARTSYS)
+			drm_err(&dev_priv->drm,
+				"Failed to remap framebuffer into virtual memory\n");
 		goto out_unpin;
 	}
 	info->screen_base = vaddr;
diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c b/drivers/gpu/drm/i915/display/intel_overlay.c
index 7e3f5c6ca484..21593f3f2664 100644
--- a/drivers/gpu/drm/i915/display/intel_overlay.c
+++ b/drivers/gpu/drm/i915/display/intel_overlay.c
@@ -1357,7 +1357,7 @@ static int get_registers(struct intel_overlay *overlay, bool use_phys)
 		overlay->flip_addr = sg_dma_address(obj->mm.pages->sgl);
 	else
 		overlay->flip_addr = i915_ggtt_offset(vma);
-	overlay->regs = i915_vma_pin_iomap(vma);
+	overlay->regs = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 
 	if (IS_ERR(overlay->regs)) {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index c4f684b7cc51..49c6e55c68ce 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 	}
 
 	if (!ptr) {
+		err = i915_gem_object_wait_moving_fence(obj, true);
+		if (err) {
+			ptr = ERR_PTR(err);
+			goto err_unpin;
+		}
+
 		if (GEM_WARN_ON(type == I915_MAP_WC &&
 				!static_cpu_has(X86_FEATURE_PAT)))
 			ptr = ERR_PTR(-ENODEV);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
index 13b088cc787e..067c512961ba 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
@@ -101,7 +101,7 @@ static int gtt_set(struct context *ctx, unsigned long offset, u32 v)
 
 	intel_gt_pm_get(vma->vm->gt);
 
-	map = i915_vma_pin_iomap(vma);
+	map = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 	if (IS_ERR(map)) {
 		err = PTR_ERR(map);
@@ -134,7 +134,7 @@ static int gtt_get(struct context *ctx, unsigned long offset, u32 *v)
 
 	intel_gt_pm_get(vma->vm->gt);
 
-	map = i915_vma_pin_iomap(vma);
+	map = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 	if (IS_ERR(map)) {
 		err = PTR_ERR(map);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 6d30cdfa80f3..5d54181c2145 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -125,12 +125,13 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj,
 	n = page - view.partial.offset;
 	GEM_BUG_ON(n >= view.partial.size);
 
-	io = i915_vma_pin_iomap(vma);
+	io = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 	if (IS_ERR(io)) {
-		pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
-		       page, (int)PTR_ERR(io));
 		err = PTR_ERR(io);
+		if (err != -EINTR && err != -ERESTARTSYS)
+			pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
+			       page, err);
 		goto out;
 	}
 
@@ -219,12 +220,15 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj,
 		n = page - view.partial.offset;
 		GEM_BUG_ON(n >= view.partial.size);
 
-		io = i915_vma_pin_iomap(vma);
+		io = i915_vma_pin_iomap_unlocked(vma);
 		i915_vma_unpin(vma);
 		if (IS_ERR(io)) {
-			pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
-			       page, (int)PTR_ERR(io));
-			return PTR_ERR(io);
+			int err = PTR_ERR(io);
+
+			if (err != -EINTR && err != -ERESTARTSYS)
+				pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
+				       page, err);
+			return err;
 		}
 
 		iowrite32(page, io + n * PAGE_SIZE / sizeof(*io));
@@ -773,7 +777,7 @@ static int gtt_set(struct drm_i915_gem_object *obj)
 		return PTR_ERR(vma);
 
 	intel_gt_pm_get(vma->vm->gt);
-	map = i915_vma_pin_iomap(vma);
+	map = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 	if (IS_ERR(map)) {
 		err = PTR_ERR(map);
@@ -799,7 +803,7 @@ static int gtt_check(struct drm_i915_gem_object *obj)
 		return PTR_ERR(vma);
 
 	intel_gt_pm_get(vma->vm->gt);
-	map = i915_vma_pin_iomap(vma);
+	map = i915_vma_pin_iomap_unlocked(vma);
 	i915_vma_unpin(vma);
 	if (IS_ERR(map)) {
 		err = PTR_ERR(map);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 8781c4f61952..069f22b3cd48 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -431,6 +431,13 @@ int i915_vma_bind(struct i915_vma *vma,
 			work->pinned = i915_gem_object_get(vma->obj);
 		}
 	} else {
+		if (vma->obj) {
+			int ret;
+
+			ret = i915_gem_object_wait_moving_fence(vma->obj, true);
+			if (ret)
+				return ret;
+		}
 		vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags);
 	}
 
@@ -455,6 +462,10 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
 
 	ptr = READ_ONCE(vma->iomap);
 	if (ptr == NULL) {
+		err = i915_gem_object_wait_moving_fence(vma->obj, true);
+		if (err)
+			goto err;
+
 		/*
 		 * TODO: consider just using i915_gem_object_pin_map() for lmem
 		 * instead, which already supports mapping non-contiguous chunks
@@ -496,6 +507,25 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
 	return IO_ERR_PTR(err);
 }
 
+void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma)
+{
+	struct i915_gem_ww_ctx ww;
+	void __iomem *map;
+	int err;
+
+	for_i915_gem_ww(&ww, err, true) {
+		err = i915_gem_object_lock(vma->obj, &ww);
+		if (err)
+			continue;
+
+		map = i915_vma_pin_iomap(vma);
+	}
+	if (err)
+		map = IO_ERR_PTR(err);
+
+	return map;
+}
+
 void i915_vma_flush_writes(struct i915_vma *vma)
 {
 	if (i915_vma_unset_ggtt_write(vma))
@@ -870,6 +900,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 		    u64 size, u64 alignment, u64 flags)
 {
 	struct i915_vma_work *work = NULL;
+	struct dma_fence *moving = NULL;
 	intel_wakeref_t wakeref = 0;
 	unsigned int bound;
 	int err;
@@ -895,7 +926,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 	if (flags & PIN_GLOBAL)
 		wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
 
-	if (flags & vma->vm->bind_async_flags) {
+	moving = vma->obj ? i915_gem_object_get_moving_fence(vma->obj) : NULL;
+	if (flags & vma->vm->bind_async_flags || moving) {
 		/* lock VM */
 		err = i915_vm_lock_objects(vma->vm, ww);
 		if (err)
@@ -909,6 +941,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 
 		work->vm = i915_vm_get(vma->vm);
 
+		dma_fence_work_chain(&work->base, moving);
+
 		/* Allocate enough page directories to used PTE */
 		if (vma->vm->allocate_va_range) {
 			err = i915_vm_alloc_pt_stash(vma->vm,
@@ -1013,7 +1047,10 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 err_rpm:
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
+	if (moving)
+		dma_fence_put(moving);
 	vma_put_pages(vma);
+
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 648dbe744c96..1812b2904a31 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -326,6 +326,9 @@ static inline bool i915_node_color_differs(const struct drm_mm_node *node,
  * Returns a valid iomapped pointer or ERR_PTR.
  */
 void __iomem *i915_vma_pin_iomap(struct i915_vma *vma);
+
+void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma);
+
 #define IO_ERR_PTR(x) ((void __iomem *)ERR_PTR(x))
 
 /**
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index 1f10fe36619b..85f43b209890 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -1005,7 +1005,7 @@ static int igt_vma_remapped_gtt(void *arg)
 
 			GEM_BUG_ON(vma->ggtt_view.type != *t);
 
-			map = i915_vma_pin_iomap(vma);
+			map = i915_vma_pin_iomap_unlocked(vma);
 			i915_vma_unpin(vma);
 			if (IS_ERR(map)) {
 				err = PTR_ERR(map);
@@ -1036,7 +1036,7 @@ static int igt_vma_remapped_gtt(void *arg)
 
 			GEM_BUG_ON(vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL);
 
-			map = i915_vma_pin_iomap(vma);
+			map = i915_vma_pin_iomap_unlocked(vma);
 			i915_vma_unpin(vma);
 			if (IS_ERR(map)) {
 				err = PTR_ERR(map);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 3/6] drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
  2021-11-14 11:12 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-14 11:12   ` Thomas Hellström
  -1 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Move the i915_gem_obj_copy_ttm() function to i915_gem_ttm_move.h.
This will help keep a number of functions static when introducing
async moves.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 47 ---------------
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |  4 --
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 63 ++++++++++++++++----
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 10 ++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c   |  1 +
 5 files changed, 56 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 68cfe6e9ceab..537a81445b90 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1063,50 +1063,3 @@ i915_gem_ttm_system_setup(struct drm_i915_private *i915,
 	intel_memory_region_set_name(mr, "system-ttm");
 	return mr;
 }
-
-/**
- * i915_gem_obj_copy_ttm - Copy the contents of one ttm-based gem object to
- * another
- * @dst: The destination object
- * @src: The source object
- * @allow_accel: Allow using the blitter. Otherwise TTM memcpy is used.
- * @intr: Whether to perform waits interruptible:
- *
- * Note: The caller is responsible for assuring that the underlying
- * TTM objects are populated if needed and locked.
- *
- * Return: Zero on success. Negative error code on error. If @intr == true,
- * then it may return -ERESTARTSYS or -EINTR.
- */
-int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
-			  struct drm_i915_gem_object *src,
-			  bool allow_accel, bool intr)
-{
-	struct ttm_buffer_object *dst_bo = i915_gem_to_ttm(dst);
-	struct ttm_buffer_object *src_bo = i915_gem_to_ttm(src);
-	struct ttm_operation_ctx ctx = {
-		.interruptible = intr,
-	};
-	struct i915_refct_sgt *dst_rsgt;
-	int ret;
-
-	assert_object_held(dst);
-	assert_object_held(src);
-
-	/*
-	 * Sync for now. This will change with async moves.
-	 */
-	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
-	if (!ret)
-		ret = ttm_bo_wait_ctx(src_bo, &ctx);
-	if (ret)
-		return ret;
-
-	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
-	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_rsgt, allow_accel);
-
-	i915_refct_sgt_put(dst_rsgt);
-
-	return 0;
-}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 074a7c08ff31..82cdabb542be 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -49,10 +49,6 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 			       resource_size_t page_size,
 			       unsigned int flags);
 
-int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
-			  struct drm_i915_gem_object *src,
-			  bool allow_accel, bool intr);
-
 /* Internal I915 TTM declarations and definitions below. */
 
 #define I915_PL_LMEM0 TTM_PL_PRIV
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index ef22d4ed66ad..f35b386c56ca 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -378,18 +378,10 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 	return &work->fence;
 }
 
-/**
- * __i915_ttm_move - helper to perform TTM moves or clears.
- * @bo: The source buffer object.
- * @clear: Whether this is a clear operation.
- * @dst_mem: The destination ttm resource.
- * @dst_ttm: The destination ttm page vector.
- * @dst_rsgt: The destination refcounted sg-list.
- * @allow_accel: Whether to allow acceleration.
- */
-void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-		     struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
-		     struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+			    struct ttm_resource *dst_mem,
+			    struct ttm_tt *dst_ttm,
+			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
 {
 	struct i915_ttm_memcpy_work *copy_work = NULL;
 	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
@@ -521,3 +513,50 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	i915_ttm_adjust_gem_after_move(obj);
 	return 0;
 }
+
+/**
+ * i915_gem_obj_copy_ttm - Copy the contents of one ttm-based gem object to
+ * another
+ * @dst: The destination object
+ * @src: The source object
+ * @allow_accel: Allow using the blitter. Otherwise TTM memcpy is used.
+ * @intr: Whether to perform waits interruptible:
+ *
+ * Note: The caller is responsible for assuring that the underlying
+ * TTM objects are populated if needed and locked.
+ *
+ * Return: Zero on success. Negative error code on error. If @intr == true,
+ * then it may return -ERESTARTSYS or -EINTR.
+ */
+int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
+			  struct drm_i915_gem_object *src,
+			  bool allow_accel, bool intr)
+{
+	struct ttm_buffer_object *dst_bo = i915_gem_to_ttm(dst);
+	struct ttm_buffer_object *src_bo = i915_gem_to_ttm(src);
+	struct ttm_operation_ctx ctx = {
+		.interruptible = intr,
+	};
+	struct i915_refct_sgt *dst_rsgt;
+	int ret;
+
+	assert_object_held(dst);
+	assert_object_held(src);
+
+	/*
+	 * Sync for now. This will change with async moves.
+	 */
+	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
+	if (!ret)
+		ret = ttm_bo_wait_ctx(src_bo, &ctx);
+	if (ret)
+		return ret;
+
+	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
+	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
+			dst_rsgt, allow_accel);
+
+	i915_refct_sgt_put(dst_rsgt);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
index 75b87e752af2..d2e7f149e05c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
@@ -23,13 +23,11 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo);
 I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 							      bool work_allocation));
 
-/* Internal I915 TTM declarations and definitions below. */
+int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
+			  struct drm_i915_gem_object *src,
+			  bool allow_accel, bool intr);
 
-void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-		     struct ttm_resource *dst_mem,
-		     struct ttm_tt *dst_ttm,
-		     struct i915_refct_sgt *dst_rsgt,
-		     bool allow_accel);
+/* Internal I915 TTM declarations and definitions below. */
 
 int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		  struct ttm_operation_ctx *ctx,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 3b6d14b5c604..60d10ab55d1e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -12,6 +12,7 @@
 
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
+#include "gem/i915_gem_ttm_move.h"
 #include "gem/i915_gem_ttm_pm.h"
 
 /**
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-gfx] [PATCH v3 3/6] drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
@ 2021-11-14 11:12   ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Move the i915_gem_obj_copy_ttm() function to i915_gem_ttm_move.h.
This will help keep a number of functions static when introducing
async moves.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 47 ---------------
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |  4 --
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 63 ++++++++++++++++----
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 10 ++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c   |  1 +
 5 files changed, 56 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 68cfe6e9ceab..537a81445b90 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1063,50 +1063,3 @@ i915_gem_ttm_system_setup(struct drm_i915_private *i915,
 	intel_memory_region_set_name(mr, "system-ttm");
 	return mr;
 }
-
-/**
- * i915_gem_obj_copy_ttm - Copy the contents of one ttm-based gem object to
- * another
- * @dst: The destination object
- * @src: The source object
- * @allow_accel: Allow using the blitter. Otherwise TTM memcpy is used.
- * @intr: Whether to perform waits interruptible:
- *
- * Note: The caller is responsible for assuring that the underlying
- * TTM objects are populated if needed and locked.
- *
- * Return: Zero on success. Negative error code on error. If @intr == true,
- * then it may return -ERESTARTSYS or -EINTR.
- */
-int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
-			  struct drm_i915_gem_object *src,
-			  bool allow_accel, bool intr)
-{
-	struct ttm_buffer_object *dst_bo = i915_gem_to_ttm(dst);
-	struct ttm_buffer_object *src_bo = i915_gem_to_ttm(src);
-	struct ttm_operation_ctx ctx = {
-		.interruptible = intr,
-	};
-	struct i915_refct_sgt *dst_rsgt;
-	int ret;
-
-	assert_object_held(dst);
-	assert_object_held(src);
-
-	/*
-	 * Sync for now. This will change with async moves.
-	 */
-	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
-	if (!ret)
-		ret = ttm_bo_wait_ctx(src_bo, &ctx);
-	if (ret)
-		return ret;
-
-	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
-	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_rsgt, allow_accel);
-
-	i915_refct_sgt_put(dst_rsgt);
-
-	return 0;
-}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 074a7c08ff31..82cdabb542be 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -49,10 +49,6 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 			       resource_size_t page_size,
 			       unsigned int flags);
 
-int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
-			  struct drm_i915_gem_object *src,
-			  bool allow_accel, bool intr);
-
 /* Internal I915 TTM declarations and definitions below. */
 
 #define I915_PL_LMEM0 TTM_PL_PRIV
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index ef22d4ed66ad..f35b386c56ca 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -378,18 +378,10 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 	return &work->fence;
 }
 
-/**
- * __i915_ttm_move - helper to perform TTM moves or clears.
- * @bo: The source buffer object.
- * @clear: Whether this is a clear operation.
- * @dst_mem: The destination ttm resource.
- * @dst_ttm: The destination ttm page vector.
- * @dst_rsgt: The destination refcounted sg-list.
- * @allow_accel: Whether to allow acceleration.
- */
-void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-		     struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
-		     struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+			    struct ttm_resource *dst_mem,
+			    struct ttm_tt *dst_ttm,
+			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
 {
 	struct i915_ttm_memcpy_work *copy_work = NULL;
 	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
@@ -521,3 +513,50 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	i915_ttm_adjust_gem_after_move(obj);
 	return 0;
 }
+
+/**
+ * i915_gem_obj_copy_ttm - Copy the contents of one ttm-based gem object to
+ * another
+ * @dst: The destination object
+ * @src: The source object
+ * @allow_accel: Allow using the blitter. Otherwise TTM memcpy is used.
+ * @intr: Whether to perform waits interruptible:
+ *
+ * Note: The caller is responsible for assuring that the underlying
+ * TTM objects are populated if needed and locked.
+ *
+ * Return: Zero on success. Negative error code on error. If @intr == true,
+ * then it may return -ERESTARTSYS or -EINTR.
+ */
+int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
+			  struct drm_i915_gem_object *src,
+			  bool allow_accel, bool intr)
+{
+	struct ttm_buffer_object *dst_bo = i915_gem_to_ttm(dst);
+	struct ttm_buffer_object *src_bo = i915_gem_to_ttm(src);
+	struct ttm_operation_ctx ctx = {
+		.interruptible = intr,
+	};
+	struct i915_refct_sgt *dst_rsgt;
+	int ret;
+
+	assert_object_held(dst);
+	assert_object_held(src);
+
+	/*
+	 * Sync for now. This will change with async moves.
+	 */
+	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
+	if (!ret)
+		ret = ttm_bo_wait_ctx(src_bo, &ctx);
+	if (ret)
+		return ret;
+
+	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
+	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
+			dst_rsgt, allow_accel);
+
+	i915_refct_sgt_put(dst_rsgt);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
index 75b87e752af2..d2e7f149e05c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h
@@ -23,13 +23,11 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo);
 I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 							      bool work_allocation));
 
-/* Internal I915 TTM declarations and definitions below. */
+int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
+			  struct drm_i915_gem_object *src,
+			  bool allow_accel, bool intr);
 
-void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-		     struct ttm_resource *dst_mem,
-		     struct ttm_tt *dst_ttm,
-		     struct i915_refct_sgt *dst_rsgt,
-		     bool allow_accel);
+/* Internal I915 TTM declarations and definitions below. */
 
 int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		  struct ttm_operation_ctx *ctx,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 3b6d14b5c604..60d10ab55d1e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -12,6 +12,7 @@
 
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
+#include "gem/i915_gem_ttm_move.h"
 #include "gem/i915_gem_ttm_pm.h"
 
 /**
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 4/6] drm/i915/ttm: Break refcounting loops at device region unref time
  2021-11-14 11:12 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-14 11:12   ` Thomas Hellström
  -1 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

There is an interesting refcounting loop:
struct intel_memory_region has a struct ttm_resource_manager,
ttm_resource_manager->move may hold a reference to i915_request,
i915_request may hold a reference to intel_context,
intel_context may hold a reference to drm_i915_gem_object,
drm_i915_gem_object may hold a reference to intel_memory_region.

Break this loop when we drop the device reference count on the
region by putting the region move fence.

Also hold dropping the device reference count until all objects of
the region has been deleted, to avoid issues if proceeding with the
device takedown while the region is still present.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c     |  1 +
 drivers/gpu/drm/i915/gt/intel_region_lmem.c |  1 +
 drivers/gpu/drm/i915/intel_memory_region.c  |  5 +++-
 drivers/gpu/drm/i915/intel_memory_region.h  |  1 +
 drivers/gpu/drm/i915/intel_region_ttm.c     | 28 +++++++++++++++++++++
 drivers/gpu/drm/i915/intel_region_ttm.h     |  2 ++
 6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 537a81445b90..a1df49378a0f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1044,6 +1044,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 
 static const struct intel_memory_region_ops ttm_system_region_ops = {
 	.init_object = __i915_gem_ttm_object_init,
+	.disable = intel_region_ttm_disable,
 };
 
 struct intel_memory_region *
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index aec838ecb2ef..956916fd21f8 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -108,6 +108,7 @@ region_lmem_init(struct intel_memory_region *mem)
 static const struct intel_memory_region_ops intel_region_lmem_ops = {
 	.init = region_lmem_init,
 	.release = region_lmem_release,
+	.disable = intel_region_ttm_disable,
 	.init_object = __i915_gem_ttm_object_init,
 };
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index e7f7e6627750..1f67d2b68c24 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -233,8 +233,11 @@ void intel_memory_regions_driver_release(struct drm_i915_private *i915)
 		struct intel_memory_region *region =
 			fetch_and_zero(&i915->mm.regions[i]);
 
-		if (region)
+		if (region) {
+			if (region->ops->disable)
+				region->ops->disable(region);
 			intel_memory_region_put(region);
+		}
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 3feae3353d33..9bb77eacd206 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -52,6 +52,7 @@ struct intel_memory_region_ops {
 
 	int (*init)(struct intel_memory_region *mem);
 	void (*release)(struct intel_memory_region *mem);
+	void (*disable)(struct intel_memory_region *mem);
 
 	int (*init_object)(struct intel_memory_region *mem,
 			   struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 2e901a27e259..4219d83a2b19 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -114,6 +114,34 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
 	mem->region_private = NULL;
 }
 
+/**
+ * intel_region_ttm_disable - A TTM region disable callback helper
+ * @mem: The memory region.
+ *
+ * A helper that ensures that nothing any longer references a region at
+ * device takedown. Breaks refcounting loops and waits for objects in the
+ * region to be deleted.
+ */
+void intel_region_ttm_disable(struct intel_memory_region *mem)
+{
+	struct ttm_resource_manager *man = mem->region_private;
+
+	/*
+	 * Put the region's move fences. This releases requests that
+	 * may hold on to contexts and vms that may hold on to buffer
+	 * objects that may have a refcount on the region. :/
+	 */
+	if (man)
+		ttm_resource_manager_cleanup(man);
+
+	/* Flush objects that may just have been freed */
+	i915_gem_flush_free_objects(mem->i915);
+
+	/* Wait until the only region reference left is our own. */
+	while (kref_read(&mem->kref) > 1)
+		msleep(20);
+}
+
 /**
  * intel_region_ttm_resource_to_rsgt -
  * Convert an opaque TTM resource manager resource to a refcounted sg_table.
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
index 7bbe2b46b504..197a8c179370 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -22,6 +22,8 @@ int intel_region_ttm_init(struct intel_memory_region *mem);
 
 void intel_region_ttm_fini(struct intel_memory_region *mem);
 
+void intel_region_ttm_disable(struct intel_memory_region *mem);
+
 struct i915_refct_sgt *
 intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
 				  struct ttm_resource *res);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-gfx] [PATCH v3 4/6] drm/i915/ttm: Break refcounting loops at device region unref time
@ 2021-11-14 11:12   ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

There is an interesting refcounting loop:
struct intel_memory_region has a struct ttm_resource_manager,
ttm_resource_manager->move may hold a reference to i915_request,
i915_request may hold a reference to intel_context,
intel_context may hold a reference to drm_i915_gem_object,
drm_i915_gem_object may hold a reference to intel_memory_region.

Break this loop when we drop the device reference count on the
region by putting the region move fence.

Also hold dropping the device reference count until all objects of
the region has been deleted, to avoid issues if proceeding with the
device takedown while the region is still present.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c     |  1 +
 drivers/gpu/drm/i915/gt/intel_region_lmem.c |  1 +
 drivers/gpu/drm/i915/intel_memory_region.c  |  5 +++-
 drivers/gpu/drm/i915/intel_memory_region.h  |  1 +
 drivers/gpu/drm/i915/intel_region_ttm.c     | 28 +++++++++++++++++++++
 drivers/gpu/drm/i915/intel_region_ttm.h     |  2 ++
 6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 537a81445b90..a1df49378a0f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1044,6 +1044,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 
 static const struct intel_memory_region_ops ttm_system_region_ops = {
 	.init_object = __i915_gem_ttm_object_init,
+	.disable = intel_region_ttm_disable,
 };
 
 struct intel_memory_region *
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index aec838ecb2ef..956916fd21f8 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -108,6 +108,7 @@ region_lmem_init(struct intel_memory_region *mem)
 static const struct intel_memory_region_ops intel_region_lmem_ops = {
 	.init = region_lmem_init,
 	.release = region_lmem_release,
+	.disable = intel_region_ttm_disable,
 	.init_object = __i915_gem_ttm_object_init,
 };
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index e7f7e6627750..1f67d2b68c24 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -233,8 +233,11 @@ void intel_memory_regions_driver_release(struct drm_i915_private *i915)
 		struct intel_memory_region *region =
 			fetch_and_zero(&i915->mm.regions[i]);
 
-		if (region)
+		if (region) {
+			if (region->ops->disable)
+				region->ops->disable(region);
 			intel_memory_region_put(region);
+		}
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 3feae3353d33..9bb77eacd206 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -52,6 +52,7 @@ struct intel_memory_region_ops {
 
 	int (*init)(struct intel_memory_region *mem);
 	void (*release)(struct intel_memory_region *mem);
+	void (*disable)(struct intel_memory_region *mem);
 
 	int (*init_object)(struct intel_memory_region *mem,
 			   struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 2e901a27e259..4219d83a2b19 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -114,6 +114,34 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
 	mem->region_private = NULL;
 }
 
+/**
+ * intel_region_ttm_disable - A TTM region disable callback helper
+ * @mem: The memory region.
+ *
+ * A helper that ensures that nothing any longer references a region at
+ * device takedown. Breaks refcounting loops and waits for objects in the
+ * region to be deleted.
+ */
+void intel_region_ttm_disable(struct intel_memory_region *mem)
+{
+	struct ttm_resource_manager *man = mem->region_private;
+
+	/*
+	 * Put the region's move fences. This releases requests that
+	 * may hold on to contexts and vms that may hold on to buffer
+	 * objects that may have a refcount on the region. :/
+	 */
+	if (man)
+		ttm_resource_manager_cleanup(man);
+
+	/* Flush objects that may just have been freed */
+	i915_gem_flush_free_objects(mem->i915);
+
+	/* Wait until the only region reference left is our own. */
+	while (kref_read(&mem->kref) > 1)
+		msleep(20);
+}
+
 /**
  * intel_region_ttm_resource_to_rsgt -
  * Convert an opaque TTM resource manager resource to a refcounted sg_table.
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
index 7bbe2b46b504..197a8c179370 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -22,6 +22,8 @@ int intel_region_ttm_init(struct intel_memory_region *mem);
 
 void intel_region_ttm_fini(struct intel_memory_region *mem);
 
+void intel_region_ttm_disable(struct intel_memory_region *mem);
+
 struct i915_refct_sgt *
 intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
 				  struct ttm_resource *res);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 5/6] drm/i915/ttm: Implement asynchronous TTM moves
  2021-11-14 11:12 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-14 11:12   ` Thomas Hellström
  -1 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Don't wait sync while migrating, but rather make the GPU blit await the
dependencies and add a moving fence to the object.

This also enables asynchronous VRAM management in that on eviction,
rather than waiting for the moving fence to expire before freeing VRAM,
it is freed immediately and the fence is stored with the VRAM manager and
handed out to newly allocated objects to await before clears and swapins,
or for kernel objects before setting up gpu vmas or mapping.

To collect dependencies before migrating, add a set of utilities that
coalesce these to a single dma_fence.

What is still missing for fully asynchronous operation is asynchronous vma
unbinding, which is still to be implemented.

This commit substantially reduces execution time in the gem_lmem_swapping
test.

v2:
- Make a couple of functions static.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  10 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 329 +++++++++++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_wait.c     |   4 +-
 4 files changed, 318 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index a1df49378a0f..111a4282d779 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -326,6 +326,9 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
 {
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 
+	if (!obj)
+		return false;
+
 	/*
 	 * EXTERNAL objects should never be swapped out by TTM, instead we need
 	 * to handle that ourselves. TTM will already skip such objects for us,
@@ -448,6 +451,10 @@ static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
 	if (bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED)
 		return 0;
 
+	ret = ttm_bo_wait_ctx(bo, &ctx);
+	if (ret)
+		return ret;
+
 	bo->ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
 	ret = ttm_bo_validate(bo, &place, &ctx);
 	if (ret) {
@@ -549,6 +556,9 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 	int ret = i915_ttm_move_notify(bo);
 
+	if (!obj)
+		return;
+
 	GEM_WARN_ON(ret);
 	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
 	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 82cdabb542be..9d698ad00853 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -37,7 +37,7 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo);
 static inline struct drm_i915_gem_object *
 i915_ttm_to_gem(struct ttm_buffer_object *bo)
 {
-	if (GEM_WARN_ON(bo->destroy != i915_ttm_bo_destroy))
+	if (bo->destroy != i915_ttm_bo_destroy)
 		return NULL;
 
 	return container_of(bo, struct drm_i915_gem_object, __do_not_access);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index f35b386c56ca..ae2c49fc3500 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,8 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include <linux/dma-fence-array.h>
+
 #include <drm/ttm/ttm_bo_driver.h>
 
 #include "i915_drv.h"
@@ -41,6 +43,228 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 }
 #endif
 
+/**
+ * DOC: Set of utilities to dynamically collect dependencies and
+ * eventually coalesce them into a single fence which is fed into
+ * the migration code. That single fence is, in the case of dependencies
+ * from multiple contexts, a struct dma_fence_array, since the
+ * i915 request code can break that up and await the individual
+ * fences.
+ *
+ * While collecting the individual dependencies, we store the refcounted
+ * struct dma_fence pointers in a realloc-type-managed pointer array, since
+ * that can be easily fed into a dma_fence_array. Other options are
+ * available, like for example an xarray for similarity with drm/sched.
+ * Can be changed easily if needed.
+ *
+ * We might want to break this out into a separate file as a utility.
+ */
+
+#define I915_DEPS_MIN_ALLOC_CHUNK 8U
+
+/**
+ * struct i915_deps - Collect dependencies into a single dma-fence
+ * @single: Storage for pointer if the collection is a single fence.
+ * @fence: Allocated array of fence pointers if more than a single fence;
+ * otherwise points to the address of @single.
+ * @num_deps: Current number of dependency fences.
+ * @fences_size: Size of the @fences array in number of pointers.
+ * @gfp: Allocation mode.
+ */
+struct i915_deps {
+	struct dma_fence *single;
+	struct dma_fence **fences;
+	unsigned int num_deps;
+	unsigned int fences_size;
+	gfp_t gfp;
+};
+
+static void i915_deps_reset_fences(struct i915_deps *deps)
+{
+	if (deps->fences != &deps->single)
+		kfree(deps->fences);
+	deps->num_deps = 0;
+	deps->fences_size = 1;
+	deps->fences = &deps->single;
+}
+
+static void i915_deps_init(struct i915_deps *deps, gfp_t gfp)
+{
+	deps->fences = NULL;
+	deps->gfp = gfp;
+	i915_deps_reset_fences(deps);
+}
+
+static void i915_deps_fini(struct i915_deps *deps)
+{
+	unsigned int i;
+
+	for (i = 0; i < deps->num_deps; ++i)
+		dma_fence_put(deps->fences[i]);
+
+	if (deps->fences != &deps->single)
+		kfree(deps->fences);
+}
+
+static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence,
+			  const struct ttm_operation_ctx *ctx)
+{
+	int ret;
+
+	if (deps->num_deps >= deps->fences_size) {
+		unsigned int new_size = 2 * deps->fences_size;
+		struct dma_fence **new_fences;
+
+		new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK);
+		new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp);
+		if (!new_fences)
+			goto sync;
+
+		memcpy(new_fences, deps->fences,
+		       deps->fences_size * sizeof(*new_fences));
+		swap(new_fences, deps->fences);
+		if (new_fences != &deps->single)
+			kfree(new_fences);
+		deps->fences_size = new_size;
+	}
+	deps->fences[deps->num_deps++] = dma_fence_get(fence);
+	return 0;
+
+sync:
+	if (ctx->no_wait_gpu) {
+		ret = -EBUSY;
+		goto unref;
+	}
+
+	ret = dma_fence_wait(fence, ctx->interruptible);
+	if (ret)
+		goto unref;
+
+	ret = fence->error;
+	if (ret)
+		goto unref;
+
+	return 0;
+
+unref:
+	i915_deps_fini(deps);
+	return ret;
+}
+
+static int i915_deps_sync(struct i915_deps *deps,
+			  const struct ttm_operation_ctx *ctx)
+{
+	unsigned int i;
+	int ret = 0;
+	struct dma_fence **fences = deps->fences;
+
+	for (i = 0; i < deps->num_deps; ++i, ++fences) {
+		if (ctx->no_wait_gpu) {
+			ret = -EBUSY;
+			goto unref;
+		}
+
+		ret = dma_fence_wait(*fences, ctx->interruptible);
+		if (ret)
+			goto unref;
+
+		ret = (*fences)->error;
+		if (ret)
+			goto unref;
+	}
+
+	i915_deps_fini(deps);
+	return 0;
+
+unref:
+	i915_deps_fini(deps);
+	return ret;
+}
+
+static int i915_deps_add_dependency(struct i915_deps *deps,
+				    struct dma_fence *fence,
+				    const struct ttm_operation_ctx *ctx)
+{
+	unsigned int i;
+	int ret;
+
+	if (!fence)
+		return 0;
+
+	if (dma_fence_is_signaled(fence)) {
+		ret = fence->error;
+		if (ret)
+			i915_deps_fini(deps);
+		return ret;
+	}
+
+	for (i = 0; i < deps->num_deps; ++i) {
+		struct dma_fence *entry = deps->fences[i];
+
+		if (!entry->context || entry->context != fence->context)
+			continue;
+
+		if (dma_fence_is_later(fence, entry)) {
+			dma_fence_put(entry);
+			deps->fences[i] = dma_fence_get(fence);
+		}
+
+		return 0;
+	}
+
+	return i915_deps_grow(deps, fence, ctx);
+}
+
+static struct dma_fence *i915_deps_to_fence(struct i915_deps *deps,
+					    const struct ttm_operation_ctx *ctx)
+{
+	struct dma_fence_array *array;
+
+	if (deps->num_deps == 0)
+		return NULL;
+
+	if (deps->num_deps == 1) {
+		deps->num_deps = 0;
+		return deps->fences[0];
+	}
+
+	/*
+	 * TODO: Alter the allocation mode here to not try too hard to
+	 * make things async.
+	 */
+	array = dma_fence_array_create(deps->num_deps, deps->fences, 0, 0,
+				       false);
+	if (!array)
+		return ERR_PTR(i915_deps_sync(deps, ctx));
+
+	deps->fences = NULL;
+	i915_deps_reset_fences(deps);
+
+	return &array->base;
+}
+
+static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv,
+			      bool all, const bool no_excl,
+			      const struct ttm_operation_ctx *ctx)
+{
+	struct dma_resv_iter iter;
+	struct dma_fence *fence;
+
+	dma_resv_assert_held(resv);
+	dma_resv_for_each_fence(&iter, resv, all, fence) {
+		int ret;
+
+		if (no_excl && !iter.index)
+			continue;
+
+		ret = i915_deps_add_dependency(deps, fence, ctx);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 		     struct ttm_tt *ttm)
@@ -156,7 +380,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 					     bool clear,
 					     struct ttm_resource *dst_mem,
 					     struct ttm_tt *dst_ttm,
-					     struct sg_table *dst_st)
+					     struct sg_table *dst_st,
+					     struct dma_fence *dep)
 {
 	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
 						     bdev);
@@ -180,7 +405,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 			return ERR_PTR(-EINVAL);
 
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
-		ret = intel_context_migrate_clear(i915->gt.migrate.context, NULL,
+		ret = intel_context_migrate_clear(i915->gt.migrate.context, dep,
 						  dst_st->sgl, dst_level,
 						  i915_ttm_gtt_binds_lmem(dst_mem),
 						  0, &rq);
@@ -194,7 +419,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
 		ret = intel_context_migrate_copy(i915->gt.migrate.context,
-						 NULL, src_rsgt->table.sgl,
+						 dep, src_rsgt->table.sgl,
 						 src_level,
 						 i915_ttm_gtt_binds_lmem(bo->resource),
 						 dst_st->sgl, dst_level,
@@ -378,10 +603,11 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 	return &work->fence;
 }
 
-static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-			    struct ttm_resource *dst_mem,
-			    struct ttm_tt *dst_ttm,
-			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+static struct dma_fence *
+__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+		struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
+		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
+		struct dma_fence *move_dep)
 {
 	struct i915_ttm_memcpy_work *copy_work = NULL;
 	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
@@ -389,7 +615,7 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 
 	if (allow_accel) {
 		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
-					    &dst_rsgt->table);
+					    &dst_rsgt->table, move_dep);
 
 		/*
 		 * We only need to intercept the error when moving to lmem.
@@ -423,6 +649,11 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 
 		if (!IS_ERR(fence))
 			goto out;
+	} else if (move_dep) {
+		int err = dma_fence_wait(move_dep, true);
+
+		if (err)
+			return ERR_PTR(err);
 	}
 
 	/* Error intercept failed or no accelerated migration to start with */
@@ -433,16 +664,35 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 	i915_ttm_memcpy_release(arg);
 	kfree(copy_work);
 
-	return;
+	return NULL;
 out:
-	/* Sync here for now, forward the fence to caller when fully async. */
-	if (fence) {
-		dma_fence_wait(fence, false);
-		dma_fence_put(fence);
-	} else if (copy_work) {
+	if (!fence && copy_work) {
 		i915_ttm_memcpy_release(arg);
 		kfree(copy_work);
 	}
+
+	return fence;
+}
+
+static struct dma_fence *prev_fence(struct ttm_buffer_object *bo,
+				    struct ttm_operation_ctx *ctx)
+{
+	struct i915_deps deps;
+	int ret;
+
+	/*
+	 * Instead of trying hard with GFP_KERNEL to allocate memory,
+	 * the dependency collection will just sync if it doesn't
+	 * succeed.
+	 */
+	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
+	ret = i915_deps_add_dependency(&deps, bo->moving, ctx);
+	if (!ret)
+		ret = i915_deps_add_resv(&deps, bo->base.resv, false, false, ctx);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return i915_deps_to_fence(&deps, ctx);
 }
 
 /**
@@ -462,16 +712,12 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 	struct ttm_resource_manager *dst_man =
 		ttm_manager_type(bo->bdev, dst_mem->mem_type);
+	struct dma_fence *migration_fence = NULL;
 	struct ttm_tt *ttm = bo->ttm;
 	struct i915_refct_sgt *dst_rsgt;
 	bool clear;
 	int ret;
 
-	/* Sync for now. We could do the actual copy async. */
-	ret = ttm_bo_wait_ctx(bo, ctx);
-	if (ret)
-		return ret;
-
 	ret = i915_ttm_move_notify(bo);
 	if (ret)
 		return ret;
@@ -494,10 +740,37 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		return PTR_ERR(dst_rsgt);
 
 	clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
-	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
-		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
+	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+		struct dma_fence *dep = prev_fence(bo, ctx);
+
+		if (IS_ERR(dep)) {
+			i915_refct_sgt_put(dst_rsgt);
+			return PTR_ERR(dep);
+		}
+
+		migration_fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm,
+						  dst_rsgt, true, dep);
+		dma_fence_put(dep);
+	}
+
+	/* We can possibly get an -ERESTARTSYS here */
+	if (IS_ERR(migration_fence)) {
+		i915_refct_sgt_put(dst_rsgt);
+		return PTR_ERR(migration_fence);
+	}
+
+	if (migration_fence) {
+		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
+						true, dst_mem);
+		if (ret) {
+			dma_fence_wait(migration_fence, false);
+			ttm_bo_move_sync_cleanup(bo, dst_mem);
+		}
+		dma_fence_put(migration_fence);
+	} else {
+		ttm_bo_move_sync_cleanup(bo, dst_mem);
+	}
 
-	ttm_bo_move_sync_cleanup(bo, dst_mem);
 	i915_ttm_adjust_domains_after_move(obj);
 	i915_ttm_free_cached_io_rsgt(obj);
 
@@ -538,6 +811,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		.interruptible = intr,
 	};
 	struct i915_refct_sgt *dst_rsgt;
+	struct dma_fence *copy_fence;
 	int ret;
 
 	assert_object_held(dst);
@@ -553,10 +827,17 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		return ret;
 
 	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
-	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_rsgt, allow_accel);
+	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
+				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
 
 	i915_refct_sgt_put(dst_rsgt);
+	if (IS_ERR(copy_fence))
+		return PTR_ERR(copy_fence);
+
+	if (copy_fence) {
+		dma_fence_wait(copy_fence, false);
+		dma_fence_put(copy_fence);
+	}
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index f909aaa09d9c..bae65796a6cc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -306,6 +306,6 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
 				   unsigned int flags)
 {
 	might_sleep();
-	/* NOP for now. */
-	return 0;
+
+	return i915_gem_object_wait_moving_fence(obj, !!(flags & I915_WAIT_INTERRUPTIBLE));
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-gfx] [PATCH v3 5/6] drm/i915/ttm: Implement asynchronous TTM moves
@ 2021-11-14 11:12   ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Don't wait sync while migrating, but rather make the GPU blit await the
dependencies and add a moving fence to the object.

This also enables asynchronous VRAM management in that on eviction,
rather than waiting for the moving fence to expire before freeing VRAM,
it is freed immediately and the fence is stored with the VRAM manager and
handed out to newly allocated objects to await before clears and swapins,
or for kernel objects before setting up gpu vmas or mapping.

To collect dependencies before migrating, add a set of utilities that
coalesce these to a single dma_fence.

What is still missing for fully asynchronous operation is asynchronous vma
unbinding, which is still to be implemented.

This commit substantially reduces execution time in the gem_lmem_swapping
test.

v2:
- Make a couple of functions static.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  10 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 329 +++++++++++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_wait.c     |   4 +-
 4 files changed, 318 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index a1df49378a0f..111a4282d779 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -326,6 +326,9 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
 {
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 
+	if (!obj)
+		return false;
+
 	/*
 	 * EXTERNAL objects should never be swapped out by TTM, instead we need
 	 * to handle that ourselves. TTM will already skip such objects for us,
@@ -448,6 +451,10 @@ static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
 	if (bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED)
 		return 0;
 
+	ret = ttm_bo_wait_ctx(bo, &ctx);
+	if (ret)
+		return ret;
+
 	bo->ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
 	ret = ttm_bo_validate(bo, &place, &ctx);
 	if (ret) {
@@ -549,6 +556,9 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 	int ret = i915_ttm_move_notify(bo);
 
+	if (!obj)
+		return;
+
 	GEM_WARN_ON(ret);
 	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
 	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 82cdabb542be..9d698ad00853 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -37,7 +37,7 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo);
 static inline struct drm_i915_gem_object *
 i915_ttm_to_gem(struct ttm_buffer_object *bo)
 {
-	if (GEM_WARN_ON(bo->destroy != i915_ttm_bo_destroy))
+	if (bo->destroy != i915_ttm_bo_destroy)
 		return NULL;
 
 	return container_of(bo, struct drm_i915_gem_object, __do_not_access);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index f35b386c56ca..ae2c49fc3500 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,8 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include <linux/dma-fence-array.h>
+
 #include <drm/ttm/ttm_bo_driver.h>
 
 #include "i915_drv.h"
@@ -41,6 +43,228 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
 }
 #endif
 
+/**
+ * DOC: Set of utilities to dynamically collect dependencies and
+ * eventually coalesce them into a single fence which is fed into
+ * the migration code. That single fence is, in the case of dependencies
+ * from multiple contexts, a struct dma_fence_array, since the
+ * i915 request code can break that up and await the individual
+ * fences.
+ *
+ * While collecting the individual dependencies, we store the refcounted
+ * struct dma_fence pointers in a realloc-type-managed pointer array, since
+ * that can be easily fed into a dma_fence_array. Other options are
+ * available, like for example an xarray for similarity with drm/sched.
+ * Can be changed easily if needed.
+ *
+ * We might want to break this out into a separate file as a utility.
+ */
+
+#define I915_DEPS_MIN_ALLOC_CHUNK 8U
+
+/**
+ * struct i915_deps - Collect dependencies into a single dma-fence
+ * @single: Storage for pointer if the collection is a single fence.
+ * @fence: Allocated array of fence pointers if more than a single fence;
+ * otherwise points to the address of @single.
+ * @num_deps: Current number of dependency fences.
+ * @fences_size: Size of the @fences array in number of pointers.
+ * @gfp: Allocation mode.
+ */
+struct i915_deps {
+	struct dma_fence *single;
+	struct dma_fence **fences;
+	unsigned int num_deps;
+	unsigned int fences_size;
+	gfp_t gfp;
+};
+
+static void i915_deps_reset_fences(struct i915_deps *deps)
+{
+	if (deps->fences != &deps->single)
+		kfree(deps->fences);
+	deps->num_deps = 0;
+	deps->fences_size = 1;
+	deps->fences = &deps->single;
+}
+
+static void i915_deps_init(struct i915_deps *deps, gfp_t gfp)
+{
+	deps->fences = NULL;
+	deps->gfp = gfp;
+	i915_deps_reset_fences(deps);
+}
+
+static void i915_deps_fini(struct i915_deps *deps)
+{
+	unsigned int i;
+
+	for (i = 0; i < deps->num_deps; ++i)
+		dma_fence_put(deps->fences[i]);
+
+	if (deps->fences != &deps->single)
+		kfree(deps->fences);
+}
+
+static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence,
+			  const struct ttm_operation_ctx *ctx)
+{
+	int ret;
+
+	if (deps->num_deps >= deps->fences_size) {
+		unsigned int new_size = 2 * deps->fences_size;
+		struct dma_fence **new_fences;
+
+		new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK);
+		new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp);
+		if (!new_fences)
+			goto sync;
+
+		memcpy(new_fences, deps->fences,
+		       deps->fences_size * sizeof(*new_fences));
+		swap(new_fences, deps->fences);
+		if (new_fences != &deps->single)
+			kfree(new_fences);
+		deps->fences_size = new_size;
+	}
+	deps->fences[deps->num_deps++] = dma_fence_get(fence);
+	return 0;
+
+sync:
+	if (ctx->no_wait_gpu) {
+		ret = -EBUSY;
+		goto unref;
+	}
+
+	ret = dma_fence_wait(fence, ctx->interruptible);
+	if (ret)
+		goto unref;
+
+	ret = fence->error;
+	if (ret)
+		goto unref;
+
+	return 0;
+
+unref:
+	i915_deps_fini(deps);
+	return ret;
+}
+
+static int i915_deps_sync(struct i915_deps *deps,
+			  const struct ttm_operation_ctx *ctx)
+{
+	unsigned int i;
+	int ret = 0;
+	struct dma_fence **fences = deps->fences;
+
+	for (i = 0; i < deps->num_deps; ++i, ++fences) {
+		if (ctx->no_wait_gpu) {
+			ret = -EBUSY;
+			goto unref;
+		}
+
+		ret = dma_fence_wait(*fences, ctx->interruptible);
+		if (ret)
+			goto unref;
+
+		ret = (*fences)->error;
+		if (ret)
+			goto unref;
+	}
+
+	i915_deps_fini(deps);
+	return 0;
+
+unref:
+	i915_deps_fini(deps);
+	return ret;
+}
+
+static int i915_deps_add_dependency(struct i915_deps *deps,
+				    struct dma_fence *fence,
+				    const struct ttm_operation_ctx *ctx)
+{
+	unsigned int i;
+	int ret;
+
+	if (!fence)
+		return 0;
+
+	if (dma_fence_is_signaled(fence)) {
+		ret = fence->error;
+		if (ret)
+			i915_deps_fini(deps);
+		return ret;
+	}
+
+	for (i = 0; i < deps->num_deps; ++i) {
+		struct dma_fence *entry = deps->fences[i];
+
+		if (!entry->context || entry->context != fence->context)
+			continue;
+
+		if (dma_fence_is_later(fence, entry)) {
+			dma_fence_put(entry);
+			deps->fences[i] = dma_fence_get(fence);
+		}
+
+		return 0;
+	}
+
+	return i915_deps_grow(deps, fence, ctx);
+}
+
+static struct dma_fence *i915_deps_to_fence(struct i915_deps *deps,
+					    const struct ttm_operation_ctx *ctx)
+{
+	struct dma_fence_array *array;
+
+	if (deps->num_deps == 0)
+		return NULL;
+
+	if (deps->num_deps == 1) {
+		deps->num_deps = 0;
+		return deps->fences[0];
+	}
+
+	/*
+	 * TODO: Alter the allocation mode here to not try too hard to
+	 * make things async.
+	 */
+	array = dma_fence_array_create(deps->num_deps, deps->fences, 0, 0,
+				       false);
+	if (!array)
+		return ERR_PTR(i915_deps_sync(deps, ctx));
+
+	deps->fences = NULL;
+	i915_deps_reset_fences(deps);
+
+	return &array->base;
+}
+
+static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv,
+			      bool all, const bool no_excl,
+			      const struct ttm_operation_ctx *ctx)
+{
+	struct dma_resv_iter iter;
+	struct dma_fence *fence;
+
+	dma_resv_assert_held(resv);
+	dma_resv_for_each_fence(&iter, resv, all, fence) {
+		int ret;
+
+		if (no_excl && !iter.index)
+			continue;
+
+		ret = i915_deps_add_dependency(deps, fence, ctx);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 		     struct ttm_tt *ttm)
@@ -156,7 +380,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 					     bool clear,
 					     struct ttm_resource *dst_mem,
 					     struct ttm_tt *dst_ttm,
-					     struct sg_table *dst_st)
+					     struct sg_table *dst_st,
+					     struct dma_fence *dep)
 {
 	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
 						     bdev);
@@ -180,7 +405,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 			return ERR_PTR(-EINVAL);
 
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
-		ret = intel_context_migrate_clear(i915->gt.migrate.context, NULL,
+		ret = intel_context_migrate_clear(i915->gt.migrate.context, dep,
 						  dst_st->sgl, dst_level,
 						  i915_ttm_gtt_binds_lmem(dst_mem),
 						  0, &rq);
@@ -194,7 +419,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
 		ret = intel_context_migrate_copy(i915->gt.migrate.context,
-						 NULL, src_rsgt->table.sgl,
+						 dep, src_rsgt->table.sgl,
 						 src_level,
 						 i915_ttm_gtt_binds_lmem(bo->resource),
 						 dst_st->sgl, dst_level,
@@ -378,10 +603,11 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
 	return &work->fence;
 }
 
-static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-			    struct ttm_resource *dst_mem,
-			    struct ttm_tt *dst_ttm,
-			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+static struct dma_fence *
+__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+		struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
+		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
+		struct dma_fence *move_dep)
 {
 	struct i915_ttm_memcpy_work *copy_work = NULL;
 	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
@@ -389,7 +615,7 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 
 	if (allow_accel) {
 		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
-					    &dst_rsgt->table);
+					    &dst_rsgt->table, move_dep);
 
 		/*
 		 * We only need to intercept the error when moving to lmem.
@@ -423,6 +649,11 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 
 		if (!IS_ERR(fence))
 			goto out;
+	} else if (move_dep) {
+		int err = dma_fence_wait(move_dep, true);
+
+		if (err)
+			return ERR_PTR(err);
 	}
 
 	/* Error intercept failed or no accelerated migration to start with */
@@ -433,16 +664,35 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 	i915_ttm_memcpy_release(arg);
 	kfree(copy_work);
 
-	return;
+	return NULL;
 out:
-	/* Sync here for now, forward the fence to caller when fully async. */
-	if (fence) {
-		dma_fence_wait(fence, false);
-		dma_fence_put(fence);
-	} else if (copy_work) {
+	if (!fence && copy_work) {
 		i915_ttm_memcpy_release(arg);
 		kfree(copy_work);
 	}
+
+	return fence;
+}
+
+static struct dma_fence *prev_fence(struct ttm_buffer_object *bo,
+				    struct ttm_operation_ctx *ctx)
+{
+	struct i915_deps deps;
+	int ret;
+
+	/*
+	 * Instead of trying hard with GFP_KERNEL to allocate memory,
+	 * the dependency collection will just sync if it doesn't
+	 * succeed.
+	 */
+	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
+	ret = i915_deps_add_dependency(&deps, bo->moving, ctx);
+	if (!ret)
+		ret = i915_deps_add_resv(&deps, bo->base.resv, false, false, ctx);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return i915_deps_to_fence(&deps, ctx);
 }
 
 /**
@@ -462,16 +712,12 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 	struct ttm_resource_manager *dst_man =
 		ttm_manager_type(bo->bdev, dst_mem->mem_type);
+	struct dma_fence *migration_fence = NULL;
 	struct ttm_tt *ttm = bo->ttm;
 	struct i915_refct_sgt *dst_rsgt;
 	bool clear;
 	int ret;
 
-	/* Sync for now. We could do the actual copy async. */
-	ret = ttm_bo_wait_ctx(bo, ctx);
-	if (ret)
-		return ret;
-
 	ret = i915_ttm_move_notify(bo);
 	if (ret)
 		return ret;
@@ -494,10 +740,37 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		return PTR_ERR(dst_rsgt);
 
 	clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
-	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
-		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
+	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+		struct dma_fence *dep = prev_fence(bo, ctx);
+
+		if (IS_ERR(dep)) {
+			i915_refct_sgt_put(dst_rsgt);
+			return PTR_ERR(dep);
+		}
+
+		migration_fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm,
+						  dst_rsgt, true, dep);
+		dma_fence_put(dep);
+	}
+
+	/* We can possibly get an -ERESTARTSYS here */
+	if (IS_ERR(migration_fence)) {
+		i915_refct_sgt_put(dst_rsgt);
+		return PTR_ERR(migration_fence);
+	}
+
+	if (migration_fence) {
+		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
+						true, dst_mem);
+		if (ret) {
+			dma_fence_wait(migration_fence, false);
+			ttm_bo_move_sync_cleanup(bo, dst_mem);
+		}
+		dma_fence_put(migration_fence);
+	} else {
+		ttm_bo_move_sync_cleanup(bo, dst_mem);
+	}
 
-	ttm_bo_move_sync_cleanup(bo, dst_mem);
 	i915_ttm_adjust_domains_after_move(obj);
 	i915_ttm_free_cached_io_rsgt(obj);
 
@@ -538,6 +811,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		.interruptible = intr,
 	};
 	struct i915_refct_sgt *dst_rsgt;
+	struct dma_fence *copy_fence;
 	int ret;
 
 	assert_object_held(dst);
@@ -553,10 +827,17 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		return ret;
 
 	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
-	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_rsgt, allow_accel);
+	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
+				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
 
 	i915_refct_sgt_put(dst_rsgt);
+	if (IS_ERR(copy_fence))
+		return PTR_ERR(copy_fence);
+
+	if (copy_fence) {
+		dma_fence_wait(copy_fence, false);
+		dma_fence_put(copy_fence);
+	}
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index f909aaa09d9c..bae65796a6cc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -306,6 +306,6 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
 				   unsigned int flags)
 {
 	might_sleep();
-	/* NOP for now. */
-	return 0;
+
+	return i915_gem_object_wait_moving_fence(obj, !!(flags & I915_WAIT_INTERRUPTIBLE));
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 6/6] drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous
  2021-11-14 11:12 ` [Intel-gfx] " Thomas Hellström
@ 2021-11-14 11:12   ` Thomas Hellström
  -1 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Update the copy function i915_gem_obj_copy_ttm() to be asynchronous for
future users and update the only current user to sync the objects
as needed after this function.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 40 ++++++++++++++------
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c   |  2 +
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index ae2c49fc3500..53ed3972c7be 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -811,33 +811,49 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		.interruptible = intr,
 	};
 	struct i915_refct_sgt *dst_rsgt;
-	struct dma_fence *copy_fence;
-	int ret;
+	struct dma_fence *copy_fence, *dep_fence;
+	struct i915_deps deps;
+	int ret, shared_err;
 
 	assert_object_held(dst);
 	assert_object_held(src);
+	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
 
 	/*
-	 * Sync for now. This will change with async moves.
+	 * We plan to add a shared fence only for the source. If that
+	 * fails, we await all source fences before commencing
+	 * the copy instead of only the exclusive.
 	 */
-	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
+	shared_err = dma_resv_reserve_shared(src_bo->base.resv, 1);
+	ret = i915_deps_add_resv(&deps, dst_bo->base.resv, true, false, &ctx);
 	if (!ret)
-		ret = ttm_bo_wait_ctx(src_bo, &ctx);
+		ret = i915_deps_add_resv(&deps, src_bo->base.resv,
+					 !!shared_err, false, &ctx);
 	if (ret)
 		return ret;
 
+	dep_fence = i915_deps_to_fence(&deps, &ctx);
+	if (IS_ERR(dep_fence))
+		return PTR_ERR(dep_fence);
+
 	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
 	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
-				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
+				     dst_bo->ttm, dst_rsgt, allow_accel,
+				     dep_fence);
 
 	i915_refct_sgt_put(dst_rsgt);
-	if (IS_ERR(copy_fence))
-		return PTR_ERR(copy_fence);
+	if (IS_ERR_OR_NULL(copy_fence))
+		return PTR_ERR_OR_ZERO(copy_fence);
 
-	if (copy_fence) {
-		dma_fence_wait(copy_fence, false);
-		dma_fence_put(copy_fence);
-	}
+	dma_resv_add_excl_fence(dst_bo->base.resv, copy_fence);
+
+	/* If we failed to reserve a shared slot, add an exclusive fence */
+	if (shared_err)
+		dma_resv_add_excl_fence(src_bo->base.resv, copy_fence);
+	else
+		dma_resv_add_shared_fence(src_bo->base.resv, copy_fence);
+
+	dma_fence_put(copy_fence);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 60d10ab55d1e..9aad84059d56 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -80,6 +80,7 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region *apply,
 
 	err = i915_gem_obj_copy_ttm(backup, obj, pm_apply->allow_gpu, false);
 	GEM_WARN_ON(err);
+	ttm_bo_wait_ctx(backup_bo, &ctx);
 
 	obj->ttm.backup = backup;
 	return 0;
@@ -170,6 +171,7 @@ static int i915_ttm_restore(struct i915_gem_apply_to_region *apply,
 		err = i915_gem_obj_copy_ttm(obj, backup, pm_apply->allow_gpu,
 					    false);
 		GEM_WARN_ON(err);
+		ttm_bo_wait_ctx(backup_bo, &ctx);
 
 		obj->ttm.backup = NULL;
 		err = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-gfx] [PATCH v3 6/6] drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous
@ 2021-11-14 11:12   ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-14 11:12 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Update the copy function i915_gem_obj_copy_ttm() to be asynchronous for
future users and update the only current user to sync the objects
as needed after this function.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 40 ++++++++++++++------
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c   |  2 +
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index ae2c49fc3500..53ed3972c7be 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -811,33 +811,49 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		.interruptible = intr,
 	};
 	struct i915_refct_sgt *dst_rsgt;
-	struct dma_fence *copy_fence;
-	int ret;
+	struct dma_fence *copy_fence, *dep_fence;
+	struct i915_deps deps;
+	int ret, shared_err;
 
 	assert_object_held(dst);
 	assert_object_held(src);
+	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
 
 	/*
-	 * Sync for now. This will change with async moves.
+	 * We plan to add a shared fence only for the source. If that
+	 * fails, we await all source fences before commencing
+	 * the copy instead of only the exclusive.
 	 */
-	ret = ttm_bo_wait_ctx(dst_bo, &ctx);
+	shared_err = dma_resv_reserve_shared(src_bo->base.resv, 1);
+	ret = i915_deps_add_resv(&deps, dst_bo->base.resv, true, false, &ctx);
 	if (!ret)
-		ret = ttm_bo_wait_ctx(src_bo, &ctx);
+		ret = i915_deps_add_resv(&deps, src_bo->base.resv,
+					 !!shared_err, false, &ctx);
 	if (ret)
 		return ret;
 
+	dep_fence = i915_deps_to_fence(&deps, &ctx);
+	if (IS_ERR(dep_fence))
+		return PTR_ERR(dep_fence);
+
 	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
 	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
-				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
+				     dst_bo->ttm, dst_rsgt, allow_accel,
+				     dep_fence);
 
 	i915_refct_sgt_put(dst_rsgt);
-	if (IS_ERR(copy_fence))
-		return PTR_ERR(copy_fence);
+	if (IS_ERR_OR_NULL(copy_fence))
+		return PTR_ERR_OR_ZERO(copy_fence);
 
-	if (copy_fence) {
-		dma_fence_wait(copy_fence, false);
-		dma_fence_put(copy_fence);
-	}
+	dma_resv_add_excl_fence(dst_bo->base.resv, copy_fence);
+
+	/* If we failed to reserve a shared slot, add an exclusive fence */
+	if (shared_err)
+		dma_resv_add_excl_fence(src_bo->base.resv, copy_fence);
+	else
+		dma_resv_add_shared_fence(src_bo->base.resv, copy_fence);
+
+	dma_fence_put(copy_fence);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 60d10ab55d1e..9aad84059d56 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -80,6 +80,7 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region *apply,
 
 	err = i915_gem_obj_copy_ttm(backup, obj, pm_apply->allow_gpu, false);
 	GEM_WARN_ON(err);
+	ttm_bo_wait_ctx(backup_bo, &ctx);
 
 	obj->ttm.backup = backup;
 	return 0;
@@ -170,6 +171,7 @@ static int i915_ttm_restore(struct i915_gem_apply_to_region *apply,
 		err = i915_gem_obj_copy_ttm(obj, backup, pm_apply->allow_gpu,
 					    false);
 		GEM_WARN_ON(err);
+		ttm_bo_wait_ctx(backup_bo, &ctx);
 
 		obj->ttm.backup = NULL;
 		err = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/ttm: Async migration (rev4)
  2021-11-14 11:12 ` [Intel-gfx] " Thomas Hellström
                   ` (6 preceding siblings ...)
  (?)
@ 2021-11-14 11:25 ` Patchwork
  -1 siblings, 0 replies; 40+ messages in thread
From: Patchwork @ 2021-11-14 11:25 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/ttm: Async migration (rev4)
URL   : https://patchwork.freedesktop.org/series/96798/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
-
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:28:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:28:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:28:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:33:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:33:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:51:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:51:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:51:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:57:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:57:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_reset.c:1399:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block
+drivers/gpu/drm/i915/i915_perf.c:1442:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1496:15: warning: memset with byte count of 16777216
+./include/asm-generic/bitops/find.h:112:45: warning: shift count is negative (-262080)
+./include/asm-generic/bitops/find.h:32:31: warning: shift count is negative (-262080)
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'gen6_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'gen6_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'gen6_write8' - different lock contexts for basic block



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Intel-gfx] ✗ Fi.CI.DOCS: warning for drm/i915/ttm: Async migration (rev4)
  2021-11-14 11:12 ` [Intel-gfx] " Thomas Hellström
                   ` (7 preceding siblings ...)
  (?)
@ 2021-11-14 11:28 ` Patchwork
  -1 siblings, 0 replies; 40+ messages in thread
From: Patchwork @ 2021-11-14 11:28 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/ttm: Async migration (rev4)
URL   : https://patchwork.freedesktop.org/series/96798/
State : warning

== Summary ==

$ make htmldocs 2>&1 > /dev/null | grep i915
./drivers/gpu/drm/i915/display/intel_fbc.c:635: warning: Excess function parameter 'i915' description in 'intel_fbc_is_active'
./drivers/gpu/drm/i915/display/intel_fbc.c:1638: warning: Excess function parameter 'i915' description in 'intel_fbc_handle_fifo_underrun_irq'
./drivers/gpu/drm/i915/display/intel_fbc.c:635: warning: Function parameter or member 'fbc' not described in 'intel_fbc_is_active'
./drivers/gpu/drm/i915/display/intel_fbc.c:635: warning: Excess function parameter 'i915' description in 'intel_fbc_is_active'
./drivers/gpu/drm/i915/display/intel_fbc.c:1638: warning: Function parameter or member 'fbc' not described in 'intel_fbc_handle_fifo_underrun_irq'
./drivers/gpu/drm/i915/display/intel_fbc.c:1638: warning: Excess function parameter 'i915' description in 'intel_fbc_handle_fifo_underrun_irq'



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/ttm: Async migration (rev4)
  2021-11-14 11:12 ` [Intel-gfx] " Thomas Hellström
                   ` (8 preceding siblings ...)
  (?)
@ 2021-11-14 11:52 ` Patchwork
  -1 siblings, 0 replies; 40+ messages in thread
From: Patchwork @ 2021-11-14 11:52 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 6006 bytes --]

== Series Details ==

Series: drm/i915/ttm: Async migration (rev4)
URL   : https://patchwork.freedesktop.org/series/96798/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10878 -> Patchwork_21582
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/index.html

Participating hosts (29 -> 28)
------------------------------

  Additional (3): fi-kbl-soraka fi-elk-e7500 fi-pnv-d510 
  Missing    (4): fi-bsw-cyan fi-bsw-nick bat-dg1-6 bat-dg1-5 

Known issues
------------

  Here are the changes found in Patchwork_21582 that come from known issues:

### CI changes ###

#### Issues hit ####

  * boot:
    - fi-skl-6700k2:      [PASS][1] -> [FAIL][2] ([i915#4467])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/fi-skl-6700k2/boot.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-skl-6700k2/boot.html

  

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@cs-multi-fence:
    - fi-blb-e6850:       NOTRUN -> [SKIP][3] ([fdo#109271]) +17 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-blb-e6850/igt@amdgpu/amd_basic@cs-multi-fence.html

  * igt@gem_exec_fence@basic-busy@bcs0:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][4] ([fdo#109271]) +8 similar issues
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-kbl-soraka/igt@gem_exec_fence@basic-busy@bcs0.html

  * igt@gem_huc_copy@huc-copy:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#2190])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-kbl-soraka/igt@gem_huc_copy@huc-copy.html

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-cfl-guc:         [PASS][6] -> [DMESG-FAIL][7] ([i915#541])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/fi-cfl-guc/igt@i915_selftest@live@gt_heartbeat.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-cfl-guc/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@gt_pm:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-FAIL][8] ([i915#1886] / [i915#2291])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][9] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-kbl-soraka/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][10] ([fdo#109271] / [i915#533])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-kbl-soraka/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@prime_vgem@basic-userptr:
    - fi-pnv-d510:        NOTRUN -> [SKIP][11] ([fdo#109271]) +53 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-pnv-d510/igt@prime_vgem@basic-userptr.html

  * igt@runner@aborted:
    - fi-bdw-5557u:       NOTRUN -> [FAIL][12] ([i915#1602] / [i915#2426] / [i915#4312])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-bdw-5557u/igt@runner@aborted.html
    - fi-elk-e7500:       NOTRUN -> [FAIL][13] ([i915#2426])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-elk-e7500/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@requests:
    - fi-blb-e6850:       [DMESG-FAIL][14] ([i915#4528]) -> [PASS][15]
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/fi-blb-e6850/igt@i915_selftest@live@requests.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-blb-e6850/igt@i915_selftest@live@requests.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cml-u2:          [DMESG-WARN][16] ([i915#4269]) -> [PASS][17]
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1602]: https://gitlab.freedesktop.org/drm/intel/issues/1602
  [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2291]: https://gitlab.freedesktop.org/drm/intel/issues/2291
  [i915#2426]: https://gitlab.freedesktop.org/drm/intel/issues/2426
  [i915#4269]: https://gitlab.freedesktop.org/drm/intel/issues/4269
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4467]: https://gitlab.freedesktop.org/drm/intel/issues/4467
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#541]: https://gitlab.freedesktop.org/drm/intel/issues/541


Build changes
-------------

  * Linux: CI_DRM_10878 -> Patchwork_21582

  CI-20190529: 20190529
  CI_DRM_10878: 9fccd12cfac1c863fa46d4d17c2d8ac25a44b190 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6280: 246bfd31dba6bf184b26b170d91d72c90a54be6b @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_21582: 7309e16f38a9afe9c808e6cd2340727d4c193f0c @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

7309e16f38a9 drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous
ecfa3453e635 drm/i915/ttm: Implement asynchronous TTM moves
5ec2be55b932 drm/i915/ttm: Break refcounting loops at device region unref time
0d37a8145978 drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
4e7ae2be8fdb drm/i915: Add support for asynchronous moving fence waiting
6ad46350db01 drm/i915: Add functions to set/get moving fence

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/index.html

[-- Attachment #2: Type: text/html, Size: 7348 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/ttm: Async migration (rev4)
  2021-11-14 11:12 ` [Intel-gfx] " Thomas Hellström
                   ` (9 preceding siblings ...)
  (?)
@ 2021-11-14 13:32 ` Patchwork
  -1 siblings, 0 replies; 40+ messages in thread
From: Patchwork @ 2021-11-14 13:32 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 30259 bytes --]

== Series Details ==

Series: drm/i915/ttm: Async migration (rev4)
URL   : https://patchwork.freedesktop.org/series/96798/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10878_full -> Patchwork_21582_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_21582_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21582_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (10 -> 11)
------------------------------

  Additional (1): shard-rkl 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_21582_full:

### IGT changes ###

#### Possible regressions ####

  * igt@core_setmaster@master-drop-set-shared-fd:
    - shard-iclb:         [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-iclb4/igt@core_setmaster@master-drop-set-shared-fd.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-iclb4/igt@core_setmaster@master-drop-set-shared-fd.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@kms_big_fb@yf-tiled-8bpp-rotate-0:
    - {shard-rkl}:        NOTRUN -> [SKIP][3] +2 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-rkl-6/igt@kms_big_fb@yf-tiled-8bpp-rotate-0.html

  * igt@kms_big_fb@yf-tiled-8bpp-rotate-90:
    - {shard-rkl}:        NOTRUN -> ([SKIP][4], [SKIP][5]) ([i915#1845]) +1 similar issue
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-rkl-6/igt@kms_big_fb@yf-tiled-8bpp-rotate-90.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-rkl-4/igt@kms_big_fb@yf-tiled-8bpp-rotate-90.html

  
Known issues
------------

  Here are the changes found in Patchwork_21582_full that come from known issues:

### CI changes ###

#### Possible fixes ####

  * boot:
    - shard-glk:          ([PASS][6], [PASS][7], [PASS][8], [PASS][9], [PASS][10], [PASS][11], [PASS][12], [PASS][13], [PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], [PASS][19], [PASS][20], [PASS][21], [PASS][22], [PASS][23], [PASS][24], [FAIL][25], [PASS][26], [PASS][27], [PASS][28], [PASS][29], [PASS][30]) ([i915#4392]) -> ([PASS][31], [PASS][32], [PASS][33], [PASS][34], [PASS][35], [PASS][36], [PASS][37], [PASS][38], [PASS][39], [PASS][40], [PASS][41], [PASS][42], [PASS][43], [PASS][44], [PASS][45], [PASS][46], [PASS][47], [PASS][48], [PASS][49], [PASS][50], [PASS][51], [PASS][52], [PASS][53], [PASS][54], [PASS][55])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk9/boot.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk9/boot.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk8/boot.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk8/boot.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk8/boot.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk7/boot.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk7/boot.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk7/boot.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk6/boot.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk6/boot.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk6/boot.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk5/boot.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk5/boot.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk5/boot.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk4/boot.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk4/boot.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk4/boot.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk3/boot.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk3/boot.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk3/boot.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk3/boot.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk2/boot.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk2/boot.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk1/boot.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk1/boot.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk6/boot.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk7/boot.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk7/boot.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk8/boot.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk8/boot.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk8/boot.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk9/boot.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk9/boot.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk9/boot.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk1/boot.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk1/boot.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk1/boot.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk2/boot.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk2/boot.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk2/boot.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk3/boot.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk3/boot.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk3/boot.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk4/boot.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk4/boot.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk4/boot.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk5/boot.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk5/boot.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk5/boot.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk6/boot.html

  

### IGT changes ###

#### Issues hit ####

  * igt@gem_create@create-massive:
    - shard-tglb:         NOTRUN -> [DMESG-WARN][56] ([i915#3002])
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb5/igt@gem_create@create-massive.html

  * igt@gem_ctx_isolation@preservation-s3@rcs0:
    - shard-apl:          NOTRUN -> [DMESG-WARN][57] ([i915#180])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-apl3/igt@gem_ctx_isolation@preservation-s3@rcs0.html

  * igt@gem_ctx_param@set-priority-not-supported:
    - shard-tglb:         NOTRUN -> [SKIP][58] ([fdo#109314])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb2/igt@gem_ctx_param@set-priority-not-supported.html

  * igt@gem_ctx_sseu@mmap-args:
    - shard-tglb:         NOTRUN -> [SKIP][59] ([i915#280])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb5/igt@gem_ctx_sseu@mmap-args.html

  * igt@gem_exec_balancer@parallel-keep-submit-fence:
    - shard-tglb:         NOTRUN -> [SKIP][60] ([i915#4525])
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb5/igt@gem_exec_balancer@parallel-keep-submit-fence.html

  * igt@gem_exec_capture@pi@vcs0:
    - shard-skl:          NOTRUN -> [INCOMPLETE][61] ([i915#2369])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl7/igt@gem_exec_capture@pi@vcs0.html

  * igt@gem_exec_capture@pi@vecs0:
    - shard-tglb:         [PASS][62] -> [INCOMPLETE][63] ([i915#2369] / [i915#3371])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-tglb3/igt@gem_exec_capture@pi@vecs0.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb3/igt@gem_exec_capture@pi@vecs0.html

  * igt@gem_exec_fair@basic-none@vcs0:
    - shard-tglb:         NOTRUN -> [FAIL][64] ([i915#2842]) +5 similar issues
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb2/igt@gem_exec_fair@basic-none@vcs0.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-glk:          [PASS][65] -> [FAIL][66] ([i915#2842])
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk5/igt@gem_exec_fair@basic-pace-solo@rcs0.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk4/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_exec_fair@basic-pace@bcs0:
    - shard-tglb:         [PASS][67] -> [FAIL][68] ([i915#2842]) +1 similar issue
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-tglb3/igt@gem_exec_fair@basic-pace@bcs0.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb8/igt@gem_exec_fair@basic-pace@bcs0.html

  * igt@gem_exec_fair@basic-pace@vcs0:
    - shard-kbl:          [PASS][69] -> [FAIL][70] ([i915#2842])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-kbl2/igt@gem_exec_fair@basic-pace@vcs0.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-kbl2/igt@gem_exec_fair@basic-pace@vcs0.html

  * igt@gem_exec_flush@basic-batch-kernel-default-cmd:
    - shard-tglb:         NOTRUN -> [SKIP][71] ([fdo#109313])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb2/igt@gem_exec_flush@basic-batch-kernel-default-cmd.html

  * igt@gem_exec_params@secure-non-root:
    - shard-tglb:         NOTRUN -> [SKIP][72] ([fdo#112283])
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@gem_exec_params@secure-non-root.html

  * igt@gem_pread@exhaustion:
    - shard-tglb:         NOTRUN -> [WARN][73] ([i915#2658])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb2/igt@gem_pread@exhaustion.html

  * igt@gem_pwrite@basic-exhaustion:
    - shard-apl:          NOTRUN -> [WARN][74] ([i915#2658])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-apl1/igt@gem_pwrite@basic-exhaustion.html

  * igt@gem_pxp@verify-pxp-key-change-after-suspend-resume:
    - shard-tglb:         NOTRUN -> [SKIP][75] ([i915#4270]) +2 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@gem_pxp@verify-pxp-key-change-after-suspend-resume.html

  * igt@gem_userptr_blits@unsync-unmap-cycles:
    - shard-tglb:         NOTRUN -> [SKIP][76] ([i915#3297]) +1 similar issue
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb2/igt@gem_userptr_blits@unsync-unmap-cycles.html

  * igt@gen3_render_linear_blits:
    - shard-tglb:         NOTRUN -> [SKIP][77] ([fdo#109289]) +2 similar issues
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb5/igt@gen3_render_linear_blits.html

  * igt@gen9_exec_parse@basic-rejected:
    - shard-tglb:         NOTRUN -> [SKIP][78] ([i915#2856]) +1 similar issue
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb3/igt@gen9_exec_parse@basic-rejected.html

  * igt@i915_pm_backlight@fade_with_suspend:
    - shard-tglb:         [PASS][79] -> [INCOMPLETE][80] ([i915#456])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-tglb2/igt@i915_pm_backlight@fade_with_suspend.html
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb7/igt@i915_pm_backlight@fade_with_suspend.html

  * igt@i915_pm_dc@dc6-dpms:
    - shard-tglb:         NOTRUN -> [FAIL][81] ([i915#454])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_rpm@modeset-non-lpsp-stress:
    - shard-tglb:         NOTRUN -> [SKIP][82] ([fdo#111644] / [i915#1397] / [i915#2411]) +1 similar issue
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@i915_pm_rpm@modeset-non-lpsp-stress.html

  * igt@i915_pm_rpm@modeset-pc8-residency-stress:
    - shard-tglb:         NOTRUN -> [SKIP][83] ([fdo#109506] / [i915#2411])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb2/igt@i915_pm_rpm@modeset-pc8-residency-stress.html

  * igt@i915_selftest@mock@requests:
    - shard-skl:          [PASS][84] -> [INCOMPLETE][85] ([i915#198])
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-skl1/igt@i915_selftest@mock@requests.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl10/igt@i915_selftest@mock@requests.html

  * igt@i915_suspend@fence-restore-tiled2untiled:
    - shard-tglb:         [PASS][86] -> [INCOMPLETE][87] ([i915#456] / [i915#750])
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-tglb1/igt@i915_suspend@fence-restore-tiled2untiled.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb7/igt@i915_suspend@fence-restore-tiled2untiled.html

  * igt@kms_addfb_basic@invalid-smem-bo-on-discrete:
    - shard-tglb:         NOTRUN -> [SKIP][88] ([i915#3826])
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@kms_addfb_basic@invalid-smem-bo-on-discrete.html

  * igt@kms_big_fb@linear-16bpp-rotate-270:
    - shard-tglb:         NOTRUN -> [SKIP][89] ([fdo#111614]) +4 similar issues
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb2/igt@kms_big_fb@linear-16bpp-rotate-270.html

  * igt@kms_big_fb@yf-tiled-addfb-size-overflow:
    - shard-tglb:         NOTRUN -> [SKIP][90] ([fdo#111615]) +2 similar issues
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb2/igt@kms_big_fb@yf-tiled-addfb-size-overflow.html

  * igt@kms_big_joiner@basic:
    - shard-tglb:         NOTRUN -> [SKIP][91] ([i915#2705])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@kms_big_joiner@basic.html

  * igt@kms_ccs@pipe-a-bad-rotation-90-y_tiled_gen12_rc_ccs_cc:
    - shard-skl:          NOTRUN -> [SKIP][92] ([fdo#109271] / [i915#3886]) +1 similar issue
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl7/igt@kms_ccs@pipe-a-bad-rotation-90-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_mc_ccs:
    - shard-glk:          NOTRUN -> [SKIP][93] ([fdo#109271] / [i915#3886]) +2 similar issues
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk8/igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-b-bad-rotation-90-y_tiled_gen12_mc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][94] ([i915#3689] / [i915#3886]) +1 similar issue
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb3/igt@kms_ccs@pipe-b-bad-rotation-90-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-c-ccs-on-another-bo-y_tiled_gen12_rc_ccs_cc:
    - shard-apl:          NOTRUN -> [SKIP][95] ([fdo#109271] / [i915#3886]) +3 similar issues
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-apl1/igt@kms_ccs@pipe-c-ccs-on-another-bo-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-d-bad-rotation-90-yf_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][96] ([i915#3689]) +12 similar issues
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb5/igt@kms_ccs@pipe-d-bad-rotation-90-yf_tiled_ccs.html

  * igt@kms_chamelium@vga-frame-dump:
    - shard-skl:          NOTRUN -> [SKIP][97] ([fdo#109271] / [fdo#111827]) +6 similar issues
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl7/igt@kms_chamelium@vga-frame-dump.html

  * igt@kms_color_chamelium@pipe-b-ctm-blue-to-red:
    - shard-apl:          NOTRUN -> [SKIP][98] ([fdo#109271] / [fdo#111827]) +7 similar issues
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-apl3/igt@kms_color_chamelium@pipe-b-ctm-blue-to-red.html

  * igt@kms_color_chamelium@pipe-b-ctm-limited-range:
    - shard-glk:          NOTRUN -> [SKIP][99] ([fdo#109271] / [fdo#111827]) +1 similar issue
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk8/igt@kms_color_chamelium@pipe-b-ctm-limited-range.html

  * igt@kms_color_chamelium@pipe-d-ctm-red-to-blue:
    - shard-tglb:         NOTRUN -> [SKIP][100] ([fdo#109284] / [fdo#111827]) +14 similar issues
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb3/igt@kms_color_chamelium@pipe-d-ctm-red-to-blue.html

  * igt@kms_content_protection@lic:
    - shard-tglb:         NOTRUN -> [SKIP][101] ([fdo#111828])
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@kms_content_protection@lic.html

  * igt@kms_content_protection@srm:
    - shard-glk:          NOTRUN -> [SKIP][102] ([fdo#109271]) +43 similar issues
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk8/igt@kms_content_protection@srm.html

  * igt@kms_cursor_crc@pipe-b-cursor-512x512-offscreen:
    - shard-skl:          NOTRUN -> [SKIP][103] ([fdo#109271]) +52 similar issues
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl7/igt@kms_cursor_crc@pipe-b-cursor-512x512-offscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-max-size-onscreen:
    - shard-tglb:         NOTRUN -> [SKIP][104] ([i915#3359]) +7 similar issues
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb3/igt@kms_cursor_crc@pipe-c-cursor-max-size-onscreen.html

  * igt@kms_cursor_crc@pipe-d-cursor-32x32-rapid-movement:
    - shard-tglb:         NOTRUN -> [SKIP][105] ([i915#3319]) +1 similar issue
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@kms_cursor_crc@pipe-d-cursor-32x32-rapid-movement.html

  * igt@kms_cursor_crc@pipe-d-cursor-512x170-random:
    - shard-tglb:         NOTRUN -> [SKIP][106] ([fdo#109279] / [i915#3359]) +1 similar issue
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb2/igt@kms_cursor_crc@pipe-d-cursor-512x170-random.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-atomic:
    - shard-tglb:         NOTRUN -> [SKIP][107] ([fdo#111825]) +35 similar issues
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb2/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size:
    - shard-skl:          [PASS][108] -> [FAIL][109] ([i915#2346] / [i915#533])
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-skl8/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl1/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@flip-vs-cursor-toggle:
    - shard-iclb:         [PASS][110] -> [FAIL][111] ([i915#2346])
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-iclb1/igt@kms_cursor_legacy@flip-vs-cursor-toggle.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-iclb7/igt@kms_cursor_legacy@flip-vs-cursor-toggle.html

  * igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size:
    - shard-tglb:         NOTRUN -> [SKIP][112] ([i915#4103])
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-apl:          [PASS][113] -> [DMESG-WARN][114] ([i915#180]) +2 similar issues
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-apl3/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-apl4/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_flip@plain-flip-ts-check-interruptible@a-hdmi-a1:
    - shard-glk:          [PASS][115] -> [FAIL][116] ([i915#2122])
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk3/igt@kms_flip@plain-flip-ts-check-interruptible@a-hdmi-a1.html
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk9/igt@kms_flip@plain-flip-ts-check-interruptible@a-hdmi-a1.html

  * igt@kms_hdr@bpc-switch-dpms:
    - shard-skl:          [PASS][117] -> [FAIL][118] ([i915#1188])
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-skl9/igt@kms_hdr@bpc-switch-dpms.html
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl1/igt@kms_hdr@bpc-switch-dpms.html

  * igt@kms_hdr@static-swap:
    - shard-tglb:         NOTRUN -> [SKIP][119] ([i915#1187])
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb3/igt@kms_hdr@static-swap.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-d-frame-sequence:
    - shard-skl:          NOTRUN -> [SKIP][120] ([fdo#109271] / [i915#533])
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl7/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-d-frame-sequence.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-opaque-fb:
    - shard-skl:          NOTRUN -> [FAIL][121] ([fdo#108145] / [i915#265])
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl7/igt@kms_plane_alpha_blend@pipe-b-alpha-opaque-fb.html

  * igt@kms_plane_alpha_blend@pipe-b-constant-alpha-max:
    - shard-apl:          NOTRUN -> [FAIL][122] ([fdo#108145] / [i915#265])
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-apl1/igt@kms_plane_alpha_blend@pipe-b-constant-alpha-max.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [PASS][123] -> [FAIL][124] ([fdo#108145] / [i915#265]) +1 similar issue
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-skl10/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl10/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-max:
    - shard-glk:          NOTRUN -> [FAIL][125] ([fdo#108145] / [i915#265])
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk8/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-max.html

  * igt@kms_plane_lowres@pipe-d-tiling-none:
    - shard-tglb:         NOTRUN -> [SKIP][126] ([i915#3536])
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@kms_plane_lowres@pipe-d-tiling-none.html

  * igt@kms_plane_multiple@atomic-pipe-d-tiling-yf:
    - shard-tglb:         NOTRUN -> [SKIP][127] ([fdo#112054])
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb3/igt@kms_plane_multiple@atomic-pipe-d-tiling-yf.html

  * igt@kms_psr2_sf@cursor-plane-update-sf:
    - shard-tglb:         NOTRUN -> [SKIP][128] ([i915#2920]) +1 similar issue
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@kms_psr2_sf@cursor-plane-update-sf.html

  * igt@kms_psr2_sf@plane-move-sf-dmg-area-3:
    - shard-skl:          NOTRUN -> [SKIP][129] ([fdo#109271] / [i915#658])
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl7/igt@kms_psr2_sf@plane-move-sf-dmg-area-3.html

  * igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-2:
    - shard-apl:          NOTRUN -> [SKIP][130] ([fdo#109271] / [i915#658])
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-apl1/igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-2.html

  * igt@kms_psr2_su@page_flip:
    - shard-glk:          NOTRUN -> [SKIP][131] ([fdo#109271] / [i915#658])
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk8/igt@kms_psr2_su@page_flip.html

  * igt@kms_psr@psr2_basic:
    - shard-iclb:         [PASS][132] -> [SKIP][133] ([fdo#109441]) +3 similar issues
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-iclb2/igt@kms_psr@psr2_basic.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-iclb7/igt@kms_psr@psr2_basic.html

  * igt@kms_psr@psr2_sprite_plane_move:
    - shard-tglb:         NOTRUN -> [FAIL][134] ([i915#132] / [i915#3467]) +2 similar issues
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb3/igt@kms_psr@psr2_sprite_plane_move.html

  * igt@kms_writeback@writeback-pixel-formats:
    - shard-tglb:         NOTRUN -> [SKIP][135] ([i915#2437])
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb5/igt@kms_writeback@writeback-pixel-formats.html

  * igt@nouveau_crc@pipe-b-ctx-flip-skip-current-frame:
    - shard-apl:          NOTRUN -> [SKIP][136] ([fdo#109271]) +82 similar issues
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-apl3/igt@nouveau_crc@pipe-b-ctx-flip-skip-current-frame.html
    - shard-tglb:         NOTRUN -> [SKIP][137] ([i915#2530]) +2 similar issues
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb3/igt@nouveau_crc@pipe-b-ctx-flip-skip-current-frame.html

  * igt@perf@polling-parameterized:
    - shard-tglb:         NOTRUN -> [FAIL][138] ([i915#1542])
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@perf@polling-parameterized.html

  * igt@prime_nv_pcopy@test1_macro:
    - shard-tglb:         NOTRUN -> [SKIP][139] ([fdo#109291]) +3 similar issues
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb1/igt@prime_nv_pcopy@test1_macro.html

  * igt@sysfs_clients@fair-1:
    - shard-glk:          NOTRUN -> [SKIP][140] ([fdo#109271] / [i915#2994]) +1 similar issue
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk8/igt@sysfs_clients@fair-1.html

  * igt@sysfs_clients@fair-3:
    - shard-skl:          NOTRUN -> [SKIP][141] ([fdo#109271] / [i915#2994])
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl7/igt@sysfs_clients@fair-3.html

  * igt@sysfs_clients@sema-10:
    - shard-tglb:         NOTRUN -> [SKIP][142] ([i915#2994]) +2 similar issues
   [142]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb3/igt@sysfs_clients@sema-10.html

  
#### Possible fixes ####

  * igt@gem_eio@in-flight-1us:
    - shard-skl:          [TIMEOUT][143] ([i915#3063]) -> [PASS][144]
   [143]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-skl7/igt@gem_eio@in-flight-1us.html
   [144]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-skl9/igt@gem_eio@in-flight-1us.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-iclb:         [FAIL][145] ([i915#2842]) -> [PASS][146] +1 similar issue
   [145]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-iclb8/igt@gem_exec_fair@basic-pace-solo@rcs0.html
   [146]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-iclb4/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_exec_fair@basic-pace@vcs0:
    - shard-tglb:         [FAIL][147] ([i915#2842]) -> [PASS][148]
   [147]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-tglb3/igt@gem_exec_fair@basic-pace@vcs0.html
   [148]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb8/igt@gem_exec_fair@basic-pace@vcs0.html

  * igt@gem_exec_fair@basic-throttle@rcs0:
    - shard-glk:          [FAIL][149] ([i915#2842]) -> [PASS][150]
   [149]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk3/igt@gem_exec_fair@basic-throttle@rcs0.html
   [150]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk5/igt@gem_exec_fair@basic-throttle@rcs0.html

  * igt@gem_exec_gttfill@all:
    - shard-glk:          [DMESG-WARN][151] ([i915#118]) -> [PASS][152]
   [151]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk9/igt@gem_exec_gttfill@all.html
   [152]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk6/igt@gem_exec_gttfill@all.html

  * igt@i915_suspend@debugfs-reader:
    - shard-tglb:         [INCOMPLETE][153] ([i915#456]) -> [PASS][154] +2 similar issues
   [153]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-tglb7/igt@i915_suspend@debugfs-reader.html
   [154]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb5/igt@i915_suspend@debugfs-reader.html

  * igt@kms_cursor_crc@pipe-a-cursor-128x42-offscreen:
    - shard-glk:          [FAIL][155] ([i915#3444]) -> [PASS][156]
   [155]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-glk7/igt@kms_cursor_crc@pipe-a-cursor-128x42-offscreen.html
   [156]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-glk5/igt@kms_cursor_crc@pipe-a-cursor-128x42-offscreen.html

  * igt@kms_cursor_crc@pipe-d-cursor-suspend:
    - shard-tglb:         [INCOMPLETE][157] ([i915#2411] / [i915#4211]) -> [PASS][158]
   [157]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-tglb7/igt@kms_cursor_crc@pipe-d-cursor-suspend.html
   [158]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/shard-tglb2/igt@kms_cursor_crc@pipe-d-cursor-suspend.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1:
    - shard-apl:          [FAIL][159] ([i915#79]) -> [PASS][160]
   [159]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10878/shard-apl8/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1.html
   [160]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwo

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21582/index.html

[-- Attachment #2: Type: text/html, Size: 33693 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 3/6] drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
  2021-11-14 11:12   ` [Intel-gfx] " Thomas Hellström
@ 2021-11-15 10:42     ` Matthew Auld
  -1 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 10:42 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 14/11/2021 11:12, Thomas Hellström wrote:
> Move the i915_gem_obj_copy_ttm() function to i915_gem_ttm_move.h.
> This will help keep a number of functions static when introducing
> async moves.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Intel-gfx] [PATCH v3 3/6] drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function
@ 2021-11-15 10:42     ` Matthew Auld
  0 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 10:42 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 14/11/2021 11:12, Thomas Hellström wrote:
> Move the i915_gem_obj_copy_ttm() function to i915_gem_ttm_move.h.
> This will help keep a number of functions static when introducing
> async moves.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 4/6] drm/i915/ttm: Break refcounting loops at device region unref time
  2021-11-14 11:12   ` [Intel-gfx] " Thomas Hellström
@ 2021-11-15 10:49     ` Matthew Auld
  -1 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 10:49 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 14/11/2021 11:12, Thomas Hellström wrote:
> There is an interesting refcounting loop:
> struct intel_memory_region has a struct ttm_resource_manager,
> ttm_resource_manager->move may hold a reference to i915_request,
> i915_request may hold a reference to intel_context,
> intel_context may hold a reference to drm_i915_gem_object,
> drm_i915_gem_object may hold a reference to intel_memory_region.

Would it help if we drop the per object region refcoutning? IIRC that 
was originally added to more cleanly appease some selftest teardown or 
something.

> 
> Break this loop when we drop the device reference count on the
> region by putting the region move fence.
> 
> Also hold dropping the device reference count until all objects of
> the region has been deleted, to avoid issues if proceeding with the
> device takedown while the region is still present.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c     |  1 +
>   drivers/gpu/drm/i915/gt/intel_region_lmem.c |  1 +
>   drivers/gpu/drm/i915/intel_memory_region.c  |  5 +++-
>   drivers/gpu/drm/i915/intel_memory_region.h  |  1 +
>   drivers/gpu/drm/i915/intel_region_ttm.c     | 28 +++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_region_ttm.h     |  2 ++
>   6 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 537a81445b90..a1df49378a0f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -1044,6 +1044,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>   
>   static const struct intel_memory_region_ops ttm_system_region_ops = {
>   	.init_object = __i915_gem_ttm_object_init,
> +	.disable = intel_region_ttm_disable,
>   };
>   
>   struct intel_memory_region *
> diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> index aec838ecb2ef..956916fd21f8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> @@ -108,6 +108,7 @@ region_lmem_init(struct intel_memory_region *mem)
>   static const struct intel_memory_region_ops intel_region_lmem_ops = {
>   	.init = region_lmem_init,
>   	.release = region_lmem_release,
> +	.disable = intel_region_ttm_disable,
>   	.init_object = __i915_gem_ttm_object_init,
>   };
>   
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
> index e7f7e6627750..1f67d2b68c24 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/intel_memory_region.c
> @@ -233,8 +233,11 @@ void intel_memory_regions_driver_release(struct drm_i915_private *i915)
>   		struct intel_memory_region *region =
>   			fetch_and_zero(&i915->mm.regions[i]);
>   
> -		if (region)
> +		if (region) {
> +			if (region->ops->disable)
> +				region->ops->disable(region);
>   			intel_memory_region_put(region);
> +		}
>   	}
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
> index 3feae3353d33..9bb77eacd206 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.h
> +++ b/drivers/gpu/drm/i915/intel_memory_region.h
> @@ -52,6 +52,7 @@ struct intel_memory_region_ops {
>   
>   	int (*init)(struct intel_memory_region *mem);
>   	void (*release)(struct intel_memory_region *mem);
> +	void (*disable)(struct intel_memory_region *mem);
>   
>   	int (*init_object)(struct intel_memory_region *mem,
>   			   struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
> index 2e901a27e259..4219d83a2b19 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.c
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.c
> @@ -114,6 +114,34 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
>   	mem->region_private = NULL;
>   }
>   
> +/**
> + * intel_region_ttm_disable - A TTM region disable callback helper
> + * @mem: The memory region.
> + *
> + * A helper that ensures that nothing any longer references a region at
> + * device takedown. Breaks refcounting loops and waits for objects in the
> + * region to be deleted.
> + */
> +void intel_region_ttm_disable(struct intel_memory_region *mem)
> +{
> +	struct ttm_resource_manager *man = mem->region_private;
> +
> +	/*
> +	 * Put the region's move fences. This releases requests that
> +	 * may hold on to contexts and vms that may hold on to buffer
> +	 * objects that may have a refcount on the region. :/
> +	 */
> +	if (man)
> +		ttm_resource_manager_cleanup(man);
> +
> +	/* Flush objects that may just have been freed */
> +	i915_gem_flush_free_objects(mem->i915);
> +
> +	/* Wait until the only region reference left is our own. */
> +	while (kref_read(&mem->kref) > 1)
> +		msleep(20);

If we leak an object, I guess we get an infinite loop here at driver 
release?

> +}
> +
>   /**
>    * intel_region_ttm_resource_to_rsgt -
>    * Convert an opaque TTM resource manager resource to a refcounted sg_table.
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
> index 7bbe2b46b504..197a8c179370 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.h
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.h
> @@ -22,6 +22,8 @@ int intel_region_ttm_init(struct intel_memory_region *mem);
>   
>   void intel_region_ttm_fini(struct intel_memory_region *mem);
>   
> +void intel_region_ttm_disable(struct intel_memory_region *mem);
> +
>   struct i915_refct_sgt *
>   intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
>   				  struct ttm_resource *res);
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Intel-gfx] [PATCH v3 4/6] drm/i915/ttm: Break refcounting loops at device region unref time
@ 2021-11-15 10:49     ` Matthew Auld
  0 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 10:49 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 14/11/2021 11:12, Thomas Hellström wrote:
> There is an interesting refcounting loop:
> struct intel_memory_region has a struct ttm_resource_manager,
> ttm_resource_manager->move may hold a reference to i915_request,
> i915_request may hold a reference to intel_context,
> intel_context may hold a reference to drm_i915_gem_object,
> drm_i915_gem_object may hold a reference to intel_memory_region.

Would it help if we drop the per object region refcoutning? IIRC that 
was originally added to more cleanly appease some selftest teardown or 
something.

> 
> Break this loop when we drop the device reference count on the
> region by putting the region move fence.
> 
> Also hold dropping the device reference count until all objects of
> the region has been deleted, to avoid issues if proceeding with the
> device takedown while the region is still present.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c     |  1 +
>   drivers/gpu/drm/i915/gt/intel_region_lmem.c |  1 +
>   drivers/gpu/drm/i915/intel_memory_region.c  |  5 +++-
>   drivers/gpu/drm/i915/intel_memory_region.h  |  1 +
>   drivers/gpu/drm/i915/intel_region_ttm.c     | 28 +++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_region_ttm.h     |  2 ++
>   6 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 537a81445b90..a1df49378a0f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -1044,6 +1044,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>   
>   static const struct intel_memory_region_ops ttm_system_region_ops = {
>   	.init_object = __i915_gem_ttm_object_init,
> +	.disable = intel_region_ttm_disable,
>   };
>   
>   struct intel_memory_region *
> diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> index aec838ecb2ef..956916fd21f8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> @@ -108,6 +108,7 @@ region_lmem_init(struct intel_memory_region *mem)
>   static const struct intel_memory_region_ops intel_region_lmem_ops = {
>   	.init = region_lmem_init,
>   	.release = region_lmem_release,
> +	.disable = intel_region_ttm_disable,
>   	.init_object = __i915_gem_ttm_object_init,
>   };
>   
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
> index e7f7e6627750..1f67d2b68c24 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/intel_memory_region.c
> @@ -233,8 +233,11 @@ void intel_memory_regions_driver_release(struct drm_i915_private *i915)
>   		struct intel_memory_region *region =
>   			fetch_and_zero(&i915->mm.regions[i]);
>   
> -		if (region)
> +		if (region) {
> +			if (region->ops->disable)
> +				region->ops->disable(region);
>   			intel_memory_region_put(region);
> +		}
>   	}
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
> index 3feae3353d33..9bb77eacd206 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.h
> +++ b/drivers/gpu/drm/i915/intel_memory_region.h
> @@ -52,6 +52,7 @@ struct intel_memory_region_ops {
>   
>   	int (*init)(struct intel_memory_region *mem);
>   	void (*release)(struct intel_memory_region *mem);
> +	void (*disable)(struct intel_memory_region *mem);
>   
>   	int (*init_object)(struct intel_memory_region *mem,
>   			   struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
> index 2e901a27e259..4219d83a2b19 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.c
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.c
> @@ -114,6 +114,34 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
>   	mem->region_private = NULL;
>   }
>   
> +/**
> + * intel_region_ttm_disable - A TTM region disable callback helper
> + * @mem: The memory region.
> + *
> + * A helper that ensures that nothing any longer references a region at
> + * device takedown. Breaks refcounting loops and waits for objects in the
> + * region to be deleted.
> + */
> +void intel_region_ttm_disable(struct intel_memory_region *mem)
> +{
> +	struct ttm_resource_manager *man = mem->region_private;
> +
> +	/*
> +	 * Put the region's move fences. This releases requests that
> +	 * may hold on to contexts and vms that may hold on to buffer
> +	 * objects that may have a refcount on the region. :/
> +	 */
> +	if (man)
> +		ttm_resource_manager_cleanup(man);
> +
> +	/* Flush objects that may just have been freed */
> +	i915_gem_flush_free_objects(mem->i915);
> +
> +	/* Wait until the only region reference left is our own. */
> +	while (kref_read(&mem->kref) > 1)
> +		msleep(20);

If we leak an object, I guess we get an infinite loop here at driver 
release?

> +}
> +
>   /**
>    * intel_region_ttm_resource_to_rsgt -
>    * Convert an opaque TTM resource manager resource to a refcounted sg_table.
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
> index 7bbe2b46b504..197a8c179370 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.h
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.h
> @@ -22,6 +22,8 @@ int intel_region_ttm_init(struct intel_memory_region *mem);
>   
>   void intel_region_ttm_fini(struct intel_memory_region *mem);
>   
> +void intel_region_ttm_disable(struct intel_memory_region *mem);
> +
>   struct i915_refct_sgt *
>   intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
>   				  struct ttm_resource *res);
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting
  2021-11-14 11:12   ` [Intel-gfx] " Thomas Hellström
@ 2021-11-15 12:36     ` Matthew Auld
  -1 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 12:36 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 14/11/2021 11:12, Thomas Hellström wrote:
> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> 
> For now, we will only allow async migration when TTM is used,
> so the paths we care about are related to TTM.
> 
> The mmap path is handled by having the fence in ttm_bo->moving,
> when pinning, the binding only becomes available after the moving
> fence is signaled, and pinning a cpu map will only work after
> the moving fence signals.
> 
> This should close all holes where userspace can read a buffer
> before it's fully migrated.
> 
> v2:
> - Fix a couple of SPARSE warnings
> v3:
> - Fix a NULL pointer dereference
> 
> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/display/intel_fbdev.c    |  7 ++--
>   drivers/gpu/drm/i915/display/intel_overlay.c  |  2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  6 +++
>   .../i915/gem/selftests/i915_gem_coherency.c   |  4 +-
>   .../drm/i915/gem/selftests/i915_gem_mman.c    | 22 ++++++-----
>   drivers/gpu/drm/i915/i915_vma.c               | 39 ++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_vma.h               |  3 ++
>   drivers/gpu/drm/i915/selftests/i915_vma.c     |  4 +-
>   8 files changed, 69 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c
> index adc3a81be9f7..5902ad0c2bd8 100644
> --- a/drivers/gpu/drm/i915/display/intel_fbdev.c
> +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
> @@ -265,11 +265,12 @@ static int intelfb_create(struct drm_fb_helper *helper,
>   		info->fix.smem_len = vma->node.size;
>   	}
>   
> -	vaddr = i915_vma_pin_iomap(vma);
> +	vaddr = i915_vma_pin_iomap_unlocked(vma);
>   	if (IS_ERR(vaddr)) {
> -		drm_err(&dev_priv->drm,
> -			"Failed to remap framebuffer into virtual memory\n");
>   		ret = PTR_ERR(vaddr);
> +		if (ret != -EINTR && ret != -ERESTARTSYS)
> +			drm_err(&dev_priv->drm,
> +				"Failed to remap framebuffer into virtual memory\n");
>   		goto out_unpin;
>   	}
>   	info->screen_base = vaddr;
> diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c b/drivers/gpu/drm/i915/display/intel_overlay.c
> index 7e3f5c6ca484..21593f3f2664 100644
> --- a/drivers/gpu/drm/i915/display/intel_overlay.c
> +++ b/drivers/gpu/drm/i915/display/intel_overlay.c
> @@ -1357,7 +1357,7 @@ static int get_registers(struct intel_overlay *overlay, bool use_phys)
>   		overlay->flip_addr = sg_dma_address(obj->mm.pages->sgl);
>   	else
>   		overlay->flip_addr = i915_ggtt_offset(vma);
> -	overlay->regs = i915_vma_pin_iomap(vma);
> +	overlay->regs = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   
>   	if (IS_ERR(overlay->regs)) {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index c4f684b7cc51..49c6e55c68ce 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
>   	}
>   
>   	if (!ptr) {
> +		err = i915_gem_object_wait_moving_fence(obj, true);
> +		if (err) {
> +			ptr = ERR_PTR(err);
> +			goto err_unpin;
> +		}
> +
>   		if (GEM_WARN_ON(type == I915_MAP_WC &&
>   				!static_cpu_has(X86_FEATURE_PAT)))
>   			ptr = ERR_PTR(-ENODEV);
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> index 13b088cc787e..067c512961ba 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> @@ -101,7 +101,7 @@ static int gtt_set(struct context *ctx, unsigned long offset, u32 v)
>   
>   	intel_gt_pm_get(vma->vm->gt);
>   
> -	map = i915_vma_pin_iomap(vma);
> +	map = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   	if (IS_ERR(map)) {
>   		err = PTR_ERR(map);
> @@ -134,7 +134,7 @@ static int gtt_get(struct context *ctx, unsigned long offset, u32 *v)
>   
>   	intel_gt_pm_get(vma->vm->gt);
>   
> -	map = i915_vma_pin_iomap(vma);
> +	map = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   	if (IS_ERR(map)) {
>   		err = PTR_ERR(map);
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index 6d30cdfa80f3..5d54181c2145 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -125,12 +125,13 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj,
>   	n = page - view.partial.offset;
>   	GEM_BUG_ON(n >= view.partial.size);
>   
> -	io = i915_vma_pin_iomap(vma);
> +	io = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   	if (IS_ERR(io)) {
> -		pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
> -		       page, (int)PTR_ERR(io));
>   		err = PTR_ERR(io);
> +		if (err != -EINTR && err != -ERESTARTSYS)
> +			pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
> +			       page, err);
>   		goto out;
>   	}
>   
> @@ -219,12 +220,15 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj,
>   		n = page - view.partial.offset;
>   		GEM_BUG_ON(n >= view.partial.size);
>   
> -		io = i915_vma_pin_iomap(vma);
> +		io = i915_vma_pin_iomap_unlocked(vma);
>   		i915_vma_unpin(vma);
>   		if (IS_ERR(io)) {
> -			pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
> -			       page, (int)PTR_ERR(io));
> -			return PTR_ERR(io);
> +			int err = PTR_ERR(io);
> +
> +			if (err != -EINTR && err != -ERESTARTSYS)
> +				pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
> +				       page, err);
> +			return err;
>   		}
>   
>   		iowrite32(page, io + n * PAGE_SIZE / sizeof(*io));
> @@ -773,7 +777,7 @@ static int gtt_set(struct drm_i915_gem_object *obj)
>   		return PTR_ERR(vma);
>   
>   	intel_gt_pm_get(vma->vm->gt);
> -	map = i915_vma_pin_iomap(vma);
> +	map = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   	if (IS_ERR(map)) {
>   		err = PTR_ERR(map);
> @@ -799,7 +803,7 @@ static int gtt_check(struct drm_i915_gem_object *obj)
>   		return PTR_ERR(vma);
>   
>   	intel_gt_pm_get(vma->vm->gt);
> -	map = i915_vma_pin_iomap(vma);
> +	map = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   	if (IS_ERR(map)) {
>   		err = PTR_ERR(map);
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 8781c4f61952..069f22b3cd48 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -431,6 +431,13 @@ int i915_vma_bind(struct i915_vma *vma,
>   			work->pinned = i915_gem_object_get(vma->obj);
>   		}
>   	} else {
> +		if (vma->obj) {
> +			int ret;
> +
> +			ret = i915_gem_object_wait_moving_fence(vma->obj, true);
> +			if (ret)
> +				return ret;
> +		}
>   		vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags);
>   	}
>   
> @@ -455,6 +462,10 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
>   
>   	ptr = READ_ONCE(vma->iomap);
>   	if (ptr == NULL) {
> +		err = i915_gem_object_wait_moving_fence(vma->obj, true);
> +		if (err)
> +			goto err;
> +
>   		/*
>   		 * TODO: consider just using i915_gem_object_pin_map() for lmem
>   		 * instead, which already supports mapping non-contiguous chunks
> @@ -496,6 +507,25 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
>   	return IO_ERR_PTR(err);
>   }
>   
> +void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma)
> +{
> +	struct i915_gem_ww_ctx ww;
> +	void __iomem *map;
> +	int err;
> +
> +	for_i915_gem_ww(&ww, err, true) {
> +		err = i915_gem_object_lock(vma->obj, &ww);
> +		if (err)
> +			continue;
> +
> +		map = i915_vma_pin_iomap(vma);
> +	}
> +	if (err)
> +		map = IO_ERR_PTR(err);
> +
> +	return map;
> +}

What is the reason for this change? Is this strictly related to this 
series/commit?

> +
>   void i915_vma_flush_writes(struct i915_vma *vma)
>   {
>   	if (i915_vma_unset_ggtt_write(vma))
> @@ -870,6 +900,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   		    u64 size, u64 alignment, u64 flags)
>   {
>   	struct i915_vma_work *work = NULL;
> +	struct dma_fence *moving = NULL;
>   	intel_wakeref_t wakeref = 0;
>   	unsigned int bound;
>   	int err;
> @@ -895,7 +926,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   	if (flags & PIN_GLOBAL)
>   		wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
>   
> -	if (flags & vma->vm->bind_async_flags) {
> +	moving = vma->obj ? i915_gem_object_get_moving_fence(vma->obj) : NULL;
> +	if (flags & vma->vm->bind_async_flags || moving) {
>   		/* lock VM */
>   		err = i915_vm_lock_objects(vma->vm, ww);
>   		if (err)
> @@ -909,6 +941,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   
>   		work->vm = i915_vm_get(vma->vm);
>   
> +		dma_fence_work_chain(&work->base, moving);
> +
>   		/* Allocate enough page directories to used PTE */
>   		if (vma->vm->allocate_va_range) {
>   			err = i915_vm_alloc_pt_stash(vma->vm,
> @@ -1013,7 +1047,10 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   err_rpm:
>   	if (wakeref)
>   		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
> +	if (moving)
> +		dma_fence_put(moving);
>   	vma_put_pages(vma);
> +
>   	return err;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 648dbe744c96..1812b2904a31 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -326,6 +326,9 @@ static inline bool i915_node_color_differs(const struct drm_mm_node *node,
>    * Returns a valid iomapped pointer or ERR_PTR.
>    */
>   void __iomem *i915_vma_pin_iomap(struct i915_vma *vma);
> +
> +void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma);
> +
>   #define IO_ERR_PTR(x) ((void __iomem *)ERR_PTR(x))
>   
>   /**
> diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
> index 1f10fe36619b..85f43b209890 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_vma.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
> @@ -1005,7 +1005,7 @@ static int igt_vma_remapped_gtt(void *arg)
>   
>   			GEM_BUG_ON(vma->ggtt_view.type != *t);
>   
> -			map = i915_vma_pin_iomap(vma);
> +			map = i915_vma_pin_iomap_unlocked(vma);
>   			i915_vma_unpin(vma);
>   			if (IS_ERR(map)) {
>   				err = PTR_ERR(map);
> @@ -1036,7 +1036,7 @@ static int igt_vma_remapped_gtt(void *arg)
>   
>   			GEM_BUG_ON(vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL);
>   
> -			map = i915_vma_pin_iomap(vma);
> +			map = i915_vma_pin_iomap_unlocked(vma);
>   			i915_vma_unpin(vma);
>   			if (IS_ERR(map)) {
>   				err = PTR_ERR(map);
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting
@ 2021-11-15 12:36     ` Matthew Auld
  0 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 12:36 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 14/11/2021 11:12, Thomas Hellström wrote:
> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> 
> For now, we will only allow async migration when TTM is used,
> so the paths we care about are related to TTM.
> 
> The mmap path is handled by having the fence in ttm_bo->moving,
> when pinning, the binding only becomes available after the moving
> fence is signaled, and pinning a cpu map will only work after
> the moving fence signals.
> 
> This should close all holes where userspace can read a buffer
> before it's fully migrated.
> 
> v2:
> - Fix a couple of SPARSE warnings
> v3:
> - Fix a NULL pointer dereference
> 
> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/display/intel_fbdev.c    |  7 ++--
>   drivers/gpu/drm/i915/display/intel_overlay.c  |  2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  6 +++
>   .../i915/gem/selftests/i915_gem_coherency.c   |  4 +-
>   .../drm/i915/gem/selftests/i915_gem_mman.c    | 22 ++++++-----
>   drivers/gpu/drm/i915/i915_vma.c               | 39 ++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_vma.h               |  3 ++
>   drivers/gpu/drm/i915/selftests/i915_vma.c     |  4 +-
>   8 files changed, 69 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c
> index adc3a81be9f7..5902ad0c2bd8 100644
> --- a/drivers/gpu/drm/i915/display/intel_fbdev.c
> +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
> @@ -265,11 +265,12 @@ static int intelfb_create(struct drm_fb_helper *helper,
>   		info->fix.smem_len = vma->node.size;
>   	}
>   
> -	vaddr = i915_vma_pin_iomap(vma);
> +	vaddr = i915_vma_pin_iomap_unlocked(vma);
>   	if (IS_ERR(vaddr)) {
> -		drm_err(&dev_priv->drm,
> -			"Failed to remap framebuffer into virtual memory\n");
>   		ret = PTR_ERR(vaddr);
> +		if (ret != -EINTR && ret != -ERESTARTSYS)
> +			drm_err(&dev_priv->drm,
> +				"Failed to remap framebuffer into virtual memory\n");
>   		goto out_unpin;
>   	}
>   	info->screen_base = vaddr;
> diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c b/drivers/gpu/drm/i915/display/intel_overlay.c
> index 7e3f5c6ca484..21593f3f2664 100644
> --- a/drivers/gpu/drm/i915/display/intel_overlay.c
> +++ b/drivers/gpu/drm/i915/display/intel_overlay.c
> @@ -1357,7 +1357,7 @@ static int get_registers(struct intel_overlay *overlay, bool use_phys)
>   		overlay->flip_addr = sg_dma_address(obj->mm.pages->sgl);
>   	else
>   		overlay->flip_addr = i915_ggtt_offset(vma);
> -	overlay->regs = i915_vma_pin_iomap(vma);
> +	overlay->regs = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   
>   	if (IS_ERR(overlay->regs)) {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index c4f684b7cc51..49c6e55c68ce 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
>   	}
>   
>   	if (!ptr) {
> +		err = i915_gem_object_wait_moving_fence(obj, true);
> +		if (err) {
> +			ptr = ERR_PTR(err);
> +			goto err_unpin;
> +		}
> +
>   		if (GEM_WARN_ON(type == I915_MAP_WC &&
>   				!static_cpu_has(X86_FEATURE_PAT)))
>   			ptr = ERR_PTR(-ENODEV);
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> index 13b088cc787e..067c512961ba 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> @@ -101,7 +101,7 @@ static int gtt_set(struct context *ctx, unsigned long offset, u32 v)
>   
>   	intel_gt_pm_get(vma->vm->gt);
>   
> -	map = i915_vma_pin_iomap(vma);
> +	map = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   	if (IS_ERR(map)) {
>   		err = PTR_ERR(map);
> @@ -134,7 +134,7 @@ static int gtt_get(struct context *ctx, unsigned long offset, u32 *v)
>   
>   	intel_gt_pm_get(vma->vm->gt);
>   
> -	map = i915_vma_pin_iomap(vma);
> +	map = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   	if (IS_ERR(map)) {
>   		err = PTR_ERR(map);
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index 6d30cdfa80f3..5d54181c2145 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -125,12 +125,13 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj,
>   	n = page - view.partial.offset;
>   	GEM_BUG_ON(n >= view.partial.size);
>   
> -	io = i915_vma_pin_iomap(vma);
> +	io = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   	if (IS_ERR(io)) {
> -		pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
> -		       page, (int)PTR_ERR(io));
>   		err = PTR_ERR(io);
> +		if (err != -EINTR && err != -ERESTARTSYS)
> +			pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
> +			       page, err);
>   		goto out;
>   	}
>   
> @@ -219,12 +220,15 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj,
>   		n = page - view.partial.offset;
>   		GEM_BUG_ON(n >= view.partial.size);
>   
> -		io = i915_vma_pin_iomap(vma);
> +		io = i915_vma_pin_iomap_unlocked(vma);
>   		i915_vma_unpin(vma);
>   		if (IS_ERR(io)) {
> -			pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
> -			       page, (int)PTR_ERR(io));
> -			return PTR_ERR(io);
> +			int err = PTR_ERR(io);
> +
> +			if (err != -EINTR && err != -ERESTARTSYS)
> +				pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
> +				       page, err);
> +			return err;
>   		}
>   
>   		iowrite32(page, io + n * PAGE_SIZE / sizeof(*io));
> @@ -773,7 +777,7 @@ static int gtt_set(struct drm_i915_gem_object *obj)
>   		return PTR_ERR(vma);
>   
>   	intel_gt_pm_get(vma->vm->gt);
> -	map = i915_vma_pin_iomap(vma);
> +	map = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   	if (IS_ERR(map)) {
>   		err = PTR_ERR(map);
> @@ -799,7 +803,7 @@ static int gtt_check(struct drm_i915_gem_object *obj)
>   		return PTR_ERR(vma);
>   
>   	intel_gt_pm_get(vma->vm->gt);
> -	map = i915_vma_pin_iomap(vma);
> +	map = i915_vma_pin_iomap_unlocked(vma);
>   	i915_vma_unpin(vma);
>   	if (IS_ERR(map)) {
>   		err = PTR_ERR(map);
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 8781c4f61952..069f22b3cd48 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -431,6 +431,13 @@ int i915_vma_bind(struct i915_vma *vma,
>   			work->pinned = i915_gem_object_get(vma->obj);
>   		}
>   	} else {
> +		if (vma->obj) {
> +			int ret;
> +
> +			ret = i915_gem_object_wait_moving_fence(vma->obj, true);
> +			if (ret)
> +				return ret;
> +		}
>   		vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags);
>   	}
>   
> @@ -455,6 +462,10 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
>   
>   	ptr = READ_ONCE(vma->iomap);
>   	if (ptr == NULL) {
> +		err = i915_gem_object_wait_moving_fence(vma->obj, true);
> +		if (err)
> +			goto err;
> +
>   		/*
>   		 * TODO: consider just using i915_gem_object_pin_map() for lmem
>   		 * instead, which already supports mapping non-contiguous chunks
> @@ -496,6 +507,25 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
>   	return IO_ERR_PTR(err);
>   }
>   
> +void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma)
> +{
> +	struct i915_gem_ww_ctx ww;
> +	void __iomem *map;
> +	int err;
> +
> +	for_i915_gem_ww(&ww, err, true) {
> +		err = i915_gem_object_lock(vma->obj, &ww);
> +		if (err)
> +			continue;
> +
> +		map = i915_vma_pin_iomap(vma);
> +	}
> +	if (err)
> +		map = IO_ERR_PTR(err);
> +
> +	return map;
> +}

What is the reason for this change? Is this strictly related to this 
series/commit?

> +
>   void i915_vma_flush_writes(struct i915_vma *vma)
>   {
>   	if (i915_vma_unset_ggtt_write(vma))
> @@ -870,6 +900,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   		    u64 size, u64 alignment, u64 flags)
>   {
>   	struct i915_vma_work *work = NULL;
> +	struct dma_fence *moving = NULL;
>   	intel_wakeref_t wakeref = 0;
>   	unsigned int bound;
>   	int err;
> @@ -895,7 +926,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   	if (flags & PIN_GLOBAL)
>   		wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
>   
> -	if (flags & vma->vm->bind_async_flags) {
> +	moving = vma->obj ? i915_gem_object_get_moving_fence(vma->obj) : NULL;
> +	if (flags & vma->vm->bind_async_flags || moving) {
>   		/* lock VM */
>   		err = i915_vm_lock_objects(vma->vm, ww);
>   		if (err)
> @@ -909,6 +941,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   
>   		work->vm = i915_vm_get(vma->vm);
>   
> +		dma_fence_work_chain(&work->base, moving);
> +
>   		/* Allocate enough page directories to used PTE */
>   		if (vma->vm->allocate_va_range) {
>   			err = i915_vm_alloc_pt_stash(vma->vm,
> @@ -1013,7 +1047,10 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   err_rpm:
>   	if (wakeref)
>   		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
> +	if (moving)
> +		dma_fence_put(moving);
>   	vma_put_pages(vma);
> +
>   	return err;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 648dbe744c96..1812b2904a31 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -326,6 +326,9 @@ static inline bool i915_node_color_differs(const struct drm_mm_node *node,
>    * Returns a valid iomapped pointer or ERR_PTR.
>    */
>   void __iomem *i915_vma_pin_iomap(struct i915_vma *vma);
> +
> +void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma);
> +
>   #define IO_ERR_PTR(x) ((void __iomem *)ERR_PTR(x))
>   
>   /**
> diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
> index 1f10fe36619b..85f43b209890 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_vma.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
> @@ -1005,7 +1005,7 @@ static int igt_vma_remapped_gtt(void *arg)
>   
>   			GEM_BUG_ON(vma->ggtt_view.type != *t);
>   
> -			map = i915_vma_pin_iomap(vma);
> +			map = i915_vma_pin_iomap_unlocked(vma);
>   			i915_vma_unpin(vma);
>   			if (IS_ERR(map)) {
>   				err = PTR_ERR(map);
> @@ -1036,7 +1036,7 @@ static int igt_vma_remapped_gtt(void *arg)
>   
>   			GEM_BUG_ON(vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL);
>   
> -			map = i915_vma_pin_iomap(vma);
> +			map = i915_vma_pin_iomap_unlocked(vma);
>   			i915_vma_unpin(vma);
>   			if (IS_ERR(map)) {
>   				err = PTR_ERR(map);
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 1/6] drm/i915: Add functions to set/get moving fence
  2021-11-14 11:12   ` [Intel-gfx] " Thomas Hellström
@ 2021-11-15 12:39     ` Matthew Auld
  -1 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 12:39 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 14/11/2021 11:12, Thomas Hellström wrote:
> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> 
> We want to get rid of i915_vma tracking to simplify the code and
> lifetimes. Add a way to set/put the moving fence, in preparation for
> removing the tracking.
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 37 ++++++++++++++++++++++
>   drivers/gpu/drm/i915/gem/i915_gem_object.h |  9 ++++++
>   2 files changed, 46 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 591ee3cb7275..ec4313836597 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -33,6 +33,7 @@
>   #include "i915_gem_object.h"
>   #include "i915_memcpy.h"
>   #include "i915_trace.h"
> +#include "i915_gem_ttm.h"
>   
>   static struct kmem_cache *slab_objects;
>   
> @@ -726,6 +727,42 @@ static const struct drm_gem_object_funcs i915_gem_object_funcs = {
>   	.export = i915_gem_prime_export,
>   };
>   
> +struct dma_fence *
> +i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj)
> +{
> +	return dma_fence_get(i915_gem_to_ttm(obj)->moving);
> +}
> +
> +void i915_gem_object_set_moving_fence(struct drm_i915_gem_object *obj,
> +				      struct dma_fence *fence)
> +{
> +	dma_fence_put(i915_gem_to_ttm(obj)->moving);
> +
> +	i915_gem_to_ttm(obj)->moving = dma_fence_get(fence);
> +}

Are these also assert_object_held()? Should we maybe squash this patch 
with the first user?

> +
> +int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> +				      bool intr)
> +{
> +	struct dma_fence *fence = i915_gem_to_ttm(obj)->moving;
> +	int ret;
> +
> +	assert_object_held(obj);
> +	if (!fence)
> +		return 0;
> +
> +	ret = dma_fence_wait(fence, intr);
> +	if (ret)
> +		return ret;
> +
> +	if (fence->error)
> +		return fence->error;
> +
> +	i915_gem_to_ttm(obj)->moving = NULL;
> +	dma_fence_put(fence);
> +	return 0;
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/huge_gem_object.c"
>   #include "selftests/huge_pages.c"
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 133963b46135..36bf3e2e602f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -517,6 +517,15 @@ i915_gem_object_finish_access(struct drm_i915_gem_object *obj)
>   	i915_gem_object_unpin_pages(obj);
>   }
>   
> +struct dma_fence *
> +i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj);
> +
> +void i915_gem_object_set_moving_fence(struct drm_i915_gem_object *obj,
> +				      struct dma_fence *fence);
> +
> +int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> +				      bool intr);
> +
>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   					 unsigned int cache_level);
>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Intel-gfx] [PATCH v3 1/6] drm/i915: Add functions to set/get moving fence
@ 2021-11-15 12:39     ` Matthew Auld
  0 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 12:39 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 14/11/2021 11:12, Thomas Hellström wrote:
> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> 
> We want to get rid of i915_vma tracking to simplify the code and
> lifetimes. Add a way to set/put the moving fence, in preparation for
> removing the tracking.
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 37 ++++++++++++++++++++++
>   drivers/gpu/drm/i915/gem/i915_gem_object.h |  9 ++++++
>   2 files changed, 46 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 591ee3cb7275..ec4313836597 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -33,6 +33,7 @@
>   #include "i915_gem_object.h"
>   #include "i915_memcpy.h"
>   #include "i915_trace.h"
> +#include "i915_gem_ttm.h"
>   
>   static struct kmem_cache *slab_objects;
>   
> @@ -726,6 +727,42 @@ static const struct drm_gem_object_funcs i915_gem_object_funcs = {
>   	.export = i915_gem_prime_export,
>   };
>   
> +struct dma_fence *
> +i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj)
> +{
> +	return dma_fence_get(i915_gem_to_ttm(obj)->moving);
> +}
> +
> +void i915_gem_object_set_moving_fence(struct drm_i915_gem_object *obj,
> +				      struct dma_fence *fence)
> +{
> +	dma_fence_put(i915_gem_to_ttm(obj)->moving);
> +
> +	i915_gem_to_ttm(obj)->moving = dma_fence_get(fence);
> +}

Are these also assert_object_held()? Should we maybe squash this patch 
with the first user?

> +
> +int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> +				      bool intr)
> +{
> +	struct dma_fence *fence = i915_gem_to_ttm(obj)->moving;
> +	int ret;
> +
> +	assert_object_held(obj);
> +	if (!fence)
> +		return 0;
> +
> +	ret = dma_fence_wait(fence, intr);
> +	if (ret)
> +		return ret;
> +
> +	if (fence->error)
> +		return fence->error;
> +
> +	i915_gem_to_ttm(obj)->moving = NULL;
> +	dma_fence_put(fence);
> +	return 0;
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/huge_gem_object.c"
>   #include "selftests/huge_pages.c"
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 133963b46135..36bf3e2e602f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -517,6 +517,15 @@ i915_gem_object_finish_access(struct drm_i915_gem_object *obj)
>   	i915_gem_object_unpin_pages(obj);
>   }
>   
> +struct dma_fence *
> +i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj);
> +
> +void i915_gem_object_set_moving_fence(struct drm_i915_gem_object *obj,
> +				      struct dma_fence *fence);
> +
> +int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> +				      bool intr);
> +
>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   					 unsigned int cache_level);
>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting
  2021-11-15 12:36     ` [Intel-gfx] " Matthew Auld
@ 2021-11-15 12:42       ` Thomas Hellström
  -1 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-15 12:42 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx, dri-devel


On 11/15/21 13:36, Matthew Auld wrote:
> On 14/11/2021 11:12, Thomas Hellström wrote:
>> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>
>> For now, we will only allow async migration when TTM is used,
>> so the paths we care about are related to TTM.
>>
>> The mmap path is handled by having the fence in ttm_bo->moving,
>> when pinning, the binding only becomes available after the moving
>> fence is signaled, and pinning a cpu map will only work after
>> the moving fence signals.
>>
>> This should close all holes where userspace can read a buffer
>> before it's fully migrated.
>>
>> v2:
>> - Fix a couple of SPARSE warnings
>> v3:
>> - Fix a NULL pointer dereference
>>
>> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/display/intel_fbdev.c    |  7 ++--
>>   drivers/gpu/drm/i915/display/intel_overlay.c  |  2 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  6 +++
>>   .../i915/gem/selftests/i915_gem_coherency.c   |  4 +-
>>   .../drm/i915/gem/selftests/i915_gem_mman.c    | 22 ++++++-----
>>   drivers/gpu/drm/i915/i915_vma.c               | 39 ++++++++++++++++++-
>>   drivers/gpu/drm/i915/i915_vma.h               |  3 ++
>>   drivers/gpu/drm/i915/selftests/i915_vma.c     |  4 +-
>>   8 files changed, 69 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
>> b/drivers/gpu/drm/i915/display/intel_fbdev.c
>> index adc3a81be9f7..5902ad0c2bd8 100644
>> --- a/drivers/gpu/drm/i915/display/intel_fbdev.c
>> +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
>> @@ -265,11 +265,12 @@ static int intelfb_create(struct drm_fb_helper 
>> *helper,
>>           info->fix.smem_len = vma->node.size;
>>       }
>>   -    vaddr = i915_vma_pin_iomap(vma);
>> +    vaddr = i915_vma_pin_iomap_unlocked(vma);
>>       if (IS_ERR(vaddr)) {
>> -        drm_err(&dev_priv->drm,
>> -            "Failed to remap framebuffer into virtual memory\n");
>>           ret = PTR_ERR(vaddr);
>> +        if (ret != -EINTR && ret != -ERESTARTSYS)
>> +            drm_err(&dev_priv->drm,
>> +                "Failed to remap framebuffer into virtual memory\n");
>>           goto out_unpin;
>>       }
>>       info->screen_base = vaddr;
>> diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c 
>> b/drivers/gpu/drm/i915/display/intel_overlay.c
>> index 7e3f5c6ca484..21593f3f2664 100644
>> --- a/drivers/gpu/drm/i915/display/intel_overlay.c
>> +++ b/drivers/gpu/drm/i915/display/intel_overlay.c
>> @@ -1357,7 +1357,7 @@ static int get_registers(struct intel_overlay 
>> *overlay, bool use_phys)
>>           overlay->flip_addr = sg_dma_address(obj->mm.pages->sgl);
>>       else
>>           overlay->flip_addr = i915_ggtt_offset(vma);
>> -    overlay->regs = i915_vma_pin_iomap(vma);
>> +    overlay->regs = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>         if (IS_ERR(overlay->regs)) {
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>> index c4f684b7cc51..49c6e55c68ce 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>> @@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct 
>> drm_i915_gem_object *obj,
>>       }
>>         if (!ptr) {
>> +        err = i915_gem_object_wait_moving_fence(obj, true);
>> +        if (err) {
>> +            ptr = ERR_PTR(err);
>> +            goto err_unpin;
>> +        }
>> +
>>           if (GEM_WARN_ON(type == I915_MAP_WC &&
>>                   !static_cpu_has(X86_FEATURE_PAT)))
>>               ptr = ERR_PTR(-ENODEV);
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>> index 13b088cc787e..067c512961ba 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>> @@ -101,7 +101,7 @@ static int gtt_set(struct context *ctx, unsigned 
>> long offset, u32 v)
>>         intel_gt_pm_get(vma->vm->gt);
>>   -    map = i915_vma_pin_iomap(vma);
>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>       if (IS_ERR(map)) {
>>           err = PTR_ERR(map);
>> @@ -134,7 +134,7 @@ static int gtt_get(struct context *ctx, unsigned 
>> long offset, u32 *v)
>>         intel_gt_pm_get(vma->vm->gt);
>>   -    map = i915_vma_pin_iomap(vma);
>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>       if (IS_ERR(map)) {
>>           err = PTR_ERR(map);
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> index 6d30cdfa80f3..5d54181c2145 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> @@ -125,12 +125,13 @@ static int check_partial_mapping(struct 
>> drm_i915_gem_object *obj,
>>       n = page - view.partial.offset;
>>       GEM_BUG_ON(n >= view.partial.size);
>>   -    io = i915_vma_pin_iomap(vma);
>> +    io = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>       if (IS_ERR(io)) {
>> -        pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
>> -               page, (int)PTR_ERR(io));
>>           err = PTR_ERR(io);
>> +        if (err != -EINTR && err != -ERESTARTSYS)
>> +            pr_err("Failed to iomap partial view: offset=%lu; 
>> err=%d\n",
>> +                   page, err);
>>           goto out;
>>       }
>>   @@ -219,12 +220,15 @@ static int check_partial_mappings(struct 
>> drm_i915_gem_object *obj,
>>           n = page - view.partial.offset;
>>           GEM_BUG_ON(n >= view.partial.size);
>>   -        io = i915_vma_pin_iomap(vma);
>> +        io = i915_vma_pin_iomap_unlocked(vma);
>>           i915_vma_unpin(vma);
>>           if (IS_ERR(io)) {
>> -            pr_err("Failed to iomap partial view: offset=%lu; 
>> err=%d\n",
>> -                   page, (int)PTR_ERR(io));
>> -            return PTR_ERR(io);
>> +            int err = PTR_ERR(io);
>> +
>> +            if (err != -EINTR && err != -ERESTARTSYS)
>> +                pr_err("Failed to iomap partial view: offset=%lu; 
>> err=%d\n",
>> +                       page, err);
>> +            return err;
>>           }
>>             iowrite32(page, io + n * PAGE_SIZE / sizeof(*io));
>> @@ -773,7 +777,7 @@ static int gtt_set(struct drm_i915_gem_object *obj)
>>           return PTR_ERR(vma);
>>         intel_gt_pm_get(vma->vm->gt);
>> -    map = i915_vma_pin_iomap(vma);
>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>       if (IS_ERR(map)) {
>>           err = PTR_ERR(map);
>> @@ -799,7 +803,7 @@ static int gtt_check(struct drm_i915_gem_object 
>> *obj)
>>           return PTR_ERR(vma);
>>         intel_gt_pm_get(vma->vm->gt);
>> -    map = i915_vma_pin_iomap(vma);
>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>       if (IS_ERR(map)) {
>>           err = PTR_ERR(map);
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index 8781c4f61952..069f22b3cd48 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -431,6 +431,13 @@ int i915_vma_bind(struct i915_vma *vma,
>>               work->pinned = i915_gem_object_get(vma->obj);
>>           }
>>       } else {
>> +        if (vma->obj) {
>> +            int ret;
>> +
>> +            ret = i915_gem_object_wait_moving_fence(vma->obj, true);
>> +            if (ret)
>> +                return ret;
>> +        }
>>           vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, 
>> bind_flags);
>>       }
>>   @@ -455,6 +462,10 @@ void __iomem *i915_vma_pin_iomap(struct 
>> i915_vma *vma)
>>         ptr = READ_ONCE(vma->iomap);
>>       if (ptr == NULL) {
>> +        err = i915_gem_object_wait_moving_fence(vma->obj, true);
>> +        if (err)
>> +            goto err;
>> +
>>           /*
>>            * TODO: consider just using i915_gem_object_pin_map() for 
>> lmem
>>            * instead, which already supports mapping non-contiguous 
>> chunks
>> @@ -496,6 +507,25 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma 
>> *vma)
>>       return IO_ERR_PTR(err);
>>   }
>>   +void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma)
>> +{
>> +    struct i915_gem_ww_ctx ww;
>> +    void __iomem *map;
>> +    int err;
>> +
>> +    for_i915_gem_ww(&ww, err, true) {
>> +        err = i915_gem_object_lock(vma->obj, &ww);
>> +        if (err)
>> +            continue;
>> +
>> +        map = i915_vma_pin_iomap(vma);
>> +    }
>> +    if (err)
>> +        map = IO_ERR_PTR(err);
>> +
>> +    return map;
>> +}
>
> What is the reason for this change? Is this strictly related to this 
> series/commit?

Yes, it's because pulling out the moving fence requires the dma_resv lock.

/Thomas



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting
@ 2021-11-15 12:42       ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-15 12:42 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx, dri-devel


On 11/15/21 13:36, Matthew Auld wrote:
> On 14/11/2021 11:12, Thomas Hellström wrote:
>> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>
>> For now, we will only allow async migration when TTM is used,
>> so the paths we care about are related to TTM.
>>
>> The mmap path is handled by having the fence in ttm_bo->moving,
>> when pinning, the binding only becomes available after the moving
>> fence is signaled, and pinning a cpu map will only work after
>> the moving fence signals.
>>
>> This should close all holes where userspace can read a buffer
>> before it's fully migrated.
>>
>> v2:
>> - Fix a couple of SPARSE warnings
>> v3:
>> - Fix a NULL pointer dereference
>>
>> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/display/intel_fbdev.c    |  7 ++--
>>   drivers/gpu/drm/i915/display/intel_overlay.c  |  2 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  6 +++
>>   .../i915/gem/selftests/i915_gem_coherency.c   |  4 +-
>>   .../drm/i915/gem/selftests/i915_gem_mman.c    | 22 ++++++-----
>>   drivers/gpu/drm/i915/i915_vma.c               | 39 ++++++++++++++++++-
>>   drivers/gpu/drm/i915/i915_vma.h               |  3 ++
>>   drivers/gpu/drm/i915/selftests/i915_vma.c     |  4 +-
>>   8 files changed, 69 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
>> b/drivers/gpu/drm/i915/display/intel_fbdev.c
>> index adc3a81be9f7..5902ad0c2bd8 100644
>> --- a/drivers/gpu/drm/i915/display/intel_fbdev.c
>> +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
>> @@ -265,11 +265,12 @@ static int intelfb_create(struct drm_fb_helper 
>> *helper,
>>           info->fix.smem_len = vma->node.size;
>>       }
>>   -    vaddr = i915_vma_pin_iomap(vma);
>> +    vaddr = i915_vma_pin_iomap_unlocked(vma);
>>       if (IS_ERR(vaddr)) {
>> -        drm_err(&dev_priv->drm,
>> -            "Failed to remap framebuffer into virtual memory\n");
>>           ret = PTR_ERR(vaddr);
>> +        if (ret != -EINTR && ret != -ERESTARTSYS)
>> +            drm_err(&dev_priv->drm,
>> +                "Failed to remap framebuffer into virtual memory\n");
>>           goto out_unpin;
>>       }
>>       info->screen_base = vaddr;
>> diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c 
>> b/drivers/gpu/drm/i915/display/intel_overlay.c
>> index 7e3f5c6ca484..21593f3f2664 100644
>> --- a/drivers/gpu/drm/i915/display/intel_overlay.c
>> +++ b/drivers/gpu/drm/i915/display/intel_overlay.c
>> @@ -1357,7 +1357,7 @@ static int get_registers(struct intel_overlay 
>> *overlay, bool use_phys)
>>           overlay->flip_addr = sg_dma_address(obj->mm.pages->sgl);
>>       else
>>           overlay->flip_addr = i915_ggtt_offset(vma);
>> -    overlay->regs = i915_vma_pin_iomap(vma);
>> +    overlay->regs = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>         if (IS_ERR(overlay->regs)) {
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>> index c4f684b7cc51..49c6e55c68ce 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>> @@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct 
>> drm_i915_gem_object *obj,
>>       }
>>         if (!ptr) {
>> +        err = i915_gem_object_wait_moving_fence(obj, true);
>> +        if (err) {
>> +            ptr = ERR_PTR(err);
>> +            goto err_unpin;
>> +        }
>> +
>>           if (GEM_WARN_ON(type == I915_MAP_WC &&
>>                   !static_cpu_has(X86_FEATURE_PAT)))
>>               ptr = ERR_PTR(-ENODEV);
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>> index 13b088cc787e..067c512961ba 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>> @@ -101,7 +101,7 @@ static int gtt_set(struct context *ctx, unsigned 
>> long offset, u32 v)
>>         intel_gt_pm_get(vma->vm->gt);
>>   -    map = i915_vma_pin_iomap(vma);
>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>       if (IS_ERR(map)) {
>>           err = PTR_ERR(map);
>> @@ -134,7 +134,7 @@ static int gtt_get(struct context *ctx, unsigned 
>> long offset, u32 *v)
>>         intel_gt_pm_get(vma->vm->gt);
>>   -    map = i915_vma_pin_iomap(vma);
>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>       if (IS_ERR(map)) {
>>           err = PTR_ERR(map);
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> index 6d30cdfa80f3..5d54181c2145 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> @@ -125,12 +125,13 @@ static int check_partial_mapping(struct 
>> drm_i915_gem_object *obj,
>>       n = page - view.partial.offset;
>>       GEM_BUG_ON(n >= view.partial.size);
>>   -    io = i915_vma_pin_iomap(vma);
>> +    io = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>       if (IS_ERR(io)) {
>> -        pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
>> -               page, (int)PTR_ERR(io));
>>           err = PTR_ERR(io);
>> +        if (err != -EINTR && err != -ERESTARTSYS)
>> +            pr_err("Failed to iomap partial view: offset=%lu; 
>> err=%d\n",
>> +                   page, err);
>>           goto out;
>>       }
>>   @@ -219,12 +220,15 @@ static int check_partial_mappings(struct 
>> drm_i915_gem_object *obj,
>>           n = page - view.partial.offset;
>>           GEM_BUG_ON(n >= view.partial.size);
>>   -        io = i915_vma_pin_iomap(vma);
>> +        io = i915_vma_pin_iomap_unlocked(vma);
>>           i915_vma_unpin(vma);
>>           if (IS_ERR(io)) {
>> -            pr_err("Failed to iomap partial view: offset=%lu; 
>> err=%d\n",
>> -                   page, (int)PTR_ERR(io));
>> -            return PTR_ERR(io);
>> +            int err = PTR_ERR(io);
>> +
>> +            if (err != -EINTR && err != -ERESTARTSYS)
>> +                pr_err("Failed to iomap partial view: offset=%lu; 
>> err=%d\n",
>> +                       page, err);
>> +            return err;
>>           }
>>             iowrite32(page, io + n * PAGE_SIZE / sizeof(*io));
>> @@ -773,7 +777,7 @@ static int gtt_set(struct drm_i915_gem_object *obj)
>>           return PTR_ERR(vma);
>>         intel_gt_pm_get(vma->vm->gt);
>> -    map = i915_vma_pin_iomap(vma);
>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>       if (IS_ERR(map)) {
>>           err = PTR_ERR(map);
>> @@ -799,7 +803,7 @@ static int gtt_check(struct drm_i915_gem_object 
>> *obj)
>>           return PTR_ERR(vma);
>>         intel_gt_pm_get(vma->vm->gt);
>> -    map = i915_vma_pin_iomap(vma);
>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>       i915_vma_unpin(vma);
>>       if (IS_ERR(map)) {
>>           err = PTR_ERR(map);
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index 8781c4f61952..069f22b3cd48 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -431,6 +431,13 @@ int i915_vma_bind(struct i915_vma *vma,
>>               work->pinned = i915_gem_object_get(vma->obj);
>>           }
>>       } else {
>> +        if (vma->obj) {
>> +            int ret;
>> +
>> +            ret = i915_gem_object_wait_moving_fence(vma->obj, true);
>> +            if (ret)
>> +                return ret;
>> +        }
>>           vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, 
>> bind_flags);
>>       }
>>   @@ -455,6 +462,10 @@ void __iomem *i915_vma_pin_iomap(struct 
>> i915_vma *vma)
>>         ptr = READ_ONCE(vma->iomap);
>>       if (ptr == NULL) {
>> +        err = i915_gem_object_wait_moving_fence(vma->obj, true);
>> +        if (err)
>> +            goto err;
>> +
>>           /*
>>            * TODO: consider just using i915_gem_object_pin_map() for 
>> lmem
>>            * instead, which already supports mapping non-contiguous 
>> chunks
>> @@ -496,6 +507,25 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma 
>> *vma)
>>       return IO_ERR_PTR(err);
>>   }
>>   +void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma)
>> +{
>> +    struct i915_gem_ww_ctx ww;
>> +    void __iomem *map;
>> +    int err;
>> +
>> +    for_i915_gem_ww(&ww, err, true) {
>> +        err = i915_gem_object_lock(vma->obj, &ww);
>> +        if (err)
>> +            continue;
>> +
>> +        map = i915_vma_pin_iomap(vma);
>> +    }
>> +    if (err)
>> +        map = IO_ERR_PTR(err);
>> +
>> +    return map;
>> +}
>
> What is the reason for this change? Is this strictly related to this 
> series/commit?

Yes, it's because pulling out the moving fence requires the dma_resv lock.

/Thomas



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 1/6] drm/i915: Add functions to set/get moving fence
  2021-11-15 12:39     ` [Intel-gfx] " Matthew Auld
@ 2021-11-15 12:44       ` Thomas Hellström
  -1 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-15 12:44 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx, dri-devel


On 11/15/21 13:39, Matthew Auld wrote:
> On 14/11/2021 11:12, Thomas Hellström wrote:
>> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>
>> We want to get rid of i915_vma tracking to simplify the code and
>> lifetimes. Add a way to set/put the moving fence, in preparation for
>> removing the tracking.
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 37 ++++++++++++++++++++++
>>   drivers/gpu/drm/i915/gem/i915_gem_object.h |  9 ++++++
>>   2 files changed, 46 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 591ee3cb7275..ec4313836597 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -33,6 +33,7 @@
>>   #include "i915_gem_object.h"
>>   #include "i915_memcpy.h"
>>   #include "i915_trace.h"
>> +#include "i915_gem_ttm.h"
>>     static struct kmem_cache *slab_objects;
>>   @@ -726,6 +727,42 @@ static const struct drm_gem_object_funcs 
>> i915_gem_object_funcs = {
>>       .export = i915_gem_prime_export,
>>   };
>>   +struct dma_fence *
>> +i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj)
>> +{
>> +    return dma_fence_get(i915_gem_to_ttm(obj)->moving);
>> +}
>> +
>> +void i915_gem_object_set_moving_fence(struct drm_i915_gem_object *obj,
>> +                      struct dma_fence *fence)
>> +{
>> +    dma_fence_put(i915_gem_to_ttm(obj)->moving);
>> +
>> +    i915_gem_to_ttm(obj)->moving = dma_fence_get(fence);
>> +}
>
> Are these also assert_object_held()? Should we maybe squash this patch 
> with the first user?

Yes these are also assert_object_held(). We could probably squash these, 
yes.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Intel-gfx] [PATCH v3 1/6] drm/i915: Add functions to set/get moving fence
@ 2021-11-15 12:44       ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-15 12:44 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx, dri-devel


On 11/15/21 13:39, Matthew Auld wrote:
> On 14/11/2021 11:12, Thomas Hellström wrote:
>> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>
>> We want to get rid of i915_vma tracking to simplify the code and
>> lifetimes. Add a way to set/put the moving fence, in preparation for
>> removing the tracking.
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 37 ++++++++++++++++++++++
>>   drivers/gpu/drm/i915/gem/i915_gem_object.h |  9 ++++++
>>   2 files changed, 46 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 591ee3cb7275..ec4313836597 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -33,6 +33,7 @@
>>   #include "i915_gem_object.h"
>>   #include "i915_memcpy.h"
>>   #include "i915_trace.h"
>> +#include "i915_gem_ttm.h"
>>     static struct kmem_cache *slab_objects;
>>   @@ -726,6 +727,42 @@ static const struct drm_gem_object_funcs 
>> i915_gem_object_funcs = {
>>       .export = i915_gem_prime_export,
>>   };
>>   +struct dma_fence *
>> +i915_gem_object_get_moving_fence(struct drm_i915_gem_object *obj)
>> +{
>> +    return dma_fence_get(i915_gem_to_ttm(obj)->moving);
>> +}
>> +
>> +void i915_gem_object_set_moving_fence(struct drm_i915_gem_object *obj,
>> +                      struct dma_fence *fence)
>> +{
>> +    dma_fence_put(i915_gem_to_ttm(obj)->moving);
>> +
>> +    i915_gem_to_ttm(obj)->moving = dma_fence_get(fence);
>> +}
>
> Are these also assert_object_held()? Should we maybe squash this patch 
> with the first user?

Yes these are also assert_object_held(). We could probably squash these, 
yes.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting
  2021-11-15 12:42       ` [Intel-gfx] " Thomas Hellström
@ 2021-11-15 13:13         ` Matthew Auld
  -1 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 13:13 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 15/11/2021 12:42, Thomas Hellström wrote:
> 
> On 11/15/21 13:36, Matthew Auld wrote:
>> On 14/11/2021 11:12, Thomas Hellström wrote:
>>> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>
>>> For now, we will only allow async migration when TTM is used,
>>> so the paths we care about are related to TTM.
>>>
>>> The mmap path is handled by having the fence in ttm_bo->moving,
>>> when pinning, the binding only becomes available after the moving
>>> fence is signaled, and pinning a cpu map will only work after
>>> the moving fence signals.
>>>
>>> This should close all holes where userspace can read a buffer
>>> before it's fully migrated.
>>>
>>> v2:
>>> - Fix a couple of SPARSE warnings
>>> v3:
>>> - Fix a NULL pointer dereference
>>>
>>> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/display/intel_fbdev.c    |  7 ++--
>>>   drivers/gpu/drm/i915/display/intel_overlay.c  |  2 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  6 +++
>>>   .../i915/gem/selftests/i915_gem_coherency.c   |  4 +-
>>>   .../drm/i915/gem/selftests/i915_gem_mman.c    | 22 ++++++-----
>>>   drivers/gpu/drm/i915/i915_vma.c               | 39 ++++++++++++++++++-
>>>   drivers/gpu/drm/i915/i915_vma.h               |  3 ++
>>>   drivers/gpu/drm/i915/selftests/i915_vma.c     |  4 +-
>>>   8 files changed, 69 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
>>> b/drivers/gpu/drm/i915/display/intel_fbdev.c
>>> index adc3a81be9f7..5902ad0c2bd8 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_fbdev.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
>>> @@ -265,11 +265,12 @@ static int intelfb_create(struct drm_fb_helper 
>>> *helper,
>>>           info->fix.smem_len = vma->node.size;
>>>       }
>>>   -    vaddr = i915_vma_pin_iomap(vma);
>>> +    vaddr = i915_vma_pin_iomap_unlocked(vma);
>>>       if (IS_ERR(vaddr)) {
>>> -        drm_err(&dev_priv->drm,
>>> -            "Failed to remap framebuffer into virtual memory\n");
>>>           ret = PTR_ERR(vaddr);
>>> +        if (ret != -EINTR && ret != -ERESTARTSYS)
>>> +            drm_err(&dev_priv->drm,
>>> +                "Failed to remap framebuffer into virtual memory\n");
>>>           goto out_unpin;
>>>       }
>>>       info->screen_base = vaddr;
>>> diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c 
>>> b/drivers/gpu/drm/i915/display/intel_overlay.c
>>> index 7e3f5c6ca484..21593f3f2664 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_overlay.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_overlay.c
>>> @@ -1357,7 +1357,7 @@ static int get_registers(struct intel_overlay 
>>> *overlay, bool use_phys)
>>>           overlay->flip_addr = sg_dma_address(obj->mm.pages->sgl);
>>>       else
>>>           overlay->flip_addr = i915_ggtt_offset(vma);
>>> -    overlay->regs = i915_vma_pin_iomap(vma);
>>> +    overlay->regs = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>         if (IS_ERR(overlay->regs)) {
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>> index c4f684b7cc51..49c6e55c68ce 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>> @@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct 
>>> drm_i915_gem_object *obj,
>>>       }
>>>         if (!ptr) {
>>> +        err = i915_gem_object_wait_moving_fence(obj, true);
>>> +        if (err) {
>>> +            ptr = ERR_PTR(err);
>>> +            goto err_unpin;
>>> +        }
>>> +
>>>           if (GEM_WARN_ON(type == I915_MAP_WC &&
>>>                   !static_cpu_has(X86_FEATURE_PAT)))
>>>               ptr = ERR_PTR(-ENODEV);
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>> index 13b088cc787e..067c512961ba 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>> @@ -101,7 +101,7 @@ static int gtt_set(struct context *ctx, unsigned 
>>> long offset, u32 v)
>>>         intel_gt_pm_get(vma->vm->gt);
>>>   -    map = i915_vma_pin_iomap(vma);
>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>       if (IS_ERR(map)) {
>>>           err = PTR_ERR(map);
>>> @@ -134,7 +134,7 @@ static int gtt_get(struct context *ctx, unsigned 
>>> long offset, u32 *v)
>>>         intel_gt_pm_get(vma->vm->gt);
>>>   -    map = i915_vma_pin_iomap(vma);
>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>       if (IS_ERR(map)) {
>>>           err = PTR_ERR(map);
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> index 6d30cdfa80f3..5d54181c2145 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> @@ -125,12 +125,13 @@ static int check_partial_mapping(struct 
>>> drm_i915_gem_object *obj,
>>>       n = page - view.partial.offset;
>>>       GEM_BUG_ON(n >= view.partial.size);
>>>   -    io = i915_vma_pin_iomap(vma);
>>> +    io = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>       if (IS_ERR(io)) {
>>> -        pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
>>> -               page, (int)PTR_ERR(io));
>>>           err = PTR_ERR(io);
>>> +        if (err != -EINTR && err != -ERESTARTSYS)
>>> +            pr_err("Failed to iomap partial view: offset=%lu; 
>>> err=%d\n",
>>> +                   page, err);
>>>           goto out;
>>>       }
>>>   @@ -219,12 +220,15 @@ static int check_partial_mappings(struct 
>>> drm_i915_gem_object *obj,
>>>           n = page - view.partial.offset;
>>>           GEM_BUG_ON(n >= view.partial.size);
>>>   -        io = i915_vma_pin_iomap(vma);
>>> +        io = i915_vma_pin_iomap_unlocked(vma);
>>>           i915_vma_unpin(vma);
>>>           if (IS_ERR(io)) {
>>> -            pr_err("Failed to iomap partial view: offset=%lu; 
>>> err=%d\n",
>>> -                   page, (int)PTR_ERR(io));
>>> -            return PTR_ERR(io);
>>> +            int err = PTR_ERR(io);
>>> +
>>> +            if (err != -EINTR && err != -ERESTARTSYS)
>>> +                pr_err("Failed to iomap partial view: offset=%lu; 
>>> err=%d\n",
>>> +                       page, err);
>>> +            return err;
>>>           }
>>>             iowrite32(page, io + n * PAGE_SIZE / sizeof(*io));
>>> @@ -773,7 +777,7 @@ static int gtt_set(struct drm_i915_gem_object *obj)
>>>           return PTR_ERR(vma);
>>>         intel_gt_pm_get(vma->vm->gt);
>>> -    map = i915_vma_pin_iomap(vma);
>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>       if (IS_ERR(map)) {
>>>           err = PTR_ERR(map);
>>> @@ -799,7 +803,7 @@ static int gtt_check(struct drm_i915_gem_object 
>>> *obj)
>>>           return PTR_ERR(vma);
>>>         intel_gt_pm_get(vma->vm->gt);
>>> -    map = i915_vma_pin_iomap(vma);
>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>       if (IS_ERR(map)) {
>>>           err = PTR_ERR(map);
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>> b/drivers/gpu/drm/i915/i915_vma.c
>>> index 8781c4f61952..069f22b3cd48 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>> @@ -431,6 +431,13 @@ int i915_vma_bind(struct i915_vma *vma,
>>>               work->pinned = i915_gem_object_get(vma->obj);
>>>           }
>>>       } else {
>>> +        if (vma->obj) {
>>> +            int ret;
>>> +
>>> +            ret = i915_gem_object_wait_moving_fence(vma->obj, true);
>>> +            if (ret)
>>> +                return ret;
>>> +        }
>>>           vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, 
>>> bind_flags);
>>>       }
>>>   @@ -455,6 +462,10 @@ void __iomem *i915_vma_pin_iomap(struct 
>>> i915_vma *vma)
>>>         ptr = READ_ONCE(vma->iomap);
>>>       if (ptr == NULL) {
>>> +        err = i915_gem_object_wait_moving_fence(vma->obj, true);
>>> +        if (err)
>>> +            goto err;
>>> +
>>>           /*
>>>            * TODO: consider just using i915_gem_object_pin_map() for 
>>> lmem
>>>            * instead, which already supports mapping non-contiguous 
>>> chunks
>>> @@ -496,6 +507,25 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma 
>>> *vma)
>>>       return IO_ERR_PTR(err);
>>>   }
>>>   +void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma)
>>> +{
>>> +    struct i915_gem_ww_ctx ww;
>>> +    void __iomem *map;
>>> +    int err;
>>> +
>>> +    for_i915_gem_ww(&ww, err, true) {
>>> +        err = i915_gem_object_lock(vma->obj, &ww);
>>> +        if (err)
>>> +            continue;
>>> +
>>> +        map = i915_vma_pin_iomap(vma);
>>> +    }
>>> +    if (err)
>>> +        map = IO_ERR_PTR(err);
>>> +
>>> +    return map;
>>> +}
>>
>> What is the reason for this change? Is this strictly related to this 
>> series/commit?
> 
> Yes, it's because pulling out the moving fence requires the dma_resv lock.

Ok, I was thinking that vma_pin_iomap is only ever called on an already 
bound GGTT vma, for which we do a syncronous wait_for_bind, but maybe 
that's not always true?

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> 
> /Thomas
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting
@ 2021-11-15 13:13         ` Matthew Auld
  0 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 13:13 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 15/11/2021 12:42, Thomas Hellström wrote:
> 
> On 11/15/21 13:36, Matthew Auld wrote:
>> On 14/11/2021 11:12, Thomas Hellström wrote:
>>> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>
>>> For now, we will only allow async migration when TTM is used,
>>> so the paths we care about are related to TTM.
>>>
>>> The mmap path is handled by having the fence in ttm_bo->moving,
>>> when pinning, the binding only becomes available after the moving
>>> fence is signaled, and pinning a cpu map will only work after
>>> the moving fence signals.
>>>
>>> This should close all holes where userspace can read a buffer
>>> before it's fully migrated.
>>>
>>> v2:
>>> - Fix a couple of SPARSE warnings
>>> v3:
>>> - Fix a NULL pointer dereference
>>>
>>> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/display/intel_fbdev.c    |  7 ++--
>>>   drivers/gpu/drm/i915/display/intel_overlay.c  |  2 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  6 +++
>>>   .../i915/gem/selftests/i915_gem_coherency.c   |  4 +-
>>>   .../drm/i915/gem/selftests/i915_gem_mman.c    | 22 ++++++-----
>>>   drivers/gpu/drm/i915/i915_vma.c               | 39 ++++++++++++++++++-
>>>   drivers/gpu/drm/i915/i915_vma.h               |  3 ++
>>>   drivers/gpu/drm/i915/selftests/i915_vma.c     |  4 +-
>>>   8 files changed, 69 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
>>> b/drivers/gpu/drm/i915/display/intel_fbdev.c
>>> index adc3a81be9f7..5902ad0c2bd8 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_fbdev.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
>>> @@ -265,11 +265,12 @@ static int intelfb_create(struct drm_fb_helper 
>>> *helper,
>>>           info->fix.smem_len = vma->node.size;
>>>       }
>>>   -    vaddr = i915_vma_pin_iomap(vma);
>>> +    vaddr = i915_vma_pin_iomap_unlocked(vma);
>>>       if (IS_ERR(vaddr)) {
>>> -        drm_err(&dev_priv->drm,
>>> -            "Failed to remap framebuffer into virtual memory\n");
>>>           ret = PTR_ERR(vaddr);
>>> +        if (ret != -EINTR && ret != -ERESTARTSYS)
>>> +            drm_err(&dev_priv->drm,
>>> +                "Failed to remap framebuffer into virtual memory\n");
>>>           goto out_unpin;
>>>       }
>>>       info->screen_base = vaddr;
>>> diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c 
>>> b/drivers/gpu/drm/i915/display/intel_overlay.c
>>> index 7e3f5c6ca484..21593f3f2664 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_overlay.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_overlay.c
>>> @@ -1357,7 +1357,7 @@ static int get_registers(struct intel_overlay 
>>> *overlay, bool use_phys)
>>>           overlay->flip_addr = sg_dma_address(obj->mm.pages->sgl);
>>>       else
>>>           overlay->flip_addr = i915_ggtt_offset(vma);
>>> -    overlay->regs = i915_vma_pin_iomap(vma);
>>> +    overlay->regs = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>         if (IS_ERR(overlay->regs)) {
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>> index c4f684b7cc51..49c6e55c68ce 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>> @@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct 
>>> drm_i915_gem_object *obj,
>>>       }
>>>         if (!ptr) {
>>> +        err = i915_gem_object_wait_moving_fence(obj, true);
>>> +        if (err) {
>>> +            ptr = ERR_PTR(err);
>>> +            goto err_unpin;
>>> +        }
>>> +
>>>           if (GEM_WARN_ON(type == I915_MAP_WC &&
>>>                   !static_cpu_has(X86_FEATURE_PAT)))
>>>               ptr = ERR_PTR(-ENODEV);
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>> index 13b088cc787e..067c512961ba 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>> @@ -101,7 +101,7 @@ static int gtt_set(struct context *ctx, unsigned 
>>> long offset, u32 v)
>>>         intel_gt_pm_get(vma->vm->gt);
>>>   -    map = i915_vma_pin_iomap(vma);
>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>       if (IS_ERR(map)) {
>>>           err = PTR_ERR(map);
>>> @@ -134,7 +134,7 @@ static int gtt_get(struct context *ctx, unsigned 
>>> long offset, u32 *v)
>>>         intel_gt_pm_get(vma->vm->gt);
>>>   -    map = i915_vma_pin_iomap(vma);
>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>       if (IS_ERR(map)) {
>>>           err = PTR_ERR(map);
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> index 6d30cdfa80f3..5d54181c2145 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> @@ -125,12 +125,13 @@ static int check_partial_mapping(struct 
>>> drm_i915_gem_object *obj,
>>>       n = page - view.partial.offset;
>>>       GEM_BUG_ON(n >= view.partial.size);
>>>   -    io = i915_vma_pin_iomap(vma);
>>> +    io = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>       if (IS_ERR(io)) {
>>> -        pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
>>> -               page, (int)PTR_ERR(io));
>>>           err = PTR_ERR(io);
>>> +        if (err != -EINTR && err != -ERESTARTSYS)
>>> +            pr_err("Failed to iomap partial view: offset=%lu; 
>>> err=%d\n",
>>> +                   page, err);
>>>           goto out;
>>>       }
>>>   @@ -219,12 +220,15 @@ static int check_partial_mappings(struct 
>>> drm_i915_gem_object *obj,
>>>           n = page - view.partial.offset;
>>>           GEM_BUG_ON(n >= view.partial.size);
>>>   -        io = i915_vma_pin_iomap(vma);
>>> +        io = i915_vma_pin_iomap_unlocked(vma);
>>>           i915_vma_unpin(vma);
>>>           if (IS_ERR(io)) {
>>> -            pr_err("Failed to iomap partial view: offset=%lu; 
>>> err=%d\n",
>>> -                   page, (int)PTR_ERR(io));
>>> -            return PTR_ERR(io);
>>> +            int err = PTR_ERR(io);
>>> +
>>> +            if (err != -EINTR && err != -ERESTARTSYS)
>>> +                pr_err("Failed to iomap partial view: offset=%lu; 
>>> err=%d\n",
>>> +                       page, err);
>>> +            return err;
>>>           }
>>>             iowrite32(page, io + n * PAGE_SIZE / sizeof(*io));
>>> @@ -773,7 +777,7 @@ static int gtt_set(struct drm_i915_gem_object *obj)
>>>           return PTR_ERR(vma);
>>>         intel_gt_pm_get(vma->vm->gt);
>>> -    map = i915_vma_pin_iomap(vma);
>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>       if (IS_ERR(map)) {
>>>           err = PTR_ERR(map);
>>> @@ -799,7 +803,7 @@ static int gtt_check(struct drm_i915_gem_object 
>>> *obj)
>>>           return PTR_ERR(vma);
>>>         intel_gt_pm_get(vma->vm->gt);
>>> -    map = i915_vma_pin_iomap(vma);
>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>       i915_vma_unpin(vma);
>>>       if (IS_ERR(map)) {
>>>           err = PTR_ERR(map);
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>> b/drivers/gpu/drm/i915/i915_vma.c
>>> index 8781c4f61952..069f22b3cd48 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>> @@ -431,6 +431,13 @@ int i915_vma_bind(struct i915_vma *vma,
>>>               work->pinned = i915_gem_object_get(vma->obj);
>>>           }
>>>       } else {
>>> +        if (vma->obj) {
>>> +            int ret;
>>> +
>>> +            ret = i915_gem_object_wait_moving_fence(vma->obj, true);
>>> +            if (ret)
>>> +                return ret;
>>> +        }
>>>           vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, 
>>> bind_flags);
>>>       }
>>>   @@ -455,6 +462,10 @@ void __iomem *i915_vma_pin_iomap(struct 
>>> i915_vma *vma)
>>>         ptr = READ_ONCE(vma->iomap);
>>>       if (ptr == NULL) {
>>> +        err = i915_gem_object_wait_moving_fence(vma->obj, true);
>>> +        if (err)
>>> +            goto err;
>>> +
>>>           /*
>>>            * TODO: consider just using i915_gem_object_pin_map() for 
>>> lmem
>>>            * instead, which already supports mapping non-contiguous 
>>> chunks
>>> @@ -496,6 +507,25 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma 
>>> *vma)
>>>       return IO_ERR_PTR(err);
>>>   }
>>>   +void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma)
>>> +{
>>> +    struct i915_gem_ww_ctx ww;
>>> +    void __iomem *map;
>>> +    int err;
>>> +
>>> +    for_i915_gem_ww(&ww, err, true) {
>>> +        err = i915_gem_object_lock(vma->obj, &ww);
>>> +        if (err)
>>> +            continue;
>>> +
>>> +        map = i915_vma_pin_iomap(vma);
>>> +    }
>>> +    if (err)
>>> +        map = IO_ERR_PTR(err);
>>> +
>>> +    return map;
>>> +}
>>
>> What is the reason for this change? Is this strictly related to this 
>> series/commit?
> 
> Yes, it's because pulling out the moving fence requires the dma_resv lock.

Ok, I was thinking that vma_pin_iomap is only ever called on an already 
bound GGTT vma, for which we do a syncronous wait_for_bind, but maybe 
that's not always true?

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> 
> /Thomas
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting
  2021-11-15 13:13         ` [Intel-gfx] " Matthew Auld
@ 2021-11-15 13:29           ` Thomas Hellström
  -1 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-15 13:29 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx, dri-devel


On 11/15/21 14:13, Matthew Auld wrote:
> On 15/11/2021 12:42, Thomas Hellström wrote:
>>
>> On 11/15/21 13:36, Matthew Auld wrote:
>>> On 14/11/2021 11:12, Thomas Hellström wrote:
>>>> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>
>>>> For now, we will only allow async migration when TTM is used,
>>>> so the paths we care about are related to TTM.
>>>>
>>>> The mmap path is handled by having the fence in ttm_bo->moving,
>>>> when pinning, the binding only becomes available after the moving
>>>> fence is signaled, and pinning a cpu map will only work after
>>>> the moving fence signals.
>>>>
>>>> This should close all holes where userspace can read a buffer
>>>> before it's fully migrated.
>>>>
>>>> v2:
>>>> - Fix a couple of SPARSE warnings
>>>> v3:
>>>> - Fix a NULL pointer dereference
>>>>
>>>> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/display/intel_fbdev.c    |  7 ++--
>>>>   drivers/gpu/drm/i915/display/intel_overlay.c  |  2 +-
>>>>   drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  6 +++
>>>>   .../i915/gem/selftests/i915_gem_coherency.c   |  4 +-
>>>>   .../drm/i915/gem/selftests/i915_gem_mman.c    | 22 ++++++-----
>>>>   drivers/gpu/drm/i915/i915_vma.c               | 39 
>>>> ++++++++++++++++++-
>>>>   drivers/gpu/drm/i915/i915_vma.h               |  3 ++
>>>>   drivers/gpu/drm/i915/selftests/i915_vma.c     |  4 +-
>>>>   8 files changed, 69 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
>>>> b/drivers/gpu/drm/i915/display/intel_fbdev.c
>>>> index adc3a81be9f7..5902ad0c2bd8 100644
>>>> --- a/drivers/gpu/drm/i915/display/intel_fbdev.c
>>>> +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
>>>> @@ -265,11 +265,12 @@ static int intelfb_create(struct 
>>>> drm_fb_helper *helper,
>>>>           info->fix.smem_len = vma->node.size;
>>>>       }
>>>>   -    vaddr = i915_vma_pin_iomap(vma);
>>>> +    vaddr = i915_vma_pin_iomap_unlocked(vma);
>>>>       if (IS_ERR(vaddr)) {
>>>> -        drm_err(&dev_priv->drm,
>>>> -            "Failed to remap framebuffer into virtual memory\n");
>>>>           ret = PTR_ERR(vaddr);
>>>> +        if (ret != -EINTR && ret != -ERESTARTSYS)
>>>> +            drm_err(&dev_priv->drm,
>>>> +                "Failed to remap framebuffer into virtual memory\n");
>>>>           goto out_unpin;
>>>>       }
>>>>       info->screen_base = vaddr;
>>>> diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c 
>>>> b/drivers/gpu/drm/i915/display/intel_overlay.c
>>>> index 7e3f5c6ca484..21593f3f2664 100644
>>>> --- a/drivers/gpu/drm/i915/display/intel_overlay.c
>>>> +++ b/drivers/gpu/drm/i915/display/intel_overlay.c
>>>> @@ -1357,7 +1357,7 @@ static int get_registers(struct intel_overlay 
>>>> *overlay, bool use_phys)
>>>>           overlay->flip_addr = sg_dma_address(obj->mm.pages->sgl);
>>>>       else
>>>>           overlay->flip_addr = i915_ggtt_offset(vma);
>>>> -    overlay->regs = i915_vma_pin_iomap(vma);
>>>> +    overlay->regs = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>         if (IS_ERR(overlay->regs)) {
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>>> index c4f684b7cc51..49c6e55c68ce 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>>> @@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct 
>>>> drm_i915_gem_object *obj,
>>>>       }
>>>>         if (!ptr) {
>>>> +        err = i915_gem_object_wait_moving_fence(obj, true);
>>>> +        if (err) {
>>>> +            ptr = ERR_PTR(err);
>>>> +            goto err_unpin;
>>>> +        }
>>>> +
>>>>           if (GEM_WARN_ON(type == I915_MAP_WC &&
>>>>                   !static_cpu_has(X86_FEATURE_PAT)))
>>>>               ptr = ERR_PTR(-ENODEV);
>>>> diff --git 
>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c 
>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>>> index 13b088cc787e..067c512961ba 100644
>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>>> @@ -101,7 +101,7 @@ static int gtt_set(struct context *ctx, 
>>>> unsigned long offset, u32 v)
>>>>         intel_gt_pm_get(vma->vm->gt);
>>>>   -    map = i915_vma_pin_iomap(vma);
>>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>       if (IS_ERR(map)) {
>>>>           err = PTR_ERR(map);
>>>> @@ -134,7 +134,7 @@ static int gtt_get(struct context *ctx, 
>>>> unsigned long offset, u32 *v)
>>>>         intel_gt_pm_get(vma->vm->gt);
>>>>   -    map = i915_vma_pin_iomap(vma);
>>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>       if (IS_ERR(map)) {
>>>>           err = PTR_ERR(map);
>>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>>> index 6d30cdfa80f3..5d54181c2145 100644
>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>>> @@ -125,12 +125,13 @@ static int check_partial_mapping(struct 
>>>> drm_i915_gem_object *obj,
>>>>       n = page - view.partial.offset;
>>>>       GEM_BUG_ON(n >= view.partial.size);
>>>>   -    io = i915_vma_pin_iomap(vma);
>>>> +    io = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>       if (IS_ERR(io)) {
>>>> -        pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
>>>> -               page, (int)PTR_ERR(io));
>>>>           err = PTR_ERR(io);
>>>> +        if (err != -EINTR && err != -ERESTARTSYS)
>>>> +            pr_err("Failed to iomap partial view: offset=%lu; 
>>>> err=%d\n",
>>>> +                   page, err);
>>>>           goto out;
>>>>       }
>>>>   @@ -219,12 +220,15 @@ static int check_partial_mappings(struct 
>>>> drm_i915_gem_object *obj,
>>>>           n = page - view.partial.offset;
>>>>           GEM_BUG_ON(n >= view.partial.size);
>>>>   -        io = i915_vma_pin_iomap(vma);
>>>> +        io = i915_vma_pin_iomap_unlocked(vma);
>>>>           i915_vma_unpin(vma);
>>>>           if (IS_ERR(io)) {
>>>> -            pr_err("Failed to iomap partial view: offset=%lu; 
>>>> err=%d\n",
>>>> -                   page, (int)PTR_ERR(io));
>>>> -            return PTR_ERR(io);
>>>> +            int err = PTR_ERR(io);
>>>> +
>>>> +            if (err != -EINTR && err != -ERESTARTSYS)
>>>> +                pr_err("Failed to iomap partial view: offset=%lu; 
>>>> err=%d\n",
>>>> +                       page, err);
>>>> +            return err;
>>>>           }
>>>>             iowrite32(page, io + n * PAGE_SIZE / sizeof(*io));
>>>> @@ -773,7 +777,7 @@ static int gtt_set(struct drm_i915_gem_object 
>>>> *obj)
>>>>           return PTR_ERR(vma);
>>>>         intel_gt_pm_get(vma->vm->gt);
>>>> -    map = i915_vma_pin_iomap(vma);
>>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>       if (IS_ERR(map)) {
>>>>           err = PTR_ERR(map);
>>>> @@ -799,7 +803,7 @@ static int gtt_check(struct drm_i915_gem_object 
>>>> *obj)
>>>>           return PTR_ERR(vma);
>>>>         intel_gt_pm_get(vma->vm->gt);
>>>> -    map = i915_vma_pin_iomap(vma);
>>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>       if (IS_ERR(map)) {
>>>>           err = PTR_ERR(map);
>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>> index 8781c4f61952..069f22b3cd48 100644
>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>> @@ -431,6 +431,13 @@ int i915_vma_bind(struct i915_vma *vma,
>>>>               work->pinned = i915_gem_object_get(vma->obj);
>>>>           }
>>>>       } else {
>>>> +        if (vma->obj) {
>>>> +            int ret;
>>>> +
>>>> +            ret = i915_gem_object_wait_moving_fence(vma->obj, true);
>>>> +            if (ret)
>>>> +                return ret;
>>>> +        }
>>>>           vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, 
>>>> bind_flags);
>>>>       }
>>>>   @@ -455,6 +462,10 @@ void __iomem *i915_vma_pin_iomap(struct 
>>>> i915_vma *vma)
>>>>         ptr = READ_ONCE(vma->iomap);
>>>>       if (ptr == NULL) {
>>>> +        err = i915_gem_object_wait_moving_fence(vma->obj, true);
>>>> +        if (err)
>>>> +            goto err;
>>>> +
>>>>           /*
>>>>            * TODO: consider just using i915_gem_object_pin_map() 
>>>> for lmem
>>>>            * instead, which already supports mapping non-contiguous 
>>>> chunks
>>>> @@ -496,6 +507,25 @@ void __iomem *i915_vma_pin_iomap(struct 
>>>> i915_vma *vma)
>>>>       return IO_ERR_PTR(err);
>>>>   }
>>>>   +void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma)
>>>> +{
>>>> +    struct i915_gem_ww_ctx ww;
>>>> +    void __iomem *map;
>>>> +    int err;
>>>> +
>>>> +    for_i915_gem_ww(&ww, err, true) {
>>>> +        err = i915_gem_object_lock(vma->obj, &ww);
>>>> +        if (err)
>>>> +            continue;
>>>> +
>>>> +        map = i915_vma_pin_iomap(vma);
>>>> +    }
>>>> +    if (err)
>>>> +        map = IO_ERR_PTR(err);
>>>> +
>>>> +    return map;
>>>> +}
>>>
>>> What is the reason for this change? Is this strictly related to this 
>>> series/commit?
>>
>> Yes, it's because pulling out the moving fence requires the dma_resv 
>> lock.
>
> Ok, I was thinking that vma_pin_iomap is only ever called on an 
> already bound GGTT vma, for which we do a syncronous wait_for_bind, 
> but maybe that's not always true?
>
Hmm, Good point. We should probably replace that vma_pin_iomap stuff 
with an assert that the
binding fence is indeed signaled and error free? Because if binding 
succeeded, no need to check the moving fence.

/Thomas



> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>
>>
>> /Thomas
>>
>>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting
@ 2021-11-15 13:29           ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-15 13:29 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx, dri-devel


On 11/15/21 14:13, Matthew Auld wrote:
> On 15/11/2021 12:42, Thomas Hellström wrote:
>>
>> On 11/15/21 13:36, Matthew Auld wrote:
>>> On 14/11/2021 11:12, Thomas Hellström wrote:
>>>> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>
>>>> For now, we will only allow async migration when TTM is used,
>>>> so the paths we care about are related to TTM.
>>>>
>>>> The mmap path is handled by having the fence in ttm_bo->moving,
>>>> when pinning, the binding only becomes available after the moving
>>>> fence is signaled, and pinning a cpu map will only work after
>>>> the moving fence signals.
>>>>
>>>> This should close all holes where userspace can read a buffer
>>>> before it's fully migrated.
>>>>
>>>> v2:
>>>> - Fix a couple of SPARSE warnings
>>>> v3:
>>>> - Fix a NULL pointer dereference
>>>>
>>>> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/display/intel_fbdev.c    |  7 ++--
>>>>   drivers/gpu/drm/i915/display/intel_overlay.c  |  2 +-
>>>>   drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  6 +++
>>>>   .../i915/gem/selftests/i915_gem_coherency.c   |  4 +-
>>>>   .../drm/i915/gem/selftests/i915_gem_mman.c    | 22 ++++++-----
>>>>   drivers/gpu/drm/i915/i915_vma.c               | 39 
>>>> ++++++++++++++++++-
>>>>   drivers/gpu/drm/i915/i915_vma.h               |  3 ++
>>>>   drivers/gpu/drm/i915/selftests/i915_vma.c     |  4 +-
>>>>   8 files changed, 69 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
>>>> b/drivers/gpu/drm/i915/display/intel_fbdev.c
>>>> index adc3a81be9f7..5902ad0c2bd8 100644
>>>> --- a/drivers/gpu/drm/i915/display/intel_fbdev.c
>>>> +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
>>>> @@ -265,11 +265,12 @@ static int intelfb_create(struct 
>>>> drm_fb_helper *helper,
>>>>           info->fix.smem_len = vma->node.size;
>>>>       }
>>>>   -    vaddr = i915_vma_pin_iomap(vma);
>>>> +    vaddr = i915_vma_pin_iomap_unlocked(vma);
>>>>       if (IS_ERR(vaddr)) {
>>>> -        drm_err(&dev_priv->drm,
>>>> -            "Failed to remap framebuffer into virtual memory\n");
>>>>           ret = PTR_ERR(vaddr);
>>>> +        if (ret != -EINTR && ret != -ERESTARTSYS)
>>>> +            drm_err(&dev_priv->drm,
>>>> +                "Failed to remap framebuffer into virtual memory\n");
>>>>           goto out_unpin;
>>>>       }
>>>>       info->screen_base = vaddr;
>>>> diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c 
>>>> b/drivers/gpu/drm/i915/display/intel_overlay.c
>>>> index 7e3f5c6ca484..21593f3f2664 100644
>>>> --- a/drivers/gpu/drm/i915/display/intel_overlay.c
>>>> +++ b/drivers/gpu/drm/i915/display/intel_overlay.c
>>>> @@ -1357,7 +1357,7 @@ static int get_registers(struct intel_overlay 
>>>> *overlay, bool use_phys)
>>>>           overlay->flip_addr = sg_dma_address(obj->mm.pages->sgl);
>>>>       else
>>>>           overlay->flip_addr = i915_ggtt_offset(vma);
>>>> -    overlay->regs = i915_vma_pin_iomap(vma);
>>>> +    overlay->regs = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>         if (IS_ERR(overlay->regs)) {
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>>> index c4f684b7cc51..49c6e55c68ce 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>>> @@ -418,6 +418,12 @@ void *i915_gem_object_pin_map(struct 
>>>> drm_i915_gem_object *obj,
>>>>       }
>>>>         if (!ptr) {
>>>> +        err = i915_gem_object_wait_moving_fence(obj, true);
>>>> +        if (err) {
>>>> +            ptr = ERR_PTR(err);
>>>> +            goto err_unpin;
>>>> +        }
>>>> +
>>>>           if (GEM_WARN_ON(type == I915_MAP_WC &&
>>>>                   !static_cpu_has(X86_FEATURE_PAT)))
>>>>               ptr = ERR_PTR(-ENODEV);
>>>> diff --git 
>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c 
>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>>> index 13b088cc787e..067c512961ba 100644
>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
>>>> @@ -101,7 +101,7 @@ static int gtt_set(struct context *ctx, 
>>>> unsigned long offset, u32 v)
>>>>         intel_gt_pm_get(vma->vm->gt);
>>>>   -    map = i915_vma_pin_iomap(vma);
>>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>       if (IS_ERR(map)) {
>>>>           err = PTR_ERR(map);
>>>> @@ -134,7 +134,7 @@ static int gtt_get(struct context *ctx, 
>>>> unsigned long offset, u32 *v)
>>>>         intel_gt_pm_get(vma->vm->gt);
>>>>   -    map = i915_vma_pin_iomap(vma);
>>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>       if (IS_ERR(map)) {
>>>>           err = PTR_ERR(map);
>>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>>> index 6d30cdfa80f3..5d54181c2145 100644
>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>>> @@ -125,12 +125,13 @@ static int check_partial_mapping(struct 
>>>> drm_i915_gem_object *obj,
>>>>       n = page - view.partial.offset;
>>>>       GEM_BUG_ON(n >= view.partial.size);
>>>>   -    io = i915_vma_pin_iomap(vma);
>>>> +    io = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>       if (IS_ERR(io)) {
>>>> -        pr_err("Failed to iomap partial view: offset=%lu; err=%d\n",
>>>> -               page, (int)PTR_ERR(io));
>>>>           err = PTR_ERR(io);
>>>> +        if (err != -EINTR && err != -ERESTARTSYS)
>>>> +            pr_err("Failed to iomap partial view: offset=%lu; 
>>>> err=%d\n",
>>>> +                   page, err);
>>>>           goto out;
>>>>       }
>>>>   @@ -219,12 +220,15 @@ static int check_partial_mappings(struct 
>>>> drm_i915_gem_object *obj,
>>>>           n = page - view.partial.offset;
>>>>           GEM_BUG_ON(n >= view.partial.size);
>>>>   -        io = i915_vma_pin_iomap(vma);
>>>> +        io = i915_vma_pin_iomap_unlocked(vma);
>>>>           i915_vma_unpin(vma);
>>>>           if (IS_ERR(io)) {
>>>> -            pr_err("Failed to iomap partial view: offset=%lu; 
>>>> err=%d\n",
>>>> -                   page, (int)PTR_ERR(io));
>>>> -            return PTR_ERR(io);
>>>> +            int err = PTR_ERR(io);
>>>> +
>>>> +            if (err != -EINTR && err != -ERESTARTSYS)
>>>> +                pr_err("Failed to iomap partial view: offset=%lu; 
>>>> err=%d\n",
>>>> +                       page, err);
>>>> +            return err;
>>>>           }
>>>>             iowrite32(page, io + n * PAGE_SIZE / sizeof(*io));
>>>> @@ -773,7 +777,7 @@ static int gtt_set(struct drm_i915_gem_object 
>>>> *obj)
>>>>           return PTR_ERR(vma);
>>>>         intel_gt_pm_get(vma->vm->gt);
>>>> -    map = i915_vma_pin_iomap(vma);
>>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>       if (IS_ERR(map)) {
>>>>           err = PTR_ERR(map);
>>>> @@ -799,7 +803,7 @@ static int gtt_check(struct drm_i915_gem_object 
>>>> *obj)
>>>>           return PTR_ERR(vma);
>>>>         intel_gt_pm_get(vma->vm->gt);
>>>> -    map = i915_vma_pin_iomap(vma);
>>>> +    map = i915_vma_pin_iomap_unlocked(vma);
>>>>       i915_vma_unpin(vma);
>>>>       if (IS_ERR(map)) {
>>>>           err = PTR_ERR(map);
>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>> index 8781c4f61952..069f22b3cd48 100644
>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>> @@ -431,6 +431,13 @@ int i915_vma_bind(struct i915_vma *vma,
>>>>               work->pinned = i915_gem_object_get(vma->obj);
>>>>           }
>>>>       } else {
>>>> +        if (vma->obj) {
>>>> +            int ret;
>>>> +
>>>> +            ret = i915_gem_object_wait_moving_fence(vma->obj, true);
>>>> +            if (ret)
>>>> +                return ret;
>>>> +        }
>>>>           vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, 
>>>> bind_flags);
>>>>       }
>>>>   @@ -455,6 +462,10 @@ void __iomem *i915_vma_pin_iomap(struct 
>>>> i915_vma *vma)
>>>>         ptr = READ_ONCE(vma->iomap);
>>>>       if (ptr == NULL) {
>>>> +        err = i915_gem_object_wait_moving_fence(vma->obj, true);
>>>> +        if (err)
>>>> +            goto err;
>>>> +
>>>>           /*
>>>>            * TODO: consider just using i915_gem_object_pin_map() 
>>>> for lmem
>>>>            * instead, which already supports mapping non-contiguous 
>>>> chunks
>>>> @@ -496,6 +507,25 @@ void __iomem *i915_vma_pin_iomap(struct 
>>>> i915_vma *vma)
>>>>       return IO_ERR_PTR(err);
>>>>   }
>>>>   +void __iomem *i915_vma_pin_iomap_unlocked(struct i915_vma *vma)
>>>> +{
>>>> +    struct i915_gem_ww_ctx ww;
>>>> +    void __iomem *map;
>>>> +    int err;
>>>> +
>>>> +    for_i915_gem_ww(&ww, err, true) {
>>>> +        err = i915_gem_object_lock(vma->obj, &ww);
>>>> +        if (err)
>>>> +            continue;
>>>> +
>>>> +        map = i915_vma_pin_iomap(vma);
>>>> +    }
>>>> +    if (err)
>>>> +        map = IO_ERR_PTR(err);
>>>> +
>>>> +    return map;
>>>> +}
>>>
>>> What is the reason for this change? Is this strictly related to this 
>>> series/commit?
>>
>> Yes, it's because pulling out the moving fence requires the dma_resv 
>> lock.
>
> Ok, I was thinking that vma_pin_iomap is only ever called on an 
> already bound GGTT vma, for which we do a syncronous wait_for_bind, 
> but maybe that's not always true?
>
Hmm, Good point. We should probably replace that vma_pin_iomap stuff 
with an assert that the
binding fence is indeed signaled and error free? Because if binding 
succeeded, no need to check the moving fence.

/Thomas



> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>
>>
>> /Thomas
>>
>>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 5/6] drm/i915/ttm: Implement asynchronous TTM moves
  2021-11-14 11:12   ` [Intel-gfx] " Thomas Hellström
@ 2021-11-15 17:16     ` Matthew Auld
  -1 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 17:16 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 14/11/2021 11:12, Thomas Hellström wrote:
> Don't wait sync while migrating, but rather make the GPU blit await the
> dependencies and add a moving fence to the object.
> 
> This also enables asynchronous VRAM management in that on eviction,
> rather than waiting for the moving fence to expire before freeing VRAM,
> it is freed immediately and the fence is stored with the VRAM manager and
> handed out to newly allocated objects to await before clears and swapins,
> or for kernel objects before setting up gpu vmas or mapping.
> 
> To collect dependencies before migrating, add a set of utilities that
> coalesce these to a single dma_fence.
> 
> What is still missing for fully asynchronous operation is asynchronous vma
> unbinding, which is still to be implemented.
> 
> This commit substantially reduces execution time in the gem_lmem_swapping
> test.
> 
> v2:
> - Make a couple of functions static.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  10 +
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 329 +++++++++++++++++--
>   drivers/gpu/drm/i915/gem/i915_gem_wait.c     |   4 +-
>   4 files changed, 318 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index a1df49378a0f..111a4282d779 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -326,6 +326,9 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
>   {
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>   
> +	if (!obj)
> +		return false;
> +
>   	/*
>   	 * EXTERNAL objects should never be swapped out by TTM, instead we need
>   	 * to handle that ourselves. TTM will already skip such objects for us,
> @@ -448,6 +451,10 @@ static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
>   	if (bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED)
>   		return 0;
>   
> +	ret = ttm_bo_wait_ctx(bo, &ctx);
> +	if (ret)
> +		return ret;


Why do we need this? Also not needed for the above purge case?

> +
>   	bo->ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
>   	ret = ttm_bo_validate(bo, &place, &ctx);
>   	if (ret) {
> @@ -549,6 +556,9 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>   	int ret = i915_ttm_move_notify(bo);
>   
> +	if (!obj)
> +		return;

It looks like the i915_ttm_move_notify(bo) already dereferenced the GEM 
bo. Or did something in there maybe nuke it?

> +
>   	GEM_WARN_ON(ret);
>   	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
>   	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> index 82cdabb542be..9d698ad00853 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> @@ -37,7 +37,7 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo);
>   static inline struct drm_i915_gem_object *
>   i915_ttm_to_gem(struct ttm_buffer_object *bo)
>   {
> -	if (GEM_WARN_ON(bo->destroy != i915_ttm_bo_destroy))
> +	if (bo->destroy != i915_ttm_bo_destroy)
>   		return NULL;

So this would indicate a "ghost" object, or is this something else? How 
scared should we be with this, like with the above checking for NULL GEM 
object state? In general do you know where we need the above checking?

>   
>   	return container_of(bo, struct drm_i915_gem_object, __do_not_access);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index f35b386c56ca..ae2c49fc3500 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -3,6 +3,8 @@
>    * Copyright © 2021 Intel Corporation
>    */
>   
> +#include <linux/dma-fence-array.h>
> +
>   #include <drm/ttm/ttm_bo_driver.h>
>   
>   #include "i915_drv.h"
> @@ -41,6 +43,228 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>   }
>   #endif
>   
> +/**
> + * DOC: Set of utilities to dynamically collect dependencies and
> + * eventually coalesce them into a single fence which is fed into
> + * the migration code. That single fence is, in the case of dependencies
> + * from multiple contexts, a struct dma_fence_array, since the
> + * i915 request code can break that up and await the individual
> + * fences.

Would it make sense to add few more more details here for why we need 
this? IIUC it looks like TTM expects single context/timeline for 
pipelined move, like with that dma_fence_is_later() check in 
pipeline_evict? Maybe there is something already documented in TTM we 
can link to here?

> + *
> + * While collecting the individual dependencies, we store the refcounted
> + * struct dma_fence pointers in a realloc-type-managed pointer array, since
> + * that can be easily fed into a dma_fence_array. Other options are
> + * available, like for example an xarray for similarity with drm/sched.
> + * Can be changed easily if needed.
> + *
> + * We might want to break this out into a separate file as a utility.
> + */
> +
> +#define I915_DEPS_MIN_ALLOC_CHUNK 8U
> +
> +/**
> + * struct i915_deps - Collect dependencies into a single dma-fence
> + * @single: Storage for pointer if the collection is a single fence.
> + * @fence: Allocated array of fence pointers if more than a single fence;
> + * otherwise points to the address of @single.
> + * @num_deps: Current number of dependency fences.
> + * @fences_size: Size of the @fences array in number of pointers.
> + * @gfp: Allocation mode.
> + */
> +struct i915_deps {
> +	struct dma_fence *single;
> +	struct dma_fence **fences;
> +	unsigned int num_deps;
> +	unsigned int fences_size;
> +	gfp_t gfp;
> +};
> +
> +static void i915_deps_reset_fences(struct i915_deps *deps)
> +{
> +	if (deps->fences != &deps->single)
> +		kfree(deps->fences);
> +	deps->num_deps = 0;
> +	deps->fences_size = 1;
> +	deps->fences = &deps->single;
> +}
> +
> +static void i915_deps_init(struct i915_deps *deps, gfp_t gfp)
> +{
> +	deps->fences = NULL;
> +	deps->gfp = gfp;
> +	i915_deps_reset_fences(deps);
> +}
> +
> +static void i915_deps_fini(struct i915_deps *deps)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < deps->num_deps; ++i)
> +		dma_fence_put(deps->fences[i]);
> +
> +	if (deps->fences != &deps->single)
> +		kfree(deps->fences);
> +}
> +
> +static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence,
> +			  const struct ttm_operation_ctx *ctx)
> +{
> +	int ret;
> +
> +	if (deps->num_deps >= deps->fences_size) {
> +		unsigned int new_size = 2 * deps->fences_size;
> +		struct dma_fence **new_fences;
> +
> +		new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK);
> +		new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp);
> +		if (!new_fences)
> +			goto sync;
> +
> +		memcpy(new_fences, deps->fences,
> +		       deps->fences_size * sizeof(*new_fences));
> +		swap(new_fences, deps->fences);
> +		if (new_fences != &deps->single)
> +			kfree(new_fences);
> +		deps->fences_size = new_size;
> +	}
> +	deps->fences[deps->num_deps++] = dma_fence_get(fence);
> +	return 0;
> +
> +sync:
> +	if (ctx->no_wait_gpu) {
> +		ret = -EBUSY;
> +		goto unref;
> +	}
> +
> +	ret = dma_fence_wait(fence, ctx->interruptible);
> +	if (ret)
> +		goto unref;
> +
> +	ret = fence->error;
> +	if (ret)
> +		goto unref;
> +
> +	return 0;
> +
> +unref:
> +	i915_deps_fini(deps);
> +	return ret;
> +}
> +
> +static int i915_deps_sync(struct i915_deps *deps,
> +			  const struct ttm_operation_ctx *ctx)
> +{
> +	unsigned int i;
> +	int ret = 0;
> +	struct dma_fence **fences = deps->fences;

Nit: Christmas tree.

> +
> +	for (i = 0; i < deps->num_deps; ++i, ++fences) {
> +		if (ctx->no_wait_gpu) {
> +			ret = -EBUSY;
> +			goto unref;
> +		}
> +
> +		ret = dma_fence_wait(*fences, ctx->interruptible);
> +		if (ret)
> +			goto unref;
> +
> +		ret = (*fences)->error;
> +		if (ret)
> +			goto unref;
> +	}
> +
> +	i915_deps_fini(deps);
> +	return 0;
> +
> +unref:
> +	i915_deps_fini(deps);
> +	return ret;
> +}
> +
> +static int i915_deps_add_dependency(struct i915_deps *deps,
> +				    struct dma_fence *fence,
> +				    const struct ttm_operation_ctx *ctx)
> +{
> +	unsigned int i;
> +	int ret;
> +
> +	if (!fence)
> +		return 0;
> +
> +	if (dma_fence_is_signaled(fence)) {
> +		ret = fence->error;
> +		if (ret)
> +			i915_deps_fini(deps);
> +		return ret;
> +	}
> +
> +	for (i = 0; i < deps->num_deps; ++i) {
> +		struct dma_fence *entry = deps->fences[i];
> +
> +		if (!entry->context || entry->context != fence->context)
> +			continue;
> +
> +		if (dma_fence_is_later(fence, entry)) {
> +			dma_fence_put(entry);
> +			deps->fences[i] = dma_fence_get(fence);
> +		}
> +
> +		return 0;
> +	}
> +
> +	return i915_deps_grow(deps, fence, ctx);
> +}
> +
> +static struct dma_fence *i915_deps_to_fence(struct i915_deps *deps,
> +					    const struct ttm_operation_ctx *ctx)
> +{
> +	struct dma_fence_array *array;
> +
> +	if (deps->num_deps == 0)
> +		return NULL;
> +
> +	if (deps->num_deps == 1) {
> +		deps->num_deps = 0;
> +		return deps->fences[0];
> +	}
> +
> +	/*
> +	 * TODO: Alter the allocation mode here to not try too hard to
> +	 * make things async.
> +	 */
> +	array = dma_fence_array_create(deps->num_deps, deps->fences, 0, 0,
> +				       false);
> +	if (!array)
> +		return ERR_PTR(i915_deps_sync(deps, ctx));
> +
> +	deps->fences = NULL;
> +	i915_deps_reset_fences(deps);
> +
> +	return &array->base;
> +}
> +
> +static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv,
> +			      bool all, const bool no_excl,
> +			      const struct ttm_operation_ctx *ctx)
> +{
> +	struct dma_resv_iter iter;
> +	struct dma_fence *fence;
> +
> +	dma_resv_assert_held(resv);
> +	dma_resv_for_each_fence(&iter, resv, all, fence) {
> +		int ret;
> +
> +		if (no_excl && !iter.index)
> +			continue;
> +
> +		ret = i915_deps_add_dependency(deps, fence, ctx);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
>   static enum i915_cache_level
>   i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>   		     struct ttm_tt *ttm)
> @@ -156,7 +380,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   					     bool clear,
>   					     struct ttm_resource *dst_mem,
>   					     struct ttm_tt *dst_ttm,
> -					     struct sg_table *dst_st)
> +					     struct sg_table *dst_st,
> +					     struct dma_fence *dep)
>   {
>   	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
>   						     bdev);
> @@ -180,7 +405,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   			return ERR_PTR(-EINVAL);
>   
>   		intel_engine_pm_get(i915->gt.migrate.context->engine);
> -		ret = intel_context_migrate_clear(i915->gt.migrate.context, NULL,
> +		ret = intel_context_migrate_clear(i915->gt.migrate.context, dep,
>   						  dst_st->sgl, dst_level,
>   						  i915_ttm_gtt_binds_lmem(dst_mem),
>   						  0, &rq);
> @@ -194,7 +419,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>   		intel_engine_pm_get(i915->gt.migrate.context->engine);
>   		ret = intel_context_migrate_copy(i915->gt.migrate.context,
> -						 NULL, src_rsgt->table.sgl,
> +						 dep, src_rsgt->table.sgl,
>   						 src_level,
>   						 i915_ttm_gtt_binds_lmem(bo->resource),
>   						 dst_st->sgl, dst_level,
> @@ -378,10 +603,11 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
>   	return &work->fence;
>   }
>   
> -static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
> -			    struct ttm_resource *dst_mem,
> -			    struct ttm_tt *dst_ttm,
> -			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
> +static struct dma_fence *
> +__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
> +		struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
> +		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
> +		struct dma_fence *move_dep)
>   {
>   	struct i915_ttm_memcpy_work *copy_work = NULL;
>   	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
> @@ -389,7 +615,7 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   
>   	if (allow_accel) {
>   		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
> -					    &dst_rsgt->table);
> +					    &dst_rsgt->table, move_dep);
>   
>   		/*
>   		 * We only need to intercept the error when moving to lmem.
> @@ -423,6 +649,11 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   
>   		if (!IS_ERR(fence))
>   			goto out;
> +	} else if (move_dep) {
> +		int err = dma_fence_wait(move_dep, true);
> +
> +		if (err)
> +			return ERR_PTR(err);
>   	}
>   
>   	/* Error intercept failed or no accelerated migration to start with */
> @@ -433,16 +664,35 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   	i915_ttm_memcpy_release(arg);
>   	kfree(copy_work);
>   
> -	return;
> +	return NULL;
>   out:
> -	/* Sync here for now, forward the fence to caller when fully async. */
> -	if (fence) {
> -		dma_fence_wait(fence, false);
> -		dma_fence_put(fence);
> -	} else if (copy_work) {
> +	if (!fence && copy_work) {
>   		i915_ttm_memcpy_release(arg);
>   		kfree(copy_work);
>   	}
> +
> +	return fence;
> +}
> +
> +static struct dma_fence *prev_fence(struct ttm_buffer_object *bo,
> +				    struct ttm_operation_ctx *ctx)
> +{
> +	struct i915_deps deps;
> +	int ret;
> +
> +	/*
> +	 * Instead of trying hard with GFP_KERNEL to allocate memory,
> +	 * the dependency collection will just sync if it doesn't
> +	 * succeed.
> +	 */
> +	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
> +	ret = i915_deps_add_dependency(&deps, bo->moving, ctx);
> +	if (!ret)
> +		ret = i915_deps_add_resv(&deps, bo->base.resv, false, false, ctx);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	return i915_deps_to_fence(&deps, ctx);
>   }
>   
>   /**
> @@ -462,16 +712,12 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>   	struct ttm_resource_manager *dst_man =
>   		ttm_manager_type(bo->bdev, dst_mem->mem_type);
> +	struct dma_fence *migration_fence = NULL;
>   	struct ttm_tt *ttm = bo->ttm;
>   	struct i915_refct_sgt *dst_rsgt;
>   	bool clear;
>   	int ret;
>   
> -	/* Sync for now. We could do the actual copy async. */
> -	ret = ttm_bo_wait_ctx(bo, ctx);
> -	if (ret)
> -		return ret;
> -
>   	ret = i915_ttm_move_notify(bo);
>   	if (ret)
>   		return ret;
> @@ -494,10 +740,37 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>   		return PTR_ERR(dst_rsgt);
>   
>   	clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
> -	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
> -		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
> +	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
> +		struct dma_fence *dep = prev_fence(bo, ctx);
> +
> +		if (IS_ERR(dep)) {
> +			i915_refct_sgt_put(dst_rsgt);
> +			return PTR_ERR(dep);
> +		}
> +
> +		migration_fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm,
> +						  dst_rsgt, true, dep);
> +		dma_fence_put(dep);
> +	}
> +
> +	/* We can possibly get an -ERESTARTSYS here */
> +	if (IS_ERR(migration_fence)) {
> +		i915_refct_sgt_put(dst_rsgt);
> +		return PTR_ERR(migration_fence);
> +	}
> +
> +	if (migration_fence) {
> +		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
> +						true, dst_mem);
> +		if (ret) {
> +			dma_fence_wait(migration_fence, false);
> +			ttm_bo_move_sync_cleanup(bo, dst_mem);
> +		}
> +		dma_fence_put(migration_fence);
> +	} else {
> +		ttm_bo_move_sync_cleanup(bo, dst_mem);
> +	}
>   
> -	ttm_bo_move_sync_cleanup(bo, dst_mem);
>   	i915_ttm_adjust_domains_after_move(obj);
>   	i915_ttm_free_cached_io_rsgt(obj);
>   
> @@ -538,6 +811,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>   		.interruptible = intr,
>   	};
>   	struct i915_refct_sgt *dst_rsgt;
> +	struct dma_fence *copy_fence;
>   	int ret;
>   
>   	assert_object_held(dst);
> @@ -553,10 +827,17 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>   		return ret;
>   
>   	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
> -	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
> -			dst_rsgt, allow_accel);
> +	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
> +				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
>   
>   	i915_refct_sgt_put(dst_rsgt);
> +	if (IS_ERR(copy_fence))
> +		return PTR_ERR(copy_fence);
> +
> +	if (copy_fence) {
> +		dma_fence_wait(copy_fence, false);
> +		dma_fence_put(copy_fence);
> +	}
>   
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> index f909aaa09d9c..bae65796a6cc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> @@ -306,6 +306,6 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
>   				   unsigned int flags)
>   {
>   	might_sleep();
> -	/* NOP for now. */
> -	return 0;
> +
> +	return i915_gem_object_wait_moving_fence(obj, !!(flags & I915_WAIT_INTERRUPTIBLE));
>   }
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Intel-gfx] [PATCH v3 5/6] drm/i915/ttm: Implement asynchronous TTM moves
@ 2021-11-15 17:16     ` Matthew Auld
  0 siblings, 0 replies; 40+ messages in thread
From: Matthew Auld @ 2021-11-15 17:16 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx, dri-devel

On 14/11/2021 11:12, Thomas Hellström wrote:
> Don't wait sync while migrating, but rather make the GPU blit await the
> dependencies and add a moving fence to the object.
> 
> This also enables asynchronous VRAM management in that on eviction,
> rather than waiting for the moving fence to expire before freeing VRAM,
> it is freed immediately and the fence is stored with the VRAM manager and
> handed out to newly allocated objects to await before clears and swapins,
> or for kernel objects before setting up gpu vmas or mapping.
> 
> To collect dependencies before migrating, add a set of utilities that
> coalesce these to a single dma_fence.
> 
> What is still missing for fully asynchronous operation is asynchronous vma
> unbinding, which is still to be implemented.
> 
> This commit substantially reduces execution time in the gem_lmem_swapping
> test.
> 
> v2:
> - Make a couple of functions static.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  10 +
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 329 +++++++++++++++++--
>   drivers/gpu/drm/i915/gem/i915_gem_wait.c     |   4 +-
>   4 files changed, 318 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index a1df49378a0f..111a4282d779 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -326,6 +326,9 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
>   {
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>   
> +	if (!obj)
> +		return false;
> +
>   	/*
>   	 * EXTERNAL objects should never be swapped out by TTM, instead we need
>   	 * to handle that ourselves. TTM will already skip such objects for us,
> @@ -448,6 +451,10 @@ static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
>   	if (bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED)
>   		return 0;
>   
> +	ret = ttm_bo_wait_ctx(bo, &ctx);
> +	if (ret)
> +		return ret;


Why do we need this? Also not needed for the above purge case?

> +
>   	bo->ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
>   	ret = ttm_bo_validate(bo, &place, &ctx);
>   	if (ret) {
> @@ -549,6 +556,9 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>   	int ret = i915_ttm_move_notify(bo);
>   
> +	if (!obj)
> +		return;

It looks like the i915_ttm_move_notify(bo) already dereferenced the GEM 
bo. Or did something in there maybe nuke it?

> +
>   	GEM_WARN_ON(ret);
>   	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
>   	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> index 82cdabb542be..9d698ad00853 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> @@ -37,7 +37,7 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo);
>   static inline struct drm_i915_gem_object *
>   i915_ttm_to_gem(struct ttm_buffer_object *bo)
>   {
> -	if (GEM_WARN_ON(bo->destroy != i915_ttm_bo_destroy))
> +	if (bo->destroy != i915_ttm_bo_destroy)
>   		return NULL;

So this would indicate a "ghost" object, or is this something else? How 
scared should we be with this, like with the above checking for NULL GEM 
object state? In general do you know where we need the above checking?

>   
>   	return container_of(bo, struct drm_i915_gem_object, __do_not_access);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index f35b386c56ca..ae2c49fc3500 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -3,6 +3,8 @@
>    * Copyright © 2021 Intel Corporation
>    */
>   
> +#include <linux/dma-fence-array.h>
> +
>   #include <drm/ttm/ttm_bo_driver.h>
>   
>   #include "i915_drv.h"
> @@ -41,6 +43,228 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
>   }
>   #endif
>   
> +/**
> + * DOC: Set of utilities to dynamically collect dependencies and
> + * eventually coalesce them into a single fence which is fed into
> + * the migration code. That single fence is, in the case of dependencies
> + * from multiple contexts, a struct dma_fence_array, since the
> + * i915 request code can break that up and await the individual
> + * fences.

Would it make sense to add few more more details here for why we need 
this? IIUC it looks like TTM expects single context/timeline for 
pipelined move, like with that dma_fence_is_later() check in 
pipeline_evict? Maybe there is something already documented in TTM we 
can link to here?

> + *
> + * While collecting the individual dependencies, we store the refcounted
> + * struct dma_fence pointers in a realloc-type-managed pointer array, since
> + * that can be easily fed into a dma_fence_array. Other options are
> + * available, like for example an xarray for similarity with drm/sched.
> + * Can be changed easily if needed.
> + *
> + * We might want to break this out into a separate file as a utility.
> + */
> +
> +#define I915_DEPS_MIN_ALLOC_CHUNK 8U
> +
> +/**
> + * struct i915_deps - Collect dependencies into a single dma-fence
> + * @single: Storage for pointer if the collection is a single fence.
> + * @fence: Allocated array of fence pointers if more than a single fence;
> + * otherwise points to the address of @single.
> + * @num_deps: Current number of dependency fences.
> + * @fences_size: Size of the @fences array in number of pointers.
> + * @gfp: Allocation mode.
> + */
> +struct i915_deps {
> +	struct dma_fence *single;
> +	struct dma_fence **fences;
> +	unsigned int num_deps;
> +	unsigned int fences_size;
> +	gfp_t gfp;
> +};
> +
> +static void i915_deps_reset_fences(struct i915_deps *deps)
> +{
> +	if (deps->fences != &deps->single)
> +		kfree(deps->fences);
> +	deps->num_deps = 0;
> +	deps->fences_size = 1;
> +	deps->fences = &deps->single;
> +}
> +
> +static void i915_deps_init(struct i915_deps *deps, gfp_t gfp)
> +{
> +	deps->fences = NULL;
> +	deps->gfp = gfp;
> +	i915_deps_reset_fences(deps);
> +}
> +
> +static void i915_deps_fini(struct i915_deps *deps)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < deps->num_deps; ++i)
> +		dma_fence_put(deps->fences[i]);
> +
> +	if (deps->fences != &deps->single)
> +		kfree(deps->fences);
> +}
> +
> +static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence,
> +			  const struct ttm_operation_ctx *ctx)
> +{
> +	int ret;
> +
> +	if (deps->num_deps >= deps->fences_size) {
> +		unsigned int new_size = 2 * deps->fences_size;
> +		struct dma_fence **new_fences;
> +
> +		new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK);
> +		new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp);
> +		if (!new_fences)
> +			goto sync;
> +
> +		memcpy(new_fences, deps->fences,
> +		       deps->fences_size * sizeof(*new_fences));
> +		swap(new_fences, deps->fences);
> +		if (new_fences != &deps->single)
> +			kfree(new_fences);
> +		deps->fences_size = new_size;
> +	}
> +	deps->fences[deps->num_deps++] = dma_fence_get(fence);
> +	return 0;
> +
> +sync:
> +	if (ctx->no_wait_gpu) {
> +		ret = -EBUSY;
> +		goto unref;
> +	}
> +
> +	ret = dma_fence_wait(fence, ctx->interruptible);
> +	if (ret)
> +		goto unref;
> +
> +	ret = fence->error;
> +	if (ret)
> +		goto unref;
> +
> +	return 0;
> +
> +unref:
> +	i915_deps_fini(deps);
> +	return ret;
> +}
> +
> +static int i915_deps_sync(struct i915_deps *deps,
> +			  const struct ttm_operation_ctx *ctx)
> +{
> +	unsigned int i;
> +	int ret = 0;
> +	struct dma_fence **fences = deps->fences;

Nit: Christmas tree.

> +
> +	for (i = 0; i < deps->num_deps; ++i, ++fences) {
> +		if (ctx->no_wait_gpu) {
> +			ret = -EBUSY;
> +			goto unref;
> +		}
> +
> +		ret = dma_fence_wait(*fences, ctx->interruptible);
> +		if (ret)
> +			goto unref;
> +
> +		ret = (*fences)->error;
> +		if (ret)
> +			goto unref;
> +	}
> +
> +	i915_deps_fini(deps);
> +	return 0;
> +
> +unref:
> +	i915_deps_fini(deps);
> +	return ret;
> +}
> +
> +static int i915_deps_add_dependency(struct i915_deps *deps,
> +				    struct dma_fence *fence,
> +				    const struct ttm_operation_ctx *ctx)
> +{
> +	unsigned int i;
> +	int ret;
> +
> +	if (!fence)
> +		return 0;
> +
> +	if (dma_fence_is_signaled(fence)) {
> +		ret = fence->error;
> +		if (ret)
> +			i915_deps_fini(deps);
> +		return ret;
> +	}
> +
> +	for (i = 0; i < deps->num_deps; ++i) {
> +		struct dma_fence *entry = deps->fences[i];
> +
> +		if (!entry->context || entry->context != fence->context)
> +			continue;
> +
> +		if (dma_fence_is_later(fence, entry)) {
> +			dma_fence_put(entry);
> +			deps->fences[i] = dma_fence_get(fence);
> +		}
> +
> +		return 0;
> +	}
> +
> +	return i915_deps_grow(deps, fence, ctx);
> +}
> +
> +static struct dma_fence *i915_deps_to_fence(struct i915_deps *deps,
> +					    const struct ttm_operation_ctx *ctx)
> +{
> +	struct dma_fence_array *array;
> +
> +	if (deps->num_deps == 0)
> +		return NULL;
> +
> +	if (deps->num_deps == 1) {
> +		deps->num_deps = 0;
> +		return deps->fences[0];
> +	}
> +
> +	/*
> +	 * TODO: Alter the allocation mode here to not try too hard to
> +	 * make things async.
> +	 */
> +	array = dma_fence_array_create(deps->num_deps, deps->fences, 0, 0,
> +				       false);
> +	if (!array)
> +		return ERR_PTR(i915_deps_sync(deps, ctx));
> +
> +	deps->fences = NULL;
> +	i915_deps_reset_fences(deps);
> +
> +	return &array->base;
> +}
> +
> +static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv,
> +			      bool all, const bool no_excl,
> +			      const struct ttm_operation_ctx *ctx)
> +{
> +	struct dma_resv_iter iter;
> +	struct dma_fence *fence;
> +
> +	dma_resv_assert_held(resv);
> +	dma_resv_for_each_fence(&iter, resv, all, fence) {
> +		int ret;
> +
> +		if (no_excl && !iter.index)
> +			continue;
> +
> +		ret = i915_deps_add_dependency(deps, fence, ctx);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
>   static enum i915_cache_level
>   i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>   		     struct ttm_tt *ttm)
> @@ -156,7 +380,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   					     bool clear,
>   					     struct ttm_resource *dst_mem,
>   					     struct ttm_tt *dst_ttm,
> -					     struct sg_table *dst_st)
> +					     struct sg_table *dst_st,
> +					     struct dma_fence *dep)
>   {
>   	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
>   						     bdev);
> @@ -180,7 +405,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   			return ERR_PTR(-EINVAL);
>   
>   		intel_engine_pm_get(i915->gt.migrate.context->engine);
> -		ret = intel_context_migrate_clear(i915->gt.migrate.context, NULL,
> +		ret = intel_context_migrate_clear(i915->gt.migrate.context, dep,
>   						  dst_st->sgl, dst_level,
>   						  i915_ttm_gtt_binds_lmem(dst_mem),
>   						  0, &rq);
> @@ -194,7 +419,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>   		intel_engine_pm_get(i915->gt.migrate.context->engine);
>   		ret = intel_context_migrate_copy(i915->gt.migrate.context,
> -						 NULL, src_rsgt->table.sgl,
> +						 dep, src_rsgt->table.sgl,
>   						 src_level,
>   						 i915_ttm_gtt_binds_lmem(bo->resource),
>   						 dst_st->sgl, dst_level,
> @@ -378,10 +603,11 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work,
>   	return &work->fence;
>   }
>   
> -static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
> -			    struct ttm_resource *dst_mem,
> -			    struct ttm_tt *dst_ttm,
> -			    struct i915_refct_sgt *dst_rsgt, bool allow_accel)
> +static struct dma_fence *
> +__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
> +		struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
> +		struct i915_refct_sgt *dst_rsgt, bool allow_accel,
> +		struct dma_fence *move_dep)
>   {
>   	struct i915_ttm_memcpy_work *copy_work = NULL;
>   	struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
> @@ -389,7 +615,7 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   
>   	if (allow_accel) {
>   		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
> -					    &dst_rsgt->table);
> +					    &dst_rsgt->table, move_dep);
>   
>   		/*
>   		 * We only need to intercept the error when moving to lmem.
> @@ -423,6 +649,11 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   
>   		if (!IS_ERR(fence))
>   			goto out;
> +	} else if (move_dep) {
> +		int err = dma_fence_wait(move_dep, true);
> +
> +		if (err)
> +			return ERR_PTR(err);
>   	}
>   
>   	/* Error intercept failed or no accelerated migration to start with */
> @@ -433,16 +664,35 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>   	i915_ttm_memcpy_release(arg);
>   	kfree(copy_work);
>   
> -	return;
> +	return NULL;
>   out:
> -	/* Sync here for now, forward the fence to caller when fully async. */
> -	if (fence) {
> -		dma_fence_wait(fence, false);
> -		dma_fence_put(fence);
> -	} else if (copy_work) {
> +	if (!fence && copy_work) {
>   		i915_ttm_memcpy_release(arg);
>   		kfree(copy_work);
>   	}
> +
> +	return fence;
> +}
> +
> +static struct dma_fence *prev_fence(struct ttm_buffer_object *bo,
> +				    struct ttm_operation_ctx *ctx)
> +{
> +	struct i915_deps deps;
> +	int ret;
> +
> +	/*
> +	 * Instead of trying hard with GFP_KERNEL to allocate memory,
> +	 * the dependency collection will just sync if it doesn't
> +	 * succeed.
> +	 */
> +	i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
> +	ret = i915_deps_add_dependency(&deps, bo->moving, ctx);
> +	if (!ret)
> +		ret = i915_deps_add_resv(&deps, bo->base.resv, false, false, ctx);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	return i915_deps_to_fence(&deps, ctx);
>   }
>   
>   /**
> @@ -462,16 +712,12 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>   	struct ttm_resource_manager *dst_man =
>   		ttm_manager_type(bo->bdev, dst_mem->mem_type);
> +	struct dma_fence *migration_fence = NULL;
>   	struct ttm_tt *ttm = bo->ttm;
>   	struct i915_refct_sgt *dst_rsgt;
>   	bool clear;
>   	int ret;
>   
> -	/* Sync for now. We could do the actual copy async. */
> -	ret = ttm_bo_wait_ctx(bo, ctx);
> -	if (ret)
> -		return ret;
> -
>   	ret = i915_ttm_move_notify(bo);
>   	if (ret)
>   		return ret;
> @@ -494,10 +740,37 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>   		return PTR_ERR(dst_rsgt);
>   
>   	clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
> -	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
> -		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
> +	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
> +		struct dma_fence *dep = prev_fence(bo, ctx);
> +
> +		if (IS_ERR(dep)) {
> +			i915_refct_sgt_put(dst_rsgt);
> +			return PTR_ERR(dep);
> +		}
> +
> +		migration_fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm,
> +						  dst_rsgt, true, dep);
> +		dma_fence_put(dep);
> +	}
> +
> +	/* We can possibly get an -ERESTARTSYS here */
> +	if (IS_ERR(migration_fence)) {
> +		i915_refct_sgt_put(dst_rsgt);
> +		return PTR_ERR(migration_fence);
> +	}
> +
> +	if (migration_fence) {
> +		ret = ttm_bo_move_accel_cleanup(bo, migration_fence, evict,
> +						true, dst_mem);
> +		if (ret) {
> +			dma_fence_wait(migration_fence, false);
> +			ttm_bo_move_sync_cleanup(bo, dst_mem);
> +		}
> +		dma_fence_put(migration_fence);
> +	} else {
> +		ttm_bo_move_sync_cleanup(bo, dst_mem);
> +	}
>   
> -	ttm_bo_move_sync_cleanup(bo, dst_mem);
>   	i915_ttm_adjust_domains_after_move(obj);
>   	i915_ttm_free_cached_io_rsgt(obj);
>   
> @@ -538,6 +811,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>   		.interruptible = intr,
>   	};
>   	struct i915_refct_sgt *dst_rsgt;
> +	struct dma_fence *copy_fence;
>   	int ret;
>   
>   	assert_object_held(dst);
> @@ -553,10 +827,17 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>   		return ret;
>   
>   	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
> -	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
> -			dst_rsgt, allow_accel);
> +	copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource,
> +				     dst_bo->ttm, dst_rsgt, allow_accel, NULL);
>   
>   	i915_refct_sgt_put(dst_rsgt);
> +	if (IS_ERR(copy_fence))
> +		return PTR_ERR(copy_fence);
> +
> +	if (copy_fence) {
> +		dma_fence_wait(copy_fence, false);
> +		dma_fence_put(copy_fence);
> +	}
>   
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> index f909aaa09d9c..bae65796a6cc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> @@ -306,6 +306,6 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
>   				   unsigned int flags)
>   {
>   	might_sleep();
> -	/* NOP for now. */
> -	return 0;
> +
> +	return i915_gem_object_wait_moving_fence(obj, !!(flags & I915_WAIT_INTERRUPTIBLE));
>   }
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 5/6] drm/i915/ttm: Implement asynchronous TTM moves
  2021-11-15 17:16     ` [Intel-gfx] " Matthew Auld
@ 2021-11-16  7:20       ` Thomas Hellström
  -1 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-16  7:20 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx, dri-devel


On 11/15/21 18:16, Matthew Auld wrote:

Thanks for reviewing, Matthew,

I'll take a look at the comments.

/Thomas


> On 14/11/2021 11:12, Thomas Hellström wrote:
>> Don't wait sync while migrating, but rather make the GPU blit await the
>> dependencies and add a moving fence to the object.
>>
>> This also enables asynchronous VRAM management in that on eviction,
>> rather than waiting for the moving fence to expire before freeing VRAM,
>> it is freed immediately and the fence is stored with the VRAM manager 
>> and
>> handed out to newly allocated objects to await before clears and 
>> swapins,
>> or for kernel objects before setting up gpu vmas or mapping.
>>
>> To collect dependencies before migrating, add a set of utilities that
>> coalesce these to a single dma_fence.
>>
>> What is still missing for fully asynchronous operation is 
>> asynchronous vma
>> unbinding, which is still to be implemented.
>>
>> This commit substantially reduces execution time in the 
>> gem_lmem_swapping
>> test.
>>
>> v2:
>> - Make a couple of functions static. 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Intel-gfx] [PATCH v3 5/6] drm/i915/ttm: Implement asynchronous TTM moves
@ 2021-11-16  7:20       ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-16  7:20 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx, dri-devel


On 11/15/21 18:16, Matthew Auld wrote:

Thanks for reviewing, Matthew,

I'll take a look at the comments.

/Thomas


> On 14/11/2021 11:12, Thomas Hellström wrote:
>> Don't wait sync while migrating, but rather make the GPU blit await the
>> dependencies and add a moving fence to the object.
>>
>> This also enables asynchronous VRAM management in that on eviction,
>> rather than waiting for the moving fence to expire before freeing VRAM,
>> it is freed immediately and the fence is stored with the VRAM manager 
>> and
>> handed out to newly allocated objects to await before clears and 
>> swapins,
>> or for kernel objects before setting up gpu vmas or mapping.
>>
>> To collect dependencies before migrating, add a set of utilities that
>> coalesce these to a single dma_fence.
>>
>> What is still missing for fully asynchronous operation is 
>> asynchronous vma
>> unbinding, which is still to be implemented.
>>
>> This commit substantially reduces execution time in the 
>> gem_lmem_swapping
>> test.
>>
>> v2:
>> - Make a couple of functions static. 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Intel-gfx] [PATCH v3 5/6] drm/i915/ttm: Implement asynchronous TTM moves
  2021-11-15 17:16     ` [Intel-gfx] " Matthew Auld
@ 2021-11-18  7:13       ` Thomas Hellström
  -1 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-18  7:13 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx, dri-devel

Hi, Matthew

Finally got some time to look at this more in-depth, please see below.

On Mon, 2021-11-15 at 17:16 +0000, Matthew Auld wrote:
> On 14/11/2021 11:12, Thomas Hellström wrote:
> > Don't wait sync while migrating, but rather make the GPU blit await
> > the
> > dependencies and add a moving fence to the object.
> > 
> > This also enables asynchronous VRAM management in that on eviction,
> > rather than waiting for the moving fence to expire before freeing
> > VRAM,
> > it is freed immediately and the fence is stored with the VRAM
> > manager and
> > handed out to newly allocated objects to await before clears and
> > swapins,
> > or for kernel objects before setting up gpu vmas or mapping.
> > 
> > To collect dependencies before migrating, add a set of utilities
> > that
> > coalesce these to a single dma_fence.
> > 
> > What is still missing for fully asynchronous operation is
> > asynchronous vma
> > unbinding, which is still to be implemented.
> > 
> > This commit substantially reduces execution time in the
> > gem_lmem_swapping
> > test.
> > 
> > v2:
> > - Make a couple of functions static.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  10 +
> >   drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   2 +-
> >   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 329
> > +++++++++++++++++--
> >   drivers/gpu/drm/i915/gem/i915_gem_wait.c     |   4 +-
> >   4 files changed, 318 insertions(+), 27 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > index a1df49378a0f..111a4282d779 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > @@ -326,6 +326,9 @@ static bool i915_ttm_eviction_valuable(struct
> > ttm_buffer_object *bo,
> >   {
> >         struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> >   
> > +       if (!obj)
> > +               return false;
> > +
> >         /*
> >          * EXTERNAL objects should never be swapped out by TTM,
> > instead we need
> >          * to handle that ourselves. TTM will already skip such
> > objects for us,
> > @@ -448,6 +451,10 @@ static int
> > i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
> >         if (bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED)
> >                 return 0;
> >   
> > +       ret = ttm_bo_wait_ctx(bo, &ctx);
> > +       if (ret)
> > +               return ret;
> 
> 
> Why do we need this? Also not needed for the above purge case?

This is for bos with an on-going async move to system. The
intel_migrate code doesn't set up vmas, so unbinding doesn't
necessarily idle. The purge code currently idles in TTM, but in both
cases we should probably add another argument to
shrinker_release_pages() and move this wait before purge to return -
EBUSY unless we have SHRINK_ACTIVE.

> 
> > +
> >         bo->ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
> >         ret = ttm_bo_validate(bo, &place, &ctx);
> >         if (ret) {
> > @@ -549,6 +556,9 @@ static void i915_ttm_swap_notify(struct
> > ttm_buffer_object *bo)
> >         struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> >         int ret = i915_ttm_move_notify(bo);
> >   
> > +       if (!obj)
> > +               return;
> 
> It looks like the i915_ttm_move_notify(bo) already dereferenced the
> GEM 
> bo. Or did something in there maybe nuke it?
> 
> > +
> >         GEM_WARN_ON(ret);
> >         GEM_WARN_ON(obj->ttm.cached_io_rsgt);
> >         if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> > b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> > index 82cdabb542be..9d698ad00853 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> > @@ -37,7 +37,7 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object
> > *bo);
> >   static inline struct drm_i915_gem_object *
> >   i915_ttm_to_gem(struct ttm_buffer_object *bo)
> >   {
> > -       if (GEM_WARN_ON(bo->destroy != i915_ttm_bo_destroy))
> > +       if (bo->destroy != i915_ttm_bo_destroy)
> >                 return NULL;
> 
> So this would indicate a "ghost" object, or is this something else?
> How 
> scared should we be with this, like with the above checking for NULL
> GEM 
> object state? In general do you know where we need the above
> checking?

Yeah, these are ghost objects and this is a major flaw in TTM in that
some callbacks are per device and not per object. Should have been
fixed long ago :/. For the ttm_tt callbacks, obj might be NULL but we
must still be able to cope with that. For other callbacks we should
ignore the ghost objects. I'll do a second audit here.

> 
> >   
> >         return container_of(bo, struct drm_i915_gem_object,
> > __do_not_access);
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > index f35b386c56ca..ae2c49fc3500 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > @@ -3,6 +3,8 @@
> >    * Copyright © 2021 Intel Corporation
> >    */
> >   
> > +#include <linux/dma-fence-array.h>
> > +
> >   #include <drm/ttm/ttm_bo_driver.h>
> >   
> >   #include "i915_drv.h"
> > @@ -41,6 +43,228 @@ void i915_ttm_migrate_set_failure_modes(bool
> > gpu_migration,
> >   }
> >   #endif
> >   
> > +/**
> > + * DOC: Set of utilities to dynamically collect dependencies and
> > + * eventually coalesce them into a single fence which is fed into
> > + * the migration code. That single fence is, in the case of
> > dependencies
> > + * from multiple contexts, a struct dma_fence_array, since the
> > + * i915 request code can break that up and await the individual
> > + * fences.
> 
> this? IIUC it looks like TTM expects single context/timeline for 
> pipelined move, like with that dma_fence_is_later() check in 
> can link to here?

Yes, currently we only have a single migration fence from the migration
code so we don't need this for pipelined moves yet. But with async
unbinding we do, and then we'd have to coalesce the unbind fences
together with the migration fence (We can allow reading from a bo while
migrating it) and then feed it to the pipelined move cleanup after
attaching it to a timeline, (using dma_fence_chain I guess).

> 
> > + *
> > + * While collecting the individual dependencies, we store the
> > refcounted
> > + * struct dma_fence pointers in a realloc-type-managed pointer
> > array, since
> > + * that can be easily fed into a dma_fence_array. Other options
> > are
> > + * available, like for example an xarray for similarity with
> > drm/sched.
> > + * Can be changed easily if needed.
> > + *
> > + * We might want to break this out into a separate file as a
> > utility.
> > + */
> > +
> > +#define I915_DEPS_MIN_ALLOC_CHUNK 8U
> > +
> > +/**
> > + * struct i915_deps - Collect dependencies into a single dma-fence
> > + * @single: Storage for pointer if the collection is a single
> > fence.
> > + * @fence: Allocated array of fence pointers if more than a single
> > fence;
> > + * otherwise points to the address of @single.
> > + * @num_deps: Current number of dependency fences.
> > + * @fences_size: Size of the @fences array in number of pointers.
> > + * @gfp: Allocation mode.
> > + */
> > +struct i915_deps {
> > +       struct dma_fence *single;
> > +       struct dma_fence **fences;
> > +       unsigned int num_deps;
> > +       unsigned int fences_size;
> > +       gfp_t gfp;
> > +};
> > +
> > +static void i915_deps_reset_fences(struct i915_deps *deps)
> > +{
> > +       if (deps->fences != &deps->single)
> > +               kfree(deps->fences);
> > +       deps->num_deps = 0;
> > +       deps->fences_size = 1;
> > +       deps->fences = &deps->single;
> > +}
> > +
> > +static void i915_deps_init(struct i915_deps *deps, gfp_t gfp)
> > +{
> > +       deps->fences = NULL;
> > +       deps->gfp = gfp;
> > +       i915_deps_reset_fences(deps);
> > +}
> > +
> > +static void i915_deps_fini(struct i915_deps *deps)
> > +{
> > +       unsigned int i;
> > +
> > +       for (i = 0; i < deps->num_deps; ++i)
> > +               dma_fence_put(deps->fences[i]);
> > +
> > +       if (deps->fences != &deps->single)
> > +               kfree(deps->fences);
> > +}
> > +
> > +static int i915_deps_grow(struct i915_deps *deps, struct dma_fence
> > *fence,
> > +                         const struct ttm_operation_ctx *ctx)
> > +{
> > +       int ret;
> > +
> > +       if (deps->num_deps >= deps->fences_size) {
> > +               unsigned int new_size = 2 * deps->fences_size;
> > +               struct dma_fence **new_fences;
> > +
> > +               new_size = max(new_size,
> > I915_DEPS_MIN_ALLOC_CHUNK);
> > +               new_fences = kmalloc_array(new_size,
> > sizeof(*new_fences), deps->gfp);
> > +               if (!new_fences)
> > +                       goto sync;
> > +
> > +               memcpy(new_fences, deps->fences,
> > +                      deps->fences_size * sizeof(*new_fences));
> > +               swap(new_fences, deps->fences);
> > +               if (new_fences != &deps->single)
> > +                       kfree(new_fences);
> > +               deps->fences_size = new_size;
> > +       }
> > +       deps->fences[deps->num_deps++] = dma_fence_get(fence);
> > +       return 0;
> > +
> > +sync:
> > +       if (ctx->no_wait_gpu) {
> > +               ret = -EBUSY;
> > +               goto unref;
> > +       }
> > +
> > +       ret = dma_fence_wait(fence, ctx->interruptible);
> > +       if (ret)
> > +               goto unref;
> > +
> > +       ret = fence->error;
> > +       if (ret)
> > +               goto unref;
> > +
> > +       return 0;
> > +
> > +unref:
> > +       i915_deps_fini(deps);
> > +       return ret;
> > +}
> > +
> > +static int i915_deps_sync(struct i915_deps *deps,
> > +                         const struct ttm_operation_ctx *ctx)
> > +{
> > +       unsigned int i;
> > +       int ret = 0;
> > +       struct dma_fence **fences = deps->fences;
> 
> Nit: Christmas tree.

Will fix.

> 
> > +
> > +       for (i = 0; i < deps->num_deps; ++i, ++fences) {
> > +               if (ctx->no_wait_gpu) {
> > +                       ret = -EBUSY;
> > +                       goto unref;
> > +               }
> > +
> > +               ret = dma_fence_wait(*fences, ctx->interruptible);
> > +               if (ret)
> > +                       goto unref;
> > +
> > +               ret = (*fences)->error;
> > +               if (ret)
> > +                       goto unref;
> > +       }
> > +
> > +       i915_deps_fini(deps);
> > +       return 0;
> > +
> > +unref:
> > +       i915_deps_fini(deps);
> > +       return ret;
> > +}
> > +
> > +static int i915_deps_add_dependency(struct i915_deps *deps,
> > +                                   struct dma_fence *fence,
> > +                                   const struct ttm_operation_ctx
> > *ctx)
> > +{
> > +       unsigned int i;
> > +       int ret;
> > +
> > +       if (!fence)
> > +               return 0;
> > +
> > +       if (dma_fence_is_signaled(fence)) {
> > +               ret = fence->error;
> > +               if (ret)
> > +                       i915_deps_fini(deps);
> > +               return ret;
> > +       }
> > +
> > +       for (i = 0; i < deps->num_deps; ++i) {
> > +               struct dma_fence *entry = deps->fences[i];
> > +
> > +               if (!entry->context || entry->context != fence-
> > >context)
> > +                       continue;
> > +
> > +               if (dma_fence_is_later(fence, entry)) {
> > +                       dma_fence_put(entry);
> > +                       deps->fences[i] = dma_fence_get(fence);
> > +               }
> > +
> > +               return 0;
> > +       }
> > +
> > +       return i915_deps_grow(deps, fence, ctx);
> > +}
> > +
> > +static struct dma_fence *i915_deps_to_fence(struct i915_deps
> > *deps,
> > +                                           const struct
> > ttm_operation_ctx *ctx)
> > +{
> > +       struct dma_fence_array *array;
> > +
> > +       if (deps->num_deps == 0)
> > +               return NULL;
> > +
> > +       if (deps->num_deps == 1) {
> > +               deps->num_deps = 0;
> > +               return deps->fences[0];
> > +       }
> > +
> > +       /*
> > +        * TODO: Alter the allocation mode here to not try too hard
> > to
> > +        * make things async.
> > +        */
> > +       array = dma_fence_array_create(deps->num_deps, deps-
> > >fences, 0, 0,
> > +                                      false);
> > +       if (!array)
> > +               return ERR_PTR(i915_deps_sync(deps, ctx));
> > +
> > +       deps->fences = NULL;
> > +       i915_deps_reset_fences(deps);
> > +
> > +       return &array->base;
> > +}
> > +
> > +static int i915_deps_add_resv(struct i915_deps *deps, struct
> > dma_resv *resv,
> > +                             bool all, const bool no_excl,
> > +                             const struct ttm_operation_ctx *ctx)
> > +{
> > +       struct dma_resv_iter iter;
> > +       struct dma_fence *fence;
> > +
> > +       dma_resv_assert_held(resv);
> > +       dma_resv_for_each_fence(&iter, resv, all, fence) {
> > +               int ret;
> > +
> > +               if (no_excl && !iter.index)
> > +                       continue;
> > +
> > +               ret = i915_deps_add_dependency(deps, fence, ctx);
> > +               if (ret)
> > +                       return ret;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >   static enum i915_cache_level
> >   i915_ttm_cache_level(struct drm_i915_private *i915, struct
> > ttm_resource *res,
> >                      struct ttm_tt *ttm)
> > @@ -156,7 +380,8 @@ static struct dma_fence
> > *i915_ttm_accel_move(struct ttm_buffer_object *bo,
> >                                              bool clear,
> >                                              struct ttm_resource
> > *dst_mem,
> >                                              struct ttm_tt
> > *dst_ttm,
> > -                                            struct sg_table
> > *dst_st)
> > +                                            struct sg_table
> > *dst_st,
> > +                                            struct dma_fence *dep)
> >   {
> >         struct drm_i915_private *i915 = container_of(bo->bdev,
> > typeof(*i915),
> >                                                      bdev);
> > @@ -180,7 +405,7 @@ static struct dma_fence
> > *i915_ttm_accel_move(struct ttm_buffer_object *bo,
> >                         return ERR_PTR(-EINVAL);
> >   
> >                 intel_engine_pm_get(i915->gt.migrate.context-
> > >engine);
> > -               ret = intel_context_migrate_clear(i915-
> > >gt.migrate.context, NULL,
> > +               ret = intel_context_migrate_clear(i915-
> > >gt.migrate.context, dep,
> >                                                   dst_st->sgl,
> > dst_level,
> >                                                  
> > i915_ttm_gtt_binds_lmem(dst_mem),
> >                                                   0, &rq);
> > @@ -194,7 +419,7 @@ static struct dma_fence
> > *i915_ttm_accel_move(struct ttm_buffer_object *bo,
> >                 src_level = i915_ttm_cache_level(i915, bo-
> > >resource, src_ttm);
> >                 intel_engine_pm_get(i915->gt.migrate.context-
> > >engine);
> >                 ret = intel_context_migrate_copy(i915-
> > >gt.migrate.context,
> > -                                                NULL, src_rsgt-
> > >table.sgl,
> > +                                                dep, src_rsgt-
> > >table.sgl,
> >                                                  src_level,
> >                                                 
> > i915_ttm_gtt_binds_lmem(bo->resource),
> >                                                  dst_st->sgl,
> > dst_level,
> > @@ -378,10 +603,11 @@ i915_ttm_memcpy_work_arm(struct
> > i915_ttm_memcpy_work *work,
> >         return &work->fence;
> >   }
> >   
> > -static void __i915_ttm_move(struct ttm_buffer_object *bo, bool
> > clear,
> > -                           struct ttm_resource *dst_mem,
> > -                           struct ttm_tt *dst_ttm,
> > -                           struct i915_refct_sgt *dst_rsgt, bool
> > allow_accel)
> > +static struct dma_fence *
> > +__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
> > +               struct ttm_resource *dst_mem, struct ttm_tt
> > *dst_ttm,
> > +               struct i915_refct_sgt *dst_rsgt, bool allow_accel,
> > +               struct dma_fence *move_dep)
> >   {
> >         struct i915_ttm_memcpy_work *copy_work = NULL;
> >         struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
> > @@ -389,7 +615,7 @@ static void __i915_ttm_move(struct
> > ttm_buffer_object *bo, bool clear,
> >   
> >         if (allow_accel) {
> >                 fence = i915_ttm_accel_move(bo, clear, dst_mem,
> > dst_ttm,
> > -                                           &dst_rsgt->table);
> > +                                           &dst_rsgt->table,
> > move_dep);
> >   
> >                 /*
> >                  * We only need to intercept the error when moving
> > to lmem.
> > @@ -423,6 +649,11 @@ static void __i915_ttm_move(struct
> > ttm_buffer_object *bo, bool clear,
> >   
> >                 if (!IS_ERR(fence))
> >                         goto out;
> > +       } else if (move_dep) {
> > +               int err = dma_fence_wait(move_dep, true);
> > +
> > +               if (err)
> > +                       return ERR_PTR(err);
> >         }
> >   
> >         /* Error intercept failed or no accelerated migration to
> > start with */
> > @@ -433,16 +664,35 @@ static void __i915_ttm_move(struct
> > ttm_buffer_object *bo, bool clear,
> >         i915_ttm_memcpy_release(arg);
> >         kfree(copy_work);
> >   
> > -       return;
> > +       return NULL;
> >   out:
> > -       /* Sync here for now, forward the fence to caller when
> > fully async. */
> > -       if (fence) {
> > -               dma_fence_wait(fence, false);
> > -               dma_fence_put(fence);
> > -       } else if (copy_work) {
> > +       if (!fence && copy_work) {
> >                 i915_ttm_memcpy_release(arg);
> >                 kfree(copy_work);
> >         }
> > +
> > +       return fence;
> > +}
> > +
> > +static struct dma_fence *prev_fence(struct ttm_buffer_object *bo,
> > +                                   struct ttm_operation_ctx *ctx)
> > +{
> > +       struct i915_deps deps;
> > +       int ret;
> > +
> > +       /*
> > +        * Instead of trying hard with GFP_KERNEL to allocate
> > memory,
> > +        * the dependency collection will just sync if it doesn't
> > +        * succeed.
> > +        */
> > +       i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY |
> > __GFP_NOWARN);
> > +       ret = i915_deps_add_dependency(&deps, bo->moving, ctx);
> > +       if (!ret)
> > +               ret = i915_deps_add_resv(&deps, bo->base.resv,
> > false, false, ctx);
> > +       if (ret)
> > +               return ERR_PTR(ret);
> > +
> > +       return i915_deps_to_fence(&deps, ctx);
> >   }
> >   
> >   /**
> > @@ -462,16 +712,12 @@ int i915_ttm_move(struct ttm_buffer_object
> > *bo, bool evict,
> >         struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> >         struct ttm_resource_manager *dst_man =
> >                 ttm_manager_type(bo->bdev, dst_mem->mem_type);
> > +       struct dma_fence *migration_fence = NULL;
> >         struct ttm_tt *ttm = bo->ttm;
> >         struct i915_refct_sgt *dst_rsgt;
> >         bool clear;
> >         int ret;
> >   
> > -       /* Sync for now. We could do the actual copy async. */
> > -       ret = ttm_bo_wait_ctx(bo, ctx);
> > -       if (ret)
> > -               return ret;
> > -
> >         ret = i915_ttm_move_notify(bo);
> >         if (ret)
> >                 return ret;
> > @@ -494,10 +740,37 @@ int i915_ttm_move(struct ttm_buffer_object
> > *bo, bool evict,
> >                 return PTR_ERR(dst_rsgt);
> >   
> >         clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm ||
> > !ttm_tt_is_populated(ttm));
> > -       if (!(clear && ttm && !(ttm->page_flags &
> > TTM_TT_FLAG_ZERO_ALLOC)))
> > -               __i915_ttm_move(bo, clear, dst_mem, bo->ttm,
> > dst_rsgt, true);
> > +       if (!(clear && ttm && !(ttm->page_flags &
> > TTM_TT_FLAG_ZERO_ALLOC))) {
> > +               struct dma_fence *dep = prev_fence(bo, ctx);
> > +
> > +               if (IS_ERR(dep)) {
> > +                       i915_refct_sgt_put(dst_rsgt);
> > +                       return PTR_ERR(dep);
> > +               }
> > +
> > +               migration_fence = __i915_ttm_move(bo, clear,
> > dst_mem, bo->ttm,
> > +                                                 dst_rsgt, true,
> > dep);
> > +               dma_fence_put(dep);
> > +       }
> > +
> > +       /* We can possibly get an -ERESTARTSYS here */
> > +       if (IS_ERR(migration_fence)) {
> > +               i915_refct_sgt_put(dst_rsgt);
> > +               return PTR_ERR(migration_fence);
> > +       }
> > +
> > +       if (migration_fence) {
> > +               ret = ttm_bo_move_accel_cleanup(bo,
> > migration_fence, evict,
> > +                                               true, dst_mem);
> > +               if (ret) {
> > +                       dma_fence_wait(migration_fence, false);
> > +                       ttm_bo_move_sync_cleanup(bo, dst_mem);
> > +               }
> > +               dma_fence_put(migration_fence);
> > +       } else {
> > +               ttm_bo_move_sync_cleanup(bo, dst_mem);
> > +       }
> >   
> > -       ttm_bo_move_sync_cleanup(bo, dst_mem);
> >         i915_ttm_adjust_domains_after_move(obj);
> >         i915_ttm_free_cached_io_rsgt(obj);
> >   
> > @@ -538,6 +811,7 @@ int i915_gem_obj_copy_ttm(struct
> > drm_i915_gem_object *dst,
> >                 .interruptible = intr,
> >         };
> >         struct i915_refct_sgt *dst_rsgt;
> > +       struct dma_fence *copy_fence;
> >         int ret;
> >   
> >         assert_object_held(dst);
> > @@ -553,10 +827,17 @@ int i915_gem_obj_copy_ttm(struct
> > drm_i915_gem_object *dst,
> >                 return ret;
> >   
> >         dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
> > -       __i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo-
> > >ttm,
> > -                       dst_rsgt, allow_accel);
> > +       copy_fence = __i915_ttm_move(src_bo, false, dst_bo-
> > >resource,
> > +                                    dst_bo->ttm, dst_rsgt,
> > allow_accel, NULL);
> >   
> >         i915_refct_sgt_put(dst_rsgt);
> > +       if (IS_ERR(copy_fence))
> > +               return PTR_ERR(copy_fence);
> > +
> > +       if (copy_fence) {
> > +               dma_fence_wait(copy_fence, false);
> > +               dma_fence_put(copy_fence);
> > +       }
> >   
> >         return 0;
> >   }
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > index f909aaa09d9c..bae65796a6cc 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > @@ -306,6 +306,6 @@ int i915_gem_object_wait_migration(struct
> > drm_i915_gem_object *obj,
> >                                    unsigned int flags)
> >   {
> >         might_sleep();
> > -       /* NOP for now. */
> > -       return 0;
> > +
> > +       return i915_gem_object_wait_moving_fence(obj, !!(flags &
> > I915_WAIT_INTERRUPTIBLE));
> >   }
> > 



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 5/6] drm/i915/ttm: Implement asynchronous TTM moves
@ 2021-11-18  7:13       ` Thomas Hellström
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Hellström @ 2021-11-18  7:13 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx, dri-devel

Hi, Matthew

Finally got some time to look at this more in-depth, please see below.

On Mon, 2021-11-15 at 17:16 +0000, Matthew Auld wrote:
> On 14/11/2021 11:12, Thomas Hellström wrote:
> > Don't wait sync while migrating, but rather make the GPU blit await
> > the
> > dependencies and add a moving fence to the object.
> > 
> > This also enables asynchronous VRAM management in that on eviction,
> > rather than waiting for the moving fence to expire before freeing
> > VRAM,
> > it is freed immediately and the fence is stored with the VRAM
> > manager and
> > handed out to newly allocated objects to await before clears and
> > swapins,
> > or for kernel objects before setting up gpu vmas or mapping.
> > 
> > To collect dependencies before migrating, add a set of utilities
> > that
> > coalesce these to a single dma_fence.
> > 
> > What is still missing for fully asynchronous operation is
> > asynchronous vma
> > unbinding, which is still to be implemented.
> > 
> > This commit substantially reduces execution time in the
> > gem_lmem_swapping
> > test.
> > 
> > v2:
> > - Make a couple of functions static.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  10 +
> >   drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   2 +-
> >   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 329
> > +++++++++++++++++--
> >   drivers/gpu/drm/i915/gem/i915_gem_wait.c     |   4 +-
> >   4 files changed, 318 insertions(+), 27 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > index a1df49378a0f..111a4282d779 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > @@ -326,6 +326,9 @@ static bool i915_ttm_eviction_valuable(struct
> > ttm_buffer_object *bo,
> >   {
> >         struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> >   
> > +       if (!obj)
> > +               return false;
> > +
> >         /*
> >          * EXTERNAL objects should never be swapped out by TTM,
> > instead we need
> >          * to handle that ourselves. TTM will already skip such
> > objects for us,
> > @@ -448,6 +451,10 @@ static int
> > i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
> >         if (bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED)
> >                 return 0;
> >   
> > +       ret = ttm_bo_wait_ctx(bo, &ctx);
> > +       if (ret)
> > +               return ret;
> 
> 
> Why do we need this? Also not needed for the above purge case?

This is for bos with an on-going async move to system. The
intel_migrate code doesn't set up vmas, so unbinding doesn't
necessarily idle. The purge code currently idles in TTM, but in both
cases we should probably add another argument to
shrinker_release_pages() and move this wait before purge to return -
EBUSY unless we have SHRINK_ACTIVE.

> 
> > +
> >         bo->ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
> >         ret = ttm_bo_validate(bo, &place, &ctx);
> >         if (ret) {
> > @@ -549,6 +556,9 @@ static void i915_ttm_swap_notify(struct
> > ttm_buffer_object *bo)
> >         struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> >         int ret = i915_ttm_move_notify(bo);
> >   
> > +       if (!obj)
> > +               return;
> 
> It looks like the i915_ttm_move_notify(bo) already dereferenced the
> GEM 
> bo. Or did something in there maybe nuke it?
> 
> > +
> >         GEM_WARN_ON(ret);
> >         GEM_WARN_ON(obj->ttm.cached_io_rsgt);
> >         if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> > b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> > index 82cdabb542be..9d698ad00853 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
> > @@ -37,7 +37,7 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object
> > *bo);
> >   static inline struct drm_i915_gem_object *
> >   i915_ttm_to_gem(struct ttm_buffer_object *bo)
> >   {
> > -       if (GEM_WARN_ON(bo->destroy != i915_ttm_bo_destroy))
> > +       if (bo->destroy != i915_ttm_bo_destroy)
> >                 return NULL;
> 
> So this would indicate a "ghost" object, or is this something else?
> How 
> scared should we be with this, like with the above checking for NULL
> GEM 
> object state? In general do you know where we need the above
> checking?

Yeah, these are ghost objects and this is a major flaw in TTM in that
some callbacks are per device and not per object. Should have been
fixed long ago :/. For the ttm_tt callbacks, obj might be NULL but we
must still be able to cope with that. For other callbacks we should
ignore the ghost objects. I'll do a second audit here.

> 
> >   
> >         return container_of(bo, struct drm_i915_gem_object,
> > __do_not_access);
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > index f35b386c56ca..ae2c49fc3500 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > @@ -3,6 +3,8 @@
> >    * Copyright © 2021 Intel Corporation
> >    */
> >   
> > +#include <linux/dma-fence-array.h>
> > +
> >   #include <drm/ttm/ttm_bo_driver.h>
> >   
> >   #include "i915_drv.h"
> > @@ -41,6 +43,228 @@ void i915_ttm_migrate_set_failure_modes(bool
> > gpu_migration,
> >   }
> >   #endif
> >   
> > +/**
> > + * DOC: Set of utilities to dynamically collect dependencies and
> > + * eventually coalesce them into a single fence which is fed into
> > + * the migration code. That single fence is, in the case of
> > dependencies
> > + * from multiple contexts, a struct dma_fence_array, since the
> > + * i915 request code can break that up and await the individual
> > + * fences.
> 
> this? IIUC it looks like TTM expects single context/timeline for 
> pipelined move, like with that dma_fence_is_later() check in 
> can link to here?

Yes, currently we only have a single migration fence from the migration
code so we don't need this for pipelined moves yet. But with async
unbinding we do, and then we'd have to coalesce the unbind fences
together with the migration fence (We can allow reading from a bo while
migrating it) and then feed it to the pipelined move cleanup after
attaching it to a timeline, (using dma_fence_chain I guess).

> 
> > + *
> > + * While collecting the individual dependencies, we store the
> > refcounted
> > + * struct dma_fence pointers in a realloc-type-managed pointer
> > array, since
> > + * that can be easily fed into a dma_fence_array. Other options
> > are
> > + * available, like for example an xarray for similarity with
> > drm/sched.
> > + * Can be changed easily if needed.
> > + *
> > + * We might want to break this out into a separate file as a
> > utility.
> > + */
> > +
> > +#define I915_DEPS_MIN_ALLOC_CHUNK 8U
> > +
> > +/**
> > + * struct i915_deps - Collect dependencies into a single dma-fence
> > + * @single: Storage for pointer if the collection is a single
> > fence.
> > + * @fence: Allocated array of fence pointers if more than a single
> > fence;
> > + * otherwise points to the address of @single.
> > + * @num_deps: Current number of dependency fences.
> > + * @fences_size: Size of the @fences array in number of pointers.
> > + * @gfp: Allocation mode.
> > + */
> > +struct i915_deps {
> > +       struct dma_fence *single;
> > +       struct dma_fence **fences;
> > +       unsigned int num_deps;
> > +       unsigned int fences_size;
> > +       gfp_t gfp;
> > +};
> > +
> > +static void i915_deps_reset_fences(struct i915_deps *deps)
> > +{
> > +       if (deps->fences != &deps->single)
> > +               kfree(deps->fences);
> > +       deps->num_deps = 0;
> > +       deps->fences_size = 1;
> > +       deps->fences = &deps->single;
> > +}
> > +
> > +static void i915_deps_init(struct i915_deps *deps, gfp_t gfp)
> > +{
> > +       deps->fences = NULL;
> > +       deps->gfp = gfp;
> > +       i915_deps_reset_fences(deps);
> > +}
> > +
> > +static void i915_deps_fini(struct i915_deps *deps)
> > +{
> > +       unsigned int i;
> > +
> > +       for (i = 0; i < deps->num_deps; ++i)
> > +               dma_fence_put(deps->fences[i]);
> > +
> > +       if (deps->fences != &deps->single)
> > +               kfree(deps->fences);
> > +}
> > +
> > +static int i915_deps_grow(struct i915_deps *deps, struct dma_fence
> > *fence,
> > +                         const struct ttm_operation_ctx *ctx)
> > +{
> > +       int ret;
> > +
> > +       if (deps->num_deps >= deps->fences_size) {
> > +               unsigned int new_size = 2 * deps->fences_size;
> > +               struct dma_fence **new_fences;
> > +
> > +               new_size = max(new_size,
> > I915_DEPS_MIN_ALLOC_CHUNK);
> > +               new_fences = kmalloc_array(new_size,
> > sizeof(*new_fences), deps->gfp);
> > +               if (!new_fences)
> > +                       goto sync;
> > +
> > +               memcpy(new_fences, deps->fences,
> > +                      deps->fences_size * sizeof(*new_fences));
> > +               swap(new_fences, deps->fences);
> > +               if (new_fences != &deps->single)
> > +                       kfree(new_fences);
> > +               deps->fences_size = new_size;
> > +       }
> > +       deps->fences[deps->num_deps++] = dma_fence_get(fence);
> > +       return 0;
> > +
> > +sync:
> > +       if (ctx->no_wait_gpu) {
> > +               ret = -EBUSY;
> > +               goto unref;
> > +       }
> > +
> > +       ret = dma_fence_wait(fence, ctx->interruptible);
> > +       if (ret)
> > +               goto unref;
> > +
> > +       ret = fence->error;
> > +       if (ret)
> > +               goto unref;
> > +
> > +       return 0;
> > +
> > +unref:
> > +       i915_deps_fini(deps);
> > +       return ret;
> > +}
> > +
> > +static int i915_deps_sync(struct i915_deps *deps,
> > +                         const struct ttm_operation_ctx *ctx)
> > +{
> > +       unsigned int i;
> > +       int ret = 0;
> > +       struct dma_fence **fences = deps->fences;
> 
> Nit: Christmas tree.

Will fix.

> 
> > +
> > +       for (i = 0; i < deps->num_deps; ++i, ++fences) {
> > +               if (ctx->no_wait_gpu) {
> > +                       ret = -EBUSY;
> > +                       goto unref;
> > +               }
> > +
> > +               ret = dma_fence_wait(*fences, ctx->interruptible);
> > +               if (ret)
> > +                       goto unref;
> > +
> > +               ret = (*fences)->error;
> > +               if (ret)
> > +                       goto unref;
> > +       }
> > +
> > +       i915_deps_fini(deps);
> > +       return 0;
> > +
> > +unref:
> > +       i915_deps_fini(deps);
> > +       return ret;
> > +}
> > +
> > +static int i915_deps_add_dependency(struct i915_deps *deps,
> > +                                   struct dma_fence *fence,
> > +                                   const struct ttm_operation_ctx
> > *ctx)
> > +{
> > +       unsigned int i;
> > +       int ret;
> > +
> > +       if (!fence)
> > +               return 0;
> > +
> > +       if (dma_fence_is_signaled(fence)) {
> > +               ret = fence->error;
> > +               if (ret)
> > +                       i915_deps_fini(deps);
> > +               return ret;
> > +       }
> > +
> > +       for (i = 0; i < deps->num_deps; ++i) {
> > +               struct dma_fence *entry = deps->fences[i];
> > +
> > +               if (!entry->context || entry->context != fence-
> > >context)
> > +                       continue;
> > +
> > +               if (dma_fence_is_later(fence, entry)) {
> > +                       dma_fence_put(entry);
> > +                       deps->fences[i] = dma_fence_get(fence);
> > +               }
> > +
> > +               return 0;
> > +       }
> > +
> > +       return i915_deps_grow(deps, fence, ctx);
> > +}
> > +
> > +static struct dma_fence *i915_deps_to_fence(struct i915_deps
> > *deps,
> > +                                           const struct
> > ttm_operation_ctx *ctx)
> > +{
> > +       struct dma_fence_array *array;
> > +
> > +       if (deps->num_deps == 0)
> > +               return NULL;
> > +
> > +       if (deps->num_deps == 1) {
> > +               deps->num_deps = 0;
> > +               return deps->fences[0];
> > +       }
> > +
> > +       /*
> > +        * TODO: Alter the allocation mode here to not try too hard
> > to
> > +        * make things async.
> > +        */
> > +       array = dma_fence_array_create(deps->num_deps, deps-
> > >fences, 0, 0,
> > +                                      false);
> > +       if (!array)
> > +               return ERR_PTR(i915_deps_sync(deps, ctx));
> > +
> > +       deps->fences = NULL;
> > +       i915_deps_reset_fences(deps);
> > +
> > +       return &array->base;
> > +}
> > +
> > +static int i915_deps_add_resv(struct i915_deps *deps, struct
> > dma_resv *resv,
> > +                             bool all, const bool no_excl,
> > +                             const struct ttm_operation_ctx *ctx)
> > +{
> > +       struct dma_resv_iter iter;
> > +       struct dma_fence *fence;
> > +
> > +       dma_resv_assert_held(resv);
> > +       dma_resv_for_each_fence(&iter, resv, all, fence) {
> > +               int ret;
> > +
> > +               if (no_excl && !iter.index)
> > +                       continue;
> > +
> > +               ret = i915_deps_add_dependency(deps, fence, ctx);
> > +               if (ret)
> > +                       return ret;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >   static enum i915_cache_level
> >   i915_ttm_cache_level(struct drm_i915_private *i915, struct
> > ttm_resource *res,
> >                      struct ttm_tt *ttm)
> > @@ -156,7 +380,8 @@ static struct dma_fence
> > *i915_ttm_accel_move(struct ttm_buffer_object *bo,
> >                                              bool clear,
> >                                              struct ttm_resource
> > *dst_mem,
> >                                              struct ttm_tt
> > *dst_ttm,
> > -                                            struct sg_table
> > *dst_st)
> > +                                            struct sg_table
> > *dst_st,
> > +                                            struct dma_fence *dep)
> >   {
> >         struct drm_i915_private *i915 = container_of(bo->bdev,
> > typeof(*i915),
> >                                                      bdev);
> > @@ -180,7 +405,7 @@ static struct dma_fence
> > *i915_ttm_accel_move(struct ttm_buffer_object *bo,
> >                         return ERR_PTR(-EINVAL);
> >   
> >                 intel_engine_pm_get(i915->gt.migrate.context-
> > >engine);
> > -               ret = intel_context_migrate_clear(i915-
> > >gt.migrate.context, NULL,
> > +               ret = intel_context_migrate_clear(i915-
> > >gt.migrate.context, dep,
> >                                                   dst_st->sgl,
> > dst_level,
> >                                                  
> > i915_ttm_gtt_binds_lmem(dst_mem),
> >                                                   0, &rq);
> > @@ -194,7 +419,7 @@ static struct dma_fence
> > *i915_ttm_accel_move(struct ttm_buffer_object *bo,
> >                 src_level = i915_ttm_cache_level(i915, bo-
> > >resource, src_ttm);
> >                 intel_engine_pm_get(i915->gt.migrate.context-
> > >engine);
> >                 ret = intel_context_migrate_copy(i915-
> > >gt.migrate.context,
> > -                                                NULL, src_rsgt-
> > >table.sgl,
> > +                                                dep, src_rsgt-
> > >table.sgl,
> >                                                  src_level,
> >                                                 
> > i915_ttm_gtt_binds_lmem(bo->resource),
> >                                                  dst_st->sgl,
> > dst_level,
> > @@ -378,10 +603,11 @@ i915_ttm_memcpy_work_arm(struct
> > i915_ttm_memcpy_work *work,
> >         return &work->fence;
> >   }
> >   
> > -static void __i915_ttm_move(struct ttm_buffer_object *bo, bool
> > clear,
> > -                           struct ttm_resource *dst_mem,
> > -                           struct ttm_tt *dst_ttm,
> > -                           struct i915_refct_sgt *dst_rsgt, bool
> > allow_accel)
> > +static struct dma_fence *
> > +__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
> > +               struct ttm_resource *dst_mem, struct ttm_tt
> > *dst_ttm,
> > +               struct i915_refct_sgt *dst_rsgt, bool allow_accel,
> > +               struct dma_fence *move_dep)
> >   {
> >         struct i915_ttm_memcpy_work *copy_work = NULL;
> >         struct i915_ttm_memcpy_arg _arg, *arg = &_arg;
> > @@ -389,7 +615,7 @@ static void __i915_ttm_move(struct
> > ttm_buffer_object *bo, bool clear,
> >   
> >         if (allow_accel) {
> >                 fence = i915_ttm_accel_move(bo, clear, dst_mem,
> > dst_ttm,
> > -                                           &dst_rsgt->table);
> > +                                           &dst_rsgt->table,
> > move_dep);
> >   
> >                 /*
> >                  * We only need to intercept the error when moving
> > to lmem.
> > @@ -423,6 +649,11 @@ static void __i915_ttm_move(struct
> > ttm_buffer_object *bo, bool clear,
> >   
> >                 if (!IS_ERR(fence))
> >                         goto out;
> > +       } else if (move_dep) {
> > +               int err = dma_fence_wait(move_dep, true);
> > +
> > +               if (err)
> > +                       return ERR_PTR(err);
> >         }
> >   
> >         /* Error intercept failed or no accelerated migration to
> > start with */
> > @@ -433,16 +664,35 @@ static void __i915_ttm_move(struct
> > ttm_buffer_object *bo, bool clear,
> >         i915_ttm_memcpy_release(arg);
> >         kfree(copy_work);
> >   
> > -       return;
> > +       return NULL;
> >   out:
> > -       /* Sync here for now, forward the fence to caller when
> > fully async. */
> > -       if (fence) {
> > -               dma_fence_wait(fence, false);
> > -               dma_fence_put(fence);
> > -       } else if (copy_work) {
> > +       if (!fence && copy_work) {
> >                 i915_ttm_memcpy_release(arg);
> >                 kfree(copy_work);
> >         }
> > +
> > +       return fence;
> > +}
> > +
> > +static struct dma_fence *prev_fence(struct ttm_buffer_object *bo,
> > +                                   struct ttm_operation_ctx *ctx)
> > +{
> > +       struct i915_deps deps;
> > +       int ret;
> > +
> > +       /*
> > +        * Instead of trying hard with GFP_KERNEL to allocate
> > memory,
> > +        * the dependency collection will just sync if it doesn't
> > +        * succeed.
> > +        */
> > +       i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY |
> > __GFP_NOWARN);
> > +       ret = i915_deps_add_dependency(&deps, bo->moving, ctx);
> > +       if (!ret)
> > +               ret = i915_deps_add_resv(&deps, bo->base.resv,
> > false, false, ctx);
> > +       if (ret)
> > +               return ERR_PTR(ret);
> > +
> > +       return i915_deps_to_fence(&deps, ctx);
> >   }
> >   
> >   /**
> > @@ -462,16 +712,12 @@ int i915_ttm_move(struct ttm_buffer_object
> > *bo, bool evict,
> >         struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> >         struct ttm_resource_manager *dst_man =
> >                 ttm_manager_type(bo->bdev, dst_mem->mem_type);
> > +       struct dma_fence *migration_fence = NULL;
> >         struct ttm_tt *ttm = bo->ttm;
> >         struct i915_refct_sgt *dst_rsgt;
> >         bool clear;
> >         int ret;
> >   
> > -       /* Sync for now. We could do the actual copy async. */
> > -       ret = ttm_bo_wait_ctx(bo, ctx);
> > -       if (ret)
> > -               return ret;
> > -
> >         ret = i915_ttm_move_notify(bo);
> >         if (ret)
> >                 return ret;
> > @@ -494,10 +740,37 @@ int i915_ttm_move(struct ttm_buffer_object
> > *bo, bool evict,
> >                 return PTR_ERR(dst_rsgt);
> >   
> >         clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm ||
> > !ttm_tt_is_populated(ttm));
> > -       if (!(clear && ttm && !(ttm->page_flags &
> > TTM_TT_FLAG_ZERO_ALLOC)))
> > -               __i915_ttm_move(bo, clear, dst_mem, bo->ttm,
> > dst_rsgt, true);
> > +       if (!(clear && ttm && !(ttm->page_flags &
> > TTM_TT_FLAG_ZERO_ALLOC))) {
> > +               struct dma_fence *dep = prev_fence(bo, ctx);
> > +
> > +               if (IS_ERR(dep)) {
> > +                       i915_refct_sgt_put(dst_rsgt);
> > +                       return PTR_ERR(dep);
> > +               }
> > +
> > +               migration_fence = __i915_ttm_move(bo, clear,
> > dst_mem, bo->ttm,
> > +                                                 dst_rsgt, true,
> > dep);
> > +               dma_fence_put(dep);
> > +       }
> > +
> > +       /* We can possibly get an -ERESTARTSYS here */
> > +       if (IS_ERR(migration_fence)) {
> > +               i915_refct_sgt_put(dst_rsgt);
> > +               return PTR_ERR(migration_fence);
> > +       }
> > +
> > +       if (migration_fence) {
> > +               ret = ttm_bo_move_accel_cleanup(bo,
> > migration_fence, evict,
> > +                                               true, dst_mem);
> > +               if (ret) {
> > +                       dma_fence_wait(migration_fence, false);
> > +                       ttm_bo_move_sync_cleanup(bo, dst_mem);
> > +               }
> > +               dma_fence_put(migration_fence);
> > +       } else {
> > +               ttm_bo_move_sync_cleanup(bo, dst_mem);
> > +       }
> >   
> > -       ttm_bo_move_sync_cleanup(bo, dst_mem);
> >         i915_ttm_adjust_domains_after_move(obj);
> >         i915_ttm_free_cached_io_rsgt(obj);
> >   
> > @@ -538,6 +811,7 @@ int i915_gem_obj_copy_ttm(struct
> > drm_i915_gem_object *dst,
> >                 .interruptible = intr,
> >         };
> >         struct i915_refct_sgt *dst_rsgt;
> > +       struct dma_fence *copy_fence;
> >         int ret;
> >   
> >         assert_object_held(dst);
> > @@ -553,10 +827,17 @@ int i915_gem_obj_copy_ttm(struct
> > drm_i915_gem_object *dst,
> >                 return ret;
> >   
> >         dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
> > -       __i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo-
> > >ttm,
> > -                       dst_rsgt, allow_accel);
> > +       copy_fence = __i915_ttm_move(src_bo, false, dst_bo-
> > >resource,
> > +                                    dst_bo->ttm, dst_rsgt,
> > allow_accel, NULL);
> >   
> >         i915_refct_sgt_put(dst_rsgt);
> > +       if (IS_ERR(copy_fence))
> > +               return PTR_ERR(copy_fence);
> > +
> > +       if (copy_fence) {
> > +               dma_fence_wait(copy_fence, false);
> > +               dma_fence_put(copy_fence);
> > +       }
> >   
> >         return 0;
> >   }
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > index f909aaa09d9c..bae65796a6cc 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > @@ -306,6 +306,6 @@ int i915_gem_object_wait_migration(struct
> > drm_i915_gem_object *obj,
> >                                    unsigned int flags)
> >   {
> >         might_sleep();
> > -       /* NOP for now. */
> > -       return 0;
> > +
> > +       return i915_gem_object_wait_moving_fence(obj, !!(flags &
> > I915_WAIT_INTERRUPTIBLE));
> >   }
> > 



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-11-18  7:13 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-14 11:12 [PATCH v3 0/6] drm/i915/ttm: Async migration Thomas Hellström
2021-11-14 11:12 ` [Intel-gfx] " Thomas Hellström
2021-11-14 11:12 ` [PATCH v3 1/6] drm/i915: Add functions to set/get moving fence Thomas Hellström
2021-11-14 11:12   ` [Intel-gfx] " Thomas Hellström
2021-11-15 12:39   ` Matthew Auld
2021-11-15 12:39     ` [Intel-gfx] " Matthew Auld
2021-11-15 12:44     ` Thomas Hellström
2021-11-15 12:44       ` [Intel-gfx] " Thomas Hellström
2021-11-14 11:12 ` [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting Thomas Hellström
2021-11-14 11:12   ` [Intel-gfx] " Thomas Hellström
2021-11-15 12:36   ` Matthew Auld
2021-11-15 12:36     ` [Intel-gfx] " Matthew Auld
2021-11-15 12:42     ` Thomas Hellström
2021-11-15 12:42       ` [Intel-gfx] " Thomas Hellström
2021-11-15 13:13       ` Matthew Auld
2021-11-15 13:13         ` [Intel-gfx] " Matthew Auld
2021-11-15 13:29         ` Thomas Hellström
2021-11-15 13:29           ` [Intel-gfx] " Thomas Hellström
2021-11-14 11:12 ` [PATCH v3 3/6] drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function Thomas Hellström
2021-11-14 11:12   ` [Intel-gfx] " Thomas Hellström
2021-11-15 10:42   ` Matthew Auld
2021-11-15 10:42     ` [Intel-gfx] " Matthew Auld
2021-11-14 11:12 ` [PATCH v3 4/6] drm/i915/ttm: Break refcounting loops at device region unref time Thomas Hellström
2021-11-14 11:12   ` [Intel-gfx] " Thomas Hellström
2021-11-15 10:49   ` Matthew Auld
2021-11-15 10:49     ` [Intel-gfx] " Matthew Auld
2021-11-14 11:12 ` [PATCH v3 5/6] drm/i915/ttm: Implement asynchronous TTM moves Thomas Hellström
2021-11-14 11:12   ` [Intel-gfx] " Thomas Hellström
2021-11-15 17:16   ` Matthew Auld
2021-11-15 17:16     ` [Intel-gfx] " Matthew Auld
2021-11-16  7:20     ` Thomas Hellström
2021-11-16  7:20       ` [Intel-gfx] " Thomas Hellström
2021-11-18  7:13     ` Thomas Hellström
2021-11-18  7:13       ` Thomas Hellström
2021-11-14 11:12 ` [PATCH v3 6/6] drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous Thomas Hellström
2021-11-14 11:12   ` [Intel-gfx] " Thomas Hellström
2021-11-14 11:25 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/ttm: Async migration (rev4) Patchwork
2021-11-14 11:28 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2021-11-14 11:52 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-11-14 13:32 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.