All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] drm/i915: Failsafe migration blits
@ 2021-10-08 13:35 ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

This patch series introduces failsafe migration blits.
The reason for this seemingly strange concept is that if the initial
clearing or readback of LMEM fails for some reason, and we then set up
either GPU- or CPU ptes to the allocated LMEM, we can expose old
contents from other clients.

So after each migration blit we attach a struct dma-fence-work that checks
the error value and if it's an error, perform a memcpy blit, instead.

This comes with some needed infrastructure updates:

Patch 1, updates dma_fence_work to do the work even if there is an error.
The work callback needs to check for error and take action accordingly.
Patch 2, Introduces refcounted sg-tables. The sg-tables are needed async for
the memcpy.
Patch 3, Introduces the failsafe migration blits and selftests.
Patch 4, Adds the possibility to attach the struct dma_fence_work to a timeline.
Patch 5, Attached the migration fence to a timeline since TTM requires that
for upcoming async eviction.
Patch 6 Adds an optimization for coalescing-only struct dma_fence_work.

Worth to consider during review: Patch 4-6 are probably better done in the
context of struct dma_fence_array. Both since we perhaps shouldn't add
irq work to yet another fence data structure and also because the
i915 command submission can individualize struct dma_fence_arrays.

Also the memcpy solution here isn't a final one as it only works if the
aperture covers all of lmem. We probably need to work on a solution where
we intercept move_fence errors and refuse GPU- and CPU mappings.

Thomas Hellström (6):
  drm/i915: Update dma_fence_work
  drm/i915: Introduce refcounted sg-tables
  drm/i915/ttm: Failsafe migration blits
  drm/i915: Add a struct dma_fence_work timeline
  drm/i915/ttm: Attach the migration fence to a region timeline on
    eviction
  drm/i915: Use irq work for coalescing-only dma-fence-work

 drivers/gpu/drm/i915/gem/i915_gem_clflush.c   |   5 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 467 ++++++++++++++----
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   4 +
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  24 +-
 drivers/gpu/drm/i915/i915_scatterlist.c       |  62 ++-
 drivers/gpu/drm/i915/i915_scatterlist.h       |  76 ++-
 drivers/gpu/drm/i915/i915_sw_fence_work.c     | 145 +++++-
 drivers/gpu/drm/i915/i915_sw_fence_work.h     |  61 +++
 drivers/gpu/drm/i915/i915_vma.c               |  12 +-
 drivers/gpu/drm/i915/intel_memory_region.c    |  43 ++
 drivers/gpu/drm/i915/intel_memory_region.h    |   7 +
 drivers/gpu/drm/i915/intel_region_ttm.c       |  15 +-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   5 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 15 files changed, 776 insertions(+), 165 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH 0/6] drm/i915: Failsafe migration blits
@ 2021-10-08 13:35 ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

This patch series introduces failsafe migration blits.
The reason for this seemingly strange concept is that if the initial
clearing or readback of LMEM fails for some reason, and we then set up
either GPU- or CPU ptes to the allocated LMEM, we can expose old
contents from other clients.

So after each migration blit we attach a struct dma-fence-work that checks
the error value and if it's an error, perform a memcpy blit, instead.

This comes with some needed infrastructure updates:

Patch 1, updates dma_fence_work to do the work even if there is an error.
The work callback needs to check for error and take action accordingly.
Patch 2, Introduces refcounted sg-tables. The sg-tables are needed async for
the memcpy.
Patch 3, Introduces the failsafe migration blits and selftests.
Patch 4, Adds the possibility to attach the struct dma_fence_work to a timeline.
Patch 5, Attached the migration fence to a timeline since TTM requires that
for upcoming async eviction.
Patch 6 Adds an optimization for coalescing-only struct dma_fence_work.

Worth to consider during review: Patch 4-6 are probably better done in the
context of struct dma_fence_array. Both since we perhaps shouldn't add
irq work to yet another fence data structure and also because the
i915 command submission can individualize struct dma_fence_arrays.

Also the memcpy solution here isn't a final one as it only works if the
aperture covers all of lmem. We probably need to work on a solution where
we intercept move_fence errors and refuse GPU- and CPU mappings.

Thomas Hellström (6):
  drm/i915: Update dma_fence_work
  drm/i915: Introduce refcounted sg-tables
  drm/i915/ttm: Failsafe migration blits
  drm/i915: Add a struct dma_fence_work timeline
  drm/i915/ttm: Attach the migration fence to a region timeline on
    eviction
  drm/i915: Use irq work for coalescing-only dma-fence-work

 drivers/gpu/drm/i915/gem/i915_gem_clflush.c   |   5 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 467 ++++++++++++++----
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   4 +
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  24 +-
 drivers/gpu/drm/i915/i915_scatterlist.c       |  62 ++-
 drivers/gpu/drm/i915/i915_scatterlist.h       |  76 ++-
 drivers/gpu/drm/i915/i915_sw_fence_work.c     | 145 +++++-
 drivers/gpu/drm/i915/i915_sw_fence_work.h     |  61 +++
 drivers/gpu/drm/i915/i915_vma.c               |  12 +-
 drivers/gpu/drm/i915/intel_memory_region.c    |  43 ++
 drivers/gpu/drm/i915/intel_memory_region.h    |   7 +
 drivers/gpu/drm/i915/intel_region_ttm.c       |  15 +-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   5 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 15 files changed, 776 insertions(+), 165 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/6] drm/i915: Update dma_fence_work
  2021-10-08 13:35 ` [Intel-gfx] " Thomas Hellström
@ 2021-10-08 13:35   ` Thomas Hellström
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

Move the release callback to after fence signaling to align with
what's done for upcoming VM_BIND user-fence signaling.

Finally call the work callback regardless of whether we have a fence
error or not and update the existing callbacks accordingly. We will
need this to intercept the error for failsafe migration.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_clflush.c |  5 +++
 drivers/gpu/drm/i915/i915_sw_fence_work.c   | 36 ++++++++++-----------
 drivers/gpu/drm/i915/i915_sw_fence_work.h   |  1 +
 drivers/gpu/drm/i915/i915_vma.c             | 12 +++++--
 4 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
index f0435c6feb68..2143ebaf5b6f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
@@ -28,6 +28,11 @@ static void clflush_work(struct dma_fence_work *base)
 {
 	struct clflush *clflush = container_of(base, typeof(*clflush), base);
 
+	if (base->error) {
+		dma_fence_set_error(&base->dma, base->error);
+		return;
+	}
+
 	__do_clflush(clflush->obj);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index 5b33ef23d54c..5b55cddafc9b 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -6,21 +6,24 @@
 
 #include "i915_sw_fence_work.h"
 
-static void fence_complete(struct dma_fence_work *f)
+static void dma_fence_work_complete(struct dma_fence_work *f)
 {
+	dma_fence_signal(&f->dma);
+
 	if (f->ops->release)
 		f->ops->release(f);
-	dma_fence_signal(&f->dma);
+
+	dma_fence_put(&f->dma);
 }
 
-static void fence_work(struct work_struct *work)
+static void dma_fence_work_work(struct work_struct *work)
 {
 	struct dma_fence_work *f = container_of(work, typeof(*f), work);
 
-	f->ops->work(f);
+	if (f->ops->work)
+		f->ops->work(f);
 
-	fence_complete(f);
-	dma_fence_put(&f->dma);
+	dma_fence_work_complete(f);
 }
 
 static int __i915_sw_fence_call
@@ -31,17 +34,13 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	switch (state) {
 	case FENCE_COMPLETE:
 		if (fence->error)
-			dma_fence_set_error(&f->dma, fence->error);
-
-		if (!f->dma.error) {
-			dma_fence_get(&f->dma);
-			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
-				fence_work(&f->work);
-			else
-				queue_work(system_unbound_wq, &f->work);
-		} else {
-			fence_complete(f);
-		}
+			cmpxchg(&f->error, 0, fence->error);
+
+		dma_fence_get(&f->dma);
+		if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
+			dma_fence_work_work(&f->work);
+		else
+			queue_work(system_unbound_wq, &f->work);
 		break;
 
 	case FENCE_FREE:
@@ -84,10 +83,11 @@ void dma_fence_work_init(struct dma_fence_work *f,
 			 const struct dma_fence_work_ops *ops)
 {
 	f->ops = ops;
+	f->error = 0;
 	spin_lock_init(&f->lock);
 	dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0);
 	i915_sw_fence_init(&f->chain, fence_notify);
-	INIT_WORK(&f->work, fence_work);
+	INIT_WORK(&f->work, dma_fence_work_work);
 }
 
 int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal)
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
index d56806918d13..caa59fb5252b 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
@@ -24,6 +24,7 @@ struct dma_fence_work_ops {
 struct dma_fence_work {
 	struct dma_fence dma;
 	spinlock_t lock;
+	int error;
 
 	struct i915_sw_fence chain;
 	struct i915_sw_dma_fence_cb cb;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 4b7fc4647e46..5123ac28ad9a 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -301,6 +301,11 @@ static void __vma_bind(struct dma_fence_work *work)
 	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
 	struct i915_vma *vma = vw->vma;
 
+	if (work->error) {
+		dma_fence_set_error(&work->dma, work->error);
+		return;
+	}
+
 	vma->ops->bind_vma(vw->vm, &vw->stash,
 			   vma, vw->cache_level, vw->flags);
 }
@@ -333,7 +338,7 @@ struct i915_vma_work *i915_vma_work(void)
 		return NULL;
 
 	dma_fence_work_init(&vw->base, &bind_ops);
-	vw->base.dma.error = -EAGAIN; /* disable the worker by default */
+	vw->base.error = -EAGAIN; /* disable the worker by default */
 
 	return vw;
 }
@@ -416,6 +421,9 @@ int i915_vma_bind(struct i915_vma *vma,
 		 * part of the obj->resv->excl_fence as it only affects
 		 * execution and not content or object's backing store lifetime.
 		 */
+
+		work->base.error = 0; /* enable the queue_work() */
+
 		prev = i915_active_set_exclusive(&vma->active, &work->base.dma);
 		if (prev) {
 			__i915_sw_fence_await_dma_fence(&work->base.chain,
@@ -424,8 +432,6 @@ int i915_vma_bind(struct i915_vma *vma,
 			dma_fence_put(prev);
 		}
 
-		work->base.dma.error = 0; /* enable the queue_work() */
-
 		if (vma->obj) {
 			__i915_gem_object_pin_pages(vma->obj);
 			work->pinned = i915_gem_object_get(vma->obj);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH 1/6] drm/i915: Update dma_fence_work
@ 2021-10-08 13:35   ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

Move the release callback to after fence signaling to align with
what's done for upcoming VM_BIND user-fence signaling.

Finally call the work callback regardless of whether we have a fence
error or not and update the existing callbacks accordingly. We will
need this to intercept the error for failsafe migration.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_clflush.c |  5 +++
 drivers/gpu/drm/i915/i915_sw_fence_work.c   | 36 ++++++++++-----------
 drivers/gpu/drm/i915/i915_sw_fence_work.h   |  1 +
 drivers/gpu/drm/i915/i915_vma.c             | 12 +++++--
 4 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
index f0435c6feb68..2143ebaf5b6f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
@@ -28,6 +28,11 @@ static void clflush_work(struct dma_fence_work *base)
 {
 	struct clflush *clflush = container_of(base, typeof(*clflush), base);
 
+	if (base->error) {
+		dma_fence_set_error(&base->dma, base->error);
+		return;
+	}
+
 	__do_clflush(clflush->obj);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index 5b33ef23d54c..5b55cddafc9b 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -6,21 +6,24 @@
 
 #include "i915_sw_fence_work.h"
 
-static void fence_complete(struct dma_fence_work *f)
+static void dma_fence_work_complete(struct dma_fence_work *f)
 {
+	dma_fence_signal(&f->dma);
+
 	if (f->ops->release)
 		f->ops->release(f);
-	dma_fence_signal(&f->dma);
+
+	dma_fence_put(&f->dma);
 }
 
-static void fence_work(struct work_struct *work)
+static void dma_fence_work_work(struct work_struct *work)
 {
 	struct dma_fence_work *f = container_of(work, typeof(*f), work);
 
-	f->ops->work(f);
+	if (f->ops->work)
+		f->ops->work(f);
 
-	fence_complete(f);
-	dma_fence_put(&f->dma);
+	dma_fence_work_complete(f);
 }
 
 static int __i915_sw_fence_call
@@ -31,17 +34,13 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	switch (state) {
 	case FENCE_COMPLETE:
 		if (fence->error)
-			dma_fence_set_error(&f->dma, fence->error);
-
-		if (!f->dma.error) {
-			dma_fence_get(&f->dma);
-			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
-				fence_work(&f->work);
-			else
-				queue_work(system_unbound_wq, &f->work);
-		} else {
-			fence_complete(f);
-		}
+			cmpxchg(&f->error, 0, fence->error);
+
+		dma_fence_get(&f->dma);
+		if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
+			dma_fence_work_work(&f->work);
+		else
+			queue_work(system_unbound_wq, &f->work);
 		break;
 
 	case FENCE_FREE:
@@ -84,10 +83,11 @@ void dma_fence_work_init(struct dma_fence_work *f,
 			 const struct dma_fence_work_ops *ops)
 {
 	f->ops = ops;
+	f->error = 0;
 	spin_lock_init(&f->lock);
 	dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0);
 	i915_sw_fence_init(&f->chain, fence_notify);
-	INIT_WORK(&f->work, fence_work);
+	INIT_WORK(&f->work, dma_fence_work_work);
 }
 
 int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal)
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
index d56806918d13..caa59fb5252b 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
@@ -24,6 +24,7 @@ struct dma_fence_work_ops {
 struct dma_fence_work {
 	struct dma_fence dma;
 	spinlock_t lock;
+	int error;
 
 	struct i915_sw_fence chain;
 	struct i915_sw_dma_fence_cb cb;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 4b7fc4647e46..5123ac28ad9a 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -301,6 +301,11 @@ static void __vma_bind(struct dma_fence_work *work)
 	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
 	struct i915_vma *vma = vw->vma;
 
+	if (work->error) {
+		dma_fence_set_error(&work->dma, work->error);
+		return;
+	}
+
 	vma->ops->bind_vma(vw->vm, &vw->stash,
 			   vma, vw->cache_level, vw->flags);
 }
@@ -333,7 +338,7 @@ struct i915_vma_work *i915_vma_work(void)
 		return NULL;
 
 	dma_fence_work_init(&vw->base, &bind_ops);
-	vw->base.dma.error = -EAGAIN; /* disable the worker by default */
+	vw->base.error = -EAGAIN; /* disable the worker by default */
 
 	return vw;
 }
@@ -416,6 +421,9 @@ int i915_vma_bind(struct i915_vma *vma,
 		 * part of the obj->resv->excl_fence as it only affects
 		 * execution and not content or object's backing store lifetime.
 		 */
+
+		work->base.error = 0; /* enable the queue_work() */
+
 		prev = i915_active_set_exclusive(&vma->active, &work->base.dma);
 		if (prev) {
 			__i915_sw_fence_await_dma_fence(&work->base.chain,
@@ -424,8 +432,6 @@ int i915_vma_bind(struct i915_vma *vma,
 			dma_fence_put(prev);
 		}
 
-		work->base.dma.error = 0; /* enable the queue_work() */
-
 		if (vma->obj) {
 			__i915_gem_object_pin_pages(vma->obj);
 			work->pinned = i915_gem_object_get(vma->obj);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/6] drm/i915: Introduce refcounted sg-tables
  2021-10-08 13:35 ` [Intel-gfx] " Thomas Hellström
@ 2021-10-08 13:35   ` Thomas Hellström
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

As we start to introduce asynchronous failsafe object migration,
where we update the object state and then submit asynchronous
commands we need to record what memory resources are actually used
by various part of the command stream. Initially for three purposes:

1) Error capture.
2) Asynchronous migration error recovery.
3) Asynchronous vma bind.

At the time where these happens, the object state may have been updated
to be several migrations ahead and object sg-tables discarded.

In order to make it possible to keep sg-tables with memory resource
information for these operations, introduce refcounted sg-tables that
aren't freed until the last user is done with them.

The alternative would be to reference information sitting on the
corresponding ttm_resources which typically have the same lifetime as
these refcountes sg_tables, but that leads to other awkward constructs:
Due to the design direction chosen for ttm resource managers that would
lead to diamond-style inheritance, the LMEM resources may sometimes be
prematurely freed, and finally the subclassed struct ttm_resource would
have to bleed into the asynchronous vma bind code.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 159 +++++++++++-------
 drivers/gpu/drm/i915/i915_scatterlist.c       |  62 +++++--
 drivers/gpu/drm/i915/i915_scatterlist.h       |  76 ++++++++-
 drivers/gpu/drm/i915/intel_region_ttm.c       |  15 +-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   5 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 7 files changed, 238 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 7c3da4e3e737..d600cf7ceb35 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -485,6 +485,7 @@ struct drm_i915_gem_object {
 		 */
 		struct list_head region_link;
 
+		struct i915_refct_sgt *rsgt;
 		struct sg_table *pages;
 		void *mapping;
 
@@ -538,7 +539,7 @@ struct drm_i915_gem_object {
 	} mm;
 
 	struct {
-		struct sg_table *cached_io_st;
+		struct i915_refct_sgt *cached_io_rsgt;
 		struct i915_gem_object_page_iter get_io_page;
 		struct drm_i915_gem_object *backup;
 		bool created:1;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 74a1ffd0d7dd..4b4d7457bef9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -34,7 +34,7 @@
  * struct i915_ttm_tt - TTM page vector with additional private information
  * @ttm: The base TTM page vector.
  * @dev: The struct device used for dma mapping and unmapping.
- * @cached_st: The cached scatter-gather table.
+ * @cached_rsgt: The cached scatter-gather table.
  *
  * Note that DMA may be going on right up to the point where the page-
  * vector is unpopulated in delayed destroy. Hence keep the
@@ -45,7 +45,7 @@
 struct i915_ttm_tt {
 	struct ttm_tt ttm;
 	struct device *dev;
-	struct sg_table *cached_st;
+	struct i915_refct_sgt cached_rsgt;
 };
 
 static const struct ttm_place sys_placement_flags = {
@@ -179,6 +179,21 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj,
 	placement->busy_placement = busy;
 }
 
+static void i915_ttm_tt_release(struct kref *ref)
+{
+	struct i915_ttm_tt *i915_tt =
+		container_of(ref, typeof(*i915_tt), cached_rsgt.kref);
+	struct sg_table *st = &i915_tt->cached_rsgt.table;
+
+	GEM_WARN_ON(st->sgl);
+
+	kfree(i915_tt);
+}
+
+static const struct i915_refct_sgt_ops tt_rsgt_ops = {
+	.release = i915_ttm_tt_release
+};
+
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 					 uint32_t page_flags)
 {
@@ -203,6 +218,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		return NULL;
 	}
 
+	i915_refct_sgt_init_ops(&i915_tt->cached_rsgt, bo->base.size,
+				&tt_rsgt_ops);
 	i915_tt->dev = obj->base.dev->dev;
 
 	return &i915_tt->ttm;
@@ -211,13 +228,13 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 static void i915_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 {
 	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
+	struct sg_table *st = &i915_tt->cached_rsgt.table;
+
+	GEM_WARN_ON(kref_read(&i915_tt->cached_rsgt.kref) != 1);
 
-	if (i915_tt->cached_st) {
-		dma_unmap_sgtable(i915_tt->dev, i915_tt->cached_st,
-				  DMA_BIDIRECTIONAL, 0);
-		sg_free_table(i915_tt->cached_st);
-		kfree(i915_tt->cached_st);
-		i915_tt->cached_st = NULL;
+	if (st->sgl) {
+		dma_unmap_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
+		sg_free_table(st);
 	}
 	ttm_pool_free(&bdev->pool, ttm);
 }
@@ -226,8 +243,10 @@ static void i915_ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
 {
 	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
 
+	GEM_WARN_ON(kref_read(&i915_tt->cached_rsgt.kref) != 1);
+
 	ttm_tt_fini(ttm);
-	kfree(i915_tt);
+	i915_refct_sgt_put(&i915_tt->cached_rsgt);
 }
 
 static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
@@ -261,12 +280,12 @@ static int i915_ttm_move_notify(struct ttm_buffer_object *bo)
 	return 0;
 }
 
-static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
+static void i915_ttm_free_cached_io_rsgt(struct drm_i915_gem_object *obj)
 {
 	struct radix_tree_iter iter;
 	void __rcu **slot;
 
-	if (!obj->ttm.cached_io_st)
+	if (!obj->ttm.cached_io_rsgt)
 		return;
 
 	rcu_read_lock();
@@ -274,9 +293,8 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
 		radix_tree_delete(&obj->ttm.get_io_page.radix, iter.index);
 	rcu_read_unlock();
 
-	sg_free_table(obj->ttm.cached_io_st);
-	kfree(obj->ttm.cached_io_st);
-	obj->ttm.cached_io_st = NULL;
+	i915_refct_sgt_put(obj->ttm.cached_io_rsgt);
+	obj->ttm.cached_io_rsgt = NULL;
 }
 
 static void
@@ -347,7 +365,7 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
 		obj->write_domain = 0;
 		obj->read_domains = 0;
 		i915_ttm_adjust_gem_after_move(obj);
-		i915_ttm_free_cached_io_st(obj);
+		i915_ttm_free_cached_io_rsgt(obj);
 		obj->mm.madv = __I915_MADV_PURGED;
 	}
 }
@@ -358,7 +376,7 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
 	int ret = i915_ttm_move_notify(bo);
 
 	GEM_WARN_ON(ret);
-	GEM_WARN_ON(obj->ttm.cached_io_st);
+	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
 	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
 		i915_ttm_purge(obj);
 }
@@ -369,7 +387,7 @@ static void i915_ttm_delete_mem_notify(struct ttm_buffer_object *bo)
 
 	if (likely(obj)) {
 		__i915_gem_object_pages_fini(obj);
-		i915_ttm_free_cached_io_st(obj);
+		i915_ttm_free_cached_io_rsgt(obj);
 	}
 }
 
@@ -389,40 +407,35 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
 					  ttm_mem_type - I915_PL_LMEM0);
 }
 
-static struct sg_table *i915_ttm_tt_get_st(struct ttm_tt *ttm)
+static struct i915_refct_sgt *i915_ttm_tt_get_st(struct ttm_tt *ttm)
 {
 	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
 	struct sg_table *st;
 	int ret;
 
-	if (i915_tt->cached_st)
-		return i915_tt->cached_st;
-
-	st = kzalloc(sizeof(*st), GFP_KERNEL);
-	if (!st)
-		return ERR_PTR(-ENOMEM);
+	if (i915_tt->cached_rsgt.table.sgl)
+		return i915_refct_sgt_get(&i915_tt->cached_rsgt);
 
+	st = &i915_tt->cached_rsgt.table;
 	ret = sg_alloc_table_from_pages_segment(st,
 			ttm->pages, ttm->num_pages,
 			0, (unsigned long)ttm->num_pages << PAGE_SHIFT,
 			i915_sg_segment_size(), GFP_KERNEL);
 	if (ret) {
-		kfree(st);
+		st->sgl = NULL;
 		return ERR_PTR(ret);
 	}
 
 	ret = dma_map_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
 	if (ret) {
 		sg_free_table(st);
-		kfree(st);
 		return ERR_PTR(ret);
 	}
 
-	i915_tt->cached_st = st;
-	return st;
+	return i915_refct_sgt_get(&i915_tt->cached_rsgt);
 }
 
-static struct sg_table *
+static struct i915_refct_sgt *
 i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 			 struct ttm_resource *res)
 {
@@ -436,7 +449,21 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 	 * the resulting st. Might make sense for GGTT.
 	 */
 	GEM_WARN_ON(!cpu_maps_iomem(res));
-	return intel_region_ttm_resource_to_st(obj->mm.region, res);
+	if (bo->resource == res) {
+		if (!obj->ttm.cached_io_rsgt) {
+			struct i915_refct_sgt *rsgt;
+
+			rsgt = intel_region_ttm_resource_to_rsgt(obj->mm.region,
+								 res);
+			if (IS_ERR(rsgt))
+				return rsgt;
+
+			obj->ttm.cached_io_rsgt = rsgt;
+		}
+		return i915_refct_sgt_get(obj->ttm.cached_io_rsgt);
+	}
+
+	return intel_region_ttm_resource_to_rsgt(obj->mm.region, res);
 }
 
 static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
@@ -447,10 +474,7 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
 {
 	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
 						     bdev);
-	struct ttm_resource_manager *src_man =
-		ttm_manager_type(bo->bdev, bo->resource->mem_type);
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-	struct sg_table *src_st;
 	struct i915_request *rq;
 	struct ttm_tt *src_ttm = bo->ttm;
 	enum i915_cache_level src_level, dst_level;
@@ -476,17 +500,22 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
 		}
 		intel_engine_pm_put(i915->gt.migrate.context->engine);
 	} else {
-		src_st = src_man->use_tt ? i915_ttm_tt_get_st(src_ttm) :
-			obj->ttm.cached_io_st;
+		struct i915_refct_sgt *src_rsgt =
+			i915_ttm_resource_get_st(obj, bo->resource);
+
+		if (IS_ERR(src_rsgt))
+			return PTR_ERR(src_rsgt);
 
 		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
 		ret = intel_context_migrate_copy(i915->gt.migrate.context,
-						 NULL, src_st->sgl, src_level,
+						 NULL, src_rsgt->table.sgl,
+						 src_level,
 						 gpu_binds_iomem(bo->resource),
 						 dst_st->sgl, dst_level,
 						 gpu_binds_iomem(dst_mem),
 						 &rq);
+		i915_refct_sgt_put(src_rsgt);
 		if (!ret && rq) {
 			i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
 			i915_request_put(rq);
@@ -500,13 +529,14 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
 static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 			    struct ttm_resource *dst_mem,
 			    struct ttm_tt *dst_ttm,
-			    struct sg_table *dst_st,
+			    struct i915_refct_sgt *dst_rsgt,
 			    bool allow_accel)
 {
 	int ret = -EINVAL;
 
 	if (allow_accel)
-		ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm, dst_st);
+		ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
+					  &dst_rsgt->table);
 	if (ret) {
 		struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 		struct intel_memory_region *dst_reg, *src_reg;
@@ -523,12 +553,13 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 		dst_iter = !cpu_maps_iomem(dst_mem) ?
 			ttm_kmap_iter_tt_init(&_dst_iter.tt, dst_ttm) :
 			ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
-						 dst_st, dst_reg->region.start);
+						 &dst_rsgt->table,
+						 dst_reg->region.start);
 
 		src_iter = !cpu_maps_iomem(bo->resource) ?
 			ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
 			ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
-						 obj->ttm.cached_io_st,
+						 &obj->ttm.cached_io_rsgt->table,
 						 src_reg->region.start);
 
 		ttm_move_memcpy(clear, dst_mem->num_pages, dst_iter, src_iter);
@@ -544,7 +575,7 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	struct ttm_resource_manager *dst_man =
 		ttm_manager_type(bo->bdev, dst_mem->mem_type);
 	struct ttm_tt *ttm = bo->ttm;
-	struct sg_table *dst_st;
+	struct i915_refct_sgt *dst_rsgt;
 	bool clear;
 	int ret;
 
@@ -570,22 +601,24 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 			return ret;
 	}
 
-	dst_st = i915_ttm_resource_get_st(obj, dst_mem);
-	if (IS_ERR(dst_st))
-		return PTR_ERR(dst_st);
+	dst_rsgt = i915_ttm_resource_get_st(obj, dst_mem);
+	if (IS_ERR(dst_rsgt))
+		return PTR_ERR(dst_rsgt);
 
 	clear = !cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
 	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
-		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_st, true);
+		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
 
 	ttm_bo_move_sync_cleanup(bo, dst_mem);
 	i915_ttm_adjust_domains_after_move(obj);
-	i915_ttm_free_cached_io_st(obj);
+	i915_ttm_free_cached_io_rsgt(obj);
 
 	if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
-		obj->ttm.cached_io_st = dst_st;
-		obj->ttm.get_io_page.sg_pos = dst_st->sgl;
+		obj->ttm.cached_io_rsgt = dst_rsgt;
+		obj->ttm.get_io_page.sg_pos = dst_rsgt->table.sgl;
 		obj->ttm.get_io_page.sg_idx = 0;
+	} else {
+		i915_refct_sgt_put(dst_rsgt);
 	}
 
 	i915_ttm_adjust_gem_after_move(obj);
@@ -649,7 +682,6 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj,
 		.interruptible = true,
 		.no_wait_gpu = false,
 	};
-	struct sg_table *st;
 	int real_num_busy;
 	int ret;
 
@@ -687,12 +719,16 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj,
 	}
 
 	if (!i915_gem_object_has_pages(obj)) {
-		/* Object either has a page vector or is an iomem object */
-		st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
-		if (IS_ERR(st))
-			return PTR_ERR(st);
+		struct i915_refct_sgt *rsgt =
+			i915_ttm_resource_get_st(obj, bo->resource);
+
+		if (IS_ERR(rsgt))
+			return PTR_ERR(rsgt);
 
-		__i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
+		GEM_BUG_ON(obj->mm.rsgt);
+		obj->mm.rsgt = rsgt;
+		__i915_gem_object_set_pages(obj, &rsgt->table,
+					    i915_sg_dma_sizes(rsgt->table.sgl));
 	}
 
 	return ret;
@@ -766,6 +802,11 @@ static void i915_ttm_put_pages(struct drm_i915_gem_object *obj,
 	 * and shrinkers will move it out if needed.
 	 */
 
+	if (obj->mm.rsgt) {
+		i915_refct_sgt_put(obj->mm.rsgt);
+		obj->mm.rsgt = NULL;
+	}
+
 	i915_ttm_adjust_lru(obj);
 }
 
@@ -1023,7 +1064,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 	struct ttm_operation_ctx ctx = {
 		.interruptible = intr,
 	};
-	struct sg_table *dst_st;
+	struct i915_refct_sgt *dst_rsgt;
 	int ret;
 
 	assert_object_held(dst);
@@ -1038,11 +1079,11 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 	if (ret)
 		return ret;
 
-	dst_st = gpu_binds_iomem(dst_bo->resource) ?
-		dst->ttm.cached_io_st : i915_ttm_tt_get_st(dst_bo->ttm);
-
+	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
 	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_st, allow_accel);
+			dst_rsgt, allow_accel);
+
+	i915_refct_sgt_put(dst_rsgt);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c b/drivers/gpu/drm/i915/i915_scatterlist.c
index 4a6712dca838..8a510ee5d1ad 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.c
+++ b/drivers/gpu/drm/i915/i915_scatterlist.c
@@ -41,8 +41,32 @@ bool i915_sg_trim(struct sg_table *orig_st)
 	return true;
 }
 
+static void i915_refct_sgt_release(struct kref *ref)
+{
+	struct i915_refct_sgt *rsgt =
+		container_of(ref, typeof(*rsgt), kref);
+
+	sg_free_table(&rsgt->table);
+	kfree(rsgt);
+}
+
+static const struct i915_refct_sgt_ops rsgt_ops = {
+	.release = i915_refct_sgt_release
+};
+
+/**
+ * i915_refct_sgt_init - Initialize a struct i915_refct_sgt with default ops
+ * @rsgt: The struct i915_refct_sgt to initialize.
+ * size: The size of the underlying memory buffer.
+ */
+void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, size_t size)
+{
+	i915_refct_sgt_init_ops(rsgt, size, &rsgt_ops);
+}
+
 /**
- * i915_sg_from_mm_node - Create an sg_table from a struct drm_mm_node
+ * i915_rsgt_from_mm_node - Create a refcounted sg_table from a struct
+ * drm_mm_node
  * @node: The drm_mm_node.
  * @region_start: An offset to add to the dma addresses of the sg list.
  *
@@ -50,25 +74,28 @@ bool i915_sg_trim(struct sg_table *orig_st)
  * taking a maximum segment length into account, splitting into segments
  * if necessary.
  *
- * Return: A pointer to a kmalloced struct sg_table on success, negative
+ * Return: A pointer to a kmalloced struct i915_refct_sgt on success, negative
  * error code cast to an error pointer on failure.
  */
-struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
-				      u64 region_start)
+struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
+					      u64 region_start)
 {
 	const u64 max_segment = SZ_1G; /* Do we have a limit on this? */
 	u64 segment_pages = max_segment >> PAGE_SHIFT;
 	u64 block_size, offset, prev_end;
+	struct i915_refct_sgt *rsgt;
 	struct sg_table *st;
 	struct scatterlist *sg;
 
-	st = kmalloc(sizeof(*st), GFP_KERNEL);
-	if (!st)
+	rsgt = kmalloc(sizeof(*rsgt), GFP_KERNEL);
+	if (!rsgt)
 		return ERR_PTR(-ENOMEM);
 
+	i915_refct_sgt_init(rsgt, node->size << PAGE_SHIFT);
+	st = &rsgt->table;
 	if (sg_alloc_table(st, DIV_ROUND_UP(node->size, segment_pages),
 			   GFP_KERNEL)) {
-		kfree(st);
+		i915_refct_sgt_put(rsgt);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -104,11 +131,11 @@ struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
 	sg_mark_end(sg);
 	i915_sg_trim(st);
 
-	return st;
+	return rsgt;
 }
 
 /**
- * i915_sg_from_buddy_resource - Create an sg_table from a struct
+ * i915_rsgt_from_buddy_resource - Create a refcounted sg_table from a struct
  * i915_buddy_block list
  * @res: The struct i915_ttm_buddy_resource.
  * @region_start: An offset to add to the dma addresses of the sg list.
@@ -117,11 +144,11 @@ struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
  * taking a maximum segment length into account, splitting into segments
  * if necessary.
  *
- * Return: A pointer to a kmalloced struct sg_table on success, negative
+ * Return: A pointer to a kmalloced struct i915_refct_sgts on success, negative
  * error code cast to an error pointer on failure.
  */
-struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
-					     u64 region_start)
+struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
+						     u64 region_start)
 {
 	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
 	const u64 size = res->num_pages << PAGE_SHIFT;
@@ -129,18 +156,21 @@ struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
 	struct i915_buddy_mm *mm = bman_res->mm;
 	struct list_head *blocks = &bman_res->blocks;
 	struct i915_buddy_block *block;
+	struct i915_refct_sgt *rsgt;
 	struct scatterlist *sg;
 	struct sg_table *st;
 	resource_size_t prev_end;
 
 	GEM_BUG_ON(list_empty(blocks));
 
-	st = kmalloc(sizeof(*st), GFP_KERNEL);
-	if (!st)
+	rsgt = kmalloc(sizeof(*rsgt), GFP_KERNEL);
+	if (!rsgt)
 		return ERR_PTR(-ENOMEM);
 
+	i915_refct_sgt_init(rsgt, size);
+	st = &rsgt->table;
 	if (sg_alloc_table(st, res->num_pages, GFP_KERNEL)) {
-		kfree(st);
+		i915_refct_sgt_put(rsgt);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -181,7 +211,7 @@ struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
 	sg_mark_end(sg);
 	i915_sg_trim(st);
 
-	return st;
+	return rsgt;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h b/drivers/gpu/drm/i915/i915_scatterlist.h
index b8bd5925b03f..321fd4a9f777 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.h
+++ b/drivers/gpu/drm/i915/i915_scatterlist.h
@@ -144,10 +144,78 @@ static inline unsigned int i915_sg_segment_size(void)
 
 bool i915_sg_trim(struct sg_table *orig_st);
 
-struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
-				      u64 region_start);
+/**
+ * struct i915_refct_sgt_ops - Operations structure for struct i915_refct_sgt
+ */
+struct i915_refct_sgt_ops {
+	/**
+	 * release() - Free the memory of the struct i915_refct_sgt
+	 * @ref: struct kref that is embedded in the struct i915_refct_sgt
+	 */
+	void (*release)(struct kref *ref);
+};
+
+/**
+ * struct i915_refct_sgt - A refcounted scatter-gather table
+ * @kref: struct kref for refcounting
+ * @table: struct sg_table holding the scatter-gather table itself. Note that
+ * @table->sgl = NULL can be used to determine whether a scatter-gather table
+ * is present or not.
+ * @size: The size in bytes of the underlying memory buffer
+ * @ops: The operations structure.
+ */
+struct i915_refct_sgt {
+	struct kref kref;
+	struct sg_table table;
+	size_t size;
+	const struct i915_refct_sgt_ops *ops;
+};
+
+/**
+ * i915_refct_sgt_put - Put a refcounted sg-table
+ * @rsgt the struct i915_refct_sgt to put.
+ */
+static inline void i915_refct_sgt_put(struct i915_refct_sgt *rsgt)
+{
+	if (rsgt)
+		kref_put(&rsgt->kref, rsgt->ops->release);
+}
+
+/**
+ * i915_refct_sgt_get - Get a refcounted sg-table
+ * @rsgt the struct i915_refct_sgt to get.
+ */
+static inline struct i915_refct_sgt *
+i915_refct_sgt_get(struct i915_refct_sgt *rsgt)
+{
+	kref_get(&rsgt->kref);
+	return rsgt;
+}
+
+/**
+ * i915_refct_sgt_init_ops - Initialize a refcounted sg-list with a custom
+ * operations structure
+ * @rsgt The struct i915_refct_sgt to initialize.
+ * @size: Size in bytes of the underlying memory buffer.
+ * @ops: A customized operations structure in case the refcounted sg-list
+ * is embedded into another structure.
+ */
+static inline void i915_refct_sgt_init_ops(struct i915_refct_sgt *rsgt,
+					   size_t size,
+					   const struct i915_refct_sgt_ops *ops)
+{
+	kref_init(&rsgt->kref);
+	rsgt->table.sgl = NULL;
+	rsgt->size = size;
+	rsgt->ops = ops;
+}
+
+void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, size_t size);
+
+struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
+					      u64 region_start);
 
-struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
-					     u64 region_start);
+struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
+						     u64 region_start);
 
 #endif
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 98c7339bf8ba..2e901a27e259 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -115,8 +115,8 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
 }
 
 /**
- * intel_region_ttm_resource_to_st - Convert an opaque TTM resource manager resource
- * to an sg_table.
+ * intel_region_ttm_resource_to_rsgt -
+ * Convert an opaque TTM resource manager resource to a refcounted sg_table.
  * @mem: The memory region.
  * @res: The resource manager resource obtained from the TTM resource manager.
  *
@@ -126,17 +126,18 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
  *
  * Return: A malloced sg_table on success, an error pointer on failure.
  */
-struct sg_table *intel_region_ttm_resource_to_st(struct intel_memory_region *mem,
-						 struct ttm_resource *res)
+struct i915_refct_sgt *
+intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
+				  struct ttm_resource *res)
 {
 	if (mem->is_range_manager) {
 		struct ttm_range_mgr_node *range_node =
 			to_ttm_range_mgr_node(res);
 
-		return i915_sg_from_mm_node(&range_node->mm_nodes[0],
-					    mem->region.start);
+		return i915_rsgt_from_mm_node(&range_node->mm_nodes[0],
+					      mem->region.start);
 	} else {
-		return i915_sg_from_buddy_resource(res, mem->region.start);
+		return i915_rsgt_from_buddy_resource(res, mem->region.start);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
index 6f44075920f2..7bbe2b46b504 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -22,8 +22,9 @@ int intel_region_ttm_init(struct intel_memory_region *mem);
 
 void intel_region_ttm_fini(struct intel_memory_region *mem);
 
-struct sg_table *intel_region_ttm_resource_to_st(struct intel_memory_region *mem,
-						 struct ttm_resource *res);
+struct i915_refct_sgt *
+intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
+				  struct ttm_resource *res);
 
 void intel_region_ttm_resource_free(struct intel_memory_region *mem,
 				    struct ttm_resource *res);
diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c
index efa86dffe3c6..2752b5b98f60 100644
--- a/drivers/gpu/drm/i915/selftests/mock_region.c
+++ b/drivers/gpu/drm/i915/selftests/mock_region.c
@@ -17,9 +17,9 @@
 static void mock_region_put_pages(struct drm_i915_gem_object *obj,
 				  struct sg_table *pages)
 {
+	i915_refct_sgt_put(obj->mm.rsgt);
+	obj->mm.rsgt = NULL;
 	intel_region_ttm_resource_free(obj->mm.region, obj->mm.res);
-	sg_free_table(pages);
-	kfree(pages);
 }
 
 static int mock_region_get_pages(struct drm_i915_gem_object *obj)
@@ -38,12 +38,14 @@ static int mock_region_get_pages(struct drm_i915_gem_object *obj)
 	if (IS_ERR(obj->mm.res))
 		return PTR_ERR(obj->mm.res);
 
-	pages = intel_region_ttm_resource_to_st(obj->mm.region, obj->mm.res);
-	if (IS_ERR(pages)) {
-		err = PTR_ERR(pages);
+	obj->mm.rsgt = intel_region_ttm_resource_to_rsgt(obj->mm.region,
+							 obj->mm.res);
+	if (IS_ERR(obj->mm.rsgt)) {
+		err = PTR_ERR(obj->mm.rsgt);
 		goto err_free_resource;
 	}
 
+	pages = &obj->mm.rsgt->table;
 	__i915_gem_object_set_pages(obj, pages, i915_sg_dma_sizes(pages->sgl));
 
 	return 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH 2/6] drm/i915: Introduce refcounted sg-tables
@ 2021-10-08 13:35   ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

As we start to introduce asynchronous failsafe object migration,
where we update the object state and then submit asynchronous
commands we need to record what memory resources are actually used
by various part of the command stream. Initially for three purposes:

1) Error capture.
2) Asynchronous migration error recovery.
3) Asynchronous vma bind.

At the time where these happens, the object state may have been updated
to be several migrations ahead and object sg-tables discarded.

In order to make it possible to keep sg-tables with memory resource
information for these operations, introduce refcounted sg-tables that
aren't freed until the last user is done with them.

The alternative would be to reference information sitting on the
corresponding ttm_resources which typically have the same lifetime as
these refcountes sg_tables, but that leads to other awkward constructs:
Due to the design direction chosen for ttm resource managers that would
lead to diamond-style inheritance, the LMEM resources may sometimes be
prematurely freed, and finally the subclassed struct ttm_resource would
have to bleed into the asynchronous vma bind code.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 159 +++++++++++-------
 drivers/gpu/drm/i915/i915_scatterlist.c       |  62 +++++--
 drivers/gpu/drm/i915/i915_scatterlist.h       |  76 ++++++++-
 drivers/gpu/drm/i915/intel_region_ttm.c       |  15 +-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   5 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 7 files changed, 238 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 7c3da4e3e737..d600cf7ceb35 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -485,6 +485,7 @@ struct drm_i915_gem_object {
 		 */
 		struct list_head region_link;
 
+		struct i915_refct_sgt *rsgt;
 		struct sg_table *pages;
 		void *mapping;
 
@@ -538,7 +539,7 @@ struct drm_i915_gem_object {
 	} mm;
 
 	struct {
-		struct sg_table *cached_io_st;
+		struct i915_refct_sgt *cached_io_rsgt;
 		struct i915_gem_object_page_iter get_io_page;
 		struct drm_i915_gem_object *backup;
 		bool created:1;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 74a1ffd0d7dd..4b4d7457bef9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -34,7 +34,7 @@
  * struct i915_ttm_tt - TTM page vector with additional private information
  * @ttm: The base TTM page vector.
  * @dev: The struct device used for dma mapping and unmapping.
- * @cached_st: The cached scatter-gather table.
+ * @cached_rsgt: The cached scatter-gather table.
  *
  * Note that DMA may be going on right up to the point where the page-
  * vector is unpopulated in delayed destroy. Hence keep the
@@ -45,7 +45,7 @@
 struct i915_ttm_tt {
 	struct ttm_tt ttm;
 	struct device *dev;
-	struct sg_table *cached_st;
+	struct i915_refct_sgt cached_rsgt;
 };
 
 static const struct ttm_place sys_placement_flags = {
@@ -179,6 +179,21 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj,
 	placement->busy_placement = busy;
 }
 
+static void i915_ttm_tt_release(struct kref *ref)
+{
+	struct i915_ttm_tt *i915_tt =
+		container_of(ref, typeof(*i915_tt), cached_rsgt.kref);
+	struct sg_table *st = &i915_tt->cached_rsgt.table;
+
+	GEM_WARN_ON(st->sgl);
+
+	kfree(i915_tt);
+}
+
+static const struct i915_refct_sgt_ops tt_rsgt_ops = {
+	.release = i915_ttm_tt_release
+};
+
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 					 uint32_t page_flags)
 {
@@ -203,6 +218,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		return NULL;
 	}
 
+	i915_refct_sgt_init_ops(&i915_tt->cached_rsgt, bo->base.size,
+				&tt_rsgt_ops);
 	i915_tt->dev = obj->base.dev->dev;
 
 	return &i915_tt->ttm;
@@ -211,13 +228,13 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 static void i915_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 {
 	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
+	struct sg_table *st = &i915_tt->cached_rsgt.table;
+
+	GEM_WARN_ON(kref_read(&i915_tt->cached_rsgt.kref) != 1);
 
-	if (i915_tt->cached_st) {
-		dma_unmap_sgtable(i915_tt->dev, i915_tt->cached_st,
-				  DMA_BIDIRECTIONAL, 0);
-		sg_free_table(i915_tt->cached_st);
-		kfree(i915_tt->cached_st);
-		i915_tt->cached_st = NULL;
+	if (st->sgl) {
+		dma_unmap_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
+		sg_free_table(st);
 	}
 	ttm_pool_free(&bdev->pool, ttm);
 }
@@ -226,8 +243,10 @@ static void i915_ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
 {
 	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
 
+	GEM_WARN_ON(kref_read(&i915_tt->cached_rsgt.kref) != 1);
+
 	ttm_tt_fini(ttm);
-	kfree(i915_tt);
+	i915_refct_sgt_put(&i915_tt->cached_rsgt);
 }
 
 static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
@@ -261,12 +280,12 @@ static int i915_ttm_move_notify(struct ttm_buffer_object *bo)
 	return 0;
 }
 
-static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
+static void i915_ttm_free_cached_io_rsgt(struct drm_i915_gem_object *obj)
 {
 	struct radix_tree_iter iter;
 	void __rcu **slot;
 
-	if (!obj->ttm.cached_io_st)
+	if (!obj->ttm.cached_io_rsgt)
 		return;
 
 	rcu_read_lock();
@@ -274,9 +293,8 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
 		radix_tree_delete(&obj->ttm.get_io_page.radix, iter.index);
 	rcu_read_unlock();
 
-	sg_free_table(obj->ttm.cached_io_st);
-	kfree(obj->ttm.cached_io_st);
-	obj->ttm.cached_io_st = NULL;
+	i915_refct_sgt_put(obj->ttm.cached_io_rsgt);
+	obj->ttm.cached_io_rsgt = NULL;
 }
 
 static void
@@ -347,7 +365,7 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
 		obj->write_domain = 0;
 		obj->read_domains = 0;
 		i915_ttm_adjust_gem_after_move(obj);
-		i915_ttm_free_cached_io_st(obj);
+		i915_ttm_free_cached_io_rsgt(obj);
 		obj->mm.madv = __I915_MADV_PURGED;
 	}
 }
@@ -358,7 +376,7 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
 	int ret = i915_ttm_move_notify(bo);
 
 	GEM_WARN_ON(ret);
-	GEM_WARN_ON(obj->ttm.cached_io_st);
+	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
 	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
 		i915_ttm_purge(obj);
 }
@@ -369,7 +387,7 @@ static void i915_ttm_delete_mem_notify(struct ttm_buffer_object *bo)
 
 	if (likely(obj)) {
 		__i915_gem_object_pages_fini(obj);
-		i915_ttm_free_cached_io_st(obj);
+		i915_ttm_free_cached_io_rsgt(obj);
 	}
 }
 
@@ -389,40 +407,35 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
 					  ttm_mem_type - I915_PL_LMEM0);
 }
 
-static struct sg_table *i915_ttm_tt_get_st(struct ttm_tt *ttm)
+static struct i915_refct_sgt *i915_ttm_tt_get_st(struct ttm_tt *ttm)
 {
 	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
 	struct sg_table *st;
 	int ret;
 
-	if (i915_tt->cached_st)
-		return i915_tt->cached_st;
-
-	st = kzalloc(sizeof(*st), GFP_KERNEL);
-	if (!st)
-		return ERR_PTR(-ENOMEM);
+	if (i915_tt->cached_rsgt.table.sgl)
+		return i915_refct_sgt_get(&i915_tt->cached_rsgt);
 
+	st = &i915_tt->cached_rsgt.table;
 	ret = sg_alloc_table_from_pages_segment(st,
 			ttm->pages, ttm->num_pages,
 			0, (unsigned long)ttm->num_pages << PAGE_SHIFT,
 			i915_sg_segment_size(), GFP_KERNEL);
 	if (ret) {
-		kfree(st);
+		st->sgl = NULL;
 		return ERR_PTR(ret);
 	}
 
 	ret = dma_map_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
 	if (ret) {
 		sg_free_table(st);
-		kfree(st);
 		return ERR_PTR(ret);
 	}
 
-	i915_tt->cached_st = st;
-	return st;
+	return i915_refct_sgt_get(&i915_tt->cached_rsgt);
 }
 
-static struct sg_table *
+static struct i915_refct_sgt *
 i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 			 struct ttm_resource *res)
 {
@@ -436,7 +449,21 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 	 * the resulting st. Might make sense for GGTT.
 	 */
 	GEM_WARN_ON(!cpu_maps_iomem(res));
-	return intel_region_ttm_resource_to_st(obj->mm.region, res);
+	if (bo->resource == res) {
+		if (!obj->ttm.cached_io_rsgt) {
+			struct i915_refct_sgt *rsgt;
+
+			rsgt = intel_region_ttm_resource_to_rsgt(obj->mm.region,
+								 res);
+			if (IS_ERR(rsgt))
+				return rsgt;
+
+			obj->ttm.cached_io_rsgt = rsgt;
+		}
+		return i915_refct_sgt_get(obj->ttm.cached_io_rsgt);
+	}
+
+	return intel_region_ttm_resource_to_rsgt(obj->mm.region, res);
 }
 
 static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
@@ -447,10 +474,7 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
 {
 	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
 						     bdev);
-	struct ttm_resource_manager *src_man =
-		ttm_manager_type(bo->bdev, bo->resource->mem_type);
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-	struct sg_table *src_st;
 	struct i915_request *rq;
 	struct ttm_tt *src_ttm = bo->ttm;
 	enum i915_cache_level src_level, dst_level;
@@ -476,17 +500,22 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
 		}
 		intel_engine_pm_put(i915->gt.migrate.context->engine);
 	} else {
-		src_st = src_man->use_tt ? i915_ttm_tt_get_st(src_ttm) :
-			obj->ttm.cached_io_st;
+		struct i915_refct_sgt *src_rsgt =
+			i915_ttm_resource_get_st(obj, bo->resource);
+
+		if (IS_ERR(src_rsgt))
+			return PTR_ERR(src_rsgt);
 
 		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
 		ret = intel_context_migrate_copy(i915->gt.migrate.context,
-						 NULL, src_st->sgl, src_level,
+						 NULL, src_rsgt->table.sgl,
+						 src_level,
 						 gpu_binds_iomem(bo->resource),
 						 dst_st->sgl, dst_level,
 						 gpu_binds_iomem(dst_mem),
 						 &rq);
+		i915_refct_sgt_put(src_rsgt);
 		if (!ret && rq) {
 			i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
 			i915_request_put(rq);
@@ -500,13 +529,14 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
 static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 			    struct ttm_resource *dst_mem,
 			    struct ttm_tt *dst_ttm,
-			    struct sg_table *dst_st,
+			    struct i915_refct_sgt *dst_rsgt,
 			    bool allow_accel)
 {
 	int ret = -EINVAL;
 
 	if (allow_accel)
-		ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm, dst_st);
+		ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
+					  &dst_rsgt->table);
 	if (ret) {
 		struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 		struct intel_memory_region *dst_reg, *src_reg;
@@ -523,12 +553,13 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 		dst_iter = !cpu_maps_iomem(dst_mem) ?
 			ttm_kmap_iter_tt_init(&_dst_iter.tt, dst_ttm) :
 			ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
-						 dst_st, dst_reg->region.start);
+						 &dst_rsgt->table,
+						 dst_reg->region.start);
 
 		src_iter = !cpu_maps_iomem(bo->resource) ?
 			ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
 			ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
-						 obj->ttm.cached_io_st,
+						 &obj->ttm.cached_io_rsgt->table,
 						 src_reg->region.start);
 
 		ttm_move_memcpy(clear, dst_mem->num_pages, dst_iter, src_iter);
@@ -544,7 +575,7 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	struct ttm_resource_manager *dst_man =
 		ttm_manager_type(bo->bdev, dst_mem->mem_type);
 	struct ttm_tt *ttm = bo->ttm;
-	struct sg_table *dst_st;
+	struct i915_refct_sgt *dst_rsgt;
 	bool clear;
 	int ret;
 
@@ -570,22 +601,24 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 			return ret;
 	}
 
-	dst_st = i915_ttm_resource_get_st(obj, dst_mem);
-	if (IS_ERR(dst_st))
-		return PTR_ERR(dst_st);
+	dst_rsgt = i915_ttm_resource_get_st(obj, dst_mem);
+	if (IS_ERR(dst_rsgt))
+		return PTR_ERR(dst_rsgt);
 
 	clear = !cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
 	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
-		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_st, true);
+		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
 
 	ttm_bo_move_sync_cleanup(bo, dst_mem);
 	i915_ttm_adjust_domains_after_move(obj);
-	i915_ttm_free_cached_io_st(obj);
+	i915_ttm_free_cached_io_rsgt(obj);
 
 	if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
-		obj->ttm.cached_io_st = dst_st;
-		obj->ttm.get_io_page.sg_pos = dst_st->sgl;
+		obj->ttm.cached_io_rsgt = dst_rsgt;
+		obj->ttm.get_io_page.sg_pos = dst_rsgt->table.sgl;
 		obj->ttm.get_io_page.sg_idx = 0;
+	} else {
+		i915_refct_sgt_put(dst_rsgt);
 	}
 
 	i915_ttm_adjust_gem_after_move(obj);
@@ -649,7 +682,6 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj,
 		.interruptible = true,
 		.no_wait_gpu = false,
 	};
-	struct sg_table *st;
 	int real_num_busy;
 	int ret;
 
@@ -687,12 +719,16 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj,
 	}
 
 	if (!i915_gem_object_has_pages(obj)) {
-		/* Object either has a page vector or is an iomem object */
-		st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
-		if (IS_ERR(st))
-			return PTR_ERR(st);
+		struct i915_refct_sgt *rsgt =
+			i915_ttm_resource_get_st(obj, bo->resource);
+
+		if (IS_ERR(rsgt))
+			return PTR_ERR(rsgt);
 
-		__i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
+		GEM_BUG_ON(obj->mm.rsgt);
+		obj->mm.rsgt = rsgt;
+		__i915_gem_object_set_pages(obj, &rsgt->table,
+					    i915_sg_dma_sizes(rsgt->table.sgl));
 	}
 
 	return ret;
@@ -766,6 +802,11 @@ static void i915_ttm_put_pages(struct drm_i915_gem_object *obj,
 	 * and shrinkers will move it out if needed.
 	 */
 
+	if (obj->mm.rsgt) {
+		i915_refct_sgt_put(obj->mm.rsgt);
+		obj->mm.rsgt = NULL;
+	}
+
 	i915_ttm_adjust_lru(obj);
 }
 
@@ -1023,7 +1064,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 	struct ttm_operation_ctx ctx = {
 		.interruptible = intr,
 	};
-	struct sg_table *dst_st;
+	struct i915_refct_sgt *dst_rsgt;
 	int ret;
 
 	assert_object_held(dst);
@@ -1038,11 +1079,11 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 	if (ret)
 		return ret;
 
-	dst_st = gpu_binds_iomem(dst_bo->resource) ?
-		dst->ttm.cached_io_st : i915_ttm_tt_get_st(dst_bo->ttm);
-
+	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
 	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_st, allow_accel);
+			dst_rsgt, allow_accel);
+
+	i915_refct_sgt_put(dst_rsgt);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c b/drivers/gpu/drm/i915/i915_scatterlist.c
index 4a6712dca838..8a510ee5d1ad 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.c
+++ b/drivers/gpu/drm/i915/i915_scatterlist.c
@@ -41,8 +41,32 @@ bool i915_sg_trim(struct sg_table *orig_st)
 	return true;
 }
 
+static void i915_refct_sgt_release(struct kref *ref)
+{
+	struct i915_refct_sgt *rsgt =
+		container_of(ref, typeof(*rsgt), kref);
+
+	sg_free_table(&rsgt->table);
+	kfree(rsgt);
+}
+
+static const struct i915_refct_sgt_ops rsgt_ops = {
+	.release = i915_refct_sgt_release
+};
+
+/**
+ * i915_refct_sgt_init - Initialize a struct i915_refct_sgt with default ops
+ * @rsgt: The struct i915_refct_sgt to initialize.
+ * size: The size of the underlying memory buffer.
+ */
+void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, size_t size)
+{
+	i915_refct_sgt_init_ops(rsgt, size, &rsgt_ops);
+}
+
 /**
- * i915_sg_from_mm_node - Create an sg_table from a struct drm_mm_node
+ * i915_rsgt_from_mm_node - Create a refcounted sg_table from a struct
+ * drm_mm_node
  * @node: The drm_mm_node.
  * @region_start: An offset to add to the dma addresses of the sg list.
  *
@@ -50,25 +74,28 @@ bool i915_sg_trim(struct sg_table *orig_st)
  * taking a maximum segment length into account, splitting into segments
  * if necessary.
  *
- * Return: A pointer to a kmalloced struct sg_table on success, negative
+ * Return: A pointer to a kmalloced struct i915_refct_sgt on success, negative
  * error code cast to an error pointer on failure.
  */
-struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
-				      u64 region_start)
+struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
+					      u64 region_start)
 {
 	const u64 max_segment = SZ_1G; /* Do we have a limit on this? */
 	u64 segment_pages = max_segment >> PAGE_SHIFT;
 	u64 block_size, offset, prev_end;
+	struct i915_refct_sgt *rsgt;
 	struct sg_table *st;
 	struct scatterlist *sg;
 
-	st = kmalloc(sizeof(*st), GFP_KERNEL);
-	if (!st)
+	rsgt = kmalloc(sizeof(*rsgt), GFP_KERNEL);
+	if (!rsgt)
 		return ERR_PTR(-ENOMEM);
 
+	i915_refct_sgt_init(rsgt, node->size << PAGE_SHIFT);
+	st = &rsgt->table;
 	if (sg_alloc_table(st, DIV_ROUND_UP(node->size, segment_pages),
 			   GFP_KERNEL)) {
-		kfree(st);
+		i915_refct_sgt_put(rsgt);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -104,11 +131,11 @@ struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
 	sg_mark_end(sg);
 	i915_sg_trim(st);
 
-	return st;
+	return rsgt;
 }
 
 /**
- * i915_sg_from_buddy_resource - Create an sg_table from a struct
+ * i915_rsgt_from_buddy_resource - Create a refcounted sg_table from a struct
  * i915_buddy_block list
  * @res: The struct i915_ttm_buddy_resource.
  * @region_start: An offset to add to the dma addresses of the sg list.
@@ -117,11 +144,11 @@ struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
  * taking a maximum segment length into account, splitting into segments
  * if necessary.
  *
- * Return: A pointer to a kmalloced struct sg_table on success, negative
+ * Return: A pointer to a kmalloced struct i915_refct_sgts on success, negative
  * error code cast to an error pointer on failure.
  */
-struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
-					     u64 region_start)
+struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
+						     u64 region_start)
 {
 	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
 	const u64 size = res->num_pages << PAGE_SHIFT;
@@ -129,18 +156,21 @@ struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
 	struct i915_buddy_mm *mm = bman_res->mm;
 	struct list_head *blocks = &bman_res->blocks;
 	struct i915_buddy_block *block;
+	struct i915_refct_sgt *rsgt;
 	struct scatterlist *sg;
 	struct sg_table *st;
 	resource_size_t prev_end;
 
 	GEM_BUG_ON(list_empty(blocks));
 
-	st = kmalloc(sizeof(*st), GFP_KERNEL);
-	if (!st)
+	rsgt = kmalloc(sizeof(*rsgt), GFP_KERNEL);
+	if (!rsgt)
 		return ERR_PTR(-ENOMEM);
 
+	i915_refct_sgt_init(rsgt, size);
+	st = &rsgt->table;
 	if (sg_alloc_table(st, res->num_pages, GFP_KERNEL)) {
-		kfree(st);
+		i915_refct_sgt_put(rsgt);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -181,7 +211,7 @@ struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
 	sg_mark_end(sg);
 	i915_sg_trim(st);
 
-	return st;
+	return rsgt;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h b/drivers/gpu/drm/i915/i915_scatterlist.h
index b8bd5925b03f..321fd4a9f777 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.h
+++ b/drivers/gpu/drm/i915/i915_scatterlist.h
@@ -144,10 +144,78 @@ static inline unsigned int i915_sg_segment_size(void)
 
 bool i915_sg_trim(struct sg_table *orig_st);
 
-struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
-				      u64 region_start);
+/**
+ * struct i915_refct_sgt_ops - Operations structure for struct i915_refct_sgt
+ */
+struct i915_refct_sgt_ops {
+	/**
+	 * release() - Free the memory of the struct i915_refct_sgt
+	 * @ref: struct kref that is embedded in the struct i915_refct_sgt
+	 */
+	void (*release)(struct kref *ref);
+};
+
+/**
+ * struct i915_refct_sgt - A refcounted scatter-gather table
+ * @kref: struct kref for refcounting
+ * @table: struct sg_table holding the scatter-gather table itself. Note that
+ * @table->sgl = NULL can be used to determine whether a scatter-gather table
+ * is present or not.
+ * @size: The size in bytes of the underlying memory buffer
+ * @ops: The operations structure.
+ */
+struct i915_refct_sgt {
+	struct kref kref;
+	struct sg_table table;
+	size_t size;
+	const struct i915_refct_sgt_ops *ops;
+};
+
+/**
+ * i915_refct_sgt_put - Put a refcounted sg-table
+ * @rsgt the struct i915_refct_sgt to put.
+ */
+static inline void i915_refct_sgt_put(struct i915_refct_sgt *rsgt)
+{
+	if (rsgt)
+		kref_put(&rsgt->kref, rsgt->ops->release);
+}
+
+/**
+ * i915_refct_sgt_get - Get a refcounted sg-table
+ * @rsgt the struct i915_refct_sgt to get.
+ */
+static inline struct i915_refct_sgt *
+i915_refct_sgt_get(struct i915_refct_sgt *rsgt)
+{
+	kref_get(&rsgt->kref);
+	return rsgt;
+}
+
+/**
+ * i915_refct_sgt_init_ops - Initialize a refcounted sg-list with a custom
+ * operations structure
+ * @rsgt The struct i915_refct_sgt to initialize.
+ * @size: Size in bytes of the underlying memory buffer.
+ * @ops: A customized operations structure in case the refcounted sg-list
+ * is embedded into another structure.
+ */
+static inline void i915_refct_sgt_init_ops(struct i915_refct_sgt *rsgt,
+					   size_t size,
+					   const struct i915_refct_sgt_ops *ops)
+{
+	kref_init(&rsgt->kref);
+	rsgt->table.sgl = NULL;
+	rsgt->size = size;
+	rsgt->ops = ops;
+}
+
+void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, size_t size);
+
+struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
+					      u64 region_start);
 
-struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
-					     u64 region_start);
+struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
+						     u64 region_start);
 
 #endif
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 98c7339bf8ba..2e901a27e259 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -115,8 +115,8 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
 }
 
 /**
- * intel_region_ttm_resource_to_st - Convert an opaque TTM resource manager resource
- * to an sg_table.
+ * intel_region_ttm_resource_to_rsgt -
+ * Convert an opaque TTM resource manager resource to a refcounted sg_table.
  * @mem: The memory region.
  * @res: The resource manager resource obtained from the TTM resource manager.
  *
@@ -126,17 +126,18 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
  *
  * Return: A malloced sg_table on success, an error pointer on failure.
  */
-struct sg_table *intel_region_ttm_resource_to_st(struct intel_memory_region *mem,
-						 struct ttm_resource *res)
+struct i915_refct_sgt *
+intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
+				  struct ttm_resource *res)
 {
 	if (mem->is_range_manager) {
 		struct ttm_range_mgr_node *range_node =
 			to_ttm_range_mgr_node(res);
 
-		return i915_sg_from_mm_node(&range_node->mm_nodes[0],
-					    mem->region.start);
+		return i915_rsgt_from_mm_node(&range_node->mm_nodes[0],
+					      mem->region.start);
 	} else {
-		return i915_sg_from_buddy_resource(res, mem->region.start);
+		return i915_rsgt_from_buddy_resource(res, mem->region.start);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
index 6f44075920f2..7bbe2b46b504 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -22,8 +22,9 @@ int intel_region_ttm_init(struct intel_memory_region *mem);
 
 void intel_region_ttm_fini(struct intel_memory_region *mem);
 
-struct sg_table *intel_region_ttm_resource_to_st(struct intel_memory_region *mem,
-						 struct ttm_resource *res);
+struct i915_refct_sgt *
+intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
+				  struct ttm_resource *res);
 
 void intel_region_ttm_resource_free(struct intel_memory_region *mem,
 				    struct ttm_resource *res);
diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c
index efa86dffe3c6..2752b5b98f60 100644
--- a/drivers/gpu/drm/i915/selftests/mock_region.c
+++ b/drivers/gpu/drm/i915/selftests/mock_region.c
@@ -17,9 +17,9 @@
 static void mock_region_put_pages(struct drm_i915_gem_object *obj,
 				  struct sg_table *pages)
 {
+	i915_refct_sgt_put(obj->mm.rsgt);
+	obj->mm.rsgt = NULL;
 	intel_region_ttm_resource_free(obj->mm.region, obj->mm.res);
-	sg_free_table(pages);
-	kfree(pages);
 }
 
 static int mock_region_get_pages(struct drm_i915_gem_object *obj)
@@ -38,12 +38,14 @@ static int mock_region_get_pages(struct drm_i915_gem_object *obj)
 	if (IS_ERR(obj->mm.res))
 		return PTR_ERR(obj->mm.res);
 
-	pages = intel_region_ttm_resource_to_st(obj->mm.region, obj->mm.res);
-	if (IS_ERR(pages)) {
-		err = PTR_ERR(pages);
+	obj->mm.rsgt = intel_region_ttm_resource_to_rsgt(obj->mm.region,
+							 obj->mm.res);
+	if (IS_ERR(obj->mm.rsgt)) {
+		err = PTR_ERR(obj->mm.rsgt);
 		goto err_free_resource;
 	}
 
+	pages = &obj->mm.rsgt->table;
 	__i915_gem_object_set_pages(obj, pages, i915_sg_dma_sizes(pages->sgl));
 
 	return 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/6] drm/i915/ttm: Failsafe migration blits
  2021-10-08 13:35 ` [Intel-gfx] " Thomas Hellström
@ 2021-10-08 13:35   ` Thomas Hellström
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

If the initial fill blit or copy blit of an object fails, the old
content of the data might be exposed and read as soon as either CPU- or
GPU PTEs are set up to point at the pages.

Intercept the blit fence with an async dma_fence_work that checks the
blit fence for errors and if there are errors performs an async cpu blit
instead. If there is a failure to allocate the async dma_fence_work,
allocate it on the stack and sync wait for the blit to complete.

Add selftests that simulate gpu blit failures and failure to allocate
the async dma_fence_work.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 268 ++++++++++++++----
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   4 +
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  24 +-
 3 files changed, 240 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4b4d7457bef9..79d4d50aa4e5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -7,6 +7,7 @@
 #include <drm/ttm/ttm_placement.h>
 
 #include "i915_drv.h"
+#include "i915_sw_fence_work.h"
 #include "intel_memory_region.h"
 #include "intel_region_ttm.h"
 
@@ -25,6 +26,18 @@
 #define I915_TTM_PRIO_NO_PAGES  1
 #define I915_TTM_PRIO_HAS_PAGES 2
 
+I915_SELFTEST_DECLARE(static bool fail_gpu_migration;)
+I915_SELFTEST_DECLARE(static bool fail_work_allocation;)
+
+#ifdef CONFIG_DRM_I915_SELFTEST
+void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
+					bool work_allocation)
+{
+	fail_gpu_migration = gpu_migration;
+	fail_work_allocation = work_allocation;
+}
+#endif
+
 /*
  * Size of struct ttm_place vector in on-stack struct ttm_placement allocs
  */
@@ -466,11 +479,11 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 	return intel_region_ttm_resource_to_rsgt(obj->mm.region, res);
 }
 
-static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
-			       bool clear,
-			       struct ttm_resource *dst_mem,
-			       struct ttm_tt *dst_ttm,
-			       struct sg_table *dst_st)
+static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
+					     bool clear,
+					     struct ttm_resource *dst_mem,
+					     struct ttm_tt *dst_ttm,
+					     struct sg_table *dst_st)
 {
 	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
 						     bdev);
@@ -481,30 +494,29 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
 	int ret;
 
 	if (!i915->gt.migrate.context || intel_gt_is_wedged(&i915->gt))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
+
+	/* With fail_gpu_migration, we always perform a GPU clear. */
+	if (I915_SELFTEST_ONLY(fail_gpu_migration))
+		clear = true;
 
 	dst_level = i915_ttm_cache_level(i915, dst_mem, dst_ttm);
 	if (clear) {
-		if (bo->type == ttm_bo_type_kernel)
-			return -EINVAL;
+		if (bo->type == ttm_bo_type_kernel &&
+		    !I915_SELFTEST_ONLY(fail_gpu_migration))
+			return ERR_PTR(-EINVAL);
 
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
 		ret = intel_context_migrate_clear(i915->gt.migrate.context, NULL,
 						  dst_st->sgl, dst_level,
 						  gpu_binds_iomem(dst_mem),
 						  0, &rq);
-
-		if (!ret && rq) {
-			i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
-			i915_request_put(rq);
-		}
-		intel_engine_pm_put(i915->gt.migrate.context->engine);
 	} else {
 		struct i915_refct_sgt *src_rsgt =
 			i915_ttm_resource_get_st(obj, bo->resource);
 
 		if (IS_ERR(src_rsgt))
-			return PTR_ERR(src_rsgt);
+			return ERR_CAST(src_rsgt);
 
 		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
@@ -515,55 +527,201 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
 						 dst_st->sgl, dst_level,
 						 gpu_binds_iomem(dst_mem),
 						 &rq);
+
 		i915_refct_sgt_put(src_rsgt);
-		if (!ret && rq) {
-			i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
-			i915_request_put(rq);
-		}
-		intel_engine_pm_put(i915->gt.migrate.context->engine);
 	}
 
-	return ret;
+	intel_engine_pm_put(i915->gt.migrate.context->engine);
+
+	if (ret && rq) {
+		i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
+		i915_request_put(rq);
+	}
+
+	return ret ? ERR_PTR(ret) : &rq->fence;
+}
+
+/**
+ * struct i915_ttm_memcpy_work - memcpy work item under a dma-fence
+ * @base: The struct dma_fence_work we subclass.
+ * @_dst_iter: Storage space for the destination kmap iterator.
+ * @_src_iter: Storage space for the source kmap iterator.
+ * @dst_iter: Pointer to the destination kmap iterator.
+ * @src_iter: Pointer to the source kmap iterator.
+ * @clear: Whether to clear instead of copy.
+ * @num_pages: Number of pages in the copy.
+ * @src_rsgt: Refcounted scatter-gather list of source memory.
+ * @dst_rsgt: Refcounted scatter-gather list of destination memory.
+ */
+struct i915_ttm_memcpy_work {
+	struct dma_fence_work base;
+	union {
+		struct ttm_kmap_iter_tt tt;
+		struct ttm_kmap_iter_iomap io;
+	} _dst_iter,
+	_src_iter;
+	struct ttm_kmap_iter *dst_iter;
+	struct ttm_kmap_iter *src_iter;
+	unsigned long num_pages;
+	bool clear;
+	struct i915_refct_sgt *src_rsgt;
+	struct i915_refct_sgt *dst_rsgt;
+};
+
+static void __memcpy_work(struct dma_fence_work *work)
+{
+	struct i915_ttm_memcpy_work *copy_work =
+		container_of(work, typeof(*copy_work), base);
+
+	if (I915_SELFTEST_ONLY(fail_gpu_migration))
+		cmpxchg(&work->error, 0, -EINVAL);
+
+	/* If there was an error in the gpu copy operation, run memcpy. */
+	if (work->error)
+		ttm_move_memcpy(copy_work->clear, copy_work->num_pages,
+				copy_work->dst_iter, copy_work->src_iter);
+
+	/*
+	 * Can't signal before we unref the rsgts, because then
+	 * ttms might be unpopulated before we unref these and we'll hit
+	 * a GEM_WARN_ON() in i915_ttm_tt_unpopulate. Not a real problem,
+	 * but good to keep the GEM_WARN_ON to check that we don't leak rsgts.
+	 */
+	i915_refct_sgt_put(copy_work->src_rsgt);
+	i915_refct_sgt_put(copy_work->dst_rsgt);
+}
+
+static const struct dma_fence_work_ops i915_ttm_memcpy_ops = {
+	.work = __memcpy_work,
+};
+
+static void i915_ttm_memcpy_work_init(struct i915_ttm_memcpy_work *copy_work,
+				      struct ttm_buffer_object *bo, bool clear,
+				      struct ttm_resource *dst_mem,
+				      struct ttm_tt *dst_ttm,
+				      struct i915_refct_sgt *dst_rsgt)
+{
+	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+	struct intel_memory_region *dst_reg, *src_reg;
+
+	dst_reg = i915_ttm_region(bo->bdev, dst_mem->mem_type);
+	src_reg = i915_ttm_region(bo->bdev, bo->resource->mem_type);
+	GEM_BUG_ON(!dst_reg || !src_reg);
+
+	/*
+	 * We could consider populating only parts of this structure
+	 * (like avoiding the iterators) until it's actually
+	 * determined that we need it. But initializing the iterators
+	 * shouldn't be that costly really.
+	 */
+
+	copy_work->dst_iter = !cpu_maps_iomem(dst_mem) ?
+		ttm_kmap_iter_tt_init(&copy_work->_dst_iter.tt, dst_ttm) :
+		ttm_kmap_iter_iomap_init(&copy_work->_dst_iter.io, &dst_reg->iomap,
+					 &dst_rsgt->table, dst_reg->region.start);
+
+	copy_work->src_iter = !cpu_maps_iomem(bo->resource) ?
+		ttm_kmap_iter_tt_init(&copy_work->_src_iter.tt, bo->ttm) :
+		ttm_kmap_iter_iomap_init(&copy_work->_src_iter.io, &src_reg->iomap,
+					 &obj->ttm.cached_io_rsgt->table,
+					 src_reg->region.start);
+	copy_work->clear = clear;
+	copy_work->num_pages = bo->base.size >> PAGE_SHIFT;
+
+	copy_work->dst_rsgt = i915_refct_sgt_get(dst_rsgt);
+	copy_work->src_rsgt = clear ? NULL :
+		i915_ttm_resource_get_st(obj, bo->resource);
 }
 
-static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-			    struct ttm_resource *dst_mem,
-			    struct ttm_tt *dst_ttm,
-			    struct i915_refct_sgt *dst_rsgt,
-			    bool allow_accel)
+/*
+ * This is only used as a last fallback if the copy_work
+ * memory allocation fails, prohibiting async moves.
+ */
+static void __i915_ttm_move_fallback(struct ttm_buffer_object *bo, bool clear,
+				     struct ttm_resource *dst_mem,
+				     struct ttm_tt *dst_ttm,
+				     struct i915_refct_sgt *dst_rsgt,
+				     bool allow_accel)
 {
 	int ret = -EINVAL;
 
-	if (allow_accel)
-		ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
-					  &dst_rsgt->table);
-	if (ret) {
-		struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-		struct intel_memory_region *dst_reg, *src_reg;
-		union {
-			struct ttm_kmap_iter_tt tt;
-			struct ttm_kmap_iter_iomap io;
-		} _dst_iter, _src_iter;
-		struct ttm_kmap_iter *dst_iter, *src_iter;
-
-		dst_reg = i915_ttm_region(bo->bdev, dst_mem->mem_type);
-		src_reg = i915_ttm_region(bo->bdev, bo->resource->mem_type);
-		GEM_BUG_ON(!dst_reg || !src_reg);
-
-		dst_iter = !cpu_maps_iomem(dst_mem) ?
-			ttm_kmap_iter_tt_init(&_dst_iter.tt, dst_ttm) :
-			ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
-						 &dst_rsgt->table,
-						 dst_reg->region.start);
-
-		src_iter = !cpu_maps_iomem(bo->resource) ?
-			ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
-			ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
-						 &obj->ttm.cached_io_rsgt->table,
-						 src_reg->region.start);
-
-		ttm_move_memcpy(clear, dst_mem->num_pages, dst_iter, src_iter);
+	if (allow_accel) {
+		struct dma_fence *fence;
+
+		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
+					    &dst_rsgt->table);
+		if (IS_ERR(fence)) {
+			ret = PTR_ERR(fence);
+		} else {
+			ret = dma_fence_wait(fence, false);
+			if (!ret)
+				ret = fence->error;
+			dma_fence_put(fence);
+		}
+	}
+
+	if (ret || I915_SELFTEST_ONLY(fail_gpu_migration)) {
+		struct i915_ttm_memcpy_work copy_work;
+
+		i915_ttm_memcpy_work_init(&copy_work, bo, clear, dst_mem,
+					  dst_ttm, dst_rsgt);
+
+		/* Trigger a copy by setting an error value */
+		copy_work.base.dma.error = -EINVAL;
+		__memcpy_work(&copy_work.base);
+	}
+}
+
+static int __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+			   struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
+			   struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+{
+	struct i915_ttm_memcpy_work *copy_work;
+	struct dma_fence *fence;
+	int ret;
+
+	if (!I915_SELFTEST_ONLY(fail_work_allocation))
+		copy_work = kzalloc(sizeof(*copy_work), GFP_KERNEL);
+	else
+		copy_work = NULL;
+
+	if (!copy_work) {
+		/* Don't fail with -ENOMEM. Move sync instead. */
+		__i915_ttm_move_fallback(bo, clear, dst_mem, dst_ttm, dst_rsgt,
+					 allow_accel);
+		return 0;
+	}
+
+	dma_fence_work_init(&copy_work->base, &i915_ttm_memcpy_ops);
+	if (allow_accel) {
+		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
+					    &dst_rsgt->table);
+		if (IS_ERR(fence)) {
+			i915_sw_fence_set_error_once(&copy_work->base.chain,
+						     PTR_ERR(fence));
+		} else {
+			ret = dma_fence_work_chain(&copy_work->base, fence);
+			dma_fence_put(fence);
+			GEM_WARN_ON(ret < 0);
+		}
+	} else {
+		i915_sw_fence_set_error_once(&copy_work->base.chain, -EINVAL);
 	}
+
+	/* Setup async memcpy */
+	i915_ttm_memcpy_work_init(copy_work, bo, clear, dst_mem, dst_ttm,
+				  dst_rsgt);
+	fence = dma_fence_get(&copy_work->base.dma);
+	dma_fence_work_commit_imm(&copy_work->base);
+
+	/*
+	 * We're synchronizing here for now. For async moves, return the
+	 * fence.
+	 */
+	dma_fence_wait(fence, false);
+	dma_fence_put(fence);
+
+	return ret;
 }
 
 static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 0b7291dd897c..c5bf8863446d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -51,6 +51,10 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 			  struct drm_i915_gem_object *src,
 			  bool allow_accel, bool intr);
 
+I915_SELFTEST_DECLARE
+(void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
+					 bool work_allocation);)
+
 /* Internal I915 TTM declarations and definitions below. */
 
 #define I915_PL_LMEM0 TTM_PL_PRIV
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index 28a700f08b49..a2122bdcc1cb 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -4,6 +4,7 @@
  */
 
 #include "gt/intel_migrate.h"
+#include "gem/i915_gem_ttm.h"
 
 static int igt_fill_check_buffer(struct drm_i915_gem_object *obj,
 				 bool fill)
@@ -227,13 +228,34 @@ static int igt_lmem_pages_migrate(void *arg)
 	return err;
 }
 
+static int igt_lmem_pages_failsafe_migrate(void *arg)
+{
+	int fail_gpu, fail_alloc, ret;
+
+	for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
+		for (fail_alloc = 0; fail_alloc < 2; ++fail_alloc) {
+			pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
+				fail_gpu, fail_alloc);
+			i915_ttm_migrate_set_failure_modes(fail_gpu,
+							   fail_alloc);
+			ret = igt_lmem_pages_migrate(arg);
+			if (ret)
+				goto out_err;
+		}
+	}
+
+out_err:
+	i915_ttm_migrate_set_failure_modes(false, false);
+	return ret;
+}
+
 int i915_gem_migrate_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(igt_smem_create_migrate),
 		SUBTEST(igt_lmem_create_migrate),
 		SUBTEST(igt_same_create_migrate),
-		SUBTEST(igt_lmem_pages_migrate),
+		SUBTEST(igt_lmem_pages_failsafe_migrate),
 	};
 
 	if (!HAS_LMEM(i915))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH 3/6] drm/i915/ttm: Failsafe migration blits
@ 2021-10-08 13:35   ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

If the initial fill blit or copy blit of an object fails, the old
content of the data might be exposed and read as soon as either CPU- or
GPU PTEs are set up to point at the pages.

Intercept the blit fence with an async dma_fence_work that checks the
blit fence for errors and if there are errors performs an async cpu blit
instead. If there is a failure to allocate the async dma_fence_work,
allocate it on the stack and sync wait for the blit to complete.

Add selftests that simulate gpu blit failures and failure to allocate
the async dma_fence_work.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 268 ++++++++++++++----
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   4 +
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  24 +-
 3 files changed, 240 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4b4d7457bef9..79d4d50aa4e5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -7,6 +7,7 @@
 #include <drm/ttm/ttm_placement.h>
 
 #include "i915_drv.h"
+#include "i915_sw_fence_work.h"
 #include "intel_memory_region.h"
 #include "intel_region_ttm.h"
 
@@ -25,6 +26,18 @@
 #define I915_TTM_PRIO_NO_PAGES  1
 #define I915_TTM_PRIO_HAS_PAGES 2
 
+I915_SELFTEST_DECLARE(static bool fail_gpu_migration;)
+I915_SELFTEST_DECLARE(static bool fail_work_allocation;)
+
+#ifdef CONFIG_DRM_I915_SELFTEST
+void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
+					bool work_allocation)
+{
+	fail_gpu_migration = gpu_migration;
+	fail_work_allocation = work_allocation;
+}
+#endif
+
 /*
  * Size of struct ttm_place vector in on-stack struct ttm_placement allocs
  */
@@ -466,11 +479,11 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 	return intel_region_ttm_resource_to_rsgt(obj->mm.region, res);
 }
 
-static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
-			       bool clear,
-			       struct ttm_resource *dst_mem,
-			       struct ttm_tt *dst_ttm,
-			       struct sg_table *dst_st)
+static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
+					     bool clear,
+					     struct ttm_resource *dst_mem,
+					     struct ttm_tt *dst_ttm,
+					     struct sg_table *dst_st)
 {
 	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
 						     bdev);
@@ -481,30 +494,29 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
 	int ret;
 
 	if (!i915->gt.migrate.context || intel_gt_is_wedged(&i915->gt))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
+
+	/* With fail_gpu_migration, we always perform a GPU clear. */
+	if (I915_SELFTEST_ONLY(fail_gpu_migration))
+		clear = true;
 
 	dst_level = i915_ttm_cache_level(i915, dst_mem, dst_ttm);
 	if (clear) {
-		if (bo->type == ttm_bo_type_kernel)
-			return -EINVAL;
+		if (bo->type == ttm_bo_type_kernel &&
+		    !I915_SELFTEST_ONLY(fail_gpu_migration))
+			return ERR_PTR(-EINVAL);
 
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
 		ret = intel_context_migrate_clear(i915->gt.migrate.context, NULL,
 						  dst_st->sgl, dst_level,
 						  gpu_binds_iomem(dst_mem),
 						  0, &rq);
-
-		if (!ret && rq) {
-			i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
-			i915_request_put(rq);
-		}
-		intel_engine_pm_put(i915->gt.migrate.context->engine);
 	} else {
 		struct i915_refct_sgt *src_rsgt =
 			i915_ttm_resource_get_st(obj, bo->resource);
 
 		if (IS_ERR(src_rsgt))
-			return PTR_ERR(src_rsgt);
+			return ERR_CAST(src_rsgt);
 
 		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(i915->gt.migrate.context->engine);
@@ -515,55 +527,201 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
 						 dst_st->sgl, dst_level,
 						 gpu_binds_iomem(dst_mem),
 						 &rq);
+
 		i915_refct_sgt_put(src_rsgt);
-		if (!ret && rq) {
-			i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
-			i915_request_put(rq);
-		}
-		intel_engine_pm_put(i915->gt.migrate.context->engine);
 	}
 
-	return ret;
+	intel_engine_pm_put(i915->gt.migrate.context->engine);
+
+	if (ret && rq) {
+		i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
+		i915_request_put(rq);
+	}
+
+	return ret ? ERR_PTR(ret) : &rq->fence;
+}
+
+/**
+ * struct i915_ttm_memcpy_work - memcpy work item under a dma-fence
+ * @base: The struct dma_fence_work we subclass.
+ * @_dst_iter: Storage space for the destination kmap iterator.
+ * @_src_iter: Storage space for the source kmap iterator.
+ * @dst_iter: Pointer to the destination kmap iterator.
+ * @src_iter: Pointer to the source kmap iterator.
+ * @clear: Whether to clear instead of copy.
+ * @num_pages: Number of pages in the copy.
+ * @src_rsgt: Refcounted scatter-gather list of source memory.
+ * @dst_rsgt: Refcounted scatter-gather list of destination memory.
+ */
+struct i915_ttm_memcpy_work {
+	struct dma_fence_work base;
+	union {
+		struct ttm_kmap_iter_tt tt;
+		struct ttm_kmap_iter_iomap io;
+	} _dst_iter,
+	_src_iter;
+	struct ttm_kmap_iter *dst_iter;
+	struct ttm_kmap_iter *src_iter;
+	unsigned long num_pages;
+	bool clear;
+	struct i915_refct_sgt *src_rsgt;
+	struct i915_refct_sgt *dst_rsgt;
+};
+
+static void __memcpy_work(struct dma_fence_work *work)
+{
+	struct i915_ttm_memcpy_work *copy_work =
+		container_of(work, typeof(*copy_work), base);
+
+	if (I915_SELFTEST_ONLY(fail_gpu_migration))
+		cmpxchg(&work->error, 0, -EINVAL);
+
+	/* If there was an error in the gpu copy operation, run memcpy. */
+	if (work->error)
+		ttm_move_memcpy(copy_work->clear, copy_work->num_pages,
+				copy_work->dst_iter, copy_work->src_iter);
+
+	/*
+	 * Can't signal before we unref the rsgts, because then
+	 * ttms might be unpopulated before we unref these and we'll hit
+	 * a GEM_WARN_ON() in i915_ttm_tt_unpopulate. Not a real problem,
+	 * but good to keep the GEM_WARN_ON to check that we don't leak rsgts.
+	 */
+	i915_refct_sgt_put(copy_work->src_rsgt);
+	i915_refct_sgt_put(copy_work->dst_rsgt);
+}
+
+static const struct dma_fence_work_ops i915_ttm_memcpy_ops = {
+	.work = __memcpy_work,
+};
+
+static void i915_ttm_memcpy_work_init(struct i915_ttm_memcpy_work *copy_work,
+				      struct ttm_buffer_object *bo, bool clear,
+				      struct ttm_resource *dst_mem,
+				      struct ttm_tt *dst_ttm,
+				      struct i915_refct_sgt *dst_rsgt)
+{
+	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+	struct intel_memory_region *dst_reg, *src_reg;
+
+	dst_reg = i915_ttm_region(bo->bdev, dst_mem->mem_type);
+	src_reg = i915_ttm_region(bo->bdev, bo->resource->mem_type);
+	GEM_BUG_ON(!dst_reg || !src_reg);
+
+	/*
+	 * We could consider populating only parts of this structure
+	 * (like avoiding the iterators) until it's actually
+	 * determined that we need it. But initializing the iterators
+	 * shouldn't be that costly really.
+	 */
+
+	copy_work->dst_iter = !cpu_maps_iomem(dst_mem) ?
+		ttm_kmap_iter_tt_init(&copy_work->_dst_iter.tt, dst_ttm) :
+		ttm_kmap_iter_iomap_init(&copy_work->_dst_iter.io, &dst_reg->iomap,
+					 &dst_rsgt->table, dst_reg->region.start);
+
+	copy_work->src_iter = !cpu_maps_iomem(bo->resource) ?
+		ttm_kmap_iter_tt_init(&copy_work->_src_iter.tt, bo->ttm) :
+		ttm_kmap_iter_iomap_init(&copy_work->_src_iter.io, &src_reg->iomap,
+					 &obj->ttm.cached_io_rsgt->table,
+					 src_reg->region.start);
+	copy_work->clear = clear;
+	copy_work->num_pages = bo->base.size >> PAGE_SHIFT;
+
+	copy_work->dst_rsgt = i915_refct_sgt_get(dst_rsgt);
+	copy_work->src_rsgt = clear ? NULL :
+		i915_ttm_resource_get_st(obj, bo->resource);
 }
 
-static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-			    struct ttm_resource *dst_mem,
-			    struct ttm_tt *dst_ttm,
-			    struct i915_refct_sgt *dst_rsgt,
-			    bool allow_accel)
+/*
+ * This is only used as a last fallback if the copy_work
+ * memory allocation fails, prohibiting async moves.
+ */
+static void __i915_ttm_move_fallback(struct ttm_buffer_object *bo, bool clear,
+				     struct ttm_resource *dst_mem,
+				     struct ttm_tt *dst_ttm,
+				     struct i915_refct_sgt *dst_rsgt,
+				     bool allow_accel)
 {
 	int ret = -EINVAL;
 
-	if (allow_accel)
-		ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
-					  &dst_rsgt->table);
-	if (ret) {
-		struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-		struct intel_memory_region *dst_reg, *src_reg;
-		union {
-			struct ttm_kmap_iter_tt tt;
-			struct ttm_kmap_iter_iomap io;
-		} _dst_iter, _src_iter;
-		struct ttm_kmap_iter *dst_iter, *src_iter;
-
-		dst_reg = i915_ttm_region(bo->bdev, dst_mem->mem_type);
-		src_reg = i915_ttm_region(bo->bdev, bo->resource->mem_type);
-		GEM_BUG_ON(!dst_reg || !src_reg);
-
-		dst_iter = !cpu_maps_iomem(dst_mem) ?
-			ttm_kmap_iter_tt_init(&_dst_iter.tt, dst_ttm) :
-			ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
-						 &dst_rsgt->table,
-						 dst_reg->region.start);
-
-		src_iter = !cpu_maps_iomem(bo->resource) ?
-			ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
-			ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
-						 &obj->ttm.cached_io_rsgt->table,
-						 src_reg->region.start);
-
-		ttm_move_memcpy(clear, dst_mem->num_pages, dst_iter, src_iter);
+	if (allow_accel) {
+		struct dma_fence *fence;
+
+		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
+					    &dst_rsgt->table);
+		if (IS_ERR(fence)) {
+			ret = PTR_ERR(fence);
+		} else {
+			ret = dma_fence_wait(fence, false);
+			if (!ret)
+				ret = fence->error;
+			dma_fence_put(fence);
+		}
+	}
+
+	if (ret || I915_SELFTEST_ONLY(fail_gpu_migration)) {
+		struct i915_ttm_memcpy_work copy_work;
+
+		i915_ttm_memcpy_work_init(&copy_work, bo, clear, dst_mem,
+					  dst_ttm, dst_rsgt);
+
+		/* Trigger a copy by setting an error value */
+		copy_work.base.dma.error = -EINVAL;
+		__memcpy_work(&copy_work.base);
+	}
+}
+
+static int __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+			   struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
+			   struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+{
+	struct i915_ttm_memcpy_work *copy_work;
+	struct dma_fence *fence;
+	int ret;
+
+	if (!I915_SELFTEST_ONLY(fail_work_allocation))
+		copy_work = kzalloc(sizeof(*copy_work), GFP_KERNEL);
+	else
+		copy_work = NULL;
+
+	if (!copy_work) {
+		/* Don't fail with -ENOMEM. Move sync instead. */
+		__i915_ttm_move_fallback(bo, clear, dst_mem, dst_ttm, dst_rsgt,
+					 allow_accel);
+		return 0;
+	}
+
+	dma_fence_work_init(&copy_work->base, &i915_ttm_memcpy_ops);
+	if (allow_accel) {
+		fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
+					    &dst_rsgt->table);
+		if (IS_ERR(fence)) {
+			i915_sw_fence_set_error_once(&copy_work->base.chain,
+						     PTR_ERR(fence));
+		} else {
+			ret = dma_fence_work_chain(&copy_work->base, fence);
+			dma_fence_put(fence);
+			GEM_WARN_ON(ret < 0);
+		}
+	} else {
+		i915_sw_fence_set_error_once(&copy_work->base.chain, -EINVAL);
 	}
+
+	/* Setup async memcpy */
+	i915_ttm_memcpy_work_init(copy_work, bo, clear, dst_mem, dst_ttm,
+				  dst_rsgt);
+	fence = dma_fence_get(&copy_work->base.dma);
+	dma_fence_work_commit_imm(&copy_work->base);
+
+	/*
+	 * We're synchronizing here for now. For async moves, return the
+	 * fence.
+	 */
+	dma_fence_wait(fence, false);
+	dma_fence_put(fence);
+
+	return ret;
 }
 
 static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 0b7291dd897c..c5bf8863446d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -51,6 +51,10 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 			  struct drm_i915_gem_object *src,
 			  bool allow_accel, bool intr);
 
+I915_SELFTEST_DECLARE
+(void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
+					 bool work_allocation);)
+
 /* Internal I915 TTM declarations and definitions below. */
 
 #define I915_PL_LMEM0 TTM_PL_PRIV
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index 28a700f08b49..a2122bdcc1cb 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -4,6 +4,7 @@
  */
 
 #include "gt/intel_migrate.h"
+#include "gem/i915_gem_ttm.h"
 
 static int igt_fill_check_buffer(struct drm_i915_gem_object *obj,
 				 bool fill)
@@ -227,13 +228,34 @@ static int igt_lmem_pages_migrate(void *arg)
 	return err;
 }
 
+static int igt_lmem_pages_failsafe_migrate(void *arg)
+{
+	int fail_gpu, fail_alloc, ret;
+
+	for (fail_gpu = 0; fail_gpu < 2; ++fail_gpu) {
+		for (fail_alloc = 0; fail_alloc < 2; ++fail_alloc) {
+			pr_info("Simulated failure modes: gpu: %d, alloc: %d\n",
+				fail_gpu, fail_alloc);
+			i915_ttm_migrate_set_failure_modes(fail_gpu,
+							   fail_alloc);
+			ret = igt_lmem_pages_migrate(arg);
+			if (ret)
+				goto out_err;
+		}
+	}
+
+out_err:
+	i915_ttm_migrate_set_failure_modes(false, false);
+	return ret;
+}
+
 int i915_gem_migrate_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(igt_smem_create_migrate),
 		SUBTEST(igt_lmem_create_migrate),
 		SUBTEST(igt_same_create_migrate),
-		SUBTEST(igt_lmem_pages_migrate),
+		SUBTEST(igt_lmem_pages_failsafe_migrate),
 	};
 
 	if (!HAS_LMEM(i915))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/6] drm/i915: Add a struct dma_fence_work timeline
  2021-10-08 13:35 ` [Intel-gfx] " Thomas Hellström
@ 2021-10-08 13:35   ` Thomas Hellström
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

The TTM managers and, possibly, the gtt address space managers will
need to be able to order fences for async operation.
Using dma_fence_is_later() for this will require that the fences we hand
them are from a single fence context and ordered.

Introduce a struct dma_fence_work_timeline, and a function to attach
struct dma_fence_work to such a timeline in a way that all previous
fences attached to the timeline will be signaled when the latest
attached struct dma_fence_work signals.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_sw_fence_work.c | 89 ++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_sw_fence_work.h | 58 +++++++++++++++
 2 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index 5b55cddafc9b..87cdb3158042 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -5,6 +5,66 @@
  */
 
 #include "i915_sw_fence_work.h"
+#include "i915_utils.h"
+
+/**
+ * dma_fence_work_timeline_attach - Attach a struct dma_fence_work to a
+ * timeline.
+ * @tl: The timeline to attach to.
+ * @f: The struct dma_fence_work.
+ * @tl_cb: The i915_sw_dma_fence_cb needed to attach to the
+ * timeline. This is typically embedded into the structure that also
+ * embeds the struct dma_fence_work.
+ *
+ * This function takes a timeline reference and associates it with the
+ * struct dma_fence_work. That reference is given up when the fence
+ * signals. Furthermore it assigns a fence context and a seqno to the
+ * dma-fence, and then chains upon the previous fence of the timeline
+ * if any, to make sure that the fence signals after that fence. The
+ * @tl_cb callback structure is needed for that chaining. Finally
+ * the registered last fence of the timeline is replaced by this fence, and
+ * the timeline takes a reference on the fence, which is released when
+ * the fence signals.
+ */
+void dma_fence_work_timeline_attach(struct dma_fence_work_timeline *tl,
+				    struct dma_fence_work *f,
+				    struct i915_sw_dma_fence_cb *tl_cb)
+{
+	struct dma_fence *await;
+
+	if (tl->ops->get)
+		tl->ops->get(tl);
+
+	spin_lock(&tl->lock);
+	await = tl->last_fence;
+	tl->last_fence = dma_fence_get(&f->dma);
+	f->dma.seqno = tl->seqno++;
+	f->dma.context = tl->context;
+	f->tl = tl;
+	spin_unlock(&tl->lock);
+
+	if (await) {
+		__i915_sw_fence_await_dma_fence(&f->chain, await, tl_cb);
+		dma_fence_put(await);
+	}
+}
+
+static void dma_fence_work_timeline_detach(struct dma_fence_work *f)
+{
+	struct dma_fence_work_timeline *tl = f->tl;
+	bool put = false;
+
+	spin_lock(&tl->lock);
+	if (tl->last_fence == &f->dma) {
+		put = true;
+		tl->last_fence = NULL;
+	}
+	spin_unlock(&tl->lock);
+	if (tl->ops->put)
+		tl->ops->put(tl);
+	if (put)
+		dma_fence_put(&f->dma);
+}
 
 static void dma_fence_work_complete(struct dma_fence_work *f)
 {
@@ -13,6 +73,9 @@ static void dma_fence_work_complete(struct dma_fence_work *f)
 	if (f->ops->release)
 		f->ops->release(f);
 
+	if (f->tl)
+		dma_fence_work_timeline_detach(f);
+
 	dma_fence_put(&f->dma);
 }
 
@@ -53,14 +116,17 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 
 static const char *get_driver_name(struct dma_fence *fence)
 {
-	return "dma-fence";
+	struct dma_fence_work *f = container_of(fence, typeof(*f), dma);
+
+	return (f->tl && f->tl->ops->name) ? f->tl->ops->name : "dma-fence";
 }
 
 static const char *get_timeline_name(struct dma_fence *fence)
 {
 	struct dma_fence_work *f = container_of(fence, typeof(*f), dma);
 
-	return f->ops->name ?: "work";
+	return (f->tl && f->tl->name) ? f->tl->name :
+		f->ops->name ?: "work";
 }
 
 static void fence_release(struct dma_fence *fence)
@@ -84,6 +150,7 @@ void dma_fence_work_init(struct dma_fence_work *f,
 {
 	f->ops = ops;
 	f->error = 0;
+	f->tl = NULL;
 	spin_lock_init(&f->lock);
 	dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0);
 	i915_sw_fence_init(&f->chain, fence_notify);
@@ -97,3 +164,21 @@ int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal)
 
 	return __i915_sw_fence_await_dma_fence(&f->chain, signal, &f->cb);
 }
+
+/**
+ * dma_fence_work_timeline_init - Initialize a dma_fence_work timeline
+ * @tl: The timeline to initialize,
+ * @name: The name of the timeline,
+ * @ops: The timeline operations.
+ */
+void dma_fence_work_timeline_init(struct dma_fence_work_timeline *tl,
+				  const char *name,
+				  const struct dma_fence_work_timeline_ops *ops)
+{
+	tl->name = name;
+	spin_lock_init(&tl->lock);
+	tl->context = dma_fence_context_alloc(1);
+	tl->seqno = 0;
+	tl->last_fence = NULL;
+	tl->ops = ops;
+}
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
index caa59fb5252b..6f41ee360133 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
@@ -14,6 +14,53 @@
 #include "i915_sw_fence.h"
 
 struct dma_fence_work;
+struct dma_fence_work_timeline;
+
+/**
+ * struct dma_fence_work_timeline_ops - Timeline operations struct
+ * @name: Timeline ops name. This field is used if the timeline itself has
+ * a NULL name. Can be set to NULL in which case a default name is used.
+ *
+ * The struct dma_fence_work_timeline is intended to be embeddable.
+ * We use the ops to get and put the parent structure.
+ */
+struct dma_fence_work_timeline_ops {
+	/**
+	 * Timeline ops name. Used if the timeline itself has no name.
+	 */
+	const char *name;
+
+	/**
+	 * put() - Put the structure embedding the timeline
+	 * @tl: The timeline
+	 */
+	void (*put)(struct dma_fence_work_timeline *tl);
+
+	/**
+	 * get() - Get the structure embedding the timeline
+	 * @tl: The timeline
+	 */
+	void (*get)(struct dma_fence_work_timeline *tl);
+};
+
+/**
+ * struct dma_fence_work_timeline - Simple timeline struct for dma_fence_work
+ * @name: The name of the timeline. May be set to NULL. Immutable
+ * @lock: Protects mutable members of the structure.
+ * @context: The timeline fence context. Immutable.
+ * @seqno: The previous seqno used. Protected by @lock.
+ * @last_fence : The previous fence of the timeline. Protected by @lock.
+ * @ops: The timeline operations struct. Immutable.
+ */
+struct dma_fence_work_timeline {
+	const char *name;
+	/** Protects mutable members of the structure */
+	spinlock_t lock;
+	u64 context;
+	u64 seqno;
+	struct dma_fence *last_fence;
+	const struct dma_fence_work_timeline_ops *ops;
+};
 
 struct dma_fence_work_ops {
 	const char *name;
@@ -30,6 +77,9 @@ struct dma_fence_work {
 	struct i915_sw_dma_fence_cb cb;
 
 	struct work_struct work;
+
+	struct dma_fence_work_timeline *tl;
+
 	const struct dma_fence_work_ops *ops;
 };
 
@@ -65,4 +115,12 @@ static inline void dma_fence_work_commit_imm(struct dma_fence_work *f)
 	dma_fence_work_commit(f);
 }
 
+void dma_fence_work_timeline_attach(struct dma_fence_work_timeline *tl,
+				    struct dma_fence_work *f,
+				    struct i915_sw_dma_fence_cb *tl_cb);
+
+void dma_fence_work_timeline_init(struct dma_fence_work_timeline *tl,
+				  const char *name,
+				  const struct dma_fence_work_timeline_ops *ops);
+
 #endif /* I915_SW_FENCE_WORK_H */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH 4/6] drm/i915: Add a struct dma_fence_work timeline
@ 2021-10-08 13:35   ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

The TTM managers and, possibly, the gtt address space managers will
need to be able to order fences for async operation.
Using dma_fence_is_later() for this will require that the fences we hand
them are from a single fence context and ordered.

Introduce a struct dma_fence_work_timeline, and a function to attach
struct dma_fence_work to such a timeline in a way that all previous
fences attached to the timeline will be signaled when the latest
attached struct dma_fence_work signals.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_sw_fence_work.c | 89 ++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_sw_fence_work.h | 58 +++++++++++++++
 2 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index 5b55cddafc9b..87cdb3158042 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -5,6 +5,66 @@
  */
 
 #include "i915_sw_fence_work.h"
+#include "i915_utils.h"
+
+/**
+ * dma_fence_work_timeline_attach - Attach a struct dma_fence_work to a
+ * timeline.
+ * @tl: The timeline to attach to.
+ * @f: The struct dma_fence_work.
+ * @tl_cb: The i915_sw_dma_fence_cb needed to attach to the
+ * timeline. This is typically embedded into the structure that also
+ * embeds the struct dma_fence_work.
+ *
+ * This function takes a timeline reference and associates it with the
+ * struct dma_fence_work. That reference is given up when the fence
+ * signals. Furthermore it assigns a fence context and a seqno to the
+ * dma-fence, and then chains upon the previous fence of the timeline
+ * if any, to make sure that the fence signals after that fence. The
+ * @tl_cb callback structure is needed for that chaining. Finally
+ * the registered last fence of the timeline is replaced by this fence, and
+ * the timeline takes a reference on the fence, which is released when
+ * the fence signals.
+ */
+void dma_fence_work_timeline_attach(struct dma_fence_work_timeline *tl,
+				    struct dma_fence_work *f,
+				    struct i915_sw_dma_fence_cb *tl_cb)
+{
+	struct dma_fence *await;
+
+	if (tl->ops->get)
+		tl->ops->get(tl);
+
+	spin_lock(&tl->lock);
+	await = tl->last_fence;
+	tl->last_fence = dma_fence_get(&f->dma);
+	f->dma.seqno = tl->seqno++;
+	f->dma.context = tl->context;
+	f->tl = tl;
+	spin_unlock(&tl->lock);
+
+	if (await) {
+		__i915_sw_fence_await_dma_fence(&f->chain, await, tl_cb);
+		dma_fence_put(await);
+	}
+}
+
+static void dma_fence_work_timeline_detach(struct dma_fence_work *f)
+{
+	struct dma_fence_work_timeline *tl = f->tl;
+	bool put = false;
+
+	spin_lock(&tl->lock);
+	if (tl->last_fence == &f->dma) {
+		put = true;
+		tl->last_fence = NULL;
+	}
+	spin_unlock(&tl->lock);
+	if (tl->ops->put)
+		tl->ops->put(tl);
+	if (put)
+		dma_fence_put(&f->dma);
+}
 
 static void dma_fence_work_complete(struct dma_fence_work *f)
 {
@@ -13,6 +73,9 @@ static void dma_fence_work_complete(struct dma_fence_work *f)
 	if (f->ops->release)
 		f->ops->release(f);
 
+	if (f->tl)
+		dma_fence_work_timeline_detach(f);
+
 	dma_fence_put(&f->dma);
 }
 
@@ -53,14 +116,17 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 
 static const char *get_driver_name(struct dma_fence *fence)
 {
-	return "dma-fence";
+	struct dma_fence_work *f = container_of(fence, typeof(*f), dma);
+
+	return (f->tl && f->tl->ops->name) ? f->tl->ops->name : "dma-fence";
 }
 
 static const char *get_timeline_name(struct dma_fence *fence)
 {
 	struct dma_fence_work *f = container_of(fence, typeof(*f), dma);
 
-	return f->ops->name ?: "work";
+	return (f->tl && f->tl->name) ? f->tl->name :
+		f->ops->name ?: "work";
 }
 
 static void fence_release(struct dma_fence *fence)
@@ -84,6 +150,7 @@ void dma_fence_work_init(struct dma_fence_work *f,
 {
 	f->ops = ops;
 	f->error = 0;
+	f->tl = NULL;
 	spin_lock_init(&f->lock);
 	dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0);
 	i915_sw_fence_init(&f->chain, fence_notify);
@@ -97,3 +164,21 @@ int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal)
 
 	return __i915_sw_fence_await_dma_fence(&f->chain, signal, &f->cb);
 }
+
+/**
+ * dma_fence_work_timeline_init - Initialize a dma_fence_work timeline
+ * @tl: The timeline to initialize,
+ * @name: The name of the timeline,
+ * @ops: The timeline operations.
+ */
+void dma_fence_work_timeline_init(struct dma_fence_work_timeline *tl,
+				  const char *name,
+				  const struct dma_fence_work_timeline_ops *ops)
+{
+	tl->name = name;
+	spin_lock_init(&tl->lock);
+	tl->context = dma_fence_context_alloc(1);
+	tl->seqno = 0;
+	tl->last_fence = NULL;
+	tl->ops = ops;
+}
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
index caa59fb5252b..6f41ee360133 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
@@ -14,6 +14,53 @@
 #include "i915_sw_fence.h"
 
 struct dma_fence_work;
+struct dma_fence_work_timeline;
+
+/**
+ * struct dma_fence_work_timeline_ops - Timeline operations struct
+ * @name: Timeline ops name. This field is used if the timeline itself has
+ * a NULL name. Can be set to NULL in which case a default name is used.
+ *
+ * The struct dma_fence_work_timeline is intended to be embeddable.
+ * We use the ops to get and put the parent structure.
+ */
+struct dma_fence_work_timeline_ops {
+	/**
+	 * Timeline ops name. Used if the timeline itself has no name.
+	 */
+	const char *name;
+
+	/**
+	 * put() - Put the structure embedding the timeline
+	 * @tl: The timeline
+	 */
+	void (*put)(struct dma_fence_work_timeline *tl);
+
+	/**
+	 * get() - Get the structure embedding the timeline
+	 * @tl: The timeline
+	 */
+	void (*get)(struct dma_fence_work_timeline *tl);
+};
+
+/**
+ * struct dma_fence_work_timeline - Simple timeline struct for dma_fence_work
+ * @name: The name of the timeline. May be set to NULL. Immutable
+ * @lock: Protects mutable members of the structure.
+ * @context: The timeline fence context. Immutable.
+ * @seqno: The previous seqno used. Protected by @lock.
+ * @last_fence : The previous fence of the timeline. Protected by @lock.
+ * @ops: The timeline operations struct. Immutable.
+ */
+struct dma_fence_work_timeline {
+	const char *name;
+	/** Protects mutable members of the structure */
+	spinlock_t lock;
+	u64 context;
+	u64 seqno;
+	struct dma_fence *last_fence;
+	const struct dma_fence_work_timeline_ops *ops;
+};
 
 struct dma_fence_work_ops {
 	const char *name;
@@ -30,6 +77,9 @@ struct dma_fence_work {
 	struct i915_sw_dma_fence_cb cb;
 
 	struct work_struct work;
+
+	struct dma_fence_work_timeline *tl;
+
 	const struct dma_fence_work_ops *ops;
 };
 
@@ -65,4 +115,12 @@ static inline void dma_fence_work_commit_imm(struct dma_fence_work *f)
 	dma_fence_work_commit(f);
 }
 
+void dma_fence_work_timeline_attach(struct dma_fence_work_timeline *tl,
+				    struct dma_fence_work *f,
+				    struct i915_sw_dma_fence_cb *tl_cb);
+
+void dma_fence_work_timeline_init(struct dma_fence_work_timeline *tl,
+				  const char *name,
+				  const struct dma_fence_work_timeline_ops *ops);
+
 #endif /* I915_SW_FENCE_WORK_H */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/6] drm/i915/ttm: Attach the migration fence to a region timeline on eviction
  2021-10-08 13:35 ` [Intel-gfx] " Thomas Hellström
@ 2021-10-08 13:35   ` Thomas Hellström
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

On eviction, TTM requires that migration fences from the same region are
ordered using dma_fence_is_later(). For request-based fences we therefore
need to use the same context for the migration, but now that we use a
dma_fence_work for error recovery, and, in addition, might need to coalesce
the migration fence with async unbind fences, Create a coalesce fence for
this.

Chain the coalesce fence on the migration fence and attach it to a region
timeline.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c    | 84 ++++++++++++++++++----
 drivers/gpu/drm/i915/intel_memory_region.c | 43 +++++++++++
 drivers/gpu/drm/i915/intel_memory_region.h |  7 ++
 3 files changed, 119 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 79d4d50aa4e5..625ce52e8662 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -672,9 +672,10 @@ static void __i915_ttm_move_fallback(struct ttm_buffer_object *bo, bool clear,
 	}
 }
 
-static int __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-			   struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
-			   struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+static struct dma_fence *
+__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+		struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
+		struct i915_refct_sgt *dst_rsgt, bool allow_accel)
 {
 	struct i915_ttm_memcpy_work *copy_work;
 	struct dma_fence *fence;
@@ -689,7 +690,7 @@ static int __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 		/* Don't fail with -ENOMEM. Move sync instead. */
 		__i915_ttm_move_fallback(bo, clear, dst_mem, dst_ttm, dst_rsgt,
 					 allow_accel);
-		return 0;
+		return NULL;
 	}
 
 	dma_fence_work_init(&copy_work->base, &i915_ttm_memcpy_ops);
@@ -714,14 +715,45 @@ static int __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 	fence = dma_fence_get(&copy_work->base.dma);
 	dma_fence_work_commit_imm(&copy_work->base);
 
-	/*
-	 * We're synchronizing here for now. For async moves, return the
-	 * fence.
-	 */
-	dma_fence_wait(fence, false);
-	dma_fence_put(fence);
+	return fence;
+}
 
-	return ret;
+/**
+ * struct i915_coalesce_fence - A dma-fence used to coalesce multiple fences
+ * similar to struct dm_fence_array, and at the same time being timeline-
+ * attached.
+ * @base: struct dma_fence_work base.
+ * @cb: Callback for timeline attachment.
+ */
+struct i915_coalesce_fence {
+	struct dma_fence_work base;
+	struct i915_sw_dma_fence_cb cb;
+};
+
+/* No .work or .release callback. Just coalescing. */
+static const struct dma_fence_work_ops i915_coalesce_fence_ops = {
+	.name = "Coalesce fence",
+};
+
+static struct dma_fence *
+i915_ttm_coalesce_fence(struct dma_fence *fence, struct intel_memory_region *mr)
+{
+	struct i915_coalesce_fence *coalesce =
+		kmalloc(sizeof(*coalesce), GFP_KERNEL);
+
+	if (!coalesce) {
+		dma_fence_wait(fence, false);
+		dma_fence_put(fence);
+		return NULL;
+	}
+
+	dma_fence_work_init(&coalesce->base, &i915_coalesce_fence_ops);
+	dma_fence_work_chain(&coalesce->base, fence);
+	dma_fence_work_timeline_attach(&mr->tl, &coalesce->base, &coalesce->cb);
+	dma_fence_get(&coalesce->base.dma);
+	dma_fence_work_commit_imm(&coalesce->base);
+	dma_fence_put(fence);
+	return &coalesce->base.dma;
 }
 
 static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
@@ -734,6 +766,7 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		ttm_manager_type(bo->bdev, dst_mem->mem_type);
 	struct ttm_tt *ttm = bo->ttm;
 	struct i915_refct_sgt *dst_rsgt;
+	struct dma_fence *fence = NULL;
 	bool clear;
 	int ret;
 
@@ -765,7 +798,23 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 
 	clear = !cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
 	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
-		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
+		fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
+	if (fence && evict) {
+		struct intel_memory_region *mr =
+			i915_ttm_region(bo->bdev, bo->resource->mem_type);
+
+		/*
+		 * Attach to the region timeline and for future async unbind,
+		 * which requires a timeline. Also future async unbind fences
+		 * can be attached here.
+		 */
+		fence = i915_ttm_coalesce_fence(fence, mr);
+	}
+
+	if (fence) {
+		dma_fence_wait(fence, false);
+		dma_fence_put(fence);
+	}
 
 	ttm_bo_move_sync_cleanup(bo, dst_mem);
 	i915_ttm_adjust_domains_after_move(obj);
@@ -1223,6 +1272,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		.interruptible = intr,
 	};
 	struct i915_refct_sgt *dst_rsgt;
+	struct dma_fence *fence;
 	int ret;
 
 	assert_object_held(dst);
@@ -1238,10 +1288,14 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		return ret;
 
 	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
-	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_rsgt, allow_accel);
-
+	fence = __i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
+				dst_rsgt, allow_accel);
 	i915_refct_sgt_put(dst_rsgt);
 
+	if (fence) {
+		dma_fence_wait(fence, false);
+		dma_fence_put(fence);
+	}
+
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index e7f7e6627750..aa1733e840f7 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -7,6 +7,9 @@
 #include "i915_drv.h"
 #include "i915_ttm_buddy_manager.h"
 
+static const struct dma_fence_work_timeline_ops tl_ops;
+static void intel_region_timeline_release_work(struct work_struct *work);
+
 static const struct {
 	u16 class;
 	u16 instance;
@@ -127,6 +130,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
 	}
 
 	kref_init(&mem->kref);
+
+	INIT_WORK(&mem->tl_put_work, intel_region_timeline_release_work);
+	dma_fence_work_timeline_init(&mem->tl, NULL, &tl_ops);
+
 	return mem;
 
 err_free:
@@ -238,6 +245,42 @@ void intel_memory_regions_driver_release(struct drm_i915_private *i915)
 	}
 }
 
+static void intel_region_timeline_get(struct dma_fence_work_timeline *tl)
+{
+	struct intel_memory_region *mr = container_of(tl, typeof(*mr), tl);
+
+	intel_memory_region_get(mr);
+}
+
+static void intel_region_timeline_release_work(struct work_struct *work)
+{
+	struct intel_memory_region *mr =
+		container_of(work, typeof(*mr), tl_put_work);
+
+	__intel_memory_region_destroy(&mr->kref);
+}
+
+static void intel_region_timeline_release(struct kref *ref)
+{
+	struct intel_memory_region *mr = container_of(ref, typeof(*mr), kref);
+
+	/* May be called from hardirq context, so queue the final release. */
+	queue_work(system_unbound_wq, &mr->tl_put_work);
+}
+
+static void intel_region_timeline_put(struct dma_fence_work_timeline *tl)
+{
+	struct intel_memory_region *mr = container_of(tl, typeof(*mr), tl);
+
+	kref_put(&mr->kref, intel_region_timeline_release);
+}
+
+static const struct dma_fence_work_timeline_ops tl_ops = {
+	.name = "Region timeline",
+	.get = intel_region_timeline_get,
+	.put = intel_region_timeline_put,
+};
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/intel_memory_region.c"
 #include "selftests/mock_region.c"
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 3feae3353d33..928819e2edba 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -13,6 +13,8 @@
 #include <drm/drm_mm.h>
 #include <drm/i915_drm.h>
 
+#include "i915_sw_fence_work.h"
+
 struct drm_i915_private;
 struct drm_i915_gem_object;
 struct drm_printer;
@@ -94,6 +96,11 @@ struct intel_memory_region {
 	bool is_range_manager;
 
 	void *region_private;
+
+	/** Timeline for TTM eviction fences */
+	struct dma_fence_work_timeline tl;
+	/** Work struct for _region_put() from atomic / irq context */
+	struct work_struct tl_put_work;
 };
 
 struct intel_memory_region *
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH 5/6] drm/i915/ttm: Attach the migration fence to a region timeline on eviction
@ 2021-10-08 13:35   ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

On eviction, TTM requires that migration fences from the same region are
ordered using dma_fence_is_later(). For request-based fences we therefore
need to use the same context for the migration, but now that we use a
dma_fence_work for error recovery, and, in addition, might need to coalesce
the migration fence with async unbind fences, Create a coalesce fence for
this.

Chain the coalesce fence on the migration fence and attach it to a region
timeline.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c    | 84 ++++++++++++++++++----
 drivers/gpu/drm/i915/intel_memory_region.c | 43 +++++++++++
 drivers/gpu/drm/i915/intel_memory_region.h |  7 ++
 3 files changed, 119 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 79d4d50aa4e5..625ce52e8662 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -672,9 +672,10 @@ static void __i915_ttm_move_fallback(struct ttm_buffer_object *bo, bool clear,
 	}
 }
 
-static int __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
-			   struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
-			   struct i915_refct_sgt *dst_rsgt, bool allow_accel)
+static struct dma_fence *
+__i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
+		struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm,
+		struct i915_refct_sgt *dst_rsgt, bool allow_accel)
 {
 	struct i915_ttm_memcpy_work *copy_work;
 	struct dma_fence *fence;
@@ -689,7 +690,7 @@ static int __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 		/* Don't fail with -ENOMEM. Move sync instead. */
 		__i915_ttm_move_fallback(bo, clear, dst_mem, dst_ttm, dst_rsgt,
 					 allow_accel);
-		return 0;
+		return NULL;
 	}
 
 	dma_fence_work_init(&copy_work->base, &i915_ttm_memcpy_ops);
@@ -714,14 +715,45 @@ static int __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
 	fence = dma_fence_get(&copy_work->base.dma);
 	dma_fence_work_commit_imm(&copy_work->base);
 
-	/*
-	 * We're synchronizing here for now. For async moves, return the
-	 * fence.
-	 */
-	dma_fence_wait(fence, false);
-	dma_fence_put(fence);
+	return fence;
+}
 
-	return ret;
+/**
+ * struct i915_coalesce_fence - A dma-fence used to coalesce multiple fences
+ * similar to struct dm_fence_array, and at the same time being timeline-
+ * attached.
+ * @base: struct dma_fence_work base.
+ * @cb: Callback for timeline attachment.
+ */
+struct i915_coalesce_fence {
+	struct dma_fence_work base;
+	struct i915_sw_dma_fence_cb cb;
+};
+
+/* No .work or .release callback. Just coalescing. */
+static const struct dma_fence_work_ops i915_coalesce_fence_ops = {
+	.name = "Coalesce fence",
+};
+
+static struct dma_fence *
+i915_ttm_coalesce_fence(struct dma_fence *fence, struct intel_memory_region *mr)
+{
+	struct i915_coalesce_fence *coalesce =
+		kmalloc(sizeof(*coalesce), GFP_KERNEL);
+
+	if (!coalesce) {
+		dma_fence_wait(fence, false);
+		dma_fence_put(fence);
+		return NULL;
+	}
+
+	dma_fence_work_init(&coalesce->base, &i915_coalesce_fence_ops);
+	dma_fence_work_chain(&coalesce->base, fence);
+	dma_fence_work_timeline_attach(&mr->tl, &coalesce->base, &coalesce->cb);
+	dma_fence_get(&coalesce->base.dma);
+	dma_fence_work_commit_imm(&coalesce->base);
+	dma_fence_put(fence);
+	return &coalesce->base.dma;
 }
 
 static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
@@ -734,6 +766,7 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		ttm_manager_type(bo->bdev, dst_mem->mem_type);
 	struct ttm_tt *ttm = bo->ttm;
 	struct i915_refct_sgt *dst_rsgt;
+	struct dma_fence *fence = NULL;
 	bool clear;
 	int ret;
 
@@ -765,7 +798,23 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 
 	clear = !cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
 	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
-		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
+		fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
+	if (fence && evict) {
+		struct intel_memory_region *mr =
+			i915_ttm_region(bo->bdev, bo->resource->mem_type);
+
+		/*
+		 * Attach to the region timeline and for future async unbind,
+		 * which requires a timeline. Also future async unbind fences
+		 * can be attached here.
+		 */
+		fence = i915_ttm_coalesce_fence(fence, mr);
+	}
+
+	if (fence) {
+		dma_fence_wait(fence, false);
+		dma_fence_put(fence);
+	}
 
 	ttm_bo_move_sync_cleanup(bo, dst_mem);
 	i915_ttm_adjust_domains_after_move(obj);
@@ -1223,6 +1272,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		.interruptible = intr,
 	};
 	struct i915_refct_sgt *dst_rsgt;
+	struct dma_fence *fence;
 	int ret;
 
 	assert_object_held(dst);
@@ -1238,10 +1288,14 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
 		return ret;
 
 	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
-	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
-			dst_rsgt, allow_accel);
-
+	fence = __i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
+				dst_rsgt, allow_accel);
 	i915_refct_sgt_put(dst_rsgt);
 
+	if (fence) {
+		dma_fence_wait(fence, false);
+		dma_fence_put(fence);
+	}
+
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index e7f7e6627750..aa1733e840f7 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -7,6 +7,9 @@
 #include "i915_drv.h"
 #include "i915_ttm_buddy_manager.h"
 
+static const struct dma_fence_work_timeline_ops tl_ops;
+static void intel_region_timeline_release_work(struct work_struct *work);
+
 static const struct {
 	u16 class;
 	u16 instance;
@@ -127,6 +130,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
 	}
 
 	kref_init(&mem->kref);
+
+	INIT_WORK(&mem->tl_put_work, intel_region_timeline_release_work);
+	dma_fence_work_timeline_init(&mem->tl, NULL, &tl_ops);
+
 	return mem;
 
 err_free:
@@ -238,6 +245,42 @@ void intel_memory_regions_driver_release(struct drm_i915_private *i915)
 	}
 }
 
+static void intel_region_timeline_get(struct dma_fence_work_timeline *tl)
+{
+	struct intel_memory_region *mr = container_of(tl, typeof(*mr), tl);
+
+	intel_memory_region_get(mr);
+}
+
+static void intel_region_timeline_release_work(struct work_struct *work)
+{
+	struct intel_memory_region *mr =
+		container_of(work, typeof(*mr), tl_put_work);
+
+	__intel_memory_region_destroy(&mr->kref);
+}
+
+static void intel_region_timeline_release(struct kref *ref)
+{
+	struct intel_memory_region *mr = container_of(ref, typeof(*mr), kref);
+
+	/* May be called from hardirq context, so queue the final release. */
+	queue_work(system_unbound_wq, &mr->tl_put_work);
+}
+
+static void intel_region_timeline_put(struct dma_fence_work_timeline *tl)
+{
+	struct intel_memory_region *mr = container_of(tl, typeof(*mr), tl);
+
+	kref_put(&mr->kref, intel_region_timeline_release);
+}
+
+static const struct dma_fence_work_timeline_ops tl_ops = {
+	.name = "Region timeline",
+	.get = intel_region_timeline_get,
+	.put = intel_region_timeline_put,
+};
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/intel_memory_region.c"
 #include "selftests/mock_region.c"
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 3feae3353d33..928819e2edba 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -13,6 +13,8 @@
 #include <drm/drm_mm.h>
 #include <drm/i915_drm.h>
 
+#include "i915_sw_fence_work.h"
+
 struct drm_i915_private;
 struct drm_i915_gem_object;
 struct drm_printer;
@@ -94,6 +96,11 @@ struct intel_memory_region {
 	bool is_range_manager;
 
 	void *region_private;
+
+	/** Timeline for TTM eviction fences */
+	struct dma_fence_work_timeline tl;
+	/** Work struct for _region_put() from atomic / irq context */
+	struct work_struct tl_put_work;
 };
 
 struct intel_memory_region *
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/6] drm/i915: Use irq work for coalescing-only dma-fence-work
  2021-10-08 13:35 ` [Intel-gfx] " Thomas Hellström
@ 2021-10-08 13:35   ` Thomas Hellström
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

We are using a timeline-attached struct dma_fence_work to coalesce
dma-fences on eviction. In this mode we will not have a work callback
attached.
Similar to how the dma-fence-chain and dma-fence-array containers do this,
use irq work to signal to reduce latency.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_sw_fence_work.c | 36 ++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_sw_fence_work.h |  2 ++
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index 87cdb3158042..4573f537ada4 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -32,16 +32,17 @@ void dma_fence_work_timeline_attach(struct dma_fence_work_timeline *tl,
 {
 	struct dma_fence *await;
 
+	might_sleep();
 	if (tl->ops->get)
 		tl->ops->get(tl);
 
-	spin_lock(&tl->lock);
+	spin_lock_irq(&tl->lock);
 	await = tl->last_fence;
 	tl->last_fence = dma_fence_get(&f->dma);
 	f->dma.seqno = tl->seqno++;
 	f->dma.context = tl->context;
 	f->tl = tl;
-	spin_unlock(&tl->lock);
+	spin_unlock_irq(&tl->lock);
 
 	if (await) {
 		__i915_sw_fence_await_dma_fence(&f->chain, await, tl_cb);
@@ -53,13 +54,14 @@ static void dma_fence_work_timeline_detach(struct dma_fence_work *f)
 {
 	struct dma_fence_work_timeline *tl = f->tl;
 	bool put = false;
+	unsigned long irq_flags;
 
-	spin_lock(&tl->lock);
+	spin_lock_irqsave(&tl->lock, irq_flags);
 	if (tl->last_fence == &f->dma) {
 		put = true;
 		tl->last_fence = NULL;
 	}
-	spin_unlock(&tl->lock);
+	spin_unlock_irqrestore(&tl->lock, irq_flags);
 	if (tl->ops->put)
 		tl->ops->put(tl);
 	if (put)
@@ -68,8 +70,6 @@ static void dma_fence_work_timeline_detach(struct dma_fence_work *f)
 
 static void dma_fence_work_complete(struct dma_fence_work *f)
 {
-	dma_fence_signal(&f->dma);
-
 	if (f->ops->release)
 		f->ops->release(f);
 
@@ -79,13 +79,32 @@ static void dma_fence_work_complete(struct dma_fence_work *f)
 	dma_fence_put(&f->dma);
 }
 
+static void dma_fence_work_irq_work(struct irq_work *irq_work)
+{
+	struct dma_fence_work *f = container_of(irq_work, typeof(*f), irq_work);
+
+	dma_fence_signal(&f->dma);
+	if (f->ops->release)
+		/* Note we take the signaled path in dma_fence_work_work() */
+		queue_work(system_unbound_wq, &f->work);
+	else
+		dma_fence_work_complete(f);
+}
+
 static void dma_fence_work_work(struct work_struct *work)
 {
 	struct dma_fence_work *f = container_of(work, typeof(*f), work);
 
+	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &f->dma.flags)) {
+		dma_fence_work_complete(f);
+		return;
+	}
+
 	if (f->ops->work)
 		f->ops->work(f);
 
+	dma_fence_signal(&f->dma);
+
 	dma_fence_work_complete(f);
 }
 
@@ -102,8 +121,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 		dma_fence_get(&f->dma);
 		if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
 			dma_fence_work_work(&f->work);
-		else
+		else if (f->ops->work)
 			queue_work(system_unbound_wq, &f->work);
+		else
+			irq_work_queue(&f->irq_work);
 		break;
 
 	case FENCE_FREE:
@@ -155,6 +176,7 @@ void dma_fence_work_init(struct dma_fence_work *f,
 	dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0);
 	i915_sw_fence_init(&f->chain, fence_notify);
 	INIT_WORK(&f->work, dma_fence_work_work);
+	init_irq_work(&f->irq_work, dma_fence_work_irq_work);
 }
 
 int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal)
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
index 6f41ee360133..c412bb4cb288 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
@@ -8,6 +8,7 @@
 #define I915_SW_FENCE_WORK_H
 
 #include <linux/dma-fence.h>
+#include <linux/irq_work.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
 
@@ -77,6 +78,7 @@ struct dma_fence_work {
 	struct i915_sw_dma_fence_cb cb;
 
 	struct work_struct work;
+	struct irq_work irq_work;
 
 	struct dma_fence_work_timeline *tl;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH 6/6] drm/i915: Use irq work for coalescing-only dma-fence-work
@ 2021-10-08 13:35   ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-08 13:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

We are using a timeline-attached struct dma_fence_work to coalesce
dma-fences on eviction. In this mode we will not have a work callback
attached.
Similar to how the dma-fence-chain and dma-fence-array containers do this,
use irq work to signal to reduce latency.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_sw_fence_work.c | 36 ++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_sw_fence_work.h |  2 ++
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index 87cdb3158042..4573f537ada4 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -32,16 +32,17 @@ void dma_fence_work_timeline_attach(struct dma_fence_work_timeline *tl,
 {
 	struct dma_fence *await;
 
+	might_sleep();
 	if (tl->ops->get)
 		tl->ops->get(tl);
 
-	spin_lock(&tl->lock);
+	spin_lock_irq(&tl->lock);
 	await = tl->last_fence;
 	tl->last_fence = dma_fence_get(&f->dma);
 	f->dma.seqno = tl->seqno++;
 	f->dma.context = tl->context;
 	f->tl = tl;
-	spin_unlock(&tl->lock);
+	spin_unlock_irq(&tl->lock);
 
 	if (await) {
 		__i915_sw_fence_await_dma_fence(&f->chain, await, tl_cb);
@@ -53,13 +54,14 @@ static void dma_fence_work_timeline_detach(struct dma_fence_work *f)
 {
 	struct dma_fence_work_timeline *tl = f->tl;
 	bool put = false;
+	unsigned long irq_flags;
 
-	spin_lock(&tl->lock);
+	spin_lock_irqsave(&tl->lock, irq_flags);
 	if (tl->last_fence == &f->dma) {
 		put = true;
 		tl->last_fence = NULL;
 	}
-	spin_unlock(&tl->lock);
+	spin_unlock_irqrestore(&tl->lock, irq_flags);
 	if (tl->ops->put)
 		tl->ops->put(tl);
 	if (put)
@@ -68,8 +70,6 @@ static void dma_fence_work_timeline_detach(struct dma_fence_work *f)
 
 static void dma_fence_work_complete(struct dma_fence_work *f)
 {
-	dma_fence_signal(&f->dma);
-
 	if (f->ops->release)
 		f->ops->release(f);
 
@@ -79,13 +79,32 @@ static void dma_fence_work_complete(struct dma_fence_work *f)
 	dma_fence_put(&f->dma);
 }
 
+static void dma_fence_work_irq_work(struct irq_work *irq_work)
+{
+	struct dma_fence_work *f = container_of(irq_work, typeof(*f), irq_work);
+
+	dma_fence_signal(&f->dma);
+	if (f->ops->release)
+		/* Note we take the signaled path in dma_fence_work_work() */
+		queue_work(system_unbound_wq, &f->work);
+	else
+		dma_fence_work_complete(f);
+}
+
 static void dma_fence_work_work(struct work_struct *work)
 {
 	struct dma_fence_work *f = container_of(work, typeof(*f), work);
 
+	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &f->dma.flags)) {
+		dma_fence_work_complete(f);
+		return;
+	}
+
 	if (f->ops->work)
 		f->ops->work(f);
 
+	dma_fence_signal(&f->dma);
+
 	dma_fence_work_complete(f);
 }
 
@@ -102,8 +121,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 		dma_fence_get(&f->dma);
 		if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
 			dma_fence_work_work(&f->work);
-		else
+		else if (f->ops->work)
 			queue_work(system_unbound_wq, &f->work);
+		else
+			irq_work_queue(&f->irq_work);
 		break;
 
 	case FENCE_FREE:
@@ -155,6 +176,7 @@ void dma_fence_work_init(struct dma_fence_work *f,
 	dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0);
 	i915_sw_fence_init(&f->chain, fence_notify);
 	INIT_WORK(&f->work, dma_fence_work_work);
+	init_irq_work(&f->irq_work, dma_fence_work_irq_work);
 }
 
 int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal)
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
index 6f41ee360133..c412bb4cb288 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
@@ -8,6 +8,7 @@
 #define I915_SW_FENCE_WORK_H
 
 #include <linux/dma-fence.h>
+#include <linux/irq_work.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
 
@@ -77,6 +78,7 @@ struct dma_fence_work {
 	struct i915_sw_dma_fence_cb cb;
 
 	struct work_struct work;
+	struct irq_work irq_work;
 
 	struct dma_fence_work_timeline *tl;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: Failsafe migration blits
  2021-10-08 13:35 ` [Intel-gfx] " Thomas Hellström
                   ` (6 preceding siblings ...)
  (?)
@ 2021-10-08 17:00 ` Patchwork
  -1 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2021-10-08 17:00 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Failsafe migration blits
URL   : https://patchwork.freedesktop.org/series/95617/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
-
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:32:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:32:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:56:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:56:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_reset.c:1392:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block
+drivers/gpu/drm/i915/i915_perf.c:1442:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1496:15: warning: memset with byte count of 16777216
+./include/asm-generic/bitops/find.h:112:45: warning: shift count is negative (-262080)
+./include/asm-generic/bitops/find.h:32:31: warning: shift count is negative (-262080)
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'gen6_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'gen6_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:418:9: warning: context imbalance in 'gen6_write8' - different lock contexts for basic block



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Failsafe migration blits
  2021-10-08 13:35 ` [Intel-gfx] " Thomas Hellström
                   ` (7 preceding siblings ...)
  (?)
@ 2021-10-08 17:29 ` Patchwork
  -1 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2021-10-08 17:29 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 4578 bytes --]

== Series Details ==

Series: drm/i915: Failsafe migration blits
URL   : https://patchwork.freedesktop.org/series/95617/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10700 -> Patchwork_21293
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/index.html

Known issues
------------

  Here are the changes found in Patchwork_21293 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_huc_copy@huc-copy:
    - fi-tgl-u2:          NOTRUN -> [SKIP][1] ([i915#2190])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/fi-tgl-u2/igt@gem_huc_copy@huc-copy.html

  * igt@kms_chamelium@dp-hpd-fast:
    - fi-tgl-u2:          NOTRUN -> [SKIP][2] ([fdo#109284] / [fdo#111827]) +8 similar issues
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/fi-tgl-u2/igt@kms_chamelium@dp-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-tgl-u2:          NOTRUN -> [SKIP][3] ([i915#4103]) +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/fi-tgl-u2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-tgl-u2:          NOTRUN -> [SKIP][4] ([fdo#109285])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/fi-tgl-u2/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cml-u2:          [PASS][5] -> [DMESG-WARN][6] ([i915#4269])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html

  * igt@prime_vgem@basic-userptr:
    - fi-tgl-u2:          NOTRUN -> [SKIP][7] ([i915#3301])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/fi-tgl-u2/igt@prime_vgem@basic-userptr.html

  * igt@runner@aborted:
    - fi-bdw-5557u:       NOTRUN -> [FAIL][8] ([i915#1602] / [i915#2029])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/fi-bdw-5557u/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-a:
    - {fi-tgl-dsi}:       [DMESG-WARN][9] ([i915#1982]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/fi-tgl-dsi/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-a.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/fi-tgl-dsi/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-a.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1602]: https://gitlab.freedesktop.org/drm/intel/issues/1602
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2029]: https://gitlab.freedesktop.org/drm/intel/issues/2029
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4269]: https://gitlab.freedesktop.org/drm/intel/issues/4269


Participating hosts (40 -> 38)
------------------------------

  Additional (1): fi-tgl-u2 
  Missing    (3): fi-ilk-m540 fi-bsw-cyan fi-hsw-4200u 


Build changes
-------------

  * Linux: CI_DRM_10700 -> Patchwork_21293

  CI-20190529: 20190529
  CI_DRM_10700: 6ecdd5e29c83cd8fc191f8cce5c283eefb53c97e @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6240: b232a092b9e1b10a8be13601acaa440903b226bc @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_21293: c5653fd415d13fb18f1941ee25d8d17bd84fa232 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

c5653fd415d1 drm/i915: Use irq work for coalescing-only dma-fence-work
d38939721ed8 drm/i915/ttm: Attach the migration fence to a region timeline on eviction
dfed5050f265 drm/i915: Add a struct dma_fence_work timeline
67524b4cd58a drm/i915/ttm: Failsafe migration blits
7b60ee40dcf3 drm/i915: Introduce refcounted sg-tables
1d852143d213 drm/i915: Update dma_fence_work

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/index.html

[-- Attachment #2: Type: text/html, Size: 5408 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: Failsafe migration blits
  2021-10-08 13:35 ` [Intel-gfx] " Thomas Hellström
                   ` (8 preceding siblings ...)
  (?)
@ 2021-10-09  0:04 ` Patchwork
  -1 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2021-10-09  0:04 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 30257 bytes --]

== Series Details ==

Series: drm/i915: Failsafe migration blits
URL   : https://patchwork.freedesktop.org/series/95617/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10700_full -> Patchwork_21293_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_21293_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21293_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_21293_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_workarounds@reset:
    - shard-snb:          NOTRUN -> [TIMEOUT][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-snb2/igt@gem_workarounds@reset.html

  * igt@i915_pm_dc@dc9-dpms:
    - shard-tglb:         NOTRUN -> [SKIP][2]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-tglb3/igt@i915_pm_dc@dc9-dpms.html

  
Known issues
------------

  Here are the changes found in Patchwork_21293_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_isolation@preservation-s3@rcs0:
    - shard-apl:          NOTRUN -> [DMESG-WARN][3] ([i915#180]) +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl6/igt@gem_ctx_isolation@preservation-s3@rcs0.html

  * igt@gem_ctx_isolation@preservation-s3@vcs0:
    - shard-kbl:          [PASS][4] -> [DMESG-WARN][5] ([i915#180]) +2 similar issues
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-kbl6/igt@gem_ctx_isolation@preservation-s3@vcs0.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl6/igt@gem_ctx_isolation@preservation-s3@vcs0.html

  * igt@gem_ctx_persistence@legacy-engines-queued:
    - shard-snb:          NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#1099])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-snb7/igt@gem_ctx_persistence@legacy-engines-queued.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-apl:          NOTRUN -> [FAIL][7] ([i915#2846])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl3/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-none-solo@rcs0:
    - shard-kbl:          NOTRUN -> [FAIL][8] ([i915#2842])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl1/igt@gem_exec_fair@basic-none-solo@rcs0.html

  * igt@gem_exec_fair@basic-none@vcs0:
    - shard-apl:          [PASS][9] -> [FAIL][10] ([i915#2842])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-apl2/igt@gem_exec_fair@basic-none@vcs0.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl3/igt@gem_exec_fair@basic-none@vcs0.html

  * igt@gem_exec_fair@basic-none@vecs0:
    - shard-kbl:          [PASS][11] -> [FAIL][12] ([i915#2842])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-kbl7/igt@gem_exec_fair@basic-none@vecs0.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl7/igt@gem_exec_fair@basic-none@vecs0.html
    - shard-apl:          [PASS][13] -> [FAIL][14] ([i915#2842] / [i915#3468])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-apl2/igt@gem_exec_fair@basic-none@vecs0.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl3/igt@gem_exec_fair@basic-none@vecs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-glk:          [PASS][15] -> [FAIL][16] ([i915#2842]) +1 similar issue
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-glk6/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk5/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-iclb:         NOTRUN -> [FAIL][17] ([i915#2842])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb8/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_exec_flush@basic-uc-rw-default:
    - shard-skl:          [PASS][18] -> [DMESG-WARN][19] ([i915#1982])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-skl2/igt@gem_exec_flush@basic-uc-rw-default.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl2/igt@gem_exec_flush@basic-uc-rw-default.html

  * igt@gem_exec_whisper@basic-queues-priority:
    - shard-iclb:         [PASS][20] -> [INCOMPLETE][21] ([i915#1895])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-iclb6/igt@gem_exec_whisper@basic-queues-priority.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb1/igt@gem_exec_whisper@basic-queues-priority.html

  * igt@gem_huc_copy@huc-copy:
    - shard-apl:          NOTRUN -> [SKIP][22] ([fdo#109271] / [i915#2190])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl6/igt@gem_huc_copy@huc-copy.html

  * igt@gem_pwrite@basic-exhaustion:
    - shard-snb:          NOTRUN -> [WARN][23] ([i915#2658])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-snb2/igt@gem_pwrite@basic-exhaustion.html
    - shard-apl:          NOTRUN -> [WARN][24] ([i915#2658])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl8/igt@gem_pwrite@basic-exhaustion.html
    - shard-glk:          NOTRUN -> [WARN][25] ([i915#2658])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk7/igt@gem_pwrite@basic-exhaustion.html

  * igt@gem_pxp@reject-modify-context-protection-off-2:
    - shard-tglb:         NOTRUN -> [SKIP][26] ([i915#4270])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-tglb3/igt@gem_pxp@reject-modify-context-protection-off-2.html
    - shard-iclb:         NOTRUN -> [SKIP][27] ([i915#4270])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb8/igt@gem_pxp@reject-modify-context-protection-off-2.html

  * igt@gem_render_copy@yf-tiled-mc-ccs-to-vebox-y-tiled:
    - shard-glk:          NOTRUN -> [SKIP][28] ([fdo#109271]) +36 similar issues
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk7/igt@gem_render_copy@yf-tiled-mc-ccs-to-vebox-y-tiled.html

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-kbl:          NOTRUN -> [SKIP][29] ([fdo#109271] / [i915#3323])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl4/igt@gem_userptr_blits@dmabuf-sync.html

  * igt@gen9_exec_parse@batch-invalid-length:
    - shard-iclb:         NOTRUN -> [SKIP][30] ([i915#2856])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb8/igt@gen9_exec_parse@batch-invalid-length.html

  * igt@i915_pm_dc@dc9-dpms:
    - shard-iclb:         NOTRUN -> [FAIL][31] ([i915#4275])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb8/igt@i915_pm_dc@dc9-dpms.html

  * igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-dp:
    - shard-apl:          NOTRUN -> [SKIP][32] ([fdo#109271] / [i915#1937])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl3/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-dp.html

  * igt@i915_suspend@sysfs-reader:
    - shard-skl:          [PASS][33] -> [INCOMPLETE][34] ([i915#146] / [i915#198])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-skl7/igt@i915_suspend@sysfs-reader.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl9/igt@i915_suspend@sysfs-reader.html

  * igt@kms_big_fb@x-tiled-32bpp-rotate-180:
    - shard-glk:          NOTRUN -> [DMESG-WARN][35] ([i915#118])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk7/igt@kms_big_fb@x-tiled-32bpp-rotate-180.html

  * igt@kms_big_fb@x-tiled-64bpp-rotate-270:
    - shard-iclb:         NOTRUN -> [SKIP][36] ([fdo#110725] / [fdo#111614])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb8/igt@kms_big_fb@x-tiled-64bpp-rotate-270.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip:
    - shard-apl:          NOTRUN -> [SKIP][37] ([fdo#109271] / [i915#3777]) +2 similar issues
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl3/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip.html
    - shard-kbl:          NOTRUN -> [SKIP][38] ([fdo#109271] / [i915#3777]) +1 similar issue
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl1/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip:
    - shard-kbl:          NOTRUN -> [SKIP][39] ([fdo#109271]) +131 similar issues
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl4/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0:
    - shard-apl:          NOTRUN -> [SKIP][40] ([fdo#109271]) +277 similar issues
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl3/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0.html

  * igt@kms_ccs@pipe-a-ccs-on-another-bo-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][41] ([fdo#109271] / [i915#3886]) +14 similar issues
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl3/igt@kms_ccs@pipe-a-ccs-on-another-bo-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_rc_ccs_cc:
    - shard-skl:          NOTRUN -> [SKIP][42] ([fdo#109271] / [i915#3886]) +1 similar issue
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl8/igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-b-bad-pixel-format-y_tiled_ccs:
    - shard-snb:          NOTRUN -> [SKIP][43] ([fdo#109271]) +218 similar issues
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-snb2/igt@kms_ccs@pipe-b-bad-pixel-format-y_tiled_ccs.html

  * igt@kms_ccs@pipe-c-crc-primary-rotation-180-y_tiled_gen12_mc_ccs:
    - shard-kbl:          NOTRUN -> [SKIP][44] ([fdo#109271] / [i915#3886]) +6 similar issues
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl1/igt@kms_ccs@pipe-c-crc-primary-rotation-180-y_tiled_gen12_mc_ccs.html

  * igt@kms_chamelium@hdmi-mode-timings:
    - shard-snb:          NOTRUN -> [SKIP][45] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-snb7/igt@kms_chamelium@hdmi-mode-timings.html

  * igt@kms_chamelium@vga-hpd:
    - shard-apl:          NOTRUN -> [SKIP][46] ([fdo#109271] / [fdo#111827]) +22 similar issues
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl7/igt@kms_chamelium@vga-hpd.html

  * igt@kms_color_chamelium@pipe-a-ctm-negative:
    - shard-kbl:          NOTRUN -> [SKIP][47] ([fdo#109271] / [fdo#111827]) +7 similar issues
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl4/igt@kms_color_chamelium@pipe-a-ctm-negative.html

  * igt@kms_color_chamelium@pipe-b-ctm-0-5:
    - shard-glk:          NOTRUN -> [SKIP][48] ([fdo#109271] / [fdo#111827]) +2 similar issues
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk7/igt@kms_color_chamelium@pipe-b-ctm-0-5.html

  * igt@kms_color_chamelium@pipe-d-ctm-red-to-blue:
    - shard-skl:          NOTRUN -> [SKIP][49] ([fdo#109271] / [fdo#111827]) +5 similar issues
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl8/igt@kms_color_chamelium@pipe-d-ctm-red-to-blue.html

  * igt@kms_content_protection@legacy:
    - shard-kbl:          NOTRUN -> [TIMEOUT][50] ([i915#1319])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl1/igt@kms_content_protection@legacy.html
    - shard-apl:          NOTRUN -> [TIMEOUT][51] ([i915#1319])
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl3/igt@kms_content_protection@legacy.html

  * igt@kms_content_protection@uevent:
    - shard-kbl:          NOTRUN -> [FAIL][52] ([i915#2105])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl4/igt@kms_content_protection@uevent.html

  * igt@kms_cursor_crc@pipe-b-cursor-32x32-sliding:
    - shard-iclb:         NOTRUN -> [SKIP][53] ([fdo#109278]) +4 similar issues
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb8/igt@kms_cursor_crc@pipe-b-cursor-32x32-sliding.html

  * igt@kms_cursor_crc@pipe-d-cursor-512x170-rapid-movement:
    - shard-tglb:         NOTRUN -> [SKIP][54] ([i915#3359])
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-tglb3/igt@kms_cursor_crc@pipe-d-cursor-512x170-rapid-movement.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions:
    - shard-skl:          [PASS][55] -> [FAIL][56] ([i915#2346])
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-skl7/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl7/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-kbl:          [PASS][57] -> [INCOMPLETE][58] ([i915#180] / [i915#636])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-kbl4/igt@kms_fbcon_fbt@fbc-suspend.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl7/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@bc-hdmi-a1-hdmi-a2:
    - shard-glk:          [PASS][59] -> [FAIL][60] ([i915#79])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-glk1/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@bc-hdmi-a1-hdmi-a2.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk4/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@bc-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@flip-vs-suspend@c-dp1:
    - shard-kbl:          NOTRUN -> [DMESG-WARN][61] ([i915#180]) +2 similar issues
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl1/igt@kms_flip@flip-vs-suspend@c-dp1.html

  * igt@kms_flip@plain-flip-ts-check-interruptible@b-hdmi-a1:
    - shard-glk:          [PASS][62] -> [FAIL][63] ([i915#2122])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-glk5/igt@kms_flip@plain-flip-ts-check-interruptible@b-hdmi-a1.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk3/igt@kms_flip@plain-flip-ts-check-interruptible@b-hdmi-a1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile:
    - shard-iclb:         [PASS][64] -> [SKIP][65] ([i915#3701])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-iclb1/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb2/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-16bpp-ytile:
    - shard-skl:          NOTRUN -> [INCOMPLETE][66] ([i915#3699])
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl8/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-16bpp-ytile.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-pwrite:
    - shard-glk:          [PASS][67] -> [FAIL][68] ([i915#1888] / [i915#2546])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-glk5/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-pwrite.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk3/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-pwrite.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-draw-render:
    - shard-iclb:         NOTRUN -> [SKIP][69] ([fdo#109280])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb8/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-draw-render.html
    - shard-tglb:         NOTRUN -> [SKIP][70] ([fdo#111825])
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-tglb3/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-draw-render.html

  * igt@kms_hdr@bpc-switch-dpms:
    - shard-skl:          [PASS][71] -> [FAIL][72] ([i915#1188]) +1 similar issue
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-skl9/igt@kms_hdr@bpc-switch-dpms.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl3/igt@kms_hdr@bpc-switch-dpms.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - shard-apl:          NOTRUN -> [SKIP][73] ([fdo#109271] / [i915#533]) +3 similar issues
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl1/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@kms_pipe_crc_basic@disable-crc-after-crtc-pipe-d:
    - shard-glk:          NOTRUN -> [SKIP][74] ([fdo#109271] / [i915#533])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk7/igt@kms_pipe_crc_basic@disable-crc-after-crtc-pipe-d.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-d:
    - shard-kbl:          NOTRUN -> [SKIP][75] ([fdo#109271] / [i915#533])
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl1/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-d.html

  * igt@kms_plane@pixel-format-source-clamping@pipe-a-planes:
    - shard-skl:          NOTRUN -> [SKIP][76] ([fdo#109271]) +100 similar issues
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl8/igt@kms_plane@pixel-format-source-clamping@pipe-a-planes.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb:
    - shard-apl:          NOTRUN -> [FAIL][77] ([i915#265])
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl2/igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-basic:
    - shard-glk:          NOTRUN -> [FAIL][78] ([fdo#108145] / [i915#265])
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk7/igt@kms_plane_alpha_blend@pipe-c-alpha-basic.html
    - shard-apl:          NOTRUN -> [FAIL][79] ([fdo#108145] / [i915#265]) +1 similar issue
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl8/igt@kms_plane_alpha_blend@pipe-c-alpha-basic.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [PASS][80] -> [FAIL][81] ([fdo#108145] / [i915#265]) +1 similar issue
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-skl1/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl5/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-2:
    - shard-apl:          NOTRUN -> [SKIP][82] ([fdo#109271] / [i915#658]) +4 similar issues
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl2/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-2.html
    - shard-skl:          NOTRUN -> [SKIP][83] ([fdo#109271] / [i915#658])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl8/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-2.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-5:
    - shard-kbl:          NOTRUN -> [SKIP][84] ([fdo#109271] / [i915#658]) +1 similar issue
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl4/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-5.html

  * igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-1:
    - shard-glk:          NOTRUN -> [SKIP][85] ([fdo#109271] / [i915#658])
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk7/igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-1.html

  * igt@kms_psr@psr2_primary_mmap_cpu:
    - shard-iclb:         [PASS][86] -> [SKIP][87] ([fdo#109441]) +4 similar issues
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-iclb2/igt@kms_psr@psr2_primary_mmap_cpu.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb5/igt@kms_psr@psr2_primary_mmap_cpu.html

  * igt@kms_setmode@clone-exclusive-crtc:
    - shard-skl:          NOTRUN -> [WARN][88] ([i915#2100])
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl8/igt@kms_setmode@clone-exclusive-crtc.html

  * igt@perf@short-reads:
    - shard-skl:          [PASS][89] -> [FAIL][90] ([i915#51])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-skl4/igt@perf@short-reads.html
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl2/igt@perf@short-reads.html

  * igt@sysfs_clients@recycle:
    - shard-skl:          NOTRUN -> [SKIP][91] ([fdo#109271] / [i915#2994])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl8/igt@sysfs_clients@recycle.html

  * igt@sysfs_clients@recycle-many:
    - shard-apl:          NOTRUN -> [SKIP][92] ([fdo#109271] / [i915#2994]) +2 similar issues
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl8/igt@sysfs_clients@recycle-many.html
    - shard-glk:          NOTRUN -> [SKIP][93] ([fdo#109271] / [i915#2994])
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk7/igt@sysfs_clients@recycle-many.html

  * igt@sysfs_clients@split-50:
    - shard-kbl:          NOTRUN -> [SKIP][94] ([fdo#109271] / [i915#2994]) +3 similar issues
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl4/igt@sysfs_clients@split-50.html

  
#### Possible fixes ####

  * igt@gem_eio@in-flight-contexts-1us:
    - shard-iclb:         [TIMEOUT][95] ([i915#3070]) -> [PASS][96]
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-iclb8/igt@gem_eio@in-flight-contexts-1us.html
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb7/igt@gem_eio@in-flight-contexts-1us.html

  * igt@gem_eio@in-flight-contexts-immediate:
    - shard-apl:          [TIMEOUT][97] ([i915#3063]) -> [PASS][98]
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-apl7/igt@gem_eio@in-flight-contexts-immediate.html
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl1/igt@gem_eio@in-flight-contexts-immediate.html

  * igt@gem_exec_fair@basic-flow@rcs0:
    - shard-tglb:         [FAIL][99] ([i915#2842]) -> [PASS][100] +3 similar issues
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-tglb1/igt@gem_exec_fair@basic-flow@rcs0.html
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-tglb1/igt@gem_exec_fair@basic-flow@rcs0.html
    - shard-glk:          [FAIL][101] ([i915#2842]) -> [PASS][102] +1 similar issue
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-glk7/igt@gem_exec_fair@basic-flow@rcs0.html
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk8/igt@gem_exec_fair@basic-flow@rcs0.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-kbl:          [FAIL][103] ([i915#2842]) -> [PASS][104]
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-kbl1/igt@gem_exec_fair@basic-pace-solo@rcs0.html
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl1/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_huc_copy@huc-copy:
    - shard-tglb:         [SKIP][105] ([i915#2190]) -> [PASS][106]
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-tglb6/igt@gem_huc_copy@huc-copy.html
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-tglb1/igt@gem_huc_copy@huc-copy.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-skl:          [DMESG-WARN][107] ([i915#1436] / [i915#716]) -> [PASS][108]
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-skl8/igt@gen9_exec_parse@allowed-single.html
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl7/igt@gen9_exec_parse@allowed-single.html

  * igt@i915_pm_dc@dc6-dpms:
    - shard-iclb:         [FAIL][109] ([i915#454]) -> [PASS][110]
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-iclb3/igt@i915_pm_dc@dc6-dpms.html
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb4/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_rpm@system-suspend:
    - shard-skl:          [INCOMPLETE][111] ([i915#151]) -> [PASS][112]
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-skl10/igt@i915_pm_rpm@system-suspend.html
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl5/igt@i915_pm_rpm@system-suspend.html

  * igt@kms_async_flips@alternate-sync-async-flip:
    - shard-skl:          [FAIL][113] ([i915#2521]) -> [PASS][114]
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-skl8/igt@kms_async_flips@alternate-sync-async-flip.html
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl9/igt@kms_async_flips@alternate-sync-async-flip.html

  * igt@kms_big_fb@yf-tiled-16bpp-rotate-0:
    - shard-glk:          [DMESG-WARN][115] ([i915#118]) -> [PASS][116]
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-glk2/igt@kms_big_fb@yf-tiled-16bpp-rotate-0.html
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk8/igt@kms_big_fb@yf-tiled-16bpp-rotate-0.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][117] ([i915#180]) -> [PASS][118] +3 similar issues
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-kbl6/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-kbl2/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_flip@plain-flip-fb-recreate@a-edp1:
    - shard-skl:          [FAIL][119] ([i915#2122]) -> [PASS][120]
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-skl1/igt@kms_flip@plain-flip-fb-recreate@a-edp1.html
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl5/igt@kms_flip@plain-flip-fb-recreate@a-edp1.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - shard-apl:          [DMESG-WARN][121] ([i915#180]) -> [PASS][122]
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-apl2/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-apl8/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html

  * igt@kms_psr@psr2_primary_mmap_gtt:
    - shard-iclb:         [SKIP][123] ([fdo#109441]) -> [PASS][124] +1 similar issue
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-iclb1/igt@kms_psr@psr2_primary_mmap_gtt.html
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb2/igt@kms_psr@psr2_primary_mmap_gtt.html

  * igt@perf@polling-parameterized:
    - shard-glk:          [FAIL][125] ([i915#1542]) -> [PASS][126]
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-glk5/igt@perf@polling-parameterized.html
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-glk3/igt@perf@polling-parameterized.html
    - shard-skl:          [FAIL][127] ([i915#1542]) -> [PASS][128]
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-skl8/igt@perf@polling-parameterized.html
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-skl9/igt@perf@polling-parameterized.html

  
#### Warnings ####

  * igt@gem_exec_fair@basic-none-rrul@rcs0:
    - shard-iclb:         [FAIL][129] ([i915#2842]) -> [FAIL][130] ([i915#2852])
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-iclb1/igt@gem_exec_fair@basic-none-rrul@rcs0.html
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb2/igt@gem_exec_fair@basic-none-rrul@rcs0.html

  * igt@i915_pm_rc6_residency@rc6-idle:
    - shard-iclb:         [WARN][131] ([i915#2684]) -> [WARN][132] ([i915#1804] / [i915#2684])
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-iclb8/igt@i915_pm_rc6_residency@rc6-idle.html
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb7/igt@i915_pm_rc6_residency@rc6-idle.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-1:
    - shard-iclb:         [SKIP][133] ([i915#2920]) -> [SKIP][134] ([i915#658]) +3 similar issues
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-iclb2/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-1.html
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb8/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-1.html

  * igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-2:
    - shard-iclb:         [SKIP][135] ([i915#658]) -> [SKIP][136] ([i915#2920]) +3 similar issues
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-iclb1/igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-2.html
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/shard-iclb2/igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-2.html

  * igt@runner@aborted:
    - shard-kbl:          ([FAIL][137], [FAIL][138], [FAIL][139], [FAIL][140], [FAIL][141], [FAIL][142], [FAIL][143]) ([i915#1436] / [i915#180] / [i915#1814] / [i915#3002] / [i915#3363]) -> ([FAIL][144], [FAIL][145], [FAIL][146], [FAIL][147], [FAIL][148], [FAIL][149], [FAIL][150]) ([i915#1436] / [i915#180] / [i915#3002] / [i915#3363] / [i915#92])
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-kbl3/igt@runner@aborted.html
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-kbl6/igt@runner@aborted.html
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-kbl6/igt@runner@aborted.html
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-kbl7/igt@runner@aborted.html
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10700/shard-kbl7/igt@runner@abor

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21293/index.html

[-- Attachment #2: Type: text/html, Size: 33480 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] drm/i915: Update dma_fence_work
  2021-10-08 13:35   ` [Intel-gfx] " Thomas Hellström
@ 2021-10-13 12:41     ` Daniel Vetter
  -1 siblings, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2021-10-13 12:41 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-gfx, dri-devel, maarten.lankhorst, matthew.auld

On Fri, Oct 08, 2021 at 03:35:25PM +0200, Thomas Hellström wrote:
> Move the release callback to after fence signaling to align with
> what's done for upcoming VM_BIND user-fence signaling.
> 
> Finally call the work callback regardless of whether we have a fence
> error or not and update the existing callbacks accordingly. We will
> need this to intercept the error for failsafe migration.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

I think before we make this thing more complex we really should either
move this into dma-buf/ as a proper thing, or just open-code.

Minimally at least any new async dma_fence worker needs to have
dma_fence_begin/end_signalling annotations, or we're just digging a grave
here.

I'm also not seeing the point in building everything on top of this, for
many cases just an open-coded work_struct should be a lot simpler. It's
just more to clean up later on, that part is for sure.
-Daniel

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_clflush.c |  5 +++
>  drivers/gpu/drm/i915/i915_sw_fence_work.c   | 36 ++++++++++-----------
>  drivers/gpu/drm/i915/i915_sw_fence_work.h   |  1 +
>  drivers/gpu/drm/i915/i915_vma.c             | 12 +++++--
>  4 files changed, 33 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
> index f0435c6feb68..2143ebaf5b6f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
> @@ -28,6 +28,11 @@ static void clflush_work(struct dma_fence_work *base)
>  {
>  	struct clflush *clflush = container_of(base, typeof(*clflush), base);
>  
> +	if (base->error) {
> +		dma_fence_set_error(&base->dma, base->error);
> +		return;
> +	}
> +
>  	__do_clflush(clflush->obj);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> index 5b33ef23d54c..5b55cddafc9b 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> @@ -6,21 +6,24 @@
>  
>  #include "i915_sw_fence_work.h"
>  
> -static void fence_complete(struct dma_fence_work *f)
> +static void dma_fence_work_complete(struct dma_fence_work *f)
>  {
> +	dma_fence_signal(&f->dma);
> +
>  	if (f->ops->release)
>  		f->ops->release(f);
> -	dma_fence_signal(&f->dma);
> +
> +	dma_fence_put(&f->dma);
>  }
>  
> -static void fence_work(struct work_struct *work)
> +static void dma_fence_work_work(struct work_struct *work)
>  {
>  	struct dma_fence_work *f = container_of(work, typeof(*f), work);
>  
> -	f->ops->work(f);
> +	if (f->ops->work)
> +		f->ops->work(f);
>  
> -	fence_complete(f);
> -	dma_fence_put(&f->dma);
> +	dma_fence_work_complete(f);
>  }
>  
>  static int __i915_sw_fence_call
> @@ -31,17 +34,13 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>  	switch (state) {
>  	case FENCE_COMPLETE:
>  		if (fence->error)
> -			dma_fence_set_error(&f->dma, fence->error);
> -
> -		if (!f->dma.error) {
> -			dma_fence_get(&f->dma);
> -			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
> -				fence_work(&f->work);
> -			else
> -				queue_work(system_unbound_wq, &f->work);
> -		} else {
> -			fence_complete(f);
> -		}
> +			cmpxchg(&f->error, 0, fence->error);
> +
> +		dma_fence_get(&f->dma);
> +		if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
> +			dma_fence_work_work(&f->work);
> +		else
> +			queue_work(system_unbound_wq, &f->work);
>  		break;
>  
>  	case FENCE_FREE:
> @@ -84,10 +83,11 @@ void dma_fence_work_init(struct dma_fence_work *f,
>  			 const struct dma_fence_work_ops *ops)
>  {
>  	f->ops = ops;
> +	f->error = 0;
>  	spin_lock_init(&f->lock);
>  	dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0);
>  	i915_sw_fence_init(&f->chain, fence_notify);
> -	INIT_WORK(&f->work, fence_work);
> +	INIT_WORK(&f->work, dma_fence_work_work);
>  }
>  
>  int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal)
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> index d56806918d13..caa59fb5252b 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> @@ -24,6 +24,7 @@ struct dma_fence_work_ops {
>  struct dma_fence_work {
>  	struct dma_fence dma;
>  	spinlock_t lock;
> +	int error;
>  
>  	struct i915_sw_fence chain;
>  	struct i915_sw_dma_fence_cb cb;
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 4b7fc4647e46..5123ac28ad9a 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -301,6 +301,11 @@ static void __vma_bind(struct dma_fence_work *work)
>  	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>  	struct i915_vma *vma = vw->vma;
>  
> +	if (work->error) {
> +		dma_fence_set_error(&work->dma, work->error);
> +		return;
> +	}
> +
>  	vma->ops->bind_vma(vw->vm, &vw->stash,
>  			   vma, vw->cache_level, vw->flags);
>  }
> @@ -333,7 +338,7 @@ struct i915_vma_work *i915_vma_work(void)
>  		return NULL;
>  
>  	dma_fence_work_init(&vw->base, &bind_ops);
> -	vw->base.dma.error = -EAGAIN; /* disable the worker by default */
> +	vw->base.error = -EAGAIN; /* disable the worker by default */
>  
>  	return vw;
>  }
> @@ -416,6 +421,9 @@ int i915_vma_bind(struct i915_vma *vma,
>  		 * part of the obj->resv->excl_fence as it only affects
>  		 * execution and not content or object's backing store lifetime.
>  		 */
> +
> +		work->base.error = 0; /* enable the queue_work() */
> +
>  		prev = i915_active_set_exclusive(&vma->active, &work->base.dma);
>  		if (prev) {
>  			__i915_sw_fence_await_dma_fence(&work->base.chain,
> @@ -424,8 +432,6 @@ int i915_vma_bind(struct i915_vma *vma,
>  			dma_fence_put(prev);
>  		}
>  
> -		work->base.dma.error = 0; /* enable the queue_work() */
> -
>  		if (vma->obj) {
>  			__i915_gem_object_pin_pages(vma->obj);
>  			work->pinned = i915_gem_object_get(vma->obj);
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH 1/6] drm/i915: Update dma_fence_work
@ 2021-10-13 12:41     ` Daniel Vetter
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2021-10-13 12:41 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-gfx, dri-devel, maarten.lankhorst, matthew.auld

On Fri, Oct 08, 2021 at 03:35:25PM +0200, Thomas Hellström wrote:
> Move the release callback to after fence signaling to align with
> what's done for upcoming VM_BIND user-fence signaling.
> 
> Finally call the work callback regardless of whether we have a fence
> error or not and update the existing callbacks accordingly. We will
> need this to intercept the error for failsafe migration.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

I think before we make this thing more complex we really should either
move this into dma-buf/ as a proper thing, or just open-code.

Minimally at least any new async dma_fence worker needs to have
dma_fence_begin/end_signalling annotations, or we're just digging a grave
here.

I'm also not seeing the point in building everything on top of this, for
many cases just an open-coded work_struct should be a lot simpler. It's
just more to clean up later on, that part is for sure.
-Daniel

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_clflush.c |  5 +++
>  drivers/gpu/drm/i915/i915_sw_fence_work.c   | 36 ++++++++++-----------
>  drivers/gpu/drm/i915/i915_sw_fence_work.h   |  1 +
>  drivers/gpu/drm/i915/i915_vma.c             | 12 +++++--
>  4 files changed, 33 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
> index f0435c6feb68..2143ebaf5b6f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
> @@ -28,6 +28,11 @@ static void clflush_work(struct dma_fence_work *base)
>  {
>  	struct clflush *clflush = container_of(base, typeof(*clflush), base);
>  
> +	if (base->error) {
> +		dma_fence_set_error(&base->dma, base->error);
> +		return;
> +	}
> +
>  	__do_clflush(clflush->obj);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> index 5b33ef23d54c..5b55cddafc9b 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> @@ -6,21 +6,24 @@
>  
>  #include "i915_sw_fence_work.h"
>  
> -static void fence_complete(struct dma_fence_work *f)
> +static void dma_fence_work_complete(struct dma_fence_work *f)
>  {
> +	dma_fence_signal(&f->dma);
> +
>  	if (f->ops->release)
>  		f->ops->release(f);
> -	dma_fence_signal(&f->dma);
> +
> +	dma_fence_put(&f->dma);
>  }
>  
> -static void fence_work(struct work_struct *work)
> +static void dma_fence_work_work(struct work_struct *work)
>  {
>  	struct dma_fence_work *f = container_of(work, typeof(*f), work);
>  
> -	f->ops->work(f);
> +	if (f->ops->work)
> +		f->ops->work(f);
>  
> -	fence_complete(f);
> -	dma_fence_put(&f->dma);
> +	dma_fence_work_complete(f);
>  }
>  
>  static int __i915_sw_fence_call
> @@ -31,17 +34,13 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>  	switch (state) {
>  	case FENCE_COMPLETE:
>  		if (fence->error)
> -			dma_fence_set_error(&f->dma, fence->error);
> -
> -		if (!f->dma.error) {
> -			dma_fence_get(&f->dma);
> -			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
> -				fence_work(&f->work);
> -			else
> -				queue_work(system_unbound_wq, &f->work);
> -		} else {
> -			fence_complete(f);
> -		}
> +			cmpxchg(&f->error, 0, fence->error);
> +
> +		dma_fence_get(&f->dma);
> +		if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
> +			dma_fence_work_work(&f->work);
> +		else
> +			queue_work(system_unbound_wq, &f->work);
>  		break;
>  
>  	case FENCE_FREE:
> @@ -84,10 +83,11 @@ void dma_fence_work_init(struct dma_fence_work *f,
>  			 const struct dma_fence_work_ops *ops)
>  {
>  	f->ops = ops;
> +	f->error = 0;
>  	spin_lock_init(&f->lock);
>  	dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0);
>  	i915_sw_fence_init(&f->chain, fence_notify);
> -	INIT_WORK(&f->work, fence_work);
> +	INIT_WORK(&f->work, dma_fence_work_work);
>  }
>  
>  int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal)
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> index d56806918d13..caa59fb5252b 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> @@ -24,6 +24,7 @@ struct dma_fence_work_ops {
>  struct dma_fence_work {
>  	struct dma_fence dma;
>  	spinlock_t lock;
> +	int error;
>  
>  	struct i915_sw_fence chain;
>  	struct i915_sw_dma_fence_cb cb;
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 4b7fc4647e46..5123ac28ad9a 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -301,6 +301,11 @@ static void __vma_bind(struct dma_fence_work *work)
>  	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>  	struct i915_vma *vma = vw->vma;
>  
> +	if (work->error) {
> +		dma_fence_set_error(&work->dma, work->error);
> +		return;
> +	}
> +
>  	vma->ops->bind_vma(vw->vm, &vw->stash,
>  			   vma, vw->cache_level, vw->flags);
>  }
> @@ -333,7 +338,7 @@ struct i915_vma_work *i915_vma_work(void)
>  		return NULL;
>  
>  	dma_fence_work_init(&vw->base, &bind_ops);
> -	vw->base.dma.error = -EAGAIN; /* disable the worker by default */
> +	vw->base.error = -EAGAIN; /* disable the worker by default */
>  
>  	return vw;
>  }
> @@ -416,6 +421,9 @@ int i915_vma_bind(struct i915_vma *vma,
>  		 * part of the obj->resv->excl_fence as it only affects
>  		 * execution and not content or object's backing store lifetime.
>  		 */
> +
> +		work->base.error = 0; /* enable the queue_work() */
> +
>  		prev = i915_active_set_exclusive(&vma->active, &work->base.dma);
>  		if (prev) {
>  			__i915_sw_fence_await_dma_fence(&work->base.chain,
> @@ -424,8 +432,6 @@ int i915_vma_bind(struct i915_vma *vma,
>  			dma_fence_put(prev);
>  		}
>  
> -		work->base.dma.error = 0; /* enable the queue_work() */
> -
>  		if (vma->obj) {
>  			__i915_gem_object_pin_pages(vma->obj);
>  			work->pinned = i915_gem_object_get(vma->obj);
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915: Add a struct dma_fence_work timeline
  2021-10-08 13:35   ` [Intel-gfx] " Thomas Hellström
  (?)
@ 2021-10-13 12:43   ` Daniel Vetter
  2021-10-13 14:21     ` Thomas Hellström
  -1 siblings, 1 reply; 33+ messages in thread
From: Daniel Vetter @ 2021-10-13 12:43 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-gfx, dri-devel, maarten.lankhorst, matthew.auld

On Fri, Oct 08, 2021 at 03:35:28PM +0200, Thomas Hellström wrote:
> The TTM managers and, possibly, the gtt address space managers will
> need to be able to order fences for async operation.
> Using dma_fence_is_later() for this will require that the fences we hand
> them are from a single fence context and ordered.
> 
> Introduce a struct dma_fence_work_timeline, and a function to attach
> struct dma_fence_work to such a timeline in a way that all previous
> fences attached to the timeline will be signaled when the latest
> attached struct dma_fence_work signals.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

I'm not understanding why we need this:

- if we just want to order dma_fence work, then an ordered workqueue is
  what we want. Which is why hand-rolling is better than reusing
  dma_fence_work for absolutely everything.

- if we just need to make sure the public fences signal in order, then
  it's a dma_fence_chain.

Definitely no more "it looks like it's shared code but isn't" stuff in
i915.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_sw_fence_work.c | 89 ++++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_sw_fence_work.h | 58 +++++++++++++++
>  2 files changed, 145 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> index 5b55cddafc9b..87cdb3158042 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> @@ -5,6 +5,66 @@
>   */
>  
>  #include "i915_sw_fence_work.h"
> +#include "i915_utils.h"
> +
> +/**
> + * dma_fence_work_timeline_attach - Attach a struct dma_fence_work to a
> + * timeline.
> + * @tl: The timeline to attach to.
> + * @f: The struct dma_fence_work.
> + * @tl_cb: The i915_sw_dma_fence_cb needed to attach to the
> + * timeline. This is typically embedded into the structure that also
> + * embeds the struct dma_fence_work.
> + *
> + * This function takes a timeline reference and associates it with the
> + * struct dma_fence_work. That reference is given up when the fence
> + * signals. Furthermore it assigns a fence context and a seqno to the
> + * dma-fence, and then chains upon the previous fence of the timeline
> + * if any, to make sure that the fence signals after that fence. The
> + * @tl_cb callback structure is needed for that chaining. Finally
> + * the registered last fence of the timeline is replaced by this fence, and
> + * the timeline takes a reference on the fence, which is released when
> + * the fence signals.
> + */
> +void dma_fence_work_timeline_attach(struct dma_fence_work_timeline *tl,
> +				    struct dma_fence_work *f,
> +				    struct i915_sw_dma_fence_cb *tl_cb)
> +{
> +	struct dma_fence *await;
> +
> +	if (tl->ops->get)
> +		tl->ops->get(tl);
> +
> +	spin_lock(&tl->lock);
> +	await = tl->last_fence;
> +	tl->last_fence = dma_fence_get(&f->dma);
> +	f->dma.seqno = tl->seqno++;
> +	f->dma.context = tl->context;
> +	f->tl = tl;
> +	spin_unlock(&tl->lock);
> +
> +	if (await) {
> +		__i915_sw_fence_await_dma_fence(&f->chain, await, tl_cb);
> +		dma_fence_put(await);
> +	}
> +}
> +
> +static void dma_fence_work_timeline_detach(struct dma_fence_work *f)
> +{
> +	struct dma_fence_work_timeline *tl = f->tl;
> +	bool put = false;
> +
> +	spin_lock(&tl->lock);
> +	if (tl->last_fence == &f->dma) {
> +		put = true;
> +		tl->last_fence = NULL;
> +	}
> +	spin_unlock(&tl->lock);
> +	if (tl->ops->put)
> +		tl->ops->put(tl);
> +	if (put)
> +		dma_fence_put(&f->dma);
> +}
>  
>  static void dma_fence_work_complete(struct dma_fence_work *f)
>  {
> @@ -13,6 +73,9 @@ static void dma_fence_work_complete(struct dma_fence_work *f)
>  	if (f->ops->release)
>  		f->ops->release(f);
>  
> +	if (f->tl)
> +		dma_fence_work_timeline_detach(f);
> +
>  	dma_fence_put(&f->dma);
>  }
>  
> @@ -53,14 +116,17 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>  
>  static const char *get_driver_name(struct dma_fence *fence)
>  {
> -	return "dma-fence";
> +	struct dma_fence_work *f = container_of(fence, typeof(*f), dma);
> +
> +	return (f->tl && f->tl->ops->name) ? f->tl->ops->name : "dma-fence";
>  }
>  
>  static const char *get_timeline_name(struct dma_fence *fence)
>  {
>  	struct dma_fence_work *f = container_of(fence, typeof(*f), dma);
>  
> -	return f->ops->name ?: "work";
> +	return (f->tl && f->tl->name) ? f->tl->name :
> +		f->ops->name ?: "work";
>  }
>  
>  static void fence_release(struct dma_fence *fence)
> @@ -84,6 +150,7 @@ void dma_fence_work_init(struct dma_fence_work *f,
>  {
>  	f->ops = ops;
>  	f->error = 0;
> +	f->tl = NULL;
>  	spin_lock_init(&f->lock);
>  	dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0);
>  	i915_sw_fence_init(&f->chain, fence_notify);
> @@ -97,3 +164,21 @@ int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal)
>  
>  	return __i915_sw_fence_await_dma_fence(&f->chain, signal, &f->cb);
>  }
> +
> +/**
> + * dma_fence_work_timeline_init - Initialize a dma_fence_work timeline
> + * @tl: The timeline to initialize,
> + * @name: The name of the timeline,
> + * @ops: The timeline operations.
> + */
> +void dma_fence_work_timeline_init(struct dma_fence_work_timeline *tl,
> +				  const char *name,
> +				  const struct dma_fence_work_timeline_ops *ops)
> +{
> +	tl->name = name;
> +	spin_lock_init(&tl->lock);
> +	tl->context = dma_fence_context_alloc(1);
> +	tl->seqno = 0;
> +	tl->last_fence = NULL;
> +	tl->ops = ops;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> index caa59fb5252b..6f41ee360133 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> @@ -14,6 +14,53 @@
>  #include "i915_sw_fence.h"
>  
>  struct dma_fence_work;
> +struct dma_fence_work_timeline;
> +
> +/**
> + * struct dma_fence_work_timeline_ops - Timeline operations struct
> + * @name: Timeline ops name. This field is used if the timeline itself has
> + * a NULL name. Can be set to NULL in which case a default name is used.
> + *
> + * The struct dma_fence_work_timeline is intended to be embeddable.
> + * We use the ops to get and put the parent structure.
> + */
> +struct dma_fence_work_timeline_ops {
> +	/**
> +	 * Timeline ops name. Used if the timeline itself has no name.
> +	 */
> +	const char *name;
> +
> +	/**
> +	 * put() - Put the structure embedding the timeline
> +	 * @tl: The timeline
> +	 */
> +	void (*put)(struct dma_fence_work_timeline *tl);
> +
> +	/**
> +	 * get() - Get the structure embedding the timeline
> +	 * @tl: The timeline
> +	 */
> +	void (*get)(struct dma_fence_work_timeline *tl);
> +};
> +
> +/**
> + * struct dma_fence_work_timeline - Simple timeline struct for dma_fence_work
> + * @name: The name of the timeline. May be set to NULL. Immutable
> + * @lock: Protects mutable members of the structure.
> + * @context: The timeline fence context. Immutable.
> + * @seqno: The previous seqno used. Protected by @lock.
> + * @last_fence : The previous fence of the timeline. Protected by @lock.
> + * @ops: The timeline operations struct. Immutable.
> + */
> +struct dma_fence_work_timeline {
> +	const char *name;
> +	/** Protects mutable members of the structure */
> +	spinlock_t lock;
> +	u64 context;
> +	u64 seqno;
> +	struct dma_fence *last_fence;
> +	const struct dma_fence_work_timeline_ops *ops;
> +};
>  
>  struct dma_fence_work_ops {
>  	const char *name;
> @@ -30,6 +77,9 @@ struct dma_fence_work {
>  	struct i915_sw_dma_fence_cb cb;
>  
>  	struct work_struct work;
> +
> +	struct dma_fence_work_timeline *tl;
> +
>  	const struct dma_fence_work_ops *ops;
>  };
>  
> @@ -65,4 +115,12 @@ static inline void dma_fence_work_commit_imm(struct dma_fence_work *f)
>  	dma_fence_work_commit(f);
>  }
>  
> +void dma_fence_work_timeline_attach(struct dma_fence_work_timeline *tl,
> +				    struct dma_fence_work *f,
> +				    struct i915_sw_dma_fence_cb *tl_cb);
> +
> +void dma_fence_work_timeline_init(struct dma_fence_work_timeline *tl,
> +				  const char *name,
> +				  const struct dma_fence_work_timeline_ops *ops);
> +
>  #endif /* I915_SW_FENCE_WORK_H */
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] drm/i915: Update dma_fence_work
  2021-10-13 12:41     ` [Intel-gfx] " Daniel Vetter
@ 2021-10-13 12:59       ` Thomas Hellström
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-13 12:59 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, dri-devel, maarten.lankhorst, matthew.auld


On 10/13/21 14:41, Daniel Vetter wrote:
> On Fri, Oct 08, 2021 at 03:35:25PM +0200, Thomas Hellström wrote:
>> Move the release callback to after fence signaling to align with
>> what's done for upcoming VM_BIND user-fence signaling.
>>
>> Finally call the work callback regardless of whether we have a fence
>> error or not and update the existing callbacks accordingly. We will
>> need this to intercept the error for failsafe migration.
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> I think before we make this thing more complex we really should either
> move this into dma-buf/ as a proper thing, or just open-code.
>
> Minimally at least any new async dma_fence worker needs to have
> dma_fence_begin/end_signalling annotations, or we're just digging a grave
> here.
>
> I'm also not seeing the point in building everything on top of this, for
> many cases just an open-coded work_struct should be a lot simpler. It's
> just more to clean up later on, that part is for sure.
> -Daniel

Yes, I mentioned to Matthew, I'm going to respin this based on our 
previous discussions.

Forgot to mention on the ML.

/Thomas


>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_clflush.c |  5 +++
>>   drivers/gpu/drm/i915/i915_sw_fence_work.c   | 36 ++++++++++-----------
>>   drivers/gpu/drm/i915/i915_sw_fence_work.h   |  1 +
>>   drivers/gpu/drm/i915/i915_vma.c             | 12 +++++--
>>   4 files changed, 33 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
>> index f0435c6feb68..2143ebaf5b6f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
>> @@ -28,6 +28,11 @@ static void clflush_work(struct dma_fence_work *base)
>>   {
>>   	struct clflush *clflush = container_of(base, typeof(*clflush), base);
>>   
>> +	if (base->error) {
>> +		dma_fence_set_error(&base->dma, base->error);
>> +		return;
>> +	}
>> +
>>   	__do_clflush(clflush->obj);
>>   }
>>   
>> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
>> index 5b33ef23d54c..5b55cddafc9b 100644
>> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
>> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
>> @@ -6,21 +6,24 @@
>>   
>>   #include "i915_sw_fence_work.h"
>>   
>> -static void fence_complete(struct dma_fence_work *f)
>> +static void dma_fence_work_complete(struct dma_fence_work *f)
>>   {
>> +	dma_fence_signal(&f->dma);
>> +
>>   	if (f->ops->release)
>>   		f->ops->release(f);
>> -	dma_fence_signal(&f->dma);
>> +
>> +	dma_fence_put(&f->dma);
>>   }
>>   
>> -static void fence_work(struct work_struct *work)
>> +static void dma_fence_work_work(struct work_struct *work)
>>   {
>>   	struct dma_fence_work *f = container_of(work, typeof(*f), work);
>>   
>> -	f->ops->work(f);
>> +	if (f->ops->work)
>> +		f->ops->work(f);
>>   
>> -	fence_complete(f);
>> -	dma_fence_put(&f->dma);
>> +	dma_fence_work_complete(f);
>>   }
>>   
>>   static int __i915_sw_fence_call
>> @@ -31,17 +34,13 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>>   	switch (state) {
>>   	case FENCE_COMPLETE:
>>   		if (fence->error)
>> -			dma_fence_set_error(&f->dma, fence->error);
>> -
>> -		if (!f->dma.error) {
>> -			dma_fence_get(&f->dma);
>> -			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
>> -				fence_work(&f->work);
>> -			else
>> -				queue_work(system_unbound_wq, &f->work);
>> -		} else {
>> -			fence_complete(f);
>> -		}
>> +			cmpxchg(&f->error, 0, fence->error);
>> +
>> +		dma_fence_get(&f->dma);
>> +		if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
>> +			dma_fence_work_work(&f->work);
>> +		else
>> +			queue_work(system_unbound_wq, &f->work);
>>   		break;
>>   
>>   	case FENCE_FREE:
>> @@ -84,10 +83,11 @@ void dma_fence_work_init(struct dma_fence_work *f,
>>   			 const struct dma_fence_work_ops *ops)
>>   {
>>   	f->ops = ops;
>> +	f->error = 0;
>>   	spin_lock_init(&f->lock);
>>   	dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0);
>>   	i915_sw_fence_init(&f->chain, fence_notify);
>> -	INIT_WORK(&f->work, fence_work);
>> +	INIT_WORK(&f->work, dma_fence_work_work);
>>   }
>>   
>>   int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal)
>> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
>> index d56806918d13..caa59fb5252b 100644
>> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
>> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
>> @@ -24,6 +24,7 @@ struct dma_fence_work_ops {
>>   struct dma_fence_work {
>>   	struct dma_fence dma;
>>   	spinlock_t lock;
>> +	int error;
>>   
>>   	struct i915_sw_fence chain;
>>   	struct i915_sw_dma_fence_cb cb;
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>> index 4b7fc4647e46..5123ac28ad9a 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -301,6 +301,11 @@ static void __vma_bind(struct dma_fence_work *work)
>>   	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>>   	struct i915_vma *vma = vw->vma;
>>   
>> +	if (work->error) {
>> +		dma_fence_set_error(&work->dma, work->error);
>> +		return;
>> +	}
>> +
>>   	vma->ops->bind_vma(vw->vm, &vw->stash,
>>   			   vma, vw->cache_level, vw->flags);
>>   }
>> @@ -333,7 +338,7 @@ struct i915_vma_work *i915_vma_work(void)
>>   		return NULL;
>>   
>>   	dma_fence_work_init(&vw->base, &bind_ops);
>> -	vw->base.dma.error = -EAGAIN; /* disable the worker by default */
>> +	vw->base.error = -EAGAIN; /* disable the worker by default */
>>   
>>   	return vw;
>>   }
>> @@ -416,6 +421,9 @@ int i915_vma_bind(struct i915_vma *vma,
>>   		 * part of the obj->resv->excl_fence as it only affects
>>   		 * execution and not content or object's backing store lifetime.
>>   		 */
>> +
>> +		work->base.error = 0; /* enable the queue_work() */
>> +
>>   		prev = i915_active_set_exclusive(&vma->active, &work->base.dma);
>>   		if (prev) {
>>   			__i915_sw_fence_await_dma_fence(&work->base.chain,
>> @@ -424,8 +432,6 @@ int i915_vma_bind(struct i915_vma *vma,
>>   			dma_fence_put(prev);
>>   		}
>>   
>> -		work->base.dma.error = 0; /* enable the queue_work() */
>> -
>>   		if (vma->obj) {
>>   			__i915_gem_object_pin_pages(vma->obj);
>>   			work->pinned = i915_gem_object_get(vma->obj);
>> -- 
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH 1/6] drm/i915: Update dma_fence_work
@ 2021-10-13 12:59       ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-13 12:59 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, dri-devel, maarten.lankhorst, matthew.auld


On 10/13/21 14:41, Daniel Vetter wrote:
> On Fri, Oct 08, 2021 at 03:35:25PM +0200, Thomas Hellström wrote:
>> Move the release callback to after fence signaling to align with
>> what's done for upcoming VM_BIND user-fence signaling.
>>
>> Finally call the work callback regardless of whether we have a fence
>> error or not and update the existing callbacks accordingly. We will
>> need this to intercept the error for failsafe migration.
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> I think before we make this thing more complex we really should either
> move this into dma-buf/ as a proper thing, or just open-code.
>
> Minimally at least any new async dma_fence worker needs to have
> dma_fence_begin/end_signalling annotations, or we're just digging a grave
> here.
>
> I'm also not seeing the point in building everything on top of this, for
> many cases just an open-coded work_struct should be a lot simpler. It's
> just more to clean up later on, that part is for sure.
> -Daniel

Yes, I mentioned to Matthew, I'm going to respin this based on our 
previous discussions.

Forgot to mention on the ML.

/Thomas


>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_clflush.c |  5 +++
>>   drivers/gpu/drm/i915/i915_sw_fence_work.c   | 36 ++++++++++-----------
>>   drivers/gpu/drm/i915/i915_sw_fence_work.h   |  1 +
>>   drivers/gpu/drm/i915/i915_vma.c             | 12 +++++--
>>   4 files changed, 33 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
>> index f0435c6feb68..2143ebaf5b6f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
>> @@ -28,6 +28,11 @@ static void clflush_work(struct dma_fence_work *base)
>>   {
>>   	struct clflush *clflush = container_of(base, typeof(*clflush), base);
>>   
>> +	if (base->error) {
>> +		dma_fence_set_error(&base->dma, base->error);
>> +		return;
>> +	}
>> +
>>   	__do_clflush(clflush->obj);
>>   }
>>   
>> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
>> index 5b33ef23d54c..5b55cddafc9b 100644
>> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
>> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
>> @@ -6,21 +6,24 @@
>>   
>>   #include "i915_sw_fence_work.h"
>>   
>> -static void fence_complete(struct dma_fence_work *f)
>> +static void dma_fence_work_complete(struct dma_fence_work *f)
>>   {
>> +	dma_fence_signal(&f->dma);
>> +
>>   	if (f->ops->release)
>>   		f->ops->release(f);
>> -	dma_fence_signal(&f->dma);
>> +
>> +	dma_fence_put(&f->dma);
>>   }
>>   
>> -static void fence_work(struct work_struct *work)
>> +static void dma_fence_work_work(struct work_struct *work)
>>   {
>>   	struct dma_fence_work *f = container_of(work, typeof(*f), work);
>>   
>> -	f->ops->work(f);
>> +	if (f->ops->work)
>> +		f->ops->work(f);
>>   
>> -	fence_complete(f);
>> -	dma_fence_put(&f->dma);
>> +	dma_fence_work_complete(f);
>>   }
>>   
>>   static int __i915_sw_fence_call
>> @@ -31,17 +34,13 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>>   	switch (state) {
>>   	case FENCE_COMPLETE:
>>   		if (fence->error)
>> -			dma_fence_set_error(&f->dma, fence->error);
>> -
>> -		if (!f->dma.error) {
>> -			dma_fence_get(&f->dma);
>> -			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
>> -				fence_work(&f->work);
>> -			else
>> -				queue_work(system_unbound_wq, &f->work);
>> -		} else {
>> -			fence_complete(f);
>> -		}
>> +			cmpxchg(&f->error, 0, fence->error);
>> +
>> +		dma_fence_get(&f->dma);
>> +		if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
>> +			dma_fence_work_work(&f->work);
>> +		else
>> +			queue_work(system_unbound_wq, &f->work);
>>   		break;
>>   
>>   	case FENCE_FREE:
>> @@ -84,10 +83,11 @@ void dma_fence_work_init(struct dma_fence_work *f,
>>   			 const struct dma_fence_work_ops *ops)
>>   {
>>   	f->ops = ops;
>> +	f->error = 0;
>>   	spin_lock_init(&f->lock);
>>   	dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0);
>>   	i915_sw_fence_init(&f->chain, fence_notify);
>> -	INIT_WORK(&f->work, fence_work);
>> +	INIT_WORK(&f->work, dma_fence_work_work);
>>   }
>>   
>>   int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal)
>> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
>> index d56806918d13..caa59fb5252b 100644
>> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
>> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
>> @@ -24,6 +24,7 @@ struct dma_fence_work_ops {
>>   struct dma_fence_work {
>>   	struct dma_fence dma;
>>   	spinlock_t lock;
>> +	int error;
>>   
>>   	struct i915_sw_fence chain;
>>   	struct i915_sw_dma_fence_cb cb;
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>> index 4b7fc4647e46..5123ac28ad9a 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -301,6 +301,11 @@ static void __vma_bind(struct dma_fence_work *work)
>>   	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>>   	struct i915_vma *vma = vw->vma;
>>   
>> +	if (work->error) {
>> +		dma_fence_set_error(&work->dma, work->error);
>> +		return;
>> +	}
>> +
>>   	vma->ops->bind_vma(vw->vm, &vw->stash,
>>   			   vma, vw->cache_level, vw->flags);
>>   }
>> @@ -333,7 +338,7 @@ struct i915_vma_work *i915_vma_work(void)
>>   		return NULL;
>>   
>>   	dma_fence_work_init(&vw->base, &bind_ops);
>> -	vw->base.dma.error = -EAGAIN; /* disable the worker by default */
>> +	vw->base.error = -EAGAIN; /* disable the worker by default */
>>   
>>   	return vw;
>>   }
>> @@ -416,6 +421,9 @@ int i915_vma_bind(struct i915_vma *vma,
>>   		 * part of the obj->resv->excl_fence as it only affects
>>   		 * execution and not content or object's backing store lifetime.
>>   		 */
>> +
>> +		work->base.error = 0; /* enable the queue_work() */
>> +
>>   		prev = i915_active_set_exclusive(&vma->active, &work->base.dma);
>>   		if (prev) {
>>   			__i915_sw_fence_await_dma_fence(&work->base.chain,
>> @@ -424,8 +432,6 @@ int i915_vma_bind(struct i915_vma *vma,
>>   			dma_fence_put(prev);
>>   		}
>>   
>> -		work->base.dma.error = 0; /* enable the queue_work() */
>> -
>>   		if (vma->obj) {
>>   			__i915_gem_object_pin_pages(vma->obj);
>>   			work->pinned = i915_gem_object_get(vma->obj);
>> -- 
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915: Add a struct dma_fence_work timeline
  2021-10-13 12:43   ` Daniel Vetter
@ 2021-10-13 14:21     ` Thomas Hellström
  2021-10-13 14:33       ` Daniel Vetter
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Hellström @ 2021-10-13 14:21 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, dri-devel, maarten.lankhorst, matthew.auld

On Wed, 2021-10-13 at 14:43 +0200, Daniel Vetter wrote:
> On Fri, Oct 08, 2021 at 03:35:28PM +0200, Thomas Hellström wrote:
> > The TTM managers and, possibly, the gtt address space managers will
> > need to be able to order fences for async operation.
> > Using dma_fence_is_later() for this will require that the fences we
> > hand
> > them are from a single fence context and ordered.
> > 
> > Introduce a struct dma_fence_work_timeline, and a function to
> > attach
> > struct dma_fence_work to such a timeline in a way that all previous
> > fences attached to the timeline will be signaled when the latest
> > attached struct dma_fence_work signals.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> I'm not understanding why we need this:
> 
> - if we just want to order dma_fence work, then an ordered workqueue
> is
>   what we want. Which is why hand-rolling is better than reusing
>   dma_fence_work for absolutely everything.
> 
> - if we just need to make sure the public fences signal in order,
> then
>   it's a dma_fence_chain.

Part of the same series that needs reworking.

What we need here is a way to coalesce multiple fences from various
contexts (including both gpu and work fences) into a single fence and
then attach it to a timeline.

/Thomas





^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915: Add a struct dma_fence_work timeline
  2021-10-13 14:21     ` Thomas Hellström
@ 2021-10-13 14:33       ` Daniel Vetter
  2021-10-13 14:39         ` Thomas Hellström
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Vetter @ 2021-10-13 14:33 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Daniel Vetter, intel-gfx, dri-devel, maarten.lankhorst, matthew.auld

On Wed, Oct 13, 2021 at 04:21:43PM +0200, Thomas Hellström wrote:
> On Wed, 2021-10-13 at 14:43 +0200, Daniel Vetter wrote:
> > On Fri, Oct 08, 2021 at 03:35:28PM +0200, Thomas Hellström wrote:
> > > The TTM managers and, possibly, the gtt address space managers will
> > > need to be able to order fences for async operation.
> > > Using dma_fence_is_later() for this will require that the fences we
> > > hand
> > > them are from a single fence context and ordered.
> > > 
> > > Introduce a struct dma_fence_work_timeline, and a function to
> > > attach
> > > struct dma_fence_work to such a timeline in a way that all previous
> > > fences attached to the timeline will be signaled when the latest
> > > attached struct dma_fence_work signals.
> > > 
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > 
> > I'm not understanding why we need this:
> > 
> > - if we just want to order dma_fence work, then an ordered workqueue
> > is
> >   what we want. Which is why hand-rolling is better than reusing
> >   dma_fence_work for absolutely everything.
> > 
> > - if we just need to make sure the public fences signal in order,
> > then
> >   it's a dma_fence_chain.
> 
> Part of the same series that needs reworking.
> 
> What we need here is a way to coalesce multiple fences from various
> contexts (including both gpu and work fences) into a single fence and
> then attach it to a timeline.

I thought dma_fence_chain does this for you, including coelescing on the
same timeline. Or at least it's supposed to, because if it doesn't you can
produce some rather epic chain explosions with vulkan :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915: Add a struct dma_fence_work timeline
  2021-10-13 14:33       ` Daniel Vetter
@ 2021-10-13 14:39         ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-13 14:39 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, dri-devel, maarten.lankhorst, matthew.auld


On 10/13/21 16:33, Daniel Vetter wrote:
> On Wed, Oct 13, 2021 at 04:21:43PM +0200, Thomas Hellström wrote:
>> On Wed, 2021-10-13 at 14:43 +0200, Daniel Vetter wrote:
>>> On Fri, Oct 08, 2021 at 03:35:28PM +0200, Thomas Hellström wrote:
>>>> The TTM managers and, possibly, the gtt address space managers will
>>>> need to be able to order fences for async operation.
>>>> Using dma_fence_is_later() for this will require that the fences we
>>>> hand
>>>> them are from a single fence context and ordered.
>>>>
>>>> Introduce a struct dma_fence_work_timeline, and a function to
>>>> attach
>>>> struct dma_fence_work to such a timeline in a way that all previous
>>>> fences attached to the timeline will be signaled when the latest
>>>> attached struct dma_fence_work signals.
>>>>
>>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> I'm not understanding why we need this:
>>>
>>> - if we just want to order dma_fence work, then an ordered workqueue
>>> is
>>>    what we want. Which is why hand-rolling is better than reusing
>>>    dma_fence_work for absolutely everything.
>>>
>>> - if we just need to make sure the public fences signal in order,
>>> then
>>>    it's a dma_fence_chain.
>> Part of the same series that needs reworking.
>>
>> What we need here is a way to coalesce multiple fences from various
>> contexts (including both gpu and work fences) into a single fence and
>> then attach it to a timeline.
> I thought dma_fence_chain does this for you, including coelescing on the
> same timeline. Or at least it's supposed to, because if it doesn't you can
> produce some rather epic chain explosions with vulkan :-)

I'll take a look to see if I can use dma_fence_chain for this case.

Thanks,

/Thomas

> -Daniel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] drm/i915: Introduce refcounted sg-tables
  2021-10-08 13:35   ` [Intel-gfx] " Thomas Hellström
@ 2021-10-13 14:41     ` Daniel Vetter
  -1 siblings, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2021-10-13 14:41 UTC (permalink / raw)
  To: Thomas Hellström, Christian König
  Cc: intel-gfx, dri-devel, maarten.lankhorst, matthew.auld

On Fri, Oct 08, 2021 at 03:35:26PM +0200, Thomas Hellström wrote:
> As we start to introduce asynchronous failsafe object migration,
> where we update the object state and then submit asynchronous
> commands we need to record what memory resources are actually used
> by various part of the command stream. Initially for three purposes:
> 
> 1) Error capture.
> 2) Asynchronous migration error recovery.
> 3) Asynchronous vma bind.
> 
> At the time where these happens, the object state may have been updated
> to be several migrations ahead and object sg-tables discarded.
> 
> In order to make it possible to keep sg-tables with memory resource
> information for these operations, introduce refcounted sg-tables that
> aren't freed until the last user is done with them.
> 
> The alternative would be to reference information sitting on the
> corresponding ttm_resources which typically have the same lifetime as
> these refcountes sg_tables, but that leads to other awkward constructs:
> Due to the design direction chosen for ttm resource managers that would
> lead to diamond-style inheritance, the LMEM resources may sometimes be
> prematurely freed, and finally the subclassed struct ttm_resource would
> have to bleed into the asynchronous vma bind code.

On the diamon inheritence I was pondering some more whether we shouldn't
just do the classic C union horrors, i.e.

struct ttm_resource {
	/* stuff */
};

struct ttm_drm_mm_resource {
	struct ttm_resource base;
	struct drm_mm_node;
};

struct ttm_buddy_resource {
	struct ttm_resource base;
	struct drm_buddy_node;
};

Whatever else we have, maybe also integer resources for guc_id.

And then the horrors:

struct i915_gem_resource {
	union {
		struct ttm_resource base;
		struct ttm_drm_mm_resource drm_mm;
		struct ttm_buffer_object buddy;
	};

	/* i915 stuff */
};

BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
	offsetof(struct i915_gem_resource, drmm_mm.base))
BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
	offsetof(struct i915_gem_resource, buddy.base))

This is horrible, but also in official C89 and later unions are the only
ways to do inheritance. The only reason we can do different in linux is
because we compile with strict aliasing turned off.

So I think we can shrug this off as officially sanctioned horrors. There's
a small downside with overhead maybe, but I don't think the amount in
difference between the various allocators is big enough that we should
care. Plus a pointer to driver stuff to resolve the diamond inheritance
through different means isn't free either.

But also this is for much later, I think for now refcounting sglist as a
standalone thing is ok, since we do seem to need them in a bunch of
places. But eventually I do think we should aim to merge them with
ttm_resource, if/when those get refcounted.
-Daniel

> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |   3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 159 +++++++++++-------
>  drivers/gpu/drm/i915/i915_scatterlist.c       |  62 +++++--
>  drivers/gpu/drm/i915/i915_scatterlist.h       |  76 ++++++++-
>  drivers/gpu/drm/i915/intel_region_ttm.c       |  15 +-
>  drivers/gpu/drm/i915/intel_region_ttm.h       |   5 +-
>  drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
>  7 files changed, 238 insertions(+), 94 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 7c3da4e3e737..d600cf7ceb35 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -485,6 +485,7 @@ struct drm_i915_gem_object {
>  		 */
>  		struct list_head region_link;
>  
> +		struct i915_refct_sgt *rsgt;
>  		struct sg_table *pages;
>  		void *mapping;
>  
> @@ -538,7 +539,7 @@ struct drm_i915_gem_object {
>  	} mm;
>  
>  	struct {
> -		struct sg_table *cached_io_st;
> +		struct i915_refct_sgt *cached_io_rsgt;
>  		struct i915_gem_object_page_iter get_io_page;
>  		struct drm_i915_gem_object *backup;
>  		bool created:1;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 74a1ffd0d7dd..4b4d7457bef9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -34,7 +34,7 @@
>   * struct i915_ttm_tt - TTM page vector with additional private information
>   * @ttm: The base TTM page vector.
>   * @dev: The struct device used for dma mapping and unmapping.
> - * @cached_st: The cached scatter-gather table.
> + * @cached_rsgt: The cached scatter-gather table.
>   *
>   * Note that DMA may be going on right up to the point where the page-
>   * vector is unpopulated in delayed destroy. Hence keep the
> @@ -45,7 +45,7 @@
>  struct i915_ttm_tt {
>  	struct ttm_tt ttm;
>  	struct device *dev;
> -	struct sg_table *cached_st;
> +	struct i915_refct_sgt cached_rsgt;
>  };
>  
>  static const struct ttm_place sys_placement_flags = {
> @@ -179,6 +179,21 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj,
>  	placement->busy_placement = busy;
>  }
>  
> +static void i915_ttm_tt_release(struct kref *ref)
> +{
> +	struct i915_ttm_tt *i915_tt =
> +		container_of(ref, typeof(*i915_tt), cached_rsgt.kref);
> +	struct sg_table *st = &i915_tt->cached_rsgt.table;
> +
> +	GEM_WARN_ON(st->sgl);
> +
> +	kfree(i915_tt);
> +}
> +
> +static const struct i915_refct_sgt_ops tt_rsgt_ops = {
> +	.release = i915_ttm_tt_release
> +};
> +
>  static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>  					 uint32_t page_flags)
>  {
> @@ -203,6 +218,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>  		return NULL;
>  	}
>  
> +	i915_refct_sgt_init_ops(&i915_tt->cached_rsgt, bo->base.size,
> +				&tt_rsgt_ops);
>  	i915_tt->dev = obj->base.dev->dev;
>  
>  	return &i915_tt->ttm;
> @@ -211,13 +228,13 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>  static void i915_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
>  {
>  	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
> +	struct sg_table *st = &i915_tt->cached_rsgt.table;
> +
> +	GEM_WARN_ON(kref_read(&i915_tt->cached_rsgt.kref) != 1);
>  
> -	if (i915_tt->cached_st) {
> -		dma_unmap_sgtable(i915_tt->dev, i915_tt->cached_st,
> -				  DMA_BIDIRECTIONAL, 0);
> -		sg_free_table(i915_tt->cached_st);
> -		kfree(i915_tt->cached_st);
> -		i915_tt->cached_st = NULL;
> +	if (st->sgl) {
> +		dma_unmap_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
> +		sg_free_table(st);
>  	}
>  	ttm_pool_free(&bdev->pool, ttm);
>  }
> @@ -226,8 +243,10 @@ static void i915_ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
>  {
>  	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
>  
> +	GEM_WARN_ON(kref_read(&i915_tt->cached_rsgt.kref) != 1);
> +
>  	ttm_tt_fini(ttm);
> -	kfree(i915_tt);
> +	i915_refct_sgt_put(&i915_tt->cached_rsgt);
>  }
>  
>  static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
> @@ -261,12 +280,12 @@ static int i915_ttm_move_notify(struct ttm_buffer_object *bo)
>  	return 0;
>  }
>  
> -static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
> +static void i915_ttm_free_cached_io_rsgt(struct drm_i915_gem_object *obj)
>  {
>  	struct radix_tree_iter iter;
>  	void __rcu **slot;
>  
> -	if (!obj->ttm.cached_io_st)
> +	if (!obj->ttm.cached_io_rsgt)
>  		return;
>  
>  	rcu_read_lock();
> @@ -274,9 +293,8 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
>  		radix_tree_delete(&obj->ttm.get_io_page.radix, iter.index);
>  	rcu_read_unlock();
>  
> -	sg_free_table(obj->ttm.cached_io_st);
> -	kfree(obj->ttm.cached_io_st);
> -	obj->ttm.cached_io_st = NULL;
> +	i915_refct_sgt_put(obj->ttm.cached_io_rsgt);
> +	obj->ttm.cached_io_rsgt = NULL;
>  }
>  
>  static void
> @@ -347,7 +365,7 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>  		obj->write_domain = 0;
>  		obj->read_domains = 0;
>  		i915_ttm_adjust_gem_after_move(obj);
> -		i915_ttm_free_cached_io_st(obj);
> +		i915_ttm_free_cached_io_rsgt(obj);
>  		obj->mm.madv = __I915_MADV_PURGED;
>  	}
>  }
> @@ -358,7 +376,7 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
>  	int ret = i915_ttm_move_notify(bo);
>  
>  	GEM_WARN_ON(ret);
> -	GEM_WARN_ON(obj->ttm.cached_io_st);
> +	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
>  	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
>  		i915_ttm_purge(obj);
>  }
> @@ -369,7 +387,7 @@ static void i915_ttm_delete_mem_notify(struct ttm_buffer_object *bo)
>  
>  	if (likely(obj)) {
>  		__i915_gem_object_pages_fini(obj);
> -		i915_ttm_free_cached_io_st(obj);
> +		i915_ttm_free_cached_io_rsgt(obj);
>  	}
>  }
>  
> @@ -389,40 +407,35 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
>  					  ttm_mem_type - I915_PL_LMEM0);
>  }
>  
> -static struct sg_table *i915_ttm_tt_get_st(struct ttm_tt *ttm)
> +static struct i915_refct_sgt *i915_ttm_tt_get_st(struct ttm_tt *ttm)
>  {
>  	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
>  	struct sg_table *st;
>  	int ret;
>  
> -	if (i915_tt->cached_st)
> -		return i915_tt->cached_st;
> -
> -	st = kzalloc(sizeof(*st), GFP_KERNEL);
> -	if (!st)
> -		return ERR_PTR(-ENOMEM);
> +	if (i915_tt->cached_rsgt.table.sgl)
> +		return i915_refct_sgt_get(&i915_tt->cached_rsgt);
>  
> +	st = &i915_tt->cached_rsgt.table;
>  	ret = sg_alloc_table_from_pages_segment(st,
>  			ttm->pages, ttm->num_pages,
>  			0, (unsigned long)ttm->num_pages << PAGE_SHIFT,
>  			i915_sg_segment_size(), GFP_KERNEL);
>  	if (ret) {
> -		kfree(st);
> +		st->sgl = NULL;
>  		return ERR_PTR(ret);
>  	}
>  
>  	ret = dma_map_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
>  	if (ret) {
>  		sg_free_table(st);
> -		kfree(st);
>  		return ERR_PTR(ret);
>  	}
>  
> -	i915_tt->cached_st = st;
> -	return st;
> +	return i915_refct_sgt_get(&i915_tt->cached_rsgt);
>  }
>  
> -static struct sg_table *
> +static struct i915_refct_sgt *
>  i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>  			 struct ttm_resource *res)
>  {
> @@ -436,7 +449,21 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>  	 * the resulting st. Might make sense for GGTT.
>  	 */
>  	GEM_WARN_ON(!cpu_maps_iomem(res));
> -	return intel_region_ttm_resource_to_st(obj->mm.region, res);
> +	if (bo->resource == res) {
> +		if (!obj->ttm.cached_io_rsgt) {
> +			struct i915_refct_sgt *rsgt;
> +
> +			rsgt = intel_region_ttm_resource_to_rsgt(obj->mm.region,
> +								 res);
> +			if (IS_ERR(rsgt))
> +				return rsgt;
> +
> +			obj->ttm.cached_io_rsgt = rsgt;
> +		}
> +		return i915_refct_sgt_get(obj->ttm.cached_io_rsgt);
> +	}
> +
> +	return intel_region_ttm_resource_to_rsgt(obj->mm.region, res);
>  }
>  
>  static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
> @@ -447,10 +474,7 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  {
>  	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
>  						     bdev);
> -	struct ttm_resource_manager *src_man =
> -		ttm_manager_type(bo->bdev, bo->resource->mem_type);
>  	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> -	struct sg_table *src_st;
>  	struct i915_request *rq;
>  	struct ttm_tt *src_ttm = bo->ttm;
>  	enum i915_cache_level src_level, dst_level;
> @@ -476,17 +500,22 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  		}
>  		intel_engine_pm_put(i915->gt.migrate.context->engine);
>  	} else {
> -		src_st = src_man->use_tt ? i915_ttm_tt_get_st(src_ttm) :
> -			obj->ttm.cached_io_st;
> +		struct i915_refct_sgt *src_rsgt =
> +			i915_ttm_resource_get_st(obj, bo->resource);
> +
> +		if (IS_ERR(src_rsgt))
> +			return PTR_ERR(src_rsgt);
>  
>  		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>  		intel_engine_pm_get(i915->gt.migrate.context->engine);
>  		ret = intel_context_migrate_copy(i915->gt.migrate.context,
> -						 NULL, src_st->sgl, src_level,
> +						 NULL, src_rsgt->table.sgl,
> +						 src_level,
>  						 gpu_binds_iomem(bo->resource),
>  						 dst_st->sgl, dst_level,
>  						 gpu_binds_iomem(dst_mem),
>  						 &rq);
> +		i915_refct_sgt_put(src_rsgt);
>  		if (!ret && rq) {
>  			i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
>  			i915_request_put(rq);
> @@ -500,13 +529,14 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>  			    struct ttm_resource *dst_mem,
>  			    struct ttm_tt *dst_ttm,
> -			    struct sg_table *dst_st,
> +			    struct i915_refct_sgt *dst_rsgt,
>  			    bool allow_accel)
>  {
>  	int ret = -EINVAL;
>  
>  	if (allow_accel)
> -		ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm, dst_st);
> +		ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
> +					  &dst_rsgt->table);
>  	if (ret) {
>  		struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>  		struct intel_memory_region *dst_reg, *src_reg;
> @@ -523,12 +553,13 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>  		dst_iter = !cpu_maps_iomem(dst_mem) ?
>  			ttm_kmap_iter_tt_init(&_dst_iter.tt, dst_ttm) :
>  			ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
> -						 dst_st, dst_reg->region.start);
> +						 &dst_rsgt->table,
> +						 dst_reg->region.start);
>  
>  		src_iter = !cpu_maps_iomem(bo->resource) ?
>  			ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
>  			ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
> -						 obj->ttm.cached_io_st,
> +						 &obj->ttm.cached_io_rsgt->table,
>  						 src_reg->region.start);
>  
>  		ttm_move_memcpy(clear, dst_mem->num_pages, dst_iter, src_iter);
> @@ -544,7 +575,7 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>  	struct ttm_resource_manager *dst_man =
>  		ttm_manager_type(bo->bdev, dst_mem->mem_type);
>  	struct ttm_tt *ttm = bo->ttm;
> -	struct sg_table *dst_st;
> +	struct i915_refct_sgt *dst_rsgt;
>  	bool clear;
>  	int ret;
>  
> @@ -570,22 +601,24 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>  			return ret;
>  	}
>  
> -	dst_st = i915_ttm_resource_get_st(obj, dst_mem);
> -	if (IS_ERR(dst_st))
> -		return PTR_ERR(dst_st);
> +	dst_rsgt = i915_ttm_resource_get_st(obj, dst_mem);
> +	if (IS_ERR(dst_rsgt))
> +		return PTR_ERR(dst_rsgt);
>  
>  	clear = !cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
>  	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
> -		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_st, true);
> +		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
>  
>  	ttm_bo_move_sync_cleanup(bo, dst_mem);
>  	i915_ttm_adjust_domains_after_move(obj);
> -	i915_ttm_free_cached_io_st(obj);
> +	i915_ttm_free_cached_io_rsgt(obj);
>  
>  	if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
> -		obj->ttm.cached_io_st = dst_st;
> -		obj->ttm.get_io_page.sg_pos = dst_st->sgl;
> +		obj->ttm.cached_io_rsgt = dst_rsgt;
> +		obj->ttm.get_io_page.sg_pos = dst_rsgt->table.sgl;
>  		obj->ttm.get_io_page.sg_idx = 0;
> +	} else {
> +		i915_refct_sgt_put(dst_rsgt);
>  	}
>  
>  	i915_ttm_adjust_gem_after_move(obj);
> @@ -649,7 +682,6 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj,
>  		.interruptible = true,
>  		.no_wait_gpu = false,
>  	};
> -	struct sg_table *st;
>  	int real_num_busy;
>  	int ret;
>  
> @@ -687,12 +719,16 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj,
>  	}
>  
>  	if (!i915_gem_object_has_pages(obj)) {
> -		/* Object either has a page vector or is an iomem object */
> -		st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
> -		if (IS_ERR(st))
> -			return PTR_ERR(st);
> +		struct i915_refct_sgt *rsgt =
> +			i915_ttm_resource_get_st(obj, bo->resource);
> +
> +		if (IS_ERR(rsgt))
> +			return PTR_ERR(rsgt);
>  
> -		__i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
> +		GEM_BUG_ON(obj->mm.rsgt);
> +		obj->mm.rsgt = rsgt;
> +		__i915_gem_object_set_pages(obj, &rsgt->table,
> +					    i915_sg_dma_sizes(rsgt->table.sgl));
>  	}
>  
>  	return ret;
> @@ -766,6 +802,11 @@ static void i915_ttm_put_pages(struct drm_i915_gem_object *obj,
>  	 * and shrinkers will move it out if needed.
>  	 */
>  
> +	if (obj->mm.rsgt) {
> +		i915_refct_sgt_put(obj->mm.rsgt);
> +		obj->mm.rsgt = NULL;
> +	}
> +
>  	i915_ttm_adjust_lru(obj);
>  }
>  
> @@ -1023,7 +1064,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>  	struct ttm_operation_ctx ctx = {
>  		.interruptible = intr,
>  	};
> -	struct sg_table *dst_st;
> +	struct i915_refct_sgt *dst_rsgt;
>  	int ret;
>  
>  	assert_object_held(dst);
> @@ -1038,11 +1079,11 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>  	if (ret)
>  		return ret;
>  
> -	dst_st = gpu_binds_iomem(dst_bo->resource) ?
> -		dst->ttm.cached_io_st : i915_ttm_tt_get_st(dst_bo->ttm);
> -
> +	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
>  	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
> -			dst_st, allow_accel);
> +			dst_rsgt, allow_accel);
> +
> +	i915_refct_sgt_put(dst_rsgt);
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c b/drivers/gpu/drm/i915/i915_scatterlist.c
> index 4a6712dca838..8a510ee5d1ad 100644
> --- a/drivers/gpu/drm/i915/i915_scatterlist.c
> +++ b/drivers/gpu/drm/i915/i915_scatterlist.c
> @@ -41,8 +41,32 @@ bool i915_sg_trim(struct sg_table *orig_st)
>  	return true;
>  }
>  
> +static void i915_refct_sgt_release(struct kref *ref)
> +{
> +	struct i915_refct_sgt *rsgt =
> +		container_of(ref, typeof(*rsgt), kref);
> +
> +	sg_free_table(&rsgt->table);
> +	kfree(rsgt);
> +}
> +
> +static const struct i915_refct_sgt_ops rsgt_ops = {
> +	.release = i915_refct_sgt_release
> +};
> +
> +/**
> + * i915_refct_sgt_init - Initialize a struct i915_refct_sgt with default ops
> + * @rsgt: The struct i915_refct_sgt to initialize.
> + * size: The size of the underlying memory buffer.
> + */
> +void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, size_t size)
> +{
> +	i915_refct_sgt_init_ops(rsgt, size, &rsgt_ops);
> +}
> +
>  /**
> - * i915_sg_from_mm_node - Create an sg_table from a struct drm_mm_node
> + * i915_rsgt_from_mm_node - Create a refcounted sg_table from a struct
> + * drm_mm_node
>   * @node: The drm_mm_node.
>   * @region_start: An offset to add to the dma addresses of the sg list.
>   *
> @@ -50,25 +74,28 @@ bool i915_sg_trim(struct sg_table *orig_st)
>   * taking a maximum segment length into account, splitting into segments
>   * if necessary.
>   *
> - * Return: A pointer to a kmalloced struct sg_table on success, negative
> + * Return: A pointer to a kmalloced struct i915_refct_sgt on success, negative
>   * error code cast to an error pointer on failure.
>   */
> -struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
> -				      u64 region_start)
> +struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
> +					      u64 region_start)
>  {
>  	const u64 max_segment = SZ_1G; /* Do we have a limit on this? */
>  	u64 segment_pages = max_segment >> PAGE_SHIFT;
>  	u64 block_size, offset, prev_end;
> +	struct i915_refct_sgt *rsgt;
>  	struct sg_table *st;
>  	struct scatterlist *sg;
>  
> -	st = kmalloc(sizeof(*st), GFP_KERNEL);
> -	if (!st)
> +	rsgt = kmalloc(sizeof(*rsgt), GFP_KERNEL);
> +	if (!rsgt)
>  		return ERR_PTR(-ENOMEM);
>  
> +	i915_refct_sgt_init(rsgt, node->size << PAGE_SHIFT);
> +	st = &rsgt->table;
>  	if (sg_alloc_table(st, DIV_ROUND_UP(node->size, segment_pages),
>  			   GFP_KERNEL)) {
> -		kfree(st);
> +		i915_refct_sgt_put(rsgt);
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> @@ -104,11 +131,11 @@ struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
>  	sg_mark_end(sg);
>  	i915_sg_trim(st);
>  
> -	return st;
> +	return rsgt;
>  }
>  
>  /**
> - * i915_sg_from_buddy_resource - Create an sg_table from a struct
> + * i915_rsgt_from_buddy_resource - Create a refcounted sg_table from a struct
>   * i915_buddy_block list
>   * @res: The struct i915_ttm_buddy_resource.
>   * @region_start: An offset to add to the dma addresses of the sg list.
> @@ -117,11 +144,11 @@ struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
>   * taking a maximum segment length into account, splitting into segments
>   * if necessary.
>   *
> - * Return: A pointer to a kmalloced struct sg_table on success, negative
> + * Return: A pointer to a kmalloced struct i915_refct_sgts on success, negative
>   * error code cast to an error pointer on failure.
>   */
> -struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
> -					     u64 region_start)
> +struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
> +						     u64 region_start)
>  {
>  	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
>  	const u64 size = res->num_pages << PAGE_SHIFT;
> @@ -129,18 +156,21 @@ struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
>  	struct i915_buddy_mm *mm = bman_res->mm;
>  	struct list_head *blocks = &bman_res->blocks;
>  	struct i915_buddy_block *block;
> +	struct i915_refct_sgt *rsgt;
>  	struct scatterlist *sg;
>  	struct sg_table *st;
>  	resource_size_t prev_end;
>  
>  	GEM_BUG_ON(list_empty(blocks));
>  
> -	st = kmalloc(sizeof(*st), GFP_KERNEL);
> -	if (!st)
> +	rsgt = kmalloc(sizeof(*rsgt), GFP_KERNEL);
> +	if (!rsgt)
>  		return ERR_PTR(-ENOMEM);
>  
> +	i915_refct_sgt_init(rsgt, size);
> +	st = &rsgt->table;
>  	if (sg_alloc_table(st, res->num_pages, GFP_KERNEL)) {
> -		kfree(st);
> +		i915_refct_sgt_put(rsgt);
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> @@ -181,7 +211,7 @@ struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
>  	sg_mark_end(sg);
>  	i915_sg_trim(st);
>  
> -	return st;
> +	return rsgt;
>  }
>  
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h b/drivers/gpu/drm/i915/i915_scatterlist.h
> index b8bd5925b03f..321fd4a9f777 100644
> --- a/drivers/gpu/drm/i915/i915_scatterlist.h
> +++ b/drivers/gpu/drm/i915/i915_scatterlist.h
> @@ -144,10 +144,78 @@ static inline unsigned int i915_sg_segment_size(void)
>  
>  bool i915_sg_trim(struct sg_table *orig_st);
>  
> -struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
> -				      u64 region_start);
> +/**
> + * struct i915_refct_sgt_ops - Operations structure for struct i915_refct_sgt
> + */
> +struct i915_refct_sgt_ops {
> +	/**
> +	 * release() - Free the memory of the struct i915_refct_sgt
> +	 * @ref: struct kref that is embedded in the struct i915_refct_sgt
> +	 */
> +	void (*release)(struct kref *ref);
> +};
> +
> +/**
> + * struct i915_refct_sgt - A refcounted scatter-gather table
> + * @kref: struct kref for refcounting
> + * @table: struct sg_table holding the scatter-gather table itself. Note that
> + * @table->sgl = NULL can be used to determine whether a scatter-gather table
> + * is present or not.
> + * @size: The size in bytes of the underlying memory buffer
> + * @ops: The operations structure.
> + */
> +struct i915_refct_sgt {
> +	struct kref kref;
> +	struct sg_table table;
> +	size_t size;
> +	const struct i915_refct_sgt_ops *ops;
> +};
> +
> +/**
> + * i915_refct_sgt_put - Put a refcounted sg-table
> + * @rsgt the struct i915_refct_sgt to put.
> + */
> +static inline void i915_refct_sgt_put(struct i915_refct_sgt *rsgt)
> +{
> +	if (rsgt)
> +		kref_put(&rsgt->kref, rsgt->ops->release);
> +}
> +
> +/**
> + * i915_refct_sgt_get - Get a refcounted sg-table
> + * @rsgt the struct i915_refct_sgt to get.
> + */
> +static inline struct i915_refct_sgt *
> +i915_refct_sgt_get(struct i915_refct_sgt *rsgt)
> +{
> +	kref_get(&rsgt->kref);
> +	return rsgt;
> +}
> +
> +/**
> + * i915_refct_sgt_init_ops - Initialize a refcounted sg-list with a custom
> + * operations structure
> + * @rsgt The struct i915_refct_sgt to initialize.
> + * @size: Size in bytes of the underlying memory buffer.
> + * @ops: A customized operations structure in case the refcounted sg-list
> + * is embedded into another structure.
> + */
> +static inline void i915_refct_sgt_init_ops(struct i915_refct_sgt *rsgt,
> +					   size_t size,
> +					   const struct i915_refct_sgt_ops *ops)
> +{
> +	kref_init(&rsgt->kref);
> +	rsgt->table.sgl = NULL;
> +	rsgt->size = size;
> +	rsgt->ops = ops;
> +}
> +
> +void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, size_t size);
> +
> +struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
> +					      u64 region_start);
>  
> -struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
> -					     u64 region_start);
> +struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
> +						     u64 region_start);
>  
>  #endif
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
> index 98c7339bf8ba..2e901a27e259 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.c
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.c
> @@ -115,8 +115,8 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
>  }
>  
>  /**
> - * intel_region_ttm_resource_to_st - Convert an opaque TTM resource manager resource
> - * to an sg_table.
> + * intel_region_ttm_resource_to_rsgt -
> + * Convert an opaque TTM resource manager resource to a refcounted sg_table.
>   * @mem: The memory region.
>   * @res: The resource manager resource obtained from the TTM resource manager.
>   *
> @@ -126,17 +126,18 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
>   *
>   * Return: A malloced sg_table on success, an error pointer on failure.
>   */
> -struct sg_table *intel_region_ttm_resource_to_st(struct intel_memory_region *mem,
> -						 struct ttm_resource *res)
> +struct i915_refct_sgt *
> +intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
> +				  struct ttm_resource *res)
>  {
>  	if (mem->is_range_manager) {
>  		struct ttm_range_mgr_node *range_node =
>  			to_ttm_range_mgr_node(res);
>  
> -		return i915_sg_from_mm_node(&range_node->mm_nodes[0],
> -					    mem->region.start);
> +		return i915_rsgt_from_mm_node(&range_node->mm_nodes[0],
> +					      mem->region.start);
>  	} else {
> -		return i915_sg_from_buddy_resource(res, mem->region.start);
> +		return i915_rsgt_from_buddy_resource(res, mem->region.start);
>  	}
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
> index 6f44075920f2..7bbe2b46b504 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.h
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.h
> @@ -22,8 +22,9 @@ int intel_region_ttm_init(struct intel_memory_region *mem);
>  
>  void intel_region_ttm_fini(struct intel_memory_region *mem);
>  
> -struct sg_table *intel_region_ttm_resource_to_st(struct intel_memory_region *mem,
> -						 struct ttm_resource *res);
> +struct i915_refct_sgt *
> +intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
> +				  struct ttm_resource *res);
>  
>  void intel_region_ttm_resource_free(struct intel_memory_region *mem,
>  				    struct ttm_resource *res);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c
> index efa86dffe3c6..2752b5b98f60 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_region.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_region.c
> @@ -17,9 +17,9 @@
>  static void mock_region_put_pages(struct drm_i915_gem_object *obj,
>  				  struct sg_table *pages)
>  {
> +	i915_refct_sgt_put(obj->mm.rsgt);
> +	obj->mm.rsgt = NULL;
>  	intel_region_ttm_resource_free(obj->mm.region, obj->mm.res);
> -	sg_free_table(pages);
> -	kfree(pages);
>  }
>  
>  static int mock_region_get_pages(struct drm_i915_gem_object *obj)
> @@ -38,12 +38,14 @@ static int mock_region_get_pages(struct drm_i915_gem_object *obj)
>  	if (IS_ERR(obj->mm.res))
>  		return PTR_ERR(obj->mm.res);
>  
> -	pages = intel_region_ttm_resource_to_st(obj->mm.region, obj->mm.res);
> -	if (IS_ERR(pages)) {
> -		err = PTR_ERR(pages);
> +	obj->mm.rsgt = intel_region_ttm_resource_to_rsgt(obj->mm.region,
> +							 obj->mm.res);
> +	if (IS_ERR(obj->mm.rsgt)) {
> +		err = PTR_ERR(obj->mm.rsgt);
>  		goto err_free_resource;
>  	}
>  
> +	pages = &obj->mm.rsgt->table;
>  	__i915_gem_object_set_pages(obj, pages, i915_sg_dma_sizes(pages->sgl));
>  
>  	return 0;
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] drm/i915: Introduce refcounted sg-tables
@ 2021-10-13 14:41     ` Daniel Vetter
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2021-10-13 14:41 UTC (permalink / raw)
  To: Thomas Hellström, Christian König
  Cc: intel-gfx, dri-devel, maarten.lankhorst, matthew.auld

On Fri, Oct 08, 2021 at 03:35:26PM +0200, Thomas Hellström wrote:
> As we start to introduce asynchronous failsafe object migration,
> where we update the object state and then submit asynchronous
> commands we need to record what memory resources are actually used
> by various part of the command stream. Initially for three purposes:
> 
> 1) Error capture.
> 2) Asynchronous migration error recovery.
> 3) Asynchronous vma bind.
> 
> At the time where these happens, the object state may have been updated
> to be several migrations ahead and object sg-tables discarded.
> 
> In order to make it possible to keep sg-tables with memory resource
> information for these operations, introduce refcounted sg-tables that
> aren't freed until the last user is done with them.
> 
> The alternative would be to reference information sitting on the
> corresponding ttm_resources which typically have the same lifetime as
> these refcountes sg_tables, but that leads to other awkward constructs:
> Due to the design direction chosen for ttm resource managers that would
> lead to diamond-style inheritance, the LMEM resources may sometimes be
> prematurely freed, and finally the subclassed struct ttm_resource would
> have to bleed into the asynchronous vma bind code.

On the diamon inheritence I was pondering some more whether we shouldn't
just do the classic C union horrors, i.e.

struct ttm_resource {
	/* stuff */
};

struct ttm_drm_mm_resource {
	struct ttm_resource base;
	struct drm_mm_node;
};

struct ttm_buddy_resource {
	struct ttm_resource base;
	struct drm_buddy_node;
};

Whatever else we have, maybe also integer resources for guc_id.

And then the horrors:

struct i915_gem_resource {
	union {
		struct ttm_resource base;
		struct ttm_drm_mm_resource drm_mm;
		struct ttm_buffer_object buddy;
	};

	/* i915 stuff */
};

BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
	offsetof(struct i915_gem_resource, drmm_mm.base))
BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
	offsetof(struct i915_gem_resource, buddy.base))

This is horrible, but also in official C89 and later unions are the only
ways to do inheritance. The only reason we can do different in linux is
because we compile with strict aliasing turned off.

So I think we can shrug this off as officially sanctioned horrors. There's
a small downside with overhead maybe, but I don't think the amount in
difference between the various allocators is big enough that we should
care. Plus a pointer to driver stuff to resolve the diamond inheritance
through different means isn't free either.

But also this is for much later, I think for now refcounting sglist as a
standalone thing is ok, since we do seem to need them in a bunch of
places. But eventually I do think we should aim to merge them with
ttm_resource, if/when those get refcounted.
-Daniel

> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |   3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 159 +++++++++++-------
>  drivers/gpu/drm/i915/i915_scatterlist.c       |  62 +++++--
>  drivers/gpu/drm/i915/i915_scatterlist.h       |  76 ++++++++-
>  drivers/gpu/drm/i915/intel_region_ttm.c       |  15 +-
>  drivers/gpu/drm/i915/intel_region_ttm.h       |   5 +-
>  drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
>  7 files changed, 238 insertions(+), 94 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 7c3da4e3e737..d600cf7ceb35 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -485,6 +485,7 @@ struct drm_i915_gem_object {
>  		 */
>  		struct list_head region_link;
>  
> +		struct i915_refct_sgt *rsgt;
>  		struct sg_table *pages;
>  		void *mapping;
>  
> @@ -538,7 +539,7 @@ struct drm_i915_gem_object {
>  	} mm;
>  
>  	struct {
> -		struct sg_table *cached_io_st;
> +		struct i915_refct_sgt *cached_io_rsgt;
>  		struct i915_gem_object_page_iter get_io_page;
>  		struct drm_i915_gem_object *backup;
>  		bool created:1;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 74a1ffd0d7dd..4b4d7457bef9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -34,7 +34,7 @@
>   * struct i915_ttm_tt - TTM page vector with additional private information
>   * @ttm: The base TTM page vector.
>   * @dev: The struct device used for dma mapping and unmapping.
> - * @cached_st: The cached scatter-gather table.
> + * @cached_rsgt: The cached scatter-gather table.
>   *
>   * Note that DMA may be going on right up to the point where the page-
>   * vector is unpopulated in delayed destroy. Hence keep the
> @@ -45,7 +45,7 @@
>  struct i915_ttm_tt {
>  	struct ttm_tt ttm;
>  	struct device *dev;
> -	struct sg_table *cached_st;
> +	struct i915_refct_sgt cached_rsgt;
>  };
>  
>  static const struct ttm_place sys_placement_flags = {
> @@ -179,6 +179,21 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj,
>  	placement->busy_placement = busy;
>  }
>  
> +static void i915_ttm_tt_release(struct kref *ref)
> +{
> +	struct i915_ttm_tt *i915_tt =
> +		container_of(ref, typeof(*i915_tt), cached_rsgt.kref);
> +	struct sg_table *st = &i915_tt->cached_rsgt.table;
> +
> +	GEM_WARN_ON(st->sgl);
> +
> +	kfree(i915_tt);
> +}
> +
> +static const struct i915_refct_sgt_ops tt_rsgt_ops = {
> +	.release = i915_ttm_tt_release
> +};
> +
>  static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>  					 uint32_t page_flags)
>  {
> @@ -203,6 +218,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>  		return NULL;
>  	}
>  
> +	i915_refct_sgt_init_ops(&i915_tt->cached_rsgt, bo->base.size,
> +				&tt_rsgt_ops);
>  	i915_tt->dev = obj->base.dev->dev;
>  
>  	return &i915_tt->ttm;
> @@ -211,13 +228,13 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>  static void i915_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
>  {
>  	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
> +	struct sg_table *st = &i915_tt->cached_rsgt.table;
> +
> +	GEM_WARN_ON(kref_read(&i915_tt->cached_rsgt.kref) != 1);
>  
> -	if (i915_tt->cached_st) {
> -		dma_unmap_sgtable(i915_tt->dev, i915_tt->cached_st,
> -				  DMA_BIDIRECTIONAL, 0);
> -		sg_free_table(i915_tt->cached_st);
> -		kfree(i915_tt->cached_st);
> -		i915_tt->cached_st = NULL;
> +	if (st->sgl) {
> +		dma_unmap_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
> +		sg_free_table(st);
>  	}
>  	ttm_pool_free(&bdev->pool, ttm);
>  }
> @@ -226,8 +243,10 @@ static void i915_ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
>  {
>  	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
>  
> +	GEM_WARN_ON(kref_read(&i915_tt->cached_rsgt.kref) != 1);
> +
>  	ttm_tt_fini(ttm);
> -	kfree(i915_tt);
> +	i915_refct_sgt_put(&i915_tt->cached_rsgt);
>  }
>  
>  static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
> @@ -261,12 +280,12 @@ static int i915_ttm_move_notify(struct ttm_buffer_object *bo)
>  	return 0;
>  }
>  
> -static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
> +static void i915_ttm_free_cached_io_rsgt(struct drm_i915_gem_object *obj)
>  {
>  	struct radix_tree_iter iter;
>  	void __rcu **slot;
>  
> -	if (!obj->ttm.cached_io_st)
> +	if (!obj->ttm.cached_io_rsgt)
>  		return;
>  
>  	rcu_read_lock();
> @@ -274,9 +293,8 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
>  		radix_tree_delete(&obj->ttm.get_io_page.radix, iter.index);
>  	rcu_read_unlock();
>  
> -	sg_free_table(obj->ttm.cached_io_st);
> -	kfree(obj->ttm.cached_io_st);
> -	obj->ttm.cached_io_st = NULL;
> +	i915_refct_sgt_put(obj->ttm.cached_io_rsgt);
> +	obj->ttm.cached_io_rsgt = NULL;
>  }
>  
>  static void
> @@ -347,7 +365,7 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>  		obj->write_domain = 0;
>  		obj->read_domains = 0;
>  		i915_ttm_adjust_gem_after_move(obj);
> -		i915_ttm_free_cached_io_st(obj);
> +		i915_ttm_free_cached_io_rsgt(obj);
>  		obj->mm.madv = __I915_MADV_PURGED;
>  	}
>  }
> @@ -358,7 +376,7 @@ static void i915_ttm_swap_notify(struct ttm_buffer_object *bo)
>  	int ret = i915_ttm_move_notify(bo);
>  
>  	GEM_WARN_ON(ret);
> -	GEM_WARN_ON(obj->ttm.cached_io_st);
> +	GEM_WARN_ON(obj->ttm.cached_io_rsgt);
>  	if (!ret && obj->mm.madv != I915_MADV_WILLNEED)
>  		i915_ttm_purge(obj);
>  }
> @@ -369,7 +387,7 @@ static void i915_ttm_delete_mem_notify(struct ttm_buffer_object *bo)
>  
>  	if (likely(obj)) {
>  		__i915_gem_object_pages_fini(obj);
> -		i915_ttm_free_cached_io_st(obj);
> +		i915_ttm_free_cached_io_rsgt(obj);
>  	}
>  }
>  
> @@ -389,40 +407,35 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
>  					  ttm_mem_type - I915_PL_LMEM0);
>  }
>  
> -static struct sg_table *i915_ttm_tt_get_st(struct ttm_tt *ttm)
> +static struct i915_refct_sgt *i915_ttm_tt_get_st(struct ttm_tt *ttm)
>  {
>  	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
>  	struct sg_table *st;
>  	int ret;
>  
> -	if (i915_tt->cached_st)
> -		return i915_tt->cached_st;
> -
> -	st = kzalloc(sizeof(*st), GFP_KERNEL);
> -	if (!st)
> -		return ERR_PTR(-ENOMEM);
> +	if (i915_tt->cached_rsgt.table.sgl)
> +		return i915_refct_sgt_get(&i915_tt->cached_rsgt);
>  
> +	st = &i915_tt->cached_rsgt.table;
>  	ret = sg_alloc_table_from_pages_segment(st,
>  			ttm->pages, ttm->num_pages,
>  			0, (unsigned long)ttm->num_pages << PAGE_SHIFT,
>  			i915_sg_segment_size(), GFP_KERNEL);
>  	if (ret) {
> -		kfree(st);
> +		st->sgl = NULL;
>  		return ERR_PTR(ret);
>  	}
>  
>  	ret = dma_map_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
>  	if (ret) {
>  		sg_free_table(st);
> -		kfree(st);
>  		return ERR_PTR(ret);
>  	}
>  
> -	i915_tt->cached_st = st;
> -	return st;
> +	return i915_refct_sgt_get(&i915_tt->cached_rsgt);
>  }
>  
> -static struct sg_table *
> +static struct i915_refct_sgt *
>  i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>  			 struct ttm_resource *res)
>  {
> @@ -436,7 +449,21 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>  	 * the resulting st. Might make sense for GGTT.
>  	 */
>  	GEM_WARN_ON(!cpu_maps_iomem(res));
> -	return intel_region_ttm_resource_to_st(obj->mm.region, res);
> +	if (bo->resource == res) {
> +		if (!obj->ttm.cached_io_rsgt) {
> +			struct i915_refct_sgt *rsgt;
> +
> +			rsgt = intel_region_ttm_resource_to_rsgt(obj->mm.region,
> +								 res);
> +			if (IS_ERR(rsgt))
> +				return rsgt;
> +
> +			obj->ttm.cached_io_rsgt = rsgt;
> +		}
> +		return i915_refct_sgt_get(obj->ttm.cached_io_rsgt);
> +	}
> +
> +	return intel_region_ttm_resource_to_rsgt(obj->mm.region, res);
>  }
>  
>  static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
> @@ -447,10 +474,7 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  {
>  	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
>  						     bdev);
> -	struct ttm_resource_manager *src_man =
> -		ttm_manager_type(bo->bdev, bo->resource->mem_type);
>  	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> -	struct sg_table *src_st;
>  	struct i915_request *rq;
>  	struct ttm_tt *src_ttm = bo->ttm;
>  	enum i915_cache_level src_level, dst_level;
> @@ -476,17 +500,22 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  		}
>  		intel_engine_pm_put(i915->gt.migrate.context->engine);
>  	} else {
> -		src_st = src_man->use_tt ? i915_ttm_tt_get_st(src_ttm) :
> -			obj->ttm.cached_io_st;
> +		struct i915_refct_sgt *src_rsgt =
> +			i915_ttm_resource_get_st(obj, bo->resource);
> +
> +		if (IS_ERR(src_rsgt))
> +			return PTR_ERR(src_rsgt);
>  
>  		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>  		intel_engine_pm_get(i915->gt.migrate.context->engine);
>  		ret = intel_context_migrate_copy(i915->gt.migrate.context,
> -						 NULL, src_st->sgl, src_level,
> +						 NULL, src_rsgt->table.sgl,
> +						 src_level,
>  						 gpu_binds_iomem(bo->resource),
>  						 dst_st->sgl, dst_level,
>  						 gpu_binds_iomem(dst_mem),
>  						 &rq);
> +		i915_refct_sgt_put(src_rsgt);
>  		if (!ret && rq) {
>  			i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
>  			i915_request_put(rq);
> @@ -500,13 +529,14 @@ static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>  			    struct ttm_resource *dst_mem,
>  			    struct ttm_tt *dst_ttm,
> -			    struct sg_table *dst_st,
> +			    struct i915_refct_sgt *dst_rsgt,
>  			    bool allow_accel)
>  {
>  	int ret = -EINVAL;
>  
>  	if (allow_accel)
> -		ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm, dst_st);
> +		ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm,
> +					  &dst_rsgt->table);
>  	if (ret) {
>  		struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>  		struct intel_memory_region *dst_reg, *src_reg;
> @@ -523,12 +553,13 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
>  		dst_iter = !cpu_maps_iomem(dst_mem) ?
>  			ttm_kmap_iter_tt_init(&_dst_iter.tt, dst_ttm) :
>  			ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
> -						 dst_st, dst_reg->region.start);
> +						 &dst_rsgt->table,
> +						 dst_reg->region.start);
>  
>  		src_iter = !cpu_maps_iomem(bo->resource) ?
>  			ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
>  			ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
> -						 obj->ttm.cached_io_st,
> +						 &obj->ttm.cached_io_rsgt->table,
>  						 src_reg->region.start);
>  
>  		ttm_move_memcpy(clear, dst_mem->num_pages, dst_iter, src_iter);
> @@ -544,7 +575,7 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>  	struct ttm_resource_manager *dst_man =
>  		ttm_manager_type(bo->bdev, dst_mem->mem_type);
>  	struct ttm_tt *ttm = bo->ttm;
> -	struct sg_table *dst_st;
> +	struct i915_refct_sgt *dst_rsgt;
>  	bool clear;
>  	int ret;
>  
> @@ -570,22 +601,24 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>  			return ret;
>  	}
>  
> -	dst_st = i915_ttm_resource_get_st(obj, dst_mem);
> -	if (IS_ERR(dst_st))
> -		return PTR_ERR(dst_st);
> +	dst_rsgt = i915_ttm_resource_get_st(obj, dst_mem);
> +	if (IS_ERR(dst_rsgt))
> +		return PTR_ERR(dst_rsgt);
>  
>  	clear = !cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
>  	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)))
> -		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_st, true);
> +		__i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_rsgt, true);
>  
>  	ttm_bo_move_sync_cleanup(bo, dst_mem);
>  	i915_ttm_adjust_domains_after_move(obj);
> -	i915_ttm_free_cached_io_st(obj);
> +	i915_ttm_free_cached_io_rsgt(obj);
>  
>  	if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
> -		obj->ttm.cached_io_st = dst_st;
> -		obj->ttm.get_io_page.sg_pos = dst_st->sgl;
> +		obj->ttm.cached_io_rsgt = dst_rsgt;
> +		obj->ttm.get_io_page.sg_pos = dst_rsgt->table.sgl;
>  		obj->ttm.get_io_page.sg_idx = 0;
> +	} else {
> +		i915_refct_sgt_put(dst_rsgt);
>  	}
>  
>  	i915_ttm_adjust_gem_after_move(obj);
> @@ -649,7 +682,6 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj,
>  		.interruptible = true,
>  		.no_wait_gpu = false,
>  	};
> -	struct sg_table *st;
>  	int real_num_busy;
>  	int ret;
>  
> @@ -687,12 +719,16 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj,
>  	}
>  
>  	if (!i915_gem_object_has_pages(obj)) {
> -		/* Object either has a page vector or is an iomem object */
> -		st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
> -		if (IS_ERR(st))
> -			return PTR_ERR(st);
> +		struct i915_refct_sgt *rsgt =
> +			i915_ttm_resource_get_st(obj, bo->resource);
> +
> +		if (IS_ERR(rsgt))
> +			return PTR_ERR(rsgt);
>  
> -		__i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
> +		GEM_BUG_ON(obj->mm.rsgt);
> +		obj->mm.rsgt = rsgt;
> +		__i915_gem_object_set_pages(obj, &rsgt->table,
> +					    i915_sg_dma_sizes(rsgt->table.sgl));
>  	}
>  
>  	return ret;
> @@ -766,6 +802,11 @@ static void i915_ttm_put_pages(struct drm_i915_gem_object *obj,
>  	 * and shrinkers will move it out if needed.
>  	 */
>  
> +	if (obj->mm.rsgt) {
> +		i915_refct_sgt_put(obj->mm.rsgt);
> +		obj->mm.rsgt = NULL;
> +	}
> +
>  	i915_ttm_adjust_lru(obj);
>  }
>  
> @@ -1023,7 +1064,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>  	struct ttm_operation_ctx ctx = {
>  		.interruptible = intr,
>  	};
> -	struct sg_table *dst_st;
> +	struct i915_refct_sgt *dst_rsgt;
>  	int ret;
>  
>  	assert_object_held(dst);
> @@ -1038,11 +1079,11 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
>  	if (ret)
>  		return ret;
>  
> -	dst_st = gpu_binds_iomem(dst_bo->resource) ?
> -		dst->ttm.cached_io_st : i915_ttm_tt_get_st(dst_bo->ttm);
> -
> +	dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource);
>  	__i915_ttm_move(src_bo, false, dst_bo->resource, dst_bo->ttm,
> -			dst_st, allow_accel);
> +			dst_rsgt, allow_accel);
> +
> +	i915_refct_sgt_put(dst_rsgt);
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c b/drivers/gpu/drm/i915/i915_scatterlist.c
> index 4a6712dca838..8a510ee5d1ad 100644
> --- a/drivers/gpu/drm/i915/i915_scatterlist.c
> +++ b/drivers/gpu/drm/i915/i915_scatterlist.c
> @@ -41,8 +41,32 @@ bool i915_sg_trim(struct sg_table *orig_st)
>  	return true;
>  }
>  
> +static void i915_refct_sgt_release(struct kref *ref)
> +{
> +	struct i915_refct_sgt *rsgt =
> +		container_of(ref, typeof(*rsgt), kref);
> +
> +	sg_free_table(&rsgt->table);
> +	kfree(rsgt);
> +}
> +
> +static const struct i915_refct_sgt_ops rsgt_ops = {
> +	.release = i915_refct_sgt_release
> +};
> +
> +/**
> + * i915_refct_sgt_init - Initialize a struct i915_refct_sgt with default ops
> + * @rsgt: The struct i915_refct_sgt to initialize.
> + * size: The size of the underlying memory buffer.
> + */
> +void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, size_t size)
> +{
> +	i915_refct_sgt_init_ops(rsgt, size, &rsgt_ops);
> +}
> +
>  /**
> - * i915_sg_from_mm_node - Create an sg_table from a struct drm_mm_node
> + * i915_rsgt_from_mm_node - Create a refcounted sg_table from a struct
> + * drm_mm_node
>   * @node: The drm_mm_node.
>   * @region_start: An offset to add to the dma addresses of the sg list.
>   *
> @@ -50,25 +74,28 @@ bool i915_sg_trim(struct sg_table *orig_st)
>   * taking a maximum segment length into account, splitting into segments
>   * if necessary.
>   *
> - * Return: A pointer to a kmalloced struct sg_table on success, negative
> + * Return: A pointer to a kmalloced struct i915_refct_sgt on success, negative
>   * error code cast to an error pointer on failure.
>   */
> -struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
> -				      u64 region_start)
> +struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
> +					      u64 region_start)
>  {
>  	const u64 max_segment = SZ_1G; /* Do we have a limit on this? */
>  	u64 segment_pages = max_segment >> PAGE_SHIFT;
>  	u64 block_size, offset, prev_end;
> +	struct i915_refct_sgt *rsgt;
>  	struct sg_table *st;
>  	struct scatterlist *sg;
>  
> -	st = kmalloc(sizeof(*st), GFP_KERNEL);
> -	if (!st)
> +	rsgt = kmalloc(sizeof(*rsgt), GFP_KERNEL);
> +	if (!rsgt)
>  		return ERR_PTR(-ENOMEM);
>  
> +	i915_refct_sgt_init(rsgt, node->size << PAGE_SHIFT);
> +	st = &rsgt->table;
>  	if (sg_alloc_table(st, DIV_ROUND_UP(node->size, segment_pages),
>  			   GFP_KERNEL)) {
> -		kfree(st);
> +		i915_refct_sgt_put(rsgt);
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> @@ -104,11 +131,11 @@ struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
>  	sg_mark_end(sg);
>  	i915_sg_trim(st);
>  
> -	return st;
> +	return rsgt;
>  }
>  
>  /**
> - * i915_sg_from_buddy_resource - Create an sg_table from a struct
> + * i915_rsgt_from_buddy_resource - Create a refcounted sg_table from a struct
>   * i915_buddy_block list
>   * @res: The struct i915_ttm_buddy_resource.
>   * @region_start: An offset to add to the dma addresses of the sg list.
> @@ -117,11 +144,11 @@ struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
>   * taking a maximum segment length into account, splitting into segments
>   * if necessary.
>   *
> - * Return: A pointer to a kmalloced struct sg_table on success, negative
> + * Return: A pointer to a kmalloced struct i915_refct_sgts on success, negative
>   * error code cast to an error pointer on failure.
>   */
> -struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
> -					     u64 region_start)
> +struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
> +						     u64 region_start)
>  {
>  	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
>  	const u64 size = res->num_pages << PAGE_SHIFT;
> @@ -129,18 +156,21 @@ struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
>  	struct i915_buddy_mm *mm = bman_res->mm;
>  	struct list_head *blocks = &bman_res->blocks;
>  	struct i915_buddy_block *block;
> +	struct i915_refct_sgt *rsgt;
>  	struct scatterlist *sg;
>  	struct sg_table *st;
>  	resource_size_t prev_end;
>  
>  	GEM_BUG_ON(list_empty(blocks));
>  
> -	st = kmalloc(sizeof(*st), GFP_KERNEL);
> -	if (!st)
> +	rsgt = kmalloc(sizeof(*rsgt), GFP_KERNEL);
> +	if (!rsgt)
>  		return ERR_PTR(-ENOMEM);
>  
> +	i915_refct_sgt_init(rsgt, size);
> +	st = &rsgt->table;
>  	if (sg_alloc_table(st, res->num_pages, GFP_KERNEL)) {
> -		kfree(st);
> +		i915_refct_sgt_put(rsgt);
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> @@ -181,7 +211,7 @@ struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
>  	sg_mark_end(sg);
>  	i915_sg_trim(st);
>  
> -	return st;
> +	return rsgt;
>  }
>  
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h b/drivers/gpu/drm/i915/i915_scatterlist.h
> index b8bd5925b03f..321fd4a9f777 100644
> --- a/drivers/gpu/drm/i915/i915_scatterlist.h
> +++ b/drivers/gpu/drm/i915/i915_scatterlist.h
> @@ -144,10 +144,78 @@ static inline unsigned int i915_sg_segment_size(void)
>  
>  bool i915_sg_trim(struct sg_table *orig_st);
>  
> -struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
> -				      u64 region_start);
> +/**
> + * struct i915_refct_sgt_ops - Operations structure for struct i915_refct_sgt
> + */
> +struct i915_refct_sgt_ops {
> +	/**
> +	 * release() - Free the memory of the struct i915_refct_sgt
> +	 * @ref: struct kref that is embedded in the struct i915_refct_sgt
> +	 */
> +	void (*release)(struct kref *ref);
> +};
> +
> +/**
> + * struct i915_refct_sgt - A refcounted scatter-gather table
> + * @kref: struct kref for refcounting
> + * @table: struct sg_table holding the scatter-gather table itself. Note that
> + * @table->sgl = NULL can be used to determine whether a scatter-gather table
> + * is present or not.
> + * @size: The size in bytes of the underlying memory buffer
> + * @ops: The operations structure.
> + */
> +struct i915_refct_sgt {
> +	struct kref kref;
> +	struct sg_table table;
> +	size_t size;
> +	const struct i915_refct_sgt_ops *ops;
> +};
> +
> +/**
> + * i915_refct_sgt_put - Put a refcounted sg-table
> + * @rsgt the struct i915_refct_sgt to put.
> + */
> +static inline void i915_refct_sgt_put(struct i915_refct_sgt *rsgt)
> +{
> +	if (rsgt)
> +		kref_put(&rsgt->kref, rsgt->ops->release);
> +}
> +
> +/**
> + * i915_refct_sgt_get - Get a refcounted sg-table
> + * @rsgt the struct i915_refct_sgt to get.
> + */
> +static inline struct i915_refct_sgt *
> +i915_refct_sgt_get(struct i915_refct_sgt *rsgt)
> +{
> +	kref_get(&rsgt->kref);
> +	return rsgt;
> +}
> +
> +/**
> + * i915_refct_sgt_init_ops - Initialize a refcounted sg-list with a custom
> + * operations structure
> + * @rsgt The struct i915_refct_sgt to initialize.
> + * @size: Size in bytes of the underlying memory buffer.
> + * @ops: A customized operations structure in case the refcounted sg-list
> + * is embedded into another structure.
> + */
> +static inline void i915_refct_sgt_init_ops(struct i915_refct_sgt *rsgt,
> +					   size_t size,
> +					   const struct i915_refct_sgt_ops *ops)
> +{
> +	kref_init(&rsgt->kref);
> +	rsgt->table.sgl = NULL;
> +	rsgt->size = size;
> +	rsgt->ops = ops;
> +}
> +
> +void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, size_t size);
> +
> +struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
> +					      u64 region_start);
>  
> -struct sg_table *i915_sg_from_buddy_resource(struct ttm_resource *res,
> -					     u64 region_start);
> +struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
> +						     u64 region_start);
>  
>  #endif
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
> index 98c7339bf8ba..2e901a27e259 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.c
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.c
> @@ -115,8 +115,8 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
>  }
>  
>  /**
> - * intel_region_ttm_resource_to_st - Convert an opaque TTM resource manager resource
> - * to an sg_table.
> + * intel_region_ttm_resource_to_rsgt -
> + * Convert an opaque TTM resource manager resource to a refcounted sg_table.
>   * @mem: The memory region.
>   * @res: The resource manager resource obtained from the TTM resource manager.
>   *
> @@ -126,17 +126,18 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
>   *
>   * Return: A malloced sg_table on success, an error pointer on failure.
>   */
> -struct sg_table *intel_region_ttm_resource_to_st(struct intel_memory_region *mem,
> -						 struct ttm_resource *res)
> +struct i915_refct_sgt *
> +intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
> +				  struct ttm_resource *res)
>  {
>  	if (mem->is_range_manager) {
>  		struct ttm_range_mgr_node *range_node =
>  			to_ttm_range_mgr_node(res);
>  
> -		return i915_sg_from_mm_node(&range_node->mm_nodes[0],
> -					    mem->region.start);
> +		return i915_rsgt_from_mm_node(&range_node->mm_nodes[0],
> +					      mem->region.start);
>  	} else {
> -		return i915_sg_from_buddy_resource(res, mem->region.start);
> +		return i915_rsgt_from_buddy_resource(res, mem->region.start);
>  	}
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
> index 6f44075920f2..7bbe2b46b504 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.h
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.h
> @@ -22,8 +22,9 @@ int intel_region_ttm_init(struct intel_memory_region *mem);
>  
>  void intel_region_ttm_fini(struct intel_memory_region *mem);
>  
> -struct sg_table *intel_region_ttm_resource_to_st(struct intel_memory_region *mem,
> -						 struct ttm_resource *res);
> +struct i915_refct_sgt *
> +intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
> +				  struct ttm_resource *res);
>  
>  void intel_region_ttm_resource_free(struct intel_memory_region *mem,
>  				    struct ttm_resource *res);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c
> index efa86dffe3c6..2752b5b98f60 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_region.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_region.c
> @@ -17,9 +17,9 @@
>  static void mock_region_put_pages(struct drm_i915_gem_object *obj,
>  				  struct sg_table *pages)
>  {
> +	i915_refct_sgt_put(obj->mm.rsgt);
> +	obj->mm.rsgt = NULL;
>  	intel_region_ttm_resource_free(obj->mm.region, obj->mm.res);
> -	sg_free_table(pages);
> -	kfree(pages);
>  }
>  
>  static int mock_region_get_pages(struct drm_i915_gem_object *obj)
> @@ -38,12 +38,14 @@ static int mock_region_get_pages(struct drm_i915_gem_object *obj)
>  	if (IS_ERR(obj->mm.res))
>  		return PTR_ERR(obj->mm.res);
>  
> -	pages = intel_region_ttm_resource_to_st(obj->mm.region, obj->mm.res);
> -	if (IS_ERR(pages)) {
> -		err = PTR_ERR(pages);
> +	obj->mm.rsgt = intel_region_ttm_resource_to_rsgt(obj->mm.region,
> +							 obj->mm.res);
> +	if (IS_ERR(obj->mm.rsgt)) {
> +		err = PTR_ERR(obj->mm.rsgt);
>  		goto err_free_resource;
>  	}
>  
> +	pages = &obj->mm.rsgt->table;
>  	__i915_gem_object_set_pages(obj, pages, i915_sg_dma_sizes(pages->sgl));
>  
>  	return 0;
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] drm/i915: Introduce refcounted sg-tables
  2021-10-13 14:41     ` [Intel-gfx] " Daniel Vetter
@ 2021-10-13 14:55       ` Thomas Hellström
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-13 14:55 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: intel-gfx, dri-devel, maarten.lankhorst, matthew.auld


On 10/13/21 16:41, Daniel Vetter wrote:
> On Fri, Oct 08, 2021 at 03:35:26PM +0200, Thomas Hellström wrote:
>> As we start to introduce asynchronous failsafe object migration,
>> where we update the object state and then submit asynchronous
>> commands we need to record what memory resources are actually used
>> by various part of the command stream. Initially for three purposes:
>>
>> 1) Error capture.
>> 2) Asynchronous migration error recovery.
>> 3) Asynchronous vma bind.
>>
>> At the time where these happens, the object state may have been updated
>> to be several migrations ahead and object sg-tables discarded.
>>
>> In order to make it possible to keep sg-tables with memory resource
>> information for these operations, introduce refcounted sg-tables that
>> aren't freed until the last user is done with them.
>>
>> The alternative would be to reference information sitting on the
>> corresponding ttm_resources which typically have the same lifetime as
>> these refcountes sg_tables, but that leads to other awkward constructs:
>> Due to the design direction chosen for ttm resource managers that would
>> lead to diamond-style inheritance, the LMEM resources may sometimes be
>> prematurely freed, and finally the subclassed struct ttm_resource would
>> have to bleed into the asynchronous vma bind code.
> On the diamon inheritence I was pondering some more whether we shouldn't
> just do the classic C union horrors, i.e.
>
> struct ttm_resource {
> 	/* stuff */
> };
>
> struct ttm_drm_mm_resource {
> 	struct ttm_resource base;
> 	struct drm_mm_node;
> };
>
> struct ttm_buddy_resource {
> 	struct ttm_resource base;
> 	struct drm_buddy_node;
> };
>
> Whatever else we have, maybe also integer resources for guc_id.
>
> And then the horrors:
>
> struct i915_gem_resource {
> 	union {
> 		struct ttm_resource base;
> 		struct ttm_drm_mm_resource drm_mm;
> 		struct ttm_buffer_object buddy;
> 	};
>
> 	/* i915 stuff */
> };
>
> BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
> 	offsetof(struct i915_gem_resource, drmm_mm.base))
> BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
> 	offsetof(struct i915_gem_resource, buddy.base))
>
> This is horrible, but also in official C89 and later unions are the only
> ways to do inheritance. The only reason we can do different in linux is
> because we compile with strict aliasing turned off.
>
> So I think we can shrug this off as officially sanctioned horrors. There's
> a small downside with overhead maybe, but I don't think the amount in
> difference between the various allocators is big enough that we should
> care. Plus a pointer to driver stuff to resolve the diamond inheritance
> through different means isn't free either.

Yes, this is exactly what was meant by "awkward constructs" in the 
commit message,

My thoughts are still that all this could be avoided by a different 
design for struct ttm_resource,
but I agree we can do with refcounted sg-lists for now, to see where 
this ends up when all related resource-on-lru stuff lands in TTM.

/Thomas



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] drm/i915: Introduce refcounted sg-tables
@ 2021-10-13 14:55       ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-13 14:55 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: intel-gfx, dri-devel, maarten.lankhorst, matthew.auld


On 10/13/21 16:41, Daniel Vetter wrote:
> On Fri, Oct 08, 2021 at 03:35:26PM +0200, Thomas Hellström wrote:
>> As we start to introduce asynchronous failsafe object migration,
>> where we update the object state and then submit asynchronous
>> commands we need to record what memory resources are actually used
>> by various part of the command stream. Initially for three purposes:
>>
>> 1) Error capture.
>> 2) Asynchronous migration error recovery.
>> 3) Asynchronous vma bind.
>>
>> At the time where these happens, the object state may have been updated
>> to be several migrations ahead and object sg-tables discarded.
>>
>> In order to make it possible to keep sg-tables with memory resource
>> information for these operations, introduce refcounted sg-tables that
>> aren't freed until the last user is done with them.
>>
>> The alternative would be to reference information sitting on the
>> corresponding ttm_resources which typically have the same lifetime as
>> these refcountes sg_tables, but that leads to other awkward constructs:
>> Due to the design direction chosen for ttm resource managers that would
>> lead to diamond-style inheritance, the LMEM resources may sometimes be
>> prematurely freed, and finally the subclassed struct ttm_resource would
>> have to bleed into the asynchronous vma bind code.
> On the diamon inheritence I was pondering some more whether we shouldn't
> just do the classic C union horrors, i.e.
>
> struct ttm_resource {
> 	/* stuff */
> };
>
> struct ttm_drm_mm_resource {
> 	struct ttm_resource base;
> 	struct drm_mm_node;
> };
>
> struct ttm_buddy_resource {
> 	struct ttm_resource base;
> 	struct drm_buddy_node;
> };
>
> Whatever else we have, maybe also integer resources for guc_id.
>
> And then the horrors:
>
> struct i915_gem_resource {
> 	union {
> 		struct ttm_resource base;
> 		struct ttm_drm_mm_resource drm_mm;
> 		struct ttm_buffer_object buddy;
> 	};
>
> 	/* i915 stuff */
> };
>
> BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
> 	offsetof(struct i915_gem_resource, drmm_mm.base))
> BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
> 	offsetof(struct i915_gem_resource, buddy.base))
>
> This is horrible, but also in official C89 and later unions are the only
> ways to do inheritance. The only reason we can do different in linux is
> because we compile with strict aliasing turned off.
>
> So I think we can shrug this off as officially sanctioned horrors. There's
> a small downside with overhead maybe, but I don't think the amount in
> difference between the various allocators is big enough that we should
> care. Plus a pointer to driver stuff to resolve the diamond inheritance
> through different means isn't free either.

Yes, this is exactly what was meant by "awkward constructs" in the 
commit message,

My thoughts are still that all this could be avoided by a different 
design for struct ttm_resource,
but I agree we can do with refcounted sg-lists for now, to see where 
this ends up when all related resource-on-lru stuff lands in TTM.

/Thomas



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/6] drm/i915: Failsafe migration blits
  2021-10-08 13:35 ` [Intel-gfx] " Thomas Hellström
@ 2021-10-14  1:50   ` Dave Airlie
  -1 siblings, 0 replies; 33+ messages in thread
From: Dave Airlie @ 2021-10-14  1:50 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Intel Graphics Development, dri-devel, Maarten Lankhorst, Matthew Auld

On Fri, 8 Oct 2021 at 23:36, Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> This patch series introduces failsafe migration blits.
> The reason for this seemingly strange concept is that if the initial
> clearing or readback of LMEM fails for some reason, and we then set up
> either GPU- or CPU ptes to the allocated LMEM, we can expose old
> contents from other clients.

Can we enumerate "for some reason" here?

This feels like "security" with no defined threat model. Maybe if the
cover letter contains more details on the threat model it would make
more sense.

Dave.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH 0/6] drm/i915: Failsafe migration blits
@ 2021-10-14  1:50   ` Dave Airlie
  0 siblings, 0 replies; 33+ messages in thread
From: Dave Airlie @ 2021-10-14  1:50 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Intel Graphics Development, dri-devel, Maarten Lankhorst, Matthew Auld

On Fri, 8 Oct 2021 at 23:36, Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> This patch series introduces failsafe migration blits.
> The reason for this seemingly strange concept is that if the initial
> clearing or readback of LMEM fails for some reason, and we then set up
> either GPU- or CPU ptes to the allocated LMEM, we can expose old
> contents from other clients.

Can we enumerate "for some reason" here?

This feels like "security" with no defined threat model. Maybe if the
cover letter contains more details on the threat model it would make
more sense.

Dave.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH 0/6] drm/i915: Failsafe migration blits
  2021-10-14  1:50   ` [Intel-gfx] " Dave Airlie
@ 2021-10-14  7:29     ` Thomas Hellström
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-14  7:29 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Intel Graphics Development, dri-devel, Maarten Lankhorst, Matthew Auld

Hi, Dave,

On 10/14/21 03:50, Dave Airlie wrote:
> On Fri, 8 Oct 2021 at 23:36, Thomas Hellström
> <thomas.hellstrom@linux.intel.com> wrote:
>> This patch series introduces failsafe migration blits.
>> The reason for this seemingly strange concept is that if the initial
>> clearing or readback of LMEM fails for some reason, and we then set up
>> either GPU- or CPU ptes to the allocated LMEM, we can expose old
>> contents from other clients.
> Can we enumerate "for some reason" here?
>
> This feels like "security" with no defined threat model. Maybe if the
> cover letter contains more details on the threat model it would make
> more sense.

TBH, I'd be quite happy if we could find a way to skip this series (or 
even a reworked version) completely.

Assuming that the migration request setup code is bug-free enough to not 
never cause an engine reset, there are at least two ways I can see the 
migration fail:

1) The migration fence we will be depending on when fully async 
(ttm->moving) may signal with error after the following:
malicious_batchbuffer_causing_reset -> async eviction -> allocation -> 
async clearing

2) malicious_batchbuffers_causing_gt_wedge submitted to copy engine -> 
migration_blit submitted to  copy_engine. If wedging the gt, the 
migration blit will never be executed, fence->error will end up with 
-EIO but TTM will happily fault the pages to user-space.

Now we had other versions around looking at the ttm_bo->moving errors at 
vma binding and cpu faulting, but this was the direction chosen after 
discussions with our arch team. Either way we'd probably want to block 
the error propagation after async_eviction.

I can of course add 1) and 2) above to the cover-letter, but if you have 
any additional input on the best way to handle this, that'd be appreciated.

Thanks,

Thomas





> Dave.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/6] drm/i915: Failsafe migration blits
@ 2021-10-14  7:29     ` Thomas Hellström
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellström @ 2021-10-14  7:29 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Intel Graphics Development, dri-devel, Maarten Lankhorst, Matthew Auld

Hi, Dave,

On 10/14/21 03:50, Dave Airlie wrote:
> On Fri, 8 Oct 2021 at 23:36, Thomas Hellström
> <thomas.hellstrom@linux.intel.com> wrote:
>> This patch series introduces failsafe migration blits.
>> The reason for this seemingly strange concept is that if the initial
>> clearing or readback of LMEM fails for some reason, and we then set up
>> either GPU- or CPU ptes to the allocated LMEM, we can expose old
>> contents from other clients.
> Can we enumerate "for some reason" here?
>
> This feels like "security" with no defined threat model. Maybe if the
> cover letter contains more details on the threat model it would make
> more sense.

TBH, I'd be quite happy if we could find a way to skip this series (or 
even a reworked version) completely.

Assuming that the migration request setup code is bug-free enough to not 
never cause an engine reset, there are at least two ways I can see the 
migration fail:

1) The migration fence we will be depending on when fully async 
(ttm->moving) may signal with error after the following:
malicious_batchbuffer_causing_reset -> async eviction -> allocation -> 
async clearing

2) malicious_batchbuffers_causing_gt_wedge submitted to copy engine -> 
migration_blit submitted to  copy_engine. If wedging the gt, the 
migration blit will never be executed, fence->error will end up with 
-EIO but TTM will happily fault the pages to user-space.

Now we had other versions around looking at the ttm_bo->moving errors at 
vma binding and cpu faulting, but this was the direction chosen after 
discussions with our arch team. Either way we'd probably want to block 
the error propagation after async_eviction.

I can of course add 1) and 2) above to the cover-letter, but if you have 
any additional input on the best way to handle this, that'd be appreciated.

Thanks,

Thomas





> Dave.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-10-14  7:30 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-08 13:35 [PATCH 0/6] drm/i915: Failsafe migration blits Thomas Hellström
2021-10-08 13:35 ` [Intel-gfx] " Thomas Hellström
2021-10-08 13:35 ` [PATCH 1/6] drm/i915: Update dma_fence_work Thomas Hellström
2021-10-08 13:35   ` [Intel-gfx] " Thomas Hellström
2021-10-13 12:41   ` Daniel Vetter
2021-10-13 12:41     ` [Intel-gfx] " Daniel Vetter
2021-10-13 12:59     ` Thomas Hellström
2021-10-13 12:59       ` [Intel-gfx] " Thomas Hellström
2021-10-08 13:35 ` [PATCH 2/6] drm/i915: Introduce refcounted sg-tables Thomas Hellström
2021-10-08 13:35   ` [Intel-gfx] " Thomas Hellström
2021-10-13 14:41   ` Daniel Vetter
2021-10-13 14:41     ` [Intel-gfx] " Daniel Vetter
2021-10-13 14:55     ` Thomas Hellström
2021-10-13 14:55       ` [Intel-gfx] " Thomas Hellström
2021-10-08 13:35 ` [PATCH 3/6] drm/i915/ttm: Failsafe migration blits Thomas Hellström
2021-10-08 13:35   ` [Intel-gfx] " Thomas Hellström
2021-10-08 13:35 ` [PATCH 4/6] drm/i915: Add a struct dma_fence_work timeline Thomas Hellström
2021-10-08 13:35   ` [Intel-gfx] " Thomas Hellström
2021-10-13 12:43   ` Daniel Vetter
2021-10-13 14:21     ` Thomas Hellström
2021-10-13 14:33       ` Daniel Vetter
2021-10-13 14:39         ` Thomas Hellström
2021-10-08 13:35 ` [PATCH 5/6] drm/i915/ttm: Attach the migration fence to a region timeline on eviction Thomas Hellström
2021-10-08 13:35   ` [Intel-gfx] " Thomas Hellström
2021-10-08 13:35 ` [PATCH 6/6] drm/i915: Use irq work for coalescing-only dma-fence-work Thomas Hellström
2021-10-08 13:35   ` [Intel-gfx] " Thomas Hellström
2021-10-08 17:00 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: Failsafe migration blits Patchwork
2021-10-08 17:29 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-10-09  0:04 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-10-14  1:50 ` [PATCH 0/6] " Dave Airlie
2021-10-14  1:50   ` [Intel-gfx] " Dave Airlie
2021-10-14  7:29   ` Thomas Hellström
2021-10-14  7:29     ` Thomas Hellström

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.