All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/10] drm/i915: ttm for stolen
@ 2022-06-20 21:33 ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Robert Beckett, Thomas Hellström, kernel, Matthew Auld

This series refactors i915's stolen memory region to use ttm.

v2:	handle disabled stolen similar to legacy version.
	relying on ttm to fail allocs works fine, but is dmesg noisy and causes testing
	dmesg warning regressions.

v3:	rebase to latest drm-tip.
	fix v2 code refactor which could leave a buffer pinned.
	locally passes fftl again now.

v4:	- Allow memory regions creators to do allocation. Allows stolen region to track
	  it's own reservations.
	- Pre-reserve first page of stolen mem (add back WaSkipStolenMemoryFirstPage:bdw+)
	- Improve commit descritpion for "drm/i915: sanitize mem_flags for stolen buffers"
	- replace i915_gem_object_pin_pages_unlocked() call with manual locking and pinning.
	  this avoids ww ctx class reuse during context creation -> ring vma obj alloc.

v5:	- detect both types of stolen as stolen buffers in
	  "drm/i915: sanitize mem_flags for stolen buffers"
	- in stolen_object_init limit page size to mem region minimum.
	  The range allocator expects the page_size to define the
	  alignment

v6:	- Share first 4 patches from ttm for internal series as generic
	  i915 ttm fixes
	- Drop patch 4 from v5. We don't need separate object ops just
	  to satisfy test interfaces. The tests have now been fixed via
	  checking whether the memory region is private to decide
	  whether to mmap
	- Add new buffer pin alloc flag to allow creation of buffers in
	  their final ttm placement instead of deferring until
	  get_pages. This fixes legacy fallback paths for buffer
	  allocations during stolen memory pressure.

v7: 	- fix mock_region_get_pages() to correctly handle I915_BO_INVALID_OFFSET

Robert Beckett (10):
  drm/i915/ttm: dont trample cache_level overrides during ttm move
  drm/i915: limit ttm to dma32 for i965G[M]
  drm/i915/ttm: only trust snooping for dgfx when deciding default
    cache_level
  drm/i915/gem: selftest should not attempt mmap of private regions
  drm/i915: instantiate ttm ranger manager for stolen memory
  drm/i915: sanitize mem_flags for stolen buffers
  drm/i915: ttm move/clear logic fix
  drm/i915: allow memory region creators to alloc and free the region
  drm/i915/ttm: add buffer pin on alloc flag
  drm/i915: stolen memory use ttm backend

 drivers/gpu/drm/i915/display/intel_fbc.c      |  78 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    | 440 +++++++-----------
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h    |  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  29 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   7 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  47 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |   3 +
 drivers/gpu/drm/i915/gt/intel_rc6.c           |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |   7 +-
 drivers/gpu/drm/i915/i915_drv.h               |   5 -
 drivers/gpu/drm/i915/intel_memory_region.c    |  16 +-
 drivers/gpu/drm/i915/intel_memory_region.h    |   2 +
 drivers/gpu/drm/i915/intel_region_ttm.c       |  80 +++-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 18 files changed, 423 insertions(+), 369 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v7 00/10] drm/i915: ttm for stolen
@ 2022-06-20 21:33 ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Thomas Hellström, kernel, Matthew Auld

This series refactors i915's stolen memory region to use ttm.

v2:	handle disabled stolen similar to legacy version.
	relying on ttm to fail allocs works fine, but is dmesg noisy and causes testing
	dmesg warning regressions.

v3:	rebase to latest drm-tip.
	fix v2 code refactor which could leave a buffer pinned.
	locally passes fftl again now.

v4:	- Allow memory regions creators to do allocation. Allows stolen region to track
	  it's own reservations.
	- Pre-reserve first page of stolen mem (add back WaSkipStolenMemoryFirstPage:bdw+)
	- Improve commit descritpion for "drm/i915: sanitize mem_flags for stolen buffers"
	- replace i915_gem_object_pin_pages_unlocked() call with manual locking and pinning.
	  this avoids ww ctx class reuse during context creation -> ring vma obj alloc.

v5:	- detect both types of stolen as stolen buffers in
	  "drm/i915: sanitize mem_flags for stolen buffers"
	- in stolen_object_init limit page size to mem region minimum.
	  The range allocator expects the page_size to define the
	  alignment

v6:	- Share first 4 patches from ttm for internal series as generic
	  i915 ttm fixes
	- Drop patch 4 from v5. We don't need separate object ops just
	  to satisfy test interfaces. The tests have now been fixed via
	  checking whether the memory region is private to decide
	  whether to mmap
	- Add new buffer pin alloc flag to allow creation of buffers in
	  their final ttm placement instead of deferring until
	  get_pages. This fixes legacy fallback paths for buffer
	  allocations during stolen memory pressure.

v7: 	- fix mock_region_get_pages() to correctly handle I915_BO_INVALID_OFFSET

Robert Beckett (10):
  drm/i915/ttm: dont trample cache_level overrides during ttm move
  drm/i915: limit ttm to dma32 for i965G[M]
  drm/i915/ttm: only trust snooping for dgfx when deciding default
    cache_level
  drm/i915/gem: selftest should not attempt mmap of private regions
  drm/i915: instantiate ttm ranger manager for stolen memory
  drm/i915: sanitize mem_flags for stolen buffers
  drm/i915: ttm move/clear logic fix
  drm/i915: allow memory region creators to alloc and free the region
  drm/i915/ttm: add buffer pin on alloc flag
  drm/i915: stolen memory use ttm backend

 drivers/gpu/drm/i915/display/intel_fbc.c      |  78 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    | 440 +++++++-----------
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h    |  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  29 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   7 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  47 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |   3 +
 drivers/gpu/drm/i915/gt/intel_rc6.c           |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |   7 +-
 drivers/gpu/drm/i915/i915_drv.h               |   5 -
 drivers/gpu/drm/i915/intel_memory_region.c    |  16 +-
 drivers/gpu/drm/i915/intel_memory_region.h    |   2 +
 drivers/gpu/drm/i915/intel_region_ttm.c       |  80 +++-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 18 files changed, 423 insertions(+), 369 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 01/10] drm/i915/ttm: dont trample cache_level overrides during ttm move
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-06-20 21:33   ` Robert Beckett
  -1 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: kernel, Robert Beckett, Matthew Auld, Thomas Hellström,
	linux-kernel

Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c       | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c          | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c     | 9 ++++++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
 	obj->cache_level = cache_level;
+	obj->ttm.cache_level_override = true;
 
 	if (cache_level != I915_CACHE_NONE)
 		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
 		struct i915_gem_object_page_iter get_io_page;
 		struct drm_i915_gem_object *backup;
 		bool created:1;
+		bool cache_level_override:1;
 	} ttm;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_gem_object_init_memory_region(obj, mem);
 	i915_ttm_adjust_domains_after_move(obj);
 	i915_ttm_adjust_gem_after_move(obj);
+	obj->ttm.cache_level_override = false;
 	i915_gem_object_unlock(obj);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 	obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
 		I915_BO_FLAG_STRUCT_PAGE;
 
-	cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-					   bo->ttm);
-	i915_gem_object_set_cache_coherency(obj, cache_level);
+	if (!obj->ttm.cache_level_override) {
+		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+						   bo->resource, bo->ttm);
+		i915_gem_object_set_cache_coherency(obj, cache_level);
+		obj->ttm.cache_level_override = false;
+	}
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 01/10] drm/i915/ttm: dont trample cache_level overrides during ttm move
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Robert Beckett, Thomas Hellström, kernel, Matthew Auld,
	linux-kernel

Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c       | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c          | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c     | 9 ++++++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
 	obj->cache_level = cache_level;
+	obj->ttm.cache_level_override = true;
 
 	if (cache_level != I915_CACHE_NONE)
 		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
 		struct i915_gem_object_page_iter get_io_page;
 		struct drm_i915_gem_object *backup;
 		bool created:1;
+		bool cache_level_override:1;
 	} ttm;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_gem_object_init_memory_region(obj, mem);
 	i915_ttm_adjust_domains_after_move(obj);
 	i915_ttm_adjust_gem_after_move(obj);
+	obj->ttm.cache_level_override = false;
 	i915_gem_object_unlock(obj);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 	obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
 		I915_BO_FLAG_STRUCT_PAGE;
 
-	cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-					   bo->ttm);
-	i915_gem_object_set_cache_coherency(obj, cache_level);
+	if (!obj->ttm.cache_level_override) {
+		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+						   bo->resource, bo->ttm);
+		i915_gem_object_set_cache_coherency(obj, cache_level);
+		obj->ttm.cache_level_override = false;
+	}
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v7 01/10] drm/i915/ttm: dont trample cache_level overrides during ttm move
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Thomas Hellström, kernel, Matthew Auld, linux-kernel

Various places within the driver override the default chosen cache_level.
Before ttm, these overrides were permanent until explicitly changed again
or for the lifetime of the buffer.

TTM movement code came along and decided that it could make that
decision at that time, which is usually well after object creation, so
overrode the cache_level decision and reverted it back to its default
decision.

Add logic to indicate whether the caching mode has been set by anything
other than the move logic. If so, assume that the code that overrode the
defaults knows best and keep it.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c       | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c          | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c     | 9 ++++++---
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 06b1b188ce5a..519887769c08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -125,6 +125,7 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
 	obj->cache_level = cache_level;
+	obj->ttm.cache_level_override = true;
 
 	if (cache_level != I915_CACHE_NONE)
 		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2c88bdb8ff7c..6632ed52e919 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -605,6 +605,7 @@ struct drm_i915_gem_object {
 		struct i915_gem_object_page_iter get_io_page;
 		struct drm_i915_gem_object *backup;
 		bool created:1;
+		bool cache_level_override:1;
 	} ttm;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4c25d9b2f138..27d59639177f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1241,6 +1241,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_gem_object_init_memory_region(obj, mem);
 	i915_ttm_adjust_domains_after_move(obj);
 	i915_ttm_adjust_gem_after_move(obj);
+	obj->ttm.cache_level_override = false;
 	i915_gem_object_unlock(obj);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index a10716f4e717..4c1de0b4a10f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -123,9 +123,12 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 	obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
 		I915_BO_FLAG_STRUCT_PAGE;
 
-	cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-					   bo->ttm);
-	i915_gem_object_set_cache_coherency(obj, cache_level);
+	if (!obj->ttm.cache_level_override) {
+		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
+						   bo->resource, bo->ttm);
+		i915_gem_object_set_cache_coherency(obj, cache_level);
+		obj->ttm.cache_level_override = false;
+	}
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 02/10] drm/i915: limit ttm to dma32 for i965G[M]
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-06-20 21:33   ` Robert Beckett
  -1 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: kernel, Robert Beckett, Matthew Auld, Thomas Hellström,
	linux-kernel

i965G[M] cannot relocate objects above 4GiB.
Ensure ttm uses dma32 on these systems.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 62ff77445b01..fd2ecfdd8fa1 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -32,10 +32,15 @@
 int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 {
 	struct drm_device *drm = &dev_priv->drm;
+	bool use_dma32 = false;
+
+	/* i965g[m] cannot relocate objects above 4GiB. */
+	if (IS_I965GM(dev_priv) || IS_I965G(dev_priv))
+		use_dma32 = true;
 
 	return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
 			       drm->dev, drm->anon_inode->i_mapping,
-			       drm->vma_offset_manager, false, false);
+			       drm->vma_offset_manager, false, use_dma32);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 02/10] drm/i915: limit ttm to dma32 for i965G[M]
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Robert Beckett, Thomas Hellström, kernel, Matthew Auld,
	linux-kernel

i965G[M] cannot relocate objects above 4GiB.
Ensure ttm uses dma32 on these systems.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 62ff77445b01..fd2ecfdd8fa1 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -32,10 +32,15 @@
 int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 {
 	struct drm_device *drm = &dev_priv->drm;
+	bool use_dma32 = false;
+
+	/* i965g[m] cannot relocate objects above 4GiB. */
+	if (IS_I965GM(dev_priv) || IS_I965G(dev_priv))
+		use_dma32 = true;
 
 	return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
 			       drm->dev, drm->anon_inode->i_mapping,
-			       drm->vma_offset_manager, false, false);
+			       drm->vma_offset_manager, false, use_dma32);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v7 02/10] drm/i915: limit ttm to dma32 for i965G[M]
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Thomas Hellström, kernel, Matthew Auld, linux-kernel

i965G[M] cannot relocate objects above 4GiB.
Ensure ttm uses dma32 on these systems.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/intel_region_ttm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 62ff77445b01..fd2ecfdd8fa1 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -32,10 +32,15 @@
 int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 {
 	struct drm_device *drm = &dev_priv->drm;
+	bool use_dma32 = false;
+
+	/* i965g[m] cannot relocate objects above 4GiB. */
+	if (IS_I965GM(dev_priv) || IS_I965G(dev_priv))
+		use_dma32 = true;
 
 	return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
 			       drm->dev, drm->anon_inode->i_mapping,
-			       drm->vma_offset_manager, false, false);
+			       drm->vma_offset_manager, false, use_dma32);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 03/10] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-06-20 21:33   ` Robert Beckett
  -1 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: kernel, Robert Beckett, Matthew Auld, Thomas Hellström,
	linux-kernel

By default i915_ttm_cache_level() decides I915_CACHE_LLC if HAS_SNOOP.
This is divergent from existing backends code which only considers
HAS_LLC.
Testing shows that trusting snooping on gen5- is unreliable and bsw via
ggtt mappings, so limit DGFX for now and maintain previous behaviour.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 4c1de0b4a10f..40249fa28a7a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -46,7 +46,9 @@ static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 		     struct ttm_tt *ttm)
 {
-	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+	bool can_snoop = HAS_SNOOP(i915) && IS_DGFX(i915);
+
+	return ((HAS_LLC(i915) || can_snoop) &&
 		!i915_ttm_gtt_binds_lmem(res) &&
 		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
 		I915_CACHE_NONE;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 03/10] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Robert Beckett, Thomas Hellström, kernel, Matthew Auld,
	linux-kernel

By default i915_ttm_cache_level() decides I915_CACHE_LLC if HAS_SNOOP.
This is divergent from existing backends code which only considers
HAS_LLC.
Testing shows that trusting snooping on gen5- is unreliable and bsw via
ggtt mappings, so limit DGFX for now and maintain previous behaviour.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 4c1de0b4a10f..40249fa28a7a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -46,7 +46,9 @@ static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 		     struct ttm_tt *ttm)
 {
-	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+	bool can_snoop = HAS_SNOOP(i915) && IS_DGFX(i915);
+
+	return ((HAS_LLC(i915) || can_snoop) &&
 		!i915_ttm_gtt_binds_lmem(res) &&
 		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
 		I915_CACHE_NONE;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v7 03/10] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Thomas Hellström, kernel, Matthew Auld, linux-kernel

By default i915_ttm_cache_level() decides I915_CACHE_LLC if HAS_SNOOP.
This is divergent from existing backends code which only considers
HAS_LLC.
Testing shows that trusting snooping on gen5- is unreliable and bsw via
ggtt mappings, so limit DGFX for now and maintain previous behaviour.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 4c1de0b4a10f..40249fa28a7a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -46,7 +46,9 @@ static enum i915_cache_level
 i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 		     struct ttm_tt *ttm)
 {
-	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+	bool can_snoop = HAS_SNOOP(i915) && IS_DGFX(i915);
+
+	return ((HAS_LLC(i915) || can_snoop) &&
 		!i915_ttm_gtt_binds_lmem(res) &&
 		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
 		I915_CACHE_NONE;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 04/10] drm/i915/gem: selftest should not attempt mmap of private regions
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-06-20 21:33   ` Robert Beckett
  -1 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: kernel, Robert Beckett, Matthew Auld, Thomas Hellström,
	linux-kernel

During testing make can_mmap consider whether the region is private.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum i915_mmap_type type)
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 	bool no_map;
 
+	if (obj->mm.region && obj->mm.region->private)
+		return false;
+
 	if (obj->ops->mmap_offset)
 		return type == I915_MMAP_TYPE_FIXED;
 	else if (type == I915_MMAP_TYPE_FIXED)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 04/10] drm/i915/gem: selftest should not attempt mmap of private regions
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Robert Beckett, Thomas Hellström, kernel, Matthew Auld,
	linux-kernel

During testing make can_mmap consider whether the region is private.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum i915_mmap_type type)
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 	bool no_map;
 
+	if (obj->mm.region && obj->mm.region->private)
+		return false;
+
 	if (obj->ops->mmap_offset)
 		return type == I915_MMAP_TYPE_FIXED;
 	else if (type == I915_MMAP_TYPE_FIXED)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v7 04/10] drm/i915/gem: selftest should not attempt mmap of private regions
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Thomas Hellström, kernel, Matthew Auld, linux-kernel

During testing make can_mmap consider whether the region is private.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5bc93a1ce3e3..76181e28c75e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -869,6 +869,9 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum i915_mmap_type type)
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 	bool no_map;
 
+	if (obj->mm.region && obj->mm.region->private)
+		return false;
+
 	if (obj->ops->mmap_offset)
 		return type == I915_MMAP_TYPE_FIXED;
 	else if (type == I915_MMAP_TYPE_FIXED)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 05/10] drm/i915: instantiate ttm ranger manager for stolen memory
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-06-20 21:33   ` Robert Beckett
  -1 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: kernel, Robert Beckett, Matthew Auld, Thomas Hellström,
	linux-kernel

prepare for ttm based stolen region by using ttm range manager
as the resource manager for stolen region.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  6 ++--
 drivers/gpu/drm/i915/intel_region_ttm.c      | 31 +++++++++++++++-----
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 40249fa28a7a..675e9ab30396 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -60,11 +60,13 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
 	struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
 
 	/* There's some room for optimization here... */
-	GEM_BUG_ON(ttm_mem_type != I915_PL_SYSTEM &&
-		   ttm_mem_type < I915_PL_LMEM0);
+	GEM_BUG_ON(ttm_mem_type == I915_PL_GGTT);
+
 	if (ttm_mem_type == I915_PL_SYSTEM)
 		return intel_memory_region_lookup(i915, INTEL_MEMORY_SYSTEM,
 						  0);
+	if (ttm_mem_type == I915_PL_STOLEN)
+		return i915->mm.stolen_region;
 
 	return intel_memory_region_lookup(i915, INTEL_MEMORY_LOCAL,
 					  ttm_mem_type - I915_PL_LMEM0);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index fd2ecfdd8fa1..694e9acb69e2 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -54,7 +54,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private *dev_priv)
 
 /*
  * Map the i915 memory regions to TTM memory types. We use the
- * driver-private types for now, reserving TTM_PL_VRAM for stolen
+ * driver-private types for now, reserving I915_PL_STOLEN for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
 int intel_region_to_ttm_type(const struct intel_memory_region *mem)
@@ -63,11 +63,17 @@ int intel_region_to_ttm_type(const struct intel_memory_region *mem)
 
 	GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
 		   mem->type != INTEL_MEMORY_MOCK &&
-		   mem->type != INTEL_MEMORY_SYSTEM);
+		   mem->type != INTEL_MEMORY_SYSTEM &&
+		   mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+		   mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
 	if (mem->type == INTEL_MEMORY_SYSTEM)
 		return TTM_PL_SYSTEM;
 
+	if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+	    mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+		return I915_PL_STOLEN;
+
 	type = mem->instance + TTM_PL_PRIV;
 	GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -91,10 +97,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 	int mem_type = intel_region_to_ttm_type(mem);
 	int ret;
 
-	ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
-				      resource_size(&mem->region),
-				      mem->io_size,
-				      mem->min_page_size, PAGE_SIZE);
+	if (mem_type == I915_PL_STOLEN) {
+		ret = ttm_range_man_init(bdev, mem_type, false,
+					 resource_size(&mem->region) >> PAGE_SHIFT);
+		mem->is_range_manager = true;
+	} else {
+		ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+					      resource_size(&mem->region),
+					      mem->io_size,
+					      mem->min_page_size, PAGE_SIZE);
+	}
 	if (ret)
 		return ret;
 
@@ -114,6 +126,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
 	struct ttm_resource_manager *man = mem->region_private;
+	int mem_type = intel_region_to_ttm_type(mem);
 	int ret = -EBUSY;
 	int count;
 
@@ -144,8 +157,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
 	if (ret || !man)
 		return ret;
 
-	ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
-				      intel_region_to_ttm_type(mem));
+	if (mem_type == I915_PL_STOLEN)
+		ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+	else
+		ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
 	GEM_WARN_ON(ret);
 	mem->region_private = NULL;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 05/10] drm/i915: instantiate ttm ranger manager for stolen memory
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Robert Beckett, Thomas Hellström, kernel, Matthew Auld,
	linux-kernel

prepare for ttm based stolen region by using ttm range manager
as the resource manager for stolen region.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  6 ++--
 drivers/gpu/drm/i915/intel_region_ttm.c      | 31 +++++++++++++++-----
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 40249fa28a7a..675e9ab30396 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -60,11 +60,13 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
 	struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
 
 	/* There's some room for optimization here... */
-	GEM_BUG_ON(ttm_mem_type != I915_PL_SYSTEM &&
-		   ttm_mem_type < I915_PL_LMEM0);
+	GEM_BUG_ON(ttm_mem_type == I915_PL_GGTT);
+
 	if (ttm_mem_type == I915_PL_SYSTEM)
 		return intel_memory_region_lookup(i915, INTEL_MEMORY_SYSTEM,
 						  0);
+	if (ttm_mem_type == I915_PL_STOLEN)
+		return i915->mm.stolen_region;
 
 	return intel_memory_region_lookup(i915, INTEL_MEMORY_LOCAL,
 					  ttm_mem_type - I915_PL_LMEM0);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index fd2ecfdd8fa1..694e9acb69e2 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -54,7 +54,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private *dev_priv)
 
 /*
  * Map the i915 memory regions to TTM memory types. We use the
- * driver-private types for now, reserving TTM_PL_VRAM for stolen
+ * driver-private types for now, reserving I915_PL_STOLEN for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
 int intel_region_to_ttm_type(const struct intel_memory_region *mem)
@@ -63,11 +63,17 @@ int intel_region_to_ttm_type(const struct intel_memory_region *mem)
 
 	GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
 		   mem->type != INTEL_MEMORY_MOCK &&
-		   mem->type != INTEL_MEMORY_SYSTEM);
+		   mem->type != INTEL_MEMORY_SYSTEM &&
+		   mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+		   mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
 	if (mem->type == INTEL_MEMORY_SYSTEM)
 		return TTM_PL_SYSTEM;
 
+	if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+	    mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+		return I915_PL_STOLEN;
+
 	type = mem->instance + TTM_PL_PRIV;
 	GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -91,10 +97,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 	int mem_type = intel_region_to_ttm_type(mem);
 	int ret;
 
-	ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
-				      resource_size(&mem->region),
-				      mem->io_size,
-				      mem->min_page_size, PAGE_SIZE);
+	if (mem_type == I915_PL_STOLEN) {
+		ret = ttm_range_man_init(bdev, mem_type, false,
+					 resource_size(&mem->region) >> PAGE_SHIFT);
+		mem->is_range_manager = true;
+	} else {
+		ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+					      resource_size(&mem->region),
+					      mem->io_size,
+					      mem->min_page_size, PAGE_SIZE);
+	}
 	if (ret)
 		return ret;
 
@@ -114,6 +126,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
 	struct ttm_resource_manager *man = mem->region_private;
+	int mem_type = intel_region_to_ttm_type(mem);
 	int ret = -EBUSY;
 	int count;
 
@@ -144,8 +157,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
 	if (ret || !man)
 		return ret;
 
-	ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
-				      intel_region_to_ttm_type(mem));
+	if (mem_type == I915_PL_STOLEN)
+		ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+	else
+		ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
 	GEM_WARN_ON(ret);
 	mem->region_private = NULL;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v7 05/10] drm/i915: instantiate ttm ranger manager for stolen memory
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Thomas Hellström, kernel, Matthew Auld, linux-kernel

prepare for ttm based stolen region by using ttm range manager
as the resource manager for stolen region.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  6 ++--
 drivers/gpu/drm/i915/intel_region_ttm.c      | 31 +++++++++++++++-----
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 40249fa28a7a..675e9ab30396 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -60,11 +60,13 @@ i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
 	struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
 
 	/* There's some room for optimization here... */
-	GEM_BUG_ON(ttm_mem_type != I915_PL_SYSTEM &&
-		   ttm_mem_type < I915_PL_LMEM0);
+	GEM_BUG_ON(ttm_mem_type == I915_PL_GGTT);
+
 	if (ttm_mem_type == I915_PL_SYSTEM)
 		return intel_memory_region_lookup(i915, INTEL_MEMORY_SYSTEM,
 						  0);
+	if (ttm_mem_type == I915_PL_STOLEN)
+		return i915->mm.stolen_region;
 
 	return intel_memory_region_lookup(i915, INTEL_MEMORY_LOCAL,
 					  ttm_mem_type - I915_PL_LMEM0);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index fd2ecfdd8fa1..694e9acb69e2 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -54,7 +54,7 @@ void intel_region_ttm_device_fini(struct drm_i915_private *dev_priv)
 
 /*
  * Map the i915 memory regions to TTM memory types. We use the
- * driver-private types for now, reserving TTM_PL_VRAM for stolen
+ * driver-private types for now, reserving I915_PL_STOLEN for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
 int intel_region_to_ttm_type(const struct intel_memory_region *mem)
@@ -63,11 +63,17 @@ int intel_region_to_ttm_type(const struct intel_memory_region *mem)
 
 	GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
 		   mem->type != INTEL_MEMORY_MOCK &&
-		   mem->type != INTEL_MEMORY_SYSTEM);
+		   mem->type != INTEL_MEMORY_SYSTEM &&
+		   mem->type != INTEL_MEMORY_STOLEN_SYSTEM &&
+		   mem->type != INTEL_MEMORY_STOLEN_LOCAL);
 
 	if (mem->type == INTEL_MEMORY_SYSTEM)
 		return TTM_PL_SYSTEM;
 
+	if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+	    mem->type == INTEL_MEMORY_STOLEN_LOCAL)
+		return I915_PL_STOLEN;
+
 	type = mem->instance + TTM_PL_PRIV;
 	GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
 
@@ -91,10 +97,16 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 	int mem_type = intel_region_to_ttm_type(mem);
 	int ret;
 
-	ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
-				      resource_size(&mem->region),
-				      mem->io_size,
-				      mem->min_page_size, PAGE_SIZE);
+	if (mem_type == I915_PL_STOLEN) {
+		ret = ttm_range_man_init(bdev, mem_type, false,
+					 resource_size(&mem->region) >> PAGE_SHIFT);
+		mem->is_range_manager = true;
+	} else {
+		ret = i915_ttm_buddy_man_init(bdev, mem_type, false,
+					      resource_size(&mem->region),
+					      mem->io_size,
+					      mem->min_page_size, PAGE_SIZE);
+	}
 	if (ret)
 		return ret;
 
@@ -114,6 +126,7 @@ int intel_region_ttm_init(struct intel_memory_region *mem)
 int intel_region_ttm_fini(struct intel_memory_region *mem)
 {
 	struct ttm_resource_manager *man = mem->region_private;
+	int mem_type = intel_region_to_ttm_type(mem);
 	int ret = -EBUSY;
 	int count;
 
@@ -144,8 +157,10 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
 	if (ret || !man)
 		return ret;
 
-	ret = i915_ttm_buddy_man_fini(&mem->i915->bdev,
-				      intel_region_to_ttm_type(mem));
+	if (mem_type == I915_PL_STOLEN)
+		ret = ttm_range_man_fini(&mem->i915->bdev, mem_type);
+	else
+		ret = i915_ttm_buddy_man_fini(&mem->i915->bdev, mem_type);
 	GEM_WARN_ON(ret);
 	mem->region_private = NULL;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 06/10] drm/i915: sanitize mem_flags for stolen buffers
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-06-20 21:33   ` Robert Beckett
  -1 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: kernel, Robert Beckett, Matthew Auld, Thomas Hellström,
	linux-kernel

Stolen regions are not page backed or considered iomem.
Prevent flags indicating such.
This correctly prevents stolen buffers from attempting to directly map
them.

See i915_gem_object_has_struct_page() and i915_gem_object_has_iomem()
usage for where it would break otherwise.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 675e9ab30396..81c67ca9edda 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
+#include "gem/i915_gem_stolen.h"
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
@@ -124,8 +125,9 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 
 	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 
-	obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
-		I915_BO_FLAG_STRUCT_PAGE;
+	if (!i915_gem_object_is_stolen(obj))
+		obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
+			I915_BO_FLAG_STRUCT_PAGE;
 
 	if (!obj->ttm.cache_level_override) {
 		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 06/10] drm/i915: sanitize mem_flags for stolen buffers
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Robert Beckett, Thomas Hellström, kernel, Matthew Auld,
	linux-kernel

Stolen regions are not page backed or considered iomem.
Prevent flags indicating such.
This correctly prevents stolen buffers from attempting to directly map
them.

See i915_gem_object_has_struct_page() and i915_gem_object_has_iomem()
usage for where it would break otherwise.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 675e9ab30396..81c67ca9edda 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
+#include "gem/i915_gem_stolen.h"
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
@@ -124,8 +125,9 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 
 	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 
-	obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
-		I915_BO_FLAG_STRUCT_PAGE;
+	if (!i915_gem_object_is_stolen(obj))
+		obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
+			I915_BO_FLAG_STRUCT_PAGE;
 
 	if (!obj->ttm.cache_level_override) {
 		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v7 06/10] drm/i915: sanitize mem_flags for stolen buffers
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Thomas Hellström, kernel, Matthew Auld, linux-kernel

Stolen regions are not page backed or considered iomem.
Prevent flags indicating such.
This correctly prevents stolen buffers from attempting to directly map
them.

See i915_gem_object_has_struct_page() and i915_gem_object_has_iomem()
usage for where it would break otherwise.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 675e9ab30396..81c67ca9edda 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
+#include "gem/i915_gem_stolen.h"
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_gt.h"
@@ -124,8 +125,9 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 
 	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 
-	obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
-		I915_BO_FLAG_STRUCT_PAGE;
+	if (!i915_gem_object_is_stolen(obj))
+		obj->mem_flags |= i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
+			I915_BO_FLAG_STRUCT_PAGE;
 
 	if (!obj->ttm.cache_level_override) {
 		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 07/10] drm/i915: ttm move/clear logic fix
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-06-20 21:33   ` Robert Beckett
  -1 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: kernel, Robert Beckett, Matthew Auld, Thomas Hellström,
	linux-kernel

ttm managed buffers start off with system resource definitions and ttm_tt
tracking structures allocated (though unpopulated).
currently this prevents clearing of buffers on first move to desired
placements.

The desired behaviour is to clear user allocated buffers and any kernel
buffers that specifically requests it only.
Make the logic match the desired behaviour.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 22 +++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 81c67ca9edda..a3f8fc056dbc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,7 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include "drm/ttm/ttm_tt.h"
 #include <drm/ttm/ttm_bo_driver.h>
 
 #include "i915_deps.h"
@@ -476,6 +477,25 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
 	return fence;
 }
 
+static bool
+allow_clear(struct drm_i915_gem_object *obj, struct ttm_tt *ttm, struct ttm_resource *dst_mem)
+{
+	/* never clear stolen */
+	if (dst_mem->mem_type == I915_PL_STOLEN)
+		return false;
+	/*
+	 * we want to clear user buffers and any kernel buffers
+	 * that specifically request clearing.
+	 */
+	if (obj->flags & I915_BO_ALLOC_USER)
+		return true;
+
+	if (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)
+		return true;
+
+	return false;
+}
+
 /**
  * i915_ttm_move - The TTM move callback used by i915.
  * @bo: The buffer object.
@@ -526,7 +546,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		return PTR_ERR(dst_rsgt);
 
 	clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
-	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+	if (!clear || allow_clear(obj, ttm, dst_mem)) {
 		struct i915_deps deps;
 
 		i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 07/10] drm/i915: ttm move/clear logic fix
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Robert Beckett, Thomas Hellström, kernel, Matthew Auld,
	linux-kernel

ttm managed buffers start off with system resource definitions and ttm_tt
tracking structures allocated (though unpopulated).
currently this prevents clearing of buffers on first move to desired
placements.

The desired behaviour is to clear user allocated buffers and any kernel
buffers that specifically requests it only.
Make the logic match the desired behaviour.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 22 +++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 81c67ca9edda..a3f8fc056dbc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,7 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include "drm/ttm/ttm_tt.h"
 #include <drm/ttm/ttm_bo_driver.h>
 
 #include "i915_deps.h"
@@ -476,6 +477,25 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
 	return fence;
 }
 
+static bool
+allow_clear(struct drm_i915_gem_object *obj, struct ttm_tt *ttm, struct ttm_resource *dst_mem)
+{
+	/* never clear stolen */
+	if (dst_mem->mem_type == I915_PL_STOLEN)
+		return false;
+	/*
+	 * we want to clear user buffers and any kernel buffers
+	 * that specifically request clearing.
+	 */
+	if (obj->flags & I915_BO_ALLOC_USER)
+		return true;
+
+	if (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)
+		return true;
+
+	return false;
+}
+
 /**
  * i915_ttm_move - The TTM move callback used by i915.
  * @bo: The buffer object.
@@ -526,7 +546,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		return PTR_ERR(dst_rsgt);
 
 	clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
-	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+	if (!clear || allow_clear(obj, ttm, dst_mem)) {
 		struct i915_deps deps;
 
 		i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v7 07/10] drm/i915: ttm move/clear logic fix
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Thomas Hellström, kernel, Matthew Auld, linux-kernel

ttm managed buffers start off with system resource definitions and ttm_tt
tracking structures allocated (though unpopulated).
currently this prevents clearing of buffers on first move to desired
placements.

The desired behaviour is to clear user allocated buffers and any kernel
buffers that specifically requests it only.
Make the logic match the desired behaviour.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 22 +++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 81c67ca9edda..a3f8fc056dbc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -3,6 +3,7 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include "drm/ttm/ttm_tt.h"
 #include <drm/ttm/ttm_bo_driver.h>
 
 #include "i915_deps.h"
@@ -476,6 +477,25 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
 	return fence;
 }
 
+static bool
+allow_clear(struct drm_i915_gem_object *obj, struct ttm_tt *ttm, struct ttm_resource *dst_mem)
+{
+	/* never clear stolen */
+	if (dst_mem->mem_type == I915_PL_STOLEN)
+		return false;
+	/*
+	 * we want to clear user buffers and any kernel buffers
+	 * that specifically request clearing.
+	 */
+	if (obj->flags & I915_BO_ALLOC_USER)
+		return true;
+
+	if (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC)
+		return true;
+
+	return false;
+}
+
 /**
  * i915_ttm_move - The TTM move callback used by i915.
  * @bo: The buffer object.
@@ -526,7 +546,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 		return PTR_ERR(dst_rsgt);
 
 	clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
-	if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) {
+	if (!clear || allow_clear(obj, ttm, dst_mem)) {
 		struct i915_deps deps;
 
 		i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 08/10] drm/i915: allow memory region creators to alloc and free the region
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-06-20 21:33   ` Robert Beckett
  -1 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: kernel, Robert Beckett, Matthew Auld, Thomas Hellström,
	linux-kernel

add callbacks for alloc and free.
this allows region creators to allocate any extra storage they may
require.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/intel_memory_region.c | 16 +++++++++++++---
 drivers/gpu/drm/i915/intel_memory_region.h |  2 ++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index e38d2db1c3e3..3da07a712f90 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -231,7 +231,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
 	struct intel_memory_region *mem;
 	int err;
 
-	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+	if (ops->alloc)
+		mem = ops->alloc();
+	else
+		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return ERR_PTR(-ENOMEM);
 
@@ -265,7 +268,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
 	if (mem->ops->release)
 		mem->ops->release(mem);
 err_free:
-	kfree(mem);
+	if (mem->ops->free)
+		mem->ops->free(mem);
+	else
+		kfree(mem);
 	return ERR_PTR(err);
 }
 
@@ -288,7 +294,11 @@ void intel_memory_region_destroy(struct intel_memory_region *mem)
 
 	GEM_WARN_ON(!list_empty_careful(&mem->objects.list));
 	mutex_destroy(&mem->objects.lock);
-	if (!ret)
+	if (ret)
+		return;
+	if (mem->ops->free)
+		mem->ops->free(mem);
+	else
 		kfree(mem);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 3d8378c1b447..048955b5429f 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -61,6 +61,8 @@ struct intel_memory_region_ops {
 			   resource_size_t size,
 			   resource_size_t page_size,
 			   unsigned int flags);
+	struct intel_memory_region *(*alloc)(void);
+	void (*free)(struct intel_memory_region *mem);
 };
 
 struct intel_memory_region {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 08/10] drm/i915: allow memory region creators to alloc and free the region
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Robert Beckett, Thomas Hellström, kernel, Matthew Auld,
	linux-kernel

add callbacks for alloc and free.
this allows region creators to allocate any extra storage they may
require.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/intel_memory_region.c | 16 +++++++++++++---
 drivers/gpu/drm/i915/intel_memory_region.h |  2 ++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index e38d2db1c3e3..3da07a712f90 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -231,7 +231,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
 	struct intel_memory_region *mem;
 	int err;
 
-	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+	if (ops->alloc)
+		mem = ops->alloc();
+	else
+		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return ERR_PTR(-ENOMEM);
 
@@ -265,7 +268,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
 	if (mem->ops->release)
 		mem->ops->release(mem);
 err_free:
-	kfree(mem);
+	if (mem->ops->free)
+		mem->ops->free(mem);
+	else
+		kfree(mem);
 	return ERR_PTR(err);
 }
 
@@ -288,7 +294,11 @@ void intel_memory_region_destroy(struct intel_memory_region *mem)
 
 	GEM_WARN_ON(!list_empty_careful(&mem->objects.list));
 	mutex_destroy(&mem->objects.lock);
-	if (!ret)
+	if (ret)
+		return;
+	if (mem->ops->free)
+		mem->ops->free(mem);
+	else
 		kfree(mem);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 3d8378c1b447..048955b5429f 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -61,6 +61,8 @@ struct intel_memory_region_ops {
 			   resource_size_t size,
 			   resource_size_t page_size,
 			   unsigned int flags);
+	struct intel_memory_region *(*alloc)(void);
+	void (*free)(struct intel_memory_region *mem);
 };
 
 struct intel_memory_region {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v7 08/10] drm/i915: allow memory region creators to alloc and free the region
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Thomas Hellström, kernel, Matthew Auld, linux-kernel

add callbacks for alloc and free.
this allows region creators to allocate any extra storage they may
require.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/intel_memory_region.c | 16 +++++++++++++---
 drivers/gpu/drm/i915/intel_memory_region.h |  2 ++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index e38d2db1c3e3..3da07a712f90 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -231,7 +231,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
 	struct intel_memory_region *mem;
 	int err;
 
-	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+	if (ops->alloc)
+		mem = ops->alloc();
+	else
+		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return ERR_PTR(-ENOMEM);
 
@@ -265,7 +268,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
 	if (mem->ops->release)
 		mem->ops->release(mem);
 err_free:
-	kfree(mem);
+	if (mem->ops->free)
+		mem->ops->free(mem);
+	else
+		kfree(mem);
 	return ERR_PTR(err);
 }
 
@@ -288,7 +294,11 @@ void intel_memory_region_destroy(struct intel_memory_region *mem)
 
 	GEM_WARN_ON(!list_empty_careful(&mem->objects.list));
 	mutex_destroy(&mem->objects.lock);
-	if (!ret)
+	if (ret)
+		return;
+	if (mem->ops->free)
+		mem->ops->free(mem);
+	else
 		kfree(mem);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 3d8378c1b447..048955b5429f 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -61,6 +61,8 @@ struct intel_memory_region_ops {
 			   resource_size_t size,
 			   resource_size_t page_size,
 			   unsigned int flags);
+	struct intel_memory_region *(*alloc)(void);
+	void (*free)(struct intel_memory_region *mem);
 };
 
 struct intel_memory_region {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 09/10] drm/i915/ttm: add buffer pin on alloc flag
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-06-20 21:33   ` Robert Beckett
  -1 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: kernel, Robert Beckett, Matthew Auld, Thomas Hellström,
	linux-kernel

For situations where allocations need to fail on alloc instead of
delayed get_pages, add a new alloc flag to pin the ttm bo.
This makes sure that the resource has been allocated during buffer
creation, allowing it to fail with an error if the placement is
exhausted.
This allows existing fallback options for stolen backend allocation like
create_ring_vma to work as expected.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++++++----
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 25 ++++++++++++++++++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 6632ed52e919..07bc11247a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -325,17 +325,20 @@ struct drm_i915_gem_object {
  * dealing with userspace objects the CPU fault handler is free to ignore this.
  */
 #define I915_BO_ALLOC_GPU_ONLY	  BIT(6)
+/* object should be pinned in destination region from allocation */
+#define I915_BO_ALLOC_PINNED	  BIT(7)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 			     I915_BO_ALLOC_VOLATILE | \
 			     I915_BO_ALLOC_CPU_CLEAR | \
 			     I915_BO_ALLOC_USER | \
 			     I915_BO_ALLOC_PM_VOLATILE | \
 			     I915_BO_ALLOC_PM_EARLY | \
-			     I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY          BIT(7)
-#define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED         BIT(9)
-#define I915_BO_WAS_BOUND_BIT     10
+			     I915_BO_ALLOC_GPU_ONLY | \
+			     I915_BO_ALLOC_PINNED)
+#define I915_BO_READONLY          BIT(8)
+#define I915_TILING_QUIRK_BIT     9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED         BIT(10)
+#define I915_BO_WAS_BOUND_BIT     11
 	/**
 	 * @mem_flags - Mutable placement-related flags
 	 *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27d59639177f..bb988608296d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -998,6 +998,13 @@ static void i915_ttm_delayed_free(struct drm_i915_gem_object *obj)
 {
 	GEM_BUG_ON(!obj->ttm.created);
 
+	/* stolen objects are pinned for lifetime. Unpin before putting */
+	if (obj->flags & I915_BO_ALLOC_PINNED) {
+		ttm_bo_reserve(i915_gem_to_ttm(obj), true, false, NULL);
+		ttm_bo_unpin(i915_gem_to_ttm(obj));
+		ttm_bo_unreserve(i915_gem_to_ttm(obj));
+	}
+
 	ttm_bo_put(i915_gem_to_ttm(obj));
 }
 
@@ -1193,6 +1200,9 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 		.no_wait_gpu = false,
 	};
 	enum ttm_bo_type bo_type;
+	struct ttm_place _place;
+	struct ttm_placement _placement;
+	struct ttm_placement *placement;
 	int ret;
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,6 +1232,17 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	 */
 	i915_gem_object_make_unshrinkable(obj);
 
+	if (obj->flags & I915_BO_ALLOC_PINNED) {
+		i915_ttm_place_from_region(mem, &_place, obj->bo_offset,
+					   obj->base.size, obj->flags);
+		_placement.num_placement = 1;
+		_placement.placement = &_place;
+		_placement.num_busy_placement = 0;
+		_placement.busy_placement = NULL;
+		placement = &_placement;
+	} else {
+		placement = &i915_sys_placement;
+	}
 	/*
 	 * If this function fails, it will call the destructor, but
 	 * our caller still owns the object. So no freeing in the
@@ -1230,7 +1251,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	 * until successful initialization.
 	 */
 	ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
-				   bo_type, &i915_sys_placement,
+				   bo_type, placement,
 				   page_size >> PAGE_SHIFT,
 				   &ctx, NULL, NULL, i915_ttm_bo_destroy);
 	if (ret)
@@ -1242,6 +1263,8 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_ttm_adjust_domains_after_move(obj);
 	i915_ttm_adjust_gem_after_move(obj);
 	obj->ttm.cache_level_override = false;
+	if (obj->flags & I915_BO_ALLOC_PINNED)
+		ttm_bo_pin(i915_gem_to_ttm(obj));
 	i915_gem_object_unlock(obj);
 
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 09/10] drm/i915/ttm: add buffer pin on alloc flag
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Robert Beckett, Thomas Hellström, kernel, Matthew Auld,
	linux-kernel

For situations where allocations need to fail on alloc instead of
delayed get_pages, add a new alloc flag to pin the ttm bo.
This makes sure that the resource has been allocated during buffer
creation, allowing it to fail with an error if the placement is
exhausted.
This allows existing fallback options for stolen backend allocation like
create_ring_vma to work as expected.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++++++----
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 25 ++++++++++++++++++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 6632ed52e919..07bc11247a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -325,17 +325,20 @@ struct drm_i915_gem_object {
  * dealing with userspace objects the CPU fault handler is free to ignore this.
  */
 #define I915_BO_ALLOC_GPU_ONLY	  BIT(6)
+/* object should be pinned in destination region from allocation */
+#define I915_BO_ALLOC_PINNED	  BIT(7)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 			     I915_BO_ALLOC_VOLATILE | \
 			     I915_BO_ALLOC_CPU_CLEAR | \
 			     I915_BO_ALLOC_USER | \
 			     I915_BO_ALLOC_PM_VOLATILE | \
 			     I915_BO_ALLOC_PM_EARLY | \
-			     I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY          BIT(7)
-#define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED         BIT(9)
-#define I915_BO_WAS_BOUND_BIT     10
+			     I915_BO_ALLOC_GPU_ONLY | \
+			     I915_BO_ALLOC_PINNED)
+#define I915_BO_READONLY          BIT(8)
+#define I915_TILING_QUIRK_BIT     9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED         BIT(10)
+#define I915_BO_WAS_BOUND_BIT     11
 	/**
 	 * @mem_flags - Mutable placement-related flags
 	 *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27d59639177f..bb988608296d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -998,6 +998,13 @@ static void i915_ttm_delayed_free(struct drm_i915_gem_object *obj)
 {
 	GEM_BUG_ON(!obj->ttm.created);
 
+	/* stolen objects are pinned for lifetime. Unpin before putting */
+	if (obj->flags & I915_BO_ALLOC_PINNED) {
+		ttm_bo_reserve(i915_gem_to_ttm(obj), true, false, NULL);
+		ttm_bo_unpin(i915_gem_to_ttm(obj));
+		ttm_bo_unreserve(i915_gem_to_ttm(obj));
+	}
+
 	ttm_bo_put(i915_gem_to_ttm(obj));
 }
 
@@ -1193,6 +1200,9 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 		.no_wait_gpu = false,
 	};
 	enum ttm_bo_type bo_type;
+	struct ttm_place _place;
+	struct ttm_placement _placement;
+	struct ttm_placement *placement;
 	int ret;
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,6 +1232,17 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	 */
 	i915_gem_object_make_unshrinkable(obj);
 
+	if (obj->flags & I915_BO_ALLOC_PINNED) {
+		i915_ttm_place_from_region(mem, &_place, obj->bo_offset,
+					   obj->base.size, obj->flags);
+		_placement.num_placement = 1;
+		_placement.placement = &_place;
+		_placement.num_busy_placement = 0;
+		_placement.busy_placement = NULL;
+		placement = &_placement;
+	} else {
+		placement = &i915_sys_placement;
+	}
 	/*
 	 * If this function fails, it will call the destructor, but
 	 * our caller still owns the object. So no freeing in the
@@ -1230,7 +1251,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	 * until successful initialization.
 	 */
 	ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
-				   bo_type, &i915_sys_placement,
+				   bo_type, placement,
 				   page_size >> PAGE_SHIFT,
 				   &ctx, NULL, NULL, i915_ttm_bo_destroy);
 	if (ret)
@@ -1242,6 +1263,8 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_ttm_adjust_domains_after_move(obj);
 	i915_ttm_adjust_gem_after_move(obj);
 	obj->ttm.cache_level_override = false;
+	if (obj->flags & I915_BO_ALLOC_PINNED)
+		ttm_bo_pin(i915_gem_to_ttm(obj));
 	i915_gem_object_unlock(obj);
 
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v7 09/10] drm/i915/ttm: add buffer pin on alloc flag
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Thomas Hellström, kernel, Matthew Auld, linux-kernel

For situations where allocations need to fail on alloc instead of
delayed get_pages, add a new alloc flag to pin the ttm bo.
This makes sure that the resource has been allocated during buffer
creation, allowing it to fail with an error if the placement is
exhausted.
This allows existing fallback options for stolen backend allocation like
create_ring_vma to work as expected.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++++++----
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 25 ++++++++++++++++++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 6632ed52e919..07bc11247a3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -325,17 +325,20 @@ struct drm_i915_gem_object {
  * dealing with userspace objects the CPU fault handler is free to ignore this.
  */
 #define I915_BO_ALLOC_GPU_ONLY	  BIT(6)
+/* object should be pinned in destination region from allocation */
+#define I915_BO_ALLOC_PINNED	  BIT(7)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 			     I915_BO_ALLOC_VOLATILE | \
 			     I915_BO_ALLOC_CPU_CLEAR | \
 			     I915_BO_ALLOC_USER | \
 			     I915_BO_ALLOC_PM_VOLATILE | \
 			     I915_BO_ALLOC_PM_EARLY | \
-			     I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY          BIT(7)
-#define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED         BIT(9)
-#define I915_BO_WAS_BOUND_BIT     10
+			     I915_BO_ALLOC_GPU_ONLY | \
+			     I915_BO_ALLOC_PINNED)
+#define I915_BO_READONLY          BIT(8)
+#define I915_TILING_QUIRK_BIT     9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED         BIT(10)
+#define I915_BO_WAS_BOUND_BIT     11
 	/**
 	 * @mem_flags - Mutable placement-related flags
 	 *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27d59639177f..bb988608296d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -998,6 +998,13 @@ static void i915_ttm_delayed_free(struct drm_i915_gem_object *obj)
 {
 	GEM_BUG_ON(!obj->ttm.created);
 
+	/* stolen objects are pinned for lifetime. Unpin before putting */
+	if (obj->flags & I915_BO_ALLOC_PINNED) {
+		ttm_bo_reserve(i915_gem_to_ttm(obj), true, false, NULL);
+		ttm_bo_unpin(i915_gem_to_ttm(obj));
+		ttm_bo_unreserve(i915_gem_to_ttm(obj));
+	}
+
 	ttm_bo_put(i915_gem_to_ttm(obj));
 }
 
@@ -1193,6 +1200,9 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 		.no_wait_gpu = false,
 	};
 	enum ttm_bo_type bo_type;
+	struct ttm_place _place;
+	struct ttm_placement _placement;
+	struct ttm_placement *placement;
 	int ret;
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,6 +1232,17 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	 */
 	i915_gem_object_make_unshrinkable(obj);
 
+	if (obj->flags & I915_BO_ALLOC_PINNED) {
+		i915_ttm_place_from_region(mem, &_place, obj->bo_offset,
+					   obj->base.size, obj->flags);
+		_placement.num_placement = 1;
+		_placement.placement = &_place;
+		_placement.num_busy_placement = 0;
+		_placement.busy_placement = NULL;
+		placement = &_placement;
+	} else {
+		placement = &i915_sys_placement;
+	}
 	/*
 	 * If this function fails, it will call the destructor, but
 	 * our caller still owns the object. So no freeing in the
@@ -1230,7 +1251,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	 * until successful initialization.
 	 */
 	ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
-				   bo_type, &i915_sys_placement,
+				   bo_type, placement,
 				   page_size >> PAGE_SHIFT,
 				   &ctx, NULL, NULL, i915_ttm_bo_destroy);
 	if (ret)
@@ -1242,6 +1263,8 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_ttm_adjust_domains_after_move(obj);
 	i915_ttm_adjust_gem_after_move(obj);
 	obj->ttm.cache_level_override = false;
+	if (obj->flags & I915_BO_ALLOC_PINNED)
+		ttm_bo_pin(i915_gem_to_ttm(obj));
 	i915_gem_object_unlock(obj);
 
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 10/10] drm/i915: stolen memory use ttm backend
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-06-20 21:33   ` Robert Beckett
  -1 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: kernel, Robert Beckett, Matthew Auld, Thomas Hellström,
	linux-kernel

refactor stolen memory region to use ttm.
this necessitates using ttm resources to track reserved stolen regions
instead of drm_mm_nodes.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/display/intel_fbc.c      |  78 ++--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    | 440 +++++++-----------
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h    |  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   7 +
 drivers/gpu/drm/i915/gt/intel_rc6.c           |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |   7 +-
 drivers/gpu/drm/i915/i915_drv.h               |   5 -
 drivers/gpu/drm/i915/intel_region_ttm.c       |  42 +-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 13 files changed, 303 insertions(+), 342 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c
index 8b807284cde1..6f3afac5e8c9 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -38,6 +38,7 @@
  * forcibly disable it to allow proper screen updates.
  */
 
+#include "gem/i915_gem_stolen.h"
 #include <linux/string_helpers.h>
 
 #include <drm/drm_fourcc.h>
@@ -51,6 +52,7 @@
 #include "intel_display_types.h"
 #include "intel_fbc.h"
 #include "intel_frontbuffer.h"
+#include "gem/i915_gem_region.h"
 
 #define for_each_fbc_id(__dev_priv, __fbc_id) \
 	for ((__fbc_id) = INTEL_FBC_A; (__fbc_id) < I915_MAX_FBCS; (__fbc_id)++) \
@@ -92,8 +94,8 @@ struct intel_fbc {
 	struct mutex lock;
 	unsigned int busy_bits;
 
-	struct drm_mm_node compressed_fb;
-	struct drm_mm_node compressed_llb;
+	struct ttm_resource *compressed_fb;
+	struct ttm_resource *compressed_llb;
 
 	enum intel_fbc_id id;
 
@@ -331,16 +333,20 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
 	struct drm_i915_private *i915 = fbc->i915;
+	u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
+	u64 llb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_llb);
 
+	GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+	GEM_BUG_ON(llb_offset == I915_BO_INVALID_OFFSET);
 	GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-					 fbc->compressed_fb.start, U32_MAX));
+					 fb_offset, U32_MAX));
 	GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-					 fbc->compressed_llb.start, U32_MAX));
+					 llb_offset, U32_MAX));
 
 	intel_de_write(i915, FBC_CFB_BASE,
-		       i915->dsm.start + fbc->compressed_fb.start);
+		       i915->dsm.start + fb_offset);
 	intel_de_write(i915, FBC_LL_BASE,
-		       i915->dsm.start + fbc->compressed_llb.start);
+		       i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -448,8 +454,10 @@ static bool g4x_fbc_is_compressing(struct intel_fbc *fbc)
 static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
 	struct drm_i915_private *i915 = fbc->i915;
+	u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-	intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+	GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+	intel_de_write(i915, DPFC_CB_BASE, fb_offset);
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -499,8 +507,10 @@ static bool ilk_fbc_is_compressing(struct intel_fbc *fbc)
 static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
 	struct drm_i915_private *i915 = fbc->i915;
+	u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-	intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fbc->compressed_fb.start);
+	GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+	intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fb_offset);
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -744,21 +754,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
 	struct drm_i915_private *i915 = fbc->i915;
 	u64 end = intel_fbc_stolen_end(i915);
-	int ret, limit = min_limit;
+	int limit = min_limit;
+	struct ttm_resource *res;
 
 	size /= limit;
 
 	/* Try to over-allocate to reduce reallocations and fragmentation. */
-	ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-						   size <<= 1, 4096, 0, end);
-	if (ret == 0)
+	res = i915_gem_stolen_reserve_range(i915, size <<= 1, 0, end);
+	if (!IS_ERR(res)) {
+		fbc->compressed_fb = res;
 		return limit;
+	}
 
 	for (; limit <= intel_fbc_max_limit(i915); limit <<= 1) {
-		ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-							   size >>= 1, 4096, 0, end);
-		if (ret == 0)
+		res = i915_gem_stolen_reserve_range(i915, size >>= 1, 0, end);
+		if (!IS_ERR(res)) {
+			fbc->compressed_fb = res;
 			return limit;
+		}
 	}
 
 	return 0;
@@ -769,17 +782,18 @@ static int intel_fbc_alloc_cfb(struct intel_fbc *fbc,
 {
 	struct drm_i915_private *i915 = fbc->i915;
 	int ret;
+	struct ttm_resource *res;
 
-	drm_WARN_ON(&i915->drm,
-		    drm_mm_node_allocated(&fbc->compressed_fb));
-	drm_WARN_ON(&i915->drm,
-		    drm_mm_node_allocated(&fbc->compressed_llb));
+	drm_WARN_ON(&i915->drm, fbc->compressed_fb);
+	drm_WARN_ON(&i915->drm, fbc->compressed_llb);
 
 	if (DISPLAY_VER(i915) < 5 && !IS_G4X(i915)) {
-		ret = i915_gem_stolen_insert_node(i915, &fbc->compressed_llb,
-						  4096, 4096);
-		if (ret)
+		res = i915_gem_stolen_reserve_range(i915, 4096, I915_GEM_STOLEN_BIAS, 0);
+		if (IS_ERR(res)) {
+			ret = PTR_ERR(res);
 			goto err;
+		}
+		fbc->compressed_llb = res;
 	}
 
 	ret = find_compression_limit(fbc, size, min_limit);
@@ -793,15 +807,14 @@ static int intel_fbc_alloc_cfb(struct intel_fbc *fbc,
 
 	drm_dbg_kms(&i915->drm,
 		    "reserved %llu bytes of contiguous stolen space for FBC, limit: %d\n",
-		    fbc->compressed_fb.size, fbc->limit);
+		    i915_gem_stolen_reserve_size(fbc->compressed_fb), fbc->limit);
 
 	return 0;
 
 err_llb:
-	if (drm_mm_node_allocated(&fbc->compressed_llb))
-		i915_gem_stolen_remove_node(i915, &fbc->compressed_llb);
+	i915_gem_stolen_release_range(i915, fetch_and_zero(&fbc->compressed_llb));
 err:
-	if (drm_mm_initialized(&i915->mm.stolen))
+	if (IS_ERR(res) && (PTR_ERR(res) == -ENOMEM || PTR_ERR(res) == -ENXIO))
 		drm_info_once(&i915->drm, "not enough stolen space for compressed buffer (need %d more bytes), disabling. Hint: you may be able to increase stolen memory size in the BIOS to avoid this.\n", size);
 	return -ENOSPC;
 }
@@ -826,10 +839,10 @@ static void __intel_fbc_cleanup_cfb(struct intel_fbc *fbc)
 	if (WARN_ON(intel_fbc_hw_is_active(fbc)))
 		return;
 
-	if (drm_mm_node_allocated(&fbc->compressed_llb))
-		i915_gem_stolen_remove_node(i915, &fbc->compressed_llb);
-	if (drm_mm_node_allocated(&fbc->compressed_fb))
-		i915_gem_stolen_remove_node(i915, &fbc->compressed_fb);
+	if (fbc->compressed_llb)
+		i915_gem_stolen_release_range(i915, fetch_and_zero(&fbc->compressed_llb));
+	if (fbc->compressed_fb)
+		i915_gem_stolen_release_range(i915, fetch_and_zero(&fbc->compressed_fb));
 }
 
 void intel_fbc_cleanup(struct drm_i915_private *i915)
@@ -1034,9 +1047,10 @@ static bool intel_fbc_is_cfb_ok(const struct intel_plane_state *plane_state)
 {
 	struct intel_plane *plane = to_intel_plane(plane_state->uapi.plane);
 	struct intel_fbc *fbc = plane->fbc;
+	u64 fb_size = i915_gem_stolen_reserve_size(fbc->compressed_fb);
 
 	return intel_fbc_min_limit(plane_state) <= fbc->limit &&
-		intel_fbc_cfb_size(plane_state) <= fbc->compressed_fb.size * fbc->limit;
+		intel_fbc_cfb_size(plane_state) <= fb_size * fbc->limit;
 }
 
 static bool intel_fbc_is_ok(const struct intel_plane_state *plane_state)
@@ -1702,7 +1716,7 @@ void intel_fbc_init(struct drm_i915_private *i915)
 {
 	enum intel_fbc_id fbc_id;
 
-	if (!drm_mm_initialized(&i915->mm.stolen))
+	if (!i915->mm.stolen_region)
 		mkwrite_device_info(i915)->display.fbc_mask = 0;
 
 	if (need_fbc_vtd_wa(i915))
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 07bc11247a3e..13466cca7af8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -633,8 +633,6 @@ struct drm_i915_gem_object {
 		} userptr;
 #endif
 
-		struct drm_mm_node *stolen;
-
 		resource_size_t bo_offset;
 
 		unsigned long scratch;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 47b5e0e342ab..e7fdccf628d8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -4,75 +4,111 @@
  * Copyright © 2008-2012 Intel Corporation
  */
 
+#include "drm/ttm/ttm_placement.h"
+#include "gem/i915_gem_object_types.h"
 #include <linux/errno.h>
 #include <linux/mutex.h>
 
-#include <drm/drm_mm.h>
 #include <drm/i915_drm.h>
 
 #include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_region.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_region_lmem.h"
+#include "gem/i915_gem_ttm.h"
 #include "i915_drv.h"
 #include "i915_gem_stolen.h"
 #include "i915_reg.h"
 #include "i915_utils.h"
 #include "i915_vgpu.h"
 #include "intel_mchbar_regs.h"
+#include "intel_region_ttm.h"
 
-/*
- * The BIOS typically reserves some of the system's memory for the exclusive
- * use of the integrated graphics. This memory is no longer available for
- * use by the OS and so the user finds that his system has less memory
- * available than he put in. We refer to this memory as stolen.
- *
- * The BIOS will allocate its framebuffer from the stolen memory. Our
- * goal is try to reuse that object for our own fbcon which must always
- * be available for panics. Anything else we can reuse the stolen memory
- * for is a boon.
- */
+struct stolen_region {
+	struct intel_memory_region region;
+	struct ttm_resource *wa_resvd;
+};
 
-int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *i915,
-					 struct drm_mm_node *node, u64 size,
-					 unsigned alignment, u64 start, u64 end)
+inline struct stolen_region *to_stolen_region(struct intel_memory_region *mem)
 {
-	int ret;
-
-	if (!drm_mm_initialized(&i915->mm.stolen))
-		return -ENODEV;
+	return container_of(mem, struct stolen_region, region);
+}
 
-	/* WaSkipStolenMemoryFirstPage:bdw+ */
-	if (GRAPHICS_VER(i915) >= 8 && start < 4096)
-		start = 4096;
+/**
+ * i915_gem_stolen_reserve_range - reserve a region of space in a given range
+ * @i915: i915 device instance
+ * @size: size of region to reserve
+ * @start: start of search area
+ * @end: end of search area
+ *
+ * Search for @size amount of free space within the region delimeted by @start and @end.
+ * If found reserve it from future use until later release with @i915_gem_stolen_release_range.
+ *
+ * Return: pointer to resource tracking structure on success, ERR_PTR otherwise
+ */
+struct ttm_resource *
+i915_gem_stolen_reserve_range(struct drm_i915_private *i915,
+			      resource_size_t size,
+			      u64 start, u64 end)
+{
+	struct intel_memory_region *mem = i915->mm.stolen_region;
 
-	mutex_lock(&i915->mm.stolen_lock);
-	ret = drm_mm_insert_node_in_range(&i915->mm.stolen, node,
-					  size, alignment, 0,
-					  start, end, DRM_MM_INSERT_BEST);
-	mutex_unlock(&i915->mm.stolen_lock);
+	if (!mem)
+		return ERR_PTR(-ENODEV);
+	return intel_region_ttm_resource_alloc(mem, size, start, end, I915_BO_ALLOC_CONTIGUOUS);
+}
 
-	return ret;
+/**
+ * i915_gem_stolen_reserve_offset - return the offset of the reserved space
+ * @res: pointer to resource tracking structure to check
+ *
+ * Return: The offset of the reserved resource, or I915_BO_INVALID_OFFSET on error
+ */
+u64 i915_gem_stolen_reserve_offset(struct ttm_resource *res)
+{
+	if (!res)
+		return I915_BO_INVALID_OFFSET;
+	return PFN_PHYS(res->start);
 }
 
-int i915_gem_stolen_insert_node(struct drm_i915_private *i915,
-				struct drm_mm_node *node, u64 size,
-				unsigned alignment)
+/**
+ * i915_gem_stolen_reserve_size - return the reserved size of the reserved space
+ * @res: pointer to resource tracking structure to check
+ *
+ * Return: The size of the reserved resource, or I915_BO_INVALID_OFFSET on error
+ */
+u64 i915_gem_stolen_reserve_size(struct ttm_resource *res)
 {
-	return i915_gem_stolen_insert_node_in_range(i915, node,
-						    size, alignment,
-						    I915_GEM_STOLEN_BIAS,
-						    U64_MAX);
+	if (!res)
+		return I915_BO_INVALID_OFFSET;
+	return PFN_PHYS(res->num_pages);
 }
 
-void i915_gem_stolen_remove_node(struct drm_i915_private *i915,
-				 struct drm_mm_node *node)
+/**
+ * i915_gem_stolen_release_range - release the reserved area to be free for allocation again
+ * @i915: i915 device instance
+ * @res: pointer to resource tracking structure allocated via @i915_gem_stolen_reserve_range
+ */
+void i915_gem_stolen_release_range(struct drm_i915_private *i915,
+				   struct ttm_resource *res)
 {
-	mutex_lock(&i915->mm.stolen_lock);
-	drm_mm_remove_node(node);
-	mutex_unlock(&i915->mm.stolen_lock);
+	struct intel_memory_region *mem = i915->mm.stolen_region;
+
+	intel_region_ttm_resource_free(mem, res);
 }
 
+/*
+ * The BIOS typically reserves some of the system's memory for the exclusive
+ * use of the integrated graphics. This memory is no longer available for
+ * use by the OS and so the user finds that his system has less memory
+ * available than he put in. We refer to this memory as stolen.
+ *
+ * The BIOS will allocate its framebuffer from the stolen memory. Our
+ * goal is try to reuse that object for our own fbcon which must always
+ * be available for panics. Anything else we can reuse the stolen memory
+ * for is a boon.
+ */
+
 static int i915_adjust_stolen(struct drm_i915_private *i915,
 			      struct resource *dsm)
 {
@@ -173,14 +209,6 @@ static int i915_adjust_stolen(struct drm_i915_private *i915,
 	return 0;
 }
 
-static void i915_gem_cleanup_stolen(struct drm_i915_private *i915)
-{
-	if (!drm_mm_initialized(&i915->mm.stolen))
-		return;
-
-	drm_mm_takedown(&i915->mm.stolen);
-}
-
 static void g4x_get_stolen_reserved(struct drm_i915_private *i915,
 				    struct intel_uncore *uncore,
 				    resource_size_t *base,
@@ -394,8 +422,7 @@ static int i915_gem_init_stolen(struct intel_memory_region *mem)
 	struct intel_uncore *uncore = &i915->uncore;
 	resource_size_t reserved_base, stolen_top;
 	resource_size_t reserved_total, reserved_size;
-
-	mutex_init(&i915->mm.stolen_lock);
+	int ret;
 
 	if (intel_vgpu_active(i915)) {
 		drm_notice(&i915->drm,
@@ -513,258 +540,100 @@ static int i915_gem_init_stolen(struct intel_memory_region *mem)
 		return 0;
 
 	/* Basic memrange allocator for stolen space. */
-	drm_mm_init(&i915->mm.stolen, 0, i915->stolen_usable_size);
-
-	return 0;
-}
-
-static void dbg_poison(struct i915_ggtt *ggtt,
-		       dma_addr_t addr, resource_size_t size,
-		       u8 x)
-{
-#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
-	if (!drm_mm_node_allocated(&ggtt->error_capture))
-		return;
-
-	if (ggtt->vm.bind_async_flags & I915_VMA_GLOBAL_BIND)
-		return; /* beware stop_machine() inversion */
-
-	GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
-
-	mutex_lock(&ggtt->error_mutex);
-	while (size) {
-		void __iomem *s;
-
-		ggtt->vm.insert_page(&ggtt->vm, addr,
-				     ggtt->error_capture.start,
-				     I915_CACHE_NONE, 0);
-		mb();
-
-		s = io_mapping_map_wc(&ggtt->iomap,
-				      ggtt->error_capture.start,
-				      PAGE_SIZE);
-		memset_io(s, x, PAGE_SIZE);
-		io_mapping_unmap(s);
-
-		addr += PAGE_SIZE;
-		size -= PAGE_SIZE;
-	}
-	mb();
-	ggtt->vm.clear_range(&ggtt->vm, ggtt->error_capture.start, PAGE_SIZE);
-	mutex_unlock(&ggtt->error_mutex);
-#endif
-}
-
-static struct sg_table *
-i915_pages_create_for_stolen(struct drm_device *dev,
-			     resource_size_t offset, resource_size_t size)
-{
-	struct drm_i915_private *i915 = to_i915(dev);
-	struct sg_table *st;
-	struct scatterlist *sg;
-
-	GEM_BUG_ON(range_overflows(offset, size, resource_size(&i915->dsm)));
-
-	/* We hide that we have no struct page backing our stolen object
-	 * by wrapping the contiguous physical allocation with a fake
-	 * dma mapping in a single scatterlist.
-	 */
+	ret = intel_region_ttm_init(mem);
+	if (ret)
+		return ret;
 
-	st = kmalloc(sizeof(*st), GFP_KERNEL);
-	if (st == NULL)
-		return ERR_PTR(-ENOMEM);
+	/* WaSkipStolenMemoryFirstPage:bdw+ */
+	if (GRAPHICS_VER(i915) >= 8) {
+		struct stolen_region *region = to_stolen_region(mem);
+		struct ttm_resource *resvd;
+
+		resvd = intel_region_ttm_resource_alloc(mem, 4096, 0, 4096, 0);
+		if (IS_ERR(resvd)) {
+			ret = PTR_ERR(resvd);
+			goto err_no_reserve;
+		}
 
-	if (sg_alloc_table(st, 1, GFP_KERNEL)) {
-		kfree(st);
-		return ERR_PTR(-ENOMEM);
+		region->wa_resvd = resvd;
 	}
 
-	sg = st->sgl;
-	sg->offset = 0;
-	sg->length = size;
+	return ret;
 
-	sg_dma_address(sg) = (dma_addr_t)i915->dsm.start + offset;
-	sg_dma_len(sg) = size;
+err_no_reserve:
+	intel_region_ttm_fini(mem);
 
-	return st;
+	return ret;
 }
 
-static int i915_gem_object_get_pages_stolen(struct drm_i915_gem_object *obj)
+struct drm_i915_gem_object *
+i915_gem_object_create_stolen(struct drm_i915_private *i915,
+			      resource_size_t size)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	struct sg_table *pages =
-		i915_pages_create_for_stolen(obj->base.dev,
-					     obj->stolen->start,
-					     obj->stolen->size);
-	if (IS_ERR(pages))
-		return PTR_ERR(pages);
-
-	dbg_poison(to_gt(i915)->ggtt,
-		   sg_dma_address(pages->sgl),
-		   sg_dma_len(pages->sgl),
-		   POISON_INUSE);
-
-	__i915_gem_object_set_pages(obj, pages, obj->stolen->size);
-
-	return 0;
+	return i915_gem_object_create_region(i915->mm.stolen_region, size, 0,
+					     I915_BO_ALLOC_CONTIGUOUS | I915_BO_ALLOC_PINNED);
 }
 
-static void i915_gem_object_put_pages_stolen(struct drm_i915_gem_object *obj,
-					     struct sg_table *pages)
+static struct intel_memory_region *alloc_stolen_region(void)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	/* Should only be called from i915_gem_object_release_stolen() */
+	struct stolen_region *region = kzalloc(sizeof(*region), GFP_KERNEL);
 
-	dbg_poison(to_gt(i915)->ggtt,
-		   sg_dma_address(pages->sgl),
-		   sg_dma_len(pages->sgl),
-		   POISON_FREE);
+	if (!region)
+		return NULL;
 
-	sg_free_table(pages);
-	kfree(pages);
+	return &region->region;
 }
 
-static void
-i915_gem_object_release_stolen(struct drm_i915_gem_object *obj)
+static void free_stolen_region(struct intel_memory_region *mem)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	struct drm_mm_node *stolen = fetch_and_zero(&obj->stolen);
-
-	GEM_BUG_ON(!stolen);
-	i915_gem_stolen_remove_node(i915, stolen);
-	kfree(stolen);
+	struct stolen_region *region = to_stolen_region(mem);
 
-	i915_gem_object_release_memory_region(obj);
+	kfree(region);
 }
 
-static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = {
-	.name = "i915_gem_object_stolen",
-	.get_pages = i915_gem_object_get_pages_stolen,
-	.put_pages = i915_gem_object_put_pages_stolen,
-	.release = i915_gem_object_release_stolen,
-};
-
-static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
-					   struct drm_i915_gem_object *obj,
-					   struct drm_mm_node *stolen)
+static int init_stolen_smem(struct intel_memory_region *mem)
 {
-	static struct lock_class_key lock_class;
-	unsigned int cache_level;
-	unsigned int flags;
-	int err;
-
 	/*
-	 * Stolen objects are always physically contiguous since we just
-	 * allocate one big block underneath using the drm_mm range allocator.
+	 * Initialise stolen early so that we may reserve preallocated
+	 * objects for the BIOS to KMS transition.
 	 */
-	flags = I915_BO_ALLOC_CONTIGUOUS;
-
-	drm_gem_private_object_init(&mem->i915->drm, &obj->base, stolen->size);
-	i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, flags);
-
-	obj->stolen = stolen;
-	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
-	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
-	i915_gem_object_set_cache_coherency(obj, cache_level);
-
-	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
-		return -EBUSY;
+	return i915_gem_init_stolen(mem);
+}
 
-	i915_gem_object_init_memory_region(obj, mem);
+static int release_stolen(struct intel_memory_region *mem)
+{
+	struct stolen_region *region = to_stolen_region(mem);
 
-	err = i915_gem_object_pin_pages(obj);
-	if (err)
-		i915_gem_object_release_memory_region(obj);
-	i915_gem_object_unlock(obj);
+	if (region->wa_resvd)
+		intel_region_ttm_resource_free(mem, region->wa_resvd);
 
-	return err;
+	return intel_region_ttm_fini(mem);
 }
 
-static int _i915_gem_object_stolen_init(struct intel_memory_region *mem,
-					struct drm_i915_gem_object *obj,
-					resource_size_t offset,
-					resource_size_t size,
-					resource_size_t page_size,
-					unsigned int flags)
+static int stolen_object_init(struct intel_memory_region *mem,
+			      struct drm_i915_gem_object *obj,
+			      resource_size_t offset,
+			      resource_size_t size,
+			      resource_size_t page_size,
+			      unsigned int flags)
 {
-	struct drm_i915_private *i915 = mem->i915;
-	struct drm_mm_node *stolen;
-	int ret;
-
-	if (!drm_mm_initialized(&i915->mm.stolen))
+	if (!mem->region_private)
 		return -ENODEV;
 
 	if (size == 0)
 		return -EINVAL;
 
-	/*
-	 * With discrete devices, where we lack a mappable aperture there is no
-	 * possible way to ever access this memory on the CPU side.
-	 */
-	if (mem->type == INTEL_MEMORY_STOLEN_LOCAL && !mem->io_size &&
-	    !(flags & I915_BO_ALLOC_GPU_ONLY))
-		return -ENOSPC;
-
-	stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
-	if (!stolen)
-		return -ENOMEM;
-
-	if (offset != I915_BO_INVALID_OFFSET) {
-		drm_dbg(&i915->drm,
-			"creating preallocated stolen object: stolen_offset=%pa, size=%pa\n",
-			&offset, &size);
-
-		stolen->start = offset;
-		stolen->size = size;
-		mutex_lock(&i915->mm.stolen_lock);
-		ret = drm_mm_reserve_node(&i915->mm.stolen, stolen);
-		mutex_unlock(&i915->mm.stolen_lock);
-	} else {
-		ret = i915_gem_stolen_insert_node(i915, stolen, size,
-						  mem->min_page_size);
-	}
-	if (ret)
-		goto err_free;
-
-	ret = __i915_gem_object_create_stolen(mem, obj, stolen);
-	if (ret)
-		goto err_remove;
-
-	return 0;
-
-err_remove:
-	i915_gem_stolen_remove_node(i915, stolen);
-err_free:
-	kfree(stolen);
-	return ret;
-}
-
-struct drm_i915_gem_object *
-i915_gem_object_create_stolen(struct drm_i915_private *i915,
-			      resource_size_t size)
-{
-	return i915_gem_object_create_region(i915->mm.stolen_region, size, 0, 0);
-}
-
-static int init_stolen_smem(struct intel_memory_region *mem)
-{
-	/*
-	 * Initialise stolen early so that we may reserve preallocated
-	 * objects for the BIOS to KMS transition.
-	 */
-	return i915_gem_init_stolen(mem);
-}
-
-static int release_stolen_smem(struct intel_memory_region *mem)
-{
-	i915_gem_cleanup_stolen(mem->i915);
-	return 0;
+	/* default range manager relies on page_size for alignment */
+	page_size = max(page_size, mem->min_page_size);
+	return __i915_gem_ttm_object_init(mem, obj, offset, size, page_size, flags);
 }
 
 static const struct intel_memory_region_ops i915_region_stolen_smem_ops = {
 	.init = init_stolen_smem,
-	.release = release_stolen_smem,
-	.init_object = _i915_gem_object_stolen_init,
+	.release = release_stolen,
+	.init_object = stolen_object_init,
+	.alloc = alloc_stolen_region,
+	.free = free_stolen_region,
 };
 
 static int init_stolen_lmem(struct intel_memory_region *mem)
@@ -793,7 +662,7 @@ static int init_stolen_lmem(struct intel_memory_region *mem)
 	return 0;
 
 err_cleanup:
-	i915_gem_cleanup_stolen(mem->i915);
+	intel_region_ttm_fini(mem);
 	return err;
 }
 
@@ -801,14 +670,15 @@ static int release_stolen_lmem(struct intel_memory_region *mem)
 {
 	if (mem->io_size)
 		io_mapping_fini(&mem->iomap);
-	i915_gem_cleanup_stolen(mem->i915);
-	return 0;
+	return release_stolen(mem);
 }
 
 static const struct intel_memory_region_ops i915_region_stolen_lmem_ops = {
 	.init = init_stolen_lmem,
 	.release = release_stolen_lmem,
-	.init_object = _i915_gem_object_stolen_init,
+	.init_object = stolen_object_init,
+	.alloc = alloc_stolen_region,
+	.free = free_stolen_region,
 };
 
 struct intel_memory_region *
@@ -896,7 +766,37 @@ i915_gem_stolen_smem_setup(struct drm_i915_private *i915, u16 type,
 	return mem;
 }
 
-bool i915_gem_object_is_stolen(const struct drm_i915_gem_object *obj)
+/**
+ * i915_gem_object_stolen_offset - return offset from start of stolen region
+ * @obj: the object to return the offset of
+ *
+ * Get the offset from stolen region if this object is currently placed in stolen memory.
+ *
+ * Return: offset from stolen if successful, I915_BO_INVALID_OFFSET otherwise
+ */
+u64 i915_gem_object_stolen_offset(struct drm_i915_gem_object *obj)
 {
-	return obj->ops == &i915_gem_object_stolen_ops;
+	struct ttm_buffer_object *ttm_obj;
+
+	if (!obj || !i915_gem_object_is_stolen(obj))
+		return I915_BO_INVALID_OFFSET;
+
+	ttm_obj = i915_gem_to_ttm(obj);
+	if (ttm_obj->resource->mem_type != I915_PL_STOLEN)
+		return I915_BO_INVALID_OFFSET;
+
+	return PFN_PHYS(ttm_obj->resource->start);
+}
+
+bool i915_gem_object_is_stolen(struct drm_i915_gem_object *obj)
+{
+	struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+
+#ifdef CONFIG_LOCKDEP
+	if (i915_gem_object_migratable(obj) &&
+	    i915_gem_object_evictable(obj))
+		assert_object_held(obj);
+#endif
+	return mr && (mr->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+		      mr->type == INTEL_MEMORY_STOLEN_LOCAL);
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h
index d5005a39d130..b39cb6e6c768 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h
@@ -10,17 +10,9 @@
 
 struct drm_i915_private;
 struct drm_mm_node;
+struct ttm_resource;
 struct drm_i915_gem_object;
 
-int i915_gem_stolen_insert_node(struct drm_i915_private *dev_priv,
-				struct drm_mm_node *node, u64 size,
-				unsigned alignment);
-int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *dev_priv,
-					 struct drm_mm_node *node, u64 size,
-					 unsigned alignment, u64 start,
-					 u64 end);
-void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,
-				 struct drm_mm_node *node);
 struct intel_memory_region *
 i915_gem_stolen_smem_setup(struct drm_i915_private *i915, u16 type,
 			   u16 instance);
@@ -32,7 +24,16 @@ struct drm_i915_gem_object *
 i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
 			      resource_size_t size);
 
-bool i915_gem_object_is_stolen(const struct drm_i915_gem_object *obj);
+u64 i915_gem_object_stolen_offset(struct drm_i915_gem_object *obj);
+bool i915_gem_object_is_stolen(struct drm_i915_gem_object *obj);
+struct ttm_resource *
+i915_gem_stolen_reserve_range(struct drm_i915_private *i915,
+			      resource_size_t size,
+			      u64 start, u64 end);
+u64 i915_gem_stolen_reserve_offset(struct ttm_resource *res);
+u64 i915_gem_stolen_reserve_size(struct ttm_resource *res);
+void i915_gem_stolen_release_range(struct drm_i915_private *i915,
+				   struct ttm_resource *res);
 
 #define I915_GEM_STOLEN_BIAS SZ_128K
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index bb988608296d..c7611efddcf7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1203,6 +1203,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	struct ttm_place _place;
 	struct ttm_placement _placement;
 	struct ttm_placement *placement;
+	bool is_stolen = intel_region_to_ttm_type(mem) == I915_PL_STOLEN;
 	int ret;
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,7 +1223,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
 
 	/* Forcing the page size is kernel internal only */
-	GEM_BUG_ON(page_size && obj->mm.n_placements);
+	GEM_BUG_ON(page_size && obj->mm.n_placements && !is_stolen);
 
 	/*
 	 * Keep an extra shrink pin to prevent the object from being made
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 73e371aa3850..81654a51df51 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -49,6 +49,13 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 			       resource_size_t size,
 			       resource_size_t page_size,
 			       unsigned int flags);
+int i915_gem_ttm_object_init_in_place(struct intel_memory_region *mem,
+				      struct drm_i915_gem_object *obj,
+				      resource_size_t size,
+				      resource_size_t page_size,
+				      unsigned int flags,
+				      u64 start,
+				      u64 end);
 
 /* Internal I915 TTM declarations and definitions below. */
 
diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c
index f8d0523f4c18..846eef898f7e 100644
--- a/drivers/gpu/drm/i915/gt/intel_rc6.c
+++ b/drivers/gpu/drm/i915/gt/intel_rc6.c
@@ -355,9 +355,9 @@ static int vlv_rc6_init(struct intel_rc6 *rc6)
 
 	GEM_BUG_ON(range_overflows_end_t(u64,
 					 i915->dsm.start,
-					 pctx->stolen->start,
+					 i915_gem_object_stolen_offset(pctx),
 					 U32_MAX));
-	pctx_paddr = i915->dsm.start + pctx->stolen->start;
+	pctx_paddr = i915->dsm.start + i915_gem_object_stolen_offset(pctx);
 	intel_uncore_write(uncore, VLV_PCBR, pctx_paddr);
 
 out:
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 37c38bdd5f47..75bc7d90c9dc 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -6,6 +6,7 @@
 #include <linux/crc32.h>
 
 #include "gem/i915_gem_stolen.h"
+#include "intel_region_ttm.h"
 
 #include "i915_memcpy.h"
 #include "i915_selftest.h"
@@ -83,6 +84,7 @@ __igt_reset_stolen(struct intel_gt *gt,
 		dma_addr_t dma = (dma_addr_t)dsm->start + (page << PAGE_SHIFT);
 		void __iomem *s;
 		void *in;
+		bool busy;
 
 		ggtt->vm.insert_page(&ggtt->vm, dma,
 				     ggtt->error_capture.start,
@@ -93,9 +95,9 @@ __igt_reset_stolen(struct intel_gt *gt,
 				      ggtt->error_capture.start,
 				      PAGE_SIZE);
 
-		if (!__drm_mm_interval_first(&gt->i915->mm.stolen,
-					     page << PAGE_SHIFT,
-					     ((page + 1) << PAGE_SHIFT) - 1))
+		busy = intel_region_ttm_range_busy(gt->i915->mm.stolen_region,
+						   PFN_PHYS(page), PAGE_SIZE);
+		if (!busy)
 			memset_io(s, STACK_MAGIC, PAGE_SIZE);
 
 		in = (void __force *)s;
@@ -124,6 +126,7 @@ __igt_reset_stolen(struct intel_gt *gt,
 		void __iomem *s;
 		void *in;
 		u32 x;
+		bool busy;
 
 		ggtt->vm.insert_page(&ggtt->vm, dma,
 				     ggtt->error_capture.start,
@@ -139,10 +142,9 @@ __igt_reset_stolen(struct intel_gt *gt,
 			in = tmp;
 		x = crc32_le(0, in, PAGE_SIZE);
 
-		if (x != crc[page] &&
-		    !__drm_mm_interval_first(&gt->i915->mm.stolen,
-					     page << PAGE_SHIFT,
-					     ((page + 1) << PAGE_SHIFT) - 1)) {
+		busy = intel_region_ttm_range_busy(gt->i915->mm.stolen_region,
+						   PFN_PHYS(page), PAGE_SIZE);
+		if (x != crc[page] && !busy) {
 			pr_debug("unused stolen page %pa modified by GPU reset\n",
 				 &page);
 			if (count++ == 0)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 94e5c29d2ee3..e538b8f71dcc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -32,6 +32,7 @@
 
 #include <drm/drm_debugfs.h>
 
+#include "gem/i915_gem_region.h"
 #include "gem/i915_gem_context.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_buffer_pool.h"
@@ -157,6 +158,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	struct i915_vma *vma;
 	int pin_count = 0;
+	u64 offset;
 
 	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
 		   &obj->base,
@@ -241,8 +243,9 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	spin_unlock(&obj->vma.lock);
 
 	seq_printf(m, " (pinned x %d)", pin_count);
-	if (i915_gem_object_is_stolen(obj))
-		seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
+	offset = i915_gem_object_stolen_offset(obj);
+	if (offset != I915_BO_INVALID_OFFSET)
+		seq_printf(m, " (stolen: %08llx)", offset);
 	if (i915_gem_object_is_framebuffer(obj))
 		seq_printf(m, " (fb)");
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index eba94fa76b18..1ebb266e7b63 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -224,11 +224,6 @@ struct i915_gem_mm {
 	 * support stolen.
 	 */
 	struct intel_memory_region *stolen_region;
-	/** Memory allocator for GTT stolen memory */
-	struct drm_mm stolen;
-	/** Protects the usage of the GTT stolen memory allocator. This is
-	 * always the inner lock when overlapping with struct_mutex. */
-	struct mutex stolen_lock;
 
 	/* Protects bound_list/unbound_list and #drm_i915_gem_object.mm.link */
 	spinlock_t obj_lock;
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 694e9acb69e2..b3850188981b 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -194,11 +194,12 @@ intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
 	}
 }
 
-#ifdef CONFIG_DRM_I915_SELFTEST
 /**
  * intel_region_ttm_resource_alloc - Allocate memory resources from a region
  * @mem: The memory region,
  * @size: The requested size in bytes
+ * @start: start of allowed range
+ * @end: end of allowed range
  * @flags: Allocation flags
  *
  * This functionality is provided only for callers that need to allocate
@@ -212,8 +213,9 @@ intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
  */
 struct ttm_resource *
 intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
-				resource_size_t offset,
 				resource_size_t size,
+				u64 start,
+				u64 end,
 				unsigned int flags)
 {
 	struct ttm_resource_manager *man = mem->region_private;
@@ -222,11 +224,14 @@ intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
 	struct ttm_resource *res;
 	int ret;
 
+	if (!man)
+		return ERR_PTR(-ENODEV);
+
 	if (flags & I915_BO_ALLOC_CONTIGUOUS)
 		place.flags |= TTM_PL_FLAG_CONTIGUOUS;
-	if (offset != I915_BO_INVALID_OFFSET) {
-		place.fpfn = offset >> PAGE_SHIFT;
-		place.lpfn = place.fpfn + (size >> PAGE_SHIFT);
+	if (start || end) {
+		place.fpfn = PFN_DOWN(start);
+		place.lpfn = PFN_UP(end);
 	} else if (mem->io_size && mem->io_size < mem->total) {
 		if (flags & I915_BO_ALLOC_GPU_ONLY) {
 			place.flags |= TTM_PL_FLAG_TOPDOWN;
@@ -247,8 +252,6 @@ intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
 	return ret ? ERR_PTR(ret) : res;
 }
 
-#endif
-
 /**
  * intel_region_ttm_resource_free - Free a resource allocated from a resource manager
  * @mem: The region the resource was allocated from.
@@ -260,9 +263,34 @@ void intel_region_ttm_resource_free(struct intel_memory_region *mem,
 	struct ttm_resource_manager *man = mem->region_private;
 	struct ttm_buffer_object mock_bo = {};
 
+	if (!man)
+		return;
+
 	mock_bo.base.size = res->num_pages << PAGE_SHIFT;
 	mock_bo.bdev = &mem->i915->bdev;
 	res->bo = &mock_bo;
 
 	man->func->free(man, res);
 }
+
+/**
+ * intel_region_ttm_range_busy - check whether range has any allocations
+ * @mem: The region to check
+ * @start: the start of the range to check
+ * @size: size of the range to check
+ *
+ * Return: true if something is alloceted within the region, false otherwise.
+ */
+bool intel_region_ttm_range_busy(struct intel_memory_region *mem,
+				 u64 start, u64 size)
+{
+	struct ttm_resource *dummy;
+
+	dummy = intel_region_ttm_resource_alloc(mem, size, start, start + size,
+						I915_BO_ALLOC_CONTIGUOUS);
+	if (IS_ERR(dummy))
+		return true;
+
+	intel_region_ttm_resource_free(mem, dummy);
+	return false;
+}
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
index cf9d86dcf409..1e88472fb2ea 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -29,15 +29,17 @@ intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
 void intel_region_ttm_resource_free(struct intel_memory_region *mem,
 				    struct ttm_resource *res);
 
+bool intel_region_ttm_range_busy(struct intel_memory_region *mem,
+				 u64 start, u64 size);
+
 int intel_region_to_ttm_type(const struct intel_memory_region *mem);
 
 struct ttm_device_funcs *i915_ttm_driver(void);
 
-#ifdef CONFIG_DRM_I915_SELFTEST
 struct ttm_resource *
 intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
-				resource_size_t offset,
 				resource_size_t size,
+				u64 start,
+				u64 end,
 				unsigned int flags);
-#endif
 #endif /* _INTEL_REGION_TTM_H_ */
diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c
index 670557ce1024..a41d81dc345d 100644
--- a/drivers/gpu/drm/i915/selftests/mock_region.c
+++ b/drivers/gpu/drm/i915/selftests/mock_region.c
@@ -24,10 +24,20 @@ static int mock_region_get_pages(struct drm_i915_gem_object *obj)
 {
 	struct sg_table *pages;
 	int err;
+	u64 start, end;
+
+	if (obj->bo_offset == I915_BO_INVALID_OFFSET) {
+		start = 0;
+		end = 0;
+	} else {
+		start = obj->bo_offset;
+		end = obj->bo_offset + obj->base.size;
+	}
 
 	obj->mm.res = intel_region_ttm_resource_alloc(obj->mm.region,
-						      obj->bo_offset,
 						      obj->base.size,
+						      start,
+						      end,
 						      obj->flags);
 	if (IS_ERR(obj->mm.res))
 		return PTR_ERR(obj->mm.res);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 10/10] drm/i915: stolen memory use ttm backend
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Robert Beckett, Thomas Hellström, kernel, Matthew Auld,
	linux-kernel

refactor stolen memory region to use ttm.
this necessitates using ttm resources to track reserved stolen regions
instead of drm_mm_nodes.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/display/intel_fbc.c      |  78 ++--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    | 440 +++++++-----------
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h    |  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   7 +
 drivers/gpu/drm/i915/gt/intel_rc6.c           |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |   7 +-
 drivers/gpu/drm/i915/i915_drv.h               |   5 -
 drivers/gpu/drm/i915/intel_region_ttm.c       |  42 +-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 13 files changed, 303 insertions(+), 342 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c
index 8b807284cde1..6f3afac5e8c9 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -38,6 +38,7 @@
  * forcibly disable it to allow proper screen updates.
  */
 
+#include "gem/i915_gem_stolen.h"
 #include <linux/string_helpers.h>
 
 #include <drm/drm_fourcc.h>
@@ -51,6 +52,7 @@
 #include "intel_display_types.h"
 #include "intel_fbc.h"
 #include "intel_frontbuffer.h"
+#include "gem/i915_gem_region.h"
 
 #define for_each_fbc_id(__dev_priv, __fbc_id) \
 	for ((__fbc_id) = INTEL_FBC_A; (__fbc_id) < I915_MAX_FBCS; (__fbc_id)++) \
@@ -92,8 +94,8 @@ struct intel_fbc {
 	struct mutex lock;
 	unsigned int busy_bits;
 
-	struct drm_mm_node compressed_fb;
-	struct drm_mm_node compressed_llb;
+	struct ttm_resource *compressed_fb;
+	struct ttm_resource *compressed_llb;
 
 	enum intel_fbc_id id;
 
@@ -331,16 +333,20 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
 	struct drm_i915_private *i915 = fbc->i915;
+	u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
+	u64 llb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_llb);
 
+	GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+	GEM_BUG_ON(llb_offset == I915_BO_INVALID_OFFSET);
 	GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-					 fbc->compressed_fb.start, U32_MAX));
+					 fb_offset, U32_MAX));
 	GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-					 fbc->compressed_llb.start, U32_MAX));
+					 llb_offset, U32_MAX));
 
 	intel_de_write(i915, FBC_CFB_BASE,
-		       i915->dsm.start + fbc->compressed_fb.start);
+		       i915->dsm.start + fb_offset);
 	intel_de_write(i915, FBC_LL_BASE,
-		       i915->dsm.start + fbc->compressed_llb.start);
+		       i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -448,8 +454,10 @@ static bool g4x_fbc_is_compressing(struct intel_fbc *fbc)
 static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
 	struct drm_i915_private *i915 = fbc->i915;
+	u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-	intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+	GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+	intel_de_write(i915, DPFC_CB_BASE, fb_offset);
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -499,8 +507,10 @@ static bool ilk_fbc_is_compressing(struct intel_fbc *fbc)
 static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
 	struct drm_i915_private *i915 = fbc->i915;
+	u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-	intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fbc->compressed_fb.start);
+	GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+	intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fb_offset);
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -744,21 +754,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
 	struct drm_i915_private *i915 = fbc->i915;
 	u64 end = intel_fbc_stolen_end(i915);
-	int ret, limit = min_limit;
+	int limit = min_limit;
+	struct ttm_resource *res;
 
 	size /= limit;
 
 	/* Try to over-allocate to reduce reallocations and fragmentation. */
-	ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-						   size <<= 1, 4096, 0, end);
-	if (ret == 0)
+	res = i915_gem_stolen_reserve_range(i915, size <<= 1, 0, end);
+	if (!IS_ERR(res)) {
+		fbc->compressed_fb = res;
 		return limit;
+	}
 
 	for (; limit <= intel_fbc_max_limit(i915); limit <<= 1) {
-		ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-							   size >>= 1, 4096, 0, end);
-		if (ret == 0)
+		res = i915_gem_stolen_reserve_range(i915, size >>= 1, 0, end);
+		if (!IS_ERR(res)) {
+			fbc->compressed_fb = res;
 			return limit;
+		}
 	}
 
 	return 0;
@@ -769,17 +782,18 @@ static int intel_fbc_alloc_cfb(struct intel_fbc *fbc,
 {
 	struct drm_i915_private *i915 = fbc->i915;
 	int ret;
+	struct ttm_resource *res;
 
-	drm_WARN_ON(&i915->drm,
-		    drm_mm_node_allocated(&fbc->compressed_fb));
-	drm_WARN_ON(&i915->drm,
-		    drm_mm_node_allocated(&fbc->compressed_llb));
+	drm_WARN_ON(&i915->drm, fbc->compressed_fb);
+	drm_WARN_ON(&i915->drm, fbc->compressed_llb);
 
 	if (DISPLAY_VER(i915) < 5 && !IS_G4X(i915)) {
-		ret = i915_gem_stolen_insert_node(i915, &fbc->compressed_llb,
-						  4096, 4096);
-		if (ret)
+		res = i915_gem_stolen_reserve_range(i915, 4096, I915_GEM_STOLEN_BIAS, 0);
+		if (IS_ERR(res)) {
+			ret = PTR_ERR(res);
 			goto err;
+		}
+		fbc->compressed_llb = res;
 	}
 
 	ret = find_compression_limit(fbc, size, min_limit);
@@ -793,15 +807,14 @@ static int intel_fbc_alloc_cfb(struct intel_fbc *fbc,
 
 	drm_dbg_kms(&i915->drm,
 		    "reserved %llu bytes of contiguous stolen space for FBC, limit: %d\n",
-		    fbc->compressed_fb.size, fbc->limit);
+		    i915_gem_stolen_reserve_size(fbc->compressed_fb), fbc->limit);
 
 	return 0;
 
 err_llb:
-	if (drm_mm_node_allocated(&fbc->compressed_llb))
-		i915_gem_stolen_remove_node(i915, &fbc->compressed_llb);
+	i915_gem_stolen_release_range(i915, fetch_and_zero(&fbc->compressed_llb));
 err:
-	if (drm_mm_initialized(&i915->mm.stolen))
+	if (IS_ERR(res) && (PTR_ERR(res) == -ENOMEM || PTR_ERR(res) == -ENXIO))
 		drm_info_once(&i915->drm, "not enough stolen space for compressed buffer (need %d more bytes), disabling. Hint: you may be able to increase stolen memory size in the BIOS to avoid this.\n", size);
 	return -ENOSPC;
 }
@@ -826,10 +839,10 @@ static void __intel_fbc_cleanup_cfb(struct intel_fbc *fbc)
 	if (WARN_ON(intel_fbc_hw_is_active(fbc)))
 		return;
 
-	if (drm_mm_node_allocated(&fbc->compressed_llb))
-		i915_gem_stolen_remove_node(i915, &fbc->compressed_llb);
-	if (drm_mm_node_allocated(&fbc->compressed_fb))
-		i915_gem_stolen_remove_node(i915, &fbc->compressed_fb);
+	if (fbc->compressed_llb)
+		i915_gem_stolen_release_range(i915, fetch_and_zero(&fbc->compressed_llb));
+	if (fbc->compressed_fb)
+		i915_gem_stolen_release_range(i915, fetch_and_zero(&fbc->compressed_fb));
 }
 
 void intel_fbc_cleanup(struct drm_i915_private *i915)
@@ -1034,9 +1047,10 @@ static bool intel_fbc_is_cfb_ok(const struct intel_plane_state *plane_state)
 {
 	struct intel_plane *plane = to_intel_plane(plane_state->uapi.plane);
 	struct intel_fbc *fbc = plane->fbc;
+	u64 fb_size = i915_gem_stolen_reserve_size(fbc->compressed_fb);
 
 	return intel_fbc_min_limit(plane_state) <= fbc->limit &&
-		intel_fbc_cfb_size(plane_state) <= fbc->compressed_fb.size * fbc->limit;
+		intel_fbc_cfb_size(plane_state) <= fb_size * fbc->limit;
 }
 
 static bool intel_fbc_is_ok(const struct intel_plane_state *plane_state)
@@ -1702,7 +1716,7 @@ void intel_fbc_init(struct drm_i915_private *i915)
 {
 	enum intel_fbc_id fbc_id;
 
-	if (!drm_mm_initialized(&i915->mm.stolen))
+	if (!i915->mm.stolen_region)
 		mkwrite_device_info(i915)->display.fbc_mask = 0;
 
 	if (need_fbc_vtd_wa(i915))
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 07bc11247a3e..13466cca7af8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -633,8 +633,6 @@ struct drm_i915_gem_object {
 		} userptr;
 #endif
 
-		struct drm_mm_node *stolen;
-
 		resource_size_t bo_offset;
 
 		unsigned long scratch;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 47b5e0e342ab..e7fdccf628d8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -4,75 +4,111 @@
  * Copyright © 2008-2012 Intel Corporation
  */
 
+#include "drm/ttm/ttm_placement.h"
+#include "gem/i915_gem_object_types.h"
 #include <linux/errno.h>
 #include <linux/mutex.h>
 
-#include <drm/drm_mm.h>
 #include <drm/i915_drm.h>
 
 #include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_region.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_region_lmem.h"
+#include "gem/i915_gem_ttm.h"
 #include "i915_drv.h"
 #include "i915_gem_stolen.h"
 #include "i915_reg.h"
 #include "i915_utils.h"
 #include "i915_vgpu.h"
 #include "intel_mchbar_regs.h"
+#include "intel_region_ttm.h"
 
-/*
- * The BIOS typically reserves some of the system's memory for the exclusive
- * use of the integrated graphics. This memory is no longer available for
- * use by the OS and so the user finds that his system has less memory
- * available than he put in. We refer to this memory as stolen.
- *
- * The BIOS will allocate its framebuffer from the stolen memory. Our
- * goal is try to reuse that object for our own fbcon which must always
- * be available for panics. Anything else we can reuse the stolen memory
- * for is a boon.
- */
+struct stolen_region {
+	struct intel_memory_region region;
+	struct ttm_resource *wa_resvd;
+};
 
-int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *i915,
-					 struct drm_mm_node *node, u64 size,
-					 unsigned alignment, u64 start, u64 end)
+inline struct stolen_region *to_stolen_region(struct intel_memory_region *mem)
 {
-	int ret;
-
-	if (!drm_mm_initialized(&i915->mm.stolen))
-		return -ENODEV;
+	return container_of(mem, struct stolen_region, region);
+}
 
-	/* WaSkipStolenMemoryFirstPage:bdw+ */
-	if (GRAPHICS_VER(i915) >= 8 && start < 4096)
-		start = 4096;
+/**
+ * i915_gem_stolen_reserve_range - reserve a region of space in a given range
+ * @i915: i915 device instance
+ * @size: size of region to reserve
+ * @start: start of search area
+ * @end: end of search area
+ *
+ * Search for @size amount of free space within the region delimeted by @start and @end.
+ * If found reserve it from future use until later release with @i915_gem_stolen_release_range.
+ *
+ * Return: pointer to resource tracking structure on success, ERR_PTR otherwise
+ */
+struct ttm_resource *
+i915_gem_stolen_reserve_range(struct drm_i915_private *i915,
+			      resource_size_t size,
+			      u64 start, u64 end)
+{
+	struct intel_memory_region *mem = i915->mm.stolen_region;
 
-	mutex_lock(&i915->mm.stolen_lock);
-	ret = drm_mm_insert_node_in_range(&i915->mm.stolen, node,
-					  size, alignment, 0,
-					  start, end, DRM_MM_INSERT_BEST);
-	mutex_unlock(&i915->mm.stolen_lock);
+	if (!mem)
+		return ERR_PTR(-ENODEV);
+	return intel_region_ttm_resource_alloc(mem, size, start, end, I915_BO_ALLOC_CONTIGUOUS);
+}
 
-	return ret;
+/**
+ * i915_gem_stolen_reserve_offset - return the offset of the reserved space
+ * @res: pointer to resource tracking structure to check
+ *
+ * Return: The offset of the reserved resource, or I915_BO_INVALID_OFFSET on error
+ */
+u64 i915_gem_stolen_reserve_offset(struct ttm_resource *res)
+{
+	if (!res)
+		return I915_BO_INVALID_OFFSET;
+	return PFN_PHYS(res->start);
 }
 
-int i915_gem_stolen_insert_node(struct drm_i915_private *i915,
-				struct drm_mm_node *node, u64 size,
-				unsigned alignment)
+/**
+ * i915_gem_stolen_reserve_size - return the reserved size of the reserved space
+ * @res: pointer to resource tracking structure to check
+ *
+ * Return: The size of the reserved resource, or I915_BO_INVALID_OFFSET on error
+ */
+u64 i915_gem_stolen_reserve_size(struct ttm_resource *res)
 {
-	return i915_gem_stolen_insert_node_in_range(i915, node,
-						    size, alignment,
-						    I915_GEM_STOLEN_BIAS,
-						    U64_MAX);
+	if (!res)
+		return I915_BO_INVALID_OFFSET;
+	return PFN_PHYS(res->num_pages);
 }
 
-void i915_gem_stolen_remove_node(struct drm_i915_private *i915,
-				 struct drm_mm_node *node)
+/**
+ * i915_gem_stolen_release_range - release the reserved area to be free for allocation again
+ * @i915: i915 device instance
+ * @res: pointer to resource tracking structure allocated via @i915_gem_stolen_reserve_range
+ */
+void i915_gem_stolen_release_range(struct drm_i915_private *i915,
+				   struct ttm_resource *res)
 {
-	mutex_lock(&i915->mm.stolen_lock);
-	drm_mm_remove_node(node);
-	mutex_unlock(&i915->mm.stolen_lock);
+	struct intel_memory_region *mem = i915->mm.stolen_region;
+
+	intel_region_ttm_resource_free(mem, res);
 }
 
+/*
+ * The BIOS typically reserves some of the system's memory for the exclusive
+ * use of the integrated graphics. This memory is no longer available for
+ * use by the OS and so the user finds that his system has less memory
+ * available than he put in. We refer to this memory as stolen.
+ *
+ * The BIOS will allocate its framebuffer from the stolen memory. Our
+ * goal is try to reuse that object for our own fbcon which must always
+ * be available for panics. Anything else we can reuse the stolen memory
+ * for is a boon.
+ */
+
 static int i915_adjust_stolen(struct drm_i915_private *i915,
 			      struct resource *dsm)
 {
@@ -173,14 +209,6 @@ static int i915_adjust_stolen(struct drm_i915_private *i915,
 	return 0;
 }
 
-static void i915_gem_cleanup_stolen(struct drm_i915_private *i915)
-{
-	if (!drm_mm_initialized(&i915->mm.stolen))
-		return;
-
-	drm_mm_takedown(&i915->mm.stolen);
-}
-
 static void g4x_get_stolen_reserved(struct drm_i915_private *i915,
 				    struct intel_uncore *uncore,
 				    resource_size_t *base,
@@ -394,8 +422,7 @@ static int i915_gem_init_stolen(struct intel_memory_region *mem)
 	struct intel_uncore *uncore = &i915->uncore;
 	resource_size_t reserved_base, stolen_top;
 	resource_size_t reserved_total, reserved_size;
-
-	mutex_init(&i915->mm.stolen_lock);
+	int ret;
 
 	if (intel_vgpu_active(i915)) {
 		drm_notice(&i915->drm,
@@ -513,258 +540,100 @@ static int i915_gem_init_stolen(struct intel_memory_region *mem)
 		return 0;
 
 	/* Basic memrange allocator for stolen space. */
-	drm_mm_init(&i915->mm.stolen, 0, i915->stolen_usable_size);
-
-	return 0;
-}
-
-static void dbg_poison(struct i915_ggtt *ggtt,
-		       dma_addr_t addr, resource_size_t size,
-		       u8 x)
-{
-#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
-	if (!drm_mm_node_allocated(&ggtt->error_capture))
-		return;
-
-	if (ggtt->vm.bind_async_flags & I915_VMA_GLOBAL_BIND)
-		return; /* beware stop_machine() inversion */
-
-	GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
-
-	mutex_lock(&ggtt->error_mutex);
-	while (size) {
-		void __iomem *s;
-
-		ggtt->vm.insert_page(&ggtt->vm, addr,
-				     ggtt->error_capture.start,
-				     I915_CACHE_NONE, 0);
-		mb();
-
-		s = io_mapping_map_wc(&ggtt->iomap,
-				      ggtt->error_capture.start,
-				      PAGE_SIZE);
-		memset_io(s, x, PAGE_SIZE);
-		io_mapping_unmap(s);
-
-		addr += PAGE_SIZE;
-		size -= PAGE_SIZE;
-	}
-	mb();
-	ggtt->vm.clear_range(&ggtt->vm, ggtt->error_capture.start, PAGE_SIZE);
-	mutex_unlock(&ggtt->error_mutex);
-#endif
-}
-
-static struct sg_table *
-i915_pages_create_for_stolen(struct drm_device *dev,
-			     resource_size_t offset, resource_size_t size)
-{
-	struct drm_i915_private *i915 = to_i915(dev);
-	struct sg_table *st;
-	struct scatterlist *sg;
-
-	GEM_BUG_ON(range_overflows(offset, size, resource_size(&i915->dsm)));
-
-	/* We hide that we have no struct page backing our stolen object
-	 * by wrapping the contiguous physical allocation with a fake
-	 * dma mapping in a single scatterlist.
-	 */
+	ret = intel_region_ttm_init(mem);
+	if (ret)
+		return ret;
 
-	st = kmalloc(sizeof(*st), GFP_KERNEL);
-	if (st == NULL)
-		return ERR_PTR(-ENOMEM);
+	/* WaSkipStolenMemoryFirstPage:bdw+ */
+	if (GRAPHICS_VER(i915) >= 8) {
+		struct stolen_region *region = to_stolen_region(mem);
+		struct ttm_resource *resvd;
+
+		resvd = intel_region_ttm_resource_alloc(mem, 4096, 0, 4096, 0);
+		if (IS_ERR(resvd)) {
+			ret = PTR_ERR(resvd);
+			goto err_no_reserve;
+		}
 
-	if (sg_alloc_table(st, 1, GFP_KERNEL)) {
-		kfree(st);
-		return ERR_PTR(-ENOMEM);
+		region->wa_resvd = resvd;
 	}
 
-	sg = st->sgl;
-	sg->offset = 0;
-	sg->length = size;
+	return ret;
 
-	sg_dma_address(sg) = (dma_addr_t)i915->dsm.start + offset;
-	sg_dma_len(sg) = size;
+err_no_reserve:
+	intel_region_ttm_fini(mem);
 
-	return st;
+	return ret;
 }
 
-static int i915_gem_object_get_pages_stolen(struct drm_i915_gem_object *obj)
+struct drm_i915_gem_object *
+i915_gem_object_create_stolen(struct drm_i915_private *i915,
+			      resource_size_t size)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	struct sg_table *pages =
-		i915_pages_create_for_stolen(obj->base.dev,
-					     obj->stolen->start,
-					     obj->stolen->size);
-	if (IS_ERR(pages))
-		return PTR_ERR(pages);
-
-	dbg_poison(to_gt(i915)->ggtt,
-		   sg_dma_address(pages->sgl),
-		   sg_dma_len(pages->sgl),
-		   POISON_INUSE);
-
-	__i915_gem_object_set_pages(obj, pages, obj->stolen->size);
-
-	return 0;
+	return i915_gem_object_create_region(i915->mm.stolen_region, size, 0,
+					     I915_BO_ALLOC_CONTIGUOUS | I915_BO_ALLOC_PINNED);
 }
 
-static void i915_gem_object_put_pages_stolen(struct drm_i915_gem_object *obj,
-					     struct sg_table *pages)
+static struct intel_memory_region *alloc_stolen_region(void)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	/* Should only be called from i915_gem_object_release_stolen() */
+	struct stolen_region *region = kzalloc(sizeof(*region), GFP_KERNEL);
 
-	dbg_poison(to_gt(i915)->ggtt,
-		   sg_dma_address(pages->sgl),
-		   sg_dma_len(pages->sgl),
-		   POISON_FREE);
+	if (!region)
+		return NULL;
 
-	sg_free_table(pages);
-	kfree(pages);
+	return &region->region;
 }
 
-static void
-i915_gem_object_release_stolen(struct drm_i915_gem_object *obj)
+static void free_stolen_region(struct intel_memory_region *mem)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	struct drm_mm_node *stolen = fetch_and_zero(&obj->stolen);
-
-	GEM_BUG_ON(!stolen);
-	i915_gem_stolen_remove_node(i915, stolen);
-	kfree(stolen);
+	struct stolen_region *region = to_stolen_region(mem);
 
-	i915_gem_object_release_memory_region(obj);
+	kfree(region);
 }
 
-static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = {
-	.name = "i915_gem_object_stolen",
-	.get_pages = i915_gem_object_get_pages_stolen,
-	.put_pages = i915_gem_object_put_pages_stolen,
-	.release = i915_gem_object_release_stolen,
-};
-
-static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
-					   struct drm_i915_gem_object *obj,
-					   struct drm_mm_node *stolen)
+static int init_stolen_smem(struct intel_memory_region *mem)
 {
-	static struct lock_class_key lock_class;
-	unsigned int cache_level;
-	unsigned int flags;
-	int err;
-
 	/*
-	 * Stolen objects are always physically contiguous since we just
-	 * allocate one big block underneath using the drm_mm range allocator.
+	 * Initialise stolen early so that we may reserve preallocated
+	 * objects for the BIOS to KMS transition.
 	 */
-	flags = I915_BO_ALLOC_CONTIGUOUS;
-
-	drm_gem_private_object_init(&mem->i915->drm, &obj->base, stolen->size);
-	i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, flags);
-
-	obj->stolen = stolen;
-	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
-	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
-	i915_gem_object_set_cache_coherency(obj, cache_level);
-
-	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
-		return -EBUSY;
+	return i915_gem_init_stolen(mem);
+}
 
-	i915_gem_object_init_memory_region(obj, mem);
+static int release_stolen(struct intel_memory_region *mem)
+{
+	struct stolen_region *region = to_stolen_region(mem);
 
-	err = i915_gem_object_pin_pages(obj);
-	if (err)
-		i915_gem_object_release_memory_region(obj);
-	i915_gem_object_unlock(obj);
+	if (region->wa_resvd)
+		intel_region_ttm_resource_free(mem, region->wa_resvd);
 
-	return err;
+	return intel_region_ttm_fini(mem);
 }
 
-static int _i915_gem_object_stolen_init(struct intel_memory_region *mem,
-					struct drm_i915_gem_object *obj,
-					resource_size_t offset,
-					resource_size_t size,
-					resource_size_t page_size,
-					unsigned int flags)
+static int stolen_object_init(struct intel_memory_region *mem,
+			      struct drm_i915_gem_object *obj,
+			      resource_size_t offset,
+			      resource_size_t size,
+			      resource_size_t page_size,
+			      unsigned int flags)
 {
-	struct drm_i915_private *i915 = mem->i915;
-	struct drm_mm_node *stolen;
-	int ret;
-
-	if (!drm_mm_initialized(&i915->mm.stolen))
+	if (!mem->region_private)
 		return -ENODEV;
 
 	if (size == 0)
 		return -EINVAL;
 
-	/*
-	 * With discrete devices, where we lack a mappable aperture there is no
-	 * possible way to ever access this memory on the CPU side.
-	 */
-	if (mem->type == INTEL_MEMORY_STOLEN_LOCAL && !mem->io_size &&
-	    !(flags & I915_BO_ALLOC_GPU_ONLY))
-		return -ENOSPC;
-
-	stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
-	if (!stolen)
-		return -ENOMEM;
-
-	if (offset != I915_BO_INVALID_OFFSET) {
-		drm_dbg(&i915->drm,
-			"creating preallocated stolen object: stolen_offset=%pa, size=%pa\n",
-			&offset, &size);
-
-		stolen->start = offset;
-		stolen->size = size;
-		mutex_lock(&i915->mm.stolen_lock);
-		ret = drm_mm_reserve_node(&i915->mm.stolen, stolen);
-		mutex_unlock(&i915->mm.stolen_lock);
-	} else {
-		ret = i915_gem_stolen_insert_node(i915, stolen, size,
-						  mem->min_page_size);
-	}
-	if (ret)
-		goto err_free;
-
-	ret = __i915_gem_object_create_stolen(mem, obj, stolen);
-	if (ret)
-		goto err_remove;
-
-	return 0;
-
-err_remove:
-	i915_gem_stolen_remove_node(i915, stolen);
-err_free:
-	kfree(stolen);
-	return ret;
-}
-
-struct drm_i915_gem_object *
-i915_gem_object_create_stolen(struct drm_i915_private *i915,
-			      resource_size_t size)
-{
-	return i915_gem_object_create_region(i915->mm.stolen_region, size, 0, 0);
-}
-
-static int init_stolen_smem(struct intel_memory_region *mem)
-{
-	/*
-	 * Initialise stolen early so that we may reserve preallocated
-	 * objects for the BIOS to KMS transition.
-	 */
-	return i915_gem_init_stolen(mem);
-}
-
-static int release_stolen_smem(struct intel_memory_region *mem)
-{
-	i915_gem_cleanup_stolen(mem->i915);
-	return 0;
+	/* default range manager relies on page_size for alignment */
+	page_size = max(page_size, mem->min_page_size);
+	return __i915_gem_ttm_object_init(mem, obj, offset, size, page_size, flags);
 }
 
 static const struct intel_memory_region_ops i915_region_stolen_smem_ops = {
 	.init = init_stolen_smem,
-	.release = release_stolen_smem,
-	.init_object = _i915_gem_object_stolen_init,
+	.release = release_stolen,
+	.init_object = stolen_object_init,
+	.alloc = alloc_stolen_region,
+	.free = free_stolen_region,
 };
 
 static int init_stolen_lmem(struct intel_memory_region *mem)
@@ -793,7 +662,7 @@ static int init_stolen_lmem(struct intel_memory_region *mem)
 	return 0;
 
 err_cleanup:
-	i915_gem_cleanup_stolen(mem->i915);
+	intel_region_ttm_fini(mem);
 	return err;
 }
 
@@ -801,14 +670,15 @@ static int release_stolen_lmem(struct intel_memory_region *mem)
 {
 	if (mem->io_size)
 		io_mapping_fini(&mem->iomap);
-	i915_gem_cleanup_stolen(mem->i915);
-	return 0;
+	return release_stolen(mem);
 }
 
 static const struct intel_memory_region_ops i915_region_stolen_lmem_ops = {
 	.init = init_stolen_lmem,
 	.release = release_stolen_lmem,
-	.init_object = _i915_gem_object_stolen_init,
+	.init_object = stolen_object_init,
+	.alloc = alloc_stolen_region,
+	.free = free_stolen_region,
 };
 
 struct intel_memory_region *
@@ -896,7 +766,37 @@ i915_gem_stolen_smem_setup(struct drm_i915_private *i915, u16 type,
 	return mem;
 }
 
-bool i915_gem_object_is_stolen(const struct drm_i915_gem_object *obj)
+/**
+ * i915_gem_object_stolen_offset - return offset from start of stolen region
+ * @obj: the object to return the offset of
+ *
+ * Get the offset from stolen region if this object is currently placed in stolen memory.
+ *
+ * Return: offset from stolen if successful, I915_BO_INVALID_OFFSET otherwise
+ */
+u64 i915_gem_object_stolen_offset(struct drm_i915_gem_object *obj)
 {
-	return obj->ops == &i915_gem_object_stolen_ops;
+	struct ttm_buffer_object *ttm_obj;
+
+	if (!obj || !i915_gem_object_is_stolen(obj))
+		return I915_BO_INVALID_OFFSET;
+
+	ttm_obj = i915_gem_to_ttm(obj);
+	if (ttm_obj->resource->mem_type != I915_PL_STOLEN)
+		return I915_BO_INVALID_OFFSET;
+
+	return PFN_PHYS(ttm_obj->resource->start);
+}
+
+bool i915_gem_object_is_stolen(struct drm_i915_gem_object *obj)
+{
+	struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+
+#ifdef CONFIG_LOCKDEP
+	if (i915_gem_object_migratable(obj) &&
+	    i915_gem_object_evictable(obj))
+		assert_object_held(obj);
+#endif
+	return mr && (mr->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+		      mr->type == INTEL_MEMORY_STOLEN_LOCAL);
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h
index d5005a39d130..b39cb6e6c768 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h
@@ -10,17 +10,9 @@
 
 struct drm_i915_private;
 struct drm_mm_node;
+struct ttm_resource;
 struct drm_i915_gem_object;
 
-int i915_gem_stolen_insert_node(struct drm_i915_private *dev_priv,
-				struct drm_mm_node *node, u64 size,
-				unsigned alignment);
-int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *dev_priv,
-					 struct drm_mm_node *node, u64 size,
-					 unsigned alignment, u64 start,
-					 u64 end);
-void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,
-				 struct drm_mm_node *node);
 struct intel_memory_region *
 i915_gem_stolen_smem_setup(struct drm_i915_private *i915, u16 type,
 			   u16 instance);
@@ -32,7 +24,16 @@ struct drm_i915_gem_object *
 i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
 			      resource_size_t size);
 
-bool i915_gem_object_is_stolen(const struct drm_i915_gem_object *obj);
+u64 i915_gem_object_stolen_offset(struct drm_i915_gem_object *obj);
+bool i915_gem_object_is_stolen(struct drm_i915_gem_object *obj);
+struct ttm_resource *
+i915_gem_stolen_reserve_range(struct drm_i915_private *i915,
+			      resource_size_t size,
+			      u64 start, u64 end);
+u64 i915_gem_stolen_reserve_offset(struct ttm_resource *res);
+u64 i915_gem_stolen_reserve_size(struct ttm_resource *res);
+void i915_gem_stolen_release_range(struct drm_i915_private *i915,
+				   struct ttm_resource *res);
 
 #define I915_GEM_STOLEN_BIAS SZ_128K
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index bb988608296d..c7611efddcf7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1203,6 +1203,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	struct ttm_place _place;
 	struct ttm_placement _placement;
 	struct ttm_placement *placement;
+	bool is_stolen = intel_region_to_ttm_type(mem) == I915_PL_STOLEN;
 	int ret;
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,7 +1223,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
 
 	/* Forcing the page size is kernel internal only */
-	GEM_BUG_ON(page_size && obj->mm.n_placements);
+	GEM_BUG_ON(page_size && obj->mm.n_placements && !is_stolen);
 
 	/*
 	 * Keep an extra shrink pin to prevent the object from being made
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 73e371aa3850..81654a51df51 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -49,6 +49,13 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 			       resource_size_t size,
 			       resource_size_t page_size,
 			       unsigned int flags);
+int i915_gem_ttm_object_init_in_place(struct intel_memory_region *mem,
+				      struct drm_i915_gem_object *obj,
+				      resource_size_t size,
+				      resource_size_t page_size,
+				      unsigned int flags,
+				      u64 start,
+				      u64 end);
 
 /* Internal I915 TTM declarations and definitions below. */
 
diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c
index f8d0523f4c18..846eef898f7e 100644
--- a/drivers/gpu/drm/i915/gt/intel_rc6.c
+++ b/drivers/gpu/drm/i915/gt/intel_rc6.c
@@ -355,9 +355,9 @@ static int vlv_rc6_init(struct intel_rc6 *rc6)
 
 	GEM_BUG_ON(range_overflows_end_t(u64,
 					 i915->dsm.start,
-					 pctx->stolen->start,
+					 i915_gem_object_stolen_offset(pctx),
 					 U32_MAX));
-	pctx_paddr = i915->dsm.start + pctx->stolen->start;
+	pctx_paddr = i915->dsm.start + i915_gem_object_stolen_offset(pctx);
 	intel_uncore_write(uncore, VLV_PCBR, pctx_paddr);
 
 out:
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 37c38bdd5f47..75bc7d90c9dc 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -6,6 +6,7 @@
 #include <linux/crc32.h>
 
 #include "gem/i915_gem_stolen.h"
+#include "intel_region_ttm.h"
 
 #include "i915_memcpy.h"
 #include "i915_selftest.h"
@@ -83,6 +84,7 @@ __igt_reset_stolen(struct intel_gt *gt,
 		dma_addr_t dma = (dma_addr_t)dsm->start + (page << PAGE_SHIFT);
 		void __iomem *s;
 		void *in;
+		bool busy;
 
 		ggtt->vm.insert_page(&ggtt->vm, dma,
 				     ggtt->error_capture.start,
@@ -93,9 +95,9 @@ __igt_reset_stolen(struct intel_gt *gt,
 				      ggtt->error_capture.start,
 				      PAGE_SIZE);
 
-		if (!__drm_mm_interval_first(&gt->i915->mm.stolen,
-					     page << PAGE_SHIFT,
-					     ((page + 1) << PAGE_SHIFT) - 1))
+		busy = intel_region_ttm_range_busy(gt->i915->mm.stolen_region,
+						   PFN_PHYS(page), PAGE_SIZE);
+		if (!busy)
 			memset_io(s, STACK_MAGIC, PAGE_SIZE);
 
 		in = (void __force *)s;
@@ -124,6 +126,7 @@ __igt_reset_stolen(struct intel_gt *gt,
 		void __iomem *s;
 		void *in;
 		u32 x;
+		bool busy;
 
 		ggtt->vm.insert_page(&ggtt->vm, dma,
 				     ggtt->error_capture.start,
@@ -139,10 +142,9 @@ __igt_reset_stolen(struct intel_gt *gt,
 			in = tmp;
 		x = crc32_le(0, in, PAGE_SIZE);
 
-		if (x != crc[page] &&
-		    !__drm_mm_interval_first(&gt->i915->mm.stolen,
-					     page << PAGE_SHIFT,
-					     ((page + 1) << PAGE_SHIFT) - 1)) {
+		busy = intel_region_ttm_range_busy(gt->i915->mm.stolen_region,
+						   PFN_PHYS(page), PAGE_SIZE);
+		if (x != crc[page] && !busy) {
 			pr_debug("unused stolen page %pa modified by GPU reset\n",
 				 &page);
 			if (count++ == 0)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 94e5c29d2ee3..e538b8f71dcc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -32,6 +32,7 @@
 
 #include <drm/drm_debugfs.h>
 
+#include "gem/i915_gem_region.h"
 #include "gem/i915_gem_context.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_buffer_pool.h"
@@ -157,6 +158,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	struct i915_vma *vma;
 	int pin_count = 0;
+	u64 offset;
 
 	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
 		   &obj->base,
@@ -241,8 +243,9 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	spin_unlock(&obj->vma.lock);
 
 	seq_printf(m, " (pinned x %d)", pin_count);
-	if (i915_gem_object_is_stolen(obj))
-		seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
+	offset = i915_gem_object_stolen_offset(obj);
+	if (offset != I915_BO_INVALID_OFFSET)
+		seq_printf(m, " (stolen: %08llx)", offset);
 	if (i915_gem_object_is_framebuffer(obj))
 		seq_printf(m, " (fb)");
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index eba94fa76b18..1ebb266e7b63 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -224,11 +224,6 @@ struct i915_gem_mm {
 	 * support stolen.
 	 */
 	struct intel_memory_region *stolen_region;
-	/** Memory allocator for GTT stolen memory */
-	struct drm_mm stolen;
-	/** Protects the usage of the GTT stolen memory allocator. This is
-	 * always the inner lock when overlapping with struct_mutex. */
-	struct mutex stolen_lock;
 
 	/* Protects bound_list/unbound_list and #drm_i915_gem_object.mm.link */
 	spinlock_t obj_lock;
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 694e9acb69e2..b3850188981b 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -194,11 +194,12 @@ intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
 	}
 }
 
-#ifdef CONFIG_DRM_I915_SELFTEST
 /**
  * intel_region_ttm_resource_alloc - Allocate memory resources from a region
  * @mem: The memory region,
  * @size: The requested size in bytes
+ * @start: start of allowed range
+ * @end: end of allowed range
  * @flags: Allocation flags
  *
  * This functionality is provided only for callers that need to allocate
@@ -212,8 +213,9 @@ intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
  */
 struct ttm_resource *
 intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
-				resource_size_t offset,
 				resource_size_t size,
+				u64 start,
+				u64 end,
 				unsigned int flags)
 {
 	struct ttm_resource_manager *man = mem->region_private;
@@ -222,11 +224,14 @@ intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
 	struct ttm_resource *res;
 	int ret;
 
+	if (!man)
+		return ERR_PTR(-ENODEV);
+
 	if (flags & I915_BO_ALLOC_CONTIGUOUS)
 		place.flags |= TTM_PL_FLAG_CONTIGUOUS;
-	if (offset != I915_BO_INVALID_OFFSET) {
-		place.fpfn = offset >> PAGE_SHIFT;
-		place.lpfn = place.fpfn + (size >> PAGE_SHIFT);
+	if (start || end) {
+		place.fpfn = PFN_DOWN(start);
+		place.lpfn = PFN_UP(end);
 	} else if (mem->io_size && mem->io_size < mem->total) {
 		if (flags & I915_BO_ALLOC_GPU_ONLY) {
 			place.flags |= TTM_PL_FLAG_TOPDOWN;
@@ -247,8 +252,6 @@ intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
 	return ret ? ERR_PTR(ret) : res;
 }
 
-#endif
-
 /**
  * intel_region_ttm_resource_free - Free a resource allocated from a resource manager
  * @mem: The region the resource was allocated from.
@@ -260,9 +263,34 @@ void intel_region_ttm_resource_free(struct intel_memory_region *mem,
 	struct ttm_resource_manager *man = mem->region_private;
 	struct ttm_buffer_object mock_bo = {};
 
+	if (!man)
+		return;
+
 	mock_bo.base.size = res->num_pages << PAGE_SHIFT;
 	mock_bo.bdev = &mem->i915->bdev;
 	res->bo = &mock_bo;
 
 	man->func->free(man, res);
 }
+
+/**
+ * intel_region_ttm_range_busy - check whether range has any allocations
+ * @mem: The region to check
+ * @start: the start of the range to check
+ * @size: size of the range to check
+ *
+ * Return: true if something is alloceted within the region, false otherwise.
+ */
+bool intel_region_ttm_range_busy(struct intel_memory_region *mem,
+				 u64 start, u64 size)
+{
+	struct ttm_resource *dummy;
+
+	dummy = intel_region_ttm_resource_alloc(mem, size, start, start + size,
+						I915_BO_ALLOC_CONTIGUOUS);
+	if (IS_ERR(dummy))
+		return true;
+
+	intel_region_ttm_resource_free(mem, dummy);
+	return false;
+}
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
index cf9d86dcf409..1e88472fb2ea 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -29,15 +29,17 @@ intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
 void intel_region_ttm_resource_free(struct intel_memory_region *mem,
 				    struct ttm_resource *res);
 
+bool intel_region_ttm_range_busy(struct intel_memory_region *mem,
+				 u64 start, u64 size);
+
 int intel_region_to_ttm_type(const struct intel_memory_region *mem);
 
 struct ttm_device_funcs *i915_ttm_driver(void);
 
-#ifdef CONFIG_DRM_I915_SELFTEST
 struct ttm_resource *
 intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
-				resource_size_t offset,
 				resource_size_t size,
+				u64 start,
+				u64 end,
 				unsigned int flags);
-#endif
 #endif /* _INTEL_REGION_TTM_H_ */
diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c
index 670557ce1024..a41d81dc345d 100644
--- a/drivers/gpu/drm/i915/selftests/mock_region.c
+++ b/drivers/gpu/drm/i915/selftests/mock_region.c
@@ -24,10 +24,20 @@ static int mock_region_get_pages(struct drm_i915_gem_object *obj)
 {
 	struct sg_table *pages;
 	int err;
+	u64 start, end;
+
+	if (obj->bo_offset == I915_BO_INVALID_OFFSET) {
+		start = 0;
+		end = 0;
+	} else {
+		start = obj->bo_offset;
+		end = obj->bo_offset + obj->base.size;
+	}
 
 	obj->mm.res = intel_region_ttm_resource_alloc(obj->mm.region,
-						      obj->bo_offset,
 						      obj->base.size,
+						      start,
+						      end,
 						      obj->flags);
 	if (IS_ERR(obj->mm.res))
 		return PTR_ERR(obj->mm.res);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] [PATCH v7 10/10] drm/i915: stolen memory use ttm backend
@ 2022-06-20 21:33   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-20 21:33 UTC (permalink / raw)
  To: dri-devel, intel-gfx, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Thomas Hellström, kernel, Matthew Auld, linux-kernel

refactor stolen memory region to use ttm.
this necessitates using ttm resources to track reserved stolen regions
instead of drm_mm_nodes.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/display/intel_fbc.c      |  78 ++--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    | 440 +++++++-----------
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h    |  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |   7 +
 drivers/gpu/drm/i915/gt/intel_rc6.c           |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |   7 +-
 drivers/gpu/drm/i915/i915_drv.h               |   5 -
 drivers/gpu/drm/i915/intel_region_ttm.c       |  42 +-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |  12 +-
 13 files changed, 303 insertions(+), 342 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c
index 8b807284cde1..6f3afac5e8c9 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -38,6 +38,7 @@
  * forcibly disable it to allow proper screen updates.
  */
 
+#include "gem/i915_gem_stolen.h"
 #include <linux/string_helpers.h>
 
 #include <drm/drm_fourcc.h>
@@ -51,6 +52,7 @@
 #include "intel_display_types.h"
 #include "intel_fbc.h"
 #include "intel_frontbuffer.h"
+#include "gem/i915_gem_region.h"
 
 #define for_each_fbc_id(__dev_priv, __fbc_id) \
 	for ((__fbc_id) = INTEL_FBC_A; (__fbc_id) < I915_MAX_FBCS; (__fbc_id)++) \
@@ -92,8 +94,8 @@ struct intel_fbc {
 	struct mutex lock;
 	unsigned int busy_bits;
 
-	struct drm_mm_node compressed_fb;
-	struct drm_mm_node compressed_llb;
+	struct ttm_resource *compressed_fb;
+	struct ttm_resource *compressed_llb;
 
 	enum intel_fbc_id id;
 
@@ -331,16 +333,20 @@ static void i8xx_fbc_nuke(struct intel_fbc *fbc)
 static void i8xx_fbc_program_cfb(struct intel_fbc *fbc)
 {
 	struct drm_i915_private *i915 = fbc->i915;
+	u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
+	u64 llb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_llb);
 
+	GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+	GEM_BUG_ON(llb_offset == I915_BO_INVALID_OFFSET);
 	GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-					 fbc->compressed_fb.start, U32_MAX));
+					 fb_offset, U32_MAX));
 	GEM_BUG_ON(range_overflows_end_t(u64, i915->dsm.start,
-					 fbc->compressed_llb.start, U32_MAX));
+					 llb_offset, U32_MAX));
 
 	intel_de_write(i915, FBC_CFB_BASE,
-		       i915->dsm.start + fbc->compressed_fb.start);
+		       i915->dsm.start + fb_offset);
 	intel_de_write(i915, FBC_LL_BASE,
-		       i915->dsm.start + fbc->compressed_llb.start);
+		       i915->dsm.start + llb_offset);
 }
 
 static const struct intel_fbc_funcs i8xx_fbc_funcs = {
@@ -448,8 +454,10 @@ static bool g4x_fbc_is_compressing(struct intel_fbc *fbc)
 static void g4x_fbc_program_cfb(struct intel_fbc *fbc)
 {
 	struct drm_i915_private *i915 = fbc->i915;
+	u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-	intel_de_write(i915, DPFC_CB_BASE, fbc->compressed_fb.start);
+	GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+	intel_de_write(i915, DPFC_CB_BASE, fb_offset);
 }
 
 static const struct intel_fbc_funcs g4x_fbc_funcs = {
@@ -499,8 +507,10 @@ static bool ilk_fbc_is_compressing(struct intel_fbc *fbc)
 static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
 {
 	struct drm_i915_private *i915 = fbc->i915;
+	u64 fb_offset = i915_gem_stolen_reserve_offset(fbc->compressed_fb);
 
-	intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fbc->compressed_fb.start);
+	GEM_BUG_ON(fb_offset == I915_BO_INVALID_OFFSET);
+	intel_de_write(i915, ILK_DPFC_CB_BASE(fbc->id), fb_offset);
 }
 
 static const struct intel_fbc_funcs ilk_fbc_funcs = {
@@ -744,21 +754,24 @@ static int find_compression_limit(struct intel_fbc *fbc,
 {
 	struct drm_i915_private *i915 = fbc->i915;
 	u64 end = intel_fbc_stolen_end(i915);
-	int ret, limit = min_limit;
+	int limit = min_limit;
+	struct ttm_resource *res;
 
 	size /= limit;
 
 	/* Try to over-allocate to reduce reallocations and fragmentation. */
-	ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-						   size <<= 1, 4096, 0, end);
-	if (ret == 0)
+	res = i915_gem_stolen_reserve_range(i915, size <<= 1, 0, end);
+	if (!IS_ERR(res)) {
+		fbc->compressed_fb = res;
 		return limit;
+	}
 
 	for (; limit <= intel_fbc_max_limit(i915); limit <<= 1) {
-		ret = i915_gem_stolen_insert_node_in_range(i915, &fbc->compressed_fb,
-							   size >>= 1, 4096, 0, end);
-		if (ret == 0)
+		res = i915_gem_stolen_reserve_range(i915, size >>= 1, 0, end);
+		if (!IS_ERR(res)) {
+			fbc->compressed_fb = res;
 			return limit;
+		}
 	}
 
 	return 0;
@@ -769,17 +782,18 @@ static int intel_fbc_alloc_cfb(struct intel_fbc *fbc,
 {
 	struct drm_i915_private *i915 = fbc->i915;
 	int ret;
+	struct ttm_resource *res;
 
-	drm_WARN_ON(&i915->drm,
-		    drm_mm_node_allocated(&fbc->compressed_fb));
-	drm_WARN_ON(&i915->drm,
-		    drm_mm_node_allocated(&fbc->compressed_llb));
+	drm_WARN_ON(&i915->drm, fbc->compressed_fb);
+	drm_WARN_ON(&i915->drm, fbc->compressed_llb);
 
 	if (DISPLAY_VER(i915) < 5 && !IS_G4X(i915)) {
-		ret = i915_gem_stolen_insert_node(i915, &fbc->compressed_llb,
-						  4096, 4096);
-		if (ret)
+		res = i915_gem_stolen_reserve_range(i915, 4096, I915_GEM_STOLEN_BIAS, 0);
+		if (IS_ERR(res)) {
+			ret = PTR_ERR(res);
 			goto err;
+		}
+		fbc->compressed_llb = res;
 	}
 
 	ret = find_compression_limit(fbc, size, min_limit);
@@ -793,15 +807,14 @@ static int intel_fbc_alloc_cfb(struct intel_fbc *fbc,
 
 	drm_dbg_kms(&i915->drm,
 		    "reserved %llu bytes of contiguous stolen space for FBC, limit: %d\n",
-		    fbc->compressed_fb.size, fbc->limit);
+		    i915_gem_stolen_reserve_size(fbc->compressed_fb), fbc->limit);
 
 	return 0;
 
 err_llb:
-	if (drm_mm_node_allocated(&fbc->compressed_llb))
-		i915_gem_stolen_remove_node(i915, &fbc->compressed_llb);
+	i915_gem_stolen_release_range(i915, fetch_and_zero(&fbc->compressed_llb));
 err:
-	if (drm_mm_initialized(&i915->mm.stolen))
+	if (IS_ERR(res) && (PTR_ERR(res) == -ENOMEM || PTR_ERR(res) == -ENXIO))
 		drm_info_once(&i915->drm, "not enough stolen space for compressed buffer (need %d more bytes), disabling. Hint: you may be able to increase stolen memory size in the BIOS to avoid this.\n", size);
 	return -ENOSPC;
 }
@@ -826,10 +839,10 @@ static void __intel_fbc_cleanup_cfb(struct intel_fbc *fbc)
 	if (WARN_ON(intel_fbc_hw_is_active(fbc)))
 		return;
 
-	if (drm_mm_node_allocated(&fbc->compressed_llb))
-		i915_gem_stolen_remove_node(i915, &fbc->compressed_llb);
-	if (drm_mm_node_allocated(&fbc->compressed_fb))
-		i915_gem_stolen_remove_node(i915, &fbc->compressed_fb);
+	if (fbc->compressed_llb)
+		i915_gem_stolen_release_range(i915, fetch_and_zero(&fbc->compressed_llb));
+	if (fbc->compressed_fb)
+		i915_gem_stolen_release_range(i915, fetch_and_zero(&fbc->compressed_fb));
 }
 
 void intel_fbc_cleanup(struct drm_i915_private *i915)
@@ -1034,9 +1047,10 @@ static bool intel_fbc_is_cfb_ok(const struct intel_plane_state *plane_state)
 {
 	struct intel_plane *plane = to_intel_plane(plane_state->uapi.plane);
 	struct intel_fbc *fbc = plane->fbc;
+	u64 fb_size = i915_gem_stolen_reserve_size(fbc->compressed_fb);
 
 	return intel_fbc_min_limit(plane_state) <= fbc->limit &&
-		intel_fbc_cfb_size(plane_state) <= fbc->compressed_fb.size * fbc->limit;
+		intel_fbc_cfb_size(plane_state) <= fb_size * fbc->limit;
 }
 
 static bool intel_fbc_is_ok(const struct intel_plane_state *plane_state)
@@ -1702,7 +1716,7 @@ void intel_fbc_init(struct drm_i915_private *i915)
 {
 	enum intel_fbc_id fbc_id;
 
-	if (!drm_mm_initialized(&i915->mm.stolen))
+	if (!i915->mm.stolen_region)
 		mkwrite_device_info(i915)->display.fbc_mask = 0;
 
 	if (need_fbc_vtd_wa(i915))
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 07bc11247a3e..13466cca7af8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -633,8 +633,6 @@ struct drm_i915_gem_object {
 		} userptr;
 #endif
 
-		struct drm_mm_node *stolen;
-
 		resource_size_t bo_offset;
 
 		unsigned long scratch;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 47b5e0e342ab..e7fdccf628d8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -4,75 +4,111 @@
  * Copyright © 2008-2012 Intel Corporation
  */
 
+#include "drm/ttm/ttm_placement.h"
+#include "gem/i915_gem_object_types.h"
 #include <linux/errno.h>
 #include <linux/mutex.h>
 
-#include <drm/drm_mm.h>
 #include <drm/i915_drm.h>
 
 #include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_region.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_region_lmem.h"
+#include "gem/i915_gem_ttm.h"
 #include "i915_drv.h"
 #include "i915_gem_stolen.h"
 #include "i915_reg.h"
 #include "i915_utils.h"
 #include "i915_vgpu.h"
 #include "intel_mchbar_regs.h"
+#include "intel_region_ttm.h"
 
-/*
- * The BIOS typically reserves some of the system's memory for the exclusive
- * use of the integrated graphics. This memory is no longer available for
- * use by the OS and so the user finds that his system has less memory
- * available than he put in. We refer to this memory as stolen.
- *
- * The BIOS will allocate its framebuffer from the stolen memory. Our
- * goal is try to reuse that object for our own fbcon which must always
- * be available for panics. Anything else we can reuse the stolen memory
- * for is a boon.
- */
+struct stolen_region {
+	struct intel_memory_region region;
+	struct ttm_resource *wa_resvd;
+};
 
-int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *i915,
-					 struct drm_mm_node *node, u64 size,
-					 unsigned alignment, u64 start, u64 end)
+inline struct stolen_region *to_stolen_region(struct intel_memory_region *mem)
 {
-	int ret;
-
-	if (!drm_mm_initialized(&i915->mm.stolen))
-		return -ENODEV;
+	return container_of(mem, struct stolen_region, region);
+}
 
-	/* WaSkipStolenMemoryFirstPage:bdw+ */
-	if (GRAPHICS_VER(i915) >= 8 && start < 4096)
-		start = 4096;
+/**
+ * i915_gem_stolen_reserve_range - reserve a region of space in a given range
+ * @i915: i915 device instance
+ * @size: size of region to reserve
+ * @start: start of search area
+ * @end: end of search area
+ *
+ * Search for @size amount of free space within the region delimeted by @start and @end.
+ * If found reserve it from future use until later release with @i915_gem_stolen_release_range.
+ *
+ * Return: pointer to resource tracking structure on success, ERR_PTR otherwise
+ */
+struct ttm_resource *
+i915_gem_stolen_reserve_range(struct drm_i915_private *i915,
+			      resource_size_t size,
+			      u64 start, u64 end)
+{
+	struct intel_memory_region *mem = i915->mm.stolen_region;
 
-	mutex_lock(&i915->mm.stolen_lock);
-	ret = drm_mm_insert_node_in_range(&i915->mm.stolen, node,
-					  size, alignment, 0,
-					  start, end, DRM_MM_INSERT_BEST);
-	mutex_unlock(&i915->mm.stolen_lock);
+	if (!mem)
+		return ERR_PTR(-ENODEV);
+	return intel_region_ttm_resource_alloc(mem, size, start, end, I915_BO_ALLOC_CONTIGUOUS);
+}
 
-	return ret;
+/**
+ * i915_gem_stolen_reserve_offset - return the offset of the reserved space
+ * @res: pointer to resource tracking structure to check
+ *
+ * Return: The offset of the reserved resource, or I915_BO_INVALID_OFFSET on error
+ */
+u64 i915_gem_stolen_reserve_offset(struct ttm_resource *res)
+{
+	if (!res)
+		return I915_BO_INVALID_OFFSET;
+	return PFN_PHYS(res->start);
 }
 
-int i915_gem_stolen_insert_node(struct drm_i915_private *i915,
-				struct drm_mm_node *node, u64 size,
-				unsigned alignment)
+/**
+ * i915_gem_stolen_reserve_size - return the reserved size of the reserved space
+ * @res: pointer to resource tracking structure to check
+ *
+ * Return: The size of the reserved resource, or I915_BO_INVALID_OFFSET on error
+ */
+u64 i915_gem_stolen_reserve_size(struct ttm_resource *res)
 {
-	return i915_gem_stolen_insert_node_in_range(i915, node,
-						    size, alignment,
-						    I915_GEM_STOLEN_BIAS,
-						    U64_MAX);
+	if (!res)
+		return I915_BO_INVALID_OFFSET;
+	return PFN_PHYS(res->num_pages);
 }
 
-void i915_gem_stolen_remove_node(struct drm_i915_private *i915,
-				 struct drm_mm_node *node)
+/**
+ * i915_gem_stolen_release_range - release the reserved area to be free for allocation again
+ * @i915: i915 device instance
+ * @res: pointer to resource tracking structure allocated via @i915_gem_stolen_reserve_range
+ */
+void i915_gem_stolen_release_range(struct drm_i915_private *i915,
+				   struct ttm_resource *res)
 {
-	mutex_lock(&i915->mm.stolen_lock);
-	drm_mm_remove_node(node);
-	mutex_unlock(&i915->mm.stolen_lock);
+	struct intel_memory_region *mem = i915->mm.stolen_region;
+
+	intel_region_ttm_resource_free(mem, res);
 }
 
+/*
+ * The BIOS typically reserves some of the system's memory for the exclusive
+ * use of the integrated graphics. This memory is no longer available for
+ * use by the OS and so the user finds that his system has less memory
+ * available than he put in. We refer to this memory as stolen.
+ *
+ * The BIOS will allocate its framebuffer from the stolen memory. Our
+ * goal is try to reuse that object for our own fbcon which must always
+ * be available for panics. Anything else we can reuse the stolen memory
+ * for is a boon.
+ */
+
 static int i915_adjust_stolen(struct drm_i915_private *i915,
 			      struct resource *dsm)
 {
@@ -173,14 +209,6 @@ static int i915_adjust_stolen(struct drm_i915_private *i915,
 	return 0;
 }
 
-static void i915_gem_cleanup_stolen(struct drm_i915_private *i915)
-{
-	if (!drm_mm_initialized(&i915->mm.stolen))
-		return;
-
-	drm_mm_takedown(&i915->mm.stolen);
-}
-
 static void g4x_get_stolen_reserved(struct drm_i915_private *i915,
 				    struct intel_uncore *uncore,
 				    resource_size_t *base,
@@ -394,8 +422,7 @@ static int i915_gem_init_stolen(struct intel_memory_region *mem)
 	struct intel_uncore *uncore = &i915->uncore;
 	resource_size_t reserved_base, stolen_top;
 	resource_size_t reserved_total, reserved_size;
-
-	mutex_init(&i915->mm.stolen_lock);
+	int ret;
 
 	if (intel_vgpu_active(i915)) {
 		drm_notice(&i915->drm,
@@ -513,258 +540,100 @@ static int i915_gem_init_stolen(struct intel_memory_region *mem)
 		return 0;
 
 	/* Basic memrange allocator for stolen space. */
-	drm_mm_init(&i915->mm.stolen, 0, i915->stolen_usable_size);
-
-	return 0;
-}
-
-static void dbg_poison(struct i915_ggtt *ggtt,
-		       dma_addr_t addr, resource_size_t size,
-		       u8 x)
-{
-#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
-	if (!drm_mm_node_allocated(&ggtt->error_capture))
-		return;
-
-	if (ggtt->vm.bind_async_flags & I915_VMA_GLOBAL_BIND)
-		return; /* beware stop_machine() inversion */
-
-	GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
-
-	mutex_lock(&ggtt->error_mutex);
-	while (size) {
-		void __iomem *s;
-
-		ggtt->vm.insert_page(&ggtt->vm, addr,
-				     ggtt->error_capture.start,
-				     I915_CACHE_NONE, 0);
-		mb();
-
-		s = io_mapping_map_wc(&ggtt->iomap,
-				      ggtt->error_capture.start,
-				      PAGE_SIZE);
-		memset_io(s, x, PAGE_SIZE);
-		io_mapping_unmap(s);
-
-		addr += PAGE_SIZE;
-		size -= PAGE_SIZE;
-	}
-	mb();
-	ggtt->vm.clear_range(&ggtt->vm, ggtt->error_capture.start, PAGE_SIZE);
-	mutex_unlock(&ggtt->error_mutex);
-#endif
-}
-
-static struct sg_table *
-i915_pages_create_for_stolen(struct drm_device *dev,
-			     resource_size_t offset, resource_size_t size)
-{
-	struct drm_i915_private *i915 = to_i915(dev);
-	struct sg_table *st;
-	struct scatterlist *sg;
-
-	GEM_BUG_ON(range_overflows(offset, size, resource_size(&i915->dsm)));
-
-	/* We hide that we have no struct page backing our stolen object
-	 * by wrapping the contiguous physical allocation with a fake
-	 * dma mapping in a single scatterlist.
-	 */
+	ret = intel_region_ttm_init(mem);
+	if (ret)
+		return ret;
 
-	st = kmalloc(sizeof(*st), GFP_KERNEL);
-	if (st == NULL)
-		return ERR_PTR(-ENOMEM);
+	/* WaSkipStolenMemoryFirstPage:bdw+ */
+	if (GRAPHICS_VER(i915) >= 8) {
+		struct stolen_region *region = to_stolen_region(mem);
+		struct ttm_resource *resvd;
+
+		resvd = intel_region_ttm_resource_alloc(mem, 4096, 0, 4096, 0);
+		if (IS_ERR(resvd)) {
+			ret = PTR_ERR(resvd);
+			goto err_no_reserve;
+		}
 
-	if (sg_alloc_table(st, 1, GFP_KERNEL)) {
-		kfree(st);
-		return ERR_PTR(-ENOMEM);
+		region->wa_resvd = resvd;
 	}
 
-	sg = st->sgl;
-	sg->offset = 0;
-	sg->length = size;
+	return ret;
 
-	sg_dma_address(sg) = (dma_addr_t)i915->dsm.start + offset;
-	sg_dma_len(sg) = size;
+err_no_reserve:
+	intel_region_ttm_fini(mem);
 
-	return st;
+	return ret;
 }
 
-static int i915_gem_object_get_pages_stolen(struct drm_i915_gem_object *obj)
+struct drm_i915_gem_object *
+i915_gem_object_create_stolen(struct drm_i915_private *i915,
+			      resource_size_t size)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	struct sg_table *pages =
-		i915_pages_create_for_stolen(obj->base.dev,
-					     obj->stolen->start,
-					     obj->stolen->size);
-	if (IS_ERR(pages))
-		return PTR_ERR(pages);
-
-	dbg_poison(to_gt(i915)->ggtt,
-		   sg_dma_address(pages->sgl),
-		   sg_dma_len(pages->sgl),
-		   POISON_INUSE);
-
-	__i915_gem_object_set_pages(obj, pages, obj->stolen->size);
-
-	return 0;
+	return i915_gem_object_create_region(i915->mm.stolen_region, size, 0,
+					     I915_BO_ALLOC_CONTIGUOUS | I915_BO_ALLOC_PINNED);
 }
 
-static void i915_gem_object_put_pages_stolen(struct drm_i915_gem_object *obj,
-					     struct sg_table *pages)
+static struct intel_memory_region *alloc_stolen_region(void)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	/* Should only be called from i915_gem_object_release_stolen() */
+	struct stolen_region *region = kzalloc(sizeof(*region), GFP_KERNEL);
 
-	dbg_poison(to_gt(i915)->ggtt,
-		   sg_dma_address(pages->sgl),
-		   sg_dma_len(pages->sgl),
-		   POISON_FREE);
+	if (!region)
+		return NULL;
 
-	sg_free_table(pages);
-	kfree(pages);
+	return &region->region;
 }
 
-static void
-i915_gem_object_release_stolen(struct drm_i915_gem_object *obj)
+static void free_stolen_region(struct intel_memory_region *mem)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	struct drm_mm_node *stolen = fetch_and_zero(&obj->stolen);
-
-	GEM_BUG_ON(!stolen);
-	i915_gem_stolen_remove_node(i915, stolen);
-	kfree(stolen);
+	struct stolen_region *region = to_stolen_region(mem);
 
-	i915_gem_object_release_memory_region(obj);
+	kfree(region);
 }
 
-static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = {
-	.name = "i915_gem_object_stolen",
-	.get_pages = i915_gem_object_get_pages_stolen,
-	.put_pages = i915_gem_object_put_pages_stolen,
-	.release = i915_gem_object_release_stolen,
-};
-
-static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
-					   struct drm_i915_gem_object *obj,
-					   struct drm_mm_node *stolen)
+static int init_stolen_smem(struct intel_memory_region *mem)
 {
-	static struct lock_class_key lock_class;
-	unsigned int cache_level;
-	unsigned int flags;
-	int err;
-
 	/*
-	 * Stolen objects are always physically contiguous since we just
-	 * allocate one big block underneath using the drm_mm range allocator.
+	 * Initialise stolen early so that we may reserve preallocated
+	 * objects for the BIOS to KMS transition.
 	 */
-	flags = I915_BO_ALLOC_CONTIGUOUS;
-
-	drm_gem_private_object_init(&mem->i915->drm, &obj->base, stolen->size);
-	i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, flags);
-
-	obj->stolen = stolen;
-	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
-	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
-	i915_gem_object_set_cache_coherency(obj, cache_level);
-
-	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
-		return -EBUSY;
+	return i915_gem_init_stolen(mem);
+}
 
-	i915_gem_object_init_memory_region(obj, mem);
+static int release_stolen(struct intel_memory_region *mem)
+{
+	struct stolen_region *region = to_stolen_region(mem);
 
-	err = i915_gem_object_pin_pages(obj);
-	if (err)
-		i915_gem_object_release_memory_region(obj);
-	i915_gem_object_unlock(obj);
+	if (region->wa_resvd)
+		intel_region_ttm_resource_free(mem, region->wa_resvd);
 
-	return err;
+	return intel_region_ttm_fini(mem);
 }
 
-static int _i915_gem_object_stolen_init(struct intel_memory_region *mem,
-					struct drm_i915_gem_object *obj,
-					resource_size_t offset,
-					resource_size_t size,
-					resource_size_t page_size,
-					unsigned int flags)
+static int stolen_object_init(struct intel_memory_region *mem,
+			      struct drm_i915_gem_object *obj,
+			      resource_size_t offset,
+			      resource_size_t size,
+			      resource_size_t page_size,
+			      unsigned int flags)
 {
-	struct drm_i915_private *i915 = mem->i915;
-	struct drm_mm_node *stolen;
-	int ret;
-
-	if (!drm_mm_initialized(&i915->mm.stolen))
+	if (!mem->region_private)
 		return -ENODEV;
 
 	if (size == 0)
 		return -EINVAL;
 
-	/*
-	 * With discrete devices, where we lack a mappable aperture there is no
-	 * possible way to ever access this memory on the CPU side.
-	 */
-	if (mem->type == INTEL_MEMORY_STOLEN_LOCAL && !mem->io_size &&
-	    !(flags & I915_BO_ALLOC_GPU_ONLY))
-		return -ENOSPC;
-
-	stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
-	if (!stolen)
-		return -ENOMEM;
-
-	if (offset != I915_BO_INVALID_OFFSET) {
-		drm_dbg(&i915->drm,
-			"creating preallocated stolen object: stolen_offset=%pa, size=%pa\n",
-			&offset, &size);
-
-		stolen->start = offset;
-		stolen->size = size;
-		mutex_lock(&i915->mm.stolen_lock);
-		ret = drm_mm_reserve_node(&i915->mm.stolen, stolen);
-		mutex_unlock(&i915->mm.stolen_lock);
-	} else {
-		ret = i915_gem_stolen_insert_node(i915, stolen, size,
-						  mem->min_page_size);
-	}
-	if (ret)
-		goto err_free;
-
-	ret = __i915_gem_object_create_stolen(mem, obj, stolen);
-	if (ret)
-		goto err_remove;
-
-	return 0;
-
-err_remove:
-	i915_gem_stolen_remove_node(i915, stolen);
-err_free:
-	kfree(stolen);
-	return ret;
-}
-
-struct drm_i915_gem_object *
-i915_gem_object_create_stolen(struct drm_i915_private *i915,
-			      resource_size_t size)
-{
-	return i915_gem_object_create_region(i915->mm.stolen_region, size, 0, 0);
-}
-
-static int init_stolen_smem(struct intel_memory_region *mem)
-{
-	/*
-	 * Initialise stolen early so that we may reserve preallocated
-	 * objects for the BIOS to KMS transition.
-	 */
-	return i915_gem_init_stolen(mem);
-}
-
-static int release_stolen_smem(struct intel_memory_region *mem)
-{
-	i915_gem_cleanup_stolen(mem->i915);
-	return 0;
+	/* default range manager relies on page_size for alignment */
+	page_size = max(page_size, mem->min_page_size);
+	return __i915_gem_ttm_object_init(mem, obj, offset, size, page_size, flags);
 }
 
 static const struct intel_memory_region_ops i915_region_stolen_smem_ops = {
 	.init = init_stolen_smem,
-	.release = release_stolen_smem,
-	.init_object = _i915_gem_object_stolen_init,
+	.release = release_stolen,
+	.init_object = stolen_object_init,
+	.alloc = alloc_stolen_region,
+	.free = free_stolen_region,
 };
 
 static int init_stolen_lmem(struct intel_memory_region *mem)
@@ -793,7 +662,7 @@ static int init_stolen_lmem(struct intel_memory_region *mem)
 	return 0;
 
 err_cleanup:
-	i915_gem_cleanup_stolen(mem->i915);
+	intel_region_ttm_fini(mem);
 	return err;
 }
 
@@ -801,14 +670,15 @@ static int release_stolen_lmem(struct intel_memory_region *mem)
 {
 	if (mem->io_size)
 		io_mapping_fini(&mem->iomap);
-	i915_gem_cleanup_stolen(mem->i915);
-	return 0;
+	return release_stolen(mem);
 }
 
 static const struct intel_memory_region_ops i915_region_stolen_lmem_ops = {
 	.init = init_stolen_lmem,
 	.release = release_stolen_lmem,
-	.init_object = _i915_gem_object_stolen_init,
+	.init_object = stolen_object_init,
+	.alloc = alloc_stolen_region,
+	.free = free_stolen_region,
 };
 
 struct intel_memory_region *
@@ -896,7 +766,37 @@ i915_gem_stolen_smem_setup(struct drm_i915_private *i915, u16 type,
 	return mem;
 }
 
-bool i915_gem_object_is_stolen(const struct drm_i915_gem_object *obj)
+/**
+ * i915_gem_object_stolen_offset - return offset from start of stolen region
+ * @obj: the object to return the offset of
+ *
+ * Get the offset from stolen region if this object is currently placed in stolen memory.
+ *
+ * Return: offset from stolen if successful, I915_BO_INVALID_OFFSET otherwise
+ */
+u64 i915_gem_object_stolen_offset(struct drm_i915_gem_object *obj)
 {
-	return obj->ops == &i915_gem_object_stolen_ops;
+	struct ttm_buffer_object *ttm_obj;
+
+	if (!obj || !i915_gem_object_is_stolen(obj))
+		return I915_BO_INVALID_OFFSET;
+
+	ttm_obj = i915_gem_to_ttm(obj);
+	if (ttm_obj->resource->mem_type != I915_PL_STOLEN)
+		return I915_BO_INVALID_OFFSET;
+
+	return PFN_PHYS(ttm_obj->resource->start);
+}
+
+bool i915_gem_object_is_stolen(struct drm_i915_gem_object *obj)
+{
+	struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+
+#ifdef CONFIG_LOCKDEP
+	if (i915_gem_object_migratable(obj) &&
+	    i915_gem_object_evictable(obj))
+		assert_object_held(obj);
+#endif
+	return mr && (mr->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+		      mr->type == INTEL_MEMORY_STOLEN_LOCAL);
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h
index d5005a39d130..b39cb6e6c768 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h
@@ -10,17 +10,9 @@
 
 struct drm_i915_private;
 struct drm_mm_node;
+struct ttm_resource;
 struct drm_i915_gem_object;
 
-int i915_gem_stolen_insert_node(struct drm_i915_private *dev_priv,
-				struct drm_mm_node *node, u64 size,
-				unsigned alignment);
-int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *dev_priv,
-					 struct drm_mm_node *node, u64 size,
-					 unsigned alignment, u64 start,
-					 u64 end);
-void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,
-				 struct drm_mm_node *node);
 struct intel_memory_region *
 i915_gem_stolen_smem_setup(struct drm_i915_private *i915, u16 type,
 			   u16 instance);
@@ -32,7 +24,16 @@ struct drm_i915_gem_object *
 i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
 			      resource_size_t size);
 
-bool i915_gem_object_is_stolen(const struct drm_i915_gem_object *obj);
+u64 i915_gem_object_stolen_offset(struct drm_i915_gem_object *obj);
+bool i915_gem_object_is_stolen(struct drm_i915_gem_object *obj);
+struct ttm_resource *
+i915_gem_stolen_reserve_range(struct drm_i915_private *i915,
+			      resource_size_t size,
+			      u64 start, u64 end);
+u64 i915_gem_stolen_reserve_offset(struct ttm_resource *res);
+u64 i915_gem_stolen_reserve_size(struct ttm_resource *res);
+void i915_gem_stolen_release_range(struct drm_i915_private *i915,
+				   struct ttm_resource *res);
 
 #define I915_GEM_STOLEN_BIAS SZ_128K
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index bb988608296d..c7611efddcf7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1203,6 +1203,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	struct ttm_place _place;
 	struct ttm_placement _placement;
 	struct ttm_placement *placement;
+	bool is_stolen = intel_region_to_ttm_type(mem) == I915_PL_STOLEN;
 	int ret;
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
@@ -1222,7 +1223,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
 
 	/* Forcing the page size is kernel internal only */
-	GEM_BUG_ON(page_size && obj->mm.n_placements);
+	GEM_BUG_ON(page_size && obj->mm.n_placements && !is_stolen);
 
 	/*
 	 * Keep an extra shrink pin to prevent the object from being made
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 73e371aa3850..81654a51df51 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -49,6 +49,13 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 			       resource_size_t size,
 			       resource_size_t page_size,
 			       unsigned int flags);
+int i915_gem_ttm_object_init_in_place(struct intel_memory_region *mem,
+				      struct drm_i915_gem_object *obj,
+				      resource_size_t size,
+				      resource_size_t page_size,
+				      unsigned int flags,
+				      u64 start,
+				      u64 end);
 
 /* Internal I915 TTM declarations and definitions below. */
 
diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c
index f8d0523f4c18..846eef898f7e 100644
--- a/drivers/gpu/drm/i915/gt/intel_rc6.c
+++ b/drivers/gpu/drm/i915/gt/intel_rc6.c
@@ -355,9 +355,9 @@ static int vlv_rc6_init(struct intel_rc6 *rc6)
 
 	GEM_BUG_ON(range_overflows_end_t(u64,
 					 i915->dsm.start,
-					 pctx->stolen->start,
+					 i915_gem_object_stolen_offset(pctx),
 					 U32_MAX));
-	pctx_paddr = i915->dsm.start + pctx->stolen->start;
+	pctx_paddr = i915->dsm.start + i915_gem_object_stolen_offset(pctx);
 	intel_uncore_write(uncore, VLV_PCBR, pctx_paddr);
 
 out:
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 37c38bdd5f47..75bc7d90c9dc 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -6,6 +6,7 @@
 #include <linux/crc32.h>
 
 #include "gem/i915_gem_stolen.h"
+#include "intel_region_ttm.h"
 
 #include "i915_memcpy.h"
 #include "i915_selftest.h"
@@ -83,6 +84,7 @@ __igt_reset_stolen(struct intel_gt *gt,
 		dma_addr_t dma = (dma_addr_t)dsm->start + (page << PAGE_SHIFT);
 		void __iomem *s;
 		void *in;
+		bool busy;
 
 		ggtt->vm.insert_page(&ggtt->vm, dma,
 				     ggtt->error_capture.start,
@@ -93,9 +95,9 @@ __igt_reset_stolen(struct intel_gt *gt,
 				      ggtt->error_capture.start,
 				      PAGE_SIZE);
 
-		if (!__drm_mm_interval_first(&gt->i915->mm.stolen,
-					     page << PAGE_SHIFT,
-					     ((page + 1) << PAGE_SHIFT) - 1))
+		busy = intel_region_ttm_range_busy(gt->i915->mm.stolen_region,
+						   PFN_PHYS(page), PAGE_SIZE);
+		if (!busy)
 			memset_io(s, STACK_MAGIC, PAGE_SIZE);
 
 		in = (void __force *)s;
@@ -124,6 +126,7 @@ __igt_reset_stolen(struct intel_gt *gt,
 		void __iomem *s;
 		void *in;
 		u32 x;
+		bool busy;
 
 		ggtt->vm.insert_page(&ggtt->vm, dma,
 				     ggtt->error_capture.start,
@@ -139,10 +142,9 @@ __igt_reset_stolen(struct intel_gt *gt,
 			in = tmp;
 		x = crc32_le(0, in, PAGE_SIZE);
 
-		if (x != crc[page] &&
-		    !__drm_mm_interval_first(&gt->i915->mm.stolen,
-					     page << PAGE_SHIFT,
-					     ((page + 1) << PAGE_SHIFT) - 1)) {
+		busy = intel_region_ttm_range_busy(gt->i915->mm.stolen_region,
+						   PFN_PHYS(page), PAGE_SIZE);
+		if (x != crc[page] && !busy) {
 			pr_debug("unused stolen page %pa modified by GPU reset\n",
 				 &page);
 			if (count++ == 0)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 94e5c29d2ee3..e538b8f71dcc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -32,6 +32,7 @@
 
 #include <drm/drm_debugfs.h>
 
+#include "gem/i915_gem_region.h"
 #include "gem/i915_gem_context.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_buffer_pool.h"
@@ -157,6 +158,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	struct i915_vma *vma;
 	int pin_count = 0;
+	u64 offset;
 
 	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
 		   &obj->base,
@@ -241,8 +243,9 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	spin_unlock(&obj->vma.lock);
 
 	seq_printf(m, " (pinned x %d)", pin_count);
-	if (i915_gem_object_is_stolen(obj))
-		seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
+	offset = i915_gem_object_stolen_offset(obj);
+	if (offset != I915_BO_INVALID_OFFSET)
+		seq_printf(m, " (stolen: %08llx)", offset);
 	if (i915_gem_object_is_framebuffer(obj))
 		seq_printf(m, " (fb)");
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index eba94fa76b18..1ebb266e7b63 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -224,11 +224,6 @@ struct i915_gem_mm {
 	 * support stolen.
 	 */
 	struct intel_memory_region *stolen_region;
-	/** Memory allocator for GTT stolen memory */
-	struct drm_mm stolen;
-	/** Protects the usage of the GTT stolen memory allocator. This is
-	 * always the inner lock when overlapping with struct_mutex. */
-	struct mutex stolen_lock;
 
 	/* Protects bound_list/unbound_list and #drm_i915_gem_object.mm.link */
 	spinlock_t obj_lock;
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 694e9acb69e2..b3850188981b 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -194,11 +194,12 @@ intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
 	}
 }
 
-#ifdef CONFIG_DRM_I915_SELFTEST
 /**
  * intel_region_ttm_resource_alloc - Allocate memory resources from a region
  * @mem: The memory region,
  * @size: The requested size in bytes
+ * @start: start of allowed range
+ * @end: end of allowed range
  * @flags: Allocation flags
  *
  * This functionality is provided only for callers that need to allocate
@@ -212,8 +213,9 @@ intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
  */
 struct ttm_resource *
 intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
-				resource_size_t offset,
 				resource_size_t size,
+				u64 start,
+				u64 end,
 				unsigned int flags)
 {
 	struct ttm_resource_manager *man = mem->region_private;
@@ -222,11 +224,14 @@ intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
 	struct ttm_resource *res;
 	int ret;
 
+	if (!man)
+		return ERR_PTR(-ENODEV);
+
 	if (flags & I915_BO_ALLOC_CONTIGUOUS)
 		place.flags |= TTM_PL_FLAG_CONTIGUOUS;
-	if (offset != I915_BO_INVALID_OFFSET) {
-		place.fpfn = offset >> PAGE_SHIFT;
-		place.lpfn = place.fpfn + (size >> PAGE_SHIFT);
+	if (start || end) {
+		place.fpfn = PFN_DOWN(start);
+		place.lpfn = PFN_UP(end);
 	} else if (mem->io_size && mem->io_size < mem->total) {
 		if (flags & I915_BO_ALLOC_GPU_ONLY) {
 			place.flags |= TTM_PL_FLAG_TOPDOWN;
@@ -247,8 +252,6 @@ intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
 	return ret ? ERR_PTR(ret) : res;
 }
 
-#endif
-
 /**
  * intel_region_ttm_resource_free - Free a resource allocated from a resource manager
  * @mem: The region the resource was allocated from.
@@ -260,9 +263,34 @@ void intel_region_ttm_resource_free(struct intel_memory_region *mem,
 	struct ttm_resource_manager *man = mem->region_private;
 	struct ttm_buffer_object mock_bo = {};
 
+	if (!man)
+		return;
+
 	mock_bo.base.size = res->num_pages << PAGE_SHIFT;
 	mock_bo.bdev = &mem->i915->bdev;
 	res->bo = &mock_bo;
 
 	man->func->free(man, res);
 }
+
+/**
+ * intel_region_ttm_range_busy - check whether range has any allocations
+ * @mem: The region to check
+ * @start: the start of the range to check
+ * @size: size of the range to check
+ *
+ * Return: true if something is alloceted within the region, false otherwise.
+ */
+bool intel_region_ttm_range_busy(struct intel_memory_region *mem,
+				 u64 start, u64 size)
+{
+	struct ttm_resource *dummy;
+
+	dummy = intel_region_ttm_resource_alloc(mem, size, start, start + size,
+						I915_BO_ALLOC_CONTIGUOUS);
+	if (IS_ERR(dummy))
+		return true;
+
+	intel_region_ttm_resource_free(mem, dummy);
+	return false;
+}
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
index cf9d86dcf409..1e88472fb2ea 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -29,15 +29,17 @@ intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
 void intel_region_ttm_resource_free(struct intel_memory_region *mem,
 				    struct ttm_resource *res);
 
+bool intel_region_ttm_range_busy(struct intel_memory_region *mem,
+				 u64 start, u64 size);
+
 int intel_region_to_ttm_type(const struct intel_memory_region *mem);
 
 struct ttm_device_funcs *i915_ttm_driver(void);
 
-#ifdef CONFIG_DRM_I915_SELFTEST
 struct ttm_resource *
 intel_region_ttm_resource_alloc(struct intel_memory_region *mem,
-				resource_size_t offset,
 				resource_size_t size,
+				u64 start,
+				u64 end,
 				unsigned int flags);
-#endif
 #endif /* _INTEL_REGION_TTM_H_ */
diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c
index 670557ce1024..a41d81dc345d 100644
--- a/drivers/gpu/drm/i915/selftests/mock_region.c
+++ b/drivers/gpu/drm/i915/selftests/mock_region.c
@@ -24,10 +24,20 @@ static int mock_region_get_pages(struct drm_i915_gem_object *obj)
 {
 	struct sg_table *pages;
 	int err;
+	u64 start, end;
+
+	if (obj->bo_offset == I915_BO_INVALID_OFFSET) {
+		start = 0;
+		end = 0;
+	} else {
+		start = obj->bo_offset;
+		end = obj->bo_offset + obj->base.size;
+	}
 
 	obj->mm.res = intel_region_ttm_resource_alloc(obj->mm.region,
-						      obj->bo_offset,
 						      obj->base.size,
+						      start,
+						      end,
 						      obj->flags);
 	if (IS_ERR(obj->mm.res))
 		return PTR_ERR(obj->mm.res);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: ttm for stolen (rev3)
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
                   ` (10 preceding siblings ...)
  (?)
@ 2022-06-21  1:09 ` Patchwork
  -1 siblings, 0 replies; 48+ messages in thread
From: Patchwork @ 2022-06-21  1:09 UTC (permalink / raw)
  To: Robert Beckett; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: ttm for stolen (rev3)
URL   : https://patchwork.freedesktop.org/series/101396/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev3)
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
                   ` (11 preceding siblings ...)
  (?)
@ 2022-06-21  1:39 ` Patchwork
  -1 siblings, 0 replies; 48+ messages in thread
From: Patchwork @ 2022-06-21  1:39 UTC (permalink / raw)
  To: Robert Beckett; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 14318 bytes --]

== Series Details ==

Series: drm/i915: ttm for stolen (rev3)
URL   : https://patchwork.freedesktop.org/series/101396/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11786 -> Patchwork_101396v3
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_101396v3 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_101396v3, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/index.html

Participating hosts (41 -> 43)
------------------------------

  Additional (3): fi-kbl-soraka bat-dg2-8 fi-rkl-11600 
  Missing    (1): fi-bdw-samus 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_101396v3:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@reset:
    - fi-rkl-guc:         [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/fi-rkl-guc/igt@i915_selftest@live@reset.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-guc/igt@i915_selftest@live@reset.html
    - bat-adlp-4:         [PASS][3] -> [DMESG-FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/bat-adlp-4/igt@i915_selftest@live@reset.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/bat-adlp-4/igt@i915_selftest@live@reset.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live@reset:
    - {bat-adlp-6}:       [PASS][5] -> [DMESG-FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/bat-adlp-6/igt@i915_selftest@live@reset.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/bat-adlp-6/igt@i915_selftest@live@reset.html

  * igt@kms_busy@basic@flip:
    - {bat-dg2-9}:        [PASS][7] -> [DMESG-WARN][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/bat-dg2-9/igt@kms_busy@basic@flip.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/bat-dg2-9/igt@kms_busy@basic@flip.html

  
Known issues
------------

  Here are the changes found in Patchwork_101396v3 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_huc_copy@huc-copy:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][9] ([fdo#109271] / [i915#2190])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-kbl-soraka/igt@gem_huc_copy@huc-copy.html
    - fi-rkl-11600:       NOTRUN -> [SKIP][10] ([i915#2190])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@basic:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][11] ([fdo#109271] / [i915#4613]) +3 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-kbl-soraka/igt@gem_lmem_swapping@basic.html
    - fi-rkl-11600:       NOTRUN -> [SKIP][12] ([i915#4613]) +3 similar issues
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@gem_lmem_swapping@basic.html

  * igt@gem_tiled_pread_basic:
    - fi-rkl-11600:       NOTRUN -> [SKIP][13] ([i915#3282])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
    - fi-rkl-11600:       NOTRUN -> [SKIP][14] ([i915#3012])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@i915_pm_backlight@basic-brightness.html

  * igt@i915_selftest@live@gem:
    - fi-pnv-d510:        NOTRUN -> [DMESG-FAIL][15] ([i915#4528])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-pnv-d510/igt@i915_selftest@live@gem.html

  * igt@i915_selftest@live@gt_pm:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-FAIL][16] ([i915#1886])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@hangcheck:
    - fi-snb-2600:        [PASS][17] -> [INCOMPLETE][18] ([i915#3921])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
    - bat-dg1-6:          [PASS][19] -> [DMESG-FAIL][20] ([i915#4494] / [i915#4957])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/bat-dg1-6/igt@i915_selftest@live@hangcheck.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/bat-dg1-6/igt@i915_selftest@live@hangcheck.html

  * igt@i915_suspend@basic-s3-without-i915:
    - fi-rkl-11600:       NOTRUN -> [INCOMPLETE][21] ([i915#5982])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_busy@basic@flip:
    - bat-adlp-4:         [PASS][22] -> [DMESG-WARN][23] ([i915#3576])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/bat-adlp-4/igt@kms_busy@basic@flip.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/bat-adlp-4/igt@kms_busy@basic@flip.html

  * igt@kms_chamelium@hdmi-edid-read:
    - fi-rkl-11600:       NOTRUN -> [SKIP][24] ([fdo#111827]) +7 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@kms_chamelium@hdmi-edid-read.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][25] ([fdo#109271] / [fdo#111827]) +7 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-kbl-soraka/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][26] ([fdo#109271]) +9 similar issues
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-kbl-soraka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy:
    - fi-rkl-11600:       NOTRUN -> [SKIP][27] ([i915#4103]) +1 similar issue
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-rkl-11600:       NOTRUN -> [SKIP][28] ([fdo#109285] / [i915#4098])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - fi-rkl-11600:       NOTRUN -> [SKIP][29] ([i915#533])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html
    - fi-kbl-soraka:      NOTRUN -> [SKIP][30] ([fdo#109271] / [i915#533])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-kbl-soraka/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@kms_psr@primary_page_flip:
    - fi-rkl-11600:       NOTRUN -> [SKIP][31] ([i915#1072]) +3 similar issues
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@kms_psr@primary_page_flip.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - fi-rkl-11600:       NOTRUN -> [SKIP][32] ([i915#3555] / [i915#4098])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-read:
    - fi-rkl-11600:       NOTRUN -> [SKIP][33] ([fdo#109295] / [i915#3291] / [i915#3708]) +2 similar issues
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@prime_vgem@basic-read.html

  * igt@prime_vgem@basic-userptr:
    - fi-rkl-11600:       NOTRUN -> [SKIP][34] ([fdo#109295] / [i915#3301] / [i915#3708])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-rkl-11600/igt@prime_vgem@basic-userptr.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s0@smem:
    - {fi-ehl-2}:         [DMESG-WARN][35] ([i915#5122]) -> [PASS][36]
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/fi-ehl-2/igt@gem_exec_suspend@basic-s0@smem.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-ehl-2/igt@gem_exec_suspend@basic-s0@smem.html

  * igt@i915_selftest@live@gtt:
    - fi-bdw-5557u:       [DMESG-FAIL][37] ([i915#3674]) -> [PASS][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/fi-bdw-5557u/igt@i915_selftest@live@gtt.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-bdw-5557u/igt@i915_selftest@live@gtt.html

  * igt@i915_selftest@live@requests:
    - fi-pnv-d510:        [DMESG-FAIL][39] ([i915#4528]) -> [PASS][40]
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/fi-pnv-d510/igt@i915_selftest@live@requests.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/fi-pnv-d510/igt@i915_selftest@live@requests.html

  * igt@kms_flip@basic-flip-vs-modeset@a-edp1:
    - {bat-adlp-6}:       [DMESG-WARN][41] ([i915#3576]) -> [PASS][42]
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/bat-adlp-6/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/bat-adlp-6/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html

  * igt@kms_flip@basic-flip-vs-wf_vblank@a-edp1:
    - bat-adlp-4:         [DMESG-WARN][43] ([i915#3576]) -> [PASS][44] +1 similar issue
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11786/bat-adlp-4/igt@kms_flip@basic-flip-vs-wf_vblank@a-edp1.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/bat-adlp-4/igt@kms_flip@basic-flip-vs-wf_vblank@a-edp1.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1155]: https://gitlab.freedesktop.org/drm/intel/issues/1155
  [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#3012]: https://gitlab.freedesktop.org/drm/intel/issues/3012
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#3595]: https://gitlab.freedesktop.org/drm/intel/issues/3595
  [i915#3674]: https://gitlab.freedesktop.org/drm/intel/issues/3674
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3921]: https://gitlab.freedesktop.org/drm/intel/issues/3921
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4579]: https://gitlab.freedesktop.org/drm/intel/issues/4579
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4873]: https://gitlab.freedesktop.org/drm/intel/issues/4873
  [i915#4957]: https://gitlab.freedesktop.org/drm/intel/issues/4957
  [i915#5122]: https://gitlab.freedesktop.org/drm/intel/issues/5122
  [i915#5174]: https://gitlab.freedesktop.org/drm/intel/issues/5174
  [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190
  [i915#5270]: https://gitlab.freedesktop.org/drm/intel/issues/5270
  [i915#5274]: https://gitlab.freedesktop.org/drm/intel/issues/5274
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#5763]: https://gitlab.freedesktop.org/drm/intel/issues/5763
  [i915#5982]: https://gitlab.freedesktop.org/drm/intel/issues/5982


Build changes
-------------

  * Linux: CI_DRM_11786 -> Patchwork_101396v3

  CI-20190529: 20190529
  CI_DRM_11786: ab3bfa333f25d26bb8bf414419f9a2e6a46a141f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6537: 331658a8475c8b0c0f7ffe5268a7318ef83da34e @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_101396v3: ab3bfa333f25d26bb8bf414419f9a2e6a46a141f @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

a1d70c3cc916 drm/i915: stolen memory use ttm backend
2c64d9ebd97c drm/i915/ttm: add buffer pin on alloc flag
3abc482f5079 drm/i915: allow memory region creators to alloc and free the region
935868b734da drm/i915: ttm move/clear logic fix
89100835a17b drm/i915: sanitize mem_flags for stolen buffers
5c25b8204ae2 drm/i915: instantiate ttm ranger manager for stolen memory
2546c2d528c1 drm/i915/gem: selftest should not attempt mmap of private regions
e45c7df632f4 drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level
bd3cd3cf8a29 drm/i915: limit ttm to dma32 for i965G[M]
df2b4c0c5eeb drm/i915/ttm: dont trample cache_level overrides during ttm move

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v3/index.html

[-- Attachment #2: Type: text/html, Size: 15508 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: ttm for stolen (rev4)
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
                   ` (12 preceding siblings ...)
  (?)
@ 2022-06-21 15:10 ` Patchwork
  -1 siblings, 0 replies; 48+ messages in thread
From: Patchwork @ 2022-06-21 15:10 UTC (permalink / raw)
  To: Robert Beckett; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: ttm for stolen (rev4)
URL   : https://patchwork.freedesktop.org/series/101396/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev4)
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
                   ` (13 preceding siblings ...)
  (?)
@ 2022-06-21 15:32 ` Patchwork
  -1 siblings, 0 replies; 48+ messages in thread
From: Patchwork @ 2022-06-21 15:32 UTC (permalink / raw)
  To: Robert Beckett; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 7936 bytes --]

== Series Details ==

Series: drm/i915: ttm for stolen (rev4)
URL   : https://patchwork.freedesktop.org/series/101396/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11788 -> Patchwork_101396v4
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_101396v4 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_101396v4, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/index.html

Participating hosts (41 -> 40)
------------------------------

  Additional (1): bat-dg2-9 
  Missing    (2): fi-icl-u2 fi-bdw-samus 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_101396v4:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@reset:
    - bat-adlp-4:         [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11788/bat-adlp-4/igt@i915_selftest@live@reset.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/bat-adlp-4/igt@i915_selftest@live@reset.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@kms_busy@basic@flip:
    - {bat-dg2-9}:        NOTRUN -> [DMESG-WARN][3]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/bat-dg2-9/igt@kms_busy@basic@flip.html

  
Known issues
------------

  Here are the changes found in Patchwork_101396v4 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_pm_rpm@module-reload:
    - bat-adlp-4:         [PASS][4] -> [DMESG-WARN][5] ([i915#1888] / [i915#3576])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11788/bat-adlp-4/igt@i915_pm_rpm@module-reload.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/bat-adlp-4/igt@i915_pm_rpm@module-reload.html
    - fi-bsw-n3050:       [PASS][6] -> [FAIL][7] ([i915#6042])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11788/fi-bsw-n3050/igt@i915_pm_rpm@module-reload.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/fi-bsw-n3050/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live@hangcheck:
    - fi-hsw-g3258:       [PASS][8] -> [INCOMPLETE][9] ([i915#4785])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11788/fi-hsw-g3258/igt@i915_selftest@live@hangcheck.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/fi-hsw-g3258/igt@i915_selftest@live@hangcheck.html

  * igt@kms_busy@basic@flip:
    - bat-adlp-4:         [PASS][10] -> [DMESG-WARN][11] ([i915#3576]) +1 similar issue
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11788/bat-adlp-4/igt@kms_busy@basic@flip.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/bat-adlp-4/igt@kms_busy@basic@flip.html
    - fi-tgl-u2:          [PASS][12] -> [DMESG-WARN][13] ([i915#402]) +1 similar issue
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11788/fi-tgl-u2/igt@kms_busy@basic@flip.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/fi-tgl-u2/igt@kms_busy@basic@flip.html

  * igt@runner@aborted:
    - fi-hsw-g3258:       NOTRUN -> [FAIL][14] ([fdo#109271] / [i915#4312] / [i915#6246])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/fi-hsw-g3258/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@hangcheck:
    - bat-dg1-6:          [DMESG-FAIL][15] ([i915#4494] / [i915#4957]) -> [PASS][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11788/bat-dg1-6/igt@i915_selftest@live@hangcheck.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/bat-dg1-6/igt@i915_selftest@live@hangcheck.html

  * igt@kms_busy@basic@modeset:
    - bat-adlp-4:         [DMESG-WARN][17] ([i915#3576]) -> [PASS][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11788/bat-adlp-4/igt@kms_busy@basic@modeset.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/bat-adlp-4/igt@kms_busy@basic@modeset.html

  * igt@kms_flip@basic-flip-vs-modeset@b-edp1:
    - {bat-adlp-6}:       [DMESG-WARN][19] ([i915#3576]) -> [PASS][20] +1 similar issue
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11788/bat-adlp-6/igt@kms_flip@basic-flip-vs-modeset@b-edp1.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/bat-adlp-6/igt@kms_flip@basic-flip-vs-modeset@b-edp1.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#3595]: https://gitlab.freedesktop.org/drm/intel/issues/3595
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4785]: https://gitlab.freedesktop.org/drm/intel/issues/4785
  [i915#4957]: https://gitlab.freedesktop.org/drm/intel/issues/4957
  [i915#5122]: https://gitlab.freedesktop.org/drm/intel/issues/5122
  [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903
  [i915#6042]: https://gitlab.freedesktop.org/drm/intel/issues/6042
  [i915#6246]: https://gitlab.freedesktop.org/drm/intel/issues/6246


Build changes
-------------

  * Linux: CI_DRM_11788 -> Patchwork_101396v4

  CI-20190529: 20190529
  CI_DRM_11788: ec6f2ba27fe4ed0f5f6aca3977f62793b65bb1b6 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6537: 331658a8475c8b0c0f7ffe5268a7318ef83da34e @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_101396v4: ec6f2ba27fe4ed0f5f6aca3977f62793b65bb1b6 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

1ccfb13b3dd4 drm/i915: stolen memory use ttm backend
1f0a4f9e1589 drm/i915/ttm: add buffer pin on alloc flag
a1161f42a91f drm/i915: allow memory region creators to alloc and free the region
6203c83e2d80 drm/i915: ttm move/clear logic fix
46d68f12720a drm/i915: sanitize mem_flags for stolen buffers
56fda106e476 drm/i915: instantiate ttm ranger manager for stolen memory
3e6660a8ae89 drm/i915/gem: selftest should not attempt mmap of private regions
97ef5cdac3c7 drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level
5cc00573a107 drm/i915: limit ttm to dma32 for i965G[M]
7d7f92f49606 drm/i915/ttm: dont trample cache_level overrides during ttm move

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v4/index.html

[-- Attachment #2: Type: text/html, Size: 7930 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: ttm for stolen (rev5)
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
                   ` (14 preceding siblings ...)
  (?)
@ 2022-06-21 17:16 ` Patchwork
  -1 siblings, 0 replies; 48+ messages in thread
From: Patchwork @ 2022-06-21 17:16 UTC (permalink / raw)
  To: Robert Beckett; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: ttm for stolen (rev5)
URL   : https://patchwork.freedesktop.org/series/101396/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)
  2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
                   ` (15 preceding siblings ...)
  (?)
@ 2022-06-21 17:37 ` Patchwork
  2022-06-21 19:11   ` Robert Beckett
  -1 siblings, 1 reply; 48+ messages in thread
From: Patchwork @ 2022-06-21 17:37 UTC (permalink / raw)
  To: Robert Beckett; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 12076 bytes --]

== Series Details ==

Series: drm/i915: ttm for stolen (rev5)
URL   : https://patchwork.freedesktop.org/series/101396/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_101396v5 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_101396v5, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html

Participating hosts (40 -> 41)
------------------------------

  Additional (2): fi-icl-u2 bat-dg2-9 
  Missing    (1): fi-bdw-samus 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_101396v5:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@reset:
    - bat-adlp-4:         [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@kms_busy@basic@flip:
    - {bat-dg2-9}:        NOTRUN -> [DMESG-WARN][3]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-dg2-9/igt@kms_busy@basic@flip.html

  
Known issues
------------

  Here are the changes found in Patchwork_101396v5 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_huc_copy@huc-copy:
    - fi-icl-u2:          NOTRUN -> [SKIP][4] ([i915#2190])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@random-engines:
    - fi-icl-u2:          NOTRUN -> [SKIP][5] ([i915#4613]) +3 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@gem_lmem_swapping@random-engines.html

  * igt@i915_pm_rpm@module-reload:
    - bat-adlp-4:         [PASS][6] -> [DMESG-WARN][7] ([i915#1888] / [i915#3576])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_pm_rpm@module-reload.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live@hangcheck:
    - bat-dg1-6:          [PASS][8] -> [DMESG-FAIL][9] ([i915#4494] / [i915#4957])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-dg1-6/igt@i915_selftest@live@hangcheck.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-dg1-6/igt@i915_selftest@live@hangcheck.html

  * igt@i915_suspend@basic-s3-without-i915:
    - fi-icl-u2:          NOTRUN -> [SKIP][10] ([i915#5903])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_busy@basic@flip:
    - bat-adlp-4:         [PASS][11] -> [DMESG-WARN][12] ([i915#3576])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@kms_busy@basic@flip.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@kms_busy@basic@flip.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-hsw-g3258:       NOTRUN -> [SKIP][13] ([fdo#109271] / [fdo#111827])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-hsw-g3258/igt@kms_chamelium@common-hpd-after-suspend.html
    - fi-hsw-4770:        NOTRUN -> [SKIP][14] ([fdo#109271] / [fdo#111827])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-hsw-4770/igt@kms_chamelium@common-hpd-after-suspend.html
    - fi-blb-e6850:       NOTRUN -> [SKIP][15] ([fdo#109271])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-blb-e6850/igt@kms_chamelium@common-hpd-after-suspend.html
    - fi-pnv-d510:        NOTRUN -> [SKIP][16] ([fdo#109271])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-pnv-d510/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-icl-u2:          NOTRUN -> [SKIP][17] ([fdo#111827]) +8 similar issues
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy:
    - fi-icl-u2:          NOTRUN -> [SKIP][18] ([fdo#109278] / [i915#4103]) +1 similar issue
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html

  * igt@kms_flip@basic-flip-vs-modeset@a-edp1:
    - fi-tgl-u2:          [PASS][19] -> [DMESG-WARN][20] ([i915#402]) +1 similar issue
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html

  * igt@kms_force_connector_basic@force-connector-state:
    - fi-icl-u2:          NOTRUN -> [WARN][21] ([i915#6008])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_force_connector_basic@force-connector-state.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-icl-u2:          NOTRUN -> [SKIP][22] ([fdo#109285])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - fi-icl-u2:          NOTRUN -> [SKIP][23] ([fdo#109278])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - fi-icl-u2:          NOTRUN -> [SKIP][24] ([i915#3555])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-userptr:
    - fi-icl-u2:          NOTRUN -> [SKIP][25] ([fdo#109295] / [i915#3301])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@prime_vgem@basic-userptr.html

  * igt@runner@aborted:
    - fi-cfl-guc:         NOTRUN -> [FAIL][26] ([i915#4312])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-cfl-guc/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@hangcheck:
    - fi-hsw-4770:        [INCOMPLETE][27] ([i915#3303] / [i915#4785]) -> [PASS][28]
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
    - fi-hsw-g3258:       [INCOMPLETE][29] ([i915#3303] / [i915#4785]) -> [PASS][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/fi-hsw-g3258/igt@i915_selftest@live@hangcheck.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-hsw-g3258/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@requests:
    - fi-blb-e6850:       [DMESG-FAIL][31] ([i915#4528]) -> [PASS][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/fi-blb-e6850/igt@i915_selftest@live@requests.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-blb-e6850/igt@i915_selftest@live@requests.html
    - fi-pnv-d510:        [DMESG-FAIL][33] ([i915#4528]) -> [PASS][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/fi-pnv-d510/igt@i915_selftest@live@requests.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-pnv-d510/igt@i915_selftest@live@requests.html

  * igt@kms_flip@basic-flip-vs-modeset@a-edp1:
    - bat-adlp-4:         [DMESG-WARN][35] ([i915#3576]) -> [PASS][36]
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html

  * igt@kms_flip@basic-plain-flip@a-edp1:
    - {bat-adlp-6}:       [DMESG-WARN][37] ([i915#3576]) -> [PASS][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-6/igt@kms_flip@basic-plain-flip@a-edp1.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-6/igt@kms_flip@basic-plain-flip@a-edp1.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3303]: https://gitlab.freedesktop.org/drm/intel/issues/3303
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#3595]: https://gitlab.freedesktop.org/drm/intel/issues/3595
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4785]: https://gitlab.freedesktop.org/drm/intel/issues/4785
  [i915#4957]: https://gitlab.freedesktop.org/drm/intel/issues/4957
  [i915#5122]: https://gitlab.freedesktop.org/drm/intel/issues/5122
  [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190
  [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903
  [i915#6008]: https://gitlab.freedesktop.org/drm/intel/issues/6008


Build changes
-------------

  * Linux: CI_DRM_11790 -> Patchwork_101396v5

  CI-20190529: 20190529
  CI_DRM_11790: a838c72d1e41413a5ace5e3cf2ada93c2e81f66b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6537: 331658a8475c8b0c0f7ffe5268a7318ef83da34e @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_101396v5: a838c72d1e41413a5ace5e3cf2ada93c2e81f66b @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

cb4854cf1605 drm/i915: stolen memory use ttm backend
25a96de09969 drm/i915/ttm: add buffer pin on alloc flag
ad58b349349b drm/i915: allow memory region creators to alloc and free the region
785ff03c0542 drm/i915: ttm move/clear logic fix
d0902500e074 drm/i915: sanitize mem_flags for stolen buffers
8a7630243d04 drm/i915: instantiate ttm ranger manager for stolen memory
055ef33c2ac8 drm/i915/gem: selftest should not attempt mmap of private regions
eb22d2a146a5 drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level
f02995012a97 drm/i915: limit ttm to dma32 for i965G[M]
1770347378a1 drm/i915/ttm: dont trample cache_level overrides during ttm move

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html

[-- Attachment #2: Type: text/html, Size: 13648 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)
  2022-06-21 17:37 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
@ 2022-06-21 19:11   ` Robert Beckett
  2022-06-22  9:05     ` Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Robert Beckett @ 2022-06-21 19:11 UTC (permalink / raw)
  To: intel-gfx



On 21/06/2022 18:37, Patchwork wrote:
> *Patch Details*
> *Series:*	drm/i915: ttm for stolen (rev5)
> *URL:*	https://patchwork.freedesktop.org/series/101396/ 
> <https://patchwork.freedesktop.org/series/101396/>
> *State:*	failure
> *Details:* 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html>
> 
> 
>   CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5
> 
> 
>     Summary
> 
> *FAILURE*
> 
> Serious unknown changes coming with Patchwork_101396v5 absolutely need to be
> verified manually.
> 
> If you think the reported changes have nothing to do with the changes
> introduced in Patchwork_101396v5, please notify your bug team to allow them
> to document this new failure mode, which will reduce false positives in CI.
> 
> External URL: 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
> 
> 
>     Participating hosts (40 -> 41)
> 
> Additional (2): fi-icl-u2 bat-dg2-9
> Missing (1): fi-bdw-samus
> 
> 
>     Possible new issues
> 
> Here are the unknown changes that may have been introduced in 
> Patchwork_101396v5:
> 
> 
>       IGT changes
> 
> 
>         Possible regressions
> 
>   * igt@i915_selftest@live@reset:
>       o bat-adlp-4: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html>
>         -> DMESG-FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html>
> 

I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and occasionally on 
bat-adlp-6.

Should bat-adlp-4 be considered an unreliable machine like bat-adlp-6 is 
for now?

Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude low 
pages (128KiB) of stolen from use"

could this be an indication that maybe the original issue is worse on 
adlp machines?
I have only ever seen page page 135 or 136 clobbered across many runs 
via trybot, so it looks fairly consistent.
Though excluding the use of over 540K of stolen might be too severe.


> 
>         Suppressed
> 
> The following results come from untrusted machines, tests, or statuses.
> They do not affect the overall result.
> 
>   * igt@kms_busy@basic@flip:
>       o {bat-dg2-9}: NOTRUN -> DMESG-WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-dg2-9/igt@kms_busy@basic@flip.html>
> 
> 
>     Known issues
> 
> Here are the changes found in Patchwork_101396v5 that come from known 
> issues:
> 
> 
>       IGT changes
> 
> 
>         Issues hit
> 
>   *
> 
>     igt@gem_huc_copy@huc-copy:
> 
>       o fi-icl-u2: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@gem_huc_copy@huc-copy.html>
>         (i915#2190 <https://gitlab.freedesktop.org/drm/intel/issues/2190>)
>   *
> 
>     igt@gem_lmem_swapping@random-engines:
> 
>       o fi-icl-u2: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@gem_lmem_swapping@random-engines.html>
>         (i915#4613
>         <https://gitlab.freedesktop.org/drm/intel/issues/4613>) +3
>         similar issues
>   *
> 
>     igt@i915_pm_rpm@module-reload:
> 
>       o bat-adlp-4: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_pm_rpm@module-reload.html>
>         -> DMESG-WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_pm_rpm@module-reload.html>
>         (i915#1888
>         <https://gitlab.freedesktop.org/drm/intel/issues/1888> /
>         i915#3576 <https://gitlab.freedesktop.org/drm/intel/issues/3576>)
>   *
> 
>     igt@i915_selftest@live@hangcheck:
> 
>       o bat-dg1-6: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-dg1-6/igt@i915_selftest@live@hangcheck.html>
>         -> DMESG-FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-dg1-6/igt@i915_selftest@live@hangcheck.html>
>         (i915#4494
>         <https://gitlab.freedesktop.org/drm/intel/issues/4494> /
>         i915#4957 <https://gitlab.freedesktop.org/drm/intel/issues/4957>)
>   *
> 
>     igt@i915_suspend@basic-s3-without-i915:
> 
>       o fi-icl-u2: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@i915_suspend@basic-s3-without-i915.html>
>         (i915#5903 <https://gitlab.freedesktop.org/drm/intel/issues/5903>)
>   *
> 
>     igt@kms_busy@basic@flip:
> 
>       o bat-adlp-4: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@kms_busy@basic@flip.html>
>         -> DMESG-WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@kms_busy@basic@flip.html>
>         (i915#3576 <https://gitlab.freedesktop.org/drm/intel/issues/3576>)
>   *
> 
>     igt@kms_chamelium@common-hpd-after-suspend:
> 
>       o
> 
>         fi-hsw-g3258: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-hsw-g3258/igt@kms_chamelium@common-hpd-after-suspend.html>
>         (fdo#109271
>         <https://bugs.freedesktop.org/show_bug.cgi?id=109271> /
>         fdo#111827 <https://bugs.freedesktop.org/show_bug.cgi?id=111827>)
> 
>       o
> 
>         fi-hsw-4770: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-hsw-4770/igt@kms_chamelium@common-hpd-after-suspend.html>
>         (fdo#109271
>         <https://bugs.freedesktop.org/show_bug.cgi?id=109271> /
>         fdo#111827 <https://bugs.freedesktop.org/show_bug.cgi?id=111827>)
> 
>       o
> 
>         fi-blb-e6850: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-blb-e6850/igt@kms_chamelium@common-hpd-after-suspend.html>
>         (fdo#109271 <https://bugs.freedesktop.org/show_bug.cgi?id=109271>)
> 
>       o
> 
>         fi-pnv-d510: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-pnv-d510/igt@kms_chamelium@common-hpd-after-suspend.html>
>         (fdo#109271 <https://bugs.freedesktop.org/show_bug.cgi?id=109271>)
> 
>   *
> 
>     igt@kms_chamelium@hdmi-hpd-fast:
> 
>       o fi-icl-u2: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html>
>         (fdo#111827
>         <https://bugs.freedesktop.org/show_bug.cgi?id=111827>) +8
>         similar issues
>   *
> 
>     igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy:
> 
>       o fi-icl-u2: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html>
>         (fdo#109278
>         <https://bugs.freedesktop.org/show_bug.cgi?id=109278> /
>         i915#4103
>         <https://gitlab.freedesktop.org/drm/intel/issues/4103>) +1
>         similar issue
>   *
> 
>     igt@kms_flip@basic-flip-vs-modeset@a-edp1:
> 
>       o fi-tgl-u2: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html>
>         -> DMESG-WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html>
>         (i915#402 <https://gitlab.freedesktop.org/drm/intel/issues/402>)
>         +1 similar issue
>   *
> 
>     igt@kms_force_connector_basic@force-connector-state:
> 
>       o fi-icl-u2: NOTRUN -> WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_force_connector_basic@force-connector-state.html>
>         (i915#6008 <https://gitlab.freedesktop.org/drm/intel/issues/6008>)
>   *
> 
>     igt@kms_force_connector_basic@force-load-detect:
> 
>       o fi-icl-u2: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_force_connector_basic@force-load-detect.html>
>         (fdo#109285 <https://bugs.freedesktop.org/show_bug.cgi?id=109285>)
>   *
> 
>     igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
> 
>       o fi-icl-u2: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html>
>         (fdo#109278 <https://bugs.freedesktop.org/show_bug.cgi?id=109278>)
>   *
> 
>     igt@kms_setmode@basic-clone-single-crtc:
> 
>       o fi-icl-u2: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@kms_setmode@basic-clone-single-crtc.html>
>         (i915#3555 <https://gitlab.freedesktop.org/drm/intel/issues/3555>)
>   *
> 
>     igt@prime_vgem@basic-userptr:
> 
>       o fi-icl-u2: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-icl-u2/igt@prime_vgem@basic-userptr.html>
>         (fdo#109295
>         <https://bugs.freedesktop.org/show_bug.cgi?id=109295> /
>         i915#3301 <https://gitlab.freedesktop.org/drm/intel/issues/3301>)
>   *
> 
>     igt@runner@aborted:
> 
>       o fi-cfl-guc: NOTRUN -> FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-cfl-guc/igt@runner@aborted.html>
>         (i915#4312 <https://gitlab.freedesktop.org/drm/intel/issues/4312>)
> 
> 
>         Possible fixes
> 
>   *
> 
>     igt@i915_selftest@live@hangcheck:
> 
>       o
> 
>         fi-hsw-4770: INCOMPLETE
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html>
>         (i915#3303
>         <https://gitlab.freedesktop.org/drm/intel/issues/3303> /
>         i915#4785
>         <https://gitlab.freedesktop.org/drm/intel/issues/4785>) -> PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html>
> 
>       o
> 
>         fi-hsw-g3258: INCOMPLETE
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/fi-hsw-g3258/igt@i915_selftest@live@hangcheck.html>
>         (i915#3303
>         <https://gitlab.freedesktop.org/drm/intel/issues/3303> /
>         i915#4785
>         <https://gitlab.freedesktop.org/drm/intel/issues/4785>) -> PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-hsw-g3258/igt@i915_selftest@live@hangcheck.html>
> 
>   *
> 
>     igt@i915_selftest@live@requests:
> 
>       o
> 
>         fi-blb-e6850: DMESG-FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/fi-blb-e6850/igt@i915_selftest@live@requests.html>
>         (i915#4528
>         <https://gitlab.freedesktop.org/drm/intel/issues/4528>) -> PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-blb-e6850/igt@i915_selftest@live@requests.html>
> 
>       o
> 
>         fi-pnv-d510: DMESG-FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/fi-pnv-d510/igt@i915_selftest@live@requests.html>
>         (i915#4528
>         <https://gitlab.freedesktop.org/drm/intel/issues/4528>) -> PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/fi-pnv-d510/igt@i915_selftest@live@requests.html>
> 
>   *
> 
>     igt@kms_flip@basic-flip-vs-modeset@a-edp1:
> 
>       o bat-adlp-4: DMESG-WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html>
>         (i915#3576
>         <https://gitlab.freedesktop.org/drm/intel/issues/3576>) -> PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html>
>   *
> 
>     igt@kms_flip@basic-plain-flip@a-edp1:
> 
>       o {bat-adlp-6}: DMESG-WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-6/igt@kms_flip@basic-plain-flip@a-edp1.html>
>         (i915#3576
>         <https://gitlab.freedesktop.org/drm/intel/issues/3576>) -> PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-6/igt@kms_flip@basic-plain-flip@a-edp1.html>
> 
> {name}: This element is suppressed. This means it is ignored when computing
> the status of the difference (SUCCESS, WARNING, or FAILURE).
> 
> 
>     Build changes
> 
>   * Linux: CI_DRM_11790 -> Patchwork_101396v5
> 
> CI-20190529: 20190529
> CI_DRM_11790: a838c72d1e41413a5ace5e3cf2ada93c2e81f66b @ 
> git://anongit.freedesktop.org/gfx-ci/linux
> IGT_6537: 331658a8475c8b0c0f7ffe5268a7318ef83da34e @ 
> https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
> Patchwork_101396v5: a838c72d1e41413a5ace5e3cf2ada93c2e81f66b @ 
> git://anongit.freedesktop.org/gfx-ci/linux
> 
> 
>       Linux commits
> 
> cb4854cf1605 drm/i915: stolen memory use ttm backend
> 25a96de09969 drm/i915/ttm: add buffer pin on alloc flag
> ad58b349349b drm/i915: allow memory region creators to alloc and free 
> the region
> 785ff03c0542 drm/i915: ttm move/clear logic fix
> d0902500e074 drm/i915: sanitize mem_flags for stolen buffers
> 8a7630243d04 drm/i915: instantiate ttm ranger manager for stolen memory
> 055ef33c2ac8 drm/i915/gem: selftest should not attempt mmap of private 
> regions
> eb22d2a146a5 drm/i915/ttm: only trust snooping for dgfx when deciding 
> default cache_level
> f02995012a97 drm/i915: limit ttm to dma32 for i965G[M]
> 1770347378a1 drm/i915/ttm: dont trample cache_level overrides during ttm 
> move
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)
  2022-06-21 19:11   ` Robert Beckett
@ 2022-06-22  9:05     ` Tvrtko Ursulin
  2022-06-27 17:08       ` Robert Beckett
  0 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2022-06-22  9:05 UTC (permalink / raw)
  To: Robert Beckett, intel-gfx


On 21/06/2022 20:11, Robert Beckett wrote:
> 
> 
> On 21/06/2022 18:37, Patchwork wrote:
>> *Patch Details*
>> *Series:*    drm/i915: ttm for stolen (rev5)
>> *URL:*    https://patchwork.freedesktop.org/series/101396/ 
>> <https://patchwork.freedesktop.org/series/101396/>
>> *State:*    failure
>> *Details:* 
>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html>
>>
>>
>>   CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5
>>
>>
>>     Summary
>>
>> *FAILURE*
>>
>> Serious unknown changes coming with Patchwork_101396v5 absolutely need 
>> to be
>> verified manually.
>>
>> If you think the reported changes have nothing to do with the changes
>> introduced in Patchwork_101396v5, please notify your bug team to allow 
>> them
>> to document this new failure mode, which will reduce false positives 
>> in CI.
>>
>> External URL: 
>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
>>
>>
>>     Participating hosts (40 -> 41)
>>
>> Additional (2): fi-icl-u2 bat-dg2-9
>> Missing (1): fi-bdw-samus
>>
>>
>>     Possible new issues
>>
>> Here are the unknown changes that may have been introduced in 
>> Patchwork_101396v5:
>>
>>
>>       IGT changes
>>
>>
>>         Possible regressions
>>
>>   * igt@i915_selftest@live@reset:
>>       o bat-adlp-4: PASS
>>         
>> <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>
>>         -> DMESG-FAIL
>>         
>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>
>>
> 
> I keep hitting clobbered pages during engine resets on bat-adlp-4.
> It seems to happen most of the time on that machine and occasionally on 
> bat-adlp-6.
> 
> Should bat-adlp-4 be considered an unreliable machine like bat-adlp-6 is 
> for now?
> 
> Alternatively, seeing the history of this in
> 
> commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude low 
> pages (128KiB) of stolen from use"
> 
> could this be an indication that maybe the original issue is worse on 
> adlp machines?
> I have only ever seen page page 135 or 136 clobbered across many runs 
> via trybot, so it looks fairly consistent.
> Though excluding the use of over 540K of stolen might be too severe.

Don't know but I see that on the latest version you even hit pages 165/166.

Any history of hitting this in CI without your series? If not, are there 
some other changes which could explain it? Are you touching the selftest 
itself?

Hexdump of the clobbered page looks quite complex. Especially 
POISON_FREE. Any idea how that ends up there?

Btw what is the benefit of converting stolen to start with? It's not 
much of a backend since it just uses the drm range manager. So quite 
thin and uneventful. Diffstats for the series also do not look like you 
end up with much code reduction?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)
  2022-06-22  9:05     ` Tvrtko Ursulin
@ 2022-06-27 17:08       ` Robert Beckett
  2022-06-28  8:46         ` Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Robert Beckett @ 2022-06-27 17:08 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx



On 22/06/2022 10:05, Tvrtko Ursulin wrote:
> 
> On 21/06/2022 20:11, Robert Beckett wrote:
>>
>>
>> On 21/06/2022 18:37, Patchwork wrote:
>>> *Patch Details*
>>> *Series:*    drm/i915: ttm for stolen (rev5)
>>> *URL:*    https://patchwork.freedesktop.org/series/101396/ 
>>> <https://patchwork.freedesktop.org/series/101396/>
>>> *State:*    failure
>>> *Details:* 
>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html> 
>>>
>>>
>>>
>>>   CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5
>>>
>>>
>>>     Summary
>>>
>>> *FAILURE*
>>>
>>> Serious unknown changes coming with Patchwork_101396v5 absolutely 
>>> need to be
>>> verified manually.
>>>
>>> If you think the reported changes have nothing to do with the changes
>>> introduced in Patchwork_101396v5, please notify your bug team to 
>>> allow them
>>> to document this new failure mode, which will reduce false positives 
>>> in CI.
>>>
>>> External URL: 
>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
>>>
>>>
>>>     Participating hosts (40 -> 41)
>>>
>>> Additional (2): fi-icl-u2 bat-dg2-9
>>> Missing (1): fi-bdw-samus
>>>
>>>
>>>     Possible new issues
>>>
>>> Here are the unknown changes that may have been introduced in 
>>> Patchwork_101396v5:
>>>
>>>
>>>       IGT changes
>>>
>>>
>>>         Possible regressions
>>>
>>>   * igt@i915_selftest@live@reset:
>>>       o bat-adlp-4: PASS
>>> <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>
>>>         -> DMESG-FAIL
>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>
>>>
>>
>> I keep hitting clobbered pages during engine resets on bat-adlp-4.
>> It seems to happen most of the time on that machine and occasionally 
>> on bat-adlp-6.
>>
>> Should bat-adlp-4 be considered an unreliable machine like bat-adlp-6 
>> is for now?
>>
>> Alternatively, seeing the history of this in
>>
>> commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude low 
>> pages (128KiB) of stolen from use"
>>
>> could this be an indication that maybe the original issue is worse on 
>> adlp machines?
>> I have only ever seen page page 135 or 136 clobbered across many runs 
>> via trybot, so it looks fairly consistent.
>> Though excluding the use of over 540K of stolen might be too severe.
> 
> Don't know but I see that on the latest version you even hit pages 165/166.
> 
> Any history of hitting this in CI without your series? If not, are there 
> some other changes which could explain it? Are you touching the selftest 
> itself?
> 
> Hexdump of the clobbered page looks quite complex. Especially 
> POISON_FREE. Any idea how that ends up there?


(see 
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings702)

after lots of slow debug via CI, it looks like the issue is that a ring 
buffer was allocated and taking up that page during the initial crc 
capture in the test, but by the time it came to check for corruption, it 
had been freed from that page.

The test has a number of weaknesses:

1. the busy check is done twice, without taking in to account any change 
in between. I assume previously this could be relied on never to occur, 
but now it can for some reason (more on that later)

2. the engine reset returns early with an error for guc submission 
engines, but it is silently ignored in the test. Perhaps it should 
ignore guc submission engines as it is a largely useless test for those 
situations.


A quick obvious fix is to have a busy bitmask that remembers each page's 
busy state initially and only check for corruption if it was busy during 
both checks.

However, the main question is why this is occurring now with my changes.
I have added more debug to check where the stolen memory is being freed, 
but the first run last night didn't hit the issue for once.
I am running again now, will report back if I figure out where it is 
being freed.

I am pretty sure the "corruption" (which isn't actually corruption) is 
from a ring buffer.
The POISON_FREE is the only difference between the captured before and 
after dumps:

[0040] 00000000 02800000 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 
6b6b6b6b

with the 2nd dword being the MI_ARB_CHECK used for the spinner.
I think this is the request poisoning from i915_request_retire()

The bit I don't know yet is why a ring buffer was freed between the 
initial crc capture and the corruption check. The spinner should be 
active across the entire test, maintaining a ref on the context and it's 
ring.

hopefully my latest debug will give more answers.


> 
> Btw what is the benefit of converting stolen to start with? It's not 
> much of a backend since it just uses the drm range manager. So quite 
> thin and uneventful. Diffstats for the series also do not look like you 
> end up with much code reduction?
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)
  2022-06-27 17:08       ` Robert Beckett
@ 2022-06-28  8:46         ` Tvrtko Ursulin
  2022-06-28 16:22           ` Robert Beckett
  0 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2022-06-28  8:46 UTC (permalink / raw)
  To: Robert Beckett, intel-gfx


On 27/06/2022 18:08, Robert Beckett wrote:
> 
> 
> On 22/06/2022 10:05, Tvrtko Ursulin wrote:
>>
>> On 21/06/2022 20:11, Robert Beckett wrote:
>>>
>>>
>>> On 21/06/2022 18:37, Patchwork wrote:
>>>> *Patch Details*
>>>> *Series:*    drm/i915: ttm for stolen (rev5)
>>>> *URL:*    https://patchwork.freedesktop.org/series/101396/ 
>>>> <https://patchwork.freedesktop.org/series/101396/>
>>>> *State:*    failure
>>>> *Details:* 
>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html> 
>>>>
>>>>
>>>>
>>>>   CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5
>>>>
>>>>
>>>>     Summary
>>>>
>>>> *FAILURE*
>>>>
>>>> Serious unknown changes coming with Patchwork_101396v5 absolutely 
>>>> need to be
>>>> verified manually.
>>>>
>>>> If you think the reported changes have nothing to do with the changes
>>>> introduced in Patchwork_101396v5, please notify your bug team to 
>>>> allow them
>>>> to document this new failure mode, which will reduce false positives 
>>>> in CI.
>>>>
>>>> External URL: 
>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
>>>>
>>>>
>>>>     Participating hosts (40 -> 41)
>>>>
>>>> Additional (2): fi-icl-u2 bat-dg2-9
>>>> Missing (1): fi-bdw-samus
>>>>
>>>>
>>>>     Possible new issues
>>>>
>>>> Here are the unknown changes that may have been introduced in 
>>>> Patchwork_101396v5:
>>>>
>>>>
>>>>       IGT changes
>>>>
>>>>
>>>>         Possible regressions
>>>>
>>>>   * igt@i915_selftest@live@reset:
>>>>       o bat-adlp-4: PASS
>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>>
>>>>         -> DMESG-FAIL
>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>>
>>>>
>>>
>>> I keep hitting clobbered pages during engine resets on bat-adlp-4.
>>> It seems to happen most of the time on that machine and occasionally 
>>> on bat-adlp-6.
>>>
>>> Should bat-adlp-4 be considered an unreliable machine like bat-adlp-6 
>>> is for now?
>>>
>>> Alternatively, seeing the history of this in
>>>
>>> commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude 
>>> low pages (128KiB) of stolen from use"
>>>
>>> could this be an indication that maybe the original issue is worse on 
>>> adlp machines?
>>> I have only ever seen page page 135 or 136 clobbered across many runs 
>>> via trybot, so it looks fairly consistent.
>>> Though excluding the use of over 540K of stolen might be too severe.
>>
>> Don't know but I see that on the latest version you even hit pages 
>> 165/166.
>>
>> Any history of hitting this in CI without your series? If not, are 
>> there some other changes which could explain it? Are you touching the 
>> selftest itself?
>>
>> Hexdump of the clobbered page looks quite complex. Especially 
>> POISON_FREE. Any idea how that ends up there?
> 
> 
> (see 
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings702) 
> 
> 
> after lots of slow debug via CI, it looks like the issue is that a ring 
> buffer was allocated and taking up that page during the initial crc 
> capture in the test, but by the time it came to check for corruption, it 
> had been freed from that page.
> 
> The test has a number of weaknesses:
> 
> 1. the busy check is done twice, without taking in to account any change 
> in between. I assume previously this could be relied on never to occur, 
> but now it can for some reason (more on that later)

You mean the stolen page used/unused test? Probably the premise is that 
the test controls the driver completely ie. is the sole user and the two 
checks are run at the time where nothing else could have changed the state.

With the nerfed request (as with GuC) this actually should hold. In the 
generic case I am less sure, my working knowledge faded a bit, but 
perhaps there was something guaranteeing the spinner couldn't have been 
retired yet at the time of the second check. Would need clarifying at 
least in comments.
> 
> 2. the engine reset returns early with an error for guc submission 
> engines, but it is silently ignored in the test. Perhaps it should 
> ignore guc submission engines as it is a largely useless test for those 
> situations.

Yes looks dodgy indeed. You will need to summon the owners of the GuC 
backend to comment on this.

However even if the test should be skipped with GuC it is extremely 
interesting that you are hitting this so I suspect there is a more 
serious issue at play.

> A quick obvious fix is to have a busy bitmask that remembers each page's 
> busy state initially and only check for corruption if it was busy during 
> both checks.
> 
> However, the main question is why this is occurring now with my changes.
> I have added more debug to check where the stolen memory is being freed, 
> but the first run last night didn't hit the issue for once.
> I am running again now, will report back if I figure out where it is 
> being freed.
> 
> I am pretty sure the "corruption" (which isn't actually corruption) is 
> from a ring buffer.
> The POISON_FREE is the only difference between the captured before and 
> after dumps:
> 
> [0040] 00000000 02800000 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 
> 6b6b6b6b
> 
> with the 2nd dword being the MI_ARB_CHECK used for the spinner.
> I think this is the request poisoning from i915_request_retire()
> 
> The bit I don't know yet is why a ring buffer was freed between the 
> initial crc capture and the corruption check. The spinner should be 
> active across the entire test, maintaining a ref on the context and it's 
> ring.
> 
> hopefully my latest debug will give more answers.

Yeah if you can figure our whether the a) spinner is still active during 
the 2nd check (as I think it should be), and b) is the corruption 
detected in the same pages which were used in the 1st pass that would be 
interesting.

Regards,

Tvrtko

> 
> 
>>
>> Btw what is the benefit of converting stolen to start with? It's not 
>> much of a backend since it just uses the drm range manager. So quite 
>> thin and uneventful. Diffstats for the series also do not look like 
>> you end up with much code reduction?
>>
>> Regards,
>>
>> Tvrtko

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)
  2022-06-28  8:46         ` Tvrtko Ursulin
@ 2022-06-28 16:22           ` Robert Beckett
  2022-06-29 12:51             ` Robert Beckett
  0 siblings, 1 reply; 48+ messages in thread
From: Robert Beckett @ 2022-06-28 16:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx



On 28/06/2022 09:46, Tvrtko Ursulin wrote:
> 
> On 27/06/2022 18:08, Robert Beckett wrote:
>>
>>
>> On 22/06/2022 10:05, Tvrtko Ursulin wrote:
>>>
>>> On 21/06/2022 20:11, Robert Beckett wrote:
>>>>
>>>>
>>>> On 21/06/2022 18:37, Patchwork wrote:
>>>>> *Patch Details*
>>>>> *Series:*    drm/i915: ttm for stolen (rev5)
>>>>> *URL:*    https://patchwork.freedesktop.org/series/101396/ 
>>>>> <https://patchwork.freedesktop.org/series/101396/>
>>>>> *State:*    failure
>>>>> *Details:* 
>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html> 
>>>>>
>>>>>
>>>>>
>>>>>   CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5
>>>>>
>>>>>
>>>>>     Summary
>>>>>
>>>>> *FAILURE*
>>>>>
>>>>> Serious unknown changes coming with Patchwork_101396v5 absolutely 
>>>>> need to be
>>>>> verified manually.
>>>>>
>>>>> If you think the reported changes have nothing to do with the changes
>>>>> introduced in Patchwork_101396v5, please notify your bug team to 
>>>>> allow them
>>>>> to document this new failure mode, which will reduce false 
>>>>> positives in CI.
>>>>>
>>>>> External URL: 
>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
>>>>>
>>>>>
>>>>>     Participating hosts (40 -> 41)
>>>>>
>>>>> Additional (2): fi-icl-u2 bat-dg2-9
>>>>> Missing (1): fi-bdw-samus
>>>>>
>>>>>
>>>>>     Possible new issues
>>>>>
>>>>> Here are the unknown changes that may have been introduced in 
>>>>> Patchwork_101396v5:
>>>>>
>>>>>
>>>>>       IGT changes
>>>>>
>>>>>
>>>>>         Possible regressions
>>>>>
>>>>>   * igt@i915_selftest@live@reset:
>>>>>       o bat-adlp-4: PASS
>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>>>
>>>>>         -> DMESG-FAIL
>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>>>
>>>>>
>>>>
>>>> I keep hitting clobbered pages during engine resets on bat-adlp-4.
>>>> It seems to happen most of the time on that machine and occasionally 
>>>> on bat-adlp-6.
>>>>
>>>> Should bat-adlp-4 be considered an unreliable machine like 
>>>> bat-adlp-6 is for now?
>>>>
>>>> Alternatively, seeing the history of this in
>>>>
>>>> commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude 
>>>> low pages (128KiB) of stolen from use"
>>>>
>>>> could this be an indication that maybe the original issue is worse 
>>>> on adlp machines?
>>>> I have only ever seen page page 135 or 136 clobbered across many 
>>>> runs via trybot, so it looks fairly consistent.
>>>> Though excluding the use of over 540K of stolen might be too severe.
>>>
>>> Don't know but I see that on the latest version you even hit pages 
>>> 165/166.
>>>
>>> Any history of hitting this in CI without your series? If not, are 
>>> there some other changes which could explain it? Are you touching the 
>>> selftest itself?
>>>
>>> Hexdump of the clobbered page looks quite complex. Especially 
>>> POISON_FREE. Any idea how that ends up there?
>>
>>
>> (see 
>> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings702) 
>>
>>
>> after lots of slow debug via CI, it looks like the issue is that a 
>> ring buffer was allocated and taking up that page during the initial 
>> crc capture in the test, but by the time it came to check for 
>> corruption, it had been freed from that page.
>>
>> The test has a number of weaknesses:
>>
>> 1. the busy check is done twice, without taking in to account any 
>> change in between. I assume previously this could be relied on never 
>> to occur, but now it can for some reason (more on that later)
> 
> You mean the stolen page used/unused test? Probably the premise is that 
> the test controls the driver completely ie. is the sole user and the two 
> checks are run at the time where nothing else could have changed the state.
> 
> With the nerfed request (as with GuC) this actually should hold. In the 
> generic case I am less sure, my working knowledge faded a bit, but 
> perhaps there was something guaranteeing the spinner couldn't have been 
> retired yet at the time of the second check. Would need clarifying at 
> least in comments.
>>
>> 2. the engine reset returns early with an error for guc submission 
>> engines, but it is silently ignored in the test. Perhaps it should 
>> ignore guc submission engines as it is a largely useless test for 
>> those situations.
> 
> Yes looks dodgy indeed. You will need to summon the owners of the GuC 
> backend to comment on this.
> 
> However even if the test should be skipped with GuC it is extremely 
> interesting that you are hitting this so I suspect there is a more 
> serious issue at play.

indeed. That's why I am keen to get to the root cause instead of just 
slapping in a fix.

> 
>> A quick obvious fix is to have a busy bitmask that remembers each 
>> page's busy state initially and only check for corruption if it was 
>> busy during both checks.
>>
>> However, the main question is why this is occurring now with my changes.
>> I have added more debug to check where the stolen memory is being 
>> freed, but the first run last night didn't hit the issue for once.
>> I am running again now, will report back if I figure out where it is 
>> being freed.
>>
>> I am pretty sure the "corruption" (which isn't actually corruption) is 
>> from a ring buffer.
>> The POISON_FREE is the only difference between the captured before and 
>> after dumps:
>>
>> [0040] 00000000 02800000 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 
>> 6b6b6b6b
>>
>> with the 2nd dword being the MI_ARB_CHECK used for the spinner.
>> I think this is the request poisoning from i915_request_retire()
>>
>> The bit I don't know yet is why a ring buffer was freed between the 
>> initial crc capture and the corruption check. The spinner should be 
>> active across the entire test, maintaining a ref on the context and 
>> it's ring.
>>
>> hopefully my latest debug will give more answers.
> 
> Yeah if you can figure our whether the a) spinner is still active during 
> the 2nd check (as I think it should be), and b) is the corruption 
> detected in the same pages which were used in the 1st pass that would be 
> interesting.

yep. The latest run is still stuck in the CI queue after 27 hours.
I think I have enough debug in there to catch it now.
Hopefully I can get a root cause once it gets chance to run.


> 
> Regards,
> 
> Tvrtko
> 
>>
>>
>>>
>>> Btw what is the benefit of converting stolen to start with? It's not 
>>> much of a backend since it just uses the drm range manager. So quite 
>>> thin and uneventful. Diffstats for the series also do not look like 
>>> you end up with much code reduction?
>>>
>>> Regards,
>>>
>>> Tvrtko

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)
  2022-06-28 16:22           ` Robert Beckett
@ 2022-06-29 12:51             ` Robert Beckett
  2022-06-30 14:20               ` Robert Beckett
  0 siblings, 1 reply; 48+ messages in thread
From: Robert Beckett @ 2022-06-29 12:51 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx, Hellstrom, Thomas



On 28/06/2022 17:22, Robert Beckett wrote:
> 
> 
> On 28/06/2022 09:46, Tvrtko Ursulin wrote:
>>
>> On 27/06/2022 18:08, Robert Beckett wrote:
>>>
>>>
>>> On 22/06/2022 10:05, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/06/2022 20:11, Robert Beckett wrote:
>>>>>
>>>>>
>>>>> On 21/06/2022 18:37, Patchwork wrote:
>>>>>> *Patch Details*
>>>>>> *Series:*    drm/i915: ttm for stolen (rev5)
>>>>>> *URL:*    https://patchwork.freedesktop.org/series/101396/ 
>>>>>> <https://patchwork.freedesktop.org/series/101396/>
>>>>>> *State:*    failure
>>>>>> *Details:* 
>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
>>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html> 
>>>>>>
>>>>>>
>>>>>>
>>>>>>   CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5
>>>>>>
>>>>>>
>>>>>>     Summary
>>>>>>
>>>>>> *FAILURE*
>>>>>>
>>>>>> Serious unknown changes coming with Patchwork_101396v5 absolutely 
>>>>>> need to be
>>>>>> verified manually.
>>>>>>
>>>>>> If you think the reported changes have nothing to do with the changes
>>>>>> introduced in Patchwork_101396v5, please notify your bug team to 
>>>>>> allow them
>>>>>> to document this new failure mode, which will reduce false 
>>>>>> positives in CI.
>>>>>>
>>>>>> External URL: 
>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
>>>>>>
>>>>>>
>>>>>>
>>>>>>     Participating hosts (40 -> 41)
>>>>>>
>>>>>> Additional (2): fi-icl-u2 bat-dg2-9
>>>>>> Missing (1): fi-bdw-samus
>>>>>>
>>>>>>
>>>>>>     Possible new issues
>>>>>>
>>>>>> Here are the unknown changes that may have been introduced in 
>>>>>> Patchwork_101396v5:
>>>>>>
>>>>>>
>>>>>>       IGT changes
>>>>>>
>>>>>>
>>>>>>         Possible regressions
>>>>>>
>>>>>>   * igt@i915_selftest@live@reset:
>>>>>>       o bat-adlp-4: PASS
>>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>>>>
>>>>>>         -> DMESG-FAIL
>>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>>>>
>>>>>>
>>>>>
>>>>> I keep hitting clobbered pages during engine resets on bat-adlp-4.
>>>>> It seems to happen most of the time on that machine and 
>>>>> occasionally on bat-adlp-6.
>>>>>
>>>>> Should bat-adlp-4 be considered an unreliable machine like 
>>>>> bat-adlp-6 is for now?
>>>>>
>>>>> Alternatively, seeing the history of this in
>>>>>
>>>>> commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude 
>>>>> low pages (128KiB) of stolen from use"
>>>>>
>>>>> could this be an indication that maybe the original issue is worse 
>>>>> on adlp machines?
>>>>> I have only ever seen page page 135 or 136 clobbered across many 
>>>>> runs via trybot, so it looks fairly consistent.
>>>>> Though excluding the use of over 540K of stolen might be too severe.
>>>>
>>>> Don't know but I see that on the latest version you even hit pages 
>>>> 165/166.
>>>>
>>>> Any history of hitting this in CI without your series? If not, are 
>>>> there some other changes which could explain it? Are you touching 
>>>> the selftest itself?
>>>>
>>>> Hexdump of the clobbered page looks quite complex. Especially 
>>>> POISON_FREE. Any idea how that ends up there?
>>>
>>>
>>> (see 
>>> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings702) 
>>>
>>>
>>> after lots of slow debug via CI, it looks like the issue is that a 
>>> ring buffer was allocated and taking up that page during the initial 
>>> crc capture in the test, but by the time it came to check for 
>>> corruption, it had been freed from that page.
>>>
>>> The test has a number of weaknesses:
>>>
>>> 1. the busy check is done twice, without taking in to account any 
>>> change in between. I assume previously this could be relied on never 
>>> to occur, but now it can for some reason (more on that later)
>>
>> You mean the stolen page used/unused test? Probably the premise is 
>> that the test controls the driver completely ie. is the sole user and 
>> the two checks are run at the time where nothing else could have 
>> changed the state.
>>
>> With the nerfed request (as with GuC) this actually should hold. In 
>> the generic case I am less sure, my working knowledge faded a bit, but 
>> perhaps there was something guaranteeing the spinner couldn't have 
>> been retired yet at the time of the second check. Would need 
>> clarifying at least in comments.
>>>
>>> 2. the engine reset returns early with an error for guc submission 
>>> engines, but it is silently ignored in the test. Perhaps it should 
>>> ignore guc submission engines as it is a largely useless test for 
>>> those situations.
>>
>> Yes looks dodgy indeed. You will need to summon the owners of the GuC 
>> backend to comment on this.
>>
>> However even if the test should be skipped with GuC it is extremely 
>> interesting that you are hitting this so I suspect there is a more 
>> serious issue at play.
> 
> indeed. That's why I am keen to get to the root cause instead of just 
> slapping in a fix.
> 
>>
>>> A quick obvious fix is to have a busy bitmask that remembers each 
>>> page's busy state initially and only check for corruption if it was 
>>> busy during both checks.
>>>
>>> However, the main question is why this is occurring now with my changes.
>>> I have added more debug to check where the stolen memory is being 
>>> freed, but the first run last night didn't hit the issue for once.
>>> I am running again now, will report back if I figure out where it is 
>>> being freed.
>>>
>>> I am pretty sure the "corruption" (which isn't actually corruption) 
>>> is from a ring buffer.
>>> The POISON_FREE is the only difference between the captured before 
>>> and after dumps:
>>>
>>> [0040] 00000000 02800000 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 
>>> 6b6b6b6b
>>>
>>> with the 2nd dword being the MI_ARB_CHECK used for the spinner.
>>> I think this is the request poisoning from i915_request_retire()
>>>
>>> The bit I don't know yet is why a ring buffer was freed between the 
>>> initial crc capture and the corruption check. The spinner should be 
>>> active across the entire test, maintaining a ref on the context and 
>>> it's ring.
>>>
>>> hopefully my latest debug will give more answers.
>>
>> Yeah if you can figure our whether the a) spinner is still active 
>> during the 2nd check (as I think it should be), and b) is the 
>> corruption detected in the same pages which were used in the 1st pass 
>> that would be interesting.
> 
> yep. The latest run is still stuck in the CI queue after 27 hours.
> I think I have enough debug in there to catch it now.
> Hopefully I can get a root cause once it gets chance to run.
> 

https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v7/fi-adl-ddr5/igt@i915_selftest@live@reset.html#dmesg-warnings496

well, the run finally happened.
And it shows that the freed resource happens from a workqueue. Not helpful.

I'll now add a saved stack traces to all objects that saves where it is 
allocated and freed/queued for free.


> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>>
>>>>
>>>> Btw what is the benefit of converting stolen to start with? It's not 
>>>> much of a backend since it just uses the drm range manager. So quite 
>>>> thin and uneventful. Diffstats for the series also do not look like 
>>>> you end up with much code reduction?
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)
  2022-06-29 12:51             ` Robert Beckett
@ 2022-06-30 14:20               ` Robert Beckett
  2022-06-30 14:52                 ` Hellstrom, Thomas
  2022-07-01  7:34                 ` Tvrtko Ursulin
  0 siblings, 2 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-30 14:20 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx, Hellstrom, Thomas, John Harrison,
	Matthew Brost



On 29/06/2022 13:51, Robert Beckett wrote:
> 
> 
> On 28/06/2022 17:22, Robert Beckett wrote:
>>
>>
>> On 28/06/2022 09:46, Tvrtko Ursulin wrote:
>>>
>>> On 27/06/2022 18:08, Robert Beckett wrote:
>>>>
>>>>
>>>> On 22/06/2022 10:05, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 21/06/2022 20:11, Robert Beckett wrote:
>>>>>>
>>>>>>
>>>>>> On 21/06/2022 18:37, Patchwork wrote:
>>>>>>> *Patch Details*
>>>>>>> *Series:*    drm/i915: ttm for stolen (rev5)
>>>>>>> *URL:*    https://patchwork.freedesktop.org/series/101396/ 
>>>>>>> <https://patchwork.freedesktop.org/series/101396/>
>>>>>>> *State:*    failure
>>>>>>> *Details:* 
>>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
>>>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html> 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5
>>>>>>>
>>>>>>>
>>>>>>>     Summary
>>>>>>>
>>>>>>> *FAILURE*
>>>>>>>
>>>>>>> Serious unknown changes coming with Patchwork_101396v5 absolutely 
>>>>>>> need to be
>>>>>>> verified manually.
>>>>>>>
>>>>>>> If you think the reported changes have nothing to do with the 
>>>>>>> changes
>>>>>>> introduced in Patchwork_101396v5, please notify your bug team to 
>>>>>>> allow them
>>>>>>> to document this new failure mode, which will reduce false 
>>>>>>> positives in CI.
>>>>>>>
>>>>>>> External URL: 
>>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     Participating hosts (40 -> 41)
>>>>>>>
>>>>>>> Additional (2): fi-icl-u2 bat-dg2-9
>>>>>>> Missing (1): fi-bdw-samus
>>>>>>>
>>>>>>>
>>>>>>>     Possible new issues
>>>>>>>
>>>>>>> Here are the unknown changes that may have been introduced in 
>>>>>>> Patchwork_101396v5:
>>>>>>>
>>>>>>>
>>>>>>>       IGT changes
>>>>>>>
>>>>>>>
>>>>>>>         Possible regressions
>>>>>>>
>>>>>>>   * igt@i915_selftest@live@reset:
>>>>>>>       o bat-adlp-4: PASS
>>>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>>>>>
>>>>>>>         -> DMESG-FAIL
>>>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I keep hitting clobbered pages during engine resets on bat-adlp-4.
>>>>>> It seems to happen most of the time on that machine and 
>>>>>> occasionally on bat-adlp-6.
>>>>>>
>>>>>> Should bat-adlp-4 be considered an unreliable machine like 
>>>>>> bat-adlp-6 is for now?
>>>>>>
>>>>>> Alternatively, seeing the history of this in
>>>>>>
>>>>>> commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude 
>>>>>> low pages (128KiB) of stolen from use"
>>>>>>
>>>>>> could this be an indication that maybe the original issue is worse 
>>>>>> on adlp machines?
>>>>>> I have only ever seen page page 135 or 136 clobbered across many 
>>>>>> runs via trybot, so it looks fairly consistent.
>>>>>> Though excluding the use of over 540K of stolen might be too severe.
>>>>>
>>>>> Don't know but I see that on the latest version you even hit pages 
>>>>> 165/166.
>>>>>
>>>>> Any history of hitting this in CI without your series? If not, are 
>>>>> there some other changes which could explain it? Are you touching 
>>>>> the selftest itself?
>>>>>
>>>>> Hexdump of the clobbered page looks quite complex. Especially 
>>>>> POISON_FREE. Any idea how that ends up there?
>>>>
>>>>
>>>> (see 
>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings702) 
>>>>
>>>>
>>>> after lots of slow debug via CI, it looks like the issue is that a 
>>>> ring buffer was allocated and taking up that page during the initial 
>>>> crc capture in the test, but by the time it came to check for 
>>>> corruption, it had been freed from that page.
>>>>
>>>> The test has a number of weaknesses:
>>>>
>>>> 1. the busy check is done twice, without taking in to account any 
>>>> change in between. I assume previously this could be relied on never 
>>>> to occur, but now it can for some reason (more on that later)
>>>
>>> You mean the stolen page used/unused test? Probably the premise is 
>>> that the test controls the driver completely ie. is the sole user and 
>>> the two checks are run at the time where nothing else could have 
>>> changed the state.
>>>
>>> With the nerfed request (as with GuC) this actually should hold. In 
>>> the generic case I am less sure, my working knowledge faded a bit, 
>>> but perhaps there was something guaranteeing the spinner couldn't 
>>> have been retired yet at the time of the second check. Would need 
>>> clarifying at least in comments.
>>>>
>>>> 2. the engine reset returns early with an error for guc submission 
>>>> engines, but it is silently ignored in the test. Perhaps it should 
>>>> ignore guc submission engines as it is a largely useless test for 
>>>> those situations.
>>>
>>> Yes looks dodgy indeed. You will need to summon the owners of the GuC 
>>> backend to comment on this.
>>>
>>> However even if the test should be skipped with GuC it is extremely 
>>> interesting that you are hitting this so I suspect there is a more 
>>> serious issue at play.
>>
>> indeed. That's why I am keen to get to the root cause instead of just 
>> slapping in a fix.
>>
>>>
>>>> A quick obvious fix is to have a busy bitmask that remembers each 
>>>> page's busy state initially and only check for corruption if it was 
>>>> busy during both checks.
>>>>
>>>> However, the main question is why this is occurring now with my 
>>>> changes.
>>>> I have added more debug to check where the stolen memory is being 
>>>> freed, but the first run last night didn't hit the issue for once.
>>>> I am running again now, will report back if I figure out where it is 
>>>> being freed.
>>>>
>>>> I am pretty sure the "corruption" (which isn't actually corruption) 
>>>> is from a ring buffer.
>>>> The POISON_FREE is the only difference between the captured before 
>>>> and after dumps:
>>>>
>>>> [0040] 00000000 02800000 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 
>>>> 6b6b6b6b 6b6b6b6b
>>>>
>>>> with the 2nd dword being the MI_ARB_CHECK used for the spinner.
>>>> I think this is the request poisoning from i915_request_retire()
>>>>
>>>> The bit I don't know yet is why a ring buffer was freed between the 
>>>> initial crc capture and the corruption check. The spinner should be 
>>>> active across the entire test, maintaining a ref on the context and 
>>>> it's ring.
>>>>
>>>> hopefully my latest debug will give more answers.
>>>
>>> Yeah if you can figure our whether the a) spinner is still active 
>>> during the 2nd check (as I think it should be), and b) is the 
>>> corruption detected in the same pages which were used in the 1st pass 
>>> that would be interesting.
>>
>> yep. The latest run is still stuck in the CI queue after 27 hours.
>> I think I have enough debug in there to catch it now.
>> Hopefully I can get a root cause once it gets chance to run.
>>
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v7/fi-adl-ddr5/igt@i915_selftest@live@reset.html#dmesg-warnings496 
> 
> 
> well, the run finally happened.
> And it shows that the freed resource happens from a workqueue. Not helpful.
> 
> I'll now add a saved stack traces to all objects that saves where it is 
> allocated and freed/queued for free.
> 

https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v8/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings419

I'm pretty sure I know what is going on now.

igt_reset_engines_stolen() loops around each engine and calls 
__igt_reset_stolen() for that engine.


__igt_reset_stolen() does
intel_context_create()

igt_spinner_create_request()->intel_context_create_request()->__i915_request_create()->intel_context_get()

intel_context_put()

leaving the request as the remaining holder of the context.

it then does the reset, which does nothing on GuC setups, does the 
comparisons, then ends the spinner via igt_spinner_fini()->igt_spinner_end()
which lets the spinner request finish.

once the request is retired, intel_context_put() is eventually called, 
which starts the GuC teardown of the context as the request was the last 
holder of the context.

This GuC teardown is asynchronous via ct transactions.
By the time the ct_process_request() sees the 
INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE message, the test has already 
looped to the next engine and has already checked the original status of 
the page that the destroying context used for its ring buffer, so the 
test sees it being freed from the previous loop while testing the next 
engine. It considers this a corruption of the stolen memory due to the 
previously highlighted double checking of busy state for each page.

I think for now, we should simply not test GuC submission engines in 
line with the reset call returning an error.
If at some point we want to enable this test for GuC setups, then 
flushing and waiting for context cleanup would need to be added to the test.

Anyone know why per engine reset is not allowed for GuC submission setup?
looking at commit "eb5e7da736f3 drm/i915/guc: Reset implementation for 
new GuC interface" doesn't really detail why per engine resets are not 
allowed.
Maybe it just never got implemented? or are there reasons to not allow 
the host to request specific engine resets?



> 
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>
>>>>
>>>>>
>>>>> Btw what is the benefit of converting stolen to start with? It's 
>>>>> not much of a backend since it just uses the drm range manager. So 
>>>>> quite thin and uneventful. Diffstats for the series also do not 
>>>>> look like you end up with much code reduction?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Tvrtko

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)
  2022-06-30 14:20               ` Robert Beckett
@ 2022-06-30 14:52                 ` Hellstrom, Thomas
  2022-06-30 15:24                   ` Robert Beckett
  2022-07-01  7:34                 ` Tvrtko Ursulin
  1 sibling, 1 reply; 48+ messages in thread
From: Hellstrom, Thomas @ 2022-06-30 14:52 UTC (permalink / raw)
  To: Harrison, John C, Brost, Matthew, bob.beckett, tvrtko.ursulin, intel-gfx

Hi!

On Thu, 2022-06-30 at 15:20 +0100, Robert Beckett wrote:
> 
> 
> On 29/06/2022 13:51, Robert Beckett wrote:
> > 
> > 
> > On 28/06/2022 17:22, Robert Beckett wrote:
> > > 
> > > 
> > > On 28/06/2022 09:46, Tvrtko Ursulin wrote:
> > > > 
> > > > On 27/06/2022 18:08, Robert Beckett wrote:
> > > > > 
> > > > > 
> > > > > On 22/06/2022 10:05, Tvrtko Ursulin wrote:
> > > > > > 
> > > > > > On 21/06/2022 20:11, Robert Beckett wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On 21/06/2022 18:37, Patchwork wrote:
> > > > > > > > *Patch Details*
> > > > > > > > *Series:*    drm/i915: ttm for stolen (rev5)
> > > > > > > > *URL:*   
> > > > > > > > https://patchwork.freedesktop.org/series/101396/ 
> > > > > > > > <https://patchwork.freedesktop.org/series/101396/>
> > > > > > > > *State:*    failure
> > > > > > > > *Details:* 
> > > > > > > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
> > > > > > > >  
> > > > > > > > <
> > > > > > > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
> > > > > > > > >
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > >   CI Bug Log - changes from CI_DRM_11790 ->
> > > > > > > > Patchwork_101396v5
> > > > > > > > 
> > > > > > > > 
> > > > > > > >     Summary
> > > > > > > > 
> > > > > > > > *FAILURE*
> > > > > > > > 
> > > > > > > > Serious unknown changes coming with Patchwork_101396v5
> > > > > > > > absolutely 
> > > > > > > > need to be
> > > > > > > > verified manually.
> > > > > > > > 
> > > > > > > > If you think the reported changes have nothing to do
> > > > > > > > with the 
> > > > > > > > changes
> > > > > > > > introduced in Patchwork_101396v5, please notify your
> > > > > > > > bug team to 
> > > > > > > > allow them
> > > > > > > > to document this new failure mode, which will reduce
> > > > > > > > false 
> > > > > > > > positives in CI.
> > > > > > > > 
> > > > > > > > External URL: 
> > > > > > > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
> > > > > > > >  
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > >     Participating hosts (40 -> 41)
> > > > > > > > 
> > > > > > > > Additional (2): fi-icl-u2 bat-dg2-9
> > > > > > > > Missing (1): fi-bdw-samus
> > > > > > > > 
> > > > > > > > 
> > > > > > > >     Possible new issues
> > > > > > > > 
> > > > > > > > Here are the unknown changes that may have been
> > > > > > > > introduced in 
> > > > > > > > Patchwork_101396v5:
> > > > > > > > 
> > > > > > > > 
> > > > > > > >       IGT changes
> > > > > > > > 
> > > > > > > > 
> > > > > > > >         Possible regressions
> > > > > > > > 
> > > > > > > >   * igt@i915_selftest@live@reset:
> > > > > > > >       o bat-adlp-4: PASS
> > > > > > > > <
> > > > > > > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html
> > > > > > > > >
> > > > > > > > 
> > > > > > > >         -> DMESG-FAIL
> > > > > > > > <
> > > > > > > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html
> > > > > > > > >
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > I keep hitting clobbered pages during engine resets on
> > > > > > > bat-adlp-4.
> > > > > > > It seems to happen most of the time on that machine and 
> > > > > > > occasionally on bat-adlp-6.
> > > > > > > 
> > > > > > > Should bat-adlp-4 be considered an unreliable machine
> > > > > > > like 
> > > > > > > bat-adlp-6 is for now?
> > > > > > > 
> > > > > > > Alternatively, seeing the history of this in
> > > > > > > 
> > > > > > > commit 3da3c5c1c9825c24168f27b021339e90af37e969
> > > > > > > "drm/i915: Exclude 
> > > > > > > low pages (128KiB) of stolen from use"
> > > > > > > 
> > > > > > > could this be an indication that maybe the original issue
> > > > > > > is worse 
> > > > > > > on adlp machines?
> > > > > > > I have only ever seen page page 135 or 136 clobbered
> > > > > > > across many 
> > > > > > > runs via trybot, so it looks fairly consistent.
> > > > > > > Though excluding the use of over 540K of stolen might be
> > > > > > > too severe.
> > > > > > 
> > > > > > Don't know but I see that on the latest version you even
> > > > > > hit pages 
> > > > > > 165/166.
> > > > > > 
> > > > > > Any history of hitting this in CI without your series? If
> > > > > > not, are 
> > > > > > there some other changes which could explain it? Are you
> > > > > > touching 
> > > > > > the selftest itself?
> > > > > > 
> > > > > > Hexdump of the clobbered page looks quite complex.
> > > > > > Especially 
> > > > > > POISON_FREE. Any idea how that ends up there?
> > > > > 
> > > > > 
> > > > > (see 
> > > > > https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings702
> > > > > )
> > > > > 
> > > > > 
> > > > > after lots of slow debug via CI, it looks like the issue is
> > > > > that a 
> > > > > ring buffer was allocated and taking up that page during the
> > > > > initial 
> > > > > crc capture in the test, but by the time it came to check for
> > > > > corruption, it had been freed from that page.
> > > > > 
> > > > > The test has a number of weaknesses:
> > > > > 
> > > > > 1. the busy check is done twice, without taking in to account
> > > > > any 
> > > > > change in between. I assume previously this could be relied
> > > > > on never 
> > > > > to occur, but now it can for some reason (more on that later)
> > > > 
> > > > You mean the stolen page used/unused test? Probably the premise
> > > > is 
> > > > that the test controls the driver completely ie. is the sole
> > > > user and 
> > > > the two checks are run at the time where nothing else could
> > > > have 
> > > > changed the state.
> > > > 
> > > > With the nerfed request (as with GuC) this actually should
> > > > hold. In 
> > > > the generic case I am less sure, my working knowledge faded a
> > > > bit, 
> > > > but perhaps there was something guaranteeing the spinner
> > > > couldn't 
> > > > have been retired yet at the time of the second check. Would
> > > > need 
> > > > clarifying at least in comments.
> > > > > 
> > > > > 2. the engine reset returns early with an error for guc
> > > > > submission 
> > > > > engines, but it is silently ignored in the test. Perhaps it
> > > > > should 
> > > > > ignore guc submission engines as it is a largely useless test
> > > > > for 
> > > > > those situations.
> > > > 
> > > > Yes looks dodgy indeed. You will need to summon the owners of
> > > > the GuC 
> > > > backend to comment on this.
> > > > 
> > > > However even if the test should be skipped with GuC it is
> > > > extremely 
> > > > interesting that you are hitting this so I suspect there is a
> > > > more 
> > > > serious issue at play.
> > > 
> > > indeed. That's why I am keen to get to the root cause instead of
> > > just 
> > > slapping in a fix.
> > > 
> > > > 
> > > > > A quick obvious fix is to have a busy bitmask that remembers
> > > > > each 
> > > > > page's busy state initially and only check for corruption if
> > > > > it was 
> > > > > busy during both checks.
> > > > > 
> > > > > However, the main question is why this is occurring now with
> > > > > my 
> > > > > changes.
> > > > > I have added more debug to check where the stolen memory is
> > > > > being 
> > > > > freed, but the first run last night didn't hit the issue for
> > > > > once.
> > > > > I am running again now, will report back if I figure out
> > > > > where it is 
> > > > > being freed.
> > > > > 
> > > > > I am pretty sure the "corruption" (which isn't actually
> > > > > corruption) 
> > > > > is from a ring buffer.
> > > > > The POISON_FREE is the only difference between the captured
> > > > > before 
> > > > > and after dumps:
> > > > > 
> > > > > [0040] 00000000 02800000 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 
> > > > > 6b6b6b6b 6b6b6b6b
> > > > > 
> > > > > with the 2nd dword being the MI_ARB_CHECK used for the
> > > > > spinner.
> > > > > I think this is the request poisoning from
> > > > > i915_request_retire()
> > > > > 
> > > > > The bit I don't know yet is why a ring buffer was freed
> > > > > between the 
> > > > > initial crc capture and the corruption check. The spinner
> > > > > should be 
> > > > > active across the entire test, maintaining a ref on the
> > > > > context and 
> > > > > it's ring.
> > > > > 
> > > > > hopefully my latest debug will give more answers.
> > > > 
> > > > Yeah if you can figure our whether the a) spinner is still
> > > > active 
> > > > during the 2nd check (as I think it should be), and b) is the 
> > > > corruption detected in the same pages which were used in the
> > > > 1st pass 
> > > > that would be interesting.
> > > 
> > > yep. The latest run is still stuck in the CI queue after 27
> > > hours.
> > > I think I have enough debug in there to catch it now.
> > > Hopefully I can get a root cause once it gets chance to run.
> > > 
> > 
> > https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v7/fi-adl-ddr5/igt@i915_selftest@live@reset.html#dmesg-warnings496
> >  
> > 
> > 
> > well, the run finally happened.
> > And it shows that the freed resource happens from a workqueue. Not
> > helpful.
> > 
> > I'll now add a saved stack traces to all objects that saves where
> > it is 
> > allocated and freed/queued for free.
> > 
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v8/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings419
> 
> I'm pretty sure I know what is going on now.
> 
> igt_reset_engines_stolen() loops around each engine and calls 
> __igt_reset_stolen() for that engine.
> 
> 
> __igt_reset_stolen() does
> intel_context_create()
> 
> igt_spinner_create_request()->intel_context_create_request()-
> >__i915_request_create()->intel_context_get()
> 
> intel_context_put()
> 
> leaving the request as the remaining holder of the context.
> 
> it then does the reset, which does nothing on GuC setups, does the 
> comparisons, then ends the spinner via igt_spinner_fini()-
> >igt_spinner_end()
> which lets the spinner request finish.
> 
> once the request is retired, intel_context_put() is eventually
> called, 
> which starts the GuC teardown of the context as the request was the
> last 
> holder of the context.
> 
> This GuC teardown is asynchronous via ct transactions.
> By the time the ct_process_request() sees the 
> INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE message, the test has
> already 
> looped to the next engine and has already checked the original status
> of 
> the page that the destroying context used for its ring buffer, so the
> test sees it being freed from the previous loop while testing the
> next 
> engine. It considers this a corruption of the stolen memory due to
> the 
> previously highlighted double checking of busy state for each page.
> 
> I think for now, we should simply not test GuC submission engines in 
> line with the reset call returning an error.
> If at some point we want to enable this test for GuC setups, then 
> flushing and waiting for context cleanup would need to be added to
> the test.
> 
> Anyone know why per engine reset is not allowed for GuC submission
> setup?
> looking at commit "eb5e7da736f3 drm/i915/guc: Reset implementation
> for 
> new GuC interface" doesn't really detail why per engine resets are
> not 
> allowed.
> Maybe it just never got implemented? or are there reasons to not
> allow 
> the host to request specific engine resets?
> 

IIRC, the GuC by design decides for itself when it needs to do a per-
engine reset, hence the host can't trigger those.

/Thomas


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)
  2022-06-30 14:52                 ` Hellstrom, Thomas
@ 2022-06-30 15:24                   ` Robert Beckett
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Beckett @ 2022-06-30 15:24 UTC (permalink / raw)
  To: Hellstrom, Thomas, Harrison, John C, Brost, Matthew,
	tvrtko.ursulin, intel-gfx



On 30/06/2022 15:52, Hellstrom, Thomas wrote:
> Hi!
> 
> On Thu, 2022-06-30 at 15:20 +0100, Robert Beckett wrote:
>>
>>
>> On 29/06/2022 13:51, Robert Beckett wrote:
>>>
>>>
>>> On 28/06/2022 17:22, Robert Beckett wrote:
>>>>
>>>>
>>>> On 28/06/2022 09:46, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 27/06/2022 18:08, Robert Beckett wrote:
>>>>>>
>>>>>>
>>>>>> On 22/06/2022 10:05, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 21/06/2022 20:11, Robert Beckett wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 21/06/2022 18:37, Patchwork wrote:
>>>>>>>>> *Patch Details*
>>>>>>>>> *Series:*    drm/i915: ttm for stolen (rev5)
>>>>>>>>> *URL:*
>>>>>>>>> https://patchwork.freedesktop.org/series/101396/
>>>>>>>>> <https://patchwork.freedesktop.org/series/101396/>
>>>>>>>>> *State:*    failure
>>>>>>>>> *Details:*
>>>>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
>>>>>>>>>   
>>>>>>>>> <
>>>>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    CI Bug Log - changes from CI_DRM_11790 ->
>>>>>>>>> Patchwork_101396v5
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>      Summary
>>>>>>>>>
>>>>>>>>> *FAILURE*
>>>>>>>>>
>>>>>>>>> Serious unknown changes coming with Patchwork_101396v5
>>>>>>>>> absolutely
>>>>>>>>> need to be
>>>>>>>>> verified manually.
>>>>>>>>>
>>>>>>>>> If you think the reported changes have nothing to do
>>>>>>>>> with the
>>>>>>>>> changes
>>>>>>>>> introduced in Patchwork_101396v5, please notify your
>>>>>>>>> bug team to
>>>>>>>>> allow them
>>>>>>>>> to document this new failure mode, which will reduce
>>>>>>>>> false
>>>>>>>>> positives in CI.
>>>>>>>>>
>>>>>>>>> External URL:
>>>>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
>>>>>>>>>   
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>      Participating hosts (40 -> 41)
>>>>>>>>>
>>>>>>>>> Additional (2): fi-icl-u2 bat-dg2-9
>>>>>>>>> Missing (1): fi-bdw-samus
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>      Possible new issues
>>>>>>>>>
>>>>>>>>> Here are the unknown changes that may have been
>>>>>>>>> introduced in
>>>>>>>>> Patchwork_101396v5:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>        IGT changes
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>          Possible regressions
>>>>>>>>>
>>>>>>>>>    * igt@i915_selftest@live@reset:
>>>>>>>>>        o bat-adlp-4: PASS
>>>>>>>>> <
>>>>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>          -> DMESG-FAIL
>>>>>>>>> <
>>>>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> I keep hitting clobbered pages during engine resets on
>>>>>>>> bat-adlp-4.
>>>>>>>> It seems to happen most of the time on that machine and
>>>>>>>> occasionally on bat-adlp-6.
>>>>>>>>
>>>>>>>> Should bat-adlp-4 be considered an unreliable machine
>>>>>>>> like
>>>>>>>> bat-adlp-6 is for now?
>>>>>>>>
>>>>>>>> Alternatively, seeing the history of this in
>>>>>>>>
>>>>>>>> commit 3da3c5c1c9825c24168f27b021339e90af37e969
>>>>>>>> "drm/i915: Exclude
>>>>>>>> low pages (128KiB) of stolen from use"
>>>>>>>>
>>>>>>>> could this be an indication that maybe the original issue
>>>>>>>> is worse
>>>>>>>> on adlp machines?
>>>>>>>> I have only ever seen page page 135 or 136 clobbered
>>>>>>>> across many
>>>>>>>> runs via trybot, so it looks fairly consistent.
>>>>>>>> Though excluding the use of over 540K of stolen might be
>>>>>>>> too severe.
>>>>>>>
>>>>>>> Don't know but I see that on the latest version you even
>>>>>>> hit pages
>>>>>>> 165/166.
>>>>>>>
>>>>>>> Any history of hitting this in CI without your series? If
>>>>>>> not, are
>>>>>>> there some other changes which could explain it? Are you
>>>>>>> touching
>>>>>>> the selftest itself?
>>>>>>>
>>>>>>> Hexdump of the clobbered page looks quite complex.
>>>>>>> Especially
>>>>>>> POISON_FREE. Any idea how that ends up there?
>>>>>>
>>>>>>
>>>>>> (see
>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings702
>>>>>> )
>>>>>>
>>>>>>
>>>>>> after lots of slow debug via CI, it looks like the issue is
>>>>>> that a
>>>>>> ring buffer was allocated and taking up that page during the
>>>>>> initial
>>>>>> crc capture in the test, but by the time it came to check for
>>>>>> corruption, it had been freed from that page.
>>>>>>
>>>>>> The test has a number of weaknesses:
>>>>>>
>>>>>> 1. the busy check is done twice, without taking in to account
>>>>>> any
>>>>>> change in between. I assume previously this could be relied
>>>>>> on never
>>>>>> to occur, but now it can for some reason (more on that later)
>>>>>
>>>>> You mean the stolen page used/unused test? Probably the premise
>>>>> is
>>>>> that the test controls the driver completely ie. is the sole
>>>>> user and
>>>>> the two checks are run at the time where nothing else could
>>>>> have
>>>>> changed the state.
>>>>>
>>>>> With the nerfed request (as with GuC) this actually should
>>>>> hold. In
>>>>> the generic case I am less sure, my working knowledge faded a
>>>>> bit,
>>>>> but perhaps there was something guaranteeing the spinner
>>>>> couldn't
>>>>> have been retired yet at the time of the second check. Would
>>>>> need
>>>>> clarifying at least in comments.
>>>>>>
>>>>>> 2. the engine reset returns early with an error for guc
>>>>>> submission
>>>>>> engines, but it is silently ignored in the test. Perhaps it
>>>>>> should
>>>>>> ignore guc submission engines as it is a largely useless test
>>>>>> for
>>>>>> those situations.
>>>>>
>>>>> Yes looks dodgy indeed. You will need to summon the owners of
>>>>> the GuC
>>>>> backend to comment on this.
>>>>>
>>>>> However even if the test should be skipped with GuC it is
>>>>> extremely
>>>>> interesting that you are hitting this so I suspect there is a
>>>>> more
>>>>> serious issue at play.
>>>>
>>>> indeed. That's why I am keen to get to the root cause instead of
>>>> just
>>>> slapping in a fix.
>>>>
>>>>>
>>>>>> A quick obvious fix is to have a busy bitmask that remembers
>>>>>> each
>>>>>> page's busy state initially and only check for corruption if
>>>>>> it was
>>>>>> busy during both checks.
>>>>>>
>>>>>> However, the main question is why this is occurring now with
>>>>>> my
>>>>>> changes.
>>>>>> I have added more debug to check where the stolen memory is
>>>>>> being
>>>>>> freed, but the first run last night didn't hit the issue for
>>>>>> once.
>>>>>> I am running again now, will report back if I figure out
>>>>>> where it is
>>>>>> being freed.
>>>>>>
>>>>>> I am pretty sure the "corruption" (which isn't actually
>>>>>> corruption)
>>>>>> is from a ring buffer.
>>>>>> The POISON_FREE is the only difference between the captured
>>>>>> before
>>>>>> and after dumps:
>>>>>>
>>>>>> [0040] 00000000 02800000 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b
>>>>>> 6b6b6b6b 6b6b6b6b
>>>>>>
>>>>>> with the 2nd dword being the MI_ARB_CHECK used for the
>>>>>> spinner.
>>>>>> I think this is the request poisoning from
>>>>>> i915_request_retire()
>>>>>>
>>>>>> The bit I don't know yet is why a ring buffer was freed
>>>>>> between the
>>>>>> initial crc capture and the corruption check. The spinner
>>>>>> should be
>>>>>> active across the entire test, maintaining a ref on the
>>>>>> context and
>>>>>> it's ring.
>>>>>>
>>>>>> hopefully my latest debug will give more answers.
>>>>>
>>>>> Yeah if you can figure our whether the a) spinner is still
>>>>> active
>>>>> during the 2nd check (as I think it should be), and b) is the
>>>>> corruption detected in the same pages which were used in the
>>>>> 1st pass
>>>>> that would be interesting.
>>>>
>>>> yep. The latest run is still stuck in the CI queue after 27
>>>> hours.
>>>> I think I have enough debug in there to catch it now.
>>>> Hopefully I can get a root cause once it gets chance to run.
>>>>
>>>
>>> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v7/fi-adl-ddr5/igt@i915_selftest@live@reset.html#dmesg-warnings496
>>>   
>>>
>>>
>>> well, the run finally happened.
>>> And it shows that the freed resource happens from a workqueue. Not
>>> helpful.
>>>
>>> I'll now add a saved stack traces to all objects that saves where
>>> it is
>>> allocated and freed/queued for free.
>>>
>>
>> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v8/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings419
>>
>> I'm pretty sure I know what is going on now.
>>
>> igt_reset_engines_stolen() loops around each engine and calls
>> __igt_reset_stolen() for that engine.
>>
>>
>> __igt_reset_stolen() does
>> intel_context_create()
>>
>> igt_spinner_create_request()->intel_context_create_request()-
>>> __i915_request_create()->intel_context_get()
>>
>> intel_context_put()
>>
>> leaving the request as the remaining holder of the context.
>>
>> it then does the reset, which does nothing on GuC setups, does the
>> comparisons, then ends the spinner via igt_spinner_fini()-
>>> igt_spinner_end()
>> which lets the spinner request finish.
>>
>> once the request is retired, intel_context_put() is eventually
>> called,
>> which starts the GuC teardown of the context as the request was the
>> last
>> holder of the context.
>>
>> This GuC teardown is asynchronous via ct transactions.
>> By the time the ct_process_request() sees the
>> INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE message, the test has
>> already
>> looped to the next engine and has already checked the original status
>> of
>> the page that the destroying context used for its ring buffer, so the
>> test sees it being freed from the previous loop while testing the
>> next
>> engine. It considers this a corruption of the stolen memory due to
>> the
>> previously highlighted double checking of busy state for each page.
>>
>> I think for now, we should simply not test GuC submission engines in
>> line with the reset call returning an error.
>> If at some point we want to enable this test for GuC setups, then
>> flushing and waiting for context cleanup would need to be added to
>> the test.
>>
>> Anyone know why per engine reset is not allowed for GuC submission
>> setup?
>> looking at commit "eb5e7da736f3 drm/i915/guc: Reset implementation
>> for
>> new GuC interface" doesn't really detail why per engine resets are
>> not
>> allowed.
>> Maybe it just never got implemented? or are there reasons to not
>> allow
>> the host to request specific engine resets?
>>
> 
> IIRC, the GuC by design decides for itself when it needs to do a per-
> engine reset, hence the host can't trigger those.
> 
> /Thomas
> 

okay, understood.
I'll include a fix to the test in the next version to not test if GuC 
submissions are active.

The only curious bit is that the conversion to ttm makes this issue 
occur more often. In theory this should have been happening with the old 
code too.
I suspect it is just a timing thing. Maybe the ttm code has sped up or 
slowed down the process somewhere.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)
  2022-06-30 14:20               ` Robert Beckett
  2022-06-30 14:52                 ` Hellstrom, Thomas
@ 2022-07-01  7:34                 ` Tvrtko Ursulin
  1 sibling, 0 replies; 48+ messages in thread
From: Tvrtko Ursulin @ 2022-07-01  7:34 UTC (permalink / raw)
  To: Robert Beckett, intel-gfx, Hellstrom, Thomas, John Harrison,
	Matthew Brost


On 30/06/2022 15:20, Robert Beckett wrote:
> 
> 
> On 29/06/2022 13:51, Robert Beckett wrote:
>>
>>
>> On 28/06/2022 17:22, Robert Beckett wrote:
>>>
>>>
>>> On 28/06/2022 09:46, Tvrtko Ursulin wrote:
>>>>
>>>> On 27/06/2022 18:08, Robert Beckett wrote:
>>>>>
>>>>>
>>>>> On 22/06/2022 10:05, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 21/06/2022 20:11, Robert Beckett wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 21/06/2022 18:37, Patchwork wrote:
>>>>>>>> *Patch Details*
>>>>>>>> *Series:*    drm/i915: ttm for stolen (rev5)
>>>>>>>> *URL:*    https://patchwork.freedesktop.org/series/101396/ 
>>>>>>>> <https://patchwork.freedesktop.org/series/101396/>
>>>>>>>> *State:*    failure
>>>>>>>> *Details:* 
>>>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
>>>>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html> 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5
>>>>>>>>
>>>>>>>>
>>>>>>>>     Summary
>>>>>>>>
>>>>>>>> *FAILURE*
>>>>>>>>
>>>>>>>> Serious unknown changes coming with Patchwork_101396v5 
>>>>>>>> absolutely need to be
>>>>>>>> verified manually.
>>>>>>>>
>>>>>>>> If you think the reported changes have nothing to do with the 
>>>>>>>> changes
>>>>>>>> introduced in Patchwork_101396v5, please notify your bug team to 
>>>>>>>> allow them
>>>>>>>> to document this new failure mode, which will reduce false 
>>>>>>>> positives in CI.
>>>>>>>>
>>>>>>>> External URL: 
>>>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     Participating hosts (40 -> 41)
>>>>>>>>
>>>>>>>> Additional (2): fi-icl-u2 bat-dg2-9
>>>>>>>> Missing (1): fi-bdw-samus
>>>>>>>>
>>>>>>>>
>>>>>>>>     Possible new issues
>>>>>>>>
>>>>>>>> Here are the unknown changes that may have been introduced in 
>>>>>>>> Patchwork_101396v5:
>>>>>>>>
>>>>>>>>
>>>>>>>>       IGT changes
>>>>>>>>
>>>>>>>>
>>>>>>>>         Possible regressions
>>>>>>>>
>>>>>>>>   * igt@i915_selftest@live@reset:
>>>>>>>>       o bat-adlp-4: PASS
>>>>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>>>>>>
>>>>>>>>         -> DMESG-FAIL
>>>>>>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> I keep hitting clobbered pages during engine resets on bat-adlp-4.
>>>>>>> It seems to happen most of the time on that machine and 
>>>>>>> occasionally on bat-adlp-6.
>>>>>>>
>>>>>>> Should bat-adlp-4 be considered an unreliable machine like 
>>>>>>> bat-adlp-6 is for now?
>>>>>>>
>>>>>>> Alternatively, seeing the history of this in
>>>>>>>
>>>>>>> commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: 
>>>>>>> Exclude low pages (128KiB) of stolen from use"
>>>>>>>
>>>>>>> could this be an indication that maybe the original issue is 
>>>>>>> worse on adlp machines?
>>>>>>> I have only ever seen page page 135 or 136 clobbered across many 
>>>>>>> runs via trybot, so it looks fairly consistent.
>>>>>>> Though excluding the use of over 540K of stolen might be too severe.
>>>>>>
>>>>>> Don't know but I see that on the latest version you even hit pages 
>>>>>> 165/166.
>>>>>>
>>>>>> Any history of hitting this in CI without your series? If not, are 
>>>>>> there some other changes which could explain it? Are you touching 
>>>>>> the selftest itself?
>>>>>>
>>>>>> Hexdump of the clobbered page looks quite complex. Especially 
>>>>>> POISON_FREE. Any idea how that ends up there?
>>>>>
>>>>>
>>>>> (see 
>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v4/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings702) 
>>>>>
>>>>>
>>>>> after lots of slow debug via CI, it looks like the issue is that a 
>>>>> ring buffer was allocated and taking up that page during the 
>>>>> initial crc capture in the test, but by the time it came to check 
>>>>> for corruption, it had been freed from that page.
>>>>>
>>>>> The test has a number of weaknesses:
>>>>>
>>>>> 1. the busy check is done twice, without taking in to account any 
>>>>> change in between. I assume previously this could be relied on 
>>>>> never to occur, but now it can for some reason (more on that later)
>>>>
>>>> You mean the stolen page used/unused test? Probably the premise is 
>>>> that the test controls the driver completely ie. is the sole user 
>>>> and the two checks are run at the time where nothing else could have 
>>>> changed the state.
>>>>
>>>> With the nerfed request (as with GuC) this actually should hold. In 
>>>> the generic case I am less sure, my working knowledge faded a bit, 
>>>> but perhaps there was something guaranteeing the spinner couldn't 
>>>> have been retired yet at the time of the second check. Would need 
>>>> clarifying at least in comments.
>>>>>
>>>>> 2. the engine reset returns early with an error for guc submission 
>>>>> engines, but it is silently ignored in the test. Perhaps it should 
>>>>> ignore guc submission engines as it is a largely useless test for 
>>>>> those situations.
>>>>
>>>> Yes looks dodgy indeed. You will need to summon the owners of the 
>>>> GuC backend to comment on this.
>>>>
>>>> However even if the test should be skipped with GuC it is extremely 
>>>> interesting that you are hitting this so I suspect there is a more 
>>>> serious issue at play.
>>>
>>> indeed. That's why I am keen to get to the root cause instead of just 
>>> slapping in a fix.
>>>
>>>>
>>>>> A quick obvious fix is to have a busy bitmask that remembers each 
>>>>> page's busy state initially and only check for corruption if it was 
>>>>> busy during both checks.
>>>>>
>>>>> However, the main question is why this is occurring now with my 
>>>>> changes.
>>>>> I have added more debug to check where the stolen memory is being 
>>>>> freed, but the first run last night didn't hit the issue for once.
>>>>> I am running again now, will report back if I figure out where it 
>>>>> is being freed.
>>>>>
>>>>> I am pretty sure the "corruption" (which isn't actually corruption) 
>>>>> is from a ring buffer.
>>>>> The POISON_FREE is the only difference between the captured before 
>>>>> and after dumps:
>>>>>
>>>>> [0040] 00000000 02800000 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 
>>>>> 6b6b6b6b 6b6b6b6b
>>>>>
>>>>> with the 2nd dword being the MI_ARB_CHECK used for the spinner.
>>>>> I think this is the request poisoning from i915_request_retire()
>>>>>
>>>>> The bit I don't know yet is why a ring buffer was freed between the 
>>>>> initial crc capture and the corruption check. The spinner should be 
>>>>> active across the entire test, maintaining a ref on the context and 
>>>>> it's ring.
>>>>>
>>>>> hopefully my latest debug will give more answers.
>>>>
>>>> Yeah if you can figure our whether the a) spinner is still active 
>>>> during the 2nd check (as I think it should be), and b) is the 
>>>> corruption detected in the same pages which were used in the 1st 
>>>> pass that would be interesting.
>>>
>>> yep. The latest run is still stuck in the CI queue after 27 hours.
>>> I think I have enough debug in there to catch it now.
>>> Hopefully I can get a root cause once it gets chance to run.
>>>
>>
>> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v7/fi-adl-ddr5/igt@i915_selftest@live@reset.html#dmesg-warnings496 
>>
>>
>> well, the run finally happened.
>> And it shows that the freed resource happens from a workqueue. Not 
>> helpful.
>>
>> I'll now add a saved stack traces to all objects that saves where it 
>> is allocated and freed/queued for free.
>>
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_105517v8/fi-rkl-guc/igt@i915_selftest@live@reset.html#dmesg-warnings419 
> 
> 
> I'm pretty sure I know what is going on now.
> 
> igt_reset_engines_stolen() loops around each engine and calls 
> __igt_reset_stolen() for that engine.
> 
> 
> __igt_reset_stolen() does
> intel_context_create()
> 
> igt_spinner_create_request()->intel_context_create_request()->__i915_request_create()->intel_context_get() 
> 
> 
> intel_context_put()
> 
> leaving the request as the remaining holder of the context.
> 
> it then does the reset, which does nothing on GuC setups, does the 
> comparisons, then ends the spinner via 
> igt_spinner_fini()->igt_spinner_end()
> which lets the spinner request finish.
> 
> once the request is retired, intel_context_put() is eventually called, 
> which starts the GuC teardown of the context as the request was the last 
> holder of the context.
> 
> This GuC teardown is asynchronous via ct transactions.
> By the time the ct_process_request() sees the 
> INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE message, the test has already 
> looped to the next engine and has already checked the original status of 
> the page that the destroying context used for its ring buffer, so the 
> test sees it being freed from the previous loop while testing the next 
> engine. It considers this a corruption of the stolen memory due to the 
> previously highlighted double checking of busy state for each page.

Alright, makes sense. Test kind of depends on implementation details and 
perhaps ideally it should do some explicit flushing before moving to the 
next engine, instead of assuming engine reset leaves everything idle and 
flushed. I don't know from the top of my head what kind of flushing 
would that be. In theory all possible delayed workers that we have. 
Maybe. But never mind for now.

> I think for now, we should simply not test GuC submission engines in 
> line with the reset call returning an error.
> If at some point we want to enable this test for GuC setups, then 
> flushing and waiting for context cleanup would need to be added to the 
> test.

Yeah that's okay. Alternative could be to provoke for instance the 
preempt timeout and hit the reset in that way. I am pretty sure some 
tests fiddle with it to enable rapid execution time. (Remove 
MI_ARB_CEHCK from the spinner and send an enging pulse, then wait for 
reset.)

> Anyone know why per engine reset is not allowed for GuC submission setup?
> looking at commit "eb5e7da736f3 drm/i915/guc: Reset implementation for 
> new GuC interface" doesn't really detail why per engine resets are not 
> allowed.
> Maybe it just never got implemented? or are there reasons to not allow 
> the host to request specific engine resets?

As Thomas has said - thinking seems to be there must be no explicit 
external mechanism to make GuC trigger the engine reset. Maybe it would 
be useful for testing purposes, or maybe indirect route as above is enough.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2022-07-01  7:34 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 21:33 [PATCH v7 00/10] drm/i915: ttm for stolen Robert Beckett
2022-06-20 21:33 ` [Intel-gfx] " Robert Beckett
2022-06-20 21:33 ` [PATCH v7 01/10] drm/i915/ttm: dont trample cache_level overrides during ttm move Robert Beckett
2022-06-20 21:33   ` [Intel-gfx] " Robert Beckett
2022-06-20 21:33   ` Robert Beckett
2022-06-20 21:33 ` [PATCH v7 02/10] drm/i915: limit ttm to dma32 for i965G[M] Robert Beckett
2022-06-20 21:33   ` [Intel-gfx] " Robert Beckett
2022-06-20 21:33   ` Robert Beckett
2022-06-20 21:33 ` [PATCH v7 03/10] drm/i915/ttm: only trust snooping for dgfx when deciding default cache_level Robert Beckett
2022-06-20 21:33   ` [Intel-gfx] " Robert Beckett
2022-06-20 21:33   ` Robert Beckett
2022-06-20 21:33 ` [PATCH v7 04/10] drm/i915/gem: selftest should not attempt mmap of private regions Robert Beckett
2022-06-20 21:33   ` [Intel-gfx] " Robert Beckett
2022-06-20 21:33   ` Robert Beckett
2022-06-20 21:33 ` [PATCH v7 05/10] drm/i915: instantiate ttm ranger manager for stolen memory Robert Beckett
2022-06-20 21:33   ` [Intel-gfx] " Robert Beckett
2022-06-20 21:33   ` Robert Beckett
2022-06-20 21:33 ` [PATCH v7 06/10] drm/i915: sanitize mem_flags for stolen buffers Robert Beckett
2022-06-20 21:33   ` [Intel-gfx] " Robert Beckett
2022-06-20 21:33   ` Robert Beckett
2022-06-20 21:33 ` [PATCH v7 07/10] drm/i915: ttm move/clear logic fix Robert Beckett
2022-06-20 21:33   ` [Intel-gfx] " Robert Beckett
2022-06-20 21:33   ` Robert Beckett
2022-06-20 21:33 ` [PATCH v7 08/10] drm/i915: allow memory region creators to alloc and free the region Robert Beckett
2022-06-20 21:33   ` [Intel-gfx] " Robert Beckett
2022-06-20 21:33   ` Robert Beckett
2022-06-20 21:33 ` [PATCH v7 09/10] drm/i915/ttm: add buffer pin on alloc flag Robert Beckett
2022-06-20 21:33   ` [Intel-gfx] " Robert Beckett
2022-06-20 21:33   ` Robert Beckett
2022-06-20 21:33 ` [PATCH v7 10/10] drm/i915: stolen memory use ttm backend Robert Beckett
2022-06-20 21:33   ` [Intel-gfx] " Robert Beckett
2022-06-20 21:33   ` Robert Beckett
2022-06-21  1:09 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: ttm for stolen (rev3) Patchwork
2022-06-21  1:39 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-06-21 15:10 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: ttm for stolen (rev4) Patchwork
2022-06-21 15:32 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-06-21 17:16 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: ttm for stolen (rev5) Patchwork
2022-06-21 17:37 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-06-21 19:11   ` Robert Beckett
2022-06-22  9:05     ` Tvrtko Ursulin
2022-06-27 17:08       ` Robert Beckett
2022-06-28  8:46         ` Tvrtko Ursulin
2022-06-28 16:22           ` Robert Beckett
2022-06-29 12:51             ` Robert Beckett
2022-06-30 14:20               ` Robert Beckett
2022-06-30 14:52                 ` Hellstrom, Thomas
2022-06-30 15:24                   ` Robert Beckett
2022-07-01  7:34                 ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.