[PATCH v3 0/4] drm/i915: Move system memory to TTM for discrete

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/4] drm/i915: Move system memory to TTM for discrete
@ 2021-06-14  9:52 ` Thomas Hellström
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14  9:52 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Early implementation of moving system memory for discrete cards over to
TTM. We first add the notion of objects being migratable under the object
lock to i915 gem, and add some asserts to verify that objects are either
locked or pinned when the placement is checked by the gem code.

Patch 2 and 3 deals with updating the i915 gem bookkeeping after a TTM move,
Patch 4 moves system over from shmem to TTM for discrete

Note that the mock device doesn't consider itself discrete so the TTM
system path is not checked by the mock selftests.

v2:
- Style fixes (reported by Matthew Auld)
- Drop the last patch (migration) It needs selftests and some additional work.
- Unconditionally add VM_IO at mmap time.

v3:
- More style fixes (reported by Matthew Auld)
- Don't overfill the busy placement vector (reported by Mattew Auld)

Thomas Hellström (4):
  drm/i915: Update object placement flags to be mutable
  drm/i915/ttm: Adjust gem flags and caching settings after a move
  drm/i915/ttm: Calculate the object placement at get_pages time
  drm/i915/ttm: Use TTM for system memory

 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  12 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  38 +++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  14 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  20 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_phys.c      |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 225 ++++++++++++++----
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   4 +-
 .../drm/i915/gem/selftests/huge_gem_object.c  |   4 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   5 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |   4 +-
 .../drm/i915/gem/selftests/i915_gem_phys.c    |   3 +-
 drivers/gpu/drm/i915/i915_drv.h               |   3 -
 drivers/gpu/drm/i915/intel_memory_region.c    |   7 +-
 drivers/gpu/drm/i915/intel_memory_region.h    |   8 +
 drivers/gpu/drm/i915/intel_region_ttm.c       |   8 +-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   2 +
 19 files changed, 279 insertions(+), 96 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Intel-gfx] [PATCH v3 0/4] drm/i915: Move system memory to TTM for discrete
@ 2021-06-14  9:52 ` Thomas Hellström
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14  9:52 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Early implementation of moving system memory for discrete cards over to
TTM. We first add the notion of objects being migratable under the object
lock to i915 gem, and add some asserts to verify that objects are either
locked or pinned when the placement is checked by the gem code.

Patch 2 and 3 deals with updating the i915 gem bookkeeping after a TTM move,
Patch 4 moves system over from shmem to TTM for discrete

Note that the mock device doesn't consider itself discrete so the TTM
system path is not checked by the mock selftests.

v2:
- Style fixes (reported by Matthew Auld)
- Drop the last patch (migration) It needs selftests and some additional work.
- Unconditionally add VM_IO at mmap time.

v3:
- More style fixes (reported by Matthew Auld)
- Don't overfill the busy placement vector (reported by Mattew Auld)

Thomas Hellström (4):
  drm/i915: Update object placement flags to be mutable
  drm/i915/ttm: Adjust gem flags and caching settings after a move
  drm/i915/ttm: Calculate the object placement at get_pages time
  drm/i915/ttm: Use TTM for system memory

 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  12 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  38 +++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  14 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  20 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_phys.c      |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 225 ++++++++++++++----
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   4 +-
 .../drm/i915/gem/selftests/huge_gem_object.c  |   4 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   5 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |   4 +-
 .../drm/i915/gem/selftests/i915_gem_phys.c    |   3 +-
 drivers/gpu/drm/i915/i915_drv.h               |   3 -
 drivers/gpu/drm/i915/intel_memory_region.c    |   7 +-
 drivers/gpu/drm/i915/intel_memory_region.h    |   8 +
 drivers/gpu/drm/i915/intel_region_ttm.c       |   8 +-
 drivers/gpu/drm/i915/intel_region_ttm.h       |   2 +
 19 files changed, 279 insertions(+), 96 deletions(-)

-- 
2.31.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/4] drm/i915: Update object placement flags to be mutable
  2021-06-14  9:52 ` [Intel-gfx] " Thomas Hellström
@ 2021-06-14  9:52   ` Thomas Hellström
  -1 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14  9:52 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

The object ops i915_GEM_OBJECT_HAS_IOMEM and the object
I915_BO_ALLOC_STRUCT_PAGE flags are considered immutable by
much of our code. Introduce a new mem_flags member to hold these
and make sure checks for these flags being set are either done
under the object lock or with pages properly pinned. The flags
will change during migration under the object lock.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
v2:
- Unconditionally set VM_IO on our VMAs in line with the rest core gem
  and TTM. Since the bo might be migrated while the VMA is still alive,
  there is no sense, whether or not it maps iomem might change.
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      | 12 +++---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    | 38 +++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    | 14 ++-----
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 20 +++++-----
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_phys.c      |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     | 10 +++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |  4 +-
 .../drm/i915/gem/selftests/huge_gem_object.c  |  4 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  5 +--
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  4 +-
 .../drm/i915/gem/selftests/i915_gem_phys.c    |  3 +-
 14 files changed, 79 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index ce6b664b10aa..13b217f75055 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -177,8 +177,8 @@ i915_gem_object_create_internal(struct drm_i915_private *i915,
 		return ERR_PTR(-ENOMEM);
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
-	i915_gem_object_init(obj, &i915_gem_object_internal_ops, &lock_class,
-			     I915_BO_ALLOC_STRUCT_PAGE);
+	i915_gem_object_init(obj, &i915_gem_object_internal_ops, &lock_class, 0);
+	obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
 
 	/*
 	 * Mark the object as volatile, such that the pages are marked as
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 2fd155742bd2..6497a2dbdab9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -684,7 +684,7 @@ __assign_mmap_offset(struct drm_i915_gem_object *obj,
 
 	if (mmap_type != I915_MMAP_TYPE_GTT &&
 	    !i915_gem_object_has_struct_page(obj) &&
-	    !i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM))
+	    !i915_gem_object_has_iomem(obj))
 		return -ENODEV;
 
 	mmo = mmap_offset_attach(obj, mmap_type, file);
@@ -708,7 +708,12 @@ __assign_mmap_offset_handle(struct drm_file *file,
 	if (!obj)
 		return -ENOENT;
 
+	err = i915_gem_object_lock_interruptible(obj, NULL);
+	if (err)
+		goto out_put;
 	err = __assign_mmap_offset(obj, mmap_type, offset, file);
+	i915_gem_object_unlock(obj);
+out_put:
 	i915_gem_object_put(obj);
 	return err;
 }
@@ -932,10 +937,7 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma)
 		return PTR_ERR(anon);
 	}
 
-	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
-
-	if (i915_gem_object_has_iomem(obj))
-		vma->vm_flags |= VM_IO;
+	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
 
 	/*
 	 * We keep the ref on mmo->obj, not vm_file, but we require
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index cf18c430d51f..07e8ff9a8aae 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -475,6 +475,44 @@ bool i915_gem_object_migratable(struct drm_i915_gem_object *obj)
 	return obj->mm.n_placements > 1;
 }
 
+/**
+ * i915_gem_object_has_struct_page - Whether the object is page-backed
+ * @obj: The object to query.
+ *
+ * This function should only be called while the object is locked or pinned,
+ * otherwise the page backing may change under the caller.
+ *
+ * Return: True if page-backed, false otherwise.
+ */
+bool i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj)
+{
+#ifdef CONFIG_LOCKDEP
+	if (IS_DGFX(to_i915(obj->base.dev)) &&
+	    i915_gem_object_evictable((void __force *)obj))
+		assert_object_held_shared(obj);
+#endif
+	return obj->mem_flags & I915_BO_FLAG_STRUCT_PAGE;
+}
+
+/**
+ * i915_gem_object_has_iomem - Whether the object is iomem-backed
+ * @obj: The object to query.
+ *
+ * This function should only be called while the object is locked or pinned,
+ * otherwise the iomem backing may change under the caller.
+ *
+ * Return: True if iomem-backed, false otherwise.
+ */
+bool i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj)
+{
+#ifdef CONFIG_LOCKDEP
+	if (IS_DGFX(to_i915(obj->base.dev)) &&
+	    i915_gem_object_evictable((void __force *)obj))
+		assert_object_held_shared(obj);
+#endif
+	return obj->mem_flags & I915_BO_FLAG_IOMEM;
+}
+
 void i915_gem_init__objects(struct drm_i915_private *i915)
 {
 	INIT_WORK(&i915->mm.free_work, __i915_gem_free_work);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index e9eecebf5c9d..60c760ebde42 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -147,7 +147,7 @@ i915_gem_object_put(struct drm_i915_gem_object *obj)
 /*
  * If more than one potential simultaneous locker, assert held.
  */
-static inline void assert_object_held_shared(struct drm_i915_gem_object *obj)
+static inline void assert_object_held_shared(const struct drm_i915_gem_object *obj)
 {
 	/*
 	 * Note mm list lookup is protected by
@@ -261,17 +261,9 @@ i915_gem_object_type_has(const struct drm_i915_gem_object *obj,
 	return obj->ops->flags & flags;
 }
 
-static inline bool
-i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj)
-{
-	return obj->flags & I915_BO_ALLOC_STRUCT_PAGE;
-}
+bool i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj);
 
-static inline bool
-i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj)
-{
-	return i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM);
-}
+bool i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj);
 
 static inline bool
 i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2a23b77424b3..fb9ccc3f50e7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -33,10 +33,9 @@ struct i915_lut_handle {
 
 struct drm_i915_gem_object_ops {
 	unsigned int flags;
-#define I915_GEM_OBJECT_HAS_IOMEM	BIT(1)
-#define I915_GEM_OBJECT_IS_SHRINKABLE	BIT(2)
-#define I915_GEM_OBJECT_IS_PROXY	BIT(3)
-#define I915_GEM_OBJECT_NO_MMAP		BIT(4)
+#define I915_GEM_OBJECT_IS_SHRINKABLE	BIT(1)
+#define I915_GEM_OBJECT_IS_PROXY	BIT(2)
+#define I915_GEM_OBJECT_NO_MMAP		BIT(3)
 
 	/* Interface between the GEM object and its backing storage.
 	 * get_pages() is called once prior to the use of the associated set
@@ -201,17 +200,18 @@ struct drm_i915_gem_object {
 	unsigned long flags;
 #define I915_BO_ALLOC_CONTIGUOUS BIT(0)
 #define I915_BO_ALLOC_VOLATILE   BIT(1)
-#define I915_BO_ALLOC_STRUCT_PAGE BIT(2)
-#define I915_BO_ALLOC_CPU_CLEAR  BIT(3)
-#define I915_BO_ALLOC_USER       BIT(4)
+#define I915_BO_ALLOC_CPU_CLEAR  BIT(2)
+#define I915_BO_ALLOC_USER       BIT(3)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 			     I915_BO_ALLOC_VOLATILE | \
-			     I915_BO_ALLOC_STRUCT_PAGE | \
 			     I915_BO_ALLOC_CPU_CLEAR | \
 			     I915_BO_ALLOC_USER)
-#define I915_BO_READONLY         BIT(5)
-#define I915_TILING_QUIRK_BIT    6 /* unknown swizzling; do not release! */
+#define I915_BO_READONLY         BIT(4)
+#define I915_TILING_QUIRK_BIT    5 /* unknown swizzling; do not release! */
 
+	unsigned int mem_flags:2;
+#define I915_BO_FLAG_STRUCT_PAGE BIT(0)
+#define I915_BO_FLAG_IOMEM       BIT(1)
 	/*
 	 * Is the object to be mapped as read-only to the GPU
 	 * Only honoured if hardware has relevant pte bit
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 086005c1c7ea..f2f850e31b8e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -351,7 +351,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 	int err;
 
 	if (!i915_gem_object_has_struct_page(obj) &&
-	    !i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM))
+	    !i915_gem_object_has_iomem(obj))
 		return ERR_PTR(-ENXIO);
 
 	assert_object_held(obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
index be72ad0634ba..7986612f48fa 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
@@ -76,7 +76,7 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
 	intel_gt_chipset_flush(&to_i915(obj->base.dev)->gt);
 
 	/* We're no longer struct page backed */
-	obj->flags &= ~I915_BO_ALLOC_STRUCT_PAGE;
+	obj->mem_flags &= ~I915_BO_FLAG_STRUCT_PAGE;
 	__i915_gem_object_set_pages(obj, st, sg->length);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 5d16c4462fda..3648ae1d6628 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -284,6 +284,7 @@ __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj,
 				bool needs_clflush)
 {
 	GEM_BUG_ON(obj->mm.madv == __I915_MADV_PURGED);
+	GEM_WARN_ON(IS_DGFX(to_i915(obj->base.dev)));
 
 	if (obj->mm.madv == I915_MADV_DONTNEED)
 		obj->mm.dirty = false;
@@ -302,6 +303,7 @@ void i915_gem_object_put_pages_shmem(struct drm_i915_gem_object *obj, struct sg_
 	struct pagevec pvec;
 	struct page *page;
 
+	GEM_WARN_ON(IS_DGFX(to_i915(obj->base.dev)));
 	__i915_gem_object_release_shmem(obj, pages, true);
 
 	i915_gem_gtt_finish_pages(obj, pages);
@@ -444,7 +446,7 @@ shmem_pread(struct drm_i915_gem_object *obj,
 
 static void shmem_release(struct drm_i915_gem_object *obj)
 {
-	if (obj->flags & I915_BO_ALLOC_STRUCT_PAGE)
+	if (i915_gem_object_has_struct_page(obj))
 		i915_gem_object_release_memory_region(obj);
 
 	fput(obj->base.filp);
@@ -513,9 +515,8 @@ static int shmem_object_init(struct intel_memory_region *mem,
 	mapping_set_gfp_mask(mapping, mask);
 	GEM_BUG_ON(!(mapping_gfp_mask(mapping) & __GFP_RECLAIM));
 
-	i915_gem_object_init(obj, &i915_gem_shmem_ops, &lock_class,
-			     I915_BO_ALLOC_STRUCT_PAGE);
-
+	i915_gem_object_init(obj, &i915_gem_shmem_ops, &lock_class, 0);
+	obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 
@@ -561,6 +562,7 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv,
 	resource_size_t offset;
 	int err;
 
+	GEM_WARN_ON(IS_DGFX(dev_priv));
 	obj = i915_gem_object_create_shmem(dev_priv, round_up(size, PAGE_SIZE));
 	if (IS_ERR(obj))
 		return obj;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index bf33724bed5c..33ab47f1e05b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -563,7 +563,6 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
 
 const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
 	.name = "i915_gem_object_ttm",
-	.flags = I915_GEM_OBJECT_HAS_IOMEM,
 
 	.get_pages = i915_ttm_get_pages,
 	.put_pages = i915_ttm_put_pages,
@@ -620,6 +619,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_gem_object_init_memory_region(obj, mem);
 	i915_gem_object_make_unshrinkable(obj);
 	obj->read_domains = I915_GEM_DOMAIN_WC | I915_GEM_DOMAIN_GTT;
+	obj->mem_flags |= I915_BO_FLAG_IOMEM;
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
 	INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN);
 	mutex_init(&obj->ttm.get_io_page.lock);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 7487bab11f0b..e99fce4429a7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -538,8 +538,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
 		return -ENOMEM;
 
 	drm_gem_private_object_init(dev, &obj->base, args->user_size);
-	i915_gem_object_init(obj, &i915_gem_userptr_ops, &lock_class,
-			     I915_BO_ALLOC_STRUCT_PAGE);
+	i915_gem_object_init(obj, &i915_gem_userptr_ops, &lock_class, 0);
+	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
index 0c8ecfdf5405..f963b8e1e37b 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
@@ -114,8 +114,8 @@ huge_gem_object(struct drm_i915_private *i915,
 		return ERR_PTR(-ENOMEM);
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, dma_size);
-	i915_gem_object_init(obj, &huge_ops, &lock_class,
-			     I915_BO_ALLOC_STRUCT_PAGE);
+	i915_gem_object_init(obj, &huge_ops, &lock_class, 0);
+	obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
 
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index dadd485bc52f..ccc67ed1a84b 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -167,9 +167,8 @@ huge_pages_object(struct drm_i915_private *i915,
 		return ERR_PTR(-ENOMEM);
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
-	i915_gem_object_init(obj, &huge_page_ops, &lock_class,
-			     I915_BO_ALLOC_STRUCT_PAGE);
-
+	i915_gem_object_init(obj, &huge_page_ops, &lock_class, 0);
+	obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
 	i915_gem_object_set_volatile(obj);
 
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 44b5de06ce64..fcea0ddabcc5 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -837,7 +837,7 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum i915_mmap_type type)
 
 	if (type != I915_MMAP_TYPE_GTT &&
 	    !i915_gem_object_has_struct_page(obj) &&
-	    !i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM))
+	    !i915_gem_object_has_iomem(obj))
 		return false;
 
 	return true;
@@ -991,7 +991,7 @@ static const char *repr_mmap_type(enum i915_mmap_type type)
 static bool can_access(const struct drm_i915_gem_object *obj)
 {
 	return i915_gem_object_has_struct_page(obj) ||
-	       i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM);
+	       i915_gem_object_has_iomem(obj);
 }
 
 static int __igt_mmap_access(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c
index 3a6ce87f8b52..d43d8dae0f69 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c
@@ -25,13 +25,14 @@ static int mock_phys_object(void *arg)
 		goto out;
 	}
 
+	i915_gem_object_lock(obj, NULL);
 	if (!i915_gem_object_has_struct_page(obj)) {
+		i915_gem_object_unlock(obj);
 		err = -EINVAL;
 		pr_err("shmem has no struct page\n");
 		goto out_obj;
 	}
 
-	i915_gem_object_lock(obj, NULL);
 	err = i915_gem_object_attach_phys(obj, PAGE_SIZE);
 	i915_gem_object_unlock(obj);
 	if (err) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Intel-gfx] [PATCH v3 1/4] drm/i915: Update object placement flags to be mutable
@ 2021-06-14  9:52   ` Thomas Hellström
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14  9:52 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

The object ops i915_GEM_OBJECT_HAS_IOMEM and the object
I915_BO_ALLOC_STRUCT_PAGE flags are considered immutable by
much of our code. Introduce a new mem_flags member to hold these
and make sure checks for these flags being set are either done
under the object lock or with pages properly pinned. The flags
will change during migration under the object lock.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
v2:
- Unconditionally set VM_IO on our VMAs in line with the rest core gem
  and TTM. Since the bo might be migrated while the VMA is still alive,
  there is no sense, whether or not it maps iomem might change.
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      | 12 +++---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    | 38 +++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    | 14 ++-----
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 20 +++++-----
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_phys.c      |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     | 10 +++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |  4 +-
 .../drm/i915/gem/selftests/huge_gem_object.c  |  4 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  5 +--
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  4 +-
 .../drm/i915/gem/selftests/i915_gem_phys.c    |  3 +-
 14 files changed, 79 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index ce6b664b10aa..13b217f75055 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -177,8 +177,8 @@ i915_gem_object_create_internal(struct drm_i915_private *i915,
 		return ERR_PTR(-ENOMEM);
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
-	i915_gem_object_init(obj, &i915_gem_object_internal_ops, &lock_class,
-			     I915_BO_ALLOC_STRUCT_PAGE);
+	i915_gem_object_init(obj, &i915_gem_object_internal_ops, &lock_class, 0);
+	obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
 
 	/*
 	 * Mark the object as volatile, such that the pages are marked as
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 2fd155742bd2..6497a2dbdab9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -684,7 +684,7 @@ __assign_mmap_offset(struct drm_i915_gem_object *obj,
 
 	if (mmap_type != I915_MMAP_TYPE_GTT &&
 	    !i915_gem_object_has_struct_page(obj) &&
-	    !i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM))
+	    !i915_gem_object_has_iomem(obj))
 		return -ENODEV;
 
 	mmo = mmap_offset_attach(obj, mmap_type, file);
@@ -708,7 +708,12 @@ __assign_mmap_offset_handle(struct drm_file *file,
 	if (!obj)
 		return -ENOENT;
 
+	err = i915_gem_object_lock_interruptible(obj, NULL);
+	if (err)
+		goto out_put;
 	err = __assign_mmap_offset(obj, mmap_type, offset, file);
+	i915_gem_object_unlock(obj);
+out_put:
 	i915_gem_object_put(obj);
 	return err;
 }
@@ -932,10 +937,7 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma)
 		return PTR_ERR(anon);
 	}
 
-	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
-
-	if (i915_gem_object_has_iomem(obj))
-		vma->vm_flags |= VM_IO;
+	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
 
 	/*
 	 * We keep the ref on mmo->obj, not vm_file, but we require
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index cf18c430d51f..07e8ff9a8aae 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -475,6 +475,44 @@ bool i915_gem_object_migratable(struct drm_i915_gem_object *obj)
 	return obj->mm.n_placements > 1;
 }
 
+/**
+ * i915_gem_object_has_struct_page - Whether the object is page-backed
+ * @obj: The object to query.
+ *
+ * This function should only be called while the object is locked or pinned,
+ * otherwise the page backing may change under the caller.
+ *
+ * Return: True if page-backed, false otherwise.
+ */
+bool i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj)
+{
+#ifdef CONFIG_LOCKDEP
+	if (IS_DGFX(to_i915(obj->base.dev)) &&
+	    i915_gem_object_evictable((void __force *)obj))
+		assert_object_held_shared(obj);
+#endif
+	return obj->mem_flags & I915_BO_FLAG_STRUCT_PAGE;
+}
+
+/**
+ * i915_gem_object_has_iomem - Whether the object is iomem-backed
+ * @obj: The object to query.
+ *
+ * This function should only be called while the object is locked or pinned,
+ * otherwise the iomem backing may change under the caller.
+ *
+ * Return: True if iomem-backed, false otherwise.
+ */
+bool i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj)
+{
+#ifdef CONFIG_LOCKDEP
+	if (IS_DGFX(to_i915(obj->base.dev)) &&
+	    i915_gem_object_evictable((void __force *)obj))
+		assert_object_held_shared(obj);
+#endif
+	return obj->mem_flags & I915_BO_FLAG_IOMEM;
+}
+
 void i915_gem_init__objects(struct drm_i915_private *i915)
 {
 	INIT_WORK(&i915->mm.free_work, __i915_gem_free_work);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index e9eecebf5c9d..60c760ebde42 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -147,7 +147,7 @@ i915_gem_object_put(struct drm_i915_gem_object *obj)
 /*
  * If more than one potential simultaneous locker, assert held.
  */
-static inline void assert_object_held_shared(struct drm_i915_gem_object *obj)
+static inline void assert_object_held_shared(const struct drm_i915_gem_object *obj)
 {
 	/*
 	 * Note mm list lookup is protected by
@@ -261,17 +261,9 @@ i915_gem_object_type_has(const struct drm_i915_gem_object *obj,
 	return obj->ops->flags & flags;
 }
 
-static inline bool
-i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj)
-{
-	return obj->flags & I915_BO_ALLOC_STRUCT_PAGE;
-}
+bool i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj);
 
-static inline bool
-i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj)
-{
-	return i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM);
-}
+bool i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj);
 
 static inline bool
 i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2a23b77424b3..fb9ccc3f50e7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -33,10 +33,9 @@ struct i915_lut_handle {
 
 struct drm_i915_gem_object_ops {
 	unsigned int flags;
-#define I915_GEM_OBJECT_HAS_IOMEM	BIT(1)
-#define I915_GEM_OBJECT_IS_SHRINKABLE	BIT(2)
-#define I915_GEM_OBJECT_IS_PROXY	BIT(3)
-#define I915_GEM_OBJECT_NO_MMAP		BIT(4)
+#define I915_GEM_OBJECT_IS_SHRINKABLE	BIT(1)
+#define I915_GEM_OBJECT_IS_PROXY	BIT(2)
+#define I915_GEM_OBJECT_NO_MMAP		BIT(3)
 
 	/* Interface between the GEM object and its backing storage.
 	 * get_pages() is called once prior to the use of the associated set
@@ -201,17 +200,18 @@ struct drm_i915_gem_object {
 	unsigned long flags;
 #define I915_BO_ALLOC_CONTIGUOUS BIT(0)
 #define I915_BO_ALLOC_VOLATILE   BIT(1)
-#define I915_BO_ALLOC_STRUCT_PAGE BIT(2)
-#define I915_BO_ALLOC_CPU_CLEAR  BIT(3)
-#define I915_BO_ALLOC_USER       BIT(4)
+#define I915_BO_ALLOC_CPU_CLEAR  BIT(2)
+#define I915_BO_ALLOC_USER       BIT(3)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 			     I915_BO_ALLOC_VOLATILE | \
-			     I915_BO_ALLOC_STRUCT_PAGE | \
 			     I915_BO_ALLOC_CPU_CLEAR | \
 			     I915_BO_ALLOC_USER)
-#define I915_BO_READONLY         BIT(5)
-#define I915_TILING_QUIRK_BIT    6 /* unknown swizzling; do not release! */
+#define I915_BO_READONLY         BIT(4)
+#define I915_TILING_QUIRK_BIT    5 /* unknown swizzling; do not release! */
 
+	unsigned int mem_flags:2;
+#define I915_BO_FLAG_STRUCT_PAGE BIT(0)
+#define I915_BO_FLAG_IOMEM       BIT(1)
 	/*
 	 * Is the object to be mapped as read-only to the GPU
 	 * Only honoured if hardware has relevant pte bit
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 086005c1c7ea..f2f850e31b8e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -351,7 +351,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 	int err;
 
 	if (!i915_gem_object_has_struct_page(obj) &&
-	    !i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM))
+	    !i915_gem_object_has_iomem(obj))
 		return ERR_PTR(-ENXIO);
 
 	assert_object_held(obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
index be72ad0634ba..7986612f48fa 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
@@ -76,7 +76,7 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
 	intel_gt_chipset_flush(&to_i915(obj->base.dev)->gt);
 
 	/* We're no longer struct page backed */
-	obj->flags &= ~I915_BO_ALLOC_STRUCT_PAGE;
+	obj->mem_flags &= ~I915_BO_FLAG_STRUCT_PAGE;
 	__i915_gem_object_set_pages(obj, st, sg->length);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 5d16c4462fda..3648ae1d6628 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -284,6 +284,7 @@ __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj,
 				bool needs_clflush)
 {
 	GEM_BUG_ON(obj->mm.madv == __I915_MADV_PURGED);
+	GEM_WARN_ON(IS_DGFX(to_i915(obj->base.dev)));
 
 	if (obj->mm.madv == I915_MADV_DONTNEED)
 		obj->mm.dirty = false;
@@ -302,6 +303,7 @@ void i915_gem_object_put_pages_shmem(struct drm_i915_gem_object *obj, struct sg_
 	struct pagevec pvec;
 	struct page *page;
 
+	GEM_WARN_ON(IS_DGFX(to_i915(obj->base.dev)));
 	__i915_gem_object_release_shmem(obj, pages, true);
 
 	i915_gem_gtt_finish_pages(obj, pages);
@@ -444,7 +446,7 @@ shmem_pread(struct drm_i915_gem_object *obj,
 
 static void shmem_release(struct drm_i915_gem_object *obj)
 {
-	if (obj->flags & I915_BO_ALLOC_STRUCT_PAGE)
+	if (i915_gem_object_has_struct_page(obj))
 		i915_gem_object_release_memory_region(obj);
 
 	fput(obj->base.filp);
@@ -513,9 +515,8 @@ static int shmem_object_init(struct intel_memory_region *mem,
 	mapping_set_gfp_mask(mapping, mask);
 	GEM_BUG_ON(!(mapping_gfp_mask(mapping) & __GFP_RECLAIM));
 
-	i915_gem_object_init(obj, &i915_gem_shmem_ops, &lock_class,
-			     I915_BO_ALLOC_STRUCT_PAGE);
-
+	i915_gem_object_init(obj, &i915_gem_shmem_ops, &lock_class, 0);
+	obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 
@@ -561,6 +562,7 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv,
 	resource_size_t offset;
 	int err;
 
+	GEM_WARN_ON(IS_DGFX(dev_priv));
 	obj = i915_gem_object_create_shmem(dev_priv, round_up(size, PAGE_SIZE));
 	if (IS_ERR(obj))
 		return obj;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index bf33724bed5c..33ab47f1e05b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -563,7 +563,6 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
 
 const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
 	.name = "i915_gem_object_ttm",
-	.flags = I915_GEM_OBJECT_HAS_IOMEM,
 
 	.get_pages = i915_ttm_get_pages,
 	.put_pages = i915_ttm_put_pages,
@@ -620,6 +619,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_gem_object_init_memory_region(obj, mem);
 	i915_gem_object_make_unshrinkable(obj);
 	obj->read_domains = I915_GEM_DOMAIN_WC | I915_GEM_DOMAIN_GTT;
+	obj->mem_flags |= I915_BO_FLAG_IOMEM;
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
 	INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN);
 	mutex_init(&obj->ttm.get_io_page.lock);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 7487bab11f0b..e99fce4429a7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -538,8 +538,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
 		return -ENOMEM;
 
 	drm_gem_private_object_init(dev, &obj->base, args->user_size);
-	i915_gem_object_init(obj, &i915_gem_userptr_ops, &lock_class,
-			     I915_BO_ALLOC_STRUCT_PAGE);
+	i915_gem_object_init(obj, &i915_gem_userptr_ops, &lock_class, 0);
+	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
index 0c8ecfdf5405..f963b8e1e37b 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
@@ -114,8 +114,8 @@ huge_gem_object(struct drm_i915_private *i915,
 		return ERR_PTR(-ENOMEM);
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, dma_size);
-	i915_gem_object_init(obj, &huge_ops, &lock_class,
-			     I915_BO_ALLOC_STRUCT_PAGE);
+	i915_gem_object_init(obj, &huge_ops, &lock_class, 0);
+	obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
 
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index dadd485bc52f..ccc67ed1a84b 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -167,9 +167,8 @@ huge_pages_object(struct drm_i915_private *i915,
 		return ERR_PTR(-ENOMEM);
 
 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
-	i915_gem_object_init(obj, &huge_page_ops, &lock_class,
-			     I915_BO_ALLOC_STRUCT_PAGE);
-
+	i915_gem_object_init(obj, &huge_page_ops, &lock_class, 0);
+	obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
 	i915_gem_object_set_volatile(obj);
 
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 44b5de06ce64..fcea0ddabcc5 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -837,7 +837,7 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum i915_mmap_type type)
 
 	if (type != I915_MMAP_TYPE_GTT &&
 	    !i915_gem_object_has_struct_page(obj) &&
-	    !i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM))
+	    !i915_gem_object_has_iomem(obj))
 		return false;
 
 	return true;
@@ -991,7 +991,7 @@ static const char *repr_mmap_type(enum i915_mmap_type type)
 static bool can_access(const struct drm_i915_gem_object *obj)
 {
 	return i915_gem_object_has_struct_page(obj) ||
-	       i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM);
+	       i915_gem_object_has_iomem(obj);
 }
 
 static int __igt_mmap_access(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c
index 3a6ce87f8b52..d43d8dae0f69 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c
@@ -25,13 +25,14 @@ static int mock_phys_object(void *arg)
 		goto out;
 	}
 
+	i915_gem_object_lock(obj, NULL);
 	if (!i915_gem_object_has_struct_page(obj)) {
+		i915_gem_object_unlock(obj);
 		err = -EINVAL;
 		pr_err("shmem has no struct page\n");
 		goto out_obj;
 	}
 
-	i915_gem_object_lock(obj, NULL);
 	err = i915_gem_object_attach_phys(obj, PAGE_SIZE);
 	i915_gem_object_unlock(obj);
 	if (err) {
-- 
2.31.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/4] drm/i915/ttm: Adjust gem flags and caching settings after a move
  2021-06-14  9:52 ` [Intel-gfx] " Thomas Hellström
@ 2021-06-14  9:52   ` Thomas Hellström
  -1 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14  9:52 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

After a TTM move or object init we need to update the i915 gem flags and
caching settings to reflect the new placement. Currently caching settings
are not changed during the lifetime of an object, although that might
change moving forward if we run into performance issues or issues with
WC system page allocations.
Also introduce gpu_binds_iomem() and cpu_maps_iomem() to clean up the
various ways we previously used to detect this.
Finally, initialize the TTM object reserved to be able to update
flags and caching before anyone else gets hold of the object.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
v2:
- Style fixes (Reported by Matthew Auld)
v3:
- More style fixes. Clarify why we're updating caching settings after move.
  (Suggested by Matthew Auld)
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 111 +++++++++++++++++++-----
 1 file changed, 89 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 33ab47f1e05b..5176682a7d19 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -70,6 +70,17 @@ static struct ttm_placement i915_sys_placement = {
 	.busy_placement = &lmem0_sys_placement_flags[1],
 };
 
+static bool gpu_binds_iomem(struct ttm_resource *mem)
+{
+	return mem->mem_type != TTM_PL_SYSTEM;
+}
+
+static bool cpu_maps_iomem(struct ttm_resource *mem)
+{
+	/* Once / if we support GGTT, this is also false for cached ttm_tts */
+	return mem->mem_type != TTM_PL_SYSTEM;
+}
+
 static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
 
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
@@ -175,6 +186,40 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
 	obj->ttm.cached_io_st = NULL;
 }
 
+static void
+i915_ttm_adjust_domains_after_cpu_move(struct drm_i915_gem_object *obj)
+{
+	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
+
+	if (cpu_maps_iomem(bo->resource) || bo->ttm->caching != ttm_cached) {
+		obj->write_domain = I915_GEM_DOMAIN_WC;
+		obj->read_domains = I915_GEM_DOMAIN_WC;
+	} else {
+		obj->write_domain = I915_GEM_DOMAIN_CPU;
+		obj->read_domains = I915_GEM_DOMAIN_CPU;
+	}
+}
+
+static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
+	unsigned int cache_level;
+
+	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
+
+	obj->mem_flags |= cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
+		I915_BO_FLAG_STRUCT_PAGE;
+
+	if ((HAS_LLC(i915) || HAS_SNOOP(i915)) && !gpu_binds_iomem(bo->resource) &&
+	    bo->ttm->caching == ttm_cached)
+		cache_level = I915_CACHE_LLC;
+	else
+		cache_level = I915_CACHE_NONE;
+
+	i915_gem_object_set_cache_coherency(obj, cache_level);
+}
+
 static void i915_ttm_purge(struct drm_i915_gem_object *obj)
 {
 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
@@ -190,8 +235,10 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
 
 	/* TTM's purge interface. Note that we might be reentering. */
 	ret = ttm_bo_validate(bo, &place, &ctx);
-
 	if (!ret) {
+		obj->write_domain = 0;
+		obj->read_domains = 0;
+		i915_ttm_adjust_gem_after_move(obj);
 		i915_ttm_free_cached_io_st(obj);
 		obj->mm.madv = __I915_MADV_PURGED;
 	}
@@ -273,12 +320,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 			 struct ttm_resource *res)
 {
 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
-	struct ttm_resource_manager *man =
-		ttm_manager_type(bo->bdev, res->mem_type);
 
-	if (man->use_tt)
+	if (!gpu_binds_iomem(res))
 		return i915_ttm_tt_get_st(bo->ttm);
 
+	/*
+	 * If CPU mapping differs, we need to add the ttm_tt pages to
+	 * the resulting st. Might make sense for GGTT.
+	 */
+	GEM_WARN_ON(!cpu_maps_iomem(res));
 	return intel_region_ttm_node_to_st(obj->mm.region, res);
 }
 
@@ -290,8 +340,6 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 	struct ttm_resource_manager *dst_man =
 		ttm_manager_type(bo->bdev, dst_mem->mem_type);
-	struct ttm_resource_manager *src_man =
-		ttm_manager_type(bo->bdev, bo->resource->mem_type);
 	struct intel_memory_region *dst_reg, *src_reg;
 	union {
 		struct ttm_kmap_iter_tt tt;
@@ -332,34 +380,36 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	if (IS_ERR(dst_st))
 		return PTR_ERR(dst_st);
 
-	/* If we start mapping GGTT, we can no longer use man::use_tt here. */
-	dst_iter = dst_man->use_tt ?
+	dst_iter = !cpu_maps_iomem(dst_mem) ?
 		ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
 		ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
 					 dst_st, dst_reg->region.start);
 
-	src_iter = src_man->use_tt ?
+	src_iter = !cpu_maps_iomem(bo->resource) ?
 		ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
 		ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
 					 obj->ttm.cached_io_st,
 					 src_reg->region.start);
 
 	ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
+	/* Below dst_mem becomes bo->resource. */
 	ttm_bo_move_sync_cleanup(bo, dst_mem);
+	i915_ttm_adjust_domains_after_cpu_move(obj);
 	i915_ttm_free_cached_io_st(obj);
 
-	if (!dst_man->use_tt) {
+	if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
 		obj->ttm.cached_io_st = dst_st;
 		obj->ttm.get_io_page.sg_pos = dst_st->sgl;
 		obj->ttm.get_io_page.sg_idx = 0;
 	}
 
+	i915_ttm_adjust_gem_after_move(obj);
 	return 0;
 }
 
 static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
 {
-	if (mem->mem_type < I915_PL_LMEM0)
+	if (!cpu_maps_iomem(mem))
 		return 0;
 
 	mem->bus.caching = ttm_write_combined;
@@ -421,6 +471,16 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
 	if (ret)
 		return ret == -ENOSPC ? -ENXIO : ret;
 
+	i915_ttm_adjust_lru(obj);
+	if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
+		ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
+		if (ret)
+			return ret;
+
+		i915_ttm_adjust_domains_after_cpu_move(obj);
+		i915_ttm_adjust_gem_after_move(obj);
+	}
+
 	/* Object either has a page vector or is an iomem object */
 	st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
 	if (IS_ERR(st))
@@ -428,8 +488,6 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
 
 	__i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
 
-	i915_ttm_adjust_lru(obj);
-
 	return ret;
 }
 
@@ -563,6 +621,7 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
 
 const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
 	.name = "i915_gem_object_ttm",
+	.flags = I915_GEM_OBJECT_IS_SHRINKABLE,
 
 	.get_pages = i915_ttm_get_pages,
 	.put_pages = i915_ttm_put_pages,
@@ -599,6 +658,10 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 {
 	static struct lock_class_key lock_class;
 	struct drm_i915_private *i915 = mem->i915;
+	struct ttm_operation_ctx ctx = {
+		.interruptible = true,
+		.no_wait_gpu = false,
+	};
 	enum ttm_bo_type bo_type;
 	size_t alignment = 0;
 	int ret;
@@ -618,15 +681,14 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
 	i915_gem_object_init_memory_region(obj, mem);
 	i915_gem_object_make_unshrinkable(obj);
-	obj->read_domains = I915_GEM_DOMAIN_WC | I915_GEM_DOMAIN_GTT;
-	obj->mem_flags |= I915_BO_FLAG_IOMEM;
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
 	INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN);
 	mutex_init(&obj->ttm.get_io_page.lock);
 
 	bo_type = (obj->flags & I915_BO_ALLOC_USER) ? ttm_bo_type_device :
 		ttm_bo_type_kernel;
 
+	obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
+
 	/*
 	 * If this function fails, it will call the destructor, but
 	 * our caller still owns the object. So no freeing in the
@@ -634,14 +696,19 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	 * Similarly, in delayed_destroy, we can't call ttm_bo_put()
 	 * until successful initialization.
 	 */
-	obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
-	ret = ttm_bo_init(&i915->bdev, i915_gem_to_ttm(obj), size,
-			  bo_type, &i915_sys_placement, alignment,
-			  true, NULL, NULL, i915_ttm_bo_destroy);
+	ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
+				   bo_type, &i915_sys_placement, alignment,
+				   &ctx, NULL, NULL, i915_ttm_bo_destroy);
+
+	if (ret)
+		goto out;
 
-	if (!ret)
-		obj->ttm.created = true;
+	obj->ttm.created = true;
+	i915_ttm_adjust_domains_after_cpu_move(obj);
+	i915_ttm_adjust_gem_after_move(obj);
+	i915_gem_object_unlock(obj);
 
+out:
 	/* i915 wants -ENXIO when out of memory region space. */
 	return (ret == -ENOSPC) ? -ENXIO : ret;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Intel-gfx] [PATCH v3 2/4] drm/i915/ttm: Adjust gem flags and caching settings after a move
@ 2021-06-14  9:52   ` Thomas Hellström
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14  9:52 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

After a TTM move or object init we need to update the i915 gem flags and
caching settings to reflect the new placement. Currently caching settings
are not changed during the lifetime of an object, although that might
change moving forward if we run into performance issues or issues with
WC system page allocations.
Also introduce gpu_binds_iomem() and cpu_maps_iomem() to clean up the
various ways we previously used to detect this.
Finally, initialize the TTM object reserved to be able to update
flags and caching before anyone else gets hold of the object.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
v2:
- Style fixes (Reported by Matthew Auld)
v3:
- More style fixes. Clarify why we're updating caching settings after move.
  (Suggested by Matthew Auld)
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 111 +++++++++++++++++++-----
 1 file changed, 89 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 33ab47f1e05b..5176682a7d19 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -70,6 +70,17 @@ static struct ttm_placement i915_sys_placement = {
 	.busy_placement = &lmem0_sys_placement_flags[1],
 };
 
+static bool gpu_binds_iomem(struct ttm_resource *mem)
+{
+	return mem->mem_type != TTM_PL_SYSTEM;
+}
+
+static bool cpu_maps_iomem(struct ttm_resource *mem)
+{
+	/* Once / if we support GGTT, this is also false for cached ttm_tts */
+	return mem->mem_type != TTM_PL_SYSTEM;
+}
+
 static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
 
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
@@ -175,6 +186,40 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
 	obj->ttm.cached_io_st = NULL;
 }
 
+static void
+i915_ttm_adjust_domains_after_cpu_move(struct drm_i915_gem_object *obj)
+{
+	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
+
+	if (cpu_maps_iomem(bo->resource) || bo->ttm->caching != ttm_cached) {
+		obj->write_domain = I915_GEM_DOMAIN_WC;
+		obj->read_domains = I915_GEM_DOMAIN_WC;
+	} else {
+		obj->write_domain = I915_GEM_DOMAIN_CPU;
+		obj->read_domains = I915_GEM_DOMAIN_CPU;
+	}
+}
+
+static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
+	unsigned int cache_level;
+
+	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
+
+	obj->mem_flags |= cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
+		I915_BO_FLAG_STRUCT_PAGE;
+
+	if ((HAS_LLC(i915) || HAS_SNOOP(i915)) && !gpu_binds_iomem(bo->resource) &&
+	    bo->ttm->caching == ttm_cached)
+		cache_level = I915_CACHE_LLC;
+	else
+		cache_level = I915_CACHE_NONE;
+
+	i915_gem_object_set_cache_coherency(obj, cache_level);
+}
+
 static void i915_ttm_purge(struct drm_i915_gem_object *obj)
 {
 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
@@ -190,8 +235,10 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
 
 	/* TTM's purge interface. Note that we might be reentering. */
 	ret = ttm_bo_validate(bo, &place, &ctx);
-
 	if (!ret) {
+		obj->write_domain = 0;
+		obj->read_domains = 0;
+		i915_ttm_adjust_gem_after_move(obj);
 		i915_ttm_free_cached_io_st(obj);
 		obj->mm.madv = __I915_MADV_PURGED;
 	}
@@ -273,12 +320,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 			 struct ttm_resource *res)
 {
 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
-	struct ttm_resource_manager *man =
-		ttm_manager_type(bo->bdev, res->mem_type);
 
-	if (man->use_tt)
+	if (!gpu_binds_iomem(res))
 		return i915_ttm_tt_get_st(bo->ttm);
 
+	/*
+	 * If CPU mapping differs, we need to add the ttm_tt pages to
+	 * the resulting st. Might make sense for GGTT.
+	 */
+	GEM_WARN_ON(!cpu_maps_iomem(res));
 	return intel_region_ttm_node_to_st(obj->mm.region, res);
 }
 
@@ -290,8 +340,6 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 	struct ttm_resource_manager *dst_man =
 		ttm_manager_type(bo->bdev, dst_mem->mem_type);
-	struct ttm_resource_manager *src_man =
-		ttm_manager_type(bo->bdev, bo->resource->mem_type);
 	struct intel_memory_region *dst_reg, *src_reg;
 	union {
 		struct ttm_kmap_iter_tt tt;
@@ -332,34 +380,36 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 	if (IS_ERR(dst_st))
 		return PTR_ERR(dst_st);
 
-	/* If we start mapping GGTT, we can no longer use man::use_tt here. */
-	dst_iter = dst_man->use_tt ?
+	dst_iter = !cpu_maps_iomem(dst_mem) ?
 		ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
 		ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
 					 dst_st, dst_reg->region.start);
 
-	src_iter = src_man->use_tt ?
+	src_iter = !cpu_maps_iomem(bo->resource) ?
 		ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
 		ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
 					 obj->ttm.cached_io_st,
 					 src_reg->region.start);
 
 	ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
+	/* Below dst_mem becomes bo->resource. */
 	ttm_bo_move_sync_cleanup(bo, dst_mem);
+	i915_ttm_adjust_domains_after_cpu_move(obj);
 	i915_ttm_free_cached_io_st(obj);
 
-	if (!dst_man->use_tt) {
+	if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
 		obj->ttm.cached_io_st = dst_st;
 		obj->ttm.get_io_page.sg_pos = dst_st->sgl;
 		obj->ttm.get_io_page.sg_idx = 0;
 	}
 
+	i915_ttm_adjust_gem_after_move(obj);
 	return 0;
 }
 
 static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
 {
-	if (mem->mem_type < I915_PL_LMEM0)
+	if (!cpu_maps_iomem(mem))
 		return 0;
 
 	mem->bus.caching = ttm_write_combined;
@@ -421,6 +471,16 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
 	if (ret)
 		return ret == -ENOSPC ? -ENXIO : ret;
 
+	i915_ttm_adjust_lru(obj);
+	if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
+		ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
+		if (ret)
+			return ret;
+
+		i915_ttm_adjust_domains_after_cpu_move(obj);
+		i915_ttm_adjust_gem_after_move(obj);
+	}
+
 	/* Object either has a page vector or is an iomem object */
 	st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
 	if (IS_ERR(st))
@@ -428,8 +488,6 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
 
 	__i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
 
-	i915_ttm_adjust_lru(obj);
-
 	return ret;
 }
 
@@ -563,6 +621,7 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
 
 const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
 	.name = "i915_gem_object_ttm",
+	.flags = I915_GEM_OBJECT_IS_SHRINKABLE,
 
 	.get_pages = i915_ttm_get_pages,
 	.put_pages = i915_ttm_put_pages,
@@ -599,6 +658,10 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 {
 	static struct lock_class_key lock_class;
 	struct drm_i915_private *i915 = mem->i915;
+	struct ttm_operation_ctx ctx = {
+		.interruptible = true,
+		.no_wait_gpu = false,
+	};
 	enum ttm_bo_type bo_type;
 	size_t alignment = 0;
 	int ret;
@@ -618,15 +681,14 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
 	i915_gem_object_init_memory_region(obj, mem);
 	i915_gem_object_make_unshrinkable(obj);
-	obj->read_domains = I915_GEM_DOMAIN_WC | I915_GEM_DOMAIN_GTT;
-	obj->mem_flags |= I915_BO_FLAG_IOMEM;
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
 	INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN);
 	mutex_init(&obj->ttm.get_io_page.lock);
 
 	bo_type = (obj->flags & I915_BO_ALLOC_USER) ? ttm_bo_type_device :
 		ttm_bo_type_kernel;
 
+	obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
+
 	/*
 	 * If this function fails, it will call the destructor, but
 	 * our caller still owns the object. So no freeing in the
@@ -634,14 +696,19 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	 * Similarly, in delayed_destroy, we can't call ttm_bo_put()
 	 * until successful initialization.
 	 */
-	obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
-	ret = ttm_bo_init(&i915->bdev, i915_gem_to_ttm(obj), size,
-			  bo_type, &i915_sys_placement, alignment,
-			  true, NULL, NULL, i915_ttm_bo_destroy);
+	ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
+				   bo_type, &i915_sys_placement, alignment,
+				   &ctx, NULL, NULL, i915_ttm_bo_destroy);
+
+	if (ret)
+		goto out;
 
-	if (!ret)
-		obj->ttm.created = true;
+	obj->ttm.created = true;
+	i915_ttm_adjust_domains_after_cpu_move(obj);
+	i915_ttm_adjust_gem_after_move(obj);
+	i915_gem_object_unlock(obj);
 
+out:
 	/* i915 wants -ENXIO when out of memory region space. */
 	return (ret == -ENOSPC) ? -ENXIO : ret;
 }
-- 
2.31.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/4] drm/i915/ttm: Calculate the object placement at get_pages time
  2021-06-14  9:52 ` [Intel-gfx] " Thomas Hellström
@ 2021-06-14  9:52   ` Thomas Hellström
  -1 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14  9:52 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Instead of relying on a static placement, calculate at get_pages() time.
This should work for LMEM regions and system for now. For stolen we need
to take preallocated range into account. That will if needed be added
later.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
v2:
- Fixed a style issue (Reported by Matthew Auld)
v3:
- Make sure we don't add more placements to the stack-allocated vector
  than there is room for. (Reported by Matthew Auld)
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 94 ++++++++++++++++++-------
 drivers/gpu/drm/i915/intel_region_ttm.c |  8 ++-
 drivers/gpu/drm/i915/intel_region_ttm.h |  2 +
 3 files changed, 77 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 5176682a7d19..33213be45050 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -24,6 +24,11 @@
 #define I915_TTM_PRIO_NO_PAGES  1
 #define I915_TTM_PRIO_HAS_PAGES 2
 
+/*
+ * Size of struct ttm_place vector in on-stack struct ttm_placement allocs
+ */
+#define I915_TTM_MAX_PLACEMENTS INTEL_REGION_UNKNOWN
+
 /**
  * struct i915_ttm_tt - TTM page vector with additional private information
  * @ttm: The base TTM page vector.
@@ -42,32 +47,18 @@ struct i915_ttm_tt {
 	struct sg_table *cached_st;
 };
 
-static const struct ttm_place lmem0_sys_placement_flags[] = {
-	{
-		.fpfn = 0,
-		.lpfn = 0,
-		.mem_type = I915_PL_LMEM0,
-		.flags = 0,
-	}, {
-		.fpfn = 0,
-		.lpfn = 0,
-		.mem_type = I915_PL_SYSTEM,
-		.flags = 0,
-	}
-};
-
-static struct ttm_placement i915_lmem0_placement = {
-	.num_placement = 1,
-	.placement = &lmem0_sys_placement_flags[0],
-	.num_busy_placement = 1,
-	.busy_placement = &lmem0_sys_placement_flags[0],
+static const struct ttm_place sys_placement_flags = {
+	.fpfn = 0,
+	.lpfn = 0,
+	.mem_type = I915_PL_SYSTEM,
+	.flags = 0,
 };
 
 static struct ttm_placement i915_sys_placement = {
 	.num_placement = 1,
-	.placement = &lmem0_sys_placement_flags[1],
+	.placement = &sys_placement_flags,
 	.num_busy_placement = 1,
-	.busy_placement = &lmem0_sys_placement_flags[1],
+	.busy_placement = &sys_placement_flags,
 };
 
 static bool gpu_binds_iomem(struct ttm_resource *mem)
@@ -83,6 +74,55 @@ static bool cpu_maps_iomem(struct ttm_resource *mem)
 
 static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
 
+static enum ttm_caching
+i915_ttm_select_tt_caching(const struct drm_i915_gem_object *obj)
+{
+	/*
+	 * Objects only allowed in system get cached cpu-mappings.
+	 * Other objects get WC mapping for now. Even if in system.
+	 */
+	if (obj->mm.region->type == INTEL_MEMORY_SYSTEM &&
+	    obj->mm.n_placements <= 1)
+		return ttm_cached;
+
+	return ttm_write_combined;
+}
+
+static void
+i915_ttm_place_from_region(const struct intel_memory_region *mr,
+			   struct ttm_place *place)
+{
+	memset(place, 0, sizeof(*place));
+	place->mem_type = intel_region_to_ttm_type(mr);
+}
+
+static void
+i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj,
+			    struct ttm_place *requested,
+			    struct ttm_place *busy,
+			    struct ttm_placement *placement)
+{
+	unsigned int num_allowed = obj->mm.n_placements;
+	unsigned int i;
+
+	placement->num_placement = 1;
+	i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] :
+				   obj->mm.region, requested);
+
+	/* Cache this on object? */
+	placement->num_busy_placement = num_allowed;
+	for (i = 0; i < placement->num_busy_placement; ++i)
+		i915_ttm_place_from_region(obj->mm.placements[i], busy + i);
+
+	if (num_allowed == 0) {
+		*busy = *requested;
+		placement->num_busy_placement = 1;
+	}
+
+	placement->placement = requested;
+	placement->busy_placement = busy;
+}
+
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 					 uint32_t page_flags)
 {
@@ -100,7 +140,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 	    man->use_tt)
 		page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC;
 
-	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, ttm_write_combined);
+	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags,
+			  i915_ttm_select_tt_caching(obj));
 	if (ret) {
 		kfree(i915_tt);
 		return NULL;
@@ -464,10 +505,15 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
 		.no_wait_gpu = false,
 	};
 	struct sg_table *st;
+	struct ttm_place requested, busy[I915_TTM_MAX_PLACEMENTS];
+	struct ttm_placement placement;
 	int ret;
 
+	GEM_BUG_ON(obj->mm.n_placements > I915_TTM_MAX_PLACEMENTS);
+
 	/* Move to the requested placement. */
-	ret = ttm_bo_validate(bo, &i915_lmem0_placement, &ctx);
+	i915_ttm_placement_from_obj(obj, &requested, busy, &placement);
+	ret = ttm_bo_validate(bo, &placement, &ctx);
 	if (ret)
 		return ret == -ENOSPC ? -ENXIO : ret;
 
@@ -683,7 +729,6 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_gem_object_make_unshrinkable(obj);
 	INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN);
 	mutex_init(&obj->ttm.get_io_page.lock);
-
 	bo_type = (obj->flags & I915_BO_ALLOC_USER) ? ttm_bo_type_device :
 		ttm_bo_type_kernel;
 
@@ -707,7 +752,6 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_ttm_adjust_domains_after_cpu_move(obj);
 	i915_ttm_adjust_gem_after_move(obj);
 	i915_gem_object_unlock(obj);
-
 out:
 	/* i915 wants -ENXIO when out of memory region space. */
 	return (ret == -ENOSPC) ? -ENXIO : ret;
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 27fe0668d094..5a664f6cc93f 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -50,12 +50,16 @@ void intel_region_ttm_device_fini(struct drm_i915_private *dev_priv)
  * driver-private types for now, reserving TTM_PL_VRAM for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
-static int intel_region_to_ttm_type(struct intel_memory_region *mem)
+int intel_region_to_ttm_type(const struct intel_memory_region *mem)
 {
 	int type;
 
 	GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
-		   mem->type != INTEL_MEMORY_MOCK);
+		   mem->type != INTEL_MEMORY_MOCK &&
+		   mem->type != INTEL_MEMORY_SYSTEM);
+
+	if (mem->type == INTEL_MEMORY_SYSTEM)
+		return TTM_PL_SYSTEM;
 
 	type = mem->instance + TTM_PL_PRIV;
 	GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
index e8cf830fda6f..649491844e79 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -28,6 +28,8 @@ struct sg_table *intel_region_ttm_node_to_st(struct intel_memory_region *mem,
 void intel_region_ttm_node_free(struct intel_memory_region *mem,
 				struct ttm_resource *node);
 
+int intel_region_to_ttm_type(const struct intel_memory_region *mem);
+
 struct ttm_device_funcs *i915_ttm_driver(void);
 
 #ifdef CONFIG_DRM_I915_SELFTEST
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Intel-gfx] [PATCH v3 3/4] drm/i915/ttm: Calculate the object placement at get_pages time
@ 2021-06-14  9:52   ` Thomas Hellström
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14  9:52 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

Instead of relying on a static placement, calculate at get_pages() time.
This should work for LMEM regions and system for now. For stolen we need
to take preallocated range into account. That will if needed be added
later.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
v2:
- Fixed a style issue (Reported by Matthew Auld)
v3:
- Make sure we don't add more placements to the stack-allocated vector
  than there is room for. (Reported by Matthew Auld)
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 94 ++++++++++++++++++-------
 drivers/gpu/drm/i915/intel_region_ttm.c |  8 ++-
 drivers/gpu/drm/i915/intel_region_ttm.h |  2 +
 3 files changed, 77 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 5176682a7d19..33213be45050 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -24,6 +24,11 @@
 #define I915_TTM_PRIO_NO_PAGES  1
 #define I915_TTM_PRIO_HAS_PAGES 2
 
+/*
+ * Size of struct ttm_place vector in on-stack struct ttm_placement allocs
+ */
+#define I915_TTM_MAX_PLACEMENTS INTEL_REGION_UNKNOWN
+
 /**
  * struct i915_ttm_tt - TTM page vector with additional private information
  * @ttm: The base TTM page vector.
@@ -42,32 +47,18 @@ struct i915_ttm_tt {
 	struct sg_table *cached_st;
 };
 
-static const struct ttm_place lmem0_sys_placement_flags[] = {
-	{
-		.fpfn = 0,
-		.lpfn = 0,
-		.mem_type = I915_PL_LMEM0,
-		.flags = 0,
-	}, {
-		.fpfn = 0,
-		.lpfn = 0,
-		.mem_type = I915_PL_SYSTEM,
-		.flags = 0,
-	}
-};
-
-static struct ttm_placement i915_lmem0_placement = {
-	.num_placement = 1,
-	.placement = &lmem0_sys_placement_flags[0],
-	.num_busy_placement = 1,
-	.busy_placement = &lmem0_sys_placement_flags[0],
+static const struct ttm_place sys_placement_flags = {
+	.fpfn = 0,
+	.lpfn = 0,
+	.mem_type = I915_PL_SYSTEM,
+	.flags = 0,
 };
 
 static struct ttm_placement i915_sys_placement = {
 	.num_placement = 1,
-	.placement = &lmem0_sys_placement_flags[1],
+	.placement = &sys_placement_flags,
 	.num_busy_placement = 1,
-	.busy_placement = &lmem0_sys_placement_flags[1],
+	.busy_placement = &sys_placement_flags,
 };
 
 static bool gpu_binds_iomem(struct ttm_resource *mem)
@@ -83,6 +74,55 @@ static bool cpu_maps_iomem(struct ttm_resource *mem)
 
 static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
 
+static enum ttm_caching
+i915_ttm_select_tt_caching(const struct drm_i915_gem_object *obj)
+{
+	/*
+	 * Objects only allowed in system get cached cpu-mappings.
+	 * Other objects get WC mapping for now. Even if in system.
+	 */
+	if (obj->mm.region->type == INTEL_MEMORY_SYSTEM &&
+	    obj->mm.n_placements <= 1)
+		return ttm_cached;
+
+	return ttm_write_combined;
+}
+
+static void
+i915_ttm_place_from_region(const struct intel_memory_region *mr,
+			   struct ttm_place *place)
+{
+	memset(place, 0, sizeof(*place));
+	place->mem_type = intel_region_to_ttm_type(mr);
+}
+
+static void
+i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj,
+			    struct ttm_place *requested,
+			    struct ttm_place *busy,
+			    struct ttm_placement *placement)
+{
+	unsigned int num_allowed = obj->mm.n_placements;
+	unsigned int i;
+
+	placement->num_placement = 1;
+	i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] :
+				   obj->mm.region, requested);
+
+	/* Cache this on object? */
+	placement->num_busy_placement = num_allowed;
+	for (i = 0; i < placement->num_busy_placement; ++i)
+		i915_ttm_place_from_region(obj->mm.placements[i], busy + i);
+
+	if (num_allowed == 0) {
+		*busy = *requested;
+		placement->num_busy_placement = 1;
+	}
+
+	placement->placement = requested;
+	placement->busy_placement = busy;
+}
+
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 					 uint32_t page_flags)
 {
@@ -100,7 +140,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 	    man->use_tt)
 		page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC;
 
-	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, ttm_write_combined);
+	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags,
+			  i915_ttm_select_tt_caching(obj));
 	if (ret) {
 		kfree(i915_tt);
 		return NULL;
@@ -464,10 +505,15 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
 		.no_wait_gpu = false,
 	};
 	struct sg_table *st;
+	struct ttm_place requested, busy[I915_TTM_MAX_PLACEMENTS];
+	struct ttm_placement placement;
 	int ret;
 
+	GEM_BUG_ON(obj->mm.n_placements > I915_TTM_MAX_PLACEMENTS);
+
 	/* Move to the requested placement. */
-	ret = ttm_bo_validate(bo, &i915_lmem0_placement, &ctx);
+	i915_ttm_placement_from_obj(obj, &requested, busy, &placement);
+	ret = ttm_bo_validate(bo, &placement, &ctx);
 	if (ret)
 		return ret == -ENOSPC ? -ENXIO : ret;
 
@@ -683,7 +729,6 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_gem_object_make_unshrinkable(obj);
 	INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN);
 	mutex_init(&obj->ttm.get_io_page.lock);
-
 	bo_type = (obj->flags & I915_BO_ALLOC_USER) ? ttm_bo_type_device :
 		ttm_bo_type_kernel;
 
@@ -707,7 +752,6 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	i915_ttm_adjust_domains_after_cpu_move(obj);
 	i915_ttm_adjust_gem_after_move(obj);
 	i915_gem_object_unlock(obj);
-
 out:
 	/* i915 wants -ENXIO when out of memory region space. */
 	return (ret == -ENOSPC) ? -ENXIO : ret;
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 27fe0668d094..5a664f6cc93f 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -50,12 +50,16 @@ void intel_region_ttm_device_fini(struct drm_i915_private *dev_priv)
  * driver-private types for now, reserving TTM_PL_VRAM for stolen
  * memory and TTM_PL_TT for GGTT use if decided to implement this.
  */
-static int intel_region_to_ttm_type(struct intel_memory_region *mem)
+int intel_region_to_ttm_type(const struct intel_memory_region *mem)
 {
 	int type;
 
 	GEM_BUG_ON(mem->type != INTEL_MEMORY_LOCAL &&
-		   mem->type != INTEL_MEMORY_MOCK);
+		   mem->type != INTEL_MEMORY_MOCK &&
+		   mem->type != INTEL_MEMORY_SYSTEM);
+
+	if (mem->type == INTEL_MEMORY_SYSTEM)
+		return TTM_PL_SYSTEM;
 
 	type = mem->instance + TTM_PL_PRIV;
 	GEM_BUG_ON(type >= TTM_NUM_MEM_TYPES);
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
index e8cf830fda6f..649491844e79 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.h
+++ b/drivers/gpu/drm/i915/intel_region_ttm.h
@@ -28,6 +28,8 @@ struct sg_table *intel_region_ttm_node_to_st(struct intel_memory_region *mem,
 void intel_region_ttm_node_free(struct intel_memory_region *mem,
 				struct ttm_resource *node);
 
+int intel_region_to_ttm_type(const struct intel_memory_region *mem);
+
 struct ttm_device_funcs *i915_ttm_driver(void);
 
 #ifdef CONFIG_DRM_I915_SELFTEST
-- 
2.31.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/4] drm/i915/ttm: Use TTM for system memory
  2021-06-14  9:52 ` [Intel-gfx] " Thomas Hellström
@ 2021-06-14  9:52   ` Thomas Hellström
  -1 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14  9:52 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

For discrete, use TTM for both cached and WC system memory. That means
we currently rely on the TTM memory accounting / shrinker. For cached
system memory we should consider remaining shmem-backed, which can be
implemented from our ttm_tt_populate callback. We can then also reuse our
own very elaborate shrinker for that memory.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
v2:
- Fix IS_ERR_OR_NULL() check to IS_ERR() (Reported by Matthew Auld)
v3:
- Commit message typo fix
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c    | 22 ++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h            |  3 ---
 drivers/gpu/drm/i915/intel_memory_region.c |  7 ++++++-
 drivers/gpu/drm/i915/intel_memory_region.h |  8 ++++++++
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 33213be45050..1fb32593b2f0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -756,3 +756,25 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	/* i915 wants -ENXIO when out of memory region space. */
 	return (ret == -ENOSPC) ? -ENXIO : ret;
 }
+
+static const struct intel_memory_region_ops ttm_system_region_ops = {
+	.init_object = __i915_gem_ttm_object_init,
+};
+
+struct intel_memory_region *
+i915_gem_ttm_system_setup(struct drm_i915_private *i915,
+			  u16 type, u16 instance)
+{
+	struct intel_memory_region *mr;
+
+	mr = intel_memory_region_create(i915, 0,
+					totalram_pages() << PAGE_SHIFT,
+					PAGE_SIZE, 0,
+					type, instance,
+					&ttm_system_region_ops);
+	if (IS_ERR(mr))
+		return mr;
+
+	intel_memory_region_set_name(mr, "system-ttm");
+	return mr;
+}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 38ff2fb89744..9643bebb951d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1751,9 +1751,6 @@ void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv);
 void i915_gem_init_early(struct drm_i915_private *dev_priv);
 void i915_gem_cleanup_early(struct drm_i915_private *dev_priv);
 
-struct intel_memory_region *i915_gem_shmem_setup(struct drm_i915_private *i915,
-						 u16 type, u16 instance);
-
 static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
 {
 	/*
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index 12fb5423fd5e..0b016bdb8b84 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -220,7 +220,12 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915)
 		instance = intel_region_map[i].instance;
 		switch (type) {
 		case INTEL_MEMORY_SYSTEM:
-			mem = i915_gem_shmem_setup(i915, type, instance);
+			if (IS_DGFX(i915))
+				mem = i915_gem_ttm_system_setup(i915, type,
+								instance);
+			else
+				mem = i915_gem_shmem_setup(i915, type,
+							   instance);
 			break;
 		case INTEL_MEMORY_STOLEN_LOCAL:
 			mem = i915_gem_stolen_lmem_setup(i915, type, instance);
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index c7e635d62e1a..1a2bb9fc9de5 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -143,4 +143,12 @@ void intel_memory_region_unreserve(struct intel_memory_region *mem);
 int intel_memory_region_reserve(struct intel_memory_region *mem,
 				resource_size_t offset,
 				resource_size_t size);
+
+struct intel_memory_region *
+i915_gem_ttm_system_setup(struct drm_i915_private *i915,
+			  u16 type, u16 instance);
+struct intel_memory_region *
+i915_gem_shmem_setup(struct drm_i915_private *i915,
+		     u16 type, u16 instance);
+
 #endif
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Intel-gfx] [PATCH v3 4/4] drm/i915/ttm: Use TTM for system memory
@ 2021-06-14  9:52   ` Thomas Hellström
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14  9:52 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, matthew.auld

For discrete, use TTM for both cached and WC system memory. That means
we currently rely on the TTM memory accounting / shrinker. For cached
system memory we should consider remaining shmem-backed, which can be
implemented from our ttm_tt_populate callback. We can then also reuse our
own very elaborate shrinker for that memory.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
v2:
- Fix IS_ERR_OR_NULL() check to IS_ERR() (Reported by Matthew Auld)
v3:
- Commit message typo fix
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c    | 22 ++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h            |  3 ---
 drivers/gpu/drm/i915/intel_memory_region.c |  7 ++++++-
 drivers/gpu/drm/i915/intel_memory_region.h |  8 ++++++++
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 33213be45050..1fb32593b2f0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -756,3 +756,25 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	/* i915 wants -ENXIO when out of memory region space. */
 	return (ret == -ENOSPC) ? -ENXIO : ret;
 }
+
+static const struct intel_memory_region_ops ttm_system_region_ops = {
+	.init_object = __i915_gem_ttm_object_init,
+};
+
+struct intel_memory_region *
+i915_gem_ttm_system_setup(struct drm_i915_private *i915,
+			  u16 type, u16 instance)
+{
+	struct intel_memory_region *mr;
+
+	mr = intel_memory_region_create(i915, 0,
+					totalram_pages() << PAGE_SHIFT,
+					PAGE_SIZE, 0,
+					type, instance,
+					&ttm_system_region_ops);
+	if (IS_ERR(mr))
+		return mr;
+
+	intel_memory_region_set_name(mr, "system-ttm");
+	return mr;
+}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 38ff2fb89744..9643bebb951d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1751,9 +1751,6 @@ void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv);
 void i915_gem_init_early(struct drm_i915_private *dev_priv);
 void i915_gem_cleanup_early(struct drm_i915_private *dev_priv);
 
-struct intel_memory_region *i915_gem_shmem_setup(struct drm_i915_private *i915,
-						 u16 type, u16 instance);
-
 static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
 {
 	/*
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index 12fb5423fd5e..0b016bdb8b84 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -220,7 +220,12 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915)
 		instance = intel_region_map[i].instance;
 		switch (type) {
 		case INTEL_MEMORY_SYSTEM:
-			mem = i915_gem_shmem_setup(i915, type, instance);
+			if (IS_DGFX(i915))
+				mem = i915_gem_ttm_system_setup(i915, type,
+								instance);
+			else
+				mem = i915_gem_shmem_setup(i915, type,
+							   instance);
 			break;
 		case INTEL_MEMORY_STOLEN_LOCAL:
 			mem = i915_gem_stolen_lmem_setup(i915, type, instance);
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index c7e635d62e1a..1a2bb9fc9de5 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -143,4 +143,12 @@ void intel_memory_region_unreserve(struct intel_memory_region *mem);
 int intel_memory_region_reserve(struct intel_memory_region *mem,
 				resource_size_t offset,
 				resource_size_t size);
+
+struct intel_memory_region *
+i915_gem_ttm_system_setup(struct drm_i915_private *i915,
+			  u16 type, u16 instance);
+struct intel_memory_region *
+i915_gem_shmem_setup(struct drm_i915_private *i915,
+		     u16 type, u16 instance);
+
 #endif
-- 
2.31.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/4] drm/i915/ttm: Adjust gem flags and caching settings after a move
  2021-06-14  9:52   ` [Intel-gfx] " Thomas Hellström
@ 2021-06-14 10:20     ` Matthew Auld
  -1 siblings, 0 replies; 18+ messages in thread
From: Matthew Auld @ 2021-06-14 10:20 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Intel Graphics Development, Matthew Auld, ML dri-devel

On Mon, 14 Jun 2021 at 10:53, Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> After a TTM move or object init we need to update the i915 gem flags and
> caching settings to reflect the new placement. Currently caching settings
> are not changed during the lifetime of an object, although that might
> change moving forward if we run into performance issues or issues with
> WC system page allocations.
> Also introduce gpu_binds_iomem() and cpu_maps_iomem() to clean up the
> various ways we previously used to detect this.
> Finally, initialize the TTM object reserved to be able to update
> flags and caching before anyone else gets hold of the object.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> v2:
> - Style fixes (Reported by Matthew Auld)
> v3:
> - More style fixes. Clarify why we're updating caching settings after move.
>   (Suggested by Matthew Auld)
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 111 +++++++++++++++++++-----
>  1 file changed, 89 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 33ab47f1e05b..5176682a7d19 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -70,6 +70,17 @@ static struct ttm_placement i915_sys_placement = {
>         .busy_placement = &lmem0_sys_placement_flags[1],
>  };
>
> +static bool gpu_binds_iomem(struct ttm_resource *mem)
> +{
> +       return mem->mem_type != TTM_PL_SYSTEM;
> +}
> +
> +static bool cpu_maps_iomem(struct ttm_resource *mem)
> +{
> +       /* Once / if we support GGTT, this is also false for cached ttm_tts */
> +       return mem->mem_type != TTM_PL_SYSTEM;
> +}
> +
>  static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
>
>  static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
> @@ -175,6 +186,40 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
>         obj->ttm.cached_io_st = NULL;
>  }
>
> +static void
> +i915_ttm_adjust_domains_after_cpu_move(struct drm_i915_gem_object *obj)
> +{
> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> +
> +       if (cpu_maps_iomem(bo->resource) || bo->ttm->caching != ttm_cached) {
> +               obj->write_domain = I915_GEM_DOMAIN_WC;
> +               obj->read_domains = I915_GEM_DOMAIN_WC;
> +       } else {
> +               obj->write_domain = I915_GEM_DOMAIN_CPU;
> +               obj->read_domains = I915_GEM_DOMAIN_CPU;
> +       }
> +}
> +
> +static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> +{
> +       struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> +       unsigned int cache_level;
> +
> +       obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
> +
> +       obj->mem_flags |= cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
> +               I915_BO_FLAG_STRUCT_PAGE;
> +
> +       if ((HAS_LLC(i915) || HAS_SNOOP(i915)) && !gpu_binds_iomem(bo->resource) &&
> +           bo->ttm->caching == ttm_cached)
> +               cache_level = I915_CACHE_LLC;
> +       else
> +               cache_level = I915_CACHE_NONE;
> +
> +       i915_gem_object_set_cache_coherency(obj, cache_level);
> +}
> +
>  static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>  {
>         struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> @@ -190,8 +235,10 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>
>         /* TTM's purge interface. Note that we might be reentering. */
>         ret = ttm_bo_validate(bo, &place, &ctx);
> -
>         if (!ret) {
> +               obj->write_domain = 0;
> +               obj->read_domains = 0;
> +               i915_ttm_adjust_gem_after_move(obj);
>                 i915_ttm_free_cached_io_st(obj);
>                 obj->mm.madv = __I915_MADV_PURGED;
>         }
> @@ -273,12 +320,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>                          struct ttm_resource *res)
>  {
>         struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> -       struct ttm_resource_manager *man =
> -               ttm_manager_type(bo->bdev, res->mem_type);
>
> -       if (man->use_tt)
> +       if (!gpu_binds_iomem(res))
>                 return i915_ttm_tt_get_st(bo->ttm);
>
> +       /*
> +        * If CPU mapping differs, we need to add the ttm_tt pages to
> +        * the resulting st. Might make sense for GGTT.
> +        */
> +       GEM_WARN_ON(!cpu_maps_iomem(res));
>         return intel_region_ttm_node_to_st(obj->mm.region, res);
>  }
>
> @@ -290,8 +340,6 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>         struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>         struct ttm_resource_manager *dst_man =
>                 ttm_manager_type(bo->bdev, dst_mem->mem_type);
> -       struct ttm_resource_manager *src_man =
> -               ttm_manager_type(bo->bdev, bo->resource->mem_type);
>         struct intel_memory_region *dst_reg, *src_reg;
>         union {
>                 struct ttm_kmap_iter_tt tt;
> @@ -332,34 +380,36 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>         if (IS_ERR(dst_st))
>                 return PTR_ERR(dst_st);
>
> -       /* If we start mapping GGTT, we can no longer use man::use_tt here. */
> -       dst_iter = dst_man->use_tt ?
> +       dst_iter = !cpu_maps_iomem(dst_mem) ?
>                 ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
>                 ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
>                                          dst_st, dst_reg->region.start);
>
> -       src_iter = src_man->use_tt ?
> +       src_iter = !cpu_maps_iomem(bo->resource) ?
>                 ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
>                 ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
>                                          obj->ttm.cached_io_st,
>                                          src_reg->region.start);
>
>         ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
> +       /* Below dst_mem becomes bo->resource. */
>         ttm_bo_move_sync_cleanup(bo, dst_mem);
> +       i915_ttm_adjust_domains_after_cpu_move(obj);
>         i915_ttm_free_cached_io_st(obj);
>
> -       if (!dst_man->use_tt) {
> +       if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
>                 obj->ttm.cached_io_st = dst_st;
>                 obj->ttm.get_io_page.sg_pos = dst_st->sgl;
>                 obj->ttm.get_io_page.sg_idx = 0;
>         }
>
> +       i915_ttm_adjust_gem_after_move(obj);
>         return 0;
>  }
>
>  static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
>  {
> -       if (mem->mem_type < I915_PL_LMEM0)
> +       if (!cpu_maps_iomem(mem))
>                 return 0;
>
>         mem->bus.caching = ttm_write_combined;
> @@ -421,6 +471,16 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>         if (ret)
>                 return ret == -ENOSPC ? -ENXIO : ret;
>
> +       i915_ttm_adjust_lru(obj);
> +       if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
> +               ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
> +               if (ret)
> +                       return ret;
> +
> +               i915_ttm_adjust_domains_after_cpu_move(obj);
> +               i915_ttm_adjust_gem_after_move(obj);
> +       }
> +
>         /* Object either has a page vector or is an iomem object */
>         st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
>         if (IS_ERR(st))
> @@ -428,8 +488,6 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>
>         __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
>
> -       i915_ttm_adjust_lru(obj);
> -
>         return ret;
>  }
>
> @@ -563,6 +621,7 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
>
>  const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
>         .name = "i915_gem_object_ttm",
> +       .flags = I915_GEM_OBJECT_IS_SHRINKABLE,

We set is_shrinkable for both lmem and smem? Does the current shrinker
work with !shmem? I assume the put_pages() will just discard the page
contents, which is not what we want? Maybe keep this disabled for now?
Or am I missing something?

>
>         .get_pages = i915_ttm_get_pages,
>         .put_pages = i915_ttm_put_pages,
> @@ -599,6 +658,10 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>  {
>         static struct lock_class_key lock_class;
>         struct drm_i915_private *i915 = mem->i915;
> +       struct ttm_operation_ctx ctx = {
> +               .interruptible = true,
> +               .no_wait_gpu = false,
> +       };
>         enum ttm_bo_type bo_type;
>         size_t alignment = 0;
>         int ret;
> @@ -618,15 +681,14 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>         i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
>         i915_gem_object_init_memory_region(obj, mem);
>         i915_gem_object_make_unshrinkable(obj);
> -       obj->read_domains = I915_GEM_DOMAIN_WC | I915_GEM_DOMAIN_GTT;
> -       obj->mem_flags |= I915_BO_FLAG_IOMEM;
> -       i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
>         INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN);
>         mutex_init(&obj->ttm.get_io_page.lock);
>
>         bo_type = (obj->flags & I915_BO_ALLOC_USER) ? ttm_bo_type_device :
>                 ttm_bo_type_kernel;
>
> +       obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
> +
>         /*
>          * If this function fails, it will call the destructor, but
>          * our caller still owns the object. So no freeing in the
> @@ -634,14 +696,19 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>          * Similarly, in delayed_destroy, we can't call ttm_bo_put()
>          * until successful initialization.
>          */
> -       obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
> -       ret = ttm_bo_init(&i915->bdev, i915_gem_to_ttm(obj), size,
> -                         bo_type, &i915_sys_placement, alignment,
> -                         true, NULL, NULL, i915_ttm_bo_destroy);
> +       ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
> +                                  bo_type, &i915_sys_placement, alignment,
> +                                  &ctx, NULL, NULL, i915_ttm_bo_destroy);
> +
> +       if (ret)
> +               goto out;
>
> -       if (!ret)
> -               obj->ttm.created = true;
> +       obj->ttm.created = true;
> +       i915_ttm_adjust_domains_after_cpu_move(obj);
> +       i915_ttm_adjust_gem_after_move(obj);
> +       i915_gem_object_unlock(obj);
>
> +out:
>         /* i915 wants -ENXIO when out of memory region space. */
>         return (ret == -ENOSPC) ? -ENXIO : ret;
>  }
> --
> 2.31.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/4] drm/i915/ttm: Adjust gem flags and caching settings after a move
@ 2021-06-14 10:20     ` Matthew Auld
  0 siblings, 0 replies; 18+ messages in thread
From: Matthew Auld @ 2021-06-14 10:20 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Intel Graphics Development, Matthew Auld, ML dri-devel

On Mon, 14 Jun 2021 at 10:53, Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> After a TTM move or object init we need to update the i915 gem flags and
> caching settings to reflect the new placement. Currently caching settings
> are not changed during the lifetime of an object, although that might
> change moving forward if we run into performance issues or issues with
> WC system page allocations.
> Also introduce gpu_binds_iomem() and cpu_maps_iomem() to clean up the
> various ways we previously used to detect this.
> Finally, initialize the TTM object reserved to be able to update
> flags and caching before anyone else gets hold of the object.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> v2:
> - Style fixes (Reported by Matthew Auld)
> v3:
> - More style fixes. Clarify why we're updating caching settings after move.
>   (Suggested by Matthew Auld)
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 111 +++++++++++++++++++-----
>  1 file changed, 89 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 33ab47f1e05b..5176682a7d19 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -70,6 +70,17 @@ static struct ttm_placement i915_sys_placement = {
>         .busy_placement = &lmem0_sys_placement_flags[1],
>  };
>
> +static bool gpu_binds_iomem(struct ttm_resource *mem)
> +{
> +       return mem->mem_type != TTM_PL_SYSTEM;
> +}
> +
> +static bool cpu_maps_iomem(struct ttm_resource *mem)
> +{
> +       /* Once / if we support GGTT, this is also false for cached ttm_tts */
> +       return mem->mem_type != TTM_PL_SYSTEM;
> +}
> +
>  static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
>
>  static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
> @@ -175,6 +186,40 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
>         obj->ttm.cached_io_st = NULL;
>  }
>
> +static void
> +i915_ttm_adjust_domains_after_cpu_move(struct drm_i915_gem_object *obj)
> +{
> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> +
> +       if (cpu_maps_iomem(bo->resource) || bo->ttm->caching != ttm_cached) {
> +               obj->write_domain = I915_GEM_DOMAIN_WC;
> +               obj->read_domains = I915_GEM_DOMAIN_WC;
> +       } else {
> +               obj->write_domain = I915_GEM_DOMAIN_CPU;
> +               obj->read_domains = I915_GEM_DOMAIN_CPU;
> +       }
> +}
> +
> +static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> +{
> +       struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> +       unsigned int cache_level;
> +
> +       obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
> +
> +       obj->mem_flags |= cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
> +               I915_BO_FLAG_STRUCT_PAGE;
> +
> +       if ((HAS_LLC(i915) || HAS_SNOOP(i915)) && !gpu_binds_iomem(bo->resource) &&
> +           bo->ttm->caching == ttm_cached)
> +               cache_level = I915_CACHE_LLC;
> +       else
> +               cache_level = I915_CACHE_NONE;
> +
> +       i915_gem_object_set_cache_coherency(obj, cache_level);
> +}
> +
>  static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>  {
>         struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> @@ -190,8 +235,10 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>
>         /* TTM's purge interface. Note that we might be reentering. */
>         ret = ttm_bo_validate(bo, &place, &ctx);
> -
>         if (!ret) {
> +               obj->write_domain = 0;
> +               obj->read_domains = 0;
> +               i915_ttm_adjust_gem_after_move(obj);
>                 i915_ttm_free_cached_io_st(obj);
>                 obj->mm.madv = __I915_MADV_PURGED;
>         }
> @@ -273,12 +320,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>                          struct ttm_resource *res)
>  {
>         struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> -       struct ttm_resource_manager *man =
> -               ttm_manager_type(bo->bdev, res->mem_type);
>
> -       if (man->use_tt)
> +       if (!gpu_binds_iomem(res))
>                 return i915_ttm_tt_get_st(bo->ttm);
>
> +       /*
> +        * If CPU mapping differs, we need to add the ttm_tt pages to
> +        * the resulting st. Might make sense for GGTT.
> +        */
> +       GEM_WARN_ON(!cpu_maps_iomem(res));
>         return intel_region_ttm_node_to_st(obj->mm.region, res);
>  }
>
> @@ -290,8 +340,6 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>         struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>         struct ttm_resource_manager *dst_man =
>                 ttm_manager_type(bo->bdev, dst_mem->mem_type);
> -       struct ttm_resource_manager *src_man =
> -               ttm_manager_type(bo->bdev, bo->resource->mem_type);
>         struct intel_memory_region *dst_reg, *src_reg;
>         union {
>                 struct ttm_kmap_iter_tt tt;
> @@ -332,34 +380,36 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>         if (IS_ERR(dst_st))
>                 return PTR_ERR(dst_st);
>
> -       /* If we start mapping GGTT, we can no longer use man::use_tt here. */
> -       dst_iter = dst_man->use_tt ?
> +       dst_iter = !cpu_maps_iomem(dst_mem) ?
>                 ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
>                 ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
>                                          dst_st, dst_reg->region.start);
>
> -       src_iter = src_man->use_tt ?
> +       src_iter = !cpu_maps_iomem(bo->resource) ?
>                 ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
>                 ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
>                                          obj->ttm.cached_io_st,
>                                          src_reg->region.start);
>
>         ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
> +       /* Below dst_mem becomes bo->resource. */
>         ttm_bo_move_sync_cleanup(bo, dst_mem);
> +       i915_ttm_adjust_domains_after_cpu_move(obj);
>         i915_ttm_free_cached_io_st(obj);
>
> -       if (!dst_man->use_tt) {
> +       if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
>                 obj->ttm.cached_io_st = dst_st;
>                 obj->ttm.get_io_page.sg_pos = dst_st->sgl;
>                 obj->ttm.get_io_page.sg_idx = 0;
>         }
>
> +       i915_ttm_adjust_gem_after_move(obj);
>         return 0;
>  }
>
>  static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
>  {
> -       if (mem->mem_type < I915_PL_LMEM0)
> +       if (!cpu_maps_iomem(mem))
>                 return 0;
>
>         mem->bus.caching = ttm_write_combined;
> @@ -421,6 +471,16 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>         if (ret)
>                 return ret == -ENOSPC ? -ENXIO : ret;
>
> +       i915_ttm_adjust_lru(obj);
> +       if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
> +               ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
> +               if (ret)
> +                       return ret;
> +
> +               i915_ttm_adjust_domains_after_cpu_move(obj);
> +               i915_ttm_adjust_gem_after_move(obj);
> +       }
> +
>         /* Object either has a page vector or is an iomem object */
>         st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
>         if (IS_ERR(st))
> @@ -428,8 +488,6 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>
>         __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
>
> -       i915_ttm_adjust_lru(obj);
> -
>         return ret;
>  }
>
> @@ -563,6 +621,7 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
>
>  const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
>         .name = "i915_gem_object_ttm",
> +       .flags = I915_GEM_OBJECT_IS_SHRINKABLE,

We set is_shrinkable for both lmem and smem? Does the current shrinker
work with !shmem? I assume the put_pages() will just discard the page
contents, which is not what we want? Maybe keep this disabled for now?
Or am I missing something?

>
>         .get_pages = i915_ttm_get_pages,
>         .put_pages = i915_ttm_put_pages,
> @@ -599,6 +658,10 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>  {
>         static struct lock_class_key lock_class;
>         struct drm_i915_private *i915 = mem->i915;
> +       struct ttm_operation_ctx ctx = {
> +               .interruptible = true,
> +               .no_wait_gpu = false,
> +       };
>         enum ttm_bo_type bo_type;
>         size_t alignment = 0;
>         int ret;
> @@ -618,15 +681,14 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>         i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
>         i915_gem_object_init_memory_region(obj, mem);
>         i915_gem_object_make_unshrinkable(obj);
> -       obj->read_domains = I915_GEM_DOMAIN_WC | I915_GEM_DOMAIN_GTT;
> -       obj->mem_flags |= I915_BO_FLAG_IOMEM;
> -       i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
>         INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN);
>         mutex_init(&obj->ttm.get_io_page.lock);
>
>         bo_type = (obj->flags & I915_BO_ALLOC_USER) ? ttm_bo_type_device :
>                 ttm_bo_type_kernel;
>
> +       obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
> +
>         /*
>          * If this function fails, it will call the destructor, but
>          * our caller still owns the object. So no freeing in the
> @@ -634,14 +696,19 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>          * Similarly, in delayed_destroy, we can't call ttm_bo_put()
>          * until successful initialization.
>          */
> -       obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
> -       ret = ttm_bo_init(&i915->bdev, i915_gem_to_ttm(obj), size,
> -                         bo_type, &i915_sys_placement, alignment,
> -                         true, NULL, NULL, i915_ttm_bo_destroy);
> +       ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
> +                                  bo_type, &i915_sys_placement, alignment,
> +                                  &ctx, NULL, NULL, i915_ttm_bo_destroy);
> +
> +       if (ret)
> +               goto out;
>
> -       if (!ret)
> -               obj->ttm.created = true;
> +       obj->ttm.created = true;
> +       i915_ttm_adjust_domains_after_cpu_move(obj);
> +       i915_ttm_adjust_gem_after_move(obj);
> +       i915_gem_object_unlock(obj);
>
> +out:
>         /* i915 wants -ENXIO when out of memory region space. */
>         return (ret == -ENOSPC) ? -ENXIO : ret;
>  }
> --
> 2.31.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/4] drm/i915/ttm: Adjust gem flags and caching settings after a move
  2021-06-14 10:20     ` Matthew Auld
@ 2021-06-14 10:32       ` Thomas Hellström
  -1 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14 10:32 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development, Matthew Auld, ML dri-devel


On 6/14/21 12:20 PM, Matthew Auld wrote:
> On Mon, 14 Jun 2021 at 10:53, Thomas Hellström
> <thomas.hellstrom@linux.intel.com> wrote:
>> After a TTM move or object init we need to update the i915 gem flags and
>> caching settings to reflect the new placement. Currently caching settings
>> are not changed during the lifetime of an object, although that might
>> change moving forward if we run into performance issues or issues with
>> WC system page allocations.
>> Also introduce gpu_binds_iomem() and cpu_maps_iomem() to clean up the
>> various ways we previously used to detect this.
>> Finally, initialize the TTM object reserved to be able to update
>> flags and caching before anyone else gets hold of the object.
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>> v2:
>> - Style fixes (Reported by Matthew Auld)
>> v3:
>> - More style fixes. Clarify why we're updating caching settings after move.
>>    (Suggested by Matthew Auld)
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 111 +++++++++++++++++++-----
>>   1 file changed, 89 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> index 33ab47f1e05b..5176682a7d19 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> @@ -70,6 +70,17 @@ static struct ttm_placement i915_sys_placement = {
>>          .busy_placement = &lmem0_sys_placement_flags[1],
>>   };
>>
>> +static bool gpu_binds_iomem(struct ttm_resource *mem)
>> +{
>> +       return mem->mem_type != TTM_PL_SYSTEM;
>> +}
>> +
>> +static bool cpu_maps_iomem(struct ttm_resource *mem)
>> +{
>> +       /* Once / if we support GGTT, this is also false for cached ttm_tts */
>> +       return mem->mem_type != TTM_PL_SYSTEM;
>> +}
>> +
>>   static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
>>
>>   static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>> @@ -175,6 +186,40 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
>>          obj->ttm.cached_io_st = NULL;
>>   }
>>
>> +static void
>> +i915_ttm_adjust_domains_after_cpu_move(struct drm_i915_gem_object *obj)
>> +{
>> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> +
>> +       if (cpu_maps_iomem(bo->resource) || bo->ttm->caching != ttm_cached) {
>> +               obj->write_domain = I915_GEM_DOMAIN_WC;
>> +               obj->read_domains = I915_GEM_DOMAIN_WC;
>> +       } else {
>> +               obj->write_domain = I915_GEM_DOMAIN_CPU;
>> +               obj->read_domains = I915_GEM_DOMAIN_CPU;
>> +       }
>> +}
>> +
>> +static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>> +{
>> +       struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> +       unsigned int cache_level;
>> +
>> +       obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>> +
>> +       obj->mem_flags |= cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>> +               I915_BO_FLAG_STRUCT_PAGE;
>> +
>> +       if ((HAS_LLC(i915) || HAS_SNOOP(i915)) && !gpu_binds_iomem(bo->resource) &&
>> +           bo->ttm->caching == ttm_cached)
>> +               cache_level = I915_CACHE_LLC;
>> +       else
>> +               cache_level = I915_CACHE_NONE;
>> +
>> +       i915_gem_object_set_cache_coherency(obj, cache_level);
>> +}
>> +
>>   static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>>   {
>>          struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> @@ -190,8 +235,10 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>>
>>          /* TTM's purge interface. Note that we might be reentering. */
>>          ret = ttm_bo_validate(bo, &place, &ctx);
>> -
>>          if (!ret) {
>> +               obj->write_domain = 0;
>> +               obj->read_domains = 0;
>> +               i915_ttm_adjust_gem_after_move(obj);
>>                  i915_ttm_free_cached_io_st(obj);
>>                  obj->mm.madv = __I915_MADV_PURGED;
>>          }
>> @@ -273,12 +320,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>>                           struct ttm_resource *res)
>>   {
>>          struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> -       struct ttm_resource_manager *man =
>> -               ttm_manager_type(bo->bdev, res->mem_type);
>>
>> -       if (man->use_tt)
>> +       if (!gpu_binds_iomem(res))
>>                  return i915_ttm_tt_get_st(bo->ttm);
>>
>> +       /*
>> +        * If CPU mapping differs, we need to add the ttm_tt pages to
>> +        * the resulting st. Might make sense for GGTT.
>> +        */
>> +       GEM_WARN_ON(!cpu_maps_iomem(res));
>>          return intel_region_ttm_node_to_st(obj->mm.region, res);
>>   }
>>
>> @@ -290,8 +340,6 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>>          struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>>          struct ttm_resource_manager *dst_man =
>>                  ttm_manager_type(bo->bdev, dst_mem->mem_type);
>> -       struct ttm_resource_manager *src_man =
>> -               ttm_manager_type(bo->bdev, bo->resource->mem_type);
>>          struct intel_memory_region *dst_reg, *src_reg;
>>          union {
>>                  struct ttm_kmap_iter_tt tt;
>> @@ -332,34 +380,36 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>>          if (IS_ERR(dst_st))
>>                  return PTR_ERR(dst_st);
>>
>> -       /* If we start mapping GGTT, we can no longer use man::use_tt here. */
>> -       dst_iter = dst_man->use_tt ?
>> +       dst_iter = !cpu_maps_iomem(dst_mem) ?
>>                  ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
>>                  ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
>>                                           dst_st, dst_reg->region.start);
>>
>> -       src_iter = src_man->use_tt ?
>> +       src_iter = !cpu_maps_iomem(bo->resource) ?
>>                  ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
>>                  ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
>>                                           obj->ttm.cached_io_st,
>>                                           src_reg->region.start);
>>
>>          ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
>> +       /* Below dst_mem becomes bo->resource. */
>>          ttm_bo_move_sync_cleanup(bo, dst_mem);
>> +       i915_ttm_adjust_domains_after_cpu_move(obj);
>>          i915_ttm_free_cached_io_st(obj);
>>
>> -       if (!dst_man->use_tt) {
>> +       if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
>>                  obj->ttm.cached_io_st = dst_st;
>>                  obj->ttm.get_io_page.sg_pos = dst_st->sgl;
>>                  obj->ttm.get_io_page.sg_idx = 0;
>>          }
>>
>> +       i915_ttm_adjust_gem_after_move(obj);
>>          return 0;
>>   }
>>
>>   static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
>>   {
>> -       if (mem->mem_type < I915_PL_LMEM0)
>> +       if (!cpu_maps_iomem(mem))
>>                  return 0;
>>
>>          mem->bus.caching = ttm_write_combined;
>> @@ -421,6 +471,16 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>>          if (ret)
>>                  return ret == -ENOSPC ? -ENXIO : ret;
>>
>> +       i915_ttm_adjust_lru(obj);
>> +       if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
>> +               ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
>> +               if (ret)
>> +                       return ret;
>> +
>> +               i915_ttm_adjust_domains_after_cpu_move(obj);
>> +               i915_ttm_adjust_gem_after_move(obj);
>> +       }
>> +
>>          /* Object either has a page vector or is an iomem object */
>>          st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
>>          if (IS_ERR(st))
>> @@ -428,8 +488,6 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>>
>>          __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
>>
>> -       i915_ttm_adjust_lru(obj);
>> -
>>          return ret;
>>   }
>>
>> @@ -563,6 +621,7 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
>>
>>   const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
>>          .name = "i915_gem_object_ttm",
>> +       .flags = I915_GEM_OBJECT_IS_SHRINKABLE,
> We set is_shrinkable for both lmem and smem? Does the current shrinker
> work with !shmem? I assume the put_pages() will just discard the page
> contents, which is not what we want? Maybe keep this disabled for now?
> Or am I missing something?

This is to indicate to the gem shrinker that we want to be able to 
determine dynamically whether the object is shrinkable (by the gem 
shrinker) or not. (See below). It's intended for the situation where we 
might want to use shmem to back cached-only ttm objects, and use our own 
shrinker.

But if you think it's better, we could flip that ops flag once we 
actually have anything shrinkable in place. Let me know what you think. 
The TTM shmem pool idea becomes a bit doubtful by the fact that 
allocating some 8GB of system objects, populating and clearing them 
appears to be some 6X faster with a straightforward alloc approach (like 
TTM or the "internal" backend) than with shmem...

>>          .get_pages = i915_ttm_get_pages,
>>          .put_pages = i915_ttm_put_pages,
>> @@ -599,6 +658,10 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>>   {
>>          static struct lock_class_key lock_class;
>>          struct drm_i915_private *i915 = mem->i915;
>> +       struct ttm_operation_ctx ctx = {
>> +               .interruptible = true,
>> +               .no_wait_gpu = false,
>> +       };
>>          enum ttm_bo_type bo_type;
>>          size_t alignment = 0;
>>          int ret;
>> @@ -618,15 +681,14 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>>          i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
>>          i915_gem_object_init_memory_region(obj, mem);
>>          i915_gem_object_make_unshrinkable(obj);

Here we set the object to unshrinkable.

/Thomas



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/4] drm/i915/ttm: Adjust gem flags and caching settings after a move
@ 2021-06-14 10:32       ` Thomas Hellström
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14 10:32 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development, Matthew Auld, ML dri-devel


On 6/14/21 12:20 PM, Matthew Auld wrote:
> On Mon, 14 Jun 2021 at 10:53, Thomas Hellström
> <thomas.hellstrom@linux.intel.com> wrote:
>> After a TTM move or object init we need to update the i915 gem flags and
>> caching settings to reflect the new placement. Currently caching settings
>> are not changed during the lifetime of an object, although that might
>> change moving forward if we run into performance issues or issues with
>> WC system page allocations.
>> Also introduce gpu_binds_iomem() and cpu_maps_iomem() to clean up the
>> various ways we previously used to detect this.
>> Finally, initialize the TTM object reserved to be able to update
>> flags and caching before anyone else gets hold of the object.
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>> v2:
>> - Style fixes (Reported by Matthew Auld)
>> v3:
>> - More style fixes. Clarify why we're updating caching settings after move.
>>    (Suggested by Matthew Auld)
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 111 +++++++++++++++++++-----
>>   1 file changed, 89 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> index 33ab47f1e05b..5176682a7d19 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> @@ -70,6 +70,17 @@ static struct ttm_placement i915_sys_placement = {
>>          .busy_placement = &lmem0_sys_placement_flags[1],
>>   };
>>
>> +static bool gpu_binds_iomem(struct ttm_resource *mem)
>> +{
>> +       return mem->mem_type != TTM_PL_SYSTEM;
>> +}
>> +
>> +static bool cpu_maps_iomem(struct ttm_resource *mem)
>> +{
>> +       /* Once / if we support GGTT, this is also false for cached ttm_tts */
>> +       return mem->mem_type != TTM_PL_SYSTEM;
>> +}
>> +
>>   static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
>>
>>   static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>> @@ -175,6 +186,40 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
>>          obj->ttm.cached_io_st = NULL;
>>   }
>>
>> +static void
>> +i915_ttm_adjust_domains_after_cpu_move(struct drm_i915_gem_object *obj)
>> +{
>> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> +
>> +       if (cpu_maps_iomem(bo->resource) || bo->ttm->caching != ttm_cached) {
>> +               obj->write_domain = I915_GEM_DOMAIN_WC;
>> +               obj->read_domains = I915_GEM_DOMAIN_WC;
>> +       } else {
>> +               obj->write_domain = I915_GEM_DOMAIN_CPU;
>> +               obj->read_domains = I915_GEM_DOMAIN_CPU;
>> +       }
>> +}
>> +
>> +static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>> +{
>> +       struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> +       unsigned int cache_level;
>> +
>> +       obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>> +
>> +       obj->mem_flags |= cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>> +               I915_BO_FLAG_STRUCT_PAGE;
>> +
>> +       if ((HAS_LLC(i915) || HAS_SNOOP(i915)) && !gpu_binds_iomem(bo->resource) &&
>> +           bo->ttm->caching == ttm_cached)
>> +               cache_level = I915_CACHE_LLC;
>> +       else
>> +               cache_level = I915_CACHE_NONE;
>> +
>> +       i915_gem_object_set_cache_coherency(obj, cache_level);
>> +}
>> +
>>   static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>>   {
>>          struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> @@ -190,8 +235,10 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>>
>>          /* TTM's purge interface. Note that we might be reentering. */
>>          ret = ttm_bo_validate(bo, &place, &ctx);
>> -
>>          if (!ret) {
>> +               obj->write_domain = 0;
>> +               obj->read_domains = 0;
>> +               i915_ttm_adjust_gem_after_move(obj);
>>                  i915_ttm_free_cached_io_st(obj);
>>                  obj->mm.madv = __I915_MADV_PURGED;
>>          }
>> @@ -273,12 +320,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>>                           struct ttm_resource *res)
>>   {
>>          struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> -       struct ttm_resource_manager *man =
>> -               ttm_manager_type(bo->bdev, res->mem_type);
>>
>> -       if (man->use_tt)
>> +       if (!gpu_binds_iomem(res))
>>                  return i915_ttm_tt_get_st(bo->ttm);
>>
>> +       /*
>> +        * If CPU mapping differs, we need to add the ttm_tt pages to
>> +        * the resulting st. Might make sense for GGTT.
>> +        */
>> +       GEM_WARN_ON(!cpu_maps_iomem(res));
>>          return intel_region_ttm_node_to_st(obj->mm.region, res);
>>   }
>>
>> @@ -290,8 +340,6 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>>          struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>>          struct ttm_resource_manager *dst_man =
>>                  ttm_manager_type(bo->bdev, dst_mem->mem_type);
>> -       struct ttm_resource_manager *src_man =
>> -               ttm_manager_type(bo->bdev, bo->resource->mem_type);
>>          struct intel_memory_region *dst_reg, *src_reg;
>>          union {
>>                  struct ttm_kmap_iter_tt tt;
>> @@ -332,34 +380,36 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>>          if (IS_ERR(dst_st))
>>                  return PTR_ERR(dst_st);
>>
>> -       /* If we start mapping GGTT, we can no longer use man::use_tt here. */
>> -       dst_iter = dst_man->use_tt ?
>> +       dst_iter = !cpu_maps_iomem(dst_mem) ?
>>                  ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
>>                  ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
>>                                           dst_st, dst_reg->region.start);
>>
>> -       src_iter = src_man->use_tt ?
>> +       src_iter = !cpu_maps_iomem(bo->resource) ?
>>                  ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
>>                  ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
>>                                           obj->ttm.cached_io_st,
>>                                           src_reg->region.start);
>>
>>          ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
>> +       /* Below dst_mem becomes bo->resource. */
>>          ttm_bo_move_sync_cleanup(bo, dst_mem);
>> +       i915_ttm_adjust_domains_after_cpu_move(obj);
>>          i915_ttm_free_cached_io_st(obj);
>>
>> -       if (!dst_man->use_tt) {
>> +       if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
>>                  obj->ttm.cached_io_st = dst_st;
>>                  obj->ttm.get_io_page.sg_pos = dst_st->sgl;
>>                  obj->ttm.get_io_page.sg_idx = 0;
>>          }
>>
>> +       i915_ttm_adjust_gem_after_move(obj);
>>          return 0;
>>   }
>>
>>   static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
>>   {
>> -       if (mem->mem_type < I915_PL_LMEM0)
>> +       if (!cpu_maps_iomem(mem))
>>                  return 0;
>>
>>          mem->bus.caching = ttm_write_combined;
>> @@ -421,6 +471,16 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>>          if (ret)
>>                  return ret == -ENOSPC ? -ENXIO : ret;
>>
>> +       i915_ttm_adjust_lru(obj);
>> +       if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
>> +               ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
>> +               if (ret)
>> +                       return ret;
>> +
>> +               i915_ttm_adjust_domains_after_cpu_move(obj);
>> +               i915_ttm_adjust_gem_after_move(obj);
>> +       }
>> +
>>          /* Object either has a page vector or is an iomem object */
>>          st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
>>          if (IS_ERR(st))
>> @@ -428,8 +488,6 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>>
>>          __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
>>
>> -       i915_ttm_adjust_lru(obj);
>> -
>>          return ret;
>>   }
>>
>> @@ -563,6 +621,7 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
>>
>>   const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
>>          .name = "i915_gem_object_ttm",
>> +       .flags = I915_GEM_OBJECT_IS_SHRINKABLE,
> We set is_shrinkable for both lmem and smem? Does the current shrinker
> work with !shmem? I assume the put_pages() will just discard the page
> contents, which is not what we want? Maybe keep this disabled for now?
> Or am I missing something?

This is to indicate to the gem shrinker that we want to be able to 
determine dynamically whether the object is shrinkable (by the gem 
shrinker) or not. (See below). It's intended for the situation where we 
might want to use shmem to back cached-only ttm objects, and use our own 
shrinker.

But if you think it's better, we could flip that ops flag once we 
actually have anything shrinkable in place. Let me know what you think. 
The TTM shmem pool idea becomes a bit doubtful by the fact that 
allocating some 8GB of system objects, populating and clearing them 
appears to be some 6X faster with a straightforward alloc approach (like 
TTM or the "internal" backend) than with shmem...

>>          .get_pages = i915_ttm_get_pages,
>>          .put_pages = i915_ttm_put_pages,
>> @@ -599,6 +658,10 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>>   {
>>          static struct lock_class_key lock_class;
>>          struct drm_i915_private *i915 = mem->i915;
>> +       struct ttm_operation_ctx ctx = {
>> +               .interruptible = true,
>> +               .no_wait_gpu = false,
>> +       };
>>          enum ttm_bo_type bo_type;
>>          size_t alignment = 0;
>>          int ret;
>> @@ -618,15 +681,14 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>>          i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
>>          i915_gem_object_init_memory_region(obj, mem);
>>          i915_gem_object_make_unshrinkable(obj);

Here we set the object to unshrinkable.

/Thomas


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/4] drm/i915/ttm: Adjust gem flags and caching settings after a move
  2021-06-14 10:32       ` Thomas Hellström
@ 2021-06-14 10:49         ` Matthew Auld
  -1 siblings, 0 replies; 18+ messages in thread
From: Matthew Auld @ 2021-06-14 10:49 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Intel Graphics Development, Matthew Auld, ML dri-devel

On Mon, 14 Jun 2021 at 11:32, Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
>
> On 6/14/21 12:20 PM, Matthew Auld wrote:
> > On Mon, 14 Jun 2021 at 10:53, Thomas Hellström
> > <thomas.hellstrom@linux.intel.com> wrote:
> >> After a TTM move or object init we need to update the i915 gem flags and
> >> caching settings to reflect the new placement. Currently caching settings
> >> are not changed during the lifetime of an object, although that might
> >> change moving forward if we run into performance issues or issues with
> >> WC system page allocations.
> >> Also introduce gpu_binds_iomem() and cpu_maps_iomem() to clean up the
> >> various ways we previously used to detect this.
> >> Finally, initialize the TTM object reserved to be able to update
> >> flags and caching before anyone else gets hold of the object.
> >>
> >> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >> ---
> >> v2:
> >> - Style fixes (Reported by Matthew Auld)
> >> v3:
> >> - More style fixes. Clarify why we're updating caching settings after move.
> >>    (Suggested by Matthew Auld)
> >> ---
> >>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 111 +++++++++++++++++++-----
> >>   1 file changed, 89 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> >> index 33ab47f1e05b..5176682a7d19 100644
> >> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> >> @@ -70,6 +70,17 @@ static struct ttm_placement i915_sys_placement = {
> >>          .busy_placement = &lmem0_sys_placement_flags[1],
> >>   };
> >>
> >> +static bool gpu_binds_iomem(struct ttm_resource *mem)
> >> +{
> >> +       return mem->mem_type != TTM_PL_SYSTEM;
> >> +}
> >> +
> >> +static bool cpu_maps_iomem(struct ttm_resource *mem)
> >> +{
> >> +       /* Once / if we support GGTT, this is also false for cached ttm_tts */
> >> +       return mem->mem_type != TTM_PL_SYSTEM;
> >> +}
> >> +
> >>   static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
> >>
> >>   static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
> >> @@ -175,6 +186,40 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
> >>          obj->ttm.cached_io_st = NULL;
> >>   }
> >>
> >> +static void
> >> +i915_ttm_adjust_domains_after_cpu_move(struct drm_i915_gem_object *obj)
> >> +{
> >> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> >> +
> >> +       if (cpu_maps_iomem(bo->resource) || bo->ttm->caching != ttm_cached) {
> >> +               obj->write_domain = I915_GEM_DOMAIN_WC;
> >> +               obj->read_domains = I915_GEM_DOMAIN_WC;
> >> +       } else {
> >> +               obj->write_domain = I915_GEM_DOMAIN_CPU;
> >> +               obj->read_domains = I915_GEM_DOMAIN_CPU;
> >> +       }
> >> +}
> >> +
> >> +static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> >> +{
> >> +       struct drm_i915_private *i915 = to_i915(obj->base.dev);
> >> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> >> +       unsigned int cache_level;
> >> +
> >> +       obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
> >> +
> >> +       obj->mem_flags |= cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
> >> +               I915_BO_FLAG_STRUCT_PAGE;
> >> +
> >> +       if ((HAS_LLC(i915) || HAS_SNOOP(i915)) && !gpu_binds_iomem(bo->resource) &&
> >> +           bo->ttm->caching == ttm_cached)
> >> +               cache_level = I915_CACHE_LLC;
> >> +       else
> >> +               cache_level = I915_CACHE_NONE;
> >> +
> >> +       i915_gem_object_set_cache_coherency(obj, cache_level);
> >> +}
> >> +
> >>   static void i915_ttm_purge(struct drm_i915_gem_object *obj)
> >>   {
> >>          struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> >> @@ -190,8 +235,10 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
> >>
> >>          /* TTM's purge interface. Note that we might be reentering. */
> >>          ret = ttm_bo_validate(bo, &place, &ctx);
> >> -
> >>          if (!ret) {
> >> +               obj->write_domain = 0;
> >> +               obj->read_domains = 0;
> >> +               i915_ttm_adjust_gem_after_move(obj);
> >>                  i915_ttm_free_cached_io_st(obj);
> >>                  obj->mm.madv = __I915_MADV_PURGED;
> >>          }
> >> @@ -273,12 +320,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
> >>                           struct ttm_resource *res)
> >>   {
> >>          struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> >> -       struct ttm_resource_manager *man =
> >> -               ttm_manager_type(bo->bdev, res->mem_type);
> >>
> >> -       if (man->use_tt)
> >> +       if (!gpu_binds_iomem(res))
> >>                  return i915_ttm_tt_get_st(bo->ttm);
> >>
> >> +       /*
> >> +        * If CPU mapping differs, we need to add the ttm_tt pages to
> >> +        * the resulting st. Might make sense for GGTT.
> >> +        */
> >> +       GEM_WARN_ON(!cpu_maps_iomem(res));
> >>          return intel_region_ttm_node_to_st(obj->mm.region, res);
> >>   }
> >>
> >> @@ -290,8 +340,6 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
> >>          struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> >>          struct ttm_resource_manager *dst_man =
> >>                  ttm_manager_type(bo->bdev, dst_mem->mem_type);
> >> -       struct ttm_resource_manager *src_man =
> >> -               ttm_manager_type(bo->bdev, bo->resource->mem_type);
> >>          struct intel_memory_region *dst_reg, *src_reg;
> >>          union {
> >>                  struct ttm_kmap_iter_tt tt;
> >> @@ -332,34 +380,36 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
> >>          if (IS_ERR(dst_st))
> >>                  return PTR_ERR(dst_st);
> >>
> >> -       /* If we start mapping GGTT, we can no longer use man::use_tt here. */
> >> -       dst_iter = dst_man->use_tt ?
> >> +       dst_iter = !cpu_maps_iomem(dst_mem) ?
> >>                  ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
> >>                  ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
> >>                                           dst_st, dst_reg->region.start);
> >>
> >> -       src_iter = src_man->use_tt ?
> >> +       src_iter = !cpu_maps_iomem(bo->resource) ?
> >>                  ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
> >>                  ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
> >>                                           obj->ttm.cached_io_st,
> >>                                           src_reg->region.start);
> >>
> >>          ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
> >> +       /* Below dst_mem becomes bo->resource. */
> >>          ttm_bo_move_sync_cleanup(bo, dst_mem);
> >> +       i915_ttm_adjust_domains_after_cpu_move(obj);
> >>          i915_ttm_free_cached_io_st(obj);
> >>
> >> -       if (!dst_man->use_tt) {
> >> +       if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
> >>                  obj->ttm.cached_io_st = dst_st;
> >>                  obj->ttm.get_io_page.sg_pos = dst_st->sgl;
> >>                  obj->ttm.get_io_page.sg_idx = 0;
> >>          }
> >>
> >> +       i915_ttm_adjust_gem_after_move(obj);
> >>          return 0;
> >>   }
> >>
> >>   static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
> >>   {
> >> -       if (mem->mem_type < I915_PL_LMEM0)
> >> +       if (!cpu_maps_iomem(mem))
> >>                  return 0;
> >>
> >>          mem->bus.caching = ttm_write_combined;
> >> @@ -421,6 +471,16 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
> >>          if (ret)
> >>                  return ret == -ENOSPC ? -ENXIO : ret;
> >>
> >> +       i915_ttm_adjust_lru(obj);
> >> +       if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
> >> +               ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
> >> +               if (ret)
> >> +                       return ret;
> >> +
> >> +               i915_ttm_adjust_domains_after_cpu_move(obj);
> >> +               i915_ttm_adjust_gem_after_move(obj);
> >> +       }
> >> +
> >>          /* Object either has a page vector or is an iomem object */
> >>          st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
> >>          if (IS_ERR(st))
> >> @@ -428,8 +488,6 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
> >>
> >>          __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
> >>
> >> -       i915_ttm_adjust_lru(obj);
> >> -
> >>          return ret;
> >>   }
> >>
> >> @@ -563,6 +621,7 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
> >>
> >>   const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
> >>          .name = "i915_gem_object_ttm",
> >> +       .flags = I915_GEM_OBJECT_IS_SHRINKABLE,
> > We set is_shrinkable for both lmem and smem? Does the current shrinker
> > work with !shmem? I assume the put_pages() will just discard the page
> > contents, which is not what we want? Maybe keep this disabled for now?
> > Or am I missing something?
>
> This is to indicate to the gem shrinker that we want to be able to
> determine dynamically whether the object is shrinkable (by the gem
> shrinker) or not. (See below). It's intended for the situation where we
> might want to use shmem to back cached-only ttm objects, and use our own
> shrinker.
>
> But if you think it's better, we could flip that ops flag once we
> actually have anything shrinkable in place. Let me know what you think.
> The TTM shmem pool idea becomes a bit doubtful by the fact that
> allocating some 8GB of system objects, populating and clearing them
> appears to be some 6X faster with a straightforward alloc approach (like
> TTM or the "internal" backend) than with shmem...

I would vote for keeping it turned off for now.

>
> >>          .get_pages = i915_ttm_get_pages,
> >>          .put_pages = i915_ttm_put_pages,
> >> @@ -599,6 +658,10 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
> >>   {
> >>          static struct lock_class_key lock_class;
> >>          struct drm_i915_private *i915 = mem->i915;
> >> +       struct ttm_operation_ctx ctx = {
> >> +               .interruptible = true,
> >> +               .no_wait_gpu = false,
> >> +       };
> >>          enum ttm_bo_type bo_type;
> >>          size_t alignment = 0;
> >>          int ret;
> >> @@ -618,15 +681,14 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
> >>          i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
> >>          i915_gem_object_init_memory_region(obj, mem);
> >>          i915_gem_object_make_unshrinkable(obj);
>
> Here we set the object to unshrinkable.

Oh, I totally missed that. But I don't think we are meant to call that
here, especially with IS_SHRINKABLE, since object_set_pages will later
make it shrinkable anyway, which looks like fun and games.

>
> /Thomas
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/4] drm/i915/ttm: Adjust gem flags and caching settings after a move
@ 2021-06-14 10:49         ` Matthew Auld
  0 siblings, 0 replies; 18+ messages in thread
From: Matthew Auld @ 2021-06-14 10:49 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Intel Graphics Development, Matthew Auld, ML dri-devel

On Mon, 14 Jun 2021 at 11:32, Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
>
> On 6/14/21 12:20 PM, Matthew Auld wrote:
> > On Mon, 14 Jun 2021 at 10:53, Thomas Hellström
> > <thomas.hellstrom@linux.intel.com> wrote:
> >> After a TTM move or object init we need to update the i915 gem flags and
> >> caching settings to reflect the new placement. Currently caching settings
> >> are not changed during the lifetime of an object, although that might
> >> change moving forward if we run into performance issues or issues with
> >> WC system page allocations.
> >> Also introduce gpu_binds_iomem() and cpu_maps_iomem() to clean up the
> >> various ways we previously used to detect this.
> >> Finally, initialize the TTM object reserved to be able to update
> >> flags and caching before anyone else gets hold of the object.
> >>
> >> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >> ---
> >> v2:
> >> - Style fixes (Reported by Matthew Auld)
> >> v3:
> >> - More style fixes. Clarify why we're updating caching settings after move.
> >>    (Suggested by Matthew Auld)
> >> ---
> >>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 111 +++++++++++++++++++-----
> >>   1 file changed, 89 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> >> index 33ab47f1e05b..5176682a7d19 100644
> >> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> >> @@ -70,6 +70,17 @@ static struct ttm_placement i915_sys_placement = {
> >>          .busy_placement = &lmem0_sys_placement_flags[1],
> >>   };
> >>
> >> +static bool gpu_binds_iomem(struct ttm_resource *mem)
> >> +{
> >> +       return mem->mem_type != TTM_PL_SYSTEM;
> >> +}
> >> +
> >> +static bool cpu_maps_iomem(struct ttm_resource *mem)
> >> +{
> >> +       /* Once / if we support GGTT, this is also false for cached ttm_tts */
> >> +       return mem->mem_type != TTM_PL_SYSTEM;
> >> +}
> >> +
> >>   static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
> >>
> >>   static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
> >> @@ -175,6 +186,40 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
> >>          obj->ttm.cached_io_st = NULL;
> >>   }
> >>
> >> +static void
> >> +i915_ttm_adjust_domains_after_cpu_move(struct drm_i915_gem_object *obj)
> >> +{
> >> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> >> +
> >> +       if (cpu_maps_iomem(bo->resource) || bo->ttm->caching != ttm_cached) {
> >> +               obj->write_domain = I915_GEM_DOMAIN_WC;
> >> +               obj->read_domains = I915_GEM_DOMAIN_WC;
> >> +       } else {
> >> +               obj->write_domain = I915_GEM_DOMAIN_CPU;
> >> +               obj->read_domains = I915_GEM_DOMAIN_CPU;
> >> +       }
> >> +}
> >> +
> >> +static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> >> +{
> >> +       struct drm_i915_private *i915 = to_i915(obj->base.dev);
> >> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> >> +       unsigned int cache_level;
> >> +
> >> +       obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
> >> +
> >> +       obj->mem_flags |= cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
> >> +               I915_BO_FLAG_STRUCT_PAGE;
> >> +
> >> +       if ((HAS_LLC(i915) || HAS_SNOOP(i915)) && !gpu_binds_iomem(bo->resource) &&
> >> +           bo->ttm->caching == ttm_cached)
> >> +               cache_level = I915_CACHE_LLC;
> >> +       else
> >> +               cache_level = I915_CACHE_NONE;
> >> +
> >> +       i915_gem_object_set_cache_coherency(obj, cache_level);
> >> +}
> >> +
> >>   static void i915_ttm_purge(struct drm_i915_gem_object *obj)
> >>   {
> >>          struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> >> @@ -190,8 +235,10 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
> >>
> >>          /* TTM's purge interface. Note that we might be reentering. */
> >>          ret = ttm_bo_validate(bo, &place, &ctx);
> >> -
> >>          if (!ret) {
> >> +               obj->write_domain = 0;
> >> +               obj->read_domains = 0;
> >> +               i915_ttm_adjust_gem_after_move(obj);
> >>                  i915_ttm_free_cached_io_st(obj);
> >>                  obj->mm.madv = __I915_MADV_PURGED;
> >>          }
> >> @@ -273,12 +320,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
> >>                           struct ttm_resource *res)
> >>   {
> >>          struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> >> -       struct ttm_resource_manager *man =
> >> -               ttm_manager_type(bo->bdev, res->mem_type);
> >>
> >> -       if (man->use_tt)
> >> +       if (!gpu_binds_iomem(res))
> >>                  return i915_ttm_tt_get_st(bo->ttm);
> >>
> >> +       /*
> >> +        * If CPU mapping differs, we need to add the ttm_tt pages to
> >> +        * the resulting st. Might make sense for GGTT.
> >> +        */
> >> +       GEM_WARN_ON(!cpu_maps_iomem(res));
> >>          return intel_region_ttm_node_to_st(obj->mm.region, res);
> >>   }
> >>
> >> @@ -290,8 +340,6 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
> >>          struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> >>          struct ttm_resource_manager *dst_man =
> >>                  ttm_manager_type(bo->bdev, dst_mem->mem_type);
> >> -       struct ttm_resource_manager *src_man =
> >> -               ttm_manager_type(bo->bdev, bo->resource->mem_type);
> >>          struct intel_memory_region *dst_reg, *src_reg;
> >>          union {
> >>                  struct ttm_kmap_iter_tt tt;
> >> @@ -332,34 +380,36 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
> >>          if (IS_ERR(dst_st))
> >>                  return PTR_ERR(dst_st);
> >>
> >> -       /* If we start mapping GGTT, we can no longer use man::use_tt here. */
> >> -       dst_iter = dst_man->use_tt ?
> >> +       dst_iter = !cpu_maps_iomem(dst_mem) ?
> >>                  ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
> >>                  ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
> >>                                           dst_st, dst_reg->region.start);
> >>
> >> -       src_iter = src_man->use_tt ?
> >> +       src_iter = !cpu_maps_iomem(bo->resource) ?
> >>                  ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
> >>                  ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
> >>                                           obj->ttm.cached_io_st,
> >>                                           src_reg->region.start);
> >>
> >>          ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
> >> +       /* Below dst_mem becomes bo->resource. */
> >>          ttm_bo_move_sync_cleanup(bo, dst_mem);
> >> +       i915_ttm_adjust_domains_after_cpu_move(obj);
> >>          i915_ttm_free_cached_io_st(obj);
> >>
> >> -       if (!dst_man->use_tt) {
> >> +       if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
> >>                  obj->ttm.cached_io_st = dst_st;
> >>                  obj->ttm.get_io_page.sg_pos = dst_st->sgl;
> >>                  obj->ttm.get_io_page.sg_idx = 0;
> >>          }
> >>
> >> +       i915_ttm_adjust_gem_after_move(obj);
> >>          return 0;
> >>   }
> >>
> >>   static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
> >>   {
> >> -       if (mem->mem_type < I915_PL_LMEM0)
> >> +       if (!cpu_maps_iomem(mem))
> >>                  return 0;
> >>
> >>          mem->bus.caching = ttm_write_combined;
> >> @@ -421,6 +471,16 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
> >>          if (ret)
> >>                  return ret == -ENOSPC ? -ENXIO : ret;
> >>
> >> +       i915_ttm_adjust_lru(obj);
> >> +       if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
> >> +               ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
> >> +               if (ret)
> >> +                       return ret;
> >> +
> >> +               i915_ttm_adjust_domains_after_cpu_move(obj);
> >> +               i915_ttm_adjust_gem_after_move(obj);
> >> +       }
> >> +
> >>          /* Object either has a page vector or is an iomem object */
> >>          st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
> >>          if (IS_ERR(st))
> >> @@ -428,8 +488,6 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
> >>
> >>          __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
> >>
> >> -       i915_ttm_adjust_lru(obj);
> >> -
> >>          return ret;
> >>   }
> >>
> >> @@ -563,6 +621,7 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
> >>
> >>   const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
> >>          .name = "i915_gem_object_ttm",
> >> +       .flags = I915_GEM_OBJECT_IS_SHRINKABLE,
> > We set is_shrinkable for both lmem and smem? Does the current shrinker
> > work with !shmem? I assume the put_pages() will just discard the page
> > contents, which is not what we want? Maybe keep this disabled for now?
> > Or am I missing something?
>
> This is to indicate to the gem shrinker that we want to be able to
> determine dynamically whether the object is shrinkable (by the gem
> shrinker) or not. (See below). It's intended for the situation where we
> might want to use shmem to back cached-only ttm objects, and use our own
> shrinker.
>
> But if you think it's better, we could flip that ops flag once we
> actually have anything shrinkable in place. Let me know what you think.
> The TTM shmem pool idea becomes a bit doubtful by the fact that
> allocating some 8GB of system objects, populating and clearing them
> appears to be some 6X faster with a straightforward alloc approach (like
> TTM or the "internal" backend) than with shmem...

I would vote for keeping it turned off for now.

>
> >>          .get_pages = i915_ttm_get_pages,
> >>          .put_pages = i915_ttm_put_pages,
> >> @@ -599,6 +658,10 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
> >>   {
> >>          static struct lock_class_key lock_class;
> >>          struct drm_i915_private *i915 = mem->i915;
> >> +       struct ttm_operation_ctx ctx = {
> >> +               .interruptible = true,
> >> +               .no_wait_gpu = false,
> >> +       };
> >>          enum ttm_bo_type bo_type;
> >>          size_t alignment = 0;
> >>          int ret;
> >> @@ -618,15 +681,14 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
> >>          i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
> >>          i915_gem_object_init_memory_region(obj, mem);
> >>          i915_gem_object_make_unshrinkable(obj);
>
> Here we set the object to unshrinkable.

Oh, I totally missed that. But I don't think we are meant to call that
here, especially with IS_SHRINKABLE, since object_set_pages will later
make it shrinkable anyway, which looks like fun and games.

>
> /Thomas
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/4] drm/i915/ttm: Adjust gem flags and caching settings after a move
  2021-06-14 10:49         ` Matthew Auld
@ 2021-06-14 11:14           ` Thomas Hellström
  -1 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14 11:14 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development, Matthew Auld, ML dri-devel


On 6/14/21 12:49 PM, Matthew Auld wrote:
> On Mon, 14 Jun 2021 at 11:32, Thomas Hellström
> <thomas.hellstrom@linux.intel.com> wrote:
>>
>> On 6/14/21 12:20 PM, Matthew Auld wrote:
>>> On Mon, 14 Jun 2021 at 10:53, Thomas Hellström
>>> <thomas.hellstrom@linux.intel.com> wrote:
>>>> After a TTM move or object init we need to update the i915 gem flags and
>>>> caching settings to reflect the new placement. Currently caching settings
>>>> are not changed during the lifetime of an object, although that might
>>>> change moving forward if we run into performance issues or issues with
>>>> WC system page allocations.
>>>> Also introduce gpu_binds_iomem() and cpu_maps_iomem() to clean up the
>>>> various ways we previously used to detect this.
>>>> Finally, initialize the TTM object reserved to be able to update
>>>> flags and caching before anyone else gets hold of the object.
>>>>
>>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> ---
>>>> v2:
>>>> - Style fixes (Reported by Matthew Auld)
>>>> v3:
>>>> - More style fixes. Clarify why we're updating caching settings after move.
>>>>     (Suggested by Matthew Auld)
>>>> ---
>>>>    drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 111 +++++++++++++++++++-----
>>>>    1 file changed, 89 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>> index 33ab47f1e05b..5176682a7d19 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>> @@ -70,6 +70,17 @@ static struct ttm_placement i915_sys_placement = {
>>>>           .busy_placement = &lmem0_sys_placement_flags[1],
>>>>    };
>>>>
>>>> +static bool gpu_binds_iomem(struct ttm_resource *mem)
>>>> +{
>>>> +       return mem->mem_type != TTM_PL_SYSTEM;
>>>> +}
>>>> +
>>>> +static bool cpu_maps_iomem(struct ttm_resource *mem)
>>>> +{
>>>> +       /* Once / if we support GGTT, this is also false for cached ttm_tts */
>>>> +       return mem->mem_type != TTM_PL_SYSTEM;
>>>> +}
>>>> +
>>>>    static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
>>>>
>>>>    static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>>>> @@ -175,6 +186,40 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
>>>>           obj->ttm.cached_io_st = NULL;
>>>>    }
>>>>
>>>> +static void
>>>> +i915_ttm_adjust_domains_after_cpu_move(struct drm_i915_gem_object *obj)
>>>> +{
>>>> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>>> +
>>>> +       if (cpu_maps_iomem(bo->resource) || bo->ttm->caching != ttm_cached) {
>>>> +               obj->write_domain = I915_GEM_DOMAIN_WC;
>>>> +               obj->read_domains = I915_GEM_DOMAIN_WC;
>>>> +       } else {
>>>> +               obj->write_domain = I915_GEM_DOMAIN_CPU;
>>>> +               obj->read_domains = I915_GEM_DOMAIN_CPU;
>>>> +       }
>>>> +}
>>>> +
>>>> +static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>>> +{
>>>> +       struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>>> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>>> +       unsigned int cache_level;
>>>> +
>>>> +       obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>>>> +
>>>> +       obj->mem_flags |= cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>>>> +               I915_BO_FLAG_STRUCT_PAGE;
>>>> +
>>>> +       if ((HAS_LLC(i915) || HAS_SNOOP(i915)) && !gpu_binds_iomem(bo->resource) &&
>>>> +           bo->ttm->caching == ttm_cached)
>>>> +               cache_level = I915_CACHE_LLC;
>>>> +       else
>>>> +               cache_level = I915_CACHE_NONE;
>>>> +
>>>> +       i915_gem_object_set_cache_coherency(obj, cache_level);
>>>> +}
>>>> +
>>>>    static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>>>>    {
>>>>           struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>>> @@ -190,8 +235,10 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>>>>
>>>>           /* TTM's purge interface. Note that we might be reentering. */
>>>>           ret = ttm_bo_validate(bo, &place, &ctx);
>>>> -
>>>>           if (!ret) {
>>>> +               obj->write_domain = 0;
>>>> +               obj->read_domains = 0;
>>>> +               i915_ttm_adjust_gem_after_move(obj);
>>>>                   i915_ttm_free_cached_io_st(obj);
>>>>                   obj->mm.madv = __I915_MADV_PURGED;
>>>>           }
>>>> @@ -273,12 +320,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>>>>                            struct ttm_resource *res)
>>>>    {
>>>>           struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>>> -       struct ttm_resource_manager *man =
>>>> -               ttm_manager_type(bo->bdev, res->mem_type);
>>>>
>>>> -       if (man->use_tt)
>>>> +       if (!gpu_binds_iomem(res))
>>>>                   return i915_ttm_tt_get_st(bo->ttm);
>>>>
>>>> +       /*
>>>> +        * If CPU mapping differs, we need to add the ttm_tt pages to
>>>> +        * the resulting st. Might make sense for GGTT.
>>>> +        */
>>>> +       GEM_WARN_ON(!cpu_maps_iomem(res));
>>>>           return intel_region_ttm_node_to_st(obj->mm.region, res);
>>>>    }
>>>>
>>>> @@ -290,8 +340,6 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>>>>           struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>>>>           struct ttm_resource_manager *dst_man =
>>>>                   ttm_manager_type(bo->bdev, dst_mem->mem_type);
>>>> -       struct ttm_resource_manager *src_man =
>>>> -               ttm_manager_type(bo->bdev, bo->resource->mem_type);
>>>>           struct intel_memory_region *dst_reg, *src_reg;
>>>>           union {
>>>>                   struct ttm_kmap_iter_tt tt;
>>>> @@ -332,34 +380,36 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>>>>           if (IS_ERR(dst_st))
>>>>                   return PTR_ERR(dst_st);
>>>>
>>>> -       /* If we start mapping GGTT, we can no longer use man::use_tt here. */
>>>> -       dst_iter = dst_man->use_tt ?
>>>> +       dst_iter = !cpu_maps_iomem(dst_mem) ?
>>>>                   ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
>>>>                   ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
>>>>                                            dst_st, dst_reg->region.start);
>>>>
>>>> -       src_iter = src_man->use_tt ?
>>>> +       src_iter = !cpu_maps_iomem(bo->resource) ?
>>>>                   ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
>>>>                   ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
>>>>                                            obj->ttm.cached_io_st,
>>>>                                            src_reg->region.start);
>>>>
>>>>           ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
>>>> +       /* Below dst_mem becomes bo->resource. */
>>>>           ttm_bo_move_sync_cleanup(bo, dst_mem);
>>>> +       i915_ttm_adjust_domains_after_cpu_move(obj);
>>>>           i915_ttm_free_cached_io_st(obj);
>>>>
>>>> -       if (!dst_man->use_tt) {
>>>> +       if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
>>>>                   obj->ttm.cached_io_st = dst_st;
>>>>                   obj->ttm.get_io_page.sg_pos = dst_st->sgl;
>>>>                   obj->ttm.get_io_page.sg_idx = 0;
>>>>           }
>>>>
>>>> +       i915_ttm_adjust_gem_after_move(obj);
>>>>           return 0;
>>>>    }
>>>>
>>>>    static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
>>>>    {
>>>> -       if (mem->mem_type < I915_PL_LMEM0)
>>>> +       if (!cpu_maps_iomem(mem))
>>>>                   return 0;
>>>>
>>>>           mem->bus.caching = ttm_write_combined;
>>>> @@ -421,6 +471,16 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>>>>           if (ret)
>>>>                   return ret == -ENOSPC ? -ENXIO : ret;
>>>>
>>>> +       i915_ttm_adjust_lru(obj);
>>>> +       if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
>>>> +               ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
>>>> +               if (ret)
>>>> +                       return ret;
>>>> +
>>>> +               i915_ttm_adjust_domains_after_cpu_move(obj);
>>>> +               i915_ttm_adjust_gem_after_move(obj);
>>>> +       }
>>>> +
>>>>           /* Object either has a page vector or is an iomem object */
>>>>           st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
>>>>           if (IS_ERR(st))
>>>> @@ -428,8 +488,6 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>>>>
>>>>           __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
>>>>
>>>> -       i915_ttm_adjust_lru(obj);
>>>> -
>>>>           return ret;
>>>>    }
>>>>
>>>> @@ -563,6 +621,7 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
>>>>
>>>>    const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
>>>>           .name = "i915_gem_object_ttm",
>>>> +       .flags = I915_GEM_OBJECT_IS_SHRINKABLE,
>>> We set is_shrinkable for both lmem and smem? Does the current shrinker
>>> work with !shmem? I assume the put_pages() will just discard the page
>>> contents, which is not what we want? Maybe keep this disabled for now?
>>> Or am I missing something?
>> This is to indicate to the gem shrinker that we want to be able to
>> determine dynamically whether the object is shrinkable (by the gem
>> shrinker) or not. (See below). It's intended for the situation where we
>> might want to use shmem to back cached-only ttm objects, and use our own
>> shrinker.
>>
>> But if you think it's better, we could flip that ops flag once we
>> actually have anything shrinkable in place. Let me know what you think.
>> The TTM shmem pool idea becomes a bit doubtful by the fact that
>> allocating some 8GB of system objects, populating and clearing them
>> appears to be some 6X faster with a straightforward alloc approach (like
>> TTM or the "internal" backend) than with shmem...
> I would vote for keeping it turned off for now.
>
>>>>           .get_pages = i915_ttm_get_pages,
>>>>           .put_pages = i915_ttm_put_pages,
>>>> @@ -599,6 +658,10 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>>>>    {
>>>>           static struct lock_class_key lock_class;
>>>>           struct drm_i915_private *i915 = mem->i915;
>>>> +       struct ttm_operation_ctx ctx = {
>>>> +               .interruptible = true,
>>>> +               .no_wait_gpu = false,
>>>> +       };
>>>>           enum ttm_bo_type bo_type;
>>>>           size_t alignment = 0;
>>>>           int ret;
>>>> @@ -618,15 +681,14 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>>>>           i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
>>>>           i915_gem_object_init_memory_region(obj, mem);
>>>>           i915_gem_object_make_unshrinkable(obj);
>> Here we set the object to unshrinkable.
> Oh, I totally missed that. But I don't think we are meant to call that
> here, especially with IS_SHRINKABLE, since object_set_pages will later
> make it shrinkable anyway, which looks like fun and games.

Indeed. OK, I'll respin to keep these objects !IS_SHRINKABLE for now.

Need to take a deeper look into that interface...

/Thomas



>> /Thomas
>>
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/4] drm/i915/ttm: Adjust gem flags and caching settings after a move
@ 2021-06-14 11:14           ` Thomas Hellström
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Hellström @ 2021-06-14 11:14 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development, Matthew Auld, ML dri-devel


On 6/14/21 12:49 PM, Matthew Auld wrote:
> On Mon, 14 Jun 2021 at 11:32, Thomas Hellström
> <thomas.hellstrom@linux.intel.com> wrote:
>>
>> On 6/14/21 12:20 PM, Matthew Auld wrote:
>>> On Mon, 14 Jun 2021 at 10:53, Thomas Hellström
>>> <thomas.hellstrom@linux.intel.com> wrote:
>>>> After a TTM move or object init we need to update the i915 gem flags and
>>>> caching settings to reflect the new placement. Currently caching settings
>>>> are not changed during the lifetime of an object, although that might
>>>> change moving forward if we run into performance issues or issues with
>>>> WC system page allocations.
>>>> Also introduce gpu_binds_iomem() and cpu_maps_iomem() to clean up the
>>>> various ways we previously used to detect this.
>>>> Finally, initialize the TTM object reserved to be able to update
>>>> flags and caching before anyone else gets hold of the object.
>>>>
>>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> ---
>>>> v2:
>>>> - Style fixes (Reported by Matthew Auld)
>>>> v3:
>>>> - More style fixes. Clarify why we're updating caching settings after move.
>>>>     (Suggested by Matthew Auld)
>>>> ---
>>>>    drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 111 +++++++++++++++++++-----
>>>>    1 file changed, 89 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>> index 33ab47f1e05b..5176682a7d19 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>> @@ -70,6 +70,17 @@ static struct ttm_placement i915_sys_placement = {
>>>>           .busy_placement = &lmem0_sys_placement_flags[1],
>>>>    };
>>>>
>>>> +static bool gpu_binds_iomem(struct ttm_resource *mem)
>>>> +{
>>>> +       return mem->mem_type != TTM_PL_SYSTEM;
>>>> +}
>>>> +
>>>> +static bool cpu_maps_iomem(struct ttm_resource *mem)
>>>> +{
>>>> +       /* Once / if we support GGTT, this is also false for cached ttm_tts */
>>>> +       return mem->mem_type != TTM_PL_SYSTEM;
>>>> +}
>>>> +
>>>>    static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
>>>>
>>>>    static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>>>> @@ -175,6 +186,40 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
>>>>           obj->ttm.cached_io_st = NULL;
>>>>    }
>>>>
>>>> +static void
>>>> +i915_ttm_adjust_domains_after_cpu_move(struct drm_i915_gem_object *obj)
>>>> +{
>>>> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>>> +
>>>> +       if (cpu_maps_iomem(bo->resource) || bo->ttm->caching != ttm_cached) {
>>>> +               obj->write_domain = I915_GEM_DOMAIN_WC;
>>>> +               obj->read_domains = I915_GEM_DOMAIN_WC;
>>>> +       } else {
>>>> +               obj->write_domain = I915_GEM_DOMAIN_CPU;
>>>> +               obj->read_domains = I915_GEM_DOMAIN_CPU;
>>>> +       }
>>>> +}
>>>> +
>>>> +static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>>> +{
>>>> +       struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>>> +       struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>>> +       unsigned int cache_level;
>>>> +
>>>> +       obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>>>> +
>>>> +       obj->mem_flags |= cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>>>> +               I915_BO_FLAG_STRUCT_PAGE;
>>>> +
>>>> +       if ((HAS_LLC(i915) || HAS_SNOOP(i915)) && !gpu_binds_iomem(bo->resource) &&
>>>> +           bo->ttm->caching == ttm_cached)
>>>> +               cache_level = I915_CACHE_LLC;
>>>> +       else
>>>> +               cache_level = I915_CACHE_NONE;
>>>> +
>>>> +       i915_gem_object_set_cache_coherency(obj, cache_level);
>>>> +}
>>>> +
>>>>    static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>>>>    {
>>>>           struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>>> @@ -190,8 +235,10 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
>>>>
>>>>           /* TTM's purge interface. Note that we might be reentering. */
>>>>           ret = ttm_bo_validate(bo, &place, &ctx);
>>>> -
>>>>           if (!ret) {
>>>> +               obj->write_domain = 0;
>>>> +               obj->read_domains = 0;
>>>> +               i915_ttm_adjust_gem_after_move(obj);
>>>>                   i915_ttm_free_cached_io_st(obj);
>>>>                   obj->mm.madv = __I915_MADV_PURGED;
>>>>           }
>>>> @@ -273,12 +320,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
>>>>                            struct ttm_resource *res)
>>>>    {
>>>>           struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>>> -       struct ttm_resource_manager *man =
>>>> -               ttm_manager_type(bo->bdev, res->mem_type);
>>>>
>>>> -       if (man->use_tt)
>>>> +       if (!gpu_binds_iomem(res))
>>>>                   return i915_ttm_tt_get_st(bo->ttm);
>>>>
>>>> +       /*
>>>> +        * If CPU mapping differs, we need to add the ttm_tt pages to
>>>> +        * the resulting st. Might make sense for GGTT.
>>>> +        */
>>>> +       GEM_WARN_ON(!cpu_maps_iomem(res));
>>>>           return intel_region_ttm_node_to_st(obj->mm.region, res);
>>>>    }
>>>>
>>>> @@ -290,8 +340,6 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>>>>           struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>>>>           struct ttm_resource_manager *dst_man =
>>>>                   ttm_manager_type(bo->bdev, dst_mem->mem_type);
>>>> -       struct ttm_resource_manager *src_man =
>>>> -               ttm_manager_type(bo->bdev, bo->resource->mem_type);
>>>>           struct intel_memory_region *dst_reg, *src_reg;
>>>>           union {
>>>>                   struct ttm_kmap_iter_tt tt;
>>>> @@ -332,34 +380,36 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
>>>>           if (IS_ERR(dst_st))
>>>>                   return PTR_ERR(dst_st);
>>>>
>>>> -       /* If we start mapping GGTT, we can no longer use man::use_tt here. */
>>>> -       dst_iter = dst_man->use_tt ?
>>>> +       dst_iter = !cpu_maps_iomem(dst_mem) ?
>>>>                   ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
>>>>                   ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
>>>>                                            dst_st, dst_reg->region.start);
>>>>
>>>> -       src_iter = src_man->use_tt ?
>>>> +       src_iter = !cpu_maps_iomem(bo->resource) ?
>>>>                   ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
>>>>                   ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
>>>>                                            obj->ttm.cached_io_st,
>>>>                                            src_reg->region.start);
>>>>
>>>>           ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
>>>> +       /* Below dst_mem becomes bo->resource. */
>>>>           ttm_bo_move_sync_cleanup(bo, dst_mem);
>>>> +       i915_ttm_adjust_domains_after_cpu_move(obj);
>>>>           i915_ttm_free_cached_io_st(obj);
>>>>
>>>> -       if (!dst_man->use_tt) {
>>>> +       if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
>>>>                   obj->ttm.cached_io_st = dst_st;
>>>>                   obj->ttm.get_io_page.sg_pos = dst_st->sgl;
>>>>                   obj->ttm.get_io_page.sg_idx = 0;
>>>>           }
>>>>
>>>> +       i915_ttm_adjust_gem_after_move(obj);
>>>>           return 0;
>>>>    }
>>>>
>>>>    static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
>>>>    {
>>>> -       if (mem->mem_type < I915_PL_LMEM0)
>>>> +       if (!cpu_maps_iomem(mem))
>>>>                   return 0;
>>>>
>>>>           mem->bus.caching = ttm_write_combined;
>>>> @@ -421,6 +471,16 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>>>>           if (ret)
>>>>                   return ret == -ENOSPC ? -ENXIO : ret;
>>>>
>>>> +       i915_ttm_adjust_lru(obj);
>>>> +       if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
>>>> +               ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
>>>> +               if (ret)
>>>> +                       return ret;
>>>> +
>>>> +               i915_ttm_adjust_domains_after_cpu_move(obj);
>>>> +               i915_ttm_adjust_gem_after_move(obj);
>>>> +       }
>>>> +
>>>>           /* Object either has a page vector or is an iomem object */
>>>>           st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
>>>>           if (IS_ERR(st))
>>>> @@ -428,8 +488,6 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
>>>>
>>>>           __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
>>>>
>>>> -       i915_ttm_adjust_lru(obj);
>>>> -
>>>>           return ret;
>>>>    }
>>>>
>>>> @@ -563,6 +621,7 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
>>>>
>>>>    const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
>>>>           .name = "i915_gem_object_ttm",
>>>> +       .flags = I915_GEM_OBJECT_IS_SHRINKABLE,
>>> We set is_shrinkable for both lmem and smem? Does the current shrinker
>>> work with !shmem? I assume the put_pages() will just discard the page
>>> contents, which is not what we want? Maybe keep this disabled for now?
>>> Or am I missing something?
>> This is to indicate to the gem shrinker that we want to be able to
>> determine dynamically whether the object is shrinkable (by the gem
>> shrinker) or not. (See below). It's intended for the situation where we
>> might want to use shmem to back cached-only ttm objects, and use our own
>> shrinker.
>>
>> But if you think it's better, we could flip that ops flag once we
>> actually have anything shrinkable in place. Let me know what you think.
>> The TTM shmem pool idea becomes a bit doubtful by the fact that
>> allocating some 8GB of system objects, populating and clearing them
>> appears to be some 6X faster with a straightforward alloc approach (like
>> TTM or the "internal" backend) than with shmem...
> I would vote for keeping it turned off for now.
>
>>>>           .get_pages = i915_ttm_get_pages,
>>>>           .put_pages = i915_ttm_put_pages,
>>>> @@ -599,6 +658,10 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>>>>    {
>>>>           static struct lock_class_key lock_class;
>>>>           struct drm_i915_private *i915 = mem->i915;
>>>> +       struct ttm_operation_ctx ctx = {
>>>> +               .interruptible = true,
>>>> +               .no_wait_gpu = false,
>>>> +       };
>>>>           enum ttm_bo_type bo_type;
>>>>           size_t alignment = 0;
>>>>           int ret;
>>>> @@ -618,15 +681,14 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>>>>           i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
>>>>           i915_gem_object_init_memory_region(obj, mem);
>>>>           i915_gem_object_make_unshrinkable(obj);
>> Here we set the object to unshrinkable.
> Oh, I totally missed that. But I don't think we are meant to call that
> here, especially with IS_SHRINKABLE, since object_set_pages will later
> make it shrinkable anyway, which looks like fun and games.

Indeed. OK, I'll respin to keep these objects !IS_SHRINKABLE for now.

Need to take a deeper look into that interface...

/Thomas



>> /Thomas
>>
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-06-14 11:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-14  9:52 [PATCH v3 0/4] drm/i915: Move system memory to TTM for discrete Thomas Hellström
2021-06-14  9:52 ` [Intel-gfx] " Thomas Hellström
2021-06-14  9:52 ` [PATCH v3 1/4] drm/i915: Update object placement flags to be mutable Thomas Hellström
2021-06-14  9:52   ` [Intel-gfx] " Thomas Hellström
2021-06-14  9:52 ` [PATCH v3 2/4] drm/i915/ttm: Adjust gem flags and caching settings after a move Thomas Hellström
2021-06-14  9:52   ` [Intel-gfx] " Thomas Hellström
2021-06-14 10:20   ` Matthew Auld
2021-06-14 10:20     ` Matthew Auld
2021-06-14 10:32     ` Thomas Hellström
2021-06-14 10:32       ` Thomas Hellström
2021-06-14 10:49       ` Matthew Auld
2021-06-14 10:49         ` Matthew Auld
2021-06-14 11:14         ` Thomas Hellström
2021-06-14 11:14           ` Thomas Hellström
2021-06-14  9:52 ` [PATCH v3 3/4] drm/i915/ttm: Calculate the object placement at get_pages time Thomas Hellström
2021-06-14  9:52   ` [Intel-gfx] " Thomas Hellström
2021-06-14  9:52 ` [PATCH v3 4/4] drm/i915/ttm: Use TTM for system memory Thomas Hellström
2021-06-14  9:52   ` [Intel-gfx] " Thomas Hellström

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.